0 citations0 references

Temporal Pyramid Pooling Based Relation Network for Action Recognition

2018pp. 644–647

Citations Over Time

Abstract

Efficient spatiotemporal representations play a vital role in video understanding. In this study, we propose a novel temporal pyramid pooling based relation network (TPPRN) to learn spatiotemporal representations in an end-to-end fashion. More specifically, TPPRN poolings high-level features derived from a convolutional neural network of sampling frames at multi-scale. Then, the features of same-length segments are concatenated to reason about the relation within the same-length segments. Finally, different relations are aggregated to make predictions comprehensively. Our first contribution is the carefully-designed sampling strategy. It splits a video into three clips evenly and samples four frames at each clip uniformly, which allows the model to reduce computation and memory cost. Our second contribution is the temporal pyramid pooling at multi-scale. It provides various-grained segments for relation module to reason about their relations. The experimental results on two standard benchmarks, HMDB-51 and UCF-101, demonstrate the effectiveness of the learned spatiotemporal representations. And the proposed TPPRN achieves the comparable performance with the state-of-the-art.

Citations Over Time

Abstract

Related Papers