Purely Attention Based Local Feature Integration for Video Classification
Citations Over TimeTop 11% of 2020 papers
Abstract
Recently, substantial research effort has focused on how to apply CNNs or RNNs to better capture temporal patterns to improve the accuracy of video classification. In this paper, we investigate the potential of a purely attention-based local feature integration. Accounting for the characteristics of such features in video classification, we first propose Basic Attention Clusters(BAC), which concatenates the output of multiple attention units applied in parallel and introduce a shifting operation to capture more diverse signals. Experiments show that BAC can achieve excellent results on multiple datasets. However, BAC treats all feature channels as an indivisible whole, which is suboptimal for achieving a finer-grained local feature integration over the channel dimension. Additionally, it treats the entire local feature sequence as an unordered set, thus ignoring the sequential relationships. To improve over BAC, we further propose the channel pyramid attention schema by splitting features into sub-features at multiple scales for coarse-to-fine sub-feature interaction modeling and propose the temporal pyramid attention schema by dividing the feature sequences into ordered sub-sequences of multiple lengths to account for the sequential order. We demonstrate the effectiveness of our final model Pyramid-Pyramid AttentionClusters (PPAC) on seven real-world video classification datasets.
Related Papers
- → Pits, Palaces and Pyramids(2011)
- → EGYPT. DASHUR. THE COMPANION OF THE PYRAMID OF SNEFRU: ITS FUNCTION HAS BEEN REVEALED AS TESTING CABLE ENDURANCE(2017)
- → Pyramid of geezers(2015)
- → The pediatric dentistry management pyramid: a new model for patient care(2023)
- → Mathematical Lens: A Mathematical Pyramid Scheme(2011)