Combining multiple sources of knowledge in deep CNNs for action recognition
Citations Over TimeTop 1% of 2016 papers
Abstract
Although deep convolutional neural networks (CNNs) have shown remarkable results for feature learning and prediction tasks, many recent studies have demonstrated improved performance by incorporating additional handcrafted features or by fusing predictions from multiple CNNs. Usually, these combinations are implemented via feature concatenation or by averaging output prediction scores from several CNNs. In this paper, we present new approaches for combining different sources of knowledge in deep learning. First, we propose feature amplification, where we use an auxiliary, hand-crafted, feature (e.g. optical flow) to perform spatially varying soft-gating on intermediate CNN feature maps. Second, we present a spatially varying multiplicative fusion method for combining multiple CNNs trained on different sources that results in robust prediction by amplifying or suppressing the feature activations based on their agreement. We test these methods in the context of action recognition where information from spatial and temporal cues is useful, obtaining results that are comparable with state-of-the-art methods and outperform methods using only CNNs and optical flow features.
Related Papers
- → Human activity recognition using optical flow based feature set(2016)40 cited
- → Human Action Recognition Using Optical Flow Accumulated Local Histograms(2009)13 cited
- → Predictive Coding Networks Meet Action Recognition(2020)2 cited
- → Action Recognition: First-and Second-Order 3D Feature in Bi-Directional Attention Network(2018)1 cited
- → Combining Spatio-Temporal Appearance Descriptors and Optical Flow for\n Human Action Recognition in Video Data(2013)2 cited