0 citations0 references

YouTube2Text: Recognizing and Describing Arbitrary Activities Using Semantic Hierarchies and Zero-Shot Recognition

2013pp. 2712–2719

Citations Over TimeTop 1% of 2013 papers

Sergio Guadarrama, Niveda Krishnamoorthy, Girish Malkarnenkar, Subhashini Venugopalan, Raymond J. Mooney, Trevor Darrell, Kate Saenko

Abstract

Despite a recent push towards large-scale object recognition, activity recognition remains limited to narrow domains and small vocabularies of actions. In this paper, we tackle the challenge of recognizing and describing activities ``in-the-wild''. We present a solution that takes a short video clip and outputs a brief sentence that sums up the main activity in the video, such as the actor, the action and its object. Unlike previous work, our approach works on out-of-domain actions: it does not require training videos of the exact activity. If it cannot find an accurate prediction for a pre-trained model, it finds a less specific answer that is also plausible from a pragmatic standpoint. We use semantic hierarchies learned from the data to help to choose an appropriate level of generalization, and priors learned from Web-scale natural language corpora to penalize unlikely combinations of actors/actions/objects, we also use a Web-scale language model to ``fill in'' novel verbs, i.e. when the verb does not appear in the training set. We evaluate our method on a large YouTube corpus and demonstrate it is able to generate short sentence descriptions of video clips better than baseline approaches.

Related Papers

→ Accounting for baseline targets in NDCs(2018)3 cited
Method for Improvement of Baseline Resolving Quality in GPS Measurement(2007)
Analysis of the Baseline Decorrelation and Critical Baseline of Interferometric SAR(2003)
Research of Baseline Implement(2010)
Study and application of telecom operators system security state baseline(2012)