End-to-End People Detection in Crowded Scenes
2016pp. 2325–2333
Citations Over TimeTop 1% of 2016 papers
Abstract
Current people detectors operate either by scanning an image in a sliding window fashion or by classifying a discrete set of proposals. We propose a model that is based on decoding an image into a set of people detections. Our system takes an image as input and directly outputs a set of distinct detection hypotheses. Because we generate predictions jointly, common post-processing steps such as nonmaximum suppression are unnecessary. We use a recurrent LSTM layer for sequence generation and train our model end-to-end with a new loss function that operates on sets of detections. We demonstrate the effectiveness of our approach on the challenging task of detecting people in crowded scenes1.
Related Papers
- → Does End-to-End Trained Deep Model Always Perform Better than Non-End-to-End Counterpart?(2021)2 cited
- Periodically updating sliding window join algorithms over data streams(2005)
- → End-to-end consensus using end-to-end channels(2006)2 cited
- → Dynamic window configuration in an object oriented programming environment(2003)
- Using the Technique of Window Subclassification to Design the Windows Program Manager Restoring Software(1999)