Where to Look: Focus Regions for Visual Question Answering
2016pp. 4613–4621
Citations Over TimeTop 10% of 2016 papers
Abstract
We present a method that learns to answer visual questions by selecting image regions relevant to the text-based query. Our method exhibits significant improvements in answering questions such as "what color," where it is necessary to evaluate a specific location, and "what room," where it selectively identifies informative image regions. Our model is tested on the VQA dataset which is the largest human-annotated visual question answering dataset to our knowledge.
Related Papers
- → Hierarchical Question-Image Co-Attention for Visual Question Answering(2016)1,216 cited
- → VQA: Visual Question Answering(2015)1,094 cited
- → Dynamic Memory Networks for Visual and Textual Question Answering(2016)593 cited
- → Stacked Attention Networks for Image Question Answering(2015)193 cited
- → Ask, Attend and Answer: Exploring Question-Guided Spatial Attention for Visual Question Answering(2015)105 cited