Diagram Understanding in Geometry Questions
Citations Over TimeTop 10% of 2014 papers
Abstract
Automatically solving geometry questions is a long-standing AI problem. A geometry question typically includes a textual description accompanied by a diagram. The first step in solving geometry questions is diagram understanding, which consists of identifying visual elements in the diagram, their locations, their geometric properties, and aligning them to corresponding textual descriptions. In this paper, we present a method for diagram understanding that identifies visual elements in a diagram while maximizing agreement between textual and visual data. We show that the method's objective function is submodular; thus we are able to introduce an efficient method for diagram understanding that is close to optimal. To empirically evaluate our method, we compile a new dataset of geometry questions (textual descriptions and diagrams) and compare with baselines that utilize standard vision techniques. Our experimental evaluation shows an F1 boost of more than 17% in identifying visual elements and 25% in aligning visual elements with their textual descriptions.
Related Papers
- → Submodular Cost Submodular Cover with an Approximate Oracle(2019)5 cited
- → Expressibility of Submodular Languages(2012)
- Smooth interactive submodular set cover(2015)
- → Two-stage non-submodular maximization(2023)