MURAL: Multimodal, Multitask Representations Across Languages
Citations Over TimeTop 10% of 2021 papers
Abstract
Both image-caption pairs and translation pairs provide the means to learn deep representations of and connections between languages. We use both types of pairs in MURAL (MUltimodal, MUltitask Representations Across Languages), a dual encoder that solves two tasks: 1) image-text matching and 2) translation pair matching. By incorporating billions of translation pairs, MURAL extends ALIGN (Jia et al., 2021)-a state-of-the-art dual encoder learned from 1.8 billion noisy image-text pairs. When using the same encoders, MURAL's performance matches or exceeds ALIGN's cross-modal retrieval performance on wellresourced languages across several datasets. More importantly, it considerably improves performance on under-resourced languages, showing that text-text learning can overcome a paucity of image-caption examples for these languages. On the Wikipedia Image-Text dataset, for example, MURAL-BASE improves zero-shot mean recall by 8.1% on average for eight under-resourced languages and by 6.8% on average when fine-tuning. We additionally show that MURAL's text representations cluster not only with respect to genealogical connections but also based on areal linguistics, such as the Balkan Sprachbund.
Related Papers
- → Original Technique of the Mural America Tropical by David Alfaro Siqueiros(1995)5 cited
- Conservation of mural paintings in temples: a method of taking off and restoring mural paintings(2005)
- A Discussion of the Vulgarization of Mural Art in the Taiping Rebellion Period(2013)
- Pub Mural's Lost Legacy(2008)
- → A Study on Reliefs in John Singer Sargent’s Murals - Overcoming the Limits of Mural Expression and Experimentation with Mural Techniques -(2017)