Translation and Transliteration Based Data Augmentation for Multilingual Semantic Parsing
Abstract
Multilingual semantic parsing is one of the natural language understanding tasks powering modern virtual assistants. Annotating training data for supporting all languages is expensive and methods that rely on machine translation and label projection are used to perform language adaptation. In this paper, we revisit the assumption that a separate label projection step is necessary, with the goal of saving compute and reducing the complexity of the data augmentation pipeline. We create synthetic training examples by applying translation and transliteration directly at the slot level. We show that without a dedicated and expensive label projection component, we are able to achieve 97% of state-of-the-art data augmentation performance on multilingual semantic parsing, and obtain the same performance of the best systems for code mixed and code switched semantic parsing.
Related Papers
- → Translating unknown words using WordNet and IPA-based-transliteration(2011)7 cited
- → Al-Qur'an Transliteration According to muallaf at Annaba Center Indonesia(2021)
- → Note on Transliteration and Translation(2002)
- → Note on Transliteration and Translation(2022)
- → A Note on Transliteration, Translation, and Sourcing(2020)