Translation and Transliteration Based Data Augmentation for Multilingual Semantic Parsing

Frontiers in artificial intelligence and applications2024

Sarthak Jauhari, Massimo Nicosia, Ankush Chatterjee, Rahul Goel

Abstract

Multilingual semantic parsing is one of the natural language understanding tasks powering modern virtual assistants. Annotating training data for supporting all languages is expensive and methods that rely on machine translation and label projection are used to perform language adaptation. In this paper, we revisit the assumption that a separate label projection step is necessary, with the goal of saving compute and reducing the complexity of the data augmentation pipeline. We create synthetic training examples by applying translation and transliteration directly at the slot level. We show that without a dedicated and expensive label projection component, we are able to achieve 97% of state-of-the-art data augmentation performance on multilingual semantic parsing, and obtain the same performance of the best systems for code mixed and code switched semantic parsing.

Abstract

Related Papers