0 citations0 references

XLM-T: Scaling up Multilingual Machine Translation with Pretrained Cross-lingual Transformer Encoders

arXiv (Cornell University)2020

Citations Over Time

Shuming Ma, Jian Yang, Haoyang Huang, Zewen Chi, Li Dong, Dongdong Zhang, Hany Hassan Awadalla, Alexandre Muzio, Akiko Eriguchi, Saksham Singhal, Song Xia, Arul Menezes, Furu Wei

Abstract

Multilingual machine translation enables a single model to translate between different languages. Most existing multilingual machine translation systems adopt a randomly initialized Transformer backbone. In this work, inspired by the recent success of language model pre-training, we present XLM-T, which initializes the model with an off-the-shelf pretrained cross-lingual Transformer encoder and fine-tunes it with multilingual parallel data. This simple method achieves significant improvements on a WMT dataset with 10 language pairs and the OPUS-100 corpus with 94 pairs. Surprisingly, the method is also effective even upon the strong baseline with back-translation. Moreover, extensive analysis of XLM-T on unsupervised syntactic parsing, word alignment, and multilingual classification explains its effectiveness for machine translation. The code will be at https://aka.ms/xlm-t.

Related Papers

→ Blood and Black Lace (aka Sei donne per l’assassino [ Six Women for the Murderer ], aka Fashion House of Death )(2007)
→ Twitch of the Death Nerve (aka Reazione a catena [ Chain Reaction ], aka A Bay of Blood , aka Antefatto , Before the Fact – Ecology of a Crime , aka Bloodbath Bay of Death , aka Carnage , aka Last House on the Left Part II , aka New House on the Left , aka The Ecology of a Crime )(2007)
→ Kill, Baby . . . Kill! (aka Operazione paura [ Operation Fear ], aka Don’t Walk in the Park , aka Curse of the Dead )(2007)
Chandra X-ray observations of SWIFT J0045.2+4151 (aka Sw J0045 aka GRB 150301C aka p[PFH2005] 622)(2015)