XLM-T: Scaling up Multilingual Machine Translation with Pretrained Cross-lingual Transformer Encoders
Citations Over Time
Abstract
Multilingual machine translation enables a single model to translate between different languages. Most existing multilingual machine translation systems adopt a randomly initialized Transformer backbone. In this work, inspired by the recent success of language model pre-training, we present XLM-T, which initializes the model with an off-the-shelf pretrained cross-lingual Transformer encoder and fine-tunes it with multilingual parallel data. This simple method achieves significant improvements on a WMT dataset with 10 language pairs and the OPUS-100 corpus with 94 pairs. Surprisingly, the method is also effective even upon the strong baseline with back-translation. Moreover, extensive analysis of XLM-T on unsupervised syntactic parsing, word alignment, and multilingual classification explains its effectiveness for machine translation. The code will be at https://aka.ms/xlm-t.
Related Papers
- → Blood and Black Lace (aka Sei donne per l’assassino [ Six Women for the Murderer ], aka Fashion House of Death )(2007)
- → Twitch of the Death Nerve (aka Reazione a catena [ Chain Reaction ], aka A Bay of Blood , aka Antefatto , Before the Fact – Ecology of a Crime , aka Bloodbath Bay of Death , aka Carnage , aka Last House on the Left Part II , aka New House on the Left , aka The Ecology of a Crime )(2007)
- → Kill, Baby . . . Kill! (aka Operazione paura [ Operation Fear ], aka Don’t Walk in the Park , aka Curse of the Dead )(2007)
- Chandra X-ray observations of SWIFT J0045.2+4151 (aka Sw J0045 aka GRB 150301C aka p[PFH2005] 622)(2015)