Multilingual Constituency Parsing with Self-Attention and Pre-Training
Citations Over TimeTop 1% of 2019 papers
Abstract
We show that constituency parsing benefits from unsupervised pre-training across a variety of languages and a range of pre-training conditions. We first compare the benefits of no pre-training, fastText We also find that pre-training is beneficial across all 11 languages tested; however, large model sizes (more than 100 million parameters) make it computationally expensive to train separate models for each language. To address this shortcoming, we show that joint multilingual pre-training and fine-tuning allows sharing all but a small number of parameters between ten languages in the final model. The 10x reduction in model size compared to fine-tuning one model per language causes only a 3.2% relative error increase in aggregate. We further explore the idea of joint fine-tuning and show that it gives low-resource languages a way to benefit from the larger datasets of other languages. Finally, we demonstrate new state-ofthe-art results for 11 languages, including English (95.8 F1) and Chinese (91.8 F1).
Related Papers
- → Systematic Processing of Long Sentences in Rule Based Portuguese-Chinese Machine Translation(2010)9 cited
- → A Hybrid Approach to Parsing Natural Languages(2016)1 cited
- → Morphological and Syntactic Processing for Text Retrieval(2004)8 cited
- Improving the Efficiency for Joint POS-Tagging and Dependency Parsing with Uptraining(2014)
- Syntactic Parsing based on Phrase Structure in Natural Language Processing(2009)