Building English-to-Serbian Machine Translation System for IMDb Movie Reviews
Citations Over TimeTop 15% of 2019 papers
Abstract
This paper reports the results of the first experiment dealing with the challenges of building a machine translation system for usergenerated content involving a complex South Slavic language. We focus on translation of English IMDb user movie reviews into Serbian, in a low-resource scenario. We explore potentials and limits of (i) phrase-based and neural machine translation systems trained on out-of-domain clean parallel data from news articles (ii) creating additional synthetic indomain parallel corpus by machine-translating the English IMDb corpus into Serbian. Our main findings are that morphology and syntax are better handled by the neural approach than by the phrase-based approach even in this lowresource mismatched domain scenario, however the situation is different for the lexical aspect, especially for person names. This finding also indicates that in general, machine translation of person names into Slavic languages (especially those which require/allow transcription) should be investigated more systematically.
Related Papers
- A Hybrid Approach to Example based Machine Translation for Indian Languages(2007)
- → Machine Translation Using Deep Learning: A Comparison(2020)4 cited
- → A MULTI-ENGINE TRANSLATION APPROACH TO MACHINE TRANSLATION(2002)3 cited
- Основные факторы улучшения машинного перевода(2015)