German Compounds and Statistical Machine Translation. Can they get along?
Citations Over TimeTop 12% of 2014 papers
Abstract
This paper reports different experiments created to study the impact of using linguistics to preprocess German compounds prior to translation in Statistical Machine Translation (SMT). Compounds are a known challenge both in Machine Translation (MT) and Translation in general as well as in other Natural Language Processing (NLP) applications. In the case of SMT, German compounds are split into their constituents to decrease the number of unknown words and improve the results of evaluation measures like the Bleu score. To assess to which extent it is necessary to deal with German compounds as a part of preprocessing in SMT systems, we have tested different compound splitters and strategies, such as adding lists of compounds and their translations to the training set. This paper summarizes the results of our experiments and attempts to yield better translations of German nominal compounds into Spanish and shows how our approach improves by up to 1.4 Bleu points with respect to the baseline.
Related Papers
- → Improving Statistical Machine Translation with Word Class Models(2013)41 cited
- → Language Model Pre-training Method in Machine Translation Based on Named Entity Recognition(2020)14 cited
- → Towards State-of-the-art English-Vietnamese Neural Machine Translation(2017)8 cited
- → English-Japanese Neural Machine Translation with Encoder-Decoder-Reconstructor(2017)2 cited
- → Recurrent Stacking of Layers for Compact Neural Machine Translation Models(2018)2 cited