0 citations0 references

German Compounds and Statistical Machine Translation. Can they get along?

2014pp. 48–56

Citations Over TimeTop 12% of 2014 papers

Carla Parra Escartín, Stephan Peitz, Hermann Ney

Abstract

This paper reports different experiments created to study the impact of using linguistics to preprocess German compounds prior to translation in Statistical Machine Translation (SMT). Compounds are a known challenge both in Machine Translation (MT) and Translation in general as well as in other Natural Language Processing (NLP) applications. In the case of SMT, German compounds are split into their constituents to decrease the number of unknown words and improve the results of evaluation measures like the Bleu score. To assess to which extent it is necessary to deal with German compounds as a part of preprocessing in SMT systems, we have tested different compound splitters and strategies, such as adding lists of compounds and their translations to the training set. This paper summarizes the results of our experiments and attempts to yield better translations of German nominal compounds into Spanish and shows how our approach improves by up to 1.4 Bleu points with respect to the baseline.

Citations Over TimeTop 12% of 2014 papers

Abstract

Related Papers