0 citations0 references

Arabic preprocessing schemes for statistical machine translation

2006pp. 49–52

Citations Over TimeTop 1% of 2006 papers

Abstract

In this paper, we study the effect of different word-level preprocessing decisions for Arabic on SMT quality. Our results show that given large amounts of training data, splitting off only proclitics performs best. However, for small amounts of training data, it is best to apply English-like to-kenization using part-of-speech tags, and sophisticated morphological analysis and disambiguation. Moreover, choosing the appropriate preprocessing produces a significant increase in BLEU score if there is a change in genre between training and test data.

Related Papers

→ Neural Machine Translation of Indian Languages(2017)44 cited
Better Evaluation Metrics Lead to Better Machine Translation(2011)
→ ParFDA for Instance Selection for Statistical Machine Translation(2016)7 cited
Statistical Machine Translation with Rule based Machine Translation.(2011)
→ Factored Statistical Machine Translation for German-English(2018)