Arabic preprocessing schemes for statistical machine translation
2006pp. 49–52
Citations Over TimeTop 1% of 2006 papers
Abstract
In this paper, we study the effect of different word-level preprocessing decisions for Arabic on SMT quality. Our results show that given large amounts of training data, splitting off only proclitics performs best. However, for small amounts of training data, it is best to apply English-like to-kenization using part-of-speech tags, and sophisticated morphological analysis and disambiguation. Moreover, choosing the appropriate preprocessing produces a significant increase in BLEU score if there is a change in genre between training and test data.
Related Papers
- → Neural Machine Translation of Indian Languages(2017)44 cited
- Better Evaluation Metrics Lead to Better Machine Translation(2011)
- → ParFDA for Instance Selection for Statistical Machine Translation(2016)7 cited
- Statistical Machine Translation with Rule based Machine Translation.(2011)
- → Factored Statistical Machine Translation for German-English(2018)