Denoising Neural Machine Translation Training with Trusted Data and Online Data Selection
2018pp. 133–143
Citations Over TimeTop 10% of 2018 papers
Abstract
Measuring domain relevance of data and identifying or selecting well-fit domain data for machine translation (MT) is a well-studied topic, but denoising is not yet. Denoising is concerned with a different type of data quality and tries to reduce the negative impact of data noise on MT training, in particular, neural MT (NMT) training. This paper generalizes methods for measuring and selecting data for domain MT and applies them to denoising NMT training. The proposed approach uses trusted data and a denoising curriculum realized by online data selection. Intrinsic and extrinsic evaluations of the approach show its significant effectiveness for NMT to train on data with severe noise.
Related Papers
- → Combining SMT and NMT Back-Translated Data for Efficient NMT(2019)12 cited
- Enhancing the Quality of Noisy Training Data Using a Genetic Algorithm and Prototype Selection.(2008)
- Training Data in Statistical Machine Translation - the More, the Better?(2011)
- → Combining SMT and NMT Back-Translated Data for Efficient NMT(2019)1 cited