On the Limitations of Cross-lingual Encoders as Exposed by Reference-Free Machine Translation Evaluation
Citations Over TimeTop 10% of 2020 papers
Abstract
Evaluation of cross-lingual encoders is usually performed either via zero-shot cross-lingual transfer in supervised downstream tasks or via unsupervised cross-lingual textual similarity.In this paper, we concern ourselves with reference-free machine translation (MT) evaluation where we directly compare source texts to (sometimes low-quality) system translations, which represents a natural adversarial setup for multilingual encoders.Referencefree evaluation holds the promise of web-scale comparison of MT systems.We systematically investigate a range of metrics based on state-of-the-art cross-lingual semantic representations obtained with pretrained M-BERT and LASER.We find that they perform poorly as semantic encoders for reference-free MT evaluation and identify their two key limitations, namely, (a) a semantic mismatch between representations of mutual translations and, more prominently, (b) the inability to punish "translationese", i.e., low-quality literal translations.We propose two partial remedies:(1) post-hoc re-alignment of the vector spaces and (2) coupling of semantic-similarity based metrics with target-side language modeling.In segment-level MT evaluation, our best metric surpasses reference-based BLEU by 5.7 correlation points.We make our MT evaluation code available. 1
Related Papers
- → Monomodal image registration using mutual information based methods(2006)54 cited
- → Machine Translation Using Deep Learning: A Comparison(2020)4 cited
- [Implementation of mutual information based medical image registration methods].(2003)
- → On Study of Mutual Information and its Estimation Methods(2021)
- → A New Criterion of Mutual Information Using R-value(2013)