Quantifying the utility of parallel corpora
2001pp. 398–399
Citations Over TimeTop 13% of 2001 papers
Abstract
Our English-Chinese cross-language IR system is trained from parallel corpora; we investigate its performance as a function of training corpus size for three different training corpora. We find that the performance of the system as trained on the three parallel corpora can be related by a simple measure, namely the out-of-vocabulary rate of query words.
Related Papers
- → Paraphrasing with bilingual parallel corpora(2005)544 cited
- Using Bilingual Parallel Corpora for Cross-Lingual Textual Entailment(2011)
- Creating and using large monolingual parallel corpora for sentential paraphrase generation(2014)
- Exploiting Parallel Corpora for Supervised Word-Sense Disambiguation in English-Hungarian Machine Translation(2006)
- Using Parallel Corpora for Word Sense Disambiguation(2011)