0 citations0 references

Interpreting BLEU/NIST Scores: How Much Improvement Do We Need to Have a Better System?

2004

Citations Over TimeTop 10% of 2004 papers

Abstract

Automatic evaluation metrics for Machine Translation (MT) systems, such as BLEU and the related NIST metric, are becoming increasingly important in MT. Yet, their behaviors are not fully understood. In this paper, we analyze some flaws in the BLEU/NIST metrics. With a better understanding of these problems, we can better interpret the reported BLEU/NIST scores. In addition, this paper reports a novel method of calculating the confidence intervals for BLEU/NIST scores using bootstrapping. With this method, we can determine whether two MT systems are significantly different from each other.

Citations Over TimeTop 10% of 2004 papers

Abstract

Related Papers