A Tutorial on Evaluation Metrics used in Natural Language Generation
Citations Over TimeTop 25% of 2021 papers
Abstract
There has been a massive surge of Natural Language Generation (NLG) models in the recent years, accelerated by deep learning and the availability of large-scale datasets. With such rapid progress, it is vital to assess the extent of scientific progress made and identify the areas/components that need improvement. To accomplish this in an automatic and reliable manner, the NLP community has actively pursued the development of automatic evaluation metrics. Especially in the last few years, there has been an increasing focus on evaluation metrics, with several criticisms of existing metrics and proposals for several new metrics. This tutorial presents the evolution of automatic evaluation metrics to their current state along with the emerging trends in this field by specifically addressing the following questions: (i) What makes NLG evaluation challenging? (ii) Why do we need automatic evaluation metrics? (iii) What are the existing automatic evaluation metrics and how can they be organised in a coherent taxonomy? (iv) What are the criticisms and shortcomings of existing metrics? (v) What are the possible future directions of research? * Human/Manual Evaluation * Automatic Evaluation -Tutorial Roadmap Challenges of Automatic Evaluation of NLG tasks (20 min) -Breakdown of evaluation criteria for different tasks * Machine
Related Papers
- → Evaluation in the context of natural language generation(1998)67 cited
- Context-aware Natural Language Generation for Spoken Dialogue Systems.(2016)
- Evaluation in Natural Language Generation: Lessons from Referring Expression Generation(2007)
- → Data mining via protoform based linguistic summaries: Some possible relations to natural language generation(2009)13 cited
- Affective Natural Language Generation(1999)