0 citations0 references

Ranking Generated Summaries by Correctness: An Interesting but Challenging Application for Natural Language Inference

2019pp. 2214–2220

Citations Over TimeTop 1% of 2019 papers

Tobias Falke, Leonardo F. R. Ribeiro, Prasetya Ajie Utama, Ido Dagan, Iryna Gurevych

Abstract

While recent progress on abstractive summarization has led to remarkably fluent summaries, factual errors in generated summaries still severely limit their use in practice. In this paper, we evaluate summaries produced by state-of-the-art models via crowdsourcing and show that such errors occur frequently, in particular with more abstractive models. We study whether textual entailment predictions can be used to detect such errors and if they can be reduced by reranking alternative predicted summaries. That leads to an interesting downstream application for entailment models. In our experiments, we find that outof-the-box entailment models trained on NLI datasets do not yet offer the desired performance for the downstream task and we therefore release our annotations as additional test data for future extrinsic evaluations of NLI.

Related Papers

Ensure the Correctness of the Summary: Incorporate Entailment Knowledge into Abstractive Sentence Summarization(2018)
→ The impact of summarisation on textual entailment - a case study on global warming arguments(2018)1 cited
→ Evaluating Factuality in Generation with Dependency-level Entailment(2020)1 cited
→ Recognition of Partial Textual Entailment for Indian Social Media Text(2019)1 cited
A Survey of Text Entailment until June 2012(2012)