CALOR-QUEST : generating a training corpus for Machine Reading Comprehension models from shallow semantic annotations
Citations Over Time
Abstract
Machine reading comprehension is a task related to Question-Answering where questions are not generic in scope but are related to a particular document. Recently very large corpora (SQuAD, MS MARCO) containing triplets (document, question, answer) were made available to the scientific community to develop supervised methods based on deep neural networks with promising results. These methods need very large training corpus to be efficient, however such kind of data only exists for English and Chinese at the moment. The aim of this study is the development of such resources for other languages by proposing to generate in a semi-automatic way questions from the semantic Frame analysis of large corpora. The collect of natural questions is reduced to a validation/test set. We applied this method on the CALOR-Frame French corpus to develop the CALOR-QUEST resource presented in this paper.
Related Papers
- → An analysis of the AskMSR question-answering system(2002)315 cited
- Overview of Question-Answering(2002)
- → Natural Language Processing based New Approach to Design Factoid Question Answering System(2020)11 cited
- A Survey on Question and Answering Systems(2012)
- Effective Question Answering Techniques and their Evaluation Metrics(2013)