Automatically Classifying Edit Categories in Wikipedia Revisions
Citations Over TimeTop 10% of 2013 papers
Abstract
In this paper, we analyze a novel set of features for the task of automatic edit category classification.Edit category classification assigns categories such as spelling error correction, paraphrase or vandalism to edits in a document.Our features are based on differences between two versions of a document including meta data, textual and language properties and markup.In a supervised machine learning experiment, we achieve a micro-averaged F1 score of .62 on a corpus of edits from the English Wikipedia.In this corpus, each edit has been multi-labeled according to a 21-category taxonomy.A model trained on the same data achieves state-of-the-art performance on the related task of fluency edit classification.We apply pattern mining to automatically labeled edits in the revision histories of different Wikipedia articles.Our results suggest that high-quality articles show a higher degree of homogeneity with respect to their collaboration patterns as compared to random articles.
Related Papers
- Enlarging Paraphrase Collections through Generalization and Instantiation(2012)
- Finnish Paraphrase Corpus(2021)
- → Chinese Paraphrase Dataset and Detection(2021)2 cited
- Paraphrase extraction from interactive Q&A communities(2012)
- → A Study on the Application of Paraphrase Strategy in the Translation from Chinese to English(2018)1 cited