Lexicon-assisted tagging and lemmatization in Latin: A comparison of six taggers and two lemmatization models
2015pp. 105–113
Citations Over TimeTop 10% of 2015 papers
Abstract
We present a survey of tagging accuracies -concerning part-of-speech and full morphological tagging -for several taggers based on a corpus for medieval church Latin (see www.comphistsem.org). The best tagger in our sample, Lapos, has a PoS tagging accuracy of close to 96% and an overall tagging accuracy (including full morphological tagging) of about 85%. When we 'intersect' the taggers with our lexicon, the latter score increases to almost 91% for Lapos. A conservative assessment of lemmatization accuracy on our data estimates a score of 93-94% for a lexicon-based lemmatization strategy and a score of 94-95% for lemmatizing via trained lemmatizers.
Related Papers
- → Implementation of Stemmer and Lemmatizer for a Low-Resource Language—Kannada(2021)8 cited
- BasiLex: An 11.5 million words corpus of Dutch texts written for children(2014)
- CEPLEXicon - A Lexicon of Child European Portuguese(2016)
- Developing and Evaluating a Searchable Swedish-Thai Lexicon(2007)
- Developing and Evaluating a Searchable Swedish-Thai Lexicon(2007)