Enriching the knowledge sources used in a maximum entropy part-of-speech tagger
2000Vol. 13, pp. 63–70
Citations Over TimeTop 10% of 2000 papers
Abstract
This paper presents results for a maximum-entropy-based part of speech tagger, which achieves superior performance principally by enriching the information sources used for tagging. In particular, we get improved results by incorporating these features: (i) more extensive treatment of capitalization for unknown words; (ii) features for the disambiguation of the tense forms of verbs; (iii) features for disambiguating particles from prepositions and adverbs. The best resulting accuracy for the tagger on the Penn Treebank is 96.86% overall, and 86.91% on previously unseen words.
Related Papers
- → Building an Indonesian rule-based part-of-speech tagger(2014)42 cited
- A Two-Stage Approach to Chinese Part-of-Speech Tagging(2008)
- → Chinese part of speech tagging based on maximum entropy method(2003)4 cited
- → Mongolian Part-of-Speech Tagging with Neural Networks(2021)2 cited
- → Part-Of-Speech Tagging in French: State-of-the-Art and Obstacles(2020)