NOVA
Citations Over TimeTop 10% of 2018 papers
Abstract
A feasible and flexible annotation system is designed for joint tokenization and part-of-speech (POS) tagging to annotate those languages without natural definitions of words . This design was motivated by the fact that word separators are not used in many highly analytic East and Southeast Asian languages. Although several of the languages are well-studied, e.g., Chinese and Japanese, many are understudied with low resources, e.g., Burmese (Myanmar) and Khmer. In the first part of the article, the proposed annotation system, named nova, is introduced. nova contains only four basic tags (n, v, a, and o); these tags can be further modified and combined to adapt complex linguistic phenomena in tokenization and POS tagging. In the second part of the article, the feasibility and flexibility of nova is illustrated from the annotation practice on Burmese and Khmer. The relation between nova and two universal POS tagsets is discussed in the final part of the article.
Related Papers
- → Industrial zones in Burma and Burmese labour in Thailand(2007)9 cited
- Burmese Migrants in Thailand(2005)
- Analysis on the Characteristics of the Sino-Burmese Wars and the Burmese-Thai Wars from 9th to 18th Century(2009)
- → Burmese(2019)
- → The Burmese Hour(2022)