Named Entity Recognition: A Survey for Indian Languages
Citations Over TimeTop 21% of 2019 papers
Abstract
Named Entity Recognition (NER) is a tool based on principles of Artificial Intelligence (AI) and Natural Language Processing (NLP) for automatically tagging Named Entities from unstructured text. Named Entities, which are generally proper nouns, can be the name of a person, organization, location etc. Named Entity Recognition is a very crucial tool of Natural Language Processing. Some of the application areas of NER include- Information Extraction and Retrieval, Machine Translation, Text Summarization etc. This paper analyses different approaches used in the NER of Indian languages, with emphasis on Hindi. The study compares the different approaches for NER viz. Machine Learning (ML), Rule-based and Hybrid. This study is orchestrated to provide gap analysis in available NER systems for Indian Languages, especially Hindi. It is because existing NER systems are evaluated for accuracies based on predefined datasets, which are not providing universal results on any dataset. Also, there is no standard dataset available for fair comparison of accuracies of Indian Languages. Furthermore, we concentrated our scope to Hindi as it is the official language of India and representative to other Indo-Aryan languages having similar structure, making it a generalized solution. Moreover, the scope of Named Entity tags are limited. Whereas, there is a scope of further classification of Named Entities. As future work, we would like to develop an efficient system, in terms of accuracy, and catering to more Named Entity tags than available presently in Hindi NER tools.
Related Papers
- → Named entity discovery using comparable news articles(2004)77 cited
- → Analysis and robust extraction of changing named entities(2009)4 cited
- → Improving named entity recognition and disambiguation in news headlines(2019)1 cited
- → Analysis of named entity recognition & entity linking in historical text(2016)1 cited