Sentiment Analysis for a Resource Poor Language—Roman Urdu
Citations Over TimeTop 10% of 2019 papers
Abstract
Sentiment analysis is an important sub-task of Natural Language Processing that aims to determine the polarity of a review. Most of the work done on sentiment analysis is for the resource-rich languages of the world, but very limited work has been done on resource-poor languages. In this work, we focus on developing a Sentiment Analysis System for Roman Urdu, which is a resource-poor language. To this end, a dataset of 11,000 reviews has been gathered from six different domains. Comprehensive annotation guidelines were defined and the dataset was annotated using the multi-annotator methodology. Using the annotated dataset, state-of-the-art algorithms were used to build a sentiment analysis system. To improve the results of these algorithms, four different studies were carried out based on: word-level features, character level features, and feature union. The best results showed that we could reduce the error rate by 12% from the baseline (80.07%). Also, to see if the improvements are statistically significant, we applied t-test and Confidence Interval on the obtained results and found that the best results of each study are statistically significant from the baseline.
Related Papers
- → Learning sentiment-inherent word embedding for word-level and sentence-level sentiment analysis(2015)17 cited
- → A Comprehensive Guideline for Bengali Sentiment Annotation(2021)16 cited
- → SENSEVAL-2 Japanese Translation Task.(2003)14 cited
- Proposal for multi-word expression annotation in running text(2010)
- Building Computational Resources : The URDU.KON-TB Treebank and the Urdu Parser(2014)