Code-mixed sentiment analysis of Indonesian language and Javanese language using Lexicon based approach
Citations Over TimeTop 10% of 2021 papers
Abstract
Abstract Nowadays mixing one language with another language either in spoken or written communication has become a common practice for bilingual speakers in daily conversation as well as in social media. Lexicon based approach is one of the approaches in extracting the sentiment analysis. This study is aimed to compare two lexicon models which are SentiNetWord and VADER in extracting the polarity of the code-mixed sentences in Indonesian language and Javanese language. 3,963 tweets were gathered from two accounts that provide code-mixed tweets. Pre-processing such as removing duplicates, translating to English, filter special characters, transform lower case and filter stop words were conducted on the tweets. Positive and negative word score from lexicon model was then calculated using simple mathematic formula in order to classify the polarity. By comparing with the manual labelling, the result showed that SentiNetWord perform better than VADER in negative sentiments. However, both of the lexicon model did not perform well in neutral and positive sentiments. On overall performance, VADER showed better performance than SentiNetWord. This study showed that the reason for the misclassified was that most of Indonesian language and Javanese language consist of words that were considered as positive in both Lexicon model.
Related Papers
- → Analyzing Sentiments Expressed on Twitter by UK Energy Company Consumers(2018)77 cited
- → Improving the performance of lexicon-based review sentiment analysis method by reducing additional introduced sentiment bias(2018)45 cited
- Sentiment Analysis: A Review(2017)
- → Pengembangan Profesionalitas Guru Bahasa Indonesia_M.Bayu Firmansyah_STKIP PGRI Pasuruan Jawa Timur Indonesia(2018)