Sentiment classification for Indonesian message in social media
Citations Over TimeTop 10% of 2012 papers
Abstract
Nowadays, classifying sentiment from social media has been a strategic thing since people can express their feeling about something in an easy way and short text. Mining opinion from social media has become important because people are usually honest with their feeling on something. In our research, we tried to identify the problems of classifying sentiment from Indonesian social media. We identified that people tend to express their opinion in text while the emoticon is rarely used and sometimes misleading. We also identified that the Indonesian social media opinion can be classified not only to positive, negative, neutral and question but also to a special mix case between negative and question type. Basically there are two levels of problem: word level and sentence level. Word level problems include the usage of punctuation mark, the number usage to replace letter, misspelled word and the usage of nonstandard abbreviation. In sentence level, the problem is related with the sentiment type such as mentioned before. In our research, we built a sentiment classification system which includes several steps such as text preprocessing, feature extraction, and classification. The text preprocessing aims to transform the informal text into formal text. The word formalization method in that we use is the deletion of punctuation mark, the tokenization, conversion of number to letter, the reduction of repetition letter, and using corpus with Levensthein to formalize abbreviation. The sentence formalization method that we use is negation handling, sentiment relative, and affixes handling. Rule-based, SVM and Maximum Entropy are used as the classification algorithms with features of count of positive, negative, and question word in sentence and bigram. From our experimental result, the best classification method is SVM that yields 83.5% accuracy.
Related Papers
- Learning About Punctuation(1996)
- The Cassell guide to punctuation(2000)
- → Features of idiolect in the punctuation of electronic mail(2011)1 cited
- → Aleksandr Solzhenitsyn and Codification of Russian Punctuation(2018)1 cited
- THE EXPRESSIVE USE OF PUNCTUATION IN FICTION (on the material of the French)(2011)