An N-gram-Based BERT model for Sentiment Classification Using Movie Reviews
Citations Over TimeTop 19% of 2022 papers
Abstract
An abundance of product reviews and opinions is being produced every day across the internet and other media. Sentiment analysis analyzes those data and classifies them as positive or negative. In this paper, a classification model is proposed for n-gram sentiment analysis using BERT. Specifically, the large IMDB movie review dataset is used that contains 50K instances. This dataset is tokenized and encoded into unigrams, bigrams, and trigrams and their combinations such as unigram and bigram, bigram and trigram, and unigram, bigram, and trigram. The proposed BERT model employs on these extracted features. Then, this model is evaluated using the F1 score and its micro, macro, and weighted-average scores. The model shows comparable results to state-of-the-art methods for all n-gram features. In particular, the model achieves 94.64% highest accuracy for the combination of bigram and trigram features, and 94.68% unigram, bigram, and trigram features than other n-gram features.
Related Papers
- → Optimizing n‑gram Order of an n‑gram Based Language Identification Algorithm for 68 Written Languages(2009)18 cited
- → N-Gram Accuracy Analysis in the Method of Chatbot Response(2018)13 cited
- → An N-gram-Based BERT model for Sentiment Classification Using Movie Reviews(2022)8 cited
- → Class phrase models for language modeling(2002)45 cited