Abstract
Movie reviews are a valuable source of information for potential viewers. However, reading all of the reviews can be time-consuming and overwhelming. Summarizing all of the reviews will help you make the correct choice without wasting time reading all of the reviews. Sentiment analysis, or opinion mining, can extract subjective information from movie reviews, such as the reviewer’s overall opinion of the movie, its strengths and weaknesses, and the reviewer’s recommendations. This information can help potential viewers make informed decisions about whether or not to watch a movie. XLNet and Bidirectional Encoder Representations from Transformers (BERT) are pre-trained advanced language models that learn bidirectional relationships between words, improving performance on many natural language processing tasks. BERT uses a masked language modeling objective, while XLNet uses a permutation language modeling objective. This experiment is based on the proposed method for XLNet and BERT, two advanced techniques and popular baseline techniques using the Internet Movie Database (IMDB) Dataset of 50K reviews and the Rotten Tomatoes dataset. We pre-processed both datasets using data cleaning, the removal of duplicate reviews, lemmatization, and handling of chat words to improve baseline technique results. The results indicate that XLNet achieved the highest accuracy on both datasets. As a result of the research experiment, sentiment analysis provides insights into how emotions and attitudes are expressed in movie reviews that can be used to predict a movie’s performance based on their overall sentiment.
Similar content being viewed by others
Data Availability
The data that support the findings of this study are openly available through the Open Science Framework at https://github.com/Ankit152/IMDB-sentiment-analysis.git and https://www.kaggle.com/datasets/talha002/rottentomatoes-400k-review
Abbreviations
- AI:
-
Artificial Intelligence
- BERT:
-
Bidirectional Encoder Representations from Transformers
- BOWs:
-
Bag-of-Words
- CV:
-
Cross-Validation
- DL:
-
Deep Learning
- IMDB:
-
Internet Movie Database
- KNN:
-
K-Nearest Neighbours
- LR:
-
Logistic Regression
- LSVM:
-
Linear Support Vector Machines
- ML:
-
Machine Learning
- MLP:
-
Multi-layer Perceptron
- MNB:
-
Multinomial Naive Bayes
- NB:
-
Naive Bayes
- NLP:
-
Natural Language Processing
- NLTK:
-
Natural Language Tool Kit
- OP:
-
Opinion Mining
- PAC:
-
Passive Aggressive Classifier
- RT:
-
Rotten Tomatoes
- SA:
-
Sentiment Analysis
- TF-IDF:
-
Term Frequency-Inverse Document Frequency
References
Aziz MM, Purbalaksono MD, Adiwijaya A (2023) Method comparison of Naïve Bayes, logistic regression, and svm for analyzing movie reviews. Building of Informatics, Technology and Science (BITS) 4(4):1714–1720
Dey L, Chakraborty S, Biswas A, Bose B, Tiwari S (2016) Sentiment analysis of review datasets using naive bayes and k-nn classifier. arXiv:1610.09982
Abimanyu AJ, Dwifebri M, Astut, W (2023) Sentiment analysis on movie review from rotten tomatoes using logistic regression and information gain feature selection. Building of Informatics, Technology and Science (BITS) 5(1):162–170
Khan SS, Khan M, Ran Q, Naseem R. Challenges in opinion mining, comprehensive
Sudha N, Govindarajan M (2016) Mining movie reviews using machine learning techniques. International Journal of Computer Applications. 144(5)
Khan M, Khan MS, Alharbi Y (2020) Text mining challenges and applications-a comprehensive review. IJCSNS 20(12):138
Birjali M, Kasri M, Beni-Hssane A (2021) A comprehensive survey on sentiment analysis: approaches, challenges and trends. Knowl-Based Syst 226:107134
Yang Z, Dai Z, Yang Y, Carbonell J, Salakhutdinov RR, Le QV (2019) Xlnet: generalized autoregressive pretraining for language understanding. Advances in neural information processing systems 32
Devlin J, Chang M-W, Lee K, Toutanova K (2018) Bert: pre–training of deep bidirectional transformers for language understanding. arXiv:1810.04805
Das B, Chakraborty S (2018) An improved text sentiment classification model using tf-idf and next word negation. arXiv:1806.06407
Banik N, Rahman MHH (2018) Evaluation of naïve bayes and support vector machines on bangla textual movie reviews. In: 2018 international conference on bangla speech and language processing (ICBSLP), pp 1–6. IEEE
Jiang X, Song C, Xu Y, Li Y, Peng Y (2022) Research on sentiment classification for netizens based on the bert-bilstm-textcnn model. PeerJ Comput Sci 8:1005
Samsir S, Kusmanto K, Dalimunthe AH, Aditiya R, Watrianthos R (2022) Implementation naïve bayes classification for sentiment analysis on internet movie database. Building of Informatics, Technology and Science (BITS) 4(1):1–6
Govindarajan M (2013) Sentiment analysis of movie reviews using hybrid method of Naive Bayes and genetic algorithm. Int J Adv Comput Res 3(4):139
Saxena M, Tripathi V, Singh K Sentiment analysis of movie reviews using ensemble model
Abinash N, Kalinathan L, Raghavesh D, Kirthika R, KM HK, Praveen M, Nestor IJ, Balasundaram P (2023) Sentiment analysis on reviews using word embedding and ratings. NeuroQuantology 21(6):1084
Amolik A, Jivane N, Bhandari M, Venkatesan M (2015) Twitter sentiment analysis of movie reviews using machine learning techniques. Int J Eng Technol 7(6):2038–2044
Teja JS, Sai GK, Kumar MD, Manikandan R (2018) Sentiment analysis of movie reviews using machine learning algorithms-a survey. Int J Pure Appl Math 118(20):3277–3284
Liu S, Gao P, Li Y, Fu W, Ding W (2023) Multi-modal fusion network with complementarity and importance for emotion recognition. Inf Sci 619:679–694
Danyal MM, Khan SS, Khan M, Ghaar MB, Khan B (2023) Sentiment analysis based on performance of linear support vector machine and multinomial naïve bayes using movie reviews with baseline techniques. Journal of Big Data. 4(13). https://doi.org/10.32604/jbd.2023.041319
Khyani D, Siddhartha BS, Niveditha NM, Divya BM (2021) An interpretation of lemmatization and stemming in natural language processing. J Univ Shanghai Sci Technol
Andersson V (2017) Machine Learning in Logistics: Machine Learning Algorithms: Data Preprocessing and Machine Learning Algorithms
Asghar MZ, Khan A, Ahmad S, Kundi FM (2014) A review of feature extraction in sentiment analysis. J Basic Appl Sci Res 4(3):181–186
Agrawal T (2021) Hyperparameter Optimization in Machine Learning: Make Your Machine Learning and Deep Learning Models More Efficient. Springer, ???
Yang L, Shami A (2020) On hyperparameter optimization of machine learning algorithms: theory and practice. Neurocomputing 415:295–316
Liu S, Huang S, Fu W, Lin JC-W (2022) A descriptive human visual cognitive strategy using graph neural network for facial expression recognition. International Journal of Machine Learning and Cybernetics, 1–17
Maas AL, Daly RE, Pham PT, Stackhouse DM, Ng AY (2011) IMDB Dataset of 50K Movie Reviews. https://www.kaggle.com/lakshmi25npathi/imdb-dataset-of-50k-movie-reviews/data
Rotten Tomatoes Movie Reviews dataset. https://www.rottentomatoes.com. Accessed 02 Mar 2023 (2020)
Khan B, Arshad M, Khan SS (2023) Comparative analysis of machine learning models for pdf malware detection: evaluating different training and testing criteria. Journal of Cybersecurity 5:1–11
Dewi C, Chen R-C, Christanto HJ, Cauteruccio F (2023) Multinomial Naïve Bayes classifier for sentiment analysis of internet movie database. Vietnam J Comput Sci 10(04):485–498
Dahir UM, Alkindy FK (2023) Utilizing machine learning for sentiment analysis of imdb movie review data 71:18–26
Lou Y (2023) Deep learning-based sentiment analysis of movie reviews. In: Third international conference on machine learning and computer application (ICMLCA 2022), vol 12636, pp 177–184. SPIE
Bowen Z (2023) A bert-cnn based approach on movie review sentiment analysis. In: SHS Web of Conferences, vol 163. EDP Sciences
Acknowledgements
We sincerely thank everyone who helped us finish this research paper. We are grateful to the participants for their helpful feedback and ideas, which improved our research methods and the quality of our results. We appreciate everyone who gave their time to join our study, as this research wouldn’t have been possible without them. Thank you to everyone who took the time to contribute to this research paper.
Funding
This paper is for free publication.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Ethical Approval
Not Applicable.
Conflict of Interests
The authors of this paper declare that they do not have any conflicts of interest.
Financial Interests
The authors of this paper have no competing interests relevant to this article’s content to declare.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Danyal, M.M., Khan, S.S., Khan, M. et al. Proposing sentiment analysis model based on BERT and XLNet for movie reviews. Multimed Tools Appl (2024). https://doi.org/10.1007/s11042-024-18156-5
Received:
Revised:
Accepted:
Published:
DOI: https://doi.org/10.1007/s11042-024-18156-5