Proposing sentiment analysis model based on BERT and XLNet for movie reviews

Danyal, Mian Muhammad; Khan, Sarwar Shah; Khan, Muzammil; Ullah, Subhan; Mehmood, Faheem; Ali, Ijaz

doi:10.1007/s11042-024-18156-5

Proposing sentiment analysis model based on BERT and XLNet for movie reviews

Published: 15 January 2024

(2024)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

Mian Muhammad Danyal¹^na1,
Sarwar Shah Khan^2,3,
Muzammil Khan ORCID: orcid.org/0000-0003-4656-1041²^na1,
Subhan Ullah¹^na1,
Faheem Mehmood⁴^na1 &
…
Ijaz Ali³^na1

470 Accesses
1 Citation
1 Altmetric
Explore all metrics

Abstract

Movie reviews are a valuable source of information for potential viewers. However, reading all of the reviews can be time-consuming and overwhelming. Summarizing all of the reviews will help you make the correct choice without wasting time reading all of the reviews. Sentiment analysis, or opinion mining, can extract subjective information from movie reviews, such as the reviewer’s overall opinion of the movie, its strengths and weaknesses, and the reviewer’s recommendations. This information can help potential viewers make informed decisions about whether or not to watch a movie. XLNet and Bidirectional Encoder Representations from Transformers (BERT) are pre-trained advanced language models that learn bidirectional relationships between words, improving performance on many natural language processing tasks. BERT uses a masked language modeling objective, while XLNet uses a permutation language modeling objective. This experiment is based on the proposed method for XLNet and BERT, two advanced techniques and popular baseline techniques using the Internet Movie Database (IMDB) Dataset of 50K reviews and the Rotten Tomatoes dataset. We pre-processed both datasets using data cleaning, the removal of duplicate reviews, lemmatization, and handling of chat words to improve baseline technique results. The results indicate that XLNet achieved the highest accuracy on both datasets. As a result of the research experiment, sentiment analysis provides insights into how emotions and attitudes are expressed in movie reviews that can be used to predict a movie’s performance based on their overall sentiment.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Sentiment Analysis in the Age of Generative AI

Article Open access 05 March 2024

"Challenges and future in deep learning for sentiment analysis: a comprehensive review and a proposed novel hybrid approach"

Article Open access 05 March 2024

Sentiment analysis using deep learning architectures: a review

Article 02 December 2019

Data Availability

The data that support the findings of this study are openly available through the Open Science Framework at https://github.com/Ankit152/IMDB-sentiment-analysis.git and https://www.kaggle.com/datasets/talha002/rottentomatoes-400k-review

Abbreviations

AI:: Artificial Intelligence
BERT:: Bidirectional Encoder Representations from Transformers
BOWs:: Bag-of-Words
CV:: Cross-Validation
DL:: Deep Learning
IMDB:: Internet Movie Database
KNN:: K-Nearest Neighbours
LR:: Logistic Regression
LSVM:: Linear Support Vector Machines
ML:: Machine Learning
MLP:: Multi-layer Perceptron
MNB:: Multinomial Naive Bayes
NB:: Naive Bayes
NLP:: Natural Language Processing
NLTK:: Natural Language Tool Kit
OP:: Opinion Mining
PAC:: Passive Aggressive Classifier
RT:: Rotten Tomatoes
SA:: Sentiment Analysis
TF-IDF:: Term Frequency-Inverse Document Frequency

References

Aziz MM, Purbalaksono MD, Adiwijaya A (2023) Method comparison of Naïve Bayes, logistic regression, and svm for analyzing movie reviews. Building of Informatics, Technology and Science (BITS) 4(4):1714–1720
Dey L, Chakraborty S, Biswas A, Bose B, Tiwari S (2016) Sentiment analysis of review datasets using naive bayes and k-nn classifier. arXiv:1610.09982
Abimanyu AJ, Dwifebri M, Astut, W (2023) Sentiment analysis on movie review from rotten tomatoes using logistic regression and information gain feature selection. Building of Informatics, Technology and Science (BITS) 5(1):162–170
Khan SS, Khan M, Ran Q, Naseem R. Challenges in opinion mining, comprehensive
Sudha N, Govindarajan M (2016) Mining movie reviews using machine learning techniques. International Journal of Computer Applications. 144(5)
Khan M, Khan MS, Alharbi Y (2020) Text mining challenges and applications-a comprehensive review. IJCSNS 20(12):138
Google Scholar
Birjali M, Kasri M, Beni-Hssane A (2021) A comprehensive survey on sentiment analysis: approaches, challenges and trends. Knowl-Based Syst 226:107134
Article Google Scholar
Yang Z, Dai Z, Yang Y, Carbonell J, Salakhutdinov RR, Le QV (2019) Xlnet: generalized autoregressive pretraining for language understanding. Advances in neural information processing systems 32
Devlin J, Chang M-W, Lee K, Toutanova K (2018) Bert: pre–training of deep bidirectional transformers for language understanding. arXiv:1810.04805
Das B, Chakraborty S (2018) An improved text sentiment classification model using tf-idf and next word negation. arXiv:1806.06407
Banik N, Rahman MHH (2018) Evaluation of naïve bayes and support vector machines on bangla textual movie reviews. In: 2018 international conference on bangla speech and language processing (ICBSLP), pp 1–6. IEEE
Jiang X, Song C, Xu Y, Li Y, Peng Y (2022) Research on sentiment classification for netizens based on the bert-bilstm-textcnn model. PeerJ Comput Sci 8:1005
Article Google Scholar
Samsir S, Kusmanto K, Dalimunthe AH, Aditiya R, Watrianthos R (2022) Implementation naïve bayes classification for sentiment analysis on internet movie database. Building of Informatics, Technology and Science (BITS) 4(1):1–6
Govindarajan M (2013) Sentiment analysis of movie reviews using hybrid method of Naive Bayes and genetic algorithm. Int J Adv Comput Res 3(4):139
Google Scholar
Saxena M, Tripathi V, Singh K Sentiment analysis of movie reviews using ensemble model
Abinash N, Kalinathan L, Raghavesh D, Kirthika R, KM HK, Praveen M, Nestor IJ, Balasundaram P (2023) Sentiment analysis on reviews using word embedding and ratings. NeuroQuantology 21(6):1084
Amolik A, Jivane N, Bhandari M, Venkatesan M (2015) Twitter sentiment analysis of movie reviews using machine learning techniques. Int J Eng Technol 7(6):2038–2044
Google Scholar
Teja JS, Sai GK, Kumar MD, Manikandan R (2018) Sentiment analysis of movie reviews using machine learning algorithms-a survey. Int J Pure Appl Math 118(20):3277–3284
Google Scholar
Liu S, Gao P, Li Y, Fu W, Ding W (2023) Multi-modal fusion network with complementarity and importance for emotion recognition. Inf Sci 619:679–694
Article Google Scholar
Danyal MM, Khan SS, Khan M, Ghaar MB, Khan B (2023) Sentiment analysis based on performance of linear support vector machine and multinomial naïve bayes using movie reviews with baseline techniques. Journal of Big Data. 4(13). https://doi.org/10.32604/jbd.2023.041319
Khyani D, Siddhartha BS, Niveditha NM, Divya BM (2021) An interpretation of lemmatization and stemming in natural language processing. J Univ Shanghai Sci Technol
Andersson V (2017) Machine Learning in Logistics: Machine Learning Algorithms: Data Preprocessing and Machine Learning Algorithms
Asghar MZ, Khan A, Ahmad S, Kundi FM (2014) A review of feature extraction in sentiment analysis. J Basic Appl Sci Res 4(3):181–186
Google Scholar
Agrawal T (2021) Hyperparameter Optimization in Machine Learning: Make Your Machine Learning and Deep Learning Models More Efficient. Springer, ???
Yang L, Shami A (2020) On hyperparameter optimization of machine learning algorithms: theory and practice. Neurocomputing 415:295–316
Article Google Scholar
Liu S, Huang S, Fu W, Lin JC-W (2022) A descriptive human visual cognitive strategy using graph neural network for facial expression recognition. International Journal of Machine Learning and Cybernetics, 1–17
Maas AL, Daly RE, Pham PT, Stackhouse DM, Ng AY (2011) IMDB Dataset of 50K Movie Reviews. https://www.kaggle.com/lakshmi25npathi/imdb-dataset-of-50k-movie-reviews/data
Rotten Tomatoes Movie Reviews dataset. https://www.rottentomatoes.com. Accessed 02 Mar 2023 (2020)
Khan B, Arshad M, Khan SS (2023) Comparative analysis of machine learning models for pdf malware detection: evaluating different training and testing criteria. Journal of Cybersecurity 5:1–11
Google Scholar
Dewi C, Chen R-C, Christanto HJ, Cauteruccio F (2023) Multinomial Naïve Bayes classifier for sentiment analysis of internet movie database. Vietnam J Comput Sci 10(04):485–498
Article Google Scholar
Dahir UM, Alkindy FK (2023) Utilizing machine learning for sentiment analysis of imdb movie review data 71:18–26
Google Scholar
Lou Y (2023) Deep learning-based sentiment analysis of movie reviews. In: Third international conference on machine learning and computer application (ICMLCA 2022), vol 12636, pp 177–184. SPIE
Bowen Z (2023) A bert-cnn based approach on movie review sentiment analysis. In: SHS Web of Conferences, vol 163. EDP Sciences

Download references

Acknowledgements

We sincerely thank everyone who helped us finish this research paper. We are grateful to the participants for their helpful feedback and ideas, which improved our research methods and the quality of our results. We appreciate everyone who gave their time to join our study, as this research wouldn’t have been possible without them. Thank you to everyone who took the time to contribute to this research paper.

Funding

This paper is for free publication.

Author information

Mian Muhammad Danyal, Muzammil Khan, Subhan Ullah, Faheem Mehmood, and Ijaz Ali contributed equally to this work.

Authors and Affiliations

Department of Computer Science, City University of Science and Information Technology, Peshawar, 25000, Pakistan
Mian Muhammad Danyal & Subhan Ullah
Department of Computer and Software Technology, University of Swat, Swat, 19130, Pakistan
Sarwar Shah Khan & Muzammil Khan
Department of Computer Science, Iqra National University Swat, Swat, 19130, Pakistan
Sarwar Shah Khan & Ijaz Ali
Department of Computer Science, Air University Islamabad, Islamabad, 44320, Pakistan
Faheem Mehmood

Authors

Mian Muhammad Danyal
View author publications
You can also search for this author in PubMed Google Scholar
Sarwar Shah Khan
View author publications
You can also search for this author in PubMed Google Scholar
Muzammil Khan
View author publications
You can also search for this author in PubMed Google Scholar
Subhan Ullah
View author publications
You can also search for this author in PubMed Google Scholar
Faheem Mehmood
View author publications
You can also search for this author in PubMed Google Scholar
Ijaz Ali
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Muzammil Khan.

Ethics declarations

Ethical Approval

Not Applicable.

Conflict of Interests

The authors of this paper declare that they do not have any conflicts of interest.

Financial Interests

The authors of this paper have no competing interests relevant to this article’s content to declare.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix A

The snapshot of the IMDB dataset can be seen in Fig. 11.

The snapshot of the Rotten Tomatoes dataset can be seen in Fig. 12.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Danyal, M.M., Khan, S.S., Khan, M. et al. Proposing sentiment analysis model based on BERT and XLNet for movie reviews. Multimed Tools Appl (2024). https://doi.org/10.1007/s11042-024-18156-5

Download citation

Received: 24 July 2023
Revised: 27 December 2023
Accepted: 03 January 2024
Published: 15 January 2024
DOI: https://doi.org/10.1007/s11042-024-18156-5

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Proposing sentiment analysis model based on BERT and XLNet for movie reviews

Abstract

Access this article

Similar content being viewed by others

Sentiment Analysis in the Age of Generative AI

"Challenges and future in deep learning for sentiment analysis: a comprehensive review and a proposed novel hybrid approach"

Sentiment analysis using deep learning architectures: a review

Data Availability

Abbreviations

References

Acknowledgements

Funding