Skip to main content
Log in

Proposing sentiment analysis model based on BERT and XLNet for movie reviews

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

Movie reviews are a valuable source of information for potential viewers. However, reading all of the reviews can be time-consuming and overwhelming. Summarizing all of the reviews will help you make the correct choice without wasting time reading all of the reviews. Sentiment analysis, or opinion mining, can extract subjective information from movie reviews, such as the reviewer’s overall opinion of the movie, its strengths and weaknesses, and the reviewer’s recommendations. This information can help potential viewers make informed decisions about whether or not to watch a movie. XLNet and Bidirectional Encoder Representations from Transformers (BERT) are pre-trained advanced language models that learn bidirectional relationships between words, improving performance on many natural language processing tasks. BERT uses a masked language modeling objective, while XLNet uses a permutation language modeling objective. This experiment is based on the proposed method for XLNet and BERT, two advanced techniques and popular baseline techniques using the Internet Movie Database (IMDB) Dataset of 50K reviews and the Rotten Tomatoes dataset. We pre-processed both datasets using data cleaning, the removal of duplicate reviews, lemmatization, and handling of chat words to improve baseline technique results. The results indicate that XLNet achieved the highest accuracy on both datasets. As a result of the research experiment, sentiment analysis provides insights into how emotions and attitudes are expressed in movie reviews that can be used to predict a movie’s performance based on their overall sentiment.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

Data Availability

The data that support the findings of this study are openly available through the Open Science Framework at https://github.com/Ankit152/IMDB-sentiment-analysis.git and https://www.kaggle.com/datasets/talha002/rottentomatoes-400k-review

Abbreviations

AI:

Artificial Intelligence

BERT:

Bidirectional Encoder Representations from Transformers

BOWs:

Bag-of-Words

CV:

Cross-Validation

DL:

Deep Learning

IMDB:

Internet Movie Database

KNN:

K-Nearest Neighbours

LR:

Logistic Regression

LSVM:

Linear Support Vector Machines

ML:

Machine Learning

MLP:

Multi-layer Perceptron

MNB:

Multinomial Naive Bayes

NB:

Naive Bayes

NLP:

Natural Language Processing

NLTK:

Natural Language Tool Kit

OP:

Opinion Mining

PAC:

Passive Aggressive Classifier

RT:

Rotten Tomatoes

SA:

Sentiment Analysis

TF-IDF:

Term Frequency-Inverse Document Frequency

References

  1. Aziz MM, Purbalaksono MD, Adiwijaya A (2023) Method comparison of Naïve Bayes, logistic regression, and svm for analyzing movie reviews. Building of Informatics, Technology and Science (BITS) 4(4):1714–1720

  2. Dey L, Chakraborty S, Biswas A, Bose B, Tiwari S (2016) Sentiment analysis of review datasets using naive bayes and k-nn classifier. arXiv:1610.09982

  3. Abimanyu AJ, Dwifebri M, Astut, W (2023) Sentiment analysis on movie review from rotten tomatoes using logistic regression and information gain feature selection. Building of Informatics, Technology and Science (BITS) 5(1):162–170

  4. Khan SS, Khan M, Ran Q, Naseem R. Challenges in opinion mining, comprehensive

  5. Sudha N, Govindarajan M (2016) Mining movie reviews using machine learning techniques. International Journal of Computer Applications. 144(5)

  6. Khan M, Khan MS, Alharbi Y (2020) Text mining challenges and applications-a comprehensive review. IJCSNS 20(12):138

    Google Scholar 

  7. Birjali M, Kasri M, Beni-Hssane A (2021) A comprehensive survey on sentiment analysis: approaches, challenges and trends. Knowl-Based Syst 226:107134

    Article  Google Scholar 

  8. Yang Z, Dai Z, Yang Y, Carbonell J, Salakhutdinov RR, Le QV (2019) Xlnet: generalized autoregressive pretraining for language understanding. Advances in neural information processing systems 32

  9. Devlin J, Chang M-W, Lee K, Toutanova K (2018) Bert: pre–training of deep bidirectional transformers for language understanding. arXiv:1810.04805

  10. Das B, Chakraborty S (2018) An improved text sentiment classification model using tf-idf and next word negation. arXiv:1806.06407

  11. Banik N, Rahman MHH (2018) Evaluation of naïve bayes and support vector machines on bangla textual movie reviews. In: 2018 international conference on bangla speech and language processing (ICBSLP), pp 1–6. IEEE

  12. Jiang X, Song C, Xu Y, Li Y, Peng Y (2022) Research on sentiment classification for netizens based on the bert-bilstm-textcnn model. PeerJ Comput Sci 8:1005

    Article  Google Scholar 

  13. Samsir S, Kusmanto K, Dalimunthe AH, Aditiya R, Watrianthos R (2022) Implementation naïve bayes classification for sentiment analysis on internet movie database. Building of Informatics, Technology and Science (BITS) 4(1):1–6

  14. Govindarajan M (2013) Sentiment analysis of movie reviews using hybrid method of Naive Bayes and genetic algorithm. Int J Adv Comput Res 3(4):139

    Google Scholar 

  15. Saxena M, Tripathi V, Singh K Sentiment analysis of movie reviews using ensemble model

  16. Abinash N, Kalinathan L, Raghavesh D, Kirthika R, KM HK, Praveen M, Nestor IJ, Balasundaram P (2023) Sentiment analysis on reviews using word embedding and ratings. NeuroQuantology 21(6):1084

  17. Amolik A, Jivane N, Bhandari M, Venkatesan M (2015) Twitter sentiment analysis of movie reviews using machine learning techniques. Int J Eng Technol 7(6):2038–2044

    Google Scholar 

  18. Teja JS, Sai GK, Kumar MD, Manikandan R (2018) Sentiment analysis of movie reviews using machine learning algorithms-a survey. Int J Pure Appl Math 118(20):3277–3284

    Google Scholar 

  19. Liu S, Gao P, Li Y, Fu W, Ding W (2023) Multi-modal fusion network with complementarity and importance for emotion recognition. Inf Sci 619:679–694

    Article  Google Scholar 

  20. Danyal MM, Khan SS, Khan M, Ghaar MB, Khan B (2023) Sentiment analysis based on performance of linear support vector machine and multinomial naïve bayes using movie reviews with baseline techniques. Journal of Big Data. 4(13). https://doi.org/10.32604/jbd.2023.041319

  21. Khyani D, Siddhartha BS, Niveditha NM, Divya BM (2021) An interpretation of lemmatization and stemming in natural language processing. J Univ Shanghai Sci Technol

  22. Andersson V (2017) Machine Learning in Logistics: Machine Learning Algorithms: Data Preprocessing and Machine Learning Algorithms

  23. Asghar MZ, Khan A, Ahmad S, Kundi FM (2014) A review of feature extraction in sentiment analysis. J Basic Appl Sci Res 4(3):181–186

    Google Scholar 

  24. Agrawal T (2021) Hyperparameter Optimization in Machine Learning: Make Your Machine Learning and Deep Learning Models More Efficient. Springer, ???

  25. Yang L, Shami A (2020) On hyperparameter optimization of machine learning algorithms: theory and practice. Neurocomputing 415:295–316

    Article  Google Scholar 

  26. Liu S, Huang S, Fu W, Lin JC-W (2022) A descriptive human visual cognitive strategy using graph neural network for facial expression recognition. International Journal of Machine Learning and Cybernetics, 1–17

  27. Maas AL, Daly RE, Pham PT, Stackhouse DM, Ng AY (2011) IMDB Dataset of 50K Movie Reviews. https://www.kaggle.com/lakshmi25npathi/imdb-dataset-of-50k-movie-reviews/data

  28. Rotten Tomatoes Movie Reviews dataset. https://www.rottentomatoes.com. Accessed 02 Mar 2023 (2020)

  29. Khan B, Arshad M, Khan SS (2023) Comparative analysis of machine learning models for pdf malware detection: evaluating different training and testing criteria. Journal of Cybersecurity 5:1–11

    Google Scholar 

  30. Dewi C, Chen R-C, Christanto HJ, Cauteruccio F (2023) Multinomial Naïve Bayes classifier for sentiment analysis of internet movie database. Vietnam J Comput Sci 10(04):485–498

    Article  Google Scholar 

  31. Dahir UM, Alkindy FK (2023) Utilizing machine learning for sentiment analysis of imdb movie review data 71:18–26

    Google Scholar 

  32. Lou Y (2023) Deep learning-based sentiment analysis of movie reviews. In: Third international conference on machine learning and computer application (ICMLCA 2022), vol 12636, pp 177–184. SPIE

  33. Bowen Z (2023) A bert-cnn based approach on movie review sentiment analysis. In: SHS Web of Conferences, vol 163. EDP Sciences

Download references

Acknowledgements

We sincerely thank everyone who helped us finish this research paper. We are grateful to the participants for their helpful feedback and ideas, which improved our research methods and the quality of our results. We appreciate everyone who gave their time to join our study, as this research wouldn’t have been possible without them. Thank you to everyone who took the time to contribute to this research paper.

Funding

This paper is for free publication.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Muzammil Khan.

Ethics declarations

Ethical Approval

Not Applicable.

Conflict of Interests

The authors of this paper declare that they do not have any conflicts of interest.

Financial Interests

The authors of this paper have no competing interests relevant to this article’s content to declare.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix A

Appendix A

The snapshot of the IMDB dataset can be seen in Fig. 11.

Fig. 11
figure 11

IMDB Dataset

The snapshot of the Rotten Tomatoes dataset can be seen in Fig. 12.

Fig. 12
figure 12

Rotten Tomatoes Dataset Snapshot

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Danyal, M.M., Khan, S.S., Khan, M. et al. Proposing sentiment analysis model based on BERT and XLNet for movie reviews. Multimed Tools Appl (2024). https://doi.org/10.1007/s11042-024-18156-5

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s11042-024-18156-5

Keywords

Navigation