Abstract
Twitter Sentiment Analysis (TSA) as part of a text classification task has been widely attended by researchers in recent years. This paper presents a machine learning approach to solving the TSA problem in three phases. In the second phase, a suitable value for representing each feature in the Vector Space Model is determined through the weighted combination of the values obtained from four methods (i.e., Term Frequency and Inverse Document Frequency, semantic similarity, sentiment scoring using SentiWordNet, and sentiment scoring based on the class of tweets). In this manner, finding the percentage of contributions or weights of each method is defined as an optimization problem and solved using a genetic algorithm. Also, the weighted values obtained from four methods are combined based on the Einstein sum as an important T-conorm method. Finally, the performance of the proposed method is tested based on the accuracy of support vector machine and multinomial naïve Bayes classification algorithms on four famous Twitter datasets, namely the Stanford testing dataset, STS-Gold dataset, Obama-McCain Debate dataset, and Strict Obama-McCain Debate dataset. The obtained results show the high superiority of the proposed method in comparison with the other methods.
Similar content being viewed by others
Notes
Stanford dataset official page: http://help.sentiment140.com/forstudents.
OMD dataset: https://github.com/pmbaumgartner/text-feat-lib.
References
Supriya BN, Kallimani V, Prakash S, Akki CB (2016) Twitter sentiment analysis using binary classification technique. In: International conference on nature of computation and communication ICTCC 2016: nature of computation and communication pp 91–396
Haque MdA, Rahman T (2014) Sentiment analysis by using fuzzy logic. Int J Comput Sci Eng Inf Technol (IJCSEIT) 4:33–48
Shirdastian H, Laroche M, Richard M-O (2019) Using big data analytics to study brand authenticity sentiments: the case of starbucks on twitter. Int J Inf Manage 48:291–307
Mansour R, Hady MFA, Hosam E, Amr H, Ashour A (2015) Feature selection for twitter sentiment analysis: an experimental study. In: International conference on intelligent text processing and computational linguistics CICLing computational linguistics and intelligent text processing, pp 92–103
Bao Y, Quan Ch, Wang L, Ren F (2014) The role of pre-processing in twitter sentiment analysis. In: International conference on intelligent computing ICIC: intelligent computing methodologies, pp 615–624
Keshavarz H, Abadeh M-S (2017) ALGA: adaptive lexicon learning using genetic algorithm for sentiment analysis of microblogs. Knowl-Based Syst 122:1–16
Ismail H-M, Belkhouche B, Zaki N (2018) Semantic twitter sentiment analysis based on a fuzzy thesaurus. Soft Comput 22:6011–6024
Medhat W, Hassan A, Korashy H (2014) Sentiment analysis algorithms and applications: a survey. Ain Shams Eng J 5:1093–1113
Asghar M-Z, Khan A, Khan F, Kundi F-M (2018) RIFT: a rule induction framework for twitter sentiment analysis. Arabian J Sci Eng 43:857–877
Le B, Nguyen H (2015) Twitter sentiment analysis using machine learning techniques. In: Advanced computational methods for knowledge engineering AISC: advances in intelligent systems and computing, pp 279–289
Pandey A-Ch, Rajpoot D-S, Saraswat M (2017) Twitter sentiment analysis using hybrid cuckoo search method. Inf Process Manage 53:764–779
Speriosu M, Sudan N, Upadhyay S, Baldridge J (2011) Twitter polarity classification with label propagation over lexical links and the follower graph. In: Conference on empirical methods in natural language processing, UK, pp 53–63
Masud F, Khan A, Ahmad S, Asghar M-Z (2014) Lexicon-based sentiment analysis in the social web. J Basic Appl Sci Res 4(6):238–248
Asghar M-Z, Kundi F-M, Ahmad Sh, Khan A, Khan F (2018) T-SAF: twitter sentiment analysis framework using a hybrid classification scheme. Exp Syst 35:1–19
Saif H, He Y, Fernandez M, Alani H (2016) Contextual semantics for sentiment analysis of Twitter. Inf Process Manage 52:5–19
Khan F-H, Qamar U, Bashir S (2016) SentiMI: introducing point-wise mutual information with SentiWordNet to improve sentiment polarity detection. Appl Soft Comput 39:140–153
Esuli A, Sebastiani F (2006) Sentiwordnet: a publicly available lexical resource for opinion mining. In: Proceedings of the fifth international conference on language resources and evaluation, pp 417–422
Nielsen F-A (2011) A new ANEW: evaluation of a word list for sentiment analysis for microblogs. In: Proceedings of the ESWC2011 Workshop on ‘Making Sense of Microposts’: big things come in small packages, pp 93–98
Taboada M, Brooke J, Tofiloski M, Voll K, Stede M (2011) Lexicon-based methods for sentiment analysis. Comput Lingust 37:267–307
Paltoglou G, Thelwall M (2010) A study of information retrieval weighting schemes for sentiment analysis. In: Proceedings of the 48th annual meeting of the association for computational linguistics: association for computational linguistics, pp 1386–1395
Yager RR, Kelman A (1996) Fusion of fuzzy information with considerations for compatibility, partial aggregation, and reinforcement. Int J Appr Reason 15:93–122
Appel O, Chiclana F, Carter J, Fujita H (2016) a hybrid approach to the sentiment analysis problem at the sentence level. Knowl-Based Syst 108:110–124
Gassert H (2018) Operators on fuzzy sets: zadeh and einsteinations on fuzzy sets properties of T-Norms and T-Conorms. https://pdfs.semanticscholar.org/a045/52b74047208d23d77b8aa9f5f334b59e65ea.pdf. Accessed 8 Dec 2018
Goldberg D-E (1989) Genetic algorithms in search optimization and machine learning. Addition Wesley, Massachusetts
Effrosynidis D, Symeonidis S, Arampatzis A (2017) A comparison of pre-processing techniques. In: International conference on theory and practice of digital libraries TPDL: research and advanced technology for digital libraries, pp 394–406
Salton G, Wong A, Yang C-S (1975) A vector space model for automatic indexing. Commun ACM 18:613–620
Han J, Kamber M (2006) Data mining: concepts and techniques, 2nd edn. University of Illinois at Urbana-Champaign, printed on Elsevier Inc
Vierira S-M, Mendonca L-F, Farinha G-J, Sousa J-M-C (2013) Modified binary PSO for feature selection using SVM applied to mortality prediction of septic patients. Appl Soft Comput 13:3494–3504
Gen M, Cheng R (1997) Genetic algorithms and engineering design, printed on Wiley
Vapnik V-N (1995) The nature of statistical learning theory. Springer, New York
Saif H, Fernande M, Alani YHH (2013) Evaluation datasets for twitter sentiment analysis: a survey and a new dataset, the STS-Gold. In: 1st interantional workshop on emotion and sentiment in social and expressive media: approaches and perspectives from AI (ESSEM 2013), Turin, Italy, pp 9–21
Go A, Bhayani R, Huang L (2010) Twitter sentiment classification using distant supervision. Technical report Stanford University
Shapiro SS, Wilk MB, Chen HJ (1968) A comparative study of various tests for normality. J Am Stat Assoc 63(324):1343–1372
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendix A
Appendix A
The Shapiro–Wilk test is a normality test in statistic science and was published in 1965. At a time that the size of the sample is small, this test can be considered as an appropriate alternative. Handling the small samples (n < 20) is identified as one of this test advantages [33]. In this test, the null hypothesis is what the population is normally distributed. This hypothesis is rejected with the significant level of α, if the data tested has not been distributed normally. Table 9 indicates the results distribution is the normal distribution (the significance level 0.05), which was mentioned above in this research.
Rights and permissions
About this article
Cite this article
Zarisfi Kermani, F., Sadeghi, F. & Eslami, E. Solving the twitter sentiment analysis problem based on a machine learning-based approach. Evol. Intel. 13, 381–398 (2020). https://doi.org/10.1007/s12065-019-00301-x
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s12065-019-00301-x