Skip to main content

Advertisement

Log in

Solving the twitter sentiment analysis problem based on a machine learning-based approach

  • Research Paper
  • Published:
Evolutionary Intelligence Aims and scope Submit manuscript

Abstract

Twitter Sentiment Analysis (TSA) as part of a text classification task has been widely attended by researchers in recent years. This paper presents a machine learning approach to solving the TSA problem in three phases. In the second phase, a suitable value for representing each feature in the Vector Space Model is determined through the weighted combination of the values obtained from four methods (i.e., Term Frequency and Inverse Document Frequency, semantic similarity, sentiment scoring using SentiWordNet, and sentiment scoring based on the class of tweets). In this manner, finding the percentage of contributions or weights of each method is defined as an optimization problem and solved using a genetic algorithm. Also, the weighted values obtained from four methods are combined based on the Einstein sum as an important T-conorm method. Finally, the performance of the proposed method is tested based on the accuracy of support vector machine and multinomial naïve Bayes classification algorithms on four famous Twitter datasets, namely the Stanford testing dataset, STS-Gold dataset, Obama-McCain Debate dataset, and Strict Obama-McCain Debate dataset. The obtained results show the high superiority of the proposed method in comparison with the other methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

Notes

  1. https://zephoria.com/twitter-statistics-top-ten/.

  2. https://www.internetlivestats.com/twitter-statistics/.

  3. http://sentiment.christopherpotts.net/tokenizing.html.

  4. https://en.wikipedia.org/wiki/List_of_emoticons.

  5. https://www.noslang.com/dictionary.

  6. https://www.netlingo.com/acronyms.php.

  7. Stanford dataset official page: http://help.sentiment140.com/forstudents.

  8. STS-Gold dataset: https://github.com/pollockj/world_mood/blob/master/sts_gold_v03/sts_gold_tweet.csv.

  9. OMD dataset: https://github.com/pmbaumgartner/text-feat-lib.

References

  1. Supriya BN, Kallimani V, Prakash S, Akki CB (2016) Twitter sentiment analysis using binary classification technique. In: International conference on nature of computation and communication ICTCC 2016: nature of computation and communication pp 91–396

  2. Haque MdA, Rahman T (2014) Sentiment analysis by using fuzzy logic. Int J Comput Sci Eng Inf Technol (IJCSEIT) 4:33–48

    Google Scholar 

  3. Shirdastian H, Laroche M, Richard M-O (2019) Using big data analytics to study brand authenticity sentiments: the case of starbucks on twitter. Int J Inf Manage 48:291–307

    Article  Google Scholar 

  4. Mansour R, Hady MFA, Hosam E, Amr H, Ashour A (2015) Feature selection for twitter sentiment analysis: an experimental study. In: International conference on intelligent text processing and computational linguistics CICLing computational linguistics and intelligent text processing, pp 92–103

  5. Bao Y, Quan Ch, Wang L, Ren F (2014) The role of pre-processing in twitter sentiment analysis. In: International conference on intelligent computing ICIC: intelligent computing methodologies, pp 615–624

  6. Keshavarz H, Abadeh M-S (2017) ALGA: adaptive lexicon learning using genetic algorithm for sentiment analysis of microblogs. Knowl-Based Syst 122:1–16

    Article  Google Scholar 

  7. Ismail H-M, Belkhouche B, Zaki N (2018) Semantic twitter sentiment analysis based on a fuzzy thesaurus. Soft Comput 22:6011–6024

    Article  Google Scholar 

  8. Medhat W, Hassan A, Korashy H (2014) Sentiment analysis algorithms and applications: a survey. Ain Shams Eng J 5:1093–1113

    Article  Google Scholar 

  9. Asghar M-Z, Khan A, Khan F, Kundi F-M (2018) RIFT: a rule induction framework for twitter sentiment analysis. Arabian J Sci Eng 43:857–877

    Article  Google Scholar 

  10. Le B, Nguyen H (2015) Twitter sentiment analysis using machine learning techniques. In: Advanced computational methods for knowledge engineering AISC: advances in intelligent systems and computing, pp 279–289

  11. Pandey A-Ch, Rajpoot D-S, Saraswat M (2017) Twitter sentiment analysis using hybrid cuckoo search method. Inf Process Manage 53:764–779

    Article  Google Scholar 

  12. Speriosu M, Sudan N, Upadhyay S, Baldridge J (2011) Twitter polarity classification with label propagation over lexical links and the follower graph. In: Conference on empirical methods in natural language processing, UK, pp 53–63

  13. Masud F, Khan A, Ahmad S, Asghar M-Z (2014) Lexicon-based sentiment analysis in the social web. J Basic Appl Sci Res 4(6):238–248

    Google Scholar 

  14. Asghar M-Z, Kundi F-M, Ahmad Sh, Khan A, Khan F (2018) T-SAF: twitter sentiment analysis framework using a hybrid classification scheme. Exp Syst 35:1–19

    Google Scholar 

  15. Saif H, He Y, Fernandez M, Alani H (2016) Contextual semantics for sentiment analysis of Twitter. Inf Process Manage 52:5–19

    Article  Google Scholar 

  16. Khan F-H, Qamar U, Bashir S (2016) SentiMI: introducing point-wise mutual information with SentiWordNet to improve sentiment polarity detection. Appl Soft Comput 39:140–153

    Article  Google Scholar 

  17. Esuli A, Sebastiani F (2006) Sentiwordnet: a publicly available lexical resource for opinion mining. In: Proceedings of the fifth international conference on language resources and evaluation, pp 417–422

  18. Nielsen F-A (2011) A new ANEW: evaluation of a word list for sentiment analysis for microblogs. In: Proceedings of the ESWC2011 Workshop on ‘Making Sense of Microposts’: big things come in small packages, pp 93–98

  19. Taboada M, Brooke J, Tofiloski M, Voll K, Stede M (2011) Lexicon-based methods for sentiment analysis. Comput Lingust 37:267–307

    Article  Google Scholar 

  20. Paltoglou G, Thelwall M (2010) A study of information retrieval weighting schemes for sentiment analysis. In: Proceedings of the 48th annual meeting of the association for computational linguistics: association for computational linguistics, pp 1386–1395

  21. Yager RR, Kelman A (1996) Fusion of fuzzy information with considerations for compatibility, partial aggregation, and reinforcement. Int J Appr Reason 15:93–122

    Article  MathSciNet  Google Scholar 

  22. Appel O, Chiclana F, Carter J, Fujita H (2016) a hybrid approach to the sentiment analysis problem at the sentence level. Knowl-Based Syst 108:110–124

    Article  Google Scholar 

  23. Gassert H (2018) Operators on fuzzy sets: zadeh and einsteinations on fuzzy sets properties of T-Norms and T-Conorms. https://pdfs.semanticscholar.org/a045/52b74047208d23d77b8aa9f5f334b59e65ea.pdf. Accessed 8 Dec 2018

  24. Goldberg D-E (1989) Genetic algorithms in search optimization and machine learning. Addition Wesley, Massachusetts

    MATH  Google Scholar 

  25. Effrosynidis D, Symeonidis S, Arampatzis A (2017) A comparison of pre-processing techniques. In: International conference on theory and practice of digital libraries TPDL: research and advanced technology for digital libraries, pp 394–406

  26. Salton G, Wong A, Yang C-S (1975) A vector space model for automatic indexing. Commun ACM 18:613–620

    Article  Google Scholar 

  27. Han J, Kamber M (2006) Data mining: concepts and techniques, 2nd edn. University of Illinois at Urbana-Champaign, printed on Elsevier Inc

  28. Vierira S-M, Mendonca L-F, Farinha G-J, Sousa J-M-C (2013) Modified binary PSO for feature selection using SVM applied to mortality prediction of septic patients. Appl Soft Comput 13:3494–3504

    Article  Google Scholar 

  29. Gen M, Cheng R (1997) Genetic algorithms and engineering design, printed on Wiley

  30. Vapnik V-N (1995) The nature of statistical learning theory. Springer, New York

    Book  Google Scholar 

  31. Saif H, Fernande M, Alani YHH (2013) Evaluation datasets for twitter sentiment analysis: a survey and a new dataset, the STS-Gold. In: 1st interantional workshop on emotion and sentiment in social and expressive media: approaches and perspectives from AI (ESSEM 2013), Turin, Italy, pp 9–21

  32. Go A, Bhayani R, Huang L (2010) Twitter sentiment classification using distant supervision. Technical report Stanford University

  33. Shapiro SS, Wilk MB, Chen HJ (1968) A comparative study of various tests for normality. J Am Stat Assoc 63(324):1343–1372

    Article  MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Fatemeh Zarisfi Kermani.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix A

Appendix A

The Shapiro–Wilk test is a normality test in statistic science and was published in 1965. At a time that the size of the sample is small, this test can be considered as an appropriate alternative. Handling the small samples (n < 20) is identified as one of this test advantages [33]. In this test, the null hypothesis is what the population is normally distributed. This hypothesis is rejected with the significant level of α, if the data tested has not been distributed normally. Table 9 indicates the results distribution is the normal distribution (the significance level 0.05), which was mentioned above in this research.

Table 9 The results of the Shapiro–Wilk test on all methods mentioned in this paper

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zarisfi Kermani, F., Sadeghi, F. & Eslami, E. Solving the twitter sentiment analysis problem based on a machine learning-based approach. Evol. Intel. 13, 381–398 (2020). https://doi.org/10.1007/s12065-019-00301-x

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s12065-019-00301-x

Keywords

Navigation