Skip to main content

Text Detergent: The Systematic Combination of Text Pre-processing Techniques for Social Media Sentiment Analysis

  • Conference paper
  • First Online:
Advances on Intelligent Informatics and Computing (IRICT 2021)

Abstract

During catastrophes such as natural or man-made disasters, social media services have evolved into a crucial tool utilised by communities to disseminate information. Because a vast number of social media data is being used for many applications, including sentiment analysis, sentiment analysis has become a very useful and demanding problem. Social media data cannot be applied directly because it is raw and unstructured or semi-structured data. Consequently, text pre-processing becomes one of the most important tasks because the process is strongly constrained by its dependable workflow. This reason creates a complex pattern in pre-processing workflows. For this purpose, different text pre-processing techniques have been used on Twitter, Facebook, and YouTube datasets to study the impact of different pre-processing techniques on the accuracy of machine learning algorithms. This paper applied different text pre-processing techniques in a specific sequence based on significance testing. This study examines their influence on sentiment classification accuracy using a machine learning classifier, Support Vector Machines (SVM). Results proved that applying all 14 techniques systematically can achieve up to 82.57% of the accuracy of the SVM classifier with unigram representations. By using Text Detergent, the YouTube dataset achieve the highest accuracy compared to Facebook and Twitter datasets. This will potentially improve the quality of the text and leads to better feature extraction, which in turn helps the sentiment analyst produce a better classifier.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 229.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 299.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    https://www.kaggle.com.

  2. 2.

    https://www.kaggle.com/kaushiksuresh147/covidvaccine-tweets.

  3. 3.

    https://www.kaggle.com/alechelyar/facebook-antivaccination-dataset.

  4. 4.

    https://www.kaggle.com/datasnaek/youtube?select=GBcomments.csv.

  5. 5.

    https://samrose3.github.io/algorithm-explorer/.

References

  1. Pavan Kumar, C.S., Dhinesh, L.D.: Novel text preprocessing framework for sentiment analysis. In: Satapathy, S.C., Bhateja, V., Das, S. (eds.) Smart Intelligent Computing and Applications. SIST, vol. 105, pp. 309–317. Springer, Singapore (2019). https://doi.org/10.1007/978-981-13-1927-3_33

  2. Kolajo, T., Daramola, A.A., Seth, A.: A framework for pre-processing of social media feeds based on integrated local knowledge base. Inf. Process. Manag. 57(6), 102348 (2020)

    Google Scholar 

  3. Karami, A., Shah, V., Vaezi, R., Bansal, A.: Twitter speaks: a case of national disaster situational awareness. J. Inf. Sci. 46(3), 313–324 (2020)

    Article  Google Scholar 

  4. Pimpalkar, A.P., Retna Raj, R.J.: Influence of pre-processing strategies on the performance of ML classifiers exploiting TF-IDF and BOW features. ADCAIJ Adv. Distrib. Comput. Artif. Intell. J. 9(2), 49–68 (2020)

    Google Scholar 

  5. Sharma, S., Jain, A.: Role of sentiment analysis in social media security and analytics. Wiley Interdiscip. Rev. Data Min. Knowl. Discov. 10(5) (2020)

    Google Scholar 

  6. Ali, K.: Sentiment Analysis as a Service. RMIT University (2019)

    Google Scholar 

  7. Khader, M., Awajan, A., Al-Naymat, G.: The impact of natural language preprocessing on big data sentiment analysis. Int. Arab J. Inf. Technol. 16(3), 506–513 (2019). ASpecial Issue

    Google Scholar 

  8. Sivarajah, U., Kamal, M.M., Irani, Z., Weerakkody, V.: Critical analysis of Big Data challenges and analytical methods. J. Bus. Res. 70, 263–286 (2017)

    Article  Google Scholar 

  9. Naresh, A., Venkata Krishna, P.: An efficient approach for sentiment analysis using machine learning algorithm. Evol. Intel. 14(2), 725–731 (2020). https://doi.org/10.1007/s12065-020-00429-1

    Article  Google Scholar 

  10. Alam, S., Yao, N.: The impact of preprocessing steps on the accuracy of machine learning algorithms in sentiment analysis. Comput. Math. Organ. Theory 25(3), 319–335 (2018). https://doi.org/10.1007/s10588-018-9266-8

    Article  Google Scholar 

  11. Jianqiang, Z., Xiaolin, G.: Comparison research on text pre-processing methods on Twitter sentiment analysis. IEEE Access 5, 2870–2879 (2017)

    Article  Google Scholar 

  12. Sohrabi, M.K., Hemmatian, F.: An efficient preprocessing method for supervised sentiment analysis by converting sentences to numerical vectors: a twitter case study. Multimed. Tools Appl. 78(17), 24863–24882 (2019). https://doi.org/10.1007/s11042-019-7586-4

    Article  Google Scholar 

  13. Symeonidis, S., Effrosynidis, D., Arampatzis, A.: A comparative evaluation of pre-processing techniques and their interactions for Twitter sentiment analysis. Expert Syst. Appl. 110, 298–310 (2018)

    Article  Google Scholar 

  14. K. Kumar, H.M., Harish, B.S.: Classification of short text using various preprocessing techniques: an empirical evaluation. In: Kumar Sa, P., Bakshi, S., Hatzilygeroudis, I.K., Sahoo, M.N. (eds.) Recent Findings in Intelligent Computing Techniques, vol. 3, pp. 19–30. Springer, Singapore (2018). Doi: https://doi.org/10.1007/978-981-10-8633-5_3

  15. Nagarajan, S.M., Gandhi, U.D.: Classifying streaming of Twitter data based on sentiment analysis using hybridization. Neural Comput. Appl. 31(5), 1425–1433 (2018). https://doi.org/10.1007/s00521-018-3476-3

    Article  Google Scholar 

  16. Pradha, S., Halgamuge, M.N., Tran Quoc Vinh, N.: Effective text data preprocessing technique for sentiment analysis in social media data. In: Proceedings of the 2019 11th International Conference Knowledge System Engineering, KSE 2019 (2019)

    Google Scholar 

  17. Naseem, U., Razzak, I., Eklund, P.W.: A survey of pre-processing techniques to improve short-text quality: a case study on hate speech detection on twitter. Multimed. Tools Appl. 80, 35239–35266 (2020)

    Google Scholar 

  18. Ali, K., Dong, H., Bouguettaya, A., Erradi, A., Hadjidj, R.: Sentiment analysis as a service: a social media based sentiment analysis framework. In: 2017 IEEE International Conference on Web Services (ICWS), pp. 660–667 (2017)

    Google Scholar 

  19. Hair Zaki, U.H., Ibrahim, R., Abd Halim, S.: A social media services analysis. Int. J. Adv. Trends Comput. Sci. Eng. 8(1.6), 69–75 (2019)

    Google Scholar 

  20. Infanger, D., Schmidt-Trucksäss, A.: P value functions: an underused method to present research results and to promote quantitative reasoning. Stat. Med. 38(21), 4189–4197 (2019)

    Google Scholar 

  21. Na, J., Sui, H., Khoo, C., Chan, S., Zhou, Y.: Effectiveness of simple linguistic processing in automatic. In: Knowledge Organization and the Global Information Society: Proceedings of the Eighth International ISKO Conference, pp. 49–54 (2004)

    Google Scholar 

  22. Nielsen, F.Å.: A new ANEW: evaluation of a word list for sentiment analysis in microblogs. CEUR Workshop Proc. 718, 93–98 (2011)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ummu Hani’ Hair Zaki .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Hair Zaki, U.H., Ibrahim, R., Abd Halim, S., Kamsani, I.I. (2022). Text Detergent: The Systematic Combination of Text Pre-processing Techniques for Social Media Sentiment Analysis. In: Saeed, F., Mohammed, F., Ghaleb, F. (eds) Advances on Intelligent Informatics and Computing. IRICT 2021. Lecture Notes on Data Engineering and Communications Technologies, vol 127. Springer, Cham. https://doi.org/10.1007/978-3-030-98741-1_5

Download citation

Publish with us

Policies and ethics