Text Detergent: The Systematic Combination of Text Pre-processing Techniques for Social Media Sentiment Analysis

Hair Zaki, Ummu Hani’; Ibrahim, Roliana; Abd Halim, Shahliza; Kamsani, Izyan Izzati

doi:10.1007/978-3-030-98741-1_5

Ummu Hani’ Hair Zaki⁵,
Roliana Ibrahim⁵,
Shahliza Abd Halim⁵ &
…
Izyan Izzati Kamsani⁵

Part of the book series: Lecture Notes on Data Engineering and Communications Technologies ((LNDECT,volume 127))

Included in the following conference series:

International Conference of Reliable Information and Communication Technology

893 Accesses
3 Citations

Abstract

During catastrophes such as natural or man-made disasters, social media services have evolved into a crucial tool utilised by communities to disseminate information. Because a vast number of social media data is being used for many applications, including sentiment analysis, sentiment analysis has become a very useful and demanding problem. Social media data cannot be applied directly because it is raw and unstructured or semi-structured data. Consequently, text pre-processing becomes one of the most important tasks because the process is strongly constrained by its dependable workflow. This reason creates a complex pattern in pre-processing workflows. For this purpose, different text pre-processing techniques have been used on Twitter, Facebook, and YouTube datasets to study the impact of different pre-processing techniques on the accuracy of machine learning algorithms. This paper applied different text pre-processing techniques in a specific sequence based on significance testing. This study examines their influence on sentiment classification accuracy using a machine learning classifier, Support Vector Machines (SVM). Results proved that applying all 14 techniques systematically can achieve up to 82.57% of the accuracy of the SVM classifier with unigram representations. By using Text Detergent, the YouTube dataset achieve the highest accuracy compared to Facebook and Twitter datasets. This will potentially improve the quality of the text and leads to better feature extraction, which in turn helps the sentiment analyst produce a better classifier.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 229.00; Price excludes VAT (USA)

Softcover Book: USD 299.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

References

Pavan Kumar, C.S., Dhinesh, L.D.: Novel text preprocessing framework for sentiment analysis. In: Satapathy, S.C., Bhateja, V., Das, S. (eds.) Smart Intelligent Computing and Applications. SIST, vol. 105, pp. 309–317. Springer, Singapore (2019). https://doi.org/10.1007/978-981-13-1927-3_33
Kolajo, T., Daramola, A.A., Seth, A.: A framework for pre-processing of social media feeds based on integrated local knowledge base. Inf. Process. Manag. 57(6), 102348 (2020)
Google Scholar
Karami, A., Shah, V., Vaezi, R., Bansal, A.: Twitter speaks: a case of national disaster situational awareness. J. Inf. Sci. 46(3), 313–324 (2020)
Article Google Scholar
Pimpalkar, A.P., Retna Raj, R.J.: Influence of pre-processing strategies on the performance of ML classifiers exploiting TF-IDF and BOW features. ADCAIJ Adv. Distrib. Comput. Artif. Intell. J. 9(2), 49–68 (2020)
Google Scholar
Sharma, S., Jain, A.: Role of sentiment analysis in social media security and analytics. Wiley Interdiscip. Rev. Data Min. Knowl. Discov. 10(5) (2020)
Google Scholar
Ali, K.: Sentiment Analysis as a Service. RMIT University (2019)
Google Scholar
Khader, M., Awajan, A., Al-Naymat, G.: The impact of natural language preprocessing on big data sentiment analysis. Int. Arab J. Inf. Technol. 16(3), 506–513 (2019). ASpecial Issue
Google Scholar
Sivarajah, U., Kamal, M.M., Irani, Z., Weerakkody, V.: Critical analysis of Big Data challenges and analytical methods. J. Bus. Res. 70, 263–286 (2017)
Article Google Scholar
Naresh, A., Venkata Krishna, P.: An efficient approach for sentiment analysis using machine learning algorithm. Evol. Intel. 14(2), 725–731 (2020). https://doi.org/10.1007/s12065-020-00429-1
Article Google Scholar
Alam, S., Yao, N.: The impact of preprocessing steps on the accuracy of machine learning algorithms in sentiment analysis. Comput. Math. Organ. Theory 25(3), 319–335 (2018). https://doi.org/10.1007/s10588-018-9266-8
Article Google Scholar
Jianqiang, Z., Xiaolin, G.: Comparison research on text pre-processing methods on Twitter sentiment analysis. IEEE Access 5, 2870–2879 (2017)
Article Google Scholar
Sohrabi, M.K., Hemmatian, F.: An efficient preprocessing method for supervised sentiment analysis by converting sentences to numerical vectors: a twitter case study. Multimed. Tools Appl. 78(17), 24863–24882 (2019). https://doi.org/10.1007/s11042-019-7586-4
Article Google Scholar
Symeonidis, S., Effrosynidis, D., Arampatzis, A.: A comparative evaluation of pre-processing techniques and their interactions for Twitter sentiment analysis. Expert Syst. Appl. 110, 298–310 (2018)
Article Google Scholar
K. Kumar, H.M., Harish, B.S.: Classification of short text using various preprocessing techniques: an empirical evaluation. In: Kumar Sa, P., Bakshi, S., Hatzilygeroudis, I.K., Sahoo, M.N. (eds.) Recent Findings in Intelligent Computing Techniques, vol. 3, pp. 19–30. Springer, Singapore (2018). Doi: https://doi.org/10.1007/978-981-10-8633-5_3
Nagarajan, S.M., Gandhi, U.D.: Classifying streaming of Twitter data based on sentiment analysis using hybridization. Neural Comput. Appl. 31(5), 1425–1433 (2018). https://doi.org/10.1007/s00521-018-3476-3
Article Google Scholar
Pradha, S., Halgamuge, M.N., Tran Quoc Vinh, N.: Effective text data preprocessing technique for sentiment analysis in social media data. In: Proceedings of the 2019 11th International Conference Knowledge System Engineering, KSE 2019 (2019)
Google Scholar
Naseem, U., Razzak, I., Eklund, P.W.: A survey of pre-processing techniques to improve short-text quality: a case study on hate speech detection on twitter. Multimed. Tools Appl. 80, 35239–35266 (2020)
Google Scholar
Ali, K., Dong, H., Bouguettaya, A., Erradi, A., Hadjidj, R.: Sentiment analysis as a service: a social media based sentiment analysis framework. In: 2017 IEEE International Conference on Web Services (ICWS), pp. 660–667 (2017)
Google Scholar
Hair Zaki, U.H., Ibrahim, R., Abd Halim, S.: A social media services analysis. Int. J. Adv. Trends Comput. Sci. Eng. 8(1.6), 69–75 (2019)
Google Scholar
Infanger, D., Schmidt-Trucksäss, A.: P value functions: an underused method to present research results and to promote quantitative reasoning. Stat. Med. 38(21), 4189–4197 (2019)
Google Scholar
Na, J., Sui, H., Khoo, C., Chan, S., Zhou, Y.: Effectiveness of simple linguistic processing in automatic. In: Knowledge Organization and the Global Information Society: Proceedings of the Eighth International ISKO Conference, pp. 49–54 (2004)
Google Scholar
Nielsen, F.Å.: A new ANEW: evaluation of a word list for sentiment analysis in microblogs. CEUR Workshop Proc. 718, 93–98 (2011)
Google Scholar

Download references

Author information

Authors and Affiliations

School of Computing, Faculty of Engineering, Universiti Teknologi Malaysia, 81300, Skudai, Malaysia
Ummu Hani’ Hair Zaki, Roliana Ibrahim, Shahliza Abd Halim & Izyan Izzati Kamsani

Authors

Ummu Hani’ Hair Zaki
View author publications
You can also search for this author in PubMed Google Scholar
Roliana Ibrahim
View author publications
You can also search for this author in PubMed Google Scholar
Shahliza Abd Halim
View author publications
You can also search for this author in PubMed Google Scholar
Izyan Izzati Kamsani
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ummu Hani’ Hair Zaki .

Editor information

Editors and Affiliations

Birmingham City University, Birmingham, UK
Faisal Saeed
School of Computing, Universiti Utara Malaysia (UUM), Sintok, Kedah, Malaysia
Fathey Mohammed
Department of Computer Science, School of Computing, Universiti Teknologi Malaysia, Skudai, Malaysia
Fuad Ghaleb

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Hair Zaki, U.H., Ibrahim, R., Abd Halim, S., Kamsani, I.I. (2022). Text Detergent: The Systematic Combination of Text Pre-processing Techniques for Social Media Sentiment Analysis. In: Saeed, F., Mohammed, F., Ghaleb, F. (eds) Advances on Intelligent Informatics and Computing. IRICT 2021. Lecture Notes on Data Engineering and Communications Technologies, vol 127. Springer, Cham. https://doi.org/10.1007/978-3-030-98741-1_5

Download citation

DOI: https://doi.org/10.1007/978-3-030-98741-1_5
Published: 30 March 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-98740-4
Online ISBN: 978-3-030-98741-1
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics