Abstract
During catastrophes such as natural or man-made disasters, social media services have evolved into a crucial tool utilised by communities to disseminate information. Because a vast number of social media data is being used for many applications, including sentiment analysis, sentiment analysis has become a very useful and demanding problem. Social media data cannot be applied directly because it is raw and unstructured or semi-structured data. Consequently, text pre-processing becomes one of the most important tasks because the process is strongly constrained by its dependable workflow. This reason creates a complex pattern in pre-processing workflows. For this purpose, different text pre-processing techniques have been used on Twitter, Facebook, and YouTube datasets to study the impact of different pre-processing techniques on the accuracy of machine learning algorithms. This paper applied different text pre-processing techniques in a specific sequence based on significance testing. This study examines their influence on sentiment classification accuracy using a machine learning classifier, Support Vector Machines (SVM). Results proved that applying all 14 techniques systematically can achieve up to 82.57% of the accuracy of the SVM classifier with unigram representations. By using Text Detergent, the YouTube dataset achieve the highest accuracy compared to Facebook and Twitter datasets. This will potentially improve the quality of the text and leads to better feature extraction, which in turn helps the sentiment analyst produce a better classifier.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Pavan Kumar, C.S., Dhinesh, L.D.: Novel text preprocessing framework for sentiment analysis. In: Satapathy, S.C., Bhateja, V., Das, S. (eds.) Smart Intelligent Computing and Applications. SIST, vol. 105, pp. 309–317. Springer, Singapore (2019). https://doi.org/10.1007/978-981-13-1927-3_33
Kolajo, T., Daramola, A.A., Seth, A.: A framework for pre-processing of social media feeds based on integrated local knowledge base. Inf. Process. Manag. 57(6), 102348 (2020)
Karami, A., Shah, V., Vaezi, R., Bansal, A.: Twitter speaks: a case of national disaster situational awareness. J. Inf. Sci. 46(3), 313–324 (2020)
Pimpalkar, A.P., Retna Raj, R.J.: Influence of pre-processing strategies on the performance of ML classifiers exploiting TF-IDF and BOW features. ADCAIJ Adv. Distrib. Comput. Artif. Intell. J. 9(2), 49–68 (2020)
Sharma, S., Jain, A.: Role of sentiment analysis in social media security and analytics. Wiley Interdiscip. Rev. Data Min. Knowl. Discov. 10(5) (2020)
Ali, K.: Sentiment Analysis as a Service. RMIT University (2019)
Khader, M., Awajan, A., Al-Naymat, G.: The impact of natural language preprocessing on big data sentiment analysis. Int. Arab J. Inf. Technol. 16(3), 506–513 (2019). ASpecial Issue
Sivarajah, U., Kamal, M.M., Irani, Z., Weerakkody, V.: Critical analysis of Big Data challenges and analytical methods. J. Bus. Res. 70, 263–286 (2017)
Naresh, A., Venkata Krishna, P.: An efficient approach for sentiment analysis using machine learning algorithm. Evol. Intel. 14(2), 725–731 (2020). https://doi.org/10.1007/s12065-020-00429-1
Alam, S., Yao, N.: The impact of preprocessing steps on the accuracy of machine learning algorithms in sentiment analysis. Comput. Math. Organ. Theory 25(3), 319–335 (2018). https://doi.org/10.1007/s10588-018-9266-8
Jianqiang, Z., Xiaolin, G.: Comparison research on text pre-processing methods on Twitter sentiment analysis. IEEE Access 5, 2870–2879 (2017)
Sohrabi, M.K., Hemmatian, F.: An efficient preprocessing method for supervised sentiment analysis by converting sentences to numerical vectors: a twitter case study. Multimed. Tools Appl. 78(17), 24863–24882 (2019). https://doi.org/10.1007/s11042-019-7586-4
Symeonidis, S., Effrosynidis, D., Arampatzis, A.: A comparative evaluation of pre-processing techniques and their interactions for Twitter sentiment analysis. Expert Syst. Appl. 110, 298–310 (2018)
K. Kumar, H.M., Harish, B.S.: Classification of short text using various preprocessing techniques: an empirical evaluation. In: Kumar Sa, P., Bakshi, S., Hatzilygeroudis, I.K., Sahoo, M.N. (eds.) Recent Findings in Intelligent Computing Techniques, vol. 3, pp. 19–30. Springer, Singapore (2018). Doi: https://doi.org/10.1007/978-981-10-8633-5_3
Nagarajan, S.M., Gandhi, U.D.: Classifying streaming of Twitter data based on sentiment analysis using hybridization. Neural Comput. Appl. 31(5), 1425–1433 (2018). https://doi.org/10.1007/s00521-018-3476-3
Pradha, S., Halgamuge, M.N., Tran Quoc Vinh, N.: Effective text data preprocessing technique for sentiment analysis in social media data. In: Proceedings of the 2019 11th International Conference Knowledge System Engineering, KSE 2019 (2019)
Naseem, U., Razzak, I., Eklund, P.W.: A survey of pre-processing techniques to improve short-text quality: a case study on hate speech detection on twitter. Multimed. Tools Appl. 80, 35239–35266 (2020)
Ali, K., Dong, H., Bouguettaya, A., Erradi, A., Hadjidj, R.: Sentiment analysis as a service: a social media based sentiment analysis framework. In: 2017 IEEE International Conference on Web Services (ICWS), pp. 660–667 (2017)
Hair Zaki, U.H., Ibrahim, R., Abd Halim, S.: A social media services analysis. Int. J. Adv. Trends Comput. Sci. Eng. 8(1.6), 69–75 (2019)
Infanger, D., Schmidt-Trucksäss, A.: P value functions: an underused method to present research results and to promote quantitative reasoning. Stat. Med. 38(21), 4189–4197 (2019)
Na, J., Sui, H., Khoo, C., Chan, S., Zhou, Y.: Effectiveness of simple linguistic processing in automatic. In: Knowledge Organization and the Global Information Society: Proceedings of the Eighth International ISKO Conference, pp. 49–54 (2004)
Nielsen, F.Å.: A new ANEW: evaluation of a word list for sentiment analysis in microblogs. CEUR Workshop Proc. 718, 93–98 (2011)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Hair Zaki, U.H., Ibrahim, R., Abd Halim, S., Kamsani, I.I. (2022). Text Detergent: The Systematic Combination of Text Pre-processing Techniques for Social Media Sentiment Analysis. In: Saeed, F., Mohammed, F., Ghaleb, F. (eds) Advances on Intelligent Informatics and Computing. IRICT 2021. Lecture Notes on Data Engineering and Communications Technologies, vol 127. Springer, Cham. https://doi.org/10.1007/978-3-030-98741-1_5
Download citation
DOI: https://doi.org/10.1007/978-3-030-98741-1_5
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-98740-4
Online ISBN: 978-3-030-98741-1
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)