Abstract
The growth of data has seen a huge upheaval of messages for various business purposes, engendering the need for spam classification to be prioritized as that of paramount importance. In this paper, A novel approach to spam classification using the algorithms of passive-aggressive spectrum with genetic optimization is proposed. The paper discusses application of such online learning algorithm to classify spam and do a comparative study with existing approaches to spam classification. The results demonstrate the robustness of the algorithm selected and provide a study of the effect of hyperparameters on classification. The Dataset used for classification study is public SMS spam dataset, Spam review and twitter spam datasets, 80% of each dataset was used for training and 20% for testing.The proposed algorithm outperforms standard benchmark algorithms in terms of accuracy,precision, recall scores.
Similar content being viewed by others
Data availability
All the data utilized in the paper is publicly available on Kaggle datasets.
References
Alazab, Broadhurst Roderic. An Analysis of the Nature of Spam as Cybercrime 2017.
Bonaccorso G. Machine learning algorithms. Birmingham: Packt publishing; 2018.
Cheng L-C, Tseng Judy CR, Chung T-Y. Case study of fake web reviews. In: International conference on advances in social network analysis and mining; 2017. IEEE/ACM, pp. 706–9.
Crammer K, Dekel O, Keshet J, Shalev-Shwartz S. Online passive-aggressive 32 algorithms. J Mach Learn Res. 2006;2006:551–85.
Emmanuel G. Machine learning for email spam filtering: review, approaches and open research problems Heliyon; 2019.
Zulfikar Alom BC. A deep learning model for Twitter spam detection. Online Social Networks and Media; 2020.
Hu YH, Chen YL, Chou HL. Opinion mining from online hotel reviews—a text summarization approach. Inf Process Manag. 2017;53:436–49.
Li Y, Nie X, Huang R. Web Spam classification methods based on deep belief networks. Expert Syst Appl. 2018;96:261–70.
Liu S, Zhang J, Xiang Y. Statistical detection of online drifting twitter spam. In: 11th ACM on Asia conference on computer and communication security; 2016. ACM, pp. 1–10.
Pandey AC, Rajpoot DS. Spam review detection using sprial cuckoo search clustering method. Evol Intell. 2019;12:147–64.
Salehi S, Selamat A, Bostanian M. Enhanced Genetic Algorithm for spam detection in Email. IEEE; 2011.
Sanpakdee U, Walairacht A, Walairacht S. Adaptive spam mail filtering using genetic algorithm. IEEE 2006.
Babatunde OH, Armstrong L, Leng J, Diepeveen D. A genetic-algorithm-based feature selection. Int J Electron Commun Comput Eng 2014.
Frohlich H, Chapelle O, Scholkopf B. Feature Selection for support vector machines by means of genetic algorithms. In: Proceedings, 15th IEEE international conference on tools with artificial intelligence; 2003. pp. 142–148. https://doi.org/10.1109/TAI.2003.1250182.
Chowdhary M, Dhaka VS. E-mail Spam Filtering using Genetic Algorithm: A Depper Analysis. Int J Comput Sci Inf Technol. 2272–6 (n.d.).
Sivanandam SN, Deepa SN. Principles of Soft Computing. New Delhi: Wiely-India; 2nd Edition. publication in year 2011.
David Schaffer J, Morishima A. An Adaptive crossover distribution mechanism for genetic algorithms. In: Proceedings of second international conference o genetic algorithms; 1987. Hillsdale: Lawerence Erlbaum Associates, Inc, pp. 36-40.
Morik K, Köpcke H. Analysing insurance data or the advantage of TF/IDF Features. Research Gate; 2003.
Funding
No funding.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
Author declares no conflict of interest exists.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Naravajhula, P., Naravajula, A. Spam Classification: Genetically Optimized Passive-Aggressive Approach. SN COMPUT. SCI. 4, 93 (2023). https://doi.org/10.1007/s42979-022-01517-y
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s42979-022-01517-y