Skip to main content

Advertisement

Log in

Spam Classification: Genetically Optimized Passive-Aggressive Approach

  • Original Research
  • Published:
SN Computer Science Aims and scope Submit manuscript

Abstract

The growth of data has seen a huge upheaval of messages for various business purposes, engendering the need for spam classification to be prioritized as that of paramount importance. In this paper, A novel approach to spam classification using the algorithms of passive-aggressive spectrum with genetic optimization is proposed. The paper discusses application of such online learning algorithm to classify spam and do a comparative study with existing approaches to spam classification. The results demonstrate the robustness of the algorithm selected and provide a study of the effect of hyperparameters on classification. The Dataset used for classification study is public SMS spam dataset, Spam review and twitter spam datasets, 80% of each dataset was used for training and 20% for testing.The proposed algorithm outperforms standard benchmark algorithms in terms of accuracy,precision, recall scores.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

Data availability

All the data utilized in the paper is publicly available on Kaggle datasets. 

References

  1. Alazab, Broadhurst Roderic. An Analysis of the Nature of Spam as Cybercrime 2017.

  2. Bonaccorso G. Machine learning algorithms. Birmingham: Packt publishing; 2018.

    Google Scholar 

  3. Cheng L-C, Tseng Judy CR, Chung T-Y. Case study of fake web reviews. In: International conference on advances in social network analysis and mining; 2017. IEEE/ACM, pp. 706–9.

  4. Crammer K, Dekel O, Keshet J, Shalev-Shwartz S. Online passive-aggressive 32 algorithms. J Mach Learn Res. 2006;2006:551–85.

    MATH  Google Scholar 

  5. Emmanuel G. Machine learning for email spam filtering: review, approaches and open research problems Heliyon; 2019.

  6. Zulfikar Alom BC. A deep learning model for Twitter spam detection. Online Social Networks and Media; 2020.

  7. Hu YH, Chen YL, Chou HL. Opinion mining from online hotel reviews—a text summarization approach. Inf Process Manag. 2017;53:436–49.

    Article  Google Scholar 

  8. Li Y, Nie X, Huang R. Web Spam classification methods based on deep belief networks. Expert Syst Appl. 2018;96:261–70.

    Article  Google Scholar 

  9. Liu S, Zhang J, Xiang Y. Statistical detection of online drifting twitter spam. In: 11th ACM on Asia conference on computer and communication security; 2016. ACM, pp. 1–10.

  10. Pandey AC, Rajpoot DS. Spam review detection using sprial cuckoo search clustering method. Evol Intell. 2019;12:147–64.

    Article  Google Scholar 

  11. Salehi S, Selamat A, Bostanian M. Enhanced Genetic Algorithm for spam detection in Email. IEEE; 2011.

  12. Sanpakdee U, Walairacht A, Walairacht S. Adaptive spam mail filtering using genetic algorithm. IEEE 2006.

  13. Babatunde OH, Armstrong L, Leng J, Diepeveen D. A genetic-algorithm-based feature selection. Int J Electron Commun Comput Eng 2014.

  14. Frohlich H, Chapelle O, Scholkopf B. Feature Selection for support vector machines by means of genetic algorithms. In: Proceedings, 15th IEEE international conference on tools with artificial intelligence; 2003. pp. 142–148. https://doi.org/10.1109/TAI.2003.1250182.

  15. Chowdhary M, Dhaka VS. E-mail Spam Filtering using Genetic Algorithm: A Depper Analysis. Int J Comput Sci Inf Technol. 2272–6 (n.d.).

  16. Sivanandam SN, Deepa SN. Principles of Soft Computing. New Delhi: Wiely-India; 2nd Edition. publication in year 2011.

    Google Scholar 

  17. David Schaffer J, Morishima A. An Adaptive crossover distribution mechanism for genetic algorithms. In: Proceedings of second international conference o genetic algorithms; 1987. Hillsdale: Lawerence Erlbaum Associates, Inc, pp. 36-40.

  18. Morik K, Köpcke H. Analysing insurance data or the advantage of TF/IDF Features. Research Gate; 2003.

Download references

Funding

No funding.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Alekhya Naravajula.

Ethics declarations

Conflict of interest

Author declares no conflict of interest exists.                                                                                                                                                            

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Naravajhula, P., Naravajula, A. Spam Classification: Genetically Optimized Passive-Aggressive Approach. SN COMPUT. SCI. 4, 93 (2023). https://doi.org/10.1007/s42979-022-01517-y

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s42979-022-01517-y

Keywords

Navigation