Skip to main content

SMS and E-mail Spam Classification Using Natural Language Processing and Machine Learning

  • Conference paper
  • First Online:
Proceedings of the NIELIT's International Conference on Communication, Electronics and Digital Technology (NICE-DT 2023)

Abstract

Billions of messages are sent daily over the Internet, out of which a majority part of them is spam. These spam messages have become a primary cause of distraction and security threat for users as their number keeps on increasing day by day. Many researchers have addressed this problem, and there are different approaches to it. In this present study, Machine Learning algorithms such as Naïve Bayes, Logistic Regression, K-Nearest Neighbors, Support Vector Machine, Random Forest, Gradient Boosting, and Extra Trees Classifier have been utilized to predict whether an incoming message or e-mail is spam. The model performance has been evaluated based on accuracy, precision, F1-score, and confusion matrix. Three different datasets including two SMS datasets and one e-mail dataset has been used, and a maximum F1-score of 96.06% and accuracy of 99.12% with the Extra Trees Classifier are achieved, which is 0.02% higher than the highest value of accuracy ever achieved for the SMS Spam Collection Dataset.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 219.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 279.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Bansal C, Sidhu B (2021) Machine learning based hybrid approach for email spam detection. In: 2021 9th international conference on reliability,Infocom technologies and optimization (trends and future directions), ICRITO 2021. https://doi.org/10.1109/ICRITO51393.2021.9596149

  2. Al-Rawashdeh G, Mamat R, Hafhizah Binti Abd Rahim N (2019) Hybrid water cycle optimization algorithm with simulated annealing for spam e-mail detection. IEEE Access 7:143721–143734. https://doi.org/10.1109/ACCESS.2019.2944089

  3. Laorden C, Sanz B, Santos I, Galá N-garćia P, Bringas PG (2013) Collective classification for spam filtering. Log J IGPL 21(4):540–541. https://doi.org/10.1093/jigpal/jzs030

  4. Karim A, Azam S, Shanmugam B, Kannoorpatti K, Alazab M (2019) A comprehensive survey for intelligent spam email detection. IEEE Access 7:168261–168295. https://doi.org/10.1109/ACCESS.2019.2954791

  5. Liu X, Lu H, Nayak A (2021) A spam transformer model for SMS spam detection. IEEE Access 9:80253–80263. https://doi.org/10.1109/ACCESS.2021.3081479

    Article  Google Scholar 

  6. Navaney P, Dubey G, Rana A (2018) SMS spam filtering using supervised machine learning algorithms. In: Proceedings of the 8th international conference confluence 2018 on cloud computing, data science and engineering, confluence 2018, pp 43–48. https://doi.org/10.1109/CONFLUENCE.2018.8442564

  7. I-SMAC (2019) Third international conference on I-SMAC (IoT in social, mobile, analytics and cloud). IEEE

    Google Scholar 

  8. Institute of Electrical and Electronics Engineers (2017) International conference on computing and communication technologies for smart nation (IC3TSN). 12–14 Oct 2017

    Google Scholar 

  9. Aluru S (2018) Jaypee Institute of Information Technology University, University of Florida. College of Engineering, IEEE Computer Society, IEEE Computer Society. Technical Committee on Parallel Processing, and Institute of Electrical and Electronics Engineers, 2018 Eleventh International Conference on Contemporary Computing (IC3): 2–4 Aug 2018, Jaypee Institute of Information Technology, Noida, India

    Google Scholar 

  10. Sharma N (2022) A methodological study of SMS spam classification using machine learning algorithms. In: 2022 2nd international conference on intelligent technologies, CONIT 2022. https://doi.org/10.1109/CONIT55038.2022.9848171

  11. ISCON (2019) 4th international conference on information systems and computer networks (ISCON). IEEE

    Google Scholar 

  12. Debnath K, Kar N (2022) Email spam detection using deep learning approach. In: 2022 international conference on machine learning, big data, cloud and parallel computing, COM-IT-CON 2022, pp 37–41. https://doi.org/10.1109/COM-IT-CON54601.2022.9850588

  13. Abdullahi AA, Kaya M (2021) A deep learning based method to detect email and SMS spams. In: 2021 international conference on decision aid sciences and application, DASA 2021, pp 430–435. https://doi.org/10.1109/DASA53625.2021.9681921

  14. Cota RP, Zinca D (2022) Comparative results of spam email detection using machine learning algorithms. In: 14th international conference on communications, COMM 2022—proceedings. https://doi.org/10.1109/COMM54429.2022.9817305

  15. Hidalgo JMG, Bringas GC, Sánz EP, García FC (2006) Content based SMS spam filtering. In: Proceedings of the 2006 ACM symposium on document engineering, DocEng 2006, pp 107–114. https://doi.org/10.1145/1166160.1166191

  16. Jáñez-Martino F, Alaiz-Rodríguez R, González-Castro V, Fidalgo E, Alegre E (2022) A review of spam email detection: analysis of spammer strategies and the dataset shift problem. Artif Intell Rev. https://doi.org/10.1007/s10462-022-10195-4

    Article  Google Scholar 

  17. Singh T, Kumar TA, Shambharkar PG (2022) Enhancing spam detection on SMS performance using several machine learning classification models. In: 2022 6th international conference on trends in electronics and informatics, ICOEI 2022—proceedings, pp 1472–1478. https://doi.org/10.1109/ICOEI53556.2022.9777157

  18. Ubale G, Gaikwad S (2022) SMS spam detection using TFIDF and voting classifier. In: 2022 international mobile and embedded technology conference, MECON 2022, pp 363–366. https://doi.org/10.1109/MECON53876.2022.9752078

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Prince Bari .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Bari, P., Mathew, V., Tandel, S.P., Aniket, P., Chaudhari, K.S., Naik, S. (2023). SMS and E-mail Spam Classification Using Natural Language Processing and Machine Learning. In: Singh, S.N., Mahanta, S., Singh, Y.J. (eds) Proceedings of the NIELIT's International Conference on Communication, Electronics and Digital Technology. NICE-DT 2023. Lecture Notes in Networks and Systems, vol 676. Springer, Singapore. https://doi.org/10.1007/978-981-99-1699-3_6

Download citation

Publish with us

Policies and ethics