Skip to main content

Learning-Based Model for Phishing Attack Detection

  • Chapter
  • First Online:
Artificial Intelligence in Cyber Security: Theories and Applications

Abstract

The growing amount of phishing outbreaks is one of the main concern for security scientists nowadays. Signature-based approaches are used in the traditional methods for categorizing phishing websites. Such tools and techniques fail with advanced and complex phishing webpage. Consequently, learning-based algorithms are widely adopted in many industries. Such algorithms can be applied to analyze phishing websites. The detection accuracy can be increased with a large dataset and complex features. This chapter addresses the problem of analyzing such phishing activities on web pages. Explicitly, we propose a machine learning-based framework with various supervised learning algorithms such as Random Forest, Support Vector Machine, and Decision Trees. We perform the hyperparameter optimization of these algorithms using sci-kit-learn machine learning frameworks. In the end, we discussed the applied implementation of classifying the phishing attack on the real-world dataset available on Kaggle.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Hardcover Book
USD 169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Joint Task Force Transformation Initiative, Guide for conducting risk assessments, National Institute of Standards and Technology, Gaithersburg, MD, NIST SP 800–30r1 (2012)

    Google Scholar 

  2. What is Cybersecurity? | IBM. https://www.ibm.com/topics/cybersecurity. Accessed 26 Jan 2022

  3. Cost of a Data Breach Report 2021, Risk Quantif., p. 73

    Google Scholar 

  4. S.S. Sirigineedi, J. Soni, H. Upadhyay, Learning-based models to detect runtime phishing activities using urls. In Proceedings of the 2020 the 4th International Conference on Compute and Data Analysis (2020), pp. 102–106

    Google Scholar 

  5. J. Kang, D. Lee, Advanced white list approach for preventing access to phishing sites, in 2007 International Conference on Convergence Information Technology (ICCIT 2007) (IEEE, 2007), pp. 491–496

    Google Scholar 

  6. J. Soni, N. Prabakar, Effective machine learning approach to detect groups of fake reviewers, in Proceedings of the 14th International Conference on Data Science (ICDATA’18), Las Vegas, NV (2018), pp. 3–9

    Google Scholar 

  7. J. Soni, N. Prabakar, J. H. Kim, Prediction of component failures of telepresence robot with temporal data, in 30th Florida Conference on Recent Advances in Robotics (2017)

    Google Scholar 

  8. J. Soni, N. Prabakar, H. Upadhyay, Behavioral analysis of system call sequences using LSTM Seq-Seq, cosine similarity and Jaccard similarity for real-time anomaly detection, in 2019 International Conference on Computational Science and Computational Intelligence (CSCI) (IEEE, 2019), pp. 214–219

    Google Scholar 

  9. G.S. Thejas, J. Soni, K. Chandna, S.S. Iyengar, N.R. Sunitha, N. Prabakar, Learning-based model to fight against fake like clicks on Instagram posts, in 2019 SoutheastCon (IEEE, 2019), pp. 1–8

    Google Scholar 

  10. S.B. Kotsiantis, I.D. Zaharakis, P.E. Pintelas, Machine learning: a review of classification and combining techniques. Artif. Intell. Rev. 26(3), 159–190 (2006). https://doi.org/10.1007/s10462-007-9052-3

    Article  Google Scholar 

  11. T. Peng, I. Harris, Y. Sawa, Detecting phishing attacks using natural language processing and machine learning, in 2018 IEEE 12th International Conference on Semantic Computing (ICSC) (IEEE, 2018), pp. 300–301

    Google Scholar 

  12. T. Bhardwaj, R. Mittal, H. Upadhyay, L. Lagos, Applications of swarm intelligent and deep learning algorithms for image-based cancer recognition, in Artificial Intelligence in Healthcare (Springer, Singapore, 2022), pp. 133–150

    Google Scholar 

  13. P. Gangwani, A. Perez-Pons, T. Bhardwaj, H. Upadhyay, S. Joshi, L. Lagos, Securing environmental IoT data using masked authentication messaging protocol in a DAG-based blockchain: IOTA tangle. Future Internet 13(12), 312 (2021)

    Article  Google Scholar 

  14. T. Bhardwaj, C. Reyes, H. Upadhyay, S.C. Sharma, L. Lagos, Cloudlet-enabled wireless body area networks (WBANs): a systematic review, architecture, and research directions for QoS improvement. Int. J. Syst. Assur. Eng. Manage. 1–25 (2021)

    Google Scholar 

  15. T. Bhardwaj, H. Upadhyay, S.C. Sharma, Framework for quality ranking of components in cloud computing: regressive rank, in 2020 10th International Conference on Cloud Computing, Data Science & Engineering (Confluence) (IEEE, 2020), pp. 598–604

    Google Scholar 

  16. J. Soni, N. Prabakar, H. Upadhyay, Feature extraction through deepwalk on weighted graph, in Proceedings of the 15th International Conference on Data Science (ICDATA’19), Las Vegas, NV (2019)

    Google Scholar 

  17. T. Bhardwaj, H. Upadhyay, L. Lagos, Deep learning-based cyber security solutions for smart-city: application and review, in Artificial Intelligence in Industrial Applications (Springer, Cham, 2022), pp. 175–192

    Google Scholar 

  18. D.A. Pisner, D.M. Schnyer, Support vector machine, in Machine Learning (Academic Press, 2020), pp. 101–121

    Google Scholar 

  19. J. Su, H. Zhang, A fast decision tree learning algorithm, in Aaai, vol. 6 (2006), pp. 500–505

    Google Scholar 

  20. Y.Y. Song, Y. Lu, Decision tree methods: applications for classification and prediction. Shanghai Arch. Psychiatry 27(2), 130–135 (2015). https://doi.org/10.11919/j.issn.1002-0829.215044

    Article  Google Scholar 

  21. V. Verma, K.S. Vutukuru, S.S. Divvela, S.S. Sirigineedi, Internet of things and machine learning application for a remotely operated wetland siphon system during hurricanes, in Water Resources Management and Sustainability (Springer, Singapore, 2022), pp. 443–462

    Google Scholar 

  22. Y. Wang, R. Xiong, J. Wang, J. Zhang, Multi-class assembly parts recognition using composite feature and random forest for robot programming by demonstration. IEEE Int. Conf. Robot. Biomimetics (ROBIO) 2015, 698–703 (2015). https://doi.org/10.1109/ROBIO.2015.7418850

    Article  Google Scholar 

  23. T.M. Oshiro, P.S. Perez, J.A. Baranauskas, How many trees in a random forest? in International Workshop on Machine Learning and Data Mining in Pattern Recognition (Springer, Berlin, Heidelberg, 2012), pp. 154–168

    Google Scholar 

  24. W. Hu, M. Fey, M. Zitnik, Y. Dong, H. Ren, B. Liu, J. Leskovec, Open graph benchmark: datasets for machine learning on graphs. Adv. Neural. Inf. Process. Syst. 33, 22118–22133 (2020)

    Google Scholar 

  25. D.G. Kleinbaum, K. Dietz, M. Gail, M. Klein, M. Klein, Logistic Regression (Springer-Verlag, New York, 2002), p.536

    Google Scholar 

  26. O. Chapelle, B. Scholkopf, A. Zien, Eds., Semi-Supervised Learning (Chapelle, O. et al., Eds.; 2006) [Book reviews]. IEEE Trans. Neural Netw. 20(3), 542–542 (2009). https://doi.org/10.1109/TNN.2009.2015974

  27. J. Soni, N. Prabakar, H. Upadhyay, Visualizing high-dimensional data using t-distributed stochastic neighbor embedding algorithm, in Principles of Data Science (Springer, Cham, 2020), pp. 189–206

    Google Scholar 

  28. J. Soni, S.K. Peddoju, N. Prabakar, H. Upadhyay, Comparative analysis of LSTM, one-class SVM, and PCA to monitor real-time malware threats using system call sequences and virtual machine introspection, in International Conference on Communication, Computing and Electronics Systems (Springer, Singapore, 2021), pp. 113–127

    Google Scholar 

  29. J. Soni, N. Prabakar, H. Upadhyay, Comparative analysis of LSTM sequence-sequence and auto encoder for real-time anomaly detection using system call sequences (2019)

    Google Scholar 

  30. G.E. Hinton, R.R. Salakhutdinov, Reducing the dimensionality of data with neural networks. Science 313(5786), 504–507 (2006)

    Google Scholar 

  31. Y. Bengio, A. Courville, P. Vincent, Representation learning: a review and new perspectives. IEEE Trans. Pattern Anal. Mach. Intell. 35(8), 1798–1828 (2013). https://doi.org/10.1109/TPAMI.2013.50

    Article  Google Scholar 

  32. T. Bhardwaj, H. Upadhyay, S.C. Sharma, Autonomic resource provisioning framework for service-based cloud applications: a queuing-model based approach, in 2020 10th International Conference on Cloud Computing, Data Science & Engineering (Confluence) (IEEE, 2020), pp. 605–610

    Google Scholar 

  33. T. Bhardwaj, H. Upadhyay, S.C. Sharma, An autonomic resource allocation framework for service-based cloud applications: a proactive approach, in Soft Computing: Theories and Applications (Springer, Singapore, 2020), pp. 1045–1058

    Google Scholar 

  34. V. Verma, L. Bian, D. Ozecik, S.S. Sirigineedi, A. Leon, Internet-enabled remotely controlled architecture to release water from storage units, in World Environmental and Water Resources Congress (2021), pp. 586–592

    Google Scholar 

  35. J.V. Dillon, et al.,Tensorflow Distributions. arXiv preprint arXiv:1711.10604 (2017)

  36. N. Ketkar, Introduction to Keras. Deep Learning with Python (Apress, Berkeley, CA, 2017), pp.97–111

    Google Scholar 

  37. F. Pedregosa, et al.,Scikit-learn: machine learning in Python. J. Mach. Learning Res. 12, 2825–2830 (2011)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jayesh Soni .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Soni, J., Sirigineedi, S., Vutukuru, K.S., Sirigineedi, S.S.C., Prabakar, N., Upadhyay, H. (2023). Learning-Based Model for Phishing Attack Detection. In: Bhardwaj, T., Upadhyay, H., Sharma, T.K., Fernandes, S.L. (eds) Artificial Intelligence in Cyber Security: Theories and Applications. Intelligent Systems Reference Library, vol 240. Springer, Cham. https://doi.org/10.1007/978-3-031-28581-3_11

Download citation

Publish with us

Policies and ethics