Learning-Based Model for Phishing Attack Detection

Soni, Jayesh; Sirigineedi, Surya; Vutukuru, Krishna Sai; Sirigineedi, S. S. ChandanaEswari; Prabakar, Nagarajan; Upadhyay, Himanshu

doi:10.1007/978-3-031-28581-3_11

Jayesh Soni⁷,
Surya Sirigineedi⁹,
Krishna Sai Vutukuru¹⁰,
S. S. ChandanaEswari Sirigineedi¹¹,
Nagarajan Prabakar⁷ &
…
Himanshu Upadhyay⁸

Part of the book series: Intelligent Systems Reference Library ((ISRL,volume 240))

162 Accesses

Abstract

The growing amount of phishing outbreaks is one of the main concern for security scientists nowadays. Signature-based approaches are used in the traditional methods for categorizing phishing websites. Such tools and techniques fail with advanced and complex phishing webpage. Consequently, learning-based algorithms are widely adopted in many industries. Such algorithms can be applied to analyze phishing websites. The detection accuracy can be increased with a large dataset and complex features. This chapter addresses the problem of analyzing such phishing activities on web pages. Explicitly, we propose a machine learning-based framework with various supervised learning algorithms such as Random Forest, Support Vector Machine, and Decision Trees. We perform the hyperparameter optimization of these algorithms using sci-kit-learn machine learning frameworks. In the end, we discussed the applied implementation of classifying the phishing attack on the real-world dataset available on Kaggle.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Hardcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Joint Task Force Transformation Initiative, Guide for conducting risk assessments, National Institute of Standards and Technology, Gaithersburg, MD, NIST SP 800–30r1 (2012)
Google Scholar
What is Cybersecurity? | IBM. https://www.ibm.com/topics/cybersecurity. Accessed 26 Jan 2022
Cost of a Data Breach Report 2021, Risk Quantif., p. 73
Google Scholar
S.S. Sirigineedi, J. Soni, H. Upadhyay, Learning-based models to detect runtime phishing activities using urls. In Proceedings of the 2020 the 4th International Conference on Compute and Data Analysis (2020), pp. 102–106
Google Scholar
J. Kang, D. Lee, Advanced white list approach for preventing access to phishing sites, in 2007 International Conference on Convergence Information Technology (ICCIT 2007) (IEEE, 2007), pp. 491–496
Google Scholar
J. Soni, N. Prabakar, Effective machine learning approach to detect groups of fake reviewers, in Proceedings of the 14th International Conference on Data Science (ICDATA’18), Las Vegas, NV (2018), pp. 3–9
Google Scholar
J. Soni, N. Prabakar, J. H. Kim, Prediction of component failures of telepresence robot with temporal data, in 30th Florida Conference on Recent Advances in Robotics (2017)
Google Scholar
J. Soni, N. Prabakar, H. Upadhyay, Behavioral analysis of system call sequences using LSTM Seq-Seq, cosine similarity and Jaccard similarity for real-time anomaly detection, in 2019 International Conference on Computational Science and Computational Intelligence (CSCI) (IEEE, 2019), pp. 214–219
Google Scholar
G.S. Thejas, J. Soni, K. Chandna, S.S. Iyengar, N.R. Sunitha, N. Prabakar, Learning-based model to fight against fake like clicks on Instagram posts, in 2019 SoutheastCon (IEEE, 2019), pp. 1–8
Google Scholar
S.B. Kotsiantis, I.D. Zaharakis, P.E. Pintelas, Machine learning: a review of classification and combining techniques. Artif. Intell. Rev. 26(3), 159–190 (2006). https://doi.org/10.1007/s10462-007-9052-3
Article Google Scholar
T. Peng, I. Harris, Y. Sawa, Detecting phishing attacks using natural language processing and machine learning, in 2018 IEEE 12th International Conference on Semantic Computing (ICSC) (IEEE, 2018), pp. 300–301
Google Scholar
T. Bhardwaj, R. Mittal, H. Upadhyay, L. Lagos, Applications of swarm intelligent and deep learning algorithms for image-based cancer recognition, in Artificial Intelligence in Healthcare (Springer, Singapore, 2022), pp. 133–150
Google Scholar
P. Gangwani, A. Perez-Pons, T. Bhardwaj, H. Upadhyay, S. Joshi, L. Lagos, Securing environmental IoT data using masked authentication messaging protocol in a DAG-based blockchain: IOTA tangle. Future Internet 13(12), 312 (2021)
Article Google Scholar
T. Bhardwaj, C. Reyes, H. Upadhyay, S.C. Sharma, L. Lagos, Cloudlet-enabled wireless body area networks (WBANs): a systematic review, architecture, and research directions for QoS improvement. Int. J. Syst. Assur. Eng. Manage. 1–25 (2021)
Google Scholar
T. Bhardwaj, H. Upadhyay, S.C. Sharma, Framework for quality ranking of components in cloud computing: regressive rank, in 2020 10th International Conference on Cloud Computing, Data Science & Engineering (Confluence) (IEEE, 2020), pp. 598–604
Google Scholar
J. Soni, N. Prabakar, H. Upadhyay, Feature extraction through deepwalk on weighted graph, in Proceedings of the 15th International Conference on Data Science (ICDATA’19), Las Vegas, NV (2019)
Google Scholar
T. Bhardwaj, H. Upadhyay, L. Lagos, Deep learning-based cyber security solutions for smart-city: application and review, in Artificial Intelligence in Industrial Applications (Springer, Cham, 2022), pp. 175–192
Google Scholar
D.A. Pisner, D.M. Schnyer, Support vector machine, in Machine Learning (Academic Press, 2020), pp. 101–121
Google Scholar
J. Su, H. Zhang, A fast decision tree learning algorithm, in Aaai, vol. 6 (2006), pp. 500–505
Google Scholar
Y.Y. Song, Y. Lu, Decision tree methods: applications for classification and prediction. Shanghai Arch. Psychiatry 27(2), 130–135 (2015). https://doi.org/10.11919/j.issn.1002-0829.215044
Article Google Scholar
V. Verma, K.S. Vutukuru, S.S. Divvela, S.S. Sirigineedi, Internet of things and machine learning application for a remotely operated wetland siphon system during hurricanes, in Water Resources Management and Sustainability (Springer, Singapore, 2022), pp. 443–462
Google Scholar
Y. Wang, R. Xiong, J. Wang, J. Zhang, Multi-class assembly parts recognition using composite feature and random forest for robot programming by demonstration. IEEE Int. Conf. Robot. Biomimetics (ROBIO) 2015, 698–703 (2015). https://doi.org/10.1109/ROBIO.2015.7418850
Article Google Scholar
T.M. Oshiro, P.S. Perez, J.A. Baranauskas, How many trees in a random forest? in International Workshop on Machine Learning and Data Mining in Pattern Recognition (Springer, Berlin, Heidelberg, 2012), pp. 154–168
Google Scholar
W. Hu, M. Fey, M. Zitnik, Y. Dong, H. Ren, B. Liu, J. Leskovec, Open graph benchmark: datasets for machine learning on graphs. Adv. Neural. Inf. Process. Syst. 33, 22118–22133 (2020)
Google Scholar
D.G. Kleinbaum, K. Dietz, M. Gail, M. Klein, M. Klein, Logistic Regression (Springer-Verlag, New York, 2002), p.536
Google Scholar
O. Chapelle, B. Scholkopf, A. Zien, Eds., Semi-Supervised Learning (Chapelle, O. et al., Eds.; 2006) [Book reviews]. IEEE Trans. Neural Netw. 20(3), 542–542 (2009). https://doi.org/10.1109/TNN.2009.2015974
J. Soni, N. Prabakar, H. Upadhyay, Visualizing high-dimensional data using t-distributed stochastic neighbor embedding algorithm, in Principles of Data Science (Springer, Cham, 2020), pp. 189–206
Google Scholar
J. Soni, S.K. Peddoju, N. Prabakar, H. Upadhyay, Comparative analysis of LSTM, one-class SVM, and PCA to monitor real-time malware threats using system call sequences and virtual machine introspection, in International Conference on Communication, Computing and Electronics Systems (Springer, Singapore, 2021), pp. 113–127
Google Scholar
J. Soni, N. Prabakar, H. Upadhyay, Comparative analysis of LSTM sequence-sequence and auto encoder for real-time anomaly detection using system call sequences (2019)
Google Scholar
G.E. Hinton, R.R. Salakhutdinov, Reducing the dimensionality of data with neural networks. Science 313(5786), 504–507 (2006)
Google Scholar
Y. Bengio, A. Courville, P. Vincent, Representation learning: a review and new perspectives. IEEE Trans. Pattern Anal. Mach. Intell. 35(8), 1798–1828 (2013). https://doi.org/10.1109/TPAMI.2013.50
Article Google Scholar
T. Bhardwaj, H. Upadhyay, S.C. Sharma, Autonomic resource provisioning framework for service-based cloud applications: a queuing-model based approach, in 2020 10th International Conference on Cloud Computing, Data Science & Engineering (Confluence) (IEEE, 2020), pp. 605–610
Google Scholar
T. Bhardwaj, H. Upadhyay, S.C. Sharma, An autonomic resource allocation framework for service-based cloud applications: a proactive approach, in Soft Computing: Theories and Applications (Springer, Singapore, 2020), pp. 1045–1058
Google Scholar
V. Verma, L. Bian, D. Ozecik, S.S. Sirigineedi, A. Leon, Internet-enabled remotely controlled architecture to release water from storage units, in World Environmental and Water Resources Congress (2021), pp. 586–592
Google Scholar
J.V. Dillon, et al.,Tensorflow Distributions. arXiv preprint arXiv:1711.10604 (2017)
N. Ketkar, Introduction to Keras. Deep Learning with Python (Apress, Berkeley, CA, 2017), pp.97–111
Google Scholar
F. Pedregosa, et al.,Scikit-learn: machine learning in Python. J. Mach. Learning Res. 12, 2825–2830 (2011)
Google Scholar

Download references

Author information

Authors and Affiliations

Knight Foundation School of Computing and Information Sciences, Florida International University, Miami, FL, USA
Jayesh Soni & Nagarajan Prabakar
Applied Research Center, Florida International University, Miami, FL, USA
Himanshu Upadhyay
Department of Electrical and Computer Engineering, Florida International University, Miami, FL, USA
Surya Sirigineedi
Thornton Tomasetti, Miami, FL, USA
Krishna Sai Vutukuru
University Of Central Missouri, Warrensburg, USA
S. S. ChandanaEswari Sirigineedi

Authors

Jayesh Soni
View author publications
You can also search for this author in PubMed Google Scholar
Surya Sirigineedi
View author publications
You can also search for this author in PubMed Google Scholar
Krishna Sai Vutukuru
View author publications
You can also search for this author in PubMed Google Scholar
S. S. ChandanaEswari Sirigineedi
View author publications
You can also search for this author in PubMed Google Scholar
Nagarajan Prabakar
View author publications
You can also search for this author in PubMed Google Scholar
Himanshu Upadhyay
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jayesh Soni .

Editor information

Editors and Affiliations

Applied Research Center, Florida International University, Miami Fl, FL, USA
Tushar Bhardwaj
Applied Research Center, Florida International University, Miami Fl, FL, USA
Himanshu Upadhyay
Shobhit University, Gangoh, India
Tarun Kumar Sharma
Department of Computer Science, Creighton University, Omaha, NE, USA
Steven Lawrence Fernandes

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Soni, J., Sirigineedi, S., Vutukuru, K.S., Sirigineedi, S.S.C., Prabakar, N., Upadhyay, H. (2023). Learning-Based Model for Phishing Attack Detection. In: Bhardwaj, T., Upadhyay, H., Sharma, T.K., Fernandes, S.L. (eds) Artificial Intelligence in Cyber Security: Theories and Applications. Intelligent Systems Reference Library, vol 240. Springer, Cham. https://doi.org/10.1007/978-3-031-28581-3_11

Download citation

DOI: https://doi.org/10.1007/978-3-031-28581-3_11
Published: 07 October 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-28580-6
Online ISBN: 978-3-031-28581-3
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics