Abstract
The growing amount of phishing outbreaks is one of the main concern for security scientists nowadays. Signature-based approaches are used in the traditional methods for categorizing phishing websites. Such tools and techniques fail with advanced and complex phishing webpage. Consequently, learning-based algorithms are widely adopted in many industries. Such algorithms can be applied to analyze phishing websites. The detection accuracy can be increased with a large dataset and complex features. This chapter addresses the problem of analyzing such phishing activities on web pages. Explicitly, we propose a machine learning-based framework with various supervised learning algorithms such as Random Forest, Support Vector Machine, and Decision Trees. We perform the hyperparameter optimization of these algorithms using sci-kit-learn machine learning frameworks. In the end, we discussed the applied implementation of classifying the phishing attack on the real-world dataset available on Kaggle.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Joint Task Force Transformation Initiative, Guide for conducting risk assessments, National Institute of Standards and Technology, Gaithersburg, MD, NIST SP 800–30r1 (2012)
What is Cybersecurity? | IBM. https://www.ibm.com/topics/cybersecurity. Accessed 26 Jan 2022
Cost of a Data Breach Report 2021, Risk Quantif., p. 73
S.S. Sirigineedi, J. Soni, H. Upadhyay, Learning-based models to detect runtime phishing activities using urls. In Proceedings of the 2020 the 4th International Conference on Compute and Data Analysis (2020), pp. 102–106
J. Kang, D. Lee, Advanced white list approach for preventing access to phishing sites, in 2007 International Conference on Convergence Information Technology (ICCIT 2007) (IEEE, 2007), pp. 491–496
J. Soni, N. Prabakar, Effective machine learning approach to detect groups of fake reviewers, in Proceedings of the 14th International Conference on Data Science (ICDATA’18), Las Vegas, NV (2018), pp. 3–9
J. Soni, N. Prabakar, J. H. Kim, Prediction of component failures of telepresence robot with temporal data, in 30th Florida Conference on Recent Advances in Robotics (2017)
J. Soni, N. Prabakar, H. Upadhyay, Behavioral analysis of system call sequences using LSTM Seq-Seq, cosine similarity and Jaccard similarity for real-time anomaly detection, in 2019 International Conference on Computational Science and Computational Intelligence (CSCI) (IEEE, 2019), pp. 214–219
G.S. Thejas, J. Soni, K. Chandna, S.S. Iyengar, N.R. Sunitha, N. Prabakar, Learning-based model to fight against fake like clicks on Instagram posts, in 2019 SoutheastCon (IEEE, 2019), pp. 1–8
S.B. Kotsiantis, I.D. Zaharakis, P.E. Pintelas, Machine learning: a review of classification and combining techniques. Artif. Intell. Rev. 26(3), 159–190 (2006). https://doi.org/10.1007/s10462-007-9052-3
T. Peng, I. Harris, Y. Sawa, Detecting phishing attacks using natural language processing and machine learning, in 2018 IEEE 12th International Conference on Semantic Computing (ICSC) (IEEE, 2018), pp. 300–301
T. Bhardwaj, R. Mittal, H. Upadhyay, L. Lagos, Applications of swarm intelligent and deep learning algorithms for image-based cancer recognition, in Artificial Intelligence in Healthcare (Springer, Singapore, 2022), pp. 133–150
P. Gangwani, A. Perez-Pons, T. Bhardwaj, H. Upadhyay, S. Joshi, L. Lagos, Securing environmental IoT data using masked authentication messaging protocol in a DAG-based blockchain: IOTA tangle. Future Internet 13(12), 312 (2021)
T. Bhardwaj, C. Reyes, H. Upadhyay, S.C. Sharma, L. Lagos, Cloudlet-enabled wireless body area networks (WBANs): a systematic review, architecture, and research directions for QoS improvement. Int. J. Syst. Assur. Eng. Manage. 1–25 (2021)
T. Bhardwaj, H. Upadhyay, S.C. Sharma, Framework for quality ranking of components in cloud computing: regressive rank, in 2020 10th International Conference on Cloud Computing, Data Science & Engineering (Confluence) (IEEE, 2020), pp. 598–604
J. Soni, N. Prabakar, H. Upadhyay, Feature extraction through deepwalk on weighted graph, in Proceedings of the 15th International Conference on Data Science (ICDATA’19), Las Vegas, NV (2019)
T. Bhardwaj, H. Upadhyay, L. Lagos, Deep learning-based cyber security solutions for smart-city: application and review, in Artificial Intelligence in Industrial Applications (Springer, Cham, 2022), pp. 175–192
D.A. Pisner, D.M. Schnyer, Support vector machine, in Machine Learning (Academic Press, 2020), pp. 101–121
J. Su, H. Zhang, A fast decision tree learning algorithm, in Aaai, vol. 6 (2006), pp. 500–505
Y.Y. Song, Y. Lu, Decision tree methods: applications for classification and prediction. Shanghai Arch. Psychiatry 27(2), 130–135 (2015). https://doi.org/10.11919/j.issn.1002-0829.215044
V. Verma, K.S. Vutukuru, S.S. Divvela, S.S. Sirigineedi, Internet of things and machine learning application for a remotely operated wetland siphon system during hurricanes, in Water Resources Management and Sustainability (Springer, Singapore, 2022), pp. 443–462
Y. Wang, R. Xiong, J. Wang, J. Zhang, Multi-class assembly parts recognition using composite feature and random forest for robot programming by demonstration. IEEE Int. Conf. Robot. Biomimetics (ROBIO) 2015, 698–703 (2015). https://doi.org/10.1109/ROBIO.2015.7418850
T.M. Oshiro, P.S. Perez, J.A. Baranauskas, How many trees in a random forest? in International Workshop on Machine Learning and Data Mining in Pattern Recognition (Springer, Berlin, Heidelberg, 2012), pp. 154–168
W. Hu, M. Fey, M. Zitnik, Y. Dong, H. Ren, B. Liu, J. Leskovec, Open graph benchmark: datasets for machine learning on graphs. Adv. Neural. Inf. Process. Syst. 33, 22118–22133 (2020)
D.G. Kleinbaum, K. Dietz, M. Gail, M. Klein, M. Klein, Logistic Regression (Springer-Verlag, New York, 2002), p.536
O. Chapelle, B. Scholkopf, A. Zien, Eds., Semi-Supervised Learning (Chapelle, O. et al., Eds.; 2006) [Book reviews]. IEEE Trans. Neural Netw. 20(3), 542–542 (2009). https://doi.org/10.1109/TNN.2009.2015974
J. Soni, N. Prabakar, H. Upadhyay, Visualizing high-dimensional data using t-distributed stochastic neighbor embedding algorithm, in Principles of Data Science (Springer, Cham, 2020), pp. 189–206
J. Soni, S.K. Peddoju, N. Prabakar, H. Upadhyay, Comparative analysis of LSTM, one-class SVM, and PCA to monitor real-time malware threats using system call sequences and virtual machine introspection, in International Conference on Communication, Computing and Electronics Systems (Springer, Singapore, 2021), pp. 113–127
J. Soni, N. Prabakar, H. Upadhyay, Comparative analysis of LSTM sequence-sequence and auto encoder for real-time anomaly detection using system call sequences (2019)
G.E. Hinton, R.R. Salakhutdinov, Reducing the dimensionality of data with neural networks. Science 313(5786), 504–507 (2006)
Y. Bengio, A. Courville, P. Vincent, Representation learning: a review and new perspectives. IEEE Trans. Pattern Anal. Mach. Intell. 35(8), 1798–1828 (2013). https://doi.org/10.1109/TPAMI.2013.50
T. Bhardwaj, H. Upadhyay, S.C. Sharma, Autonomic resource provisioning framework for service-based cloud applications: a queuing-model based approach, in 2020 10th International Conference on Cloud Computing, Data Science & Engineering (Confluence) (IEEE, 2020), pp. 605–610
T. Bhardwaj, H. Upadhyay, S.C. Sharma, An autonomic resource allocation framework for service-based cloud applications: a proactive approach, in Soft Computing: Theories and Applications (Springer, Singapore, 2020), pp. 1045–1058
V. Verma, L. Bian, D. Ozecik, S.S. Sirigineedi, A. Leon, Internet-enabled remotely controlled architecture to release water from storage units, in World Environmental and Water Resources Congress (2021), pp. 586–592
J.V. Dillon, et al.,Tensorflow Distributions. arXiv preprint arXiv:1711.10604 (2017)
N. Ketkar, Introduction to Keras. Deep Learning with Python (Apress, Berkeley, CA, 2017), pp.97–111
F. Pedregosa, et al.,Scikit-learn: machine learning in Python. J. Mach. Learning Res. 12, 2825–2830 (2011)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this chapter
Cite this chapter
Soni, J., Sirigineedi, S., Vutukuru, K.S., Sirigineedi, S.S.C., Prabakar, N., Upadhyay, H. (2023). Learning-Based Model for Phishing Attack Detection. In: Bhardwaj, T., Upadhyay, H., Sharma, T.K., Fernandes, S.L. (eds) Artificial Intelligence in Cyber Security: Theories and Applications. Intelligent Systems Reference Library, vol 240. Springer, Cham. https://doi.org/10.1007/978-3-031-28581-3_11
Download citation
DOI: https://doi.org/10.1007/978-3-031-28581-3_11
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-28580-6
Online ISBN: 978-3-031-28581-3
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)