Abstract
Phishing is a major problem that involves web sites and fraudulent emails that aim to reveal users important information such as financial data, emails, and other private information. Phishing activities have been in the increasing trend, and many unsuspecting users have fallen victims of these websites and fraudulent emails. This paper has analyzed the evaluation and design of the features used to detect and reduce any false activity. The selected features not only depend on the characteristics of the URL (Uniform Resource Locator), but also on the website content. The TF-IDF algorithm is used to calculate the top keywords of the website content that is used to extract one of the important features. The technique was evaluated on the dataset of 4.420 legitimate URLs and 5.389 phishing URLs. By considering features and evaluating using 5 classification algorithms, the resulting classifiers obtain 98.8 % accuracy on detecting phishing website URLs.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Basnet R, Mukkamala S, Sung AH (2008) Detection of phishing attacks: a machine learning approach. Springer, Berlin, pp 373–383
Geng GG, Lee XD, Zhang YM (2015) Combating phishing attacks via brand identity and authorization features. Secur Commun Netw 8(6):888–898
Chaudhary S, Berki E, Li L, Valtanen J (2012) Time up for phishing with effective anti-phishing research strategies. Int J Hum Cap Inf Technol Prof (IJHCITP) 49–64 (2015)
Ray LL (2015) Countering cross-site scripting in web-based applications. Int J Strateg Inf Technol Appl (IJSITA) 6(1):57–68
Sheng S, Wardman B, Warner G, Cranor LF, Hong J, Zhang C (2009) An empirical analysis of phishing blacklists
Goodin D (2012) Google bots detect 9,500 new malicious websites every day
Joshi Y, Das D, Saha S (2009) Mitigating man in the middle attack over secure sockets layer. In: IEEE international conference on Internet Multimedia Services Architecture and Applications (IMSAA), 2009. IEEE
Dudhe MPD, Ramteke PL (2015) A review on phishing detection approaches
Kirda E, Kruegel C (2005) Protecting users against phishing attacks with antiphish. In: 29th annual international computer software and applications conference, 2005. COMPSAC 2005, vol 1. IEEE
Likarish P, Jung EE, Dunbar D, Hansen TE, Hourcade JP (2008) B-apt: bayesian anti-phishing toolbar. IEEE Int Conf Commun 1745–1749
Zhang Y, Hong JI, Cranor LF (2007) Cantina: a content-based approach to detecting phishing web sites. In: The 16th international conference on World Wide Web, p 639648
Xiang G, Hong J, Rose CP, Cranor L (2011) Cantina+: a feature-rich machine learning framework for detecting phishing web sites. ACM Trans Inf Syst Secur 14(2):128
Salton G, McGill MJ (1986) Introduction to modern information retrieval. Facet Publishing, London
5000 best websites. http://5000best.com/websites
PhishTank—Suspected phish submissions. https://www.phishtank.com
Hall M, Frank E, Holmes G, Pfahringer B, Reutemann P, Witten IH (2009) The WEKA data mining software: an update. ACM SIGKDD Explor Newsl 11(1):10–18
Murph KP (2006) Naive bayes classifiers. University of British Columbia, Vancouver
Mohammad RM, Thabtah F, McCluskey L (2014) Predicting phishing websites based on self-structuring neural network. Neural Comput Appl 443–458
Pal M (2005) Random forest classifier for remote sensing classification. Int J Remote Sens 217–222
Ripley BD (1994) Neural networks and related methods for classification. J Roy Stat Soc 409–456
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing Switzerland
About this paper
Cite this paper
Nguyen, H.H., Nguyen, D.T. (2016). Machine Learning Based Phishing Web Sites Detection. In: Duy, V., Dao, T., Zelinka, I., Choi, HS., Chadli, M. (eds) AETA 2015: Recent Advances in Electrical Engineering and Related Sciences. Lecture Notes in Electrical Engineering, vol 371. Springer, Cham. https://doi.org/10.1007/978-3-319-27247-4_11
Download citation
DOI: https://doi.org/10.1007/978-3-319-27247-4_11
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-27245-0
Online ISBN: 978-3-319-27247-4
eBook Packages: EngineeringEngineering (R0)