Skip to main content
Log in

CCBLA: a Lightweight Phishing Detection Model Based on CNN, BiLSTM, and Attention Mechanism

  • Published:
Cognitive Computation Aims and scope Submit manuscript

Abstract

Phishing, in which social engineering techniques such as emails and instant messaging are employed and malicious links are disguised as normal URLs to steal sensitive information, is currently a major threat to networks worldwide. Phishing detection systems generally adopt feature engineering as one of the most important approaches to detect or even prevent phishing attacks. However, the accuracy of feature engineering systems is heavily dependent on the prior knowledge of features. In addition, extracting comprehensive features from different dimensions for high detection accuracy is time-consuming. To address these issues, this paper proposes a lightweight model that combines convolutional neural network (CNN), bi-directional long short-term memory (BiLSTM), and the attention mechanism for phishing detection. The proposed model, called the char-convolutional and BiLSTM with attention mechanism (CCBLA) model, employs deep learning to automatically extract features from target URLs and uses the attention mechanism to weight the importance of the selected features under different roles during phishing detection. The results of experiments conducted on two datasets with different scales show that CCBLA is accurate in phishing attack detection with minimal time consumption.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

References

  1. APWG. Phishing activity trends report, 1st quarter. 2020. https://docs.apwg.org/reports/apwg_trends_report_q1_2020.pdf. Accessed 15 May 2020.

  2. Canfield CI, Fischhoff B. Setting priorities in behavioral interventions: An application to reducing phishing risk. Risk Anal. 2018;38(4):826–38.

    Article  Google Scholar 

  3. Jain AK, Gupta B. Comparative analysis of features based machine learning approaches for phishing detection. In: Proceedings of the 3rd International Conference on Computing for Sustainable Global Development (NDIACom 2016), New Delhi, India, March 16–18, 2016, pp. 2125–30.

  4. Ma J, Saul LK, Savage S, Voelker GM. Beyond blacklists: Learning to detect malicious web sites from suspicious URLs. In: Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD 2009), Paris, France, June 28-July 1, 2009, pp. 1245–54.

  5. Jain AK, Gupta B. Phishing detection: analysis of visual similarity based approaches. Secur Commun Networks. 2017;2017:5421046.

  6. Xiang G, Hong J, Rose CP, Cranor L. CANTINA+: A feature-rich machine learning framework for detecting phishing web sites. ACM Trans Inf Syst Secur. 2011;14(2):21.

    Article  Google Scholar 

  7. Bahnsen AC, Bohorquez EC, Villegas S, Vargas J, Gonzalez FA. Classifying phishing URLs using recurrent neural networks. In: Proceedings of the 2017 APWG Symposium on Electronic Crime Research (eCrime 2017), Scottsdale, AZ, United states, April 2–27, 2017, pp. 1–8.

  8. Chandrashekar G, Sahin F. A survey on feature selection methods. Comput Electr Eng. 2014;40(1):16–28.

    Article  Google Scholar 

  9. Le H, Pham Q, Sahoo D, Hoi S. URLNet: Learning a URL representation with deep learning for malicious URL detection. arXiv preprint arXiv:1802.03162; 2018. https://arxiv.org/abs/1802.03162.

  10. Acquisti A, Adjerid I, Balebako R, Brandimarte L, Cranor LF, Komanduri S, Leon PG, Sadeh S, Schaub F, Sleeper M, Wang Y, Wilson S. Nudges for privacy and security: Understanding and assisting users’ choices online. ACM Comput Surv. 2017;50(3):44.

    Google Scholar 

  11. Junger M, Montoya L, Overink F. Priming and warnings are not effective to prevent social engineering attacks. Comput Hum Behav. 2017;66:75–87.

    Article  Google Scholar 

  12. Google Developer. Safe Browsing API. https://developers.google.com/safe-browsing/. Accessed 11 June 2020.

  13. Han WL, Cao Y, Bertino E, Yong JM. Using automated individual white-list to protect web digital identities. Expert Sys Appl. 2012;39(15):11861–9.

    Article  Google Scholar 

  14. Liu WY, Huang GL, Liu XY, Zhang M, Deng XT. Detection of phishing webpages based on visual similarity. In: Proceedings of the 14th International World Wide Web Conference (WWW 2005), Chiba, Japan, May 10–14, 2005, pp. 1060–1.

  15. Rosiello AP, Kirda E, Kruegel C, Ferrandi F. A layout-similarity-based approach for detecting phishing pages. In: Proceedings of the 3rd International Conference on Security and Privacy in Communication Networks (SecureComm 2007), Nice, France, September 17–21, 2007, pp. 454–63.

  16. Mao J, Li P, Li K, Wei T, Liang ZK. Baitalarm: Detecting phishing sites using similarity in fundamental visual features. In: Proceedings of the 5th International Conference on Intelligent Networking and Collaborative Systems (INCoS 2013), Xi’an, China, September 9–11, 2013, pp. 790–5.

  17. Rao RS, Ali ST. A computer vision technique to detect phishing attacks. In: Proceedings of the 5th International Conference on Communication Systems and Network Technologies (CSNT 2015), Gwalior, India, April 2015, pp. 596–601.

  18. Zhang WF, Lu H, Xu BW, Yang HJ. Web phishing detection based on page spatial layout similarity. Informatica. 2013;37(3):231–44.

    Google Scholar 

  19. Fu AY, Liu WY, Deng XT. Detecting phishing web pages with visual similarity assessment based on earth mover’s distance (EMD). IEEE Trans Dependable Secure Comput. 2006;3(4):301–11.

    Article  Google Scholar 

  20. Gu XQ, Wang HY, Ni TG. An efficient approach to detecting phishing web. J Comput Inf Syst. 2013;9(14):5553–60.

    Google Scholar 

  21. Martin A, Anutthamaa NB, Sathyavathy M, Francois MM, Prasanna V. A framework for predicting phishing websites using neural networks. Int J Comput Sci Issues. 2011;8(2):330–6.

    Google Scholar 

  22. Zhang Y, Hong JI, Cranor LF. Cantina: A content-based approach to detecting phishing web sites. In: Proceedings of the 16th international conference on World Wide Web (WWW 2007), Banff, Alberta, Canada, May 2007, pp. 639–48.

  23. Marchal S, Armano G, Grondahl T, Saari K, Singh N, Asokan N. Off-the-Hook: An efficient and usable client-side phishing prevention application. IEEE Trans Comput. 2017;66(10):1717–33.

    Article  MathSciNet  Google Scholar 

  24. Zouina M, Outtaj B. A novel lightweight URL phishing detection system using SVM and similarity index. Hum Centric Comput Inf Sci. 2017;7(1):98.

    Article  Google Scholar 

  25. Chiew KL, Tan CL, Wong K, Yong KS, Tiong WK. A new hybrid ensemble feature selection framework for machine learning-based phishing detection system. Inf Sci. 2019;484:153–66.

    Article  Google Scholar 

  26. Zhu EZ, Chen YY, Ye CC, Li XJ, Liu F. OFS-NN: An effective phishing websites detection model based on optimal feature selection and neural network. IEEE Access. 2019;7:73271–84.

    Article  Google Scholar 

  27. Zhu EZ, Ju YY, Chen ZL, Liu F, Fang XY. DTOF-ANN: An artificial neural network phishing detection model based on decision tree and optimal features. Appl Soft Comput J. 2020;95:106505.

  28. Zhao B, Li XL, Lu XQ. TTH-RNN: Tensor-train hierarchical recurrent neural network for video summarization. IEEE Trans Ind Electron. 2020;68(4):3629–37.

    Article  Google Scholar 

  29. Suzgun M, Gehrmann S, Belinkov Y, Shieber SM. Memory-augmented recurrent neural networks can learn generalized Dyck languages. arXiv preprint arXiv:1911.03329; 2019. https://arxiv.org/abs/1911.03329.

  30. Shi XJ, Chen ZR, Wang H, Yeung DY, Wong WK, Woo WC. Convolutional LSTM network: A machine learning approach for precipitation nowcasting. In: Proceedings of the 2015 Conference on Advances in Neural Information Processing Systems (NIPS 2015), Montreal, QC, Canada, December 7–12, 2015, pp. 802–10.

  31. Kong DJ, Wu F. HST-LSTM: A hierarchical spatial-temporal long-short term memory network for location prediction. In: Proceedings of the 27th International Joint Conference on Artificial Intelligence (IJCAI 2018), Stockholm, Sweden, July 13–19, 2018, pp. 2341–7.

  32. Zhao B, Li XL, Lu XQ. CAM-RNN: co-attention model based RNN for video captioning. IEEE Trans Image Process. 2019;28(11):5552–65.

    Article  MathSciNet  MATH  Google Scholar 

  33. Yuan HP, Yang ZG, Chen X, Li YK, Liu WY. URL2Vec: URL modeling with character embeddings for fast and accurate phishing website detection. In: Proceedings of the 16th IEEE International Symposium on Parallel and Distributed Processing with Applications (ISPA 2018), Melbourne, Australia, December 11–13, 2018, pp. 265–72.

  34. Luong MT, Pham H, Manning CD. Effective approaches to attention-based neural machine translation. In: Proceedings of the Conference on Empirical Methods in Natural Language (EMNLP 2015), Lisbon, Portugal, September 17–21, 2015, pp. 1412–21.

  35. Bahdanau D, Cho KH, Bengio Y. Neural machine translation by jointly learning to align and translate. In: Proceedings of the 3rd International Conference on Learning Representations (ICLR 2015), San Diego, CA, United states, May 7–9, 2015, pp. 1–15.

  36. Ahmed K, Keskar NS, Socher R. Weighted transformer network for machine translation. arXiv preprint arXiv:1711.02132; 2017. https://arxiv.org/abs/1802.03162.

  37. Forney G Jr. Viterbi algorithm. Proc IEEE. 1973;61(3):268–78.

    Article  MathSciNet  Google Scholar 

Download references

Funding

This study was co-funded by the Natural Science Foundation of Anhui Province, China (2008085MF188, 2108085MF210); and the University Natural Science Research Project of Anhui Province, China (KJ2021A0041).

Author information

Authors and Affiliations

Authors

Contributions

Conceptualization: Erzhou Zhu and Xianyong Fang; methodology: Erzhou Zhu and Qixiang Yuan; formal analysis and investigation: Zile Chen and Xuejian Li; writing–original draft preparation: Erzhou Zhu; writing–review and editing: Erzhou Zhu and Xianyong Fang; funding acquisition: Erzhou Zhu and Xianyong Fang. All authors commented on previous versions of the manuscript. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Xianyong Fang.

Ethics declarations

Ethics Approval

This article does not contain any studies with human participants or animals performed by any of the authors.

Conflict of Interest

The authors declare no competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhu, E., Yuan, Q., Chen, Z. et al. CCBLA: a Lightweight Phishing Detection Model Based on CNN, BiLSTM, and Attention Mechanism. Cogn Comput 15, 1320–1333 (2023). https://doi.org/10.1007/s12559-022-10024-4

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s12559-022-10024-4

Keywords

Navigation