CCBLA: a Lightweight Phishing Detection Model Based on CNN, BiLSTM, and Attention Mechanism

Zhu, Erzhou; Yuan, Qixiang; Chen, Zhile; Li, Xuejian; Fang, Xianyong

doi:10.1007/s12559-022-10024-4

CCBLA: a Lightweight Phishing Detection Model Based on CNN, BiLSTM, and Attention Mechanism

Published: 18 May 2022

Volume 15, pages 1320–1333, (2023)
Cite this article

Cognitive Computation Aims and scope Submit manuscript

Erzhou Zhu¹,
Qixiang Yuan¹,
Zhile Chen¹,
Xuejian Li¹ &
…
Xianyong Fang ORCID: orcid.org/0000-0002-6045-8430¹

858 Accesses
1 Citation
Explore all metrics

Abstract

Phishing, in which social engineering techniques such as emails and instant messaging are employed and malicious links are disguised as normal URLs to steal sensitive information, is currently a major threat to networks worldwide. Phishing detection systems generally adopt feature engineering as one of the most important approaches to detect or even prevent phishing attacks. However, the accuracy of feature engineering systems is heavily dependent on the prior knowledge of features. In addition, extracting comprehensive features from different dimensions for high detection accuracy is time-consuming. To address these issues, this paper proposes a lightweight model that combines convolutional neural network (CNN), bi-directional long short-term memory (BiLSTM), and the attention mechanism for phishing detection. The proposed model, called the char-convolutional and BiLSTM with attention mechanism (CCBLA) model, employs deep learning to automatically extract features from target URLs and uses the attention mechanism to weight the importance of the selected features under different roles during phishing detection. The results of experiments conducted on two datasets with different scales show that CCBLA is accurate in phishing attack detection with minimal time consumption.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1

CBAM: Convolutional Block Attention Module

A supervised deep learning-based sentiment analysis by the implementation of Word2Vec and GloVe Embedding techniques

Article 09 April 2024

Pranati Rakshit & Avik Sarkar

Visualizing and Understanding Convolutional Networks

References

APWG. Phishing activity trends report, 1st quarter. 2020. https://docs.apwg.org/reports/apwg_trends_report_q1_2020.pdf. Accessed 15 May 2020.
Canfield CI, Fischhoff B. Setting priorities in behavioral interventions: An application to reducing phishing risk. Risk Anal. 2018;38(4):826–38.
Article Google Scholar
Jain AK, Gupta B. Comparative analysis of features based machine learning approaches for phishing detection. In: Proceedings of the 3rd International Conference on Computing for Sustainable Global Development (NDIACom 2016), New Delhi, India, March 16–18, 2016, pp. 2125–30.
Ma J, Saul LK, Savage S, Voelker GM. Beyond blacklists: Learning to detect malicious web sites from suspicious URLs. In: Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD 2009), Paris, France, June 28-July 1, 2009, pp. 1245–54.
Jain AK, Gupta B. Phishing detection: analysis of visual similarity based approaches. Secur Commun Networks. 2017;2017:5421046.
Xiang G, Hong J, Rose CP, Cranor L. CANTINA+: A feature-rich machine learning framework for detecting phishing web sites. ACM Trans Inf Syst Secur. 2011;14(2):21.
Article Google Scholar
Bahnsen AC, Bohorquez EC, Villegas S, Vargas J, Gonzalez FA. Classifying phishing URLs using recurrent neural networks. In: Proceedings of the 2017 APWG Symposium on Electronic Crime Research (eCrime 2017), Scottsdale, AZ, United states, April 2–27, 2017, pp. 1–8.
Chandrashekar G, Sahin F. A survey on feature selection methods. Comput Electr Eng. 2014;40(1):16–28.
Article Google Scholar
Le H, Pham Q, Sahoo D, Hoi S. URLNet: Learning a URL representation with deep learning for malicious URL detection. arXiv preprint arXiv:1802.03162; 2018. https://arxiv.org/abs/1802.03162.
Acquisti A, Adjerid I, Balebako R, Brandimarte L, Cranor LF, Komanduri S, Leon PG, Sadeh S, Schaub F, Sleeper M, Wang Y, Wilson S. Nudges for privacy and security: Understanding and assisting users’ choices online. ACM Comput Surv. 2017;50(3):44.
Google Scholar
Junger M, Montoya L, Overink F. Priming and warnings are not effective to prevent social engineering attacks. Comput Hum Behav. 2017;66:75–87.
Article Google Scholar
Google Developer. Safe Browsing API. https://developers.google.com/safe-browsing/. Accessed 11 June 2020.
Han WL, Cao Y, Bertino E, Yong JM. Using automated individual white-list to protect web digital identities. Expert Sys Appl. 2012;39(15):11861–9.
Article Google Scholar
Liu WY, Huang GL, Liu XY, Zhang M, Deng XT. Detection of phishing webpages based on visual similarity. In: Proceedings of the 14th International World Wide Web Conference (WWW 2005), Chiba, Japan, May 10–14, 2005, pp. 1060–1.
Rosiello AP, Kirda E, Kruegel C, Ferrandi F. A layout-similarity-based approach for detecting phishing pages. In: Proceedings of the 3rd International Conference on Security and Privacy in Communication Networks (SecureComm 2007), Nice, France, September 17–21, 2007, pp. 454–63.
Mao J, Li P, Li K, Wei T, Liang ZK. Baitalarm: Detecting phishing sites using similarity in fundamental visual features. In: Proceedings of the 5th International Conference on Intelligent Networking and Collaborative Systems (INCoS 2013), Xi’an, China, September 9–11, 2013, pp. 790–5.
Rao RS, Ali ST. A computer vision technique to detect phishing attacks. In: Proceedings of the 5th International Conference on Communication Systems and Network Technologies (CSNT 2015), Gwalior, India, April 2015, pp. 596–601.
Zhang WF, Lu H, Xu BW, Yang HJ. Web phishing detection based on page spatial layout similarity. Informatica. 2013;37(3):231–44.
Google Scholar
Fu AY, Liu WY, Deng XT. Detecting phishing web pages with visual similarity assessment based on earth mover’s distance (EMD). IEEE Trans Dependable Secure Comput. 2006;3(4):301–11.
Article Google Scholar
Gu XQ, Wang HY, Ni TG. An efficient approach to detecting phishing web. J Comput Inf Syst. 2013;9(14):5553–60.
Google Scholar
Martin A, Anutthamaa NB, Sathyavathy M, Francois MM, Prasanna V. A framework for predicting phishing websites using neural networks. Int J Comput Sci Issues. 2011;8(2):330–6.
Google Scholar
Zhang Y, Hong JI, Cranor LF. Cantina: A content-based approach to detecting phishing web sites. In: Proceedings of the 16th international conference on World Wide Web (WWW 2007), Banff, Alberta, Canada, May 2007, pp. 639–48.
Marchal S, Armano G, Grondahl T, Saari K, Singh N, Asokan N. Off-the-Hook: An efficient and usable client-side phishing prevention application. IEEE Trans Comput. 2017;66(10):1717–33.
Article MathSciNet Google Scholar
Zouina M, Outtaj B. A novel lightweight URL phishing detection system using SVM and similarity index. Hum Centric Comput Inf Sci. 2017;7(1):98.
Article Google Scholar
Chiew KL, Tan CL, Wong K, Yong KS, Tiong WK. A new hybrid ensemble feature selection framework for machine learning-based phishing detection system. Inf Sci. 2019;484:153–66.
Article Google Scholar
Zhu EZ, Chen YY, Ye CC, Li XJ, Liu F. OFS-NN: An effective phishing websites detection model based on optimal feature selection and neural network. IEEE Access. 2019;7:73271–84.
Article Google Scholar
Zhu EZ, Ju YY, Chen ZL, Liu F, Fang XY. DTOF-ANN: An artificial neural network phishing detection model based on decision tree and optimal features. Appl Soft Comput J. 2020;95:106505.
Zhao B, Li XL, Lu XQ. TTH-RNN: Tensor-train hierarchical recurrent neural network for video summarization. IEEE Trans Ind Electron. 2020;68(4):3629–37.
Article Google Scholar
Suzgun M, Gehrmann S, Belinkov Y, Shieber SM. Memory-augmented recurrent neural networks can learn generalized Dyck languages. arXiv preprint arXiv:1911.03329; 2019. https://arxiv.org/abs/1911.03329.
Shi XJ, Chen ZR, Wang H, Yeung DY, Wong WK, Woo WC. Convolutional LSTM network: A machine learning approach for precipitation nowcasting. In: Proceedings of the 2015 Conference on Advances in Neural Information Processing Systems (NIPS 2015), Montreal, QC, Canada, December 7–12, 2015, pp. 802–10.
Kong DJ, Wu F. HST-LSTM: A hierarchical spatial-temporal long-short term memory network for location prediction. In: Proceedings of the 27th International Joint Conference on Artificial Intelligence (IJCAI 2018), Stockholm, Sweden, July 13–19, 2018, pp. 2341–7.
Zhao B, Li XL, Lu XQ. CAM-RNN: co-attention model based RNN for video captioning. IEEE Trans Image Process. 2019;28(11):5552–65.
Article MathSciNet MATH Google Scholar
Yuan HP, Yang ZG, Chen X, Li YK, Liu WY. URL2Vec: URL modeling with character embeddings for fast and accurate phishing website detection. In: Proceedings of the 16th IEEE International Symposium on Parallel and Distributed Processing with Applications (ISPA 2018), Melbourne, Australia, December 11–13, 2018, pp. 265–72.
Luong MT, Pham H, Manning CD. Effective approaches to attention-based neural machine translation. In: Proceedings of the Conference on Empirical Methods in Natural Language (EMNLP 2015), Lisbon, Portugal, September 17–21, 2015, pp. 1412–21.
Bahdanau D, Cho KH, Bengio Y. Neural machine translation by jointly learning to align and translate. In: Proceedings of the 3rd International Conference on Learning Representations (ICLR 2015), San Diego, CA, United states, May 7–9, 2015, pp. 1–15.
Ahmed K, Keskar NS, Socher R. Weighted transformer network for machine translation. arXiv preprint arXiv:1711.02132; 2017. https://arxiv.org/abs/1802.03162.
Forney G Jr. Viterbi algorithm. Proc IEEE. 1973;61(3):268–78.
Article MathSciNet Google Scholar

Download references

Funding

This study was co-funded by the Natural Science Foundation of Anhui Province, China (2008085MF188, 2108085MF210); and the University Natural Science Research Project of Anhui Province, China (KJ2021A0041).

Author information

Authors and Affiliations

Key Laboratory of Intelligent Computing and Signal Processing of Ministry of Education, School of Computer Science and Technology, Anhui University, Hefei, 230601, People’s Republic of China
Erzhou Zhu, Qixiang Yuan, Zhile Chen, Xuejian Li & Xianyong Fang

Authors

Erzhou Zhu
View author publications
You can also search for this author in PubMed Google Scholar
Qixiang Yuan
View author publications
You can also search for this author in PubMed Google Scholar
Zhile Chen
View author publications
You can also search for this author in PubMed Google Scholar
Xuejian Li
View author publications
You can also search for this author in PubMed Google Scholar
Xianyong Fang
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

Conceptualization: Erzhou Zhu and Xianyong Fang; methodology: Erzhou Zhu and Qixiang Yuan; formal analysis and investigation: Zile Chen and Xuejian Li; writing–original draft preparation: Erzhou Zhu; writing–review and editing: Erzhou Zhu and Xianyong Fang; funding acquisition: Erzhou Zhu and Xianyong Fang. All authors commented on previous versions of the manuscript. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Xianyong Fang.

Ethics declarations

Ethics Approval

This article does not contain any studies with human participants or animals performed by any of the authors.

Conflict of Interest

The authors declare no competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zhu, E., Yuan, Q., Chen, Z. et al. CCBLA: a Lightweight Phishing Detection Model Based on CNN, BiLSTM, and Attention Mechanism. Cogn Comput 15, 1320–1333 (2023). https://doi.org/10.1007/s12559-022-10024-4

Download citation

Received: 24 September 2020
Accepted: 08 May 2022
Published: 18 May 2022
Issue Date: July 2023
DOI: https://doi.org/10.1007/s12559-022-10024-4

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

CCBLA: a Lightweight Phishing Detection Model Based on CNN, BiLSTM, and Attention Mechanism

Abstract

Access this article

Similar content being viewed by others

CBAM: Convolutional Block Attention Module

A supervised deep learning-based sentiment analysis by the implementation of Word2Vec and GloVe Embedding techniques

Visualizing and Understanding Convolutional Networks

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Ethics Approval

Conflict of Interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

CCBLA: a Lightweight Phishing Detection Model Based on CNN, BiLSTM, and Attention Mechanism

Abstract

Access this article

Similar content being viewed by others

CBAM: Convolutional Block Attention Module

A supervised deep learning-based sentiment analysis by the implementation of Word2Vec and GloVe Embedding techniques

Visualizing and Understanding Convolutional Networks

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Ethics Approval

Conflict of Interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation