A novel deep learning-based feature selection model for improving the static analysis of vulnerability detection

Batur Şahin, Canan; Abualigah, Laith

doi:10.1007/s00521-021-06047-x

A novel deep learning-based feature selection model for improving the static analysis of vulnerability detection

Original Article
Published: 01 May 2021

Volume 33, pages 14049–14067, (2021)
Cite this article

Neural Computing and Applications Aims and scope Submit manuscript

1826 Accesses
25 Citations
Explore all metrics

Abstract

The automatic detection of software vulnerabilities is considered a complex and common research problem. It is possible to detect several security vulnerabilities using static analysis (SA) tools, but comparatively high false-positive rates are observed in this case. Existing solutions to this problem depend on human experts to identify functionality, and as a result, several vulnerabilities are often overlooked. This paper introduces a novel approach for effectively and reliably finding vulnerabilities in open-source software programs. In this paper, we are motivated to examine the potential of the clonal selection theory. A novel deep learning-based vulnerability detection model is proposed to define features using the clustering theory of the clonal selection algorithm. To our knowledge, this is the first time we have used deep-learned long-lived team-hacker features to process memories of sequential features and mapping from the entire history of previous inputs to target vectors in theory. With an immune-based feature selection model, the proposed approach aimed to improve static analyses' detection abilities. A real-world SA dataset is used based on three open-source PHP applications. Comparisons are conducted based on using a classification model for all features to measure the proposed feature selection methods' classification improvement. The results demonstrated that the proposed method got significant enhancements, which occurred in the classification accuracy also in the true positive rate.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Detecting Vulnerabilities in Source Code Using Machine Learning

A security vulnerability predictor based on source code metrics

Article 17 February 2023

A Survey of the Software Vulnerability Discovery Using Machine Learning Techniques

References

Zou Q, Ni L, Zhang T, Wang Q (2015) Deep learning based feature selection for remote sensing scene classification. IEEE Geosci Remote Sens Lett 12(11):2321–2325
Article Google Scholar
Alves H, Fonseca B, Antunes N (2016) Software metrics and security vulnerabilities: dataset and exploratory study. In: 12th European dependable computing conference (EDCC), Gothenburg, Sweden. pp 37–44
Williams L (2007) Toward the use of automated static analysis alerts for early identification of vulnerability-and attack-prone components. In: Second ınternational conference on ınternet monitoring and protection, San Jose, CA, USA. pp 18–18
Antonios G, Dimitris M, Diomidis S (2018) Vulinoss: a dataset of security vulnerabilities in open-source systems. In: Proceedings of the 15th ınternational conference on mining software repositories. ACM, pp 18–21
Koc U, Saadatpanah P, Foster JS, Porter A (2017) Learning a classifier for false positive error reports emitted by static code analysis tools. In: Proceedings of the 1st ACM SIGPLAN ınternational workshop on machine learning and programming languages, MAPL 2017. ACM , New York, NY, USA, pp 35–42
Twycross J, Aickelin U (2010) Information fusion in the immune system. Inf Fus 11(1):35–44
Article Google Scholar
Li Z, Zou D, Xu S, Jin H et al (2018) VulDeePecker: A deep learning-based system for vulnerability detection, network and distributed systems security (NDSS) symposium 2018, San Diego, CA, USA ISBN: 1-1891562-49-5. http://dx.doi.org/https://doi.org/10.14722/ndss.2018.23158
Dolan-Gavitt B, Hulin P, Kirda E, Leek T, Mambretti A, Robertson WK, Ulrich F, Whelan R (2016) LAVA: large scale automated vulnerability addition. İn: IEEE symposium on security and privacy, SP 2016. pp 110–121. Doi: https://doi.org/10.1109/SP.2016.15
Goeschel K (2019) Feature set selection for ımproved classification of static analysis alerts, Nova Southeastern University, College of Computing and Engineering, CCE Theses and Dissertations
Fang Y, Han S, Huang C, Wu R (2019) TAP: A static analysis model for PHP vulnerabilities based on token and deep learning technology. PLoS ONE. https://doi.org/10.1371/journal.pone.0225196
Article Google Scholar
Manjula C, Florence L (2019) Deep neural network based hybrid approach for software defect prediction using software metrics. Cluster Comput. https://doi.org/10.1007/s10586-018-1696-z
Article Google Scholar
Kwon D, Kim H, Kim J et al (2019) A survey of deep learning-based network anomaly detection. Cluster Comput 22:949–961. https://doi.org/10.1007/s10586-017-1117-8
Article Google Scholar
Abualigah L, Diabat A, Mirjalili S, Abd Elaziz M, Gandomi AH (2021) The arithmetic optimization algorithm. Comput Methods Appl Mech Eng 376:113609
Article MathSciNet Google Scholar
Abualigah L, Yousri D, Abd Elaziz M, Ewees AA, Al-qaness MA, Gandomi AH (2021) Aquila optimizer: a novel meta-heuristic optimization algorithm. Comput Ind Eng. https://doi.org/10.1016/j.cie.2021.107250
Article Google Scholar
Alnafessah A, Casale G (2020) Artificial neural networks based techniques for anomaly detection in Apache Spark. Cluster Comput. https://doi.org/10.1007/s10586-019-02998-y
Article Google Scholar
Zlomislić V, Fertalj K, Sruk V (2017) Denial of service attacks, defences and research challenges. Cluster Comput 20:661–671. https://doi.org/10.1007/s10586-017-0730-x
Article Google Scholar
Wang C, Yao H, Liu Z (2019) An efficient DDoS detection based on SU-Genetic feature selection. Cluster Comput 22:2505–2515. https://doi.org/10.1007/s10586-018-2275-z
Article Google Scholar
Xue B, Zhang M, Browne WN, Yao X (2016) A survey on evolutionary computation approaches to feature selection. IEEE Trans Evol Comput 20(4):606–626
Article Google Scholar
Zhang X, Liu F (2009) Feature selection based on clonal selection algorithm. Eval Appl. https://doi.org/10.4018/978-1-60566-310-4.ch009
Article Google Scholar
Sharma A, Sharma D (2011) Clonal selection algorithm for classification. In: Liò P, Nicosia G, Stibor T (eds) Artificial immune systems. ICARIS 2011. Lecture notes in computer science, vol 6825. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-22371-6_31
Ambusaidi M, He X, Nanda P, Tan Z (2016) Building an intrusion detection system using a filter-based feature selection algorithm. IEEE Trans Comput 65(10):2986–2998
Article MathSciNet Google Scholar
Chess B, McGraw G (2004) Static analysis for security. IEEE Secur Priv 2(6):76–79. https://doi.org/10.1109/MSP.2004.111
Article Google Scholar
Timmis J, Knight T, de Castro LN, Hart E (2004) An overview of artificial immune systems. In: Paton R, Bolouri H, Holcombe M, Parish JH, Tateson R (eds) Computation in cells and tissues. Natural computing series. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-662-06369-9_4
De Castro LN, Timmis JI (2003) Artificial immune systems as a novel soft computing paradigm. Soft Comput 7(8):526–544
Article Google Scholar
Huang G, Li Y, Wang Q, Ren J, Cheng Y, Zhao X et al (2019) Automatic Classification method for software vulnerability based on Deep Neural Network. IEEE Access. https://doi.org/10.1109/ACCESS.2019.2900462
Article Google Scholar
Shin Y, Meneely A, Williams L, Osborne JA (2011) Evaluating complexity, code churn, and developer activity metrics as indicators of software vulnerabilities. IEEE Trans Softw Eng 37(6):772–787
Article Google Scholar
Abualigah L, Alsalibi B, Shehab M, Alshinwan M, Khasawneh AM, Alabool H (2020) A parallel hybrid krill herd algorithm for feature selection. Int J Mach Learn Cybern. https://doi.org/10.1007/s13042-020-01202-7
Article Google Scholar
Archibald R, Fann G (2007) Feature selection and classification of hyperspectral images with support vector machines. IEEE Geosci Remote Sens Lett 4(4):674–677
Article Google Scholar
https://www.evolvingsciences.com
Dudek G (2012) An artificial immune system for classification with local feature selection. IEEE Trans Evol Comput 16(6):847–860
Article Google Scholar
Alom MDZ, Taha TM et al (2019) A state-of-the-art survey on deep learning theory and architectures. Electronics 8:292. https://doi.org/10.3390/electronics8030292
Article Google Scholar
Goldberg MD, Qu Y, McMillin LM, Wolf W, Zhou L, Divakarla M (2003) AIRS near-real-time products and algorithms in support of operational numerical weather prediction. IEEE Trans Geosci Remote Sens 41(2):379–389
Article Google Scholar
Sherstinsky A (2020) Fundamentals of recurrent neural network (rnn) and long short-term memory (lstm) network. Physica D 404:132306
Article MathSciNet Google Scholar
Li Z, Zou D, Xu S, Jin H, Zhu Y, Chen Z (2018) SySeVR: A framework for using deep learning to detect software vulnerabilities. arXiv:1807.06756
Dasgupta D, Nino F (2009) Immunological computation: theory and applications. Taylor & Francis, London
Google Scholar
Abualigah LM, Khader AT, Hanandeh ES (2018) Hybrid clustering analysis using improved krill herd algorithm. Appl Intell 48(11):4047–4071
Article Google Scholar
Abualigah LM, Khader AT, Hanandeh ES, Gandomi AH (2017) A novel hybridization strategy for krill herd algorithm applied to clustering techniques. Appl Soft Comput 60:423–435
Article Google Scholar
Abualigah L (2018) Feature selection and enhanced krill herd algorithm for text document clustering. Springer, Berlin . https://doi.org/10.1007/978-3-030-10674-4 (ISBN: 1860-949X)
Book Google Scholar
Abualigah LM, Khader AT (2017) Unsupervised text feature selection technique based on hybrid particle swarm optimization algorithm with genetic operators for the text clustering. J Supercomput 73(11):4773–4795
Article Google Scholar
Russell R, Kim L, Hamilton L, Lazovich T, Harer J, Ozdemir O, Ellingwood P, McConley M (2018) Automated vulnerability detection in source code using deep representation learning. İn: Proceedings of 17th IEEE ınternational conference on machine learning and applications (ICMLA). pp 757–762
Dam HK, Tran T, Pham T, Ng SW, Grundy J, Ghose A (2021) Automatic feature learning for predicting vulnerable software components. IEEE Trans Softw Eng 47(1):67–85. https://doi.org/10.1109/TSE.2018.2881961
Article Google Scholar
Xiaomeng W, Tao Z, Runpu W, Wei X, Changyu H (2018) CPGVA: code property graph-based vulnerability analysis by deep learning. İn: Proceedings of the 2018 10th ınternational conference on advanced ınfocomm technology (ICAIT). IEEE, pp 184–188
Zhou Y, Liu S, Siow J, Du X, Liu Y (2019) Devign: effective vulnerability identification by learning comprehensive program semantics via graph neural networks. In: Wallach H, Larochelle H, Beygelzimer A, d'AlcheBuc F, Fox E, Garnett R (eds) NIPS proceedings - advances in neural ınformation processing systems 32 (NIPS 2019) (Vol. 32). (Advances in Neural Information Processing Systems). Neural Information Processing Systems (NIPS)
Ghaffarian SM, Shahriari HR (2017) Software vulnerability analysis and discovery using machine-learning and data-mining techniques: a survey. ACM Comput Surv CSUR 50(4):1–36
Article Google Scholar
Brucker AD, Deuster T (2014) U.S. Patent No. 8,881,293. Washington, DC: U.S. Patent and Trademark Office
Graves A. (2012) Long short-term memory. In: Supervised sequence labelling with recurrent neural networks. Studies in computational intelligence, vol 385. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-24797-2_4
Chu Y et al (2019) DTI-CDF: a cascade deep forest model towards the prediction of drug-target interactions based on hybrid features. Brief Bioinform. https://doi.org/10.1093/bib/bbz152
Article Google Scholar
Zhang YF et al (2020) SPVec: A Word2vec-ınspired feature representation method for drug-target ınteraction prediction. Front Chem 7:895
Article Google Scholar
Wang X et al (2019) STS-NLSP: a network-based label space partition method for predicting the specificity of membrane transporter substrates using a hybrid feature of structural and semantic similarity. Front Bioeng Biotechnol 7:306. https://doi.org/10.3389/fbioe,2019.p.306-319
Article Google Scholar
Junaid M et al (2019) Extraction of molecular features for the drug discovery targeting protein‐protein interaction of Helicobacter pylori CagA and tumor suppressor protein ASSP2. Proteins: Structure, Function, and Bioinformatics. pp 837–849
Khan F et al (2020) Prediction of recombination spots using novel hybrid feature extraction method via deep learning approach. Front Genet 11:1052
Google Scholar

Download references

Author information

Authors and Affiliations

Faculty of Engineering and Natural Sciences, Malatya Turgut Ozal University, 44210, Malatya, Turkey
Canan Batur Şahin
Faculty of Computer Sciences and Informatics, Amman Arab University, Amman, 11953, Jordan
Laith Abualigah

Authors

Canan Batur Şahin
View author publications
You can also search for this author in PubMed Google Scholar
Laith Abualigah
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Canan Batur Şahin.

Ethics declarations

Conflict of interest

There is no conflict of interest for authorship.

Ethical approval

This manuscript does not contain any studies with human participants carried out by any of the authors.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Batur Şahin, C., Abualigah, L. A novel deep learning-based feature selection model for improving the static analysis of vulnerability detection. Neural Comput & Applic 33, 14049–14067 (2021). https://doi.org/10.1007/s00521-021-06047-x

Download citation

Received: 21 October 2020
Accepted: 15 April 2021
Published: 01 May 2021
Issue Date: October 2021
DOI: https://doi.org/10.1007/s00521-021-06047-x

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A novel deep learning-based feature selection model for improving the static analysis of vulnerability detection

Abstract

Access this article

Similar content being viewed by others

Detecting Vulnerabilities in Source Code Using Machine Learning

A security vulnerability predictor based on source code metrics

A Survey of the Software Vulnerability Discovery Using Machine Learning Techniques

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Ethical approval

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A novel deep learning-based feature selection model for improving the static analysis of vulnerability detection

Abstract

Access this article

Similar content being viewed by others

Detecting Vulnerabilities in Source Code Using Machine Learning

A security vulnerability predictor based on source code metrics

A Survey of the Software Vulnerability Discovery Using Machine Learning Techniques

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Ethical approval

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation