Evaluation of random forest classifier in security domain

Khorshidpour, Zeinab; Hashemi, Sattar; Hamzeh, Ali

doi:10.1007/s10489-017-0907-2

Evaluation of random forest classifier in security domain

Published: 12 April 2017

Volume 47, pages 558–569, (2017)
Cite this article

Applied Intelligence Aims and scope Submit manuscript

Zeinab Khorshidpour¹,
Sattar Hashemi¹ &
Ali Hamzeh¹

804 Accesses
16 Citations
Explore all metrics

Abstract

There is an intrinsic adversarial nature in the security domain such as spam filtering and malware detection systems that attempt to mislead the detection system. This adversarial nature makes security applications different from the classical machine learning problems; for instance, an adversary (attacker) might change the distribution of test data and violate the data stationarity, a common assumption in machine learning techniques. Since machine learning methods are not inherently adversary-aware, a classifier designer should investigate the robustness of a learning system under attack. In this respect, recent studies have modeled the identified attacks against machine learning-based detection systems. Based on this, a classifier designer can evaluate the performance of a learning system leveraging the modeled attacks. Prior research explored a gradient-based approach in order to devise an attack against a classifier with differentiable discriminant function like SVM. However, there are several powerful classifiers with non-differentiable decision boundary such as Random Forest, which are commonly used in different security domain and applications. In this paper, we present a novel approach to model an attack against classifiers with non-differentiable decision boundary. In the experimentation, we first present an example that visually shows the effect of a successful attack on the MNIST handwritten digits classification task. Then we conduct experiments for two well-known applications in the security domain: spam filtering and malware detection in PDF files. The experimental results demonstrate that the proposed attack successfully evades Random Forest classifier and effectively degrades the classifier’s performance.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Artificial Intelligence and Fraud Detection

A comparative analysis of gradient boosting algorithms

Article 24 August 2020

Candice Bentéjac, Anna Csörgő & Gonzalo Martínez-Muñoz

Review: machine learning techniques applied to cybersecurity

Article 04 January 2019

Javier Martínez Torres, Carla Iglesias Comesaña & Paulino J. García-Nieto

Notes

For a real number p≥1, the l _p-norm of x is defined by ∥x∥_p=(|x ₁|^p+|x ₂|^p+|x ₃|^p+...+|x _n|^p)^1/p. l ₁-norm is known as least absolute errors. l ₂-norm is known as least squares.
https://www.quora.com/What-is-the-difference-between-L1-and-L2-regularization

References

Warrender C, Forrest S, Pearlmutter B (1999) Detecting intrusions using system calls: Alternative data models Security and Privacy, 1999. Proceedings of the 1999 IEEE Symposium on, pp 133–145
Google Scholar
Benferhat S, Boudjelida A, Tabia K, Drias H (2013) An intrusion detection and alert correlation approach based on revising probabilistic classifiers using expert knowledge. Appl Intell 38(4):520–540
Article Google Scholar
Baran A (2013) Stopping spam with sending session verification. Turk J Electr Eng Comput Sci 21(Sup. 2):2259–2268
Article Google Scholar
Khor K-C, Ting C-Y, Phon-Amnuaisuk S (2012) A cascaded classifier approach for improving detection rates on rare attack categories in network intrusion detection. Appl Intell 36(2):320–329
Article Google Scholar
Zico Kolter J, Maloof MA (2006) Learning to detect and classify Malicious executables in the wild. J Mach Learn Res 7:2721–2744
MathSciNet MATH Google Scholar
Biggio B, Corona I, Maiorca D, Nelson B, Šxrndić N, Laskov P, Giacinto G, Roli F (2013) Evasion attacks against machine learning at test time Machine Learning and Knowledge Discovery in Databases. Springer, pp 387–402
Barreno M, Nelson B, Sears R, Joseph AD, Doug Tygar J (2006) Can machine learning be secure? Proceedings of the 2006 ACM Symposium on Information, computer and communications security, pp 16–25
Biggio B, Fumera G, Roli F (2014) Security evaluation of pattern classifiers under attack. IEEE Trans Knowl Data Eng 26(4):984–996
Article Google Scholar
Zhang F, Chan PPK, Biggio B, Yeung DS, Rolim F (2015) Adversarial feature selection against evasion attacks
Brückner M, Scheffer T (2011) Stackelberg games for adversarial prediction problems Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining, pp 547–555
Brückner M, Kanzow C, Scheffer T (2012) Static prediction games for adversarial learning problems. J Mach Learn Res 13(1):2617–2654
MathSciNet MATH Google Scholar
Polikar R (2006) Ensemble based systems in decision making. IEEE Circuits Syst Mag 6(3):21–45
Article Google Scholar
Breiman L (2001) Random forests. Mach Learn 45(1):5–32
Article MATH Google Scholar
Zhu C, Byrd RH, Peihuang L, Nocedal J (1997) Algorithm 778: L-bfgs-b: Fortran subroutines for large-scale bound-constrained optimization. ACM Trans Math Softw (TOMS) 23(4):550–560
Article MathSciNet MATH Google Scholar
Byrd RH, Peihuang L, Nocedal J, Zhu C (1995) A limited memory algorithm for bound constrained optimization. SIAM J Sci Comput 16(5):1190–1208
Article MathSciNet MATH Google Scholar
Macdonald C, Ounis I, Soboroff I (2007) Overview of the trec 2007 blog track TREC, vol 7. Citeseer, pp 31–43
Maiorca D, Corona I, Giacinto G (2013) Looking at the bag is not enough to find the bomb: an evasion of structural methods for Malicious pdf files detection Proceedings of the 8th ACM SIGSAC symposium on Information, computer and communications security, pp 119–130
Maiorca D, Giacinto G, Corona I (2012) A pattern recognition system for malicious pdf files detection Machine Learning and Data Mining in Pattern Recognition. Springer, pp 510–524
Smutz C, Stavrou A (2012) Malicious pdf detection using metadata and structural features Proceedings of the 28th Annual Computer Security Applications Conference, pp 239–248
Ṡrndic N, Laskov P (2013) Detection of Malicious pdf files based on hierarchical document structure Proceedings of the 20th Annual Network & Distributed System Security Symposium
Corona I, Maiorca D, Ariu D, Giacinto G (2014) Lux0r: Detection of Malicious pdf-embedded javascript code through discriminant analysis of api references Proceedings of the 2014 Workshop on Artificial Intelligent and Security Workshop, pp 47– 57

Download references

Acknowledgments

The authors gratefully acknowledge Dr. Richard Wallace from Riverside Research for suggesting changes, reviewing, and editing the grammar and readability of this paper.

Author information

Authors and Affiliations

Department of Electronic and Computer Engineering, Shiraz University, Shiraz, Iran
Zeinab Khorshidpour, Sattar Hashemi & Ali Hamzeh

Authors

Zeinab Khorshidpour
View author publications
You can also search for this author in PubMed Google Scholar
Sattar Hashemi
View author publications
You can also search for this author in PubMed Google Scholar
Ali Hamzeh
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Zeinab Khorshidpour.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Khorshidpour, Z., Hashemi, S. & Hamzeh, A. Evaluation of random forest classifier in security domain. Appl Intell 47, 558–569 (2017). https://doi.org/10.1007/s10489-017-0907-2

Download citation

Published: 12 April 2017
Issue Date: September 2017
DOI: https://doi.org/10.1007/s10489-017-0907-2

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Evaluation of random forest classifier in security domain

Abstract

Access this article

Similar content being viewed by others

Artificial Intelligence and Fraud Detection

A comparative analysis of gradient boosting algorithms

Review: machine learning techniques applied to cybersecurity

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Evaluation of random forest classifier in security domain

Abstract

Access this article

Similar content being viewed by others

Artificial Intelligence and Fraud Detection

A comparative analysis of gradient boosting algorithms

Review: machine learning techniques applied to cybersecurity

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation