Skip to main content
Log in

An in-depth experimental study of anomaly detection using gradient boosted machine

  • Original Article
  • Published:
Neural Computing and Applications Aims and scope Submit manuscript

Abstract

This paper proposes an improved detection performance of anomaly-based intrusion detection system (IDS) using gradient boosted machine (GBM). The best parameters of GBM are obtained by performing grid search. The performance of GBM is then compared with the four renowned classifiers, i.e. random forest, deep neural network, support vector machine, and classification and regression tree in terms of four performance measures, i.e. accuracy, specificity, sensitivity, false positive rate and area under receiver operating characteristic curve (AUC). From the experimental result, it can be revealed that GBM significantly outperforms the most recent IDS techniques, i.e. fuzzy classifier, two-tier classifier, GAR-forest, and tree-based classifier ensemble. These results are the highest so far applied on the complete features of three different datasets, i.e. NSL-KDD, UNSW-NB15, and GPRS dataset using either tenfold cross-validation or hold-out method. Moreover, we prove our results by conducting two statistical significant tests which are yet to discover in the existing IDS researches.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others

References

  1. Aiello S, Eckstrand E, Fu A, Landry M, Aboyoun P (2016) Machine learning with R and H2O. https://h2o-release.s3.amazonaws.com/h2o/rel-turan/4/docs-website/h2o-docs/booklets/R_Vignette.pdf. Accessed July 2017

  2. Arora A, Candel A, Lanford J, LeDell E, Parmar V (2016) Deep learning with H2O. http://h2o.ai/resources

  3. Breiman L (2001) Random forests. Mach Learn 45(1):5–32

    Article  MATH  Google Scholar 

  4. Breiman L, Friedman J, Stone CJ, Olshen RA (1984) Classification and regression trees. CRC Press, Boca Raton

    MATH  Google Scholar 

  5. Chang CC, Lin CJ (2011) LIBSVM: a library for support vector machines. ACM Trans Intell Syst Technol 2(3):27

    Article  Google Scholar 

  6. Chebrolu S, Abraham A, Thomas JP (2005) Feature deduction and ensemble design of intrusion detection systems. Comput Secur 24(4):295–307

    Article  Google Scholar 

  7. Conover WJ (1999) Practical nonparametric statistics 3rd edition, John Wiley and Sons, Michigan

  8. Cortes C, Vapnik V (1995) Support-vector networks. Mach Learn 20(3):273–297

    MATH  Google Scholar 

  9. Friedman JH (2001) Greedy function approximation: a gradient boosting machine. Ann Stat 29(5):1189–1232

    Article  MathSciNet  MATH  Google Scholar 

  10. García S, Fernández A, Luengo J, Herrera F (2010) Advanced nonparametric tests for multiple comparisons in the design of experiments in computational intelligence and data mining: experimental analysis of power. Inf Sci 180(10):2044–2064

    Article  Google Scholar 

  11. Giacinto G, Perdisci R, Del Rio M, Roli F (2008) Intrusion detection in computer networks by a modular ensemble of one-class classifiers. Inf Fusion 9(1):69–82

    Article  Google Scholar 

  12. Govindarajan M, Chandrasekaran R (2011) Intrusion detection using neural based hybrid classification methods. Comput Netw 55(8):1662–1671

    Article  Google Scholar 

  13. Harb HM, Desuky AS (2011) Adaboost ensemble with genetic algorithm post optimization for intrusion detection. Int J Comput Sci Issues 8:5

    Google Scholar 

  14. Hsu CW, Chang CC, Lin CJ et al (2010) A practical guide to support vector classification. http://www.datascienceassn.org/sites/default/files/Practical Guide to Support Vector Classification.pdf. Accessed July 2017

  15. Kanakarajan NK, Muniasamy K (2016) Improving the accuracy of intrusion detection using GAR-Forest with feature selection. In: Proceedings of the 4th international conference on frontiers in intelligent computing: theory and applications (FICTA) 2015, Springer, New York, pp 539–547

  16. Kevric J, Jukic S, Subasi A (2016) An effective combining classifier approach using tree algorithms for network intrusion detection. Neural Comput Appl 1–8

  17. Krömer P, Platoš J, Snášel V, Abraham A (2011) Fuzzy classification by evolutionary algorithms. In: 2011 IEEE international conference on systems, man, and cybernetics (SMC), IEEE, pp 313–318

  18. Kuhn M (2008) Building predictive models in R using the caret package. J Stat Softw 28(5):1–26

    Article  Google Scholar 

  19. LeCun Y, Bengio Y, Hinton G (2015) Deep learning. Nature 521(7553):436–444

    Article  Google Scholar 

  20. Lewis RJ (2000) An introduction to classification and regression tree (CART) analysis. In: Annual meeting of the society for academic emergency medicine in San Francisco, California, pp 1–14

  21. Loh WY (2011) Classification and regression trees. Wiley Interdiscip Rev Data Min Knowl Discov 1(1):14–23

    Article  Google Scholar 

  22. Mohammadi M, Raahemi B, Akbari A, Nassersharif B (2012) New class-dependent feature transformation for intrusion detection systems. Secur Commun Netw 5(12):1296–1311

    Article  Google Scholar 

  23. Moustafa N, Slay J (2015) UNSW-NB15: a comprehensive data set for network intrusion detection systems (UNSW-NB15 network data set). In: Military communications and information systems conference (MilCIS), 2015, IEEE, pp 1–6

  24. Moustafa N, Slay J (2016) The evaluation of network anomaly detection systems: statistical analysis of the UNSW-NB15 data set and the comparison with the KDD99 data set. Inf Secur J Glob Perspect 25(1–3):18–31

    Article  Google Scholar 

  25. Mukkamala S, Sung AH, Abraham A (2005) Intrusion detection using an ensemble of intelligent paradigms. J Netw Comput Appl 28(2):167–182

    Article  Google Scholar 

  26. Oza NC, Tumer K (2008) Classifier ensembles: select real-world applications. Inf Fusion 9(1):4–20

    Article  Google Scholar 

  27. Pajouh HH, Dastghaibyfard G, Hashemi S (2017) Two-tier network anomaly detection model: a machine learning approach. J Intell Inf Syst 48(1):61–74

    Article  Google Scholar 

  28. Panda M, Abraham A, Patra MR (2010) Discriminative multinomial naive bayes for network intrusion detection. In: Information assurance and security (IAS), 2010 sixth international conference on IEEE, pp 5–10

  29. Rokach L (2010) Ensemble-based classifiers. Artif Intell Rev 33(1–2):1–39

    Article  Google Scholar 

  30. Sindhu SSS, Geetha S, Kannan A (2012) Decision tree based light weight intrusion detection using a wrapper approach. Expert Syst Appl 39(1):129–141

    Article  Google Scholar 

  31. Tama BA, Rhee KH (2015a) A combination of PSO-based feature selection and tree-based classifiers ensemble for intrusion detection systems. In: Advances in computer science and ubiquitous computing, Springer, New York, pp 489–495

  32. Tama BA, Rhee KH (2015b) Performance analysis of multiple classifier system in DoS attack detection. In: International workshop on information security applications, Springer, New York, pp 339–347

  33. Tama BA, Rhee KH (2016) Classifier ensemble design with rotation forest to enhance attack detection of IDS in wireless network. In: 2016 11th Asia joint conference on information security (AsiaJCIS), IEEE, pp 87–91

  34. Tama BA, Rhee KH (2017) Performance evaluation of intrusion detection system using classifier ensembles. Int J Internet Protoc Technol 10(1):22–29

    Article  Google Scholar 

  35. Tavallaee M, Bagheri E, Lu W, Ghorbani AA (2009) A detailed analysis of the KDD Cup 99 data set. In: Proceedings of the second IEEE symposium on computational intelligence for security and Defence applications 2009

  36. Therneau TM, Atkinson B, Ripley B et al (2010) rpart: Recursive partitioning. R Package Version 3:1–46

    Google Scholar 

  37. Vilela DW, Ferreira E, Shinoda AA, de Souza Araujo NV, de Oliveira R, Nascimento VE (2014) A dataset for evaluating intrusion detection systems in IEEE 802.11 wireless networks. In: IEEE Colombian conference on communications and computing (COLCOM), IEEE, pp 1–5

Download references

Acknowledgements

This work was supported by the National Research Foundation of Korea (NRF) grant funded by the Korea government (MSIP) (No. NRF-2014R1A2A1A11052981), and partially supported by the MSIP (Ministry of Science, ICT and Future Planning), Korea, under the ITRC (Information Technology Research Center) support program (IITP-2017-2015-0-00403) supervised by the IITP (Institute for Information & communications Technology Promotion). First author acknowledges Korean Government for providing scholarship through KGSP for Graduate 2013–2018.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Kyung-Hyune Rhee.

Ethics declarations

Conflict of interest

The authors declare that there is no conflict of interest.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Tama, B.A., Rhee, KH. An in-depth experimental study of anomaly detection using gradient boosted machine. Neural Comput & Applic 31, 955–965 (2019). https://doi.org/10.1007/s00521-017-3128-z

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00521-017-3128-z

Keywords

Navigation