Skip to main content
Log in

Convolutional neural networks and extreme learning machines for malware classification

  • Original Paper
  • Published:
Journal of Computer Virology and Hacking Techniques Aims and scope Submit manuscript

Abstract

Research in the field of malware classification often relies on machine learning models that are trained on high-level features, such as opcodes, function calls, and control flow graphs. Extracting such features is costly, since disassembly or code execution is generally required. In this paper, we conduct experiments to train and evaluate machine learning models for malware classification, based on features that can be obtained without disassembly or code execution. Specifically, we visualize malware samples as images and employ image analysis techniques using both two-dimensional images and one-dimensional vectors derived from images. We consider two machine learning techniques, namely, convolutional neural networks (CNN) and extreme learning machines (ELM). For images we find that ELMs can achieve accuracies on par with CNNs, yet ELM training requires less than 2% of the time needed to train a comparable CNN. We also find that ELMs and CNNs perform as well when trained on one-dimensional data as when trained on two-dimensional data. In this latter case, ELMs are faster to train than CNNs, but only by a relatively small factor as compared to image-based training.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18
Fig. 19
Fig. 20

Similar content being viewed by others

References

  1. Akusok, A., Björk, K.-M., Miché, Y., Lendasse, A.: High-performance extreme learning machines: a complete toolbox for big data applications. IEEE Access 3, 1011–1025 (2015)

    Article  Google Scholar 

  2. Bhodia, N., Prajapati, P., Troia, F., Stamp, M.: Transfer learning for image-based malware classification. In: Mori, P., Furnell, S., Camp, O. (eds.) Proceedings of the 5th International Conference on Information Systems Security and Privacy. ICISSP 2019, pp. 719–726 (2019)

  3. Brownlee, J.: A gentle introduction to dropout for regularizing deep neural networks (2018). https://machinelearningmastery.com/dropout-for-regularizing-deep-neural-networks/

  4. Cao, J., Hao, J., Lai, X., Vong, C.-M., Luo, M.: Ensemble extreme learning machine and sparse representation classification. J. Frankl Inst 353(17), 4526–4541 (2016)

    Article  MathSciNet  Google Scholar 

  5. Cesare, S., Xiang, Y.: Classification of malware using structured control flow. In: Proceedings of the Eighth Australasian Symposium on Parallel and Distributed Computing, Vol. 107, AusPDC ’10, pp. 61–70 (2010)

  6. Chollet, F. et al.: Keras (2015). https://github.com/fchollet/keras

  7. Damodaran, A., Di Troia, F., Visaggio, C.A., Austin, T.H., Stamp, M.: A comparison of static, dynamic, and hybrid analysis for malware detection. J. Comput. Virol. Hacking Tech. 13(1), 1–12 (2017)

    Article  Google Scholar 

  8. Extreme learning machine implementation in Python. https://github.com/dclambert/Python-ELM

  9. Farrokhmanesh, M., Hamzeh, A.: A novel method for malware detection using audio signal processing techniques. In: 2016 Artificial Intelligence and Robotics (IRANOPEN), pp. 85–91 (2016)

  10. Farrokhmanesh, M., Hamzeh, A.: Music classification as a new approach for malware detection. J. Comput. Virol. Hacking Tech. 15(2), 77–96 (2019)

    Article  Google Scholar 

  11. Fernández-Navarro, F., Hervás-Martinez, C., Sanchez-Monedero, J., Gutiérrez, P.A.: MELM-GRBF: a modified version of the extreme learning machine for generalized radial basis function neural networks. Neurocomputing 74(16), 2502–2510 (2011)

    Article  Google Scholar 

  12. Huang, G.-B., Zhu, Q.-Y., Siew, C.-K.: Extreme learning machine: a new learning scheme of feedforward neural networks. In: 2004 IEEE International Joint Conference on Neural Networks, vol. 2, pp. 985–990 (2004)

  13. Hashemi, H., Azmoodeh, A., Hamzeh, A., Hashemi, S.: Graph embedding as a new approach for unknown malware detection. J. Comput. Virol. Hacking Tech. 13(3), 153–166 (2017)

    Article  Google Scholar 

  14. Huang, G., Huang, G.-B., Song, S., You, K.: Trends in extreme learning machines: a review. Neural Netw. 61, 32–48 (2015)

    Article  Google Scholar 

  15. Hubel, D., Wiesel, T.: Receptive fields, binocular interaction, and functional architecture in the cat’s visual cortex. J. Physiol. 160, 106–154 (1962)

    Article  Google Scholar 

  16. Jahromi, A., Hashemi, S., Dehghantanha, A., Choo, K.-K.R., Karimipour, H., Newton, D.E., Parizi, R.M.: An improved two-hidden-layer extreme learning machine for malware hunting. Comput. Secur. 89, 1 (2019)

    Google Scholar 

  17. Kolter, J.Z., Maloof, M.A.: Learning to detect and classify malicious executables in the wild. J. Mach. Learn. Res. 7, 2721–2744 (2006)

    MathSciNet  MATH  Google Scholar 

  18. Laks. Supervised classification with \(k\)-fold cross validation on a multi family malware dataset (2014). https://sarvamblog.blogspot.com/2014/08/supervised-classification-with-k-fold.html

  19. Majumdar, A., Masiwal, G., Meshram, B.B.: Analysis of signature-based and behaviour-based anti-malware approaches. In: International Journal of Advanced Research in Computer Engineering and Technology, vol. 2 (June 2013)

  20. Nataraj, L., Karthikeyan, S., Jacob, G., Manjunath, B.S.: Malware images: visualization and automatic classification. In: Proceedings of the 8th International Symposium on Visualization for Cyber Security, VizSec ’11, pp. 4:1–4:7, New York, NY, USA (2011). ACM

  21. Pak, M., Kim, S.: A review of deep learning in image recognition. In: 2017 4th International Conference on Computer Applications and Information Processing Technology, pp. 1–3 (August 2017)

  22. Santos, I., Brezo, F., Ugarte-Pedrero, X., Bringas, P.G.: Opcode sequences as representation of executables for data-mining-based unknown malware detection. Inf. Sci. 231, 64–82 (2013)

    Article  MathSciNet  Google Scholar 

  23. Santos, I., Penya, Y.K., Devesa, J., Bringas, P.: \(n\)-grams-based file signatures for malware detection. In: Proceedings of the 11th International Conference on Enterprise Information Systems, ICEIS 2009 (2009)

  24. Schultz, M.G., Eskin, E., Zadok, E., Stolfo, S.J.: Data mining methods for detection of new malicious executables. In: Proceedings 2001 IEEE Symposium on Security and Privacy, SP ’01, pp. 38–49 (2001)

  25. Shamshirband, S., Chronopoulos, A.T.: A new malware detection system using a high performance-elm method. In: Proceedings of the 23rd International Database Applications and Engineering Symposium, IDEAS ’19, pages 33:1–33:10 (2019)

  26. Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.: Dropout: a simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 15(1), 1929–1958 (2014)

    MathSciNet  MATH  Google Scholar 

  27. Mark Stamp. Deep thoughts on deep learning (2019). https://www.cs.sjsu.edu/~stamp/RUA/ann.pdf

  28. Symantec (2018). Internet security threat report. Technical report, Symantec

  29. Vasan, D., Alazab, M., Wassan, S., Safaei, B., Zheng, Q.: Image-based malware classification using ensemble of CNN architectures (IMCEC). Computers and Security, p. 101748 (2020)

  30. Venkatraman, S., Alazab, M., Vinayakumar, R.: A hybrid deep learning image-based analysis for effective malware detection. J. Inf. Secur. Appl. 47, 377–389 (2019)

    Google Scholar 

  31. Wong, A.: 2019 Symantec internet security threat report highlights. https://www.techarp.com/cybersecurity/2019-symantec-istr-highlights/ (2019)

  32. Wong, W., Stamp, M.: Hunting for metamorphic engines. J. Comput. Virol. 2(3), 211–229 (2006)

    Article  Google Scholar 

  33. Ming, X., Lingfei, W., Qi, S., Jian, X., Zhang, H., Ren, Y., Zheng, N.: A similarity metric method of obfuscated malware using function-call graph. J. Comput. Virol. Hacking Tech. 9(1), 35–47 (2013)

    Article  Google Scholar 

  34. Yajamanam, S., Selvin, V.R.S., Troia, F.D., Stamp, M.: Deep learning versus gist descriptors for image-based malware classification. In: Proceedings of the 4th International Conference on Information Systems Security and Privacy, ICISSP 2018, pp. 553–561 (2018)

  35. Zhang, W., Ren, H., Jiang, Q., Zhang, K.: Exploring feature extraction and ELM in malware detection for Android devices. In: Hu, X., Xia, Y., Zhang, Y., Zhao, D. (eds) Advances in Neural Networks, ISNN 2015, pp. 489–498 (2015)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mark Stamp.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Jain, M., Andreopoulos, W. & Stamp, M. Convolutional neural networks and extreme learning machines for malware classification. J Comput Virol Hack Tech 16, 229–244 (2020). https://doi.org/10.1007/s11416-020-00354-y

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11416-020-00354-y

Navigation