Abstract
Research in the field of malware classification often relies on machine learning models that are trained on high-level features, such as opcodes, function calls, and control flow graphs. Extracting such features is costly, since disassembly or code execution is generally required. In this paper, we conduct experiments to train and evaluate machine learning models for malware classification, based on features that can be obtained without disassembly or code execution. Specifically, we visualize malware samples as images and employ image analysis techniques using both two-dimensional images and one-dimensional vectors derived from images. We consider two machine learning techniques, namely, convolutional neural networks (CNN) and extreme learning machines (ELM). For images we find that ELMs can achieve accuracies on par with CNNs, yet ELM training requires less than 2% of the time needed to train a comparable CNN. We also find that ELMs and CNNs perform as well when trained on one-dimensional data as when trained on two-dimensional data. In this latter case, ELMs are faster to train than CNNs, but only by a relatively small factor as compared to image-based training.
Similar content being viewed by others
References
Akusok, A., Björk, K.-M., Miché, Y., Lendasse, A.: High-performance extreme learning machines: a complete toolbox for big data applications. IEEE Access 3, 1011–1025 (2015)
Bhodia, N., Prajapati, P., Troia, F., Stamp, M.: Transfer learning for image-based malware classification. In: Mori, P., Furnell, S., Camp, O. (eds.) Proceedings of the 5th International Conference on Information Systems Security and Privacy. ICISSP 2019, pp. 719–726 (2019)
Brownlee, J.: A gentle introduction to dropout for regularizing deep neural networks (2018). https://machinelearningmastery.com/dropout-for-regularizing-deep-neural-networks/
Cao, J., Hao, J., Lai, X., Vong, C.-M., Luo, M.: Ensemble extreme learning machine and sparse representation classification. J. Frankl Inst 353(17), 4526–4541 (2016)
Cesare, S., Xiang, Y.: Classification of malware using structured control flow. In: Proceedings of the Eighth Australasian Symposium on Parallel and Distributed Computing, Vol. 107, AusPDC ’10, pp. 61–70 (2010)
Chollet, F. et al.: Keras (2015). https://github.com/fchollet/keras
Damodaran, A., Di Troia, F., Visaggio, C.A., Austin, T.H., Stamp, M.: A comparison of static, dynamic, and hybrid analysis for malware detection. J. Comput. Virol. Hacking Tech. 13(1), 1–12 (2017)
Extreme learning machine implementation in Python. https://github.com/dclambert/Python-ELM
Farrokhmanesh, M., Hamzeh, A.: A novel method for malware detection using audio signal processing techniques. In: 2016 Artificial Intelligence and Robotics (IRANOPEN), pp. 85–91 (2016)
Farrokhmanesh, M., Hamzeh, A.: Music classification as a new approach for malware detection. J. Comput. Virol. Hacking Tech. 15(2), 77–96 (2019)
Fernández-Navarro, F., Hervás-Martinez, C., Sanchez-Monedero, J., Gutiérrez, P.A.: MELM-GRBF: a modified version of the extreme learning machine for generalized radial basis function neural networks. Neurocomputing 74(16), 2502–2510 (2011)
Huang, G.-B., Zhu, Q.-Y., Siew, C.-K.: Extreme learning machine: a new learning scheme of feedforward neural networks. In: 2004 IEEE International Joint Conference on Neural Networks, vol. 2, pp. 985–990 (2004)
Hashemi, H., Azmoodeh, A., Hamzeh, A., Hashemi, S.: Graph embedding as a new approach for unknown malware detection. J. Comput. Virol. Hacking Tech. 13(3), 153–166 (2017)
Huang, G., Huang, G.-B., Song, S., You, K.: Trends in extreme learning machines: a review. Neural Netw. 61, 32–48 (2015)
Hubel, D., Wiesel, T.: Receptive fields, binocular interaction, and functional architecture in the cat’s visual cortex. J. Physiol. 160, 106–154 (1962)
Jahromi, A., Hashemi, S., Dehghantanha, A., Choo, K.-K.R., Karimipour, H., Newton, D.E., Parizi, R.M.: An improved two-hidden-layer extreme learning machine for malware hunting. Comput. Secur. 89, 1 (2019)
Kolter, J.Z., Maloof, M.A.: Learning to detect and classify malicious executables in the wild. J. Mach. Learn. Res. 7, 2721–2744 (2006)
Laks. Supervised classification with \(k\)-fold cross validation on a multi family malware dataset (2014). https://sarvamblog.blogspot.com/2014/08/supervised-classification-with-k-fold.html
Majumdar, A., Masiwal, G., Meshram, B.B.: Analysis of signature-based and behaviour-based anti-malware approaches. In: International Journal of Advanced Research in Computer Engineering and Technology, vol. 2 (June 2013)
Nataraj, L., Karthikeyan, S., Jacob, G., Manjunath, B.S.: Malware images: visualization and automatic classification. In: Proceedings of the 8th International Symposium on Visualization for Cyber Security, VizSec ’11, pp. 4:1–4:7, New York, NY, USA (2011). ACM
Pak, M., Kim, S.: A review of deep learning in image recognition. In: 2017 4th International Conference on Computer Applications and Information Processing Technology, pp. 1–3 (August 2017)
Santos, I., Brezo, F., Ugarte-Pedrero, X., Bringas, P.G.: Opcode sequences as representation of executables for data-mining-based unknown malware detection. Inf. Sci. 231, 64–82 (2013)
Santos, I., Penya, Y.K., Devesa, J., Bringas, P.: \(n\)-grams-based file signatures for malware detection. In: Proceedings of the 11th International Conference on Enterprise Information Systems, ICEIS 2009 (2009)
Schultz, M.G., Eskin, E., Zadok, E., Stolfo, S.J.: Data mining methods for detection of new malicious executables. In: Proceedings 2001 IEEE Symposium on Security and Privacy, SP ’01, pp. 38–49 (2001)
Shamshirband, S., Chronopoulos, A.T.: A new malware detection system using a high performance-elm method. In: Proceedings of the 23rd International Database Applications and Engineering Symposium, IDEAS ’19, pages 33:1–33:10 (2019)
Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.: Dropout: a simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 15(1), 1929–1958 (2014)
Mark Stamp. Deep thoughts on deep learning (2019). https://www.cs.sjsu.edu/~stamp/RUA/ann.pdf
Symantec (2018). Internet security threat report. Technical report, Symantec
Vasan, D., Alazab, M., Wassan, S., Safaei, B., Zheng, Q.: Image-based malware classification using ensemble of CNN architectures (IMCEC). Computers and Security, p. 101748 (2020)
Venkatraman, S., Alazab, M., Vinayakumar, R.: A hybrid deep learning image-based analysis for effective malware detection. J. Inf. Secur. Appl. 47, 377–389 (2019)
Wong, A.: 2019 Symantec internet security threat report highlights. https://www.techarp.com/cybersecurity/2019-symantec-istr-highlights/ (2019)
Wong, W., Stamp, M.: Hunting for metamorphic engines. J. Comput. Virol. 2(3), 211–229 (2006)
Ming, X., Lingfei, W., Qi, S., Jian, X., Zhang, H., Ren, Y., Zheng, N.: A similarity metric method of obfuscated malware using function-call graph. J. Comput. Virol. Hacking Tech. 9(1), 35–47 (2013)
Yajamanam, S., Selvin, V.R.S., Troia, F.D., Stamp, M.: Deep learning versus gist descriptors for image-based malware classification. In: Proceedings of the 4th International Conference on Information Systems Security and Privacy, ICISSP 2018, pp. 553–561 (2018)
Zhang, W., Ren, H., Jiang, Q., Zhang, K.: Exploring feature extraction and ELM in malware detection for Android devices. In: Hu, X., Xia, Y., Zhang, Y., Zhao, D. (eds) Advances in Neural Networks, ISNN 2015, pp. 489–498 (2015)
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Jain, M., Andreopoulos, W. & Stamp, M. Convolutional neural networks and extreme learning machines for malware classification. J Comput Virol Hack Tech 16, 229–244 (2020). https://doi.org/10.1007/s11416-020-00354-y
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11416-020-00354-y