Abstract
Behavior-based machine learning plays a vital role in malware classification, as it potentially overcomes the limitations of signature-based methods. This paper explores the use of dynamic call sequences as extracted by the open source Noriben tool, which employs dynamic analysis in a virtualized environment. Call sequences of a length of up to 5000 operations are generated for a total of 2000 benign and malware samples. Seven malware families are recognized: ransomware, trojan, backdoor, rootkit, virus, miner, and other. An empirical comparison analyzes five different classifiers: fully connected neural networks, GRU and LSTM, Transformer, and two combination approaches. The overall best performing approach is a concatenation of a GRU with a Transformer architecture, yielding the highest F1-score. This best model achieves accuracy and F1-score values of up to 97%.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Abadi, M., Agarwal, A., Barham, P., Brevdo, E., et al.: Tensorflow: Large-scale machine learning on heterogeneous distributed systems (2016)
Alibaba: Alitianchi contest (2021). https://tianchi.aliyun.com/competition/introduction.htm?spm=5176.11409106.5678.1.4354684cI0fYC1?raceId=231668s
Athiwaratkun, B., Stokes, J.W.: Malware classification with lstm and gru language models and a character-level cnn. In: 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 2482–2486 (2017). https://doi.org/10.1109/ICASSP.2017.7952603
Baskin, B.: Noriben malware analysis sandbox (2015). https://github.com/Rurik/Noriben
Chen, J., Guo, S., Ma, X., Li, H., Guo, J., Chen, M., Pan, Z.: Slam: a malware detection method based on sliding local attention mechanism. Secur. Commun. Networks 2020, 6724513:1–6724513:11 (2020)
Cho, K., van Merrienboer, B., Gulcehre, C., Bahdanau, D., Bougares, F., Schwenk, H., Bengio, Y.: Learning phrase representations using rnn encoder-decoder for statistical machine translation (2014)
FileHorse: Filehorse, June 2020, https://fileHorse.com
Forensics, C.: Virusshare, June 2020, https://virusshare.com/
Goldberg, Y., Levy, O.: word2vec explained: deriving mikolov et al’.s negative-sampling word-embedding method (2014). cite arxiv:1402.3722
Kolosnjaji, B., Zarras, A., Webster, G., Eckert, C.: Deep learning for classification of malware system call sequences. In: Kang, B.H., Bai, Q. (eds.) AI 2016. LNCS (LNAI), vol. 9992, pp. 137–149. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-50127-7_11
Maxwell, K.: Maltrieve: a tool to retrieve malware directly from the source for security researchers (2015). https://github.com/krmaxwell/maltrieve
O’Malley, T., Bursztein, E., Long, J., Chollet, F., Jin, H., Invernizzi, L., et al.: Keras Tuner (2019). https://github.com/keras-team/keras-tuner
Pedregosa, F., et al.: Scikit-learn: machine learning in python. JMLR 12, 2825–2830 (2011)
Pektas, A., Acarman, T.: Malware classification based on api calls and behaviour analysis. IET Inf. Secur. 12, 107–117 (2018)
Qian, Q., Tang, M.: Dynamic api call sequence visualization for malware classification. IET Inf. Secur. 13, October 2018
Saxe, J., Berlin, K.: expose: a character-level convolutional neural network with embeddings for detecting malicious urls, file paths and registry keys (2017)
Vaswani, A., et al.: Attention is all you need (2017)
VirusSign: Virussign, June 2020. https://samples.virussign.com/samples/
VirusTotal: Virustotal, June 2020. http://www.virustotal.com
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Chanajitt, R., Pfahringer, B., Gomes, H.M. (2022). A Comparison of Neural Network Architectures for Malware Classification Based on Noriben Operation Sequences. In: Pimenidis, E., Angelov, P., Jayne, C., Papaleonidas, A., Aydin, M. (eds) Artificial Neural Networks and Machine Learning – ICANN 2022. ICANN 2022. Lecture Notes in Computer Science, vol 13529. Springer, Cham. https://doi.org/10.1007/978-3-031-15919-0_36
Download citation
DOI: https://doi.org/10.1007/978-3-031-15919-0_36
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-15918-3
Online ISBN: 978-3-031-15919-0
eBook Packages: Computer ScienceComputer Science (R0)