Abstract
Metamorphic malware modifies its code structure using a morphing engine to evade traditional signature-based detection. Previous research has shown the use of opcode instructions as feature representation with Hidden Markov Model in the context of metamorphic malware detection. However, it would be more feasible to extract a file feature at fine-grained level. In this paper, we propose a novel detection approach by generating structural features through computing a stream of byte chunks using compression ratio, entropy, Jaccard similarity coefficient and Chi-square statistic test. Nonnegative Matrix Factorization is also considered to reduce the feature dimensions. We then use the coefficient vectors from the reduced space to train Hidden Markov Model. Experimental results show there is different performance between malware detection and classification among the proposed structural features.
Similar content being viewed by others
References
Annachhatre, C., Austin, T.H., Stamp, M.: Hidden markov models for malware classification. J. Comput. Virol. Hack. Tech. 11(2), 59–73 (2015)
Austin, T.H., Filiol, E., Josse, S., Stamp, M.: Exploring hidden markov models for virus analysis: a semantic approach. In: System Sciences (HICSS), 2013 46th Hawaii International Conference on, IEEE, pp 5039–5048 (2013)
Baldangombo, U., Jambaljav, N., Horng, S.J.: A static malware detection system using data mining methods. arXiv preprint arXiv:1308.2831 (2013)
Basole, S., Di Troia, F., Stamp, M.: Multifamily malware models. J. Comput. Virol. Hack. Tech., pp. 1–14 (2020)
Baum, L.E., Petrie, T., Soules, G., Weiss, N.: A maximization technique occurring in the statistical analysis of probabilistic functions of Markov chains. Ann. Math. Stat. 41(1), 164–171 (1970)
Baysa, D., Low, R.M., Stamp, M.: Structural entropy and metamorphic malware. J. Comput. Virol. Hack. Tech. 9(4), 179–192 (2013)
Canfora, G., Mercaldo, F., Visaggio, C.A., Di Notte, P.: Metamorphic malware detection using code metrics. Inf. Secur. J.: Glob. Perspect. 23(3), 57–67 (2014)
Cesare, S., Xiang, Y.: Software Similarity and Classification. Springer, Berlin (2012)
Cover, T.M., Thomas, J.A.: Elements of Information Theory. Wiley, London (2006)
Deshpande, S., Park, Y., Stamp, M.: Eigenvalue analysis for metamorphic detection. J. Comput. Virol. Hack. Tech. 10(1), 53–65 (2014)
Gharacheh, M., Derhami, V., Hashemi, S., Fard, S.M.H.: Detection of metamorphic malware based on hmm: a hierarchical approach. Int. J. Intell. Syst. Appl. 8(4) (2016)
Gibert, D., Mateu, C., Planes, J., Vicens, R.: Classification of malware by using structural entropy on convolutional neural networks. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 32 (2018)
Gibert, D., Mateu, C., Planes, J.: A hierarchical convolutional neural network for malware classification. In: 2019 International Joint Conference on Neural Networks (IJCNN), IEEE, pp. 1–8 (2019)
Gingrich, P.: Introductory Statistics for the Social Sciences. University of Regina, Department of Sociology and Social Sciences (1992)
Guan, X., Wang, W., Zhang, X.: Fast intrusion detection based on a non-negative matrix factorization model. J. Netw. Comput. Appl. 32(1), 31–44 (2009)
Guillamet, D., Schiele, B., Vitria, J.: Analyzing non-negative matrix factorization for image classification. In: Object Recognition Supported by User Interaction for Service Robots, IEEE 2, 116–119 (2002)
Hamon, R., Borgnat, P., Flandrin, P., Robardet, C.: Nonnegative matrix factorization to find features in temporal networks. In: 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), IEEE, pp. 1065–1069 (2014)
Imran, M., Afzal, M.T., Qadir, M.A.: Using hidden markov model for dynamic malware analysis: first impressions. In: Fuzzy Systems and Knowledge Discovery (FSKD), 2015 12th International Conference on, IEEE, pp. 816–821 (2015)
Jidigam, R.K., Austin, T.H., Stamp, M.: Singular value decomposition and metamorphic detection. J. Comput. Virol. Hack. Tech. 11(4), 203–216 (2015)
Kakisim, A.G., Nar, M., Sogukpinar, I.: Metamorphic malware identification using engine-specific patterns based on co-opcode graphs. Comput. Standards Interfaces, p. 103443 (2020)
Kalbhor, A., Austin, T.H., Filiol, E., Josse, S., Stamp, M.: Dueling hidden Markov models for virus analysis. J. Comput. Virol. Hack. Tech. 11(2), 103–118 (2015)
Lee, D.D., Seung, H.S.: Learning the parts of objects by non-negative matrix factorization. Nature 401(6755), 788–791 (1999)
Lee, J., Austin, T.H., Stamp, M.: Compression-based analysis of metamorphic malware. Int. J. Secure Netw. 10(2), 124–136 (2015)
Li, Y., Ngom, A,: Non-negative matrix and tensor factorization based classification of clinical microarray gene expression data. In: Bioinformatics and Biomedicine (BIBM), 2010 IEEE International Conference on, IEEE, pp 438–443 (2010)
Ling, Y.T., Sani, N.F.M., Abdullah, M.T., Hamid, N.A.W.A.: Structural features with nonnegative matrix factorization for metamorphic malware detection. Comput. Secur. 104, 102216 (2021)
Lloyd, S.: Least squares quantization in pcm. IEEE Trans. Inf. Theory 28(2), 129–137 (1982)
Lyda, R., Hamrock, J.: Using entropy analysis to find encrypted and packed malware. IEEE Security Privacy 5(2) (2007)
McCune, B., Grace, J.B.: Analysis of Ecological Communities. Mjm Software Design (2002)
Menéndez, H.D., Bhattacharya, S., Clark, D., Barr, E.T.: The arms race: adversarial search defeats entropy used to detect malware. Expert Syst. Appl. 118, 246–260 (2019)
Nappa, A., Rafique, M.Z., Caballero, J.: The malicia dataset: identification and analysis of drive-by download operations. Int. J. Inf. Secur. 14(1), 15–33 (2015)
Patri, O., Wojnowicz, M., Wolff, M.: Discovering malware with time series shapelets. In: Proceedings of the 50th Hawaii International Conference on System Sciences (2017)
Rabiner, L.R.: A tutorial on hidden markov models and selected applications in speech recognition. Proc. IEEE 77(2), 257–286 (1989)
Radkani, E., Hashemi, S., Keshavarz-Haddad, A., Haeri, M.A.: An entropy-based distance measure for analyzing and detecting metamorphic malware. Appl. Intell., pp. 1–11 (2017)
Raff, E., Nicholas, C.: An alternative to ncd for large sequences, lempel-ziv jaccard distance. In: Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1007–1015 (2017)
Rezaei, F., Hamedi-Hamzehkolaie, M., Rezaei, S., Payandeh, A.: Metamorphic viruses detection by hidden markov models. In: Telecommunications (IST), 2014 7th International Symposium on, IEEE, pp. 821–826 (2014a)
Rezaei, F., Nezhad, M.K., Rezaei, S., Payandeh, A.: Detecting encrypted metamorphic viruses by hidden markov models. In: Fuzzy Systems and Knowledge Discovery (FSKD), 2014 11th International Conference on, IEEE, pp. 973–977 (2014b)
Runwal, N., Low, R.M., Stamp, M.: Opcode graph similarity and metamorphic detection. J. Comput. Virol. 8(1–2), 37–52 (2012)
Saleh, M.E., Mohamed, A.B., Nabi, A.A.: Eigenviruses for metamorphic virus recognition. IET Inf. Secur. 5(4), 191–198 (2011)
Shannon, C.E.: A mathematical theory of communication. Bell Syst. Tech. J. 27(3), 379–423 (1948)
Sorokin, I.: Comparing files using structural entropy. J. Comput. Virol. 7(4), 259–265 (2011)
Sridhara, S.M., Stamp, M.: Metamorphic worm that carries its own morphing engine. J Comput. Virol. Hack. Tech. 9(2), 49–58 (2013)
Thunga, S.P., Neelisetti, R.K.: Identifying metamorphic virus using n-grams and hidden markov model. In: Advances in Computing, Communications and Informatics (ICACCI), 2015 International Conference on, IEEE, pp. 2016–2022 (2015)
Toderici, A.H., Stamp, M.: Chi-squared distance and metamorphic virus detection. J. Comput. Virol. Hack. Tech. 9(1), 1–14 (2013)
Vemparala, S., Di Troia, F., Corrado, V.A., Austin, T.H., Stamo, M.: Malware detection using dynamic birthmarks. In: Proceedings of the 2016 ACM on International Workshop on Security And Privacy Analytics, ACM, pp. 41–46 (2016)
Wartell, R., Zhou, Y., Hamlen, K.W., Kantarcioglu, M., Thuraisingham ,B.:Differentiating code from data in x86 binaries. In: Joint European Conference on Machine Learning and Knowledge Discovery in Databases, Springer, Berlin, pp. 522–536 (2011)
Wojnowicz, M., Chisholm, G., Wolff, M., Zhao, X.: Wavelet decomposition of software entropy reveals symptoms of malicious code. J. Innov. Digit. Ecosyst. 3(2), 130–140 (2016)
Wong, W., Stamp, M.: Hunting for metamorphic engines. J. Comput. Virol. 2(3), 211–229 (2006)
Zdunek, R., Phan ,A.H., Cichocki, A.: Image classification with nonnegative matrix factorization based on spectral projected gradient. In: Artificial Neural Networks, Springer, Berlin, pp. 31–50 (2015)
Zhang, J.: Machine learning with feature selection using principal component analysis for malware detection: a case study. arXiv preprint arXiv:1902.03639 (2019)
Zhang, Q., Reeves, D.S., (2007) Metaaware: Identifying metamorphic malware. In: Computer Security Applications Conference: ACSAC 2007, pp. 411–420. Twenty-Third Annual, IEEE (2007)
Ziv, J., Lempel, A.: A universal algorithm for sequential data compression. IEEE Trans. Inf. Theory 23(3), 337–343 (1977)
Acknowledgements
The authors are grateful to Universiti Malaysia Sarawak for providing scholarship for the preparation of the manuscript. Special thanks also go to Prof. Mark Stamp and Ass. Prof. Fabio Di Troia of SJSU, California, for sharing the Linux metamorphic malware (MWOR) and Malicia datasets, respectively, used in this study.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no known conflict interests that could have appeared to influence the work reported in this paper.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Ling, Y.T., Sani, N.F.M., Abdullah, M.T. et al. Metamorphic malware detection using structural features and nonnegative matrix factorization with hidden markov model. J Comput Virol Hack Tech 18, 183–203 (2022). https://doi.org/10.1007/s11416-021-00404-z
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11416-021-00404-z