Skip to main content
Log in

Metamorphic malware detection using structural features and nonnegative matrix factorization with hidden markov model

  • Original Paper
  • Published:
Journal of Computer Virology and Hacking Techniques Aims and scope Submit manuscript

Abstract

Metamorphic malware modifies its code structure using a morphing engine to evade traditional signature-based detection. Previous research has shown the use of opcode instructions as feature representation with Hidden Markov Model in the context of metamorphic malware detection. However, it would be more feasible to extract a file feature at fine-grained level. In this paper, we propose a novel detection approach by generating structural features through computing a stream of byte chunks using compression ratio, entropy, Jaccard similarity coefficient and Chi-square statistic test. Nonnegative Matrix Factorization is also considered to reduce the feature dimensions. We then use the coefficient vectors from the reduced space to train Hidden Markov Model. Experimental results show there is different performance between malware detection and classification among the proposed structural features.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15

Similar content being viewed by others

Notes

  1. http://www.gzip.org/.

  2. (BIG 2015) http://arxiv.org/abs/1802.10135.

  3. https://www.systutorials.com/docs/linux/man/1-xxd/.

  4. https://virusshare.com.

  5. http://vxheaven.org/lib/vzo21.html.

  6. https://github.com/a0rtega/metame.

  7. https://download.cnet.com/.

  8. http://www.cygwin.com.

References

  1. Annachhatre, C., Austin, T.H., Stamp, M.: Hidden markov models for malware classification. J. Comput. Virol. Hack. Tech. 11(2), 59–73 (2015)

    Article  Google Scholar 

  2. Austin, T.H., Filiol, E., Josse, S., Stamp, M.: Exploring hidden markov models for virus analysis: a semantic approach. In: System Sciences (HICSS), 2013 46th Hawaii International Conference on, IEEE, pp 5039–5048 (2013)

  3. Baldangombo, U., Jambaljav, N., Horng, S.J.: A static malware detection system using data mining methods. arXiv preprint arXiv:1308.2831 (2013)

  4. Basole, S., Di Troia, F., Stamp, M.: Multifamily malware models. J. Comput. Virol. Hack. Tech., pp. 1–14 (2020)

  5. Baum, L.E., Petrie, T., Soules, G., Weiss, N.: A maximization technique occurring in the statistical analysis of probabilistic functions of Markov chains. Ann. Math. Stat. 41(1), 164–171 (1970)

    Article  MathSciNet  Google Scholar 

  6. Baysa, D., Low, R.M., Stamp, M.: Structural entropy and metamorphic malware. J. Comput. Virol. Hack. Tech. 9(4), 179–192 (2013)

    Article  Google Scholar 

  7. Canfora, G., Mercaldo, F., Visaggio, C.A., Di Notte, P.: Metamorphic malware detection using code metrics. Inf. Secur. J.: Glob. Perspect. 23(3), 57–67 (2014)

    Google Scholar 

  8. Cesare, S., Xiang, Y.: Software Similarity and Classification. Springer, Berlin (2012)

  9. Cover, T.M., Thomas, J.A.: Elements of Information Theory. Wiley, London (2006)

    MATH  Google Scholar 

  10. Deshpande, S., Park, Y., Stamp, M.: Eigenvalue analysis for metamorphic detection. J. Comput. Virol. Hack. Tech. 10(1), 53–65 (2014)

    Article  Google Scholar 

  11. Gharacheh, M., Derhami, V., Hashemi, S., Fard, S.M.H.: Detection of metamorphic malware based on hmm: a hierarchical approach. Int. J. Intell. Syst. Appl. 8(4) (2016)

  12. Gibert, D., Mateu, C., Planes, J., Vicens, R.: Classification of malware by using structural entropy on convolutional neural networks. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 32 (2018)

  13. Gibert, D., Mateu, C., Planes, J.: A hierarchical convolutional neural network for malware classification. In: 2019 International Joint Conference on Neural Networks (IJCNN), IEEE, pp. 1–8 (2019)

  14. Gingrich, P.: Introductory Statistics for the Social Sciences. University of Regina, Department of Sociology and Social Sciences (1992)

  15. Guan, X., Wang, W., Zhang, X.: Fast intrusion detection based on a non-negative matrix factorization model. J. Netw. Comput. Appl. 32(1), 31–44 (2009)

    Article  Google Scholar 

  16. Guillamet, D., Schiele, B., Vitria, J.: Analyzing non-negative matrix factorization for image classification. In: Object Recognition Supported by User Interaction for Service Robots, IEEE 2, 116–119 (2002)

  17. Hamon, R., Borgnat, P., Flandrin, P., Robardet, C.: Nonnegative matrix factorization to find features in temporal networks. In: 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), IEEE, pp. 1065–1069 (2014)

  18. Imran, M., Afzal, M.T., Qadir, M.A.: Using hidden markov model for dynamic malware analysis: first impressions. In: Fuzzy Systems and Knowledge Discovery (FSKD), 2015 12th International Conference on, IEEE, pp. 816–821 (2015)

  19. Jidigam, R.K., Austin, T.H., Stamp, M.: Singular value decomposition and metamorphic detection. J. Comput. Virol. Hack. Tech. 11(4), 203–216 (2015)

    Article  Google Scholar 

  20. Kakisim, A.G., Nar, M., Sogukpinar, I.: Metamorphic malware identification using engine-specific patterns based on co-opcode graphs. Comput. Standards Interfaces, p. 103443 (2020)

  21. Kalbhor, A., Austin, T.H., Filiol, E., Josse, S., Stamp, M.: Dueling hidden Markov models for virus analysis. J. Comput. Virol. Hack. Tech. 11(2), 103–118 (2015)

    Article  Google Scholar 

  22. Lee, D.D., Seung, H.S.: Learning the parts of objects by non-negative matrix factorization. Nature 401(6755), 788–791 (1999)

    Article  Google Scholar 

  23. Lee, J., Austin, T.H., Stamp, M.: Compression-based analysis of metamorphic malware. Int. J. Secure Netw. 10(2), 124–136 (2015)

    Article  Google Scholar 

  24. Li, Y., Ngom, A,: Non-negative matrix and tensor factorization based classification of clinical microarray gene expression data. In: Bioinformatics and Biomedicine (BIBM), 2010 IEEE International Conference on, IEEE, pp 438–443 (2010)

  25. Ling, Y.T., Sani, N.F.M., Abdullah, M.T., Hamid, N.A.W.A.: Structural features with nonnegative matrix factorization for metamorphic malware detection. Comput. Secur. 104, 102216 (2021)

    Article  Google Scholar 

  26. Lloyd, S.: Least squares quantization in pcm. IEEE Trans. Inf. Theory 28(2), 129–137 (1982)

    Article  MathSciNet  Google Scholar 

  27. Lyda, R., Hamrock, J.: Using entropy analysis to find encrypted and packed malware. IEEE Security Privacy 5(2) (2007)

  28. McCune, B., Grace, J.B.: Analysis of Ecological Communities. Mjm Software Design (2002)

  29. Menéndez, H.D., Bhattacharya, S., Clark, D., Barr, E.T.: The arms race: adversarial search defeats entropy used to detect malware. Expert Syst. Appl. 118, 246–260 (2019)

    Article  Google Scholar 

  30. Nappa, A., Rafique, M.Z., Caballero, J.: The malicia dataset: identification and analysis of drive-by download operations. Int. J. Inf. Secur. 14(1), 15–33 (2015)

    Article  Google Scholar 

  31. Patri, O., Wojnowicz, M., Wolff, M.: Discovering malware with time series shapelets. In: Proceedings of the 50th Hawaii International Conference on System Sciences (2017)

  32. Rabiner, L.R.: A tutorial on hidden markov models and selected applications in speech recognition. Proc. IEEE 77(2), 257–286 (1989)

    Article  Google Scholar 

  33. Radkani, E., Hashemi, S., Keshavarz-Haddad, A., Haeri, M.A.: An entropy-based distance measure for analyzing and detecting metamorphic malware. Appl. Intell., pp. 1–11 (2017)

  34. Raff, E., Nicholas, C.: An alternative to ncd for large sequences, lempel-ziv jaccard distance. In: Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1007–1015 (2017)

  35. Rezaei, F., Hamedi-Hamzehkolaie, M., Rezaei, S., Payandeh, A.: Metamorphic viruses detection by hidden markov models. In: Telecommunications (IST), 2014 7th International Symposium on, IEEE, pp. 821–826 (2014a)

  36. Rezaei, F., Nezhad, M.K., Rezaei, S., Payandeh, A.: Detecting encrypted metamorphic viruses by hidden markov models. In: Fuzzy Systems and Knowledge Discovery (FSKD), 2014 11th International Conference on, IEEE, pp. 973–977 (2014b)

  37. Runwal, N., Low, R.M., Stamp, M.: Opcode graph similarity and metamorphic detection. J. Comput. Virol. 8(1–2), 37–52 (2012)

    Article  Google Scholar 

  38. Saleh, M.E., Mohamed, A.B., Nabi, A.A.: Eigenviruses for metamorphic virus recognition. IET Inf. Secur. 5(4), 191–198 (2011)

    Article  Google Scholar 

  39. Shannon, C.E.: A mathematical theory of communication. Bell Syst. Tech. J. 27(3), 379–423 (1948)

    Article  MathSciNet  Google Scholar 

  40. Sorokin, I.: Comparing files using structural entropy. J. Comput. Virol. 7(4), 259–265 (2011)

    Article  MathSciNet  Google Scholar 

  41. Sridhara, S.M., Stamp, M.: Metamorphic worm that carries its own morphing engine. J Comput. Virol. Hack. Tech. 9(2), 49–58 (2013)

    Article  Google Scholar 

  42. Thunga, S.P., Neelisetti, R.K.: Identifying metamorphic virus using n-grams and hidden markov model. In: Advances in Computing, Communications and Informatics (ICACCI), 2015 International Conference on, IEEE, pp. 2016–2022 (2015)

  43. Toderici, A.H., Stamp, M.: Chi-squared distance and metamorphic virus detection. J. Comput. Virol. Hack. Tech. 9(1), 1–14 (2013)

    Article  Google Scholar 

  44. Vemparala, S., Di Troia, F., Corrado, V.A., Austin, T.H., Stamo, M.: Malware detection using dynamic birthmarks. In: Proceedings of the 2016 ACM on International Workshop on Security And Privacy Analytics, ACM, pp. 41–46 (2016)

  45. Wartell, R., Zhou, Y., Hamlen, K.W., Kantarcioglu, M., Thuraisingham ,B.:Differentiating code from data in x86 binaries. In: Joint European Conference on Machine Learning and Knowledge Discovery in Databases, Springer, Berlin, pp. 522–536 (2011)

  46. Wojnowicz, M., Chisholm, G., Wolff, M., Zhao, X.: Wavelet decomposition of software entropy reveals symptoms of malicious code. J. Innov. Digit. Ecosyst. 3(2), 130–140 (2016)

    Article  Google Scholar 

  47. Wong, W., Stamp, M.: Hunting for metamorphic engines. J. Comput. Virol. 2(3), 211–229 (2006)

    Article  Google Scholar 

  48. Zdunek, R., Phan ,A.H., Cichocki, A.: Image classification with nonnegative matrix factorization based on spectral projected gradient. In: Artificial Neural Networks, Springer, Berlin, pp. 31–50 (2015)

  49. Zhang, J.: Machine learning with feature selection using principal component analysis for malware detection: a case study. arXiv preprint arXiv:1902.03639 (2019)

  50. Zhang, Q., Reeves, D.S., (2007) Metaaware: Identifying metamorphic malware. In: Computer Security Applications Conference: ACSAC 2007, pp. 411–420. Twenty-Third Annual, IEEE (2007)

  51. Ziv, J., Lempel, A.: A universal algorithm for sequential data compression. IEEE Trans. Inf. Theory 23(3), 337–343 (1977)

    Article  MathSciNet  Google Scholar 

Download references

Acknowledgements

The authors are grateful to Universiti Malaysia Sarawak for providing scholarship for the preparation of the manuscript. Special thanks also go to Prof. Mark Stamp and Ass. Prof. Fabio Di Troia of SJSU, California, for sharing the Linux metamorphic malware (MWOR) and Malicia datasets, respectively, used in this study.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Nor Fazlida Mohd Sani.

Ethics declarations

Conflict of interest

The authors declare that they have no known conflict interests that could have appeared to influence the work reported in this paper.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Ling, Y.T., Sani, N.F.M., Abdullah, M.T. et al. Metamorphic malware detection using structural features and nonnegative matrix factorization with hidden markov model. J Comput Virol Hack Tech 18, 183–203 (2022). https://doi.org/10.1007/s11416-021-00404-z

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11416-021-00404-z

Keywords

Navigation