skip to main content
10.1145/2666652.2666665acmconferencesArticle/Chapter ViewAbstractPublication PagesccsConference Proceedingsconference-collections
research-article

Automating Reverse Engineering with Machine Learning Techniques

Published:07 November 2014Publication History

ABSTRACT

Malware continues to be an ongoing threat, with millions of unique variants created every year. Unlike the majority of this malware, Advanced Persistent Threat (APT) malware is created to target a specific network or set of networks and has a precise objective, e.g. exfiltrating sensitive data. While 0-day malware detectors are a good start, they do not help the reverse engineers better understand the threats attacking their networks. Understanding the behavior of malware is often a time sensitive task, and can take anywhere between several hours to several weeks. Our goal is to automate the task of identifying the general function of the subroutines in the function call graph of the program to aid the reverse engineers. Two approaches to model the subroutine labels are investigated, a multiclass Gaussian process and a multiclass support vector machine. The output of these methods is the probability that the subroutine belongs to a certain class of functionality (e.g., file I/O, exploit, etc.). Promising initial results, illustrating the efficacy of this method, are presented on a sample of 201 subroutines taken from two malicious families.

References

  1. Blake Anderson, Daniel Quist, Joshua Neil, Curtis Storlie, and Terran Lane. Graph-Based Malware Detection using Dynamic Analysis. Journal of Computer Virology, pages 1--12, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Blake Anderson, Curtis Storlie, and Terran Lane. Improving Malware Classification: Bridging the Static/Dynamic Gap. In Proceedings of the Fifth ACM Workshop on Security and Artificial Intelligence, pages 3--14. ACM, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Ulrich Bayer, Paolo Milani Comparetti, Clemens Hlauschek, Christopher Kruegel, and Engin Kirda. Scalable, Behavior-Based Malware Clustering. In ISOC Network and Distributed System Security Symposium. 2009.Google ScholarGoogle Scholar
  4. Daniel Bilar. Opcodes as Predictor for Malware. International Journal of Electronic Security and Digital Forensics, 1:156--168, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Danilo Bruschi, Lorenzo Martignoni, and Mattia Monga. Detecting Self-Mutating Malware using Control-Flow Graph Matching. In Detection of Intrusions and Malware and Vulnerability Assessment, Lecture Notes in Computer Science, pages 129--143. Springer Berlin / Heidelberg, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Christopher J. C. Burges. A Tutorial on Support Vector Machines for Pattern Recognition. Data Mining and Knowledge Discovery, 2:121--167, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Jianyong Dai, Ratan Guha, and Joohan Lee. Efficient Virus Detection Using Dynamic Instruction Sequences. Journal of Computers, 4(5), 2009.Google ScholarGoogle ScholarCross RefCross Ref
  8. Rainer Hettich and Kenneth Kortanek. Semi-Infinite Programming: Theory, Methods, and Applications. SIAM Review, 35:380--429, 1993. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Steven A. Hofmeyr, Stephanie Forrest, and Anil Somayaji. Intrusion Detection Using Sequences of System Calls. Journal of Computer Security, 6(3):151--180, January 1998. Google ScholarGoogle ScholarCross RefCross Ref
  10. Chih-Wei Hsu and Chih-Jen Lin. A Comparison of Methods for Multiclass Support Vector Machines. IEEE Transactions on Neural Networks, 13(2):415--425, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. IDA Pro, Accessed 17 September 2013. http://www.hex-rays.com/products/ida/index.shtml.Google ScholarGoogle Scholar
  12. Balaji Krishnapuram, Lawrence Carin, Mario AT Figueiredo, and Alexander J Hartemink. Sparse Multinomial Logistic Regression: Fast Algorithms and Generalization Bounds. IEEE Transactions on Pattern Analysis and Machine Intelligence, 27(6):957--968, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Christopher Kruegel, Engin Kirda, Darren Mutz, William Robertson, and Giovanni Vigna. Polymorphic Worm Detection Using Structural Information of Executables. In Recent Advances in Intrusion Detection, pages 207--226. Springer Berlin / Heidelberg, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Corrado Leita, Ulrich Bayer, and Engin Kirda. Exploiting Diverse Observation Perspectives to Get Insights on the Malware Landscape. In IEEE/IFIP International Conference on Dependable Systems and Networks, pages 393--402, 2010.Google ScholarGoogle ScholarCross RefCross Ref
  15. McAfee. McAfee Threat Report, First Quarter, June Accessed 15 July 2014. http://www.mcafee.com/sg/resources/reports/rp-quarterly-threat-q1--2014.pdf.Google ScholarGoogle Scholar
  16. Eitan Menahem, Asaf Shabtai, Lior Rokach, and Yuval Elovici. Improving Malware Detection by Applying Multi-Inducer Ensemble. Computational Statistics and Data Analysis, 53(4):1483--1494, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Carl Edward Rasmussen and Christopher K.I. Williams. Gaussian Processes for Machine Learning. MIT Press, 2006.Google ScholarGoogle Scholar
  18. Konrad Rieck, Thorsten Holz, Carsten Willems, Patrick Düssel, and Pavel Laskov. Learning and Classification of Malware Behavior. In Detection of Intrusions and Malware, and Vulnerability Assessment, volume 5137 of Lecture Notes in Computer Science, pages 108--125. Springer Berlin / Heidelberg, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Bernhard Schölkopf and Alexander Johannes Smola. Learning with Kernels. MIT Press, 2002.Google ScholarGoogle Scholar
  20. Madhu Shankarapani, Subbu Ramamoorthy, Ram Movva, and Srinivas Mukkamala. Malware Detection Using Assembly and API Call Sequences. Journal of Computer Virology, 7(2):1--13, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Qi-Man Shao and Joseph G Ibrahim. Monte Carlo Methods in Bayesian Computation. Springer Series in Statistics, New York, 2000.Google ScholarGoogle Scholar
  22. Sören Sonnenburg, Gunnar Raetsch, and Christin Schaefer. A General and Efficient Multiple Kernel Learning Algorithm. Nineteenth Annual Conference on Neural Information Processing Systems, 2005.Google ScholarGoogle Scholar
  23. Sören Sonnenburg, Gunnar Rätsch, Christin Schäfer, and Bernhard Schölkopf. Large Scale Multiple Kernel Learning. Journal of Machine Learning Research, 7:1531--1565, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Curtis Storlie, Blake Anderson, Scott Vander Wiel, Daniel Quist, Curtis Hash, and Nathan Brown. Stochastic Identification of Malware with Dynamic Traces. The Annals of Applied Statistics, 8(1):1--18, 2014.Google ScholarGoogle ScholarCross RefCross Ref
  25. J-Y Xu, Andrew H Sung, Patrick Chavez, and Srinivas Mukkamala. Polymorphic Malicious Executable Scanner by API Sequence Analysis. In Fourth International Conference on Hybrid Intelligent Systems (HIS), pages 378--383. IEEE, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Bianca Zadrozny and Charles Elkan. Transforming Classifier Scores into Accurate Multiclass Probability Estimates. Proceedings of the Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 694--699, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Automating Reverse Engineering with Machine Learning Techniques

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in
          • Published in

            cover image ACM Conferences
            AISec '14: Proceedings of the 2014 Workshop on Artificial Intelligent and Security Workshop
            November 2014
            134 pages
            ISBN:9781450331531
            DOI:10.1145/2666652

            Copyright © 2014 ACM

            Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

            Publisher

            Association for Computing Machinery

            New York, NY, United States

            Publication History

            • Published: 7 November 2014

            Permissions

            Request permissions about this article.

            Request Permissions

            Check for updates

            Qualifiers

            • research-article

            Acceptance Rates

            AISec '14 Paper Acceptance Rate12of24submissions,50%Overall Acceptance Rate94of231submissions,41%

            Upcoming Conference

            CCS '24
            ACM SIGSAC Conference on Computer and Communications Security
            October 14 - 18, 2024
            Salt Lake City , UT , USA

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader