research-article

Automating Reverse Engineering with Machine Learning Techniques

Authors:
Blake Anderson

Los Alamos National Laboratory, Los Alamos, USA

Los Alamos National Laboratory, Los Alamos, USA
View Profile

,
Curtis Storlie

Los Alamos National Laboratory, Los Alamos, USA

Los Alamos National Laboratory, Los Alamos, USA
View Profile

,
Micah Yates

Los Alamos National Laboratory, Los Alamos, USA

Los Alamos National Laboratory, Los Alamos, USA
View Profile

,
Aaron McPhall

Los Alamos National Laboratory, Los Alamos, USA

Los Alamos National Laboratory, Los Alamos, USA
View Profile

AISec '14: Proceedings of the 2014 Workshop on Artificial Intelligent and Security WorkshopNovember 2014Pages 103–112https://doi.org/10.1145/2666652.2666665

Published:07 November 2014Publication History

AISec '14: Proceedings of the 2014 Workshop on Artificial Intelligent and Security Workshop

Pages 103–112

ABSTRACT

Malware continues to be an ongoing threat, with millions of unique variants created every year. Unlike the majority of this malware, Advanced Persistent Threat (APT) malware is created to target a specific network or set of networks and has a precise objective, e.g. exfiltrating sensitive data. While 0-day malware detectors are a good start, they do not help the reverse engineers better understand the threats attacking their networks. Understanding the behavior of malware is often a time sensitive task, and can take anywhere between several hours to several weeks. Our goal is to automate the task of identifying the general function of the subroutines in the function call graph of the program to aid the reverse engineers. Two approaches to model the subroutine labels are investigated, a multiclass Gaussian process and a multiclass support vector machine. The output of these methods is the probability that the subroutine belongs to a certain class of functionality (e.g., file I/O, exploit, etc.). Promising initial results, illustrating the efficacy of this method, are presented on a sample of 201 subroutines taken from two malicious families.

References

Blake Anderson, Daniel Quist, Joshua Neil, Curtis Storlie, and Terran Lane. Graph-Based Malware Detection using Dynamic Analysis. Journal of Computer Virology, pages 1--12, 2011. Google ScholarDigital Library
Blake Anderson, Curtis Storlie, and Terran Lane. Improving Malware Classification: Bridging the Static/Dynamic Gap. In Proceedings of the Fifth ACM Workshop on Security and Artificial Intelligence, pages 3--14. ACM, 2012. Google ScholarDigital Library
Ulrich Bayer, Paolo Milani Comparetti, Clemens Hlauschek, Christopher Kruegel, and Engin Kirda. Scalable, Behavior-Based Malware Clustering. In ISOC Network and Distributed System Security Symposium. 2009.Google Scholar
Daniel Bilar. Opcodes as Predictor for Malware. International Journal of Electronic Security and Digital Forensics, 1:156--168, 2007. Google ScholarDigital Library
Danilo Bruschi, Lorenzo Martignoni, and Mattia Monga. Detecting Self-Mutating Malware using Control-Flow Graph Matching. In Detection of Intrusions and Malware and Vulnerability Assessment, Lecture Notes in Computer Science, pages 129--143. Springer Berlin / Heidelberg, 2006. Google ScholarDigital Library
Christopher J. C. Burges. A Tutorial on Support Vector Machines for Pattern Recognition. Data Mining and Knowledge Discovery, 2:121--167, 1998. Google ScholarDigital Library
Jianyong Dai, Ratan Guha, and Joohan Lee. Efficient Virus Detection Using Dynamic Instruction Sequences. Journal of Computers, 4(5), 2009.Google ScholarCross Ref
Rainer Hettich and Kenneth Kortanek. Semi-Infinite Programming: Theory, Methods, and Applications. SIAM Review, 35:380--429, 1993. Google ScholarDigital Library
Steven A. Hofmeyr, Stephanie Forrest, and Anil Somayaji. Intrusion Detection Using Sequences of System Calls. Journal of Computer Security, 6(3):151--180, January 1998. Google ScholarCross Ref
Chih-Wei Hsu and Chih-Jen Lin. A Comparison of Methods for Multiclass Support Vector Machines. IEEE Transactions on Neural Networks, 13(2):415--425, 2002. Google ScholarDigital Library
IDA Pro, Accessed 17 September 2013. http://www.hex-rays.com/products/ida/index.shtml.Google Scholar
Balaji Krishnapuram, Lawrence Carin, Mario AT Figueiredo, and Alexander J Hartemink. Sparse Multinomial Logistic Regression: Fast Algorithms and Generalization Bounds. IEEE Transactions on Pattern Analysis and Machine Intelligence, 27(6):957--968, 2005. Google ScholarDigital Library
Christopher Kruegel, Engin Kirda, Darren Mutz, William Robertson, and Giovanni Vigna. Polymorphic Worm Detection Using Structural Information of Executables. In Recent Advances in Intrusion Detection, pages 207--226. Springer Berlin / Heidelberg, 2006. Google ScholarDigital Library
Corrado Leita, Ulrich Bayer, and Engin Kirda. Exploiting Diverse Observation Perspectives to Get Insights on the Malware Landscape. In IEEE/IFIP International Conference on Dependable Systems and Networks, pages 393--402, 2010.Google ScholarCross Ref
McAfee. McAfee Threat Report, First Quarter, June Accessed 15 July 2014. http://www.mcafee.com/sg/resources/reports/rp-quarterly-threat-q1--2014.pdf.Google Scholar
Eitan Menahem, Asaf Shabtai, Lior Rokach, and Yuval Elovici. Improving Malware Detection by Applying Multi-Inducer Ensemble. Computational Statistics and Data Analysis, 53(4):1483--1494, 2009. Google ScholarDigital Library
Carl Edward Rasmussen and Christopher K.I. Williams. Gaussian Processes for Machine Learning. MIT Press, 2006.Google Scholar
Konrad Rieck, Thorsten Holz, Carsten Willems, Patrick Düssel, and Pavel Laskov. Learning and Classification of Malware Behavior. In Detection of Intrusions and Malware, and Vulnerability Assessment, volume 5137 of Lecture Notes in Computer Science, pages 108--125. Springer Berlin / Heidelberg, 2008. Google ScholarDigital Library
Bernhard Schölkopf and Alexander Johannes Smola. Learning with Kernels. MIT Press, 2002.Google Scholar
Madhu Shankarapani, Subbu Ramamoorthy, Ram Movva, and Srinivas Mukkamala. Malware Detection Using Assembly and API Call Sequences. Journal of Computer Virology, 7(2):1--13, 2010. Google ScholarDigital Library
Qi-Man Shao and Joseph G Ibrahim. Monte Carlo Methods in Bayesian Computation. Springer Series in Statistics, New York, 2000.Google Scholar
Sören Sonnenburg, Gunnar Raetsch, and Christin Schaefer. A General and Efficient Multiple Kernel Learning Algorithm. Nineteenth Annual Conference on Neural Information Processing Systems, 2005.Google Scholar
Sören Sonnenburg, Gunnar Rätsch, Christin Schäfer, and Bernhard Schölkopf. Large Scale Multiple Kernel Learning. Journal of Machine Learning Research, 7:1531--1565, 2006. Google ScholarDigital Library
Curtis Storlie, Blake Anderson, Scott Vander Wiel, Daniel Quist, Curtis Hash, and Nathan Brown. Stochastic Identification of Malware with Dynamic Traces. The Annals of Applied Statistics, 8(1):1--18, 2014.Google ScholarCross Ref
J-Y Xu, Andrew H Sung, Patrick Chavez, and Srinivas Mukkamala. Polymorphic Malicious Executable Scanner by API Sequence Analysis. In Fourth International Conference on Hybrid Intelligent Systems (HIS), pages 378--383. IEEE, 2004. Google ScholarDigital Library
Bianca Zadrozny and Charles Elkan. Transforming Classifier Scores into Accurate Multiclass Probability Estimates. Proceedings of the Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 694--699, 2002. Google ScholarDigital Library

Index Terms

Automating Reverse Engineering with Machine Learning Techniques

Recommendations

Obfuscation: The Hidden Malware

A cyberwar exists between malware writers and antimalware researchers. At this war's heart rages a weapons race that originated in the 80s with the first computer virus. Obfuscation is one of the latest strategies to camouflage the telltale signs of ...
Read More
Improving malware classification: bridging the static/dynamic gap
AISec '12: Proceedings of the 5th ACM workshop on Security and artificial intelligence

Malware classification systems have typically used some machine learning algorithm in conjunction with either static or dynamic features collected from the binary. Recently, more advanced malware has introduced mechanisms to avoid detection in these ...
Read More
Sorting Ransomware from Malware Utilizing Machine Learning Methods with Dynamic Analysis
MobiHoc '23: Proceedings of the Twenty-fourth International Symposium on Theory, Algorithmic Foundations, and Protocol Design for Mobile Networks and Mobile Computing

Ransomware attacks have grown significantly in the past dozen years and have disrupted businesses that engage with personal data. In this paper, we discuss the identification of ransomware, malware, and benign software from one another using machine ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
AISec '14: Proceedings of the 2014 Workshop on Artificial Intelligent and Security Workshop
November 2014
134 pages
ISBN:9781450331531
DOI:10.1145/2666652
General Chair:
Gail-Joon Ahn
Arizona State University, USA
,
Program Chairs:
Christos Dimitrakakis
Chalmers University of Technology, Sweden
,
Aikaterini Mitrokotsa
Chalmers University of Technology, Sweden
,
Benjamin I.P. Rubinstein
University of Melbourne, Australia
Copyright © 2014 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 7 November 2014
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
computer security
gaussian processes
machine learning
malware
multiple kernel learning
support vector machines
Qualifiers
- research-article
Conference

Acceptance Rates
AISec '14 Paper Acceptance Rate12of24submissions,50%Overall Acceptance Rate94of231submissions,41%
More
Upcoming Conference
CCS '24

Sponsor:

sigsac

ACM SIGSAC Conference on Computer and Communications Security

October 14 - 18, 2024

Salt Lake City , UT , USA
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 6
  Total Citations
  View Citations
- 1,061
  Total Downloads
- Downloads (Last 12 months)66
- Downloads (Last 6 weeks)5
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Automating Reverse Engineering with Machine Learning Techniques

AISec '14: Proceedings of the 2014 Workshop on Artificial Intelligent and Security Workshop

ABSTRACT

References

Cited By

Index Terms

Recommendations

Obfuscation: The Hidden Malware

Improving malware classification: bridging the static/dynamic gap

Sorting Ransomware from Malware Utilizing Machine Learning Methods with Dynamic Analysis

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Automating Reverse Engineering with Machine Learning Techniques

AISec '14: Proceedings of the 2014 Workshop on Artificial Intelligent and Security Workshop

ABSTRACT

References

Cited By

Index Terms

Recommendations

Obfuscation: The Hidden Malware

Improving malware classification: bridging the static/dynamic gap

Sorting Ransomware from Malware Utilizing Machine Learning Methods with Dynamic Analysis

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media