ABSTRACT
Malware continues to be an ongoing threat, with millions of unique variants created every year. Unlike the majority of this malware, Advanced Persistent Threat (APT) malware is created to target a specific network or set of networks and has a precise objective, e.g. exfiltrating sensitive data. While 0-day malware detectors are a good start, they do not help the reverse engineers better understand the threats attacking their networks. Understanding the behavior of malware is often a time sensitive task, and can take anywhere between several hours to several weeks. Our goal is to automate the task of identifying the general function of the subroutines in the function call graph of the program to aid the reverse engineers. Two approaches to model the subroutine labels are investigated, a multiclass Gaussian process and a multiclass support vector machine. The output of these methods is the probability that the subroutine belongs to a certain class of functionality (e.g., file I/O, exploit, etc.). Promising initial results, illustrating the efficacy of this method, are presented on a sample of 201 subroutines taken from two malicious families.
- Blake Anderson, Daniel Quist, Joshua Neil, Curtis Storlie, and Terran Lane. Graph-Based Malware Detection using Dynamic Analysis. Journal of Computer Virology, pages 1--12, 2011. Google ScholarDigital Library
- Blake Anderson, Curtis Storlie, and Terran Lane. Improving Malware Classification: Bridging the Static/Dynamic Gap. In Proceedings of the Fifth ACM Workshop on Security and Artificial Intelligence, pages 3--14. ACM, 2012. Google ScholarDigital Library
- Ulrich Bayer, Paolo Milani Comparetti, Clemens Hlauschek, Christopher Kruegel, and Engin Kirda. Scalable, Behavior-Based Malware Clustering. In ISOC Network and Distributed System Security Symposium. 2009.Google Scholar
- Daniel Bilar. Opcodes as Predictor for Malware. International Journal of Electronic Security and Digital Forensics, 1:156--168, 2007. Google ScholarDigital Library
- Danilo Bruschi, Lorenzo Martignoni, and Mattia Monga. Detecting Self-Mutating Malware using Control-Flow Graph Matching. In Detection of Intrusions and Malware and Vulnerability Assessment, Lecture Notes in Computer Science, pages 129--143. Springer Berlin / Heidelberg, 2006. Google ScholarDigital Library
- Christopher J. C. Burges. A Tutorial on Support Vector Machines for Pattern Recognition. Data Mining and Knowledge Discovery, 2:121--167, 1998. Google ScholarDigital Library
- Jianyong Dai, Ratan Guha, and Joohan Lee. Efficient Virus Detection Using Dynamic Instruction Sequences. Journal of Computers, 4(5), 2009.Google ScholarCross Ref
- Rainer Hettich and Kenneth Kortanek. Semi-Infinite Programming: Theory, Methods, and Applications. SIAM Review, 35:380--429, 1993. Google ScholarDigital Library
- Steven A. Hofmeyr, Stephanie Forrest, and Anil Somayaji. Intrusion Detection Using Sequences of System Calls. Journal of Computer Security, 6(3):151--180, January 1998. Google ScholarCross Ref
- Chih-Wei Hsu and Chih-Jen Lin. A Comparison of Methods for Multiclass Support Vector Machines. IEEE Transactions on Neural Networks, 13(2):415--425, 2002. Google ScholarDigital Library
- IDA Pro, Accessed 17 September 2013. http://www.hex-rays.com/products/ida/index.shtml.Google Scholar
- Balaji Krishnapuram, Lawrence Carin, Mario AT Figueiredo, and Alexander J Hartemink. Sparse Multinomial Logistic Regression: Fast Algorithms and Generalization Bounds. IEEE Transactions on Pattern Analysis and Machine Intelligence, 27(6):957--968, 2005. Google ScholarDigital Library
- Christopher Kruegel, Engin Kirda, Darren Mutz, William Robertson, and Giovanni Vigna. Polymorphic Worm Detection Using Structural Information of Executables. In Recent Advances in Intrusion Detection, pages 207--226. Springer Berlin / Heidelberg, 2006. Google ScholarDigital Library
- Corrado Leita, Ulrich Bayer, and Engin Kirda. Exploiting Diverse Observation Perspectives to Get Insights on the Malware Landscape. In IEEE/IFIP International Conference on Dependable Systems and Networks, pages 393--402, 2010.Google ScholarCross Ref
- McAfee. McAfee Threat Report, First Quarter, June Accessed 15 July 2014. http://www.mcafee.com/sg/resources/reports/rp-quarterly-threat-q1--2014.pdf.Google Scholar
- Eitan Menahem, Asaf Shabtai, Lior Rokach, and Yuval Elovici. Improving Malware Detection by Applying Multi-Inducer Ensemble. Computational Statistics and Data Analysis, 53(4):1483--1494, 2009. Google ScholarDigital Library
- Carl Edward Rasmussen and Christopher K.I. Williams. Gaussian Processes for Machine Learning. MIT Press, 2006.Google Scholar
- Konrad Rieck, Thorsten Holz, Carsten Willems, Patrick Düssel, and Pavel Laskov. Learning and Classification of Malware Behavior. In Detection of Intrusions and Malware, and Vulnerability Assessment, volume 5137 of Lecture Notes in Computer Science, pages 108--125. Springer Berlin / Heidelberg, 2008. Google ScholarDigital Library
- Bernhard Schölkopf and Alexander Johannes Smola. Learning with Kernels. MIT Press, 2002.Google Scholar
- Madhu Shankarapani, Subbu Ramamoorthy, Ram Movva, and Srinivas Mukkamala. Malware Detection Using Assembly and API Call Sequences. Journal of Computer Virology, 7(2):1--13, 2010. Google ScholarDigital Library
- Qi-Man Shao and Joseph G Ibrahim. Monte Carlo Methods in Bayesian Computation. Springer Series in Statistics, New York, 2000.Google Scholar
- Sören Sonnenburg, Gunnar Raetsch, and Christin Schaefer. A General and Efficient Multiple Kernel Learning Algorithm. Nineteenth Annual Conference on Neural Information Processing Systems, 2005.Google Scholar
- Sören Sonnenburg, Gunnar Rätsch, Christin Schäfer, and Bernhard Schölkopf. Large Scale Multiple Kernel Learning. Journal of Machine Learning Research, 7:1531--1565, 2006. Google ScholarDigital Library
- Curtis Storlie, Blake Anderson, Scott Vander Wiel, Daniel Quist, Curtis Hash, and Nathan Brown. Stochastic Identification of Malware with Dynamic Traces. The Annals of Applied Statistics, 8(1):1--18, 2014.Google ScholarCross Ref
- J-Y Xu, Andrew H Sung, Patrick Chavez, and Srinivas Mukkamala. Polymorphic Malicious Executable Scanner by API Sequence Analysis. In Fourth International Conference on Hybrid Intelligent Systems (HIS), pages 378--383. IEEE, 2004. Google ScholarDigital Library
- Bianca Zadrozny and Charles Elkan. Transforming Classifier Scores into Accurate Multiclass Probability Estimates. Proceedings of the Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 694--699, 2002. Google ScholarDigital Library
Index Terms
- Automating Reverse Engineering with Machine Learning Techniques
Recommendations
Obfuscation: The Hidden Malware
A cyberwar exists between malware writers and antimalware researchers. At this war's heart rages a weapons race that originated in the 80s with the first computer virus. Obfuscation is one of the latest strategies to camouflage the telltale signs of ...
Improving malware classification: bridging the static/dynamic gap
AISec '12: Proceedings of the 5th ACM workshop on Security and artificial intelligenceMalware classification systems have typically used some machine learning algorithm in conjunction with either static or dynamic features collected from the binary. Recently, more advanced malware has introduced mechanisms to avoid detection in these ...
Sorting Ransomware from Malware Utilizing Machine Learning Methods with Dynamic Analysis
MobiHoc '23: Proceedings of the Twenty-fourth International Symposium on Theory, Algorithmic Foundations, and Protocol Design for Mobile Networks and Mobile ComputingRansomware attacks have grown significantly in the past dozen years and have disrupted businesses that engage with personal data. In this paper, we discuss the identification of ransomware, malware, and benign software from one another using machine ...
Comments