Article

A semantics-based approach to malware detection

Authors:
Mila Dalla Preda

University of Verona, Verona, Italy

University of Verona, Verona, Italy
View Profile

,
Mihai Christodorescu

University of Wisconsin, Madison, WI

University of Wisconsin, Madison, WI
View Profile

,
Somesh Jha

University of Wisconsin, Madison, WI

University of Wisconsin, Madison, WI
View Profile

,
Saumya Debray

University of Arizona, Tucson, AZ

University of Arizona, Tucson, AZ
View Profile

POPL '07: Proceedings of the 34th annual ACM SIGPLAN-SIGACT symposium on Principles of programming languagesJanuary 2007Pages 377–388https://doi.org/10.1145/1190216.1190270

Published:17 January 2007Publication History

POPL '07: Proceedings of the 34th annual ACM SIGPLAN-SIGACT symposium on Principles of programming languages

Pages 377–388

ABSTRACT

Malware detection is a crucial aspect of software security. Current malware detectors work by checking for "signatures," which attempt to capture (syntactic) characteristics of the machine-level byte sequence of the malware. This reliance on a syntactic approach makes such detectors vulnerable to code obfuscations, increasingly used by malware writers, that alter syntactic properties of the malware byte sequence without significantly affecting their execution behavior.This paper takes the position that the key to malware identification lies in their semantics. It proposes a semantics-based framework for reasoning about malware detectors and proving properties such as soundness and completeness of these detectors. Our approach uses a trace semantics to characterize the behaviors of malware as well as the program being checked for infection, and uses abstract interpretation to "hide" irrelevant aspects of these behaviors. As a concrete application of our approach, we show that the semantics-aware malware detector proposed by Christodorescu et al. is complete with respect to a number of common obfuscations used by malware writers.

References

B. Barak, O. Goldreich, R. Impagliazzo, S. Rudich, A. Sahai, S. Vadhan, and K. Yang. On the (im)possibility of obfuscating programs. In Advances in Cryptology (CRYPTO'01), volume 2139 of Lecture Notes in Computer Science, pages 1 -- 18, Santa Barbara, CA, USA, Aug. 19--23, 2001. Springer Berlin/Heidelberg.]] Google ScholarDigital Library
D. Chess and S. White. An undetectable computer virus. In Proceedings of the 2000 Virus Bulletin Conference (VB2000), Orlando, FL, USA, Sept. 27--29, 2000. Virus Bulletin.]]Google Scholar
S. Chow, Y. Gu, H. Johnson, and V. Zakharov. An approach to the obfuscation of control-flow of sequential computer programs. In G. Davida and Y. Frankel, editors, Proceedings of the 4th International Information Security Conference (ISC'01), volume 2200 of Lecture Notes in Computer Science, pages 144--155, Malaga, Spain, Oct. 1--3, 2001. Springer Berlin/Heidelberg.]] Google ScholarDigital Library
M. Christodorescu, S. Jha, S. A. Seshia, D. Song, and R. E. Bryant. Semantics-aware malware detection. In Proceedings of the 2005 IEEE Symposium on Security and Privacy (S&P'05), pages 32--46, Oakland, CA, USA, May 8--11, 2005. IEEE Computer Society.]] Google ScholarDigital Library
F. B. Cohen. Computer viruses: Theory and experiments. Computers and Security, 6:22--35, 1987.]] Google ScholarDigital Library
C. Collberg, C. Thomborson, and D. Low. A taxonomy of obfuscating transformations. Technical Report 148, Department of Computer Sciences, The University of Auckland, July 1997.]]Google Scholar
C. Collberg, C. Thomborson, and D. Low. Manufacturing cheap, resilient, and stealthy opaque constructs. In Proceedings of the 25th ACM SIGPLAN--SIGACT Symposium on Principles of Programming Languages (POPL'98), pages 184--196, San Diego, CA, USA, Jan. 19--21, 1998. ACM Press.]] Google ScholarDigital Library
P. Cousot and R. Cousot. Abstract interpretation: A unified lattice model for static analysis of programs by construction of approximation of fixed points. In Proceedings of the 4th ACM SIGPLAN--SIGACT Symposium on Principles of Programming Languages (POPL'77), pages 238--252, Los Angeles, CA, USA, Jan. 17--19, 1977. ACM Press.]] Google ScholarDigital Library
P. Cousot and R. Cousot. Systematic design of program analysis frameworks. In Proceedings of the 6th ACM SIGPLAN--SIGACT Symposium on Principles of Programming Languages (POPL'79), pages 269--282, San Antonio, TX, USA, Jan. 29--31, 1979. ACM Press.]] Google ScholarDigital Library
P. Cousot and R. Cousot. Systematic design of program transformation frameworks by abstract interpretation. In Proceedings of the 29th ACM SIGPLAN--SIGACT Symposium on Principles of Programming Languages (POPL'02), pages 178--190, Portland, OR, USA, Jan. 16--18, 2002. ACM Press.]] Google ScholarDigital Library
M. Dalla Preda and R. Giacobazzi. Control code obfuscation by abstract interpretation. In Proceedings of the 3rd IEEE International Conference on Software Engineeering and Formal Methods (SEFM'05), pages 301--310, Koblenz, Germany, Sept. 5--9, 2005. IEEE Computer Society.]] Google ScholarDigital Library
M. Dalla Preda and R. Giacobazzi. Semantic-based code obfuscation by abstract interpretation. In Proceedings of the 32nd International Colloquium on Automata, Languages and Programming (ICALP'05), volume 3580 of Lecture Notes in Computer Science, pages 1325--1336, Lisboa, Portugal, July 11--15, 2005. Springer Berlin/Heidelberg.]] Google ScholarDigital Library
T. Detristan, T. Ulenspiegel, Y. Malcom, and M. S. von Underduk. Polymorphic shellcode engine using spectrum analysis. Phrack, 11(61):published online at http://www.phrack.org (last accessed on Jan. 16, 2004), Aug. 2003.]]Google Scholar
S. Goldwasser and Y. T. Kalai. On the impossibility of obfuscation with auxiliary input. In Proceedings of the 46th Annual IEEE Symposium on Foundations of Computer Science (FOCS'05), pages 553--562, Washington, DC, USA, Oct. 22--25, 2005. IEEE Computer Society.]] Google ScholarDigital Library
A. Gupta and R. Sekar. An approach for detecting self-propagating email using anomaly detection. In G. Vigna, E. Jonsson, and C. Kruegel, editors, Proceedings of the 6th International Symposium on Recent Advances in Intrusion Detection (RAID'03), volume 2820 of Lecture Notes in Computer Science, pages 55--72, Pittsburgh, PA, USA, Sept. 8--10, 2003. Springer Berlin/Heidelberg.]]Google Scholar
Intel Corporation. IA-32 Intel Architecture Software Developer's Manual.]]Google Scholar
M. Jordan. Dealing with metamorphism. Virus Bulletin, pages 4--6, Oct. 2002.]]Google Scholar
J. Kinder, S. Katzenbeisser, C. Schallhart, and H. Veith. Detecting malicious code by model checking. In K. Julisch and C. Krügel, editors, Proceedings of the 2nd International Conference on Intrusion and Malware Detection and Vulnerability Assessment (DIMVA'05), volume 3548 of Lecture Notes in Computer Science, pages 174--187, Vienna, Austria, July 7--8, 2005. Springer Berlin/Heidelberg.]] Google ScholarDigital Library
J. Z. Kolter and M. A. Maloof. Learning to detect malicious executables in the wild. In Proceedings of the 10th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD'04), pages 470--478, Seattle, WA, USA, Aug. 22--25, 2004. ACM Press.]] Google ScholarDigital Library
W.-J. Li, K. Wang, S. J. Stolfo, and B. Herzog. Fileprints: Identifying file types by n-gram analysis. In Proceedings of the 6th Annual IEEE Systems, Man, and Cybernetics (SMC) Workshop on Information Assurance (IAW'05), pages 64--71, West Point, NY, June 15--17, 2005. United States Military Academy.]]Google Scholar
C. Linn and S. Debray. Obfuscation of executable code to improve resistance to static disassembly. In Proceedings of the 10th ACM Conference on Computer and Communications Security (CCS'03), pages 290--299, Washington, DC, USA, Oct. 27--30, 2003. ACM Press.]] Google ScholarDigital Library
P. Morley. Processing virus collections. In Proceedings of the 2001 Virus Bulletin Conference (VB2001), pages 129--134, Prague, Czech Republic, Sept. 27--28, 2001. Virus Bulletin.]]Google Scholar
C. Nachenberg. Computer virus-antivirus coevolution. Communications of the ACM, 40(1):46--51, Jan. 1997.]] Google ScholarDigital Library
Rajaat. Polymorphism. 29A Magazine, 1(3), 1999.]]Google Scholar
Symantec Corporation. Symantec Internet Security Threat Report: Trends for January 06--June 06, volume X. Symantec Corporation, Sept. 25, 2006.]]Google Scholar
P. Ször. The Art of Computer Virus Research and Defense. Addison-Wesley Professional, 2005.]] Google ScholarDigital Library
P. Ször and P. Ferrie. Hunting for metamorphic. In Proceedings of the 2001 Virus Bulletin Conference (VB2001), pages 123--144, Prague, Czech Republic, Sept. 27--28, 2001. Virus Bulletin.]]Google Scholar
H. Wee. On obfuscating point functions. In Proceedings of the 37th Annual ACM Symposium on Theory of Computing (STOC'05), pages 523--532, Baltimore, MD, USA, May 21--24, 2005. ACM Press.]] Google ScholarDigital Library
z0mbie. Automated reverse engineering: Mistfall engine. Published online at http://www.madchat.org//vxdevl/papers/vxers/Z0mbie/autorev.txt (last accessed on Sep. 29, 2006).]]Google Scholar
z0mbie. Real permutating engine. Published online at http://vx.netlux.org/vx.php?id=er05 (last accessed on Sep. 29, 2006).]]Google Scholar

Index Terms

A semantics-based approach to malware detection
1. Theory of computation
  1. Semantics and reasoning
    1. Program reasoning
      1. Program verification

Recommendations

A semantics-based approach to malware detection

Malware detection is a crucial aspect of software security. Current malware detectors work by checking for signatures, which attempt to capture the syntactic characteristics of the machine-level byte sequence of the malware. This reliance on a syntactic ...
Read More
A semantics-based approach to malware detection
Proceedings of the 2007 POPL Conference

Malware detection is a crucial aspect of software security. Current malware detectors work by checking for "signatures," which attempt to capture (syntactic) characteristics of the machine-level byte sequence of the malware. This reliance on a syntactic ...
Read More
Metamorphic malware detection using base malware identification approach

Malware is a malicious program that is intentionally developed to harm computer systems. Because the metamorphic malwares are advanced in nature, they mutate their code in each generation by employing code obfuscation techniques to thwart detection. ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
POPL '07: Proceedings of the 34th annual ACM SIGPLAN-SIGACT symposium on Principles of programming languages
January 2007
400 pages
ISBN:1595935754
DOI:10.1145/1190216
General Chair:
Martin Hofmann
Ludwig-Maximilians U, Munich, Germany
,
Program Chair:
Matthias Felleisen
Northeastern University, Boston MA
ACM SIGPLAN Notices Volume 42, Issue 1
Proceedings of the 2007 POPL Conference
January 2007
379 pages
ISSN:0362-1340
EISSN:1558-1160
DOI:10.1145/1190215
Issue’s Table of Contents
Copyright © 2007 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 17 January 2007
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
abstract interpretation
malware detection
obfuscation
trace semantics
Qualifiers
- Article
Conference

Acceptance Rates
Overall Acceptance Rate824of4,130submissions,20%
Upcoming Conference
POPL '25

Sponsor:

sigplan

The 52nd Annual ACM SIGPLAN Symposium on Principles of Programming Languages

January 19 - 25, 2025

Denver , CO , USA
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 94
  Total Citations
  View Citations
- 2,181
  Total Downloads
- Downloads (Last 12 months)48
- Downloads (Last 6 weeks)12
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

A semantics-based approach to malware detection

POPL '07: Proceedings of the 34th annual ACM SIGPLAN-SIGACT symposium on Principles of programming languages

ABSTRACT

References

Cited By

Index Terms

Recommendations

A semantics-based approach to malware detection

A semantics-based approach to malware detection

Metamorphic malware detection using base malware identification approach