ABSTRACT
The discovery of vulnerabilities in source code is a key for securing computer systems. While specific types of security flaws can be identified automatically, in the general case the process of finding vulnerabilities cannot be automated and vulnerabilities are mainly discovered by manual analysis. In this paper, we propose a method for assisting a security analyst during auditing of source code. Our method proceeds by extracting abstract syntax trees from the code and determining structural patterns in these trees, such that each function in the code can be described as a mixture of these patterns. This representation enables us to decompose a known vulnerability and extrapolate it to a code base, such that functions potentially suffering from the same flaw can be suggested to the analyst. We evaluate our method on the source code of four popular open-source projects: LibTIFF, FFmpeg, Pidgin and Asterisk. For three of these projects, we are able to identify zero-day vulnerabilities by inspecting only a small fraction of the code bases.
- T. Avgerinos, S. K. Cha, B. L. T. Hao, and D. Brumley. AEG: Automatic Exploit Generation. In Proc. of Network and Distributed System Security Symposium (NDSS), 2011.Google Scholar
- I. D. Baxter, A. Yahin, L. Moura, M. S. Anna, and L. Bier. Clone detection using abstract syntax trees. In Proc. of the International Conference on Software Maintenance (ICSM), 1998. Google ScholarDigital Library
- S. Bellon, R. Koschke, I. C. Society, G. Antoniol, J. Krinke, I. C. Society, and E. Merlo. Comparison and evaluation of clone detection tools. IEEE Transactions on Software Engineering, 33: 577--591, 2007. Google ScholarDigital Library
- M. Cova, V. Felmetsger, G. Banks, and G. Vigna. Static detection of vulnerabilities in x86 executables. In Proc. of Annual Computer Security Applications Conference (ACSAC), pages 269--278, 2006. Google ScholarDigital Library
- S. Deerwester, S. Dumais, G. Furnas, T. Landauer, and R. Harshman. Indexing by latent semantic analysis. Journal of the American Society for Information Science, 41(6): 391--407, 1990.Google ScholarCross Ref
- D. Engler, D. Y. Chen, S. Hallem, A. Chou, and B. Chelf. Bugs as deviant behavior: A general approach to inferring errors in systems code. In Proc. of ACM Symposium on Operating Systems Principles (SOSP), pages 57--72, 2001. Google ScholarDigital Library
- N. Falliere, L. O. Murchu, and E. Chien. W32.stuxnet dossier. Symantec Corporation, 2011.Google Scholar
- P. Godefroid, M. Y. Levin, and D. Molnar. SAGE: whitebox fuzzing for security testing. Communications of the ACM, 55(3): 40--44, 2012. Google ScholarDigital Library
- S. Heelan. Vulnerability detection systems: Think cyborg, not robot. IEEE Security & Privacy, 9(3): 74--77, 2011. Google ScholarDigital Library
- J. Hopcroft and J. Motwani, R. Ullmann. Introduction to Automata Theory, Languages, and Computation. Addison-Wesley, 2 edition, 2001. Google ScholarDigital Library
- J. Jang, A. Agrawal, and D. Brumley. ReDeBug: finding unpatched code clones in entire os distributions. In Proc. of IEEE Symposium on Security and Privacy, 2012. Google ScholarDigital Library
- N. Jovanovic, C. Kruegel, and E. Kirda. Pixy: A static analysis tool for detecting web application vulnerabilities. In Proc. of IEEE Symposium on Security and Privacy, pages 6--263, 2006. Google ScholarDigital Library
- T. Kamiya, S. Kusumoto, and K. Inoue. CCFinder: a multilinguistic token-based code clone detection system for large scale source code. IEEE Transactions on Software Engineering, pages 654--670, 2002. Google ScholarDigital Library
- K. A. Kontogiannis, R. Demori, E. Merlo, M. Galler, and M. Bernstein. Pattern matching for clone and concept detection. Journal of Automated Software Engineering, 3: 108, 1996.Google ScholarCross Ref
- Z. Li and Y. Zhou. PR-Miner: automatically extracting implicit programming rules and detecting violations in large software code. In Proc. of European Software Engineering Conference (ESEC), pages 306--315, 2005. Google ScholarDigital Library
- Z. Li, S. Lu, S. Myagmar, and Y. Zhou. Cp-miner: Finding copy-paste and related bugs in large-scale software code. IEEE Transactions on Software Engineering, 32: 176--192, 2006. Google ScholarDigital Library
- B. Livshits and T. Zimmermann. Dynamine: finding common error patterns by mining software revision histories. In Proc. of European Software Engineering Conference (ESEC), pages 296--305, 2005. Google ScholarDigital Library
- V. B. Livshits and M. S. Lam. Finding security vulnerabilities in java applications with static analysis. In Proc. of USENIX Security Symposium, 2005. Google ScholarDigital Library
- A. Marcus and J. I. Maletic. Identification of high-level concept clones in source code. In Proc. of International Conference on Automated Software Engineering (ASE), page 107, 2001. Google ScholarDigital Library
- L. Moonen. Generating robust parsers using island grammars. In Proc. of Working Conference on Reverse Engineering (WCRE), pages 13--22, 2001. Google ScholarDigital Library
- D. Moore, V. Paxson, S. Savage, C. Shannon, S. Staniford, and N. Weaver. Inside the Slammer worm. IEEE Security and Privacy, 1(4): 33--39, 2003. Google ScholarDigital Library
- J. Newsome and D. Song. Dynamic taint analysis for automatic detection, analysis, and signature generation of exploits on commodity software. In Proc. of Network and Distributed System Security Symposium (NDSS), 2005.Google Scholar
- T. Parr and R. Quong. ANTLR: A predicated-LL(k) parser generator. Software Practice and Experience, 25: 789--810, 1995. Google ScholarDigital Library
- rats. Rough auditing tool for security. Fortify Software Inc., https://www.fortify.com/ssa-elements/threat-intelligence/rats.html, visited April, 2012.Google Scholar
- G. Salton and M. J. McGill. Introduction to Modern Information Retrieval. McGraw-Hill, 1986. Google ScholarDigital Library
- C. Shannon and D. Moore. The spread of the Witty worm. IEEE Security and Privacy, 2(4): 46--50, 2004. Google ScholarDigital Library
- M. Sutton, A. Greene, and P. Amini. Fuzzing: Brute Force Vulnerability Discovery. Addison-Wesley Professional, 2007. Google ScholarDigital Library
- J. Viega, J. Bloch, Y. Kohno, and G. McGraw. ITS4: A static vulnerability scanner for C and C++ code. In Proc. of Annual Computer Security Applications Conference (ACSAC), pages 257--267, 2000. Google ScholarDigital Library
- T. Wang, T. Wei, Z. Lin, and W. Zou. IntScope: Automatically detecting integer overflow vulnerability in x86 binary using symbolic execution. In Proc. of Network and Distributed System Security Symposium (NDSS), 2009.Google Scholar
- D. A. Wheeler. Flawfinder. http://www.dwheeler.com/flawfinder/, visited April, 2012.Google Scholar
- C. C. Williams and J. K. Hollingsworth. Automatic mining of source code repositories to improve bug finding techniques. IEEE Transactions on Software Engineering, 31: 466--480, 2005. Google ScholarDigital Library
- Y. Xie and A. Aiken. Static detection of security vulnerabilities in scripting languages. In Proc. of USENIX Security Symposium, 2006. Google ScholarDigital Library
- F. Yamaguchi, F. Lindner, and K. Rieck. Vulnerability extrapolation: Assisted discovery of vulnerabilities using machine learning. In USENIX Workshop on Offensive Technologies (WOOT), Aug. 2011. Google ScholarDigital Library
Index Terms
- Generalized vulnerability extrapolation using abstract syntax trees
Recommendations
Vulnerability extrapolation: assisted discovery of vulnerabilities using machine learning
WOOT'11: Proceedings of the 5th USENIX conference on Offensive technologiesRigorous identification of vulnerabilities in program code is a key to implementing and operating secure systems. Unfortunately, only some types of vulnerabilities can be detected automatically. While techniques from software testing can accelerate the ...
Improving the performance of code vulnerability prediction using abstract syntax tree information
PROMISE 2022: Proceedings of the 18th International Conference on Predictive Models and Data Analytics in Software EngineeringThe recent emergence of the Log4jshell vulnerability demonstrates the importance of detecting code vulnerabilities in software systems. Software Vulnerability Prediction Models (VPMs) are a promising tool for vulnerability detection. Recent studies ...
An Abstract Syntax Tree based static fuzzing mutation for vulnerability evolution analysis
Abstract Context:Zero-day vulnerabilities are highly destructive and sudden. However, traditional static and dynamic testing methods cannot efficiently detect them.
Objective:In this paper, a static ...
Comments