Abstract
Static analyses aspire to explore all possible executions in order to achieve soundness. Yet, in practice, they fail to capture common dynamic behavior. Enhancing static analyses with dynamic information is a common pattern, with tools such as Tamiflex. Past approaches, however, miss significant portions of dynamic behavior, due to native code, unsupported features (e.g., invokedynamic or lambdas in Java), and more. We present techniques that substantially counteract the unsoundness of a static analysis, with virtually no intrusion to the analysis logic. Our approach is reified in the HeapDL toolchain and consists in taking whole-heap snapshots during program execution, that are further enriched to capture significant aspects of dynamic behavior, regardless of the causes of such behavior. The snapshots are then used as extra inputs to the static analysis. The approach exhibits both portability and significantly increased coverage. Heap information under one set of dynamic inputs allows a static analysis to cover many more behaviors under other inputs. A HeapDL-enhanced static analysis of the DaCapo benchmarks computes 99.5% (median) of the call-graph edges of unseen dynamic executions (vs. 76.9% for the Tamiflex tool).
- Edward E. Aftandilian, Sean Kelley, Connor Gramazio, Nathan Ricci, Sara L. Su, and Samuel Z. Guyer. 2010. Heapviz: Interactive Heap Visualization for Program Understanding and Debugging. In Proceedings of the 5th International Symposium on Software Visualization (SOFTVIS ’10). ACM, New York, NY, USA, 53–62. DOI: Google ScholarDigital Library
- Davide Ancona, Massimo Ancona, Antonio Cuni, and Nicholas D. Matsakis. 2007. RPython: A Step Towards Reconciling Dynamically and Statically Typed OO Languages. In Proceedings of the 2007 Symposium on Dynamic Languages (DLS ’07). ACM, New York, NY, USA, 53–64. DOI: Google ScholarDigital Library
- Molham Aref, Balder ten Cate, Todd J. Green, Benny Kimelfeld, Dan Olteanu, Emir Pasalic, Todd L. Veldhuizen, and Geoffrey Washburn. 2015. Design and Implementation of the LogicBlox System. In Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data (SIGMOD ’15). ACM, New York, NY, USA, 1371–1382. DOI: Google ScholarDigital Library
- Steven Arzt, Siegfried Rasthofer, Christian Fritz, Eric Bodden, Alexandre Bartel, Jacques Klein, Yves Le Traon, Damien Octeau, and Patrick McDaniel. 2014. FlowDroid: Precise Context, Flow, Field, Object-sensitive and Lifecycle-aware Taint Analysis for Android Apps. In Proceedings of the 35th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI ’14). ACM, New York, NY, USA, 259–269. DOI: Google ScholarDigital Library
- Stephen M. Blackburn, Robin Garner, Chris Hoffmann, Asjad M. Khan, Kathryn S. McKinley, Rotem Bentzur, Amer Diwan, Daniel Feinberg, Daniel Frampton, Samuel Z. Guyer, Martin Hirzel, Antony L. Hosking, Maria Jump, Han Bok Lee, J. Eliot B. Moss, Aashish Phansalkar, Darko Stefanovic, Thomas VanDrunen, Daniel von Dincklage, and Ben Wiedermann. 2006. The DaCapo benchmarks: Java benchmarking development and analysis. In Proceedings of the 21st Annual ACM SIGPLAN Conference on Object Oriented Programming, Systems, Languages, and Applications (OOPSLA ’06). ACM, New York, NY, USA, 169–190. DOI: Google ScholarDigital Library
- Sam Blackshear, Alexandra Gendreau, and Bor-Yuh Evan Chang. 2015. Droidel: A General Approach to Android Framework Modeling. In Proceedings of the 4th ACM SIGPLAN International Workshop on State Of the Art in Program Analysis (SOAP 2015). ACM, New York, NY, USA, 19–25. DOI: Google ScholarDigital Library
- Eric Bodden, Andreas Sewe, Jan Sinschek, Hela Oueslati, and Mira Mezini. 2011. Taming reflection: Aiding static analysis in the presence of reflection and custom class loaders. In Proceedings of the 33rd International Conference on Software Engineering (ICSE ’11). ACM, New York, NY, USA, 241–250. DOI: Google ScholarDigital Library
- Martin Bravenboer and Yannis Smaragdakis. 2009. Strictly Declarative Specification of Sophisticated Points-to Analyses. In Proceedings of the 24th Annual ACM SIGPLAN Conference on Object Oriented Programming, Systems, Languages, and Applications (OOPSLA ’09). ACM, New York, NY, USA, 243–261. DOI: Google ScholarDigital Library
- Eric Bruneton, Romain Lenglet, and Thierry Coupaye. 2002. ASM: a code manipulation tool to implement adaptable systems. Adaptable and extensible component systems 30, 19 (2002).Google Scholar
- Patrick P. F. Chan, Lucas C. K. Hui, and S. M. Yiu. 2011. Dynamic Software Birthmark for Java Based on Heap Memory Analysis. Springer Berlin Heidelberg, Berlin, Heidelberg, 94–107. DOI: Google ScholarCross Ref
- Stephen J. Fink. 2015. T.J. Watson Libraries for Analysis (WALA). http://wala.sourceforge.net . (2015).Google Scholar
- Michael Furr, Jong-hoon (David) An, and Jeffrey S. Foster. 2009. Profile-guided Static Typing for Dynamic Scripting Languages. In Proceedings of the 24th ACM SIGPLAN Conference on Object Oriented Programming Systems Languages and Applications (OOPSLA ’09). ACM, New York, NY, USA, 283–300. DOI: Google ScholarDigital Library
- Patrice Godefroid, Nils Klarlund, and Koushik Sen. 2005. DART: Directed Automated Random Testing. In Proceedings of the 2005 ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI ’05). ACM, New York, NY, USA, 213–223. DOI: Google ScholarDigital Library
- Brian Goetz. 2010. One VM, Many Languages. https://gotocon.com/dl/jaoo- aarhus- 2010/slides/BrianGoetz_ OneVMManyLanguages.pdf . (2010).Google Scholar
- Brian Goetz. 2016. Project Valhalla Update. http://www.oracle.com/technetwork/java/jvmls2016- goetz- 3126134.pdf . (August 2016). JVM Language Summit.Google Scholar
- Neville Grech, Julian Rathke, and Bernd Fischer. 2013. Preemptive Type Checking in Dynamically Typed Languages. In Proceedings of the 10th International Colloquium on Theoretical Aspects of Computing (Lecture Notes in Computer Science), Vol. 8049. Springer, 195–212. DOI: Google ScholarCross Ref
- Neville Grech and Yannis Smaragdakis. 2017. P/Taint: Unified Points-to and Taint Analysis. Proceedings of the ACM on Programming Languages 1, OOPSLA (October 2017). DOI: Google ScholarDigital Library
- Martin Hirzel, Daniel Von Dincklage, Amer Diwan, and Michael Hind. 2007. Fast Online Pointer Analysis. ACM Transactions on Programming Languages and Systems 29, 2, Article 11 (April 2007), 55 pages. DOI: Google ScholarDigital Library
- Martin Hirzel, Amer Diwan, and Michael Hind. 2004. Pointer Analysis in the Presence of Dynamic Class Loading. In Proceedings of the 18th European Conference on Object-Oriented Programming (ECOOP ’04). Springer Berlin Heidelberg, Berlin, Heidelberg, 96–122. DOI: Google ScholarCross Ref
- Peter Hofer, David Gnedt, and Hanspeter Mössenböck. 2015. Efficient Dynamic Analysis of the Synchronization Performance of Java Applications. In Proceedings of the 13th International Workshop on Dynamic Analysis (WODA 2015). ACM, New York, NY, USA, 14–18. DOI: Google ScholarDigital Library
- IBM. 2017. Using the HPROF Profiler. https://www.ibm.com/support/knowledgecenter/en/SSYKE2_8.0.0/com.ibm.java.lnx. 80.doc/diag/tools/hprof.html . (2017).Google Scholar
- Herbert Jordan, Bernhard Scholz, and Pavle Subotić. 2016. Soufflé: On Synthesis of Program Analyzers. Springer International Publishing, Cham, 422–430. DOI: Google ScholarCross Ref
- George Kastrinis and Yannis Smaragdakis. 2013. Hybrid Context-Sensitivity for Points-To Analysis. In Proceedings of the 2013 ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI ’13). ACM, New York, NY, USA, 423–434. DOI: Google ScholarDigital Library
- Davy Landman, Alexander Serebrenik, and Jurgen J. Vinju. 2017. Challenges for Static Analysis of Java Reflection – Literature Review and Empirical Study. In Proceedings of the 39th International Conference on Software Engineering, ICSE 2017, Buenos Aires, Argentina, May 20-28, 2017. Google ScholarDigital Library
- Sungho Lee, Julian Dolby, and Sukyoung Ryu. 2016. HybriDroid: Static Analysis Framework for Android Hybrid Applications. In Proceedings of the 31st IEEE/ACM International Conference on Automated Software Engineering (ASE 2016). ACM, New York, NY, USA, 250–261. DOI: Google ScholarDigital Library
- Lian Li, Yi Lu, and Jingling Xue. 2017. Dynamic Symbolic Execution for Polymorphism. In Proceedings of the 26th International Conference on Compiler Construction (CC 2017). ACM, New York, NY, USA, 120–130. DOI: Google ScholarDigital Library
- Yue Li, Tian Tan, and Jingling Xue. 2015. Effective Soundness-Guided Reflection Analysis. Springer Berlin Heidelberg, Berlin, Heidelberg, 162–180. DOI: Google ScholarCross Ref
- Tim Lindholm, Frank Yellin, Gilad Bracha, and Alex Buckley. 2014. The Java Virtual Machine Specification, Java SE 8 Edition. (2014).Google Scholar
- Benjamin Livshits, Manu Sridharan, Yannis Smaragdakis, Ondřej Lhoták, J. Nelson Amaral, Bor-Yuh Evan Chang, Samuel Z. Guyer, Uday P. Khedker, Anders Møller, and Dimitrios Vardoulakis. 2015. In Defense of Soundiness: A Manifesto. Commun. ACM 58, 2 (Jan. 2015), 44–46. DOI: Google ScholarDigital Library
- Evan K. Maxwell, Godmar Back, and Naren Ramakrishnan. 2010. Diagnosing Memory Leaks Using Graph Mining on Heap Dumps. In Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD ’10). ACM, New York, NY, USA, 115–124. DOI: Google ScholarDigital Library
- Eric Mercer and Michael Jones. 2005. Model Checking Machine Code with the GNU Debugger. In Proceedings of the 12th International Conference on Model Checking Software (SPIN’05). Springer-Verlag, Berlin, Heidelberg, 251–265. DOI: Google ScholarDigital Library
- Ana Milanova, Atanas Rountev, and Barbara G. Ryder. 2005. Parameterized object sensitivity for points-to analysis for Java. ACM Transactions on Software Engineering and Methodology 14, 1 (2005), 1–41. DOI: Google ScholarDigital Library
- Oracle. 2014a. Java Platform, Standard Edition Java Flight Recorder Runtime Guide. https://docs.oracle.com/javacomponents/ jmc- 5- 4/jfr- runtime- guide/about.htm . (2014).Google Scholar
- Oracle. 2014b. JSR 335: Lambda Expressions for the Java™ Programming Language. (2014). https://jcp.org/en/jsr/detail?id=335Google Scholar
- Oracle. 2016a. HPROF: A Heap/CP U Profiling Tool. http://docs.oracle.com/javase/8/docs/technotes/samples/hprof.html . (2016).Google Scholar
- Oracle. 2016b. jhat. (2016). https://docs.oracle.com/javase/8/docs/technotes/tools/unix/jhat.htmlGoogle Scholar
- Oracle. 2016c. VisualVM: Home. https://visualvm.github.io/ . (2016).Google Scholar
- Oracle. 2017. JEP 280: Indify String Concatenation. (2017). http://openjdk.java.net/jeps/280Google Scholar
- Alex Potanin, James Noble, and Robert Biddle. 2004. Checking Ownership and Confinement: Research Articles. Concurrency and Computation: Practice & Experience - Formal Techniques for Java-like Programs 16, 7 (June 2004), 671–687. DOI: Google ScholarCross Ref
- S. P. Reiss. 2009. Visualizing the Java heap to detect memory problems. In 2009 5th IEEE International Workshop on Visualizing Software for Understanding and Analysis. 73–80. DOI: Google ScholarCross Ref
- John R. Rose. 2009. Bytecodes Meet Combinators: Invokedynamic on the JVM. In Proceedings of the Third Workshop on Virtual Machines and Intermediate Languages (VMIL ’09). ACM, New York, NY, USA, Article 2, 11 pages. DOI: Google ScholarDigital Library
- Koushik Sen, Darko Marinov, and Gul Agha. 2005. CUTE: A Concolic Unit Testing Engine for C. In Proceedings of the 10th European Software Engineering Conference Held Jointly with 13th ACM SIGSOFT International Symposium on Foundations of Software Engineering (ESEC/FSE-13). ACM, New York, NY, USA, 263–272. DOI: Google ScholarCross Ref
- Micha Sharir and Amir Pnueli. 1981. Two Approaches to Interprocedural Data Flow Analysis. In Program flow analysis: theory and applications, Steven S. Muchnick and Neil D. Jones (Eds.). Prentice-Hall, Inc., Englewood Cliffs, NJ, Chapter 7, 189–233.Google Scholar
- Yannis Smaragdakis, Martin Bravenboer, and Ondřej Lhoták. 2011. Pick Your Contexts Well: Understanding Object-Sensitivity. In Proceedings of the 38th ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL ’11). ACM, New York, NY, USA, 17–30. DOI: Google ScholarDigital Library
- Raja Vallée-Rai, Phong Co, Etienne Gagnon, Laurie J. Hendren, Patrick Lam, and Vijay Sundaresan. 1999. Soot - a Java bytecode optimization framework. In Proceedings of the 1999 Conference of the Centre for Advanced Studies on Collaborative research (CASCON ’99). IBM Press, 125–135.Google ScholarDigital Library
- Noriaki Yoshiura and Wei Wei. 2014. Static Data Race Detection for Java Programs with Dynamic Class Loading. Springer International Publishing, Cham, 161–173. DOI: Google ScholarCross Ref
- YourKit. 2017. https://www.yourkit.com/ . (2017).Google Scholar
- Yury Zhauniarovich, Maqsood Ahmad, Olga Gadyatskaya, Bruno Crispo, and Fabio Massacci. 2015. StaDynA: Addressing the Problem of Dynamic Code Updates in the Security Analysis of Android Applications. In Proceedings of the 5th ACM Conference on Data and Application Security and Privacy (CODASPY ’15). ACM, New York, NY, USA, 37–48. DOI: Google ScholarDigital Library
Index Terms
- Heaps don't lie: countering unsoundness with heap snapshots
Recommendations
On the unsoundness of static analysis for Android GUIs
SOAP 2016: Proceedings of the 5th ACM SIGPLAN International Workshop on State Of the Art in Program AnalysisAndroid software presents exciting new challenges for the static analysis community. However, static analyses for Android are typically unsound. This is due to the lack of specification of the Android framework, the continuous evolution of framework ...
Interprocedural pointer alias analysis
We present practical approximation methods for computing and representing interprocedural aliases for a program written in a language that includes pointers, reference parameters, and recursion. We present the following contributions: (1) a framework ...
Differentially testing soundness and precision of program analyzers
ISSTA 2019: Proceedings of the 28th ACM SIGSOFT International Symposium on Software Testing and AnalysisIn the last decades, numerous program analyzers have been developed both in academia and industry. Despite their abundance however, there is currently no systematic way of comparing the effectiveness of different analyzers on arbitrary code. In this ...
Comments