Heaps don't lie: countering unsoundness with heap snapshots

Authors:
Neville Grech

University of Athens, Greece / University of Malta, Malta

University of Athens, Greece / University of Malta, Malta
View Profile

,
George Fourtounis

University of Athens, Greece

University of Athens, Greece
View Profile

,
Adrian Francalanza

University of Malta, Malta

University of Malta, Malta
View Profile

,
Yannis Smaragdakis

University of Athens, Greece

University of Athens, Greece
View Profile

Proceedings of the ACM on Programming Languages Volume 1 Issue OOPSLAArticle No.: 68pp 1–27https://doi.org/10.1145/3133892

Published:12 October 2017Publication History

Proceedings of the ACM on Programming Languages

Abstract

Static analyses aspire to explore all possible executions in order to achieve soundness. Yet, in practice, they fail to capture common dynamic behavior. Enhancing static analyses with dynamic information is a common pattern, with tools such as Tamiflex. Past approaches, however, miss significant portions of dynamic behavior, due to native code, unsupported features (e.g., invokedynamic or lambdas in Java), and more. We present techniques that substantially counteract the unsoundness of a static analysis, with virtually no intrusion to the analysis logic. Our approach is reified in the HeapDL toolchain and consists in taking whole-heap snapshots during program execution, that are further enriched to capture significant aspects of dynamic behavior, regardless of the causes of such behavior. The snapshots are then used as extra inputs to the static analysis. The approach exhibits both portability and significantly increased coverage. Heap information under one set of dynamic inputs allows a static analysis to cover many more behaviors under other inputs. A HeapDL-enhanced static analysis of the DaCapo benchmarks computes 99.5% (median) of the call-graph edges of unseen dynamic executions (vs. 76.9% for the Tamiflex tool).

References

Edward E. Aftandilian, Sean Kelley, Connor Gramazio, Nathan Ricci, Sara L. Su, and Samuel Z. Guyer. 2010. Heapviz: Interactive Heap Visualization for Program Understanding and Debugging. In Proceedings of the 5th International Symposium on Software Visualization (SOFTVIS ’10). ACM, New York, NY, USA, 53–62. DOI: Google ScholarDigital Library
Davide Ancona, Massimo Ancona, Antonio Cuni, and Nicholas D. Matsakis. 2007. RPython: A Step Towards Reconciling Dynamically and Statically Typed OO Languages. In Proceedings of the 2007 Symposium on Dynamic Languages (DLS ’07). ACM, New York, NY, USA, 53–64. DOI: Google ScholarDigital Library
Molham Aref, Balder ten Cate, Todd J. Green, Benny Kimelfeld, Dan Olteanu, Emir Pasalic, Todd L. Veldhuizen, and Geoffrey Washburn. 2015. Design and Implementation of the LogicBlox System. In Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data (SIGMOD ’15). ACM, New York, NY, USA, 1371–1382. DOI: Google ScholarDigital Library
Steven Arzt, Siegfried Rasthofer, Christian Fritz, Eric Bodden, Alexandre Bartel, Jacques Klein, Yves Le Traon, Damien Octeau, and Patrick McDaniel. 2014. FlowDroid: Precise Context, Flow, Field, Object-sensitive and Lifecycle-aware Taint Analysis for Android Apps. In Proceedings of the 35th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI ’14). ACM, New York, NY, USA, 259–269. DOI: Google ScholarDigital Library
Stephen M. Blackburn, Robin Garner, Chris Hoffmann, Asjad M. Khan, Kathryn S. McKinley, Rotem Bentzur, Amer Diwan, Daniel Feinberg, Daniel Frampton, Samuel Z. Guyer, Martin Hirzel, Antony L. Hosking, Maria Jump, Han Bok Lee, J. Eliot B. Moss, Aashish Phansalkar, Darko Stefanovic, Thomas VanDrunen, Daniel von Dincklage, and Ben Wiedermann. 2006. The DaCapo benchmarks: Java benchmarking development and analysis. In Proceedings of the 21st Annual ACM SIGPLAN Conference on Object Oriented Programming, Systems, Languages, and Applications (OOPSLA ’06). ACM, New York, NY, USA, 169–190. DOI: Google ScholarDigital Library
Sam Blackshear, Alexandra Gendreau, and Bor-Yuh Evan Chang. 2015. Droidel: A General Approach to Android Framework Modeling. In Proceedings of the 4th ACM SIGPLAN International Workshop on State Of the Art in Program Analysis (SOAP 2015). ACM, New York, NY, USA, 19–25. DOI: Google ScholarDigital Library
Eric Bodden, Andreas Sewe, Jan Sinschek, Hela Oueslati, and Mira Mezini. 2011. Taming reflection: Aiding static analysis in the presence of reflection and custom class loaders. In Proceedings of the 33rd International Conference on Software Engineering (ICSE ’11). ACM, New York, NY, USA, 241–250. DOI: Google ScholarDigital Library
Martin Bravenboer and Yannis Smaragdakis. 2009. Strictly Declarative Specification of Sophisticated Points-to Analyses. In Proceedings of the 24th Annual ACM SIGPLAN Conference on Object Oriented Programming, Systems, Languages, and Applications (OOPSLA ’09). ACM, New York, NY, USA, 243–261. DOI: Google ScholarDigital Library
Eric Bruneton, Romain Lenglet, and Thierry Coupaye. 2002. ASM: a code manipulation tool to implement adaptable systems. Adaptable and extensible component systems 30, 19 (2002).Google Scholar
Patrick P. F. Chan, Lucas C. K. Hui, and S. M. Yiu. 2011. Dynamic Software Birthmark for Java Based on Heap Memory Analysis. Springer Berlin Heidelberg, Berlin, Heidelberg, 94–107. DOI: Google ScholarCross Ref
Stephen J. Fink. 2015. T.J. Watson Libraries for Analysis (WALA). http://wala.sourceforge.net . (2015).Google Scholar
Michael Furr, Jong-hoon (David) An, and Jeffrey S. Foster. 2009. Profile-guided Static Typing for Dynamic Scripting Languages. In Proceedings of the 24th ACM SIGPLAN Conference on Object Oriented Programming Systems Languages and Applications (OOPSLA ’09). ACM, New York, NY, USA, 283–300. DOI: Google ScholarDigital Library
Patrice Godefroid, Nils Klarlund, and Koushik Sen. 2005. DART: Directed Automated Random Testing. In Proceedings of the 2005 ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI ’05). ACM, New York, NY, USA, 213–223. DOI: Google ScholarDigital Library
Brian Goetz. 2010. One VM, Many Languages. https://gotocon.com/dl/jaoo- aarhus- 2010/slides/BrianGoetz_ OneVMManyLanguages.pdf . (2010).Google Scholar
Brian Goetz. 2016. Project Valhalla Update. http://www.oracle.com/technetwork/java/jvmls2016- goetz- 3126134.pdf . (August 2016). JVM Language Summit.Google Scholar
Neville Grech, Julian Rathke, and Bernd Fischer. 2013. Preemptive Type Checking in Dynamically Typed Languages. In Proceedings of the 10th International Colloquium on Theoretical Aspects of Computing (Lecture Notes in Computer Science), Vol. 8049. Springer, 195–212. DOI: Google ScholarCross Ref
Neville Grech and Yannis Smaragdakis. 2017. P/Taint: Unified Points-to and Taint Analysis. Proceedings of the ACM on Programming Languages 1, OOPSLA (October 2017). DOI: Google ScholarDigital Library
Martin Hirzel, Daniel Von Dincklage, Amer Diwan, and Michael Hind. 2007. Fast Online Pointer Analysis. ACM Transactions on Programming Languages and Systems 29, 2, Article 11 (April 2007), 55 pages. DOI: Google ScholarDigital Library
Martin Hirzel, Amer Diwan, and Michael Hind. 2004. Pointer Analysis in the Presence of Dynamic Class Loading. In Proceedings of the 18th European Conference on Object-Oriented Programming (ECOOP ’04). Springer Berlin Heidelberg, Berlin, Heidelberg, 96–122. DOI: Google ScholarCross Ref
Peter Hofer, David Gnedt, and Hanspeter Mössenböck. 2015. Efficient Dynamic Analysis of the Synchronization Performance of Java Applications. In Proceedings of the 13th International Workshop on Dynamic Analysis (WODA 2015). ACM, New York, NY, USA, 14–18. DOI: Google ScholarDigital Library
IBM. 2017. Using the HPROF Profiler. https://www.ibm.com/support/knowledgecenter/en/SSYKE2_8.0.0/com.ibm.java.lnx. 80.doc/diag/tools/hprof.html . (2017).Google Scholar
Herbert Jordan, Bernhard Scholz, and Pavle Subotić. 2016. Soufflé: On Synthesis of Program Analyzers. Springer International Publishing, Cham, 422–430. DOI: Google ScholarCross Ref
George Kastrinis and Yannis Smaragdakis. 2013. Hybrid Context-Sensitivity for Points-To Analysis. In Proceedings of the 2013 ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI ’13). ACM, New York, NY, USA, 423–434. DOI: Google ScholarDigital Library
Davy Landman, Alexander Serebrenik, and Jurgen J. Vinju. 2017. Challenges for Static Analysis of Java Reflection – Literature Review and Empirical Study. In Proceedings of the 39th International Conference on Software Engineering, ICSE 2017, Buenos Aires, Argentina, May 20-28, 2017. Google ScholarDigital Library
Sungho Lee, Julian Dolby, and Sukyoung Ryu. 2016. HybriDroid: Static Analysis Framework for Android Hybrid Applications. In Proceedings of the 31st IEEE/ACM International Conference on Automated Software Engineering (ASE 2016). ACM, New York, NY, USA, 250–261. DOI: Google ScholarDigital Library
Lian Li, Yi Lu, and Jingling Xue. 2017. Dynamic Symbolic Execution for Polymorphism. In Proceedings of the 26th International Conference on Compiler Construction (CC 2017). ACM, New York, NY, USA, 120–130. DOI: Google ScholarDigital Library
Yue Li, Tian Tan, and Jingling Xue. 2015. Effective Soundness-Guided Reflection Analysis. Springer Berlin Heidelberg, Berlin, Heidelberg, 162–180. DOI: Google ScholarCross Ref
Tim Lindholm, Frank Yellin, Gilad Bracha, and Alex Buckley. 2014. The Java Virtual Machine Specification, Java SE 8 Edition. (2014).Google Scholar
Benjamin Livshits, Manu Sridharan, Yannis Smaragdakis, Ondřej Lhoták, J. Nelson Amaral, Bor-Yuh Evan Chang, Samuel Z. Guyer, Uday P. Khedker, Anders Møller, and Dimitrios Vardoulakis. 2015. In Defense of Soundiness: A Manifesto. Commun. ACM 58, 2 (Jan. 2015), 44–46. DOI: Google ScholarDigital Library
Evan K. Maxwell, Godmar Back, and Naren Ramakrishnan. 2010. Diagnosing Memory Leaks Using Graph Mining on Heap Dumps. In Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD ’10). ACM, New York, NY, USA, 115–124. DOI: Google ScholarDigital Library
Eric Mercer and Michael Jones. 2005. Model Checking Machine Code with the GNU Debugger. In Proceedings of the 12th International Conference on Model Checking Software (SPIN’05). Springer-Verlag, Berlin, Heidelberg, 251–265. DOI: Google ScholarDigital Library
Ana Milanova, Atanas Rountev, and Barbara G. Ryder. 2005. Parameterized object sensitivity for points-to analysis for Java. ACM Transactions on Software Engineering and Methodology 14, 1 (2005), 1–41. DOI: Google ScholarDigital Library
Oracle. 2014a. Java Platform, Standard Edition Java Flight Recorder Runtime Guide. https://docs.oracle.com/javacomponents/ jmc- 5- 4/jfr- runtime- guide/about.htm . (2014).Google Scholar
Oracle. 2014b. JSR 335: Lambda Expressions for the Java™ Programming Language. (2014). https://jcp.org/en/jsr/detail?id=335Google Scholar
Oracle. 2016a. HPROF: A Heap/CP U Profiling Tool. http://docs.oracle.com/javase/8/docs/technotes/samples/hprof.html . (2016).Google Scholar
Oracle. 2016b. jhat. (2016). https://docs.oracle.com/javase/8/docs/technotes/tools/unix/jhat.htmlGoogle Scholar
Oracle. 2016c. VisualVM: Home. https://visualvm.github.io/ . (2016).Google Scholar
Oracle. 2017. JEP 280: Indify String Concatenation. (2017). http://openjdk.java.net/jeps/280Google Scholar
Alex Potanin, James Noble, and Robert Biddle. 2004. Checking Ownership and Confinement: Research Articles. Concurrency and Computation: Practice & Experience - Formal Techniques for Java-like Programs 16, 7 (June 2004), 671–687. DOI: Google ScholarCross Ref
S. P. Reiss. 2009. Visualizing the Java heap to detect memory problems. In 2009 5th IEEE International Workshop on Visualizing Software for Understanding and Analysis. 73–80. DOI: Google ScholarCross Ref
John R. Rose. 2009. Bytecodes Meet Combinators: Invokedynamic on the JVM. In Proceedings of the Third Workshop on Virtual Machines and Intermediate Languages (VMIL ’09). ACM, New York, NY, USA, Article 2, 11 pages. DOI: Google ScholarDigital Library
Koushik Sen, Darko Marinov, and Gul Agha. 2005. CUTE: A Concolic Unit Testing Engine for C. In Proceedings of the 10th European Software Engineering Conference Held Jointly with 13th ACM SIGSOFT International Symposium on Foundations of Software Engineering (ESEC/FSE-13). ACM, New York, NY, USA, 263–272. DOI: Google ScholarCross Ref
Micha Sharir and Amir Pnueli. 1981. Two Approaches to Interprocedural Data Flow Analysis. In Program flow analysis: theory and applications, Steven S. Muchnick and Neil D. Jones (Eds.). Prentice-Hall, Inc., Englewood Cliffs, NJ, Chapter 7, 189–233.Google Scholar
Yannis Smaragdakis, Martin Bravenboer, and Ondřej Lhoták. 2011. Pick Your Contexts Well: Understanding Object-Sensitivity. In Proceedings of the 38th ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL ’11). ACM, New York, NY, USA, 17–30. DOI: Google ScholarDigital Library
Raja Vallée-Rai, Phong Co, Etienne Gagnon, Laurie J. Hendren, Patrick Lam, and Vijay Sundaresan. 1999. Soot - a Java bytecode optimization framework. In Proceedings of the 1999 Conference of the Centre for Advanced Studies on Collaborative research (CASCON ’99). IBM Press, 125–135.Google ScholarDigital Library
Noriaki Yoshiura and Wei Wei. 2014. Static Data Race Detection for Java Programs with Dynamic Class Loading. Springer International Publishing, Cham, 161–173. DOI: Google ScholarCross Ref
YourKit. 2017. https://www.yourkit.com/ . (2017).Google Scholar
Yury Zhauniarovich, Maqsood Ahmad, Olga Gadyatskaya, Bruno Crispo, and Fabio Massacci. 2015. StaDynA: Addressing the Problem of Dynamic Code Updates in the Security Analysis of Android Applications. In Proceedings of the 5th ACM Conference on Data and Application Security and Privacy (CODASPY ’15). ACM, New York, NY, USA, 37–48. DOI: Google ScholarDigital Library

Index Terms

Heaps don't lie: countering unsoundness with heap snapshots
1. Software and its engineering
  1. Software notations and tools
    1. General programming languages

Recommendations

On the unsoundness of static analysis for Android GUIs
SOAP 2016: Proceedings of the 5th ACM SIGPLAN International Workshop on State Of the Art in Program Analysis

Android software presents exciting new challenges for the static analysis community. However, static analyses for Android are typically unsound. This is due to the lack of specification of the Android framework, the continuous evolution of framework ...
Read More
Interprocedural pointer alias analysis

We present practical approximation methods for computing and representing interprocedural aliases for a program written in a language that includes pointers, reference parameters, and recursion. We present the following contributions: (1) a framework ...
Read More
Differentially testing soundness and precision of program analyzers
ISSTA 2019: Proceedings of the 28th ACM SIGSOFT International Symposium on Software Testing and Analysis

In the last decades, numerous program analyzers have been developed both in academia and industry. Despite their abundance however, there is currently no systematic way of comparing the effectiveness of different analyzers on arbitrary code. In this ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in

Proceedings of the ACM on Programming Languages Volume 1, Issue OOPSLA
October 2017
1786 pages
EISSN:2475-1421
DOI:10.1145/3152284
Issue’s Table of Contents

Copyright © 2017 Owner/Author
This work is licensed under a Creative Commons Attribution International 4.0 License.
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 12 October 2017
Published in pacmpl Volume 1, Issue OOPSLA

Permissions
Request permissions about this article.
Request Permissions

Check for updates
Badges
- Artifacts Evaluated & Functional
Author Tags
Heap Profiling
Instrumentation
Program Analysis
Soundness
Qualifiers
- research-article
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 19
  Total Citations
  View Citations
- 556
  Total Downloads
- Downloads (Last 12 months)68
- Downloads (Last 6 weeks)9
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Heaps don't lie: countering unsoundness with heap snapshots

Proceedings of the ACM on Programming Languages

Abstract

References

Cited By

Index Terms

Recommendations

On the unsoundness of static analysis for Android GUIs

Interprocedural pointer alias analysis

Differentially testing soundness and precision of program analyzers