Abstract
Algebraic Program Analysis (APA) is a ubiquitous framework that has been employed as a unifying model for various problems in data-flow analysis, termination analysis, invariant generation, predicate abstraction and a wide variety of other standard static analysis tasks. APA models program summaries as elements of a regular algebra . Suppose that a summary in A is assigned to every transition of the program and that we aim to compute the effect of running the program starting at line s and ending at line t. APA first computes a regular expression capturing all program paths of interest. In case of intraprocedural analysis, models all paths from s to t, whereas in the interprocedural case it models all interprocedurally-valid paths, i.e. paths that go back to the right caller function when a callee returns. This regular expression is then interpreted over the algebra to obtain the desired result. Suppose the program has n lines of code and each evaluation of an operation in the regular algebra takes O(k) time. It is well-known that a single APA query, or a set of queries with the same starting point s, can be answered in O(n · α(n) · k), where α is the inverse Ackermann function. In this work, we consider an on-demand setting for APA: the program is given in the input and can be preprocessed. The analysis has to then answer a large number of on-line queries, each providing a pair (s, t) of program lines which are the start and end point of the query, respectively. The goal is to avoid the significant cost of running a fresh APA instance for each query. Our main contribution is a series of algorithms that, after a lightweight preprocessing of O(n · lgn · k), answer each query in O(k) time. In other words, our preprocessing has almost the same asymptotic complexity as a single APA query, except for a sub-logarithmic factor, and then every future query is answered instantly, i.e. by a constant number of operations in the algebra. We achieve this remarkable speedup by relying on certain structural sparsity properties of control-flow and call graphs (CFGs and CGs). Specifically, we exploit the fact that control-flow graphs of real-world programs have a tree-like structure and bounded treewidth and nesting depth and that their call graphs have small treedepth in comparison to the size of the program. Finally, we provide experimental results demonstrating the effectiveness and efficiency of our approach and showing that it beats the runtime of classical APA by several orders of magnitude.
- Ali Ahmadi, Krishnendu Chatterjee, Amir Kafshdar Goharshady, Tobias Meggendorfer, Roodabeh Safavi, and Ð orde Zikelic. 2022. Algorithms and Hardness Results for Computing Cores of Markov Chains. In FSTTCS. 250, 29:1–29:20. Google Scholar
- Ali Ahmadi, Majid Daliri, Amir Kafshdar Goharshady, and Andreas Pavlogiannis. 2022. Efficient approximations for cache-conscious data placement. In PLDI. 857–871. Google Scholar
- C Aiswarya. 2022. How treewidth helps in verification. ACM SIGLOG News, 9, 1 (2022), 6–21. Google ScholarDigital Library
- Noga Alon and Baruch Schieber. 1987. Optimal preprocessing for answering on-line product queries. https://citeseerx.ist.psu.edu/document?repid=rep1&doi=cf740240d3a7440e23e92a09bf590cb70544cf4f Google Scholar
- Ali Asadi, Krishnendu Chatterjee, Amir Kafshdar Goharshady, Kiarash Mohammadi, and Andreas Pavlogiannis. 2020. Faster Algorithms for Quantitative Analysis of MCs and MDPs with Small Treewidth. In ATVA. 253–270. Google Scholar
- Wayne A. Babich and Mehdi Jazayeri. 1978. The Method of Attributes for Data Flow Analysis: Part II. Demand Analysis. Acta Informatica, 10 (1978), 265–272. Google ScholarDigital Library
- Roland C Backhouse and Bernard A Carré. 1975. Regular algebra applied to path-finding problems. IMA Journal of Applied Mathematics, 15, 2 (1975), 161–186. Google ScholarCross Ref
- Thomas Ball, Ella Bounimova, Vladimir Levin, Rahul Kumar, and Jakob Lichtenberg. 2010. The Static Driver Verifier Research Platform. In CAV. 6174, 119–122. Google Scholar
- Thomas Ball and Sriram K. Rajamani. 2000. Bebop: A Symbolic Model Checker for Boolean Programs. In SPIN. 1885, 113–130. Google Scholar
- Michael A Bender, Martin Farach-Colton, Giridhar Pemmasani, Steven Skiena, and Pavel Sumazin. 2005. Lowest common ancestors in trees and directed acyclic graphs. Journal of Algorithms, 57, 2 (2005), 75–94. Google ScholarDigital Library
- Stephen M. Blackburn, Robin Garner, Chris Hoffmann, Asjad M. Khan, Kathryn S. McKinley, Rotem Bentzur, Amer Diwan, Daniel Feinberg, Daniel Frampton, Samuel Z. Guyer, Martin Hirzel, Antony L. Hosking, Maria Jump, Han Bok Lee, J. Eliot B. Moss, Aashish Phansalkar, Darko Stefanovic, Thomas VanDrunen, Daniel von Dincklage, and Ben Wiedermann. 2006. The DaCapo benchmarks: Java benchmarking development and analysis. In OOPSLA. ACM, 169–190. Google Scholar
- Eric Bodden. 2012. Inter-procedural data-flow analysis with IFDS/IDE and Soot. In SOAP@PLDI. 3–8. Google Scholar
- Hans L. Bodlaender. 1988. Dynamic Programming on Graphs with Bounded Treewidth. In ICALP. 317, 105–118. Google Scholar
- Hans L. Bodlaender. 1996. A Linear-Time Algorithm for Finding Tree-Decompositions of Small Treewidth. SIAM J. Comput., 25, 6 (1996), 1305–1317. Google ScholarDigital Library
- Hans L. Bodlaender and Torben Hagerup. 1998. Parallel Algorithms with Optimal Speedup for Bounded Treewidth. SIAM J. Comput., 27, 6 (1998), 1725–1746. Google ScholarDigital Library
- Richard B Borie, R Gary Parker, and Craig A Tovey. 1992. Automatic generation of linear-time algorithms from predicate calculus descriptions of problems on recursively constructed graph families. Algorithmica, 7 (1992), 555–581. Google ScholarDigital Library
- Jason Breck. 2020. Enhancing Algebraic Program Analysis. University of Wisconsin. Google Scholar
- Igor Carpanese. 2018. A Visual Introduction to Centroid Decomposition. https://medium.com/carpanese/an-illustrated-introduction-to-centroid-decomposition-8c1989d53308 Google Scholar
- Krishnendu Chatterjee, Amir Kafshdar Goharshady, and Ehsan Kafshdar Goharshady. 2019. The treewidth of smart contracts. In SAC. 400–408. Google Scholar
- Krishnendu Chatterjee, Amir Kafshdar Goharshady, Prateesh Goyal, Rasmus Ibsen-Jensen, and Andreas Pavlogiannis. 2019. Faster Algorithms for Dynamic Algebraic Queries in Basic RSMs with Constant Treewidth. ACM Trans. Program. Lang. Syst., 41, 4 (2019), 23:1–23:46. Google ScholarDigital Library
- Krishnendu Chatterjee, Amir Kafshdar Goharshady, Rasmus Ibsen-Jensen, and Andreas Pavlogiannis. 2016. Algorithms for algebraic path properties in concurrent systems of constant treewidth components. In POPL. 733–747. Google Scholar
- Krishnendu Chatterjee, Amir Kafshdar Goharshady, Rasmus Ibsen-Jensen, and Andreas Pavlogiannis. 2020. Optimal and Perfectly Parallel Algorithms for On-demand Data-Flow Analysis. In ESOP. 112–140. Google Scholar
- Krishnendu Chatterjee, Amir Kafshdar Goharshady, Nastaran Okati, and Andreas Pavlogiannis. 2019. Efficient parameterized algorithms for data packing. In POPL. 53:1–53:28. Google Scholar
- Krishnendu Chatterjee, Amir Kafshdar Goharshady, and Andreas Pavlogiannis. 2017. JTDec: A Tool for Tree Decompositions in Soot. In ATVA. 10482, 59–66. Google Scholar
- Krishnendu Chatterjee, Rasmus Ibsen-Jensen, Amir Kafshdar Goharshady, and Andreas Pavlogiannis. 2018. Algorithms for Algebraic Path Properties in Concurrent Systems of Constant Treewidth Components. ACM Trans. Program. Lang. Syst., 40, 3 (2018), 9:1–9:43. Google ScholarDigital Library
- Krishnendu Chatterjee, Rasmus Ibsen-Jensen, and Andreas Pavlogiannis. 2021. Faster algorithms for quantitative verification in bounded treewidth graphs. Formal Methods Syst. Des., 57, 3 (2021), 401–428. Google ScholarDigital Library
- Giovanna Kobus Conrado, Amir Kafshdar Goharshady, Kerim Kochekov, Yun Chen Tsai, and Ahmed Khaled Zaher. 2023. Artifact for Exploiting the Sparseness of Control-flow and Call Graphs for Efficient and On-demand Algebraic Program Analysis. Zenodo. https://doi.org/10.5281/zenodo.8320671 Google ScholarDigital Library
- Giovanna Kobus Conrado, Amir Kafshdar Goharshady, Kerim Kochekov, Yun Chen Tsai, and Ahmed Khaled Zaher. 2023. Exploiting the Sparseness of Control-flow and Call Graphs for Efficient and On-demand Algebraic Program Analysis. https://hal.science/hal-04194535 Google Scholar
- Giovanna Kobus Conrado, Amir Kafshdar Goharshady, and Chun Kit Lam. 2023. The Bounded Pathwidth of Control-Flow Graphs. In OOPSLA. 232:1–232:26. Google Scholar
- Bruno Courcelle. 1990. The monadic second-order logic of graphs. I. Recognizable sets of finite graphs. Information and computation, 85, 1 (1990), 12–75. Google Scholar
- Patrick Cousot and Radhia Cousot. 1977. Static Determination of Dynamic Properties of Recursive Procedures. In Formal Description of Programming Concepts. 237–278. Google Scholar
- Marek Cygan, Fedor V Fomin, Ł ukasz Kowalik, Daniel Lokshtanov, Dániel Marx, Marcin Pilipczuk, Michał Pilipczuk, and Saket Saurabh. 2015. Parameterized algorithms. Springer. Google Scholar
- Víctor Dalmau, Phokion G. Kolaitis, and Moshe Y. Vardi. 2002. Constraint Satisfaction, Bounded Treewidth, and Finite-Variable Logics. In CP. Springer, 310–326. Google Scholar
- Mark de Berg, Marc van Kreveld, Mark Overmars, Otfried Cheong Schwarzkopf, Mark de Berg, Marc van Kreveld, Mark Overmars, and Otfried Cheong Schwarzkopf. 2000. More geometric data structures: Windowing. Computational Geometry: algorithms and applications, 211–233. Google Scholar
- Holger Dell, Christian Komusiewicz, Nimrod Talmon, and Mathias Weller. 2017. The PACE 2017 Parameterized Algorithms and Computational Experiments Challenge: The Second Iteration. In IPEC. 89, 30:1–30:12. Google Scholar
- Davide della Giustina, Nicola Prezza, and Rossano Venturini. 2019. A New Linear-Time Algorithm for Centroid Decomposition. In SPIRE. 274–282. Google Scholar
- Evelyn Duesterwald, Rajiv Gupta, and Mary Lou Soffa. 1995. Demand-driven Computation of Interprocedural Data Flow. In POPL. ACM Press, 37–48. Google Scholar
- Michael Elberfeld, Andreas Jakoby, and Till Tantau. 2010. Logspace Versions of the Theorems of Bodlaender and Courcelle. In FOCS. IEEE Computer Society, 143–152. Google Scholar
- Javier Esparza, Stefan Kiefer, and Michael Luttenberger. 2010. Newtonian program analysis. J. ACM, 57, 6 (2010), 33:1–33:47. Google ScholarDigital Library
- Andrea Ferrara, Guoqiang Pan, and Moshe Y. Vardi. 2005. Treewidth in Verification: Local vs. Global. In LPAR. 489–503. Google Scholar
- Fedor V. Fomin, Daniel Lokshtanov, Saket Saurabh, Michal Pilipczuk, and Marcin Wrochna. 2018. Fully Polynomial-Time Parameterized Computations for Graphs and Matrices of Low Treewidth. ACM Trans. Algorithms, 14, 3 (2018), 34:1–34:45. Google ScholarDigital Library
- Harold N. Gabow and Robert Endre Tarjan. 1983. A Linear-Time Algorithm for a Special Case of Disjoint Set Union. In STOC. 246–251. Google Scholar
- Amir Kafshdar Goharshady and Fatemeh Mohammadi. 2020. An efficient algorithm for computing network reliability in small treewidth. Reliab. Eng. Syst. Saf., 193 (2020), 106665. Google ScholarCross Ref
- Amir Kafshdar Goharshady and Ahmed Khaled Zaher. 2023. Efficient Interprocedural Data-Flow Analysis Using Treedepth and Treewidth. In VMCAI. 177–202. Google Scholar
- Jens Gustedt, Ole A. Mæ hle, and Jan Arne Telle. 2002. The Treewidth of Java Programs. In ALENEX. 2409, 86–97. Google Scholar
- Susan Horwitz, Thomas W. Reps, and Shmuel Sagiv. 1995. Demand Interprocedural Dataflow Analysis. In FSE. 104–115. Google Scholar
- Camille Jordan. 1869. Sur les assemblages de lignes. Journal für die reine und angewandte Mathematik, 70 (1869), 185–190. Google ScholarCross Ref
- Alexander Kernozhitsky, Anton Älgmyr, Oleksandr Kulkov, and Wiktor Kuchta. 2022. Sqrt Tree. https://cp-algorithms.com/data_structures/sqrt-tree.html Google Scholar
- Gary A. Kildall. 1973. A Unified Approach to Global Program Optimization. In POPL. 194–206. Google Scholar
- Zachary Kincaid, Jason Breck, Ashkan Forouhi Boroujeni, and Thomas W. Reps. 2017. Compositional recurrence analysis revisited. In PLDI. ACM, 248–262. Google Scholar
- Zachary Kincaid, John Cyphert, Jason Breck, and Thomas W. Reps. 2018. Non-linear reasoning for invariant synthesis. In POPL. 54:1–54:33. Google Scholar
- Zachary Kincaid, Thomas W. Reps, and John Cyphert. 2021. Algebraic Program Analysis. In CAV. 46–83. Google Scholar
- Stephen Kleene. 1956. Representation of events in nerve nets and finite automata. Automata studies, 34 (1956), 3–41. Google Scholar
- Joachim Kneis and Alexander Langer. 2008. A Practical Approach to Courcelle’s Theorem. In MEMICS (Electronic Notes in Theoretical Computer Science, Vol. 251). 65–81. Google Scholar
- Lukasz Kowalik, Marcin Mucha, Wojciech Nadara, Marcin Pilipczuk, Manuel Sorge, and Piotr Wygocki. 2020. The PACE 2020 Parameterized Algorithms and Computational Experiments Challenge: Treedepth. In IPEC. 180, 37:1–37:18. Google Scholar
- Dexter Kozen. 1990. On Kleene Algebras and Closed Semirings. In MFCS. 26–47. Google Scholar
- Daniel Kroening, Natasha Sharygina, Stefano Tonetta, Aliaksei Tsitovich, and Christoph M. Wintersteiger. 2008. Loop Summarization Using Abstract Transformers. In ATVA. 111–125. Google Scholar
- Jørn Lind-Nielsen. 1999. BuDDy: A binary decision diagram package.. Google Scholar
- Mohsen Alambardar Meybodi, Amir Kafshdar Goharshady, Mohammad Reza Hooshmandasl, and Ali Shakiba. 2022. Optimal Mining: Maximizing Bitcoin Miners’ Revenues from Transaction Fees. In Blockchain. 266–273. Google Scholar
- Jaroslav Nesetril and Patrice Ossona de Mendez. 2006. Tree-depth, subgraph coloring and homomorphism bounds. Eur. J. Comb., 27, 6 (2006), 1022–1041. Google ScholarDigital Library
- Rolf Niedermeier. 2004. Ubiquitous Parameterization - Invitation to Fixed-Parameter Algorithms. In MFCS. 84–103. Google Scholar
- Jan Obdrzálek. 2003. Fast Mu-Calculus Model Checking when Tree-Width Is Bounded. In CAV. 80–92. Google Scholar
- Thomas W. Reps. 1993. Demand Interprocedural Program Analysis Using Logic Databases. In ILPS. 163–196. Google Scholar
- Thomas W. Reps, Susan Horwitz, and Shmuel Sagiv. 1995. Precise Interprocedural Dataflow Analysis via Graph Reachability. In POPL. 49–61. Google Scholar
- Thomas W. Reps, Emma Turetsky, and Prathmesh Prabhu. 2017. Newtonian Program Analysis via Tensor Product. ACM Trans. Program. Lang. Syst., 39, 2 (2017), 9:1–9:72. Google ScholarDigital Library
- Neil Robertson and Paul D. Seymour. 1986. Graph Minors. II. Algorithmic Aspects of Tree-Width. J. Algorithms, 7, 3 (1986), 309–322. Google ScholarCross Ref
- Shmuel Sagiv, Thomas W. Reps, and Susan Horwitz. 1996. Precise Interprocedural Dataflow Analysis with Applications to Constant Propagation. Theor. Comput. Sci., 167 (1996), 131–170. Google ScholarDigital Library
- Micha Sharir and Amir Pnueli. 1978. Two approaches to interprocedural data flow analysis. Courant Institute of Mathematical Sciences. Google Scholar
- Manu Sridharan, Denis Gopan, Lexin Shan, and Rastislav Bodík. 2005. Demand-driven points-to analysis for Java. In OOPSLA. ACM, 59–76. Google Scholar
- Robert Endre Tarjan. 1981. Fast Algorithms for Solving Path Problems. J. ACM, 28, 3 (1981), 594–614. Google ScholarDigital Library
- Robert Endre Tarjan. 1981. A Unified Approach to Path Problems. J. ACM, 28, 3 (1981), 577–593. Google ScholarDigital Library
- Mikkel Thorup. 1998. All structured programs have small tree width and good register allocation. Information and Computation, 142, 2 (1998), 159–181. Google ScholarDigital Library
- Raja Vallée-Rai, Phong Co, Etienne Gagnon, Laurie J. Hendren, Patrick Lam, and Vijay Sundaresan. 1999. Soot - a Java bytecode optimization framework. In CASCON. 13. Google Scholar
- Dacong Yan, Guoqing Xu, and Atanas Rountev. 2011. Demand-driven context-sensitive alias analysis for Java. In ISSTA. ACM, 155–165. Google Scholar
- Xin Zheng and Radu Rugina. 2008. Demand-driven alias analysis for C. In POPL. ACM, 197–208. Google Scholar
- Shaowei Zhu and Zachary Kincaid. 2021. Termination analysis without the tears. In PLDI. 1296–1311. Google Scholar
Index Terms
- Exploiting the Sparseness of Control-Flow and Call Graphs for Efficient and On-Demand Algebraic Program Analysis
Recommendations
The Bounded Pathwidth of Control-Flow Graphs
Pathwidth and treewidth are standard and well-studied graph sparsity parameters which intuitively model the degree to which a given graph resembles a path or a tree, respectively. It is well-known that the control-flow graphs of structured goto-free ...
Optimal and Perfectly Parallel Algorithms for On-demand Data-Flow Analysis
Programming Languages and SystemsAbstractInterprocedural data-flow analyses form an expressive and useful paradigm of numerous static analysis applications, such as live variables analysis, alias analysis and null pointers analysis. The most widely-used framework for interprocedural data-...
Efficient Interprocedural Data-Flow Analysis Using Treedepth and Treewidth
Verification, Model Checking, and Abstract InterpretationAbstractWe consider interprocedural data-flow analysis as formalized by the standard IFDS framework, which can express many widely-used static analyses such as reaching definitions, live variables, and null-pointer. We focus on the well-studied on-demand ...
Comments