ABSTRACT
There is a huge and growing gap between the speed of accesses to data stored in main memory vs cache. Thus, cache misses account for a significant portion of runtime overhead in virtually every program and minimizing them has been an active research topic for decades. The primary and most classical formal model for this problem is that of Cache-conscious Data Placement (CDP): given a commutative cache with constant capacity k and a sequence Σ of accesses to data elements, the goal is to map each data element to a cache line such that the total number of cache misses over Σ is minimized. Note that we are considering an offline single-threaded setting in which Σ is known a priori. CDP has been widely studied since the 1990s. In POPL 2002, Petrank and Rawitz proved a notoriously strong hardness result: They showed that for every k ≥ 3, CDP is not only NP-hard but also hard-to-approximate within any non-trivial factor unless P=NP. As such, all subsequent works gave up on theoretical improvements and instead focused on heuristic algorithms with no theoretical guarantees.
In this work, we present the first-ever positive theoretical result for CDP. The fundamental idea behind our approach is that real-world instances of the problem have specific structural properties that can be exploited to obtain efficient algorithms with strong approximation guarantees. Specifically, the access graphs corresponding to many real-world access sequences are sparse and tree-like. This was already well-known in the community but has only been used to design heuristics without guarantees. In contrast, we provide fixed-parameter tractable algorithms that provably approximate the optimal number of cache misses within any factor 1 + є, assuming that the access graph of a specific degree dє is sparse, i.e. sparser real-world instances lead to tighter approximations. Our theoretical results are accompanied by an experimental evaluation in which our approach outperforms past heuristics over small caches with a handful of lines. However, the approach cannot currently handle large real-world caches and making it scalable in practice is a direction for future work.
- Ali Ahmadi, Majid Daliri, Amir Kafshdar Goharshady, and Andreas Pavlogiannis. 2022. Efficient Approximations for Cache-conscious Data Placement. https://hal.archives-ouvertes.fr/hal-03616652/Google Scholar
- Mohsen Alambardar, Amir Goharshady, Mohammad Reza Hooshmandasl, and Ali Shakiba. 2021. Optimal Mining: Maximizing Bitcoin Miners’ Revenues. https://hal.archives-ouvertes.fr/hal-03232783Google Scholar
- Ali Asadi, Krishnendu Chatterjee, Amir Goharshady, Kiarash Mohammadi, and Andreas Pavlogiannis. 2020. Faster algorithms for quantitative analysis of MCs and MDPs with small treewidth. In ATVA. 253–270.Google Scholar
- Mirza Beg and Peter Van Beek. 2010. A graph theoretic approach to cache-conscious placement of data for direct mapped caches. In ISMM. 113–120.Google Scholar
- Hans Bodlaender. 1996. A linear-time algorithm for finding tree-decompositions of small treewidth. SIAM Journal on computing, 25, 6 (1996), 1305–1317.Google Scholar
- Hans Bodlaender. 1997. Treewidth: Algorithmic techniques and results. In MFCS. 19–36.Google Scholar
- Hans Bodlaender. 1998. A Partial k-Arboretum of Graphs with Bounded Treewidth. Theor. Comput. Sci., 209, 1-2 (1998), 1–45.Google ScholarDigital Library
- Hans L Bodlaender. 1988. Dynamic programming on graphs with bounded treewidth. In ICALP. 105–118.Google Scholar
- Hans L Bodlaender. 1994. A tourist guide through treewidth. Acta cybernetica, 11, 1-2 (1994), 1.Google Scholar
- Hans L Bodlaender. 2005. Discovering treewidth. In SOFSEM. 1–16.Google Scholar
- Hendrik Borghorst and Olaf Spinczyk. 2019. CyPhOS - A Component-Based Cache-Aware Multi-core Operating System. In ARCS. 171–182.Google Scholar
- Allan Borodin, Sandy Irani, Prabhakar Raghavan, and Baruch Schieber. 1995. Competitive Paging with Locality of Reference. J. Comput. Syst. Sci., 50, 2 (1995), 244–258.Google ScholarDigital Library
- Bernd Burgstaller, Johann Blieberger, and Bernhard Scholz. 2004. On the tree width of Ada programs. In ADA. 78–90.Google Scholar
- Brad Calder, Chandra Krintz, Simmi John, and Todd Austin. 1998. Cache-Conscious Data Placement. In ASPLOS. 139–149.Google Scholar
- Krishnendu Chatterjee, Amir Goharshady, and Ehsan Goharshady. 2019. The treewidth of smart contracts. In SAC. 400–408.Google Scholar
- Krishnendu Chatterjee, Amir Goharshady, Prateesh Goyal, Rasmus Ibsen-Jensen, and Andreas Pavlogiannis. 2019. Faster algorithms for dynamic algebraic queries in basic RSMs with constant treewidth. TOPLAS, 41, 4 (2019), 1–46.Google Scholar
- Krishnendu Chatterjee, Amir Goharshady, Rasmus Ibsen-Jensen, and Andreas Pavlogiannis. 2016. Algorithms for algebraic path properties in concurrent systems of constant treewidth components. In POPL.Google Scholar
- Krishnendu Chatterjee, Amir Goharshady, Rasmus Ibsen-Jensen, and Andreas Pavlogiannis. 2020. Optimal and perfectly parallel algorithms for on-demand data-flow analysis. In ESOP. 112–140.Google Scholar
- Krishnendu Chatterjee, Amir Goharshady, Nastaran Okati, and Andreas Pavlogiannis. 2019. Efficient parameterized algorithms for data packing. In POPL. 53:1–53:28.Google Scholar
- Krishnendu Chatterjee, Amir Goharshady, and Andreas Pavlogiannis. 2017. JTDec: A tool for tree decompositions in soot. In ATVA. 59–66.Google Scholar
- Krishnendu Chatterjee, Rasmus Ibsen-Jensen, Amir Goharshady, and Andreas Pavlogiannis. 2018. Algorithms for algebraic path properties in concurrent systems of constant treewidth components. TOPLAS, 40, 3 (2018), 1–43.Google Scholar
- Krishnendu Chatterjee, Rasmus Ibsen-Jensen, and Andreas Pavlogiannis. 2015. Faster algorithms for quantitative verification in constant treewidth graphs. In CAV. 140–157.Google Scholar
- Krishnendu Chatterjee, Rasmus Ibsen-Jensen, and Andreas Pavlogiannis. 2016. Optimal reachability and a space-time tradeoff for distance queries in constant-treewidth graphs. In ESA. 57.Google Scholar
- Krishnendu Chatterjee, Rasmus Ibsen-Jensen, and Andreas Pavlogiannis. 2021. Quantitative Verification on Product Graphs of Small Treewidth. In FSTTCS.Google Scholar
- Krishnendu Chatterjee and Jakub Ł ącki. 2013. Faster algorithms for Markov decision processes with low treewidth. In CAV. 543–558.Google Scholar
- Marek Cygan, Fedor Fomin, Ł ukasz Kowalik, Daniel Lokshtanov, Dániel Marx, Marcin Pilipczuk, Michał Pilipczuk, and Saket Saurabh. 2015. Parameterized algorithms. Springer.Google Scholar
- Chen Ding and Ken Kennedy. 1999. Improving Cache Performance in Dynamic Applications through Data and Computation Reorganization at Run Time. In PLDI. 229–241.Google Scholar
- Wei Ding and Mahmut Kandemir. 2014. CApRI: CAche-conscious data reordering for irregular codes. In SIGMETRICS. 477–489.Google Scholar
- Rodney Downey and Michael Fellows. 2012. Parameterized complexity. Springer.Google Scholar
- John Fearnley and Sven Schewe. 2012. Time and parallelizability results for parity games with bounded treewidth. In ICALP. 189–200.Google Scholar
- Andrea Ferrara, Guoqiang Pan, and Moshe Y Vardi. 2005. Treewidth in verification: Local vs. global. In LPAR. 489–503.Google Scholar
- Amir Goharshady. 2020. Parameterized and algebro-geometric advances in static program analysis. Ph.D. Dissertation. Institute of Science and Technology Austria.Google Scholar
- Amir Goharshady and Fatemeh Mohammadi. 2020. An efficient algorithm for computing network reliability in small treewidth. Reliability Engineering & System Safety, 193 (2020), 106665.Google ScholarCross Ref
- Jens Gustedt, Ole A Mæ hle, and Jan Arne Telle. 2002. The treewidth of Java programs. In ALENEX. 86–97.Google Scholar
- Rahman Lavaee. 2016. The hardness of data packing. In POPL. 232–242.Google Scholar
- Abraham Mendlson, Shlomit Pinter, and Ruth Shtokhamer. 1994. Compile Time Instruction Cache Optimizations. In CC. 404–418.Google Scholar
- Jan Obdržálek. 2003. Fast mu-calculus model checking when tree-width is bounded. In CAV. 80–92.Google Scholar
- Erez Petrank and Dror Rawitz. 2002. The hardness of cache conscious data placement. In POPL. 101–112.Google Scholar
- Leon R Planken, Mathijs M de Weerdt, and Roman PJ van der Krogt. 2012. Computing all-pairs shortest paths by leveraging low treewidth. JAIR, 43 (2012), 353–388.Google ScholarDigital Library
- Neil Robertson and Paul Seymour. 1984. Graph minors. III. Planar tree-width. J. Comb. Theory, Ser. B, 36, 1 (1984), 49–64.Google ScholarCross Ref
- Neil Robertson and Paul D. Seymour. 1986. Graph minors. II. Algorithmic aspects of tree-width. Journal of algorithms, 7, 3 (1986), 309–322.Google ScholarCross Ref
- Theodore Romer, Dennis Lee, Brian Bershad, and Bradley Chen. 1994. Dynamic Page Mapping Policies for Cache Conflict Resolution on Standard Hardware. In OSDI. 255–266.Google Scholar
- Shai Rubin, David Bernstein, and Michael Rodeh. 1999. Virtual Cache Line: A New Technique to Improve Cache Exploitation for Recursive Data Structures. In CC. 1575, 259–273.Google Scholar
- Sriram Sankaranarayanan. 2020. Reachability Analysis Using Message Passing over Tree Decompositions. In CAV. 604–628.Google Scholar
- Timothy Sherwood, Brad Calder, and Joel Emer. 1999. Reducing cache misses using hardware and software page placement. In ICS. 155–164.Google Scholar
- Khalid Thabit. 1982. Cache management by the compiler. Rice University.Google Scholar
- Mikkel Thorup. 1998. All Structured Programs have Small Tree-Width and Good Register Allocation. Inf. Comput., 142, 2 (1998), 159–181.Google ScholarDigital Library
- Thomas van Dijk, Jan-Pieter van den Heuvel, and Wouter Slob. 2006. Computing treewidth with LibTW.Google Scholar
- Raj Vaswani and John Zahorjan. 1991. The Implications of Cache Affinity on Processor Scheduling for Multiprogrammed, Shared Memory Multiprocessors. In SOSP. ACM, 26–40.Google Scholar
- Chengliang Zhang, Chen Ding, Mitsunori Ogihara, Yutao Zhong, and Youfeng Wu. 2006. A hierarchical model of data locality. In POPL. 16–29.Google Scholar
- Yutao Zhong, Maksim Orlovich, Xipeng Shen, and Chen Ding. 2004. Array regrouping and structure splitting using whole-program reference affinity. In PLDI.Google Scholar
Index Terms
- Efficient approximations for cache-conscious data placement
Recommendations
Counter-Based Cache Replacement and Bypassing Algorithms
Recent studies have shown that in highly associative caches, the performance gap between the Least Recently Used (LRU) and the theoretical optimal replacement algorithms is large, motivating the design of alternative replacement algorithms to improve ...
Reactive NUCA: near-optimal block placement and replication in distributed caches
Increases in on-chip communication delay and the large working sets of server and scientific workloads complicate the design of the on-chip last-level cache for multicore processors. The large working sets favor a shared cache design that maximizes the ...
Reactive NUCA: near-optimal block placement and replication in distributed caches
ISCA '09: Proceedings of the 36th annual international symposium on Computer architectureIncreases in on-chip communication delay and the large working sets of server and scientific workloads complicate the design of the on-chip last-level cache for multicore processors. The large working sets favor a shared cache design that maximizes the ...
Comments