skip to main content
10.1145/3519939.3523436acmconferencesArticle/Chapter ViewAbstractPublication PagespldiConference Proceedingsconference-collections
research-article

Efficient approximations for cache-conscious data placement

Published:09 June 2022Publication History

ABSTRACT

There is a huge and growing gap between the speed of accesses to data stored in main memory vs cache. Thus, cache misses account for a significant portion of runtime overhead in virtually every program and minimizing them has been an active research topic for decades. The primary and most classical formal model for this problem is that of Cache-conscious Data Placement (CDP): given a commutative cache with constant capacity k and a sequence Σ of accesses to data elements, the goal is to map each data element to a cache line such that the total number of cache misses over Σ is minimized. Note that we are considering an offline single-threaded setting in which Σ is known a priori. CDP has been widely studied since the 1990s. In POPL 2002, Petrank and Rawitz proved a notoriously strong hardness result: They showed that for every k ≥ 3, CDP is not only NP-hard but also hard-to-approximate within any non-trivial factor unless P=NP. As such, all subsequent works gave up on theoretical improvements and instead focused on heuristic algorithms with no theoretical guarantees.

In this work, we present the first-ever positive theoretical result for CDP. The fundamental idea behind our approach is that real-world instances of the problem have specific structural properties that can be exploited to obtain efficient algorithms with strong approximation guarantees. Specifically, the access graphs corresponding to many real-world access sequences are sparse and tree-like. This was already well-known in the community but has only been used to design heuristics without guarantees. In contrast, we provide fixed-parameter tractable algorithms that provably approximate the optimal number of cache misses within any factor 1 + є, assuming that the access graph of a specific degree dє is sparse, i.e. sparser real-world instances lead to tighter approximations. Our theoretical results are accompanied by an experimental evaluation in which our approach outperforms past heuristics over small caches with a handful of lines. However, the approach cannot currently handle large real-world caches and making it scalable in practice is a direction for future work.

References

  1. Ali Ahmadi, Majid Daliri, Amir Kafshdar Goharshady, and Andreas Pavlogiannis. 2022. Efficient Approximations for Cache-conscious Data Placement. https://hal.archives-ouvertes.fr/hal-03616652/Google ScholarGoogle Scholar
  2. Mohsen Alambardar, Amir Goharshady, Mohammad Reza Hooshmandasl, and Ali Shakiba. 2021. Optimal Mining: Maximizing Bitcoin Miners’ Revenues. https://hal.archives-ouvertes.fr/hal-03232783Google ScholarGoogle Scholar
  3. Ali Asadi, Krishnendu Chatterjee, Amir Goharshady, Kiarash Mohammadi, and Andreas Pavlogiannis. 2020. Faster algorithms for quantitative analysis of MCs and MDPs with small treewidth. In ATVA. 253–270.Google ScholarGoogle Scholar
  4. Mirza Beg and Peter Van Beek. 2010. A graph theoretic approach to cache-conscious placement of data for direct mapped caches. In ISMM. 113–120.Google ScholarGoogle Scholar
  5. Hans Bodlaender. 1996. A linear-time algorithm for finding tree-decompositions of small treewidth. SIAM Journal on computing, 25, 6 (1996), 1305–1317.Google ScholarGoogle Scholar
  6. Hans Bodlaender. 1997. Treewidth: Algorithmic techniques and results. In MFCS. 19–36.Google ScholarGoogle Scholar
  7. Hans Bodlaender. 1998. A Partial k-Arboretum of Graphs with Bounded Treewidth. Theor. Comput. Sci., 209, 1-2 (1998), 1–45.Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Hans L Bodlaender. 1988. Dynamic programming on graphs with bounded treewidth. In ICALP. 105–118.Google ScholarGoogle Scholar
  9. Hans L Bodlaender. 1994. A tourist guide through treewidth. Acta cybernetica, 11, 1-2 (1994), 1.Google ScholarGoogle Scholar
  10. Hans L Bodlaender. 2005. Discovering treewidth. In SOFSEM. 1–16.Google ScholarGoogle Scholar
  11. Hendrik Borghorst and Olaf Spinczyk. 2019. CyPhOS - A Component-Based Cache-Aware Multi-core Operating System. In ARCS. 171–182.Google ScholarGoogle Scholar
  12. Allan Borodin, Sandy Irani, Prabhakar Raghavan, and Baruch Schieber. 1995. Competitive Paging with Locality of Reference. J. Comput. Syst. Sci., 50, 2 (1995), 244–258.Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Bernd Burgstaller, Johann Blieberger, and Bernhard Scholz. 2004. On the tree width of Ada programs. In ADA. 78–90.Google ScholarGoogle Scholar
  14. Brad Calder, Chandra Krintz, Simmi John, and Todd Austin. 1998. Cache-Conscious Data Placement. In ASPLOS. 139–149.Google ScholarGoogle Scholar
  15. Krishnendu Chatterjee, Amir Goharshady, and Ehsan Goharshady. 2019. The treewidth of smart contracts. In SAC. 400–408.Google ScholarGoogle Scholar
  16. Krishnendu Chatterjee, Amir Goharshady, Prateesh Goyal, Rasmus Ibsen-Jensen, and Andreas Pavlogiannis. 2019. Faster algorithms for dynamic algebraic queries in basic RSMs with constant treewidth. TOPLAS, 41, 4 (2019), 1–46.Google ScholarGoogle Scholar
  17. Krishnendu Chatterjee, Amir Goharshady, Rasmus Ibsen-Jensen, and Andreas Pavlogiannis. 2016. Algorithms for algebraic path properties in concurrent systems of constant treewidth components. In POPL.Google ScholarGoogle Scholar
  18. Krishnendu Chatterjee, Amir Goharshady, Rasmus Ibsen-Jensen, and Andreas Pavlogiannis. 2020. Optimal and perfectly parallel algorithms for on-demand data-flow analysis. In ESOP. 112–140.Google ScholarGoogle Scholar
  19. Krishnendu Chatterjee, Amir Goharshady, Nastaran Okati, and Andreas Pavlogiannis. 2019. Efficient parameterized algorithms for data packing. In POPL. 53:1–53:28.Google ScholarGoogle Scholar
  20. Krishnendu Chatterjee, Amir Goharshady, and Andreas Pavlogiannis. 2017. JTDec: A tool for tree decompositions in soot. In ATVA. 59–66.Google ScholarGoogle Scholar
  21. Krishnendu Chatterjee, Rasmus Ibsen-Jensen, Amir Goharshady, and Andreas Pavlogiannis. 2018. Algorithms for algebraic path properties in concurrent systems of constant treewidth components. TOPLAS, 40, 3 (2018), 1–43.Google ScholarGoogle Scholar
  22. Krishnendu Chatterjee, Rasmus Ibsen-Jensen, and Andreas Pavlogiannis. 2015. Faster algorithms for quantitative verification in constant treewidth graphs. In CAV. 140–157.Google ScholarGoogle Scholar
  23. Krishnendu Chatterjee, Rasmus Ibsen-Jensen, and Andreas Pavlogiannis. 2016. Optimal reachability and a space-time tradeoff for distance queries in constant-treewidth graphs. In ESA. 57.Google ScholarGoogle Scholar
  24. Krishnendu Chatterjee, Rasmus Ibsen-Jensen, and Andreas Pavlogiannis. 2021. Quantitative Verification on Product Graphs of Small Treewidth. In FSTTCS.Google ScholarGoogle Scholar
  25. Krishnendu Chatterjee and Jakub Ł ącki. 2013. Faster algorithms for Markov decision processes with low treewidth. In CAV. 543–558.Google ScholarGoogle Scholar
  26. Marek Cygan, Fedor Fomin, Ł ukasz Kowalik, Daniel Lokshtanov, Dániel Marx, Marcin Pilipczuk, Michał Pilipczuk, and Saket Saurabh. 2015. Parameterized algorithms. Springer.Google ScholarGoogle Scholar
  27. Chen Ding and Ken Kennedy. 1999. Improving Cache Performance in Dynamic Applications through Data and Computation Reorganization at Run Time. In PLDI. 229–241.Google ScholarGoogle Scholar
  28. Wei Ding and Mahmut Kandemir. 2014. CApRI: CAche-conscious data reordering for irregular codes. In SIGMETRICS. 477–489.Google ScholarGoogle Scholar
  29. Rodney Downey and Michael Fellows. 2012. Parameterized complexity. Springer.Google ScholarGoogle Scholar
  30. John Fearnley and Sven Schewe. 2012. Time and parallelizability results for parity games with bounded treewidth. In ICALP. 189–200.Google ScholarGoogle Scholar
  31. Andrea Ferrara, Guoqiang Pan, and Moshe Y Vardi. 2005. Treewidth in verification: Local vs. global. In LPAR. 489–503.Google ScholarGoogle Scholar
  32. Amir Goharshady. 2020. Parameterized and algebro-geometric advances in static program analysis. Ph.D. Dissertation. Institute of Science and Technology Austria.Google ScholarGoogle Scholar
  33. Amir Goharshady and Fatemeh Mohammadi. 2020. An efficient algorithm for computing network reliability in small treewidth. Reliability Engineering & System Safety, 193 (2020), 106665.Google ScholarGoogle ScholarCross RefCross Ref
  34. Jens Gustedt, Ole A Mæ hle, and Jan Arne Telle. 2002. The treewidth of Java programs. In ALENEX. 86–97.Google ScholarGoogle Scholar
  35. Rahman Lavaee. 2016. The hardness of data packing. In POPL. 232–242.Google ScholarGoogle Scholar
  36. Abraham Mendlson, Shlomit Pinter, and Ruth Shtokhamer. 1994. Compile Time Instruction Cache Optimizations. In CC. 404–418.Google ScholarGoogle Scholar
  37. Jan Obdržálek. 2003. Fast mu-calculus model checking when tree-width is bounded. In CAV. 80–92.Google ScholarGoogle Scholar
  38. Erez Petrank and Dror Rawitz. 2002. The hardness of cache conscious data placement. In POPL. 101–112.Google ScholarGoogle Scholar
  39. Leon R Planken, Mathijs M de Weerdt, and Roman PJ van der Krogt. 2012. Computing all-pairs shortest paths by leveraging low treewidth. JAIR, 43 (2012), 353–388.Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Neil Robertson and Paul Seymour. 1984. Graph minors. III. Planar tree-width. J. Comb. Theory, Ser. B, 36, 1 (1984), 49–64.Google ScholarGoogle ScholarCross RefCross Ref
  41. Neil Robertson and Paul D. Seymour. 1986. Graph minors. II. Algorithmic aspects of tree-width. Journal of algorithms, 7, 3 (1986), 309–322.Google ScholarGoogle ScholarCross RefCross Ref
  42. Theodore Romer, Dennis Lee, Brian Bershad, and Bradley Chen. 1994. Dynamic Page Mapping Policies for Cache Conflict Resolution on Standard Hardware. In OSDI. 255–266.Google ScholarGoogle Scholar
  43. Shai Rubin, David Bernstein, and Michael Rodeh. 1999. Virtual Cache Line: A New Technique to Improve Cache Exploitation for Recursive Data Structures. In CC. 1575, 259–273.Google ScholarGoogle Scholar
  44. Sriram Sankaranarayanan. 2020. Reachability Analysis Using Message Passing over Tree Decompositions. In CAV. 604–628.Google ScholarGoogle Scholar
  45. Timothy Sherwood, Brad Calder, and Joel Emer. 1999. Reducing cache misses using hardware and software page placement. In ICS. 155–164.Google ScholarGoogle Scholar
  46. Khalid Thabit. 1982. Cache management by the compiler. Rice University.Google ScholarGoogle Scholar
  47. Mikkel Thorup. 1998. All Structured Programs have Small Tree-Width and Good Register Allocation. Inf. Comput., 142, 2 (1998), 159–181.Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. Thomas van Dijk, Jan-Pieter van den Heuvel, and Wouter Slob. 2006. Computing treewidth with LibTW.Google ScholarGoogle Scholar
  49. Raj Vaswani and John Zahorjan. 1991. The Implications of Cache Affinity on Processor Scheduling for Multiprogrammed, Shared Memory Multiprocessors. In SOSP. ACM, 26–40.Google ScholarGoogle Scholar
  50. Chengliang Zhang, Chen Ding, Mitsunori Ogihara, Yutao Zhong, and Youfeng Wu. 2006. A hierarchical model of data locality. In POPL. 16–29.Google ScholarGoogle Scholar
  51. Yutao Zhong, Maksim Orlovich, Xipeng Shen, and Chen Ding. 2004. Array regrouping and structure splitting using whole-program reference affinity. In PLDI.Google ScholarGoogle Scholar

Index Terms

  1. Efficient approximations for cache-conscious data placement

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        PLDI 2022: Proceedings of the 43rd ACM SIGPLAN International Conference on Programming Language Design and Implementation
        June 2022
        1038 pages
        ISBN:9781450392655
        DOI:10.1145/3519939
        • General Chair:
        • Ranjit Jhala,
        • Program Chair:
        • Işil Dillig

        Copyright © 2022 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 9 June 2022

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article

        Acceptance Rates

        Overall Acceptance Rate406of2,067submissions,20%

        Upcoming Conference

        PLDI '24

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader