Skip to main content

Revisiting Cache Block Superloading

  • Conference paper
High Performance Embedded Architectures and Compilers (HiPEAC 2009)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 5409))

  • 951 Accesses

Abstract

Technological advances and increasingly complex and dynamic application behavior argue for revisiting mechanisms that adapt logical cache block size to application characteristics. This approach to bridging the processor/memory performance gap has been studied before, but mostly via trace-driven simulation, looking only at L1 caches. Given changes in hardware/software technology, we revisit the general approach: we propose a transparent, phase-adaptive, low-complexity mechanism for L2 superloading and evaluate it on a full-system simulator for 23 SPEC CPU2000 codes. Targeting L2 benefits instruction and data fetches. We investigate cache blocks of 32-512B, confirming that no fixed size performs well for all applications: differences range from 5-49% between best and worst fixed block sizes. Our scheme obtains performance similar to the per application best static block size. In a few cases, we minimally decrease performance compared to the best static size, but best size varies per application, and rarely matches real hardware. We generally improve performance over best static choices by up to 10%. Phase adaptability particularly benefits multiprogrammed workloads with conflicting locality characteristics, yielding performance gains of 5-20%. Our approach also outperforms next-line and delta prefetching.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Calder, B., Krintz, C., John, S., Austin, T.: Cache-conscious data placement. In: Proc. 8th ACM Symposium on Architectural Support for Programming Languages and Operating Systems, pp. 139–149 (October 1998)

    Google Scholar 

  2. Chen, C., Yang, S., Falsafi, B., Moshovos, A.: Accurate and complexity-effective spatial pattern prediction. In: Proc. 10th IEEE Symposium on High Performance Computer Architecture, pp. 187–276 (February 2004)

    Google Scholar 

  3. Chilimbi, T., Davidson, B., Larus, J.: Cache-conscious structure definition. In: Proc. ACM SIGPLAN Conference on Programming Language Design and Implementation, pp. 13–24 (May 1999)

    Google Scholar 

  4. Dahlgren, F., Dubois, M., Stenstrom, P.: Fixed and adaptive sequential prefetching in shared memory multiprocessors. In: Proc. International Conference on Parallel Processing, pp. 733–746 (August 1993)

    Google Scholar 

  5. Ding, C., Kennedy, K.: Improving cache performance in dynamic applications through data and computation reorganization at run time. In: Proc. ACM SIGPLAN Conference on Programming Language Design and Implementation, pp. 229–241 (May 1999)

    Google Scholar 

  6. Gonzalez, A., Aliagas, C., Valero, M.: A data cache with multiple caching strategies tuned to different types of locality. In: Proc. 1995 International Conference on Supercomputing, pp. 338–347 (1995)

    Google Scholar 

  7. Gornish, E., Veidenbaum, A.: An integrated hardware/software data prefetching scheme for shared-memory multiprocessors. In: Proc. International Conference on Parallel Programming, pp. 35–70 (August 1994)

    Google Scholar 

  8. Inoue, K., Kai, K., Marakami, K.: High bandwidth variable line size cache architecture for merged DRAM/logic LSIs. IEICE Transactions on Electronics 81(9), 1438–1447 (1999)

    Google Scholar 

  9. Johnson, T., Merten, M., Hwu, W.: Run-time spatial locality detection and optimization. In: Proc. IEEE/ACM 30th International Symposium on Microarchitecture, pp. 57–64 (December 1997)

    Google Scholar 

  10. Kane, G.: MIPS RISC Architecture. Prentice Hall, Englewood Cliffs (1989)

    Google Scholar 

  11. Kodukula, I., Pingali, K.: Data-centric transformations for locality enhancement. International Journal of Parallel Programming 29(3), 319–364 (2001)

    Article  MATH  Google Scholar 

  12. Kumar, S., Wilkerson, C.: Exploiting spatial locality in data caches using spatial footprints. In: Proc. 25th IEEE/ACM International Symposium on Computer Architecture, pp. 357–368 (June 1998)

    Google Scholar 

  13. Leung, S.: Array Restructuring for Cache Locality. PhD thesis, University of Washington (August 1996)

    Google Scholar 

  14. McKinley, K.S., Temam, O.: Quantifying loop nest locality using SPEC 1995 and the perfect benchmarks. ACM Transactions on Computer Systems 17(4), 288–336 (1999)

    Article  Google Scholar 

  15. Mellor-Crummey, J., Whalley, D., Kennedy, K.: Improving memory hierarchy performance for irregular applications using data and computation reorderings. International Journal of Parallel Programming 28(3) (June 2001)

    Google Scholar 

  16. Nesbit, K., Smith, J.: Data cache prefetching using a global history buffer. In: Proc. 10th IEEE Symposium on High Performance Computer Architecture, pp. 96–105 (February 2004)

    Google Scholar 

  17. Pingali, V., McKee, S., Hsieh, W., Carter, J.: Restructuring computations for temporal data cache locality. International Journal of Parallel Programming 31(4), 306–338 (2003)

    Article  MATH  Google Scholar 

  18. Pugh, W., Rosser, E.: Iteration space slicing for locality. In: Carter, L., Ferrante, J. (eds.) LCPC 1999. LNCS, vol. 1863, pp. 164–184. Springer, Heidelberg (2000)

    Chapter  Google Scholar 

  19. Schaelicke, L., Davis, A., McKee, S.: Profiling interrupts in modern architectures. In: Proc. 8th IEEE International Symposium on Modeling, Analysis and Simulation of Computer and Telecommunication Systems, pp. 115–123 (July 2000)

    Google Scholar 

  20. Schaelicke, L., Parker, M.: ML-RSIM home page (May 2005), http://www.cs.utah.edu/~lambert/mlrsim/

  21. Seznec, A.: Decoupled sector caches. IEEE Transactions on Computers 46, 210–215 (1997)

    Article  Google Scholar 

  22. Temam, O., Granston, E., Jalby, W.: To copy or not to copy: A compile-time technique for assessing when data copying should be used to eliminate cache conflicts. In: Proc. Supercomputing 1993, pp. 410–419 (December 1993)

    Google Scholar 

  23. Temam, O., Jegou, Y.: Using virtual lines to enhance locality exploitation. In: Proc. 8th ACM International Conference on Supercomputing, pp. 344–352 (July 1994)

    Google Scholar 

  24. Veidenbaum, A., Tang, W., Gupta, R., Nicolau, A., Ji, X.: Adapting cache line size to application behavior. In: Proc. 13th ACM International Conference on Supercomputing, pp. 145–154 (1999)

    Google Scholar 

  25. Vleet, P.V., Anderson, E., Brown, L., Baer, J.-L., Karlin, A.: Pursuing the performance potential of dynamic cache line sizes. In: Proc. International Conference on Computer Design, pp. 528–537 (October 1999)

    Google Scholar 

  26. Wolf, M.E., Lam, M.S.: A data locality optimizing algorithm. ACM SIGPLAN Notices 26(6), 30–44 (1991)

    Article  Google Scholar 

  27. Wolfe, M.: Optimizing Supercompilers for Supercomputers. MIT Press, Cambridge (1989)

    MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2009 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Watkins, M.A., McKee, S.A., Schaelicke, L. (2009). Revisiting Cache Block Superloading. In: Seznec, A., Emer, J., O’Boyle, M., Martonosi, M., Ungerer, T. (eds) High Performance Embedded Architectures and Compilers. HiPEAC 2009. Lecture Notes in Computer Science, vol 5409. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-92990-1_25

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-92990-1_25

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-92989-5

  • Online ISBN: 978-3-540-92990-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics