Revisiting Cache Block Superloading

Watkins, Matthew A.; McKee, Sally A.; Schaelicke, Lambert

doi:10.1007/978-3-540-92990-1_25

Matthew A. Watkins⁶,
Sally A. McKee⁷ &
Lambert Schaelicke⁸

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 5409))

Included in the following conference series:

International Conference on High-Performance Embedded Architectures and Compilers

951 Accesses

Abstract

Technological advances and increasingly complex and dynamic application behavior argue for revisiting mechanisms that adapt logical cache block size to application characteristics. This approach to bridging the processor/memory performance gap has been studied before, but mostly via trace-driven simulation, looking only at L1 caches. Given changes in hardware/software technology, we revisit the general approach: we propose a transparent, phase-adaptive, low-complexity mechanism for L2 superloading and evaluate it on a full-system simulator for 23 SPEC CPU2000 codes. Targeting L2 benefits instruction and data fetches. We investigate cache blocks of 32-512B, confirming that no fixed size performs well for all applications: differences range from 5-49% between best and worst fixed block sizes. Our scheme obtains performance similar to the per application best static block size. In a few cases, we minimally decrease performance compared to the best static size, but best size varies per application, and rarely matches real hardware. We generally improve performance over best static choices by up to 10%. Phase adaptability particularly benefits multiprogrammed workloads with conflicting locality characteristics, yielding performance gains of 5-20%. Our approach also outperforms next-line and delta prefetching.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Calder, B., Krintz, C., John, S., Austin, T.: Cache-conscious data placement. In: Proc. 8th ACM Symposium on Architectural Support for Programming Languages and Operating Systems, pp. 139–149 (October 1998)
Google Scholar
Chen, C., Yang, S., Falsafi, B., Moshovos, A.: Accurate and complexity-effective spatial pattern prediction. In: Proc. 10th IEEE Symposium on High Performance Computer Architecture, pp. 187–276 (February 2004)
Google Scholar
Chilimbi, T., Davidson, B., Larus, J.: Cache-conscious structure definition. In: Proc. ACM SIGPLAN Conference on Programming Language Design and Implementation, pp. 13–24 (May 1999)
Google Scholar
Dahlgren, F., Dubois, M., Stenstrom, P.: Fixed and adaptive sequential prefetching in shared memory multiprocessors. In: Proc. International Conference on Parallel Processing, pp. 733–746 (August 1993)
Google Scholar
Ding, C., Kennedy, K.: Improving cache performance in dynamic applications through data and computation reorganization at run time. In: Proc. ACM SIGPLAN Conference on Programming Language Design and Implementation, pp. 229–241 (May 1999)
Google Scholar
Gonzalez, A., Aliagas, C., Valero, M.: A data cache with multiple caching strategies tuned to different types of locality. In: Proc. 1995 International Conference on Supercomputing, pp. 338–347 (1995)
Google Scholar
Gornish, E., Veidenbaum, A.: An integrated hardware/software data prefetching scheme for shared-memory multiprocessors. In: Proc. International Conference on Parallel Programming, pp. 35–70 (August 1994)
Google Scholar
Inoue, K., Kai, K., Marakami, K.: High bandwidth variable line size cache architecture for merged DRAM/logic LSIs. IEICE Transactions on Electronics 81(9), 1438–1447 (1999)
Google Scholar
Johnson, T., Merten, M., Hwu, W.: Run-time spatial locality detection and optimization. In: Proc. IEEE/ACM 30th International Symposium on Microarchitecture, pp. 57–64 (December 1997)
Google Scholar
Kane, G.: MIPS RISC Architecture. Prentice Hall, Englewood Cliffs (1989)
Google Scholar
Kodukula, I., Pingali, K.: Data-centric transformations for locality enhancement. International Journal of Parallel Programming 29(3), 319–364 (2001)
Article MATH Google Scholar
Kumar, S., Wilkerson, C.: Exploiting spatial locality in data caches using spatial footprints. In: Proc. 25th IEEE/ACM International Symposium on Computer Architecture, pp. 357–368 (June 1998)
Google Scholar
Leung, S.: Array Restructuring for Cache Locality. PhD thesis, University of Washington (August 1996)
Google Scholar
McKinley, K.S., Temam, O.: Quantifying loop nest locality using SPEC 1995 and the perfect benchmarks. ACM Transactions on Computer Systems 17(4), 288–336 (1999)
Article Google Scholar
Mellor-Crummey, J., Whalley, D., Kennedy, K.: Improving memory hierarchy performance for irregular applications using data and computation reorderings. International Journal of Parallel Programming 28(3) (June 2001)
Google Scholar
Nesbit, K., Smith, J.: Data cache prefetching using a global history buffer. In: Proc. 10th IEEE Symposium on High Performance Computer Architecture, pp. 96–105 (February 2004)
Google Scholar
Pingali, V., McKee, S., Hsieh, W., Carter, J.: Restructuring computations for temporal data cache locality. International Journal of Parallel Programming 31(4), 306–338 (2003)
Article MATH Google Scholar
Pugh, W., Rosser, E.: Iteration space slicing for locality. In: Carter, L., Ferrante, J. (eds.) LCPC 1999. LNCS, vol. 1863, pp. 164–184. Springer, Heidelberg (2000)
Chapter Google Scholar
Schaelicke, L., Davis, A., McKee, S.: Profiling interrupts in modern architectures. In: Proc. 8th IEEE International Symposium on Modeling, Analysis and Simulation of Computer and Telecommunication Systems, pp. 115–123 (July 2000)
Google Scholar
Schaelicke, L., Parker, M.: ML-RSIM home page (May 2005), http://www.cs.utah.edu/~lambert/mlrsim/
Seznec, A.: Decoupled sector caches. IEEE Transactions on Computers 46, 210–215 (1997)
Article Google Scholar
Temam, O., Granston, E., Jalby, W.: To copy or not to copy: A compile-time technique for assessing when data copying should be used to eliminate cache conflicts. In: Proc. Supercomputing 1993, pp. 410–419 (December 1993)
Google Scholar
Temam, O., Jegou, Y.: Using virtual lines to enhance locality exploitation. In: Proc. 8th ACM International Conference on Supercomputing, pp. 344–352 (July 1994)
Google Scholar
Veidenbaum, A., Tang, W., Gupta, R., Nicolau, A., Ji, X.: Adapting cache line size to application behavior. In: Proc. 13th ACM International Conference on Supercomputing, pp. 145–154 (1999)
Google Scholar
Vleet, P.V., Anderson, E., Brown, L., Baer, J.-L., Karlin, A.: Pursuing the performance potential of dynamic cache line sizes. In: Proc. International Conference on Computer Design, pp. 528–537 (October 1999)
Google Scholar
Wolf, M.E., Lam, M.S.: A data locality optimizing algorithm. ACM SIGPLAN Notices 26(6), 30–44 (1991)
Article Google Scholar
Wolfe, M.: Optimizing Supercompilers for Supercomputers. MIT Press, Cambridge (1989)
MATH Google Scholar

Download references

Author information

Authors and Affiliations

School of Electrical and Computer Engineering, Cornell University, USA
Matthew A. Watkins
Department of Computer Science and Engineering, Chalmers University of Technology, USA
Sally A. McKee
Fort Collins Design Center, Intel Corporation, USA
Lambert Schaelicke

Authors

Matthew A. Watkins
View author publications
You can also search for this author in PubMed Google Scholar
Sally A. McKee
View author publications
You can also search for this author in PubMed Google Scholar
Lambert Schaelicke
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

IRISA, Campus de Beaulieu, 35042, Rennes Cedex, France
André Seznec
Intel Corporation, Massachusetts Microprocessor Design Center, 77 Reed Road, MA 01749, Hudson, USA
Joel Emer
School of Informatics, Institute for Computing Systems Architecture, King’ s Buildings, EH9 3JZ, Edinburgh, United Kingdom
Michael O’Boyle
Department of Electrical Engineering, Princeton University, 34 Olden Street, NJ 08544-5263, Princeton, USA
Margaret Martonosi
Department of Computer Science, University of Augsburg, 86135, Augsburg, Germany
Theo Ungerer

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Watkins, M.A., McKee, S.A., Schaelicke, L. (2009). Revisiting Cache Block Superloading. In: Seznec, A., Emer, J., O’Boyle, M., Martonosi, M., Ungerer, T. (eds) High Performance Embedded Architectures and Compilers. HiPEAC 2009. Lecture Notes in Computer Science, vol 5409. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-92990-1_25

Download citation

DOI: https://doi.org/10.1007/978-3-540-92990-1_25
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-92989-5
Online ISBN: 978-3-540-92990-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics