Abstract
Technological advances and increasingly complex and dynamic application behavior argue for revisiting mechanisms that adapt logical cache block size to application characteristics. This approach to bridging the processor/memory performance gap has been studied before, but mostly via trace-driven simulation, looking only at L1 caches. Given changes in hardware/software technology, we revisit the general approach: we propose a transparent, phase-adaptive, low-complexity mechanism for L2 superloading and evaluate it on a full-system simulator for 23 SPEC CPU2000 codes. Targeting L2 benefits instruction and data fetches. We investigate cache blocks of 32-512B, confirming that no fixed size performs well for all applications: differences range from 5-49% between best and worst fixed block sizes. Our scheme obtains performance similar to the per application best static block size. In a few cases, we minimally decrease performance compared to the best static size, but best size varies per application, and rarely matches real hardware. We generally improve performance over best static choices by up to 10%. Phase adaptability particularly benefits multiprogrammed workloads with conflicting locality characteristics, yielding performance gains of 5-20%. Our approach also outperforms next-line and delta prefetching.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Calder, B., Krintz, C., John, S., Austin, T.: Cache-conscious data placement. In: Proc. 8th ACM Symposium on Architectural Support for Programming Languages and Operating Systems, pp. 139–149 (October 1998)
Chen, C., Yang, S., Falsafi, B., Moshovos, A.: Accurate and complexity-effective spatial pattern prediction. In: Proc. 10th IEEE Symposium on High Performance Computer Architecture, pp. 187–276 (February 2004)
Chilimbi, T., Davidson, B., Larus, J.: Cache-conscious structure definition. In: Proc. ACM SIGPLAN Conference on Programming Language Design and Implementation, pp. 13–24 (May 1999)
Dahlgren, F., Dubois, M., Stenstrom, P.: Fixed and adaptive sequential prefetching in shared memory multiprocessors. In: Proc. International Conference on Parallel Processing, pp. 733–746 (August 1993)
Ding, C., Kennedy, K.: Improving cache performance in dynamic applications through data and computation reorganization at run time. In: Proc. ACM SIGPLAN Conference on Programming Language Design and Implementation, pp. 229–241 (May 1999)
Gonzalez, A., Aliagas, C., Valero, M.: A data cache with multiple caching strategies tuned to different types of locality. In: Proc. 1995 International Conference on Supercomputing, pp. 338–347 (1995)
Gornish, E., Veidenbaum, A.: An integrated hardware/software data prefetching scheme for shared-memory multiprocessors. In: Proc. International Conference on Parallel Programming, pp. 35–70 (August 1994)
Inoue, K., Kai, K., Marakami, K.: High bandwidth variable line size cache architecture for merged DRAM/logic LSIs. IEICE Transactions on Electronics 81(9), 1438–1447 (1999)
Johnson, T., Merten, M., Hwu, W.: Run-time spatial locality detection and optimization. In: Proc. IEEE/ACM 30th International Symposium on Microarchitecture, pp. 57–64 (December 1997)
Kane, G.: MIPS RISC Architecture. Prentice Hall, Englewood Cliffs (1989)
Kodukula, I., Pingali, K.: Data-centric transformations for locality enhancement. International Journal of Parallel Programming 29(3), 319–364 (2001)
Kumar, S., Wilkerson, C.: Exploiting spatial locality in data caches using spatial footprints. In: Proc. 25th IEEE/ACM International Symposium on Computer Architecture, pp. 357–368 (June 1998)
Leung, S.: Array Restructuring for Cache Locality. PhD thesis, University of Washington (August 1996)
McKinley, K.S., Temam, O.: Quantifying loop nest locality using SPEC 1995 and the perfect benchmarks. ACM Transactions on Computer Systems 17(4), 288–336 (1999)
Mellor-Crummey, J., Whalley, D., Kennedy, K.: Improving memory hierarchy performance for irregular applications using data and computation reorderings. International Journal of Parallel Programming 28(3) (June 2001)
Nesbit, K., Smith, J.: Data cache prefetching using a global history buffer. In: Proc. 10th IEEE Symposium on High Performance Computer Architecture, pp. 96–105 (February 2004)
Pingali, V., McKee, S., Hsieh, W., Carter, J.: Restructuring computations for temporal data cache locality. International Journal of Parallel Programming 31(4), 306–338 (2003)
Pugh, W., Rosser, E.: Iteration space slicing for locality. In: Carter, L., Ferrante, J. (eds.) LCPC 1999. LNCS, vol. 1863, pp. 164–184. Springer, Heidelberg (2000)
Schaelicke, L., Davis, A., McKee, S.: Profiling interrupts in modern architectures. In: Proc. 8th IEEE International Symposium on Modeling, Analysis and Simulation of Computer and Telecommunication Systems, pp. 115–123 (July 2000)
Schaelicke, L., Parker, M.: ML-RSIM home page (May 2005), http://www.cs.utah.edu/~lambert/mlrsim/
Seznec, A.: Decoupled sector caches. IEEE Transactions on Computers 46, 210–215 (1997)
Temam, O., Granston, E., Jalby, W.: To copy or not to copy: A compile-time technique for assessing when data copying should be used to eliminate cache conflicts. In: Proc. Supercomputing 1993, pp. 410–419 (December 1993)
Temam, O., Jegou, Y.: Using virtual lines to enhance locality exploitation. In: Proc. 8th ACM International Conference on Supercomputing, pp. 344–352 (July 1994)
Veidenbaum, A., Tang, W., Gupta, R., Nicolau, A., Ji, X.: Adapting cache line size to application behavior. In: Proc. 13th ACM International Conference on Supercomputing, pp. 145–154 (1999)
Vleet, P.V., Anderson, E., Brown, L., Baer, J.-L., Karlin, A.: Pursuing the performance potential of dynamic cache line sizes. In: Proc. International Conference on Computer Design, pp. 528–537 (October 1999)
Wolf, M.E., Lam, M.S.: A data locality optimizing algorithm. ACM SIGPLAN Notices 26(6), 30–44 (1991)
Wolfe, M.: Optimizing Supercompilers for Supercomputers. MIT Press, Cambridge (1989)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2009 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Watkins, M.A., McKee, S.A., Schaelicke, L. (2009). Revisiting Cache Block Superloading. In: Seznec, A., Emer, J., O’Boyle, M., Martonosi, M., Ungerer, T. (eds) High Performance Embedded Architectures and Compilers. HiPEAC 2009. Lecture Notes in Computer Science, vol 5409. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-92990-1_25
Download citation
DOI: https://doi.org/10.1007/978-3-540-92990-1_25
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-92989-5
Online ISBN: 978-3-540-92990-1
eBook Packages: Computer ScienceComputer Science (R0)