Abstract
Modern general-purpose processors employ multi-port register files and multiple functional units to support instruction-level parallelism. Fixed (1 word per cycle) bandwidth between cache and register-file might limit processor’s ability in spatial/temporal utilization. This paper presents an experimental study of conventional super-scalar processor architecture to determine benefits that we can expect to achieve by enabling variable data bandwidth between the L1 data cache and the register file. Our results demonstrate that by changing the bus width to 64, 128 and 256 bits we can reduce data traffic between the 32KB register-file and 32KB cache up to 29%, 45% and 53%, respectively, while lowering the program execution time by 8%, 13% and 17% on average in comparison to conventional single-word cache access. An adaptive bandwidth cache capable of adjusting the cache bandwidth to workload variation is also proposed.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Liao, H., Wolfe, A.: Available Parallelism in video applications. In: Proc. Micro 1997, pp. 321–329 (1997)
Burger, D., Goodman, J.R., Kagi, A.: Memory Bandwidth limitations of Future Microrocessors. In: Proc. Annual 24th Int. Symp. On Computer Acrhitecture, pp. 78–89 (1996)
McCalpin, J.: Sustainable memory bandwidth in current high-performance computers (1995), http://reality.sgi.com/mccalpinasd/papers/bandwidth.ps
Huang, S.A., Chen, J.P.: The intrinsic bandwidth requirements of ordinary programs. In: Proc. 7th Int.Conf. on Arch.Support for Programming Languages and Operating Systems (1996)
Ding, Kennedy, K.: Memory Bandwidth Bottleneck and Its Amelioration by a Compiler. In: Proc. Int.Parrallel and Distributed Process. Symp. (2000)
Johnson, T.L., Hwu, W.W.: Run-time adaptive cache hierarchy management via reference analysis. In: Proc. Annual 24th Int. Symp. On Computer Acrhitecture (June 1997)
Larsen, S., Amarasinghe, S.: Exploiting Superword Level Parallelism with Multimedia Instruction Sets. In: Proc. ACM SIGPLAN Conf.on Progr.Language Design and Implementation, pp. 145–156 (2000)
MIPS Corporation, MIPS R3000 hardware manual, MIPS Corporation
Inoue, K., Kai, K., Murakami, K.: High bandwidth, variable line-cache architecture for merged DRAM/logic LSIs. IEICE Transactions on Electronics E81-C(9), 1438–1447 (1999)
Chen, T.-F., Baer, J.-L.: Reducing memory latency via non-blocking and prefetching caches, Tech.Rep. 92-06-03, Dept. Computer Science and Engineering, Univ.Washington, Seattle, WA (June 1992)
Veidenbaum, A., Tang, W., Gupta, R., Nicolau, A., Ji, X.: Adapting cache line size to application behavior. In: Proc. Int. Conf. on Supercomputing, pp. 145–154 (1999)
Burger, D.: Hardware Techniques to Improve the Performance of the Processor/ Memory Interface, Tech. Rep. Computer Science Dept., University of Wisconsin-Madison (December 1998)
Kumar, S., Wilkerson, C.: Exploiting Spatial Locality in Data Caches using Spatial Footprints. In: Proc. 25th Annual Int. Symp. On Computer Acrhitecture, June 1998, pp. 357–368 (1998)
Agarwal, D., Yeung, D.: Exploiting Application-Level Information to reduce memory bandwidth consumption, Technical Report UMIACS-TR-2002, Univ.of Maryland, Inst. For Advanced Computer Studies (2002)
Lebeck, A.R., Raymond, D., Yang, C.-L., Thottethodi, M.S.: Annotated Memory References: A Mechanism for Informed Cache Management (1999)
Gonzales, A., Aliagas, A., Valero, M.: A data cache with multiple caching strategies tuned to different types of locality. In: Proc. 1995 Int.Conf. on Supercomputing, July 1995, pp. 338–347 (1995)
Popescu, V., Schultz, M., Spracklen, J., Gibson, G., Lightner, B., Isaman, D.: The Metaflow architecture. IEEE Micro 13, 10–13 & 63– 3 (June 1991)
Ferrante, J., Sarkar, V., Trash, W.: On Estimating and Enhancing Cache Effectiveness. In: Proc.4th Workshop on Languages and Compilers for Parallel Computing (1991)
Temmam, O., Drach, N.: Software Assistance for Data Caches. In: Proc. IEEE HPCA (1995)
Lam, M.S., et al.: The SUIF compiler System (1992-2001), http://wwwsuif.stanford.edu
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2004 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Hamayasu, K., Moshnyaga, V.G. (2004). Impact of Register-Cache Bandwidth Variation on Processor Performance. In: Yew, PC., Xue, J. (eds) Advances in Computer Systems Architecture. ACSAC 2004. Lecture Notes in Computer Science, vol 3189. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-30102-8_18
Download citation
DOI: https://doi.org/10.1007/978-3-540-30102-8_18
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-23003-8
Online ISBN: 978-3-540-30102-8
eBook Packages: Springer Book Archive