ABSTRACT
The Instruction Register File (IRF) is an architectural extension for providing improved access to frequently occurring instructions. An optimizing compiler can exploit an IRF by packing an application's instructions, resulting in decreased code size, reduced energy consumption and improved execution time primarily due to a smaller footprint in the instruction cache. The nature of the IRF also allows the execution of packed instructions to overlap with instruction fetch, thus providing a means for tolerating increased fetch latencies, like those experienced by encrypted ICs as well as the presence of low-power L0 caches. Although previous research has focused on the direct benefits of instruction packing, this paper explores the use of increased fetch bandwidth provided by packed instructions. Small L0 caches improve energy efficiency but can increase execution time due to frequent cache misses. We show that this penalty can be significantly reduced by overlapping the execution of packed instructions with miss stalls. The IRF can also be used to supply additional instructions to a more aggressive execution engine, effectively reducing dependence on instruction cache bandwidth. This can improve energy efficiency, in addition to providing additional flexibility for evaluating various design tradeoffs in a pipeline with asymmetric instruction bandwidth. Thus, we show that the IRF is a complementary technique, operating as a buffer tolerating fetch bottlenecks, as well as providing additional fetch bandwidth for an aggressive pipeline backend.
- ANNAVARAM, M., GROCHOWSKI, E., AND SHEN, J. Mitigating amdahl's law through epi throttling. In Proceedings of the 2005 ACM/IEEE International Symposium on Computer Architecture (2005), IEEE Computer Society, pp. 298--309. Google ScholarDigital Library
- ARAGÓN, J. L., GONZÁLEZ, J., AND GONZÁLEZ, A. Power-aware control speculation through selective throttling. In HPCA'03: Proceedings of the 9th International Symposium on High-Performance Computer Architecture (Washington, DC, USA, 2003), IEEE Computer Society, pp. 103--112. Google ScholarDigital Library
- AUSTIN, T., LARSON, E., AND ERNST, D. SimpleScalar: An infrastructure for computer system modeling. IEEE Computer 35 (February 2002), 59--67. Google ScholarDigital Library
- BANIASADI, A., AND MOSHOVOS, A. Instruction flow-based front-end throttling for power-aware high-performance processors. In ISLPED '01: Proceedings of the 2001 international symposium on Low power electronics and design (New York, NY, USA, 2001), ACM Press, pp. 16--21. Google ScholarDigital Library
- BANIASADI, A., AND MOSHOVOS, A. Asymmetric-frequency clustering: a power-aware back-end for high-performance processors. In ISLPED '02: Proceedings of the 2002 international symposium on Low power electronics and design (New York, NY, USA, 2002), ACM Press, pp. 255--258. Google ScholarDigital Library
- BELLAS, N., HAJJ, I., POLYCHRONOPOULOS, C., AND STA-MOULIS, G. Energy and performance improvements in a microprocessor design using a loop cache. In Proceedings of the 1999 International Conference on Computer Design (October 1999), pp. 378--383. Google ScholarDigital Library
- BELLAS, N. E., HAJJ, I. N., AND POLYCHRONOPOULOS, C. D. Using dynamic cache management techniques to reduce energy in general purpose processors. IEEE Transactions on Very Large Scale Integrated Systems 8, 6 (2000), 693--708. Google ScholarDigital Library
- BENITEZ, M. E., AND DAVI DS ON, J. W. A portable global optimizer and linker. In Proceedings of the SIGPLAN'88 conference on Programming Language Design and Implementation (1988), ACM Press, pp. 329--338. Google ScholarDigital Library
- BROOKS, D., TIWARI, V., AND MARTONOSI, M. Wattch: A framework for architectural-level power analysis and optimizations. In ISCA '00: Proceedings of the 27th annual International Symposium on Computer architecture (New York, NY, USA, 2000), ACM Press, pp. 83--94. Google ScholarDigital Library
- COOPER, K., AND MCINTOSH, N. Enhanced code compression for embedded risc processors. In Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation (May 1999), pp. 139--149. Google ScholarDigital Library
- DEBRAY, S. K., EVANS, W., MUTH, R., AND DESUTTER, B. Compiler techniques for code compaction. ACM Transactions on Programming Languages and Systems 22, 2 (March 2000), 378--415. Google ScholarDigital Library
- EYRE, J., AND BIER, J. DSP processors hit the mainstream. IEEE Computer 31, 8 (August 1998), 51--59. Google ScholarDigital Library
- FOLEGNANI, D., AND GONZÁLEZ, A. Energy-effective issue logic. In Proceedings of the 28th annual International Symposium on Computer architecture (New York, NY, USA, 2001), ACM Press, pp. 230--239. Google ScholarDigital Library
- FRASER, C. W., MYERS, E. W., AND WENDT, A. L. Analyzing and compressing assembly code. In Proceedings of the SIGPLAN '84 Symposium on Compiler Construction (June 1984), pp. 117--121. Google ScholarDigital Library
- GORDON-ROSS, A., COTTERELL, S., AND VAHID, F. Tiny instruction caches for low power embedded systems. Trans. on Embedded Computing Sys. 2, 4 (2003), 449--481. Google ScholarDigital Library
- GUTHAUS, M. R., RINGENBERG, J. S., ERNST, D., AUSTIN, T. M., MUDGE, T., AND BROWN, R. B. MiBench: A free, commercially representative embedded benchmark suite. IEEE 4th Annual Workshop on Workload Characterization (December 2001). Google ScholarDigital Library
- HINES, S., GREEN, J., TYSON, G., AND WHALLEY, D. Improving program efficiency by packing instructions into registers. In Proceedings of the 2005 ACM/IEEE International Symposium on Computer Architecture (2005), IEEE Computer Society, pp. 260--271. Google ScholarDigital Library
- HINES, S., TYSON, G., AND WHALLEY, D. Reducing instruction fetch cost by packing instructions into register windows. In Proceedings of the 38th annual ACM/IEEE International Symposium on Microarchitecture (November 2005), IEEE Computer Society, pp. 19--29. Google ScholarDigital Library
- HINES, S., WHALLEY, D., AND TYSON, G. Adapting compilation techniques to enhance the packing of instructions into registers. In Proceedings of the International Conference on Compilers, Architecture, and Synthesis for Embedded Systems (October 2006), pp. 43--53. Google ScholarDigital Library
- KIM, N. S., FLAUTNER, K., BLAAUW, D., AND MUDGE, T. Drowsy instruction caches: Leakage power reduction using dynamic voltage scaling and cache sub-bank prediction. In Proceedings of the 35th annual ACM/IEEE International Symposium on Microarchitecture (Los Alamitos, CA, USA, 2002), IEEE Computer Society Press, pp. 219--230. Google ScholarDigital Library
- KIN, J., GUPTA, M., AND MANGIONE-SMITH, W. H. The filter cache: An energy efficient memory structure. In Proceedings of the 1997 International Symposium on Microarchitecture (1997), pp. 184--193. Google ScholarDigital Library
- LAU, J., SCHOENMACKERS, S., SHERWOOD, T., AND CALDER, B. Reducing code size with echo instructions. In Proceedings of the 2003 International Conference on Compilers, Architectures and Synthesis for Embedded Systems (2003), ACM Press, pp. 84--94. Google ScholarDigital Library
- LEE, C., POTKONJAK, M., AND MANGIONE-SMITH, W. H. MediaBench: A tool for evaluating and synthesizing multimedia and communicatons systems. In MICRO 30: Proceedings of the 30th annual ACM/IEEE International Symposium on Microarchitecture (Washington, DC, USA, 1997), IEEE Computer Society, pp. 330--335. Google ScholarDigital Library
- LEE, L., MOYER, B., AND ARENDS, J. Instruction fetch energy reduction using loop caches for embedded applications with small tight loops. In Proceedings of the International Symposium on Low Power Electronics and Design (1999), pp. 267--269. Google ScholarDigital Library
- LEE, L., MOYER, B., AND ARENDS, J. Low-cost embedded program loop caching -- revisited. Tech. Rep. CSE-TR-411-99, University of Michigan, 1999.Google Scholar
- LEFURGY, C., BIRD, P., CHEN, I.-C. , AND MUDGE, T. Improving code density using compression techniques. In Proceedings of the 1997 International Symposium on Microarchitecture (December 1997), pp. 194--203. Google ScholarDigital Library
- LEFURGY, C. R. Efficient execution of compressed programs. PhD thesis, University of Michigan, 2000. Google ScholarDigital Library
- MANNE, S., KLAUSER, A., AND GRUNWALD, D. Pipeline gating: speculation control for energy reduction. In Proceedings of the 1998 ACM/IEEE International Symposium on Computer Architecture (1998), IEEE Computer Society, pp. 132--141. Google ScholarDigital Library
- MONTANARO, J., WITEK, R. T., ANNE, K., BLACK, A. J., COOPER, E. M., DOBBERPUHL, D. W., DONAHUE, P. M. , ENO, J., HOEPPNER, G. W., KRUCKEMYER, D., LEE, T. H., LIN, P. C. M., MADDEN, L., MURRAY, D., PEARCE, M. H., SANTHANAM, S., SNYDER, K. J., STEPHANY, R., AND THIERAUF, S. C. A 160-mhz, 32-b, 0. 5-W CMOS RISC microprocessor. Digital Tech. J. 9, 1 (1997), 49--62. Google ScholarDigital Library
- SHERWOOD, T., SAIR, S., AND CALDER, B. Phase tracking and prediction. SIGARCH Comput. Archit. News 31, 2 (2003), 336--349. Google ScholarDigital Library
- SHI, W., LEE, H.-H. S., GHOSH, M. , LU, C., AND BOLDYREVA, A. High efficiency counter mode security architecture via prediction and precomputation. In Proceedings of the 2005 ACM/IEEE International Symposium on Computer Architecture (2005), IEEE Computer Society, pp. 14--24. Google ScholarDigital Library
- TANG, W., VEIDENBAUM, A. V., AND GUPTA, R. Architectural adaptation for power and performance. In Proceedings of the 2001 International Conference on ASIC (October 2001), pp. 530--534.Google Scholar
- WEAVER, D., AND GERMOND, T. The SPARC Architecture Manual, 1994.Google Scholar
- WILTON, S. J., AND JOUPPI, N. P. CACTI: An enhanced cache access and cycle time model. IEEE Journal of Solid State Circuits 31, 5 (May 1996), 677--688.Google ScholarCross Ref
Index Terms
- Addressing instruction fetch bottlenecks by using an instruction register file
Recommendations
Addressing instruction fetch bottlenecks by using an instruction register file
Proceedings of the 2007 LCTES conferenceThe Instruction Register File (IRF) is an architectural extension for providing improved access to frequently occurring instructions. An optimizing compiler can exploit an IRF by packing an application's instructions, resulting in decreased code size, ...
Guaranteeing instruction fetch behavior with a lookahead instruction fetch engine (LIFE)
LCTES '09: Proceedings of the 2009 ACM SIGPLAN/SIGBED conference on Languages, compilers, and tools for embedded systemsInstruction fetch behavior has been shown to be very regular and predictable, even for diverse application areas. In this work, we propose the Lookahead Instruction Fetch Engine (LIFE), which is designed to exploit the regularity present in instruction ...
Guaranteeing instruction fetch behavior with a lookahead instruction fetch engine (LIFE)
LCTES '09Instruction fetch behavior has been shown to be very regular and predictable, even for diverse application areas. In this work, we propose the Lookahead Instruction Fetch Engine (LIFE), which is designed to exploit the regularity present in instruction ...
Comments