Article

Addressing instruction fetch bottlenecks by using an instruction register file

Authors:
Stephen Roderick Hines

Florida State University, Tallahassee, FL

Florida State University, Tallahassee, FL
View Profile

,
Gary Tyson

Florida State University, Tallahassee, FL

Florida State University, Tallahassee, FL
View Profile

,
David Whalley

Florida State University, Tallahassee, FL

Florida State University, Tallahassee, FL
View Profile

LCTES '07: Proceedings of the 2007 ACM SIGPLAN/SIGBED conference on Languages, compilers, and tools for embedded systemsJune 2007Pages 165–174https://doi.org/10.1145/1254766.1254800

Published:13 June 2007Publication History

LCTES '07: Proceedings of the 2007 ACM SIGPLAN/SIGBED conference on Languages, compilers, and tools for embedded systems

Pages 165–174

ABSTRACT

The Instruction Register File (IRF) is an architectural extension for providing improved access to frequently occurring instructions. An optimizing compiler can exploit an IRF by packing an application's instructions, resulting in decreased code size, reduced energy consumption and improved execution time primarily due to a smaller footprint in the instruction cache. The nature of the IRF also allows the execution of packed instructions to overlap with instruction fetch, thus providing a means for tolerating increased fetch latencies, like those experienced by encrypted ICs as well as the presence of low-power L0 caches. Although previous research has focused on the direct benefits of instruction packing, this paper explores the use of increased fetch bandwidth provided by packed instructions. Small L0 caches improve energy efficiency but can increase execution time due to frequent cache misses. We show that this penalty can be significantly reduced by overlapping the execution of packed instructions with miss stalls. The IRF can also be used to supply additional instructions to a more aggressive execution engine, effectively reducing dependence on instruction cache bandwidth. This can improve energy efficiency, in addition to providing additional flexibility for evaluating various design tradeoffs in a pipeline with asymmetric instruction bandwidth. Thus, we show that the IRF is a complementary technique, operating as a buffer tolerating fetch bottlenecks, as well as providing additional fetch bandwidth for an aggressive pipeline backend.

References

ANNAVARAM, M., GROCHOWSKI, E., AND SHEN, J. Mitigating amdahl's law through epi throttling. In Proceedings of the 2005 ACM/IEEE International Symposium on Computer Architecture (2005), IEEE Computer Society, pp. 298--309. Google ScholarDigital Library
ARAGÓN, J. L., GONZÁLEZ, J., AND GONZÁLEZ, A. Power-aware control speculation through selective throttling. In HPCA'03: Proceedings of the 9th International Symposium on High-Performance Computer Architecture (Washington, DC, USA, 2003), IEEE Computer Society, pp. 103--112. Google ScholarDigital Library
AUSTIN, T., LARSON, E., AND ERNST, D. SimpleScalar: An infrastructure for computer system modeling. IEEE Computer 35 (February 2002), 59--67. Google ScholarDigital Library
BANIASADI, A., AND MOSHOVOS, A. Instruction flow-based front-end throttling for power-aware high-performance processors. In ISLPED '01: Proceedings of the 2001 international symposium on Low power electronics and design (New York, NY, USA, 2001), ACM Press, pp. 16--21. Google ScholarDigital Library
BANIASADI, A., AND MOSHOVOS, A. Asymmetric-frequency clustering: a power-aware back-end for high-performance processors. In ISLPED '02: Proceedings of the 2002 international symposium on Low power electronics and design (New York, NY, USA, 2002), ACM Press, pp. 255--258. Google ScholarDigital Library
BELLAS, N., HAJJ, I., POLYCHRONOPOULOS, C., AND STA-MOULIS, G. Energy and performance improvements in a microprocessor design using a loop cache. In Proceedings of the 1999 International Conference on Computer Design (October 1999), pp. 378--383. Google ScholarDigital Library
BELLAS, N. E., HAJJ, I. N., AND POLYCHRONOPOULOS, C. D. Using dynamic cache management techniques to reduce energy in general purpose processors. IEEE Transactions on Very Large Scale Integrated Systems 8, 6 (2000), 693--708. Google ScholarDigital Library
BENITEZ, M. E., AND DAVI DS ON, J. W. A portable global optimizer and linker. In Proceedings of the SIGPLAN'88 conference on Programming Language Design and Implementation (1988), ACM Press, pp. 329--338. Google ScholarDigital Library
BROOKS, D., TIWARI, V., AND MARTONOSI, M. Wattch: A framework for architectural-level power analysis and optimizations. In ISCA '00: Proceedings of the 27th annual International Symposium on Computer architecture (New York, NY, USA, 2000), ACM Press, pp. 83--94. Google ScholarDigital Library
COOPER, K., AND MCINTOSH, N. Enhanced code compression for embedded risc processors. In Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation (May 1999), pp. 139--149. Google ScholarDigital Library
DEBRAY, S. K., EVANS, W., MUTH, R., AND DESUTTER, B. Compiler techniques for code compaction. ACM Transactions on Programming Languages and Systems 22, 2 (March 2000), 378--415. Google ScholarDigital Library
EYRE, J., AND BIER, J. DSP processors hit the mainstream. IEEE Computer 31, 8 (August 1998), 51--59. Google ScholarDigital Library
FOLEGNANI, D., AND GONZÁLEZ, A. Energy-effective issue logic. In Proceedings of the 28th annual International Symposium on Computer architecture (New York, NY, USA, 2001), ACM Press, pp. 230--239. Google ScholarDigital Library
FRASER, C. W., MYERS, E. W., AND WENDT, A. L. Analyzing and compressing assembly code. In Proceedings of the SIGPLAN '84 Symposium on Compiler Construction (June 1984), pp. 117--121. Google ScholarDigital Library
GORDON-ROSS, A., COTTERELL, S., AND VAHID, F. Tiny instruction caches for low power embedded systems. Trans. on Embedded Computing Sys. 2, 4 (2003), 449--481. Google ScholarDigital Library
GUTHAUS, M. R., RINGENBERG, J. S., ERNST, D., AUSTIN, T. M., MUDGE, T., AND BROWN, R. B. MiBench: A free, commercially representative embedded benchmark suite. IEEE 4th Annual Workshop on Workload Characterization (December 2001). Google ScholarDigital Library
HINES, S., GREEN, J., TYSON, G., AND WHALLEY, D. Improving program efficiency by packing instructions into registers. In Proceedings of the 2005 ACM/IEEE International Symposium on Computer Architecture (2005), IEEE Computer Society, pp. 260--271. Google ScholarDigital Library
HINES, S., TYSON, G., AND WHALLEY, D. Reducing instruction fetch cost by packing instructions into register windows. In Proceedings of the 38th annual ACM/IEEE International Symposium on Microarchitecture (November 2005), IEEE Computer Society, pp. 19--29. Google ScholarDigital Library
HINES, S., WHALLEY, D., AND TYSON, G. Adapting compilation techniques to enhance the packing of instructions into registers. In Proceedings of the International Conference on Compilers, Architecture, and Synthesis for Embedded Systems (October 2006), pp. 43--53. Google ScholarDigital Library
KIM, N. S., FLAUTNER, K., BLAAUW, D., AND MUDGE, T. Drowsy instruction caches: Leakage power reduction using dynamic voltage scaling and cache sub-bank prediction. In Proceedings of the 35th annual ACM/IEEE International Symposium on Microarchitecture (Los Alamitos, CA, USA, 2002), IEEE Computer Society Press, pp. 219--230. Google ScholarDigital Library
KIN, J., GUPTA, M., AND MANGIONE-SMITH, W. H. The filter cache: An energy efficient memory structure. In Proceedings of the 1997 International Symposium on Microarchitecture (1997), pp. 184--193. Google ScholarDigital Library
LAU, J., SCHOENMACKERS, S., SHERWOOD, T., AND CALDER, B. Reducing code size with echo instructions. In Proceedings of the 2003 International Conference on Compilers, Architectures and Synthesis for Embedded Systems (2003), ACM Press, pp. 84--94. Google ScholarDigital Library
LEE, C., POTKONJAK, M., AND MANGIONE-SMITH, W. H. MediaBench: A tool for evaluating and synthesizing multimedia and communicatons systems. In MICRO 30: Proceedings of the 30th annual ACM/IEEE International Symposium on Microarchitecture (Washington, DC, USA, 1997), IEEE Computer Society, pp. 330--335. Google ScholarDigital Library
LEE, L., MOYER, B., AND ARENDS, J. Instruction fetch energy reduction using loop caches for embedded applications with small tight loops. In Proceedings of the International Symposium on Low Power Electronics and Design (1999), pp. 267--269. Google ScholarDigital Library
LEE, L., MOYER, B., AND ARENDS, J. Low-cost embedded program loop caching -- revisited. Tech. Rep. CSE-TR-411-99, University of Michigan, 1999.Google Scholar
LEFURGY, C., BIRD, P., CHEN, I.-C. , AND MUDGE, T. Improving code density using compression techniques. In Proceedings of the 1997 International Symposium on Microarchitecture (December 1997), pp. 194--203. Google ScholarDigital Library
LEFURGY, C. R. Efficient execution of compressed programs. PhD thesis, University of Michigan, 2000. Google ScholarDigital Library
MANNE, S., KLAUSER, A., AND GRUNWALD, D. Pipeline gating: speculation control for energy reduction. In Proceedings of the 1998 ACM/IEEE International Symposium on Computer Architecture (1998), IEEE Computer Society, pp. 132--141. Google ScholarDigital Library
MONTANARO, J., WITEK, R. T., ANNE, K., BLACK, A. J., COOPER, E. M., DOBBERPUHL, D. W., DONAHUE, P. M. , ENO, J., HOEPPNER, G. W., KRUCKEMYER, D., LEE, T. H., LIN, P. C. M., MADDEN, L., MURRAY, D., PEARCE, M. H., SANTHANAM, S., SNYDER, K. J., STEPHANY, R., AND THIERAUF, S. C. A 160-mhz, 32-b, 0. 5-W CMOS RISC microprocessor. Digital Tech. J. 9, 1 (1997), 49--62. Google ScholarDigital Library
SHERWOOD, T., SAIR, S., AND CALDER, B. Phase tracking and prediction. SIGARCH Comput. Archit. News 31, 2 (2003), 336--349. Google ScholarDigital Library
SHI, W., LEE, H.-H. S., GHOSH, M. , LU, C., AND BOLDYREVA, A. High efficiency counter mode security architecture via prediction and precomputation. In Proceedings of the 2005 ACM/IEEE International Symposium on Computer Architecture (2005), IEEE Computer Society, pp. 14--24. Google ScholarDigital Library
TANG, W., VEIDENBAUM, A. V., AND GUPTA, R. Architectural adaptation for power and performance. In Proceedings of the 2001 International Conference on ASIC (October 2001), pp. 530--534.Google Scholar
WEAVER, D., AND GERMOND, T. The SPARC Architecture Manual, 1994.Google Scholar
WILTON, S. J., AND JOUPPI, N. P. CACTI: An enhanced cache access and cycle time model. IEEE Journal of Solid State Circuits 31, 5 (May 1996), 677--688.Google ScholarCross Ref

Index Terms

Addressing instruction fetch bottlenecks by using an instruction register file

Recommendations

Addressing instruction fetch bottlenecks by using an instruction register file
Proceedings of the 2007 LCTES conference

The Instruction Register File (IRF) is an architectural extension for providing improved access to frequently occurring instructions. An optimizing compiler can exploit an IRF by packing an application's instructions, resulting in decreased code size, ...
Read More
Guaranteeing instruction fetch behavior with a lookahead instruction fetch engine (LIFE)
LCTES '09: Proceedings of the 2009 ACM SIGPLAN/SIGBED conference on Languages, compilers, and tools for embedded systems

Instruction fetch behavior has been shown to be very regular and predictable, even for diverse application areas. In this work, we propose the Lookahead Instruction Fetch Engine (LIFE), which is designed to exploit the regularity present in instruction ...
Read More
Guaranteeing instruction fetch behavior with a lookahead instruction fetch engine (LIFE)
LCTES '09

Instruction fetch behavior has been shown to be very regular and predictable, even for diverse application areas. In this work, we propose the Lookahead Instruction Fetch Engine (LIFE), which is designed to exploit the regularity present in instruction ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
LCTES '07: Proceedings of the 2007 ACM SIGPLAN/SIGBED conference on Languages, compilers, and tools for embedded systems
June 2007
258 pages
ISBN:9781595936325
DOI:10.1145/1254766
General Chair:
Santosh Pande
Georgia Institute of Technology, USA
,
Program Chair:
Zhiyuan Li
Purdue University, USA
ACM SIGPLAN Notices Volume 42, Issue 7
Proceedings of the 2007 LCTES conference
July 2007
241 pages
ISSN:0362-1340
EISSN:1558-1160
DOI:10.1145/1273444
Issue’s Table of Contents
Copyright © 2007 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 13 June 2007
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
L0/filter cache
instruction packing
instruction register file
Qualifiers
- Article
Conference

Acceptance Rates
Overall Acceptance Rate116of438submissions,26%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 2
  Total Citations
  View Citations
- 321
  Total Downloads
- Downloads (Last 12 months)3
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Addressing instruction fetch bottlenecks by using an instruction register file

LCTES '07: Proceedings of the 2007 ACM SIGPLAN/SIGBED conference on Languages, compilers, and tools for embedded systems

ABSTRACT

References

Cited By

Index Terms

Recommendations

Addressing instruction fetch bottlenecks by using an instruction register file

Guaranteeing instruction fetch behavior with a lookahead instruction fetch engine (LIFE)

Guaranteeing instruction fetch behavior with a lookahead instruction fetch engine (LIFE)