skip to main content
10.1145/1254766.1254800acmconferencesArticle/Chapter ViewAbstractPublication PagescpsweekConference Proceedingsconference-collections
Article

Addressing instruction fetch bottlenecks by using an instruction register file

Published:13 June 2007Publication History

ABSTRACT

The Instruction Register File (IRF) is an architectural extension for providing improved access to frequently occurring instructions. An optimizing compiler can exploit an IRF by packing an application's instructions, resulting in decreased code size, reduced energy consumption and improved execution time primarily due to a smaller footprint in the instruction cache. The nature of the IRF also allows the execution of packed instructions to overlap with instruction fetch, thus providing a means for tolerating increased fetch latencies, like those experienced by encrypted ICs as well as the presence of low-power L0 caches. Although previous research has focused on the direct benefits of instruction packing, this paper explores the use of increased fetch bandwidth provided by packed instructions. Small L0 caches improve energy efficiency but can increase execution time due to frequent cache misses. We show that this penalty can be significantly reduced by overlapping the execution of packed instructions with miss stalls. The IRF can also be used to supply additional instructions to a more aggressive execution engine, effectively reducing dependence on instruction cache bandwidth. This can improve energy efficiency, in addition to providing additional flexibility for evaluating various design tradeoffs in a pipeline with asymmetric instruction bandwidth. Thus, we show that the IRF is a complementary technique, operating as a buffer tolerating fetch bottlenecks, as well as providing additional fetch bandwidth for an aggressive pipeline backend.

References

  1. ANNAVARAM, M., GROCHOWSKI, E., AND SHEN, J. Mitigating amdahl's law through epi throttling. In Proceedings of the 2005 ACM/IEEE International Symposium on Computer Architecture (2005), IEEE Computer Society, pp. 298--309. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. ARAGÓN, J. L., GONZÁLEZ, J., AND GONZÁLEZ, A. Power-aware control speculation through selective throttling. In HPCA'03: Proceedings of the 9th International Symposium on High-Performance Computer Architecture (Washington, DC, USA, 2003), IEEE Computer Society, pp. 103--112. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. AUSTIN, T., LARSON, E., AND ERNST, D. SimpleScalar: An infrastructure for computer system modeling. IEEE Computer 35 (February 2002), 59--67. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. BANIASADI, A., AND MOSHOVOS, A. Instruction flow-based front-end throttling for power-aware high-performance processors. In ISLPED '01: Proceedings of the 2001 international symposium on Low power electronics and design (New York, NY, USA, 2001), ACM Press, pp. 16--21. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. BANIASADI, A., AND MOSHOVOS, A. Asymmetric-frequency clustering: a power-aware back-end for high-performance processors. In ISLPED '02: Proceedings of the 2002 international symposium on Low power electronics and design (New York, NY, USA, 2002), ACM Press, pp. 255--258. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. BELLAS, N., HAJJ, I., POLYCHRONOPOULOS, C., AND STA-MOULIS, G. Energy and performance improvements in a microprocessor design using a loop cache. In Proceedings of the 1999 International Conference on Computer Design (October 1999), pp. 378--383. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. BELLAS, N. E., HAJJ, I. N., AND POLYCHRONOPOULOS, C. D. Using dynamic cache management techniques to reduce energy in general purpose processors. IEEE Transactions on Very Large Scale Integrated Systems 8, 6 (2000), 693--708. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. BENITEZ, M. E., AND DAVI DS ON, J. W. A portable global optimizer and linker. In Proceedings of the SIGPLAN'88 conference on Programming Language Design and Implementation (1988), ACM Press, pp. 329--338. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. BROOKS, D., TIWARI, V., AND MARTONOSI, M. Wattch: A framework for architectural-level power analysis and optimizations. In ISCA '00: Proceedings of the 27th annual International Symposium on Computer architecture (New York, NY, USA, 2000), ACM Press, pp. 83--94. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. COOPER, K., AND MCINTOSH, N. Enhanced code compression for embedded risc processors. In Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation (May 1999), pp. 139--149. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. DEBRAY, S. K., EVANS, W., MUTH, R., AND DESUTTER, B. Compiler techniques for code compaction. ACM Transactions on Programming Languages and Systems 22, 2 (March 2000), 378--415. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. EYRE, J., AND BIER, J. DSP processors hit the mainstream. IEEE Computer 31, 8 (August 1998), 51--59. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. FOLEGNANI, D., AND GONZÁLEZ, A. Energy-effective issue logic. In Proceedings of the 28th annual International Symposium on Computer architecture (New York, NY, USA, 2001), ACM Press, pp. 230--239. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. FRASER, C. W., MYERS, E. W., AND WENDT, A. L. Analyzing and compressing assembly code. In Proceedings of the SIGPLAN '84 Symposium on Compiler Construction (June 1984), pp. 117--121. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. GORDON-ROSS, A., COTTERELL, S., AND VAHID, F. Tiny instruction caches for low power embedded systems. Trans. on Embedded Computing Sys. 2, 4 (2003), 449--481. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. GUTHAUS, M. R., RINGENBERG, J. S., ERNST, D., AUSTIN, T. M., MUDGE, T., AND BROWN, R. B. MiBench: A free, commercially representative embedded benchmark suite. IEEE 4th Annual Workshop on Workload Characterization (December 2001). Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. HINES, S., GREEN, J., TYSON, G., AND WHALLEY, D. Improving program efficiency by packing instructions into registers. In Proceedings of the 2005 ACM/IEEE International Symposium on Computer Architecture (2005), IEEE Computer Society, pp. 260--271. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. HINES, S., TYSON, G., AND WHALLEY, D. Reducing instruction fetch cost by packing instructions into register windows. In Proceedings of the 38th annual ACM/IEEE International Symposium on Microarchitecture (November 2005), IEEE Computer Society, pp. 19--29. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. HINES, S., WHALLEY, D., AND TYSON, G. Adapting compilation techniques to enhance the packing of instructions into registers. In Proceedings of the International Conference on Compilers, Architecture, and Synthesis for Embedded Systems (October 2006), pp. 43--53. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. KIM, N. S., FLAUTNER, K., BLAAUW, D., AND MUDGE, T. Drowsy instruction caches: Leakage power reduction using dynamic voltage scaling and cache sub-bank prediction. In Proceedings of the 35th annual ACM/IEEE International Symposium on Microarchitecture (Los Alamitos, CA, USA, 2002), IEEE Computer Society Press, pp. 219--230. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. KIN, J., GUPTA, M., AND MANGIONE-SMITH, W. H. The filter cache: An energy efficient memory structure. In Proceedings of the 1997 International Symposium on Microarchitecture (1997), pp. 184--193. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. LAU, J., SCHOENMACKERS, S., SHERWOOD, T., AND CALDER, B. Reducing code size with echo instructions. In Proceedings of the 2003 International Conference on Compilers, Architectures and Synthesis for Embedded Systems (2003), ACM Press, pp. 84--94. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. LEE, C., POTKONJAK, M., AND MANGIONE-SMITH, W. H. MediaBench: A tool for evaluating and synthesizing multimedia and communicatons systems. In MICRO 30: Proceedings of the 30th annual ACM/IEEE International Symposium on Microarchitecture (Washington, DC, USA, 1997), IEEE Computer Society, pp. 330--335. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. LEE, L., MOYER, B., AND ARENDS, J. Instruction fetch energy reduction using loop caches for embedded applications with small tight loops. In Proceedings of the International Symposium on Low Power Electronics and Design (1999), pp. 267--269. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. LEE, L., MOYER, B., AND ARENDS, J. Low-cost embedded program loop caching -- revisited. Tech. Rep. CSE-TR-411-99, University of Michigan, 1999.Google ScholarGoogle Scholar
  26. LEFURGY, C., BIRD, P., CHEN, I.-C. , AND MUDGE, T. Improving code density using compression techniques. In Proceedings of the 1997 International Symposium on Microarchitecture (December 1997), pp. 194--203. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. LEFURGY, C. R. Efficient execution of compressed programs. PhD thesis, University of Michigan, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. MANNE, S., KLAUSER, A., AND GRUNWALD, D. Pipeline gating: speculation control for energy reduction. In Proceedings of the 1998 ACM/IEEE International Symposium on Computer Architecture (1998), IEEE Computer Society, pp. 132--141. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. MONTANARO, J., WITEK, R. T., ANNE, K., BLACK, A. J., COOPER, E. M., DOBBERPUHL, D. W., DONAHUE, P. M. , ENO, J., HOEPPNER, G. W., KRUCKEMYER, D., LEE, T. H., LIN, P. C. M., MADDEN, L., MURRAY, D., PEARCE, M. H., SANTHANAM, S., SNYDER, K. J., STEPHANY, R., AND THIERAUF, S. C. A 160-mhz, 32-b, 0. 5-W CMOS RISC microprocessor. Digital Tech. J. 9, 1 (1997), 49--62. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. SHERWOOD, T., SAIR, S., AND CALDER, B. Phase tracking and prediction. SIGARCH Comput. Archit. News 31, 2 (2003), 336--349. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. SHI, W., LEE, H.-H. S., GHOSH, M. , LU, C., AND BOLDYREVA, A. High efficiency counter mode security architecture via prediction and precomputation. In Proceedings of the 2005 ACM/IEEE International Symposium on Computer Architecture (2005), IEEE Computer Society, pp. 14--24. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. TANG, W., VEIDENBAUM, A. V., AND GUPTA, R. Architectural adaptation for power and performance. In Proceedings of the 2001 International Conference on ASIC (October 2001), pp. 530--534.Google ScholarGoogle Scholar
  33. WEAVER, D., AND GERMOND, T. The SPARC Architecture Manual, 1994.Google ScholarGoogle Scholar
  34. WILTON, S. J., AND JOUPPI, N. P. CACTI: An enhanced cache access and cycle time model. IEEE Journal of Solid State Circuits 31, 5 (May 1996), 677--688.Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. Addressing instruction fetch bottlenecks by using an instruction register file

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Conferences
          LCTES '07: Proceedings of the 2007 ACM SIGPLAN/SIGBED conference on Languages, compilers, and tools for embedded systems
          June 2007
          258 pages
          ISBN:9781595936325
          DOI:10.1145/1254766
          • cover image ACM SIGPLAN Notices
            ACM SIGPLAN Notices  Volume 42, Issue 7
            Proceedings of the 2007 LCTES conference
            July 2007
            241 pages
            ISSN:0362-1340
            EISSN:1558-1160
            DOI:10.1145/1273444
            Issue’s Table of Contents

          Copyright © 2007 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 13 June 2007

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • Article

          Acceptance Rates

          Overall Acceptance Rate116of438submissions,26%

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader