skip to main content
research-article

Area-efficiency in CMP core design: co-optimization of microarchitecture and physical design

Published:23 July 2009Publication History
Skip Abstract Section

Abstract

In this paper, we examine the area-performance design space of a processing core for a chip multiprocessor (CMP), considering both the architectural design space and the tradeoffs of the physical design on which the architecture relies. We first propose a methodology for performing an integrated optimization of both the micro-architecture and the physical circuit design of a microprocessor. In our approach, we use statistical and convex fitting methods to capture a large micro-architectural design space. We then characterize the area-delay tradeoffs of the underlying circuits through RTL synthesis. Finally, we establish the relationship between the architecture and the circuits in an integrative model, which we use to optimize the processor. As a case study, we apply this methodology to explore the performance-area tradeoffs in a highly parallel accelerator architecture for visual computing applications. Based on some early circuit tradeoff data, our results indicate that two separate designs are performance/area optimal for our set of benchmarks: a simpler single-issue, 2-way multithreaded core running at high-frequency, and a more aggressively tuned dual-issue 4-way multithreaded design running at a lower frequency.

References

  1. J. Balfour and W.J. Dally. Design tradeoffs for tiled CMP on-chip networks. In Proceedings of the 20th International Conference on Supercomputing, pages 187--198, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. A. Hartstein and T.R. Puzak. The optimum pipeline depth for a microprocessor. isca, 00:0007, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. A. Hartstein and T.R. Puzak. The optimum pipeline depth considering both power and performance. ACM Trans. Archit. Code Optim., 1(4):369--388, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. M.S. Hrishikesh, D. Burger, S.W. Keckler, P. Shivakumar, N.P. Jouppi, and K.I. Farkas. The optimal logic depth per pipeline stage is 6 to 8 fo4 inverter delays. isca, 00:0014, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. L. Hsu, R. Iyer, S. Makineni, S. Reinhardt, and D. Newell. Exploring the cache design space for large scale CMPs. ACM SIGARCH Somputer Architecture News, 33(4):24--33, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. J. Huh, D. Burger, and S. Keckler. Exploring the design space of future CMPs. In Proceedings of the International Conference on Parallel Architectures and Compilation Techniques, pages 199--210, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. E. Ipek, S. McKee, B. de Supinski, M. Schulz, and R. Caruana. Efficiently Exploring Architectural Design Spaces via Predictive Modeling. In Proceedings of the 12th International Conference on Architectural Support for Programming Languages and Operating Systems, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. R. Kumar, D.M. Tullsen, and N.P. Jouppi. Core architecture optimization for heterogeneous chip multiprocessors. In PACT '06: Proceedings of the 15th international conference on Parallel architectures and compilation techniques, pages 23--32, New York, NY, USA, 2006. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. R. Kumar, V. Zyuban, and D.M. Tullsen. Interconnections in Multi-Core Architectures: Understanding Mechanisms, Overheads, and Scaling. In Proceedings of the 32th Annual International Symposium on Computer Architecture, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. B.C. Lee and D.M. Brooks. Illustrative design space studies with microarchitectural regression models. In Proceedings of the 13th International Symposium on High Performance Computer Architecture, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Y. Li, B. Lee, D. Brooks, Z. Hu, and K. Skadron. CMP Design Space Exploration Subject to Physical Constraints. In Proceedings of the 12th International Symposium on High Performance Computer Architecture, 2006.Google ScholarGoogle ScholarCross RefCross Ref
  12. A. Mahesri, D.R. Johnson, N. Crago, and S.J. Patel. Tradeoffs in Designing Accelerator Architectures for Visual Computing. Technical Report UILU-ENG-08-2008, University of Illinois, May 2008.Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. M. Monchiero, R. Canal, and A. Gonzlez. Design space exploration for multicore architectures: a power/performance/thermal view. In Proceedings of the 20th International Conference on Supercomputing, pages 178--186, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. K. Olukotun, B.A. Nayfeh, L. Hammond, K. Wilson, and K. Chang. The case for a single-chip multiprocessor. In Proceedings of the 7th International Conference on Architectural Support for Programming Languages and Operating Systems, 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. D. Patil, S.J. Kim, and M. Horowitz. Joint supply, threshold voltage and sizing optimization for design of robust digital circuits. Technical report, Department of Electrical Engineering, Stanford University.Google ScholarGoogle Scholar
  16. D.A. Patterson and C.H. Sequin. RISC I: A Reduced Instruction Set VLSI Computer. In ISCA '81: Proceedings of the 8th annual symposium on Computer Architecture, pages 443--457, Los Alamitos, CA, USA, 1981. IEEE Computer Society Press. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. E. Sprangle and D. Carmean. Increasing processor performance by implementing deeper pipelines. isca, 00:0025, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. V. Zyuban and P. Strenski. Unified methodology for resolving power-performance tradeoffs at the microarchitectural and circuit levels. In ISLPED '02: Proceedings of the 2002 international symposium on Low power electronics and design, pages 166--171, New York, NY, USA, 2002. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Area-efficiency in CMP core design: co-optimization of microarchitecture and physical design

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in

          Full Access

          • Published in

            cover image ACM SIGARCH Computer Architecture News
            ACM SIGARCH Computer Architecture News  Volume 37, Issue 2
            May 2009
            69 pages
            ISSN:0163-5964
            DOI:10.1145/1577129
            Issue’s Table of Contents

            Copyright © 2009 Authors

            Publisher

            Association for Computing Machinery

            New York, NY, United States

            Publication History

            • Published: 23 July 2009

            Check for updates

            Qualifiers

            • research-article

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader