research-article

Area-efficiency in CMP core design: co-optimization of microarchitecture and physical design

Authors:
Omid Azizi

Stanford University

Stanford University
View Profile

,
Aqeel Mahesri

University of Illinois at Urbana-Champaign

University of Illinois at Urbana-Champaign
View Profile

,
Sanjay J. Patel

University of Illinois at Urbana-Champaign

University of Illinois at Urbana-Champaign
View Profile

,
Mark Horowitz

Stanford University

Stanford University
View Profile

Authors Info & Claims

ACM SIGARCH Computer Architecture News Volume 37 Issue 2May 2009pp 56–65https://doi.org/10.1145/1577129.1577138

Published:23 July 2009Publication History

ACM SIGARCH Computer Architecture News

Abstract

In this paper, we examine the area-performance design space of a processing core for a chip multiprocessor (CMP), considering both the architectural design space and the tradeoffs of the physical design on which the architecture relies. We first propose a methodology for performing an integrated optimization of both the micro-architecture and the physical circuit design of a microprocessor. In our approach, we use statistical and convex fitting methods to capture a large micro-architectural design space. We then characterize the area-delay tradeoffs of the underlying circuits through RTL synthesis. Finally, we establish the relationship between the architecture and the circuits in an integrative model, which we use to optimize the processor. As a case study, we apply this methodology to explore the performance-area tradeoffs in a highly parallel accelerator architecture for visual computing applications. Based on some early circuit tradeoff data, our results indicate that two separate designs are performance/area optimal for our set of benchmarks: a simpler single-issue, 2-way multithreaded core running at high-frequency, and a more aggressively tuned dual-issue 4-way multithreaded design running at a lower frequency.

References

J. Balfour and W.J. Dally. Design tradeoffs for tiled CMP on-chip networks. In Proceedings of the 20th International Conference on Supercomputing, pages 187--198, 2006. Google ScholarDigital Library
A. Hartstein and T.R. Puzak. The optimum pipeline depth for a microprocessor. isca, 00:0007, 2002. Google ScholarDigital Library
A. Hartstein and T.R. Puzak. The optimum pipeline depth considering both power and performance. ACM Trans. Archit. Code Optim., 1(4):369--388, 2004. Google ScholarDigital Library
M.S. Hrishikesh, D. Burger, S.W. Keckler, P. Shivakumar, N.P. Jouppi, and K.I. Farkas. The optimal logic depth per pipeline stage is 6 to 8 fo4 inverter delays. isca, 00:0014, 2002. Google ScholarDigital Library
L. Hsu, R. Iyer, S. Makineni, S. Reinhardt, and D. Newell. Exploring the cache design space for large scale CMPs. ACM SIGARCH Somputer Architecture News, 33(4):24--33, 2005. Google ScholarDigital Library
J. Huh, D. Burger, and S. Keckler. Exploring the design space of future CMPs. In Proceedings of the International Conference on Parallel Architectures and Compilation Techniques, pages 199--210, 2001. Google ScholarDigital Library
E. Ipek, S. McKee, B. de Supinski, M. Schulz, and R. Caruana. Efficiently Exploring Architectural Design Spaces via Predictive Modeling. In Proceedings of the 12th International Conference on Architectural Support for Programming Languages and Operating Systems, 2006. Google ScholarDigital Library
R. Kumar, D.M. Tullsen, and N.P. Jouppi. Core architecture optimization for heterogeneous chip multiprocessors. In PACT '06: Proceedings of the 15th international conference on Parallel architectures and compilation techniques, pages 23--32, New York, NY, USA, 2006. ACM. Google ScholarDigital Library
R. Kumar, V. Zyuban, and D.M. Tullsen. Interconnections in Multi-Core Architectures: Understanding Mechanisms, Overheads, and Scaling. In Proceedings of the 32th Annual International Symposium on Computer Architecture, 2005. Google ScholarDigital Library
B.C. Lee and D.M. Brooks. Illustrative design space studies with microarchitectural regression models. In Proceedings of the 13th International Symposium on High Performance Computer Architecture, 2007. Google ScholarDigital Library
Y. Li, B. Lee, D. Brooks, Z. Hu, and K. Skadron. CMP Design Space Exploration Subject to Physical Constraints. In Proceedings of the 12th International Symposium on High Performance Computer Architecture, 2006.Google ScholarCross Ref
A. Mahesri, D.R. Johnson, N. Crago, and S.J. Patel. Tradeoffs in Designing Accelerator Architectures for Visual Computing. Technical Report UILU-ENG-08-2008, University of Illinois, May 2008.Google ScholarDigital Library
M. Monchiero, R. Canal, and A. Gonzlez. Design space exploration for multicore architectures: a power/performance/thermal view. In Proceedings of the 20th International Conference on Supercomputing, pages 178--186, 2006. Google ScholarDigital Library
K. Olukotun, B.A. Nayfeh, L. Hammond, K. Wilson, and K. Chang. The case for a single-chip multiprocessor. In Proceedings of the 7th International Conference on Architectural Support for Programming Languages and Operating Systems, 1996. Google ScholarDigital Library
D. Patil, S.J. Kim, and M. Horowitz. Joint supply, threshold voltage and sizing optimization for design of robust digital circuits. Technical report, Department of Electrical Engineering, Stanford University.Google Scholar
D.A. Patterson and C.H. Sequin. RISC I: A Reduced Instruction Set VLSI Computer. In ISCA '81: Proceedings of the 8th annual symposium on Computer Architecture, pages 443--457, Los Alamitos, CA, USA, 1981. IEEE Computer Society Press. Google ScholarDigital Library
E. Sprangle and D. Carmean. Increasing processor performance by implementing deeper pipelines. isca, 00:0025, 2002. Google ScholarDigital Library
V. Zyuban and P. Strenski. Unified methodology for resolving power-performance tradeoffs at the microarchitectural and circuit levels. In ISLPED '02: Proceedings of the 2002 international symposium on Low power electronics and design, pages 166--171, New York, NY, USA, 2002. ACM. Google ScholarDigital Library

Index Terms

Area-efficiency in CMP core design: co-optimization of microarchitecture and physical design

Recommendations

Area and System Clock Effects on SMT/CMP Throughput

Two approaches to high throughput processors are Chip Multi-Processing (CMP) and Simultaneous Multi-Threading (SMT). CMP increases layout efficiency, which allows more functional units and a faster clock rate. However, CMP suffers from hardware ...
Read More
Area and System Clock Effects on SMT/CMP Processors
PACT '01: Proceedings of the 2001 International Conference on Parallel Architectures and Compilation Techniques

Abstract: Two approaches to high throughput processors are Chip Multi-Processing (CMP) and Simultaneous Multi-Threading (SMT). CMP increases layout efficiency, which allows more functional units and a faster clock rate. However, CMP suffers from ...
Read More
Mat-core: a decoupled matrix core extension for general-purpose processors

This paper proposes new processor architecture to exploit the increasingly number of transistors per integrated circuit and improve the performance of many applications on general-purpose processors. The proposed processor (called Mat-Core) is based on ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in

ACM SIGARCH Computer Architecture News Volume 37, Issue 2
May 2009
69 pages
ISSN:0163-5964
DOI:10.1145/1577129
Issue’s Table of Contents

Copyright © 2009 Authors
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 23 July 2009
Check for updates
Qualifiers
- research-article
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 3
  Total Citations
  View Citations
- 390
  Total Downloads
- Downloads (Last 12 months)3
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Area-efficiency in CMP core design: co-optimization of microarchitecture and physical design

ACM SIGARCH Computer Architecture News

Abstract

References

Cited By

Index Terms

Recommendations

Area and System Clock Effects on SMT/CMP Throughput

Area and System Clock Effects on SMT/CMP Processors

Mat-core: a decoupled matrix core extension for general-purpose processors

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Check for updates

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Area-efficiency in CMP core design: co-optimization of microarchitecture and physical design

ACM SIGARCH Computer Architecture News

Abstract

References

Cited By

Index Terms

Recommendations

Area and System Clock Effects on SMT/CMP Throughput

Area and System Clock Effects on SMT/CMP Processors

Mat-core: a decoupled matrix core extension for general-purpose processors

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Check for updates

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media