skip to main content
article

Exploiting Structural Duplication for Lifetime Reliability Enhancement

Published:01 May 2005Publication History
Skip Abstract Section

Abstract

Increased power densities (and resultant temperatures) and other effects of device scaling are predicted to cause significant lifetime reliability problems in the near future. In this paper, we study two techniques that leverage microarchitectural structural redundancy for lifetime reliability enhancement. First, in structural duplication (SD), redundant microarchitectural structures are added to the processor and designated as spares. Spare structures can be turned on when the original structure fails, increasing the processorýs lifetime. Second, graceful performance degradation (GPD) is a technique which exploits existing microarchitectural redundancy for reliability. Redundant structures that fail are shut down while still maintaining functionality, thereby increasing the processorýs lifetime, but at a lower performance. Our analysis shows that exploiting structural redundancy can provide significant reliability benefits, and we present guidelines for efficient usage of these techniques by identifying situations where each is more beneficial. We show that GPD is the superior technique when only limited performance or cost resources can be sacrificed for reliability. Specifically, on average for our systems and applications,GPD increased processor reliability to 1.42 times the base value for less than a 5% loss in performance. On the other hand, for systems where reliability is more important than performance or cost, SD is more beneficial. SD increases reliability to 3.17 times the base value for 2.25 times the base cost, for our applications. Finally, a combination of the two techniques (SD+GPD) provides the highest reliability benefit.

References

  1. {1} Assessing Product Reliability, Chapter 8, NIST/SEMATECH e-Handbook of Statistical Methods. In http://www.itl.nist.gov/div898/handbook/.Google ScholarGoogle Scholar
  2. {2} Compaq NonStop Himalaya S-Series Server Description Manual. In Compaq Technical Manual 520331-001, http://www.compaq.com.Google ScholarGoogle Scholar
  3. {3} Methods for Calculating Failure Rates in Units of FITs. In JEDEC Publication JESD85, 2001.Google ScholarGoogle Scholar
  4. {4} F. Bower et al. Tolerating Hard Faults in Microprocessor Array Structures. In Proceedings of the 2004 International Conference on Dependable Systems and Networks, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. {5} D. Brooks et al. Power-aware Microarchitecture: Design and Modeling Challenges for the next-generation microprocessor. In IEEE Micro, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. {6} D. Brooks et al. Wattch: A Framework for Architectural-Level Power Analysis and Optimizations. In Proc. of the 27th Annual Intl. Symp. on Comp. Arch., 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. {7} J. L. Hennessy and D. A. Patterson. Computer Architecture, A Quantitative Approach. Morgan Kaufmann, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. {8} S. Heo et al. Reducing Power Density Through Activity Migration. In Intl. Symp. on Low Power Elec. Design, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. {9} G. Hetheringon et al. Logic BIST for Large Industrial Designs: Real Issues and Case Studies. In Proceedings of the International Test Conference, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. {10} V. Iyengar, L. H. Trevillyan, and P. Bose. Representative Traces for Processor Models with Infinite Cache. In Proc. of the 2nd Intl. Symp. on High-Perf. Comp. Architecture, 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. {11} I. Koren et al. Defect Tolerant VLSI Circuits: Techniques and Yield Analysis. In Proceedings of the IEEE, 1998.Google ScholarGoogle Scholar
  12. {12} M. Moudgill et al. Environment for PowerPC microarchitectural exploration. In IEEE Micro, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. {13} M. Moudgill et al. Validation of turandot, a fast processor model for microarchitectural exploration. In IEEE Intl Perf., Computing, and Communications Conf., 1999.Google ScholarGoogle Scholar
  14. {14} W. H. Press, S. A. Teukolsky, W. T. Vetterling, and B. P. Flannery. Numerical recipes in C (2nd ed.): the art of scientific computing. Cambridge University Press, 1992. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. {15} P. Shivakumar et al. Exploiting Microarchitectural Redundancy for Defect Tolerance. In 21st Intl. Conf. on Comp. Design, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. {16} K. Skadron et al. Temperature-Aware Microarchitecture. In Proc. of the 30th Annual Intl. Symp. on Comp. Arch., 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. {17} L. Spainhower et al. IBM S/390 Parallel Enterprise Server G5 Fault Tolerance: A Historical Perspective. In IBM Journal of R&D, September/November 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. {18} J. Srinivasan et al. The Case for Lifetime Reliability-Aware Microprocessors. In Proc. of the 31st Annual Intl. Symp. on Comp. Architecture, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. {19} J. Srinivasan et al. The Impact of Technology Scaling on Lifetime Reliability. In Proceedings of the 2004 International Conference on Dependable Systems and Networks, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. {20} J. M. Tendler et al. POWER4 System Microarchitecture. In IBM Journal of Research and Development, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. {21} K. Trivedi. Probability and Statistics with Reliability, Queueing, and Computer Science Applications. Prentice Hall, 1982. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. {22} S. Zafar et al. A Model for Negative Bias Temperature Instability (NBTI) in Oxide and High-KpFETs. In 2004 Symposia on VLSI Technology and Circuits, June, 2004.Google ScholarGoogle Scholar

Index Terms

  1. Exploiting Structural Duplication for Lifetime Reliability Enhancement

                        Recommendations

                        Comments

                        Login options

                        Check if you have access through your login credentials or your institution to get full access on this article.

                        Sign in

                        Full Access

                        • Published in

                          cover image ACM SIGARCH Computer Architecture News
                          ACM SIGARCH Computer Architecture News  Volume 33, Issue 2
                          ISCA 2005
                          May 2005
                          531 pages
                          ISSN:0163-5964
                          DOI:10.1145/1080695
                          Issue’s Table of Contents
                          • cover image ACM Conferences
                            ISCA '05: Proceedings of the 32nd annual international symposium on Computer Architecture
                            June 2005
                            541 pages
                            ISBN:076952270X

                          Copyright © 2005 Authors

                          Publisher

                          Association for Computing Machinery

                          New York, NY, United States

                          Publication History

                          • Published: 1 May 2005

                          Check for updates

                          Qualifiers

                          • article