Skip to main content

Faults and their manifestation

  • Systems Session II
  • Conference paper
  • First Online:
Fault-Tolerant Distributed Computing

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 448))

  • 153 Accesses

Abstract

The data in Section 3 indicates that system failures are predominantly transient in nature and follow a decreasing failure rate function (i.e., Weibull) rather than a constant failure rate function (i.e., exponential). System failures have diverse manifestations and diverse causes ranging from errors in design to component aging. A substantial gap remains between actual system failures and these system failure models. Perhaps as little as 50of the observed system faiures fall into one or more of the high-level fault models defined in Section 1. Much work remains to effectively bridge this gap.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. A. Avizienis “Architecture of Fault-Tolerant Computer systems,” 5th International Symposium on Fault-Tolerant Computing, IEEE, Paris, FR, pp 3–16, 1975.

    Google Scholar 

  2. Ball, M.O. and F. Hardie, “Effects and Detection of Intermittent Failure in Digital Systems,” IBM 67-825-2137, 1967.

    Google Scholar 

  3. Breuer, M. A., “Testing for Intermittent Faults in Digital Circuits,” IEEE Transactions on Computers, Vol. C-22: pp 241–246, March 1975.

    Google Scholar 

  4. Brodsky, M, “Hardening RAMs Against Soft Errors,” Electronics, Vol 53, April 24, 1980, McGraw-Hill.

    Google Scholar 

  5. Castillo, X., S. R. McConnel, D. P. Siewiorek, “Derivation and Calibration of a Transient Error Reliability Model,” IEEE Transactions on Computers, Vol. C-31(7), pp 658–671, July 1982.

    Google Scholar 

  6. Clune, Ed, “Analysis of the Fault Free Behavior of the FTMP Multiprocessor System,” Technical Report CMU-CS-84-130, Carnegie Mellon University, 1984.

    Google Scholar 

  7. Czeck, Ed., D. P. Siewiorek, Z. Segall, “Fault Free Performance Validation of a Fault-Tolerant Multiprocessor: Baseline and Synthetic Workload Measurements,” Technical Report CMU-CS-85-177, Carnegie Mellon University, Nov. 1985.

    Google Scholar 

  8. Czeck, Edward W., Frank E. Feather, Ann Marie Grizzaffi, George B. Finelli, Zary Z. Segall, and Daniel P. Siewiorek, “Fault-Free Performance Validation of Avionic Multiprocessors,” 7th Digital Avionic Systems Conference, Dallas, TX, October 1986.

    Google Scholar 

  9. Czeck, Edward W., Frank E. Feather, Ann Marie Grizzaffi, Zary Z. Segall, and Daniel P. Siewiorek “Fault-Free Performance Validation of Fault-Tolerant Multiprocessor,” NASA CR-178236, January 1987.

    Google Scholar 

  10. Czeck, Edward W., Daniel P. Siewiorek, and Zary Z. Segall, “Software Implemented Fault Insertion: An FTMP Example,” NASA CR-17823, October 1987.

    Google Scholar 

  11. Czeck, Edward W., Daniel P. Siewiorek, and Zary Z. Segall, “Predeployment Validation of Fault-Tolerant Systems Through Software-Implemented Fault Insertion,” NASA CR-4244, July 1989.

    Google Scholar 

  12. Faulkner, T. L., C. W. Bartlett, and M. Small, “Hardware Logic Design Faults — a Classification and Some Measurements,” 12th Annual International Symposium on Fault-Tolerant Computing, pp 377–380, Santa Monica, CA, June 1982.

    Google Scholar 

  13. Feather, Frank, “Validation of a Fault-Tolerant Multiprocessor: Baseline Experiments and Workload Implementation,” Technical Report CMU-CS-85-145, Carnegie Mellon University, July 1985.

    Google Scholar 

  14. Geilhufe, M., “Soft Errors in Semiconductor Memories,” Digest of Papers, COMPCON Spring 79, IEEE Computer Society, 1979.

    Google Scholar 

  15. Grizzaffi, Ann Marie, “Fault Free Performance Validation of Fault-Tolerant Multiprocessors,” Technical Report CMU-CS-86-127, Carnegie Mellon University, Nov. 1985.

    Google Scholar 

  16. Kamal, S., “An Approach to the Diagnosis of Intermittent Faults,” IEEE Transactions on Computers, Vol. C-24, pp 461–467, May 1975.

    MathSciNet  Google Scholar 

  17. Kamal, S and C. V. Page, “Intermittent Faults: A Model and Detection Procedure,” IEEE Trans. Comp, C-23, pp 173–179, July 1974.

    Google Scholar 

  18. Laprie, J-C, “Dependable Computing and Fault Tolerance: Concepts and Terminology,” IEEE 15th Annual International Symposium on Fault-Tolerant Computing, Ann Arbor, Michigan, pp 2–11, June 1985.

    Google Scholar 

  19. Lamport, L., “Proving the Correctness of Multiprocess Programs”, IEEE Transactions on Software Engineering, Vol. SE-3, No. 7, pp 125–133, March 1977.

    MathSciNet  Google Scholar 

  20. J. Losq, “Testing for Intermittent Failures in Combinational Circuits,” Third USA-Japan Computer Conf., AFIPS-IPSJ, pp 165–170, 1978.

    Google Scholar 

  21. McConnel, S. R., D. P. Siewiorek, and M. M. Tsao, “Transient Error Data Analysis”, Technical Report, Carnegie-Mellon University, Department of Computer Science, May 1979.

    Google Scholar 

  22. McGough, J. G., F. Swern, and S.J. Bavuso, “New Results in Fault Latency Modeling”, Proceedings of the IEEE EASCON Conference, pp 299–306, August 1983.

    Google Scholar 

  23. Monachino, M., “Design Verification System for Large-Scale LSI Designs,” IBM Journal of Research and Development, Vol. 26, No. 1, pp 78–88, January 1982.

    Article  Google Scholar 

  24. M. Morganti, Personal communications to author, 1978.

    Google Scholar 

  25. Ohm, V. J., “Reliability Consideration for Semiconductor Memories,” In Spring Digest of Papers CompCon, IEEE Computer Society, pp 207–209, 1979.

    Google Scholar 

  26. Roth, J. P., W. G. Bouricius, W. C. Carter, and P. R. Schneider, “Phase II of an Architectural Study for a Self-Repairing Computer,” SAMSO-TR-67-106, U.S. Air Force Space and Missile Division, El Segundo, CA, 1967.

    Google Scholar 

  27. Savir, J., “Testing for Intermittent Failures in Combinational Circuits by Minimizing the Mean Testing Time for a Given Test Quality,” Third USA-Japan Computer Conf. AFIPS & IPSJ, pp 155–161, 1978.

    Google Scholar 

  28. Schuette, M.A., J. P. Shen, D. P. Siewiorek, and Y. X. Zhu, “Experimental Evaluation of Two Concurrent Error Detection Schemes,” IEEE 16th Annual International Symposium on Fault-Tolerant Computing, Vienna, Austria, pp 138–143, July 1986.

    Google Scholar 

  29. Shen, J. P., W. Maly and F. Joel Ferguson, “Inductive Fault Analysis of MOS Integrated Circuits,” IEEE Design and Test of Computers, December 1985.

    Google Scholar 

  30. Siewiorek, D. P., V. Kini, H. Mashburn, S. McConnel, and M. Tsao, “A Case Study of C.mmp, Cm*, C.vmp: Part I — Experiences with Fault Tolerance in Multiprocessor Systems,” Proceedings of the IEEE, pp 1178–1199, October 1978.

    Google Scholar 

  31. Tasar, O. and V. Tasar, “A Study of Intermittent Faults in Digital Computers,” AFIPS Conf. Proceedings,Vol. 46, pp 807–811, Montvale, NJ, 1977.

    Google Scholar 

  32. Toy, W. N., “Fault-Tolerant Design of Local ESS Processors,” Proc. IEEE Vol. 66, No. 10, pp 1126–1145, October 1978.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Barbara Simons Alfred Spector

Rights and permissions

Reprints and permissions

Copyright information

© 1990 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Siewiorek, D.P. (1990). Faults and their manifestation. In: Simons, B., Spector, A. (eds) Fault-Tolerant Distributed Computing. Lecture Notes in Computer Science, vol 448. Springer, New York, NY. https://doi.org/10.1007/BFb0042340

Download citation

  • DOI: https://doi.org/10.1007/BFb0042340

  • Published:

  • Publisher Name: Springer, New York, NY

  • Print ISBN: 978-0-387-97385-2

  • Online ISBN: 978-0-387-34812-4

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics