Abstract
The data in Section 3 indicates that system failures are predominantly transient in nature and follow a decreasing failure rate function (i.e., Weibull) rather than a constant failure rate function (i.e., exponential). System failures have diverse manifestations and diverse causes ranging from errors in design to component aging. A substantial gap remains between actual system failures and these system failure models. Perhaps as little as 50of the observed system faiures fall into one or more of the high-level fault models defined in Section 1. Much work remains to effectively bridge this gap.
Preview
Unable to display preview. Download preview PDF.
References
A. Avizienis “Architecture of Fault-Tolerant Computer systems,” 5th International Symposium on Fault-Tolerant Computing, IEEE, Paris, FR, pp 3–16, 1975.
Ball, M.O. and F. Hardie, “Effects and Detection of Intermittent Failure in Digital Systems,” IBM 67-825-2137, 1967.
Breuer, M. A., “Testing for Intermittent Faults in Digital Circuits,” IEEE Transactions on Computers, Vol. C-22: pp 241–246, March 1975.
Brodsky, M, “Hardening RAMs Against Soft Errors,” Electronics, Vol 53, April 24, 1980, McGraw-Hill.
Castillo, X., S. R. McConnel, D. P. Siewiorek, “Derivation and Calibration of a Transient Error Reliability Model,” IEEE Transactions on Computers, Vol. C-31(7), pp 658–671, July 1982.
Clune, Ed, “Analysis of the Fault Free Behavior of the FTMP Multiprocessor System,” Technical Report CMU-CS-84-130, Carnegie Mellon University, 1984.
Czeck, Ed., D. P. Siewiorek, Z. Segall, “Fault Free Performance Validation of a Fault-Tolerant Multiprocessor: Baseline and Synthetic Workload Measurements,” Technical Report CMU-CS-85-177, Carnegie Mellon University, Nov. 1985.
Czeck, Edward W., Frank E. Feather, Ann Marie Grizzaffi, George B. Finelli, Zary Z. Segall, and Daniel P. Siewiorek, “Fault-Free Performance Validation of Avionic Multiprocessors,” 7th Digital Avionic Systems Conference, Dallas, TX, October 1986.
Czeck, Edward W., Frank E. Feather, Ann Marie Grizzaffi, Zary Z. Segall, and Daniel P. Siewiorek “Fault-Free Performance Validation of Fault-Tolerant Multiprocessor,” NASA CR-178236, January 1987.
Czeck, Edward W., Daniel P. Siewiorek, and Zary Z. Segall, “Software Implemented Fault Insertion: An FTMP Example,” NASA CR-17823, October 1987.
Czeck, Edward W., Daniel P. Siewiorek, and Zary Z. Segall, “Predeployment Validation of Fault-Tolerant Systems Through Software-Implemented Fault Insertion,” NASA CR-4244, July 1989.
Faulkner, T. L., C. W. Bartlett, and M. Small, “Hardware Logic Design Faults — a Classification and Some Measurements,” 12th Annual International Symposium on Fault-Tolerant Computing, pp 377–380, Santa Monica, CA, June 1982.
Feather, Frank, “Validation of a Fault-Tolerant Multiprocessor: Baseline Experiments and Workload Implementation,” Technical Report CMU-CS-85-145, Carnegie Mellon University, July 1985.
Geilhufe, M., “Soft Errors in Semiconductor Memories,” Digest of Papers, COMPCON Spring 79, IEEE Computer Society, 1979.
Grizzaffi, Ann Marie, “Fault Free Performance Validation of Fault-Tolerant Multiprocessors,” Technical Report CMU-CS-86-127, Carnegie Mellon University, Nov. 1985.
Kamal, S., “An Approach to the Diagnosis of Intermittent Faults,” IEEE Transactions on Computers, Vol. C-24, pp 461–467, May 1975.
Kamal, S and C. V. Page, “Intermittent Faults: A Model and Detection Procedure,” IEEE Trans. Comp, C-23, pp 173–179, July 1974.
Laprie, J-C, “Dependable Computing and Fault Tolerance: Concepts and Terminology,” IEEE 15th Annual International Symposium on Fault-Tolerant Computing, Ann Arbor, Michigan, pp 2–11, June 1985.
Lamport, L., “Proving the Correctness of Multiprocess Programs”, IEEE Transactions on Software Engineering, Vol. SE-3, No. 7, pp 125–133, March 1977.
J. Losq, “Testing for Intermittent Failures in Combinational Circuits,” Third USA-Japan Computer Conf., AFIPS-IPSJ, pp 165–170, 1978.
McConnel, S. R., D. P. Siewiorek, and M. M. Tsao, “Transient Error Data Analysis”, Technical Report, Carnegie-Mellon University, Department of Computer Science, May 1979.
McGough, J. G., F. Swern, and S.J. Bavuso, “New Results in Fault Latency Modeling”, Proceedings of the IEEE EASCON Conference, pp 299–306, August 1983.
Monachino, M., “Design Verification System for Large-Scale LSI Designs,” IBM Journal of Research and Development, Vol. 26, No. 1, pp 78–88, January 1982.
M. Morganti, Personal communications to author, 1978.
Ohm, V. J., “Reliability Consideration for Semiconductor Memories,” In Spring Digest of Papers CompCon, IEEE Computer Society, pp 207–209, 1979.
Roth, J. P., W. G. Bouricius, W. C. Carter, and P. R. Schneider, “Phase II of an Architectural Study for a Self-Repairing Computer,” SAMSO-TR-67-106, U.S. Air Force Space and Missile Division, El Segundo, CA, 1967.
Savir, J., “Testing for Intermittent Failures in Combinational Circuits by Minimizing the Mean Testing Time for a Given Test Quality,” Third USA-Japan Computer Conf. AFIPS & IPSJ, pp 155–161, 1978.
Schuette, M.A., J. P. Shen, D. P. Siewiorek, and Y. X. Zhu, “Experimental Evaluation of Two Concurrent Error Detection Schemes,” IEEE 16th Annual International Symposium on Fault-Tolerant Computing, Vienna, Austria, pp 138–143, July 1986.
Shen, J. P., W. Maly and F. Joel Ferguson, “Inductive Fault Analysis of MOS Integrated Circuits,” IEEE Design and Test of Computers, December 1985.
Siewiorek, D. P., V. Kini, H. Mashburn, S. McConnel, and M. Tsao, “A Case Study of C.mmp, Cm*, C.vmp: Part I — Experiences with Fault Tolerance in Multiprocessor Systems,” Proceedings of the IEEE, pp 1178–1199, October 1978.
Tasar, O. and V. Tasar, “A Study of Intermittent Faults in Digital Computers,” AFIPS Conf. Proceedings,Vol. 46, pp 807–811, Montvale, NJ, 1977.
Toy, W. N., “Fault-Tolerant Design of Local ESS Processors,” Proc. IEEE Vol. 66, No. 10, pp 1126–1145, October 1978.
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 1990 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Siewiorek, D.P. (1990). Faults and their manifestation. In: Simons, B., Spector, A. (eds) Fault-Tolerant Distributed Computing. Lecture Notes in Computer Science, vol 448. Springer, New York, NY. https://doi.org/10.1007/BFb0042340
Download citation
DOI: https://doi.org/10.1007/BFb0042340
Published:
Publisher Name: Springer, New York, NY
Print ISBN: 978-0-387-97385-2
Online ISBN: 978-0-387-34812-4
eBook Packages: Springer Book Archive