Abstract
Design diversity has been used for many years now as a means of achieving a degree of fault tolerance in software-based systems. While there is clear evidence that the approach can be expected to deliver some increase in reliability compared to a single version, there is no agreement about the extent of this. More importantly, it remains difficult to evaluate exactly how reliable a particular diverse fault-tolerant system is. This difficulty arises because assumptions of independence of failures between different versions have been shown to be untenable: assessment of the actual level of dependence present is therefore needed, and this is difficult. In this tutorial, we survey the modeling issues here, with an emphasis upon the impact these have upon the problem of assessing the reliability of fault-tolerant systems. The intended audience is one of designers, assessors, and project managers with only a basic knowledge of probabilities, as well as reliability experts without detailed knowledge of software, who seek an introduction to the probabilistic issues in decisions about design diversity.
- ADAMS, E. N. 1984. Optimizing preventive service of software products. IBM J. Res. Devel. 28,1, 2-14.Google ScholarDigital Library
- AMMANN,P.E.AND KNIGHT, J. C. 1988. Data diversity: An approach to software fault tolerance. IEEE Trans. Comput. C-37, 4, 418-425. Google ScholarDigital Library
- ANDERSON, T., BARRETT, P. A., HALLIWELL,D.N.AND MOULDING, M. R. 1985. An evaluation of software fault tolerance in a practical system. In Proceedings of the 15th IEEE International Symposium on Fault-Tolerant Computing (FTCS- 15). (Ann Arbor, MI.)Google Scholar
- BABBAGE, C. 1974. On the mathematical powers of the calculating engine (unpublished manuscript, December 1837). In The Origins of Digital Computers: Selected Papers, B. Randell, Ed. Springer- Verlag, New York, 17-52.Google Scholar
- BISHOP, P. G. 1988. The PODS diversity experiment. In Software Diversity in Computerized Control Systems, U. Voges, Ed. Springer-Verlag, New York, pp. 51-84.Google Scholar
- BISHOP,P.G.AND PULLEN, F. D. 1988. PODS revisited-A study of software failure behavior. In Proceedings of the 18th International Symposium on Fault-Tolerant Computing. (Tokyo), IEEE Computer Society Press, Los Alamitos, Calif.Google Scholar
- BLOUGH,D.M.AND SULLIVAN, G. 1990. A comparison of voting strategies for fault-tolerant distributed systems. In Ninth Symposium on Reliable Distributed Systems (SRDS-9) (Huntsville, AL), IEEE Computer Society.Google ScholarCross Ref
- BONDAVALLI, A., CHIARADONNA, S., DI GIANDOMENICO, F. AND STRIGINI, L. 1999. A contribution to the evaluation of the reliability of iterativeexecution software. Soft. Test. Verif. Reliab. 9,3, 145-166.Google ScholarCross Ref
- BRIERE,D.AND TRAVERSE, P. 1993. Airbus A320/A330/A340 electrical flight controls-A family of fault-tolerant systems. In Proceedings of the 23rd International Symposium on Fault- Tolerant Computing (FTCS-23). (Toulouse, France), IEEE Computer Society, Los Alamitos, Calif.Google ScholarCross Ref
- DI GIANDOMENICO,F.AND STRIGINI, L. 1990. Adjudicators for diverse-redundant components. In Ninth Symposium on Reliable Distributed Systems (SRDS-9) (Huntsville, AL.), IEEE Computer Society Press, Los Alamitos, Calif.Google ScholarCross Ref
- DYER, M. 1992. The Cleanroom Approach to Quality Software Development. Software Engineering Practice. Wiley, New York. Google ScholarDigital Library
- ECKHARDT,D.E.AND LEE, L. D. 1985. A theoretical basis for the analysis of multiversion software subject to coincident errors. IEEE Trans. Softw. Eng. SE-11, 12, 1511-1517.Google ScholarDigital Library
- FAA 1985. Federal Aviation Administration, Advisors Circular AC 25 1309-1A.Google Scholar
- HAGELIN, G. 1988. ERICSSON safety systems for railway control. In Software Diversity in Computerized Control Systems, U. Voges, Ed. Springer-Verlag, New York, pp. 11-21.Google Scholar
- HUANG, Y., KINTALA, C., KOLETTIS,N.AND FULTON,N.D. 1995. Software rejuvenation: Analysis, module and applications. In 25th International Symposium on Fault Tolerant Computing (FTCS-25) (Pasadena), IEEE Computer Society Press, Los Alamitos, Calif. Google ScholarDigital Library
- HUGHES, R. P. 1987. A new approach to common cause failure. Reliab. Eng. 17, 211-236.Google ScholarCross Ref
- KANTZ,H.AND KOZA, C. 1995. The ELEKTRA railway signalling-system: Field experience with an actively replicated system with diversity. In Proceedings of the 25th IEEE Annual International Symposium on Fault -Tolerant Computing (FTCS-25) (Pasadena), IEEE Computer Society Press, Los Alamitos, Calif. Google ScholarDigital Library
- KERSKEN,M.AND SAGLIETTI, F. Eds. 1992. Software fault tolerance: Achievement and assessment strategies. Research Reports ESPRIT, Springer- Verlag, New York.Google Scholar
- KNIGHT,J.C.AND AMMAN, P. E. 1985. An experimental evaluation of simple methods for seeding program errors. In Proceedings of the Eighth International Conference on Software Engineering, IEEE Computer Society Press, Los Alamitos, Calif. Google ScholarDigital Library
- KNIGHT,J.C.AND LEVESON, N. G. 1986. An experimental evaluation of the assumption of independence in multi-version programming. IEEE Trans. Softw. Eng. SE-12, 1, 96-109. Google ScholarDigital Library
- KNIGHT,J.C.AND LEVESON, N.G. 1990. A reply to the criticism of the Knight & Leveson experiment. ACM SIGSOFT Softw. Eng. Notes 15,1,24- 35. Google ScholarDigital Library
- LAPRIE, J. C., ARLAT, J., BEOUNES,C.AND KANOUN, K. 1990. Definition and analysis of hardwareand-software fault-tolerant architectures. IEEE Comput. 23, 7, 39-51. Google ScholarDigital Library
- LARYD, A. 1994. Operating experience of software in programmable equipment used in ABB Atom nuclear I&C application. In Advanced Control and Instrumentation Systems in Nuclear Power Plants. Design, Verification and Validation. IAEA/IWG/ATWR & NPPCI Technical Committee Meeting (Espoo, Finland).Google Scholar
- LEE,I.AND IYER, R. K. 1995. Software dependability in the Tandem GUARDIAN system. IEEE Trans. Softw. Eng. 21, 5, 455-467. Google ScholarDigital Library
- LINDEBERG, J. F. 1993. The Swedish state railways' experience with n-version programmed systems. In Directions in Safety-Critical Systems, F. Redmill and T. Anderson, Eds. Springer- Verlag, New York, p. 36.Google Scholar
- LITTLEWOOD, B. 1996. The impact of diversity upon common mode failures. Reliab. Eng. Syst. Safety. 51, 101-113.Google ScholarCross Ref
- LITTLEWOOD,B.POPOV, P., STRIGINI,L.AND SHRYANE, N. 2000. Modelling the effects of combining diverse software fault removal techniques. IEEE Trans. Softw. Eng. SE-26, 12, 1157-1167. Google ScholarDigital Library
- LITTLEWOOD,B.AND MILLER, D. R. 1989. Conceptual modelling of coincident failures in multi-version software. IEEE Trans. Softw. Eng. SE-15, 12, 1596-1614. Google ScholarDigital Library
- LITTLEWOOD,B.AND STRIGINI, L. 1993. Validation of ultra-high dependability for software-based systems. Communi. ACM, 36, 11, (Nov.), 69-80. Google ScholarDigital Library
- LITTLEWOOD,B.AND STRIGINI, L. 1998. Guidelines for the statistical testing of software. Centre for Software Reliability, City University, London.Google Scholar
- LITTLEWOOD, B., POPOV,P.,AND STRIGINI, L. 1999. A note on reliability estimation of functionally diverse systems. Reliab. Eng. Syst. Safety. 66, 93- 95.Google ScholarCross Ref
- LYU, M. R. Ed. 1995. Software Fault Tolerance. Wiley, New York, 337. Google ScholarDigital Library
- LYU, M. R. Ed. 1996. Handbook of Software Reliability Engineering. IEEE Computer Society Press and McGraw-Hill, New York. Google ScholarDigital Library
- MIGNEAULT, G. E. 1982. The Cost of Software Fault Tolerance Technical Report. NASA Langley Research Center, Hampton, Va.Google Scholar
- MoD, 1996. Safety management requirements for defence systems. U.K. Ministry of Defence.Google Scholar
- MoD, 1997. Requirements for safety related software in defence equipment. U.K. Ministry of Defence.Google Scholar
- MONGARDI, G. 1993. Dependable computing for railway control systems. In Third IFIP International Working Conference on Dependable Computing for Critical Applications (DCCA-3) (Mondello, Italy).Google ScholarCross Ref
- MUSA, J. D. 1993. Operational profiles in softwarereliability engineering. IEEE Softw. (March), 14-32. Google ScholarDigital Library
- NICOLA,V.F.AND GOYAL, A. 1990. Modeling of correlated failures and community error recovery in multiversion software. IEEE Trans. Softw. Eng. 16, 3, 350-359. Google ScholarDigital Library
- POPOV,P.T.AND STRIGINI, L. 1998. Conceptual models for the reliability of diverse systems-New results. In Proceedings of the 28th International Symposium on Fault-Tolerant Computing (FTCS-28) (Munich, Germany) IEEE Computer Society Press, Los Alamitos, Calif. Google ScholarDigital Library
- POPOV, P., STRIGINI, L., AND PIZZA, M. 1998. The efficacy of diverse redundancy against design error: Some practical considerations. In Preprints of the INucE Third International Conference on Control and Instrumentation in Nuclear Installations (Edinburgh).Google Scholar
- RTCA/EuroCAE, 1992. DO-178B, Software considerations in airborne systems and equipment certification.Google Scholar
- SHOOMAN, M. 1996. Avionics software problem occurrence rates. In ISSRE'96, Seventh International Symposium on Software Reliability Engineering (White Plains, NY). Google ScholarDigital Library
- SMITH,I.C.,WALL,D.N.,AND BALDWIN, J. A. 1991. DARTS-An experiment into cost of and diversity in safety critical computer systems. In IFAC/IFIP/EWICS/SRE Symposium on Safety of Computer Control Systems (SAFECOMP '91). (Trondheim, Norway), Pergamon Press.Google ScholarCross Ref
- STRIGINI, L. 1996. On testing process control software for reliability assessment: The effects of correlation between successive failures. Softw. Test. Verif. Reliab. 6, 1, 36-48.Google ScholarCross Ref
- TRAVERSE, P. J. 1988. AIRBUS and ATR system architecture and specification. In Software Diversity in Computerized Control Systems, U. Voges, Ed. Springer-Verlag, New York, pp. 95- 104.Google Scholar
- TURNER,D.B.,BURNS,R.D.,AND HECHT, H. 1987. Designing micro-based systems for fail-safe travel. IEEE Spectrum 24, 2, 58-63. Google ScholarDigital Library
- VOGES,U.AND GMEINER, L. 1979. Software diversity in reactor protection systems: An experiment. In IFAC Workshop, SAFECOMP'79 (Stuttgart, Germany May 16-18).Google Scholar
- VOGES, U. Ed. 1988. Software Diversity in Computerized Control Systems. Dependable Computing and Fault-Tolerance Series. Springer-Verlag, Wien, Austria. Google ScholarDigital Library
- VOGES, U. 1994. Software diversity. Reliab. Eng. Syst. Safety 43, 2, 103-110.Google ScholarCross Ref
- YEH, Y. C. B. 1998. Design considerations in Boeing 777 fly-by-wire computers. In Third IEEE High-Assurance Systems Engineering Symposium (HASE). (Washington, DC) IEEE Computer Society Press, Los Alamitos, Calif. Google ScholarDigital Library
Index Terms
- Modeling software design diversity: a review
Recommendations
Fault Injection for Software Certification
As software becomes more pervasive and complex, it's increasingly important to ensure that a system will be safe even in the presence of residual software faults (or bugs). Software fault injection consists of the deliberate introduction of software ...
The N-Version Approach to Fault-Tolerant Software
Evolution of the N-version software approach to the tolerance of design faults is reviewed. Principal requirements for the implementation of N-version software are summarized and the DEDIX distributed supervisor and testbed for the execution of N-...
The Infeasibility of Quantifying the Reliability of Life-Critical Real-Time Software
This work affirms that the quantification of life-critical software reliability is infeasible using statistical methods, whether these methods are applied to standard software or fault-tolerant software. The classical methods of estimating reliability ...
Comments