Abstract
There are two types of software fault tolerance techniques: single version and multi version. Single version techniques aim to improve the fault tolerance of a software component by adding to it mechanisms for fault detection, containment, and recovery. Multi-version techniques use redundant software components which are developed following design diversity rules. As in the hardware case, various choices have to be examined to determine at which level the redundancy has to be provided and which modules are to be made redundant. One has to be aware that the increase in complexity caused by redundancy can be quite severe and may diminish the dependability improvement, unless redundant resources are allocated in a proper way. In this chapter, we consider common single-version and multi-version software fault tolerance techniques, including checkpoint and restart, recovery blocks, n-version programming, and n self-checking programming. We also briefly cover software testing and common test coverage metrics.
“Programs are really not much more than the programmer’s best guess about what a system should do.” Russel Abbot.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Ammann, P., Knight, J.: Data diversity: an approach to software fault tolerance. IEEE Trans. Comput. 37(4), 418–425 (1988)
Aveyard, R.L., Man, F.T.: A study on the reliability of the circuit maintenance system-1 b. Bell Syst. Tech. J. 59(8), 1317–1332 (1980)
Avižienis, A.: Fault-tolerant systems. IEEE Trans. Comput. 25(12), 1304–1312 (1976)
Avižienis, A.: Design diversity: an approach to fault tolerance of design faults. In: Proceedings of the National Computer Conference and Exposition, pp. 163–171 (1984)
Avižienis, A.: The methodology of N-version programming. In: Lyu, M.R. (ed.) Software Fault Tolerance. Wiley, Chichester, pp. 158–168 (1995)
Bartlett, J.F.: A “NonStop” operating system. In: Proceedings of the 11th Hawaii International Conference on System Sciences, vol. 3 (1978)
Beizer, B.: Software Testing Techniques. Van Nostrand Reinhold, New York (1990)
Bishop, P.: Software fault tolerance by design diversity. In: Lyu, M.R. (ed.) Software Fault Tolerance. Wiley, New York, pp. 211–229 (1995)
Black, J.P., Taylor, D.J., Morgan, D.E.: An introduction to robust data structures. Computer Science Department, University of Waterloo, CS-80-08. Computer Science Department, University of Waterloo (1980)
Briere, D., Traverse, P.: AIRBUS A320/A330/A340 electrical flight controls—a family of fault-tolerant systems. In: Digest of Papers of The Twenty-Third International Symposium on Fault-Tolerant, Computing (FTCS’93), pp. 616–623 (1993)
Broen, R.B.: New voters for redundant systems. J. Dyn. Syst. Meas. Control 97(1), 41–45 (1975)
Brooks, F.P.: No silver bullet: essence and accidents of software engineering. IEEE Comput. 20(4), 10–19 (1987)
Carney, D., Cochrane, J.: The 5ESS switching system: architectural overview. AT &T Tech. J. 64(6), 1339–1356 (1985)
Challenger: report of on the space shuttle Challenger accident. http://science.ksc.nasa.gov/shuttle/missions/51-l/docs/rogers-commission/tab (1986)
Chandy, K., Browne, J., Dissly, C., Uhrig, W.: Analytic models for rollback and recovery strategies in data base systems. IEEE Trans. Softw. Eng. SE-1(1), 100–110 (1975)
Cristian, F.: Exception handling and software fault tolerance. IEEE Trans. Comput. C-31(6), 531–540 (1982)
Denning, P.J.: Fault tolerant operating systems. ACM Comput. Surv. 8(4), 359–389 (1976)
Dimmer, C.I.: The tandem non-stop system. In: Anderson, T. (ed.) Resilient Computing Systems, vol. 1. Wiley, New York, pp. 178–196 (1986)
Dubrova, E.: Structural testing based on minimum kernels. In: Proceedings of the Conference on Design, Automation and Test in Europe—Volume 2, DATE ’05, pp. 1168–1173 (2005)
Gersting, J., Nist, R., Roberts, D., Van Valkenburg, R.: A comparison of voting algorithms for N-version programming. In: Proceedings of the Twenty-Fourth Annual Hawaii International Conference on System Sciences, vol. ii, pp. 253–262 (1991)
Gray, J.: Why do computers stop and what can be done about it? In: Proceedings of the the Fifth Symposium of Reliability in Distributed Software and Database Systems, pp. 3–12 (1986)
Hailpern, B., Santhanam, P.: Software debugging, testing, and verification. IBM Syst. J. 41(1), 4–12 (2002)
Heimdahl, M.P.E., Leveson, N.G.: Completeness and consistency in hierarchical state-based requirements. IEEE Trans. Softw. Eng. 22(6), 363–377 (1996)
Koopman, P.: Better Embedded System Software. Drumnadrochit Press, Wilmington (2010)
Kulkarni, G.V., Nicola, F.V., Trivedi, S.K.: Effects of checkpointing and queueing on program performance. Commun. Stat. Stochast. Models 6(4) 615–648 (1990)
Laprie, J.C., Arlat, J., Beounes, C., Kanoun, K.: Definition and analysis of hardware- and software-fault-tolerant architectures. Computer 23(7), 39–51 (1990)
Lee, P.A., Anderson, T.: Fault tolerance: principles and Practice. Dependable computing and fault-tolerant systems, 2nd edn. Springer-Verlag, New York (1990)
Leveson, N., Turner, C.S.: An investigation of the Therac-25 accidents. IEEE Comput. 26, 18–41 (1993)
Lin, H.: Sheffield hickups caused by software. Sci. Am. 253(6), 48 (1985)
Lions, J.L.: Ariane 5 flight 501 failure, report by the inquiry board. http://www.esrin.esa.it/htdocs/tidc/Press/Press96/ariane5rep.html (1996)
Lorczak, P., Caglayan, A., Eckhardt, D.: A theoretical investigation of generalized voters for redundant systems. In: Nineteenth International Symposium on Fault-Tolerant Computing, 1989, FTCS-19. Digest of Papers, pp. 444–451 (1989)
Lowry, E.S., Medlock, C.W.: Object code optimization. Commun. ACM 12(1), 13–22 (1969)
Lyu, M.R.: Introduction. In: Lyu, M.R. (ed.) Handbook of Software Reliability. McGraw-Hill, New York, pp. 3–25 (1996)
Massa, A.J.: Embedded development: Handling exceptions and interrupts in eCos. http://www.informit.com/articles/article.aspx?p=32058 (2003)
Maxion, R.A., Olszewski, R.T.: Improving software robustness with dependability cases. In: Proceedings of the The Twenty-Eighth Annual International Symposium on Fault-Tolerant Computing, pp. 346–355 (1998)
McAllister, D., Vouk, M.A.: Fault-tolerant software reliability engineering. In: Lyu, M.R. (ed.) Handbook of Software Reliability. McGraw-Hill, New York, pp. 567–614 (1996)
Myers, G.J.: Art of Software Testing. Wiley, New York (1979)
Myers, W.: Can software for the strategic defense initiative ever be error free? IEEE Comput. 19(11), 61–67 (1986)
Nicola, V.F.: Checkpointing and the modeling of program execution time. In: Lyu, M.R. (ed.) Software Fault Tolerance. Wiley, Chichester, pp. 167–188 (1995)
Ntafos, S.: A comparison of some structural testing strategies. IEEE Trans. Softw. Eng. 14(6), 868–874 (1988)
Pratt, V.: Anatomy of the Pentium bug. In: Mosses, P.D., Nielsen, M., Schwartzbach, M.I.(eds.) TAPSOFT’95: Theory and Practice of Software Development, vol. 915. Springer Verlag, Berlin, pp. 97–107 (1995)
Pressman, R.S.: Software Engineering: A Practitioner’s Approach. The McGraw-Hill Companies, Inc., New York (1997)
Randell, B.: System structure for software fault tolerance. In: Proceedings of the International Conference on Reliable Software, pp. 437–449 (1975)
Randell, B., Xu, J.: The evolution of the recovery block concept. In: Lyu, M.R. (ed.) Software Fault Tolerance. Wiley, New York, pp. 1–21 (1995)
Roper, M.: Software Testing. McGraw-Hill Book Company, London (1994)
Rushby, J.M.: Bus architectures for safety-critical embedded systems. In: Proceedings of the First International Workshop on Embedded Software, EMSOFT ’01, pp. 306–323 (2001)
TR-528-96: Reliability techniques for combined hardware and software systems. Technical Report TR-528-96, Rome Laboratory (1992)
Watson, A.H.: Structured testing: analysis and extensions. Technical Report TR-528-96, Princeton University (1996)
Wilfredo, T.: Software fault tolerance: a tutorial. Technical Report, Langley Research Center, Hampton (2000)
Woodcock, J., Larsen, P.G., Bicarregui, J., Fitzgerald, J.: Formal methods: practice and experience. ACM Comput. Surv. 41(4), 19:1–19:36 (2009)
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
Copyright information
© 2013 Springer Science+Business Media New York
About this chapter
Cite this chapter
Dubrova, E. (2013). Software Redundancy. In: Fault-Tolerant Design. Springer, New York, NY. https://doi.org/10.1007/978-1-4614-2113-9_7
Download citation
DOI: https://doi.org/10.1007/978-1-4614-2113-9_7
Published:
Publisher Name: Springer, New York, NY
Print ISBN: 978-1-4614-2112-2
Online ISBN: 978-1-4614-2113-9
eBook Packages: EngineeringEngineering (R0)