Skip to main content

Software Redundancy

  • Chapter
  • First Online:
Fault-Tolerant Design

Abstract

There are two types of software fault tolerance techniques: single version and multi version. Single version techniques aim to improve the fault tolerance of a software component by adding to it mechanisms for fault detection, containment, and recovery. Multi-version techniques use redundant software components which are developed following design diversity rules. As in the hardware case, various choices have to be examined to determine at which level the redundancy has to be provided and which modules are to be made redundant. One has to be aware that the increase in complexity caused by redundancy can be quite severe and may diminish the dependability improvement, unless redundant resources are allocated in a proper way. In this chapter, we consider common single-version and multi-version software fault tolerance techniques, including checkpoint and restart, recovery blocks, n-version programming, and n self-checking programming. We also briefly cover software testing and common test coverage metrics.

“Programs are really not much more than the programmer’s best guess about what a system should do.” Russel Abbot.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

eBook
USD 16.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 64.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 99.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Ammann, P., Knight, J.: Data diversity: an approach to software fault tolerance. IEEE Trans. Comput. 37(4), 418–425 (1988)

    Article  Google Scholar 

  2. Aveyard, R.L., Man, F.T.: A study on the reliability of the circuit maintenance system-1 b. Bell Syst. Tech. J. 59(8), 1317–1332 (1980)

    Google Scholar 

  3. Avižienis, A.: Fault-tolerant systems. IEEE Trans. Comput. 25(12), 1304–1312 (1976)

    Article  MATH  Google Scholar 

  4. Avižienis, A.: Design diversity: an approach to fault tolerance of design faults. In: Proceedings of the National Computer Conference and Exposition, pp. 163–171 (1984)

    Google Scholar 

  5. Avižienis, A.: The methodology of N-version programming. In: Lyu, M.R. (ed.) Software Fault Tolerance. Wiley, Chichester, pp. 158–168 (1995)

    Google Scholar 

  6. Bartlett, J.F.: A “NonStop” operating system. In: Proceedings of the 11th Hawaii International Conference on System Sciences, vol. 3 (1978)

    Google Scholar 

  7. Beizer, B.: Software Testing Techniques. Van Nostrand Reinhold, New York (1990)

    Google Scholar 

  8. Bishop, P.: Software fault tolerance by design diversity. In: Lyu, M.R. (ed.) Software Fault Tolerance. Wiley, New York, pp. 211–229 (1995)

    Google Scholar 

  9. Black, J.P., Taylor, D.J., Morgan, D.E.: An introduction to robust data structures. Computer Science Department, University of Waterloo, CS-80-08. Computer Science Department, University of Waterloo (1980)

    Google Scholar 

  10. Briere, D., Traverse, P.: AIRBUS A320/A330/A340 electrical flight controls—a family of fault-tolerant systems. In: Digest of Papers of The Twenty-Third International Symposium on Fault-Tolerant, Computing (FTCS’93), pp. 616–623 (1993)

    Google Scholar 

  11. Broen, R.B.: New voters for redundant systems. J. Dyn. Syst. Meas. Control 97(1), 41–45 (1975)

    Article  Google Scholar 

  12. Brooks, F.P.: No silver bullet: essence and accidents of software engineering. IEEE Comput. 20(4), 10–19 (1987)

    Article  MathSciNet  Google Scholar 

  13. Carney, D., Cochrane, J.: The 5ESS switching system: architectural overview. AT &T Tech. J. 64(6), 1339–1356 (1985)

    Google Scholar 

  14. Challenger: report of on the space shuttle Challenger accident. http://science.ksc.nasa.gov/shuttle/missions/51-l/docs/rogers-commission/tab (1986)

  15. Chandy, K., Browne, J., Dissly, C., Uhrig, W.: Analytic models for rollback and recovery strategies in data base systems. IEEE Trans. Softw. Eng. SE-1(1), 100–110 (1975)

    Google Scholar 

  16. Cristian, F.: Exception handling and software fault tolerance. IEEE Trans. Comput. C-31(6), 531–540 (1982)

    Google Scholar 

  17. Denning, P.J.: Fault tolerant operating systems. ACM Comput. Surv. 8(4), 359–389 (1976)

    Article  MATH  Google Scholar 

  18. Dimmer, C.I.: The tandem non-stop system. In: Anderson, T. (ed.) Resilient Computing Systems, vol. 1. Wiley, New York, pp. 178–196 (1986)

    Google Scholar 

  19. Dubrova, E.: Structural testing based on minimum kernels. In: Proceedings of the Conference on Design, Automation and Test in Europe—Volume 2, DATE ’05, pp. 1168–1173 (2005)

    Google Scholar 

  20. Gersting, J., Nist, R., Roberts, D., Van Valkenburg, R.: A comparison of voting algorithms for N-version programming. In: Proceedings of the Twenty-Fourth Annual Hawaii International Conference on System Sciences, vol. ii, pp. 253–262 (1991)

    Google Scholar 

  21. Gray, J.: Why do computers stop and what can be done about it? In: Proceedings of the the Fifth Symposium of Reliability in Distributed Software and Database Systems, pp. 3–12 (1986)

    Google Scholar 

  22. Hailpern, B., Santhanam, P.: Software debugging, testing, and verification. IBM Syst. J. 41(1), 4–12 (2002)

    Article  Google Scholar 

  23. Heimdahl, M.P.E., Leveson, N.G.: Completeness and consistency in hierarchical state-based requirements. IEEE Trans. Softw. Eng. 22(6), 363–377 (1996)

    Article  Google Scholar 

  24. Koopman, P.: Better Embedded System Software. Drumnadrochit Press, Wilmington (2010)

    Google Scholar 

  25. Kulkarni, G.V., Nicola, F.V., Trivedi, S.K.: Effects of checkpointing and queueing on program performance. Commun. Stat. Stochast. Models 6(4) 615–648 (1990)

    Google Scholar 

  26. Laprie, J.C., Arlat, J., Beounes, C., Kanoun, K.: Definition and analysis of hardware- and software-fault-tolerant architectures. Computer 23(7), 39–51 (1990)

    Article  Google Scholar 

  27. Lee, P.A., Anderson, T.: Fault tolerance: principles and Practice. Dependable computing and fault-tolerant systems, 2nd edn. Springer-Verlag, New York (1990)

    Google Scholar 

  28. Leveson, N., Turner, C.S.: An investigation of the Therac-25 accidents. IEEE Comput. 26, 18–41 (1993)

    Article  Google Scholar 

  29. Lin, H.: Sheffield hickups caused by software. Sci. Am. 253(6), 48 (1985)

    Article  Google Scholar 

  30. Lions, J.L.: Ariane 5 flight 501 failure, report by the inquiry board. http://www.esrin.esa.it/htdocs/tidc/Press/Press96/ariane5rep.html (1996)

  31. Lorczak, P., Caglayan, A., Eckhardt, D.: A theoretical investigation of generalized voters for redundant systems. In: Nineteenth International Symposium on Fault-Tolerant Computing, 1989, FTCS-19. Digest of Papers, pp. 444–451 (1989)

    Google Scholar 

  32. Lowry, E.S., Medlock, C.W.: Object code optimization. Commun. ACM 12(1), 13–22 (1969)

    Article  Google Scholar 

  33. Lyu, M.R.: Introduction. In: Lyu, M.R. (ed.) Handbook of Software Reliability. McGraw-Hill, New York, pp. 3–25 (1996)

    Google Scholar 

  34. Massa, A.J.: Embedded development: Handling exceptions and interrupts in eCos. http://www.informit.com/articles/article.aspx?p=32058 (2003)

  35. Maxion, R.A., Olszewski, R.T.: Improving software robustness with dependability cases. In: Proceedings of the The Twenty-Eighth Annual International Symposium on Fault-Tolerant Computing, pp. 346–355 (1998)

    Google Scholar 

  36. McAllister, D., Vouk, M.A.: Fault-tolerant software reliability engineering. In: Lyu, M.R. (ed.) Handbook of Software Reliability. McGraw-Hill, New York, pp. 567–614 (1996)

    Google Scholar 

  37. Myers, G.J.: Art of Software Testing. Wiley, New York (1979)

    Google Scholar 

  38. Myers, W.: Can software for the strategic defense initiative ever be error free? IEEE Comput. 19(11), 61–67 (1986)

    Article  Google Scholar 

  39. Nicola, V.F.: Checkpointing and the modeling of program execution time. In: Lyu, M.R. (ed.) Software Fault Tolerance. Wiley, Chichester, pp. 167–188 (1995)

    Google Scholar 

  40. Ntafos, S.: A comparison of some structural testing strategies. IEEE Trans. Softw. Eng. 14(6), 868–874 (1988)

    Article  Google Scholar 

  41. Pratt, V.: Anatomy of the Pentium bug. In: Mosses, P.D., Nielsen, M., Schwartzbach, M.I.(eds.) TAPSOFT’95: Theory and Practice of Software Development, vol. 915. Springer Verlag, Berlin, pp. 97–107 (1995)

    Google Scholar 

  42. Pressman, R.S.: Software Engineering: A Practitioner’s Approach. The McGraw-Hill Companies, Inc., New York (1997)

    Google Scholar 

  43. Randell, B.: System structure for software fault tolerance. In: Proceedings of the International Conference on Reliable Software, pp. 437–449 (1975)

    Google Scholar 

  44. Randell, B., Xu, J.: The evolution of the recovery block concept. In: Lyu, M.R. (ed.) Software Fault Tolerance. Wiley, New York, pp. 1–21 (1995)

    Google Scholar 

  45. Roper, M.: Software Testing. McGraw-Hill Book Company, London (1994)

    Google Scholar 

  46. Rushby, J.M.: Bus architectures for safety-critical embedded systems. In: Proceedings of the First International Workshop on Embedded Software, EMSOFT ’01, pp. 306–323 (2001)

    Google Scholar 

  47. TR-528-96: Reliability techniques for combined hardware and software systems. Technical Report TR-528-96, Rome Laboratory (1992)

    Google Scholar 

  48. Watson, A.H.: Structured testing: analysis and extensions. Technical Report TR-528-96, Princeton University (1996)

    Google Scholar 

  49. Wilfredo, T.: Software fault tolerance: a tutorial. Technical Report, Langley Research Center, Hampton (2000)

    Google Scholar 

  50. Woodcock, J., Larsen, P.G., Bicarregui, J., Fitzgerald, J.: Formal methods: practice and experience. ACM Comput. Surv. 41(4), 19:1–19:36 (2009)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Elena Dubrova .

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer Science+Business Media New York

About this chapter

Cite this chapter

Dubrova, E. (2013). Software Redundancy. In: Fault-Tolerant Design. Springer, New York, NY. https://doi.org/10.1007/978-1-4614-2113-9_7

Download citation

  • DOI: https://doi.org/10.1007/978-1-4614-2113-9_7

  • Published:

  • Publisher Name: Springer, New York, NY

  • Print ISBN: 978-1-4614-2112-2

  • Online ISBN: 978-1-4614-2113-9

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics