Skip to main content
Log in

Design and Verification of Distributed Recovery Blocks with CSP

  • Published:
Formal Methods in System Design Aims and scope Submit manuscript

Abstract

A case study on the application of Communicating Sequential Processes (CSP) to the design and verification of fault-tolerant real-time systems is presented. The distributed recovery block (DRB) scheme is a design technique for the uniform treatment of hardware and software faults in real-time systems. Through a simple fault-tolerant real-time system design using the DRB scheme, the case study illustrates a paradigm for specifying fault-tolerant software and demonstrates how the different behavioural aspects of a fault-tolerant real-time system design can be separately and systematically specified, formulated, and verified using an integrated set of formal techniques based on CSP.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. P.E. Ammann and J.C. Knight, “Data diversity: An approach to software fault tolerance,” In Proc. 17th International Symposium on Fault Tolerant Computing Systems, 1987, pp. 122–126.

  2. S.D. Brookes, C.A.R. Hoare, and A.W. Roscoe, “A theory of communicating sequential processes,” J. ACM, Vol. 31, pp. 560–599, 1984.

    Google Scholar 

  3. A. Cau and W.-P. de Roever, “Specifying fault-tolerance within stark's formalism,” in Proc. 23rd Symp. on Fault-Tolerant Comp., IEEE Computer Society Press, 1993, pp. 392–401.

  4. G.H. Chisholm and A.S. Wojcik, “An application of formal analysis to software in a fault-tolerant environment,” IEEE Transactions on Computers, Vol. 48, No. 10, pp. 1053–1063, 1999.

    Google Scholar 

  5. J. Coenen and J. Hooman, “A compositional semantics for fault-tolerant real-time systems,” in J. Vytopil (Ed.), Proc. Second International Symposium on Formal Techniques in Real-Time and Fault-Tolerant Systems, Nijmegen, The Netherlands, Springer-Verlag, Jan. 1992, pp. 33–51.

    Google Scholar 

  6. J. Coenen and J. Hooman, “Parameterized semantics for fault tolerant real-time systems,” in J. Vytopil (Ed.), Formal Techniques in Real-Time Fault-Tolerant Systems, Kluwer Academic Publishers, 1993, pp. 51–78.

  7. F. Cristian, “Exception handling and software fault tolerance,” IEEE Transactions on Computers, Vol. C-31, No. 6, pp. 531–540, 1982.

    Google Scholar 

  8. F. Cristian, “Arigorous approach to fault-tolerant programming,” IEEE Transactions on Software Engineering, Vol. SE-11, No. 1, pp. 23–31, 1985.

    Google Scholar 

  9. J.W. Davies, Specification and Proof in Real-Time Systems. Cambridge University Press, 1993.

  10. J.W. Davies and S.A. Schneider, “Real-Time CSP,” in T. Rus and C. Rattray (Eds.), Theories and Experiences for Real-time System Development, Vol. 2. World Scientific, 1995.

  11. D.E. Eckhardt and L.D. Lee, “A theoretical basis for the analysis of multiversion software subject to coincidental errors,” IEEE Transactions on Software Engineering, Vol. SE-11, No. 12, pp. 1511–1517, 1985.

    Google Scholar 

  12. Tom R. Halfhill, “The truth behind the pentium bug,” Byte, March 1995.

  13. H.A. Hansson, “Modeling real-time and reliability,” in J. Vytopil (Ed.), Formal Techniques in Real-Time Fault-Tolerant Systems, Kluwer Academic Publishers, 1993, pp. 79–105.

  14. Jifeng He and C.A.R. Hoare, “Algebraic specification and proof of a distributed recovery algorithm,” Distributed Computing, Vol. 2, pp. 1–12, 1987.

    Google Scholar 

  15. C.A.R. Hoare, Communicating Sequential Processes, Prentice Hall, 1985.

  16. J.J. Horning et al., “A Program Structure for Error Detection and Recovery,” in E. Gelenbe and C. Kaiser (Eds.), Lecture Notes in Computer Science, Springer Verlag, 1974, Vol. 16, pp. 171–187.

  17. M. Joseph, A. Moitra, and N. Soundararajan, “Proof rules for fault-tolerant distributed programs,” Science of Computer Programming, Vol. 8, pp. 43–67, 1987.

    Google Scholar 

  18. K.H. Kim and H.O. Welch, “Distributed execution of recovery blocks: An approach for uniform treatment of hardware and software faults in real-time applications,” IEEE Transactions on Computers, Vol. 38, No. 5, pp. 626–636, 1989.

    Google Scholar 

  19. J.C. Knight and N.G. Leveson, “An experimental evaluation of the assumption of independence in multiversion programming,” IEEE Transactions on Software Engineering, Vol. SE-12, No. 1, pp. 96–109, 1986.

    Google Scholar 

  20. L. Lamport, “The temporal logic of actions,” ACM Transactions on Programming Languages and Systems, Vol. 1, No. 3, pp. 872–923, 1994.

    Google Scholar 

  21. L. Lamport and S. Merz, “Specifying and verifying fault-tolerant systems,” in Proc. Formal Techniques in Real-Time and Fault-Tolerant Systems, H. Langmaak, W.-P. de Roever, and J. Vytopil (Eds.), Springer-Verlag, 1994, pp. 42–76.

  22. Jean-Claude Laprie et al., “Definition and analysis of hardware-and software-fault-tolerant architectures,” IEEE Computer, Vol. 23, No. 7, pp. 39–51, 1990.

    Google Scholar 

  23. R. Lazic, “A semantic study of data-independence with applications to the mechanical verification of concurrent systems,” Ph.D. Thesis, Oxford University, 1997.

  24. G. Lowe, “Probabilities and priorities in timed CSP,” D. Phil. Thesis, Oxford University, 1993.

  25. R. Milner, Communication and Concurrency, Prentice Hall, 1989.

  26. A.W. Roscoe, M.W. Mislove, and S.A. Schneider, “Fixed points without completeness,” Theoretical Computer Science, Vol. 138, No. 2, pp. 273–314, 1995.

    Google Scholar 

  27. S. Owre, J. Rushby, N. Shankar, and F. Von Henke, “Formal verification for fault-tolerant architectures: Prolegomena to the design of PVS,” IEEE Transactions on Software Engineering, Vol. 21, No. 2, pp. 107–125, 1995.

    Google Scholar 

  28. J. Peleska, “Design and verification of fault tolerant systems with CSP,” Distributed Computing, Vol. 5, pp. 95–106, 1991.

    Google Scholar 

  29. B. Randell. “System structure for software fault tolerance,” IEEE Transactions on Software Engineering, Vol. SE-1, No. 2, pp. 220–232, 1975.

    Google Scholar 

  30. G.M. Reed, “A uniform mathematical theory for real-time distributed computing,” D.Phil. Thesis, Oxford University, 1988.

  31. G.M. Reed and A.W Roscoe, “A timed model for communicating sequential processes,” in 13th ICALP, Vol. 226 of LNCS, Springer-Verlag, 1986, pp. 314–323.

    Google Scholar 

  32. A.W. Roscoe, “Model checking CSP,” In A Classical Mind: Essays in Honour of C.A.R. Hoare. Prentice Hall, 1994.

  33. A.W. Roscoe, The Theory and Practice of Concurrency, Prentice Hall, 1997.

  34. Henk Schepers, “Real-time systems and fault-tolerance,” in Real-Time Systems: Specification, Verification and Analysis, M. Joseph (Ed.), Prentice Hall, 1996, Ch. 6, pp. 229–257.

  35. R.D. Schlichting and F.B. Schneider, “Fail-stop processors: An approach to designing fault tolerant computing systems,” ACM Transactions on Computer Systems, Vol. 1, No. 3, pp. 222–238, 1983.

    Google Scholar 

  36. F.B. Schneider, “Implementing fault-tolerant services using the state machine approach: A tutorial,” ACM Comp. Surveys, Vol. 22, No. 4, pp. 299–319, 1990.

    Google Scholar 

  37. S.A. Schneider, “Unbounded nondeterminism for real-time processes,” Technical Report 13–92, Oxford University, 1992.

  38. S.A. Schneider, “Timewise refinement for communicating processes,” Science of Computer Programming, Vol. 28, pp. 43–90, 1997.

    Google Scholar 

  39. S.A. Schneider, Concurrent and Real-time Systems: The CSP Approach, John Wiley, 2000.

  40. W.L. Yeung, S.A. Schneider, and F. Tam, “Design and verification of distributed recovery blocks with CSP,” Technical Report CSD-TR–98–08, Royal Holloway, University of London, 1998.

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

About this article

Cite this article

Yeung, W., Schneider, S. Design and Verification of Distributed Recovery Blocks with CSP. Formal Methods in System Design 22, 225–248 (2003). https://doi.org/10.1023/A:1022997110855

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1023/A:1022997110855

Navigation