Abstract
A case study on the application of Communicating Sequential Processes (CSP) to the design and verification of fault-tolerant real-time systems is presented. The distributed recovery block (DRB) scheme is a design technique for the uniform treatment of hardware and software faults in real-time systems. Through a simple fault-tolerant real-time system design using the DRB scheme, the case study illustrates a paradigm for specifying fault-tolerant software and demonstrates how the different behavioural aspects of a fault-tolerant real-time system design can be separately and systematically specified, formulated, and verified using an integrated set of formal techniques based on CSP.
Similar content being viewed by others
References
P.E. Ammann and J.C. Knight, “Data diversity: An approach to software fault tolerance,” In Proc. 17th International Symposium on Fault Tolerant Computing Systems, 1987, pp. 122–126.
S.D. Brookes, C.A.R. Hoare, and A.W. Roscoe, “A theory of communicating sequential processes,” J. ACM, Vol. 31, pp. 560–599, 1984.
A. Cau and W.-P. de Roever, “Specifying fault-tolerance within stark's formalism,” in Proc. 23rd Symp. on Fault-Tolerant Comp., IEEE Computer Society Press, 1993, pp. 392–401.
G.H. Chisholm and A.S. Wojcik, “An application of formal analysis to software in a fault-tolerant environment,” IEEE Transactions on Computers, Vol. 48, No. 10, pp. 1053–1063, 1999.
J. Coenen and J. Hooman, “A compositional semantics for fault-tolerant real-time systems,” in J. Vytopil (Ed.), Proc. Second International Symposium on Formal Techniques in Real-Time and Fault-Tolerant Systems, Nijmegen, The Netherlands, Springer-Verlag, Jan. 1992, pp. 33–51.
J. Coenen and J. Hooman, “Parameterized semantics for fault tolerant real-time systems,” in J. Vytopil (Ed.), Formal Techniques in Real-Time Fault-Tolerant Systems, Kluwer Academic Publishers, 1993, pp. 51–78.
F. Cristian, “Exception handling and software fault tolerance,” IEEE Transactions on Computers, Vol. C-31, No. 6, pp. 531–540, 1982.
F. Cristian, “Arigorous approach to fault-tolerant programming,” IEEE Transactions on Software Engineering, Vol. SE-11, No. 1, pp. 23–31, 1985.
J.W. Davies, Specification and Proof in Real-Time Systems. Cambridge University Press, 1993.
J.W. Davies and S.A. Schneider, “Real-Time CSP,” in T. Rus and C. Rattray (Eds.), Theories and Experiences for Real-time System Development, Vol. 2. World Scientific, 1995.
D.E. Eckhardt and L.D. Lee, “A theoretical basis for the analysis of multiversion software subject to coincidental errors,” IEEE Transactions on Software Engineering, Vol. SE-11, No. 12, pp. 1511–1517, 1985.
Tom R. Halfhill, “The truth behind the pentium bug,” Byte, March 1995.
H.A. Hansson, “Modeling real-time and reliability,” in J. Vytopil (Ed.), Formal Techniques in Real-Time Fault-Tolerant Systems, Kluwer Academic Publishers, 1993, pp. 79–105.
Jifeng He and C.A.R. Hoare, “Algebraic specification and proof of a distributed recovery algorithm,” Distributed Computing, Vol. 2, pp. 1–12, 1987.
C.A.R. Hoare, Communicating Sequential Processes, Prentice Hall, 1985.
J.J. Horning et al., “A Program Structure for Error Detection and Recovery,” in E. Gelenbe and C. Kaiser (Eds.), Lecture Notes in Computer Science, Springer Verlag, 1974, Vol. 16, pp. 171–187.
M. Joseph, A. Moitra, and N. Soundararajan, “Proof rules for fault-tolerant distributed programs,” Science of Computer Programming, Vol. 8, pp. 43–67, 1987.
K.H. Kim and H.O. Welch, “Distributed execution of recovery blocks: An approach for uniform treatment of hardware and software faults in real-time applications,” IEEE Transactions on Computers, Vol. 38, No. 5, pp. 626–636, 1989.
J.C. Knight and N.G. Leveson, “An experimental evaluation of the assumption of independence in multiversion programming,” IEEE Transactions on Software Engineering, Vol. SE-12, No. 1, pp. 96–109, 1986.
L. Lamport, “The temporal logic of actions,” ACM Transactions on Programming Languages and Systems, Vol. 1, No. 3, pp. 872–923, 1994.
L. Lamport and S. Merz, “Specifying and verifying fault-tolerant systems,” in Proc. Formal Techniques in Real-Time and Fault-Tolerant Systems, H. Langmaak, W.-P. de Roever, and J. Vytopil (Eds.), Springer-Verlag, 1994, pp. 42–76.
Jean-Claude Laprie et al., “Definition and analysis of hardware-and software-fault-tolerant architectures,” IEEE Computer, Vol. 23, No. 7, pp. 39–51, 1990.
R. Lazic, “A semantic study of data-independence with applications to the mechanical verification of concurrent systems,” Ph.D. Thesis, Oxford University, 1997.
G. Lowe, “Probabilities and priorities in timed CSP,” D. Phil. Thesis, Oxford University, 1993.
R. Milner, Communication and Concurrency, Prentice Hall, 1989.
A.W. Roscoe, M.W. Mislove, and S.A. Schneider, “Fixed points without completeness,” Theoretical Computer Science, Vol. 138, No. 2, pp. 273–314, 1995.
S. Owre, J. Rushby, N. Shankar, and F. Von Henke, “Formal verification for fault-tolerant architectures: Prolegomena to the design of PVS,” IEEE Transactions on Software Engineering, Vol. 21, No. 2, pp. 107–125, 1995.
J. Peleska, “Design and verification of fault tolerant systems with CSP,” Distributed Computing, Vol. 5, pp. 95–106, 1991.
B. Randell. “System structure for software fault tolerance,” IEEE Transactions on Software Engineering, Vol. SE-1, No. 2, pp. 220–232, 1975.
G.M. Reed, “A uniform mathematical theory for real-time distributed computing,” D.Phil. Thesis, Oxford University, 1988.
G.M. Reed and A.W Roscoe, “A timed model for communicating sequential processes,” in 13th ICALP, Vol. 226 of LNCS, Springer-Verlag, 1986, pp. 314–323.
A.W. Roscoe, “Model checking CSP,” In A Classical Mind: Essays in Honour of C.A.R. Hoare. Prentice Hall, 1994.
A.W. Roscoe, The Theory and Practice of Concurrency, Prentice Hall, 1997.
Henk Schepers, “Real-time systems and fault-tolerance,” in Real-Time Systems: Specification, Verification and Analysis, M. Joseph (Ed.), Prentice Hall, 1996, Ch. 6, pp. 229–257.
R.D. Schlichting and F.B. Schneider, “Fail-stop processors: An approach to designing fault tolerant computing systems,” ACM Transactions on Computer Systems, Vol. 1, No. 3, pp. 222–238, 1983.
F.B. Schneider, “Implementing fault-tolerant services using the state machine approach: A tutorial,” ACM Comp. Surveys, Vol. 22, No. 4, pp. 299–319, 1990.
S.A. Schneider, “Unbounded nondeterminism for real-time processes,” Technical Report 13–92, Oxford University, 1992.
S.A. Schneider, “Timewise refinement for communicating processes,” Science of Computer Programming, Vol. 28, pp. 43–90, 1997.
S.A. Schneider, Concurrent and Real-time Systems: The CSP Approach, John Wiley, 2000.
W.L. Yeung, S.A. Schneider, and F. Tam, “Design and verification of distributed recovery blocks with CSP,” Technical Report CSD-TR–98–08, Royal Holloway, University of London, 1998.
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Yeung, W., Schneider, S. Design and Verification of Distributed Recovery Blocks with CSP. Formal Methods in System Design 22, 225–248 (2003). https://doi.org/10.1023/A:1022997110855
Issue Date:
DOI: https://doi.org/10.1023/A:1022997110855