Abstract
In this paper, we present a complete architecture for improving the dependability of complex COTS and legacy-based systems. For long-lived applications, such as most of those being constructed nowadays via integration of legacy subsystems, fault treatment is a very important part of the fault tolerance strategy. The paper advocates the need for careful diagnosis and damage assessment, and for precise and effective recovery actions, specifically tailored to the affecting fault and/or to the extent of the damage in the affected component. In our proposal, threshold-based mechanisms are exploited to trigger alternative actions. The design and implementation of the resulting solution is illustrated with respect to a case study. This consists of a distributed architectural framework, handling replicated legacy-based subsystems. Replication and voting are used for error detection and masking. An experimental prototype deployed over a COTS-based LAN is described and has allowed a dependability analysis, via combined use of direct measurements and analytical modeling.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Microsoft Corporation, NET Framework Reference (2002), http://msdn.microsoft.com/netframework/techinfo/documentation/default.asp
Shannon, B.: Java 2 Platform Enterprise Edition Specification, v1.4 (2002), http://java.sun.com/j2ee
Arlat, J., Fabre, J.-C., Rodríguez, M., Salles, F.: Dependability of COTS Microkernel- Based Systems. IEEE Transactions on Computers 51(2) (2002)
Narasimhan, P., Melliar-Smith, P.M.: State Synchronization and Recovery for Strongly Consistent Replicated CORBA Objects. In: Proc. of The 2001 International Conference on Dependable Systems and Networks (2001)
Sabnis, C., Sanders, W.H., Bakken, D.E., Berman, M.E., Karr, D.A., Cukier, M.: AQuA: An Adaptive Architecture that Provides Dependable Distributed Objects. In: Proc. of The IEEE 17th Symposium on Reliable Distributed Systems (1998)
Kalbarczyk, Z.T., Iyer, R.K., Bagchi, S., Whisnant, K.: Chameleon: a Software Infrastructure for Adaptive Fult Tolerance. IEEE Trans. on Parallel and Distributed Systems 10, 560–579 (1999)
Baldoni, R., Marchetti, C., Mecella, M., Virgillito, A.: An Interoperable Replication Logic for CORBA Systems. In: Proc. of The 2nd International Symposium on Distributed Object Applications 2000, DOA 2000 (2000)
Natarajan, B., Gokhale, A., Yajnik, S., Schmidt, D.C.: DOORS: TowardsHighperformance Fault-tolerant CORBA. In: Proc. of International Symposium on Distributed Objects and Applications, DOA 2000 (2000)
Cotroneo, D., Mazzocca, N., Romano, L., Russo, S.: Building a Dependable System from a Legacy Application with CORBA. Journal of Systems Architecture 48, 81–98 (2002)
Fabre, J.C., Perennou, T.: A metaobject architecture for fault-tolerant distributed systems: the FRIENDS approach. IEEE Transactions on Computers 47, 78–95 (1998)
Avizienis, A., Laprie, J.C., Randell, B.: Fundamental Concepts of Dependability, LAAS, Technical Report n.ro 01145, Tolosa (France), Technical Report n.ro 01145 (2001)
Bondavalli, A., Chiaradonna, S., Di Giandomenico, F., Grandoni, F.: Threshold- Based Mechanisms to Discriminate Transient from Intermittent Faults. IEEE Transactions on Computers 49, 230–245 (2000)
Powell, D., Bonn, G., Seaton, D., Verissimo, P., Waeselynck, F.: The delta-4 approach to dependability in open distributed computing systems. In: Proc. of the 18th International Symposium on Fault Tolerant Computing Systems, FTCS 18 (1988)
Group, O.M.: Fault-Tolerant CORBA Specification, v1.0, OMG document ptc/00-04-04 (2001), http://www.omg.org
Romano, L., Chiaradonna, S., Bondavalli, A., Cotroneo, D.: Implementation of Threshold-based Diagnostic Mechanisms for COTS-based Applications. In: Proc. of The 21st IEEE Symposium on Reliable Distributed Systems (SRDS 2002), Osaka, Japan (2002)
Goswami, K.K., Iyer, R.K.: Simulation of Software Behavior Under Hardware Faults. In: Proc. of the 23rd Annual International Symposium on Fault-Tolerant Computing (1993)
Iyer, R.K., Tang, D.: Experimental Analysis of Computer System Fault tolerance. In: Pradhan, D.K. (ed.) Fault-Tolerant Computer System Design. ch. 5. Prentice Hall Inc., Englewood Cliffs (1996)
Stott, D., Jones, P.H., Hamman, M., Kalbarczyk, Z., Iyer, R.K.: NFTAPE: networked fault tolerance and performance evaluator. In: Proc. of International Conference on Dependable Systems and Networks (2002)
Bakken, D.E., Zhan, Z., Jones, C.C., Karr, D.A.: Middleware support for voting and data fusion. Presented at DSN 2001- IEEE International Conference on Dependable Systems and Networks, Gotenburg, Sweden, pp. 453–462 (2001)
DBench Consortium, Measurements, Deliverable ETIE1, IST-2000-25425 Dependability Benchmarking, DBench (2002)
Mullen, R.: The Lognormal Distribution of Software Failure Rates: Origin and Evidence. In: Proc. of The Ninth International Symposium on Software Reliability Engineering, Paderborn, Germany (1998)
Sanders, W.H., Meyer, J.F.: A Unified Approach for Specifying Measures of Performance. In: Avizienis, A., Kopetz, H., Laprie, J.C. (eds.) Dependable Computing for Critical Applications. Dependable Computing and Fault-Tolerant Systems, vol. 4, pp. 215–237. Springer, Heidelberg (1991)
Birman, K., Constable, R., Hayden, M., Kreitz, C., Rodeh, O., van Renesse, R., Vogels, W.: The Horus and Ensemble Projects: Accomplishments and Limitations. In: Proceedings of the DARPA Information Survivability Conference & Exposition, DISCEX 2000 (2000)
Cotroneo, D., Mazzeo, A., Romano, L., Russo, S.: Implementing a CORBA-based architecture for leveraging the security level of existing applications. In: Meersman, R., Tari, Z., et al. (eds.) CoopIS 2002, DOA 2002, and ODBASE 2002. LNCS, vol. 2519. Springer, Heidelberg (2002)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2003 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Bondavalli, A., Chiaradonna, S., Cotroneo, D., Romano, L. (2003). A Fault-Tolerant Distributed Legacy-Based System and Its Evaluation. In: de Lemos, R., Weber, T.S., Camargo, J.B. (eds) Dependable Computing. LADC 2003. Lecture Notes in Computer Science, vol 2847. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-45214-0_22
Download citation
DOI: https://doi.org/10.1007/978-3-540-45214-0_22
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-20224-0
Online ISBN: 978-3-540-45214-0
eBook Packages: Springer Book Archive