Abstract
Despite the use of software engineering best practices and tools, it would be very risky to assume that the software that is developed today is fault-free. Moreover, we have to consider the fact that the software could face unexpected situations not considered during its design. Robustness is a highly desirable and sometimes indispensable software requirement, especially for critical systems, where the consequences of a system failure can be catastrophic. This chapter outlines existing fault tolerance techniques, followed by a discussion of the potential that multiagent systems have to enhance the design of robust, fault-tolerant systems, thereby improving large-scale, critical, and complex system reliability.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Anderson, H. and Hagelin, G. (1981). Computer Controlled Interlocking System. Ericsson Review No 2.
Anderson, T. (1985). Resilient Computing Systems. Collins, London, UK.
Avizienis, A. (1995). The methodology of n-version programming. In Lyu, M. R., editor, Software Fault Tolerance, pages 23–46. John Wiley & Sons, New York.
Avizienis, A. and Chen, L. (1977). On the implementation of N-version programming for software fault tolerance during execution. In Proceedings of the 1st IEEE International Computer Software and Applications Conference (COMPSAC’77), pages 149–155, 8–11 November, Chicago. IEEE Computer Society.
Avizienis, A. and Kelly, J. P. J. (1984). Fault tolerance by design diversity: Concepts and experiments. Computer, 17:67–80.
Avizienis, A., Laprie, J.-C., and Randell, B. (2000). Fundamental concepts of dependability. In Proceedings of the 3rd IEEE Information Survability Workshop (ISW-2000), pages 7–12, 20–21 December, Boston. IEEE Computer Society.
Becker, R. and Corkill, D. (2007). Determining confidence when integrating contributions from multiple agents. In Proceedings of the 6th International Joint Conference on Autonomous Agents and Multi-Agent Systems (AAMAS’07), pages 449–456. The International Foundation for Autonomous Agents and Multiagent Systems (IFAAMAS).
Bishop, P. (1995). Software fault tolerance by design diversity. In Lyu, M., editor, Software Fault Tolerance, pages 211–229. John Wiley & Sons, New York.
Brachman, R. J. (2006). (AA)AI more than the sum of its parts. AI Magazine, 27(4):19–34.
Cheyer, A. and Martin, D. L. (2001). The open agent architecture. Autonomous Agents and Multi-Agent Systems, 4(1/2):143–148.
DeMarco, T. and Lister, T. (1987). Peopleware: productive projects and teams. Dorset House Publishing Co., Inc., New York.
Donald, L., Keller, S., and Calhoun, C. (1989). Sociology. Alfred A. Knopf, New York.
Fraser, S., Campara, D., Chilley, C., Gabriel, R., Lopez, R., Thomas, D., and Utas, G. (2005). Fostering software robustness in an increasingly hostile world. In Proceedings of the 20th ACM SIGPLAN conference on Object-oriented programming, systems, languages, and applications (OOPSLA’05), pages 378–380, 16–20 October, San Diego. ACM.
Grosspietsch, K. E. and Silayeva, T. A. (2003). An adaptive approach for n-version systems. In Proceedings of the 17th International Symposium on Parallel and Distributed Processing (IPDPS’03), page 215.1, Nice, France. IEEE Computer Society.
Hasling, J. (1975). Group Discussion and Decision Making. Thomas Y. Crowell Company, New York.
Hempel, J. (2006). Crowdsourcing: Milk the masses for inspiration. BusinessWeek. 25 September.
Huhns, M. N., Holderfield, V. T., and Zavala Gutierrez, R. L. (2003a). Achieving software robustness via large-scale multiagent. In Garcia, A., Lucena, C., Zambonelli, F., Omicini, A., and Castro, J., editors, Software Engineering for Large-Scale Multi-Agent Systems, volume 2603 of Lecture Notes in Computer Science, pages 199–215. Springer, Berlin Heidelberg.
Huhns, M. N., Holderfield, V. T., and Zavala Gutierrez, R. L. (2003b). Robust software via agent-based redundancy. In Proceedings of the 2nd International Joint Conference on Autonomous Agents and Multiagent Systems (AAMAS’03), pages 1018–1019. ACM.
Kephart, J. O. and Chess, D. M. (2003). The vision of autonomic computing. Computer, 36(1):41–50.
Kim, K., Vouk, M., and McAllister, D. (1996). An empirical evaluation of maximum likelihood voting in failure correlation conditions. In Proceedings of the 7th International Symposium on Software Reliability Engineering (ISSRE’96), pages 330–339, White Plains, NY. IEEE Computer Society.
Knight, J. and Leveson, N. (1986). An experimental evaluation of the assumption of independence in multi-version programming. IEEE Trans. Software Engineering, 12:96–109.
Laddaga, R. (1999). Guest editor’s introduction: Creating robust software through self-adaptation. IEEE Intelligent Systems, 14(3):26–29.
Laddaga, R., Robertson, P., and Shrobe, H., editors (2001). Self-Adaptive Software, 2nd International Workshop (IWSAS’01), Revised Papers, volume 2614 of Lecture Notes in Computer Science, Balatonfüred, Hungary. Springer, New York.
Laprie, J. (1995). Dependable computing: Concepts, limits, challenges. In Special Issue of the 25th IEEE International Symposium on Fault-Tolerant Computing, pages 42–54, Pasadena, CA.
Laprie, J., Avizienis, A., and Kopetz, H., editors (1992). Dependability: Basic Concepts and Terminology. Springer-Verlag, New York.
Laprie, J. C., Arlat, J., Beounes, C., Kanoun, K., and Hourtolle, C. (1987). Hardware and software fault tolerance: definition and analysis of architectural solutions. In Proceedings of the 17th International Symposium Fault-Tolerant Computing, pages 116–121, Pittsburgh,PA. ACM.
Laprie, J.-C., Béounes, C., and Kanoun, K. (1990). Definition and analysis of hardware- and software-fault-tolerant architectures. Computer, 23(7):39–51.
Leveson, N. G. (1995). Safeware: System Safety and Computers. ACM, New York.
Lyu, M., editor (1996). Handbook of Software Reliability Engineering. McGraw-Hill and IEEE Computer Society, New York.
Lyu, M. and Avizienis, A. (1991). Assuring design diversity in N-version software: A design paradigm for N-version programming. In Meyer, J. and Schlichting, R., editors, Proceedings of the 2nd IFIP International Working Conference on Dependable Computing for Critical Applications (DCCA-2), pages 197–218, Tucson, Arizona, USA. Springer-Verlag, New York.
Lyu, M., Chen, J., and Avizienis, A. (1992). Software diversity metrics and measurements. In Proceedings of the 16th IEEE Annual International Computer Software and Applications Conference (COMPSAC’92), pages 69–78, 21–25 September, Chicago. IEEE Computer Society.
Martin, D., Cheyer, A., and Moran, D. (1999). The open agent architecture: a framework for building distributed software systems. Applied Artificial Intelligence, 13(1/2):91–128.
Maxion, R. A. and Olszewski, R. T. (1998). Improving software robustness with dependability cases. In 28th International Symposium on Fault-Tolerant Computing (FTCS’98), pages 346–355, Munich, Germany. IEEE Computer Society.
Mitra, S., Saxena, N. R., and McCluskey, E. J. (1999). A design diversity metric and reliability analysis for redundant systems. In Proceedings of the 1999 IEEE International Test Conference (ITC’99), page 662, Washington, DC. IEEE Computer Society.
Musa, J. D., Iannino, A., and Okumoto, K. (1987). Software reliability: measurement, prediction, application. McGraw-Hill, Inc., New York.
Parhami, B. (1988). From defects to failures: a view of dependable computing. SIGARCH Computer Architecture News, 16(4):157–168.
Pullum, L. L. (2001). Software fault tolerance techniques and implementation. Artech House, Inc., Norwood, MA.
Randell, B. (1975). System structure for software fault tolerance. In Proceedings of the International Conference on Reliable Software, pages 437–449, Los Angeles, California. ACM.
Randell, B. (1995). The evolution of the recovery block concept. In Lyu, M., editor, Software Fault Tolerance, chapter 1, pages 1–22. John Wiley & Sons, New York.
Randell, B. (2000). Turing memorial lecture–facing up to faults. Computer, 4(2):95–106.
Scott, K., Gault, J., and McAllister, D. (1983). The consensus recovery block. In Total Systems Reliability Symposium, pages 3–9, Gaithersburg, MD. IEEE Computer Society.
Seeley, T. D., Visscher, P. K., and Passino, K. M. (2006). Group decision making in honey bee swarms. American Scientist, 94:220–229.
Shapley, L. S. and Grofman, B. (1984). Optimizing group judgmental accuracy in the presence of interdependence. Public Choice, 43:329–343.
Smith, R. G. (1988). The contract net protocol: High-level communication and control in a distributed problem solver. IEEE Transactions on Computers, C-29(12):1104–1113.
Smith, W. D. (2006). Ants, bees, and computers agree range voting is best single-winner system. Technical report, Temple University, Department of Mathematics.
Sommerville, I. (1995). Software Engineering. Addison-Wesley, Reading, MA, 5th edition.
Tai, A., Meyer, F., and Avizienis, A. (1993). Performability enhancement of fault-tolerant software. IEEE Transactions on Reliability, pages 227–237.
Townend, P. and Xu, J. (2002). Assessing multi-version systems through fault injection. In Proceedings of the 7th IEEE International Workshop on Object-Oriented Real-Time Dependable Systems (WORDS’02), pages 105–112, San Diego, CA. Computer Society.
Traverse, P. (1988). Airbus and ATR system architecture and specification. Software Diversity in Computerised Control Systems, pages 95–104.
Turlapati, R. and Huhns, M. N. (2005). Multiagent reputation management to achieve robust software using redundancy. In Proceedings of the IEEE/WIC/ACM International Conference on Intelligent Agent Technology (IAT’05), pages 386–392, Compiegne, France. ComputerSociety.
Vidotto, A., Brown, K. N., and Beck, J. (2005). Robust constraint solving using multiple heuristics. In Creaney, N., editor, Proceedings of the 16th Irish Artificial Intelligence and Cognitive Science Conference (AICS’05), page 871, Coleraine, Northern Ireland. University of Ulster.
Voges, U., Fetsch, F., and Gmeiner, L. (1982). Use of microprocessors in a safety-oriented reactor shutdown system. In Lauber, E. and Moltoft, J., editors, Reliability in Electrical and Electronic Components and Systems, pages 493–497. North-Holland Publishing Company, Amsterdam, The Netherlands.
Vouk, M., McAllister, D., Eckhardt, D., and Kim, K. (1993). An empirical evaluation of consensus voting and consensus recovery block reliability in the presence of failure correlation. Journal of Computer and Software Engineering, 4:367–388.
Zavala Gutierrez, R. L. and Huhns, M. N. (2003). Achieving software robustness via multiagent-based redundancy (extended abstract). In Das, R. and Walsh, W., editors, Proceedings of the IJCAI-03 Workshop on AI and Autonomic Computing: Developing a Research Agenda for Self-Managing Computer Systems, Acapulco, Mexico. IBM.
Zavala Gutierrez, R. L. and Huhns, M. N. (2004). On building robust web service-based applications. In Cavedon, L., Maamar, Z., Martin, D., and Benatallah, B., editors, Extending Web Services Technologies: The Use of Multi-Agent Approaches, chapter 14, pages 293–310. Kluwer Academic Publishing, New York.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2008 Springer-Verlag London Limited
About this chapter
Cite this chapter
Gutierrez, R.L.Z., Huhns, M. (2008). Multiagent-Based Fault Tolerance Management for Robustness. In: Schuster, A. (eds) Robust Intelligent Systems. Springer, London. https://doi.org/10.1007/978-1-84800-261-6_2
Download citation
DOI: https://doi.org/10.1007/978-1-84800-261-6_2
Publisher Name: Springer, London
Print ISBN: 978-1-84800-260-9
Online ISBN: 978-1-84800-261-6
eBook Packages: Computer ScienceComputer Science (R0)