Skip to main content

Multiagent-Based Fault Tolerance Management for Robustness

  • Chapter
Robust Intelligent Systems

Abstract

Despite the use of software engineering best practices and tools, it would be very risky to assume that the software that is developed today is fault-free. Moreover, we have to consider the fact that the software could face unexpected situations not considered during its design. Robustness is a highly desirable and sometimes indispensable software requirement, especially for critical systems, where the consequences of a system failure can be catastrophic. This chapter outlines existing fault tolerance techniques, followed by a discussion of the potential that multiagent systems have to enhance the design of robust, fault-tolerant systems, thereby improving large-scale, critical, and complex system reliability.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  • Anderson, H. and Hagelin, G. (1981). Computer Controlled Interlocking System. Ericsson Review No 2.

    Google Scholar 

  • Anderson, T. (1985). Resilient Computing Systems. Collins, London, UK.

    Google Scholar 

  • Avizienis, A. (1995). The methodology of n-version programming. In Lyu, M. R., editor, Software Fault Tolerance, pages 23–46. John Wiley & Sons, New York.

    Google Scholar 

  • Avizienis, A. and Chen, L. (1977). On the implementation of N-version programming for software fault tolerance during execution. In Proceedings of the 1st IEEE International Computer Software and Applications Conference (COMPSAC’77), pages 149–155, 8–11 November, Chicago. IEEE Computer Society.

    Google Scholar 

  • Avizienis, A. and Kelly, J. P. J. (1984). Fault tolerance by design diversity: Concepts and experiments. Computer, 17:67–80.

    Article  Google Scholar 

  • Avizienis, A., Laprie, J.-C., and Randell, B. (2000). Fundamental concepts of dependability. In Proceedings of the 3rd IEEE Information Survability Workshop (ISW-2000), pages 7–12, 20–21 December, Boston. IEEE Computer Society.

    Google Scholar 

  • Becker, R. and Corkill, D. (2007). Determining confidence when integrating contributions from multiple agents. In Proceedings of the 6th International Joint Conference on Autonomous Agents and Multi-Agent Systems (AAMAS’07), pages 449–456. The International Foundation for Autonomous Agents and Multiagent Systems (IFAAMAS).

    Google Scholar 

  • Bishop, P. (1995). Software fault tolerance by design diversity. In Lyu, M., editor, Software Fault Tolerance, pages 211–229. John Wiley & Sons, New York.

    Google Scholar 

  • Brachman, R. J. (2006). (AA)AI more than the sum of its parts. AI Magazine, 27(4):19–34.

    Google Scholar 

  • Cheyer, A. and Martin, D. L. (2001). The open agent architecture. Autonomous Agents and Multi-Agent Systems, 4(1/2):143–148.

    Article  Google Scholar 

  • DeMarco, T. and Lister, T. (1987). Peopleware: productive projects and teams. Dorset House Publishing Co., Inc., New York.

    Google Scholar 

  • Donald, L., Keller, S., and Calhoun, C. (1989). Sociology. Alfred A. Knopf, New York.

    Google Scholar 

  • Fraser, S., Campara, D., Chilley, C., Gabriel, R., Lopez, R., Thomas, D., and Utas, G. (2005). Fostering software robustness in an increasingly hostile world. In Proceedings of the 20th ACM SIGPLAN conference on Object-oriented programming, systems, languages, and applications (OOPSLA’05), pages 378–380, 16–20 October, San Diego. ACM.

    Google Scholar 

  • Grosspietsch, K. E. and Silayeva, T. A. (2003). An adaptive approach for n-version systems. In Proceedings of the 17th International Symposium on Parallel and Distributed Processing (IPDPS’03), page 215.1, Nice, France. IEEE Computer Society.

    Google Scholar 

  • Hasling, J. (1975). Group Discussion and Decision Making. Thomas Y. Crowell Company, New York.

    Google Scholar 

  • Hempel, J. (2006). Crowdsourcing: Milk the masses for inspiration. BusinessWeek. 25 September.

    Google Scholar 

  • Huhns, M. N., Holderfield, V. T., and Zavala Gutierrez, R. L. (2003a). Achieving software robustness via large-scale multiagent. In Garcia, A., Lucena, C., Zambonelli, F., Omicini, A., and Castro, J., editors, Software Engineering for Large-Scale Multi-Agent Systems, volume 2603 of Lecture Notes in Computer Science, pages 199–215. Springer, Berlin Heidelberg.

    Google Scholar 

  • Huhns, M. N., Holderfield, V. T., and Zavala Gutierrez, R. L. (2003b). Robust software via agent-based redundancy. In Proceedings of the 2nd International Joint Conference on Autonomous Agents and Multiagent Systems (AAMAS’03), pages 1018–1019. ACM.

    Google Scholar 

  • Kephart, J. O. and Chess, D. M. (2003). The vision of autonomic computing. Computer, 36(1):41–50.

    Article  MathSciNet  Google Scholar 

  • Kim, K., Vouk, M., and McAllister, D. (1996). An empirical evaluation of maximum likelihood voting in failure correlation conditions. In Proceedings of the 7th International Symposium on Software Reliability Engineering (ISSRE’96), pages 330–339, White Plains, NY. IEEE Computer Society.

    Chapter  Google Scholar 

  • Knight, J. and Leveson, N. (1986). An experimental evaluation of the assumption of independence in multi-version programming. IEEE Trans. Software Engineering, 12:96–109.

    Google Scholar 

  • Laddaga, R. (1999). Guest editor’s introduction: Creating robust software through self-adaptation. IEEE Intelligent Systems, 14(3):26–29.

    Article  Google Scholar 

  • Laddaga, R., Robertson, P., and Shrobe, H., editors (2001). Self-Adaptive Software, 2nd International Workshop (IWSAS’01), Revised Papers, volume 2614 of Lecture Notes in Computer Science, Balatonfüred, Hungary. Springer, New York.

    Google Scholar 

  • Laprie, J. (1995). Dependable computing: Concepts, limits, challenges. In Special Issue of the 25th IEEE International Symposium on Fault-Tolerant Computing, pages 42–54, Pasadena, CA.

    Google Scholar 

  • Laprie, J., Avizienis, A., and Kopetz, H., editors (1992). Dependability: Basic Concepts and Terminology. Springer-Verlag, New York.

    MATH  Google Scholar 

  • Laprie, J. C., Arlat, J., Beounes, C., Kanoun, K., and Hourtolle, C. (1987). Hardware and software fault tolerance: definition and analysis of architectural solutions. In Proceedings of the 17th International Symposium Fault-Tolerant Computing, pages 116–121, Pittsburgh,PA. ACM.

    Google Scholar 

  • Laprie, J.-C., Béounes, C., and Kanoun, K. (1990). Definition and analysis of hardware- and software-fault-tolerant architectures. Computer, 23(7):39–51.

    Article  Google Scholar 

  • Leveson, N. G. (1995). Safeware: System Safety and Computers. ACM, New York.

    Google Scholar 

  • Lyu, M., editor (1996). Handbook of Software Reliability Engineering. McGraw-Hill and IEEE Computer Society, New York.

    Google Scholar 

  • Lyu, M. and Avizienis, A. (1991). Assuring design diversity in N-version software: A design paradigm for N-version programming. In Meyer, J. and Schlichting, R., editors, Proceedings of the 2nd IFIP International Working Conference on Dependable Computing for Critical Applications (DCCA-2), pages 197–218, Tucson, Arizona, USA. Springer-Verlag, New York.

    Google Scholar 

  • Lyu, M., Chen, J., and Avizienis, A. (1992). Software diversity metrics and measurements. In Proceedings of the 16th IEEE Annual International Computer Software and Applications Conference (COMPSAC’92), pages 69–78, 21–25 September, Chicago. IEEE Computer Society.

    Google Scholar 

  • Martin, D., Cheyer, A., and Moran, D. (1999). The open agent architecture: a framework for building distributed software systems. Applied Artificial Intelligence, 13(1/2):91–128.

    Google Scholar 

  • Maxion, R. A. and Olszewski, R. T. (1998). Improving software robustness with dependability cases. In 28th International Symposium on Fault-Tolerant Computing (FTCS’98), pages 346–355, Munich, Germany. IEEE Computer Society.

    Google Scholar 

  • Mitra, S., Saxena, N. R., and McCluskey, E. J. (1999). A design diversity metric and reliability analysis for redundant systems. In Proceedings of the 1999 IEEE International Test Conference (ITC’99), page 662, Washington, DC. IEEE Computer Society.

    Google Scholar 

  • Musa, J. D., Iannino, A., and Okumoto, K. (1987). Software reliability: measurement, prediction, application. McGraw-Hill, Inc., New York.

    Google Scholar 

  • Parhami, B. (1988). From defects to failures: a view of dependable computing. SIGARCH Computer Architecture News, 16(4):157–168.

    Article  Google Scholar 

  • Pullum, L. L. (2001). Software fault tolerance techniques and implementation. Artech House, Inc., Norwood, MA.

    MATH  Google Scholar 

  • Randell, B. (1975). System structure for software fault tolerance. In Proceedings of the International Conference on Reliable Software, pages 437–449, Los Angeles, California. ACM.

    Chapter  Google Scholar 

  • Randell, B. (1995). The evolution of the recovery block concept. In Lyu, M., editor, Software Fault Tolerance, chapter 1, pages 1–22. John Wiley & Sons, New York.

    Google Scholar 

  • Randell, B. (2000). Turing memorial lecture–facing up to faults. Computer, 4(2):95–106.

    Google Scholar 

  • Scott, K., Gault, J., and McAllister, D. (1983). The consensus recovery block. In Total Systems Reliability Symposium, pages 3–9, Gaithersburg, MD. IEEE Computer Society.

    Google Scholar 

  • Seeley, T. D., Visscher, P. K., and Passino, K. M. (2006). Group decision making in honey bee swarms. American Scientist, 94:220–229.

    Google Scholar 

  • Shapley, L. S. and Grofman, B. (1984). Optimizing group judgmental accuracy in the presence of interdependence. Public Choice, 43:329–343.

    Article  Google Scholar 

  • Smith, R. G. (1988). The contract net protocol: High-level communication and control in a distributed problem solver. IEEE Transactions on Computers, C-29(12):1104–1113.

    Article  Google Scholar 

  • Smith, W. D. (2006). Ants, bees, and computers agree range voting is best single-winner system. Technical report, Temple University, Department of Mathematics.

    Google Scholar 

  • Sommerville, I. (1995). Software Engineering. Addison-Wesley, Reading, MA, 5th edition.

    Google Scholar 

  • Tai, A., Meyer, F., and Avizienis, A. (1993). Performability enhancement of fault-tolerant software. IEEE Transactions on Reliability, pages 227–237.

    Google Scholar 

  • Townend, P. and Xu, J. (2002). Assessing multi-version systems through fault injection. In Proceedings of the 7th IEEE International Workshop on Object-Oriented Real-Time Dependable Systems (WORDS’02), pages 105–112, San Diego, CA. Computer Society.

    Chapter  Google Scholar 

  • Traverse, P. (1988). Airbus and ATR system architecture and specification. Software Diversity in Computerised Control Systems, pages 95–104.

    Google Scholar 

  • Turlapati, R. and Huhns, M. N. (2005). Multiagent reputation management to achieve robust software using redundancy. In Proceedings of the IEEE/WIC/ACM International Conference on Intelligent Agent Technology (IAT’05), pages 386–392, Compiegne, France. ComputerSociety.

    Chapter  Google Scholar 

  • Vidotto, A., Brown, K. N., and Beck, J. (2005). Robust constraint solving using multiple heuristics. In Creaney, N., editor, Proceedings of the 16th Irish Artificial Intelligence and Cognitive Science Conference (AICS’05), page 871, Coleraine, Northern Ireland. University of Ulster.

    Google Scholar 

  • Voges, U., Fetsch, F., and Gmeiner, L. (1982). Use of microprocessors in a safety-oriented reactor shutdown system. In Lauber, E. and Moltoft, J., editors, Reliability in Electrical and Electronic Components and Systems, pages 493–497. North-Holland Publishing Company, Amsterdam, The Netherlands.

    Google Scholar 

  • Vouk, M., McAllister, D., Eckhardt, D., and Kim, K. (1993). An empirical evaluation of consensus voting and consensus recovery block reliability in the presence of failure correlation. Journal of Computer and Software Engineering, 4:367–388.

    Google Scholar 

  • Zavala Gutierrez, R. L. and Huhns, M. N. (2003). Achieving software robustness via multiagent-based redundancy (extended abstract). In Das, R. and Walsh, W., editors, Proceedings of the IJCAI-03 Workshop on AI and Autonomic Computing: Developing a Research Agenda for Self-Managing Computer Systems, Acapulco, Mexico. IBM.

    Google Scholar 

  • Zavala Gutierrez, R. L. and Huhns, M. N. (2004). On building robust web service-based applications. In Cavedon, L., Maamar, Z., Martin, D., and Benatallah, B., editors, Extending Web Services Technologies: The Use of Multi-Agent Approaches, chapter 14, pages 293–310. Kluwer Academic Publishing, New York.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Rosa Laura Zavala Gutierrez .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2008 Springer-Verlag London Limited

About this chapter

Cite this chapter

Gutierrez, R.L.Z., Huhns, M. (2008). Multiagent-Based Fault Tolerance Management for Robustness. In: Schuster, A. (eds) Robust Intelligent Systems. Springer, London. https://doi.org/10.1007/978-1-84800-261-6_2

Download citation

  • DOI: https://doi.org/10.1007/978-1-84800-261-6_2

  • Publisher Name: Springer, London

  • Print ISBN: 978-1-84800-260-9

  • Online ISBN: 978-1-84800-261-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics