skip to main content
10.1145/1555284.1555286acmconferencesArticle/Chapter ViewAbstractPublication PagesicacConference Proceedingsconference-collections
research-article

Characterizing fault tolerance in genetic programming

Published:19 June 2009Publication History

ABSTRACT

Evolutionary Algorithms (EAs), and particularly Genetic Programming (GP), are techniques frequently employed to solve difficult real-life problems, which can require up to days or months of computation. One approach to reduce the time to solution is to use parallel computing on distributed platforms. Distributed platforms are prone to failures, and when these platforms are large and/or low-cost, failures are expected events rather than catastrophic exceptions. Therefore, fault tolerance and recovery techniques often become necessary. It turns out that Parallel GP (PGP) applications have an inherent ability to tolerate failures. This ability is quantified via simulation experiments performed using failure traces from real-world distributed platforms, namely, desktop grids (DGs), for two well-known GP problems. A simple technique is then proposed by which PGP applications can better tolerate the different, and often high, failures rates seen in different platforms.

References

  1. D. Anderson. Boinc: a system for public-resource computing and storage. In Grid Computing, 2004. Proceedings. Fifth IEEE/ACM International Workshop on, pages 4--10, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. D. Andre and J. R. Koza. Parallel genetic programming: a scalable implementation using the transputer network architecture. pages 317--337, 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. S. B. and G. G. A. A Large-Scale Study of Failures in High-Performance Computing Systems. In Proceedings of the International Conference on Dependable Systems, pages 249--258, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. W. Banzhaf and W. B. Langdon. Some considerations on the reason for bloat. Genetic Programming and Evolvable Machines, 3(1):81--91, Mar. 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. A. Baratloo, P. Dasgupta, and Z. Kedem. Calypso: a novel software system for fault-tolerant parallel processing on distributed platforms. hpdc, 00:122, 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. C. C., H. T., L. P., P. L., R. A., R. E., and C. F. Blocking vs. Non-Blocking Coordinated Checkpointing for Large-Scale Fault Tolerant MPI. In Proceedings of the ACM/IEEE SC Conference, Nov. 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. S. Cahon, N. Melab, and E. Talbi. ParadisEO: A Framework for the Reusable Design of Parallel and Distributed Metaheuristics. Journal of Heuristics, 10(3):357--380, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. K. D., F. G., C. F., C. A. A., and C. H. Resource Availability in Enterprise Desktop Grids. Journal of Future Generation Computer Systems, 23(7):888--903, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. F. F. de Vega. A fault tolerant optimization algorithm based on evolutionary computation. In Proceedings of the International Conference on Dependability of Computer Systems, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. M. L. Douglas Thain. The Grid 2, chapter 19, pages 285--318. Morgan Kaufmann, 2004.Google ScholarGoogle Scholar
  11. E. Elnozahy, L. Alvisi, Y. Wang, and D. Johnson. A survey of rollback-recovery protocols in message-passing systems. ACM Computing Surveys (CSUR), 34(3):375--408, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. L. V. F. Fernández, M. Tomassini. Saving computational effort in genetic programming by means of plagues. Evolutionary Computation, 2003. CEC'03. The 2003 Congress on, 2003.Google ScholarGoogle ScholarCross RefCross Ref
  13. F. Fernandez, G. Spezzano, M. Tomassini, and L. Vanneschi. Parallel genetic programming. In E. Alba, editor, Parallel Metaheuristics, Parallel and Distributed Computing, chapter 6, pages 127--153. Wiley-Interscience, Hoboken, New Jersey, USA, 2005.Google ScholarGoogle Scholar
  14. F. Fernández and D. Lombraña. Algoritmos evolutivos tolerantes a fallos en entornos de computación distribuida. In XVII Jornadas de Paralelismo, volume 1, pages 401--406, Albacete, Spain, September 2006.Google ScholarGoogle Scholar
  15. G. Folino, C. Pizzuti, and G. Spezzano. CAGE: A tool for parallel genetic programming applications. In J. F. M. et. al., editor, Genetic Programming, Proceedings of EuroGP'2001, volume 2038 of LNCS, pages 64--73, Lake Como, Italy, 18-20 Apr. 2001. Springer-Verlag. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. F. G., G. E., B. G., A. T., C. Z., P.-G. J., L. K., and D. J. Extending the MPI Specification for Process Fault Tolerance on High Performance Computing Systems. In Proceedings of International Supercomputer Conference, June 2004.Google ScholarGoogle Scholar
  17. C. Gagné, M. Parizeau, and M. Dubreuil. Distributed beagle: An environment for parallel and distributed evolutionary computations. In Proc. of the 17th Annual International Symposium on High Performance Computing Systems and Applications (HPCS) 2003, pages 201--208, May 11-14 2003.Google ScholarGoogle Scholar
  18. F. C. Gartner. Fundamentals of fault-tolerant distributed computing in asynchronous environments. ACM Computing Surveys, 31(1):1--26, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. S. Ghosh. Distributed systems: an algorithmic approach. Chapman & Hall/CRC, 2006.Google ScholarGoogle Scholar
  20. I. Hidalgo, F. Fernández, J. Lanchares, and D. Lombraña. Is the island model fault tolerant? In Genetic and Evolutionary Computation Conference, volume 2, page 1519, London, England, July 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. D. Kondo, G. Fedak, F. Cappello, A. Chien, and H. Casanova. Characterizing resource availability in enterprise desktop grids. volume 23, pages 888--903. Elsevier, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. J. R. Koza. Genetic Programming: On the Programming of Computers by Means of Natural Selection. MIT Press, Cambridge, MA, USA, 1992. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. D. Lombraña and F. Fernández. Analyzing fault tolerance on parallel genetic programming by means of dynamic-size populations. In Congress on Evolutionary Computation, volume 1, pages 4392--4398, Singapore, September 2007.Google ScholarGoogle Scholar
  24. D. Lombraña, F. Fernández, L. Trujillo, G. Olague, and B. Segal. Customizable execution environments with virtual desktop grid computing. Parallel and Distributed Computing and Systems, PDCS, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. S. Luke and L. Panait. A comparison of bloat control methods for genetic programming. Evolutionary Computation, 14(3):309--344, Fall 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. J. Pruyne and M. Livny. Managing checkpoints for parallel programs. In Workshop on Job Scheduling Strategies for Parallel Processing (IPPS'96), Honolulu, HI, April 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. G. R. and S. A. Software-Based Replication for Fault Tolerance. IEEE Computer, 30(4):68--74, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Sullivan, Werthimer, Bowyer, Cobb, Gedye, and Anderson. A New Major SETI Project based on project SERENDIP data and 100,000 Personal Computers. In Astronomical and Biochemical Origins and the Search for Life in the Universe, 1997.Google ScholarGoogle Scholar
  29. A. T. Tai and K. S. Tso. A performability-oriented software rejuvenation framework for distributed applications. In DSN'05: Proceedings of the 2005 International Conference on Dependable Systems and Networks (DSN'05), pages 570--579, Washington, DC, USA, 2005. IEEE Computer Society. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. M. Tomassini. Spatially Structured Evolutionary Algorithms. Springer, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Top 500 Supercomputer Sites. http://www.top500.org/, 2009.Google ScholarGoogle Scholar
  32. L. Trujillo and G. Olague. Automated Design of Image Operators that Detect Interest Points. volume 16, pages 483--507. MIT Press, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Characterizing fault tolerance in genetic programming

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      BADS '09: Proceedings of the 2009 workshop on Bio-inspired algorithms for distributed systems
      June 2009
      114 pages
      ISBN:9781605585840
      DOI:10.1145/1555284

      Copyright © 2009 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 19 June 2009

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader