skip to main content
10.1145/1569901.1570051acmconferencesArticle/Chapter ViewAbstractPublication PagesgeccoConference Proceedingsconference-collections
research-article

Operator equalisation, bloat and overfitting: a study on human oral bioavailability prediction

Published:08 July 2009Publication History

ABSTRACT

Operator equalisation was recently proposed as a new bloat control technique for genetic programming. By controlling the distribution of program lengths inside the population, it can bias the search towards smaller or larger programs. In this paper we propose a new implementation of operator equalisation and compare it to a previous version, using a hard real-world regression problem where bloat and overfitting are major issues. The results show that both implementations of operator equalisation are completely bloat-free, producing smaller individuals than standard genetic programming, without compromising the generalization ability. We also show that the new implementation of operator equalisation is more efficient and exhibits a more predictable and reliable behavior than the previous version. We advance some arguable ideas regarding the relationship between bloat and overfitting, and support them with our results.

References

  1. ]]F. Archetti, E. Messina, S. Lanzeni, and L. Vanneschi. Genetic programming for computational pharmacokinetics in drug discovery and development. Genetic Programming and Evolvable Machines, 8(4):17--26, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. ]]S. Dignum and R. Poli. Generalisation of the limiting distribution of program sizes in tree-based genetic programming and analysis of its effects on bloat. In D. Thierens, et al., editors, GECCO '07: Proceedings of the 9th annual conference on Genetic and evolutionary computation, volume 2, pages 1588--1595, London, 7-11 July 2007. ACM Press. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. ]]S. Dignum and R. Poli. Crossover, sampling, bloat and the harmful effects of size limits. In M. O'Neill, et al., editors, Proceedings of the 11th European Conference on Genetic Programming, EuroGP 2008, volume 4971 of Lecture Notes in Computer Science, pages 158--169, Naples, 26-28 Mar. 2008. Springer. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. ]]S. Dignum and R. Poli. Operator equalisation and bloat free GP. In M. O'Neill, et al., editors, Proceedings of the 11th European Conference on Genetic Programming, EuroGP 2008, volume 4971 of Lecture Notes in Computer Science, pages 110--121, Naples, 26--28 Mar. 2008. Springer. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. ]]F. Archetti, S. Lanzeni, E. Messina and L. Vanneschi. Genetic programming for human oral bioavailability of drugs. In M. Cattolico, editor, Proceedings of the 8th annual conference on Genetic and Evolutionary Computation, pages 255--262, Seattle, Washington, USA, July 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. ]]F. Yoshida and J. G. Topliss. QSAR model for drug human oral bioavailability. Journal of Medicinal Chemistry, 43:2575--2585, 2000.Google ScholarGoogle ScholarCross RefCross Ref
  7. ]]H. Van de Waterbeemd and S. Rose. In The Practice of Medicinal Chemistry, 2nd edition. ed. Wermuth, L. G., 1367--1385,Academic Press, 2003.Google ScholarGoogle Scholar
  8. ]]I. Kola and J. Landis. Can the pharmaceutical industry reduce attrition rates? Nature Reviews Drug Discovery, 3:711--716, 2004.Google ScholarGoogle ScholarCross RefCross Ref
  9. ]]C. Igel and K. Chellapilla. Investigating the influence of depth and degree of genotypic change on fitness in genetic programming. In W. Banzhaf, et al., editors, Proceedings of the Genetic and Evolutionary Computation Conference, volume 2, pages 1061--1068, Orlando, Florida, USA, 13--17 July 1999. Morgan Kaufmann.Google ScholarGoogle Scholar
  10. ]]J. R. Koza. Genetic Programming: On the Programming of Computers by Means of Natural Selection. MIT Press, Cambridge, MA, USA, 1992. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. ]]W. B. Langdon and R. Poli. Foundations of Genetic Programming. Springer, Berlin, Heidelberg, New York, Berlin, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. ]]S. Luke. Modification point depth and genome growth in genetic programming. Evolutionary Computation, 11(1):67--106, Spring 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. ]]S. Luke and L. Panait. Lexicographic parsimony pressure. In W. B. Langdon, et al., editors, GECCO 2002: Proceedings of the Genetic and Evolutionary Computation Conference, pages 829--836, New York, 9-13 July 2002. Morgan Kaufmann Publishers.Google ScholarGoogle Scholar
  14. ]]P. Domingos. The role of Occam's razor in knowledge discovery. Data Mining and Knowledge Discovery, 3(4):409--425, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. ]]R. Poli, W. B. Langdon, and S. Dignum. On the limiting distribution of program sizes in tree-based genetic programming. In M. Ebner, et al., editors, Proceedings of the 10th European Conference on Genetic Programming, volume 4445 of Lecture Notes in Computer Science, pages 193--204, Valencia, Spain, 11-13 Apr. 2007. Springer. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. ]]R. Poli, N. F. McPhee, and L. Vanneschi. The impact of population size on code growth in GP: analysis and empirical validation. In M. Keijzer, et al., editors, GECCO '08: Proceedings of the 10th annual conference on Genetic and evolutionary computation, pages 1275--1282, Atlanta, GA, USA, 12-16 July 2008. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. ]]R. Poli, W. B. Langdon, and N. F. McPhee. A field guide to genetic programming. Published via http://lulu.com and freely available at http://www.gp-field-guide.org.uk, 2008. (With contributions by J. R. Koza). Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. ]]R. Todeschini and V. Consonni. Handbook of Molecular Descriptors. Wiley-VCH, Weinheim, 2000.Google ScholarGoogle ScholarCross RefCross Ref
  19. ]]J. Rissanen. Modeling by shortest data description. Automatica, 14:465--471, 1978.Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. ]]J. Rosca. Generality versus size in genetic programming. In J. R. Koza, et al., editors, Genetic Programming 1996: Proceedings of the First Annual Conference, pages 381--387, Stanford University, CA, USA, 28-31 July 1996. MIT Press. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. ]]S. David, Wishart, C. Knox, A. C. Guo, S. Shrivastava, M. Hassanali,P. Stothard, Z. Chang and J. Woolsey. DrugBank: a comprehensive resource for in silico drug discovery and exploration. Nucleic Acids Research, 34:doi:10.1093/nar/gkj067, 2006.Google ScholarGoogle Scholar
  22. ]]S. Silva. GPLAB -- a genetic programming toolbox for MATLAB, version 3.0, 2007. http://gplab.sourceforge.net.Google ScholarGoogle Scholar
  23. ]]S. Silva and J. Almeida. Dynamic maximum tree depth. In E. Cantú-Paz, et al., editors, Genetic and Evolutionary Computation -- GECCO--2003, volume 2724 of LNCS, pages 1776---1787, Chicago, 12-16 July 2003. Springer--Verlag. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. ]]S. Silva and E. Costa. Dynamic limits for bloat control in genetic programming and a review of past and current bloat theories. Genetic Programming and Evolvable Machines, 10(2):141--179, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. ]]S. Silva and S. Dignum. Extending operator equalisation: Fitness based self adaptive length distribution for bloat free GP. In L. Vanneschi, et al., editors, Proceedings of the 12th European Conference on Genetic Programming, EuroGP2009. Springer, 2009. To appear. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. ]]Simulation Plus Inc. a company that use both statistical methods and differential equations based simulations for ADME parameter estimation., 2006. See www.simulationsplus.com.Google ScholarGoogle Scholar
  27. ]]T. Kennedy. Managing the drug discovery/development interface. Drug Discovery Today, 2:436--444, 1997.Google ScholarGoogle ScholarCross RefCross Ref
  28. ]]L. Vanneschi, M. Tomassini, P. Collard, and M. Clergue. Fitness distance correlation in structural mutation genetic programming. In C. Ryan, et al., editors, Genetic Programming, Proceedings of EuroGP'2003, volume 2610 of LNCS, pages 455--464, Essex, 14-16 Apr. 2003. Springer-Verlag. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. ]]W. B. Langdon and S. J. Barrett. Genetic Programming in data mining for drug discovery. in Evolutionary computing in data mining, pages 211--235, 2004.Google ScholarGoogle Scholar

Recommendations

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Sign in
  • Published in

    cover image ACM Conferences
    GECCO '09: Proceedings of the 11th Annual conference on Genetic and evolutionary computation
    July 2009
    2036 pages
    ISBN:9781605583259
    DOI:10.1145/1569901

    Copyright © 2009 ACM

    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    • Published: 8 July 2009

    Permissions

    Request permissions about this article.

    Request Permissions

    Check for updates

    Qualifiers

    • research-article

    Acceptance Rates

    Overall Acceptance Rate1,669of4,410submissions,38%

    Upcoming Conference

    GECCO '24
    Genetic and Evolutionary Computation Conference
    July 14 - 18, 2024
    Melbourne , VIC , Australia

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader