Skip to main content

Advertisement

Log in

Controlling code growth by dynamically shaping the genotype size distribution

  • Published:
Genetic Programming and Evolvable Machines Aims and scope Submit manuscript

Abstract

Genetic programming is a hyperheuristic optimization approach that seeks to evolve various forms of symbolic computer programs, in order to solve a wide range of problems. However, the approach can be severely hindered by a significant computational burden and stagnation of the evolution caused by uncontrolled code growth. This paper introduces HARM-GP, a novel operator equalization method that conducts an adaptive shaping of the genotype size distribution of individuals in order to effectively control code growth. Its probabilistic nature minimizes the computational overheads on the evolutionary process while its generic formulation allows it to remain independent of both the problem and the genetic variation operators. Comparative results over twelve problems with different dynamics, and over nine other algorithms taken from the literature, show that HARM-GP is excellent at controlling code growth while maintaining good overall performance. Results also demonstrate the effectiveness of HARM-GP at limiting overfitting in real-world supervised learning problems.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11

Similar content being viewed by others

Notes

  1. For two of our twelve problems, namely Artificial Ant and Even Parity, the training and testing sets are the same, since the task is rather a planning problem (Artificial Ant) or the training set already samples the full problem space (Even Parity). Considering this, \(f^*_{x,i}\) and \(g^*_{x,i}\) refer to the same values for these specific problems.

  2. We do not refer to the original concept of computational effort developed by Koza, but rather to the actual processing effort needed to perform an evolution.

  3. As we select the runs only based on their fitness on the training set, we actually increase the likelihood of overfitting, and thus impact test error. Using a validation set would prevent this and allow better generalization results.

  4. http://deap.gel.ulaval.ca, http://beagle.gel.ulaval.ca.

References

  1. E. Alfaro-Cid, A. Esparcia-Alcázar, K. Sharman, F. Fernandez de Vega, J. J. Merelo, Prune and plant: a new bloat control method for genetic programming. in Proceedings of the International Conference on Hybrid Intelligent Systems (HIS), pp. 31–35 (2008)

  2. E. Alfaro-Cid, J.J. Merelo, F. Fernandez de Vega, A. Esparcia-Alcazar, K. Sharman, Bloat control operators and diversity in genetic programming: a comparative study. Evol. Comput. 18(2), 305–332 (2010)

    Article  Google Scholar 

  3. E. Alpaydin, Introduction to Machine Learning (MIT Press, Cambridge, 2004)

    Google Scholar 

  4. N. Amil, N. Bredeche, C. Gagné, S. Gelly, M. Schoenauer, O. Teytaud, A statistical learning perspective of genetic programming. in Proceedings of the European Conference on Genetic Programming (EuroGP), pp. 327–338 (2009)

  5. J. Bacardit, M. Stout, N. Krasnogor, J. D. Hirst, J. Blazewicz, Coordination number prediction using learning classifier systems: performance and interpretability. in Proceedings of Genetic and Evolutionary Computation Conference (GECCO), pp. 247–254 (2006)

  6. W. Banzhaf, W.B. Langdon, Some considerations on the reason for bloat. Genet. Program. Evolvable Mach. 3(1), 81–91 (2002)

    Article  MATH  Google Scholar 

  7. W. Banzhaf, P. Nordin, R.E. Keller, F.D. Francone, Genetic Programming: An Introduction (Morgan Kaufmann, Los Altos, 1997)

    Google Scholar 

  8. S. Bleuler, M. Brack, L. Thiele, E. Zitzler, Multiobjective genetic programming: reducing bloat using SPEA2. in Proceedings of the Congress on evolutionary computation (CEC), 1, pp. 536–543 (2001)

  9. T. Blickle, L. Thiele, Genetic programming and redundancy. in Genetic Algorithms within the Framework of Computation (Workshop at KI-94) (1994)

  10. R. Bock, A. Chilingarian, M. Gaug, F. Hakl, T. Hengstebeck, M. Jiřina, J. Klaschka, E. Kotrč, P. Savickỳ, S. Towers et al., Methods for multidimensional event classification: a case study using images from a Cherenkov gamma-ray telescope. Nucl. Instrum. Methods Phys. Res. Sect. A Accel. Spectrom. Detect. Assoc. Equip. 516(2), 511–528 (2004)

    Article  Google Scholar 

  11. M. Brameier, W. Banzhaf, A comparison of linear genetic programming and neural networks in medical data mining. IEEE Trans. Evol. Comput. 5(1), 17–26 (2001)

    Article  Google Scholar 

  12. E.K. Burke, M. Hyde, G. Kendall, J. Woodward, A genetic programming hyper-heuristic approach for evolving 2-D strip packing heuristics. IEEE Trans. Evol. Comput. 14(6), 942–958 (2010)

    Article  Google Scholar 

  13. S. Dignum, R. Poli, Generalisation of the limiting distribution of program sizes in tree-based genetic programming and analysis of its effects on bloat. in Proceedings of the Genetic and Evolutionary Computation Conference (GECCO), pp. 1588–1595 (2007)

  14. S. Dignum, R. Poli, Crossover, sampling, bloat and the harmful effects of size limits. in Proceedings of the European Conference on Genetic Programming (EuroGP), pp. 158–169 (2008)

  15. S. Dignum, R. Poli, Operator equalisation and bloat free GP. in Proceedings of the European conference on Genetic Programming (EuroGP), pp. 110–121 (2008)

  16. K. Dimitrios, K. Aigli, T. Konstantinos, L. Spiros, T. Athanasios, M. Seferina, Where we stand, where we are moving: surveying computational techniques for identifying miRNA genes and uncovering their regulatory role. J. Biomed. Inform. 46(3), 563–573 (2013)

    Article  Google Scholar 

  17. P.G. Espejo, S. Ventura, F. Herrera, A survey on the application of genetic programming to classification. IEEE Trans. Syst. Man Cybern. Part C Appl. Rev. 40(2), 121–144 (2010)

    Article  Google Scholar 

  18. J. Fitzgerald, R. Azad, C. Ryan, Bootstrapping to reduce bloat and improve generalisation in genetic programming. in Companion Proceedings of the Genetic and Evolutionary Computation Conference (GECCO), pp. 141–142 (2013)

  19. F.-A. Fortin, F.-M. De Rainville, M.-A. Gardner, M. Parizeau, C. Gagné, DEAP: evolutionary algorithms made easy. J. Mach. Learn. Res. 13, 2171–2175 (2012)

    MATH  MathSciNet  Google Scholar 

  20. C. Gagné, M. Parizeau, Genericity in evolutionary computation software tools: principles and case study. Int. J. Artif. Intel. Tools 15(2), 173–194 (2006)

    Article  Google Scholar 

  21. S. Gelly, O. Teytaud, N. Bredeche, M. Schoenauer, A statistical learning theory approach of bloat. in Proceedings of the Genetic and Evolutionary Computation Conference (GECCO), pp. 1783–1784 (2005)

  22. L. Guo, D. Rivero, J. Dorado, C.R. Munteanu, A. Pazos, Automatic feature extraction using genetic programming: an application to epileptic EEG classification. Expert Systems Appl. 38(8), 10425–10436 (2011)

    Article  Google Scholar 

  23. K. Harries, P. Smith, Code growth, explicitly defined introns and alternative selection schemes. Evol. Comput. 6(4), 346–364 (1998)

    Google Scholar 

  24. M. Keijzer, Improving symbolic regression with interval arithmetic and linear scaling. in Proceedings of the European Conference on Genetic Programming (EuroGP), pp. 70–82 (2003)

  25. K. Kinnear Jr., Evolving a sort: lessons in genetic programming. in Proceedings of the IEEE International Conference on Neural Networks (ICNN), pp. 881–888 (1993)

  26. D. Kinzett, M. Johnston, M. Zhang, Numerical simplification for bloat control and analysis of building blocks in genetic programming. Evol. Intell. 2(4), 151–168 (2009)

    Article  Google Scholar 

  27. A. Kordon, G. Smits, E. Jordaan, E. Rightor, Robust soft sensors based on integration of genetic programming, analytical neural networks, and support vector machines. in Proceedings of the IEEE International Conference on E-Commerce Technology, 1 (2002)

  28. J.R. Koza, Genetic Programming: On the Programming of Computers by Means of Natural Selection (MIT Press, Cambridge, 1992)

    MATH  Google Scholar 

  29. J.R. Koza, Human-competitive results produced by genetic programming. Genet. Program. Evolv. Mach. 11(3–4), 251 (2010)

    Article  Google Scholar 

  30. W. Langdon, R. Poli, Fitness causes bloat. in Soft Computing in Engineering Design and Manufacturing, (Springer, London, 1998), pp. 13–22

  31. W. Langdon, T. Soule, R. Poli, J. Foster. The evolution of size and shape. in Advances in Genetic Programming III, chapter 8 (MIT Press, 1999) pp. 163–190

  32. W.B. Langdon, R. Poli, Foundations of Genetic Programming (Springer, Berlin, 2002)

    Book  MATH  Google Scholar 

  33. S. M. Lee, D. S. Kim, J. H. Kim, J. S. Park, Spam detection using feature selection and parameters optimization. in Proceedings of the International Conference on Complex, Intelligent and Software Intensive Systems (CISIS), (Washington, DC, USA, 2010) pp. 883–888

  34. S. Luke, L. Panait, Fighting bloat with nonparametric parsimony pressure. in Proceedings of Parallel Problem Solving from Nature (PPSN), pp. 411–421 (2002)

  35. S. Luke, L. Panait, Lexicographic parsimony pressure. in Proceedings of the Genetic and Evolutionary Computation Conference (GECCO), pp. 829–836 (2002)

  36. S. Luke, L. Panait, A comparison of bloat control methods for genetic programming. Evol. Comput. 14(3), 309–344 (2006)

    Article  Google Scholar 

  37. J. F. Miller, P. Thomson, Cartesian genetic programming. in Proceedings of the European Conference on Genetic Programming (EuroGP), pp. 121–132 (2000)

  38. P. Nordin, W. Banzhaf et al., Complexity compression and evolution. in Proceedings of the International Conference on Genetic Algorithms (ICGA), pp. 310–317 (1995)

  39. M. O’Neill, L. Vanneschi, S. Gustafson, W. Banzhaf, Open issues in genetic programming. Genet. Program. Evolv. Mach. 11(3–4), 339–363 (2010)

    Article  Google Scholar 

  40. L. Pagie, P. Hogeweg, Evolutionary consequences of coevolving targets. Evol. Comput. 5(4), 401–418 (1997)

    Article  Google Scholar 

  41. L. Panait, S. Luke, Alternative bloat control methods. in Proceedings of the Genetic and Evolutionary Computation Conference (GECCO), pp. 630–641 (2004)

  42. R. Poli, A simple but theoretically-motivated method to control bloat in genetic programming. in Proceedings of the European Conference on Genetic Programming (EuroGP), pp. 204–217 (2003)

  43. R. Poli, Covariant tarpeian method for bloat control in genetic programming. in Genetic Programming Theory and Practice VIII, (Springer, 2011), pp. 71–89

  44. R. Poli, W. B. Langdon, S. Dignum, On the limiting distribution of program sizes in tree-based genetic programming. in Proceedings of the European Conference on Genetic Programming (EuroGP), pp. 193–204 (2007)

  45. R. Poli, W. B. Langdon, N. F. McPhee, A field guide to genetic programming. freely. http://www.gp-field-guide.org.uk (2008)

  46. R. Poli, N. F. McPhee, Exact schema theorems for gp with one-point and standard crossover operating on linear structures and their application to the study of the evolution of size. in Proceedings of the European Conference on Genetic Programming (EuroGP) (2001)

  47. R. Poli, N.F. McPhee, General schema theory for genetic programming with subtree-swapping crossover: Part II. Evol. Comput. 11(2), 169–206 (2003)

    Article  Google Scholar 

  48. R. Poli, N. F. McPhee, Parsimony pressure made easy. in Proceedings of the Genetic and Evolutionary Computation Conference (GECCO) pp. 1267–1274 (2008)

  49. R.Y. Rubinstein, D.P. Kroese, Simulation and the Monte Carlo method (Wiley, New York, 2008)

    MATH  Google Scholar 

  50. S. Silva, Reassembling operator equalisation: a secret revealed. in Proceedings of the Genetic and Evolutionary Computation Conference (GECCO), pp. 1395–1402 (2011)

  51. S. Silva, E. Costa, Dynamic limits for bloat control in genetic programming and a review of past and current bloat theories. Genet. Program. Evolv. Mach. 10(2), 141–179 (2009)

    Article  MathSciNet  Google Scholar 

  52. S. Silva, S. Dignum, Extending operator equalisation: fitness based self adaptive length distribution for bloat free GP. in Proc. of the European Conference on Genetic Programming (EuroGP), pp. 159–170 (2009)

  53. S. Silva, S. Dignum, L. Vanneschi, Operator equalisation for bloat free genetic programming and a survey of bloat control methods. Genet. Program. Evolv. Mach. 13(2), 197–238 (2011)

    Article  Google Scholar 

  54. S. Silva, L. Vanneschi, Operator equalisation, bloat and overfitting: a study on human oral bioavailability prediction. in Proceedings of Genetic and Evolutionary Computation Conference (GECCO), pp. 1115–1122 (2009)

  55. S. Silva, L. Vanneschi, The importance of being flat—studying the program length distributions of operator equalisation. in Genetic Programming Theory and Practice IX, (Springer, 2011), pp. 211–233

  56. T. Soule, J.A. Foster, Effects of code growth and parsimony pressure on populations in genetic programming. Evolut. Comput. 6(4), 293–309 (1998)

    Article  Google Scholar 

  57. T. Soule, R.B. Heckendorn, An analysis of the causes of code growth in genetic programming. Genet. Program. Evolv. Machin. 3(3), 283–309 (2002)

    Article  MATH  Google Scholar 

  58. W. Tackett, Recombination, selection, and the genetic construction of computer programs. PhD thesis, (University of Southern California, 1994)

  59. A. Teller, M. Veloso, Program evolution for data mining. Int. J. Expert Syst. Res. Appl. 8, 213–236 (1995)

    Article  Google Scholar 

  60. M. Tomassini, L. Vanneschi, J. Cuendet, F. Fernández, A new technique for dynamic size populations in genetic programming. in Proceedings of the Congress on Evolutionary Computation (CEC), 1, pp. 486–493 (2004)

  61. L. Trujillo, E. Naredo, Y. Martínez, Preliminary study of bloat in genetic programming with behavior-based search. in EVOLVE-A Bridge between Probability, Set Oriented Numerics, and Evolutionary Computation IV, (Springer, 2013) pp. 293–305

  62. L. Trujillo, S. Silva, P. Legrand, L. Vanneschi, An empirical study of functional complexity as an indicator of overfitting in genetic programming. in Proceedings of the European Conference on Genetic Programming (EuroGP), pp. 262–273 (2011)

  63. N.Q. Uy, N.X. Hoai, M. O’Neill, R.I. McKay, E. Galván-López, Semantically-based crossover in genetic programming: application to real-valued symbolic regression. Genet. Program. Evolv. Machin. 12(2), 91–119 (2011)

    Article  Google Scholar 

  64. L. Vanneschi, M. Castelli, S. Silva, Measuring bloat, overfitting and functional complexity in genetic programming. in Proceedings of the Genetic and Evolutionary Computation Conference (GECCO), pp. 877–884 (2010)

  65. E.J. Vladislavleva, G.F. Smits, D. Den Hertog, Order of nonlinearity as a complexity measure for models generated by symbolic regression via pareto genetic programming. IEEE Trans. Evolut. Comput. 13(2), 333–349 (2009)

    Article  Google Scholar 

  66. P.A. Whigham, G. Dick, Implicitly controlling bloat in genetic programming. IEEE Trans. Evolut. Comput. 14(2), 173–190 (2010)

    Article  Google Scholar 

  67. D.R. White, J. McDermott, M. Castelli, L. Manzoni, B.W. Goldman, G. Kronberger, W. Jaśkowski, U.-M. O’Reilly, S. Luke, Better GP benchmarks: community survey results and proposals. Genetic Program. Evolv. Mach. 14(1), 3–29 (2013)

    Article  Google Scholar 

  68. L. Wilkinson, A. Anand, D. N. Tuan, CHIRP: a new classifier based on composite hypercubes on iterated random projections. in Proceedings of the International Conference on Knowledge Discovery and Data Mining (KDD), pp. 6–14 (2011)

  69. J. Yu, J. Yu, A.A. Almal, S.M. Dhanasekaran, D. Ghosh, W.P. Worzel, A.M. Chinnaiyan, Feature selection and molecular classification of cancer using genetic programming. Neoplasia 9(4), 292 (2007)

    Article  Google Scholar 

  70. M. Zhang, P. Wong, Genetic programming for medical classification: a program simplification approach. Genetic Program. Evolv. Mach. 9(3), 229–255 (2008)

    Article  Google Scholar 

Download references

Acknowledgments

We acknowledge the financial support of the NSERC (Canada) and FRQ-NT (Québec), and the access to supercomputing facilities of Calcul Québec / Compute Canada. We also thank Annette Schwerdtfeger for proofreading the manuscript.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Marc-André Gardner.

Electronic supplementary material

Below is the link to the electronic supplementary material.

10710_2015_9242_MOESM1_ESM.pdf

Appendix A details the sensitivity analysis results reported in Sec. 5.1. Appendix B exposes the general experimental methodology followed, makes a detailed description of each problem and previous results obtained on it, and presents plots and tables detailing the experimental results obtained for each of these problems.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Gardner, MA., Gagné, C. & Parizeau, M. Controlling code growth by dynamically shaping the genotype size distribution. Genet Program Evolvable Mach 16, 455–498 (2015). https://doi.org/10.1007/s10710-015-9242-8

Download citation

  • Received:

  • Revised:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10710-015-9242-8

Keywords

Navigation