Skip to main content

Advertisement

Log in

Alleviating overfitting in transformation-interaction-rational symbolic regression with multi-objective optimization

  • Published:
Genetic Programming and Evolvable Machines Aims and scope Submit manuscript

Abstract

The Transformation-Interaction-Rational is a representation for symbolic regression that limits the search space of functions to the ratio of two nonlinear functions each one defined as the linear regression of transformed variables. This representation has the main objective to bias the search towards simpler expressions while keeping the approximation power of standard approaches. The performance of using Genetic Programming with this representation was substantially better than with its predecessor (Interaction-Transformation) and ranked close to the state-of-the-art on a contemporary Symbolic Regression benchmark. On a closer look at these results, we observed that the performance could be further improved with an additional selective pressure for smaller expressions when the dataset contains just a few data points. The introduction of a penalization term applied to the fitness measure improved the results on these smaller datasets. One problem with this approach is that it introduces two additional hyperparameters: (i) a criterion for when the penalization should be activated and, (ii) the amount of penalization to the fitness function. One possible solution to alleviate this additional burden of correctly setting these hyperparameters is to pose the search as a multi-objective optimization problem by minimizing the approximation error and the expression size. The main idea is that the selective pressure of finding non-dominating solutions will return the simplest model for each particular approximation error in the pareto front. In this paper, we extend Transformation-Interaction-Rational to support multi-objective optimization, specifically the NSGA-II algorithm, and apply that to the same benchmark. A detailed analysis of the results show that the use of multi-objective optimization benefits the overall performance on a subset of the benchmarks while keeping the results similar to the single-objective approach on the remainder of the datasets. Specifically to the small datasets, we observe a small (and statistically insignificant) improvement of the results suggesting that further strategies must be explored.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

Notes

  1. https://cavalab.org/srbench/.

References

  1. R.E. Kass, Nonlinear regression analysis and its applications. J. Am. Stat. Assoc. 85(410), 594–596 (1990)

    Article  Google Scholar 

  2. F.E. Harrell, Regression modeling strategies. Bios 330(2018), 14 (2017)

    Google Scholar 

  3. A. Gelman, J. Hill, A. Vehtari, Regression and Other Stories (Cambridge University Press, Cambridge, 2020)

    Book  MATH  Google Scholar 

  4. G. Kronberger, F.O. de França, B. Burlacu, C. Haider, M. Kommenda, Shape-constrained symbolic regression-improving extrapolation with prior knowledge. Evolution. Comput. 30(1), 75–98 (2022)

    Article  Google Scholar 

  5. C. Haider, F.O. de França, G. Kronberger, B. Burlacu, Comparing optimistic and pessimistic constraint evaluation in shape-constrained symbolic regression. In: Proceedings of the Genetic and Evolutionary Computation Conference, pp. 938–945 (2022)

  6. J.R. Koza, Genetic Programming: On the Means of Programming Computers by Means of Natural Selection. MIT Press (1992)

  7. J.R. Koza, Genetic Programming II vol. 17. MIT press, Cambridge (1994)

  8. R. Poli, W.B. Langdon, N.F. McPhee, J.R. Koza, A Field Guide to Genetic Programming (Lulu. com, Research Triangle Park, 2008)

    Google Scholar 

  9. F.O. de França, A greedy search tree heuristic for symbolic regression. Inf. Sci. 442–443, 18–32 (2018). https://doi.org/10.1016/j.ins.2018.02.040

    Article  MathSciNet  MATH  Google Scholar 

  10. G.S.I. Aldeia, F.O. de França, Lightweight symbolic regression with the interaction—transformation representation. In: 2018 IEEE Congress on Evolutionary Computation (CEC). IEEE, New York (2018). https://doi.org/10.1109%2Fcec.2018.8477951

  11. W. La Cava, P. Orzechowski, B. Burlacu, F.O. de França, M. Virgolin, Y. Jin, M. Kommenda, J.H. Moore, Contemporary symbolic regression methods and their relative performance. In: Proceedings of the Neural Information Processing Systems Track on Datasets and Benchmarks (2021). https://openreview.net/pdf?id=xVQMrDLyGst

  12. de França, F.O., Transformation-interaction-rational representation for symbolic regression. In: Proceedings of the Genetic and Evolutionary Computation Conference. In: GECCO ’22, pp. 920–928. Association for Computing Machinery, New York, NY, USA (2022). 10.1145/3512290.3528695. https://doi.org/10.1145/3512290.3528695

  13. D.M. Hawkins, The problem of overfitting. J. Chem. Inf. Comput. Sci. 44(1), 1–12 (2004)

    Article  Google Scholar 

  14. M. Learning, Tom Mitchell (McGraw Hill, Publisher, 1997)

    Google Scholar 

  15. A.Y. Ng, Preventing "overfitting" of cross-validation data. In: ICML, vol. 97, pp. 245–253 (1997). Citeseer

  16. M.J. Cavaretta,K. Chellapilla, Data mining using genetic programming: The implications of parsimony on generalization error. In: Proceedings of the 1999 Congress on Evolutionary Computation-CEC99 (Cat. No. 99TH8406), vol. 2, pp. 1330–1337. IEEE (1999)

  17. G. Paris, D. Robilliard, C. Fonlupt, Exploring overfitting in genetic programming. In: International Conference on Artificial Evolution (Evolution Artificielle), pp. 267–277. Springer (2003)

  18. J.R. Koza, Genetic Programming: On the Programming of Computers by Means of Natural Selection. A Bradford book. Bradford, Bradford, PA (1992). https://books.google.com.br/books?id=Bhtxo60BV0EC

  19. W.B. Langdon, Size fair and homologous tree crossovers for tree genetic programming. Genetic Program. Evol. Mach. 1, 95–119 (2000)

    Article  MATH  Google Scholar 

  20. T. Hastie, R. Tibshirani, J.H. Friedman, J.H. Friedman, The Elements of Statistical Learning: Data Mining, Inference, and Prediction vol. 2. Springer, (2009)

  21. G. Kronberger, M. Kommenda, M. Affenzeller, Overfitting detection and adaptive covariant parsimony pressure for symbolic regression. In: Proceedings of the 13th Annual Conference Companion on Genetic and Evolutionary Computation, pp. 631–638 (2011)

  22. R. Poli, N.F. McPhee, Covariant Parsimony Pressure in Genetic Programming. Technical report, Technical Report CES-480, Department of Computing and Electronic Systems (2008)

  23. L. Vanneschi, M. Castelli, S. Silva, Measuring bloat, overfitting and functional complexity in genetic programming. In: Proceedings of the 12th Annual Conference on Genetic and Evolutionary Computation, pp. 877–884 (2010)

  24. Q. Chen, B. Xue, , L. Shang, M. Zhang, Improving generalisation of genetic programming for symbolic regression with structural risk minimisation. In: Proceedings of the Genetic and Evolutionary Computation Conference 2016, pp. 709–716 (2016)

  25. G.F. Bomarito, P.E. Leser, N. Strauss, K.M. Garbrecht, J.D. Hochhalter. Bayesian model selection for reducing bloat and overfitting in genetic programming for symbolic regression. In: Proceedings of the Genetic and Evolutionary Computation Conference Companion, pp. 526–529 (2022)

  26. M. Kommenda G. Kronberger, M. Affenzeller, S.M. Winkler, B. Burlacu, Evolving simple symbolic regression models by multi-objective genetic programming. Genetic Programming Theory and Practice XIII, 1–19 (2016)

  27. E.D. De Jong, J.B. Pollack, Multi-objective methods for tree size control. Genet. Program. Evol. Mach. 4, 211–233 (2003)

    Article  Google Scholar 

  28. Smits, G.F., Kotanchek, M.: Pareto-front exploitation in symbolic regression. Genetic Programming Theory and Practice II, 283–299 (2005)

  29. Burlacu, B., Kronberger, G., Kommenda, M., Affenzeller, M.: Parsimony measures in multi-objective genetic programming for symbolic regression. In: Proceedings of the Genetic and Evolutionary Computation Conference Companion, pp. 338–339 (2019)

  30. Kronberger, G., de França, F.O., Burlacu, B., Haider, C., Kommenda, M.: Shape-constrained symbolic regression–improving extrapolation with prior knowledge. Evolution. Comput., pp. 1–24

  31. J. Kubalík, E. Derner, R. Babuška, Multi-objective symbolic regression for physics-aware dynamic modeling. Exp. Syst. Appl. 182, 115210 (2021)

    Article  Google Scholar 

  32. Aldeia, G.S.I., de Franca, F.O.: A parametric study of interaction-transformation evolutionary algorithm for symbolic regression. In: 2020 IEEE Congress on Evolutionary Computation (CEC), pp. 1–8. IEEE (2020)

  33. Udrescu, S.M., Tegmark, M.: AI Feynman: A physics-inspired method for symbolic regression. Sci. Adv. 6(16) (2020) 10.1126/sciadv.aay2631

  34. V.-M. Taavitsainen, Ridge and pls based rational function regression. J. Chemomet. 24(11–12), 665–673 (2010)

    Article  Google Scholar 

  35. V.-M. Taavitsainen, Rational function ridge regression in kinetic modeling: a case study. Chemomet. Intell. Lab. Syst. 120, 136–141 (2013)

    Article  Google Scholar 

  36. Moghaddam, S.A., Mokhtarzade, M., Naeini, A.A., Moghaddama, S.A.: Statistical method to overcome overfitting issue in rational function models. Int. Arch. Photogram. Remote Sens. Spatial Inf. Sci. 42(4/W4) (2017)

  37. de Franca, F.O.: Comparison of ols and nls to fit transformation-interaction-rational expressions. In: 2022 24th International Symposium on Symbolic and Numeric Algorithms for Scientific Computing (SYNASC), pp. 319–322. IEEE (2022)

  38. de França, F.O.: Transformation-interaction-rational representation for symbolic regression: a detailed analysis of srbench results. ACM Trans. Evol. Learn. (2023)

  39. McConaghy, T.: Ffx: Fast, scalable, deterministic symbolic regression technology. Genetic Program. Theory Pract. IX, 235–260 (2011)

  40. Deb, K., Agrawal, S., Pratap, A., Meyarivan, T.: A fast elitist non-dominated sorting genetic algorithm for multi-objective optimization: Nsga-ii. In: International Conference on Parallel Problem Solving from Nature, pp. 849–858. Springer (2000)

  41. Jamieson, K., Talwalkar, A.: Non-stochastic best arm identification and hyperparameter optimization. In: Artificial Intelligence and Statistics, pp. 240–248. PMLR (2016)

  42. Burlacu, B., Kronberger, G., Kommenda, M.: Operon c++: An efficient genetic programming framework for symbolic regression. In: Proceedings of the Genetic and Evolutionary Computation Conference Companion. GECCO ’20, pp. 1562–1570. Association for Computing Machinery, New York, NY, USA (2020). 10.1145/3377929.3398099. https://doi.org/10.1145/3377929.3398099

  43. M. Kommenda, B. Burlacu, G. Kronberger, M. Affenzeller, Parameter identification for symbolic regression using nonlinear least squares. Genet. Program. Evol. Mach. 21(3), 471–501 (2019). https://doi.org/10.1007/s10710-019-09371-3

    Article  Google Scholar 

Download references

Acknowledgements

This project is funded by Fundação de Amparo à Pesquisa do Estado de São Paulo (FAPESP), Grant Number 2021/12706-1 and CNPq through the Grant 301596/2022-0.

Author information

Authors and Affiliations

Authors

Contributions

FOF wrote the main manuscript, implemented the algorithm, executed the experiments, and prepared all figures and tables.

Corresponding author

Correspondence to Fabrício Olivetti de França.

Ethics declarations

Conflict of interest

The authors have no conflicts of interest to declare that are relevant to the content of this article.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

de França, F.O. Alleviating overfitting in transformation-interaction-rational symbolic regression with multi-objective optimization. Genet Program Evolvable Mach 24, 13 (2023). https://doi.org/10.1007/s10710-023-09461-3

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s10710-023-09461-3

Keywords

Navigation