Abstract
The process of developing new test statistics is laborious, requiring the manual development and evaluation of mathematical functions that satisfy several theoretical properties. Automating this process, hitherto not done, would greatly accelerate the discovery of much-needed, new test statistics. This automation is a challenging problem because it requires the discovery method to know something about the desirable properties of a good test statistic in addition to having an engine that can develop and explore candidate mathematical solutions with an intuitive representation. In this paper we describe a genetic programming-based system for the automated discovery of new test statistics. Specifically, our system was able to discover test statistics as powerful as the t test for comparing sample means from two distributions with equal variances.
Similar content being viewed by others
References
G. Casella, R.L. Berger, Statistical Inference (Duxbury Press, Pacific Grove, 2001)
L. Spector, D.M. Clark, I. Lindsay, B. Barr, J. Klein. Genetic programming for finite algebras, in Proceedings of the 10th Annual Conference on Genetic and Evolutionary Computation. (ACM, New York, 2008), pp. 1291–1298
J.R. Koza, Genetic Programming: On the Programming of Computers by Means of Natural Selection (MIT Press, Cambridge, 1992)
R. Poli, W.B. Langdon, N.F. McPhee, A Field Guide to Genetic Programming (Lulu Enterprises, UK Ltd, 2008)
M. Sipper, W. Fu, K. Ahuja, J.H. Moore, Investigating the parameter space of evolutionary algorithms. BioData Min. 11, 2 (2018)
F.-A. Fortin, F.-M.D. Rainville, M.-A. Gardner, M. Parizeau, C. Gagné, DEAP: evolutionary algorithms made easy. J. Mach. Learn. Res. 13, 2171–2175 (2012)
K. Deb, A. Pratap, S. Agarwal, T. Meyarivan, A fast and elitist multiobjective genetic algorithm: NSGA-II. IEEE Trans. Evol. Comput. 6, 182–197 (2002)
D.R. Cox, Present position and potential developments: some personal views: design of experiments and regression. J. R. Stat. Soc. Ser. A Gen. 147, 306–315 (1984)
K. Gervin, M. Hammerø, H.E. Akselsen, R. Moe, H. Nygård, I. Brandt, H.K. Gjessing, J.R. Harris, D.E. Undlien, R. Lyle, Extensive variation and low heritability of DNA methylation identified in a twin study. Genome Res. 21, 1813–1821 (2011)
K.D. Hansen, W. Timp, H.C. Bravo, S. Sabunciyan, B. Langmead, O.G. McDonald, B. Wen, H. Wu, Y. Liu, D. Diep, E. Briem, K. Zhang, R.A. Irizarry, A.P. Feinberg, Increased methylation variation in epigenetic domains across cancer types. Nat. Genet. 43, 768–775 (2011)
Y. Chen, Y. Ning, C. Hong, S. Wang, Semiparametric tests for identifying differentially methylated loci with case-control designs using Illumina arrays. Genet. Epidemiol. 38, 42–50 (2014)
C. Hong, Y. Chen, Y. Ning, S. Wang, H. Wu, R.J. Carroll, Plemt: a novel pseudolikelihood based em test for homogeneity in generalized exponential tilt mixture models. J. Am. Stat. Assoc. 112, 1393–1404 (2017)
D. Medernach, J. Fitzgerald, R.M.A Azad, C. Ryan. A new wave: a dynamic approach to genetic programming, in Proceedings of the Genetic and Evolutionary Computation Conference 2016. (ACM, New York, 2016), pp. 757–764
Acknowledgements
This work was supported by National Institutes of Health (USA) Grants LM012601, AI116794, and DK112217. We would like to thank the reviewers for the thoughtful suggestions.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Moore, J.H., Olson, R.S., Chen, Y. et al. Automated discovery of test statistics using genetic programming. Genet Program Evolvable Mach 20, 127–137 (2019). https://doi.org/10.1007/s10710-018-9338-z
Received:
Revised:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10710-018-9338-z