Skip to main content
Log in

The Racing Algorithm: Model Selection for Lazy Learners

  • Published:
Artificial Intelligence Review Aims and scope Submit manuscript

Abstract

Given a set of models and some training data, we would like to find the model that best describes the data. Finding the model with the lowest generalization error is a computationally expensive process, especially if the number of testing points is high or if the number of models is large. Optimization techniques such as hill climbing or genetic algorithms are helpful but can end up with a model that is arbitrarily worse than the best one or cannot be used because there is no distance metric on the space of discrete models. In this paper we develop a technique called “racing” that tests the set of models in parallel, quickly discards those models that are clearly inferior and concentrates the computational effort on differentiating among the better models. Racing is especially suitable for selecting among lazy learners since training requires negligible expense, and incremental testing using leave-one-out cross validation is efficient. We use racing to select among various lazy learning algorithms and to find relevant features in applications ranging from robot juggling to lesion detection in MRI scans.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Similar content being viewed by others

References

  • Aha, D. W. (1990). A Study of Instance-Based Algorithms for Supervised Learning Tasks: Mathematical, Empirical and Psychological Evaluations. PhD. Thesis; Technical Report No. 90–42, University of California, Irvine.

    Google Scholar 

  • Atkeson, C. G., Moore, A. W. & Schaal, S. A. (1997). Locally Weighted Learning. AI Review, this issue.

  • Atkeson, C. G. (1990). Memory-Based Approaches to Approximating Continuous Functions. In 1990 Workshop on Nonlinear Modeling and Forecasting. Adison-Wesley.

  • Bottou, L. & Vapnik, V. (1992). Local Learning Algorithms. Neural Computation 4: 888–900.

    Google Scholar 

  • Box, G. E. P., Hunter, W. G. & and Hunter, J. S. (1978). Statistics for Experimenters. Wiley.

  • Caruana, R. A. & and Freitag, D. (1994). Greedy Attribute Selection. In Machine Learning: Proceedings of the Eleventh International Conference, pp. 28–36. Morgan Kaufmann.

  • Cleveland, W. S., Devlin, S. J. & Grosse, E. (1988). Regression by local fitting: Methods, properties, and computational algorithms. Journal of Econometrics 37: 87–114.

    Google Scholar 

  • Conte, S. D. & De Boor, C. (1980). Elementary Numerical Analysis. McGraw Hill.

  • Dasarathy, B. V. (1991). Nearest Neighbor Norms: NN Patern Classifaction Techniques. IEEE Computer Society Press.

  • Efron, B. & Tibshirani, R. (1991). Statistical Data Analysis in the Computer Age. Science 253: 390–395.

    Google Scholar 

  • Fix, E. & Hodges, J. L. (1951). Discriminatory Analysis: Nonparametric Discrimination: Consistency Properties. Project 21–49–004, Report Number 4, USAF School of Aviation Medicine.

  • Goldberg, D. (1989). Genetic Algorithms in Search, Optimization and Machine Learning. Reading, MA: Addison-Wesley.

    Google Scholar 

  • Gratch, J., Chien, S. & DeJong, G. (1993). Learning Search Control Knowledge for Deep Space Network Scheduling. In Proceedings of the 10th International Conference on Machine Learning, pp. 135–142. Morgan Kaufmann.

  • Gratch, J. (1994). An effective method for correlated selection problems. Department of Computer Science Technical Report Num. 1893, University of Illinois at Urbana-Champaign.

    Google Scholar 

  • Greiner, R. & Jurisca, I. (1992). A statistical approach to solving the EBL utility problem. In Proceedings of the Tenth International conference on Artificial Intelligence, pp. 241–248. MIT Press.

  • Hastie, T. J. & Tibshirani, R. J. (1990). Generalized additive models. Chapman and Hall.

  • Haussler, D. (1992). Decision theoretic generalizations of the pac model for neural net and other learning applications. Information and Computation 100: 78–150.

    Google Scholar 

  • Hoeffding, W. (1963). Probability inequalities for sums of bounded random variables. Journal of the American Statistical Association 58: 13–30.

    Google Scholar 

  • John, G. H., Kohavi, R. & Pfleger, K. (1994). Irrelevant features and the Subset Selection Problem. In Machine Learning: Proceedings of the Eleventh International Conference, pp. 121–129. Morgan Kaufmann.

  • Kaelbling, L. P. (1990). Learning in Embedded Systems. PhD. Thesis; Technical Report No. TR–90–04, Stanford University, Department of Computer Science.

  • Kreider, J. F. & Haberl, J. S. (1994). Predicting hourly building energy usage: The great energy predictor shootout — Overview and discussion of results. Transactions of the American Society of Heating, Refrigerating and Air-Conditioning Engineers, 100, Part 2.

  • Lowe, D. G. (1995). Similarity metric learning for a variable-kernel classifier. Neural Computation 7: 72–85.

    Google Scholar 

  • Maron, O. & Moore, A. W. (1994). Hoeffding Races: Accelerating model selection search for classification and function approximation. In Cowan, J. D., Tesauro, G. & Alspector, J. (eds.), Advances in Neural Information Processing Systems 6. Morgan Kaufmann.

  • Maron, O. (1994). Hoeffding Races: Model Selection for MRI Classification. Masters Thesis, Dept. of Electrical Engeineering and Computer Science, M.I.T.

  • Miller, A. J. (1990). Subset Selection in Regression. Chapman and Hall.

  • Moore, A. W. & Lee, M. S. (1994). Efficient Algorithms for Minimizing Cross Validation Error. In Machine Learning: Proceedings of the Eleventh International Conference, pp. 190–198. Morgan Kaufmann.

  • Moore, A. W., Hill, D. J. & Johnson, M. P. (1992). An empirical investigation of brute force to choose features, smoothers and function approximators. In Hanson, S., Judd, S. & Petsche, T. (eds.), Computational Learning Theory and Natural Learning Systems, Volume 3. MIT Press.

  • Moore, A. W. (1992). Fast, robust adaptive control by learning only forward models. In Moody, J. E., Hanson, S. J. & Lippman, R. P. (eds.), Advances in Neural Information Processing Systems 4. Morgan Kaufmann.

  • Murphy, P. M. (1996). UCI repository of machine learning databases. For more information contact ml-repository@ics.uci.edu.

  • Omohundro, S. (1993). Private communication.

  • Press, W. H., Teukolsky, S. A., Vetterling, W. T. & Flannery, B. P. (1992). Numerical Recipes in C: the art of scientific computing. New York: Cambridge University Press, second edition.

    Google Scholar 

  • Rivest, R. L. & Yin, Y. (1993). Simulation Results for a new two-armed bandit heuristic. Technical report, Laboratory for Computer Science, M.I.T.

  • Schaal, S. & Atkeson, C. G. (1993). Open loop stable control strategies for robot juggling. In Proceedings of IEEE conference on Robotics and Automation.

  • Schmitt, S. A. (1969). Measuring Uncertainty: An elementary introduction to Bayesian Statistics. Addison-Wesley.

  • Skalak, D. B. (1994). Prototype and Feature Selection by Sampling and Random Mutation Hill Climbing Algorithms. In Machine Learning: Proceedings of the Eleventh International Conference, pp. 293–301. Morgan Kaufmann.

  • Weiss, S. M. & Kulikowski, C. A. (1991). Computer systems that learn: Classification and prediction methods from statistics, neural nets, machine learning, and expert systems. San Mateo, CA: Morgan-Kaufmann.

    Google Scholar 

  • Welch, B. L. (1937). The significance of the difference between two means when the population variances are unequal. Biometrika 29.

  • Zhang, X, Mesirov, J. P. & Waltz, D. L. (1992). Hybrid system for protein secondary structure prediction. Journal of Molecular Biology 225: 1049–1063.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

About this article

Cite this article

Maron, O., Moore, A.W. The Racing Algorithm: Model Selection for Lazy Learners. Artificial Intelligence Review 11, 193–225 (1997). https://doi.org/10.1023/A:1006556606079

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1023/A:1006556606079

Navigation