Skip to main content

Characterizing rational versus exponential learning curves

  • Conference paper
  • First Online:
Computational Learning Theory (EuroCOLT 1995)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 904))

Included in the following conference series:

Abstract

We consider the standard problem of learning a concept from random examples. Here a learning curve can be defined to be the expected error of a learner's hypotheses as a function of training sample size. Haussler, Littlestone and Warmuth have shown that, in the distribution free setting, the smallest expected error a learner can achieve in the worst case over a concept class C converges rationally to zero error (i.e., Θ(l/t) for training sample size t). However, recently Cohn and Tesauro have demonstrated how exponential convergence can often be observed in experimental settings (i.e., average error decreasing as e Θ(−t)). By addressing a simple non-uniformity in the original analysis, this paper shows how the dichotomy between rational and exponential worst case learning curves can be recovered in the distribution free theory. These results support the experimental findings of Cohn and Tesauro: for finite concept classes, any consistent learner achieves exponential convergence, even in the worst case; but for continuous concept classes, no learner can exhibit sub-rational convergence for every target concept and domain distribution. A precise boundary between rational and exponential convergence is drawn for simple concept Chains. Here we show that somewhere dense chains always force rational convergence in the worst case, but exponential convergence can always be achieved for nowhere dense chains.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. S. Amari, N. Fujita, and S. Shinomoto. Four types of learning curves. Neural Computation, 4:605–618, 1992.

    Google Scholar 

  2. R. B. Ash. Real Analysis and Probability. Academic Press, San Diego, 1972.

    Google Scholar 

  3. E. B. Baum and Y.-D. Lyuu. The transition to perfect generalization in perceptrons. Neural Computation, 3:386–401, 1991.

    Google Scholar 

  4. R. Brualdi. Introductory Combinatorics. North-Holland, New York, 1977.

    Google Scholar 

  5. D. Cohn and G. Tesauro. Can neural networks do better than the Vapnik-Chervonenkis bounds? In D. Touretzky, editor, Advances in Neural Information Processing Systems 3. Morgan Kaufmann, San Mateo, CA, 1990.

    Google Scholar 

  6. D. Cohn and G. Tesauro. How tight are the Vapnik-Chervonenkis bounds? Neural Computation, 4:249–269, 1992.

    Google Scholar 

  7. M. Golea and M. Marchand. Average case analysis of the clipped Hebb rule for nonoverlapping Perceptron networks. In Proceedings COLT-93, pages 151–157, 1993.

    Google Scholar 

  8. D. Haussler, M. Kearns, H. S. Seung, and N. Tishby. Rigorous learning curve bounds from statistical mechanics. In Proceedings COLT-94, 1994.

    Google Scholar 

  9. D. Haussler, N. Littlestone, and M. K. Warmuth. Predicting {0,1}-functions on randomly drawn points. In Proceedings COLT-88, pages 280–296, 1988.

    Google Scholar 

  10. D. Haussler, N. Littlestone, and M. K. Warmuth. Predicting {0,1}-functions on randomly drawn points. Technical Report UCSC-CRL-90-54, Computer Research Laboratory, University of California at Santa Cruz, 1990.

    Google Scholar 

  11. R. J. Larsen and M. L. Marx. An Introduction to Mathematical Statistics and its Applications. Prentice-Hall, Englewood Cliffs, NJ, 1981.

    Google Scholar 

  12. M. Opper and D. Haussler. Generalization performance of Bayes optimal classification algorithm for learning a Perceptron. Physical Review Letters, 66(20):2677–2680, 1991.

    Google Scholar 

  13. M. J. Pazzani and W. Sarrett. Average case analysis of conjunctive learning algorithms. In Proceedings ML-90, pages 339–347, 1990.

    Google Scholar 

  14. J. G. Rosenstein. Linear Orderings. Academic Press, New York, 1982.

    Google Scholar 

  15. D. B. Schwartz, V. K. Samalam, S. A. Solla, and J. S. Denker. Exhaustive learning. Neural Computation, 2:374–385, 1990.

    Google Scholar 

  16. H. S. Seung, H. Sompolinsky, and N. Tishby. Learning curves in large neural networks. In Proceedings COLT-91, pages 112–127, 1991.

    Google Scholar 

  17. L. G. Valiant. A theory of the learnable. Communications of the ACM, 27(11):1134–1142, 1984.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Paul Vitányi

Rights and permissions

Reprints and permissions

Copyright information

© 1995 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Schuurmans, D. (1995). Characterizing rational versus exponential learning curves. In: Vitányi, P. (eds) Computational Learning Theory. EuroCOLT 1995. Lecture Notes in Computer Science, vol 904. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-59119-2_184

Download citation

  • DOI: https://doi.org/10.1007/3-540-59119-2_184

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-59119-1

  • Online ISBN: 978-3-540-49195-8

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics