Skip to main content

Bayes and Tukey Meet at the Center Point

  • Conference paper
Book cover Learning Theory (COLT 2004)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 3120))

Included in the following conference series:

Abstract

The Bayes classifier achieves the minimal error rate by constructing a weighted majority over all concepts in the concept class. The Bayes Point [1] uses the single concept in the class which has the minimal error. This way, the Bayes Point avoids some of the deficiencies of the Bayes classifier. We prove a bound on the generalization error for Bayes Point Machines when learning linear classifiers, and show that it is at most ~1.71 times the generalization error of the Bayes classifier, independent of the input dimension and length of training. We show that when learning linear classifiers, the Bayes Point is almost identical to the Tukey Median [2] and Center Point [3]. We extend these definitions beyond linear classifiers and define the Bayes Depth of a classifier. We prove generalization bound in terms of this new definition. Finally we provide a new concentration of measure inequality for multivariate random variables to the Tukey Median.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Herbrich, R., Graepel, T., Campbell, C.: Bayes point machines. Journal of Machine Learning Research (2001)

    Google Scholar 

  2. Tukey, J.: Mathematics and picturing data. In: proceeding international congress of mathematics, vol. 2, pp. 523–531 (1975)

    Google Scholar 

  3. Matousěk, J.: Lectures on discrete geometry. Springer, Heidelberg (2002)

    MATH  Google Scholar 

  4. Caplin, A., Nalebuff, B.: Aggregation and social choice: A mean voter theorem. Exonometrica 59, 1–23 (1991)

    Article  MATH  MathSciNet  Google Scholar 

  5. Vapnik, V.: Statistical Learning Theory. Wiley, Chichester (1998)

    MATH  Google Scholar 

  6. Novikoff, A.B.J.: On convergence proofs on perceptrons. In: Proceedings of the Symposium on the Mathematical Theory of Automata, vol. 12, pp. 615–622 (1962)

    Google Scholar 

  7. Donoho, D., Gasko, M.: Breakdown properties of location estimates based on halfspace depth and projected outlyingness. Annals of Statistics 20, 1803–1827 (1992)

    Article  MATH  MathSciNet  Google Scholar 

  8. Bagnoli, M., Bergstrom, T.: Log-concave probability and its applications (1989), http://www.econ.ucsb.edu/~tedb/Theory/logconc.ps

  9. Ledoux, M.: The Concentration of Measure Phenomenon. American Mathematical Society, Providence (2001)

    MATH  Google Scholar 

  10. Bertsimas, D., Vempala, S.: Solving convex programs by random walks. In: STOC, pp. 109–115 (2002)

    Google Scholar 

  11. Freund, Y., Seung, H., Shamir, E., Tishby, N.: Selective sampling using the query by committee algorithm. Macine Learning 28, 133–168 (1997)

    Article  MATH  Google Scholar 

  12. Haussler, D., Kearns, M., Schapie, R.E.: Bounds on the sample complexity of bayesian learning using information theory and the vc dimension. Machine Learning 14, 83–113 (1994)

    MATH  Google Scholar 

  13. Vapnik, V., Chervonenkis, A.Y.: On the uniform covergence of relative frequencies of events to their probabilities. Theory of Probability and its Applications 16, 264–280 (1971)

    Article  MATH  Google Scholar 

  14. Bartlett, P., Mendelson, S.: Rademacher and gaussian complexities: risk bounds and structural results. Journal of Machine Learning Research 3, 463–482 (2002)

    Article  MathSciNet  Google Scholar 

  15. Littlestone, N.: Learning quickly when irrelevant attributes abound: A new linearthreshold algorithm. In: 28th Annual Symposium on Foundations of Computer Science, pp. 68–77 (1987)

    Google Scholar 

  16. Talagrand, M.: Concentration of measure and isoperimetric inequalities in product space. Publ. Math. I.H.E.S. 81, 73–205 (1995)

    MATH  MathSciNet  Google Scholar 

  17. Zuo, Y., Serfling, R.: General notions of statistical depth function. The Annals of Statistics 28, 461–482 (2000)

    Article  MATH  MathSciNet  Google Scholar 

  18. Prekopa, A.: Logarithmic concave measures with applications to stochastic programming. Acta Sci. Math. (Szeged)  32, 301–315 (1971)

    MATH  MathSciNet  Google Scholar 

  19. Borell, C.: Convex set functions in d-space. Periodica Mathematica Hungarica 6, 111–136 (1975)

    Article  MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2004 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Gilad-Bachrach, R., Navot, A., Tishby, N. (2004). Bayes and Tukey Meet at the Center Point. In: Shawe-Taylor, J., Singer, Y. (eds) Learning Theory. COLT 2004. Lecture Notes in Computer Science(), vol 3120. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-27819-1_38

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-27819-1_38

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-22282-8

  • Online ISBN: 978-3-540-27819-1

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics