Abstract
Within the framework of pac-learning, we explore the learnability of concepts from samples using the paradigm of sample compression schemes. A sample compression scheme of sizek for a concept class\(C \subseteq 2^X \) consists of a compression function and a reconstruction function. The compression function receives a finite sample set consistent with some concept inC and chooses a subset ofk examples as the compression set. The reconstruction function forms a hypothesis onX from a compression set ofk examples. For any sample set of a concept inC the compression set produced by the compression function must lead to a hypothesis consistent with the whole original sample set when it is fed to the reconstruction function. We demonstrate that the existence of a sample compression scheme of fixed-size for a classC is sufficient to ensure that the classC is pac-learnable.
Previous work has shown that a class is pac-learnable if and only if the Vapnik-Chervonenkis (VC) dimension of the class is finite. In the second half of this paper we explore the relationship between sample compression schemes and the VC dimension. We definemaximum andmaximal classes of VC dimensiond. For every maximum class of VC dimensiond, there is a sample compression scheme of sized, and for sufficiently-large maximum classes there is no sample compression scheme of size less thand. We discuss briefly classes of VC dimensiond that are maximal but not maximum. It is an open question whether every class of VC dimensiond has a sample compression scheme of size O(d).
Article PDF
Similar content being viewed by others
References
Angluin, D. (1988). Queries and concept learning.Machine Learning Vol. 2 No. 4, 319–342, Apr. 1988.
Blumer, A., Ehrenfeucht, A., Haussler, D., & Warmuth, M. (1987). Occam's razor.Information Processing Letters Vol. 24, 377–380.
Blumer, A., A. Ehrenfeucht, D. Haussler, & M. Warmuth. (1989). Learnability and the Vapnik-Chervonenkis dimension.Journal of the Association for Computing Machinery Vol. 36, No. 4, 929–965.
Blumer, A., & Littlestone, N. (1989). Learning faster than promised by the Vapnik-Chervonenkis dimension.Discrete Applied Mathematics 24, p.47–53.
Cesa-Bianchi, N., Freund, Y., Helmbold, D. P., Haussler, D., Schapire, R. E., & Warmuth, M. K. (1993). How to use expert advice.Proceedings of the 25th ACM Symposium on the Theory of Computation, 382–391.
Clarkson, K. L. (1992). Randomized geometric algorithms. In F. K. Hwang and D. Z. Hu (Eds.),Euclidean Geometry and Computers. World Scientific Publishing.
Ehrenfeucht, A., Haussler, D., Kearns, M., & Valiant, L. (1987). A general lower bound on the number of examples needed for learning.Proceedings of the 1988 Workshop on Computational Learning Theory, Morgan Kaufmann, 139–154.
Floyd, S. (1989).On space-bounded learning and the Vapnik-Chervonenkis dimension. PhD thesis, International Computer Science Institute Technical Report TR-89-061, Berkeley, California.
Freund, Y. (1995). Boosting a weak learning algorithm by majority. To appear inInformation and Computation.
Goldman, S., & Sloan, R. (1994). The power of self-directed learning.Machine Learning Vol. 14 No. 3, 271–294.
Haussler, D. (1988).Space efficient learning algorithms. Technical Report UCSC-CRL-88-2, University of California Santa Cruz.
Haussler, D., Welzl, E. (1987). Epsilon-nets & simplex range queries.Discrete and Computational Geometry 2 127–151.
Helmbold, D. P., & Warmuth, M. K. (1995). On weak learning.Journal of Computer and System Sciences, to appear.
Helmbold, D., Sloan, R., & Warmuth, M. (1990). Learning nested differences of intersection-closed concept classes.Machine Learning 5, 165–196, 1990.
Helmbold, D., Sloan, R., & Warmuth, M. (1992). Learning integer lattices.Siam Journal on Computing Vol. 21 No. 2, 240–266.
Littlestone, N. (1988). Learning when irrelevant attributes abound: A new linear-threshold algorithm.Machine Learning 2 285–318.
Littlestone, N. (1989).Mistake bounds and logarithmic linear-threshold learning algorithms. PhD thesis, Technical Report UCSC-CRL-89-11, University of California Santa Cruz.
Littlestone, N., Haussler, D., & Warmuth, M. (1994). Predicting {0, 1}-functions on randomly drawn points.Information and Computation Vol. 115 No. 2, 148–292.
Littlestone, N, & Warmuth, M. (1986).Relating data compression and learnability. Unpublished manuscript, University of California Santa Cruz.
Mitchell, T. (1977). Version spaces: a candidate elimination approach to rule learning.Proceedings of the International Joint Committee for Artificial Intelligence 1977. Cambridge, Mass., 305–310.
Pach, J., & Woeginger, G. (1990). Some new bounds for epsilon-nets.Proceedings of the Sixth Annual Symposium on Computational Geometry, Berkeley, California, 10–15.
Pitt, L., & Valiant, L. (1988). Computational limitations on learning from examples.Journal of the Association for Computing Machinery Vol. 35 No. 4, 965–984.
Quinlan, J., & Rivest, R. (1989). Inferring decision trees using the minimum description length principle.Information and Computation Vol. 80, 227–248.
Rissanen, J. (1986). Stochastic complexity and modeling.Annals of Statistics Vol. 14 No. 3, 1080–1100.
Sauer, N. (1972). On the density of families of sets.Journal of Combinatorial Theory (A) 13, 145–147.
Schapire, R. (1990). The strength of weak learnability.Machine Learning Vol. 5 No. 2, 197–227.
Shawe-Taylor, J., Anthony, M., & Biggs, N. (1989).Bounding sample size with the Vapnik-Chervonenkis dimension. Technical Report CSD-TR-618, University of London, Royal Halloway and New Bedford College.
Valiant, L.G. (1984). A theory of the learnable.Communications of the Association for Computing Machinery Vol. 27, No. 11, 1134–42.
Vapnik, V.N. (1982).Estimation of dependencies based on empirical data. Springer Verlag, New York.
Vapnik, V.N. & Chervonenkis, A.Ya. (1971). On the uniform convergence of relative frequencies of events to their probabilities.Theory of Probability and its Applications Vol. 16, No. 2, 264–280.
Welzl, E. (1987).Complete range spaces. Unpublished notes.
Welzl, E., & Woeginger, G. (1987).On Vapnik-Chervonenkis dimension one. Unpublished manuscript, Institutes for Information Processing, Technical University of Graz and Austrian Computer Society, Austria.
Author information
Authors and Affiliations
Additional information
S. Floyd was supported in part by the Director, Office of Energy Research, Scientific Computing Staff, of the U.S. Department of Energy under Contract No. DE-AC03-76SF00098.
M. Warmuth was supported by ONR grants N00014-K-86-K-0454 and NO0014-91-J-1162 and NSF grant IRI-9123692.
Rights and permissions
About this article
Cite this article
Floyd, S., Warmuth, M. Sample compression, learnability, and the Vapnik-Chervonenkis dimension. Mach Learn 21, 269–304 (1995). https://doi.org/10.1007/BF00993593
Received:
Accepted:
Issue Date:
DOI: https://doi.org/10.1007/BF00993593