Abstract
This paper presents an error analysis for classification algorithms generated by regularization schemes with polynomial kernels. Explicit convergence rates are provided for support vector machine (SVM) soft margin classifiers. The misclassification error can be estimated by the sum of sample error and regularization error. The main difficulty for studying algorithms with polynomial kernels is the regularization error which involves deeply the degrees of the kernel polynomials. Here we overcome this difficulty by bounding the reproducing kernel Hilbert space norm of Durrmeyer operators, and estimating the rate of approximation by Durrmeyer operators in a weighted L1 space (the weight is a probability distribution). Our study shows that the regularization parameter should decrease exponentially fast with the sample size, which is a special feature of polynomial kernels.
Similar content being viewed by others
References
N. Aronszajn, Theory of reproducing kernels, Trans. Amer. Math. Soc. 68 (1950) 337–404.
P.L. Bartlett, M.I. Jordan and J.D. McAuliffe, Convexity, classification, and risk bounds, Preprint (2003).
B.E. Boser, I. Guyon and V. Vapnik, A training algorithm for optimal margin classifiers, in: Proc. of the 5th Annual Workshop of Computational Learning Theory, Vol. 5 (ACM, Pittsburgh, 1992) pp. 144–152.
O. Bousquet and A. Elisseeff, Stability and generalization, J. Mach. Learning Res. 2 (2002) 499–526.
D. Chen, Q. Wu, Y. Ying and D.X. Zhou, Support vector machine soft margin classifiers: Error analysis, J. Mach. Learning Res. 5 (2004) 1143–1175.
C. Cortes and V. Vapnik, Support-vector networks, Mach. Learning 20 (1995) 273–297.
N. Cristianini and J. Shawe-Taylor, An Introduction to Support Vector Machines (Cambridge Univ. Press, Cambridge, 2000).
F. Cucker and S. Smale, On the mathematical foundations of learning, Bull. Amer. Math. Soc. 39 (2001) 1–49.
F. Cucker and S. Smale, Best choices for regularization parameters in learning theory: On the bias-variance problem, Found. Comput. Math. 1 (2002) 413–428.
F. Cucker and D.X. Zhou, Learning Theory: An Approximation Theory Viewpoint, Monograph manuscript in preparation for Cambridge Univ. Press.
L. Devroye, L. Györfi and G. Lugosi, A Probabilistic Theory of Pattern Recognition (Springer, New York, 1997).
T. Evgeniou, M. Pontil and T. Poggio, Regularization networks and support vector machines, Adv. Comput. Math. 13 (2000) 1–50.
B. Hammer and K. Gersmann, A note on the universal approximation capability of support vector machines, Neural Processing Lett. 17 (2003) 43–53.
Y. Lin, Support vector machines and the Bayes rule in classification, Data Mining Knowledge Discovery 6 (2002) 259–275.
P. Niyogi and F. Girosi, On the relationship between generalization error, hypothesis complexity, and sample complexity for radial basis functions, Neural Comput. 8 (1996) 819–842.
F. Rosenblatt, Principles of Neurodynamics (Spartan Book, New York, 1962).
C. Scovel and I. Steinwart, Fast rates for support vector machines, Preprint (December, 2003).
J. Shawe-Taylor, P.L. Bartlett, R.C. Williamson and M. Anthony, Structural risk minimization over data-dependent hierarchies, IEEE Trans. Inform. Theory 44 (1998) 1926–1940.
S. Smale and D.X. Zhou, Estimating the approximation error in learning theory, Anal. Appl. 1 (2003) 17–41.
S. Smale and D.X. Zhou, Shannon sampling and function reconstruction from point values, Bull. Amer. Math. Soc. 41 (2004) 279–305.
S. Smale and D.X. Zhou, Shannon sampling II. Connections to learning theory, Appl. Comput. Harmonic Anal. (July 2004) submitted by invitation.
I. Steinwart, Support vector machines are universally consistent, J. Complexity 18 (2002) 768–791.
I. Steinwart, On the influence of the kernel on the consistency of support vector machines, J. Mach. Learning Res. 2 (2001) 67–73.
A.B. Tsybakov, Optimal aggregation of classifiers in statistical learning, Ann. Statist. 32 (2004) 135–166.
V. Vapnik, Statistical Learning Theory (Wiley, New York, 1998).
G. Wahba, Spline Models for Observational Data (SIAM, Philadelphia, PA, 1990).
G. Wahba, Support vector machines, reproducing kernel Hilbert spaces and the Randomized GACV, in: Advances in Kernel Methods – Support Vector Learning, eds. Schölkopf, Burges and Smola (MIT Press, Cambridge, MA, 1999) pp. 69–88.
Q. Wu, Y. Ying and D.X. Zhou, Multikernel regularized classifiers, J. Complexity, to appear.
Q. Wu and D.X. Zhou, Analysis of support vector machine classification (2004) submitted.
Q. Wu and D.X. Zhou, Support vector machine classifiers: Linear programming versus quadratic programming, Neural Computation, in press.
T. Zhang, Statistical behavior and consistency of classification methods based on convex risk minimization, Ann. Statist. 32 (2004) 56–85.
D.X. Zhou, The covering number in learning theory, J. Complexity 18 (2002) 739–767.
D.X. Zhou, Capacity of reproducing kernel spaces in learning theory, IEEE Trans. Inform. Theory 49 (2003) 1743–1752.
D.X. Zhou, Density problem and approximation error in learning theory (2003) submitted.
Author information
Authors and Affiliations
Additional information
Communicated by Y. Xu
Dedicated to Charlie Micchelli on the occasion of his 60th birthday
Mathematics subject classifications (2000)
68T05, 62J02.
Ding-Xuan Zhou: The first author is supported partially by the Research Grants Council of Hong Kong (Project No. CityU 103704).
Rights and permissions
About this article
Cite this article
Zhou, DX., Jetter, K. Approximation with polynomial kernels and SVM classifiers. Adv Comput Math 25, 323–344 (2006). https://doi.org/10.1007/s10444-004-7206-2
Received:
Accepted:
Issue Date:
DOI: https://doi.org/10.1007/s10444-004-7206-2