Abstract
Partial differential equations and the Laplacian operator on domains in Euclidean spaces have played a central role in understanding natural phenomena. However, this avenue has been limited in many areas where calculus is obstructed, as in singular spaces, and in function spaces of functions on a space X where X itself is a function space. Examples of the latter occur in vision and quantum field theory. In vision it would be useful to do analysis on the space of images and an image is a function on a patch. Moreover, in analysis and geometry, the Lebesgue measure and its counterpart on manifolds are central. These measures are unavailable in the vision example and even in learning theory in general.
There is one situation where, in the last several decades, the problem has been studied with some success. That is when the underlying space is finite (or even discrete). The introduction of the graph Laplacian has been a major development in algorithm research and is certainly useful for unsupervised learning theory.
The approach taken here is to take advantage of both the classical research and the newer graph theoretic ideas to develop geometry on probability spaces. This starts with a space X equipped with a kernel (like a Mercer kernel) which gives a topology and geometry; X is to be equipped as well with a probability measure. The main focus is on a construction of a (normalized) Laplacian, an associated heat equation, diffusion distance, etc. In this setting, the point estimates of calculus are replaced by integral quantities. One thinks of secants rather than tangents. Our main result bounds the error of an empirical approximation to this Laplacian on X.
Similar content being viewed by others
References
Aronszajn, N.: Theory of reproducing kernels. Trans. Am. Math. Soc. 68, 337–404 (1950)
Belkin, M., Niyogi, P.: Laplacian eigenmaps for dimensionality reduction and data representation. Neural Comput. 15, 1373–1396 (2003)
Belkin, M., Niyogi, P.: Convergence of Laplacian eigenmaps. In: Neural Information Processing Systems, vol. 19, pp. 129–136 (2007)
Belkin, M., De Vito, E., Rosasco, L.: Random estimates of operators and their spectral properties for learning. Working paper
Bhatia, R.: Matrix Analysis. Graduate Texts in Mathematics, vol. 169. Springer, New York (1997)
Blanchard, G., Bousquet, O., Zwald, L.: Statistical properties of kernel principal component analysis. Mach. Learn. 66, 259–294 (2007)
Bougleux, S., Elmoataz, A., Melkemi, M.: Discrete Regularization on Weighted Graphs for Image and Mesh Filtering. Lecture Notes in Computer Science, vol. 4485, pp. 128–139. Springer, Berlin (2007)
Chung, F.R.K.: Spectral Graph Theory. Regional Conference Series in Mathematics, vol. 92. SIAM, Philadelphia (1997)
Coifman, R., Lafon, S., Lee, A., Maggioni, M., Nadler, B., Warner, F., Zucker, S.: Geometric diffusions as a tool for harmonic analysis and structure definition of data: diffusion maps. Proc. Natl. Acad. U.S.A. 102, 7426–7431 (2005)
Coifman, R., Maggioni, M.: Diffusion wavelets. Appl. Comput. Harmon. Anal. 21, 53–94 (2006)
Cucker, F., Smale, S.: On the mathematical foundations of learning. Bull. Am. Math. Soc. 39, 1–49 (2001)
Cucker, F., Zhou, D.X.: Learning Theory: An Approximation Theory Viewpoint. Cambridge University Press, Cambridge (2007)
De Vito, E., Caponnetto, A., Rosasco, L.: Model selection for regularized least-squares algorithm in learning theory. Found. Comput. Math. 5, 59–85 (2005)
De Vito, E., Rosasco, L., Caponnetto, A., De Giovannini, U., Odone, F.: Learning from examples as an inverse problem. J. Mach. Learn. Res. 6, 883–904 (2005)
Gilboa, G., Osher, S.: Nonlocal operators with applications to image processing. UCLA CAM Report 07-23, July 2007
Koltchinskii, V., Giné, E.: Random matrix approximation of spectra of integral operators. Bernoulli 6, 113–167 (2000)
Pinelis, I.: Optimum bounds for the distributions of martingales in Banach spaces. Ann. Probab. 22, 1679–1706 (1994)
Smale, S., Yao, Y.: Online learning algorithms. Found. Comput. Math. 6, 145–170 (2006)
Smale, S., Zhou, D.X.: Shannon sampling and function reconstruction from point values. Bull. Am. Math. Soc. 41, 279–305 (2004)
Smale, S., Zhou, D.X.: Shannon sampling II. Connections to learning theory. Appl. Comput. Harmonic Anal. 19, 285–302 (2005)
Smale, S., Zhou, D.X.: Learning theory estimates via integral operators and their approximations. Constr. Approx. 26, 153–172 (2007)
von Luxburg, U., Belkin, M., Bousquet, O.: Consistency of spectral clustering. Ann. Stat. 36, 555–586 (2008)
Yao, Y., Rosasco, L., Caponnetto, A.: On early stopping in gradient descent learning. Constr. Approx. 26, 289–315 (2007)
Ye, G.B., Zhou, D.X.: Learning and approximation by Gaussians on Riemannian manifolds. Adv. Comput. Math. 29, 291–310 (2008)
Zhou, D., Schölkopf, B.: Regularization on discrete spaces. In: Pattern Recognition, Proc. 27th DAGM Symposium, Berlin, pp. 361–368 (2005)
Author information
Authors and Affiliations
Corresponding author
Additional information
Communicated by Mauro Maggioni.
Rights and permissions
About this article
Cite this article
Smale, S., Zhou, DX. Geometry on Probability Spaces. Constr Approx 30, 311–323 (2009). https://doi.org/10.1007/s00365-009-9070-2
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00365-009-9070-2
Keywords
- Learning theory
- Reproducing kernel Hilbert space
- Graph Laplacian
- Dimensionality reduction
- Integral operator