August 2023 Learning low-dimensional nonlinear structures from high-dimensional noisy data: An integral operator approach
Xiucai Ding, Rong Ma
Author Affiliations +
Ann. Statist. 51(4): 1744-1769 (August 2023). DOI: 10.1214/23-AOS2306

Abstract

We propose a kernel-spectral embedding algorithm for learning low-dimensional nonlinear structures from noisy and high-dimensional observations, where the data sets are assumed to be sampled from a nonlinear manifold model and corrupted by high-dimensional noise. The algorithm employs an adaptive bandwidth selection procedure which does not rely on prior knowledge of the underlying manifold. The obtained low-dimensional embeddings can be further utilized for downstream purposes such as data visualization, clustering and prediction. Our method is theoretically justified and practically interpretable. Specifically, for a general class of kernel functions, we establish the convergence of the final embeddings to their noiseless counterparts when the dimension grows polynomially with the size, and characterize the effect of the signal-to-noise ratio on the rate of convergence and phase transition. We also prove the convergence of the embeddings to the eigenfunctions of an integral operator defined by the kernel map of some reproducing kernel Hilbert space capturing the underlying nonlinear structures. Our results hold even when the dimension of the manifold grows with the sample size. Numerical simulations and analysis of real data sets show the superior empirical performance of the proposed method, compared to many existing methods, on learning various nonlinear manifolds in diverse applications.

Acknowledgments

The authors would like to thank the Editor, the Associate Editor and two anonymous reviewers for their suggestions and comments, which have resulted in a significant improvement of the manuscript.

Citation

Download Citation

Xiucai Ding. Rong Ma. "Learning low-dimensional nonlinear structures from high-dimensional noisy data: An integral operator approach." Ann. Statist. 51 (4) 1744 - 1769, August 2023. https://doi.org/10.1214/23-AOS2306

Information

Received: 1 September 2022; Revised: 1 March 2023; Published: August 2023
First available in Project Euclid: 19 October 2023

Digital Object Identifier: 10.1214/23-AOS2306

Subjects:
Primary: 62R07 , 62R30
Secondary: 47G10

Keywords: High-dimensional data , kernel method , manifold learning , nonlinear dimension reduction , Spectral method

Rights: Copyright © 2023 Institute of Mathematical Statistics

JOURNAL ARTICLE
26 PAGES

This article is only available to subscribers.
It is not available for individual sale.
+ SAVE TO MY LIBRARY

Vol.51 • No. 4 • August 2023
Back to Top