Elsevier

Neurocomputing

Volume 97, 15 November 2012, Pages 251-266
Neurocomputing

A new embedding quality assessment method for manifold learning

https://doi.org/10.1016/j.neucom.2012.05.013Get rights and content

Abstract

Manifold learning is a hot research topic in the field of computer science. A crucial issue with current manifold learning methods is that they lack a natural quantitative measure to assess the quality of learned embeddings, which greatly limits their applications to real-world problems. In this paper, a new embedding quality assessment method for manifold learning, named as normalization independent embedding quality assessment (NIEQA) is proposed. Compared with current assessment methods which are limited to isometric embeddings, the NIEQA method has a much larger application range due to two features. First, it is based on a new measure which can effectively evaluate how well local neighborhood geometry is preserved under normalization, hence it can be applied to both isometric and normalized embeddings. Second, it can provide both local and global evaluations to output an overall assessment. Therefore, NIEQA can serve as a natural tool in model selection and evaluation tasks for manifold learning. Experimental results on benchmark data sets validate the effectiveness of the proposed method.

Introduction

Along with the advance of techniques to collect and store large sets of high-dimensional data, how to efficiently process such data issues a challenge for many fields in computer science, such as pattern recognition, visual understanding and data mining. The key problem is caused by “the curse of dimensionality” [1], that is, in handling with such data the computational complexities of algorithms often go up exponentially with the dimension.

The main approach to address this issue is to perform dimensionality reduction. Classical linear methods, such as principal component analysis (PCA) [2], [3] and multidimensional scaling (MDS) [4], achieve their success under the assumption that data lie in a linear subspace. However, such assumption may not usually hold and a more realistic assumption is that data lie on or close to a low-dimensional manifold embedded in the high-dimensional ambient space. Recently, many methods have been proposed to efficiently find meaningful low-dimensional embeddings from manifold-modeled data, and they form a family of dimensionality reduction methods called manifold learning. Representative methods include locally linear embedding (LLE) [5], [6], ISOMAP [7], [8], Laplacian eigenmap (LE) [9], [10], Hessian LLE (HLLE) [11], diffusion maps (DM) [12], [13], local tangent space alignment (LTSA) [14], maximum variance unfolding (MVU) [15], and Riemannian manifold learning (RML) [16].

Manifold learning methods have drawn great research interests due to their nonlinear nature, simple intuition, and computational simplicity. They also have many successful applications, such as motion detection [17], sample preprocessing [18], gait analysis [19], facial expression recognition [20], hyperspectral imagery processing [21], and visual tracking [22].

Despite the above success, a crucial issue with current manifold learning methods is that they lack a natural measure to assess the quality of learned embeddings. In supervised learning tasks such as classification, the classification rate can be directly obtained through label information and used as a natural tool to evaluate the performance of the classifier. However, manifold learning methods are fully unsupervised and the intrinsic degrees of freedom underlying high-dimensional data are unknown. Therefore, after training process, we cannot directly assess the quality of a learned embedding. As a consequence, model selection and model evaluation are infeasible. Although visual inspection on the embedding may be an intuitive and qualitative assessment, it cannot provide a quantitative evaluation. Moreover, it cannot be used for embeddings whose dimensions are larger than three.

Recently, several approaches have been proposed to address the issue of embedding quality assessment for manifold learning, which can be cast into two categories by their motivations.

  • Methods based on evaluating how well the rank of neighbor samples, according to pairwise Euclidean distances, is preserved within each local neighborhood.

  • Methods based on evaluating how well each local neighborhood matches its corresponding embedding under rigid motion or conformal mapping.

These methods are proved to be useful to isometric manifold learning methods, such as ISOMAP, MVU and RML. However, a large variety of manifold learning methods output normalized embeddings, such as LLE, HLLE, LE, and LTSA, just to name a few. In these methods, embeddings have unit variance up to a global scale factor. Then the distance rank of neighbor samples is disturbed in the embedding as pairwise Euclidean distances are no longer preserved. Meanwhile, anisotropic coordinate scaling (that is, separate scaling along each coordinate component) caused by normalization cannot be recovered by rigid motion or conformal mapping. As a consequence, existent methods would report false quality assessments for normalized embeddings.

In this paper, we first propose a new measure, named anisotropic scaling independent measure (ASIM), which can efficiently compare the similarity between two configurations under rigid motion and anisotropic coordinate scaling. Then based on ASIM, we propose a novel embedding quality assessment method, named normalization independent embedding quality assessment (NIEQA), which can efficiently assess the quality of normalized embeddings quantitatively. The NIEQA method owns three characteristics.

  • 1.

    NIEQA can be applied to both isometric and normalized embeddings. Since NIEQA uses ASIM to assess the similarity between patches in high-dimensional input space and their corresponding low-dimensional embeddings, the distortion caused by normalization can be eliminated. Then even if the aspect ratio of a learned embedding is scaled, NIEQA can still give faithful evaluation on how well the geometric structure of data manifold is preserved.

  • 2.

    NIEQA can provide both local and global assessments. NIEQA consists of two components for embedding quality assessment, a global one and a local one. The global assessment evaluates how well the skeleton of a data manifold, represented by a set of landmark points, is preserved, while the local assessment evaluates how well local neighborhoods are preserved. Therefore, NIEQA can provide an overall evaluation by combining the both.

  • 3.

    NIEQA can serve as a natural tool for model selection and evaluation tasks. Using NIEQA to provide quantitative evaluations on learned embeddings, we can select optimal parameters for a specific method and compare the performance among different methods.

In order to evaluate the performance of NIEQA, we conduct a series of experiments on benchmark data sets, including both synthetic and real-world data. Experimental results on these data sets validate the effectiveness of the proposed method.

The rest of the paper is organized as follows. A literature review on related works is presented in Section 2. The anisotropic scaling independent measure (ASIM) is described in Section 3. Then the normalization independent embedding quality assessment (NIEQA) method is depicted in Section 4. Experimental results are reported in Section 5. Discussions and concluding remarks are given in 6 Discussions, 7 Conclusions, respectively.

Section snippets

Literature review on related works

In this section, the current state-of-the-art works on embedding quality assessment methods are reviewed. For convenience and clarity of presentation, main notations used in this paper are summarized in Table 1. Throughout the whole paper, all data samples are in the form of column vectors. The superscript of a data vector is the index of its component.

According to motivation and application range, existent embedding quality assessment methods can be categorized into two groups: local and

ASIM: anisotropic scaling independent measure

In this section, we introduce a novel measure, named anisotropic scaling independent measure (ASIM), which can effectively evaluate the similarity between two configurations under rigid motion and anisotropic coordinate scaling. A synthetic example is first given in Section 3.1 to demonstrate why existent assessments fail under normalization. Then the motivation and overall description of ASIM are presented in Section 3.2. Finally, the computational details are stated in Section 3.3.

For clarity

Normalization independent embedding quality assessment

When assessing the quality of embeddings, we need to consider both local and global evaluations. This leads to two issues.

  • Does the embedding preserve the global topology of the manifold?

  • Does the embedding preserve the geometric structure of local neighborhoods?

In this section, we propose normalization independent embedding quality assessment method (NIEQA) to address these two issues, which is independent of normalization. NIEQA is based on the ASIM measure stated in Section 3 and consists of

Experiments

In this section, the effectiveness of the NIEQA method is validated through a series of experimental tests on benchmark data sets. In Section 5.1, NIEQA is applied to model evaluation. In Section 5.2, NIEQA is used to select optimal parameters. In experiments, NIEQA is compared with three commonly used assessment methods. We compute 1MLC instead of MLC to obtain a unified criterion, that is, a small assessment value close to zero indicates good quality of the embedding. In all experiments, the

Comparison with the Procrustes measure

Although ASIM is motivated from the procrustes measure (PM), it addresses a crucial issue in embedding quality assessment that has not been resolved by PM or any other measures yet, namely, how to eliminate separate scaling factors along each coordinate in the embedding. Therefore, ASIM is not a simple extension of PM, but a novel approach to achieve fair comparison between both normalized and isometric embeddings. Besides, by using ASIM, the proposed NIEQA method can output not only local but

Conclusions

In this paper, we proposed a novel normalization independent embedding quality assessment (NIEQA) method for manifold learning, which has wider application range than current approaches. We first propose a new local measure, which can quantitatively evaluate how well local neighborhood structure is preserved under rigid motion and anisotropic coordinate scaling. Then the NIEQA method, which is designed based on this new measure, can effectively and quantitatively evaluate the quality of both

Acknowledgements

This work was partly supported by the NNSF of China under Grant nos. 41174013 and no. 11071244. The authors thank the referees for their invaluable comments and suggestions which helped improve the paper greatly.

Peng Zhang received the B.Sc. degree in pure mathematics from Shandong University, Jinan, China, and the Ph.D. degree in applied mathematics from Academy of Mathematics and System Science, Chinese Academy of Sciences, Beijing, China, in 2006 and 2012, respectively. He is currently an assistant research fellow with the Data Center, National Disaster Reduction Center of China. His current research interests include machine learning, pattern recognition, data mining and their applications to

References (49)

  • I.T. Jolliffe

    Principal Component Analysis

    (2002)
  • T.F. Cox et al.

    Multidimensional Scaling

    (2000)
  • S.T. Roweis et al.

    Nonlinear dimensionality reduction by locally linear embedding

    Science

    (2000)
  • L.K. Saul et al.

    Think globally, fit locally: unsupervised learning of low dimensional manifolds

    J. Mach. Learn. Res.

    (2003)
  • J.B. Tenenbaum et al.

    A global geometric framework for nonlinear dimensionality reduction

    Science

    (2000)
  • V. De Silva, J.B. Tenenbaum, Global versus local methods in nonlinear dimensionality reduction, in: Advances in Neural...
  • M. Belkin, Problems of Learning on Manifolds, Ph.D. Thesis, The University of Chicago,...
  • M. Belkin et al.

    Laplacian eigenmaps for dimensionality reduction and data representation

    Neural Comput.

    (2003)
  • D.L. Donoho et al.

    Hessian eigenmaps: locally linear embedding techniques for high-dimensional data

    Proc. Natl. Acad. Sci. USA

    (2003)
  • R.R. Coifman et al.

    Geometric diffusions as a tool for harmonic analysis and structure definition of data: diffusion maps

    Proc. Natl. Acad. Sci. USA

    (2005)
  • S. Lafon et al.

    Diffusion maps and coarse-graining: a unified framework for dimensionality reduction, graph partitioning, and data set parameterization

    IEEE Trans. Pattern Anal. Mach. Intell.

    (2006)
  • Z. Zhang et al.

    Principal manifolds and nonlinear dimensionality reduction via tangent space alignment

    SIAM J. Sci. Comput.

    (2005)
  • K. Weinberger et al.

    Unsupervised learning of image manifolds by semidefinite programming

    Int. J. Comput. Vision

    (2006)
  • T. Lin et al.

    Riemannian manifold learning

    IEEE Trans. Pattern Anal. Mach. Intell.

    (2008)
  • Cited by (27)

    • State representation learning for control: An overview

      2018, Neural Networks
      Citation Excerpt :

      The classifier’s goal is to predict the generative factor that was kept fixed for a given difference between pairs of representations from the same latent factor. Other metrics from the area of manifold learning can be used, such as distortion (Indyk, 2001) and NIEQA (Zhang et al., 2012); both share the same principle as two quantitative measures of the global quality of a representation: the representation space should, as much as possible, be an undistorted version of the original space. Distortion (Indyk, 2001) gives insight of the quality of a representation by measuring how the local and global geometry coherence of the representation changes with respect to the ground truth.

    • A study on validating non-linear dimensionality reduction using persistent homology

      2017, Pattern Recognition Letters
      Citation Excerpt :

      Another technique, called Anisotropic Scaling Independent Measure (ASIM), can efficiently compare the similarity between two configurations of data points under rigid motion and anisotropic coordinate scaling [34]. Based on this is an embedding quality assessment method, called NIEQA [34], which considers both the local and global topology of a data set to provide an overall assessment. This method depends on different neighbourhood sizes for local and global assessment.

    • A methodology to compare Dimensionality Reduction algorithms in terms of loss of quality

      2014, Information Sciences
      Citation Excerpt :

      Different comparative studies comparing the different DR algorithms are currently being addressed in the literature [74,108,66]. Specifically, a set of quality assessment criteria, based on geometry-preservation concepts, have been used in several comparative research studies [77,37,119]. However, these studies are not sufficiently complete because of the lack of quality criteria and datasets used, as well as the fact that an exhaustive analysis of the geometry preservation is not carried out throughout the entire DR process (instead, it is carried out on a particular dimensionality, usually 2).

    View all citing articles on Scopus

    Peng Zhang received the B.Sc. degree in pure mathematics from Shandong University, Jinan, China, and the Ph.D. degree in applied mathematics from Academy of Mathematics and System Science, Chinese Academy of Sciences, Beijing, China, in 2006 and 2012, respectively. He is currently an assistant research fellow with the Data Center, National Disaster Reduction Center of China. His current research interests include machine learning, pattern recognition, data mining and their applications to analyzing and visualizing natural disaster data.

    Yuanyuan Ren received her B.A. and M.A. degrees from Shandong Normal University, Jinan, China, and Communication University of China, Beijing, China, in 2006 and 2009, respectively. She is currently with the Career Center and Department of Hydraulic Engineering, Tsinghua University, China. Her current research interests include mining, analyzing and visualizing career data.

    Bo Zhang received the B.Sc. degree in mathematics from Shandong University, Jinan, China, the M.Sc. degree in mathematics from Xi’an Jiaotong University, Xi’an, China, and the Ph.D. degree in applied mathematics from University of Strathclyde, Strathclyde, U.K. in 1983, 1985, and 1992, respectively. Currently, he is a professor in the Institute of Applied Mathematics, Academy of Mathematics and Systems Science, Chinese Academy of Sciences, China. His current research interests include direct and inverse scattering problems, computational electromagnetics, partial differential equations, and machine learning.

    View full text