Elsevier

Pattern Recognition Letters

Volume 43, 1 July 2014, Pages 47-61
Pattern Recognition Letters

Low rank subspace clustering (LRSC)

https://doi.org/10.1016/j.patrec.2013.08.006Get rights and content

Highlights

Abstract

We consider the problem of fitting a union of subspaces to a collection of data points drawn from one or more subspaces and corrupted by noise and/or gross errors. We pose this problem as a non-convex optimization problem, where the goal is to decompose the corrupted data matrix as the sum of a clean and self-expressive dictionary plus a matrix of noise and/or gross errors. By self-expressive we mean a dictionary whose atoms can be expressed as linear combinations of themselves with low-rank coefficients. In the case of noisy data, our key contribution is to show that this non-convex matrix decomposition problem can be solved in closed form from the SVD of the noisy data matrix. The solution involves a novel polynomial thresholding operator on the singular values of the data matrix, which requires minimal shrinkage. For one subspace, a particular case of our framework leads to classical PCA, which requires no shrinkage. For multiple subspaces, the low-rank coefficients obtained by our framework can be used to construct a data affinity matrix from which the clustering of the data according to the subspaces can be obtained by spectral clustering. In the case of data corrupted by gross errors, we solve the problem using an alternating minimization approach, which combines our polynomial thresholding operator with the more traditional shrinkage-thresholding operator. Experiments on motion segmentation and face clustering show that our framework performs on par with state-of-the-art techniques at a reduced computational cost.

Introduction

The past few decades have seen an explosion in the availability of datasets from multiple modalities. While such datasets are usually very high-dimensional, their intrinsic dimension is often much smaller than the dimension of the ambient space. For instance, the number of pixels in an image can be huge, yet most computer vision models use a few parameters to describe the appearance, geometry and dynamics of a scene. This has motivated the development of a number of techniques for finding low-dimensional representations of high-dimensional data.

One of the most commonly used methods is Principal Component Analysis (PCA), which models the data with a single low-dimensional subspace. In practice, however, the data points could be drawn from multiple subspaces and the membership of the data points to the subspaces could be unknown. For instance, a video sequence could contain several moving objects and different subspaces might be needed to describe the motion of different objects in the scene. Therefore, there is a need to simultaneously cluster the data into multiple subspaces and find a low-dimensional subspace fitting each group of points. This problem, known as subspace clustering, finds numerous applications in computer vision, e.g., image segmentation (Yang et al., 2008), motion segmentation (Vidal et al., 2008) and face clustering (Ho et al., 2003), image processing, e.g., image representation and compression (Hong et al., 2006), and systems theory, e.g., hybrid system identification (Vidal et al., 2003b).

Over the past decade, a number of subspace clustering methods have been developed. This includes algebraic methods (Boult and Brown, 1991, Costeira and Kanade, 1998, Gear, 1998, Vidal et al., 2003a, Vidal et al., 2004, Vidal et al., 2005), iterative methods (Bradley and Mangasarian, 2000, Tseng, 2000, Agarwal and Mustafa, 2004, Lu and Vidal, 2006, Zhang et al., 2009), statistical methods (Tipping and Bishop, 1999, Sugaya and Kanatani, 2004, Gruber and Weiss, 2004, Yang et al., 2006, Ma et al., 2007, Rao et al., 2008, Rao et al., 2010), and spectral clustering-based methods (Boult and Brown, 1991, Yan and Pollefeys, 2006, Zhang et al., 2010, Goh and Vidal, 2007, Elhamifar and Vidal, 2009, Elhamifar and Vidal, 2010, Elhamifar and Vidal, 2013, Liu et al., 2010, Chen and Lerman, 2009). Among them, methods based on spectral clustering have been shown to perform very well for several applications in computer vision (see Vidal (2011) for a review and comparison of existing methods).

Spectral clustering-based methods (see von Luxburg, 2007 for a review) decompose the subspace clustering problem in two steps. In the first step, a symmetric affinity matrix C=[cij] is constructed, where cij=cji0 measures whether points i and j belong to the same subspace. Ideally cij1 if points i and j are in the same subspace and cij0 otherwise. In the second step, a weighted undirected graph is constructed where the data points are the nodes and the affinities cij are the weights. The segmentation of the data is then found by clustering the eigenvectors of the graph Laplacian using central clustering techniques, such as k-means. Arguably, the most difficult step is to build a good affinity matrix. This is because two points could be very close to each other, but lie in different subspaces (e.g., near the intersection of two subspaces). Conversely, two points could be far from each other, but lie in the same subspace.

Earlier methods for building an affinity matrix (Boult and Brown, 1991, Costeira and Kanade, 1998) compute the singular value decomposition (SVD) of the data matrix D=UΣV and let C=V1V1, where the columns of V1 are the top r=rank(D) singular vectors of D. The rationale behind this choice is that cij=0 when points i and j are in different independent subspaces and the data are uncorrupted, as shown in Vidal et al. (2005). In practice, however, the data are often contaminated by noise and gross errors. In such cases, the equation cij=0 does not hold, even if the rank of the noiseless D was given. Moreover, selecting a good value for r becomes very difficult, because D is full rank. Furthermore, the equation cij=0 is derived under the assumption that the subspaces are linear. In practice, many datasets are better modeled by affine subspaces.

More recent methods for building an affinity matrix address these issues by using techniques from sparse and low-rank representation. For instance, it is shown in Elhamifar and Vidal, 2009, Elhamifar and Vidal, 2010, Elhamifar and Vidal, 2013 that a point in a union of multiple subspaces admits a sparse representation with respect to the dictionary formed by all other data points, i.e., D=DC, where C is sparse. It is also shown in Elhamifar and Vidal, 2009, Elhamifar and Vidal, 2010, Elhamifar and Vidal, 2013 that, if the subspaces are independent, the nonzero coefficients in the sparse representation of a point correspond to other points in the same subspace, i.e., if cij0, then points i and j belong to the same subspace. Moreover, the nonzero coefficients can be obtained by 1 minimization. These coefficients are then converted into symmetric and nonnegative affinities, from which the segmentation is found using spectral clustering. A very similar approach is presented in Liu et al. (2010). The major difference is that a low-rank representation is used in lieu of the sparsest representation. While the same principle of representing a point as a linear combination of other points has been successfully used when the data are corrupted by noise and gross errors, from a theoretical viewpoint it is not clear that the above methods are effective when using a corrupted dictionary.

In this paper, we propose a general optimization framework for solving the subspace clustering problem in the case of data corrupted by noise and/or gross errors. Given a corrupted data matrix DRM×N, we wish to decompose it as the sum of a self-expressive, noise-free and outlier-free (clean) data matrix ARM×N, a noise matrix GRM×N, and a matrix of sparse gross errors ERM×N. We assume that the columns of the matrix A=a1,a2,,aN are points in RM drawn from a union of n1 low-dimensional linear subspaces of unknown dimensions {di}i=1n, where diM. We also assume that A is self-expressive, which means that the clean data points can be expressed as linear combinations of themselves, i.e.,aj=i=1NaicijorA=AC,where C=[cij] is the matrix of coefficients. This constraint aims to capture the fact that a point in a linear subspace can be expressed as a linear combination of other points in the same subspace. Therefore, we expect cij to be zero if points i and j are in different subspaces.

Notice that the constraint A=AC is non-convex, because both A and C are unknown. This is an important difference with respect to existing methods, which enforce D=DC where D is the dictionary of corrupted data points. Another important difference is that we directly enforce C to be symmetric, while existing methods symmetrize C as a post-processing step.

The proposed framework, which we call Low Rank Subspace Clustering (LRSC), is based on solving the following non-convex optimization problem:(P)minA,C,E,GC+τ2A-ACF2+α2GF2+γE1s.t.D=A+G+EandC=C,where X=iσi(X), XF2=ijXij2 and X1=ij|Xij| are, respectively, the nuclear, Frobenius and 1 norms of X. The above formulation encourages:

  • C to be low-rank (by minimizing C),

  • A to be self-expressive (by minimizing A-ACF2),

  • G to be small (by minimizing GF2), and

  • E to be sparse (by minimizing E1).

The main contribution of our work is to show that important particular cases of P (see Table 1) can be solved in closed form from the SVD of the data matrix. In particular, we show that in the absence of gross errors (i.e., γ=), A and C can be obtained by thresholding the singular values of D and A, respectively. The thresholding is done using a novel polynomial thresholding operator, which reduces the amount of shrinkage with respect to existing methods. Indeed, when the self-similarity constraint A=AC is enforced exactly (i.e., α=), the optimal solution for A reduces to classical PCA, which does not perform any shrinkage. Moreover, the optimal solution for C reduces to the affinity matrix for subspace clustering proposed by Costeira and Kanade (1998). In the case of data corrupted by gross errors, a closed-form solution appears elusive. We thus use an augmented Lagrange multipliers method. Each iteration of our method involves a polynomial thresholding of the singular values to reduce the rank and a regular shrinkage-thresholding to reduce gross errors.

The remainder of the paper is organized as follows (see Table 1): Section 2 reviews existing results on sparse representation and rank minimization for subspace estimation and clustering as well as some background material needed for our derivations. Section 3 formulates the low rank subspace clustering problem for linear subspaces in the absence of noise or gross errors and derives a closed form solution for A and C. Section 4 extends the results of Section 3 to data contaminated by noise and derives a closed form solution for A and C based on the polynomial thresholding operator. Section 5 extends the results to data contaminated by both noise and gross errors and shows that A and C can be found using alternating minimization. Section 6 presents experiments that evaluate our method on synthetic and real data. Section 7 gives the conclusions.

Section snippets

Background

In this section we review existing results on sparse representation and rank minimization for subspace estimation (Section 2.1) and and subspace clustering (Section 2.2).

Low rank subspace clustering with uncorrupted data

In this section, we consider the low rank subspace clustering problem P in the case of uncorrupted data, i.e., α= and γ= so that G=E=0 and D=A. In Section 3.1, we show that the optimal solution for C can be obtained in closed form from the SVD of A by applying a nonlinear thresholding to its singular values. In Section 3.2, we assume that the self-expressiveness constraint is satisfied exactly, i.e., τ= so that A=AC. As shown in Liu et al. (2011), the optimal solution to this problem can be

Low rank subspace clustering with noisy data

In this section, we consider the low rank subspace clustering problem P in the case of noisy data, i.e., λ=, so that E=0 and D=A+G. While in principle the resulting problem appears to be very similar to those in eqs. (16) and (19), there are a number of differences. First, notice that instead of expressing the noisy data as a linear combination of itself plus noise, i.e., D=DC+G, we search for a clean dictionary, A, which is self-expressive, i.e., A=AC. We then assume that the data are

Low rank subspace clustering with corrupted data

In this section, we consider the low-rank subspace clustering problem in the case of data corrupted by both noise and gross errors, i.e., problem P. Similar to the case of noisy data discussed in Section 4, the major difference between P and the problems in (17), (19) is that, rather than using a corrupted dictionary, we search simultaneously for the clean dictionary A, the low-rank coefficients C and the sparse errors E. Also, notice that the 1 norm of the matrix of coefficients is replaced

Experiments

In this section we evaluate the performance of LRSC on two computer vision tasks: motion segmentation and face clustering. Using the subspace clustering error,Subspace clustering error=#of misclassified pointstotal#of points,as a measure of performance, we compare LRSC to state-of-the-art subspace clustering algorithms based on spectral clustering, such as LSA (Yan and Pollefeys, 2006), SCC (Chen and Lerman, 2009), LRR (Liu et al., 2010), and SSC (Elhamifar and Vidal, 2013). We choose these

Discussion and conclusion

We have proposed a new algorithm for clustering data drawn from a union of subspaces and corrupted by noise/gross errors. Our approach was based on solving a non-convex optimization problem whose solution provides an affinity matrix for spectral clustering. Our key contribution was to show that important particular cases of our formulation can be solved in closed form by applying a polynomial thresholding operator to the SVD of the data. A drawback of our approach to be addressed in the future

References (49)

  • A. Yang et al.

    Unsupervised segmentation of natural images via lossy data compression

    Computer Vision and Image Understanding

    (2008)
  • Agarwal, P., Mustafa, N., 2004. k-Means projective clustering. In: ACM Symposium on Principles of database...
  • R. Basri et al.

    Lambertian reflection and linear subspaces

    IEEE Transactions on Pattern Analysis and Machine Intelligence

    (2003)
  • Boult, T., Brown, L., 1991. Factorization-based segmentation of motions. In: IEEE Workshop on Motion Understanding, pp....
  • P.S. Bradley et al.

    k-plane clustering

    Journal of Global Optimization

    (2000)
  • J.-F. Cai et al.

    A singular value thresholding algorithm for matrix completion

    SIAM Journal of Optimization

    (2008)
  • E. Candès et al.

    Robust principal component analysis?

    Journal of the ACM

    (2011)
  • G. Chen et al.

    Spectral curvature clustering (SCC)

    International Journal of Computer Vision

    (2009)
  • J. Costeira et al.

    A multibody factorization method for independently moving objects

    International Journal of Computer Vision

    (1998)
  • Elhamifar, E., Vidal, R., 2009. Sparse subspace clustering. In: IEEE Conference on Computer Vision and Pattern...
  • Elhamifar, E., Vidal, R., 2010. Clustering disjoint subspaces via sparse representation. In: IEEE International...
  • E. Elhamifar et al.

    Sparse subspace clustering: Algorithm, theory, and applications

    (2013)
  • Favaro, P., Vidal, R., Ravichandran, A., 2011. A closed form solution to robust subspace estimation and clustering. In:...
  • C.W. Gear

    Multibody grouping from motion images

    Int. Journal of Computer Vision

    (1998)
  • Goh, A., Vidal, R., 2007. Segmenting motions of different types by unsupervised manifold clustering. In: IEEE...
  • Gruber, A., Weiss, Y., 2004. Multibody factorization with uncertainty and missing data using the EM algorithm. In: IEEE...
  • Ho, J., Yang, M.H., Lim, J., Lee, K., Kriegman, D., 2003. Clustering appearances of objects under varying illumination...
  • W. Hong et al.

    Multi-scale hybrid linear models for lossy image representation

    IEEE Trans. on Image Processing

    (2006)
  • Lauer, F., Schnörr, C., 2009. Spectral clustering of linear subspaces for motion segmentation. In: IEEE International...
  • K.-C. Lee et al.

    Acquiring linear subspaces for face recognition under variable lighting

    IEEE Transactions on Pattern Analysis and Machine Intelligence

    (2005)
  • Lin, Z., Chen, M., Wu, L., Ma, Y., 2011. The augmented Lagrange multiplier method for exact recovery of corrupted...
  • Liu, G., Lin, Z., Yan, S., Sun, J., Yu, Y., Ma, Y., 2011. Robust recovery of subspace structures by low-rank...
  • G. Liu et al.

    Robust recovery of subspace structures by low-rank representation

    (2012)
  • Liu, G., Lin, Z., Yu, Y., 2010. Robust subspace segmentation by low-rank representation. In: International Conference...
  • Cited by (461)

    • Cost-sensitive sparse subset selection

      2024, International Journal of Machine Learning and Cybernetics
    View all citing articles on Scopus
    View full text