Low rank subspace clustering (LRSC)
Introduction
The past few decades have seen an explosion in the availability of datasets from multiple modalities. While such datasets are usually very high-dimensional, their intrinsic dimension is often much smaller than the dimension of the ambient space. For instance, the number of pixels in an image can be huge, yet most computer vision models use a few parameters to describe the appearance, geometry and dynamics of a scene. This has motivated the development of a number of techniques for finding low-dimensional representations of high-dimensional data.
One of the most commonly used methods is Principal Component Analysis (PCA), which models the data with a single low-dimensional subspace. In practice, however, the data points could be drawn from multiple subspaces and the membership of the data points to the subspaces could be unknown. For instance, a video sequence could contain several moving objects and different subspaces might be needed to describe the motion of different objects in the scene. Therefore, there is a need to simultaneously cluster the data into multiple subspaces and find a low-dimensional subspace fitting each group of points. This problem, known as subspace clustering, finds numerous applications in computer vision, e.g., image segmentation (Yang et al., 2008), motion segmentation (Vidal et al., 2008) and face clustering (Ho et al., 2003), image processing, e.g., image representation and compression (Hong et al., 2006), and systems theory, e.g., hybrid system identification (Vidal et al., 2003b).
Over the past decade, a number of subspace clustering methods have been developed. This includes algebraic methods (Boult and Brown, 1991, Costeira and Kanade, 1998, Gear, 1998, Vidal et al., 2003a, Vidal et al., 2004, Vidal et al., 2005), iterative methods (Bradley and Mangasarian, 2000, Tseng, 2000, Agarwal and Mustafa, 2004, Lu and Vidal, 2006, Zhang et al., 2009), statistical methods (Tipping and Bishop, 1999, Sugaya and Kanatani, 2004, Gruber and Weiss, 2004, Yang et al., 2006, Ma et al., 2007, Rao et al., 2008, Rao et al., 2010), and spectral clustering-based methods (Boult and Brown, 1991, Yan and Pollefeys, 2006, Zhang et al., 2010, Goh and Vidal, 2007, Elhamifar and Vidal, 2009, Elhamifar and Vidal, 2010, Elhamifar and Vidal, 2013, Liu et al., 2010, Chen and Lerman, 2009). Among them, methods based on spectral clustering have been shown to perform very well for several applications in computer vision (see Vidal (2011) for a review and comparison of existing methods).
Spectral clustering-based methods (see von Luxburg, 2007 for a review) decompose the subspace clustering problem in two steps. In the first step, a symmetric affinity matrix is constructed, where measures whether points i and j belong to the same subspace. Ideally if points i and j are in the same subspace and otherwise. In the second step, a weighted undirected graph is constructed where the data points are the nodes and the affinities are the weights. The segmentation of the data is then found by clustering the eigenvectors of the graph Laplacian using central clustering techniques, such as k-means. Arguably, the most difficult step is to build a good affinity matrix. This is because two points could be very close to each other, but lie in different subspaces (e.g., near the intersection of two subspaces). Conversely, two points could be far from each other, but lie in the same subspace.
Earlier methods for building an affinity matrix (Boult and Brown, 1991, Costeira and Kanade, 1998) compute the singular value decomposition (SVD) of the data matrix and let , where the columns of are the top singular vectors of D. The rationale behind this choice is that when points i and j are in different independent subspaces and the data are uncorrupted, as shown in Vidal et al. (2005). In practice, however, the data are often contaminated by noise and gross errors. In such cases, the equation does not hold, even if the rank of the noiseless D was given. Moreover, selecting a good value for r becomes very difficult, because D is full rank. Furthermore, the equation is derived under the assumption that the subspaces are linear. In practice, many datasets are better modeled by affine subspaces.
More recent methods for building an affinity matrix address these issues by using techniques from sparse and low-rank representation. For instance, it is shown in Elhamifar and Vidal, 2009, Elhamifar and Vidal, 2010, Elhamifar and Vidal, 2013 that a point in a union of multiple subspaces admits a sparse representation with respect to the dictionary formed by all other data points, i.e., , where C is sparse. It is also shown in Elhamifar and Vidal, 2009, Elhamifar and Vidal, 2010, Elhamifar and Vidal, 2013 that, if the subspaces are independent, the nonzero coefficients in the sparse representation of a point correspond to other points in the same subspace, i.e., if , then points i and j belong to the same subspace. Moreover, the nonzero coefficients can be obtained by minimization. These coefficients are then converted into symmetric and nonnegative affinities, from which the segmentation is found using spectral clustering. A very similar approach is presented in Liu et al. (2010). The major difference is that a low-rank representation is used in lieu of the sparsest representation. While the same principle of representing a point as a linear combination of other points has been successfully used when the data are corrupted by noise and gross errors, from a theoretical viewpoint it is not clear that the above methods are effective when using a corrupted dictionary.
In this paper, we propose a general optimization framework for solving the subspace clustering problem in the case of data corrupted by noise and/or gross errors. Given a corrupted data matrix , we wish to decompose it as the sum of a self-expressive, noise-free and outlier-free (clean) data matrix , a noise matrix , and a matrix of sparse gross errors . We assume that the columns of the matrix are points in drawn from a union of low-dimensional linear subspaces of unknown dimensions , where . We also assume that A is self-expressive, which means that the clean data points can be expressed as linear combinations of themselves, i.e.,where is the matrix of coefficients. This constraint aims to capture the fact that a point in a linear subspace can be expressed as a linear combination of other points in the same subspace. Therefore, we expect to be zero if points i and j are in different subspaces.
Notice that the constraint is non-convex, because both A and C are unknown. This is an important difference with respect to existing methods, which enforce where D is the dictionary of corrupted data points. Another important difference is that we directly enforce C to be symmetric, while existing methods symmetrize C as a post-processing step.
The proposed framework, which we call Low Rank Subspace Clustering (LRSC), is based on solving the following non-convex optimization problem:where , and are, respectively, the nuclear, Frobenius and norms of X. The above formulation encourages:
- •
C to be low-rank (by minimizing ),
- •
A to be self-expressive (by minimizing ),
- •
G to be small (by minimizing ), and
- •
E to be sparse (by minimizing ).
The main contribution of our work is to show that important particular cases of P (see Table 1) can be solved in closed form from the SVD of the data matrix. In particular, we show that in the absence of gross errors (i.e., ), A and C can be obtained by thresholding the singular values of D and A, respectively. The thresholding is done using a novel polynomial thresholding operator, which reduces the amount of shrinkage with respect to existing methods. Indeed, when the self-similarity constraint is enforced exactly (i.e., ), the optimal solution for A reduces to classical PCA, which does not perform any shrinkage. Moreover, the optimal solution for C reduces to the affinity matrix for subspace clustering proposed by Costeira and Kanade (1998). In the case of data corrupted by gross errors, a closed-form solution appears elusive. We thus use an augmented Lagrange multipliers method. Each iteration of our method involves a polynomial thresholding of the singular values to reduce the rank and a regular shrinkage-thresholding to reduce gross errors.
The remainder of the paper is organized as follows (see Table 1): Section 2 reviews existing results on sparse representation and rank minimization for subspace estimation and clustering as well as some background material needed for our derivations. Section 3 formulates the low rank subspace clustering problem for linear subspaces in the absence of noise or gross errors and derives a closed form solution for A and C. Section 4 extends the results of Section 3 to data contaminated by noise and derives a closed form solution for A and C based on the polynomial thresholding operator. Section 5 extends the results to data contaminated by both noise and gross errors and shows that A and C can be found using alternating minimization. Section 6 presents experiments that evaluate our method on synthetic and real data. Section 7 gives the conclusions.
Section snippets
Background
In this section we review existing results on sparse representation and rank minimization for subspace estimation (Section 2.1) and and subspace clustering (Section 2.2).
Low rank subspace clustering with uncorrupted data
In this section, we consider the low rank subspace clustering problem P in the case of uncorrupted data, i.e., and so that and . In Section 3.1, we show that the optimal solution for C can be obtained in closed form from the SVD of A by applying a nonlinear thresholding to its singular values. In Section 3.2, we assume that the self-expressiveness constraint is satisfied exactly, i.e., so that . As shown in Liu et al. (2011), the optimal solution to this problem can be
Low rank subspace clustering with noisy data
In this section, we consider the low rank subspace clustering problem P in the case of noisy data, i.e., , so that and . While in principle the resulting problem appears to be very similar to those in eqs. (16) and (19), there are a number of differences. First, notice that instead of expressing the noisy data as a linear combination of itself plus noise, i.e., , we search for a clean dictionary, A, which is self-expressive, i.e., . We then assume that the data are
Low rank subspace clustering with corrupted data
In this section, we consider the low-rank subspace clustering problem in the case of data corrupted by both noise and gross errors, i.e., problem P. Similar to the case of noisy data discussed in Section 4, the major difference between P and the problems in (17), (19) is that, rather than using a corrupted dictionary, we search simultaneously for the clean dictionary A, the low-rank coefficients C and the sparse errors E. Also, notice that the norm of the matrix of coefficients is replaced
Experiments
In this section we evaluate the performance of LRSC on two computer vision tasks: motion segmentation and face clustering. Using the subspace clustering error,as a measure of performance, we compare LRSC to state-of-the-art subspace clustering algorithms based on spectral clustering, such as LSA (Yan and Pollefeys, 2006), SCC (Chen and Lerman, 2009), LRR (Liu et al., 2010), and SSC (Elhamifar and Vidal, 2013). We choose these
Discussion and conclusion
We have proposed a new algorithm for clustering data drawn from a union of subspaces and corrupted by noise/gross errors. Our approach was based on solving a non-convex optimization problem whose solution provides an affinity matrix for spectral clustering. Our key contribution was to show that important particular cases of our formulation can be solved in closed form by applying a polynomial thresholding operator to the SVD of the data. A drawback of our approach to be addressed in the future
References (49)
- et al.
Unsupervised segmentation of natural images via lossy data compression
Computer Vision and Image Understanding
(2008) - Agarwal, P., Mustafa, N., 2004. k-Means projective clustering. In: ACM Symposium on Principles of database...
- et al.
Lambertian reflection and linear subspaces
IEEE Transactions on Pattern Analysis and Machine Intelligence
(2003) - Boult, T., Brown, L., 1991. Factorization-based segmentation of motions. In: IEEE Workshop on Motion Understanding, pp....
- et al.
k-plane clustering
Journal of Global Optimization
(2000) - et al.
A singular value thresholding algorithm for matrix completion
SIAM Journal of Optimization
(2008) - et al.
Robust principal component analysis?
Journal of the ACM
(2011) - et al.
Spectral curvature clustering (SCC)
International Journal of Computer Vision
(2009) - et al.
A multibody factorization method for independently moving objects
International Journal of Computer Vision
(1998) - Elhamifar, E., Vidal, R., 2009. Sparse subspace clustering. In: IEEE Conference on Computer Vision and Pattern...
Sparse subspace clustering: Algorithm, theory, and applications
Multibody grouping from motion images
Int. Journal of Computer Vision
Multi-scale hybrid linear models for lossy image representation
IEEE Trans. on Image Processing
Acquiring linear subspaces for face recognition under variable lighting
IEEE Transactions on Pattern Analysis and Machine Intelligence
Robust recovery of subspace structures by low-rank representation
Cited by (461)
Leveraging spatio-temporal features using graph neural networks for human activity recognition
2024, Pattern RecognitionAn end-to-end Graph Convolutional Network for Semi-supervised Subspace Clustering via label self-expressiveness
2024, Knowledge-Based SystemsDeep robust multi-channel learning subspace clustering networks
2023, Image and Vision ComputingElastic Deep Sparse Self-Representation Subspace Clustering Network
2024, Neural Processing LettersCost-sensitive sparse subset selection
2024, International Journal of Machine Learning and Cybernetics