Skip to main content
Advertisement
Browse Subject Areas
?

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here.

  • Loading metrics

Robust Generalized Low Rank Approximations of Matrices

  • Jiarong Shi ,

    jiarongs3@163.com

    Affiliation School of Science, Xi'an University of Architecture and Technology, Xi'an, China

  • Wei Yang,

    Affiliation School of Science, Xi'an University of Architecture and Technology, Xi'an, China

  • Xiuyun Zheng

    Affiliation School of Science, Xi'an University of Architecture and Technology, Xi'an, China

Abstract

In recent years, the intrinsic low rank structure of some datasets has been extensively exploited to reduce dimensionality, remove noise and complete the missing entries. As a well-known technique for dimensionality reduction and data compression, Generalized Low Rank Approximations of Matrices (GLRAM) claims its superiority on computation time and compression ratio over the SVD. However, GLRAM is very sensitive to sparse large noise or outliers and its robust version does not have been explored or solved yet. To address this problem, this paper proposes a robust method for GLRAM, named Robust GLRAM (RGLRAM). We first formulate RGLRAM as an l1-norm optimization problem which minimizes the l1-norm of the approximation errors. Secondly, we apply the technique of Augmented Lagrange Multipliers (ALM) to solve this l1-norm minimization problem and derive a corresponding iterative scheme. Then the weak convergence of the proposed algorithm is discussed under mild conditions. Next, we investigate a special case of RGLRAM and extend RGLRAM to a general tensor case. Finally, the extensive experiments on synthetic data show that it is possible for RGLRAM to exactly recover both the low rank and the sparse components while it may be difficult for previous state-of-the-art algorithms. We also discuss three issues on RGLRAM: the sensitivity to initialization, the generalization ability and the relationship between the running time and the size/number of matrices. Moreover, the experimental results on images of faces with large corruptions illustrate that RGLRAM obtains the best denoising and compression performance than other methods.

Introduction

In the community of pattern recognition, machine learning and computer vision, a commonly-used tenet is that the interested datasets lie in single or multiple linear subspaces. This low rank structure can be exploited to reduce dimensionality, remove noise and complete the missing entries. In view of this, the low rank subspace methods have been attracting broad attentions in the past few years. These methods include not only the classical paradigms such as Principal Component Analysis (PCA) [1], Singular Value Decomposition (SVD) [2] and Linear Discriminant Analysis (LDA) [3], but also the recently emerged Low Rank Matrix Recovery (LRMR) composed mainly by Matrix Completion (MC) [4,5], Robust Principal Component Analysis (RPCA) [6,7] and Low Rank Representation (LRR) [8,9].

Conventionally, each sample is modeled by a vector and the collection of data samples is represented by a matrix. However, some real-world samples such as gray-level images are two-dimensional in essence. Under this circumstance, the traditional vector subspace model can not keep the bilinear structure of samples. To remedy this deficiency, one-dimensional subspace methods are successively generalized to the two-dimensional case. The resulting two-dimensional subspace methods mainly include two-dimensional PCA (2dPCA) [10], two-dimensional SVD (2dSVD) [11], two-dimensional LDA (2dLDA) [12], two-directional two-dimensional PCA ((2d)2PCA) [13], Generalized Low Rank Approximations of Matrices (GLRAM) [14,15] and so on. Among them, the latter two methods have the equivalent tri-factorization formulations, that is, they use two-sided transformations rather than single-sided ones. Compared to SVD, GLRAM is verified experimentally to have better compression performance and consume less computation time [14].

Recently, Liu et al. [15] revealed the relationship between GLRAM and SVD, and gave a lower-bound of GLRAM’s objective function; Shi et al. [16] proposed a method of matrix completion via GLRAM. The existing GLRAM is designed to handle Gaussian noise and the Frobenius norm is adopted to evaluate the magnitude of approximation errors. However, GLRAM does not work well in the presence of large sparse noise or outliers. As far as we know, the robustness of GLRAM does not have been explored or solved yet. To this end, this paper proposes a novel robust approach to GLRAM, named Robust GLRAM (RGLRAM). Mathematically, RGLRAM is formulated as an l1-norm minimization problem and the Augmented Lagrange Multiplier (ALM) method [17] is applied to solve this non-smooth optimization problem. We also extend RGLRAM to the tensor case and present an iterative scheme to robust low rank tensor approximations.

The rest of this paper is organized as follows. Section 2 briefly reviews GLRAM. Model and algorithm of RGLRAM are proposed respectively in Section 3 and 4. Section 5 presents the weak convergence results for the proposed algorithm under mild conditions. In Section 6, we discuss the special case of RGLRAM for a single matrix and generalize RGLRAM to the case of tensors. Experimental results on synthetic data and images of faces are reported in Section 7. Conclusions and directions for future work can be found in Section 8.

Review of Generalized Low Rank Approximations of Matrices

As the generalization of SVD, Generalized Low Rank Approximations of Matrices (GLRAM) treats each sample as a two-dimensional matrix pattern and it simultaneously carries out tri-factorizations on multiple matrices. Given a collection of N two-dimensional training samples , we decompose each sample into the product of three matrices: (1) where both and are orthogonal transformation matrices, , max(r1, r2) < min(m, n). The goal of GLRAM is to seek N+2 matrices L, R and such that LMiRT approximates Di for all i.

Mathematically, GLRAM can be formulated as a minimization problem: (2) where is an r1×r1 identify matrix and ‖⋅‖F is the Frobenius norm of a matrix. If {L, R, M1,…,MN} is the optimal solution of problem (2), then it holds that Mi = LTDiR. Based on this property, we only need to compute two transformations L and R by minimizing the reconstruction errors: (3)

Generally speaking, the above optimization problem does not have a closed-form solution. An iterative procedure for computing L and R was proposed in [14]. Specifically, for fixed L, the optimal R is computed by stacking the top r2 singular vectors of . Similarly, for given R, the optimal L is constructed by the top r1 singular vectors of . The above iterative procedure is repeated until convergence. In this paper, the stopping condition is set as (4) where ε is a small positive number.

GLRAM has good compression performance due to the fact that Di is approximated by LMiRT. To store {L, R, M1,…,MN}, mr1 + nr2 + Nr1r2 scalars are required. Hence, the compression ratio via GLRAM is Nmn/(mr1 + nr2 + Nr1r2). Moreover, formulation (1) is equivalent to: (5) where ⊗ denotes the Kronecker product and vec(⋅) is a vectorization operator which stacks the columns of a matrix into a long column vector. Based on formulation (5), GLRAM can be interpreted as a linear representation problem, where RL is the basis matrix.

Robust Model of Generalized Low Rank Approximations of Matrices

We rewrite the tri-factorization formulation (1) as (6) where are noise matrices. If the entries of Ei obey independent normal distributions with zero means, then their maximum likelihood estimator is equivalent to minimizing the objective function of problem (2). As is known to all, this objective function, also called cost function, is very sensitive to large sparse corruptions or outliers. For this purpose, we now assume that the entries of Ei are generated by independent Laplacian distributions. According to the method of maximum likelihood estimation, the hidden factors {L, R, M1,…,MN} can be obtained by solving the following l1-norm minimization problem: (7) where ‖⋅‖1 is the component-wise l1-norm of a matrix (i.e. the sum of absolute values of a matrix). To simplify the representation of the objective function in problem (7), we introduce N additional variables E1,…, EN. At this moment, the above minimization problem is equivalent to: (8)

We call problem (7) or (8) as Robust Generalized Low Rank Approximations of Matrices (RGLRAM).

Compared with problem (2), problem (7) or (8) is more robust to large sparse noise or outliers on account of the fact that the component-wise l1-norm of a matrix is the tightest convex relaxation of the component-wise l0-norm (i.e. the number of non-zero elements). Although problem (8) belongs to the l1-norm minimization, it is non-convex. Therefore, we can not directly use the available methods of compressed sensing to solve it. In the next section, we will apply the Augmented Lagrange Multipliers (ALM) method to solving problem (8).

Algorithm to Robust Generalized Low Rank Approximations of Matrices

The method of Lagrange multipliers is a strategy for converting the optimization problem with equality constraints into an unconstrained problem. In view of this, we propose an iterative algorithm based on ALM to solve problem (8).

For the sake of simplicity, we set .Without considering the orthogonal constraints in problem (8), we construct its partial augmented Lagrange function: (9) where 〈⋅,⋅〉 indicates the inner product between matrices, the penalty factor μ is positive, are Lagrange multiplier matrices and . For given Y, a block-based gradient descent search technique is proposed to minimize fμ(L, R, M, E, Y). Concretely speaking, we let one block of variables be unknown and other blocks of variables be fixed at each iteration, and then the unknown block of variables is updated by minimizing fμ(L, R, M, E, Y). The specific update formulations are derived as follows.

Computing L. If L is unknown and other variables are fixed, we update L by minimizing fμ(L, R, M, E, Y) with respect to L. Considering the orthogonal constraints in problem (8), we have (10)

Hence, it holds that (11) where .

The constraint surface in problem (11) is known as the Stiefel manifold. For the optimization problems on Stiefel manifolds, Edelman et al. developed new Newton and conjugate gradient algorithms [18]. These two algorithms have heavy computation burden because they require the Hessian of the objective function. To reduce the computation complexity, we relax the orthogonal constraint into a sphere constraint and thus have (12) It is obvious that the optimal solution of is . Meanwhile, we can verify that does not increase the value of fμ(L, R, M, E, Y) although it probably does not satisfy the orthogonal constraint. Then we set (13) where QR(⋅) means the thin QR decomposition on a matrix and L is an orthonormal basis of the range space of P. It can not be guaranteed that the iteration formulation (13) decreases the value of fμ(L, R, M, E, Y). Since (14) the value of fμ(L, R, M, E, Y) surely does not increases after updating Mi.

Computing M. When Mj is unknown and other blocks of variables are given, the update formulation for Mj is calculated as: (15)

Computing R. For each j ∈ {1, 2,…, N}, we have (16) This equation means that the iterative formulation of R is similar with that of L. Let . Then R is updated as (17) It is worth noting that Eq (17) is followed by the calculation procedure of M in order to decrease the value of fμ(L, R, M, E, Y).

Computing E. Fix L, R, M and Y, and minimize fμ(L, R, M, E, Y) with respect to Ej: (18) where is an absolute value shrinkage operator defined by (19) for arbitrary real matrix X = (xij)m×n and δ > 0.

Computing Y. For given L, R, M and E, we calculate Yj according to the following formulation (20)

Denote the m×n zero matrix by Om×n. The whole iterative procedure is outlined as follows.

Algorithm: ALM algorithm for RGLRAM

Input: Data matrices , r1 and r2.

Initialize: L, R, Mi = LTDiR, Ei = Om×n, Yi = Om×n, ρ = 1.1, μ = 10−4, μmax = 1010, i = 1, 2,…, N.

Step 1: while not converged do

Update L according to (13);

Update M according to (15);

Update R according to (17);

Update M according to (15);

Update E according to (18);

Update Y according to (20);

Update μ as μ:= min(ρμ, μmax);

End while.

Output: and .

During the initialization, we can stipulate L = orth(randn(m,r1)) and R = orth(randn(n,r2)), where orth(⋅) is an orthonormal basis operator for the range of a matrix and randn(m, r1) is an m×r1 random matrix generated by the standard normal distribution. The stopping condition of can be set as (21) or the maximum number of iterations is reached, where ε is a sufficiently small positive number.

Most of the time spent by the above algorithm is in the matrix multiplication and the thin QR decomposition. For the convenience of description, we suppose that mn > r1r2. Under the assumption, the computation complexity of GLRAM at each loop is O(Nmnr2). Subsequently, we discuss the computation complexity of the ALM algorithm. It takes O(Nmnr2) time for computing P and time for performing the QR decomposition on P. Hence, the computation time of L is O(Nmnr2). Similarly, the computation time of R is also O(Nmnr2). Moreover, the computation times of M, E and Y are all O(Nmnr2). In conclusion, the total computation complexity of the ALM algorithm at each loop is O(Nmnr2), which means that RGLRAM has the same computation complexity as RGLRAM.

Convergence Analysis on the Proposed Algorithm

To the best of our knowledge, the convergence theory of ALM algorithms has not been established on non-convex problems or convex problems with more than two blocks of variables. Consequently, it is very difficult for us to prove the convergence of the proposed ALM algorithm. However, empirical evidence in Section 7 shows the strong convergence behavior of our algorithm. This section will discuss the weak convergence results of the proposed algorithm under mild conditions.

If both L and R are known, then problem (8) is reformulated as (22)

The augmented Lagrange function of the above problem is (23)

To solve problem (22), we can simply revise the ALM algorithm by deleting the iterative formulations of L and R. In such a case, we have the following statement.

Theorem 1. Let {M(k), E(k), Y(k)} be the sequences produced by the revised ALM algorithm. If has a saddle point, then the sequences {M(k), E(k)} satisfy that the objective function in problem (22) approaches the optimal value as k → ∞.

Proof. The objective function of problem (22) can be rewritten as g(M) + h(E), where g(M) = 0 and . It is obvious that g(M) and h(E) are two closed, proper and convex functions.

Let d = vec((D1,…, DN)), x = vec((M1,…, MN)), y = vec((E1,…, EN)) and B = diag(RL,…,RL) , where diag(⋅) is the diagonal block matrix operator. Then the equality constraints in problem (22) can be re-expressed as Bx + y = d. We denote g(x) = g(M) and h(y) = h(E). According to the basic convergence result given in [19], we have and , where f* is the minimization value of g(M) + h(E) under the constraints of Bx + y = d. This ends the proof.

Moreover, problem (8) is essentially equivalent to the following minimization problem without orthogonal constraints: (24)

Its un-augmented Lagrange function is (25)

On the basis of the differentials or sub-differentials of φ(L, R, M, E, Y) with respect to each block of variables, we have the following KKT conditions for problem (24): (26) where ∂‖⋅‖1 is the set of sub-differentials of the component-wise l1-norm of a matrix.

For arbitrary μ > 0, the condition Yj ∈ ∂‖Ej1 is equivalent to (27)

Denote . Then Qμ(t) is monotonically increasing so that [20]. We apply element-wise Qμ(t) to Ej and have . Hence, we arrive at (28) according to the equation Dj = LMjRT + Ej. The above analysis means that the condition Yj ∈ ∂‖Ej1 in (26) can be replaced by (28).

Theorem 2. Let X = {L, R, M, E, Y} and be an iterative sequence produced by the ALM algorithm. If , and , then any accumulation point of satisfies the KKT conditions (26), where

Proof. By the iterative formulations (18) and (20), we have (29) (30)

The term implies that both sides of (29) and (30) tend to zeros as k approaches to infinity. Therefore, it holds that (31) (32)

Moreover, by (33) we have . This completes the proof.

Special Case and Tensor Extension of RGLRAM

This section will conduct a further discussion on RGLRAM from two aspects. Firstly, we consider the robust low rank approximations of a single matrix. Secondly, we study the robust low rank tensor approximations by generalizing the proposed method to higher order tensors.

The Case for N = 1

Now, we consider a special case of RGLRAM, that is, N = 1. Under this case, problem (8) is rewritten as (34)

We call this problem as l1-norm SVD. In other words, problem (34) is the robust version of SVD. Recently, Wright et al. [6] and Candès et al. [7] proposed a principal component pursuit method to recover simultaneously the low rank and the sparse components. The proposed method, also referred to as Robust PCA (RPCA), is described mathematically as a convex optimization problem: (35) where ‖A* is the nuclear norm of A (i.e. the sum of singular values of A) and the tradeoff parameter λ > 0. A good rule of thumb for determining the parameter λ is that [7]. The main algorithms for solving the nuclear minimization problem (35) suffer from high computation burden of SVD at each iteration. If the rank of the low rank component is properly estimated, then RPCA is boiled down to problem (34).

We further set r1 = r2 = r. Then problem (34) is transformed into (36) where . This problem is also named as l1-norm PCA. An alternative optimization method was proposed to solve it [21]. Specifically, we minimize one argument U or V while keeping the other one fixed and then the optimization problem is decomposed into m or n independent linear programmings. Kwak [22] proposed another method of PCA based on l1-norm maximization. This method changes problem (36) into the following formulation (37)

Although V is absent in the above problem, it is still very hard to directly solve it. A greedy strategy [22] was proposed to solve problem (37). However, the successive greedy solutions might not be ideal.

Extensions to Higher Order Tensors

A K-order tensor is defined as whose (i1, i2,…, iK) entry is , where 1 ≤ ikIk, 1 ≤ kK. The k-mode matricization of is defined as . Given a matrix , the k-mode product of by U, denoted as , is an I1×…×Ik−1×Jk×Ik+1×…×IK tensor whose (i1,…,ik−1,j,ik+1,…,iK) entry is computed by . The inner product, component-wise l1-norm and Frobenius norm of matrices can be easily generalized to tensors. Refer to the survey [23] for further understanding on tensor algebra.

In the following, we first consider the tensor expression of RGLRAM. Three-order tensors , and are formed by concentrating the collections of , and respectively, where . Hence, Eq (6) can be re-expressed as . Based on this, problem (8) is reformulated as (38)

The corresponding partial augmented Lagrange function is (39) where is the Lagrange multiplier tensor.

Next, we extend RGLRAM to the general case of tensors and propose the corresponding iterative scheme to robust low rank tensor approximations. Consider a K-order data tensor corrupted by large sparse noise. We assume that the tensor is intrinsically low rank. Thus, is decomposed into the sum of a low rank tensor and a noise component, that is, (40) where , , , ri < Ii, i = 1,2,…, K. We can further impose the orthogonal constraints on all mode matrices Li. To recover the low rank and the sparse terms, we solve the following minimization problem (41) where . If K = 3 and , then problem (41) is changed into problem (38).

Without considering the orthogonal constraints in problem (41), we construct its partial augmented Lagrange function: (42)

Similarly, we propose an ALM method to solve problem (41). Concretely speaking, if is fixed, we alternately update each block of variables by minimizing with respect to one argument.

Computing L. For fixed j ∈ {1, 2,…, K}, let Lj be unknown and other blocks of variables be known. Then, the update formulation of Lj is in the following: (43) where . In view of the orthogonal property of Lj, we take (44)

Computing . Once Lj is updated, we immediately compute . The calculation procedure is as follows (45)

Computing . Let be unknown and other variables be fixed. We update according to the following formulation: (46) where is the tensor generalization of absolute value shrinkage operator (19).

Computing . The update for is as follows (47)

The complete algorithm description of the robust low rank tensor approximations is omitted due to its similarity with RGLRAM.

Experiments

A. Ethics Statement

Some face datasets were used in this paper to verify the performance of our method. These face datasets are publicly available for face recognition research, and the consent was not needed. The face images and the experimental results are reported in this paper without any commercial purpose.

B. Experimental Results

In this section, we carry out experiments on synthetic data and real-world face datasets, and illustrate the feasibility and effectiveness of the proposed method. The experimental results of RGLRAM are compared with previous state-of-the-art methods: PCAL1 [22], RPCA and GLRAM. Both PCAL1 and RPCA are implemented on each training sample Di and the inexact ALM algorithm [17] is employed to solve RPCA. For the aforementioned four methods, the tolerance error ε is set to be 10−8 and the maximum number of iterations is 1000. All experiments are performed using Matlab R2012a on an Intel Core i3-3220 3.30 GHz machine with 3.48 GB RAM.

Synthetic data.

In this subsection, we synthesize N data matrices according to the following formulations: Di = Ai + Ei, where Ai are low rank and Ei are sparse. Concretely speaking, Ai are generated randomly as follows: (48) where , , and their entries are independently drawn from the standard normal distribution. As for each Ei, we first define N projection operators (49) for arbitrary , where Ωi are produced by uniformly sampling from {1,2,…, m}×{1,2,…, n} with probability p, i = 1,2,…, N. Then we generate by a uniform distribution on the interval (−a, a). Finally, we set . The magnitude of noise is measured by the Inverse Signal-to-Noise Ratio (ISNR) defined as (50)

Large value of ISNR means large noise and it is probably disadvantageous to recovering the low rank components.

For each Di, we can obtain its low rank approximation and noise component by some method discussed in this paper. The Relative Errors (REs) are adopted to evaluate the recovery performance of the low rank and the sparse components respectively and they are defined as follows: (51) (52)

The smaller the relative error is, the better the recovery performance is.

In detailed implementation, the parameters setting is stipulated as follows: m = n = 100, N = 50, r1 = r2 = r, . We initialize the parameter r to be r′ and vary it from 10 to 50 with an interval of 10. Three groups of experiments are designed according to the magnitude of noise, that is, we take a = 0.5,5 and 15, respectively. In each group of experiments, we consider two different sampling rates for Ωi: p = 0.1 and 0.2. For fixed r, a and p, the experiments are repeated 10 times and the average results are reported. The experimental results are shown in Tables 1, 2 and 3, respectively.

thumbnail
Table 1. Experimental comparison of different methods for a = 0.5.

https://doi.org/10.1371/journal.pone.0138028.t001

thumbnail
Table 2. Experimental comparison of different methods for a = 5.

https://doi.org/10.1371/journal.pone.0138028.t002

thumbnail
Table 3. Experimental comparison of different methods for a = 15.

https://doi.org/10.1371/journal.pone.0138028.t003

From these three tables, we have the following observations. (I) For given levels of noise, PCAL1 cannot effectively recover the low rank and the sparse components although it has superiority in running time. The smallest relative error of PCAL1 is 0.154 and the values of RE1 is larger than or equal to 1.58 when a = 5 or 15. To some extent, the failure of PCAL1 is resulted in the relatively large ISNR. (II) RPCA achieves good recovery performance for small r and has strong robustness to large sparse noise. When r = 10, the largest relative error of RPCA is 0.00644, which indicates that RPCA obtains satisfactory recovery performance. However, RPCA deteriorates drastically with the increasing of r. Besides, it also has the longest running time. (III) For relatively small noise corruptions (i.e. a = 0.5), the RE1 values of GLRAM are grudgingly acceptable while the RE2 values are unacceptable. In the presence of medium or large noise corruptions (i.e. a = 5 or 15), GLRAM can not effectively recover both the low rank and the sparse matrices. (IV) If a ∈ {0.5, 5} or p = 0.1, the smallest relative error of RGLRAM is 1.46×10−7, which means RGLRAM almost exactly recovers both the low rank and the sparse components. The ISNR corresponding to the case (p,a) = (0.2,15) is so large that RGLRAM can not effectively the low rank and the sparse terms. With regard to the running time, RGLRAM is generally longer than GLRAM and shorter than RPCA. In summary, RGLRAM is the most effective method for recovering the low rank components among these four methods.

Sensitivity to initialization and parameter choice.

This subsection will evaluate the sensitivity of the ALM algorithm to the initialization of {L, R, E, Y} and the choice of {r1, r2}. We still use the artificial datasets generated by the manner of the previous subsection.

A simple initialization strategy for {L, R, E, Y} is adopted in Subsection 7.1, that is, both L and R are orthogonalized randomly while Ei and Yi are initialized to be zero matrices. For the special synthetic data, we observe from Tables 1, 2 and 3 that RGLRAM successfully recover the low rank and the sparse components if a ∈ {0.5, 5} or p = 0.1. Under this situation, the standard deviation of running time varies in the interval [0.76, 4.58], that of RE1 varies in the interval [1.77×10−9, 7.52×10−8] and that of RE2 varies in the interval [9.02×10−10, 1.43×10−8]. These observations on some datasets illustrate that the ALM algorithm is not very sensitive to the random initialization of L and R. Subsequently, we will discuss the sensitivity of the ALM algorithm to the initialization of E and Y.

In previous experiments, we initialize Ei and Yi with zero matrices. Nowadays, a new initialization method is considered as below: Ei = λ1 × randn(m, n), Yi = λ2 × randn(m, n), where λ1 and λ2 are two given real numbers, i = 1,2,…, N. It is obvious that Ei and Yi are zero matrices if λ1 = λ2 = 0. We consider 11 different values for λ1 and λ2, that is, λ1, λ2 ∈ {0,1,…,10}. For the convenience of designing experiments, we only consider two taking value situations of λ1 and λ2: varying the value of one parameter while letting the other be zero. The generating manner of is the same as Subsection 7.1. The other parameters are set as follows: r1 = r2 = r = 20, p = 0.1, a = 0.5. The experiments are repeated 10 times for given parameters and the experimental results are described in Fig 1, where the left shows the relative errors under different λ, and the right plots the error bar of running time with varying λ. From Fig 1, we can see that the relative errors lie in the range from 10−9 to 10−8, which means RGLRAM obtains the relatively stable values. In addition, RGLRAM has small fluctuation in the average running time and the corresponding standard deviation is no more than 1.22 seconds. The above experiments implies that for our synthetic data, the ALM algorithm is also not too sensitive to the random initialization of both E and Y.

thumbnail
Fig 1. Experimental results of RGLRAM with different initializations on E and Y.

The left represents the relative errors vs. λ and the right represents the running time vs. λ.

https://doi.org/10.1371/journal.pone.0138028.g001

Next, we investigate the influence of r on the recovery performance. For this purpose, we consider the case that p = 0.1, r′ = 20 and a ∈ {0.5, 5}. We vary the value of r from 18 to 30. Fig 2 illustrates the comparison of relative errors with different r. We can see from this figure that both RE1 and RE2 of RGLRAM are less than 0.0068 when a = 0.5 and 20 ≤ r ≤ 24, and less than 0.0080 when a = 5 and 20 ≤ r ≤ 28. These observations on synthetic data show RGLRAM can successfully recover the low rank and the sparse components when the rank r belongs to some interval. Furthermore, r should be large than or equal to r′ to obtain better recovery performance. It is probable that the parameter r has large taking value interval in the presence of large noise. In a word, RGLRAM is not very sensitive to the choice of r on the synthetic data generated by our paper.

Model evaluation and computation complexity verification.

In this subsection, we will evaluate the generalization performance of RGLRAM and verify its computation complexity through extensive experiments.

The S-fold cross validation is employed to evaluate the generalization ability of RGLRAM. We first randomly partition the given samples into S equal sized subsamples. Then one subsample is chosen as the validation for testing the model, and the remaining S-1 subsamples are used as the training set. This validation process is repeated S times. Finally, we average the S results from the folds. In implementation, we first obtain two orthogonal matrices L and R by learning from the training set, and then apply them to the testing set.

We still use the synthetic data in Subsection 7.1 and set N = 50, p = 0.1, a = 5, m = n = r, S = 10. For a test sample Dtest = Atest + Etest, we hope to obtain its low rank approximation LMtestRT, where Atest and Etest are the low rank and the sparse components of Dtest respectively. The optimal Mtest can be obtained by solving the minimization problem: (53)

This optimization problem can be regarded as the special case of problem (8) and resolved by simply revising the ALM algorithm. The relative error RE1 is employed to evaluate the model, and the training error and the test error are listed in Table 4. It can be seen from this table that RGRLAM not only has very small training error, but also almost exactly recover the low rank components in the test procedure. Hence, RGRLAM has good generalization ability.

thumbnail
Table 4. Relative error of the training and the test sets.

https://doi.org/10.1371/journal.pone.0138028.t004

Section 4 discusses the computation complexity of the ALM algorithm in one loop. Now, we verify the relationship of running time with the size of matrices and the sample size. To this end, we design two groups of experiments. In the first group of experiments, we study how the total running time changes with the increasing of size of matrices. For the sake of convenience, we let N = 1, a = 0.5, p = 0.1, m = n = 100i and , where i = 1,2,…,25. For fixed parameters, the experiments are repeated only once to save time and the experimental results are illustrated in Fig 3, where the right performs a parabola fitting on the discrete running times as a reference curve. We can draw two conclusions from this figure: RGLRAM almost exactly recovers both RE1 and RE2 in most cases; the running time is about a parabola function relation with m. The second conclusion is identical with the result of Section 4.

thumbnail
Fig 3. Experimental results of RGLRAM with different m.

The left represents the relative errors with different m and the right represents the running time with different m.

https://doi.org/10.1371/journal.pone.0138028.g003

The second group of experiments aims to explore the relationship of total running time with N. Let and N = 50i, where i = 1,2,…,40. We conduct the experiments only once for given parameters and outline the results in Fig 4. From the left subplot, we can see both RE1 and RE2 approach to zeros. A linear fitting on the discrete running times is shown in the right subplot. From this plot, we find out that the running time has the linear relationship with N on the whole. This conclusion is also in accordance with the result in Section 4.

thumbnail
Fig 4. Experimental results of RGLRAM with different N.

The left represents the relative errors with different N and the right represents the running time with different N.

https://doi.org/10.1371/journal.pone.0138028.g004

Applications in face images datasets.

We carry out the experiments on two well-known face image datasets: Olivetti Research Laboratory (ORL) dataset [24] and Yale face dataset [3]. The former contains the face images of 40 persons, each providing 10 different images. The total 400 images captured at different time have a tolerance for some tilting and rotation of the faces up to 20 degrees. These images show different expressions and facial details, and are normalized to the resolution of 64×64. The latter consists of 165 face images of 15 persons. There are 11 images for each person and they have different facial expression or configuration. Each face image is downsampled into the size of 64×64 for computational convenience.

We first compare the performance of low rank recovery by adding salt and pepper noise to each image with a density of 0.1. At this moment, the aforementioned two face datasets corrupted by sparse noise are represented by two collections of matrices. We consider four low rank recovery methods and one denoising method listed as follows: PCAL1, GLRAM, RPCA, RGLRAM and Total Variation (TV) [25]. Let r1 = r2 = 20 in both GLRAM and RGLRAM, and the low rank to be 20 in PCAL1. Compared with PCAL1, RPCA and TV, both GLRAM and RGLRAM have better compression performance. The compression rate of GLRAM or RGLRAM is 10.08 on ORL, and that of GLRAM or RGLRAM is 9.86 on Yale. The experimental results are partially shown in Figs 5 and 6, respectively.

thumbnail
Fig 5. Comparison of low rank recovery and denoising on ORL.

This face dataset is open and can be downloaded freely at http://www.cl.cam.ac.uk/research/dtg/attarchive/facedatabase.html. It is publicly available for face recognition research, and the consent was not needed.

https://doi.org/10.1371/journal.pone.0138028.g005

thumbnail
Fig 6. Comparison of low rank recovery and denoising on Yale.

This face dataset is open and can be downloaded freely at http://vision.ucsd.edu/content/yaleface-database. It is publicly available for face recognition research, and the consent was not needed.

https://doi.org/10.1371/journal.pone.0138028.g006

From the above two figures, we can see that the PCAL1, GLRAM and TV can not efficiently remove the noise and recover facial contour features. In contrast, both RPCA and RGLRAM have better recovery performance. In terms of image reconstruction, RPCA seems to have better quality than RGLRAM, which maybe interpreted as that we choose the relatively small ranks in RGLRAM. Furthermore, RGLRAM has superiority over RPCA in running time and compression ratio. For instance, RGLRAM requires 82.79 seconds on ORL and 32.93 seconds on Yale, and the running time of RPCA is 212.30 seconds on ORL and 86.65 seconds on Yale. To sum it up, RGLRAM has the best performance in recovering low rank components.

Secondly, we compare the values of Peak Signal to Noise Ratio (PSNR) with different rank and noise density, where PSNR is defined as follows: (54) where Di indicate the image matrices corrupted by noise and are the recovered matrices. We set r1 = r2 = r and vary r from 10 to 30 with an interval of 5. The density of salt and pepper noise is denoted by p. The experiments are repeated 10 times for fixed parameters and the average values of PSNR are recorded. The PSNR comparison among PCAL1, GLRAM and RGLRAM is shown in Table 5, and the comparison between TV and RPCA is shown in Table 6.

thumbnail
Table 5. PSNR comparison among PCAL1, GLRAM and RGLRAM on ORL and Yale.

https://doi.org/10.1371/journal.pone.0138028.t005

thumbnail
Table 6. PSNR comparison between TV and RPCA on ORL and Yale.

https://doi.org/10.1371/journal.pone.0138028.t006

By combining Tables 5 with 6, we can see that RGLRAM achieves the best PSNR in all cases. These five methods are sorted in order of decreasing of PSNR as follows: RGLRAM, GLRAM, RPCA, PCAL1 and TV. On the average, the PSNR of RGLRAM is 4.55 large than that of GLRAM on ORL, and 5.28 on Yale. These results show that RGLRAM has the best recovery performance.

Conclusions

In this paper, we study the model of robust GLRAM and present an iterative algorithm based on the technique of ALM. The proposed method can also be employed to solve the problem of robust SVD or l1-norm PCA. Then we consider the tensor version of robust GLRAM and propose an iterative scheme to robust tensor approximations. It is illustrated experimentally that our method can recover perfectly both the low rank and the sparse components under some conditions.

Our study so far is limited to recovering the low rank and the sparse components for the collections of low rank matrices with a fraction of their entries being arbitrarily corrupted. It would be interesting to investigate the situation that the noise is the superposition of dense small perturbation and sparse gross errors. Moreover, it still needs further research on robust tensor approximations and robust GLRAM with missing entries.

Appendix

A. Matlab Function for RGLRAM

function [L M R E iter] = RGLRAM (D,r1,r2)

%D: an N-by-1 cell arrary of m-by-n matrices

% r1 and r2 are the rank of L and R respectively.

% L and R are orthogonal transformation matrices

% E is noise, D{i} = L*M{i}*R+E{i}

% = = = = = parameters setting = = = = =

mu = 1e-4;mubar = 1e10;

rho = 1.1;epsilon = 1e-8;

maxiter = 1000;

[m n] = size(D{1});N = length(D);

% = = = = = initialization = = = = =

L = orth(randn(m,r1));R = orth(randn(n,r2))';

for i = 1:N

M{i} = L'*D{i}*R';

E{i} = zeros(m,n);

Y{i} = zeros(m,n);

end

normD = 0;

for i = 1:N, normD = normD+norm(D{i},'fro')^2;end

normD = normD^0.5;

% = = = = = loop = = = = =

iter = 0;flag = 1;

while iter< = maxiter & flag

iter = iter+1;

% update for L

L1 = zeros(m,r1);

for i = 1:N

L1 = L1+(D{i}-E{i}+Y{i}/mu)*R'*M{i}';

end

[L LL] = qr(L1,0);

% update for M

for i = 1:N

M{i} = L'*(D{i}-E{i}+Y{i}/mu)*R';

end

% update for R

R1 = zeros(r2,n);

for i = 1:N

R1 = R1+M{i}'*L'*(D{i}-E{i}+Y{i}/mu);

end

[RT RR] = qr(R1',0);

R = RT';

% update for M

for i = 1:N

M{i} = L'*(D{i}-E{i}+Y{i}/mu)*R';

end

% update for E

for i = 1:N

EE = D{i}+Y{i}/mu-L*M{i}*R;

E{i} = max(EE-1/mu,0);

E{i} = E{i}+min(EE+1/mu,0);

end

% update for Y

normErr = 0;

for i = 1:N

temp = D{i}-L*M{i}*R-E{i};

Y{i} = Y{i}+mu*(temp);

normErr = normErr+norm(temp,'fro')^2;

end

normErr = normErr^0.5;

%update for mu

mu = min(rho*mu,mubar);

% = = = Err = =

if normErr/normD<epsilon

flag = 0;

end

end

B. Demonstration for RGLRAM

clear

m = 100;n = 100;

N = 50;

a = 0.5; p = 0.1;

r1 = 20;r2 = r1;

% = = = = = Generation of D = = = = =

L0 = orth(randn(m,r1));R0 = [orth(randn(n,r1))]';

ErrE = 0;ErrA = 0;

for i = 1:N

H = rand(m,n)>1-p;

E0{i} = (rand(m,n)*(2*a)-a).*H;

M0{i} = randn(r1,r2);

A0{i} = L0*M0{i}*R0;

D{i} = A0{i}+E0{i};

ErrE = ErrE+norm(E0{i},'fro')^2;

ErrA = ErrA+norm(A0{i},'fro')^2;

end

ISNR = sqrt (ErrE/ ErrA);

% = = = = = call RGLRAM = = = = =

t0 = cputime;

[L M R E iter] = RGLRAM (D,r1,r2);

t = cputime-t0;

% = = = = = computation for RE = = = = =

n1 = 0;n2 = 0;

for i = 1:N

n1 = n1+norm(A0{i}-L*M{i}*R,'fro')^2;

n2 = n2+norm(E{i}-E0{i},'fro')^2;

end

RE = [sqrt(n1/ErrA) sqrt(n2/ErrE)];

Author Contributions

Conceived and designed the experiments: JS WY XZ. Performed the experiments: JS XZ. Analyzed the data: JS WY. Contributed reagents/materials/analysis tools: WY. Wrote the paper: JS WY.

References

  1. 1. Jolliffe IT (2002) Principal component analysis (second edition). Springer Series in Statistics.
  2. 2. Golub G, Van Loan C (2012) Matrix computations (fourth edition). The Johns Hopkins University Press.
  3. 3. Belhumeur P, Hespanha J, Kriegman D (1997). Eigenfaces vs. Fisherfaces: Recognition using class specific linear projection. IEEE Trans Pattern Anal Mach Intell 19: 711–720.
  4. 4. Candès EJ, Recht B (2009) Exact matrix completion via convex optimization. Found Comput Math 9: 717–772.
  5. 5. Candès EJ, Tao T (2010) The power of convex relaxation: Near-optimal matrix completion. IEEE Trans Inf Theory 56:2053–2080.
  6. 6. Wright J, Peng Y, Ma Y, Ganesh A, Rao S (2009) Robust principal component analysis: exact recovery of corrupted low-rank matrices via convex optimization. Adv Neural Inf Process Syst 2080–2088.
  7. 7. Candès EJ, Li X, Ma Y, Wright J (2011) Robust principal component analysis? Journal of the ACM 58: no.3, article 11.
  8. 8. Liu G, Lin Z, Yu Y (2010) Robust subspace segmentation by low-rank representation. Proc Int Conf Machine Learning: 663–670.
  9. 9. Liu G, Lin Z, Yan S, Sun J, Yu Y, Ma Y (2013) Robust recovery of subspace structures by low-rank representation. IEEE Trans Pattern Anal Mach Intell 35:171–184. pmid:22487984
  10. 10. Yang J, Zhang D, Frangi A, Yang J (2004) Two-dimensional PCA: A new approach to appearance- based face representation and recognition. IEEE Trans Pattern Anal Mach Intell 26:131–137. pmid:15382693
  11. 11. Ding C, Ye J (2005) 2-dimensional singular value decomposition for 2d maps and images. Proc SIAM Int Conf Data Mining: 22–34.
  12. 12. Ye J, Janardan R, Li Q (2004) Two-dimensional linear discriminant analysis. Adv Neural Inf Process Syst: 354–363.
  13. 13. Zhang D, Zhou Z (2005) (2D)2PCA:2-Directional 2-Dimensional PCA for efficient face representation and recognition. Neurocomputing 69:224–231.
  14. 14. Ye J (2005) Generalized low rank approximations of matrices. Machine Learning 61:167–191.
  15. 15. Liu J, Chen S, Zhou Z, Tan X (2010) Generalized low rank approximations of matrices revisited. IEEE Trans Neural Netw 21:621–632. pmid:20172823
  16. 16. Shi J, Yong L (2014) Matrix completion via generalized low rank approximations of matrices. ICIC Express Letters, Part B: Applications 5:1619–1625.
  17. 17. Lin Z, Chen M, Wu L, Ma Y (2009) The augmented Lagrange multiplier method for exact recovery of corrupted low-rank matrices. Technical Report UILU-ENG-09-2215, arXiv:1009.5055v2.
  18. 18. Edelman A, Arias TA, Smith ST (1998) The geometry of algorithms with orthogonality constraints. SIAM J Matrix Anala 20:303–353.
  19. 19. Boyd S, Parikh N, Chu E, Peleato B, Eckstein J (2010) Distributed optimization and statistical learning via the alternating direction method of multipliers. Foundations and Trends in Machine Learning 3(1): 1–122.
  20. 20. Shen Y, Wen Z, Zhang Y (2012) Augmented Lagrangian alternating direction method for matrix separation based on low-rank factorization. Optim Methods Softw 29(2): 1–25.
  21. 21. Ke Q, Kanade T (2005) Robust l1 norm factorization in the presence of outliers and missing data by alternative convex programming. Pro IEEE Conf Computer Vision and Pattern Recognition.
  22. 22. Kwak N (2008) Principal component analysis based on L1-norm maximization. IEEE Trans Pattern Anal Mach Intell 30:1672–1680. pmid:18617723
  23. 23. Kolda TG, Bader BW (2009) Tensor decompositions and applications. SIAM Rev 51:455–500.
  24. 24. Samaria F, Harter A (1994) Parameterisation of a stochastic model for human face identification. Proc IEEE Workshop Appl Comput Vis:138–142.
  25. 25. Gilboa G, Zeevi YY,Sochen N (2003)Texture preserving variational denoising using an adaptive fidelity term: Proc VLSM.