Robust Generalized Low Rank Approximations of Matrices

Jiarong Shi; Wei Yang; Xiuyun Zheng

doi:10.1371/journal.pone.0138028

Abstract

In recent years, the intrinsic low rank structure of some datasets has been extensively exploited to reduce dimensionality, remove noise and complete the missing entries. As a well-known technique for dimensionality reduction and data compression, Generalized Low Rank Approximations of Matrices (GLRAM) claims its superiority on computation time and compression ratio over the SVD. However, GLRAM is very sensitive to sparse large noise or outliers and its robust version does not have been explored or solved yet. To address this problem, this paper proposes a robust method for GLRAM, named Robust GLRAM (RGLRAM). We first formulate RGLRAM as an l₁-norm optimization problem which minimizes the l₁-norm of the approximation errors. Secondly, we apply the technique of Augmented Lagrange Multipliers (ALM) to solve this l₁-norm minimization problem and derive a corresponding iterative scheme. Then the weak convergence of the proposed algorithm is discussed under mild conditions. Next, we investigate a special case of RGLRAM and extend RGLRAM to a general tensor case. Finally, the extensive experiments on synthetic data show that it is possible for RGLRAM to exactly recover both the low rank and the sparse components while it may be difficult for previous state-of-the-art algorithms. We also discuss three issues on RGLRAM: the sensitivity to initialization, the generalization ability and the relationship between the running time and the size/number of matrices. Moreover, the experimental results on images of faces with large corruptions illustrate that RGLRAM obtains the best denoising and compression performance than other methods.

Citation: Shi J, Yang W, Zheng X (2015) Robust Generalized Low Rank Approximations of Matrices. PLoS ONE 10(9): e0138028. https://doi.org/10.1371/journal.pone.0138028

Editor: Fabio Rapallo, University of East Piedmont, ITALY

Received: March 2, 2015; Accepted: August 24, 2015; Published: September 14, 2015

Copyright: © 2015 Shi et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited

Data Availability: Olivetti Research Laboratory (ORL) dataset and Yale face dataset can be downloaded freely at http://www.cl.cam.ac.uk/research/dtg/attarchive/facedatabase.html and http://vision.ucsd.edu/content/yale-face-database respectively.

Funding: This work is partially supported by National Natural Science Foundation of China under Grant No.61403298 and No.11401457, the Natural Science Basic Research Plan In Shaanxi Province of China under Grant No. 2014JQ8323 and No.2014JQ1019, and by Shaanxi Provincial Education Department under Grant No. 2013JK0587.

Competing interests: The authors have declared that no competing interests exist.

Introduction

In the community of pattern recognition, machine learning and computer vision, a commonly-used tenet is that the interested datasets lie in single or multiple linear subspaces. This low rank structure can be exploited to reduce dimensionality, remove noise and complete the missing entries. In view of this, the low rank subspace methods have been attracting broad attentions in the past few years. These methods include not only the classical paradigms such as Principal Component Analysis (PCA) [1], Singular Value Decomposition (SVD) [2] and Linear Discriminant Analysis (LDA) [3], but also the recently emerged Low Rank Matrix Recovery (LRMR) composed mainly by Matrix Completion (MC) [4,5], Robust Principal Component Analysis (RPCA) [6,7] and Low Rank Representation (LRR) [8,9].

Conventionally, each sample is modeled by a vector and the collection of data samples is represented by a matrix. However, some real-world samples such as gray-level images are two-dimensional in essence. Under this circumstance, the traditional vector subspace model can not keep the bilinear structure of samples. To remedy this deficiency, one-dimensional subspace methods are successively generalized to the two-dimensional case. The resulting two-dimensional subspace methods mainly include two-dimensional PCA (2dPCA) [10], two-dimensional SVD (2dSVD) [11], two-dimensional LDA (2dLDA) [12], two-directional two-dimensional PCA ((2d)²PCA) [13], Generalized Low Rank Approximations of Matrices (GLRAM) [14,15] and so on. Among them, the latter two methods have the equivalent tri-factorization formulations, that is, they use two-sided transformations rather than single-sided ones. Compared to SVD, GLRAM is verified experimentally to have better compression performance and consume less computation time [14].

Recently, Liu et al. [15] revealed the relationship between GLRAM and SVD, and gave a lower-bound of GLRAM’s objective function; Shi et al. [16] proposed a method of matrix completion via GLRAM. The existing GLRAM is designed to handle Gaussian noise and the Frobenius norm is adopted to evaluate the magnitude of approximation errors. However, GLRAM does not work well in the presence of large sparse noise or outliers. As far as we know, the robustness of GLRAM does not have been explored or solved yet. To this end, this paper proposes a novel robust approach to GLRAM, named Robust GLRAM (RGLRAM). Mathematically, RGLRAM is formulated as an l₁-norm minimization problem and the Augmented Lagrange Multiplier (ALM) method [17] is applied to solve this non-smooth optimization problem. We also extend RGLRAM to the tensor case and present an iterative scheme to robust low rank tensor approximations.

The rest of this paper is organized as follows. Section 2 briefly reviews GLRAM. Model and algorithm of RGLRAM are proposed respectively in Section 3 and 4. Section 5 presents the weak convergence results for the proposed algorithm under mild conditions. In Section 6, we discuss the special case of RGLRAM for a single matrix and generalize RGLRAM to the case of tensors. Experimental results on synthetic data and images of faces are reported in Section 7. Conclusions and directions for future work can be found in Section 8.

Review of Generalized Low Rank Approximations of Matrices

As the generalization of SVD, Generalized Low Rank Approximations of Matrices (GLRAM) treats each sample as a two-dimensional matrix pattern and it simultaneously carries out tri-factorizations on multiple matrices. Given a collection of N two-dimensional training samples , we decompose each sample into the product of three matrices: (1) where both and are orthogonal transformation matrices, , max(r₁, r₂) < min(m, n). The goal of GLRAM is to seek N+2 matrices L, R and such that LM_iR^T approximates D_i for all i.

Mathematically, GLRAM can be formulated as a minimization problem: (2) where is an r₁×r₁ identify matrix and ‖⋅‖_F is the Frobenius norm of a matrix. If {L, R, M₁,…,M_N} is the optimal solution of problem (2), then it holds that M_i = L^TD_iR. Based on this property, we only need to compute two transformations L and R by minimizing the reconstruction errors: (3)

Generally speaking, the above optimization problem does not have a closed-form solution. An iterative procedure for computing L and R was proposed in [14]. Specifically, for fixed L, the optimal R is computed by stacking the top r₂ singular vectors of . Similarly, for given R, the optimal L is constructed by the top r₁ singular vectors of . The above iterative procedure is repeated until convergence. In this paper, the stopping condition is set as (4) where ε is a small positive number.

GLRAM has good compression performance due to the fact that D_i is approximated by LM_iR^T. To store {L, R, M₁,…,M_N}, mr₁ + nr₂ + Nr₁r₂ scalars are required. Hence, the compression ratio via GLRAM is Nmn/(mr₁ + nr₂ + Nr₁r₂). Moreover, formulation (1) is equivalent to: (5) where ⊗ denotes the Kronecker product and vec(⋅) is a vectorization operator which stacks the columns of a matrix into a long column vector. Based on formulation (5), GLRAM can be interpreted as a linear representation problem, where R ⊗ L is the basis matrix.

Robust Model of Generalized Low Rank Approximations of Matrices

We rewrite the tri-factorization formulation (1) as (6) where are noise matrices. If the entries of E_i obey independent normal distributions with zero means, then their maximum likelihood estimator is equivalent to minimizing the objective function of problem (2). As is known to all, this objective function, also called cost function, is very sensitive to large sparse corruptions or outliers. For this purpose, we now assume that the entries of E_i are generated by independent Laplacian distributions. According to the method of maximum likelihood estimation, the hidden factors {L, R, M₁,…,M_N} can be obtained by solving the following l₁-norm minimization problem: (7) where ‖⋅‖₁ is the component-wise l₁-norm of a matrix (i.e. the sum of absolute values of a matrix). To simplify the representation of the objective function in problem (7), we introduce N additional variables E₁,…, E_N. At this moment, the above minimization problem is equivalent to: (8)

We call problem (7) or (8) as Robust Generalized Low Rank Approximations of Matrices (RGLRAM).

Compared with problem (2), problem (7) or (8) is more robust to large sparse noise or outliers on account of the fact that the component-wise l₁-norm of a matrix is the tightest convex relaxation of the component-wise l₀-norm (i.e. the number of non-zero elements). Although problem (8) belongs to the l₁-norm minimization, it is non-convex. Therefore, we can not directly use the available methods of compressed sensing to solve it. In the next section, we will apply the Augmented Lagrange Multipliers (ALM) method to solving problem (8).

Algorithm to Robust Generalized Low Rank Approximations of Matrices

The method of Lagrange multipliers is a strategy for converting the optimization problem with equality constraints into an unconstrained problem. In view of this, we propose an iterative algorithm based on ALM to solve problem (8).

For the sake of simplicity, we set .Without considering the orthogonal constraints in problem (8), we construct its partial augmented Lagrange function: (9) where 〈⋅,⋅〉 indicates the inner product between matrices, the penalty factor μ is positive, are Lagrange multiplier matrices and . For given Y, a block-based gradient descent search technique is proposed to minimize f_μ(L, R, M, E, Y). Concretely speaking, we let one block of variables be unknown and other blocks of variables be fixed at each iteration, and then the unknown block of variables is updated by minimizing f_μ(L, R, M, E, Y). The specific update formulations are derived as follows.

Computing L. If L is unknown and other variables are fixed, we update L by minimizing f_μ(L, R, M, E, Y) with respect to L. Considering the orthogonal constraints in problem (8), we have (10)

Hence, it holds that (11) where .

The constraint surface in problem (11) is known as the Stiefel manifold. For the optimization problems on Stiefel manifolds, Edelman et al. developed new Newton and conjugate gradient algorithms [18]. These two algorithms have heavy computation burden because they require the Hessian of the objective function. To reduce the computation complexity, we relax the orthogonal constraint into a sphere constraint and thus have (12) It is obvious that the optimal solution of is . Meanwhile, we can verify that does not increase the value of f_μ(L, R, M, E, Y) although it probably does not satisfy the orthogonal constraint. Then we set (13) where QR(⋅) means the thin QR decomposition on a matrix and L is an orthonormal basis of the range space of P. It can not be guaranteed that the iteration formulation (13) decreases the value of f_μ(L, R, M, E, Y). Since (14) the value of f_μ(L, R, M, E, Y) surely does not increases after updating M_i.

Computing M. When M_j is unknown and other blocks of variables are given, the update formulation for M_j is calculated as: (15)

Computing R. For each j ∈ {1, 2,…, N}, we have (16) This equation means that the iterative formulation of R is similar with that of L. Let . Then R is updated as (17) It is worth noting that Eq (17) is followed by the calculation procedure of M in order to decrease the value of f_μ(L, R, M, E, Y).

Computing E. Fix L, R, M and Y, and minimize f_μ(L, R, M, E, Y) with respect to E_j: (18) where is an absolute value shrinkage operator defined by (19) for arbitrary real matrix X = (x_ij)_m×n and δ > 0.

Computing Y. For given L, R, M and E, we calculate Y_j according to the following formulation (20)

Denote the m×n zero matrix by O_m×n. The whole iterative procedure is outlined as follows.

Algorithm: ALM algorithm for RGLRAM

Input: Data matrices , r₁ and r₂.

Initialize: L, R, M_i = L^TD_iR, E_i = O_m×n, Y_i = O_m×n, ρ = 1.1, μ = 10⁻⁴, μ_max = 10¹⁰, i = 1, 2,…, N.

Step 1: while not converged do

Update L according to (13);

Update M according to (15);

Update R according to (17);

Update M according to (15);

Update E according to (18);

Update Y according to (20);

Update μ as μ:= min(ρμ, μ_max);

End while.

Output: and .

During the initialization, we can stipulate L = orth(randn(m,r₁)) and R = orth(randn(n,r₂)), where orth(⋅) is an orthonormal basis operator for the range of a matrix and randn(m, r₁) is an m×r₁ random matrix generated by the standard normal distribution. The stopping condition of can be set as (21) or the maximum number of iterations is reached, where ε is a sufficiently small positive number.

Most of the time spent by the above algorithm is in the matrix multiplication and the thin QR decomposition. For the convenience of description, we suppose that m ≥ n > r₁ ≥ r₂. Under the assumption, the computation complexity of GLRAM at each loop is O(Nmnr₂). Subsequently, we discuss the computation complexity of the ALM algorithm. It takes O(Nmnr₂) time for computing P and time for performing the QR decomposition on P. Hence, the computation time of L is O(Nmnr₂). Similarly, the computation time of R is also O(Nmnr₂). Moreover, the computation times of M, E and Y are all O(Nmnr₂). In conclusion, the total computation complexity of the ALM algorithm at each loop is O(Nmnr₂), which means that RGLRAM has the same computation complexity as RGLRAM.

Convergence Analysis on the Proposed Algorithm

To the best of our knowledge, the convergence theory of ALM algorithms has not been established on non-convex problems or convex problems with more than two blocks of variables. Consequently, it is very difficult for us to prove the convergence of the proposed ALM algorithm. However, empirical evidence in Section 7 shows the strong convergence behavior of our algorithm. This section will discuss the weak convergence results of the proposed algorithm under mild conditions.

If both L and R are known, then problem (8) is reformulated as (22)

The augmented Lagrange function of the above problem is (23)

To solve problem (22), we can simply revise the ALM algorithm by deleting the iterative formulations of L and R. In such a case, we have the following statement.

Theorem 1. Let {M^(k), E^(k), Y^(k)} be the sequences produced by the revised ALM algorithm. If has a saddle point, then the sequences {M^(k), E^(k)} satisfy that the objective function in problem (22) approaches the optimal value as k → ∞.

Proof. The objective function of problem (22) can be rewritten as g(M) + h(E), where g(M) = 0 and . It is obvious that g(M) and h(E) are two closed, proper and convex functions.

Let d = vec((D₁,…, D_N)), x = vec((M₁,…, M_N)), y = vec((E₁,…, E_N)) and B = diag(R⊗L,…,R⊗L) , where diag(⋅) is the diagonal block matrix operator. Then the equality constraints in problem (22) can be re-expressed as Bx + y = d. We denote g(x) = g(M) and h(y) = h(E). According to the basic convergence result given in [19], we have and , where f^* is the minimization value of g(M) + h(E) under the constraints of Bx + y = d. This ends the proof.

Moreover, problem (8) is essentially equivalent to the following minimization problem without orthogonal constraints: (24)

Its un-augmented Lagrange function is (25)

On the basis of the differentials or sub-differentials of φ(L, R, M, E, Y) with respect to each block of variables, we have the following KKT conditions for problem (24): (26) where ∂‖⋅‖₁ is the set of sub-differentials of the component-wise l₁-norm of a matrix.

For arbitrary μ > 0, the condition Y_j ∈ ∂‖E_j‖₁ is equivalent to (27)

Denote . Then Q_μ(t) is monotonically increasing so that [20]. We apply element-wise Q_μ(t) to E_j and have . Hence, we arrive at (28) according to the equation D_j = LM_jR^T + E_j. The above analysis means that the condition Y_j ∈ ∂‖E_j‖₁ in (26) can be replaced by (28).

Theorem 2. Let X = {L, R, M, E, Y} and be an iterative sequence produced by the ALM algorithm. If , and , then any accumulation point of satisfies the KKT conditions (26), where

Proof. By the iterative formulations (18) and (20), we have (29) (30)

The term implies that both sides of (29) and (30) tend to zeros as k approaches to infinity. Therefore, it holds that (31) (32)

Moreover, by (33) we have . This completes the proof.

Special Case and Tensor Extension of RGLRAM

This section will conduct a further discussion on RGLRAM from two aspects. Firstly, we consider the robust low rank approximations of a single matrix. Secondly, we study the robust low rank tensor approximations by generalizing the proposed method to higher order tensors.

The Case for N = 1

Now, we consider a special case of RGLRAM, that is, N = 1. Under this case, problem (8) is rewritten as (34)

We call this problem as l₁-norm SVD. In other words, problem (34) is the robust version of SVD. Recently, Wright et al. [6] and Candès et al. [7] proposed a principal component pursuit method to recover simultaneously the low rank and the sparse components. The proposed method, also referred to as Robust PCA (RPCA), is described mathematically as a convex optimization problem: (35) where ‖A‖_* is the nuclear norm of A (i.e. the sum of singular values of A) and the tradeoff parameter λ > 0. A good rule of thumb for determining the parameter λ is that [7]. The main algorithms for solving the nuclear minimization problem (35) suffer from high computation burden of SVD at each iteration. If the rank of the low rank component is properly estimated, then RPCA is boiled down to problem (34).

We further set r₁ = r₂ = r. Then problem (34) is transformed into (36) where . This problem is also named as l₁-norm PCA. An alternative optimization method was proposed to solve it [21]. Specifically, we minimize one argument U or V while keeping the other one fixed and then the optimization problem is decomposed into m or n independent linear programmings. Kwak [22] proposed another method of PCA based on l₁-norm maximization. This method changes problem (36) into the following formulation (37)

Although V is absent in the above problem, it is still very hard to directly solve it. A greedy strategy [22] was proposed to solve problem (37). However, the successive greedy solutions might not be ideal.

Extensions to Higher Order Tensors

A K-order tensor is defined as whose (i₁, i₂,…, i_K) entry is , where 1 ≤ i_k ≤ I_k, 1 ≤ k ≤ K. The k-mode matricization of is defined as . Given a matrix , the k-mode product of by U, denoted as , is an I₁×…×I_k−1×J_k×I_k+1×…×I_K tensor whose (i₁,…,i_k−1,j,i_k+1,…,i_K) entry is computed by . The inner product, component-wise l₁-norm and Frobenius norm of matrices can be easily generalized to tensors. Refer to the survey [23] for further understanding on tensor algebra.

In the following, we first consider the tensor expression of RGLRAM. Three-order tensors , and are formed by concentrating the collections of , and respectively, where . Hence, Eq (6) can be re-expressed as . Based on this, problem (8) is reformulated as (38)

The corresponding partial augmented Lagrange function is (39) where is the Lagrange multiplier tensor.

Next, we extend RGLRAM to the general case of tensors and propose the corresponding iterative scheme to robust low rank tensor approximations. Consider a K-order data tensor corrupted by large sparse noise. We assume that the tensor is intrinsically low rank. Thus, is decomposed into the sum of a low rank tensor and a noise component, that is, (40) where , , , r_i < I_i, i = 1,2,…, K. We can further impose the orthogonal constraints on all mode matrices L_i. To recover the low rank and the sparse terms, we solve the following minimization problem (41) where . If K = 3 and , then problem (41) is changed into problem (38).

Without considering the orthogonal constraints in problem (41), we construct its partial augmented Lagrange function: (42)

Similarly, we propose an ALM method to solve problem (41). Concretely speaking, if is fixed, we alternately update each block of variables by minimizing with respect to one argument.

Computing L. For fixed j ∈ {1, 2,…, K}, let L_j be unknown and other blocks of variables be known. Then, the update formulation of L_j is in the following: (43) where . In view of the orthogonal property of L_j, we take (44)

Computing . Once L_j is updated, we immediately compute . The calculation procedure is as follows (45)

Computing . Let be unknown and other variables be fixed. We update according to the following formulation: (46) where is the tensor generalization of absolute value shrinkage operator (19).

Computing . The update for is as follows (47)

The complete algorithm description of the robust low rank tensor approximations is omitted due to its similarity with RGLRAM.

Experiments

A. Ethics Statement

Some face datasets were used in this paper to verify the performance of our method. These face datasets are publicly available for face recognition research, and the consent was not needed. The face images and the experimental results are reported in this paper without any commercial purpose.

B. Experimental Results

In this section, we carry out experiments on synthetic data and real-world face datasets, and illustrate the feasibility and effectiveness of the proposed method. The experimental results of RGLRAM are compared with previous state-of-the-art methods: PCAL1 [22], RPCA and GLRAM. Both PCAL1 and RPCA are implemented on each training sample D_i and the inexact ALM algorithm [17] is employed to solve RPCA. For the aforementioned four methods, the tolerance error ε is set to be 10⁻⁸ and the maximum number of iterations is 1000. All experiments are performed using Matlab R2012a on an Intel Core i3-3220 3.30 GHz machine with 3.48 GB RAM.

Synthetic data.

In this subsection, we synthesize N data matrices according to the following formulations: D_i = A_i + E_i, where A_i are low rank and E_i are sparse. Concretely speaking, A_i are generated randomly as follows: (48) where , , and their entries are independently drawn from the standard normal distribution. As for each E_i, we first define N projection operators (49) for arbitrary , where Ω_i are produced by uniformly sampling from {1,2,…, m}×{1,2,…, n} with probability p, i = 1,2,…, N. Then we generate by a uniform distribution on the interval (−a, a). Finally, we set . The magnitude of noise is measured by the Inverse Signal-to-Noise Ratio (ISNR) defined as (50)

Large value of ISNR means large noise and it is probably disadvantageous to recovering the low rank components.

For each D_i, we can obtain its low rank approximation and noise component by some method discussed in this paper. The Relative Errors (REs) are adopted to evaluate the recovery performance of the low rank and the sparse components respectively and they are defined as follows: (51) (52)

The smaller the relative error is, the better the recovery performance is.

In detailed implementation, the parameters setting is stipulated as follows: m = n = 100, N = 50, r₁ = r₂ = r, . We initialize the parameter r to be r′ and vary it from 10 to 50 with an interval of 10. Three groups of experiments are designed according to the magnitude of noise, that is, we take a = 0.5,5 and 15, respectively. In each group of experiments, we consider two different sampling rates for Ω_i: p = 0.1 and 0.2. For fixed r, a and p, the experiments are repeated 10 times and the average results are reported. The experimental results are shown in Tables 1, 2 and 3, respectively.

Download:

Table 1. Experimental comparison of different methods for a = 0.5.

https://doi.org/10.1371/journal.pone.0138028.t001

Download:

Table 2. Experimental comparison of different methods for a = 5.

https://doi.org/10.1371/journal.pone.0138028.t002

Download:

Table 3. Experimental comparison of different methods for a = 15.

https://doi.org/10.1371/journal.pone.0138028.t003

From these three tables, we have the following observations. (I) For given levels of noise, PCAL1 cannot effectively recover the low rank and the sparse components although it has superiority in running time. The smallest relative error of PCAL1 is 0.154 and the values of RE1 is larger than or equal to 1.58 when a = 5 or 15. To some extent, the failure of PCAL1 is resulted in the relatively large ISNR. (II) RPCA achieves good recovery performance for small r and has strong robustness to large sparse noise. When r = 10, the largest relative error of RPCA is 0.00644, which indicates that RPCA obtains satisfactory recovery performance. However, RPCA deteriorates drastically with the increasing of r. Besides, it also has the longest running time. (III) For relatively small noise corruptions (i.e. a = 0.5), the RE1 values of GLRAM are grudgingly acceptable while the RE2 values are unacceptable. In the presence of medium or large noise corruptions (i.e. a = 5 or 15), GLRAM can not effectively recover both the low rank and the sparse matrices. (IV) If a ∈ {0.5, 5} or p = 0.1, the smallest relative error of RGLRAM is 1.46×10⁻⁷, which means RGLRAM almost exactly recovers both the low rank and the sparse components. The ISNR corresponding to the case (p,a) = (0.2,15) is so large that RGLRAM can not effectively the low rank and the sparse terms. With regard to the running time, RGLRAM is generally longer than GLRAM and shorter than RPCA. In summary, RGLRAM is the most effective method for recovering the low rank components among these four methods.

Sensitivity to initialization and parameter choice.

This subsection will evaluate the sensitivity of the ALM algorithm to the initialization of {L, R, E, Y} and the choice of {r₁, r₂}. We still use the artificial datasets generated by the manner of the previous subsection.

A simple initialization strategy for {L, R, E, Y} is adopted in Subsection 7.1, that is, both L and R are orthogonalized randomly while E_i and Y_i are initialized to be zero matrices. For the special synthetic data, we observe from Tables 1, 2 and 3 that RGLRAM successfully recover the low rank and the sparse components if a ∈ {0.5, 5} or p = 0.1. Under this situation, the standard deviation of running time varies in the interval [0.76, 4.58], that of RE1 varies in the interval [1.77×10⁻⁹, 7.52×10⁻⁸] and that of RE2 varies in the interval [9.02×10⁻¹⁰, 1.43×10⁻⁸]. These observations on some datasets illustrate that the ALM algorithm is not very sensitive to the random initialization of L and R. Subsequently, we will discuss the sensitivity of the ALM algorithm to the initialization of E and Y.

In previous experiments, we initialize E_i and Y_i with zero matrices. Nowadays, a new initialization method is considered as below: E_i = λ₁ × randn(m, n), Y_i = λ₂ × randn(m, n), where λ₁ and λ₂ are two given real numbers, i = 1,2,…, N. It is obvious that E_i and Y_i are zero matrices if λ₁ = λ₂ = 0. We consider 11 different values for λ₁ and λ₂, that is, λ₁, λ₂ ∈ {0,1,…,10}. For the convenience of designing experiments, we only consider two taking value situations of λ₁ and λ₂: varying the value of one parameter while letting the other be zero. The generating manner of is the same as Subsection 7.1. The other parameters are set as follows: r₁ = r₂ = r = 20, p = 0.1, a = 0.5. The experiments are repeated 10 times for given parameters and the experimental results are described in Fig 1, where the left shows the relative errors under different λ, and the right plots the error bar of running time with varying λ. From Fig 1, we can see that the relative errors lie in the range from 10⁻⁹ to 10⁻⁸, which means RGLRAM obtains the relatively stable values. In addition, RGLRAM has small fluctuation in the average running time and the corresponding standard deviation is no more than 1.22 seconds. The above experiments implies that for our synthetic data, the ALM algorithm is also not too sensitive to the random initialization of both E and Y.

Download:

Fig 1. Experimental results of RGLRAM with different initializations on E and Y.

The left represents the relative errors vs. λ and the right represents the running time vs. λ.

https://doi.org/10.1371/journal.pone.0138028.g001

Next, we investigate the influence of r on the recovery performance. For this purpose, we consider the case that p = 0.1, r′ = 20 and a ∈ {0.5, 5}. We vary the value of r from 18 to 30. Fig 2 illustrates the comparison of relative errors with different r. We can see from this figure that both RE1 and RE2 of RGLRAM are less than 0.0068 when a = 0.5 and 20 ≤ r ≤ 24, and less than 0.0080 when a = 5 and 20 ≤ r ≤ 28. These observations on synthetic data show RGLRAM can successfully recover the low rank and the sparse components when the rank r belongs to some interval. Furthermore, r should be large than or equal to r′ to obtain better recovery performance. It is probable that the parameter r has large taking value interval in the presence of large noise. In a word, RGLRAM is not very sensitive to the choice of r on the synthetic data generated by our paper.

Download:

Fig 2. Comparison of relative errors with different r.

https://doi.org/10.1371/journal.pone.0138028.g002

Model evaluation and computation complexity verification.

In this subsection, we will evaluate the generalization performance of RGLRAM and verify its computation complexity through extensive experiments.

The S-fold cross validation is employed to evaluate the generalization ability of RGLRAM. We first randomly partition the given samples into S equal sized subsamples. Then one subsample is chosen as the validation for testing the model, and the remaining S-1 subsamples are used as the training set. This validation process is repeated S times. Finally, we average the S results from the folds. In implementation, we first obtain two orthogonal matrices L and R by learning from the training set, and then apply them to the testing set.

We still use the synthetic data in Subsection 7.1 and set N = 50, p = 0.1, a = 5, m = n = r, S = 10. For a test sample D_test = A_test + E_test, we hope to obtain its low rank approximation LM_testR^T, where A_test and E_test are the low rank and the sparse components of D_test respectively. The optimal M_test can be obtained by solving the minimization problem: (53)

This optimization problem can be regarded as the special case of problem (8) and resolved by simply revising the ALM algorithm. The relative error RE1 is employed to evaluate the model, and the training error and the test error are listed in Table 4. It can be seen from this table that RGRLAM not only has very small training error, but also almost exactly recover the low rank components in the test procedure. Hence, RGRLAM has good generalization ability.

Download:

Table 4. Relative error of the training and the test sets.

https://doi.org/10.1371/journal.pone.0138028.t004

Section 4 discusses the computation complexity of the ALM algorithm in one loop. Now, we verify the relationship of running time with the size of matrices and the sample size. To this end, we design two groups of experiments. In the first group of experiments, we study how the total running time changes with the increasing of size of matrices. For the sake of convenience, we let N = 1, a = 0.5, p = 0.1, m = n = 100i and , where i = 1,2,…,25. For fixed parameters, the experiments are repeated only once to save time and the experimental results are illustrated in Fig 3, where the right performs a parabola fitting on the discrete running times as a reference curve. We can draw two conclusions from this figure: RGLRAM almost exactly recovers both RE1 and RE2 in most cases; the running time is about a parabola function relation with m. The second conclusion is identical with the result of Section 4.

Download:

Fig 3. Experimental results of RGLRAM with different m.

The left represents the relative errors with different m and the right represents the running time with different m.

https://doi.org/10.1371/journal.pone.0138028.g003

The second group of experiments aims to explore the relationship of total running time with N. Let and N = 50i, where i = 1,2,…,40. We conduct the experiments only once for given parameters and outline the results in Fig 4. From the left subplot, we can see both RE1 and RE2 approach to zeros. A linear fitting on the discrete running times is shown in the right subplot. From this plot, we find out that the running time has the linear relationship with N on the whole. This conclusion is also in accordance with the result in Section 4.

Download:

Fig 4. Experimental results of RGLRAM with different N.

The left represents the relative errors with different N and the right represents the running time with different N.

https://doi.org/10.1371/journal.pone.0138028.g004

Applications in face images datasets.

We carry out the experiments on two well-known face image datasets: Olivetti Research Laboratory (ORL) dataset [24] and Yale face dataset [3]. The former contains the face images of 40 persons, each providing 10 different images. The total 400 images captured at different time have a tolerance for some tilting and rotation of the faces up to 20 degrees. These images show different expressions and facial details, and are normalized to the resolution of 64×64. The latter consists of 165 face images of 15 persons. There are 11 images for each person and they have different facial expression or configuration. Each face image is downsampled into the size of 64×64 for computational convenience.

We first compare the performance of low rank recovery by adding salt and pepper noise to each image with a density of 0.1. At this moment, the aforementioned two face datasets corrupted by sparse noise are represented by two collections of matrices. We consider four low rank recovery methods and one denoising method listed as follows: PCAL1, GLRAM, RPCA, RGLRAM and Total Variation (TV) [25]. Let r₁ = r₂ = 20 in both GLRAM and RGLRAM, and the low rank to be 20 in PCAL1. Compared with PCAL1, RPCA and TV, both GLRAM and RGLRAM have better compression performance. The compression rate of GLRAM or RGLRAM is 10.08 on ORL, and that of GLRAM or RGLRAM is 9.86 on Yale. The experimental results are partially shown in Figs 5 and 6, respectively.

Download:

Fig 5. Comparison of low rank recovery and denoising on ORL.

This face dataset is open and can be downloaded freely at http://www.cl.cam.ac.uk/research/dtg/attarchive/facedatabase.html. It is publicly available for face recognition research, and the consent was not needed.

https://doi.org/10.1371/journal.pone.0138028.g005

Download:

Fig 6. Comparison of low rank recovery and denoising on Yale.

This face dataset is open and can be downloaded freely at http://vision.ucsd.edu/content/yaleface-database. It is publicly available for face recognition research, and the consent was not needed.

https://doi.org/10.1371/journal.pone.0138028.g006

From the above two figures, we can see that the PCAL1, GLRAM and TV can not efficiently remove the noise and recover facial contour features. In contrast, both RPCA and RGLRAM have better recovery performance. In terms of image reconstruction, RPCA seems to have better quality than RGLRAM, which maybe interpreted as that we choose the relatively small ranks in RGLRAM. Furthermore, RGLRAM has superiority over RPCA in running time and compression ratio. For instance, RGLRAM requires 82.79 seconds on ORL and 32.93 seconds on Yale, and the running time of RPCA is 212.30 seconds on ORL and 86.65 seconds on Yale. To sum it up, RGLRAM has the best performance in recovering low rank components.

Secondly, we compare the values of Peak Signal to Noise Ratio (PSNR) with different rank and noise density, where PSNR is defined as follows: (54) where D_i indicate the image matrices corrupted by noise and are the recovered matrices. We set r₁ = r₂ = r and vary r from 10 to 30 with an interval of 5. The density of salt and pepper noise is denoted by p. The experiments are repeated 10 times for fixed parameters and the average values of PSNR are recorded. The PSNR comparison among PCAL1, GLRAM and RGLRAM is shown in Table 5, and the comparison between TV and RPCA is shown in Table 6.

Download:

Table 5. PSNR comparison among PCAL1, GLRAM and RGLRAM on ORL and Yale.

https://doi.org/10.1371/journal.pone.0138028.t005

Download:

Table 6. PSNR comparison between TV and RPCA on ORL and Yale.

https://doi.org/10.1371/journal.pone.0138028.t006

By combining Tables 5 with 6, we can see that RGLRAM achieves the best PSNR in all cases. These five methods are sorted in order of decreasing of PSNR as follows: RGLRAM, GLRAM, RPCA, PCAL1 and TV. On the average, the PSNR of RGLRAM is 4.55 large than that of GLRAM on ORL, and 5.28 on Yale. These results show that RGLRAM has the best recovery performance.

Conclusions

In this paper, we study the model of robust GLRAM and present an iterative algorithm based on the technique of ALM. The proposed method can also be employed to solve the problem of robust SVD or l₁-norm PCA. Then we consider the tensor version of robust GLRAM and propose an iterative scheme to robust tensor approximations. It is illustrated experimentally that our method can recover perfectly both the low rank and the sparse components under some conditions.

Our study so far is limited to recovering the low rank and the sparse components for the collections of low rank matrices with a fraction of their entries being arbitrarily corrupted. It would be interesting to investigate the situation that the noise is the superposition of dense small perturbation and sparse gross errors. Moreover, it still needs further research on robust tensor approximations and robust GLRAM with missing entries.

Appendix

A. Matlab Function for RGLRAM

function [L M R E iter] = RGLRAM (D,r1,r2)

%D: an N-by-1 cell arrary of m-by-n matrices

% r1 and r2 are the rank of L and R respectively.

% L and R are orthogonal transformation matrices

% E is noise, D{i} = L*M{i}*R+E{i}

% = = = = = parameters setting = = = = =

mu = 1e-4;mubar = 1e10;

rho = 1.1;epsilon = 1e-8;

maxiter = 1000;

[m n] = size(D{1});N = length(D);

% = = = = = initialization = = = = =

L = orth(randn(m,r1));R = orth(randn(n,r2))';

for i = 1:N

M{i} = L'*D{i}*R';

E{i} = zeros(m,n);

Y{i} = zeros(m,n);

end

normD = 0;

for i = 1:N, normD = normD+norm(D{i},'fro')^2;end

normD = normD^0.5;

% = = = = = loop = = = = =

iter = 0;flag = 1;

while iter< = maxiter & flag

iter = iter+1;

% update for L

L1 = zeros(m,r1);

for i = 1:N

L1 = L1+(D{i}-E{i}+Y{i}/mu)*R'*M{i}';

end

[L LL] = qr(L1,0);

% update for M

for i = 1:N

M{i} = L'*(D{i}-E{i}+Y{i}/mu)*R';

end

% update for R

R1 = zeros(r2,n);

for i = 1:N

R1 = R1+M{i}'*L'*(D{i}-E{i}+Y{i}/mu);

end

[RT RR] = qr(R1',0);

R = RT';

% update for M

for i = 1:N

M{i} = L'*(D{i}-E{i}+Y{i}/mu)*R';

end

% update for E

for i = 1:N

EE = D{i}+Y{i}/mu-L*M{i}*R;

E{i} = max(EE-1/mu,0);

E{i} = E{i}+min(EE+1/mu,0);

end

% update for Y

normErr = 0;

for i = 1:N

temp = D{i}-L*M{i}*R-E{i};

Y{i} = Y{i}+mu*(temp);

normErr = normErr+norm(temp,'fro')^2;

end

normErr = normErr^0.5;

%update for mu

mu = min(rho*mu,mubar);

% = = = Err = =

if normErr/normD<epsilon

flag = 0;

end

B. Demonstration for RGLRAM

clear

m = 100;n = 100;

N = 50;

a = 0.5; p = 0.1;

r1 = 20;r2 = r1;

% = = = = = Generation of D = = = = =

L0 = orth(randn(m,r1));R0 = [orth(randn(n,r1))]';

ErrE = 0;ErrA = 0;

for i = 1:N

H = rand(m,n)>1-p;

E0{i} = (rand(m,n)*(2*a)-a).*H;

M0{i} = randn(r1,r2);

A0{i} = L0*M0{i}*R0;

D{i} = A0{i}+E0{i};

ErrE = ErrE+norm(E0{i},'fro')^2;

ErrA = ErrA+norm(A0{i},'fro')^2;

end

ISNR = sqrt (ErrE/ ErrA);

% = = = = = call RGLRAM = = = = =

t0 = cputime;

[L M R E iter] = RGLRAM (D,r1,r2);

t = cputime-t0;

% = = = = = computation for RE = = = = =

n1 = 0;n2 = 0;

for i = 1:N

n1 = n1+norm(A0{i}-L*M{i}*R,'fro')^2;

n2 = n2+norm(E{i}-E0{i},'fro')^2;

end

RE = [sqrt(n1/ErrA) sqrt(n2/ErrE)];

Author Contributions

Conceived and designed the experiments: JS WY XZ. Performed the experiments: JS XZ. Analyzed the data: JS WY. Contributed reagents/materials/analysis tools: WY. Wrote the paper: JS WY.

References

1. Jolliffe IT (2002) Principal component analysis (second edition). Springer Series in Statistics.
2. Golub G, Van Loan C (2012) Matrix computations (fourth edition). The Johns Hopkins University Press.
3. Belhumeur P, Hespanha J, Kriegman D (1997). Eigenfaces vs. Fisherfaces: Recognition using class specific linear projection. IEEE Trans Pattern Anal Mach Intell 19: 711–720.
- View Article
- Google Scholar
4. Candès EJ, Recht B (2009) Exact matrix completion via convex optimization. Found Comput Math 9: 717–772.
- View Article
- Google Scholar
5. Candès EJ, Tao T (2010) The power of convex relaxation: Near-optimal matrix completion. IEEE Trans Inf Theory 56:2053–2080.
- View Article
- Google Scholar
6. Wright J, Peng Y, Ma Y, Ganesh A, Rao S (2009) Robust principal component analysis: exact recovery of corrupted low-rank matrices via convex optimization. Adv Neural Inf Process Syst 2080–2088.
- View Article
- Google Scholar
7. Candès EJ, Li X, Ma Y, Wright J (2011) Robust principal component analysis? Journal of the ACM 58: no.3, article 11.
- View Article
- Google Scholar
8. Liu G, Lin Z, Yu Y (2010) Robust subspace segmentation by low-rank representation. Proc Int Conf Machine Learning: 663–670.
- View Article
- Google Scholar
9. Liu G, Lin Z, Yan S, Sun J, Yu Y, Ma Y (2013) Robust recovery of subspace structures by low-rank representation. IEEE Trans Pattern Anal Mach Intell 35:171–184. pmid:22487984
- View Article
- PubMed/NCBI
- Google Scholar
10. Yang J, Zhang D, Frangi A, Yang J (2004) Two-dimensional PCA: A new approach to appearance- based face representation and recognition. IEEE Trans Pattern Anal Mach Intell 26:131–137. pmid:15382693
- View Article
- PubMed/NCBI
- Google Scholar
11. Ding C, Ye J (2005) 2-dimensional singular value decomposition for 2d maps and images. Proc SIAM Int Conf Data Mining: 22–34.
- View Article
- Google Scholar
12. Ye J, Janardan R, Li Q (2004) Two-dimensional linear discriminant analysis. Adv Neural Inf Process Syst: 354–363.
- View Article
- Google Scholar
13. Zhang D, Zhou Z (2005) (2D)²PCA:2-Directional 2-Dimensional PCA for efficient face representation and recognition. Neurocomputing 69:224–231.
- View Article
- Google Scholar
14. Ye J (2005) Generalized low rank approximations of matrices. Machine Learning 61:167–191.
- View Article
- Google Scholar
15. Liu J, Chen S, Zhou Z, Tan X (2010) Generalized low rank approximations of matrices revisited. IEEE Trans Neural Netw 21:621–632. pmid:20172823
- View Article
- PubMed/NCBI
- Google Scholar
16. Shi J, Yong L (2014) Matrix completion via generalized low rank approximations of matrices. ICIC Express Letters, Part B: Applications 5:1619–1625.
- View Article
- Google Scholar
17. Lin Z, Chen M, Wu L, Ma Y (2009) The augmented Lagrange multiplier method for exact recovery of corrupted low-rank matrices. Technical Report UILU-ENG-09-2215, arXiv:1009.5055v2.
18. Edelman A, Arias TA, Smith ST (1998) The geometry of algorithms with orthogonality constraints. SIAM J Matrix Anala 20:303–353.
- View Article
- Google Scholar
19. Boyd S, Parikh N, Chu E, Peleato B, Eckstein J (2010) Distributed optimization and statistical learning via the alternating direction method of multipliers. Foundations and Trends in Machine Learning 3(1): 1–122.
- View Article
- Google Scholar
20. Shen Y, Wen Z, Zhang Y (2012) Augmented Lagrangian alternating direction method for matrix separation based on low-rank factorization. Optim Methods Softw 29(2): 1–25.
- View Article
- Google Scholar
21. Ke Q, Kanade T (2005) Robust l1 norm factorization in the presence of outliers and missing data by alternative convex programming. Pro IEEE Conf Computer Vision and Pattern Recognition.
- View Article
- Google Scholar
22. Kwak N (2008) Principal component analysis based on L1-norm maximization. IEEE Trans Pattern Anal Mach Intell 30:1672–1680. pmid:18617723
- View Article
- PubMed/NCBI
- Google Scholar
23. Kolda TG, Bader BW (2009) Tensor decompositions and applications. SIAM Rev 51:455–500.
- View Article
- Google Scholar
24. Samaria F, Harter A (1994) Parameterisation of a stochastic model for human face identification. Proc IEEE Workshop Appl Comput Vis:138–142.
- View Article
- Google Scholar
25. Gilboa G, Zeevi YY,Sochen N (2003)Texture preserving variational denoising using an adaptive fidelity term: Proc VLSM.
- View Article
- Google Scholar

[ref1] 1. Jolliffe IT (2002) Principal component analysis (second edition). Springer Series in Statistics.

[ref2] 2. Golub G, Van Loan C (2012) Matrix computations (fourth edition). The Johns Hopkins University Press.

[ref3] 3. Belhumeur P, Hespanha J, Kriegman D (1997). Eigenfaces vs. Fisherfaces: Recognition using class specific linear projection. IEEE Trans Pattern Anal Mach Intell 19: 711–720.
View Article
Google Scholar

[4] View Article

[5] Google Scholar

[ref4] 4. Candès EJ, Recht B (2009) Exact matrix completion via convex optimization. Found Comput Math 9: 717–772.
View Article
Google Scholar

[7] View Article

[8] Google Scholar

[ref5] 5. Candès EJ, Tao T (2010) The power of convex relaxation: Near-optimal matrix completion. IEEE Trans Inf Theory 56:2053–2080.
View Article
Google Scholar

[10] View Article

[11] Google Scholar

[ref6] 6. Wright J, Peng Y, Ma Y, Ganesh A, Rao S (2009) Robust principal component analysis: exact recovery of corrupted low-rank matrices via convex optimization. Adv Neural Inf Process Syst 2080–2088.
View Article
Google Scholar

[13] View Article

[14] Google Scholar

[ref7] 7. Candès EJ, Li X, Ma Y, Wright J (2011) Robust principal component analysis? Journal of the ACM 58: no.3, article 11.
View Article
Google Scholar

[16] View Article

[17] Google Scholar

[ref8] 8. Liu G, Lin Z, Yu Y (2010) Robust subspace segmentation by low-rank representation. Proc Int Conf Machine Learning: 663–670.
View Article
Google Scholar

[19] View Article

[20] Google Scholar

[ref9] 9. Liu G, Lin Z, Yan S, Sun J, Yu Y, Ma Y (2013) Robust recovery of subspace structures by low-rank representation. IEEE Trans Pattern Anal Mach Intell 35:171–184. pmid:22487984
View Article
PubMed/NCBI
Google Scholar

[22] View Article

[23] PubMed/NCBI

[24] Google Scholar

[ref10] 10. Yang J, Zhang D, Frangi A, Yang J (2004) Two-dimensional PCA: A new approach to appearance- based face representation and recognition. IEEE Trans Pattern Anal Mach Intell 26:131–137. pmid:15382693
View Article
PubMed/NCBI
Google Scholar

[26] View Article

[27] PubMed/NCBI

[28] Google Scholar

[ref11] 11. Ding C, Ye J (2005) 2-dimensional singular value decomposition for 2d maps and images. Proc SIAM Int Conf Data Mining: 22–34.
View Article
Google Scholar

[30] View Article

[31] Google Scholar

[ref12] 12. Ye J, Janardan R, Li Q (2004) Two-dimensional linear discriminant analysis. Adv Neural Inf Process Syst: 354–363.
View Article
Google Scholar

[33] View Article

[34] Google Scholar

[ref13] 13. Zhang D, Zhou Z (2005) (2D)²PCA:2-Directional 2-Dimensional PCA for efficient face representation and recognition. Neurocomputing 69:224–231.
View Article
Google Scholar

[36] View Article

[37] Google Scholar

[ref14] 14. Ye J (2005) Generalized low rank approximations of matrices. Machine Learning 61:167–191.
View Article
Google Scholar

[39] View Article

[40] Google Scholar

[ref15] 15. Liu J, Chen S, Zhou Z, Tan X (2010) Generalized low rank approximations of matrices revisited. IEEE Trans Neural Netw 21:621–632. pmid:20172823
View Article
PubMed/NCBI
Google Scholar

[42] View Article

[43] PubMed/NCBI

[44] Google Scholar

[ref16] 16. Shi J, Yong L (2014) Matrix completion via generalized low rank approximations of matrices. ICIC Express Letters, Part B: Applications 5:1619–1625.
View Article
Google Scholar

[46] View Article

[47] Google Scholar

[ref17] 17. Lin Z, Chen M, Wu L, Ma Y (2009) The augmented Lagrange multiplier method for exact recovery of corrupted low-rank matrices. Technical Report UILU-ENG-09-2215, arXiv:1009.5055v2.

[ref18] 18. Edelman A, Arias TA, Smith ST (1998) The geometry of algorithms with orthogonality constraints. SIAM J Matrix Anala 20:303–353.
View Article
Google Scholar

[50] View Article

[51] Google Scholar

[ref19] 19. Boyd S, Parikh N, Chu E, Peleato B, Eckstein J (2010) Distributed optimization and statistical learning via the alternating direction method of multipliers. Foundations and Trends in Machine Learning 3(1): 1–122.
View Article
Google Scholar

[53] View Article

[54] Google Scholar

[ref20] 20. Shen Y, Wen Z, Zhang Y (2012) Augmented Lagrangian alternating direction method for matrix separation based on low-rank factorization. Optim Methods Softw 29(2): 1–25.
View Article
Google Scholar

[56] View Article

[57] Google Scholar

[ref21] 21. Ke Q, Kanade T (2005) Robust l1 norm factorization in the presence of outliers and missing data by alternative convex programming. Pro IEEE Conf Computer Vision and Pattern Recognition.
View Article
Google Scholar

[59] View Article

[60] Google Scholar

[ref22] 22. Kwak N (2008) Principal component analysis based on L1-norm maximization. IEEE Trans Pattern Anal Mach Intell 30:1672–1680. pmid:18617723
View Article
PubMed/NCBI
Google Scholar

[62] View Article

[63] PubMed/NCBI

[64] Google Scholar

[ref23] 23. Kolda TG, Bader BW (2009) Tensor decompositions and applications. SIAM Rev 51:455–500.
View Article
Google Scholar

[66] View Article

[67] Google Scholar

[ref24] 24. Samaria F, Harter A (1994) Parameterisation of a stochastic model for human face identification. Proc IEEE Workshop Appl Comput Vis:138–142.
View Article
Google Scholar

[69] View Article

[70] Google Scholar

[ref25] 25. Gilboa G, Zeevi YY,Sochen N (2003)Texture preserving variational denoising using an adaptive fidelity term: Proc VLSM.
View Article
Google Scholar

[72] View Article

[73] Google Scholar

Figures

Abstract

Introduction

Review of Generalized Low Rank Approximations of Matrices

Robust Model of Generalized Low Rank Approximations of Matrices

Algorithm to Robust Generalized Low Rank Approximations of Matrices

Convergence Analysis on the Proposed Algorithm

Special Case and Tensor Extension of RGLRAM

The Case for N = 1

Extensions to Higher Order Tensors

Experiments

A. Ethics Statement

B. Experimental Results

Synthetic data.

Sensitivity to initialization and parameter choice.

Model evaluation and computation complexity verification.

Applications in face images datasets.

Conclusions

Appendix

A. Matlab Function for RGLRAM

B. Demonstration for RGLRAM

Author Contributions

References