Tensor Robust PCA with Nonconvex and Nonlocal Regularization

Tensor robust principal component analysis (TRPCA) is a classical way for low-rank tensor recovery, which minimizes the convex surrogate of tensor rank by shrinking each tensor singular value equally. However, for real-world visual data, large singular values represent more significant information than small singular values. In this paper, we propose a nonconvex TRPCA (N-TRPCA) model based on the tensor adjustable logarithmic norm. Unlike TRPCA, our N-TRPCA can adaptively shrink small singular values more and shrink large singular values less. In addition, TRPCA assumes that the whole data tensor is of low rank. This assumption is hardly satisfied in practice for natural visual data, restricting the capability of TRPCA to recover the edges and texture details from noisy images and videos. To this end, we integrate nonlocal self-similarity into N-TRPCA, and further develop a nonconvex and nonlocal TRPCA (NN-TRPCA) model. Specifically, similar nonlocal patches are grouped as a tensor and then each group tensor is recovered by our N-TRPCA. Since the patches in one group are highly correlated, all group tensors have strong low-rank property, leading to an improvement of recovery performance. Experimental results demonstrate that the proposed NN-TRPCA outperforms existing TRPCA methods in visual data recovery. The demo code is available at https://github.com/qguo2010/NN-TRPCA.


Introduction
Principal component analysis (PCA), aiming to analyze the low-dimensional representation of high-dimensional data, has received considerable attention in the fields of computer vision and machine learning [1][2][3][4]. It is a nonparametric analysis and powerful to process data that is slightly corrupted by small noise. Unfortunately, PCA is sensitive to outliers or large amounts of noise which are inevitably introduced into visual data during acquisition and transmission.
To alleviate this issue, robust principal component analysis (RPCA) [5] was proposed to recover a low-rank matrix from its observation corrupted by sparse noise, in which the matrix rank is unique and defined by the number of nonzero singular values. Since the matrix rank function is difficult to be minimized, RPCA adopts nuclear norm as a convex surrogate of the matrix rank function. Suppose that an observation matrix X ∈ R n 1 ×n 2 can be decomposed by X = L + E, where L is a low-rank matrix and E is a sparse matrix (noise), RPCA can recover L and E with high probability under several incoherent conditions by solving the following minimization problem: where L * and E 1 indicate nuclear norm of L and 1 -norm of E, respectively, and λ is a regularization parameter. Problem (1) can be effectively solved by the Singular Value Thresholding (SVT) [6]. Till now, RPCA and its extensions have plenty of applications, including image restoration/alignment [7], foreground detection [8], subspace clustering [9], etc. Nevertheless, RPCA can only deal with two-dimensional data. In real-world applications, high-dimensional data is growing explosively. Instead of matrix, tensor is the most appropriate representation of highdimensional data. For instance, a color image is a three-dimensional tensor with size of height × width × channel while a gray video is a third-order tensor with column, row, and temporal modes. To handle these tensor data, one can apply the RPCA method on each frontal slice of tensors independently. But such a strategy will ignore the multidimensional structure information underlying the tensors. Therefore, it is natural to extend RPCA to tensor domain. Given an observed tensor X ∈ R n 1 ×n 2 ×n 3 that can be decomposed as X = L + E, where L is low-rank and E is sparse, tensor robust principal component analysis (TRPCA) aims to exactly recover L and E from tensor X. Unlike the matrix rank being unique, different definitions of tensor rank are derived from different tensor decomposition methods. Tucker rank [10] is induced by Tucker decomposition [11], which is defined as a vector of the rank unfolded along each mode of the given tensor. As directly minimizing Tucker rank is NP-hard, the sum of nuclear norm (SNN) [12] was presented as a convex relaxation of the Tucker rank. Based on SNN, Huang et al. [13] proposed a SNN-TRPCA model, i.e., where λ i > 0 and L {i} denotes the mode-i matricization [10] of tensor L. SNN-TRPCA exploits the low-rank property of tensor subspace along each mode. However, it is hard to set the weights λ i due to the fact that the low-rankness of each mode is usually different in real data. For example, the rank of a gray video along its temperal mode is much lower than those along its spatial modes. Besides, the unfolding operation along one mode could destroy the inherent structure information of tensors.
Recently, the tensor average rank [14] was defined by tensor singular value decomposition (t-SVD) [15], in which the block circulant matricization arranges the frontal slices of a tensor in a circulant way. As a result, the tensor average rank can preserve more structural information across frontal slices compared with Tucker rank. Since it is NP-hard to minimize the tensor average rank, tensor nuclear norm (TNN) [14] is deduced as a convex surrogate of the tensor average rank. And it has been proven in [14] that TNN is a tight convex envelope of the tensor average rank. Based on TNN, the TNN-TRPCA model can be formulated as follows: where L T denotes the TNN of L (See Definition 6 for details). Problem (3) can be solved by the alternating direction method of multipliers [16], in which the tensor Singular Value Thresholding (t-SVT) [14] is a key step to solve it. Mathematically, let Y = U * S * V be the t-SVD of Y ∈ R n 1 ×n 2 ×n 3 , for any τ > 0, the t-SVT is expressed as follows where S is the result of fast Fourier transform (FFT) on S along the 3-rd dimension, and ifft is the inverse operator of FFT. It is easy to see that t-SVT shrinks each singular value equally according to the threshold τ. However, in practice, the tensor singular values often have different physical meanings. For a noisy color image, large singular values usually represent the important information in the image, while small singular values usually represent the noise. This motivates us to utilize different thresholds to shrink the large singular values less and the small ones more, so that the siginificant information can be preserved well and the noise can be reduced precisely. Besides, TNN-TRPCA has a potential limitation that it simply assumes the whole underlying tensor is of low rank. For visual data (e.g., natural images and videos), such an assumption is often difficult to be satisfied. Therefore, TNN-TRPCA cannot well recover the detail information in visual data, especially in data with complex stuctures. For solving the above problems, we intend to propose a variant of TRPCA by shrinking tensor singular values differently and integrating nonlocal self-similarity. Specifically, to better preserve the important information of tensor data, a nonconvex TRPCA (N-TRPCA) model is built using tensor adjustable logarithmic norm as a nonconvex surrogate of the tensor average rank. It can apply adaptive thresholds for shrinking different tensor singular values.
Then, the nonlocal self-similarity is further introduced into N-TRPCA to derive a nonconvex and nonlocal TRPCA (NN-TRPCA) model. By this way, our model can make full use of the structural redundancy of tensors to recover the detail information, resulting in remarkable performance improvements. In summary, our contributions are highlighted as follows: • A nonconvex TRPCA (N-TRPCA) model under t-SVD framework is proposed for visual data recovery, which makes large singular values shrink less and small singular values shrink more simultaneously. Such a model can effectively preserve the important information in visual data.
• Beyond using the global low-rankness of tensors, nonlocal low-rank property is more crucial to fully utilize the structural redundancy in tensors. To this end, we incorporate nonlocal self-similarity into N-TRPCA and then propose a nonconvex and nonlocal TRPCA model, named NN-TRPCA.
• To solve the proposed NN-TRPCA, we present an effective optimization algorithm based on alternating direction method of multipliers (ADMM), in which the variables can be solved by the closed-form equations.
• We evaluate the efficacy of our NN-TRPCA method in color image restoration and gray video restoration. Extensive experiment results confirm the superiority of our method and show its competitive performance with the state-of-the-arts.
A preliminary conference version of this work was presented in [17]. We extend it both theoretically and experimentally. First, the recent works on the low-rank property and nonlocal self-similarity of visual data are elaborated to provide theoretical basis for the proposed method. Second, we offer a detailed and rigorous derivation for the closed-form solutions of our N-TRPCA algorithm. Third, we compare two methods of constructing group tensors and find a new observation. Fourth, the proposed NN-TRPCA is evaluated in visual data restoration, where the data is corrupted with random noise. This restoration task is more challenging than denoising visual data corrupted by Gaussian white noise. Fifth, we discuss the influence of parameters in NN-TRPCA on our experiments. Finally, we list several possible future extensions of the proposed NN-TRPCA.
The remainder of this paper is organized as follows. Section 2 displays some notations and preliminaries. Section 3 makes an overview of the low-rankness and nonlocal self-similarity of visual data. In Section 4, we present the NN-TRPCA method and the corresponding optimization algorithm. Extensive experimental results are reported in Section 5. Finally, a conclusion is drawn in Section 6.

Notations and Preliminaries
For convenience of presentation, we first introduce notations used in our paper, and then list some basic definitions and theorems of the tensor algebra.
Throughout this paper, third-order tensors are denoted as boldface calligraphic letters, e.g., A ∈ R n 1 ×n 2 ×n 3 , matrices are denoted as boldface capital letters, e.g., A ∈ R n 1 ×n 2 , vectors are denoted as boldface lowercase letters, e.g., a ∈ R n 1 , and scalars are denoted as lowercase letters, e.g., a ∈ R. For a three-order tensor A, we denoteĀ as the Fast Fourier Transform (FFT) of A along the third dimension by using the Matlab command fft, i.e.,Ā = fft(A, [ ], 3), and obtain A by the inverse FFT, i.e., A = ifft(Ā, [ ], 3). A (i) andĀ (i) denote the ith frontal slice of A andĀ, respectively. Moreover, the inner product between A and B is defined as The 1 -norm, infinity norm, and Frobenius norm of A are defined as A 1 = i jk |a i jk |, A ∞ = max i jk |a i jk | and A F = i jk |a i jk | 2 , respectively, where a i jk denotes the (i, j, k)th entry of A. For a matrix A, matrix nuclear norm is defined as the sum of singular values, i.e., A * = i σ i ( A), where σ j (·) is the ith largest singular value of A. Definition 1. (Block diagonal matrix [14]) The block diagonal matrix of A ∈ R n 1 ×n 2 ×n 3 is Definition 2. (Block circulant matrix [14]) For A ∈ R n 1 ×n 2 ×n 3 , the block circulant matrix of A is defined as which can be regarded as a new way of matricization of tensor A.
Theorem 1. (Diagonalization [14])The block circulant matrix of A can be block diagonalized by the following equation: where ⊗ is the Kronecker product, F n 3 ∈ C n 3 ×n 3 is the discrete fourier transformation matrix, I n 1 ∈ R n 1 ×n 1 ×n 3 and I n 2 ∈ R n 2 ×n 2 ×n 3 are two identity matrices.

Definition 4.
(T-product [15]) Given A ∈ R n 1 ×n 2 ×n 3 and B ∈ R n 2 ×l×n 3 , the t-product A * B is defined as a tensor with size n 1 × l × n 3 , Using Theorem 1, the t-product can be transformed into the matrix multiplication in the Fourier domain, i.e., A * B = AB.
(T-SVD [15]) For A ∈ R n 1 ×n 2 ×n 3 , the tensor singular value decomposition (t-SVD) of A is discribed by where U ∈ R n 1 ×n 1 ×n 3 , V ∈ R n 2 ×n 2 ×n 3 are two orthgonal tensors, S ∈ R n 1 ×n 2 ×n 3 is an f-diagonal tensor. T-SVD is the basis of the tensor average rank and the tensor nuclear norm described bellow.
Definition 5. (Tensor average rank [14]) Given tensor A ∈ R n 1 ×n 2 ×n 3 , its tensor average rank is as follows: Definition 6. (Tensor Nuclear Norm (TNN) [14]) Given tensor A ∈ R n 1 ×n 2 ×n 3 , the tensor nuclear norm of A is depicted as which is a convex surrogate of tensor average rank.

Related Works
The key issue of removing noise from corrupted visual data is to fully utilize the structure priors of the underlying data. Low-rank prior [18][19][20][21] and nonlocal prior [22][23][24][25] are two commonly-used structure priors for visual data recovery. In this section, we briefly review some works on these two priors.

Low-Rank Property
The low-rank prior of images indicates that the images have some repeating structures, i.e., structural redundancy. Numerous works [19,20,26] have focused on the low-rank property to handle image recovery problems, which transform the image recovery into solving matrix rank minimization. In particular, a gray image can be approximated by a low-rank matrix. Similarly, a color image can be approximated by a low-rank matrix on the three channels independently. Although the matrix rank is capable to describe the global information of matrices, it is NP-hard to be minimized. Thus, many researchers have attempted to find an appropriate surrogate of matrix rank. Candès and Recht [27] originally proposed matrix nuclear norm (MNN) as a convex surrogate of the matrix rank. In [28], it is theoretically proved that MNN is the best convex approximation of the matrix rank. Due to the convexity of MNN, the MNN minimization has a global optimal solution and can be efficiently solved by the Singular Value Thresholding (SVT) [6]. However, MNN ignores the difference between singular values. For visual data, the large singular values contain more important information than the small singular values. Thus, several variants of MNN were proposed to treat the singular values differently. Matrix truncated nuclear norm (MTNN) [29] shrinks only the smallest n − r singular values, where r is a parameter to be estimated. Although MTNN achieves better performance than MNN, it ignores the fact that the large singular values contain a small amount of noise information. To mitigate rather than fully eliminate the shrinkage on large singular values, Gu et al. [30] proposed a matrix weighted nuclear norm (MWNN) and derived a Weighted Singular Value Thresholding (WSVT) to effectively minimize MWNN. By setting the weights to decrease as the singular value increases, the large singular values can be shrunk less and the small singular values can be shrunk more. But it is troublesome to estimate a set of reasonable weights. Recently, by using logarithm function, a nonconvex surrogate of the matrix rank [31] was proposed to adaptively estimate the weights of singular values. Since the estimated weights decrease as the singular values increase, this nonconvex surrogate can adaptively increase the shrinkage on small singular values and reduce the shrinkage on large singular values simultaneously.
One shortcoming of matrix rank based methods is that they cannot preserve the correlations across frontal slices of multidimensional visual data, such as the correlations across RGB channels of color images or the correlations across frames of gray videos. Instead of matrices, tensors provide an efficient way to represent visual data without loss of its structural information. Hence, approximating visual data directly by low-rank tensors has gained significant popularity in recent years. However, the definition of tensor rank is nonunique. The commonly-used definitions are CP rank [10], Tucker rank [10], and tensor average rank [14]. CP rank is defined by the smallest number of rank one tensor decomposition. But the predefinition of CP rank is a challenge. Tucker rank reflects the low-rank property of the matrix unfolded along each mode of tensors. Similar to matrix cases, minimizing the Tucker rank is NP-hard. The sum of nuclear norm (SNN) [12] was proposed as a convex approximation of the Tucker rank for low-rank recovery. But SNN fails to capture the intrinsic correlations between different modes. Recently, the tensor average rank [14] was proposed. Unlike the Tucker rank that adopts matricization along several fixed directions, the tensor average rank can capture more correlations across frontal slices of tensors by block circulant matricization. Since the tensor average rank is NP-hard to be minimized, Lu et al. [14] proposed tensor nuclear norm (TNN) as a convex surrogate of the tensor average rank and further derived a tensor Singular Value Thresholding (t-SVT) to efficiently solve TNN. However, whether it is SNN or TNN, the convex surrogate of tensor rank neglects the difference between tensor singular values, which cannot well preserve the important information of tensor data. Therefore, some nonconvex surrogates of tensor rank [32,33] were proposed to treat the tensor singular values differently.

Nonlocal Self-Similarity
Nonlocal self-similarity is another important prior of images. Compared with global low-rankness, it can capture detailed structural redundancy, resulting in more accurate recovery results. Buades et al. [34] firstly applied the nonlocal self-similarity to gray image denoising and presented the nonlocal mean (NLM) filter. Concretely, NLM estimates each pixel by nonlocal averaging of all pixels in its neighborhood, where the weights for a pixel reflect the similarity of other pixels with it. Another representative work is block matching and 3-D filtering (BM3D) [35], which groups similar 2-D image patches as 3-D arrays and handles these arrays with sparse collaborative filtering. To the best of our knowledge, BM3D is the first work that combines sparsity and nonlocal self-similarity for image denoising. Futhermore, Dong et al. [22] proposed nonlocally centralized sparse representation (NCSR) to learn the sparse coding of nonlocal redundancy in the images, and then utilized NCSR to deal with several image restoration tasks. In a recent decade, there has been a growing interest in using both nonlocal self-similarity and low-rankness of matrix in the field of image processing. Nonlocal low-rank regularization compressed sensing (NLR-CS) [23] was presented to introduce the nonlocal self-similarity into compressed sensing recovery by patch grouping and lowrank approximation. It groups similar 2-D image patches into a matrix and handles the group matrix by low-rank regularization. Note that each group matrix is strongly low-rank due to that the similar patches in one group have a strong correlation. Similarly, the model presented in [24] classifies similar image patches into a matrix and then estimates each matrix by low-rank approximation with truncated singular values.
Recently, nonlocal low-rank matrix recovery has been extended to tensor domain. Unlike the matrix cases, nonlocal low-rank tensor recovery groups similar image patches as a tensor and treats the group tensors as the basic recovery units. For instance, a tensor-based compressed sensing recovery framework (NLR-TFA) [25] was proposed by utilizing nonlocal self-similarity and low-CP-rank regularization. Since computing CP rank is NP-hard, NLR-TFA uses Jenrich's algorithm [36] to estimate CP rank. Additionally, nonlocal low-rank regularization-based tensor completion (NLRR-TC) [37] combines nonlocal self-similarity and low-Tucker-rank constraint for hyperspectral image completion, which can capture both the spatial and spectral correlations of hyperspectral images. It is worth noting that the tensor average rank can well describe the correlation across frontal slices of tensors. In [38], the nonlocal prior is integrated into low-rank tensor completion based on the tensor average rank for visual data completion.
In summary, the nonconvex low-rank regularizer and the nonlocal self-similarity have been widely studied in the literature. Among them, the most relevant to our approach are the works [31] and [38]. Different from the logarithmic function g(x) = log(x + 1) using in [31], our logarithmic function introduces an adjustable parameter θ to further control the level of shrinkage on tensor singular values. Besides, although our NN-TRPCA and [38] are both nonlocal tensor recovery methods based on the tensor average rank, our NN-TRPCA is capable to capture the correlations across frontal slices of 3-D visual data, compared to processing each frontal slice separately in [38].

Proposed Method
In this section, we firstly introduce tensor adjustable logarithmic norm which is a nonconvex surrogate of the tensor average rank and use it to build a nonconvex TRPCA (N-TRPCA) model. Then an optimization algorithm based on ADMM is designed to efficiently solve our N-TRPCA. Finally, nonlocal self-similarity is integrated into N-TRPCA to derive the nonconvex and nonlocal TRPCA model, called NN-TRPCA. Fig. 1 illustrates the flowchart of the proposed NN-TRPCA method.

N-TRPCA Model
Tensor nuclear norm (TNN) is a convex surrogate of the tensor average rank. By Eq. (5), we know that t-SVT shrinks all singular values with the same threshold τ in solving the TNN minimization. But in real scenarios, there exists a great difference between tensor singular values. For instance, the large singular values of a noisy image usually deliver significant information, while the small singular values usually correspond noise. Accordingly, the large singular values should be shrunk less and the small singular values should be shrunk more. For this goal, we introduce a nonconvex surrogate of the tensor average rank as follows: Definition 7. (Tensor Adjustable Logarithmic Norm (TALN)) Given a tensor A ∈ R n 1 ×n 2 ×n 3 , r = min(n 1 , n 2 ), the tensor adjustable logarithmic norm of A is defined as where g(x) = log(θx + 1) is a nonconvex function with adjustable positive parameter θ.
One main advantage of TALN is that it can better preserve the important information in tensor data than the convex envelope of tensor average rank. In Section 4.2, we show that TALN can adaptively estimate weight for each singular value and further show that the weight decreases as the singular value increases. According to these weights, different thresholds are used for shrinking the tensor singular values, and the smaller thresholds correspond to the larger singular values. As a result, TALN shrinks large singular values less and small singular values more. Another advantage of TALN is that it flexibly controls the shrinkage level of g(x) to tensor singular values by the adjustable

Stage 3: Patch aggregation
Performing N-TRPCA Low-rank tensor where λ is a regularization parameter.

Optimization Algorithm of N-TRPCA
As used in [39][40][41], alternating direction method of multipliers (ADMM) is an efficient approach to solve the optimization problem with multiple constraint terms. In the following, we present an algorithm based on ADMM for solving the model (15). The augmented Lagrangian function of model (15) is where P is a Lagrange multiplier and µ is a penalty parameter. The variables L and E can be solved iteratively by minimizing function (16).
Theorem 3. Let Y = U * S * V be the t-SVD of Y ∈ R n 1 ×n 2 ×n 3 , W ∈ R n 1 ×n 2 ×n 3 is a f-diagonal tensor whose ith frontal slice is dig(w i 1 , w i 2 , . . . , w i r ), where r = min(n 1 , n 2 ) and 0 ≤ ω i 1 ≤ ω i 2 ≤ · · · ≤ ω i r . For any τ > 0, a global optimal solution of the following minimization problem is given by the tensor Weighted Singular Value Thresholding (t-WSVT) Proof. In Fourier domain, Eq. (25) is equivalent to = min In Eq. (28), the variablesX (i) are independent. Then, Eq. (28) can be divided into n 3 independent subproblems. By Lemma 1, we know that the global optimal solution of the ith (i = 1, 2, . . . , n 3 ) subproblem is the ith frontal slice ofX * . Thus, X * is the solution of problem (28).
According to Eq. (20) and Theorem 3, the global solution of minimization problem (22) can be obtained by the t-WSVT operator, i.e., where Q k = U * S * V is the t-SVD of Q k , W is a f-diagonal tensor which the ith frontal slice is dig(w i 1,k , w i 2,k , . . . , w i r,k ). According to the weight tensor W, the operator (29) utilizes small thresholds to shrink the large singular values and large thresholds to shrink the small singular values.
Similarly, holding L k+1 , P k , µ k fixed, E k+1 can be updated by where H k = X − L k+1 − µ −1 k P k . It has a closed-form solution where D τ (x) is the soft thresholding operator [42] defined as The whole optimization for N-TRPCA method is summarized in Algorithm 1.

NN-TRPCA Model
As discussed in Section 3.2, nonlocal self-similarity implies that there is a lot of nonlocal structure redundancy in visual data. Therefore, for each patch of visual data tensor, we can find a group of similar tensor patches. Because of the high correlation between similar patches, the tensor formed by stacking similar patches has a low-rank structure. Therefore, we can apply our N-TRPCA to each formed tensor to obtain the final recovered visual data tensor. This inspires us to develop a nonlocal variant of N-TRPCA, i.e., NN-TRPCA. In this subsection, we elaborate the proposed NN-TRPCA model. Its procedure mainly consists of three stages: tensor patch grouping, tensor patch low-rank recovery by N-TRPCA, and tensor patch aggregation.
Tensor patch grouping: Given a corrupted third-order tensor X ∈ R n 1 ×n 2 ×n 3 , we divide X into overlapping tensor patches with the spatial size p × p. We then consider two methods to construct group tensors. See Fig.  3 for an intuitive illustration of these two methods. Method 1 constructs each group tensor by stacking similar patches along the third dimension. Specifically, for each 3-D patch, we search its m − 1 similar patches based on the Euclidean distance. The group of the reference patch and its similar patches is denoted as Ψ t = {Y i ∈ R p×p×n 3 , i = Algorithm 1. N-TRPCA Input: corrupted tensor X. Output: recovered tensorL. Initialize: L 0 = E 0 = P 0 = 0, λ, µ 0 , µ max , ρ, . while not converged do 1. Update L k+1 via (29); 2. Update E k+1 via (31); 3. Update P k+1 via P k+1 = P k + µ k L k+1 + E k+1 − X ; 4. Update µ k+1 via µ k+1 = min(ρµ k , µ max ); 5. Check the convergence conditions 1, 2, . . . , m}, (t = 1, 2, . . . , T ), where T is the number of groups. At last, the group Ψ t is stacked into a third-order tensor X t ∈ R p×p×mn 3 . Differently, Method 2 unfolds tensor patches to matrix patches and stacks similar patches along the first dimension. Specifically, we first reshape all 3-D patches to 2-D patches with size p 2 × n 3 . For each 2-D patch, we search its m − 1 similar patches according to the Euclidean distance. This similar group is denoted as where T is the number of groups. The last step in Method 2 is to stack the group Ψ t into a third-order tensor X t ∈ R m×p 2 ×n 3 . It is worth noting that X t constructed by the two methods are both strongly low-rank for the reason that similar patches in each group have strong correlations.
Tensor patch low-rank recovery by N-TRPCA: After nonlocal similar patches are grouped as X t , (t = 1, 2, . . . , T ), the low-rank tensor L t is estimated from X t by our N-TRPCA model, which is depicted to solve the following optimization problem: s.t. X t = L t + E t , (t = 1, 2, . . . , T ).  [14], SSIM [43], FSIM [44], and ERGAS [45]. The best results are highlighted in bold. Tensor patch aggregation: At last, we reconstruct L t (t = 1, 2, . . . , T ) to its original position in tensor X for obtaining the final recovered tensorL. Note that, a pixel in overlapping regions of patches has multiple estimated values, we perform an averaging of them to obtain the final estimate.

Model
Here, we discuss Method 1 and Method 2 of the first stage. Intuitively, if NN-TRPCA is applied to visual data restoration, adopting Method 1 to construct the group tensors will achieve better restoration performance than using Method 2. The reason is that the unfolding operators in Method 2 will destroy the structural information of tensor patches. To demonstrate our intuition, we conduct color image restoration experiments on 20 images randomly selected from the Berkely Segmentation Dataset [46]. The average quantitative results are tabulated in Table 1. From them, we can observe that whether Method 1 or Method 2 is adopted, NN-TRPCA outperforms N-TRPCA with respective to all the evaluation indices. This shows that the introduction of nonlocal self-similarity is effective for visual data restoration. But using Method 2 obtains better restoration results than using Method 1. This observation is contrary to our intuition. One possible reason is that the size of tensor patches is small, so the loss of structure information caused by unfolding operators can be negligible. More importantly, these two methods stack similar patches along different dimensions. Method 1 uses the Fast Fourier Transform (FFT) to capture the similarity between patches while Method 2 uses the singular value decomposition (SVD). Thus, the other possible reason is that the data compression capability of SVD is better than that of FFT. In this work, for achieving better restoration performance, we choose Method 2 to form the group tensors.
Algorithm 2 describes the overall procedure of the NN-TRPCA model.

Algorithm 2. NN-TRPCA
Input: corrupted tensor X. Output: recovered tensorL. 1. {Ψ t } T t=1 ← Divide the nonlocal similar patches of tensor X into T groups; 2. X t ← Stack the similar patches in group Ψ t as a tensor; 3. for t = 1 to T do 4. L t ← Solve Eq. (33) on X t via Algorithm 1; 5. end for 6.L ← Obtain the recovered tensor by aggregating L t (t = 1, 2, . . . , T ) to the original position in X.

Experimental Results
In this section, we evaluate the performance of the proposed NN-TRPCA method in color image and gray video restoration tasks. The color images and gray videos can be considered as third-order tensors, and the restoration task is to estimate the clean visual tensors from their corrupted versions.

Experimental Setup
For color image restoration, we randomly select 100 color images with size 321 × 481 from the popular Berkely Segmentation Dataset [46] as test images. These images includes different natural scenes and objects, e.g., animals, plants, people, scenery, and buildings. For each color image, we vary the noise rate from 10 to 30 percent. The pixels  We compare the proposed NN-TRPCA 1 with three state-of-the-art TRPCA methods, including SNN-TRPCA [13], KBR-TRPCA [39], TNN-TRPCA [14]. These comparison methods adopt different tensor rank as low-rank constraints. SNN-TRPCA is based on the Tucker rank [10]. KBR-TRPCA is a Kronecker-basis-representation based method that combines the Tucker rank and CP rank [10]. And TNN-TRPCA is based on the tensor average rank [14]. Meanwhlie, in order to demonstrate the effectiveness of each part of the proposed method, our N-TRPCA model is also included for comparison. All experiments are performed on a PC with an Intel Core i5-8500 3.00 GHz CPU and 16 GB RAM.
The performance of different methods are evaluated by four quantitative picture quality indices (PQI), including peak signal-to-noise ratio (PSNR) [14], structure similarity (SSIM) [43], feature similarity (FSIM) [44], and erreur relative globale adimensionnelle de synthèse (ERGAS) [45]. PSNR and SSIM are two commonly-used PQIs in image restoration to measure the similarity between the restored image and the reference one based on MSE and structural consistency, respectively. Unlike SSIM, FSIM is more consistent with human eye perception by utilizing both phase congruency and image gradient magnitude. ERGAS is a measure of spectral fidelity which is based on the weighted sum of MSE in each band. Good restoration results correspond to larger values in PSNR, SSIM, and FSIM, while smaller values in ERGAS.
The parameters of all experiments are set as follows. [λ 1 , λ 2 , λ 3 ] in SNN-TRPCA is empirically set to [15, 15, 1.5] for color image restoration. This setting can make SNN-TRPCA perform well in most image cases. For video restoration, [λ 1 , λ 2 , λ 3 ] are set differently since the three videos have different correlations along each mode. Specifically, [λ 1 , λ 2 , λ 3 ] are set to [12,12,17], [13,13,20], and [12,12,16] for 'Hall & Monitor', 'Candela m1.10', and 'CAVIAR1', respectively. For KBR-TRPCA and TNN-TRPCA, we follow the default parameters setting suggested by their authors. For our N-TRPCA model, we empirically set µ 0 = 1e3, µ max = 1e10, ρ = 1.1, = 1e − 5, and θ = 2, respectively. And as used in [14], our parameter λ is set to 1/ √ max(n 1 , n 2 )n 3 . For our NN-TRPCA model, the patch size p and the number of patches m are empirically set to 10 and 100. Table 2 displays the average quantitative results of competitive methods on test images with different noise rates (10%, 20%, 30%). It can be observed that KBR-TRPCA outperforms SNN-TRPCA evidently. The main reason is that, different from SNN-TRPCA, KBR-TRPCA combines the advantage of Tucker rank and CP rank. In addition, TNN-TRPCA is competitive with KBR-TRPCA and better than SNN-TRPCA. This is practically reasonable because TNN-TRPCA successfully captures the multidimensional structural information in tensors equipped with the tensor average rank. More importantly, our N-TRPCA and NN-TRPCA are superior to other competing methods with respective to all the evaluation indices. This can be attributed to two factors: 1) N-TRPCA well preserves the siginificant information in tensors by shrinking the tensor singular values differently; 2) NN-TRPCA further takes full use of the structural redundancy in color images via introducing nonlocal self-similarity.

Results on Color Image Restoration
For more intuitive comparison, the restoration results of different methods on eight example images are shown in Fig. 4. One can observe that SNN-TRPCA produces serious block artifacts in all scenarios. This is because directly unfolding tensors along each mode will lose structural information in tensor data. The images restored by N-TRPCA also contain some artifacts like by TNN-TRPCA. Fortunately, by introducing nonlocal self-similarity, NN-TRPCA removes the artifacts and yields the best visual effect. It is worth noting that, our NN-TRPCA method can well restore the detail information of the complex images, e.g., texture of starfish, water-drops on flowers, patterns on girl's clothes, and the edge of pyramid. Futhermore, the quantitative results on these example images are recorded in Table 3, from which we can make the following observations. Although N-TRPCA produces artifacts like TNN-TRPCA, N-TRPCA is far superior to TNN-TRPCA in terms of all the evaluation indices, which confirms the effectiveness of the proposed tensor adjustable logarithmic norm to retain the important information in color images. Besides, our NN-TRPCA obtains the best evaluation indices over all competing methods by utilizing the nonlocal redundancy of natural color images. Especially for images with complex structures, such as 'Starfish', 'Flower', and 'Girl', NN-TRPCA still has a significant improvement compared with N-TRPCA. This is due to the fact that the nonlocal self-similarity is abundant in these complex images. By grouping nonlocal similar patches, the group tensor is strongly low-rank, which offers contribution for better restoration. In a word, the proposed NN-TRPCA method delivers the best recovery performance on both visual results and evaluation indices. Fig. 5 lists several frames of restoration results of different algorithms on test videos with noise rate 30%. These videos capture a walking human but in three different scenes. From the recovery results, we can observe that, our N-TRPCA clearly restores the important structure information of videos compared with other competing methods. The reason lies in that the tensor adjustable logarithmic norm used in our model is capable to preserve the important information in videos by shrinking the large singular values less and the small singular values more. It can be further found that, although both N-TRPCA and NN-TRPCA can recover the main structure of walking humans, NN-TRPCA restores the contour of the walking man more accurately. This indicates that the introduction of nonlocal self-similarity enables NN-TRPCA to recover more detail information in videos.

Results on Gray Video Restoration
Meanwhile, as shown in Table 4, the proposed N-TRPCA and NN-TRPCA have yielded very competitive scores of evaluation indices. More specifically, our N-TRPCA can significantly outperform other competitive methods in terms of all the evaluation indices, e.g., N-TRPCA achieves 3.98 dB gain on PSNR and 18.91 on ERGAS beyond TNN-TRPCA on average of the three videos. This is due to the fact that N-TRPCA treats the tensor singular values differently. Furthermore, by integrating nonlocal self-similarity, the performance of NN-TRPCA is improved, i.e.,

Impacts of Parameters
In NN-TRPCA, there are three algorithmic parameters, i.e., adjustable parameter θ, patch size p, and the number of patches in each group tensor m. The parameter θ controls the shrinkage level to tensor singular values. The larger θ indicates more shrinkage. To analyze the impact of nonconvex parameter θ, we fix other parameters and perform N-TRPCA on the test images with different values θ = 1, 1.5, 2, . . . , 5. The average change curves of four PQIs with varying θ are plot in Fig. 6. From them, we can observe that the best restoration results are obtained at θ = 2. Thus, the parameter θ is empirically set to 2.
The patch size p and the number of patches in each group tensor m are two important parameters to capture the nonlocal redundancy in images. When p is too small, it is hard to preserve the local structure of patches. When p is too large, there are too many details in each patch, resulting in the reduction of the similarity of patches in one group. Besides, a too small m could divide the closely similar patches into different groups and a too large m could classify other dissimilar patches into one similar group. Therefore, a too small m or a too large m will affect the accuracy of patch grouping, leading to a degraded restoration performance. To study the influence of parameters p and m, we run our NN-TRPCA algorithm on the test image 'Elephant' with different values of p and m.

Computational Cost
We next compare the computational efficiency of the proposed algorithms to that of three state-of-the-art TRPCA algorithms. Table 5 summarizes the average running time of different algorithms on the aforementioned tasks. From it, one can observe that the running time of our N-TRPCA is slight higher than that of TNN-TRPCA. This is because TNN-TRPCA algorithm simply uses a fixed threshold to all the singular values shrinkage while our N-TRPCA adaptively calculates the threshold for each singular value to shrink the small singular values more and the large ones less. Note that the computational cost of our NN-TRPCA is high expensive. The main computational cost of NN-TRPCA is the calculation of t-SVD for each nonlocal group tensor, and the running time of the patch grouping stage as well as the aggregation stage can be negligible. In fact, since the t-SVD of each group tensor is calculated independently, a direct acceleration way is parallel GPU implementation for the t-SVDs. Besides, as used in [48,49], the Lanczos

Conclusion and Future Work
TRPCA, aiming to recover a low-rank tensor from it corrupted version, has attracted considerable interest in the fields of image and video processing. In this work, we firstly build a nonconvex TRPCA (N-TRPCA) model based on the tensor adjustable logarithmic norm which can better retain the significant information of visual data. Furthermore, to fully utilize the structural redundancy in visual data, we propose a new TRPCA model with nonconvex and nonlocal regularization (NN-TRPCA) by introducing the nonlocal self-similarity. This model can well recover the edges and textures in images and videos. Meanwhile, an efficient algorithm based on alternating direction method of multipliers is designed to solve the proposed model. Extensive experimental results on widely-used datasets confirm the effectiveness of the proposed model compared with several state-of-the-art TRPCA models.
For future work, there are two possible expansion directions. Firstly, this paper focuses on dealing with threedimensional visual data due to the fact that the t-SVD and tensor average rank are defined on three-way tensors. Therefore, one can generalize the t-SVD and tensor average rank for higher-dimensional tensors, and further exploit a higher-order version of our NN-TRPCA. Secondly, in order to obtain lower-rank tensors, some studies have attempted to replace the discrete Fourier transform used in t-SVD with other invertible linear transform, such as framelet transform [50], discrete cosine transform [51]. Inspired by this, we will attempt to find a more suitable linear transform to capture the low-rankness of visual data, so as to achieve better performance of visual data restoration.