Hyperspectral Image Denoising via Nonlocal Spectral Sparse Subspace Representation

Hyperspectral image (HSI) denoising based on nonlocal subspace representation has attracted a lot of attention recently. However, most of the existing works mainly focus on refining the representation coefficient images (RCIs) using certain nonlocal denoiser but ignore the understanding why these pseudoimages have a similar spatial structure as the original HSI. In this work, we revisit such vein from the respective of principal component analysis (PCA). Inspired by an alternative sparse PCA, we propose a spectral sparse subspace representation strategy to simultaneously learn low-dimensional spectral subspace and novel RCIs with sparse loadings. It turns out that the resulting RCIs possess a more significant spatial structure due to the adaptive sparse combination of spectral bands. A simple nonlocal low-rank approximation is then employed to further remove the residual noise of the RCIs. Finally, the entire denoised HSI is obtained by inverse spectral sparse PCA. Extensive experiments on the simulated and real HSI datasets show that the proposed nonlocal spectral sparse subspace representation method, dubbed as NS3R, has excellent performance both in denoising effect and running time compared with many other state-of-the-art methods.


I. INTRODUCTION
H YPERSPECTRAL images (HSIs) are acquired through an imaging spectrometer with high resolution in the spectrum that can cover the region from ultraviolet to infrared wavelengths. Owning to their rich spectral information, HSIs have been widely applied to countless fields over recent decades, such as environment monitoring, mineral exploration, food safety detection, and military security, to name a few [1], [2], [3], [4]. However, in the presence of instrumental errors during imaging, HSIs are inevitably corrupted by a certain amount of noise, which significantly hinders the subsequence task. Therefore, HSI denoising is an indispensable preprocessing step for subsequent downstream applications.
From the very beginning, it is natural to use classic singleimage denoising methods to denoise HSI by treating each band as a grayscale image [5], [6], [7], [8], [9]. Such heuristics tend to be spatial structure-based approaches, which ignore the very significant global correlations along the spectrum in an HSI. This has led many HSI denoising methods to further take spectral information into account. A typical strategy is to model the clean HSI as a low-rank matrix/tensor via various low-rank factorizations or rank-related regularizations. Such a low-rank modeling manner has been verified to help enhance the stability of the model solution [10], [11], [12], [13], [14], [15], [16].
To further improve the denoising performance, many researchers incorporate other spatial priors into the HSI low-rank models [17], [18], [19], [20], [21], [22], [23]. Among them, the nonlocal self-similarity property has been heavily adopted, i.e., there exist many groups of nonlocal full band patches [(FBPs), stacked by patches at the same spatial location over all bands] that are very similar to each other. This well-known prior inspires a series of advanced methods for Gaussian noise removal [17], [20], [21], [22], [23]. While these methods achieve significant denoising performance, one drawback is that they are in general quite time-consuming due to the large spectral number of HSI data. For example, many classical methods, e.g., LLRT [20] and KBR [21], usually take hundreds or even thousands of seconds on a common HSI data, and thus, they cannot meet the fast processing requirements in practice.
In recent years, the line of HSI denoising based on the nonlocal subspace representation (NSR) has received a lot of attention since it subtly embeds the powerful spatial self-similarity into the basic spectral low-rank modeling within a smaller time complexity [24], [25], [26], [27], [28], [29]. For an HSI X ∈ This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ Fig. 1. Comparison of the RCIs in HSI spectral subspace representation based on the existing PCA and the proposed sparse PCA. The used dataset is the simulated DC Mall with Gaussian noise variance 0.01 and the number of subspace is set as 6. One can find that the sparse PCA-based RCIs have clearer structure with much bigger signal-to-noise ratio.
R H×W ×B that lies in a low-dimensional spectral subspace, it can be factorized as where × 3 denotes mode-3 tensor product, V ∈ R B×R (R B) satisfying V T V = I is called the spectral subspace, and U ∈ R H×W ×R is the corresponding coefficient. The R slices of U are called representation coefficient images (RCIs 1 ) due to the observation that these slices look like images with a similar spatial structure as the true HSI, see Fig. 1 for example in noisy case. Generally, two core procedures exist in these NSR-based methods for HSI denoising. One is the subspace estimation using spectral low-rank decomposition, and the other is the denoising of RCIs via some certain nonlocal self-similarity-based denoiser. Note that the number of the RCIs is much smaller than the spectral number of the original HSI, implying that the size of the FBPs is largely reduced. As a result, the NSR-based HSI denoising methods often achieve good denoising performance while drastically reducing the running time.
However, most existing works mainly attempt to remove the noise in the RCIs using certain nonlocal-based denoisers, neglecting to analyze whether the resulting RCIs inherit the strong spatial nonlocal self-similarity properties of the original HSI data in noisy environments. With this in mind, we first rethink the HSI subspace representation. The principle behind that can be viewed as a principal component analysis (PCA) dimension reduction process. Viewing each band of HSI as a variable, the estimated spectral subspace corresponds to the first R feature vectors of the HSIs unfolding matrix along the spectrum, and the obtained RCIs U = X × 3 V T correspond to the principal components (PCs) after dimensionality reduction of feature projection. From the PCA perspective, each PC is a linear combination of all variables, which means that each RCI 1 also called eigenimages in [25] or reduced images in [28].
is a combination of all bands of the HSI, i.e., U (:, :, r) = B i=1 V(i, r) · X (:, :, i)∀r = 1, . . . , R. (2) It should be noted that, for the HSI data, the ground object structure is approximately the same in each band, implying that the image structure is roughly consistent across the bands. This allows each RCI to be viewed as a pseudoimage that looks very similar to the original HSI in terms of spatial structure. Thus, one can expect that there exist similar enough FBPs in U since the corresponding nonlocal patches are also similar in every band.
Note that PCA has one limitation in high-dimensional case considering that the orthogonal matrix V is dense, i.e., the projected PCs lack certain interpretability since every PC is explained by all the variables [30]. Back to the HSI subspace representation, this corresponds that every RCI uses all the spectral information of an HSI, leading to its spatial structures that are not significant enough due to the superposition of every band. Besides, when noise exists, using all spectral bands will also cause the noise in the RCIs to be accumulated, thereby destroying the spatial structure of RCIs. The above phenomenon can be seen in Fig. 1(a) from which one can clearly see that the spatial structures of the third to sixth RCI acquired by PCA have been severely damaged by noise. Such a phenomenon would also bring the bias problem during denoising the RCIs via certain nonlocal denoiser. Although there suggests refinement strategy in recent works [28], [29], where one can estimate the subspace and denoise the RCIs iteratively to enhance denoising performance, the iteration demand inevitably increases the running time. Especially, such refinement trick cannot always bring performance lifting for all nonlocal-based methods.
To alleviate the above problem, we refocus our sights on sparse PCA, a classical alternative of PCA, which is more robust to noise [30], [31]. It attempts to make the PCs more understandable by zeroing out unnecessary or nonsignificant variables in the linear combination of the projected PCs by using a certain sparsity-promoting penalty on the loadings. In particular, when many variables contain only noise, sparse PCA can automatically zero out extremely noisy variables before PCA, leading to more robust-projected PCs. For comparison, we also present the RCIs obtained by sparse PCA in Fig. 1(b). One can see that the spatial structures of the third to sixth RCI obtained by sparse PCA are significantly better than the corresponding RCIs obtained by PCA. It should be additionally pointed out that the variable selection ability of sparse PCA is very important for real HSI denoising because the actual HSI data have very bad pollution in some spectral bands, such as air and water pollution. If the spectral bands with severe noise also participate in the construction of the RCIs, the performance of the denoising methods will inevitably be reduced. To better illustrate this, we take real Indian data as an example to show the variable selection ability of sparse PCA in Fig. 2. From the figures in the third row, it can be seen that the RCIs via sparse PCA have sparse combination with many exact zero coefficients compared with PCA, which means that sparse PCA can identify  [150,163], and 220 covering the region of water absorption fully. One can find that the sparse PCA-induced RCIs loadings distinguish these corrupted bands, and the resulting RCIs contain more significant spatial structure. these contaminated spectral segments so that the acquired RCIs have more significant spatial structure.
Inspired by the above, this work proposes a novel HSI denoising method based on NSR under a new spectral sparse PCA. Concretely, we employ an elastic net sparsity-promoting spectral sparse subspace representation for HSI data to estimate spectral subspaces and corresponding RCIs. Such a strategy enables more flexibility in the subspace representation of HSI data with tunable parameters. Especially, it degrades into the classical subspace representation when dropping the sparse regularization. Through further denoising these enhanced RCIs via a simple nonlocal low-rank matrix approximation on their grouped FBPs with adaptive noise variance estimation, and then inversely subspace projecting to get the ultimate denoised HSI, a novel HSI denoising method that we named as Nonlocal Spectral Sparse Subspace Representation (NS3R) is proposed. The proposed method achieves a comparable or even better denoising performance compared with other state-of-the-art methods while taking much less time.
In summary, the contribution of this work is threefold. 1) This work rethinks the series of NSR-based HSI denoising studies from the PCA perspective. Inspired by the variable selection ability of sparse PCA, we propose a novel spectral sparse subspace representation for HSI, which makes the resulting RCIs more significant in terms of spatial structure and more adaptive to extreme noise corruption in real HSI datasets. 2) Based on the spectral sparse subspace representation, we propose the NS3R algorithm for HSI denoising by further employing a simple nonlocal low-rank approximation on the obtained RCIs and an inverse subspace projection. Especially, the proposed algorithm do not rely on the iterative refinement trick hardly as the related peers. 3) Comprehensive experimental results demonstrate that the proposed NS3R method obtains excellent performance in terms of both denoising effect and running time compared with many other state-of-the-art methods. Especially, it achieves the best level of visual effect while it takes the second shortest running time in real HSI denoising. Notations: Throughout this article, we denote scalars, vectors, matrices, and tensors in light, bold lower case, bold upper case letters, and upper cursive, respectively. For an Norder tensor X ∈ R I 1 ×I 2 ×···×I N , its mode-n unfolding matrix is denoted as X (n) := unfold n (X ) ∈ R I n ×(I 1 ...I n−1 I n+1 ...I N ) , and fold n (X (n) ) = X , where fold n is the inverse of unfolding operator. The mode-n product of a tensor X ∈ R I 1 ×I 2 ×···×I N and a matrix A ∈ R J n ×I n is denoted as Y := X × n A, where Y ∈ R I 1 ×···×I n−1 ×J n ×I n+1 ×···×I N , meaning that Y (n) = AX (n) . The inner product of two matrices XandY of the same size is defined as X, Y : Then, the corresponding Frobenius norm (also called as 2 norm) and 1 norm are defined as X F := X, X and The rest of this article is organized as follows. Section II introduces some related works in detail. In Section III, we present the main proposed method. Experiments are shown in Section IV. Finally, Section V concludes this article.

II. RELATED WORKS
In this section, we detail some related works. For simplicity, we refer to the recent series of NSR-based works as NSR-related works and other classical works as NSR-unrelated works.

A. NSR-Unrelated Works
There are numerous works on HSI denoising. First, classic single-image denoising algorithms, including transform based (such as wavelet [5]), sparse representation based (such as KSVD [7]), and nonlocal based (such as BM3D [8] and WNNM [9]), can be extended to denoise HSI with multiple bands. For example, Rasti et al. [32] used three-dimensional (3-D) wavelet filtering for HSI denoising. In [33], sparse coding was exploited to model the redundancy information in HSI and then noise can be removed by sparse approximated data with a learned dictionary. These methods mainly use spatial information and neglect significant spectral correlations in the HSI.
Considering spectral low rankness in HSI, a series of low-rank modeling-based HSI denoising methods have been developed. Zhang et al. [10] unfolded the HSI along the spectral dimension and adopted low-rank matrix recovery to remove the noise. Later, Chen et al. [12] used a nonconvex low-rank matrix approximation for better rank relaxation. Instead of focusing on the low-rank part, Cao et al. [11] proposed a robust low-rank matrix factorization (LRMF) with mixed exponential power-based noise modeling, which significantly improved the applicability for complex noises in real HSI. Besides, there also have some low-rank tensor-based models viewing the HSI as a three-order tensor, see the articles presented in [14], [15], and [16].
To further enhance the denoising performance, many researchers have introduced spatial priors, such as local smoothness and nonlocal self-similarity into certain low-rank models. The local smoothness is often encoded by total variation (TV) norm that is widely used in HSI mixed noise removal. Typically, He et al. [18] proposed a TV regularized LRMF model. Wang et al. [19] integrated the spatial-spectral joint TV into low-Tucker-rank tensor decomposition. Besides, there has also emerged several transform-based TV variants, such as group TV [34], E3DTV [35], CTV [36], and t-CTV [37], gaining improved denoising performance. Except for the local prior, the nonlocal self-similarity is more widely used in many works for normal Gaussian denoising tasks [17], [20], [21], [22], [23]. For example, Chang et al. [20] used a hyper-Laplacian regularization on the nonlocal low-rank approximation model. Xue et al. [22] extended the nonlocal low-rank model from matrix to tensor. Xie et al. [21] proposed a Kronecker-basis-representation-based tensor sparsity for the nonlocal FBPs constructed low-rank tensor, achieving significant performance enhancement. While these nonlocal-based methods achieve promising performance for HSI denoising, they often face critical limitations in terms of computational complexity due to the large nonlocal FBP of the entire HSI.

B. NSR-Related Works
Very recently, a series of novel nonlocal HSI denoising works that couple well with subspace representations are gaining traction. It was found that the noise is mainly localized in these representation coefficients under spectral low-rank factorization, which has the same property of spatial nonlocal self-similarity as the original HSI. Thus, one can use certain nonlocal denoisers on these small-sized RCIs rather than on the original HSI. Early works include FastHyDe [25], where the authors used famous BM3D to denoise these RCIs separately. Although it runs very fast, the performance needs further improvement since the nonlocal denoising part is very heuristic. They then proposed a multidimensional low-rank approximation on these RCIs' FBP-constructed 3-D tensor for better performance [26]. As another representative method, NGmeet [28] integrated such a global and nonlocal combined model into an iterative framework using a refinement strategy, achieving very advanced effects. Compared with NGmeet, TenSRDe [29] can be viewed as a tensor version of such a NSR-based HSI denoising method. More related works refer to the articles presented in [24], [27], [38], and [39]. To the best of our knowledge, these NSR-related methods are nearly state-of-the-art and also very fast compared with the previous SR-unrelated methods.
All these methods have two key steps in common, namely, subspace estimation and RCI denoising. They usually use truncated singular value decomposition (or PCA) and related variants, such as HySime [40] in the first step, and nonlocal denoiser in the second step is the main point in these works. For example, Zhuang and Bioucas-Dias [25] use the BM3D, the authors in [28] and [29] use the WNNM, and the authors in [26] and [27] use the successive SVD [41]. But they all ignore the analysis of these RCIs possessing well spatial nonlocal self-similarity and whether the resulting RCIs can inherit significant spatial structures of the original HSI data in noisy environments. Although the iterative refinement strategy is used in [28] and [29], where one can estimate the subspace and denoise RCIs iteratively to enhance denoising performance, the iteration demand largely increases the running time. Therefore, there is still room for further enhancement of denoising performance and runtime along this research vein. The recent work [42] has made some analysis that the RCIs inherit the spatial local smoothness from the original HSI. Differently, in this work, we focus on spatial nonlocal self-similarity-based HSI denoising.
It should be noted that all the works introduced above are model driven. In these years, the data-driven approaches represented by deep learning are also very promising for HSI denoising, see the articles presented in [43], [44], [45], [46], [47], [48], [49], [50], and [51]. We omit a detailed introduction to them, considering that our work follows the line of research on model-driven HSI denoising.

A. Problem Formulation
Let X ∈ R H×W ×B be an HSI data in a 3-D cube format, where H and W denote the height and width in spatial, respectively, and B is the number of the band in spectral. Assuming that the HSI is corrupted by the additive Gaussian noise N , then the noisy HSI Y is generated by Without a specific statement, the Gaussian noise N is assumed independent identically distributed with mean zero and variance σ 2 . Under the significant spectral subspace representation of the underlying HSI data X , modeled as (1), the observation model is the orthogonal spectral subspace capturing the common subspace along the spectrum, and U ∈ R H×W ×R is the corresponding coefficients containing R slices of RCI sized H × W . Except for the spectral low rankness, the spatial nonlocal self-similarity can be subtly incorporated into the model (4) via using certain nonlocal regularizer Φ(U ) on the RCIs, thus leading to the following model for HSI denoising: where the first term is the fidelity term corresponding to the Gaussian noise from a Bayesian perspective, and the second one is the regularization term with tradeoff parameter λ.
1) Subspace estimating: 2) RCIs denoising: As previously introduced, the existing methods implement the first step as a PCA process along the spectral direction and mainly focus on the latter RCI denoising step, which brings limitations in terms of significance and robustness to the spatial structure of associated RCIs.

B. Spectral Sparse PCA
To facilitate the above issues, we propose a spectral sparse PCA strategy to estimate spectral subspace with orthogonality and corresponding RCIs with sparse loadings simultaneously from a noisy HSI.
For a given noisy HSI data Y, as analyzed before, the classical spectral subspace estimation is actually a PCA process resulting the PC Y × 3 V T with the orthogonal bases V. This can be reformulated as a least-squares problem minimizing the sum of squared residual errors between the input and the projected data, i.e., The columns of V are also known as principal directions or loadings. The PCA imposes orthogonality constraints on the weight loading matrix V. Such mechanics makes the resulting noisy PCs, Y × 3 V T , lack significance and robustness, namely, each noisy PC (or noisy RCI), Alternatively, sparse PCA aims to find a set of sparse weight loadings, i.e., the constructed PCs with only a few "active" (nonzero) combination coefficients. Here, we adopt the seminal work by Zou et al. [30] that incorporates the sparsityinducing regularizer into the regression formulated PCA. More concretely, we simultaneously estimate the orthogonal spectral subspace V and PCs (or RCIs), Y × 3 S T , with a new, elastic net (a combination of 1 and squared 2 norm) regularized sparse loadings S ∈ R B×R , i.e., convert the model (8) as follows: where α, β > 0 are the regularization parameters. A larger value of α tends to a sparser result of S, and the β-related squared 2 term is used to control its robustness. We adopt the variable projection-based optimization framework [52] to solve the spectral sparse PCA model (9). Given the variable S, the optimization for V reduces to an orthogonal Procrustes problem [53], i.e., Because of the orthogonality of V, it leads to which has a closed-form solution where svd(·) denotes the singular value decomposition. Next, the proximal gradient method [54] is used to minimize the variable S by viewing the elastic net involved parts α S 1 + β 2 S 2 F as regularization term and the rest in as the proximal term. Then, S is updated by where is the proximal gradient of S, γ > 0 is the step size, and Scaled-ST(·) denotes the scaled soft thresholding operator given by

Scaled-ST(Z) = arg min
The overall solver for spectral sparse PCA is summarized in Algorithm 1. Note that such a variable-projected-based algorithm is proved to be sublinear convergent in [52] with recommended step size γ = 1/ Y (3) 2 F . Especially, compared with the previous adopted PCA and the widely used HySime [40], the alternative spectral sparse PCA using Algorithm 1 obtains more significant spectral subspace representation but adds almost no extra time cost, see Table I for example. Visual results can be found in Figs. 1 and 2 mentioned before.

C. Proposed NS3R Method
We then propose the main NS3R method by embedding the above spectral sparse PCA into the two main steps of the previous NSR framework.
For the first step, instead of (6), we use spectral sparse PCA to simultaneously estimate a spectral subspaceV and a sparse weight loading matrixŜ from the noisy HSI Y, i.e.,

{Ŝ,V} = arg min
It, thus, obtains the more significant and robust RCIs Y × 3Ŝ T benefit by the sparsity ofŜ.
1) Block Matching: For 3-D patch P 1 j ∈ R n×n×R inŪ, performing the block matching [8] to search its nonlocal similar patches P 2 j , P 3 j , . . . , P K j , where j is the patch index and K is the searching number of similar patches. 2) Stacking: Stacking these nonlocal similar patches as a patch matrix M j sized n 2 R × K, whose kth column corresponds to the vectorization of P k j . 3) Low-Rank Approximation: Performing the WNNM [9] based low-rank approximation on M j , i.e, where M w, * denotes the weighted nuclear norm from the article presented in [9] and λ is set as σ 2 n /2, where σ 2 n is the noise variance in M j that can be estimated (see Remark 1). 4) Inverse Stacking: Getting back to the denoised patcheŝ P 1 j ,P 2 j , . . . ,P K j from the denoised patch matrixM j . 5) Aggregating: Aggregating all denoised patches back toŪ and finally obtaining the denoised RCIsÛ . Remark 1 (Noise estimation): Suppose the noise variance in Y is σ 2 , then the noise variance σ 2 r of the rth RCI inŪ equalsŜ(:, r) TŜ (:, r) · σ 2 and we estimate the noise variance σ 2 n in the nonlocal patch matrix M j by the mean of all σ 2 r . For real datasets, one can estimate σ 2 by methods, such as multiple regression theory-based approach [40].
Finally, based on the spectral subspace representation with the denoised RCIsÛ via (15) and estimated spectral subspacê V via (14), the entire denoised HSI is obtained bŷ and run another round of the procedures in steps 1 to 3; Output: denoised imageX which we called as the inverse spectral sparse PCA since it is back to the original high-dimensional HSIX from the spectral subspaceV estimated by the spectral sparse PCA. The whole NS3R procedure is summarized in Algorithm 2 and Fig. 3 gives its corresponding flowchart.
Remark 2 (Refinement): In practice, one can iteratively update the input noisy image Y by μX + (1 − μ)Y with control parameter 0 < μ < 1 and then run several more rounds of the procedures in (14)- (16) to enhance the denoising outputs. Such a strategy is called refinement in [28] and [29], which is also widely adopted in many early works, such as the articles presented in [9], [20], and [21]. Different from the existing works [28], [29], the proposed NS3R can achieve good enough performance only in one round and usually only need another round of refinement to reach the best outputs. This makes our NS3R method much faster than the peers [28], [29] as also verified in subsequent experimental results. Especially, if we neglect such a refinement strategy in the related HSI denoising methods, our proposed method achieves nearly the best tradeoff between the denoising effect and running time. One can realize this in the discussion part of the subsequent experiments.
As a supplement, we make some discussion on the computational complexity of the proposed NS3R method, i.e., Algorithm 2. Its main computational cost comes from the first two steps, spectral sparse PCA and nonlocal RCIs denoising. For the former, it is an iterative algorithm whose computational complexity in each iteration is O(HW B 2 + B 2 R + BR 2 ), which is nearly a matrix multiplication time since R B. For the latter, its computational complexity is O(T n 2 RK 2 ), where T is the number of groups of similar patches determined by the patch size and the image itself. In general, the second step costs much more time and, thus, more rounds of refinement will clearly increase the running time.

IV. EXPERIMENTS
In this section, we perform a series of experiments to demonstrate the effectiveness of our proposed NS3R method for HSI denoising. All experiments are implemented on the platform MATLAB (2021a) with Intel(R) Core(TM) i5-10400F 2.90-GHz CPU and 16 GB memory.

A. Simulated Experiments
We first conduct denoising experiments on simulated datasets with ground truth to quantitatively evaluate the performance of the proposed NS3R method.
1) Experimental Setup: Two public benchmarks' HSI datasets were used. The first one, Pavia Centre (PaC 2 ), contains 1400 × 512 pixels and 102 spectral bands. We select its subimage sized 200 × 200 × 80 with noise-free bands in our experiments. The second one, Washington DC Mall (WDC 3 ), contains 1208 × 307 pixels and 210 bands, including 191 noisefree bands. The subimage sized 256 × 256 × 191 is used in our experiments. For all tested HSIs, the gray values are rescaled to [0,1] via min-max formula band by band. We add the additive Gaussian noise with mean zero and variance σ 2 on the benchmark datasets to obtain the simulated noisy HSIs with σ varies from 0.05, 0.1, and 0.2 to 0.4, and then conduct the denoising experiments under different noise levels.
The following 13 related methods are used for comparison, which can be divided into two categories corresponding to the related works in Section II, i.e., the NSR -unrelated methods, including BM4D [55], LRMR [10], E3DTV [35], CTV [36], TDL [17], LLRT [20], KBR [21], and RCTV [42], and NSR -related methods, including FastHyDe [25], SNLRSF [27], NGmeet [28], GLF [26], and TenSRDe [29]. They are mostly representative of the state-of-the-art for HSI denoising. All the involved parameters in these methods are set by their released codes and fine-tuned to obtain the best results. For the number of spectral subspace, R, which is crucial in most NSR-related methods, can be initially estimated by methods, such as HySime [40], and then fine-tuned. 2  Five quantitative quality indices are employed to evaluate the denoising performance, including the peak signal-to-noise ratio (PSNR), structure similarity (SSIM [56]), feature similarity (FSIM [57]), erreur relative global adimensionnelle de synthèse (ERGAS [58]), and spectral angle mean (SAM [18]). Larger values of PSNR, SSIM, and FSIM imply better recovered images. Lower values indicate better results for the other two metrics, ERGAS and SAM, which are widely used for HSI data. By the way, we also very care about the running time of every considered HSI denoising methods since this is very important in practice.
2) Quantitative Comparison: For each noise case, Table II lists the calculated evaluation indices' values of the denoising results by all the competing methods tested on the two benchmark datasets and also the corresponding running time (in s). Note that the values of PSNR, SSIM, and FSIM are computed from the mean value of that in every band.
From the results, as presented in Table II, it can be obviously observed that our proposed NS3R method achieves the best performances on these five quantitative quality indices in all cases. It performs better than NGmeet, the second best method, while its running time is around the half. Compared with FastHyDe, the fastest and recently very popular HSI denoising method, it achieves a significant improvement in denoising within the second fastest running time of all NSR-related methods. For example, there is around 1.5 dB lifting than that of FastHyDe in PSNR on the PaC dataset under case σ = 0.05. This indicates a good tradeoff between the denoising effect and denoising time for our proposed method, indicating its potential in real-world tasks with fast processing requirements.
There have other findings from Table II. Among all NSR-unrelated methods, we can find that these spatial nonlocal-based ones, including BM4D, LLRT, and KBR, are usually better than others, such as LRMR, which only considers the spectral low rankness, and E3DTV, CTV, and RCTV, which further utilize the spatial local smoothness. This verifies that the nonlocal property is more powerful in Gaussian noise removal. Moreover, it can also be observed that these NSR-related methods outperform these NSR-unrelated methods in most cases. Especially, the running time of most NSR-related methods, such as FastHyDe, NGmeet, and our NS3R (around 4 s, 40 s, and 20 s for the PaC dataset, respectively), is also much less than that of those classic nonlocal methods, including BM4D, LLRT, and KBR (around 80 s, 800 s, and thousands of seconds for PaC dataset, respectively), although they all consider the nonlocal self-similarity. The key difference is that these NSRrelated methods perform nonlocal denoising only on these very few RCIs, while those classical nonlocal methods require  nonlocal-related computations on the entire HSI with many bands. This reveals the superiority of NSR-based modeling manners for HSI data since such paradigms well combine the spatial nonlocal structure and spectral low rankness. Note that the proposed NS3R method shows the best performance in the evaluation indices while taking the second shortest running time compared with all state-of-the-art NSR-related methods, demonstrating the positive effect of the embedded spectral sparse PCA.
3) Results Comparison: Next, we show some specific denoising results. Fig. 4 presents the denoised images of all competing methods on the 50th band of the PaC dataset when noise level σ = 0.1 from which we can easily observe that our proposed NS3R obtains the best result. The enlarged local subfigure of our NS3R is closest to that of the original ground truth observing that the residual map (the difference between the denoised band and the original band in the local area) has the least image information. The corresponding values of PSNR, SSIM, and FSIM listed right above the images also show the superiority of our method.
Besides, we also plot the spectral signature curves on a specific pixel point (60, 60) in Fig. 5. Note that the spectral signature is very important for HSI-related applications, such as recognition and unmixing, as it reflects the physical property of the material. From Fig. 5, we can see that the curve of denoised image obtained by our NS3R is closer to that of the original ground truth. This indicates that the proposed NS3R method not only removes the noise well in each band but also adequately preserves the spectral curve of the material.
For the other WDC dataset used for testing, Figs. 6 and 7, respectively, provide some examples of the spectral band and signature curves of the denoised HSI obtained by all competing methods when noise level σ = 0.4. These results also reveal the state-of-the-art performance achieved by the proposed NS3R method. Similar results can be observed for other cases of noise levels and tested datasets. We omit more duplicated results due to page limits.

B. Real Experiments
Compared with the simulated denoising, real HSI denoising is more important and challenging due to the complex noise in real HSI datasets. We then test the proposed NS3R method for HSI denoising on real datasets, showing its remarkable performance.

1) Experimental Setup:
We select four public real HSI datasets, including the Indian, Urban, Local, and Terrain, which are available in Pursue university MultiSpec site. These real datasets often contain complex noise, especially in some bands that are heavily corrupted by atmospheric and water absorption, for example, the 104-108th, 150-163th, and 220th band of the Indian dataset. For comparison, we also make competition with the aforementioned 13 related methods. During experiments, we adopt the multiple regression theory-based algorithm [40] to estimate the initial noise variance, which is also widely used in these related works.
2) Results Comparison: Fig. 8 shows the denoised images of all competing methods on the 106-111th bands, where the red box in the lower right corner enlarges the green boxed subfigures twice the time for better comparison.
There are some observations from these denoised bands. First, the denoised results obtained by our NS3R method achieve very promising visual performance in all bands. Second, the results obtained by the other two methods, SNLRSF and GLF, are also quite close to our results. Compared with other results, the denoised bands by the two methods and our NS3R contain more image information with more clear texture structure, especially observing the 106th band in Fig. 8. Third, due to the complex noise, some classic nonlocal methods, such as BM4D and KBR, perform poorly and their results still contain noise. In contrast, most of the NSR-related methods are relatively better. Again, this demonstrates the importance of the subspace representation for HSI data. Besides, we find that another state-of-the-art method, NGmeet, does not perform as well in real denoising, unlike in the simulated dataset, where the denoised results are clearly worse than our NS3R results. Note that both NGmeet and our NS3R employ WNNM-based nonlocal RCIs denoising; the main difference being that is we propose spectral sparse PCA to obtain novel RCIs, which partially shows the positive effect of spectral sparse PCA. As pointed out in Section I, the reason behind this could be that the sparse PCA can adaptively distinguish the corrupted bands and then get meaningful RCIs (see Fig. 2). Similar findings as above can be observed from the denoised results on other real datasets used for testing, see Fig. 9 for   In anticipation of visual results, denoising time is also very crucial for real HSI denoising tasks. Therefore, we list the detailed running time of all competing methods on various real data in Table III. We can find that our NS3R is the second fastest among all the competing algorithms, and our running time is very close to that of FastHyDe, the fastest method. Another method that is fast enough is RCTV. But, the denoised results by our NS3R method are clearly better than that by the FastHyDe and RCTV observing in Figs. 8 and 9. This indicates that the proposed NS3R achieves a better tradeoff between denoising performance and running time.
C. Discussion 1) Parameter Analysis: Here, we analyze the involved tuning parameters in the NS3R, including the number of subspace R, regularization parameter α, β in spectral sparse PCA step, the self-similar patches' size n and number K in RCIs nonlocal denoising step, and the parameter μ in the refinement step. We take the simulated PaC dataset for testing. Fig. 10 shows the denoising performance (evaluated by PSNR) versus all these parameters, respectively. Especially, we plot the 2-D surface of α and β when noise level σ = 0.1, such grid searching is beneficial to obtain ideal parameter selection. From these results, we can find that the proposed NS3R is relatively stable for the involved parameters, where the PSNR values versus different parameters are slightly fluctuating and reach a fine level in a certain interval. To obtain good performance, we empirically set n = 4, K = 200, μ = 0.5, andβ = 1e-6 in all experiments, and α = {1e-5,3e-5,5e-5,1e-4} and R = {7, 6, 5, 4} for each noise-level case σ = {0.05, 0.1, 0.2, 0.4} as default settings.
2) Refinement Study: Refinement is a common strategy used in many denoising method that can relatively enhance the denoising effect, especially in the nonlocal-based methods. Note that, in the above adopted nonlocal baselines, the LLRT, KBR, NGmeet, and TenSRDe adopt such iterative refinement trick, while the FastHyDe, SNLRSF, and GLF not. For the sake of fairness, we make further ablation study on the refinement strategy in these methods and also our proposed NS3R. Here, we take the simulated WDC dataset for testing. Table IV(a) and (b), respectively, lists the quantitative results of the above-mentioned four baselines (LLRT, KBR, NGmeet, and TenSRDe), which do not use the refinement trick, and the other three baselines (FastHyDe, SNLRSF, and GLF), which use the refinement trick on the contrary. At the meanwhile, we give the denoising performance of our NS3R method in the same setting of the refinement trick to make fair comparison. From the results in Table IV(a), for baselines LLRT, KBR, NGmeet, and TenSRDe, when they do not use the refinement trick, their denoising performance is much worse than that of our proposed NS3R in terms of both denoising effect and running time. From the results in Table IV(b), for other baselines FastHyDe, SNLRSF, and GLF, we find that the refinement strategy will large increase the running time of SNLRSF and GLF, and our NS3R method is still the best in terms of denoising effect with quite short running time. Therefore, if we simultaneous focus on the denoising effect evaluated by PSNR/SSIM and the denoising efficiency evaluated by running time, we can find that our proposed NS3R achieves the best tradeoff performance with and without refinement strategy.

3) Comparison With Data-Driven Methods:
Considering the popularity of data-driven methods in HSI denoising research, we additionally compare the proposed NS3R algorithm with five state-of-the-art data-driven deep learning methods, including HSICNN [43], QRNN3D [46], GRN-Net [47], T3SC [48], and MAC-Net [49]. Here, we take the simulated DC Mall dataset sized 200 × 200 × 160 for testing, which is also tested in recent work [42]. The quantitative comparison results are shown in Table V, where we additionally list the results of NGmeet as a state-of-the-art baseline. From Table V, it can be seen that the proposed NS3R algorithm is quite competitive or even better compared with the other four data-driven deep learning methods. These data-driven methods mainly train certain supervised deep neural networks from carefully collected noisy-clean HSI pairs. They take a lot of time in the training phase and test very quickly. As a model-driven method, our proposed NS3R is relatively slower. But, compared with the NGmeet, one well recognized model-driven baseline, our proposed method has a very significant improvement in computation effectiveness.

V. CONCLUSION
This work proposes an NS3R-based HSI denoising method, which integrates a novel spectral sparse PCA into the NSR framework. It advances other peers in two aspects. First, the proposed spectral sparse PCA strategy yields more significant RCIs and automatically distinguishes the extremely corrupted bands in real noisy datasets. Second, the proposed method achieves a significant compromise between the denoising performance and running time. The two points are observed in extensive experiments both on the simulated and real HSI datasets. In the future, we will extend the proposed method for mixed noise removal, which is more general.