Hyperspectral Unmixing Using Robust Deep Nonnegative Matrix Factorization

: Nonnegative matrix factorization (NMF) and its numerous variants have been extensively studied and used in hyperspectral unmixing (HU). With the aid of the designed deep structure, deep NMF-based methods demonstrate advantages in exploring the hierarchical features of complex data. However, a noise corruption problem commonly exists in hyperspectral data and severely degrades the unmixing performance of deep NMF-based methods when applied to HU. In this study, we propose an (cid:96) 2,1 norm-based robust deep nonnegative matrix factorization ( (cid:96) 2,1 -RDNMF) for HU, which incorporates an (cid:96) 2,1 norm into the two stages of the deep structure to achieve robustness. The multiplicative updating rules of (cid:96) 2,1 -RDNMF are efﬁciently learned and provided. The efﬁciency of the presented method is veriﬁed in experiments using both synthetic and genuine data.


Introduction
Hyperspectral unmixing (HU) is a key means of making sufficient use of remotely sensed hyperspectral image (HSI) data, which aims to separate mixed pixels into a group of contained endmembers and corresponding abundance maps. The classical linear mixing model (LMM) has been popularly employed in HU because of its practicality and clarity of physical significance. Due to the nonnegative constraint in LMM, NMF shows its advantages in extracting intuitive and interpretable representation and has been commonly applied to handle HU problems [1]. However, due to the nonconvexity, NMF is prone to get stuck at local minima and is sensitive to noisy samples [2].
To constrain the solution space and produce superior results, a branch of NMF-based methods with additional constraints have been proposed [3][4][5][6]. Robust NMF methods have also been investigated and proposed to address the problem of noise corruption [7][8][9]. The 2,1 norm first produced in [10] has been successfully used in a number of applications to deal with noise and outliers [11][12][13][14]. Through the integration of an 2,1 norm, Ma et al. [15] presented a robust sparse HU method (RSU) with an 2,1 norm-based loss function that achieves robustness to noise and outliers. In [16], a robust version of NMF employing an 2,1 norm in the objective function ( 2,1 -RNMF) is presented and handles outliers and noise more effectively than a conventional NMF. As a nonlinear similarity metric, correntropy has also been introduced to formulate robust NMF methods. In order to mitigate the corruption of noisy bands, Wang et al. present a correntropy-based NMF (CENMF) that allocates band weights in an adaptive manner [17]. Considering the variability of pixels' noise intensities, Li et al. further utilize the correntropy metric to propose a robust spectral-spatial sparsity-regularized NMF (CSsRS-NMF) [18]. In [19,20], a truncation operation is used in the cost function to promote robustness to outliers [19,20]. Peng et al. propose a self-paced NMF (SpNMF) utilizing a self-paced regularizer to learn adaptive weights and obviate the effect of noisy atoms (i.e., row-wise bands or column-wise pixels) [21].
Despite the broad and successful applications in HU, NMF-based methods still remain an improvement space for unmixing performance. In general, a large majority of existing NMF-based methods are conducted by a single-layer decomposition, which has restrained the HU performance on complicated and highly mixed hyperspectral data. With the development and enormous success of neural networks [22,23], researchers have been increasingly motivated and concentrated on creating NMF-based approaches with multilayer structures. Rajabi et al. [24] propose a multilayer NMF (MLNMF) [25] for the purpose of hyperspectral unmixing that decomposes the data matrix and coefficient matrix in a sequential manner. Nevertheless, MLNMF only involves a series of successive single-layer NMFs. There is no feedback of the later layers on the earlier layers in the multi-layer structure of MLNMF [26], which means MLNMF does not fully utilize the hierarchical power of the deep scheme. Deep NMF (DNMF) [27] makes a key advancement by involving a global objective function and a fine-tuning step to propagate information in a reverse direction. In this manner, the entire reconstruction error of the model can be reduced, and better factorization results can be obtained. In the recent literature, some variants of DNMF have been studied to further promote the quality of solutions [26,28,29]. In these variants, various regularizations and constraints are integrated to limit the solution space and lead to better results. Deep orthogonal NMF [26] is a variant that imposes an orthogonality constraint to the matrix H. Fang et al. [30] proposed a sparsity-constrained deep NMF method for HU using an 1 regularizer of the abundance matrix on each layer. In [31], an 1/2 regularizer and a TV regularizer are introduced to propose a sparity-constrained deep NMF with total variation (SDNMF-TV) for HU. In general, most deep NMFs and their variants are based on the square of the Euclidian distance objective function, which lacks the capacity to adapt to noise occasions. In real applications, HSIs are commonly contaminated by various types of noise, including Gaussian noise, dead noise, and impulse noise, which dramatically degrade the unmixing performance [32]. To address this problem, metrics with robust properties such as an 2,1 norm [10] and a correntropy-induced metric (CIM) [33] can be considered to enhance the robustness of deep NMF-based methods.
This study introduces an 2,1 norm-based robust deep nonnegative matrix factorization ( 2,1 -RDNMF) method for HU. To strengthen the robustness of DNMF when exploring hierarchical features, an 2,1 norm is effectively incorporated into the designed deep NMF structure. The proposed 2,1 -RDNMF is composed of a pretraining stage and a fine-tuning stage, both adopting an 2,1 norm in the objective function. The pretraining stage conducts a succession of forward decompositions to pretrain the layer-wise factors. Next, the finetuning stage obtains the fine-tuned factors by optimizing a global 2,1 norm-based objective function. The superiority of our method is demonstrated through experimental results on synthetic and real data. The major contributions of our article are summarized as follows: (1) We first propose a robust deep NMF unmixing model by applying an 2,1 norm. The suggested unmixing model is made of two stages: the pretraining stage and the fine-tuning stage. (2) We provide the multiplicative updating rules for the two stages of the proposed model and explain the robustness mechanism of the 2,1 norm from the viewpoint of dynamic updating weights.

Linear Mixing Model
In the classical LMM, the spectra of mixed pixels in HSIs are assumed to be representable by a linear mixture of endmember spectra and related abundance maps. For an HSI X ∈ R K×N + with N pixels and K bands, the LMM can be formulated as where W ∈ R K×P + represents the endmembers, H ∈ R P×N + represents the abundances, and E ∈ R K×N + denotes the residue term. The constraint of the abundance's nonnegativity (ANC) and the constraint of the abundance sum-to-one (ASC) are two prior constraints generally subject to the LMM, which can be represented as (2)

DNMF
NMF is a popularly used method for addressing the linear unmixing issue, which learns the endmember matrix and the abundance matrix in a single-layer learning manner. As the deep variant of NMF, deep NMF integrates the multi-layer network structure and the fine-tuning step to explore hierarchical features with hidden information and further enhance the unmixing performance. As shown in Figure 1, the formation of DNMF is composed of the following stages: (1) Pretraining Stage: The task of this stage is to pretrain the layers by performing basic NMF successively in each layer. To be specific, the data matrix X is divided into V 1 and H 1 in the first layer. Then, H 1 is divided into V 2 and H 2 in the second layer. A similar procedure is repeated until the last L-th layer is finished. In this stage, the objective function for each layer is given by In the first layer (i.e., l = 1), we have M 1 = K and H 0 = X. The multiplicative update rules for (3) are provided in the classical NMF algorithm [34], i.e., where . * and ./ denote element-wise multiplication and division.
(2) Fine-Tuning Stage: In this stage, the hierarchical network structure is considered as a whole in order to minimize the overall reconstruction error. The factors of all layers obtained in the pretraining stage are fine-tuned with the objective function as follows: The objective function in (6) can be rewritten as where C l and D l are defined as In fact, in the fine-tuning stage, H l acts as an intermediate variable and is not reflected in (6). Only the variables (V 1 , · · · , V L , H L ) need to be alternatively updated. By differentiating (6) with respect to V l and H L , we have After selecting appropriate step sizes, the update rules for V l and H L can be derived through the gradient descent (GD) method in the following multiplicative forms: ... ... On the basis of acquired optimal factors, the final endmember matrix W and abundance matrix H are calculated as follows:

Proposed 2,1 -RDNMF
In real-world scenarios, HSIs frequently suffer from various kinds of interference, including Gaussian noise, impulse noise, and dead pixels. The presence of complicated noise severely decreases the performance of DNMF and may result in the failure of the HU task. To promote the robustness of the deep NMF method, we present a robust deep NMF unmixing model that applies an 2,1 norm to the pretraining and fine-tuning stages. Our suggested unmixing model is also made up of two stages: where O 2,1 -NMF and O 2,1 -deep are the objective functions of the pretraining stage and finetuning stage, respectively, and · 2,1 is the so-called 2,1 norm, which is defined as follows: where e n , e kn denote the n-th column and the (i, j)-th element of the residue term E.
For the pretraining stage, the gradients of O 2,1 -NMF with respect to V l and H L are calculated as follows: where G l ∈ R N×N is a diagonal matrix containing the diagonal components, defined as On the basis of (19) and (20), by selecting appropriate step sizes, the update rules for V l and H l can be derived through the gradient descent (GD) method in the following multiplicative forms: In the pretraining stage, the layer-wise factors are pretrained by the update rules given in (22) and (23).
For the fine-tuning stage, the objective function given in (17) can be rewritten as The gradients of O 2,1 -deep with respect to V l are calculated as follows: where Q l ∈ R N×N is a diagonal matrix containing the diagonal components, defined as Further, the gradients of O 2,1 -deep with respect to H L is calculated as follows: Similar to (12) and (13), by selecting appropriate step sizes, the update rules for (17) can be derived in the following multiplicative forms The final results of the proposed 2,1 -RDNMF are calculated by (14) and (15). It can be observed from (21) and (26) that the data samples in 2,1 -RDNMF are assigned with adaptive weights corresponding to their located column's reconstruction error. By this robust formulation, the influence of those columns with large errors due to high noise intensities can be reduced in the updating process. Therefore, 2,1 -RDNMF has good robustness to noisy samples (pixels) by column. Algorithm 1 presents the overall process of 2,1 -RDNMF. Compute G l using Equation (21).

Implementation Issues
Vertex component analysis (VCA) [35] is utilized to initialize the matrix V l , and the fully constrained least squares (FCLS) algorithm [36] is utilized to initialize H l . Specifically, to avoid pixels with near-zero residues dominating the objective function, the maximum value in G l and Q l is limited to 100. As for the stopping criterion, the algorithm is stopped when the number of iterations reaches a preset maximum number T max or the relative error T max is set to 500 and ε is set to 10 −4 in this paper. Further, the parameters of the multi-layer structure in deep NMF-based methods are determined with reference to [31] through empirical examination. In Experiment 1, it is found that optimal performance is achieved when the number of layers L of the deep NMF structure is defined as 3. Therefore, we set L as 3 in the rest of the experiments.

Experiments on Synthetic Data
The used synthetic data are generated according to the methodology provided in [2]. The spectra used as endmembers are randomly assigned from the United States Geological Survey (USGS) digital spectral library [37]. The size of generated synthetic data is 100 × 100 pixels with 224 bands. To better simulate the real scenario, the synthetic data are contaminated by a possible complex noise composition involving dead pixels, Gaussian noise, and impulse noise. When adding the Gaussian noise, pixels' values of the signal-to-noise ratio (SNR) are derived from SNR pixel ∼ N (SNR, σ 2 ) with reference to [18]. To establish a reliable comparison, the reported results in the experiments are derived by averaging over 20 independent runs for each method. When the iteration number is set as 500, the time consumption of the proposed 2,1 -RDNMF and other compared methods is presented in Table 1  The number of layers is a key parameter for a deep NMF-based method that determines the depth of the multi-layer structure and influences the unmixing performance. This experiment investigates the impact of different settings for the number of layers on the unmixing performance. The number of layers L is set as 1, 2, 3, · · · , 6. Figures 2 and 3 display the SAD and RMSE results of 2,1 -RDNMF under different settings of L. It can be observed that the two metrics show a similar trend when L varies. The performance of the multi-layer structure (L ≥ 2) is superior to the single-layer structure (L = 1), as expected. The metrics of endmembers and abundances both gradually improve with increasing L, and then, they degrade when L is larger than 3. In other words, optimal performance can be achieved by using a three-layer structure. Therefore, a three-layer structure is adopted in the rest of the experiments.

Experiment 2 (Investigation of Noise Intensities)
To verify the proposed method's effectiveness, synthetic HSI data with various noise intensities are utilized to assess the algorithms' performance. The value of SNR is specified as 15, 20, 25, 30, or 35 in order to simulate various noise intensities; σ is set to 5 in the experiment. Figures 4 and 5 show the SAD and RMSE results of algorithms when the noise level varies. When SNR pixel = 35, the algorithms all perform well and are at a comparable level. When the value of SNR pixel decreases, all five methods' performance degrades as anticipated. However, it can be seen that the proposed 2,1 -RDNMF apparently obtains lower values of SAD and RMSE, which validates the robustness of 2,1 -RDNMF to various noise levels.

Experiment 3 (Investigation of Noise Types)
The algorithms are also investigated on synthetic data with various forms of mixed noise added, which provides a better simulation of complicated mixed noise in real scenarios. The mixed noise is made up of a combination of dead pixels, Gaussian noise, and impulse noise. For the dead pixels, 0.5% of pixels are randomly selected to be dead pixels. For the Gaussian noise, SNR pixel is set to 30. For the impulse noise, bands 30-40 are degraded by impulse noise with the noise density set to 0.05. Tables 2 and 3 report the SAD and RMSE results of algorithms under five types of noise situations, including Gaussian noise; Gaussian noise and impulse noise; Gaussian noise and dead pixels; and Gaussian noise, impulse noise, and dead pixels. The reported results show that 2,1 -RDNMF achieves the best unmixing performance in the terms of endmembers and abundances under all four types of noise situations, which further confirms the robustness of the proposed 2,1 -RDNMF under complicated mixed noise. The first real HSI dataset used in our experiments is the Samson dataset [38], which contains 156 bands spanning the spectral range of 401 nm-889 nm. As shown in Figure 6, the used subscene consists of 95 × 95 pixels. Based on the previous studies in [31,39], there are three kinds of targets existing in the subscene, including water, trees, and soil. Table 4 presents the SAD results of the three targets achieved by the proposed 2,1 -RDNMF and comparative methods. As the results show, 2,1 -RDNMF outperforms the compared methods in all three targets and the mean value. Since the ground truth of abundances for the real dataset is unavailable, Figure 7 only displays the estimated abundance maps using the proposed 2,1 -RDNMF. With comparison to the results presented in the previous works [31,39], the abundance maps are in accordance with the the reference maps.   Figure 7. Abundance maps estimated using 2,1 -RDNMF on the Samson dataset.

Cuprite Dataset
The second scene employed in the experiment is the Cuprite dataset collected by the AVIRIS sensor over a Cuprite area in Las Vegas, NV, US. As shown in Figure 8, the used subimage contains 250 × 191 pixels and 224 bands. After badly degraded bands (1-2, 76, 104-113, 148-167, and 221-224) are removed, there are 188 bands remaining that are used in the experiment. Table 5 presents the SAD results of the proposed 2,1 -RDNMF and comparative methods on the Cuprite dataset. It can be observed that 2,1 -RDNMF achieves the best performance in several minerals, including Andradite, Dumortierite, Kaolinite #2, Muscovite, Nontronite and Pyrope, as well as the mean value. Additionally, Figure 9 presents the abundance maps estimated by 2,1 -RDNMF.

Conclusions
In this study, we provide an 2,1 norm-based robust deep NMF method ( 2,1 -RDNMF) for HU. The 2,1 norms are effectively incorporated into two stages of the designed deep structure to achieve robustness to noise corruption in HU. Experiments conducted on synthetic data simulated with various noise intensities and types demonstrate that the proposed method possesses gratifying robustness to complex noise situations. Experiments conducted on real data also verify the effectiveness of our method.