Adaptive Graph Regularized Multilayer Nonnegative Matrix Factorization for Hyperspectral Unmixing

Hyperspectral unmixing is an important technique for remote sensing image analysis. Among various unmixing techniques, nonnegative matrix factorization (NMF) shows unique advantage in providing a unified solution with well physical interpretation. In order to explore the geometric information of the hyperspectral data, graph regularization is often used to improve the NMF unmixing performance. It groups neighboring pixels, uses groups as graph vertices, and then assigns weights to connected vertices. The construction of neighborhood and the weights are normally determined by k-nearest neighbors or heat kernel in a deterministic process, which do not fully reveal the structural relationships among data. In this article, we introduce an adaptive graph to regularize a multilayer NMF (AGMLNMF) model for hyperspectral unmixing. In AGMLNMF, a graph is constructed based on the probabilities between neighbors. This enables the optimal neighborhood be automatically determined. Moreover, the weights of the graph are assigned based on the relationships among neighbors, which reflects the intrinsic structure of the complex data. Experiments on both synthetic and real datasets show that this method has outperformed several state-of-the-art unmixing approaches.

While mixing models can be constructed in both linear and nonlinear ways, linear mixing model (LMM) is more widely used because of this better interpretability.In LMM, each pixel is considered as a linear combination of independent endmembers, whereas the coefficients form the abundance [6], [7].Spectral unmixing methods can be categorized into geometry based, sparse regression based, or statistics based.Early geometric methods assume pure pixels present in the hyperspectral image, therefore, the main goal is to extract the pure pixels in the mixed data [8]- [10].Later, this restriction is bypassed in simplex-based methods, such as simplex identification via split augmented Lagrangian [11] and minimum volume simplex analysis [12].
SUNSAL and CLSUNSAL [13], [14] are representative sparse regression methods.They reconstruct the hyperspectral image by a sparse linear combination of selected endmembers from a dictionary.Some constraints are added to improve the unmixing model, for example, total variation [15], spectral and spatial weighted [16], joint sparsity blocks [17], and prior knowledge of endmembers [18].MUSIC-CSR [19] was proposed to solve the problem when the signatures in the library and in the real data do not exactly match.Local collaborativity treats endmember signatures generally appear in local spatial regions [20].Double reweighted sparse regression and total variation was proposed to enhance the sparsity of fractional abundances in both spectral and spatial domains [21].
Independent components analysis [22], [23], normal compositional model [24], and nonnegative matrix factorization (NMF) [25] are the typical statistical methods.Of particular interest in this article is the NMF methods, which simultaneously estimate endmember and abundance matrices from an image.To cope with the nonconvex problem in matrix factorization, various constraints have been introduced to characterize different properties of the hyperspectral image and the decomposed matrices.These include, for example, sparsity of endmembers [26]- [28], signature dissimilarity of endmembers [29], manifold relationship of raw image and abundance [30], [31], smoothness of either endmembers and abundance [28], [32], and structural information within an image [33]- [35].Some approaches introduce prior knowledge into the unmixing process to improve the unmixing performance.These knowledge can be in the form of a whole endmember library [36] or partially known endmembers [37], [38].In order to estimate the number of endmembers, robust collaborative NMF was proposed to identify the endmembers and estimate the abundance in a chain [39].
Most methods complete the unmixing with one layer of matrix.However, such factorization may not be able to  characterize the complex structure of hyperspectral data, even with the added constraints.Deep matrix model [40], [41] has been proposed to tackle this problem.For example, multilayer NMF (MLNMF) [42], [43] decomposes a matrix into different layers, which can extract more useful information from the hyperspectral imagery and improve the unmixing performance.This method is extended into a manifold MLNMF [44], which uses graph to regularize MLNMF.In graph regularized approaches, pixels are grouped locally and the graph is built from the groups.The neighboring relationship between vertices is usually determined by the k-nearest neighbor algorithm and then the weights of edges are assigned in an deterministic manner, for example using a heat kernel.Therefore, the generated graph may not be optimal, which leads to underperformed unmixing results.
In order to address this drawback, we propose a novel adaptive graph regularized multilayer NMF (AGMLNMF) method for hyperspectral unmixing.The aim of this method is to integrate an adaptive graph into NMF unmixing framework to enhance the performance.A probabilistic method is adopted during the graph construction process and the probabilistic neighbors are computed based on their distances.Compared with the traditional k-nearest neighbor-based graph construction methods, the proposed method generates the neighbors and the weights simultaneously, assigning the adaptive and optimal neighbors for each data point based on their local distances.In this way, the highly mixed data could be assigned within the optimal neighbors with the generated weights.Benefit from these optimal neighbors, the method can produce better unmixing results than its counterparts.Fig. 1 shows the generated graph of the point marked by red circle by (a) k-nearest neighbor method and (b) our proposed method.It can be seen that the red circle correspond to a grass class.The k-nearest neighbor method generates a tree point (third yellow point from left), which may deteriorate the result.Our proposed method, however, constructs the graph using neighbors all in the grass class.
The rest of this article is organized as follows.Section II describes the background on linear mixture model and MLNMF method.Section III makes detailed description on the proposed method.Section IV presents the experimental results on both synthetic and remote sensing data.Finally, Section V concludes this article.

II. RELATED WORK ON MLNMF-BASED HYPERSPECTRAL UNMIXING
NMF and its multilayer extension (MLNMF) are the basis for the proposed work.This section briefly reviews these two techniques.
Given a hyperspectral image Y ∈ R H×N , where H is the number of bands and N is the number of pixels, linear mixture model models a pixel y ∈ R H×1 as a linear combination of endmembers M ∈ R H×P y = Mr + e (1) where P is the number of endmembers.M = (m 1 , . . ., m j , . . ., m P ), where m j is an H × 1 column vector representing the spectral signature of the jth endmember.r is a P × 1 abundance vector, and e is the additive Gaussian white noise.
For the whole image, the matrix form of the linear mixture model is defined as where R is a P × N abundance matrix and E is an H × N noise matrix.
Hyperspectral unmixing aims to estimate M and R given a hyperspectral image Y. Since both M and R cannot be negative, NMF [45], [46] becomes a natural solution.In reality, however, the structure of the data may be more complex than a single-layer NMF can cope with.This can be better modeled by multilayer NMF (MLNMF), which makes iterative decomposition using multiple layers [42] (3) where Y l , M l , and R l are defined for each layer of matrix Y, M, and R, respectively, such that Here, • F is the Frobenius norm, λ m and λ r control the sparseness of M l and R l , respectively.This objective function can be solved iteratively using where (.) T is matrix transpose, and .* and ./are multiplication and division calculated for each element of the matrix, respectively.

III. APPROACH
In this section, we first introduce the approach of generating the adaptive graph and then describe the proposed AGML-NMF method in detail.At last some implementation issues are discussed.

A. Generating Adaptive Graph
The first step in our method is to generate an adaptive graph for hyperspectral images.Traditional graph methods [47], [48] first find the nearest neighbors of a given pixel, e.g., via k-nearest neighbor algorithm, to form the vertices of a graph.Then, the importance of the neighbors is characterized by the weights of the edges, which can be calculated as where y i is a nearest neighbor of y j , and σ is used to control the magnitude of the weights.Compared with existing graph methods [47], [48], the proposed method explores probabilistic neighbors instead.In this way, the optimal neighbors can be chosen adaptively.Moreover, instead of assigning weights by a heat kernel or other deterministic methods, the weights are calculated based on the probability.
An assumption here is that if the distance of y i and y j is small, a large probability s ij should be assigned, and vice versa.Following equation is used to set the probabilities: where s i ∈ R N ×1 is a vector with the jth element as s ij .s ij defines the joint probability of y i and y j .Therefore, the sum of the probability should be one, which means s T i 1 = 1.Note that ( 7) could be solved by the nearest neighbor of y i with probability 1, whereas the probabilities of all the other data points are 0.However, this solution only considers the nearest neighbor of a pixel y i , but ignores the other neighbors, which may lead to suboptimal performance.In order to solve this issue, we add an item min s T i 1=1,∀j 0≤s ij ≤1 N j=1 γs 2 ij to (7).Then, the problem is updated as where γ is a parameter that controls the values of s ij .With this regularization term, the probability weight could evenly distributed in s ij .
According to [49], (8) can be rewritten as where S is the similarity matrix, L S is the Laplacian matrix of S, N determines the dimension of matrix L S , and c is the number of the connected components that should be equal to the number of endmembers in the unmixing problem.In the definition, matrix S is the similarity matrix on all pixels and L S is the Laplacian matrix of S, therefore, N is the maximum rank of L S .Equation ( 9) is difficult to solve because of the rank constraint rank(L S ) = N − c, for which there is no mathematical method to optimize the rank as a variable.
Based on [49], this problem can be reformulated as Compared with ( 9), ( 10) is relatively easier to solve by an alternative optimization approach.When S is fixed, (10) becomes The optimal solution F to ( 11) is formed by c eigenvectors of L S corresponding to c smallest eigenvalues.When F is fixed, (10) becomes It can be simplified as where f i are the ith eigenvector of L S .Denote and , and denote d i ∈ R N ×1 as a vector with the jth element as Then, the Lagrangian function of ( 13) is where β and τ are the Lagrangian multipliers, and the mean value of γ i is γ, which is used in (8).
According to the Karush-Kuhn-Tucker (KKT) condition, the optimal solution s i should be where β is the Lagrangian multiplier.According to the KKT condition, the optimal value of s ij is nonnegative.The + sign means a nonnegative value.
If there is large number of nodes in matrix S, it is time consuming to generate weights for each node.Moreover, the unmixing procedure needs a large memory to save the weights.Therefore, a sparse matrix S is usually used, which means only the k-nearest neighbors of y i could have chance to connect to y i .According to s T i 1 = 1, the value of β can be computed as where k corresponds to the k-nearest neighbors.Therefore, in the proposed graph method, k has physical meaning and is only a parameter to tune.To make the number of neighbors be k, γ i should be set as Therefore, γ can be set as mean value of γ 1 , . .., γ k as Then, we can construct the proposed adaptive graph.The weight matrix W could be set as We also construct the diagonal matrix D ii = j W ij and Laplacian matrix L = D − W.

B. Adaptive Graph Regularized MLNMF
In this section, we describe the AGMLNMF approach.The objective function of AGMLNMF for the lth layer is where λ m and λ r denote the regularization parameters to balance the sparsity of the spectral signature and the abundance fractions.η is the adaptive graph regularization parameters.In this model, the first term is used to measure the reconstruction error.The second and third terms are designed to force the sparseness of the endmember matrix and the abundance matrix.The last term is used to regularize the adaptive graph for abundance.Let (Ψ ik ) l and (Φ jk ) l be the Lagrange multipliers for constraints (M ik ) l ≥ 0 and (R jk ) l ≥ 0, respectively.We take the partial derivatives of Lagrange L over M l and R l of (21) as Using KKT conditions (ψ ik M ik ) l = 0 and (φ jk R jk ) l = 0, we can obtain Therefore, the updating rule can be acquired as In our method, the purpose of the graph is to keep the manifold of the hyperspectral images.Therefore, in the iteration, the weight matrix, the diagonal matrix, and the Laplacian matrix of each layer are set as W l = W, D l = D, and L l = L, respectively.

C. Implementation Details
Our method initializes the endmember matrix M and abundance matrix R with random numbers between 0 and 1.On synthetic dataset, the number of endmembers is set to be the same as the number used to generate the synthetic data.On real-world data, the number of endmembers is fixed following several studies [50].
Two stopping criteria are used to control the iterative optimization.The first one is the largest iteration number, which is set to 3000.The second one is the gradient difference of the objective function between the current and the previous iterations, which can be calculated as where is set to 10 −4 in the experiments.The optimization process terminates once either criterion is met.Fig. 2 shows the convergence curve of each layer (five layers in total) on a synthetic dataset.It can be seen that the errors drop quickly before iteration 25 and changes slightly thereafter.The procedure of our adaptive graph regularized NMF method is described in Algorithm 1.

IV. EXPERIMENTS
We have carefully designed experiments to analyze the performance of the proposed AGMLNMF method.Our method is also compared with several baseline approaches, including VCA [10], L 1/2 -NMF [26], manifold regularized NMF (MRS-NMF) [30], MLNMF [42], and graph regularized MLNMF (GMLNMF) [44].All methods are evaluated using spectral angle distance (SAD) and root-mean-squared error (RMSE).SAD measures the similarity of the pth endmember signature M p and its estimation RMSE calculates distance between the estimated abundance and reference abundance where R p is the reference abundance matrix for the pth endmember.

A. Experiments on Synthetic Data
Doing experiments on synthetic data allows objective evaluation of all methods since the ground truth data are known during the data generation process.Here, we follow the method from Miao and Qi [51], which is widely used for synthetic data generation.Detailed steps of this approach can be found in the article of Qian et al. [26].We selected six spectral signatures from the USGS digital spectral library [52] to generate the synthetic data (see Figs. 3 and 4).

1) Experiment 1 (Parameters):
In our method, λ m and λ r in (21) are parameters that control the sparsity of matrices M and R. In our experiments, we set them the same as in [42].Because large number of endmembers may lead to highly mixed data, an adequate number of layers could better reconstruct the highly mixed data.We first evaluated the optimal number of layers.We ran our algorithm for ten times with different numbers of endmembers (4 to 8).We then calculated the mean SAD and RMSE and report them in Fig. 5. From Fig. 5, we can see that five layers are the best number of layers for SAD and six layers are the best for RMSE.Therefore, we use five layers in the rest of our experiments.η in ( 21) is a parameter that controls the weight of the adaptive graph.We changed η from 0.1 to 0.5 with an interval of 0.05 to show its influence.To illustrate the robustness of η, we tested the number of endmembers from 4 to 8 and calculated the mean results.From Fig. 6, we can see that the optimal value of η is from 0.3 to 0.4 for SAD and from 0.3 to 0.5 for RMSE.Therefore, we set η as 0.4 in our experiments.Another parameter is K, which is the number of neighbors to determine the initial graph.We changed value of K from 3 to 15 and report the mean SAD and RMSE values calculated from 4 to 8 endmembers.Fig. 7 shows that the optimal value is 5 for SAD.Therefore, we set the parameter K as 5 in the rest of experiments.

2) Experiment 2 (Comparison of Different Methods):
In this experiment, several baseline methods are compared with the proposed method, including VCA [10], L 1/2 -NMF [26], MRS-NMF [30], MLNMF [42], and GMLNMF [44].We set SNR to 25 and the total number of endmembers P to 5. For each method, the code is run for ten times with different initializations.The means and standard deviations of SAD and RMSE are calculated, as shown by the bars and error lines in Fig. 8.
From this figure, we can see that the performance of SAD and RMSE of all MLNMF-based methods are better than those from NMF.This verifies that MLNMF-based method can extract more information than NMF-based method.The performance of MRS-NMF is marginally better than L 1/2 -NMF because it includes the manifold information between pixels in hyperspectral data and abundance matrix.The result of GMLNMF is better than MLNMF because it contains the manifold information  during the unmixing.For the proposed AGMLNMF method, the performance is better than all other methods.This is because we can construct the adaptive graph according to the data to regularize MLNMF.Fig. 9(a) shows the comparison of the estimated spectral signatures by AGMLNMF (solid lines) and the reference signatures extracted from the library (bold solid lines), Fig. 9(b) shows the comparison of the estimated spectral signatures by GMLNMF (dashed lines) and the reference signatures (bold solid lines).From the figure, we can see that the results generated by our proposed method AGMLNMF and by GMLNMF are indeed close to each other, which is consistent with the SAD scores.However, for most of endmembers, our method is relatively closer to the reference signatures, which shows the advantages of the adaptive neighbors.
3) Experiment 3 (Robustness to Noise): This experiment aims to verify the robustness of different methods under comparison.We set different levels of SNR to the synthetic data, including 15, 25, 35, 45 dB, and infinity (noise free), respectively.The SAD and RMSE results from different methods are shown in Fig. 10.From the figures, we can see that the performance of all the MLNMF-based method is better than NMF methods.This is because the structure of multilayers can better characterize the highly mixed data.Moreover, the results of GMLNMF are better than MLNMF due to maintaining the manifold structure through graph method.Fig. 10 also shows that our method significantly outperforms all other methods on endmember and abundance estimation.The improvement is due to several reasons.First, our method assigns the adaptive and optimal neighbors for each data point.When the noise level is high, this could ensure the neighbors be close to each other and from the same material, making it better than the KNN used in the GMLNMF method.Second, instead of heat kernel, our method assigns weights based on the neighbors' local distances.This could ensure each neighbor get adequate weights to enhance the performance with high level of noise.Finally, the adaptive graph could be applied to the multilayer structure, which can extract intrinsic information from the hyperspectral imagery.Therefore, no matter how the noise level changes, our proposed method are more robust to the influence of noise than the alternative approaches.

4) Experiment 4 (Influence of Number of Endmembers):
The goal of this experiment is to evaluate how our method responds to the change on the number of endmembers.In this experiment, we varied the total number of endmembers from 4 to 8. We ran the experiments for ten times and then calculated the mean values and stand deviations of SAD and RMSE.The results in Fig. 11(a) show how the SAD changes with respect to different numbers of endmembers.Fig. 11(b) shows the influence endmembers on the accuracy of the abundance estimation.Fig. 11 shows that no matter how the number of endmember changes, our AGMLNMF method consistently achieves better SAD and RMSE than other methods under comparison.

B. Experiments on Remote Sensing Data
We used three real-world dataset to evaluate different methods, which include Jasper, Washington DC, and Cuprite.For the Jasper and Washington DC datasets, we followed Jia's method [32] to get the reference endmember.For the reference endmember spectra, this method calculates the mean values of signatures manually chosen from the hyperspectral data.The reference endmember for the Cuprite data was obtained from the USGS Library.The reference abundance maps are computed by the CVX MATLAB software based on the obtained reference endmembers.To match the reference endmembers with the estimated endmembers, we calculated the correlation between the endmembers and then assign the estimated endmember to the nearest reference endmembers.Because no ground truth on the abundance can be found, we only displayed the abundance map of each endmember for qualitative comparison.
1) Jasper Ridge Dataset: The Jasper Ridge dataset contains 224 bands for the wavelengths from 380 to 2500 nm.Its spectral resolution is 10 nm and a band image is in the size of 512 × 614 pixels.In the experiments, we crop the original image to 100 × 100 pixels from coordinates (105, 269) [50],   In the experiment, we calculated the mean SAD values and the standard deviations of these four endmembers.The results in Table I show that for the mean SAD values, the proposed method has outperformed the baseline methods VCA, L 1/2 -NMF, MRS-NMF, MLNMF, and GMLNMF.Moreover, our method also produced the best three SAD values out of four endmembers.This is because our method not only better characterizes the mixed data with multilayers but also maintains better manifold information with the adaptive graph.Fig. 13 shows the estimated endmember signatures against the references.Fig. 14 shows the reference abundance maps and estimated abundance maps.From these figures, it can be seen that the abundance map and endmember signatures produced by our method are very close to the references.
2) Washington DC Mall Dataset: The Washington DC Mall dataset was captured by the urban hyperspectral digital imagery collection experiment sensor.Due to the large size of raw images, we crop the original image to a subimage with size of 150 × 150.Similar to the operation on the Jasper Ridge dataset, low SNR bands 103−106, 138−148, 207−210 are removed, producing an image of 191 bands.The subimage of band of 0.66, 0.92, and 1.21 μm are displayed in Fig. 15.Following Jia and Qian [32], five endmembers are set for the experiments, including tree, grass, water, roof, and street.
In the experiment, we calculated the mean SAD values and their standard deviations of these five endmembers.The results in Table II show that the proposed AGMLNMF method has outperformed the alternative approaches in terms of the mean SAD values.Furthermore, it also achieves the best two SAD values for five endmembers.Fig. 16 shows the estimated endmembers and the references that are very close to each other.Fig. 17(a)-(e) shows the abundance maps of five endmembers compared to the reference abundance map in Fig. 17(f)-(j).From the figures, we can see that the estimated abundance maps are similar with the references.This, again, shows the advantage of our method over the baselines.
3) Cuprite Dataset: We also show the results on the Cuprite dataset, which was captured by the AVIRIS sensor in    there are 12 endmembers in the dataset.Table III lists the SADs and standard deviations from different methods.The proposed method produces the best performance on the mean SAD value and 5 best SAD values on 12 endmembers.Fig. 19 shows the estimated spectral and reference spectral of the Cuprite data.Fig. 20 shows the abundance maps of the Cuprite data.
From the synthetic data and real-world data, we can see that our method achieves better performance than alternatives.The main reason is that our method can extract better structural information when the data are highly mixed.In our method, the highly mixed pixels are usually assigned as the neighbors according to the nearest material in the hyperspectral image, whereas the traditional graph method cannot consistently achieve the same results.For example, in Fig. 15, the mixed pixel of grass is

TABLE III SADS AND STANDARD DEVIATIONS FROM DIFFERENT METHODS ON CUPRITE DATASET
The bold texts are the best SAD results for each endmember.assigned the weights based on the pixels of grass in the images, whereas the traditional graph method failed to do so and pixels of trees may be assigned as the neighbors of grass.

V. CONCLUSION
In this article, an AGMLNMF has been proposed for hyperspectral unmixing.This method generates an adaptive graph according to the probability among the connected points.By integrating the adaptive graph, more comprehensive information on neighbors can be extracted and, thus, yield a better graph than alternative graph-based methods.We verified the effectiveness of this approach using a series of experiments on both synthetic and real data.The proposed method has achieved better performance than several baseline methods.In the future, we will extend our method to a deep matrix factorization model for hyperspectral unmixing.

Fig. 2 .
Fig. 2. Convergence curve of each layer of AGMLNMF.(a) Convergence curve of layer 1 (b) Convergence curve of layer 2 (c) Convergence curve of layer 3 (d) Convergence curve of layer 4 (e) Convergence curve of layer 5.

TABLE I SADS
AND STANDARD DEVIATIONS FROM DIFFERENT METHODS ON JASPER DATASETThe bold texts are the best SAD results for each endmember.

TABLE II SADS
AND STANDARD DEVIATIONS FROM DIFFERENT METHODS ON WASHINGTON DC DATASETThe bold texts are the best SAD results for each endmember.