HYPERSPECTRAL UNMIXING BY THE ALTERNATING DIRECTION METHOD OF MULTIPLIERS

. We have developed a method for hyperspectral image data unmixing that requires neither pure pixels nor any prior knowledge about the data. Based on the well-established Alternating Direction Method of Multipliers, the problem is formulated as a biconvex constrained optimization with the constraints enforced by Bregman splitting. The resulting algorithm estimates the spectral and spatial structure in the image through a numerically stable iterative approach that removes the need for separate endmember and spatial abundance estimation steps. The method is illustrated on data collected by the SpecTIR imaging sensor.


1.
Introduction. Hyperspectral (HS) imaging is an important technology for many commercial and military applications such as natural resource discovery and intelligence gathering. Images are collected having typically hundreds of thousands of discrete pixels, each having a wavelength spectrum of hundreds of spectral bands. Usually, the images contain many different topographical features such as soil, vegetation, and water in addition to man-made features such as roads, buildings, etc. Those features all have distinctive spectral properties that must be separated by processing algorithms, ideally using as little prior knowledge as possible. The term unmixing refers to the need for performing this separation.
Because many pixels will contain wavelength spectra that are a composite of two or more basic components in the image, algorithms are needed to automatically perform the unmixing. The usual method for doing this unmixing is to locate the endmember pixels that have a pure spectrum from a single material. If those pure pixels exist and can be identified, least-squares methods can be used to find the image spatial regions associated with each material. Standard simplex-based algorithms such as N-findr [1] and VCA [2] work reasonably well if pure pixels exist under high SNR conditions.
The problem with standard endmember selection methods is that pure pixels may not exist due to inadequate spatial resolution of the imaging sensor, or noise that can lead to poor spectral estimates at a single pixel. This was already recognized by the likely originator of the term "endmember," Schowengerdt [3]: "... endmembers only exist as a conceptual convenience and as idealizations in real images." Also, the concept of a pure pixel may not have meaning in the LWIR spectral region.
Nevertheless, an enormous literature has been produced with endmembers as one of the central concepts underlying contemporary hyperspectral processing. The recent treatise on hyperspectral processing methods by Cheng [4] with over 500 references, discusses endmember-based unmixing methods extensively and emphasizes their importance.
Although approaches exist that do not require the pure pixel assumption, they typically either require training data for supervised learning of the material spectra, or the use of large dictionaries of possible spectral signatures. The latter algorithms attempt to use sophisticated learning methods for selecting a sparse subset of spectra from the dictionary that adequately captures the spectral content of the image. Both methods require the use of additional information about the image that may not exist in practice. Other methods for relaxing the pure-pixel assumption perform endmember estimation by data transformations [5] or nonnegative matrix factorization [6] applied to simplex models for the data mixing model. Those models constrain the abundances to be positive values summing to 1 over the number of endmembers at each pixel. Because of this constraint, simplex-based methods require an additional, nontrivial processing step to estimate the actual spatial concentrations for each endmember component once the endmember spectra are found. Our approach performs both spectral and spatial estimations in one step.
There are several challenges to developing an efficient and reliable algorithm for HS unmixing without prior knowledge about the spectral or spatial structure of the data. The most significant challenge is that the unknown spectral and spatial components enter the signal model as a product, thereby producing a biconvex optimization problem: convex in either component given the other, but not jointly convex. Fortunately, there is a well-established optimization framework, the Alternating Direction Method of Multipliers [7] (ADMM), that can accommodate the biconvex unmixing problem. Its implementation here leads to a very natural alternating structure with constraints on the parameter arrays enforced through an augmented Lagrangian with Bregman splitting.
Constraints are needed to produce physically meaningful and reliable results. Those constraints are chosen here to be positivity on the spatial and spectral estimates, and a unit-vector constraint on the spectrum of each material. The latter replaces the spatially convex constraints used for abundances, and are felt to be more natural physically. We have used equality constraints for the unit-norms of the spectra, and a Bregman splitting method SOC (Splitting Orthogonal Constraints) of Lai and Osher [8] that introduces auxiliary variables to split the positivity constraints from the estimation of concentration and the spectra. We have also learned that improved results are obtained by introducing an l 1 regularization of the spectral estimates through total variation (TV) also implemented through Bregman splitting and soft thresholding following Goldstein and Osher [9].
Another processing challenge is the sheer size of the typical data cube. The test data set used to illustrate our approach has 192,000 pixels, each having 360 spectral bands. Processing these data by an iterative algorithm is indeed a challenge for a PC. We have found that substantial computation-time saving results from using uniform spatial subsampling to do the unmixing with little or no degradation to the spectral estimates. The spatial structure at full resolution can then be found by processing the full data cube with the estimated spectra.
The remainder of the paper is organized as follows. In section 2 we derive the HS unmixing algorithm using ADMM with an augmented Lagrangian constructed to implement the constraints and TV regularization of the spectral array. Section 3 assesses the performance of the ADMM unmixing algorithm with respect to several issues using HS test data collected by the SpecTIR sensor over Beltsville, MD. We summarize and discuss generalizations in section 4. Section 5 is an appendix deriving a spectral expansion method for numerically solving the Sylvester equation required by the unmixing algorithm.
2. ADMM unmixing algorithm derivation. As noted above, our optimization problem is biconvex, and as such we cannot expect to achieve globally unique estimates of the spectral and spatial arrays ρ and C; we must settle for finding local minimizers that will be dependent on the initialization of the spectral array ρ. Because of the need to estimate both ρ and C alternately from the same data cube, ADMM is the natural framework to use for the optimization.
Given the HS image data G ∈ R M ×N where M denotes the number of spectral bands, and N = N 1 N 2 represents the number of pixels after stacking the N 1 rows and N 2 columns into a single vector, we model G as G = ρC + ν with ρ ∈ R M ×L Here L is the number of assumed materials in the image, and ν is an additive noise with zero-mean and bounded variance. We then have the constrained problem with m a dual Lagrange multiplier, and c l (ρ) = ρ l − 1, 1 ≤ l ≤ L, is a unit norm equality constraint on the spectrum of each material. The first term is the data fidelity component, the second is an l 1 -regularization term on the derivative (total variation) of the spectral wavelength dependence, and the last two terms are the augmented Lagrange terms that enforce the unit normalization. The parameter α controls the balance between data fidelity and spectral smoothness.
An important feature of our method is the use of Bregman splitting to enforce the non-negativity constraints on ρ and C as well as the TV regularization of ρ. Analogous to the use of splitting for basis pursuit or other convex optimizations, the SOC (Splitting Orthogonality Constraint) method introduces additional parameters into the objective function that allow a difficult constrained optimization to be broken into simpler problems that often have either analytical or easily implemented iterative solutions.
For the HS unmixing problem we introduce three new sets of parameters: one set r = ρ for the spectra, another set s = ∇ρ for their derivatives, and a set e = C for the spatial concentration array. Letting ρ k , r k , C k , e k , s k denote the parameter estimates at iteration k, we have the problem min ρ,r,C,e,s with J(C, ρ) = 1 2 G − ρC 2 F , where the subscript F denotes the Frobenius (Hilbert-Schmidt) matrix norm.
The augmented Lagrangian for this problem is then , with dual Lagrange multiplier parameters m, p, q, and n that enforce the equality constraints. The Lagrange multipliers are updated within the iterative ADMM framework by the method of multipliers developed independently by Hestenes [10] and Powell [11] for equality constraints, and later generalized to inequality constraints by Rockafellar [12]. The fourth, sixth, tenth and twelfth terms on the right represent the quadratic penalty terms that augment the classical Lagrangian. Their addition promotes the constraint enforcements in (2), adds numerical stability to the solution, and provides a systematic method for locally optimizing both the multipliers and {ρ, C} in parallel. By virtue of the additional parameters e, r, and s, we have an unconstrained optimization over C and equality-constrained optimization over ρ with the remaining constraints easily enforced on e, r, and s analytically. The parameters λ m , λ C , λ ρ , and λ s are termed penalty parameters, and within the context of augmented Lagrangian theory, are updated at each iteration k by a factor γ > 1 such that λ k i = γλ k−1 i . Reference [13] discusses methods for iteratively updating the penalty parameters. Increasing the penalty functions at each iteration has the effect of significantly improving the convergence of the algorithm. The seventh and eighth terms represent indicator functions defined as I + (e) = 0, for e ≥ 0, = ∞, for e < 0.
From the structure of (3) we see that the total problem at a given iteration k breaks into two saddle point subproblems: finding C k , e k , p k given ρ k−1 , et al., and finding ρ k , r k , q k , s k , n k , m k given C k , et al. For the first subproblem we have To solve (4) we use an iterative approach, computing updates to C, e, and p in that order with C k − C k−1 as the convergence criterion. In more detail, from we find Similarly, ∇ e L = p k−1 − λ k−1 C C k − e = 0 and the positivity constraint give the projection onto the positive halfspace From the general theory of augmented Lagrangians [13] we get The second subproblem assumes the form (ρ k , r k , q k , s k , n k , m k ) = arg max q,n,m min ρ,s,r≥0 r i =1,1≤i≤L L(C k , e k , p k , ρ, r, q, s, n, m).
Analogous to the first subproblem, we solve (9) by iteratively updating ρ, m, r, q, s, and n in that order with ρ k − ρ k−1 as the convergence criterion. Specifically, differentiating L with respect to ρ gives and, after collecting terms with the ρ terms evaluated at ρ k−1 , we find the equation for ρ k Equation (11) for ρ k is called Sylvester's equation, and can be solved numerically in Matlab using the Hessenberg-Schur algorithm [14] or the simpler but possibly less numerically stable spectral method given in the appendix. The associated Lagrange parameter update equations for m and λ m are From ∇ r L = q k−1 − λ ρ ρ k − r = 0 and the positive orthant sphere constraint we get The Lagrange multiplier updates for q and λ ρ are then For updating s we use soft thresholding through the shrinkage mapping S(x, a) to find Finally, the Lagrange dual variable n and λ s are updated as Both subproblem recursions for C by (6)-(8) and ρ by (11)-(16) are iterated to convergence (with r, e, s, p, q, m and n initialized at 0) within an outer loop that iterates between the subproblems. The spectral estimates ρ 0 are initialized by positive, unit-norm vectors computed from the VCA algorithm. An alternative to VCA initialization would be random positive unit-norm vectors. VCA estimates tend to give faster convergence. The differentiation operator ∇ is implemented as a two-point difference approximation. Also, spatial subsampling is used with typically every tenth pixel for estimating the spectra from (11)-(16) followed by a final iteration of (6)- (8) for estimating C at full resolution. A summary of the overall algorithm is given in Box 1.
We make the following remarks about the algorithm in Box 1. The HS data cube is denoted by the three-dimensional array [R (j, k 1 , k 2 )] where 1 ≤ j ≤ M , 1 ≤ k 1 ≤ N 1 , and 1 ≤ k 2 ≤ N 2 . For the unmixing, R is subsampled by every N samp pixels to get G after stacking the spatial columns into a single row vector. This operation is denoted by · . The last part of the algorithm estimates C at full resolution using the spectral estimates from the unmixing, taking G to be the full data cube R, after spatial stacking.
3. ADMM Unmixing assessment using HS data. Given the biconvexity of the ADMM unmixing algorithm above, there are several potential issues with the method that need to be addressed. They include: • The dependence of the spectral estimates on the use of spatial subsampling versus the full data cube, • The variability of the spectral estimates to different initializations, • The stability of the algorithm to increasing the number of materials sought, • The physical significance of the spectral estimates, • The value of adding a unit-vector constraint for ρ and total variation regularization to the spectral estimation.
Because of the difficulty in providing analytic answers to these questions, we use Matlab calculations on a data cube collected by the state-of-the-art SpecTIR sensor using 360 spectral bands with about 5 nm resolution between about 393 nm and 2.5 µm. We first address the spatial subsampling question. Figure 1 shows an image of radiance data from the Beltsville, MD cube having 600 by 320 pixels. The picture and data were taken from the SpecTIR Website, www.spectir.com.
We compared the subsampled (by N samp = 10 pixels) and full resolution spectral estimates on these data using 20 trials with VCA initialization and initial penalty parameter choices λ ρ = 300, λ C = 0.01, α = 0.3, λ s = 0.3, λ m = 0.04, and γ = 1.1 for 6 materials. Figures 2 and 3 compare the full resolution mean and standard error estimates (top) with the spatially subsampled estimates (bottom). Although there was some difference in the standard error estimates, the mean values are seen to be very close. The mean unmixing times were 782.4 s (full) and 1.026 s (subsampled). The results indicate that using only 1% of the spatial information gave satisfactory results on these data in much shorter times.
A related concern is the variability in the spectral estimates to the initialization spectra. We address that issue next using the Beltsville data cube with the ADMM algorithm (with spatial subsampling) run 20 times for different VCA initializations using the parameter choices above for 6 materials. Figure 4 plots a superposition of the spectral estimates after permuting the color order to match that of the first trial. We see that the spectral variability from initialization is similar to that seen in Figure 3. It is also similar to the variability observed from running the VCA endmember algorithm repeatedly on the same data. As a side point, we note that the spectral variability from different initializations can be reduced greatly by averaging the estimates from a few (3)(4)(5) trials. Figures 5 and 6 compare the spatial estimates at the subsampled and full resolution from the spectra at the first trial. The discrete pixel estimates are clearly evident in Figure 5 but not Figure 6.
We next examine the unmixing stability as the number of material components is increased from 2 to 20. Figure 7 plots the fitting error J at convergence with J ≡ G −ρC 2 F M N andC computed at full spatial resolution. The error is seen to decrease smoothly with material number. This indicates that the algorithm remains stable as the number of materials increases. In effect, the unmixing algorithm is extracting more information from the data set with increasing material numbers until the noise floor in the data is reached. Evidently, about 10-12 materials are enough to capture most of the structure in the image, but the algorithm can still produce stable results for at least 20 materials. Figures 8 and 9 show the spectral and spatial estimates for 20 materials. Three plots are used for the spectra of materials 1:7, 8:14, and 15:20, respectively. The unmixing algorithm converged in 34 iterations to a fitting error of 2.1e-5 in 8.96 s for the unmixing time. Processing was done on a Dell XPS8500 running at 3.4 GHz.
The fitting error J provides a means for selecting the number of materials, and the results here suggest that the exact number chosen is not significant due to the numerical stability of the algorithm. Figure 10 plots the monotonic decrease in spectral norm difference ∆ρ k ≡ ρ k − ρ k−1 F between iterations for 20 materials. This monotonic decrease is another indication of the stability of the algorithm.
The question of the physical significance of the unmixing results would best be addressed using data having known spectral signatures. In the absence of such data, we applied the unmixing algorithm twice, once to estimate ρ and C, and again to unmix data made from their product ρC. Applying this idea to the Beltsville data for 4 materials gave the results in Figures 11 and 12. Figure 11 compares the initially inferred spectra (top) with their estimates from the second unmixing (bottom). Figure 12 compares the spatial estimates for the first component as input (left) and inferred (right). We see close agreement in both cases, suggesting that the estimates have an intrinsic meaning.
Lastly, we look at the value of adding the explicit equality constraint 1 for the unit norm of ρ and the total variation l 1 -regularization of ρ to the unmixing algorithm. In Figure 13 the fitting error at convergence for 20 trials of the present algorithm (HS-8) with both unit-norm and TV regularization of ρ is compared with an earlier algorithm [15] (HS-4) having neither, but otherwise the same. The TV norm of ρ is defined as ∇ρ 1,l ≡ M j=1 |∇ρ j,l |. The number of materials was 6 here. The substantial improvement in the fitting error from adding the TV and unit-norm terms for ρ to the augmented Lagrangian is evident. 4. Summary and generalizations. The hyperspectral unmixing algorithm described here is based on ADMM using a biconvex augmented Lagrangian with positivity and unit-norm constraint enforcement through the splitting orthogonality constraint (SOC) method of Lai and Osher [8] with total variation l 1 regularization of the spectral estimates. The resulting algorithm is an efficient and reliable unmixing method that requires no prior information about the data such as its spectral content or the assumption of pure pixels. We have found that the use of spatial subsampling that uses as little as 1% of the spatial information for the unmixing can give almost three orders of magnitude improvement in the computation time with little or no degradation in the quality of the spectral estimates. Spatial estimates at full-image resolution are easily computed from the spectra by running the spatial estimation half of the ADMM algorithm on the full data cube. We have achieved numerically stable results from the ADMM unmixing algorithm on SpecTIR data with up to 20 materials.
The basic method presented here for fixed data cubes can be extended in several ways. First, the size of the cube can be generalized to data collections over arbitrarily long tracks from pushbroom sensors by using overlapped blocks of row-data as they become available for fixed numbers of columns (detectors) and spectral bands. The approach has been applied successfully to the SHARE data set [16] having 2805 rows with 320 columns and 360 bands. Second, the spectral estimates from the unmixing can be used as feature vectors in a support vector machine classifier. Because different spectral initializations give similar but distinct output spectra, we have developed an approach that uses the average of a few spectral initializations. We have found that using only 3-5 such initializations gives excellent classification results. The efficiency of the spatial subsampling method makes this computationally feasible. Third, the unmixing approach can be easily adapted for finding faint or small targets by first applying the basic algorithm to estimate and remove the clutter background, and performing the unmixing again on the residuals. The method has been successful in detecting faint chemical plumes in background clutter [17]. The same method could be used for small targets such as landmines. Finally, the methods developed here for hyperspectral data have been used for processing multispectral lidar data containing mixtures of aerosol and vapor components. , and C ∈ R M ×L are input matrices. One method of solving this equation is to use a Kronecker matrix product construction that involves the inversion of an M L × M L matrix. For small matrices this works well, but for implementation with M = 360 and L = 16, say, the processing times are too large for practical use with an iterative algorithm such as our ADMM unmixer.
We have developed a very efficient method for solving Sylvester's equation for the special case of A and B symmetric matrices, as they are in our unmixing problem. A more general algorithm could be based on the SVD expansion. In our case we represent A and B by their spectral representations where α and β are the eigenvalue spectra and φ and ψ are the orthonormal eigenvectors of dimension L and M , respectively. We expand ρ and C in these eigenvectors as where the expansion coefficients of C are given by orthonormality as Substituting the ρ expansion into those of A and B, we have where φ T k φ k = δ kk ≡ 1, for k = k , ≡ 0, for k = k and similarly for ψ were used. From Sylvester's equation we get Therefore, comparing coefficients we find However, since α k ≥ 0, and β j ≥ 0, α k = −β j would require both to vanish. This can be avoided by retaining only eigenvalues greater than the machine precision. We further note that for our iterative application, only A changes between iterations, so, since M > L, the larger matrix expansion for B needs to be done only once. The resulting algorithm applied to random (symmetric) matrices for M = 360 and L = 16 required 0.1 s compared to the 2.7 s needed by the Kronecker method. The results differed by less than 2e-10. When applied to a test data set of hyperspectral data with just L = 4 materials and 360 spectral bands, the Kronecker method required 1297 s for the unmixing compared with 13 s using the eigenfunction algorithm.
Received xxx xxxx; revised xxx xxxx.