Correntropy Maximization via ADMM - Application to Robust Hyperspectral Unmixing

In hyperspectral images, some spectral bands suffer from low signal-to-noise ratio due to noisy acquisition and atmospheric effects, thus requiring robust techniques for the unmixing problem. This paper presents a robust supervised spectral unmixing approach for hyperspectral images. The robustness is achieved by writing the unmixing problem as the maximization of the correntropy criterion subject to the most commonly used constraints. Two unmixing problems are derived: the first problem considers the fully-constrained unmixing, with both the non-negativity and sum-to-one constraints, while the second one deals with the non-negativity and the sparsity-promoting of the abundances. The corresponding optimization problems are solved efficiently using an alternating direction method of multipliers (ADMM) approach. Experiments on synthetic and real hyperspectral images validate the performance of the proposed algorithms for different scenarios, demonstrating that the correntropy-based unmixing is robust to outlier bands.

problems, by considering the maximization of the so-called correntropy [17], [18]. Due to its stability and robustness to noise and outliers, the correntropy maximization is based on theoretical foundations and has been successfully applied to a wide class of applications, including cancer clustering [19], face recognition [20], and recently hyperspectral unmixing [21], to name a few. In these works, the resulting problem is optimized by the half-quadratic technique [22], either in a supervised manner [20] or as an unsupervised nonnegative matrix factorization [19], [21].
In this paper, we consider the hyperspectral unmixing problem by defining an appropriate correntropy-based criterion, thus taking advantage of its robustness to large outliers, as opposed to the conventional ℓ 2 -norm criteria. By including constraints commonly used for physical interpretation, we propose to solve the resulting constrained optimization problems with alternating direction method of multipliers (ADMM) algorithms. Indeed, the ADMM approach splits a hard problem into a sequence of small and handful ones [13]. Its relevance to solve nonconvex problems was studied in [13,Section 9]. We show that ADMM provides a relevant framework for incorporating different constraints raised in the unmixing problem. We present the socalled CUSAL (for Correntropy-based Unmixing by variable Splitting and Augmented Lagrangian), and study in particularly two algorithms: CUSAL-FC to solve the fully-constrained (ANC and ASC) correntropy-based unmixing problem, and the CUSAL-SP to solve the sparsity-promoting correntropy-based unmixing problem.
The rest of the paper is organized as follows. We first provide a succinct survey on the classical unmixing problems in Section II. In Section III, we propose the correntropy-based unmixing problems subject to the aforementioned constraints, and study the robustness. The resulting optimization problems are solved by the ADMM algorithms described in Section IV.
Experiments on synthetic and real hyperspectral images are presented in Sections V and VI, respectively. Finally, Section VII provides some conclusions and future works.

II. CLASSICAL UNMIXING PROBLEMS
The linear mixture model (LMM) assumes that each spectrum can be expressed as a linear combination of a set of pure material spectra, termed endmembers [1]. Consider a hyperspectral image and let Y ∈ R L×T denote the matrix of the T pixels/spectra of L spectral bands. Let y * t be its t-th column and y l * its l-th row, representing the l-th band of all pixels. For notation simplicity, we denote y t = y * t , for t = 1, . . . , T . The LMM can be written as where M = [m 1 · · · m R ] ∈ R L×R is the matrix composed by the R endmembers with m r = [m 1r · · · m Lr ] ⊤ , x t = [x 1t · · · x Rt ] ⊤ is the abundance vector associated with the t-th pixel, and n t ∈ R L is the additive noise. In matrix form for In the following, the endmembers are assumed known, either from ground-truth information or by using any endmember extraction technique. The spectral unmixing problem consists in estimating the abundances for each pixel, often by solving the least-squares optimization problem for each t = 1, . . . , T , where · 2 denotes the conventional ℓ 2 -norm. The solution to this conventional least-squares problem is given by the pseudo-inverse of the (tall) endmember matrix, with x t = (M ⊤ M ) −1 M ⊤ y t . The least-squares optimization problems (2), for all t = 1, . . . , T , are often written in a single optimization problem using the following matrix formulation where · 2 F denotes the Frobenius norm. Its solution is Finally, this optimization problem can be also tackled by considering all the image pixels at each spectral band, which yields the following least-squares optimization problem where (·) l * denotes the l-th row of its argument. While all these problem formulations have a closed-form solution, they suffer from two major drawbacks. The first one is that several constraints need to be imposed in order to have a physical meaning of the results. The second drawback is its sensitivity to noise and outliers, due to the use of the ℓ 2 -norm as a fitness measure.
These two drawbacks are detailed in the following.
To be physically interpretable, the abundances should be nonnegative (ANC) and satisfy the sum-to-one constraint (ASC).
Considering both constraints, the fully-constrained least-squares problem is formulated as, for each t = 1, . . . , T , where 1 ∈ R R×1 denotes the column vector of ones and 0 is the non-negativity applied element-wise; In matrix form: Since there is no closed-form solution when dealing with the non-negativity constraint, several iterative techniques have been proposed, such as the active set scheme with the Lawson and Hanson's algorithm [23], the multiplicative iterative strategies [24], and the fully-constrained least-squares (FCLS) technique [9]. More recently, the alternating direction method of multipliers (ADMM) was applied with success for hyperspectral unmixing problem, with the SUnSAL algorithm [12].
Recent work in hyperspectral unmixing have advocated the sparsity of the abundance vectors [12], [25], [26]. In this case, each spectrum is fitted by a sparse linear mixture of endmembers, namely only the abundances with respect to a small number of endmembers are nonzero. To this end, the sparsity-promoting regularization with the ℓ 1 -norm is included in the cost function, yielding the following constrained sparse regression problem [12], for each t = 1, . . . , T , where the parameter λ balances the fitness of the least-squares solution and the sparsity level. It is worth noting that the ASC is relaxed when the ℓ 1 -norm is included. This problem is often considered by using the following matrix formulation x t 1 , subject to X 0.

Sensitivity to outliers
All the aforementioned algorithms rely on solving a (constrained) least-squares optimization problem, thus inheriting the drawbacks of using the ℓ 2 -norm as the fitness measure. A major drawback is its sensitivity to outliers, where outliers are some spectral bands that largely deviate from the rest of the bands. Indeed, considering all the image pixels, the least-squares optimization problems take the form subject to any of the aforementioned constraints. From this formulation, it is easy to see how the squared ℓ 2 -norm gives more weight to large residuals, namely to outliers in which predicted values (M X) l * are far from actual observations y l * .
Moreover, it is common for hyperspectral images to present up to 20% of unusable spectral bands due to low signal-to-noise ratio essentially from atmospheric effects, such as water absorption. In the following section, we overcome this difficulty by considering the correntropy maximization principle from the information theoretic learning, which yields an optimization problem that is robust to outliers.

III. CORRENTROPY-BASED UNMIXING PROBLEMS
In this section, we examine the correntropy and write the unmixing problems as correntropy maximization ones. Algorithms for solving these problems are derived in Section IV.

A. Correntropy
The correntropy, studied in [17], [18], is a nonlinear local similarity measure. For two random variables, Y and its estimation Y using some model/algorithm, it is defined by 6 where IE[·] is the expectation operator, and κ(·, ·) is a shift-invariant kernel satisfying the Mercer theorem [27]. In practice, while the joint distribution function of Y and Y is unavailable, the sample estimator of correntropy is adopted instead. Employing a finite number of data {(y l * , y l * )} L l=1 , it is estimated by up to a normalization factor. The Gaussian kernel is the most commonly-used kernel for correntropy [17], [20], [28]. This leads to the following expression for the correntropy where σ denotes the bandwidth of the Gaussian kernel.
The maximization of the correntropy, given by is termed the maximum correntropy criterion [17]. It is noteworthy that well-known second-order statistics, such as the mean square error (MSE) depends heavily on the Gaussian and linear assumptions [17]. However, in presence of non-Gaussian noise and in particular large outliers, i.e., observations greatly deviated from the data bulk, the effectiveness of the MSE-based algorithms will significantly deteriorate [29]. By contrast, the maximization of the correntropy criterion is appropriate for non-Gaussian signal processing, and is robust in particular against large outliers, as shown next.

B. The underlying robustness of the correntropy criterion
In this section, we study the sensitivity to outliers of the correntropy maximization principle, by showing the robustness of the underlying mechanism. To this end, we examine the behavior of the correntropy in terms of the residual error defined by ǫ l = y l * − y l * 2 . Thus, the correntropy (8) becomes Compared with second-order statistics, e.g. MSE, the correntropy is more robust with respect to the outliers, as shown in Fig. 1 illustrating the second-order and the correntropy objective functions in terms of the residual error. As the residual error increases, the second-order function keeps increasing dramatically. On the contrary, the correntropy is only sensitive within a region of small residual errors, this region being controlled by the kernel bandwidth. For large magnitudes of residual error, the correntropy falls to zero. Consequently, the correntropy criterion is robust to large outliers.

C. Correntropy-based unmixing problems
The correntropy-based unmixing problem consists in estimating the unknown abundance matrix X, by minimizing the objective function C (the negative of correntropy), given by where the Gaussian kernel was considered, or equivalently Considering both the ANC and ASC constraints, the fully-constrained correntropy unmixing problem becomes min X C(X), subject to X 0 For the sake of promoting sparse representations, the objective function (9)-(10) can be augmented by the ℓ 1 -norm penalty on the abundance matrix X, leading to the following problem IV. ADMM FOR SOLVING THE CORRENTROPY-BASED UNMIXING PROBLEMS We first briefly review the alternating direction method of multipliers (ADMM), following the expressions in [13,Chap. 3].
Consider an optimization problem of the form where the functions f and g are closed, proper and convex. The ADMM solves the equivalent constrained problem such as having the particular constraint x = z for instance. While this formulation may seem trivial, the optimization problem can now be tackled using the augmented Lagrangian method where the objective function is separable in x and z. By alternating on each variable separately, the ADMM repeats a direct update of the dual variable. In its scaled form, the ADMM algorithm is summarized in Algorithm 1.

A. Correntropy-based unmixing with full-constraints
In the following, we apply the ADMM algorithm to solve the correntropy-based unmixing problem in the fully-constrained case, presented in (11). The main steps are summarized in Algorithm 2. Rewrite the variables to be optimized in a vector x ∈ R RT ×1 , which is stacked by the columns of the matrix X, namely Rewrite also the following vectors By following the formulation of the ADMM in Algorithm 1, we set where I is the identity matrix, 0 ∈ R RT ×1 is the zero vector and ι S (u) is the indicator function of the set S defined by In this case, the subproblem of the x-update (in line 3 of Algorithm 1) addresses a nonconvex problem without any closedform solution. To overcome this difficulty, we apply an inexact ADMM variant in lines 3-5 of Algorithm 2, which solves the subproblem iteratively using the gradient descent method, instead of solving it exactly and explicitly.
Before that, we eliminate the T equality constraints, i.e., the sum-to-one constraints, by replacing x Rt with x rt , for t = 1, . . . , T . Let x ∈ R (R−1)T ×1 be the reduced vector of (R − 1) unknowns to be estimated, stacked by for t = 1, . . . , T . By this means, the objective function in (14) is transformed from (10) into the reduced-form where ǫ l (x t ) = y lt − m lR − R−1 p=1 (m lp − m lR )x pt , for l = 1, . . . , L. The gradient of (15) with respect to x is stacked as , with the entries given by for all r = 1, . . . , (R − 1) and t = 1, . . . , T . Similarly, the function ρ 2 x − z k − u k 2 2 is expressed with respect to x as with the entries in its gradient ∂φ ∂x given by x k+1 = x k+1 − η( ∂f1 ∂x k+1 + ∂φ ∂x k+1 );

B. Sparsity-promoting unmixing algorithm
In order to apply the ADMM algorithm, we express the constrained optimization problem (12) as follows g(x) = ι R RT + (x) + λ x 1 A = −I, B = I and c = 0.
By analogy with the previous case, the x-update in line 3 of Algorithm 1 is solved iteratively with the gradient descent method and is given in Algorithm 3 lines 3-5. The gradient of (17) with respect to xis stacked by ∂f ∂xt , where The z-update in line 4 Algorithm 1 involves solving In [13], the ADMM has been applied to solve various ℓ 1 -norm problems, including the well-known LASSO [30]. The only difference between (18) and the z-update in LASSO is that in the latter, no non-negativity term ι R+ RT (z) is enforced. In this case, the z-update in LASSO is the element-wise soft thresholding operation where the soft thresholding operator [13] is defined by Following [12], it is straightforward to project the result onto the nonnegative orthant in order to include the non-negativity constraint, thus yielding where the maximum function is element-wise. All these results lead to the correntropy-based unmixing algorithm with sparsitypromoting, as summarized in Algorithm 3.
The bandwidth σ in the Gaussian kernel should be well-tuned. Note that a small value for this parameter punishes harder the outlier bands, thus increasing the robustness of the algorithm to outliers [20]. Note that, in this study, the ADMM is applied to address a nonconvex objective function, thus no convergence is guaranteed theoretically, according to [13]. Considering these issues, we propose to fix the bandwidth empirically as summarized in Algorithm 4 and described next.
Following [20], [21], we first initialize the bandwidth parameter as a function of the reconstruction error, given by where X LS is the least-squares solution (4). In the case of a result too apart from that of least-squares solution, the parameter The algorithm divergence occurs if the stopping criterion (ii) is satisfied, namely the primal residual increases over iterations. In this case, either the parameter is too large due to an overestimated initialization, or it is too small. Accordingly, we either decrease it by σ = σ 0 /p, or increase it by σ = 1.2σ, until that the ADMM converges. if σ > 1000σ 0 (due to the overestimated σ 0 ) then 11: p = p + 1; 12: decrease σ = σ 0 /p, and go to line 2 13: else 14: increase σ = 1.2σ, and go to line 2 15: end if 16: end if 13 V. EXPERIMENTS WITH SYNTHETIC DATA In this section, the performance of the proposed fully-constrained (CUSAL-FC) and sparsity-promoting (CUSAL-SP) algorithms is evaluated on synthetic data. A comparative study is performed considering six state-of-the-art methods proposed for linear and nonlinear unmixing models.
• Fully-Constrained Least-Squares (FCLS) [9]: The FCLS is developed for the linear model. Enforcing both ANC and ASC constraints, this technique yields the optimal abundance matrix in the least-squares sense.
where 0 ≤ γ ij,t ≤ 1 controls the interactions between endmembers m i and m j , and ⊙ is the element-wise product. The BayGBM considers both ANC and ASC.
• The Bayesian algorithm for Polynomial Post-Nonlinear Mixing Model (BayPPNMM) [33]: This algorithm estimates the parameters by assuming that the pixel reflectances are nonlinear functions of endmembers using where the nonlinear terms are characterized by b t ∈ R, and both ANC and ASC are required.
• Kernel Fully-Constrained Least-Squares (KFCLS) [15]: This method generalizes FCLS, by replacing the inner product with a kernel function. In the following, the Gaussian kernel is applied for simulation.
• Robust nonnegative matrix factorization (rNMF) [34]: To capture the nonlinear effect (outliers), this NMF-based method introduces a group-sparse regularization term into the linear model. Accounting for both constraints, the problem is optimized by a block-coordinate descent strategy. For fair comparison in this paper, the endmembers are fixed with the real ones. The regularization parameter is set with the degree of sparsity as suggested in [33].
We  Each image, of 50 × 50 pixels, is generated using either the linear mixing model (1) or the polynomial post-nonlinear mixing model (PPNMM) (20), where n t is a Gaussian noise of SNR ∈ {15, 35}dB. The R ∈ {3, 6} endmembers, as shown in Fig. 2, are drawn from the USGS digital spectral library [35]. These endmembers are defined over L = 244 continuous bands with the wavelength ranging from 0.2µm to 3.0µm. The abundance vectors x t are uniformly generated using a Dirichlet distribution as in [35], [36]. For PPNMM, the values of b t are generated uniformly in the set (−3, 3) according to [33]. The unmixing performance is evaluated using the abundance root mean square error (RMSE) [31], [37], defined by where x t is the estimated abundance vector. Fig. 3 and 4 illustrates the average of RMSE over 10 Monte-Carlo realizations, respectively on the LMM and PPNMM data. It is easy to see that, in presence of outlier bands, the proposed CUSAL-FC algorithm outperforms all the comparing methods in terms of RMSE, for different mixture models, noise levels and numbers of endmembers. It is also shown that the performance of the proposed algorithm improves when increasing the SNR.
The unmixing performance with the sparsity-promoting algorithms is evaluated using the signal-to-reconstruction error, measured in decibels, according to [12], [25]. It is defined by The results, averaged over ten Monte-Carlo realizations, are illustrated in Fig. 5. Considering that the abundance matrix under estimation is sparse at different levels, we conclude the following: Concerning the case without outlier bands, the CUSAL-SP outperforms the SUnSAL-SAL for K > 8 and FCLS for K > 12. When the number of outlier bands is increases, the proposed CUSAL-SP algorithm generally provides the best unmixing quality with the highest SRE value, especially for K > 6. The image has been widely investigated in the literature [7], [25]. The raw data contains L = 224 bands, covering a wavelength range 0.4 − 2.5µm. Among, there are 37 relatively noisy ones with low SNR, namely the bands 1 − 3, 105 − 115, 150 − 170, and 223 − 224. The geographic composition of this area is estimated to include up to 14 minerals [3]. Neglecting the similar signatures, we consider 12 endmembers as often investigated in the literature [7], [38]. The VCA technique is first applied to extract these endmembers on the clean image with L = 187 bands. Starting from L = 187 bands, the noisy bands, randomly  Since ground-truth abundances are unknown, the performance is measured with the averaged spectral angle distance (SAD) between the input spectra y t and the reconstructed ones y t , as illustrated in Fig. 6, where the SAD is defined by The estimated abundance maps using 187, 205 and 224 bands are given in Fig. 7, Fig. 8, and Fig. 9, respectively. In absence of noisy bands (i.e., L = 187 bands), all the considered methods lead to satisfactory abundance maps, with BayPPNMM providing the smallest SAD. As the number of noisy bands increases, especially from L = 199 to L = 224, the unmixing performance of the state-of-the-art methods deteriorates drastically, while the proposed CUSAL yields stable SAD. The obtained results confirm the good behavior of the proposed CUSAL algorithms and their robustness in presence of corrupted spectral bands.

VII. CONCLUSION
This paper presented a supervised unmixing algorithm based on the correntropy maximization principle. Two correntropybased unmixing problems were addressed, the first with the non-negativity and sum-to-one constraints, and the second with the non-negativity constraint and a sparsity-promoting term. The alternating direction method of multipliers (ADMM) was investigated in order to solve the correntropy-based unmixing problems. The effectiveness and robustness of the proposed unmixing method were validated on synthetic and real hyperspectral images. Future works include the generalization of the correntropy criterion to account for the multiple reflection phenomenon [31], [39], as well as incorporating nonlinear models [40].     Fig. 9. Cuprite image: Estimated abundance maps using all the 224 bands, with 187 clean bands. Same legend as Fig. 7.