Residual component analysis of hyperspectral images -- Application to joint nonlinear unmixing and nonlinearity detection

This paper presents a nonlinear mixing model for joint hyperspectral image unmixing and nonlinearity detection. The proposed model assumes that the pixel reflectances are linear combinations of known pure spectral components corrupted by an additional nonlinear term, affecting the endmembers and contaminated by an additive Gaussian noise. A Markov random field is considered for nonlinearity detection based on the spatial structure of the nonlinear terms. The observed image is segmented into regions where nonlinear terms, if present, share similar statistical properties. A Bayesian algorithm is proposed to estimate the parameters involved in the model yielding a joint nonlinear unmixing and nonlinearity detection algorithm. The performance of the proposed strategy is first evaluated on synthetic data. Simulations conducted with real data show the accuracy of the proposed unmixing and nonlinearity detection strategy for the analysis of hyperspectral images.


I. INTRODUCTION
Spectral unmixing (SU) of hyperspectral images has attracted growing interest over the last few decades. It consists of distinguishing the materials and quantifying their proportions in each pixel of the observed image. This blind source separation problem has been widely studied for the applications where pixel reflectances are linear combinations of pure component spectra [1]- [5]. However, as explained in [6], [7], the linear mixing model (LMM) can be inappropriate for some hyperspectral images, such as those containing sand, trees or vegetation areas. Nonlinear mixing models (NLMMs) provide an interesting alternative to overcoming the inherent limitations of the LMM. They have been proposed in the hyperspectral image literature and can be divided into two main classes [8].
The first class of NLMMs consists of physical models based on the nature of the environment. These models include the bidirectional reflectance based model proposed in [9] for intimate mixtures associated with sand-like materials and the bilinear models recently studied in [10]- [13] to account for scattering effects mainly observed in vegetation and urban areas.
The second class of NLMMs contains more flexible models allowing for different kinds of nonlinearities to be approximated. These flexible models are constructed from neural networks [14], [15], kernels [16], [17], or post-nonlinear transformations [18].
Unfortunately, developing nonlinear unmixing strategies and refining mixing models usually implies a high computational cost. While consideration of nonlinear effects can be relevant in specific areas, the LMM is often sufficient for approximating the actual mixing models in some image pixels, for instance in homogeneous regions. To reduce the complexity required to process an image, it makes sense to distinguish in any image, linearly mixed pixels which can be easily analyzed, from those nonlinearly mixed requiring deeper analysis. This paper presents a new supervised Bayesian algorithm for joint nonlinear SU and nonlinearity detection. This algorithm is supervised in the sense that the endmembers contained in the image are assumed to be known (chosen from a spectral library or extracted from the data by an endmember extraction algorithm (EEA)). This algorithm is based on a nonlinear mixing model inspired from residual component analysis (RCA) [22]. In the context of SU of hyperspectral images, the nonlinear effects are modeled by additive perturbation terms characterized by Gaussian processes (GPs). This allows the nonlinear terms to be marginalized, yielding a flexible model depending only on the nonlinearity energies. The hyperspectral image to be analyzed is partitioned into homogeneous regions in which the nonlinearities share the same GP. This algorithm relies on an implicit image classification, modeled by labels whose spatial dependencies follow a Potts-Markov random field. Consideration of two classes (linear vs. nonlinear mixtures) would lead to binary detection maps. However, this paper allows for nonlinearly mixed regions to be also identified, based on the energy of the nonlinear effects. More precisely, the proposed algorithm can identify regions with different level of nonlinearity and characterized by different GPs. Most SU algorithms assume additive, independent and identically distributed (i.i.d.) noise sequences. However, based on previous work conducted on real hyperspectral images, non i.i.d. noise vectors are considered in this paper.
In the Bayesian framework, appropriate prior distributions are chosen for the unknown parameters of the proposed RCA model, i.e., the mixing coefficients, the GP hyperparameters, the class labels and the noise covariance matrix. The joint posterior distribution of these parameters is then derived. However, the classical Bayesian estimators cannot be easily computed from this joint posterior. To alleviate this problem, a Markov chain Monte Carlo (MCMC) method is used to generate samples according to the posterior of interest. Finally, the generated samples are used to compute Bayesian estimators as well as measures of uncertainties such as confidence intervals.
The remaining paper is organized as follows. Section II introduces the RCA model for hyperspectral image analysis. Section III presents the hierarchical Bayesian model associated with the proposed RCA model and its posterior distribution. The Metropolis-Within-Gibbs sampler used to sample from the posterior of interest is detailed in Section V. Some simulation results conducted on synthetic and real data are shown and discussed in Sections VI and VII.
Conclusions are finally reported in Section VIII. May 11, 2014 DRAFT II. PROBLEM FORMULATION We consider a set of N observed pixel spectra y n = [y n,1 , . . . , y n,L ] T , n ∈ {1, . . . , N } where L is the number of spectral bands. Each of these spectra is defined as a linear combination of R known spectra m r , referred to as endmembers, contaminated by an additional spectrum φ n and additive noise a r,n m r + φ n + e n = Ma n + φ n + e n , n = 1, . . . , N where m r = [m r,1 , . . . , m r,L ] T is the spectrum of the rth material present in the scene, a r,n is its corresponding proportion in the nth pixel and e n is an additive independently and non identically distributed zero-mean Gaussian noise sequence with diagonal covariance matrix T is the vector of the L noise variances and diag (σ 2 ) is an L × L diagonal matrix containing the elements of the vector σ 2 . Moreover, the term φ n = [φ 1,n , . . . , φ L,n ] T in (1) is an unknown L × 1 additive perturbation vector modeling nonlinear effects occurring in the nth pixel. Note that the usual matrix and vector notations M = [m 1 , . . . , m R ] and a n = [a 1,n , . . . , a R,n ] T have been used in the second row of Eq. (1). There are several motivations for considering the mixing model (1). First, 1) this model reduces to the classical linear mixing model (LMM) for φ n = 0 L , 2) the model (1) is general enough to handle different of kinds of nonlinearities such as the bilinear model studied in [12] (referred to as Fan model (FM)), the generalized bilinear model (GBM) [13], and the polynomial post-nonlinear mixing model (PPNMM) studied for nonlinear spectral unmixing in [18] and nonlinearity detection in [20]. These models assume that the mixing model consists of a linear contribution of the endmembers, corrupted by at least one additive term characterizing the nonlinear effects. In the proposed model, all additive terms are gathered in the vector φ n . Note that a similar model, called robust LMM, has been also introduced in [23].
Due to physical considerations, the abundance vectors a n satisfy the following positivity and sum-to-one constraints R r=1 a r,n = 1, a r,n > 0, ∀r ∈ {1, . . . , R} . ( The problem addressed in this paper consists of the joint estimation of the abundance vectors and the detection of nonlinearly mixed pixels (characterized by φ n = 0 L ). The two next sections present the proposed Bayesian model for joint unmixing and nonlinearity detection. One of the main contributions of this paper is the characterization of the nonlinearities that will addressed later in Section IV.

A. Likelihood
Equation (1) shows that y n |M, a n , φ n , σ 2 is distributed according to a Gaussian distribution with mean Ma n + φ n and covariance matrix Σ 0 , denoted as y n |M, a n , φ n , σ 2 ∼ N (Ma n + φ n , Σ 0 ). Assuming independence between the observed pixels, the joint likelihood of the observation matrix Y can be expressed as denotes the exponential trace and X = MA + Φ is an L × N matrix.

B. Prior for the abundance matrix A
Each abundance vector can be written as a n = [c T n , a R,n ] T with c n = [a 1,n , . . . , a R−1,n ] T and a R,n = 1 − R−1 r=1 a r,n . The LMM constraints (??) impose that c n belongs to the simplex S = c c r > 0, ∀r ∈ 1, . . . , R − 1, To reflect the lack of prior knowledge about the abundances, we propose to assign noninformative prior distributions to the N vectors c n . More precisely, the following uniform is assigned to each vector c n , where 1 S (·) is the indicator function defined on the simplex S. Assuming prior independence between the N abundance vectors {a n } n=1,...,N leads to the following joint prior distribution where C = [c 1 , . . . , c N ] is an (R − 1) × N matrix.
which reflects the absence of knowledge for this parameter (see [24] for motivation). Assuming prior independence between the noise variances, we obtain

IV. MODELING THE NONLINEARITIES
We propose in this paper to exploit spatial correlations between the pixels of the hyperspectral image to be analyzed. It seems reasonable to assume that nonlinear effects occurring in a given pixel are related to the nonlinear effects present in neighboring pixels.
Formally, the hyperspectral image is assumed to be partitioned into K classes denoted as . . , N denote the subset of pixel indexes belonging to the kth is introduced to identify the class of each image pixel, i.e., In each class, nonlinearity vectors to be estimated are assumed to share the same statistical properties, as will be shown in the sequel.
A. Prior distribution for the nonlinearity matrix Φ As mentioned above, the mixing model (1) reduces to the LMM for φ n = 0 L . For nonlinearity detection, it makes sense to consider a pixel class (referred to as class C 0 ) corresponding to linearly mixed pixels. The resulting prior distribution for φ n conditioned upon z n = 0 is given by It can be seen that bilinear models and more generally polynomial models (i.e., model involv- For these reasons, we propose to divide nonlinearly mixed pixels into K − 1 classes and to assign different priors for the nonlinearity vectors belonging to the different classes. The nonlinearities (of nonlinearly mixed pixels) are assumed to be random. Assume y n belongs to the kth class. The prior distribution of the corresponding nonlinear term φ n is given by where K M is an L × L covariance matrix parameterized by the endmember matrix M and s 2 k is a scaling hyperparameter that tunes the energy of the nonlinearities in the kth class. Note that all nonlinearity vectors within the same class share the same prior. The performance of the unmixing procedure depends on the choice of K M , more precisely on the similarity measure associated with the covariance matrix. In this paper, we consider the symmetric second order polynomial kernel, which has received considerable interest in the machine learning community [25]. This kernel is defined as follows where denotes the Hadamard (termwise) product and m i,: denotes the ith row of M.
Polynomial kernels are particularly well adapted to characterize multiple scattering effects (modeled by polynomial functions of the endmembers). Note that the parametrization of the matrix K M in (12) only involves bilinear and quadratic terms 1 with respect to the endmembers m r , r = 1, . . . , R. More, precisely, the matrix K M can be rewritten as matrix. Note also that a polynomial kernel similar to (12) has been recently considered in [16] and that other kernels such as the Gaussian kernel could be investigated to model other nonlinearities as in [22].
1 Note: it can be shown that (11) and (12) can be obtained by defining φ n as a linear combination of terms mi mj (as in [13]) and by marginalizing the corresponding coefficients using a Gaussian prior parameterized by s 2 k . Marginalizing these coefficients allows the number of unknown parameters to be significantly reduced, leading to the nonlinearities being characterized by a single parameter s 2 k .
May 11, 2014 DRAFT B. Prior distribution for the label vector z In the context of hyperspectral image analysis, the labels z 1 , . . . , z N indicate the pixel classes and take values in {0, . . . , K − 1} where K is the number of classes and the set {z n } n=1,...,N forms a random field. To exploit the correlation between pixels, a Markov random field is introduced as a prior distribution for z n given its neighbors z V(n) , i.e., where V(n) is the neighborhood of the nth pixel and z \n = {z n } n =n . More precisely, this paper focuses on the Potts-Markov model since it is very appropriate for hyperspectral image segmentation [21]. Given a discrete random field z attached to an image with N pixels, the where β > 0 is the granularity coefficient, G(β) is a normalizing (or partition) constant and δ(·) is the Dirac delta function. Several neighborhood structures can be employed to define V(n). Fig. 1 shows two examples of neighborhood structures. The eight pixel structure (or 2-order neighborhood) will be considered in the rest of the paper. values of β lead to fewer and larger homogeneous regions. In this paper, the granularity coefficient is assumed to be known. Note however that it could be also included within the Bayesian model and estimated using the strategy described in [26].

C. Hyperparameter priors
The performance of the proposed Bayesian model for spectral unmixing mainly depends on the values of the hyperparameters {s 2 k } k=1,...,K . When the hyperparameters are difficult to adjust, it is the norm to include them in the unknown parameter vector, resulting in a hierarchical Bayesian model [18], [27]. This strategy requires the definition of prior distributions for the hyperparameters.
The following inverse-gamma prior distribution is assigned to the nonlinearity hyperparameters, where (γ, ν) are additional parameters that will be fixed to ensure a noninformative prior for s 2 k ((γ, ν) = (1, 1/4) in all simulations presented in this paper). Assuming prior independence between the hyperparameters, we where

A. Marginalized joint posterior distribution
The resulting directed acyclic graph (DAG) associated with the proposed Bayesian model introduced in Sections III and IV is depicted in Fig. 2.
Assuming prior independence between A, (Φ, z) and σ 2 , the posterior distribution of (Φ, θ) where θ = (C, z, σ 2 , s 2 ) can be expressed as where . This distribution can be marginalized with respect to Φ as follows where . . , K − 1) andȳ n = y n − Ma n . The advantage of this marginalization is to avoid sampling the nonlinearity matrix Φ. Thus, the nonlinearities are fully characterized by the known endmember matrix, the class labels and the values of the hyperparameters in s 2 = [s 2 1 , . . . , s 2 K ] T . Unfortunately, it is difficult to obtain closed form expressions for standard Bayesian estimators associated with (17). In this paper, we propose to use efficient Markov Chain Monte Carlo (MCMC) methods to generate samples asymptotically distributed according to (17). The next part of this section presents the Gibbs sampler which is proposed to sample according to (17).
The principle of the Gibbs sampler is to sample according to the conditional distributions of the posterior of interest [28,Chap. 10]. Due to the large number of parameters to be estimated, it makes sense to use a block Gibbs sampler to improve the convergence of the sampling procedure. More precisely, we propose to sample sequentially the N labels in z, the abundance matrix A, the noise variances σ 2 and s 2 using moves that are detailed in the next paragraphs.

B. Sampling the labels
For the nth pixel (n ∈ {1, . . . , N }), the label z n is a discrete random variable whose conditional distribution is fully characterized by the probabilities P (z n = k|y n , M, θ \zn ) ∝ f (y n |M, s 2 , z n = k, a n ) where θ \zn denotes θ without z n , k = 0, . . . , K − 1 (for K classes). These posterior probabilities are Consequently, sampling z n from its conditional distribution can be achieved by drawing a discrete value in the finite set {0, . . . , K − 1} with the probabilities defined in (19).

C. Sampling the abundance matrix A
Sampling from f (C|Y, M, z, σ 2 , s 2 ) seems difficult due to the complexity of this distribution. However, it can be shown that i.e., the N abundance vectors {a n } n=1,...,N are a posteriori independent and can be sampled independently in a parallel manner. Straightforward computations lead to c n |y n , M, z n = k, σ 2 , s 2 ∼ N S (c n , Ψ n ) where andỹ n = y n − m R . Moreover, N S (c n , Ψ n ) denotes the truncated multivariate Gaussian distribution defined on the simplex S with hidden meanc n and hidden covariance matrix Ψ n . Sampling from (21) can be achieved efficiently using the method recently proposed in [29].

D. Sampling the noise variance σ 2
It can be shown from (17) that where Sampling from (24) is not straightforward. In this case, an accept/reject procedure can be used to update σ 2 , leading to a hybrid Metropolis-within-Gibbs sampler. In this paper, we introduce the standard change of variable δ = log(σ 2 ), δ ∈ R. A Gaussian random walk for δ is used to update the variance σ 2 . Note that the noise variances are a posteriori independent. Thus they can be updated in a parallel manner. The variances of the L parallel Gaussian random walk procedures have been adjusted during the burn-in period of the sampler to obtain an acceptance rate close to 0.5, as recommended in [30, p. 8].

E. Sampling the vector s 2
It can be shown from (17) that  Table II, which represents three possible levels of nonlinearity. For each class, the nonlinear terms have been generated according to (11). The label map generated with β = 1.2 is shown in Fig. 3 (left). The abundance vectors a n , n = 1, . . . , 3600 have been randomly generated according to a uniform distribution over the admissible set defined by the positivity and sum-to-one constraints. The noise variance (depicted in Fig. 4 as a function of the spectral bands) have been arbitrarily fixed using to model a non-i.i.d. (colored) noise. The joint nonlinear SU and nonlinearity detection algorithm, denoted as "RCA-SU", has been applied to this data set with N MC = 3000 and N bi = 1000. Fig. 3 (right) shows that the estimated label map (marginal MAP estimates) is in agreement with the actual label map. Moreover, the confusion matrix depicted in Table   I illustrate the performance of the RCA-SU in term of pixel classification. Table II shows that the RCA-SU provides accurate hyperparameter estimates and thus can be used to obtain information about the importance of nonlinearities in the different regions. Note that the estimation error is computed using |s 2 k −ŝ 2 k |/s 2 k , where s 2 k andŝ 2 k are the actual and estimated dispersion parameters for the kth class. The estimated noise variances, depicted in Fig. 4 are also in good agreement with the actual values of the variances. The quality of abundance with N k = card(I k ) and where a n andâ n are the actual and estimated abundance vectors for the nth pixel of the image. For this scenario, the proposed algorithm is compared with the classical FCLS algorithm [2] assuming the LMM. Comparisons to nonlinear SU methods will be addressed in the next paragraph (scenario 2). Table III shows the RNMSEs obtained with the proposed and the FLCS algorithms for this first data set. These results show that the two algorithms provide similar abundance estimates for the first class, corresponding to linearly mixed pixels. For the three nonlinear classes, the estimation performance is reduced.
However, the proposed algorithm provides better results than the FCLS algorithm that does not handle nonlinear effects.  B. Second scenario: RCA vs. nonlinear unmixing 1) Data set: The performance of the proposed joint nonlinear SU and nonlinearity detection algorithm is then evaluated on a second synthetic image of 60 × 60 pixels containing the R = 3 spectral components presented in the previous section. In this scenario, the image consists of pixels generated according to four different mixing models associated with four classes (K = 4). The label map generated using β = 1.2 is shown in Fig. 5 (a). The class C 0 is associated with the LMM. The pixels of class C 1 have been generated according to the generalized bilinear mixing model (GBM) [13] y n = R r=1 a r,n m r where n ∈ I 1 and the nonlinearity parameters {γ i,j } have been uniformly drawn in [0.5, 1].
The class C 2 is composed of pixels generated according to the PPNMM [18] as follows a r,n m r R r=1 a r,n m r + e n (29) where n ∈ I 2 and b = 0.5 for all pixels in class C 2 . Finally, the class C 3 has been generated according to (1) with s 2 = 0.1. For the four classes, the abundance vectors have been randomly generated according to a uniform distribution over the admissible set defined by the positivity and sum-to-one constraints. All pixels have been corrupted by an additive i.i.d Gaussian noise of variance σ 2 = 10 −4 , corresponding to an average signal-to-noise ratio SNR 30dB. The noise is assumed to be i.i.d. for a fair comparison with SU algorithms assuming i.i.d. Gaussian noise. Fig. 5 (b) shows the log-energy of the nonlinearity parameters for each pixel of the image, i.e., log φ n 2 for n = 1, . . . , 3600. This figure shows that each class corresponds to a different level of nonlinearity.
2) Unmixing: Different estimation procedures have been considered for the four different mixing models: • The FCLS algorithm [2] which is known to have good performance for linear mixtures.
• The GBM-based approach [32] which is particularly adapted for bilinear nonlinearities.
• The gradient-based approach of [18] which is based on a PPNMM and has shown nice properties for various nonlinear models.  • Finally, we consider the K-Hype method [16] to compare our algorithm with state-of-the art kernel based unmixing methods. The kernel used in this paper is the polynomial, second order symmetric kernel whose Gram matrix is defined by (12). This kernel provides better performance on this data set than the kernels studied in [16] (namely the Gaussian and the polynomial, second order asymmetric kernels). All hyperparameters of the K-Hype algorithm have been optimized using preliminary runs.  The unmixing quality is also evaluated by the reconstruction error (RE) defined as  where y n is the nth observation vector andŷ n its estimate.  From a reconstruction point of view, the K-Hype and RCA-SU algorithms provides similar results. However, the proposed algorithm also provides nonlinearity detection maps. The PPNMM and RCA-SU algorithms perform similarly in term of abundance estimation and allow both nonlinearities to be detected in each pixel. However, the nonlinearities can be analyzed more deeply using the RCA-SU, as will be shown in the next part.
3) Nonlinearity detection: The performance of the proposed algorithm for nonlinearity detection is compared to the detector studied in [20], which is coupled with the PPNMM-based

A. Data set
The real image considered in this section was acquired in 2010 by the Hyspex hyperspectral scanner over Villelongue, France (0003'W and 4257'N). L = 160 spectral bands were recorded from the visible to near infrared with a spatial resolution of 0.5m. This dataset has already been studied in [17], [33] and is mainly composed of forested and urban areas.
More details about the data acquisition and pre-processing steps are available in [33]. A subimage (of size 41 × 29 pixels) is chosen here to evaluate the proposed unmixing procedure and is depicted in Fig. 6. The scene is composed mainly of roof, road and grass pixels, resulting in R = 3 endmembers. The spectral signatures of these components have been extracted from the data using the N-FINDR algorithm [34] and are depicted in Fig. 7.

B. Spectral unmixing
The proposed algorithm has been applied to this data set with N MC = 3000 and N bi = 1000.
The number of classes has been set to K = 4 (one linear class and three nonlinear classes).
The granularity parameter of the prior (14) has been fixed to β = 0.7. Fig. 8 shows examples of abundance maps estimated by the FCLS algorithm, the gradient-based method assuming the GBM [32], the PPNMM [18], the K-Hype [16] algorithms and the proposed method.
The abundance maps estimated by the RCA-SU algorithm are in good agreement with the state-of-the art algorithms. However, Table VI shows that K-Hype and the proposed algorithm provide a lower reconstruction error. Fig. 9 compares the noise variances estimated by the RCA-SU for the real image with the noise variances estimated by the HySime algorithm [35].
The HySime algorithm assumes additive noise and estimates the noise covariance matrix of the image using multiple regression. Fig. 9 shows that the two algorithms provide similar noise variance estimates. These results motivate the consideration of non i.i.d. noise for hyperspectral image analysis since the noise variances increase for the highest wavelengths.
The simulations conducted on this real dataset show the accuracy of the proposed RCA-SU in terms of abundance estimation and reconstruction error, especially for applications where the noise variances vary depending on the wavelength. Moreover, it also provides information about the nonlinearities of the scene.  structures, the proposed detector provides homogeneous regions. Similar structures can be identified in this detection map and the true color image of the scene ( Fig. 10 (a)). The estimated class C 0 (black pixels) associated with linearly mixed pixels is mainly located in the roof region. The class C 1 (dark grey pixels) can be related to regions where the main component in the pixels are grass or road. Mixed pixels composed of grass and road are gathered in class C 2 (light grey pixels). Finally, shadowed pixels located between the roof and the road are associated with the last class C 3 (white pixels). Moreover, the RCA-SU can identify three levels of nonlinearity, corresponding to [ŝ 2 1 ,ŝ 2 2 ,ŝ 2 3 ] = [0.03, 0.50, 29.5]. The most influent nonlinearity class is class C 3 , where shadowing effects occurs. Mixed pixels of class C 2 contain weaker nonlinearities. Finally, the remaining pixels of class C 1 are associated with the weakest nonlinearities. The nonlinearities of this class can probably be explained by the endmember variability and/or the endmember estimation error. It is interesting to note that the RCA-SU identifies two rather linear classes associated with homogeneous regions mainly composed of a single parameter (classes C 0 and C 1 ). The two latter classes (classes Simulations conducted on synthetic data illustrated the performance of the proposed algorithm for linear and nonlinear spectral unmixing. An important advantage of the proposed algorithm is its robustness regarding the actual underlying mixing model. Another interesting property resulting from the nonlinear mixing model considered is the possibility of detecting several kinds of linearly and nonlinearly mixed pixels. This detection can be used to identify the image regions affected by nonlinearities in order to characterize the nonlinear effects more deeply. Finally, simulations conducted with real data showed the accuracy of the proposed unmixing and nonlinearity detection strategy for the analysis of real hyperspectral images.
The endmembers contained in the hyperspectral image were assumed to be known in this work. Of course, the performance of the algorithm relies on this endmember knowledge.
We think that estimating the pure component spectra present in the image, jointly with the abundance estimation and the nonlinearity detection is an important issue that should be considered in future work. Finally, the number of classes and the granularity of the scene were assumed to be known in this study. Estimating these parameters is clearly a challenging issue that is under investigation.