Constrained Nonnegative Matrix Factorization for Blind Hyperspectral Unmixing incorporating Endmember Independence

Hyperspectral unmixing (HU) has become an important technique in exploiting hyperspectral data since it decomposes a mixed pixel into a collection of endmembers weighted by fractional abundances. The endmembers of a hyperspectral image (HSI) are more likely to be generated by independent sources and be mixed in a macroscopic degree before arriving at the sensor element of the imaging spectrometer as mixed spectra. Over the past few decades, many attempts have focused on imposing auxiliary constraints on the conventional nonnegative matrix factorization (NMF) framework in order to effectively unmix these mixed spectra. As a promising step toward finding an optimum constraint to extract endmembers, this paper presents a novel blind HU algorithm, referred to as Kurtosis-based Smooth Nonnegative Matrix Factorization (KbSNMF) which incorporates a novel constraint based on the statistical independence of the probability density functions of endmember spectra. Imposing this constraint on the conventional NMF framework promotes the extraction of independent endmembers while further enhancing the parts-based representation of data. Experiments conducted on diverse synthetic HSI datasets (with numerous numbers of endmembers, spectral bands, pixels, and noise levels) and three standard real HSI datasets demonstrate the validity of the proposed KbSNMF algorithm compared to several state-of-the-art NMF-based HU baselines. The proposed algorithm exhibits superior performance especially in terms of extracting endmember spectra from hyperspectral data; therefore, it could uplift the performance of recent deep learning HU methods which utilize the endmember spectra as supervisory input data for abundance extraction.

Abstract-Hyperspectral unmixing (HU) has become an important technique in exploiting hyperspectral data since it decomposes a mixed pixel into a collection of endmembers weighted by fractional abundances. The endmembers of a hyperspectral image (HSI) are more likely to be generated by independent sources and be mixed in a macroscopic degree before arriving at the sensor element of the imaging spectrometer as mixed spectra. Over the past few decades, many attempts have focused on imposing auxiliary regularizes on the conventional nonnegative matrix factorization (NMF) framework in order to effectively unmix these mixed spectra. As a promising step toward finding an optimum regularizer to extract endmembers, this paper presents a novel blind HU algorithm, referred to as Kurtosis-based Smooth Nonnegative Matrix Factorization (KbSNMF) which incorporates a novel regularizer based on the statistical independence of the probability density functions of endmember spectra. Imposing this regularizer on the conventional NMF framework promotes the extraction of independent endmembers while further enhancing the parts-based representation of data. Experiments conducted on diverse synthetic HSI datasets (with numerous numbers of endmembers, spectral bands, pixels, and noise levels) and three standard real HSI datasets demonstrate the validity of the proposed KbSNMF algorithm compared to several stateof-the-art NMF-based HU baselines. The proposed algorithm exhibits superior performance especially in terms of extracting endmember spectra from hyperspectral data; therefore, it could uplift the performance of recent deep learning HU methods which utilize the endmember spectra as supervisory input data for abundance extraction.

I. INTRODUCTION
H YPERSPECTRAL image (HSI) technology has become a leading imaging technology in many fields including medical imaging, food quality assessment, forensic sciences, surveillance, and remote sensing [1]. However, due to the insufficient spatial resolution of spectrometers and homogeneous mixture of distinct macroscopic materials in imaging scenes, the observed reflectance spectrum at each pixel of an HSI could easily be a mixture of spectra belonging to a set of constituent members (also called endmembers). This mixing phenomenon constitutes a major concern with regard to many applications. As a remedy to this complication, various methods of hyperspectral unmixing (HU) have been implemented to extract endmember spectra along with their fractional composition (also called abundances). HU is a study of three subproblems, i.e. determining the no. of endmembers, extracting the endmember spectra, and realizing their abundances [2].
In the past, many algorithms have been introduced in order to solve the HU problem [3]- [15] and these algorithms can be categorized under three main schemes according to the basic computational approaches [16]: 1-statistical algorithms, 2-geometric algorithms and 3-sparse regression based unmxing algorithms. Statistical algorithms interpret a mixed pixel by utilizing statistical representations. These representations are commonly analytical expressions based on the probability density functions of the underlying mixed pixel spectra. Bayesian self organizing maps (BSOM) [17], independent component analysis (ICA) [18], [19], independent factor analysis (IFA) [20], dependent component analysis (DECA) [21], automated morphological endmember extraction (AMEE) [22], and spatial-spectral endmember extraction algorithm (SSEE) [23] are some of the popular statistical algorithms utilized for HU. Geometric algorithms exploit the geometric orientation of HSI data in an n-dimensional space, where n is the no. of spectral bands captured by the imaging spectrometer. Vertex component analysis (VCA) [24], minimum volume transform (MVT) [25], simplex identification via split augmented Lagrangian (SISAL) [26], optical realtime adaptive spectral identification system (ORASIS) [27], iterative error analysis (IEA) [28], and nonnegative matrix factorization (NMF) [29] are some of the geometric algorithms frequently utilized for HU. Sparse regression based approaches arXiv:2003.01041v5 [eess.IV] 7 Aug 2021 utilize known libraries. The unmixing problem is formulated as a sparse linear regression problem which is based on the assumption that every feature can be linearly created by few elements extracted from known libraries [30]- [33].
In the recent past, several approaches have been introduced where deep learning (DL) is utilized for HU. More often, DLbased methods for HU do not perform blind unmixing, i.e. extract both endmember spectra and abundances. In [34], a Hopfield neural network (HNN) machine learning approach is utilized to solve the seminonnegative matrix factorization problem, which has illustrated promising performance with regard to abundance extraction when given reliable endmember spectra as supervisory input data. In [35], an artificial neural network (ANN) is utilized to inverse the pixel spectral mixture in Landsat imagery. Here, to train the network, a spectral library had been created, consisting of endmember spectra collected from the image and simulated mixed spectra. In [36], a two-staged ANN architecture has been introduced in which the first stage reduces the dimension of the input vector utilizing endmember spectra as input data. As can be seen, most of the current DL-based methods for HU utilize endmember spectra as supervisory input data in order to extract the abundances.
Originally introduced by Lee and Seung [29], NMF is a mathematical tool which is utilized to decompose a nonnegative data matrix into the product of two other nonnegative matrices of lower rank based on the optimization of a particular objective function. Since the nonnegativity criterion does not accommodate any negative elements in resultant matrices [37], which also coincides with the objective of HU. Driven by this parts-based representation of the NMF framework, NMFbased algorithms are often utilized to solve the HU problem. However, NMF is an ill-posed geometric algorithm; therefore, it does not possess a unique solution [8]. The non-convex objective functions utilized for NMF compel its solution space to be wide. Thus, many researchers have introduced novel NMF algorithms by adding different auxiliary regularizes to the conventional NMF framework in order to improve the uniqueness of its solution with respect to the HU setting. l 1/2 -sparsity constrained NMF (l 1/2 -NMF) [8], spatial group sparsity regularized NMF (SGSNMF) [38], minimum volume rank deficient NMF (Min-vol NMF) [39], manifold regularized sparse NMF [7], Double Constrained NMF [40], total variation regularized reweighted sparse NMF (TV-RSNMF) [41], subspace clustering constrained sparse NMF (SC-NMF) [42], nonsmooth NMF (nsNMF) [43], robust collaborative NMF (R-CoNMF) [44], Subspace Structure Regularized NMF (SSRNMF) [45], graph regularized NMF (GNMF) [46] and Projection-Based NMF (PNMF) [47] are some customary NMF-based baselines utilized for HU. Furthermore, A new architecture has recently emerged for blind unmixing under the premise Nonnegative Tensor factorization (NTF). In the paper, Matrix-Vector NTF for Blind Unmixing of Hyperspectral Imagery (MVNTF) [48], the authors have formalized a novel way of unmixing while preserving the spatial information by factorizing hyper spectral 3D cubes instead of unwrapped spectral datasets.
In HU, the endmembers are typically macroscopic objects in the HSI scene, such as soil, water, vegetation, etc [2]. In a broader sense, HU attempts to find these macroscopic objects by utilizing the observations of signals that have already interacted (or mixed) with other objects in the scene before arriving at the sensing element of the imaging spectrometer. It is pragmatic to assume that the endmembers are consequences of different physical processes; hence, statistically independent 1 [18]. If a particular methodology promotes maximizing the independence of endmembers, each of the endmember spectra extracted utilizing that particular method will be more independent than the mixed pixel spectra. Therefore, such a method would be a progression toward the extraction of more realistic endmember spectra belonging to independent macroscopic objects. Even though the frequently-associated abundance sum-to-one (ASC) constraint [49] in HU does not accommodate the concept of independent endmembers, algorithms such as ICA [18], [19], IFA [20], and independent innovation analysis (IIA) [50] are popular algorithms utilized in HU which consider this concept. Also, several attempts have been taken previously in order to incorporate the independence of endmembers onto the conventional NMF framework. The authors of [51] have proposed a novel initialization method based on statistical independence between NMF components. In [52], an attempt has been made to initialize NMF with a modified ICA method (modifICA-NMF). In [53], a novel effective method has been introduced unifying independent vector analysis (IVA) and NMF. Our previous work [54] discusses the suitability of utilizing the fundamental notions of kurtosis-based ICA to enhance the conventional NMF algorithm. Inspired by the interpretable parts-based representations and simplicity of imposing auxiliary regularizes of the conventional NMF framework, and motivated by our previous work [54]- [60], this study proposes a novel regularizer to the conventional NMF framework named Average Kurtosis regularizer. Incorporating this regularizer along with an abundance smoothing mechanism, we present a novel blind HU algorithm named Kurtosis-based Smooth Nonnegative Matrix Factorization (KbSNMF) along with its two variants KbSNMF-fnorm and KbSNMF-div. The motivation of the proposed work is to promote the independence of endmembers while extracting them in accordance with the parts-based representations of the conventional NMF framework, thereby attempting to extract the most realistic endmember spectra from a given HSI. The contributions of this paper are summarized as follows: 1) Introduction of a novel regularizer for HU, based on kurtosis, which promotes the independence of endmembers of an HSI. 2) Computation of the gradient of the aforesaid regularizer w.r.t. the factors of the conventional NMF framework, and the establishment of a blind HU algorithm named KbSNMF, which effectively promotes the independence of endmembers while maintaining the smoothness of abundance maps.
We also implement and evaluate the performance of the proposed algorithm in comparison with several selected state-ofthe-art NMF-based HU baselines. Experiments are conducted on diverse synthetic HSI datasets (with numerous numbers of endmembers, spectral bands, pixels, and noise levels) as well as on three standard real HSI datasets. These experiments substantiate that the proposed algorithm outperforms other state-of-the-art NMF-based blind HU algorithms in many instances, especially in extracting endmember spectra. This observation is understandable since the proposed algorithm tries to improve upon the pragmatic characteristics of the endmember spectra, rather than trying to improve upon the pragmatic characteristics of the abundance maps. Thus, in an unsupervised setting where there is the luxury of utilizing a DL-based method for abundance extraction, the proposed algorithm would provide a viable counterpart to generate endmember spectra as supervisory input data to the DL-based method.
The rest of the paper is arranged as follows. Section II provides the background related to the proposed algorithm. In Section III, the novel kurtosis-based regularizer is developed along with its derivatives. In Section IV, the novel KbSNMF algorithm is introduced. Section V discusses some key issues related to the implementation of the proposed algorithm. Section VI is devoted for experimental results and the paper is concluded in Section VII.

II. BACKGROUND A. Linear Mixture Model (LMM)
LMM is the most frequently utilized model for HU and its implications had been widely discussed in many previous works [4], [6], [8], [24], [61]. This model highly depends on the assumption that the incident light waves reflect only once from the underlying macroscopic objects and are captured by the sensing element of the imaging spectrometer without being subjected to scattering. In the LMM, the spectrum at each pixel is represented as a linear combination of the endmember spectra as below, where x j ∈ R n×1 + is the j th pixel spectrum, S ij is the fractional composition occupied by the i th endmember in the j th pixel, a i ∈ R n×1 + is the spectrum of the i th endmember of the HSI, e j ∈ R n×1 is an additive Gaussian noise associated with modeling errors, and r is the no. of endmembers in the HSI. All spectra are measured in reflectance values; hence, the nonnegativity in x j 's and a i 's. The nonnegativity constraint S ij ≥ 0 and the sum-to-one constraint r i=1 S ij = 1 are implied in order to guarantee that the fractional compositions representing the endmembers are nonnegative and the abundance summation equals 1 at each pixel. The LMM can be reformulated in matrix notations as below, where X ∈ R n×m + is the HSI data matrix, n being the no. of spectral bands and m being the no. of pixels of the HSI, A ∈ R n×r + is the endmember matrix whose columns represent the spectra of each of the r endmembers, S ∈ R r×m + is the abundance matrix whose columns represent the fractional compositions at each of the m pixels, and E ∈ R n×m is the noise matrix. This formulation casts the HU problem as a BSS problem, i.e. simultaneous extraction of the endmember spectra and their abundances at each pixel while utilizing the HSI as the input.

B. Nonnegative Matrix Factorization (NMF)
NMF is a low-rank approximation of nonnegative matrices widely utilized in the fields of computer vision, clustering, data compression, etc [51], [62]- [65]. NMF was first introduced by Lee and Seung [29] as a parts-based representation technique which permits the data in a nonnegative matrix to be decomposed into two other nonnegative matrices. Given a matrix V ∈ R n×m + , NMF tries to find nonnegative matrices W ∈ R n×r + (known as the source matrix) and H ∈ R r×m + (known as the mixing matrix) which satisfy the approximation below.
V ≈ WH However, there are infinite no. of W, H solution pairs which satisfy the above approximation. For instance, it is possible to write WH = (WΓ −1 )(ΓH) for any invertible Γ ∈ R r×r + . The conventional procedure to achieve (3) is by defining an objective function which quantifies the quality of the approximation between V and WH and implementing an optimization algorithm to minimize the defined objective function w.r.t. W and H . One of the most commonly utilized objective function is the square of the Frobenius norm between V and WH as in (4).
The above expression is lower bounded by zero and distinctly vanishes if and only if V = WH. Another popular objective function is the divergence 2 of V from WH as in (5).
Similar to the Frobenius norm, the divergence is also lower bounded by zero and vanishes if and only if V = WH. Even though (4) and (5) functions are convex in W and H alone, they are not convex in W and H together [29]. Hence, it is not possible to analytically find global minima of these functions w.r.t. W and H. However, it is possible to find local minima utilizing numerical optimization methods. Lee and Seung [29] have proposed the below (6) and (7) multiplicative update rules to find local minima of the above (4) and (5) functions respectively.  Fig. 1: Underlying mechanism of the proposed algorithm. According to the unmixing strategy, it is discernible that every pixel is a linear combination of several independent endmembers. The proposed method promotes independence of endmembers by increasing the super-gaussianity. The algorithm concept is a derivative of the central limit theorem.
Lee and Seung have further proven the convergence of both the above update rules utilizing an auxiliary function analogous to the proof of convergence of the Expectation Maximization algorithm [29].
The LMM model transforms the HU problem into the form of a conventional NMF problem. If V is the HSI data matrix X, then source matrix W is the endmember matrix A and mixing matrix H is the abundance matrix S. Thus, given X, solving the blind HU problem for A and S utilizing the conventional NMF problem can be formulated as in (8) and (9) for Frobenius norm and divergence-based objective functions respectively.

A. Kurtosis of a Signal
Central moments are often utilized in signal processing in order to characterize the spread of the probability density function (pdf) of a signal [18]. A normalized version of the fourth central moment, given by (10), is called the kurtosis of a signal. Here y denotes the signal, y denotes the mean of the signal, and E is the expectation operator. Intuitively, kurtosis provides a measure of the "peaky"ness of the shape of the pdf of a signal. Excess kurtosis is a measure which compares the kurtosis of a given pdf with the kurtosis of a Gaussian distribution. Since the kurtosis of a Gaussian distribution equals 3, the excess kurtosis can be defined as in (11).
excess kurtosis = kurtosis − 3 Based on the value of excess kurtosis, distributions are categorized under three main types. Mesokurtic distribution is close to a Gaussian distribution; has an excess kurtosis closer to zero. Leptokurtic (also known as super-Gaussian) distribution has a higher and sharper central peak; tails are longer and flatter; has positive excess kurtosis. Platykurtic (also known as sub-Gaussian) distribution has a lower and broader central peak; tails are shorter and thinner; has negative excess kurtosis. Central Limit Theorem (CLT) ensures that a mixture of signals is approximately Gaussian irrespective of the distributions of the underlying source signals. Even though the converse of CLT is not assured, i.e. it is not certain that any Gaussian signal is a mixture of non-Gaussian signals, in practical scenarios, Gaussian signals do consist of a mixture of non-Gaussian signals [18]. Thus, to extract the underlying source signals from a signal mixture, it is common practice in BSS to define a measure of non-Gaussianity and implement an algorithm which maximizes the defined measure as Fig.  1 illustrates. Subsequently, excess kurtosis seems to be a suitable candidate for this purpose as it is a measure of non-Gaussianity. If the excess kurtosis value of a signal is close to zero, it tempts to be Gaussian and if the excess kurtosis value of a signal is away from zero, it tempts to be non-Gaussian (super-or sub-Gaussian). Since there are two types of non-Gaussian distributions, it is common practice in most BSS methods to assume that source signals are of only one type. In this work, we assume the constituent spectra of an HSI to have super-Gaussian distributions. Hence, from a given HSI data matrix X, we aim to extract an endmember matrix A, whose column-wise average kurtosis is maximized, utilizing an NMF framework. Thus, we introduce a novel constrained NMF algorithm which incorporates the maximization of the average kurtosis of endmembers.

B. Average Kurtosis
Obeying the notations introduced in Section II-A, A ∈ R n×r + is the endmember matrix whose columns represent the spectra of each of the r endmembers of the HSI. Thus, it is possible to extract the i th endmember utilizing a simple matrix manipulation as below, where a i is spectrum of the i th endmember from matrix A (or the i th column of matrix A) and Φ i ∈ R r×1 is a column vector whose all elements are zeros except for the i th element which equals 1. If the kurtosis of the i th endmember is K i , it can expressed as below according to (10).
Thus, the average kurtosis through all r endmembers, K can be expressed as below utilizing (12) and (13).
Thus, it is seen that K is a function of A; therefore, it can be written as K(A). We try to maximize K so that the extracted endmembers will have a higher average kurtosis, i.e. they will be closer to super-Gaussian signals. Hence, the proposed framework would favorably influence the extraction of more realistic endmember spectra from the underlying HSI.

C. Derivative of Average Kurtosis
In order to incorporate the average kurtosis regularizer onto the conventional NMF framework, it is essential to find the gradient (or the partial derivative) of K w.r.t A and S, i.e. ∇ A K ∈ R n×r and ∇ S K ∈ R r×m . Since K is not a function of S, ∇ S K = 0 ∈ R n×r . In this section, we provide a detailed explanation on finding ∇ A K. Since A is the endmember matrix, we denote each of its elements by the notation A ki , with the meaning of the reflectance value belonging to the k th spectral band of the i th endmember. Thus, the (k, i) th element of ∇ A K can be written as below implementing an elementwise derivative. where For the convenience of simplifying, we assume that each of the endmember spectra vectors have unit variance, i.e.
In order to rectify the effects of this assumption, a normalization step is carried out as discussed in Section V-B. As a result, we obtain a simplified version of ∇ A K ki as below, where A pi is the reflectance value belonging to the p th spectral band of the i th endmember, and µ i is the mean reflectance of the i th endmember. As can be seen, ∇ A K ki is a summation of n more partial derivative terms for which the solutions can be obtained by utilizing the chain rule in calculus. (17) can be written as follows.
where S i = 1 n n p=1 (A pi − µ i ) 3 represents a normalized version of the third central moment (skewness) of the i th endmember. However, in this work we do not explore the implication of skewness within derivative of the average kurtosis. Concatenating the element-wise derivatives, we then express ∇ A K as the difference between two matrices as below, where P ki = S i and Q ki = (A ki − µ i ) 3 . Then, Q and P can be written as in (21) and (22) respectively for the convenience of incorporating ∇ A K in the NMF framework.
where N = (I − 1 n 1 n×n ) and 1 ∈ R n×n denotes a matrix whose all elements are ones.
IV. KURTOSIS-BASED SMOOTH NONNEGATIVE MATRIX FACTORIZATION (KBSNMF) In this section, we propose a novel blind HU algorithm which not only promotes the independence of endmembers via the kurtosis regularizer but also promotes the smoothness of the abundance maps by integrating a smoothing matrix to the conventional NMF framework. Hence, we denominate the proposed algorithm as Kurtosis-based Smooth NMF (Kb-SNMF). In the proceeding sections, we discuss two variants of KbSNMF depending on the objective function utilized for approximation.

A. KbSNMF-fnorm
Here we present KbSNMF based on Frobenius norm (KbSNMF-fnorm). The general optimization problem for KbSNMF-fnorm is as below.

arg min
Here, γ ∈ R + is a parameter which establishes the tradeoff between approximation error and non-Gaussianity of the endmembers rendered by K, and M ∈ R r×r + is a symmetric matrix called the smoothing matrix which is defined as below, where I is the identity matrix, 1 ∈ R r×1 is a vector whose all elements are ones, and θ is a parameter which satisfies 0 ≤ θ ≤ 1 and controls the extent of smoothness. Enforcing smoothness onto the abundance matrix can be interpreted as Y = MS, where Y is the smoothness-enforced abundance matrix. When θ = 0, M = I, hence, Y = S and no smoothing has occurred in S. As θ → 1, Y tends to become smoother and reaches the smoothest possible at θ = 1. Fig. 2 demonstrates the effects of smoothing parameter θ on the abundance maps.
In order to find a solution for (24), we consider the objective function below.
In order to make the algorithm much simpler, the variable matrices A and S are updated in turns. In each iteration, first A is updated while S is kept constant, then, S is updated while A is kept constant. This scheme is called a blockcoordinate descent approach and is widely utilized in NMFbased algorithms [66]. The updates rules can be primarily written as follows.
where • denotes the Hadamard (element-by-element) product. Updating A and S directly accounts to computing the partial derivatives ∇ A L ∈ R n×r + and ∇ S L ∈ R r×m + , and finding suitable learning rates η A ∈ R n×r + and η S ∈ R r×m + .
Computing the partial derivatives of L w.r.t. A and S can be seen as two parts, i.e. partial derivatives of X − AMS 2 F term and γK(A) term. We refer the readers to [43] and [66] for detailed explanation of the partial derivative of X−AMS 2 F . Incorporating the result in (23), we can present the partial derivatives of L as follows.
where γ is the scalar quantity which equals −2γ nr . By substituting ∂L ∂A and ∂L ∂S in the original block-coordinate descent equations in (27), we can obtain the following update rules for KbSNMF-fnorm.
However, due to the subtracting terms in the gradients, the update rules (29) can enforce A and S to contain negative elements, which contradicts with the parts-based representa-tion of the NMF framework as well as the HU setting. Thus, following a methodology similar to that proposed by Lee and Seung [29], we define data-adaptive learning rates η A and η S as below in order to ensure all positive elements in A and S at each update step.
The fraction line denotes element-by-element division. This results in the multiplicative update rules for the proposed KbSNMF-fnorm algorithm as below.
For convenience, we reconfigure the placement of matrices. Therefore, the final update rules for the proposed KbSNMFfnorm algorithm will be as follows.
It can be seen that choosing the data-adaptive learning rates in the form of (30) to avoid subtraction has enforced A and S to contain nonnegative elements throughout the blockcoordinate descent approach, given initial nonnegative A and S.

B. KbSNMF-div
Analogously, we present the following optimization problem for KbSNMF based on divergence (KbSNMF-div). Following a similar procedure as in Section IV-A, the following multiplicative update rules can be derived for KbSNMFdiv algorithm,

arg min
where 1 ∈ R n×m is a matrix whose all elements are one, and the other notations are the same as defined previously.

V. ALGORITHM IMPLEMENTATION
In this section, we will discuss several points related to the implementation of the proposed algorithm.

A. Initialization
Many algorithms had been designed in the past to enhance the initialization of the conventional NMF problem. In this work, we utilize the Nonnegative Double Singular Value Decomposition (NNDSVD) algorithm [67] in order to initialize the matrices A and S. NNDSVD takes the HSI X and the no. of endmembers r as the input and generates a pair of A and S matrices. The basic NNDSVD algorithm is based on two singular value decomposition (SVD) processes, first, approximating the data matrix and the second, approximating  positive sections of the resulting partial SVD factors incorporating the properties of unit rank matrices. Extensive evidence can be found to suggest that NNDSVD promotes the rapid convergence of the NMF algorithm.

B. Normalization
To avoid the complexity of computing ∇ A K, the endmember spectra are considered as signals of unit variance (See Section III-C), which is not always true in HU setting. In order to rectify this premise, at the beginning of each iteration of the proposed algorithm, we normalize the endmember spectra by their individual variances (See Algorithm 1: lines 5 and 16). Thus, the resulting algorithm follows the essence of projected gradient descent methods which are often utilized in signal processing applications [18]. Here, we have fixed the parameters γ and θ at 3 and 0.4 respectively for KbSNMF-fnorm, and at 8 and 0.4 respectively for KbSNMF-div. Selection of suitable γ and θ and their effects on the unmixing performance are extensively discussed in Section VI-C1. Observing Fig. 3(a) and 3(b), it is evident that KbSNMF convergences to a local minimum w.r.t. A and S. Also our primary objective of maximizing K has been achieved, and can be clearly seen in the Fig. 3(c) and 3(d). In the meantime, as seen in Fig. 3(e) and 3(f), Frobenius norm and divergence respectively converges to local minima w.r.t. A and S which ensures the quality of approximation between X and AMS.

D. Termination
In this work, we utilize two stopping criteria, one based on the maximum no. of iterations and the other based on the rate of change in the objective function. We choose a maximum no. of iterations, t max and a minimum rate of change in the objective function C min . The algorithm is terminated either if the present iteration t reaches t max or if the present rate of change in the objective function C(t) falls below C min . Here , where L(t) is the value of the objective function at the t th iteration. The selection of suitable t max and C min is discussed in Section V-E.

E. Parameter Selection
Observing Fig. 3(a) and 3(b), it is evident that both variants of KbSNMF algorithm have converged to local minima by the 1000 th iteration. Thus, we fix t max at 1000 preserving a reasonable allowance. Also it is seen that the percentage change in the objective function around the 1000 th iteration is in the order of 10 −4 . Thus, we fix C min at 10 −5 to ensure convergence. Determining optimum control parameters γ and θ is discussed in Section VI-C1 via experiment.
Adhering to all the implementing issues discussed above, the proposed KbSNMF algorithm can be summarized as in Algorithm 1.

A. Performance Criteria
In order to evaluate the performance of the proposed Kb-SNMF algorithm and assess its competitiveness with the other state-of-the-art algorithms, we utilize two performance criteria, which are commonly adopted in HU performance evaluation, i.e. Spectral Angle Distance (SAD) and Root Mean Square Error (RMSE). In most of the previous literature on HU, SAD had been utilized to compare the extracted endmember spectra with the ground truth endmember spectra while RMSE had been utilized to compare the extracted abundance maps with the ground truth abundance maps. In our work SAD i , as in (35) measures the spectral angle between the i th ground truth endmember spectrum a i and the corresponding extracted endmember spectrum a i , in radians; RM SE i , as in (36) measures the error between the i th ground truth abundance map S i and the corresponding extracted abundance map S i .
Unless otherwise noted, in all experiments SAD and RMSE are average values over all extracted endmember spectra and abundance maps respectively.

B. Experimental Setting
The proposed algorithm is tested on simulated as well as real hyperspectral datasets. Also, we compare the performance of our proposed algorithm with the popular state-of-the-art NMFbased HU baselines: l 1/2 -NMF [8], SGSNMF [38], Min-vol NMF [39], R-CoNMF [44], SSRNMF [45] and MVNTF [48]. To ensure that the evaluations are done on common grounds, we utilize the same initializing procedure and stopping criteria as mentioned in Sections V-A and V-D respectively, for all the competing algorithms except MVNTF algorithm which is initialized with random matrices.
Simulated HSI data were generated utilizing the hyperspectral imagery synthesis toolbox (HSIST) 3 in order to conduct experiments. HSIST consists of the full USGS spectral library 4 which contains hundreds of endmember spectra including minerals, organic and volatile compounds, vegetation, and man-made materials. The corresponding abundance maps were generated incorporating a spherical Gaussian field [68].
To assess the performance of the proposed method in real environments, we conduct experiments on real hyperspectral 3 http://www.ehu.eus/ccwintco/index.php/Hyperspectral Imagery Synth esis tools for MATLAB 4 https://www.usgs.gov/labs/spec-lab data. The Samson dataset, the Urban dataset, and the Cuprite dataset (See Fig. 4) have been widely utilized for performance evaluation and comparison in recent HU studies [3], [42], [69]. The Samson dataset's each pixel is recorded at 156 spectral channels covering wavelengths in the range of 401-889 nm with a spectral resolution of 3.13 nm. The Urban dataset's each pixel is recorded at 210 spectral channels originally, however, due to dense water vapor and atmospheric effects, several bands are customarily removed prior to analysis, resulting in 162 spectral bands ranging from 400-2500 nm, with a spectral resolution of 10 nm. The Urban dataset possesses several ground truth versions, here we utilize the one with five endmembers. The Cuprite dataset is the widely used benchmark dataset for HU and its each pixel is recorded at 188 spectral channels covering wavelengths in the range of 370-2480 nm.
The ground truths for all real datasets are worked out utilizing a procedure similar to that of [4] and [70]. First, the Virtual Dimensionality (VD) algorithm [71] is utilized to determine the no. of endmembers of the HSI. Second, the pixels that contain pure endmember spectra are chosen manually in accordance with the USGS mineral spectral library. Finally, the corresponding abundances are computed utilizing the CVX optimization Toolbox in MATLAB. Accordinglygenerated ground truths are often utilized in HU method evaluation and comparison and are readily-available 5 .
C. Experiments on simulated data 1) Sensitivity to control parameters: We conduct experiments to find optimum values for γ and θ for KbSNMF-fnorm and KbSNMF-div. We increase γ from 0 to 25 in steps of 1, increase θ from 0 to 1 in steps of 0.1, and evaluate the unmixing performance at each step. It is seen that SAD reaches minimum around γ = 3 and θ = 0.4 in Fig. 5(a) and around γ = 8 and θ = 0.4 in Fig. 5(b). Thus, we fix γ and θ at 3 and 0.4 respectively for KbSNMF-fnorm and at 8 and 0.4 respectively for KbSNMF-div.
2) Unmixing performance: Under this experiment, we compare the unmixing performance of KbSNMF with the other HU algorithms. Table I shows SAD values for each of the extracted endmember spectra and Table II shows RMSE values for each of the extracted abundance maps, under the different methods. It is clearly seen that the KbSNMF under its both variants dominates the other competing algorithms in terms of SAD while signifying competitive performance in terms of RMSE. Fig. 6 and 7 respectively illustrate the endmember spectra and abundance maps extracted utilizing KbSNMF along with their ground truths.
3) Robustness to noise: In this experiment, we aim to analyze how the proposed algorithm performs in noisy environments. We add zero-mean white Gaussian noise to the original noise-free simulated dataset with a predetermined signal to noise ratio (SNR) given by (37), where x is the pixel spectrum vector, n is the noise signal vector, and E is the expectation operator. We conduct the experiment under 11 SNR levels: 0 dB, 5 dB, 10 dB, 15 dB, 20 dB, 25 dB, 30 dB, 35 dB, 40 dB, 45 dB, 50 dB and the results are illustrated in Fig. 8 in terms of SAD and RMSE.Although SSRNMF and MVNTF show high immunity to large noise in terms of SAD values, It is discrenible that KbSNMFfnorm and KbSNMF-div report the best performance showing superior performance over all competing algorithms at noise levels in the range of 15-50 dB. They also show robustness to noise up until 30 dB. In terms of RMSE, both KbSNMF-fnorm and KbSNMF-div show robustness to noise up until 20 dB and gradually deteriorate in performance thereafter. However, both KbSNMF-fnorm and KbSNMF-div outperform SSRNMF at all noise levels in terms of RMSE. The superior performance of KbSNMF-fnorm and KbSNMF-div in terms of SAD is due to the novel auxiliary regularizes on the endmember matrix and thereby attempting to extract the most realistic endmember spectra. 4) Sensitivity to number of spectral bands: Here we vary the no. of spectral bands of the endmembers and observe the unmixing performance of the algorithms. The results are shown in Fig. 9. KbSNMF-fnorm and KbSNMF-div outperform all the competing algorithms in terms of SAD for no. of spectral bands in the range of 300-960. However, the performance of KbSNMF-fnorm and KbSNMF-div deteriorate drastically in terms of SAD for very low no. of spectral bands, i.e. around 200 spectral bands. In terms of RMSE, KbSNMFfnorm and KbSNMF-div outperforms MVNTF at high no. of spectral bands, specifically more than 180, and outperform SGSNMF at low no. of spectral bands, i.e. below 480 spectral bands. 5) Sensitivity to number of endmembers: In this experiment, we vary the no. of endmembers and investigate the performance of the algorithms. The results are illustrated in Fig. 10. All the algorithms have the tendency to deteriorate performance in terms of SAD with the no. of endmembers. KbSNMF-fnorm and KbSNMF-div outperform R-CoNMF when the no. of endmembers are low, i.e. below 7 endmembers, and outperform SGSNMF when the no. of endmembers is high, i.e. above 5 endmembers. In terms of RMSE, KbSNMFfnorm and KbSNMF-div outperform SGSNMF when the no. of endmembers is low, i.e. below 4 endmembers.
6) Sensitivity to number of pixels: Within this experiment, we illustrate how the proposed algorithm performs under simulated HSI datasets against different no. of pixels. The no. of pixels in an HSI is a major concern since it denotes the amount of statistical information in the input to the algorithm. The amount of statistical information presented to a numerical algorithm determines the tendency of an algorithm to be trapped in a local minima [72]. Fig. 11 illustrates the results in terms of SAD and RMSE. The unmixing performance of KbSNMF-fnorm and KbSNMF-div improves in terms of SAD when the no. of pixels is increased and even outperforms all competing algorithms except MVNTF and SSRNMF when the no. of pixels is very high, i.e. 64×64 and 128×128 pixels. In terms of RMSE, KbSNMF-fnorm and KbSNMFdiv outperform SGSNMF when the no. of pixels is very high, i.e. 64×64 and 128×128 pixels.

D. Experiments on real data
We compare the unmixing performance of KbSNMF with the other competing methods in terms of SAD and RMSE for the Samson and Urban datasets. But for the Cuprite dataset, only the SAD values are tabulated.
it can be observed that they closely follow their ground truth spectra. Also, the abundance maps extracted by KbSNMFfnorm and KbSNMF-div are shown in Fig. 13, and it is evident that KbSNMF-div has managed to accurately extract the abundance maps.
2) Urban dataset: Table V shows SAD values for each of the extracted endmember spectra under the different methods. In terms of SAD, KbSNMF-fnorm outperforms all methods and KbSNMF-div outperforms the rest of the methods except for Min-vol NMF. Also, KbSNMF-fnorm reports the best performance and KbSNMF-div reports the second best performance in extracting the spectra of the endmembers "Tree" and "Roof". The endmember spectra extracted utilizing KbSNMFfnorm and KbSNMF-div are shown in Fig. 14, and it can be observed that they closely follow their ground truth spectra. Table VI reports RMSE values for each of the extracted abundance maps, under the different methods. In terms of RMSE, Min-vol NMF outperforms all methods, followed by R-CoNMF and MVNTF. KbSNMF-fnorm reports the best performance in extracting the spectra of the endmembers   "Grass" and "Roof', and KbSNMF-div reports the second best performance in extracting the spectra of the endmember "Asphalt". The abundance maps extracted utilizing KbSNMFfnorm and KbSNMF-div are shown in Fig. 15, and it can be observed that they closely follow their ground truth abundance maps.
3) Cuprite dataset: Table VII shows SAD values for each of the extracted endmember spectra under the different methods. In terms of SAD, KbSNMF-div sits at the second place while SSRNMF stands at the top. The both KbSNMF forms report compelling results as they show best performance in extracting several endmembers, i.e "Andradite", "Kaolinite 2", "Muscovite", and "Sphene". The endmember spectra extracted utilizing KbSNMF-fnorm and KbSNMF-div are shown in Fig.  16, and it can be observed that they closely follow their ground truth spectra.

VII. CONCLUSION
This paper proposed a blind HU algorithm called KbSNMF, which is based on incorporating the independence of endmembers to the conventional NMF framework. This was done by introducing a novel kurtosis regularizer based on the fourth central moment of a signal which signifies the statistical independence of the underlying signal. We illustrated a com-prehensive derivation of the proposed algorithm in this paper along with its performance evaluation in simulated as well as real environments (diverse simulated HSI datasets and three standard real HSI datasets). We have assessed the sensitivity of the proposed algorithm to control parameters, noise levels, number of spectral bands, number of pixels, and number of endmembers of the HSI. We have also provided performance comparisons of the proposed algorithm with the state-of-theart NMF-based blind HU baselines. Moreover, experimental results verify that dominant performance in endmember extraction can be obtained through the novel algorithm. Hence, the proposed algorithm can be effectively utilized to extract accurate endmembers which can thereafter be passed through as supervisory input data to modern DL methods.