Matched Shrunken Cone Detector (MSCD): Bayesian Derivations and Case Studies for Hyperspectral Target Detection

Hyperspectral images (HSIs) possess <italic>non-negative</italic> properties for both hyperspectral signatures and abundance coefficients, which can be naturally modeled using cone-based representation. However, in hyperspectral target detection, cone-based methods are barely studied. In this paper, we propose a new regularized cone-based representation approach to hyperspectral target detection, as well as its two working models by incorporating into the cone representation <inline-formula> <tex-math notation="LaTeX">$l_{2}$ </tex-math></inline-formula>-norm and <inline-formula> <tex-math notation="LaTeX">$l_{1}$ </tex-math></inline-formula>-norm regularizations, respectively. We call the new approach the matched shrunken cone detector (MSCD). Also important, we provide principled derivations of the proposed MSCD from the Bayesian perspective: we show that MSCD can be derived by assuming a multivariate half-Gaussian distribution or a multivariate half-Laplace distribution as the prior distribution of the coefficients of the models. In the experimental studies, we compare the proposed MSCD with the subspace methods and the sparse representation-based methods for HSI target detection. Two real hyperspectral data sets are used for evaluating the detection performances on sub-pixel targets and full-pixel targets, respectively. Results show that the proposed MSCD can outperform other methods in both cases, demonstrating the competitiveness of the regularized cone-based representation.


Matched Shrunken Cone Detector (MSCD): Bayesian Derivations and Case Studies for
Hyperspectral Target Detection Ziyu Wang, Rui Zhu, Kazuhiro Fukui, Member, IEEE, and Jing-Hao Xue Abstract-Hyperspectral images (HSIs) possess non-negative properties for both hyperspectral signatures and abundance coefficients, which can be naturally modeled using cone-based representation.However, in hyperspectral target detection, conebased methods are barely studied.In this paper, we propose a new regularized cone-based representation approach to hyperspectral target detection, as well as its two working models by incorporating into the cone representation l 2 -norm and l 1 -norm regularizations, respectively.We call the new approach the matched shrunken cone detector (MSCD).Also important, we provide principled derivations of the proposed MSCD from the Bayesian perspective: we show that MSCD can be derived by assuming a multivariate half-Gaussian distribution or a multivariate half-Laplace distribution as the prior distribution of the coefficients of the models.In the experimental studies, we compare the proposed MSCD with the subspace methods and the sparse representation-based methods for HSI target detection.Two real hyperspectral data sets are used for evaluating the detection performances on sub-pixel targets and full-pixel targets, respectively.Results show that the proposed MSCD can outperform other methods in both cases, demonstrating the competitiveness of the regularized cone-based representation.

I. INTRODUCTION
W ITH the help of remote sensors, hyperspectral imaging has become an important scientific tool for various fields of real-world applications.In the analysis of hyperspectral images (HSIs), target detection is a major task, which aims to detect small objects or anomalies in an hyperspectral image.Typical target detection applications include military defence, agricultural management and mineral detection.
Target detection is essentially a binary classification problem, of which the task is to determine if an HSI pixel is a target spectrum or a background spectrum.Hence, target detection can be regarded as a binary hypothesis model with two competing hypotheses: the null hypothesis H 0 for the absence of the target; and the alternative hypothesis H 1 for the presence of the target.Binary hypothesis models for target detection have been nicely reviewed in [1]- [4].
Target objects often appear as sub-pixels in an HSI.That is, the spectrum of an HSI pixel can be a mixture of different component spectra of materials.These component spectra are usually termed endmembers.To model the mixture of an HSI pixel, the linear mixing model (LMM) [5] has been widely adopted.The underlying assumption of LMM is that an HSI pixel can be approximated by a linear combination of endmembers with different fractions.When a target pixel presents, its spectrum is decomposed as a linear combination of background endmembers and target endmembers; in contrast, when a background pixel presents, its spectrum is fully represented by background endmembers.
Within the framework of binary hypothesis modelling, researches have explored a variety of techniques and extensions on the basis of LMM.Since it is difficult to obtain comprehensive spectral libraries to serve as the endmembers for all desired targets, many methods focus on extract endmembers directly from HSIs.On the one hand, provided with a large number of background samples, subspace methods have been widely developed for target detection.Typical methods, such as the orthogonal subspace projection detector (OSP) [6] and matched subspace detector (MSD) [7], adopt the leading eigenvectors (with dominant eigenvalues) as the subspace bases and implicitly the endmembers.On the other hand, sparse representation (SR) techniques [8] originating from compressed sensing have been recently studied in the HSI analysis [9].For HSI target detection, SR-based methods, such as sparse target detection (STD) [10], sparse representation-based binary hypothesis model (SRBBH) [11] and hybrid sparsity and statistics detector [12], model a test HSI pixel as a linear combination of only a few training samples (aka atoms of an over-complete dictionary).It implicitly regards the atoms as endmembers, hence the SR-based methods can be viewed as being developed in the original sample space.
These methods can be further extended to nonlinear mixing models.The kernel methods, which aim to define a model in a high-dimensional feature space associated with a nonlinear mapping of input data, have also been studied for HSI target detection [13]- [15].In [13], subspace methods such as MSD, OSP have been extended to their kernel versions.Kernelisation of the SR-based methods has been also developed, such as kernel-based STD [14] and kernel-based SRBBH [15].
For the sake of physical interpretations, HSIs as instances of natural signals possess non-negative properties for both hyperspectral signatures and the abundance coefficients.A number of investigations have focused on the non-negative matrix factorisation (NMF) [16], [17] for HSI unmixing problems.NMF factorises a sample data matrix into two low-dimensional matrices in terms of bases and corresponding coefficients, and explicitly enforces the non-negative constraints on both of them.However, in the past researches of HSI target detection [6], [7], [9]- [15], the non-negativity properties have not been considered yet, particularly for the abundance coefficients.If we use the samples directly from HSIs as endmembers, it is desirable to impose the non-negative constraints on the coefficients.In this way, both endmembers and coefficients are non-negative, such that this physical characteristic of hyperspectral signatures are modelled.
Statistically, the estimation of non-negatively-constrained coefficients in the LMM is often termed non-negative least squares (NNLS) [18].Geometrically, the NNLS estimation induces a cone-shape representation [19].Suppose that a hyperspectral spectrum x is a p-dimensional vector, and that there are K types of materials, i.e.K endmembers potentially constituting an HSI pixel, which are represented by m 1 ,...,m K with each m k also a p-dimensional vector.Then the cone-based representation of pixel x expresses the spectral signature of x as a non-negative linear combination of endmembers m 1 ,...,m K with corresponding non-negative abundance fractions a 1 ,...,a K , such that a k ≥ 0f o rk = 1,...,K .More specifically, a convex cone C is defined as where M is a p×K matrix whose columns are the K endmembers spectra m k =[m k,1 ,...,m k, p ] T ;a n da =[a 1 ,...,a K ] T denotes the abundance vector.For the non-negative LMM, an additional noise term is also considered: where the vector n is assumed to be the Gaussian white noise, i.e. n ∼ (0,σ 2 I p ),w h e r eI p is the p × p identity matrix.
It is worth noting that, LMM-based methods may suffer from the problem of high variance of coefficients estimations.To this end, shrinkage methods [20] have been developed in statistical learning.Typical shrinkage methods include l 2 -norm regularisation, also known as ridge regression or Tikhonov regularisation, and l 1 -norm regularisation, also known as lasso.For the convex cone analysis, these regularisations have also been studied, mainly on the computational efficiency of the algorithms developed based on the NNLS [21]- [24].
In this paper, to account for the non-negativity as well as the shrinkage of the coefficients of the convex cone model (2) for HSI target detection, we propose a new approach called the matched shrunken cone detector (MSCD).Specifically, on the cone representations we propose to shrink the abundance coefficients of target endmembers and background endmembers by imposing constraints; we propose two working models with the l 2 -norm and l 1 -norm regularisations, respectively.We call these two methods MSCD-l 2 and MSCD-l 1 .Equally important, we derive the proposed MSCD from the Bayesian perspective, showing that MSCD-l 2 and MSCD-l 1 can be derived if a multivariate half-Gaussian distribution [25] and a multivariate half-Laplace distribution [26] are assumed as the prior distributions of the coefficient vectors.To our knowledge, it is the first time that the cone representations with the l 2 -norm and l 1 -norm regularisations are derived from the Bayesian perspective, as well as the prior distributions identified.
The main novelties and contributions of this paper are summarised as follows.
1) We propose a regularised cone-based representation approach called MSCD for HSI target detection.This is the first time that the cone-based representation and its regularised versions are brought to HSI target detection.
3) More importantly, we derive the proposed MSCD-l 2 and MSCD-l 1 from the Bayesian perspective, showing that they imply a multivariate half-Gaussian distribution and a multivariate half-Laplace distribution as the prior distributions for the coefficients.As far as we are concerned, this is the first time that the l 2 -norm and l 1 -norm regularised cone representations are derived from the Bayesian perspective with their corresponding prior distributions identified.
4) Through illustrating the Bayesian derivations of the proposed MSCD-l 2 and MSCD-l 1 , our principled work opens a door to different new regularised models to accommodate various prior knowledge of the practitioners, which provides a valuable direction to further and enrich the research of HSI target detection.5) Last but not least, we illustrate the competitive detection performance of the proposed models, compared with some classical and state-of-the-art HSI target-detection methods, on two real hyperspectral datasets for sub-pixel and full-pixel target detections, respectively.
It is worth noting that our proposed models are in nature different from the widely-used sparse-representation-based detectors [10], [11], the collaborative-representation-based detector [27] and their hybrids [28], although our two working models also apply the l 2 -norm and l 1 -norm regularisations.From the modelling perspective, motivated by NMF and physical interpretations, our MSCD introduces the non-negativity constraints into the estimation of the model coefficients.From the geometrical perspective, these non-negativity constraints induce a cone-shaped representation.Furthermore, we provide comprehensive statistical derivations of our proposed models from both frequentist and Bayesian perspectives.In fact, none of the non-negativity, the cone-based representation, or the Bayesian derivation were presented in [10], [11], [27], and [28].
In the rest of the paper, Section II reviews the binary hypothesis model in terms of the likelihood ratio test.Section III introduces the propose MSCD.Section IV shows the derivations of the proposed MSCD-l 2 and MSCD-l 1 from the Bayesian perspective with the prior distributions of the coefficients identified.Section V illustrates the superior performance of MSCD to other subspace and SR-based methods; and section VI gives the conclusion of this work.

II. BINARY HYPOTHESIS TESTING MODEL FOR
HSI TARGET DETECTION HSI target detection methods are typically derived from a binary hypothesis testing model [3].We suppose that an HSI pixel x is a continuous random vector.A likelihood ratio of the conditional probability density functions (pdfs) on two competing hypotheses is constructed as follows: In (3), D(x) is an output detector, which is the ratio of two conditional pdfs of x under the null hypothesis H 0 and the alternative hypothesis H 1 ,i .e .f x|H 0 (x) and f x|H 1 (x); ν is a predefined detection threshold, such that when D(x)>ν the test HSI pixel x is identified as a target.The pdfs are usually unknown and estimated parametrically.Specifically, the likelihood ratio is replaced by the generalised likelihood ratio (GLR), using their maximum likelihood estimates (MLEs): In (4), we use ω0 and ω1 to denote the MLEs of ω 0 and ω 1 ,w h e r eω 0 and ω 1 are unknown parameters of conditional pdfs f x|H 0 (x; ω 0 ) and f x|H 1 (x; ω 1 ), respectively.

A. Formulation of LMM-Based Binary Hypothesis Models
In the framework of LMM [5], a test pixel x is modelled by a linear combination of target endmembers and background endmembers.Specifically, the LMM for HSI target detection is constructed as follows: (5) where matrix whose columns are N b background spectra; γ and β are the abundance vectors of M T and M B , respectively; and n 0 and n 1 are assumed to be p-dimensional vectors of Gaussian white noise: n 0 ∼ N (0,σ 2 H 0 I p ) and n 1 ∼ N (0,σ 2 H 1 I p ),w h e r eI p is the p × p identity matrix.For a more convenient representation, we let M be the concatenated matrix of M T and M B : Accordingly, we concatenate the abundance vectors γ and β of model H 1 into one vector α: α . Then model H 1 can be rewritten as and the LMM-based binary hypothesis model becomes where now the unknown parameters are β, α, n 0 and n 1 .

B. Derivations of LMM-Based GLR
The generalised likelihood ratio (GLR) of LMM for target detection is formulated as The MLEs σ 2 0 and σ 2 1 are equal to 1 p n0 2 2 and 1 p n1 2 2 , respectively.Taking the 2/ p power of (8), we have The MLEs of β and α in ( 9) are given by and thus by least square estimates.Based on solutions ( 12) and ( 13), the residual sums of squares (RSS) e 0 and e 1 for models H 0 and H 1 are computed as  (19). and respectively.The final GLR detector of LMM is then The value of D LMM (x) is compared to a threshold ν to make the final decision of which hypothesis should be rejected for the test pixel x.It is worth noting that the over-fitting problem may happened in (16), and to this end the matched subspace detector (MSD) [7] can be used instead.In MSD, the endmembers of background spectra and target spectra, M B and M T , are represented by the leading eigenvectors of the background and target subspaces, respectively.

III. MATCHED SHRUNKEN CONE DETECTOR (MSCD)
Rather than using an unconstrained LMM, it is desirable to adopt the non-negative linear model for modelling a mixed HSI pixel, so as for a reasonable physical interpretation.On top of that, we also introduce the regularisation to the non-negative representation to control the variance of estimates, and derive the whole new model from the Bayesian perspective.Particularly, we introduce the popular l 2 -norm and l 1 -norm regularisations to the cone-based representation.We call the proposed approach matched shrunken cone detector (MSCD) with two specific models MSCD-l 2 and MSCD-l 1 .

A. Regularised Cone
The cone representation of a mixed pixel and its l 2 -norm and l 1 -norm regularised models are formulated as follows.
Cone representation: l 2 -norm regularised cone representation: l 1 -norm regularised cone representation: To illustrate the relationship among ( 17), ( 18) and ( 19), we show a two-dimensional cone with different constraints in Fig. 1.It is easily to see that the non-negative linear combination of two endmembers m 1 and m 2 will always lie in the cone.With additional l 2 -norm or l 1 -norm regularisations, the regions of the constructed vectors are down-sized to be a fan or a triangle, respectively.In other words, l 2 -norm and l 1 -norm regularisations shrink the value of the coefficient vector a for the representation of an HSI pixel.
In the following sections, we shall derive the cone-based binary hypothesis models corresponding to the optimisation problems of ( 17), ( 18) and ( 19), respectively.

B. Regularised Cone-Based Estimators of Coefficient Vectors
The cone-based binary hypothesis models for target detection can be formulated as the model in (7) but with additional constraints.Then we call such models corresponding to (17), ( 18) and ( 19) matched cone detector (MCD), matched shrunken cone detector with l 2 -norm regularisation (MSCD-l 2 ) and matched shrunken cone detector with l 1 -norm regularisation (MSCD-l 1 ), respectively.
MCD: given the non-negative constraints (17), the MLEs of β and α for models H 0 and H 1 of (7) are given by MSCD-l 2 :giventhel 2 -norm regularised cone representation in (18), the estimators of β and α of ( 7) are given by MSCD-l 1 :giventhel 1 -norm regularised cone representation in (19), the estimators of β and α of ( 7) are given by IV. BAYESIAN DERIVATIONS OF MSCD Given the cone representation under the null hypothesis H 0 of (7) and Bayes' theorem the maximum a posteriori (MAP) estimate of β is As the noise n 0 ∼ N (0,σ 2 H 0 I p ), the likelihood function f (x|β) can be formulated as Similarly, the MAP estimate of α in the alternative hypothesis model and as the noise n 1 ∼ N (0,σ 2 H 1 I p ), the likelihood function f (x|α) can be formulated as In the ordinary cone representations ( 20) and ( 21) of the MCD model, improper uniform (non-informative) prior distributions are actually implied for parameters β and α, with β ≥ 0 and α ≥ 0. However, in the proposed regularised MSCD-l 2 and MSCD-l 1 , multivariate folded distributions are in fact utilised as the priors for the estimation of β in (22) and (24) and α in ( 23) and ( 25), as we shall show below.

A. Folded Distributions
Suppose that the pdf of a random variable Y is g(y) with y ∈ R. The folding of g(y) over to the non-negative line is accomplished via transform where X is a random variable on the non-negative real line [26]: If we treat coefficients β i and α i in (7) as random variables, then the non-negative constraints on them imply that their pdf are on R + .We shall identify that a multivariate folded Gaussian distribution and a multivariate folded Laplace distribution are the prior distributions of coefficients in the proposed MSCD-l 2 and MSCD-l 1 , respectively.

B. Prior Distributions of β and α in MSCD-l 2
A univariate half-Gaussian distribution is defined as follows.If Y ∼ N(0,σ 2 ) with mean zero, then X =|Y | follows a half-Gaussian distribution with mean and variance An illustration of the half-Gaussian distribution is shown in Fig. 2. The half-Gaussian distribution is a special case of the folded version of Gaussian distribution N(µ, σ 2 ) when µ = 0. We shall identify that, if two multivariate half-Gaussian distributions are imposed on the coefficients α and β, respectively, as the prior distributions, then the estimators ( 22) and ( 23) of MSCD-l 2 can be derived in a Bayesian way.
In the model of the null hypothesis H 0 of the proposed MSCD-l 2 , let us assume a multivariate half-Gaussian distribution as the prior for the coefficient vector β.Specifically, suppose that a vector s =[s 1 ,...,s T is an N b -dimensional vector of all ones; the covariance matrix and the pdf is In MSCD-l 2 , placing the likelihood function f (x|β) (28) and the prior distribution f (β) (36) into the MAP estimate f (β|x) (27) and taking a logarithm, we have where In this way, parameter λ 0 effectively controls the degree of shrinkage via the ratio of two variances σ 2 H 0 and σ 2 β .Equation ( 37) is exactly the same as model (22).Similarly, let us assume the prior distribution of coefficients γ of the target endmembers in the alternative hypothesis H 1 is a multivariate half-Gaussian distribution, with the expectation where 1 N t =[ 1,...,1] T is an N t -dimensional vector; the covariance matrix where I N t is the N t × N t identity matrix; and the pdf is Then the concatenated α in model H 1 is actually assumed to follow a half-Gaussian distribution with mean which is an ) matrix; and the pdf is where σ i = σ γ for i = 1,...,N t and σ i = σ β for i = N t + 1,...,N t + N b .
We can further generalise (43) to a slightly-adaptive shrinkage model: In (44), when i = 1,...,N t ,w eh a v eλ 1i = σ 2 A folded Laplace distribution is also accomplished via transform (31), and the pdf of the transformed random variable X becomes (32).Placing (45) in (32), we have the pdf of a folded Laplace distribution [26]: Specifically, when µ = 0, (46) reduces to which is the pdf of a half-Laplace distribution with mean b.
We shall also identify that, if two multivariate half-Laplace distributions are imposed on the coefficients α and β, respectively, as the prior distributions, then the estimators (24) and (25) of MSCD-l 1 can be derived in a Bayesian way.
Let a random multivariate vector Then placing the likelihood function f (x|β) (28) and the prior distribution f (β) (48) into the MAP function f (β|x) (27) and taking the logarithm, we have where λ 0 = 2σ 2 H 0 /ϕ β controls the degree of shrinkage through the ratio of 2σ 2 H 0 and ϕ β .Equation ( 49) is exactly the same as model (25).
In the same fashion, the prior distribution of coefficients γ of the target endmembers in the alternative model H 1 is also assumed to be a multivariate half-Laplace distribution with pdf As a result, the concatenated coefficients α in model H 1 is in fact assumed to follow a multivariate half-Laplace distribution as well, with pdf where ϕ i = ϕ γ for i = 1,...,N t and As with the derivations in section IV-B, when we have ϕ γ = ϕ β and let both of them to be ϕ α , (51) can be rewritten as Then placing the likelihood function f (x|α) (30) and the prior distribution f (α) (52) into the MAP estimate of α (29) and taking the logarithm, we have where λ 1 is a shrinkage parameter equal to 2σ 2 H 1 /ϕ α .Equation ( 53) is exactly the same as model (25).
Again, (53) can be generalised as where It is worth noting that there is often only one target spectrum available in practice for HSI target detection.In such case, the target training sample M T is a p × 1 single vector instead of a p × N t matrix.Then the variance σ γ defined in MSCD-l 2 and the diversity ϕ γ in MSCD-l 1 are both have to be set as ∞, since there is no σ γ and φ γ can be estimated from the target samples.In other words, we actually do not shrink the coefficient γ ∈ R for the target subset M T so long as N t = 1, and let non-negative projection of a test HSI pixel x onto the target endmember to be as much as possible.

D. Regularisation and Prior Distributions of MSCD
To adjust (and often improve) the performance of a statistical model like MSD or MCD, some prior domain knowledge about the model, particularly the coefficients, can be incorporated by imposing regularisation (a frequentist fashion) or assuming the prior distributions (a Bayesian fashion).These two ways, although from different statistical schools of thinking and inference, can often achieve the same effect, in particular if we can find the pair of a regularisation term and a prior distribution.That is, deriving the corresponding prior distribution to a regularisation term can not only provide a theoretical justification of the latter, but also assist a deeper understanding of the latter; and vice versa.This inspires our derivation of MSCD from the Bayesian perspective.
Specifically, the benefit from proposing MSCD-l 2 and MSCD-l 1 can be understood from both regularisation and Bayesian points of view.
In MSCD-l 2 ,a nl 2 -norm regularisation term is added to impose constraints on the combination coefficients in the model of MCD.This will shrink the value of the coefficients and thus reduce the variances of the estimated coefficients, as usually achieved by a shrinkage methods [20].From the Bayesian perspective, as the coefficients are non-negative, such an l 2 -norm regularisation can be derived as corresponding to a multivariate half-Gaussian prior distribution for the coefficients, as we have shown in section IV-B.Equivalently, using such a prior will reduce the posterior variances of the coefficients, in a Bayesian sense.On the one hand, the original MCD models (20) and ( 21) are equivalent to (37) and (43) when λ 0 and λ 1 are zeros, which implies the use of prior distributions of infinite prior variance.In contrast, the nonzero shrinkage parameters λ 0 and λ 1 in (37) and (43) imply a finite prior variances for the coefficients.On the other hand, with such a prior, the posterior variance of a coefficient will be smaller than the variance of the estimator inferred from the likelihood only.Provided with the lower variance, MSCD-l 2 can provide more stable classification performance than MCD.
The case of MSCD-l 1 is similar to MSCD-l 2 ,i nt e r m so f shrinkage, though the l 1 -norm regularisation on the coefficients of the cone representation-based MCD implies a multivariate half-Laplace prior distribution for the coefficients, as we have shown in section IV-C.In fact, as well known, l 1 -norm regularisation (like lasso) or a Laplace prior distribution can induce not only shrinkage of the values of the coefficients, but also zero values of some coefficients, i.e. the sparsity of the coefficient vectors.This actually implies an endmember selection in the cone representation for HSI target detection.
V. E XPERIMENTS We conduct target detection experiments on two real hyperspectral datasets for sub-pixel target detection and full-pixel target detection, respectively.For sub-pixel target detection, a target appearing in an HSI is smaller than an HSI pixel.In this case we compare the target detection methods on the Hymap dataset [29], which was captured at the location of Cook city, USA.For full-pixel target detection, a target appearing in an HSI can occupy more than one HSI pixel.We use the dataset collected by Airborne Visible/Infrared Imaging Fig. 4.An illustration of the dual-window scheme adopted for adaptively sampling background.For fair comparison, we set OWR = 15 × 15 and IWR = 9 × 9 for all the compared target detection methods listed in Table IV.
Spectrometer (AVIRIS) from San Diego, CA, USA to evaluate the performance of detecting the full-pixel targets.
It is worth noting that, in real target detection problems, it is difficult to obtain training background pixels in a global approach.Instead, most of works on target detection adopt a local and adaptive approach to obtaining the background samples.It is believed that, if the target samples appearing in an HSI scene are sparse enough, we can use the neighbouring HSI pixels around a test HSI pixel as a set of local background samples.Therefore as with [3], [10], [11], and [27], we adopt the dual window scheme.An illustration of a dual window is shown in Fig. 4, which separates a local area of a test HSI pixel into two regions: an inner window region (IWR) and an outer window region (OWR).The IWR is used to enclose the target of interest but not to be necessarily large.The OWR is set to be outside of the IWR and the HSI pixels lie between the IWR and OWR are used to represent the background samples.However it is often difficult to determine the window sizes in practice.Therefore as with [11], [12], and [30], we empirically set OWR and IWR to be 15 × 15 and 9 × 9 respectively for all compared methods, in order to detect targets appearing in both of Hymap and AVIRIS datasets.

A. The Hymap Dataset
1) Data Description: The Hymap dataset [29] serves as a standard dataset for evaluating hyperspectral target detection, such as in [28] and [31]- [33].It has a spatial dimension of 280 × 800 and covers 126 spectral bands, as shown in Fig. 5.In this paper, we use the reflectance spectral data, and preprocess the Hymap dataset to remove some bad spectral bands that have negative values in the collected data and finally preserve 119 spectral bands for evaluation.
In the Hymap scene, there exist seven types of targets, including four types of cars (F1, F2, F3 and F4) and three types of fabrics (V1, V2 and V3).In total, nine target samples  need to be identified including F1, F2, F3a, F3b, F4a, F4b, V1, V2 and V3.The details of the targets and their corresponding locations in the scene are summarised in Table I.
The detection is performed for each type of targets.For instance, if we are interested in detecting target F1, other targets that are not of interest will be regarded as backgrounds.Note that the spatial resolution of the Hymap dataset is 3m.From the region of interests (ROIs) of each type of targets as shown in Table I, we can infer that only targets F1 and F2 are nearly of full pixels, as their ROIs occupy 3m × 3m.The rest of targets are all smaller than an HSI pixel, as their ROIs are smaller than 3m × 3m.Therefore, the Hymap dataset is a good example for evaluating the sub-pixel target detection performance of all compared methods.
As the targets of interests are mainly located in the central region of the Hymap scene, we spatially crop a 100×300 subimage from the original Fig. 5, as with [27], [28], and [34].The cropped sub-image as well as the ROIs of all seven types of targets are shown in Fig. 6(a) and Fig. 6(b), respectively.
The ground-truth spectra of seven types of targets (F1-F4 and V1-V3) are given in Fig. 7 and the sample spectral   signatures of the corresponding targets in the Hymap scene are shown in Fig. 8.We can clearly see that target spectra signatures in the scene are very different from those groundtruth spectra, which makes the detections difficult.
2) Experimental Settings: The ROIs mean that a target pixel may appear in any coordinates within the ROIs, and the exact number of pixels of a type of target is unknown.As with the experimental settings in [32] and [34], the criterion for measuring the correct detection is that if at least one pixel in the ROIs is identified as target, then this detection is regarded as a correction detection.Moreover, since the predefined threshold of each compared detector is unknown, we also adopt the false alarm rate (FAR) defined in [32] and [34] for measuring the detection performance.The FAR is equal to the number of pixels that are not in the target ROIs but have the test values equal to or greater than the highest test value of pixels within the ROIs, over the total number of pixels in the Hymap HSI, i.e. 30,000 in the example of Fig. 6.Hence we expect to see the lower the FAR, the better the detection performance.Parameters of the compared methods should be determined.For the subspace methods OSP and MSD, parameter r b ,which is the number of leading eigenvectors of background subspace, should be determined.For the sparse-representation methods STD and SRBBH, parameter L, which is the sparsity level, should be determined.We shall also determine the parameter λ 0 and λ 1 , which are the shrinkage parameters of models H 0 and H 1 , respectively, for both the proposed MSCD-l 1 and MSCD-l 2 .Due to the limited size of training samples, we are unable to do cross-validation for tuning parameters.Specifically, we have only one ground-truth spectrum of each type target and we do not even have the ground-truth spectra of background samples within the Hymap HSI.Therefore for illustration purposes, we manually tune the parameters of each compared method to their optimal values when the FARs of each method are the lowest, as done by most published works on the Hymap dataset [31], [32], [35].The range of r b is [1,119]; the range of L is [1,30].For the proposed MSCD-l 1 and MSCD-l 2 , we also manually tune the parameters λ 0 and λ 1 to their optimal values by sweeping the value in [1e-05, 1e-04, 1e-03, 1e-02, 1e-01, 1e-00, 1e+01, 1e+02].The optimal values of r b for OSP and MSD and of L for STD and SRBBH are listed in Table II.The optimal values of λ 0 and λ 1 for the proposed MSCD-l 1 and MSCD-l 2 are listed in Table III.

3) Experimental Results and Analysis:
The FARs of all compared methods for detecting each type of targets are listed in Table IV.Firstly, for the cone-based detectors, MCD, MSCD-l 2 and MSCD-l 1 , we can observe that the proposed MSCD-l 2 (FAR 5.12e-02) and MSCD-l 1 (13.17e-02)outperform MCD (28.60e-02) for detecting different types of targets.This illustrates the effectiveness of incorporating the regularisations into the optimisation of nonnegative problems.Furthermore, MSCD-l 2 performs significantly better than MSCD-l 1 , which implies that the 2 -norm regularised cone representation is more effective than the l 1 -norm regularised cone representation for detecting the targets in the Hymap dataset.
Secondly, comparing all the methods listed in Table IV, we can clearly see that our proposed MSCD-l 2 outperforms ACE, CEM, OSP, MSD, STD, SRBBH, MCD and MSCDl 1 for detecting targets F1, F4 and V1, and it performs the best in terms of the sum of FARs of detecting fabric targets F1-F4 with FAR as 0.25e-02.More importantly, MSCD-l 2 also outperforms others in detecting all types of targets, i.e.F1-F4 and V1-V3, with the smallest sum of FARs as 5.12e-02.This indicates that the proposed MSCD-l 2 is more competitive than the subspace and sparse-representation methods.
Last but not least, we shall note that, among the compared methods, the subspace method MSD and the sparserepresentation method STD perform relatively better than each Since FAR is the smaller the better, for easier reading, we plot "−FAR" such that the best detection performance occurs at the peak of the surface plot. of their cohort methods, i.e.MSD is better than OSP and STD is better than SRBBH in terms of the sum of FARs of all targets.STD also has competitive performance for detecting the vehicle targets, particularly V2 and V3.However, both of MSD and STD are not as good as the proposed MSCD-l 2 in terms of the sum of FARs for detecting all targets.This also implies that MSCD-l 2 is more stable than other methods, whatever the types of targets and the sizes of them.
To further illustrate the detection performances of the compared methods, we display the prediction maps of all methods in Fig. 9 for detecting target F4.Fig. 9(b) shows the ground-truth map of target F4.The value of each pixel shown in Fig. 9(c)-9(k) represents the test statistic value of the pixel: the brighter the pixel, the higher the test statistic value, and thus the more likely a target.That is, we expect a good prediction map to show a clear pattern for detecting F4 that the brightnesses of the pixels located within the ROIs of F4 are higher than those outside.From these prediction maps, we can visually observe that 1) ACE (Fig. 9(c)), CEM (Fig. 9(d)), OSP (Fig. 9(e)) and MSD (Fig. 9(f)) have no such a clear pattern; 2) STD (Fig. 9(g)), SRBBH (Fig. 9(h)), MCD (Fig. 9(i)) and MSCD-l 1 (Fig. 9(j)) look better, but we can easily spot many outside pixels brighter than the pixels within the ROIs of F4; 3) among all the maps, MSCD-l 2 in Fig. 9(k) looks the best, though it still does not provide a zero FAR (FAR = 0.04e-02 in Table IV), where the bright pixels largely stick around the ground-truth of F4, rather than spread over the scene as in other prediction maps.
4) Discussion on Effects of Parameters: We further investigate the effects of two types of parameters on the performance of our proposed MSCD methods: 1) the shrinkage parameters λ 0 and λ 1 ; and 2) the window sizes IWR and OWR.
Firstly, from Fig. 10 we can make two observations.1) The "-FAR" surface of MSCD-l 2 (Fig. 10(b)) is smoother than that of MSCD-l 1 (Fig. 10(a)), which indicates that MSCD-l 2 is less sensitive to the shrinkage parameters λ 0 and λ 1 than MSCD-l 1 .2) For both MSCD-l 2 and MSCD-l 1 , the detection performance tends to be stable in a wide range of values of λ 0 and λ 1 ; that is, the values λ 0 and λ 1 do not have to be exactly the same as used in Table III to achieve a similar performance for MSCD-l 2 and MSCD-l 1 .
Secondly, we investigate the effects of window sizes on the performance of the compared detectors: OSP, MSD, STD, SRBBH, MCD, MSCD-l 1 and MSCD-l 2 .For a simple and effective exploration, we fix the values of other parameters  (r b and L in Table II and λ 0 and λ 1 in Table III), and fix the IWR and tune OWR.From Fig. 11, we can see that, among the detectors, MSCD-l 2 is the most stable with OWR, while the two subspace detectors OSP and MSD are the most sensitive.

B. The AVIRIS Dataset
1) Data Description: The AVIRIS data was captured at an airport in the San Diego, CA, USA with the planes as targets.We select a sub-image that spatially covers a region of 100×100.As with [11] and [12], we remove some bad spectral bands and preserve 189 spectral bands for evaluation.In the AVIRIS scene, there are three planes need to be detected, consisting of 58 HSI pixels that are labelled as target pixels.The hyperspectral image scene and the ground-truth maps are shown in Fig 12(a) and Fig. 12(b), respectively.It is clear that each target plane covers more than one HSI pixel.Hence the AVIRIS dataset adopted here is suitable for evaluating the fullpixel target detection performance of the compared methods.
2) Experimental Settings: Because the labels for individual HSI pixels are available in the AVIRIS dataset, we select the three central HSI pixels of each plane as the prior spectra of target signatures, as with [11] and [12].The rest of target HSI pixels are used to evaluate the detection performances of methods.The 58 target spectra and the three training target spectra are shown in Fig. 13(a) and Fig. 13(b), respectively.We can observe that the spectra of the target HSI pixels still  look different from each other.However, compared with Fig. 7 and Fig. 8 for the Hymap dataset, the spectral pattern of the AVIRIS targets may be clearer and the targets may be easier to be detected, as the training target pixels are from the HSI rather than from spectral libraries.
As with [11] and [12], we use the receiver operating characteristic (ROC) curves to measure the detection performances for the AVIRIS dataset.The reason of using ROC instead of FAR is that now we have the labelling information for every single target HSI pixel, instead of the only available ROIs in the Hymap dataset.We expect that an ROC curve goes to the top left of the plot, if the detection performance of a method is good.Additionally, we adopt the area under curve (AUC) statistics to quantitatively measure the detection performance in pair with the ROC curves.
Similarly, the parameters of each compare method should be determined: the number of leading eigenvectors r b for the subspace methods OSP and MSD; the sparsity level L for the SR-based methods; and the shrinkage parameters λ 0 and λ 1 for both of the proposed MSCD-l 1 and MSCD-l 2 .Again, for illustration purposes, the parameters are empirically determined and the values are listed in Table V, with the same tuning ranges of values as for the Hymap dataset.
3) Experimental Results and Analysis: The ROC curves of all the compared methods are shown in Fig. 14 and the corresponding AUC statistics are listed in Table V.Once again, we can observe that the proposed MSCD-l 1 and MSCD-l 2 both outperform MCD, which indicates the benefit of incorporating the l 1 -norm and l 2 -norm regularisations into the cone-based representation for HSI target detection.Moreover, the proposed MSCD-l 1 is among the best of all the compared method.This implies that, for detecting full-size target HSI pixels, introducing the sparsity constraints on the coefficients into the MCD can achieve better performance than the l 2 -norm constraints on the coefficients.Generally speaking, the cone-representation methods are better than the sparserepresentation methods; and the sparse-representation methods are better than the subspace methods for detecting full-size target HSI pixels in the AVIRIS dataset.
We also plot the prediction maps for all the methods and display them in Fig. 15.It can be seen that the cone-representation methods, i.e.MCD (Fig. 15(g)), MCD-l 1 (Fig. 15(h)) and MCD-l 2 (Fig. 15(i)), look relatively better than the others.The difference among these three prediction maps are not so much.Among the other six methods (ACE, CEM, OSP, MSD, STD and SRBBH), MSD (Fig. 15(d)) looks the worst, as it is badly affected by the dual window scheme (Fig. 4); and STD looks better than ACE, CEM, OSP, MSD and SRBBH.However, the colour contrast in Fig. 15(e) of STD is not as large as those in Fig. 15(g)-15(i) of the cone-representation methods.This means that the test statistics of background pixels and target pixels of MCD, MSCD-l 1 and MSCD-l 2 are more different than those of STD, which further illustrates the stable performances of the cone-based methods for detecting targets in the AVIRIS dataset.
4) Discussion on Effects of Parameters: As with the analysis for the Hymap dataset, we also investigate the effects of  shrinkage parameters λ 0 and λ 1 and window sizes IWR and OWR on the detection performance on the AVIRIS dataset.Firstly, from Fig. 16 we can observe two similar patterns to those from Fig. 10: 1) MSCD-l 2 is less sensitive to λ 0 and λ 1 ; and 2) both MSCD-l 2 and MSCD-l 1 can achieve good performance in a wide range of values of λ 0 and λ 1 .
Secondly, Fig. 17 shows that MCD, MSCD-l 1 and MSCD-l 2 are less sensitive to OWR, or say more robust to the variation of background samples, than OSP, MSD, STD and SRBBH.
By analysing the experimental results of the two datasets, we can observe that MSCD-l 2 performs better than MSCD-l 1 for the Hymap dataset, while MSCD-l 1 is better than MSCD-l 2 for the AVIRIS dataset.In line with the debate between choosing sparse representation (lasso) or collaborative representation (ridge regression) in the HSI analysis, both methods have their own advantages and it remains an open question which one is better.We shall also note that MSCD-l 1 and MSCD-l 2 cost more computational resources than ACE, CEM, OSP and MSD, because there is no closed-form solution to the cone-based optimisation.The computational costs of STD, SRBBH, MCD, MSCD-l 1 and MSCD-l 2 are listed in Table VI, as performed on Intel i7-4790 CPU.

VI. CONCLUSION
In this paper, we have proposed a new approach called matched shrunken cone detector (MSCD) for hyperspectral target detection.Two new working models of MSCD, namely MSCD-l 2 and MSCD-l 1 , have also been proposed, with the l 2 -norm and l 1 -norm regularisations incorporated into the MSCD, respectively.Geometrically, we have analysed the underlying effectiveness of MSCD.The values of the coefficients are shrunken within a cone either by the l 2 -norm regularisation or the l 1 -norm regularisation, which form two different constrained regions for the coefficients.Statistically, we have derived MSCD from the Bayesian perspective.We have shown that if a multivariate half-Gaussian distribution or a multivariate half-Laplace distribution is assumed as the prior distribution of the coefficients, then MSCD-l 2 or MSCD-l 1 can be derived.In our experiments, cases studies on two real hyperspectral datasets have been conducted, with the Hymap dataset to illustrate the sub-pixel target detection and the AVIRIS dataset to illustrate the full-pixel target detection.We have compared four categories detectors including the baseline methods, the subspace methods, the sparse-representation methods and the cone-representation methods.Experimental results on both of the two datasets have showed the competitive performance of the proposed MSCD.
We would like to make two further notes about the Bayesian derivations.One the one hand, in the Bayesian paradigm, the half-Gaussian or half-Laplace prior distribution can be assumed on the basis of our prior knowledge that the model coefficients are positive.In principle any distribution of a positive random variable can be assumed as the prior for such a coefficient; in our case, half-Gaussian and half-Laplace distributions match the l 2 -norm and l 1 -norm regularisations, respectively.That is, the half-Gaussian and half-Laplace priors provide us with a principled Bayesian interpretation of the two regularised models.On the other hand, if the practitioners hold some specific prior domain knowledge which prefers to be modelled by other positive prior distributions, such as log-normal distributions or gamma distributions, a Bayesian derivation like ours can open a door to different new regularised models, which fit their practice better.This can be an interesting and practically valuable direction to further our principled work presented in this paper.
N b ] T follows a multivariate Gaussian distribution N(0,σ 2 β I N b ),w h e r eI N b is the N b × N b identity matrix, then β =[β 1 ,...,β N b ] T follows a multivariate half-Gaussian distribution with β i =|s i | and β i ≥ 0, where i = 1,...,N b .The expectation of β is

Fig. 3 .
Fig. 3. Illustration of a half-Laplace distribution.C. Prior Distributions of β and α in MSCD-l 1 A Laplace distribution is defined as follows.If a random variable Y has a Laplace distribution L(µ, b),t h e ni th a s mean µ,v a r i a n c e2 b 2 , and pdf

Fig. 7 .
Fig. 7. Rescaled prior spectra of all the targets in the SPL files: (a) fabric panels; (b) vehicles.

Fig. 10 .
Fig. 10.Effects of λ 0 and λ 1 of (a) MSCD-l 1 and (b) MSCD-l 2 on detecting target F4 in the Hymap dataset.Window sizes: IWR 9 × 9andO WR15× 15.Since FAR is the smaller the better, for easier reading, we plot "−FAR" such that the best detection performance occurs at the peak of the surface plot.

Fig. 13 .
Fig. 13.Spectra of targets in the AVIRIS dataset: (a) all target spectra in the hyperspectral scene; (b) spectra of three training target pixels, which are the central pixels of the three planes, respectively.

TABLE I LIST
[29]HE TARGETS IN THE HYMAP DATAS E T[29]

TABLE II PARAMETER
SETTINGS: THE NUMBER r b OF LEADING EIGENVECTORS OF OSP AND MSD; AND THE SPARSITY LEVEL L OF STD AND SRBBH

TABLE IV FALSE
ALARM RATE (FAR) OF COMPARED METHODS FOR THE HYMAP DATAS E T.THE OWR AND IWR ARE SET TO BE 15×15 AND 9×9, RESPECTIVELY, FOR OSP, MSD, STD, SRBBH, MSCD, MSCD-l 1 AND MSCD-l 2 .THE MINIMUM FARs ARE IN BOLDFACE

TABLE V PARAMETERS
AND AUC STATISTICS OF THE COMPARED METHODS FOR THE AVIRIS DATAS E T.THE OWR AND IWR ARE SET TO BE 15 × 15 AND 9 × 9, RESPECTIVELY FOR OSP, MSD, STD, SRBBH, MSCD, MSCD-l 1 AND MSCD-l 2 .THE MAXIMAL AUC ISINBOLDFACE Fig. 16.Effects of λ 0 and λ 1 of (a) MSCD-l 1 ,( b )M S C Dl 2 on detecting targets in the AVIRIS dataset.Fig.17.Effects of window sizes on detecting targets in the AVIRIS dataset.

TABLE VI EXECUTION
TIME (SEC/PIXEL)SPENT ON THE AVIRIS DATAS E T