Hierarchical Sub-Pixel Anomaly Detection Framework for Hyperspectral Imagery

Anomaly detection is an important task in hyperspectral processing. Some previous works, based on statistical information, focus on Reed-Xiaoli (RX), as it is one of the most classical and commonly used methods. However, its performance tends to be affected when anomaly target size is smaller than spatial resolution. Those sub-pixel anomaly target spectra are usually much similar with background spectra, and may results in false alarm for traditional RX method. To address this issue, this paper proposes a hierarchical RX (H-RX) anomaly detection framework to enhance the performance. The proposed H-RX method consists of several different layers of original RX anomaly detector. In each layer, the RX’s output of each pixel is restrained by a nonlinear function and then imposed as a coefficient on its spectrum for the next iteration. Furthermore, we design a spatial regularization layer to enhance the sub-pixel anomaly detection performance. To better illustrate the hierarchical framework, we provide a theoretical explanation of the hierarchical background spectra restraint and regularization process. Extensive experiments on three hyperspectral images illustrate that the proposed anomaly detection algorithm outperforms the original RX algorithm and some other classical methods.


Introduction
Hyperspectral imagery (HSI) records detailed spectral information in hundreds of narrow contiguous bands over the visible light spectrum to the Near Infrared (NIR) or Short-Wave Infrared (SWIR) bands. Each HSI pixel is a vector that represents the radiance or reflectance value of each band; therefore, HIS can be considered a 3-D cube. Such data provide deterministic spectral information about different materials and objects in the scene. Benefitting from this unique characteristic, HSI is becoming a valuable tool for many real-world applications, such as agriculture surveillance [1,2], mineralogy [3], and environmental sciences [4][5][6]. Among these applications, hyperspectral image anomaly detection plays an important role.
Anomaly detection in HSI can be regarded as a particular case of target detection that assumes no prior knowledge of the target's spectral signature [7]. The main task of anomaly detection is to detect an object with an unusual spectral signature with respect to the background, especially small man-made objects. Over the past two decades, several algorithms for hyperspectral anomaly detection have been proposed, such as the RX detector (RXD) [8], robust PCA (RPCA) [9], low-rank and sparse representation (LRASR) [10], and low-rank and sparse matrix decomposition-based Mahalanobis distance (LSMAD) [11]. These algorithms can be classified into two classes: Statistical-based methods and low-rank or sparse matrix decomposition-based methods.
Therefore, all spectra of this hyperspectral image can be arranged in an L × N matrix as X = [x 1 , x 2 , · · · , x n ]. The two competing hypotheses that the RX algorithm must distinguish are given as follows: H 0 : x = n (Target absent) H 1 : x = a * s + n (Target present) (2) where a = 0 under H 0 and a > 0 under H 1 . n is a vector that represents the background clutter noise process, and s is the spectra signature of the signal (target) given by s = s 1 , s 2 , · · · , s j . Therefore, the RX algorithm anomaly detection can be defined as follows: In addition, this formula can be simply defined as: where µ b represents the estimated mean spectral signature and Here, C b is the estimated covariance matrix of the N pixels belonging to the background. The RX algorithm is based on exploiting the difference between the spectral signatures of an input pixel with its surrounding neighbors. This distance comparison is very similar to the Mahalanobis distance measure [7].
Due to the equivalence mentioned above, influence factors on the RX performance can be identified from the perspective of Mahalanobis metrics. The larger Mahalanobis distance indicates the spectra belongs to anomaly category with a higher probability. That is, it greatly depends on the dissimilarity between the anomaly spectra and the average one. Considering the facts that anomaly spectra are sparse among background, and background spectra are not always same, since of the same object with different spectra phenomenon, we can thus infer that the metric is affected by two aspects: (1) Difference between anomaly and average spectra; (2) variation of background spectra.
Such kind of factors would have more impact on the sub-pixel anomaly detection with small ratio. In general, it is intractable to discriminate legitimate anomalies from detections that are not of interest, since of the deficiency of prior knowledge on the anomaly type [7]. Specifically, when the ratio of sub-pixel is small so that anomaly spectra is much similar to the background one, the Mahalanobis distance of false anomaly may be larger than one of legitimate anomalies, and thus degrade the robustness of RX detector.
Given aforementioned analysis, one of feasible strategies to improve the original RX is enhancing discriminability between anomaly spectra and background ones. This is also the motivation of the proposed hierarchical anomaly detection framework. But unlike the iterative methods [16,17], which consist of the many layers with same affects and functions, we built the framework by using different layers with different purposes. Moreover, they use iterative method to remove the spectra once it determined as anomalies. Different from them, we design the hierarchical framework, where we just restrain the background spectra by a nonlinear suppression function so as to better estimate the mean background vector and enhance background-anomalies contrast.

Hierarchical-RX Algorithm
In this paper, we propose that a restraint on the background spectra is helpful for anomaly detection problems. In addition, regularization on the anomaly detection result is helpful for pixel/sub-pixel anomaly detection tasks. The major characteristics of the proposed hierarchical framework can be summarized as follows.
(1) The original RX detector can be considered a single-layer detector, whereas the proposed H-RX detector consists of several layers of traditional RX detectors, and each layer is linked in series.
(2) After each layer of detection, the background spectra are restrained based on the current layer's anomaly score. In this approach, anomaly spectra become more distinguished from the background spectra, which implies that the H-RX method can achieve a better performance in the same situations to which the original RX detector is applied.
(3) To enhance the result for pixel/sub-pixel anomaly detection, a regularization method based on point spread characteristics is designed to eliminate the effects between the anomaly spectra and background spectra around anomaly pixels.
The flowchart of the proposed hierarchical-RX anomaly detection method is shown in Figure 1. Compared with the original RX detection method, the H-RX anomaly detection architecture is structurally divided into two phases: (1) Hierarchical anomaly detection, and (2) anomaly detection result regularization in spatial domain. framework by using different layers with different purposes. Moreover, they use iterative method to remove the spectra once it determined as anomalies. Different from them, we design the hierarchical framework, where we just restrain the background spectra by a nonlinear suppression function so as to better estimate the mean background vector and enhance background-anomalies contrast.

Hierarchical-RX Algorithm
In this paper, we propose that a restraint on the background spectra is helpful for anomaly detection problems. In addition, regularization on the anomaly detection result is helpful for pixel/sub-pixel anomaly detection tasks. The major characteristics of the proposed hierarchical framework can be summarized as follows.
(1) The original RX detector can be considered a single-layer detector, whereas the proposed H-RX detector consists of several layers of traditional RX detectors, and each layer is linked in series.
(2) After each layer of detection, the background spectra are restrained based on the current layer's anomaly score. In this approach, anomaly spectra become more distinguished from the background spectra, which implies that the H-RX method can achieve a better performance in the same situations to which the original RX detector is applied.
(3) To enhance the result for pixel/sub-pixel anomaly detection, a regularization method based on point spread characteristics is designed to eliminate the effects between the anomaly spectra and background spectra around anomaly pixels.
The flowchart of the proposed hierarchical-RX anomaly detection method is shown in Figure 1. Compared with the original RX detection method, the H-RX anomaly detection architecture is structurally divided into two phases: (1) Hierarchical anomaly detection, and (2) anomaly detection result regularization in spatial domain.  For the former phase, a background suppression layer is developed to restrain the background spectra of the input hyperspectral data and a stop criterion layer is designed to determine whether the data need further detection processing. For the anomaly detection result regularization phase, the  For the former phase, a background suppression layer is developed to restrain the background spectra of the input hyperspectral data and a stop criterion layer is designed to determine whether the data need further detection processing. For the anomaly detection result regularization phase, the result regularization layer is tactically designed based on the point spread characteristics, and it is designed to enhance the anomaly detection result.
Consider HSI with N pixels and L spectral bands, which are arranged in an L × N matrix as X = [x 1 , x 2 , · · · , x n ]. For the k th layer, the H-RX detector's output of this layer can be represented as follows: where x(i) k and µ k represent the spectral matrix and the mean spectra of the k th layer, respectively; C k represents the correlation matrix of the k th layer; and y k is the output of the k th layer and represents the possibility of that an anomaly target corresponds to the spectra, and it has a value within [0, 1]. Then, each spectral vector x k i is transformed by multiplying a nonnegative number q y k i based on its output score as follows: In the formulation, the nonlinear function q(t) is imposed on the spectral vector x k i . We consider this function a "soft-threshold" operation that retains the spectrum x i , whose output score is large, while suppressing the spectrum x i , whose output score is small. In this way, the undesired background spectra are gradually suppressed after each layer's detection, whereas the target spectra will remain unchanged. In this paper, the nonlinear suppression function is defined as follows: where λ is a positive parameter to adjust the shape of the function (8). Figure 2 shows the shape of the function (8)  result regularization layer is tactically designed based on the point spread characteristics, and it is designed to enhance the anomaly detection result. Consider HSI with pixels and spectral bands, which are arranged in an × matrix as = [ 1 , 2 , ⋯ , ]. For the ℎ layer, the H-RX detector's output of this layer can be represented as follows: where ( ) and represent the spectral matrix and the mean spectra of the ℎ layer, respectively; represents the correlation matrix of the ℎ layer; and is the output of the ℎ layer and represents the possibility of that an anomaly target corresponds to the spectra, and it has a value within [0, 1]. Then, each spectral vector is transformed by multiplying a nonnegative number ( ) based on its output score as follows: In the formulation, the nonlinear function ( ) is imposed on the spectral vector . We consider this function a "soft-threshold" operation that retains the spectrum , whose output score is large, while suppressing the spectrum , whose output score is small. In this way, the undesired background spectra are gradually suppressed after each layer's detection, whereas the target spectra will remain unchanged. In this paper, the nonlinear suppression function is defined as follows: where is a positive parameter to adjust the shape of the function (8). Figure 2 shows the shape of the function (8)   The potential anomaly spectra and the transformed background spectra will be used to construct the new input data for the anomaly detector in the ( + 1) ℎ layer. The aforementioned steps will be repeated until the output meets the stop criterion. In this paper, we calculate , which is the error of the average output energy of the current layer and the previous layer, and it can be defined as follows: In the Equation (9), refers to a small positive number. Therefore, if ≤ , then the iteration will be stopped; whereas if > , then the operation proceeds directly to the next iteration. After a range of anomaly detection, we incorporated a point spread function (PSF) [18,19]-based regularization into our HSI anomaly detection framework to promote the detection results under small anomaly targets. The potential anomaly spectra and the transformed background spectra will be used to construct the new input data for the anomaly detector in the (k + 1) th layer. The aforementioned steps will be repeated until the output y k meets the stop criterion. In this paper, we calculate δ k , which is the error of the average output energy of the current layer and the previous layer, and it can be defined as follows: In the Equation (9), ε refers to a small positive number. Therefore, if δ k ≤ ε, then the iteration will be stopped; whereas if δ k > ε, then the operation proceeds directly to the next iteration. After a range of anomaly detection, we incorporated a point spread function (PSF) [18,19]-based regularization into our HSI anomaly detection framework to promote the detection results under small anomaly targets. The spectral mixing model for a sub-pixel target is a difficult and complicated modeling task. We consider spectral mixing to have an effect on the detection result; therefore, we design the regularization method in the spatial domain to enhance the anomaly detection performance. In this regularization method, we modeled the radiation character of the anomaly target by using the PSF as follows: In the Equation (10), I 0 is a pixel of the H-RX detection result; (x 0 , y 0 ) represent the location of this pixel; σ x and σ y are the horizontal and vertical extent parameters, respectively; and I(x, y) denotes the value of the surrounding location (x, y). Then, for the center pixel, we have the point spread indicator p: where I M and I N represent the average value of the direct neighbor domain and the diagonal neighbor domain, respectively. The point spread indicator p of all pixels in anomaly detection result is calculated. Then, the pixel is a potential anomaly target if the p value is in the protection interval. At this point, the H-RX is complete. Briefly, the outline of the proposed H-RX algorithm for hyperspectral anomaly detection is given as follows.

Theoretical Analysis
Here, we provide theoretical analyses of the proposed H-RX algorithm. Firstly, a detailed description of our designed background spectral suppression layer is provided. Secondly, the stop criteria layer is analyzed. Finally, the motivation and function of the spatial regularization layer is clarified.

Background Spectral Suppression Layer
Here, we provide a detailed description and theoretical analysis of the background spectral suppression layer. For the (k + 1) th layer and the prior k th layer, according to formula (7) and (8), we have Because y k i ∈ [0, 1], the detection result between the (k + 1) th layer and k th layer has the following relationship: Based on this relationship, the spectral result between the (k + 1) th layer and k th layer has the following relationship: For the background spectral results, each spectral vector x k i is multiplied by a number smaller than 1 and every element of this spectral vector is changed to a much smaller value. However, for the anomaly target spectral result, the original value is maintained for every band. Then, the difference between the background spectral result and anomaly target spectral result will be enlarged, and the background spectral result is restrained so that the vector has only a small value (close to a zero vector).
As a result, the output of the background spectrum will be restrained to a small value, but the output will be much larger for the anomaly spectrum. In the ideal situation, After performing background spectral suppression layer processing, we have the following: After several layers of restraint operations, the background spectral suppression results in the vector containing only the value 0 and only the anomaly target spectrum retaining its original vector value.
The background spectral suppression layer has the following characteristics.
(1) The output of the H-RX algorithm will be restrained to a small constant for the background spectrum, whereas this constant will generally retain the original value for the anomaly target spectrum. (2) This layer can increase the difference between the target spectrum and background spectrum.
(3) Because the background spectrum is restrained to a zero vector, the sparsity of the rebuilt data will be significantly increased. (4) Although the H-RX algorithm contains several RX detectors, as the data sparsity increases, the calculation speed of each layer with the exception of the first layer will increase.
The abovementioned analysis indicates that the performance of the H-RX detector of the k th layer is equal to or better than that of the (k + 1) th layer. By using the background constraint operation, the magnitude of certain background spectra will be suppressed to zero, which leads to an increase of the difference between the background and the target spectra and makes the target easier to detect. The simulation of this situation is shown in Section 4.

Stop Criteria Layer
Based on the analysis of the background spectral suppression layer, we know that after several layer restraint operations, the spectra in the hyperspectral image contains the following characteristics: (1) The background spectral magnitude is reduced while its direction in the spectral space is maintained, and after several ranges of suppression operations, it will be transformed close to a zero vector. (2) Concurrently, the target spectra remain unchanged.
Then, after a range of suppression and detection, the spectral vector in the transformed hyperspectral image is almost restrained to a zero vector and only the anomaly target spectra contain the original spectral vector. As a result, the detection results will contain a small value (close to zero) that corresponds to the background spectra, whereas a much larger value is observed for the anomaly target spectra.
In the hierarchical framework, if there is only one anomaly target, the output energy of the current layer can be formulated as follows: where 1 is the energy of the anomaly and γ k represents the energy sum of the background. For the next layer, the output energy can be formulated as follows: Based on these equations, we can calculate δ k , the error of the average output energy of the current layer and the previous layer, which is defined as follows: Thus, we have: Based on the above analysis, γ k and γ k−1 are small values and are close to zero; then, δ k will also converge to zero. Therefore, where ε refers to a small positive value (we set this value equal to 10 −4 ). By comparing the value between δ k and ε, the algorithm makes the decision to go directly to the next iteration or stop the iteration detection. Then, the stop criteria layer can be described as a low-pass filter like Figure 3 shown us.
Based on the analysis of the background spectral suppression layer, we know that after several layer restraint operations, the spectra in the hyperspectral image contains the following characteristics: (1) The background spectral magnitude is reduced while its direction in the spectral space is maintained, and after several ranges of suppression operations, it will be transformed close to a zero vector. (2) Concurrently, the target spectra remain unchanged.
Then, after a range of suppression and detection, the spectral vector in the transformed hyperspectral image is almost restrained to a zero vector and only the anomaly target spectra contain the original spectral vector. As a result, the detection results will contain a small value (close to zero) that corresponds to the background spectra, whereas a much larger value is observed for the anomaly target spectra.
In the hierarchical framework, if there is only one anomaly target, the output energy of the current layer can be formulated as follows: where 1 is the energy of the anomaly and represents the energy sum of the background. For the next layer, the output energy can be formulated as follows: Based on these equations, we can calculate , the error of the average output energy of the current layer and the previous layer, which is defined as follows: Thus, we have: Based on the above analysis, and −1 are small values and are close to zero; then, will also converge to zero. Therefore, go back to background suppression layer, out put the result, where refers to a small positive value (we set this value equal to 10 −4 ). By comparing the value between and , the algorithm makes the decision to go directly to the next iteration or stop the iteration detection.
Then, the stop criteria layer can be described as a low-pass filter like Figure 3 shown us.   In the whole H-RX algorithm process, the stop criteria later plays the important roles of determining when to stop the detection and identifying whether a potential advantage still exists to improve the anomaly detection performance.

Spatial Regularization Layer Analysis
Before the final detection result is output, we design a regularization layer to enhance the detection performance in the spatial domain. In HSI, the background spectra around the target may be influenced by the target spectra, which prompts consideration of the similarity of the background spectra with the target spectra. As a result, these background pixels have a similar or larger value than the target in the detection result, which leads to false detection.
In this paper, we focus on achieving a better anomaly detection performance for pixel/sub-pixel anomaly targets. To achieve this goal, the noise caused by pixel-mixing phenomena should first be smoothed. Therefore, the key issue of the filtering module is determining how to distinguish the target and noise. Here, we introduce a key technique in infrared target detection systems reported in the literature [20], and then we design an efficient pixel/sub-pixel regularization filter for the regularization layer.
Based on the assumption that the model of pixel-mixing phenomena influences the detection result, we define the following formula: where (x, y) represents the pixel location and f D , f T , f B and f N are the detection result, target, background and noise, respectively. In the method, the radiation character in the detection result caused by pixel mixing has been modeled using the PSF as follows: where I 0 is the value of the anomaly detection result, (x 0 , y 0 ) denotes the center location of the candidate target, δ x and δ y are the horizontal and vertical extent parameters, respectively, and I(x, y) represents the value of the surrounding location (x, y).
Considering the task of single pixel target detection and the above radiation character of the target, a point spread indicator is proposed to protect the potential target and combined with a median filter to smooth the noise. As shown in Figure 4, suppose there is a center pixel C that is surrounded by the direct neighbor domain F m (m = 2, 4, 6, 8) and the diagonal neighbor domain F N (n = 1, 3, 5, 7). In the whole H-RX algorithm process, the stop criteria later plays the important roles of determining when to stop the detection and identifying whether a potential advantage still exists to improve the anomaly detection performance.

Spatial Regularization Layer Analysis
Before the final detection result is output, we design a regularization layer to enhance the detection performance in the spatial domain. In HSI, the background spectra around the target may be influenced by the target spectra, which prompts consideration of the similarity of the background spectra with the target spectra. As a result, these background pixels have a similar or larger value than the target in the detection result, which leads to false detection.
In this paper, we focus on achieving a better anomaly detection performance for pixel/sub-pixel anomaly targets. To achieve this goal, the noise caused by pixel-mixing phenomena should first be smoothed. Therefore, the key issue of the filtering module is determining how to distinguish the target and noise. Here, we introduce a key technique in infrared target detection systems reported in the literature [20], and then we design an efficient pixel/sub-pixel regularization filter for the regularization layer.
Based on the assumption that the model of pixel-mixing phenomena influences the detection result, we define the following formula: where ( , ) represents the pixel location and , , and are the detection result, target, background and noise, respectively. In the method, the radiation character in the detection result caused by pixel mixing has been modeled using the PSF as follows: where 0 is the value of the anomaly detection result, ( 0 , 0 ) denotes the center location of the candidate target, and are the horizontal and vertical extent parameters, respectively, and ( , ) represents the value of the surrounding location ( , ).
Considering the task of single pixel target detection and the above radiation character of the target, a point spread indicator is proposed to protect the potential target and combined with a median filter to smooth the noise. As shown in Figure 4, suppose there is a center pixel C that is surrounded by the direct neighbor domain ( = 2, 4, 6, 8) and the diagonal neighbor domain ( = 1, 3, 5, 7). In the formulation, and represent the average value of the direct neighbor domain and the diagonal neighbor domain, respectively. As Equation (23) indicates, the relationship among , and 0 can be obtained approximately as follows: In the formulation, I M and I N represent the average value of the direct neighbor domain and the diagonal neighbor domain, respectively. As Equation (23) indicates, the relationship among I M , I N and I 0 can be obtained approximately as follows: Thus, we have the following: If the center pixel is an anomaly target, the value of this pixel and the surroundings will be obtained from Equation (26), whereas a misleading target will not be obtained. To distinguish the target and misleading target effectively, the point spread indicator p is defined as follows: The indicator p of the pixel/sub-pixel target is equal to 0.5 only under ideal conditions. Hence, the protection interval of the p value is set as [0.2, 0.8]. By calculating the point spread indicator p of all pixels in the original image, the following condition holds: The pixel is a potential target only if the p value is in the protection interval. The target protection filter strategy combines the anomaly detection result of potential target pixels with the result of a (3 × 3) or (5 × 5) median filter of the other pixels.

Experimental Results and Analysis
In this section, we evaluate the proposed method on three hyperspectral images. In the experiments, we compare our H-RX algorithm with commonly used anomaly detection algorithms: Original RX, Kernel RX, RPCA and low-rank and sparse representation (LRASR). The former algorithm is a classical anomaly detection algorithm, which is also considered a benchmark among the most classical anomaly detection algorithms. Kernel RX is one of the most famous of the improved algorithms for RX. Another two algorithms are recently proposed low-rank-based anomaly detection algorithm. It may be a stretch to try and draw too much from a comparison of four algorithms and only three images, however, we believe our experiments point to the clear potential of the proposed framework.
To compare the performance of different algorithms, we demonstrate the algorithms based on receiver operating characteristic (ROC) curves, which describe the varying relationship of the detection probability and the false alarm rate and are used to provide performance comparisons of different detectors [21,22]. Based on a ground truth image, the ROC curve expresses the relationship between the false alarm rate (F a ) and the probability of detection (P d ) at different thresholds for the anomaly detector's output. F a and P d are defined as follows: where, N f represents the number of false alarm pixels, N b is the total number of background pixels, N c indicates the number of correct detection target pixels, and N t is the number of total true target pixels. For further comparison of the performance of anomaly detectors, the area under the curve (AUC) [23,24], which is one of the most frequently used performance measures extracted from the ex-ROC, has been used.

Dataset Description
In this paper, we evaluate the performance of the proposed H-RX method on three hyperspectral images. Moreover, for the purpose of verifying the method more objectively and comprehensively, we select datasets that were obtained from different hyperspectral imaging cameras.
The first two datasets are collected by the Airborne Visible/Infrared Imaging Spectrometer (AVIRIS) over ocean regions [25,26]. The main difference between those datasets is the amount of clouds, which influences background complexity. Those data can be downloaded from NASA's website (AVIRIS Dataset download website: http://aviris.jpl.nasa.gov/data/get_aviris_data.html.), where the index of these two datasets are f100830t01p00r19 and f131117t01p00r05, respectively. Due to the large size of the original data, we select two images for the experiments. The upper-left corner of each image is (294, 1665) and (330, 7280), respectively.
The third dataset is the most commonly used one for hyperspectral anomaly target detection. It is collected by the Hyperspectral Digital Imagery Collection Experiment (HYDICE) from a land cover region [27], which is mainly composed of buildings, vegetation, road, and vehicles. More details can be found from the website of US Army Corps of Engineers (HYDICE Dataset download website: https://www.erdc.usace.army.mil/Media/Fact-Sheets/.). Here, 162 bands remained after removing the water absorption bands.
The descriptions of each dataset are shown in Table 1, and corresponding false color images are displayed in Figure 5. For further experiments, we divide the first and second image into twelve patches, which are indexed by the serial number. One can see that there exists, some ship targets which can be selected to generate the sub-pixel anomalies targets in the number 7 patch of Dataset-1, and number 3 patch of Dataset-2. images. Moreover, for the purpose of verifying the method more objectively and comprehensively, we select datasets that were obtained from different hyperspectral imaging cameras. The first two datasets are collected by the Airborne Visible/Infrared Imaging Spectrometer (AVIRIS) over ocean regions [25,26]. The main difference between those datasets is the amount of clouds, which influences background complexity. Those data can be downloaded from NASA's website (AVIRIS Dataset download website: http://aviris.jpl.nasa.gov/data/get_aviris_data.html.), where the index of these two datasets are f100830t01p00r19 and f131117t01p00r05, respectively. Due to the large size of the original data, we select two images for the experiments. The upper-left corner of each image is (294, 1665) and (330, 7280), respectively.
The third dataset is the most commonly used one for hyperspectral anomaly target detection. It is collected by the Hyperspectral Digital Imagery Collection Experiment (HYDICE) from a land cover region [27], which is mainly composed of buildings, vegetation, road, and vehicles. More details can be found from the website of US Army Corps of Engineers (HYDICE Dataset download website: https://www.erdc.usace.army.mil/Media/Fact-Sheets/.). Here, 162 bands remained after removing the water absorption bands.
The descriptions of each dataset are shown in Table 1, and corresponding false color images are displayed in Figure 5. For further experiments, we divide the first and second image into twelve patches, which are indexed by the serial number. One can see that there exists, some ship targets which can be selected to generate the sub-pixel anomalies targets in the number 7 patch of Dataset-1, and number 3 patch of Dataset-2.

Performance Analysis of Background Suppression Layer
The proposed hierarchical anomaly detection algorithm is composed of several layers with different purposes. The most important and novel one is the background suppression layer, which significantly enhances the original RX detection performance. Then, to prove the theory and real effect of this layer, we use a part of the data region in Dataset-1 for a detailed analysis.
In this experiment, the ship spectra have been used as the target spectra. We select the seventh patch from the first dataset as the background region (upper-left corner is (201, 101)), the size of the selected background region is (20 × 20). As shown in Figure 6, we generated the sub-pixel target (abundance ratio equal to 0.1) at the location (10, 10), and then added Gaussian white noise with 30-dB SNR.
Here, we focus on two important and urgent problems that must be resolved to validate the proposed method: (1) Regards the performance of the proposed background suppression layer, (2)

Performance Analysis of Background Suppression Layer
The proposed hierarchical anomaly detection algorithm is composed of several layers with different purposes. The most important and novel one is the background suppression layer, which significantly enhances the original RX detection performance. Then, to prove the theory and real effect of this layer, we use a part of the data region in Dataset-1 for a detailed analysis.
In this experiment, the ship spectra have been used as the target spectra. We select the seventh patch from the first dataset as the background region (upper-left corner is (201, 101)), the size of the selected background region is (20 × 20). As shown in Figure 6, we generated the sub-pixel target (abundance ratio equal to 0.1) at the location (10, 10), and then added Gaussian white noise with 30-dB SNR.
Here, we focus on two important and urgent problems that must be resolved to validate the proposed method: (1) Regards the performance of the proposed background suppression layer, (2) considers whether rebuilding the hyperspectral information is useful for enhancing the anomaly detection performance. For the first question, the difference between the anomaly target and background according to the previously mentioned theoretical analysis will remarkably increase after several background suppression layer processing steps. Then, to verify this phenomenon, the detection results of the original RX with the restraint process applied once and twice were compared directly.
As illustrated in Figure 7, we can see that after one background restraint, the anomaly target is notably different than the original result. Moreover, by suppressing the background once again, the background is restrained to a smaller value and the anomaly target is much more remarkable than the background. Moreover, to quantitatively analyze the background suppression performance, the distance between the value of the anomaly target and the background with different process steps was also calculated with following formula, and the results are illustrated in Table 2. Note that our method will hold for the situation whether the magnitudes of anomaly target spectra are larger than average ones of background or not.
target-average(background) Distance= average(background) (29)   For the first question, the difference between the anomaly target and background according to the previously mentioned theoretical analysis will remarkably increase after several background suppression layer processing steps. Then, to verify this phenomenon, the detection results of the original RX with the restraint process applied once and twice were compared directly.
As illustrated in Figure 7, we can see that after one background restraint, the anomaly target is notably different than the original result. Moreover, by suppressing the background once again, the background is restrained to a smaller value and the anomaly target is much more remarkable than the background. For the first question, the difference between the anomaly target and background according to the previously mentioned theoretical analysis will remarkably increase after several background suppression layer processing steps. Then, to verify this phenomenon, the detection results of the original RX with the restraint process applied once and twice were compared directly.
As illustrated in Figure 7, we can see that after one background restraint, the anomaly target is notably different than the original result. Moreover, by suppressing the background once again, the background is restrained to a smaller value and the anomaly target is much more remarkable than the background. Moreover, to quantitatively analyze the background suppression performance, the distance between the value of the anomaly target and the background with different process steps was also calculated with following formula, and the results are illustrated in Table 2. Note that our method will hold for the situation whether the magnitudes of anomaly target spectra are larger than average ones of background or not.

target-average(background)
Distance= average(background) (29)   Moreover, to quantitatively analyze the background suppression performance, the distance between the value of the anomaly target and the background with different process steps was also calculated with following formula, and the results are illustrated in Table 2. Note that our method will hold for the situation whether the magnitudes of anomaly target spectra are larger than average ones of background or not. As shown in Table 2, the background suppression process assuredly enlarges the difference between the sub-pixel anomaly target and the background. The distance between the anomaly target and the background in the original RX detection result was nearly 3 times greater after performing one background suppression process, and this gap increased to more than 60 times by performing the background suppression process twice in this experiment.
For the second problem, as mentioned in Section 3.1, we rebuilt the spectral data based on each layer's detection result. In theory, this process can suppress the spectral data whose result is small, but retain the spectral data whose result is large. Then, in the new rebuilt hyperspectral data cube, the anomaly target spectral data should be considerably different than the background spectral data. We analyze this theory from both an objective and subjective perspective.
Subjectively, the spectra of the original data and the rebuilt data are compared directly in Figure 8. The original sub-pixel anomaly target spectra are similar to the background spectra; however, by using the restraint process, these spectra show greater differences in the rebuilt data. After two applications of the background suppression process, the target spectra remain unchanged and the background spectra have been further restrained. Then, after the background suppression, the target spectra remain unchanged and the background spectra are progressively suppressed. As shown in Table 2, the background suppression process assuredly enlarges the difference between the sub-pixel anomaly target and the background. The distance between the anomaly target and the background in the original RX detection result was nearly 3 times greater after performing one background suppression process, and this gap increased to more than 60 times by performing the background suppression process twice in this experiment.
For the second problem, as mentioned in Section 3.1, we rebuilt the spectral data based on each layer's detection result. In theory, this process can suppress the spectral data whose result is small, but retain the spectral data whose result is large. Then, in the new rebuilt hyperspectral data cube, the anomaly target spectral data should be considerably different than the background spectral data. We analyze this theory from both an objective and subjective perspective.
Subjectively, the spectra of the original data and the rebuilt data are compared directly in Figure 8. The original sub-pixel anomaly target spectra are similar to the background spectra; however, by using the restraint process, these spectra show greater differences in the rebuilt data. After two applications of the background suppression process, the target spectra remain unchanged and the background spectra have been further restrained. Then, after the background suppression, the target spectra remain unchanged and the background spectra are progressively suppressed. To compare the difference between the sub-pixel anomaly target and the background spectra objectively, we introduce two spectral distance metrics to measure the distance among these spectra. The first distance metric we used is the spectral angle mapper (SAM) [28], which defines the shape difference between the spectra and is one of the most widely used spectral similarity metrics in hyperspectral classification and target detection tasks [29,30]. The second distance we used is the Euclidean distance [31], which provides a quantitative measure of the distance between two spectra vectors and is the one of the most widely used distance metrics in many feature matching applications.
The spectral difference between two spectra 1 and 2 with bands can be analyzed as the calculated spectral dot product based on the Equation (30). The Euclidean distance metric between two spectra can be expressed the Equation (31).
We use the distance mentioned earlier to calculate the distance between the anomaly target and the average background spectra. In the ideal situation, the average background spectra may suppress the spectral vector close to the zero vector. To avoid this situation, we add a small positive value to each band value in the average background spectra. We set equal to 1, and then the average background spectra 2 can be rebuilt, as shown in the following formula, and demonstrated in the results in Table 2. To compare the difference between the sub-pixel anomaly target and the background spectra objectively, we introduce two spectral distance metrics to measure the distance among these spectra. The first distance metric we used is the spectral angle mapper (SAM) [28], which defines the shape difference between the spectra and is one of the most widely used spectral similarity metrics in hyperspectral classification and target detection tasks [29,30]. The second distance we used is the Euclidean distance [31], which provides a quantitative measure of the distance between two spectra vectors and is the one of the most widely used distance metrics in many feature matching applications.
The spectral difference between two spectra x 1 and x 2 with L bands can be analyzed as the calculated spectral dot product based on the Equation (30). The Euclidean distance metric between two spectra can be expressed the Equation (31).
We use the distance mentioned earlier to calculate the distance between the anomaly target and the average background spectra. In the ideal situation, the average background spectra may suppress the spectral vector close to the zero vector. To avoid this situation, we add a small positive value to each band value in the average background spectra. We set ε equal to 1, and then the average background spectra x 2 can be rebuilt, as shown in the following formula, and demonstrated in the results in Table 2.
As shown in Table 3, the Euclidean distance and SAM score between the target spectra and the average background spectra are gradually enlarged in the rebuilt hyperspectral data. The Euclidean distance has increased by almost one order of magnitude, and the SAM score is almost 20 times greater after applying the data rebuilding process twice compared with the original hyperspectral data. Table 3. Euclidean distance and SAM between sub-pixel anomaly and average background spectra.

Sub-Pixel Anomaly Detection Experiments
We evaluate the proposed method using three hyperspectral datasets, including two pure ocean surface and one relatively complicated land cover scene. For the first two datasets, we use Target Implantation Method mentioned in [32] to generated sub-pixel anomaly targets by mixing clean ship target spectra and ocean background spectra with different abundance ratios, and then replaced them in the corresponding background pixels. This approach is somewhat different than the target implantation method mentioned in [33], which directly replaces the target and background spectra, whereas we mix them in varying proportions. To evaluate the detector's performance more objectively, we add Gaussian white noise with 30-dB SNR in each of the mixed anomaly pixels. While this may not be the best method of generating the sub-pixel anomaly target, it is certainly better than replacing them directly.
For the third dataset, there already exists 21 pixels marked as anomaly targets, and thus we do not add any other synthetic targets in this scene. As we calculated with the linear unmixing method [34], the ratios of those targets ranges from 0.7 to 1, so, there are almost 8 sub-pixel anomalies (abundance ratios range from 0.7 to 0.9) and 13 full pixel anomalies (abundance ratios equals 1). Compared with the experiment on this dataset, the anomaly targets consist of different types of materials and different sizes. For example, the type in this dataset consists of embedded cars and roofs, where the size of the anomaly targets is one or two pixels. Furthermore, the observation area is more complex, with at least four types of objects constituting the background, which increases the difficulty to detect the anomaly targets.
For space and brevity, only the patches involving the target are illustrated in the first two datasets (i.e., patch 7 in Dataset-1, patch 3 in Dataset-2). As shown in Figure 9, we generated sub-pixel anomalies with different abundance ratios (ranging from 0.6 to 1 in Dataset-1 patch 7 and 0.05 to 1 in Dataset-2 patch 3). For more details, the size of selected region is (35 × 100) and (100 × 200) and their upper-left locations are (66, 1) and (101, 1) respectively. In addition, the selected target locations are (57, 35) and (9,32) in each dataset. For other patches, we implant the sub-pixel targets with the same process. Figure 10 illustrates the location of each target in Dataset-3. We draw corresponding spectra in Figure 10b. = ( ) x x l (32) As shown in Table 3, the Euclidean distance and SAM score between the target spectra and the average background spectra are gradually enlarged in the rebuilt hyperspectral data. The Euclidean distance has increased by almost one order of magnitude, and the SAM score is almost 20 times greater after applying the data rebuilding process twice compared with the original hyperspectral data. Table 3. Euclidean distance and SAM between sub-pixel anomaly and average background spectra.

Sub-Pixel Anomaly Detection Experiments
We evaluate the proposed method using three hyperspectral datasets, including two pure ocean surface and one relatively complicated land cover scene. For the first two datasets, we use Target Implantation Method mentioned in [32] to generated sub-pixel anomaly targets by mixing clean ship target spectra and ocean background spectra with different abundance ratios, and then replaced them in the corresponding background pixels. This approach is somewhat different than the target implantation method mentioned in [33], which directly replaces the target and background spectra, whereas we mix them in varying proportions. To evaluate the detector's performance more objectively, we add Gaussian white noise with 30-dB SNR in each of the mixed anomaly pixels. While this may not be the best method of generating the sub-pixel anomaly target, it is certainly better than replacing them directly.
For the third dataset, there already exists 21 pixels marked as anomaly targets, and thus we do not add any other synthetic targets in this scene. As we calculated with the linear unmixing method [34], the ratios of those targets ranges from 0.7 to 1, so, there are almost 8 sub-pixel anomalies (abundance ratios range from 0.7 to 0.9)and 13 full pixel anomalies (abundance ratios equals 1). Compared with the experiment on this dataset, the anomaly targets consist of different types of materials and different sizes. For example, the type in this dataset consists of embedded cars and roofs, where the size of the anomaly targets is one or two pixels. Furthermore, the observation area is more complex, with at least four types of objects constituting the background, which increases the difficulty to detect the anomaly targets.
For space and brevity, only the patches involving the target are illustrated in the first two datasets (i.e., patch 7 in Dataset-1, patch 3 in Dataset-2). As shown in Figure 9, we generated subpixel anomalies with different abundance ratios (ranging from 0.6 to 1 in Dataset-1 patch 7 and 0.05 to 1 in Dataset-2 patch 3). For more details, the size of selected region is (35 × 100) and (100 × 200) and their upper-left locations are (66, 1) and (101, 1) respectively. In addition, the selected target locations are (57, 35) and (9,32) in each dataset. For other patches, we implant the sub-pixel targets with the same process. Figure 10 illustrates the location of each target in Dataset-3. We draw corresponding spectra in Figure 10b.  In the proposed method, two hyper parameters, including iteration numbers and the size of spatial regulation windows are illustrated in Table 4    In the proposed method, two hyper parameters, including iteration numbers and the size of spatial regulation windows are illustrated in Table 4   In the proposed method, two hyper parameters, including iteration numbers and the size of spatial regulation windows are illustrated in Table 4  (c1) Anomalies spectra with different ratios (c2) Anomalies spectra with different ratios  In the proposed method, two hyper parameters, including iteration numbers and the size of spatial regulation windows are illustrated in Table 4. The results of different anomaly target detection algorithms are listed in the Figure 11. For highlighting the anomalies, we use the 3-D figures to show the final results, because the 2-D figures use the grey index to show the score for each pixel and the differences of the backgrounds and anomalies are unclear in the view of 2-D figure.   All the five methods have the ability to distinguish between anomalies and most background pixels on three datasets. But the proposed hierarchical method and Kernel RX show better visual performance than the other three algorithms. Considering the fact that the visual assessments may be inaccurate, we employ the semilog ROC curves and AUC score to further assess the detection performance. Figure 12 shows the semilog ROC curves of the five methods on the selected peaches. One can see that (1) For patch 7 of Dataset-1, the proposed method shows better performance than others. (2) For patch 3 of Dataset-2, RPCA and the proposed method show similar performance in low false alarm rate situation, but the latter is much better as false alarm rate growing. Moreover, with the increasing of false alarm rate, the unsatisfactory performance of Kernel RX enhances expeditiously.
(3) For Dataset-3, the Kernel RX, LARSR and RPCA show better performance than the proposed method. But it is worth noting that the proposed method achieves the smallest false alarm rate when all the anomaly targets are detected. In order to evaluate the performance in the low false alarm situation more clearly, we calculate the probability of detection when the false alarm rate equals 0.01. Additionally, we also calculate the All the five methods have the ability to distinguish between anomalies and most background pixels on three datasets. But the proposed hierarchical method and Kernel RX show better visual performance than the other three algorithms. Considering the fact that the visual assessments may be inaccurate, we employ the semilog ROC curves and AUC score to further assess the detection performance. Figure 12 shows the semilog ROC curves of the five methods on the selected peaches. One can see that (1) For patch 7 of Dataset-1, the proposed method shows better performance than others. (2) For patch 3 of Dataset-2, RPCA and the proposed method show similar performance in low false alarm rate situation, but the latter is much better as false alarm rate growing. Moreover, with the increasing of false alarm rate, the unsatisfactory performance of Kernel RX enhances expeditiously. (3) For Dataset-3, the Kernel RX, LARSR and RPCA show better performance than the proposed method. But it is worth noting that the proposed method achieves the smallest false alarm rate when all the anomaly targets are detected.  All the five methods have the ability to distinguish between anomalies and most background pixels on three datasets. But the proposed hierarchical method and Kernel RX show better visual performance than the other three algorithms. Considering the fact that the visual assessments may be inaccurate, we employ the semilog ROC curves and AUC score to further assess the detection performance. Figure 12 shows the semilog ROC curves of the five methods on the selected peaches. One can see that (1) For patch 7 of Dataset-1, the proposed method shows better performance than others. (2) For patch 3 of Dataset-2, RPCA and the proposed method show similar performance in low false alarm rate situation, but the latter is much better as false alarm rate growing. Moreover, with the increasing of false alarm rate, the unsatisfactory performance of Kernel RX enhances expeditiously.
(3) For Dataset-3, the Kernel RX, LARSR and RPCA show better performance than the proposed method. But it is worth noting that the proposed method achieves the smallest false alarm rate when all the anomaly targets are detected. In order to evaluate the performance in the low false alarm situation more clearly, we calculate the probability of detection when the false alarm rate equals 0.01. Additionally, we also calculate the In order to evaluate the performance in the low false alarm situation more clearly, we calculate the probability of detection when the false alarm rate equals 0.01. Additionally, we also calculate the false alarm rate when probability of detection equals 1. Here, we test all the patches and calculate the mean for the first two datasets. For the third dataset, we repeatly conduct the experiments 10 times, and then calculate the mean and standard deviation of the results. These resulting statistics are illustrated on Table 5. Table 5. Anomaly detection results on three datasets. (P-D at P-FA = 0.01, P-FA at P-D = 1).

Algorithm
Dataset-1 (%) Dataset-2 (%) Dataset-3 (%) P-D P-FA P-D P-FA P-D P-FA We can see that the proposed method achieves the top two probability of detection (when false alarm rate equals 0.01) in all datasets. It proves that the proposed method has a good ability to distinguish most of anomalies with a low false alarm rate. When the probability of detection equals 1, the proposed method gets the second lowest false alarm rate to detect all the anomalies. Overall, our method shows better performance on detecting all the targets and getting higher probability of detection.
To further evaluate the efficiency of the proposed method, the AUC score and time consumption are listed from Tables 6-9. Here, as all the methods show unsatisfactory performance on Dataset-2, we choose this dataset for detailed illustration in Tables 6 and 7. Statistical (average and standard deviation) AUC score and time consumption from all the patches of the first two datasets and retest of the third dataset are illustrated in Tables 8 and 9, respectively. Table 6. Area under the curve (AUC) score for each patches in Dataset-2.

Algorithm
Dataset-2 (s) We test all the discussed methods on an Z820 Workstation with Intel Xeon E5-2630 and 128 GB RAM. The programming environment is MATLAB 2015a. Compared with the Original RX, the proposed hierarchical framework consists of different layers, which increases computational complexity to some extent. As illustrated in [35], the computational complexity of original RX is O(N), and thus, the computational complexity of the proposed method can be briefly calculated as: where, i is the times of RX method employed in the proposed method. It can be seen that the proposed method achieves the highest average AUC score on all datasets as expected. Even though our method shows a little more time-consuming than the Original RX, it is still meaningful for some applications that need high detection accuracy and efficiency simultaneously. Further, our method not only get better detection results, but also much faster than other algorithms (i.e., RPCA, Kernel RX and LARSR).
It is worth noting that the performance of H-RX is close to other methods on Dataset-3 (i.e., similar AUC scores shown in Table 6). This results from the fact that the anomaly target could consist of more than one pixel (e.g., sub-pixel 2~5 in the Dataset-3). In this case, co-occurrences of multiple anomaly pixels within the spatial regularization window would perplex the judgement of true center anomaly one, and thus all of the sub-pixels may be restrained.
Finally, it is necessary to discuss unsatisfactory AUC scores of all the methods on the Dataset-2, which is caused by the low abundance ratios. For instance, when the abundance ratio is smaller than 0.25, the sub-pixel anomaly would be submerged in the background. Then, the H-RX fails to detect it with a low false alarm. Moreover, background is much more complex, especially for the background in the twelfth patch, which involves clouds, stripe noise and land cover regions. However, in such an extreme situation, the proposed method still outperforms the others in the detection results. Thus, the H-RX is a promising method for sub-pixel anomaly detection.

Conclusions
In this paper, we propose a hierarchical hyperspectral anomaly target detection method (H-RX) that consists of several layers with different functions. The main contributions of the proposed method are: (1) A novel hierarchical framework is proposed to restrain background spectra and enlarge the difference between the background and anomaly target spectra; (2) we prove the rationality of the proposed method theoretically and the ability to construct more discriminative spectra than that in previous layers; and (3) experimental results on three datasets show that the proposed hierarchical method has a significant improvement over the classical RX method and other recent sub-pixel anomaly detection methods.

Conflicts of Interest:
The authors declare no conflict of interest.