Impact of linear dimensionality reduction methods on the performance of anomaly detection algorithms in hyperspectral images

Anomaly Detection (AD) has recently become an important application of hyperspectral images analysis. The goal of these algorithms is to find the objects in the image scene which are anomalous in comparison with their surrounding background. One way to improve the performance and runtime of these algorithms is to use Dimensionality Reduction (DR) techniques. This paper evaluates the effect of three popular linear dimensionality reduction methods on the performance of three benchmark anomaly detection algorithms. The Principal Component Analysis (PCA), Fast Fourier Transform (FFT) and Discrete Wavelet Transform (DWT) as DR methods, act as pre-processing step for AD algorithms. The assessed AD algorithms are Reed-Xiaoli (RX), Kernel-based versions of the RX (Kernel-RX) and Dual Window-Based Eigen Separation Transform (DWEST). The AD methods have been applied to two hyperspectral datasets acquired by both the Airborne Visible/Infrared Imaging Spectrometer (AVIRIS) and Hyperspectral Mapper (HyMap) sensors. The evaluation of experiments has been done using Receiver Operation Characteristic (ROC) curve, visual investigation and runtime of the algorithms. Experimental results show that the DR methods can significantly improve the detection performance of the RX method. The detection performance of neither the Kernel-RX method nor the DWEST method changes when using the proposed methods. Moreover, these DR methods increase the runtime of the RX and DWEST significantly and make them suitable to be implemented in real time applications.


Introduction
Hyperspectral imaging is a suitable tool for target detection and recognition in many applications, including search-and-rescue operations, mine detection, and military usages.Hyperspectral sensors are powerful tools for distinguishing between different materials on the basis of each object's unique spectral signatures; these sensors are able to do this because they collect information about surfaces and objects in hundreds of narrow contiguous spectral bands in the visible and infrared regions of the electromagnetic spectrum [1].Anomaly Detection (AD) is a special kind of target detection (TD) techniques with no priori information about the targets.The main purpose of these algorithms is to find the objects in a given image that are anomalous with respect to their surrounding background [1].In other words, the point of anomaly detectors is to find the pixels whose spectra significantly differ from the background spectra [2].The main advantage of these methods is that they don't need priori information about the target signature, nor do they need any form of atmospheric or radiometric corrections on data [3].The Reed-Xialoi (RX) is the most widely used AD algorithm; it is known as a benchmark anomaly detector for multi/hyperspectral images.This algorithm, which is derived from the generalized likelihood ratio test (GLRT), assumes doi:10.5829/idosi.JAIDM.2015.03.01.02 that the background pixels in a local neighbourhood around the target can be modelled by the multivariate normal (Gaussian) distribution [4,5].The most reported problem for the RX and many of its modified versions is the "small sample size".This problem concerns the estimation of a local background covariance matrix from a small number of very high dimensional samples.This may result in a badly conditioned and unstable estimate of local background covariance matrix that strongly affects the detection performance of the AD algorithm [6].The first solution to this problem is enlarging the sample size by expanding the local window size.This solution tries to resolve the non-homogeneity of the local background, which undermines the effectiveness of the covariance matrix estimation.Another solution for this problem is using the Dimension Reduction (DR) [6,7].The performance of many AD algorithms can be improved by using a pre-processing DR step.The reason is that the hypercube is a relatively large empty space and the most important or interesting information is represented in a few features [8,9].The DR step, used as a pre-processing step of the AD algorithm, can reduce the inter-band spectral redundancy and ever-present noise.Although the DR is lossy, it increases the separation between anomaly and background signatures.Thus, the detection performance of the anomaly detector is improved.Another reason for using DR algorithms is that AD algorithms, such as RX, involve the inverse local clutter covariance matrix.This covariance matrix is usually singular, due to the high dimensionality of the hyperspectral data [10].In addition, in hyperspectral image data, the correlation between the different bands, i.e. information redundancy, is high.As a result, by reducing the number of image bands, the correlation between them is decreased and therefore the problem is solved.Furthermore, since DR brings data from a high order dimension to a low order dimension, it can overcome the "curse of dimensionality" problem [11].DR techniques are divided into two categories: linear and nonlinear.Although linear techniques do not exploit the nonlinear properties in hyperspectral data, they can be fast enough for real time applications.A popular linear DR method, which is ideally used for small target detection is Principle Component Analysis (PCA) [12].There are other linear DR methods, such as the Discrete Wavelet Transform (DWT) and Fast Fourier Transform (FFT), which can be used to improve the performance and runtime of AD algorithms [13,14].A general framework of an AD scenario is shown in figure 1.For the first step, the spectral dimension of an image cube is reduced through using a DR method.The AD algorithm is then used to analyse new image; the result is a two dimensional matrix named "AD matrix".To specify the locations of anomalies or targets in the image, a post-processing threshold step can be added to the algorithm.In this study, three linear DR methods PCA, DWT and FFT are used as a pre-processing step for three famous AD methods: RX, Kernel-RX and DWEST and the impact of DR step on the performance of the AD methods is evaluated.The Airborne Visible/Infrared Imaging Spectrometer (AVIRIS) and Hyperspectral Mapper (HyMap) datasets are being used to both apply and evaluate the performance of methods on real hyperspectral remotely sensed images using Receiver Operation Characteristic (ROC) curve [15], area under the ROC curve (AUC) [7], visual investigation and runtime of the algorithms.This paper is organized as follows: Section 2 provides a brief overview of three popular AD methods: RX, Kernel-RX, and DWEST.In section 3, the DR methods (PCA, DWT and FFT) are introduced.The results of the experiments will be discussed in section 4. Lastly, concluding remarks are given in section 5.

RX detector
The RX algorithm is the most famous AD algorithm, developed by Reed and Yu [16].RX is considered to be a benchmark AD algorithm for hyperspectral images; it works as follows: Assume that r is an image pixel vector that has L elements, where L is the number of image's spectral bands.The RX detector is defined by (1).
In this equation, µ is the sample mean and C is the sample data covariance matrix.Finally δrxd(r) is the well-known mahalanobis distance that shows the abnormality amount of pixel under test (PUT).The result of AD process is a two-dimensional detection matrix.To determine the exact location of targets (anomalies), a threshold should be performed on the detection matrix.

Kernel-RX detector
The Kernel RX is a nonlinear version of the RX detector, which was introduced by Kown and Nasrabadi [5].This method is based on the kernel theory.It performs far better than the standard RX detector.The kernelized version of the RX detector is defined by (2).In this equation   k(Xb,r) T represents a vector whose entries are kernels k(x(i),r), i=1…M, and represent the scalar mean of k(Xb,r) T .In addition, Kb is the Gram matrix before centering, and the elements of M×M matrix (1M)i,j=1/M.Kwon et al. [17].This method uses two windows, called "inner windows" and "outer windows", both of which are designed to maximize the separation between two-classes of data: target class data and background class data.The inner window is used to capture targets in the window; the outer window is used to model the local background.This algorithm extracts targets by projecting the differential mean between two windows onto the eigenvectors, which are associated with the first few largest Eigen-Values of the difference covariance matrix.If the covariance matrix of the inner and outer windows is named Cin and Cout, the difference covariance matrix which represents the differential secondorder statistics between the two classes, is defined in the following way:

DWEST Dual Window-based Eigen Separation Transform (DWEST) is an adaptive anomaly detector, developed by
The eigenvalues of Cdiff are divided into two groups, negative values and positive values.The eigenvectors associated with a small number of large positive eigenvalues of Cdiff can successfully extract the materials in the inner window that are spectrally distinctive.If the mean of inner and outer windows represented by min and mout, and the eigenvectors represented by the positive eigenvalues in this set are donated by {vi}, the DWEST detector projects the differential mean of two windows (which is defined by ( 7)) onto {vi} by (8) [18,19].
3. Dimensionality reduction methods 3.1.PCA PCA is the best known technique for data reduction.The main purpose of PCA is to reduce a dataset that consists of a large number of interrelated variables, while retaining the variation of the dataset as much as possible.This purpose is achieved by transforming the data into a new set of variables, the principal components (PCs), which are both uncorrelated and ordered so that the first few PCs retain most of the variation present in all of the original variables [20].An important problem in PCA-RX is the number of PCs that determine the amount of band reduction for a hyperspectral image.

DWT
As a different DR method, one can use the DWT to reduce the dimension of a hyperspectral image; it was first investigated for AD methods by Zare-Baghbidi et al. [13].A pixel within the hyperspectral image, like a signal, has low frequency components for its major part, and high frequency components for its minor part.Thus, the main behaviour of a signal can be found in approximation coefficients of the DWT, which are related to low frequencies of the main signal [21].
As an example, Figure 2 presents the spectral signature of a given pixel a from a hyperspectral image with 64 bands (part [a]).The four-level DWT coefficients of this signal, obtained using the Daubechies4 wavelet transform [22], are presented in part (b) of figure 2. As can be seen, only four samples, which are related to low frequencies in the original signal, carry relevant information about the signal.However, the rest of samples do not contain any relatively important information.Therefore, these samples can be discarded without losing significant information.As a result, the first four samples are the approximation coefficients of the main signal and are used to detect the anomalies.
The DWT DR method first calculates the DWT coefficients of every pixel in a hyperspectral data cube using the Daubechies8 wavelet.This wavelet transformation decomposes the main signal until eight samples are left.The eight samples provided in a matrix are called the "approximation matrix".
The approximation matrix is an image that has eight bands; this matrix is an abstract of the original image and can represent the main behaviour image data.Therefore, the anomaly detectors can be performed on this matrix.

FFT
For the purpose of AD, the DFT can be used in a three-step framework (see Figure 3) [14].In the first step, the Discrete Fourier transform (DFT) of every image pixel is calculated using the Fast Fourier Transform (FFT) [23].The "DFT amplitude" of the "DFT values" is then calculated.The results are stored in a matrix named "amplitude" (Figure 3.b).The last step uses a few bands of the amplitude matrix, which are related to low frequencies (and high values) of the main image.A new matrix is formed by this process.This new matrix is actually the abstract of the FFT amplitude matrix (Figure 3.c).The size of this abstract matrix is related to the amount of band reduction, and can be selected during the experiment.Due to this limitation, some self-test targets (red cotton, blue cotton, yellow nylon, and red nylon) were selected and implanted in another part of the image.To implant the targets in this sub-image (named "Img-I") a target implanted method [25] has been used.For this method, a synthetic subpixel anomaly, z, is a combination of both the target and background, as shown in (9).In this equation, t and b shows (i.e., denotes) the target and background respectively.Therefore, sub-pixel (z) consists of the target's spectrum with fraction f, and the background's spectrum with fraction (1f) [25].
(9) This implantation method does not include the adjacency effects of the target spectrum on the local background pixels.To have a more realistic condition, the background pixels, which are neighbours of the targets, can be affected by a target pixel.This effect can be achieved by using a Gaussian function with a width of w, as shown in (10), where pi is the spatial distance between background pixel (zi) and the target pixel (t) [4]. .(10) To construct the desired image, according to figure 4, a part of the main image is selected; the targets are then implanted in the selected subimage (Figure 4(a)).To apply the effect of the background on targets and make sub-pixels, outlines of targets have been selected and combined with their adjacent background according to (9) with the coefficient f=0.6.To apply the effect of anomalies on the background pixels (10) is used.The final image with implanted targets includes sub-pixel and full-pixel (or multi-pixel) targets.As a result, this image seems to be a perfect data for testing AD and TD algorithms.

AVIRIS data
Two other sub-images have been extracted from a hyperspectral image of a naval air station in San Diego, California, collected by the AVIRIS sensor [27,28].This data cube has 189 useful spectral bands with wavelengths from 400 to 2500 nm and a GSD of 3.5 meters (see Figure 5).The first sub-image, named Img-II, is an 80×80 pixel data cube that contains some military targets as anomalies and is used to evaluate the exact detection performance of algorithms using Receiver Operation Characteristic (ROC) curve (Figure 5(a)).The truth location of targets in this sub-image is shown in figure 5(b).The second sub-image, named Img-III, is an image window with 100×100 pixels.This sub-image contains 38 anomalous targets, which may be either helicopters or helipads, as shown in figure 5(c).
This sub-image is used in some TD works [29]; it is also used to evaluate the runtime of anomaly detectors.

Implementation
To evaluate the performance of the AD and DR methods, three AD algorithms, namely, the RX, Kernel-RX, and DWEST, have been implemented in the standard mode (without a pre-processing DR step) and with the three mentioned DR methods.Algorithms have been addressed according to table 1.One of the most important decisions for AD algorithms is the detection window size [4].
Although there is no specific method for choosing these windows [4], the size of the inner window should be almost as large as the biggest target in the scene.In addition, the size of the outer window should be large enough to provide a sufficient number of background samples for simulating the local background [30].According to the both above-mentioned rules and the results of the experiment, the inner and outer window size for Img-I are selected 3×3 and 11×11 pixels, respectively.The inner and outer windows for Img-II are selected 5×5 and 13×13 pixels and these values for Img-III are selected 5×5 and 11×11 pixels, respectively.An important decision for the DR methods is the amount of reduction that determines the number of image/feature bands after the DR step.This parameter should be selected according to two metrics: performance and runtime.In this study, according to the experiments the band number of output images is assumed to be 8.Therefore, at the pre-processing step, the spectral bands of the main image are reduced to 8 useful bands.

Detection performance evaluation
The ROC curve is the best way to evaluate the detection performance of AD algorithms.The ROC is a curve that shows the true detection rate (TDR) versus the false alarm rate (FAR) in a particular scenario.The TDR and FAR can be computed by varying the detection threshold and counting both the number of true detection targets and the corresponding number of false alarms in every threshold value [1].To evaluate the detection of algorithms more accurately, the AUC is used.This value is an exact criterion; it is widely used to evaluate the detection performance of target detection algorithms [7].Another way to evaluate the performance of algorithms is the visual investigation.This evaluation can be a good criterion using the post-processing threshold step.In this study, the evaluation of algorithms for Img-I and Img-II datasets is done using the ROC curve and the AUC value; in addition, the Img-III data is used to evaluate algorithms visually.

AD results of Img-I
Figure 6 shows the ROC curves of the RX detector family for Img-I.The ROC curves of the Kernel-RX and DWEST families shown in figures 7 and 8, respectively.The AUC values of all algorithms are presented in table 2. According to these criteria, the following results can be inferred.
The detection performance of the RX method in the standard mode is very weak in general; however, the use of the per-processing DR methods increases its performance significantly.
According to the AUC values, although the DWT-RX and FFT-RX methods exhibit the best performance among the RX family, the performance of all the methods that use DR as a pre-processing step are almost superior.The performance of Kernel-RX method does not change using PCA or FFT DR methods as a preprocessing step and DWT does not noticeably reduce its performance.For the DWEST family, DWT-DWEST performs best and the performance of other methods is almost the same.Of all the methods that are applied to Img-I, the DWT-DWEST performs best and the RX method performs worst.The performance of other methods is acceptable for detection of anomalies.According to these criteria, the following results can be inferred.The performance of RX method used without DR step is very weak; using DR pre-processing step increases its performance significantly.The performances of RX family using the DR step are almost the same.In the Kernel-RX family the Kernel-RX and PCA-KRX have the best performance and the performance of DWT-KRX and FFT-KRX are same.According to the results, the performance of all methods of this family is almost the same.The performance of DWEST family in all cases is almost the same and this mean DR step does not change the performance of it.Among all AD algorithms applied to Img-II, the DWT-DWEST and FFT-DWEST methods exhibit the best performance; the RX method performs worst.The performance of the other methods is good.These results are almost the same as the results inferred from the evaluation of the algorithms on Img-I.

AD results of Img-III
Img-III is used to evaluate the performance of anomaly detectors in a real scene.Because the truth location of the targets in this image is not available, the detection performance of AD algorithms is investigated visually.To achieve a better visual investigation, a threshold step is added at the end of the AD procedure.To execute this post-processing step, a cut-off threshold is needed; this value can be calculated adaptively using (11) [31]: Where   is the cut-off threshold that declares whether a pixel is a target or not, d  and d  are the mean and standard deviation of the output of the AD algorithm, respectively, and Z  is the z statistic at the significant level of α, which controls the number of pixels declared to be anomalies.Figure 12 shows the output of the threshold step using the adaptive cut-off threshold of (11).According to these results, the performance of RX is very weak.In addition, DR step increases its performance significantly.The performance of Kernel RX family is almost the same.This family suffers from False Alarm Rate (FAR) that reduces their performance.The performance of the DWEST family algorithms is almost the same.

Runtime evaluation
To evaluate the speed of the AD methods, a computer system with an "Intel Core i5-2410M, 2.3GHz" processor and four GB of Random Access Memory (RAM) is used to measure the runtime of algorithms on Img-III, in equal conditions.The runtime of the DR methods is shown in table 4 and the runtime of the AD methods, which includes the runtime of related DR pre-processing methods, is shown in table 5.In addition, figure 13 compares the runtime of the methods using a column chart.According to these results, with using the dimension reduction techniques, the FFT DR method has the best runtime.Among the AD families, the RX family has best runtime; nevertheless the Kernel RX family has the worst runtime.
The runtime of the RX and DWEST families that use the DR step is acceptable; these methods can be used in real-time applications by using parallel processing or hardware implementation of algorithms using field programmable gate array (FPGA) [32,33].Of all the methods, the FFT-RX has the best runtime: its runtime is about 124 times better than the slowest method, the Kernel-RX.

Conclusion
This paper evaluated the impact of linear dimensionality reduction methods on the performance of anomaly detection algorithms.By reducing the dimensions of the hyperspectral image as a pre-processing step, the detection performance and runtime of AD algorithms are improved.PCA, DWT and FFT as the main DR methods have been used to evaluate the performance of RX, Kernel-RX and DWEST AD algorithms.The results of the experiment on the AVIRIS and HyMap datasets were assessed using the ROC curve, the AUC values, and a visual investigation.According to these results, these DR methods increase the detection performance of RX method significantly and do not diminish the performance of Kernel-RX and DWEST methods.In addition, DR methods improve the runtime of RX and DWEST detectors significantly but this improvement about Kernel-RX is not much.FFT has the best runtime among DR methods and FFT-RX has the best runtime among AD methods.Based on these results, the DR methods, as a pre-processing step, can improve the performance of some AD algorithms and runtime of all algorithms.This runtime improvement makes the algorithms suitable for real-time application of TD in hyperspectral remotely sensed data.

Figure 1 .
Figure 1.Flowchart of hyperspectral AD using the preprocessing DR method.

Figure 2 .
Figure 2. (a) A spectrum pixel of a hyperspectral image, (b) 4-level DWT of the main signal.
The truth location of the targets that are either sub-pixel or full-pixel is shown in figure4(b).

Figure 4 .
Figure 4.A natural color composite of the HyMap data cube, (a) selected sub-image with implanted targets (Img-I), (b) truth location of targets [26].

Figure 5 .
Figure 5.A natural color composite of the AVIRIS data cube, (a) sub-image with real targets (Img-II), (b) truth locations of targets in Img-II and (c) sub-image with real targets (Img-III) [14].

Figure 6 .
Figure 6.ROC curves of the RX AD family for Img-I.

Figure 7 .
Figure 7. ROC curves of the Kernel-RX AD family for Img-I.

Figure 8 .
Figure 8. ROC curves of the DWEST AD family for Img-I.

Figure 9 .Figure 10 .
Figure 9. ROC curves of the RX AD family for Img-II.

Figure 11 .
Figure 11.ROC curves of DWEST AD family for Img-II.

Figure 12 .
Figure 12.Detection results of algorithms applied to Img-III.

Figure 13 .
Figure 13.Runtime comparison of various anomaly detectors applied to Img-III.
Targets of this image are divided into two parts: self-test and blind-test.Because only the real location of the self-test targets is available, this part of the image cannot be used to evaluate the performance of the AD algorithms.

Table 2 . AUC values of the AD methods applied to Img-I.
, 10, and 11, respectively; the AUC values of these methods are shown in table3.