Dim Target Detection in Infrared Images Using Saliency Algorithms

. Infrared (IR) target detection and tracking are commonly used in modern defense systems. Target detection is the first and very important step for several surveillance applications. Long distance between imager and targets or bad weather conditions mostly cause dim target appearance with low signal-to-noise ratio (SNR) in IR images. In this study, dim targets in IR images are enhanced and detected using saliency detection algorithms, which have not been used in IR wavelength before. Perfor-mances of the algorithms are evaluated on common IR datasets. Algorithms are compared in terms of SNR, receiver operating characteristic (ROC) and area under curve (AUC) score. Effects of parameter selection are also considered for automatic target detection. Furthermore, feasibility of the methods for real-time applications are discussed.


Introduction
Detection of targets in IR images is of great importance in IR search and tracking applications.Especially when targets are dim and have low signal-to-noise ratio, they are hard to differentiate from background noise, which usually causes high false alarm rates for conventional methods.IR imagers are widely used in military implementations since they sense thermal diffusion and reflection of objects.Besides their benefits, they have some drawbacks, such as low contrast between target and background and deficient spatial resolution that cause inconvenience for computer vision algorithms.Automatic target detection is an essential step for sundry surveillance applications in computer vision applications.Detection and tracking become very laborious in IR images since they yield important information about a target.Applying a threshold to an image is the most facile way to detect a target, but it is not well advised if the target has low contrast with background or has both of cold and hot components.Applying low threshold triggers high false alarm rate, whereas high threshold decreases probability of detection.A general solution to surmount the drawbacks is to enhance target signal before applying a threshold directly.Thus, undesirable results may be avoided.
In IR images, even when targets have low contrast according to their surroundings or appear in cluttered background, they are salient regions and have potential of taking visual attention.In other words, the image structure belonging to the target area differs barely or conspicuously from its local background.In some cases, it is not visually differentiable from its background.For this reason, utilizing methods that model attention of human visual system builds our motivation in order to enhance targets in IR images.A wide variety of approaches have been proposed to solve saliency detection such as frequency tuned salient region detection [1], spectral residual approach in saliency detection [2], image signature [3], computational model of visual attention called saliency maps [4], rapid scene analysis using saliency-based visual attention model [5], global contrast based salient region detection [6], context-aware saliency detection [7], graph-based visual saliency [8] and salient object detection [9].Stable multisubspace learning method has been used to detect dim and small targets in heterogeneous scenes in [10].Specifically, research progress of image saliency has been analyzed in [11].Various methods for saliency detection in color images have been reviewed in [12].
We compare the performance of several saliency detection methods and configure each of them with two different parameter sets to show the effect of parameter for detecting targets in IR images having low contrast or situated in complicated scenes.Following methods designed and successfully applied for images in the visible wavelength are arranged for IR images and considered in this comparison: Frequency tuned (FT) [1], spectral residual (SR) [2] and image signature (IS) [3].Algorithms are compared according to the enhancement, detection performance and execution time.
In the next section, saliency detection and aforementioned algorithms are described.In Sec. 3 and 4, target detection and experimental results are discussed, respectively.In the last section, conclusion is given.

Saliency Methods
Target regions are acknowledged as the principal areas in an image.A saliency map denotes visually dominant locations such that it represents image areas where it stimulates visual perception more than the other areas.In this work, saliency maps obtained from the mentioned three methods are utilized as enhanced target image for detection purpose.
FT method uses Lab color space and detects salient regions with color and luminance features [1].Some frequency bands in input image are filtered by utilizing difference of Gaussians (DoG) as a band pass filter.Very low frequencies from the original image are passed in order to accentuate immense salient regions, whereas high frequencies are considered to define boundaries of the regions.In contrast to these preserved frequencies, the highest frequencies are ignored to suppress noise.In our work, since IR images are grayscale, input images are converted to Lab color space to attain performance of the pristine work, in lieu of utilizing the input directly as L luminance and ignoring a and b color components.Output saliency map S has the same size as the input image I and aims to highlight the most sizably voluminous salient regions: where I μ is mean intensity vector of the input image, I whc is Gaussian smoothed version of the original image, and • denotes Euclidean distance.Each pixel location is a vector in [L,a,b] T form.Also, x and y are horizontal and vertical components, respectively.
In spectral residual (SR) approach, contrast between the original log-spectrum of the Fast Fourier Transform (FFT) and its smoothed variant, in particular the spectral residual (SR), is figured to acquire curiosity districts of an image [2].At that point combination of phase spectrum and the spectral residual is transformed to spatial domain by utilizing inverse Fast Fourier Transform to achieve saliency map.As indicated by [2], the technique can be condensed as where (i ,j) is the spectral residual and S(x,y) indicates saliency map. and 1   represent the Fast Fourier Transform and Inverse Fast Fourier Transform, individually.h n (i,j) indicates the mean filter with size n  n. g is a Gaussian filter and it smooths the saliency map to show signs of improvement visual impacts.* is the convolution operation.Input image size is scaled to 64 pixels height (or width) before FFT and the saliency map is rescaled to the input image size at the last step.In the original work, color input image is converted to grayscale image before applying the first process [2].It is not required to execute this procedure here, because input is a grayscale image itself.
In [3], a simple image descriptor, called the image signature (IS), is presented.IS is defined as the sign function of the Two-Dimensional Discrete Cosine Transform (2D-DCT) of an image and given as below: where Î is the 2D-DCT of I. Equation ( 8) forsakes amplitude information and holds only phase information.For each channel of the image I k , saliency map is obtained as below: where IDCT is Inverse 2D-DCT, and o represents elementwise product operator.Note that IR images have only one channel.

Target Detection
Saliency maps obtained using the methods in the previous section contain enhanced salient targets.After targets in IR images are enhanced, a threshold is applied to these maps generated by each of the saliency algorithms to detect targets.
Using a fixed threshold is not appropriate to label target pixels correctly, since the saliency maps have different statistical properties.Therefore, the threshold should be adaptive to the map.In [1] and [2], adaptive threshold is only related to the mean value of the obtained saliency map, while [3] gives the detection results in terms of receiver operating characteristic (ROC) and the area under the ROC (or curve), and does not apply a single threshold.Conversely, we show that the most suitable threshold value for detecting targets in a saliency map depends on both the mean and the standard deviation of the map.Thus, we introduce a new adaptive thresholding for the normalized saliency maps as where m and s represent the mean and the standard deviation of the saliency map, respectively.This formulation is derived from Th = m + c * s, commonly used to binarize input images, where c is a constant.Before applying thresh-old, each pixel value of the saliency map is normalized to [0 1].(1 -m) reflects the difference between the maximum and the mean value accordingly.Using the threshold given by (11), binarized map of salient targets is given by where l is the upper limit of threshold, which is smaller than 1, and S N (x,y) is normalized saliency map.In order to have the highest target detection achievement, optimal threshold value assigns target pixels to 1, while it makes background pixels to 0 in the binarized map B(x,y).In IR images, targets are salient regions because of their contrast and detailed structure.Specifically, the mean and standard deviation of normalized saliency maps obtained from IR images are low, besides the target areas are brighter than the background.An example of mean value and standard deviation of ten different normalized saliency maps are given in Tab. 1.In this table, threshold values are obtained by increasing from a lower value until the false positive rate has lower score, while the true positive rate is at higher.After that value, true positive rate starts to decrease.The highest threshold value amongst all normalized saliency maps examined in our experiments is 0.8.Therefore, setting l to 0.8 will be the safest value for upper bound in order to prevent a target area pixel to be classified as a background pixel.When comparing l to the threshold value Th of the normalized saliency map as in (12), smaller one of l and Th will be chosen, so that the pixel classification results in high sensitivity.Therefore, choosing the lesser value from l and Th increases the target detection achievement.Even though classification using the threshold (11) achieves high true positive rate and most of the threshold values are less than l, using l as an upper bound plays a role as a controller at the top level when obtaining binarized map B(x,y).

Experimental Results and Discussion
Experiments are held for two different parameter sets for each method.Table 2 contains the parameters that may take variable values.
In FT, size and standard deviation of the Gaussian filter that determines higher cut off frequency in DoG may take numerous values.In SR, mean filter size may be adjustable, but changing the parameter value does not affect obtained residual [2].Thus, it is considered as a fixed parameter instead of a tunable parameter here.A fixed value is also used for scaling factor since 64 pixels width (or height) as down-scaling size is recommended in [2] and [3].The Gaussian filter is employed to smooth the output saliency map at the final stage in [2] and [3].Changing its standard deviation and size is possible.
In this work, not only the methods are compared, but also the effects of the Gaussian filters in each method are examined in enhancement, detection and required computa- tion time manner.We use two different parameters for the Gaussian filters in order to reflect performance of the methods in arduous conditions having various target size, spatial image resolution, target contrast and background complexity.At the first set, Set1, the filter parameters, which are standard deviation and size, are arranged to 0.5 and 3  3, respectively.At the second set, Set2, standard deviation and filter size are set to 2.2 and 9  9 in that order.The filter size is determined minimally to demonstrate the major characteristic of the Gaussian filter at that standard deviation.Test scenarios are collected from commonly used IR dataset, AMCOM [13] and SENSIAC [14], and grouped into two different target size scales, i.e., small and large.Each group also consists of two different resolutions.These scenarios are given in Tab. 3. The letters S and L in the scene column denote scenes containing small and large target, respectively.While some of the small and large scenes have high resolution, others have low resolution.
Resolution and target size of each scene are given in terms of pixel in the first and the second column, and their percentages (target size/resolution) are shown in the ratio column of Tab. 3, respectively.
In order to evaluate enhancement results, visibility of input and output target signals are compared using where I T represents mean intensity value of the target pixels in an image.μ B and σ B are mean intensity value and standard deviation of the image, respectively.
SNR results for small and large target scenes are given in Tab. 4 and Tab. 5, respectively.Average SNR value is given at the last row in the tables as SNR µ .For both Set1 and Set2 parameters, the best results for small target scenes are obtained by IS and SR in Tab. 4. Specifically, both methods give higher SNR values than the input.Namely, they improve the input for target detection.However, FT is inappropriate for small target scenes.For three of the methods, higher standard deviation and bigger filter size of Set2 result in lower SNR than Set1.Analogously, methods with Set1 parameters enhance the results better than with Set2, since Set1 has low standard deviation and filter size that are appropriate for small targets.
SNR results for large target scenes are shown in Tab. 5.In this case, FT enhances the input better for uniform targets such as L3 and L4.IS and SR also improve the input.Three of the methods with Set2 parameters obtain slightly higher SNR values for than with Set1, since standard deviation and filter size are high in Set2.In addition to SNR, enhancement and target detection results for four small and four large scenes are shown in Fig. 1 and Fig. 2 to be able to make subjective evaluation.
Detection results are given in terms of ROC in Fig. 3 and Fig. 4 for small and large targets, respectively.In these figures, best detection results are achieved by SR and IS for small target scenes in SNR.High true positive rate against low false positive rate in ROC graph indicates a high target detection achievement.AUC scores are given in Tab.6 and Tab.7 for small and large targets, respectively.Average AUC is given as AUC µ at the last rows of the tables.AUC measures target detection achievement with the highest value of 1.The higher the AUC is, the better the detection result is.Its relationship with ROC is that if true positive rate increases so does AUC, since it is measured by the area under the ROC curve.Therefore, high AUC value indicates more accurate target detection result.Parallel to the SNR results, IS and SR give the best AUC results for both small and most of the large target scenes as shown in Tab.6 and Tab. 7.
Number of blobs for IR images with four small and four large targets located at the target or the background region are shown in Tab. 8 and Tab. 9, individually.The letters T and B represent "at the target region" and "at the background region", respectively.The symbol -denotes no blob detected at this region.Higher blob number at the target and lower blob number at the background mean better detection results.Furthermore, only one blob at the target is preferable to the piecewise blobs since the entire blobs at the target belong to single target.
A comparison about feasibility for real-time applications and required execution time is provided in Tab. 10 for each method.Each of the methods is processed 100 times for all scenes and the average computation time is calculated for each of them.Algorithms are performed on a computer with 3.50 GHz Intel® Xeon® CPU E3-1270 v3 and 32 GB RAM.A threshold calculated via the method mentioned in the previous section is applied to each enhanced target images and accordingly detection results are obtained.Detection results for the methods FT, IS, SR with the sets Set1 and Set2 around the magnified target region for both small and large target scenes are shown at the second row for each scene in Fig. 1 and Fig. 2, respectively.The targets are aligned at the center of the region exactly.As visually perceived in the given results in the figures and the tables, FT is not capable at the diminutive target scenes especially having background such as S1 and S2 for both of the sets.FT with Set1, which has smaller standard deviation, highlights the target region as well as the largest uniform area in the scene that is a road at S4.In addition to poor performance in enhancement, FT usually loses the targets and offers very high false alarm rates (compare the number of blobs detected at the target and at the background in Tab. 8).The performance of FT is additionally not adequate at the large targets locating at the cluttered background such as L1 and L2 in Tab. 9.If a target is larger and nearly uniform at a scene like L3 and L4, FT may perform better and is capable to detect blobs placed at the target region and may provide no false alarm.Moreover, the other methods fail visually in particular at L3, as shown in Fig.On the other hand, the detected blobs by using Set1 are biased according to the center of the region at the second row at S1 and S2 in Fig. 1.This implies that Set1 has disadvantage on determining exact location of the targets.For the large targets, Set2 works well in contrast to Set1.requirements for real-time applications since they may be performed on more than 150 frames for each second.
In general speaking, Set1 with smaller standard deviation is more felicitous for very small targets, whereas Set2 with larger standard deviation is more opportune for large targets.FT may be well-accomplished if a target is virtually uniform and occupies a sizably voluminous region in an image, otherwise SR and IS with congruous parameter sets outperform FT.The performances of SR and IS are proximate to each other.
One of the most consequential and vital goal of automatic target detection algorithms is to find pixel coordinates of the targets in an image.Lower scaling factor in SR and IS should be employed with the intention of getting target position in high precision, particularly at the diminutive targets.Furthermore, SR and IS are also acceptable for real-time applications.

Conclusion
Three different saliency detection algorithms commonly utilized in visible wavelength are used newly for IR images and compared in this work.Each of them is designed with two diverse parameter sets to examine the results of parameter choice on target enhancement and detection under sundry conditions such as different target size, target contrast, image resolution and background complexity.Size of the target is the most foremost limitation while selecting a well-suited parameter set.IS and SR with right parameter sets give adequate results in all conditions aside from the scenes where target-image resolution ratio is too high.FT shows better performance in these scenes by highlighting large uniform target areas.This designates that the methods are preferable to each other according to different conditions.An exertion can be expended to develop new methods that need no prior information about scene and are influenced less from varieties of the conditions in the future.Also, an SNR method that is more coherent with human visual perception system than the existing objective evaluation algorithms can be studied to evaluate the detection results.
spectrum of input image I, respectively.
Tunable parameters of the methods.
Tab. 3. Specifications of the representative scenes.
SNR results of the large target scenes.