Object fusion tracking based on visible and infrared images: A comprehensive review
Introduction
Visual object tracking has received significant attention in recent years due to its wide applications in many areas, such as robotics [1], autonomous vehicles [2], human-computer interface [3] and video surveillance [4]. According to the type of images included, it can be roughly classified into tracking based on visible images, tracking based on infrared images, and RGB-infrared fusion tracking. Among these types, the most popular is tracking based on visible images. Note that in this paper we do not distinguish between visible and RGB (Red-Green-Blue) images, although the visible images also contain gray-scale images.
Currently, two main kinds of methods in visual object tracking are deep learning (DL)-based methods [5] and correlation filter (CF)-based approaches [6]. Tracking methods based on deep learning mainly utilize its strong feature representation ability to extract better features than handcrafted ones, thus these approaches can achieve good tracking results in many cases. Here handcrafted ones means the features that are designed manually, such as histogram of oriented gradients (HOG) [7] and scale invariant feature transform (SIFT) [8]. Previously, deep learning-based methods suffer from slow speed severely [9]. However, with the application of fully convolutional Siamese networks in tracking [5], recent deep learning-based trackers achieve high performance tracking results while maintaining real-time speeds [10], [11], [12]. In CF-based tracking algorithms, the model can be updated in real-time as the correlation operation can be efficiently implemented via the Fast Fourier Transform (FFT). Therefore, during tracking process, CF-based methods utilizing shallow features can run in real-time. However, recently some CF-based trackers use raw deep convolutional features which are of high dimensionality [13], [14], [15]. These trackers become slower and slower because the computational time for the correlation filters increases with the feature dimensionality [16].
However, due to the limitation of the imaging mechanism of visible images, tracking algorithms based on visible images may fail as they may be unreliable in certain circumstances. For example, when the illumination conditions are poor or change significantly. The infrared images detect thermal information of objects and are insensitive to these factors. They can provide complementary information to visible images, as shown in Fig. 1a. In recent years, researchers also explore performing object tracking with infrared images [18], [19], [20], [21]. However, the infrared images typically have low resolutions and poor textures, and are also unreliable in certain conditions as shown in Fig. 1b. Therefore, more researchers begin to investigate object tracking method based on the fusion of visible and infrared images to overcome the inherent shortcomings of the methods based on single-modal images. By fusing complementary information from visible and infrared images, the robustness of tracking algorithms can be greatly enhanced. As a result, in recent years, object tracking based on RGB and infrared images have become a hot research topic. An increasing number of researches have been published in high quality journals or well-known conferences [22], [23], [24], [25], [26], [27], [28], [29], [30]. As a consequence, the well-known visual object tracking challenge (VOT) started a new RGB-infrared subchallenge in 20191, aiming to attract researchers to evaluate the performances on provided video sequences. Note that since the appearance of tracking based on visible and infrared images, it did not have a consistent name. A large part of researchers used fusion tracking [31], [32] or tracking by fusion [33], [34], [35]. It was until 2017 that some researchers started using RGBT tracking [27]. In this review, we denote the object tracking based on the fusion of visible and infrared images as RGB-infrared fusion tracking, because we think it can cover this kind of methods better and is thus more suitable for a comprehensive review. Besides, by using this name we aim to emphasize the importance of fusion in this kind of methods.
The research on RGB-infrared fusion tracking has begun in 2000s, as indicated by the development timeline of this field given in Fig. 2. RGB-infrared fusion tracking can be categorized into different categories. According to the primary modality utilized during fusion tracking, there are infrared-assisted RGB tracking and RGB-assisted infrared tracking. In infrared-assisted RGB tracking, visible image is the primary modality. Infrared images are employed for assisting RGB tracking, especially when the visible images are not reliable [36], [37]. In these works, the evaluation metrics are evaluated based on the ground truth of visible images. In contrast, in RGB-assisted infrared tracking, infrared image is the primary modality and all evaluation metrics need to be computed based on the infrared ground truth [38]. In this paper, we broadly divide the RGB-infrared fusion tracking methods into five categories according to their adopted theories, namely traditional methods, sparse representation (SR)-based, graph-based, correlation filter-based and deep learning-based approaches. It is well known that effective and robust feature representation is crucial for tracking algorithms. Before sparse representation-, graph-, correlation filter- and deep learning-based methods, researchers performed fusion tracking using traditional techniques such as mean shift, Camshift, Kalman filter, and particle filter. Traditional methods utilize handcrafted features to represent the target. Sparse representation-based methods work on the basis of possible representation of the target with linear combinations of bases in overcomplete dictionaries. Graph-based approaches firstly divide the bounding box around the target to non-overlapping patches, and then build the relationship among these patches to work out a feature representation of the target. CF-based trackers learn correlation filters online efficiently to adapt to variation of the target. Deep learning-based methods leverage the strong feature representation ability of deep neural networks to learn robust feature representation of the target from a large amount of images. In all these methods, a key point of achieving good fusion tracking performance is the effective combination of visible and infrared features.
As can be seen from Fig. 2, RGB-infrared fusion tracking is developing very fast. However, to the best of our knowledge, there is a lack of review on RGB-infrared fusion tracking in the literature that gives a comparison and evaluates the performance of these different techniques. This paper tries to fill this gap. The main contributions of this review are in several aspects. First, to the best of our knowledge, this is the first review on RGB-infrared fusion tracking. This manuscript systematically investigates the RGB-infrared fusion tracking methods, benchmark datasets, and evaluation metrics. Main RGB-infrared tracking algorithms are grouped into several types according to their corresponding theories and each kind is introduced in detail, including the main principles, representative methods as well as pros and cons. Second, main results on public datasets are presented and analyzed in this review to provide an objective comparison of the existing approaches. Third, based on the systematically review of main RGB-infrared fusion tracking methods and the performance comparison of different trackers, we give detailed discussions on future prospects and provide suggestions on promising research directions of this field.
The structure of this review is schematically illustrated in Fig. 3. Section 2 gives some background information. In Section 3, RGB-infrared fusion tracking methods are discussed in detail, including key points in implementation and different fusion levels. In Section 4, we summarize the development of RGB-infrared datasets. Section 5 introduces the evaluation metrics. Section 6 presents experimental results and gives an analysis on the performances. Sections 7 discusses the future prospects. Finally, Section 8 concludes the paper.
Section snippets
Related work
This section discusses some related works which are helpful for understanding and performing RGB-infrared fusion tracking.
RGB-infrared fusion tracking
In recent years, a lot of RGB-infrared fusion tracking algorithms have been proposed and some examples are listed in Table 1. In this section, we firstly discuss the key points of achieving good fusion tracking performance. Then, we introduce the fusion levels in fusion tracking. According to when the images are fused, they can be divided into pixel-level, feature-level and decision-level fusion tracking. We then give a comprehensive survey on RGB-infrared fusion tracking methods. These methods
Available RGB-infrared dataset
Large-scale datasets are of vital importance in RGB-infrared fusion tracking, since they are not only beneficial for training algorithms, but are also crucial for testing algorithms and comparing performance among trackers. Before large-scale datasets are available, in most RGB-infrared fusion tracking publications, the experimental part only employs several visible and infrared video pairs or even one single video pair to verify the algorithm. For example, the OTCBVS dataset [112] which
Evaluation metrics
In recent years, several well-recognized evaluation metrics have been proposed to evaluate tracking performance based on visible images. These include precision rate (PR), success rate (SR), accuracy, robustness and Expected Average Overlap (EAO). These evaluation metrics can also be applied to RGB-infrared fusion tracking.
Benchmark results and analysis
In this section, we present results on available public fusion tracking datasets. The results are either collected from the published literature or produced by the authors. The aim is to facilitate the research of this direction and make it easier for researchers to compare tracking results with the state-of-the-arts. It should be mentioned that many RGB-infrared trackers are not open-source and their results on public dataset have not been reported [22], [23], [24], [38], [79], [80]. As a
Future prospects
Despite the remarkable progress that has been achieved in RGB-infrared fusion tracking, several issues remain for future work. In this section, we give detailed discussions on specific trends of RGB-infrared fusion tracking based on the review of existing approaches.
Conclusion
Fusion tracking based on visible and infrared images (RGB-infrared fusion tracking) has attracted considerable attention and made significant progress in the past few years. In this paper, we comprehensively review existing RGB-infrared fusion tracking methods in the literature. These approaches can be generally divided into five categories: traditional methods, sparse representation-based, graph-based, correlation filter-based, and deep learning-based methods. Each category is introduced and
Declaration of Competing Interests
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
CRediT authorship contribution statement
Xingchen Zhang: Conceptualization, Investigation, Writing - original draft, Writing - review & editing. Ping Ye: Visualization, Investigation, Data curation. Henry Leung: Writing - review & editing. Ke Gong: Visualization, Investigation. Gang Xiao: Supervision, Writing - review & editing, Funding acquisition, Project administration.
Acknowledgment
This work was sponsored in part by the National Program on Key Basic Research Project of China under Grant 2014CB744903, in part by the National Natural Science Foundation of China under Grant 61973212 and Grant 61673270, in part by the Shanghai Science and Technology Committee Research Project under Grant 17DZ1204304, in part by the Shanghai Industrial Strengthening Project under Grant GYQJ-2017-5-08.
References (142)
- et al.
Deep convolutional neural networks for thermal infrared object tracking
Knowl.-Base.Syst.
(2017) - et al.
Synthetic data generation for end-to-end thermal infrared tracking
IEEE Trans. Image Process.
(2018) - et al.
Learning collaborative sparse representation for grayscale-thermal tracking
IEEE Trans. Image Process.
(2016) - et al.
Weighted sparse representation regularized graph learning for RGB-T object tracking
Proceedings of the 25th ACM international conference on Multimedia
(2017) - et al.
Fusion tracking in color and infrared images using sequential belief propagation
IEEE International Conference on Robotics and Automation
(2008) - et al.
Multi-focus image fusion using PCNN
Pattern Recognit.
(2010) A review of remote sensing image fusion methods
Inf. Fusion
(2016)- et al.
Perceptual quality assessment for multi-exposure image fusion
IEEE Trans. Image Process.
(2015) Tensor sparse representation for 3-D medical image fusion using weighted average rule
IEEE Trans. Biomed. Eng.
(2018)- et al.
A general framework for image fusion based on multi-scale transform and sparse representation
Inf. Fusion
(2015)
An application of compressive sensing for image fusion
Int. J. Comput. Mathemat.
Pixel-level image fusion: a survey of the state of the art
Inf. Fusion
Deep learning for pixel-level image fusion: Recent advances and future prospects
Inf. Fusion
Infrared and visible image fusion methods and applications: A survey
Inf. Fusion
Convolutional neural network-based multimodal image fusion via similarity learning in the shearlet domain
Neural Comput. Appl.
Infrared and visible image fusion with convolutional neural networks
Int. J. Wavelet., Multiresolution Inf. Process.
Comparison of infrared and visible imagery for object tracking: Toward trackers with superior IR performance
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops
RGB-T object tracking: benchmark and baseline
arXiv Preprint arXiv:1805.08982
Visual tracking using locality-constrained linear coding and saliency map for visible light and infrared image sequences
Signal Processing:Image Communication
Learning soft-consistent correlation filters for RGB-T Object tracking
Chinese Conference on Pattern Recognition and Computer Vision (PRCV)
Thermal infrared and visible sequences fusion tracking based on a hybrid tracking framework with adaptive weighting scheme
Infrar. Phys. Technol.
Discriminative fusion correlation learning for visible and infrared tracking
Math. Probl. Eng.
Hand posture recognition using finger geometric feature
Proceedings of the 21st International Conference on Pattern Recognition
Path-tracking for autonomous vehicles at the limit of friction
2017 American Control Conference
Visual object tracking classical and contemporary approaches
Front. Comput. Sci.
Fully-convolutional siamese networks for object tracking
European Conference on Computer Vision
High-speed tracking with kernelized correlation filters
IEEE Trans. Pattern Anal. Mach. Intell.
Histograms of oriented gradients for human detection
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition
Scale invariant feature transform
Scholarpedia
Learning multi-domain convolutional neural networks for visual tracking
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition
High performance visual tracking with siamese region proposal network
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition
SiamRPN++: Evolution of siamese visual tracking with very deep networks
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition
Convolutional features for correlation filter based visual tracking
Proceedings of the IEEE International Conference on Computer Vision Workshops
ECO: efficient convolution operators for tracking
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition
Beyond correlation filters: learning continuous convolution operators for visual tracking
European Conference on Computer Vision
Context-aware deep feature compression for high-speed visual tracking
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition
Fusion-based background-subtraction using contour saliency
Computer Society Conference on Computer Vision and Pattern Recognition Workshops
A thermal object tracking benchmark
Proceedings of 12th IEEE International Conference on Advanced Video and Signal Based Surveillance
Hierarchical spatial-aware siamese network for thermal infrared object tracking
Know.-Based Syst.
Learning modality-consistency feature templates: a robust RGB-infrared tracking system
IEEE Transa. Ind. Electron.
Robust collaborative discriminative learning for rgb-infrared tracking
Thirty-Second AAAI Conference on Artificial Intelligence
Modality-correlation-aware sparse representation for RGB-infrared object tracking
Pattern Recognit. Lett.
Fusing two-stream convolutional neural networks for RGB-T object tracking
Neurocomputing
Cross-modal ranking with soft consistency and noisy labels for robust RGB-T tracking
Proceedings of European Conference on Computer Vision
Learning local-global multi-graph descriptors for RGB-T object tracking
IEEE Trans. Circuit. Syst. Video Technol.
Fast RGB-T tracking via cross-modal correlation filters
Neurocomputing
Fusion tracking in color and infrared images using joint sparse representation
Sci. China Inf. Sci.
Cited by (130)
Multi-scale convolutional neural networks and saliency weight maps for infrared and visible image fusion
2024, Journal of Visual Communication and Image RepresentationInfrared and visible image fusion via mixed-frequency hierarchical guided learning
2023, Infrared Physics and TechnologyEADS: Edge-assisted and dual similarity loss for unpaired infrared-to-visible video translation
2023, Infrared Physics and Technology