Elsevier

Information Fusion

Volume 63, November 2020, Pages 166-187
Information Fusion

Object fusion tracking based on visible and infrared images: A comprehensive review

https://doi.org/10.1016/j.inffus.2020.05.002Get rights and content

Highlights

  • A review of fusion tracking methods via visible and infrared images is presented.

  • Main RGB-infrared trackers are summarized and categorized into several groups.

  • Public RGB-infrared datasets are summarized and compared.

  • Main results on public datasets are summarized and analyzed in detail.

  • Future prospects of RGB-infrared fusion tracking are discussed and suggested.

Abstract

Visual object tracking has attracted widespread interests recently. Due to the complementary features provided by visible and infrared images, fusion tracking based on visible and infrared images can boost the tracking performance under adverse challenging conditions. RGB-infrared fusion tracking has become an active research topic and various algorithms have been proposed in recent years. In this paper, we present a review on RGB-infrared fusion tracking. We summarize all major RGB-infrared trackers in the literature and categorize them into several major groups for better understanding. We also discuss the development of RGB-infrared datasets, and analyze the main results on public datasets. We observe that deep learning-based methodsachieve the state-of-the-art performances. Besides, the graph-based and correlation filter-based methods give a bit worse but still competitive performances. In conclusion, we give some suggestions on future research directions of fusion tracking based on our observations. This review can serve as a reference for researchers in RGB-infrared fusion tracking, image fusion, and related fields.

Introduction

Visual object tracking has received significant attention in recent years due to its wide applications in many areas, such as robotics [1], autonomous vehicles [2], human-computer interface [3] and video surveillance [4]. According to the type of images included, it can be roughly classified into tracking based on visible images, tracking based on infrared images, and RGB-infrared fusion tracking. Among these types, the most popular is tracking based on visible images. Note that in this paper we do not distinguish between visible and RGB (Red-Green-Blue) images, although the visible images also contain gray-scale images.

Currently, two main kinds of methods in visual object tracking are deep learning (DL)-based methods [5] and correlation filter (CF)-based approaches [6]. Tracking methods based on deep learning mainly utilize its strong feature representation ability to extract better features than handcrafted ones, thus these approaches can achieve good tracking results in many cases. Here handcrafted ones means the features that are designed manually, such as histogram of oriented gradients (HOG) [7] and scale invariant feature transform (SIFT) [8]. Previously, deep learning-based methods suffer from slow speed severely [9]. However, with the application of fully convolutional Siamese networks in tracking [5], recent deep learning-based trackers achieve high performance tracking results while maintaining real-time speeds [10], [11], [12]. In CF-based tracking algorithms, the model can be updated in real-time as the correlation operation can be efficiently implemented via the Fast Fourier Transform (FFT). Therefore, during tracking process, CF-based methods utilizing shallow features can run in real-time. However, recently some CF-based trackers use raw deep convolutional features which are of high dimensionality [13], [14], [15]. These trackers become slower and slower because the computational time for the correlation filters increases with the feature dimensionality [16].

However, due to the limitation of the imaging mechanism of visible images, tracking algorithms based on visible images may fail as they may be unreliable in certain circumstances. For example, when the illumination conditions are poor or change significantly. The infrared images detect thermal information of objects and are insensitive to these factors. They can provide complementary information to visible images, as shown in Fig. 1a. In recent years, researchers also explore performing object tracking with infrared images [18], [19], [20], [21]. However, the infrared images typically have low resolutions and poor textures, and are also unreliable in certain conditions as shown in Fig. 1b. Therefore, more researchers begin to investigate object tracking method based on the fusion of visible and infrared images to overcome the inherent shortcomings of the methods based on single-modal images. By fusing complementary information from visible and infrared images, the robustness of tracking algorithms can be greatly enhanced. As a result, in recent years, object tracking based on RGB and infrared images have become a hot research topic. An increasing number of researches have been published in high quality journals or well-known conferences [22], [23], [24], [25], [26], [27], [28], [29], [30]. As a consequence, the well-known visual object tracking challenge (VOT) started a new RGB-infrared subchallenge in 20191, aiming to attract researchers to evaluate the performances on provided video sequences. Note that since the appearance of tracking based on visible and infrared images, it did not have a consistent name. A large part of researchers used fusion tracking [31], [32] or tracking by fusion [33], [34], [35]. It was until 2017 that some researchers started using RGBT tracking [27]. In this review, we denote the object tracking based on the fusion of visible and infrared images as RGB-infrared fusion tracking, because we think it can cover this kind of methods better and is thus more suitable for a comprehensive review. Besides, by using this name we aim to emphasize the importance of fusion in this kind of methods.

The research on RGB-infrared fusion tracking has begun in 2000s, as indicated by the development timeline of this field given in Fig. 2. RGB-infrared fusion tracking can be categorized into different categories.  According to the primary modality utilized during fusion tracking, there are infrared-assisted RGB tracking and RGB-assisted infrared tracking. In infrared-assisted RGB tracking, visible image is the primary modality. Infrared images are employed for assisting RGB tracking, especially when the visible images are not reliable [36], [37]. In these works, the evaluation metrics are evaluated based on the ground truth of visible images. In contrast, in RGB-assisted infrared tracking, infrared image is the primary modality and all evaluation metrics need to be computed based on the infrared ground truth [38]. In this paper, we broadly divide the RGB-infrared fusion tracking methods into five categories according to their adopted theories, namely traditional methods, sparse representation (SR)-based, graph-based, correlation filter-based and deep learning-based approaches. It is well known that effective and robust feature representation is crucial for tracking algorithms. Before sparse representation-, graph-, correlation filter- and deep learning-based methods, researchers performed fusion tracking using traditional techniques such as mean shift, Camshift, Kalman filter, and particle filter. Traditional methods utilize handcrafted features to represent the target. Sparse representation-based methods work on the basis of possible representation of the target with linear combinations of bases in overcomplete dictionaries. Graph-based approaches firstly divide the bounding box around the target to non-overlapping patches, and then build the relationship among these patches to work out a feature representation of the target. CF-based trackers learn correlation filters online efficiently to adapt to variation of the target. Deep learning-based methods leverage the strong feature representation ability of deep neural networks to learn robust feature representation of the target from a large amount of images. In all these methods, a key point of achieving good fusion tracking performance is the effective combination of visible and infrared features.

As can be seen from Fig. 2, RGB-infrared fusion tracking is developing very fast. However, to the best of our knowledge, there is a lack of review on RGB-infrared fusion tracking in the literature that gives a comparison and evaluates the performance of these different techniques. This paper tries to fill this gap. The main contributions of this review are in several aspects. First, to the best of our knowledge, this is the first review on RGB-infrared fusion tracking. This manuscript systematically investigates the RGB-infrared fusion tracking methods, benchmark datasets, and evaluation metrics. Main RGB-infrared tracking algorithms are grouped into several types according to their corresponding theories and each kind is introduced in detail, including the main principles, representative methods as well as pros and cons. Second, main results on public datasets are presented and analyzed in this review to provide an objective comparison of the existing approaches. Third, based on the systematically review of main RGB-infrared fusion tracking methods and the performance comparison of different trackers, we give detailed discussions on future prospects and provide suggestions on promising research directions of this field.

The structure of this review is schematically illustrated in Fig. 3. Section 2 gives some background information. In Section 3, RGB-infrared fusion tracking methods are discussed in detail, including key points in implementation and different fusion levels. In Section 4, we summarize the development of RGB-infrared datasets. Section 5 introduces the evaluation metrics. Section 6 presents experimental results and gives an analysis on the performances. Sections 7 discusses the future prospects. Finally, Section 8 concludes the paper.

Section snippets

Related work

This section discusses some related works which are helpful for understanding and performing RGB-infrared fusion tracking.

RGB-infrared fusion tracking

In recent years, a lot of RGB-infrared fusion tracking algorithms have been proposed and some examples are listed in Table 1. In this section, we firstly discuss the key points of achieving good fusion tracking performance. Then, we introduce the fusion levels in fusion tracking. According to when the images are fused, they can be divided into pixel-level, feature-level and decision-level fusion tracking. We then give a comprehensive survey on RGB-infrared fusion tracking methods. These methods

Available RGB-infrared dataset

Large-scale datasets are of vital importance in RGB-infrared fusion tracking, since they are not only beneficial for training algorithms, but are also crucial for testing algorithms and comparing performance among trackers. Before large-scale datasets are available, in most RGB-infrared fusion tracking publications, the experimental part only employs several visible and infrared video pairs or even one single video pair to verify the algorithm. For example, the OTCBVS dataset [112] which

Evaluation metrics

In recent years, several well-recognized evaluation metrics have been proposed to evaluate tracking performance based on visible images. These include precision rate (PR), success rate (SR), accuracy, robustness and Expected Average Overlap (EAO). These evaluation metrics can also be applied to RGB-infrared fusion tracking.

Benchmark results and analysis

In this section, we present results on available public fusion tracking datasets. The results are either collected from the published literature or produced by the authors. The aim is to facilitate the research of this direction and make it easier for researchers to compare tracking results with the state-of-the-arts. It should be mentioned that many RGB-infrared trackers are not open-source and their results on public dataset have not been reported [22], [23], [24], [38], [79], [80]. As a

Future prospects

Despite the remarkable progress that has been achieved in RGB-infrared fusion tracking, several issues remain for future work. In this section, we give detailed discussions on specific trends of RGB-infrared fusion tracking based on the review of existing approaches.

Conclusion

Fusion tracking based on visible and infrared images (RGB-infrared fusion tracking) has attracted considerable attention and made significant progress in the past few years. In this paper, we comprehensively review existing RGB-infrared fusion tracking methods in the literature. These approaches can be generally divided into five categories: traditional methods, sparse representation-based, graph-based, correlation filter-based, and deep learning-based methods. Each category is introduced and

Declaration of Competing Interests

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

CRediT authorship contribution statement

Xingchen Zhang: Conceptualization, Investigation, Writing - original draft, Writing - review & editing. Ping Ye: Visualization, Investigation, Data curation. Henry Leung: Writing - review & editing. Ke Gong: Visualization, Investigation. Gang Xiao: Supervision, Writing - review & editing, Funding acquisition, Project administration.

Acknowledgment

This work was sponsored in part by the National Program on Key Basic Research Project of China under Grant 2014CB744903, in part by the National Natural Science Foundation of China under Grant 61973212 and Grant 61673270, in part by the Shanghai Science and Technology Committee Research Project under Grant 17DZ1204304, in part by the Shanghai Industrial Strengthening Project under Grant GYQJ-2017-5-08.

References (142)

  • T. Wan et al.

    An application of compressive sensing for image fusion

    Int. J. Comput. Mathemat.

    (2011)
  • S. Li et al.

    Pixel-level image fusion: a survey of the state of the art

    Inf. Fusion

    (2017)
  • Y. Liu et al.

    Deep learning for pixel-level image fusion: Recent advances and future prospects

    Inf. Fusion

    (2018)
  • J. Ma et al.

    Infrared and visible image fusion methods and applications: A survey

    Inf. Fusion

    (2019)
  • H. Hermessi et al.

    Convolutional neural network-based multimodal image fusion via similarity learning in the shearlet domain

    Neural Comput. Appl.

    (2018)
  • X. Yan, S.Z. Gilani, H. Qin, A. Mian, S. Member, S.Z. Gilani, H. Qin, A. Mian, Unsupervised deep multi-focus image...
  • Y. Liu et al.

    Infrared and visible image fusion with convolutional neural networks

    Int. J. Wavelet., Multiresolution Inf. Process.

    (2018)
  • E. Gundogdu et al.

    Comparison of infrared and visible imagery for object tracking: Toward trackers with superior IR performance

    Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops

    (2015)
  • C. Li et al.

    RGB-T object tracking: benchmark and baseline

    arXiv Preprint arXiv:1805.08982

    (2018)
  • M. Ding et al.

    Visual tracking using locality-constrained linear coding and saliency map for visible light and infrared image sequences

    Signal Processing:Image Communication

    (2018)
  • Y. Wang et al.

    Learning soft-consistent correlation filters for RGB-T Object tracking

    Chinese Conference on Pattern Recognition and Computer Vision (PRCV)

    (2018)
  • C. Luo et al.

    Thermal infrared and visible sequences fusion tracking based on a hybrid tracking framework with adaptive weighting scheme

    Infrar. Phys. Technol.

    (2019)
  • X. Yun et al.

    Discriminative fusion correlation learning for visible and infrared tracking

    Math. Probl. Eng.

    (2019)
  • L. Liu et al.

    Hand posture recognition using finger geometric feature

    Proceedings of the 21st International Conference on Pattern Recognition

    (2012)
  • V.A. Laurense et al.

    Path-tracking for autonomous vehicles at the limit of friction

    2017 American Control Conference

    (2017)
  • J. Severson, Human-digital media interaction tracking, 2017, US Patent...
  • A. Ali et al.

    Visual object tracking classical and contemporary approaches

    Front. Comput. Sci.

    (2016)
  • L. Bertinetto et al.

    Fully-convolutional siamese networks for object tracking

    European Conference on Computer Vision

    (2016)
  • J.F. Henriques et al.

    High-speed tracking with kernelized correlation filters

    IEEE Trans. Pattern Anal. Mach. Intell.

    (2015)
  • N. Dalal et al.

    Histograms of oriented gradients for human detection

    Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition

    (2005)
  • T. Lindeberg

    Scale invariant feature transform

    Scholarpedia

    (2012)
  • H. Nam et al.

    Learning multi-domain convolutional neural networks for visual tracking

    Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition

    (2016)
  • B. Li et al.

    High performance visual tracking with siamese region proposal network

    Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition

    (2018)
  • B. Li et al.

    SiamRPN++: Evolution of siamese visual tracking with very deep networks

    Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition

    (2019)
  • W. Zhou, L. Wen, L. Zhang, D. Du, T. Luo, Y. Wu, Siamman: Siamese motion-aware network for visual tracking, arXiv...
  • M. Danelljan et al.

    Convolutional features for correlation filter based visual tracking

    Proceedings of the IEEE International Conference on Computer Vision Workshops

    (2015)
  • M. Danelljan et al.

    ECO: efficient convolution operators for tracking

    Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition

    (2017)
  • M. Danelljan et al.

    Beyond correlation filters: learning continuous convolution operators for visual tracking

    European Conference on Computer Vision

    (2016)
  • J. Choi et al.

    Context-aware deep feature compression for high-speed visual tracking

    Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition

    (2018)
  • J.W. Davis et al.

    Fusion-based background-subtraction using contour saliency

    Computer Society Conference on Computer Vision and Pattern Recognition Workshops

    (2005)
  • A. Berg et al.

    A thermal object tracking benchmark

    Proceedings of 12th IEEE International Conference on Advanced Video and Signal Based Surveillance

    (2015)
  • X. Li et al.

    Hierarchical spatial-aware siamese network for thermal infrared object tracking

    Know.-Based Syst.

    (2019)
  • X. Lan et al.

    Learning modality-consistency feature templates: a robust RGB-infrared tracking system

    IEEE Transa. Ind. Electron.

    (2019)
  • X. Lan et al.

    Robust collaborative discriminative learning for rgb-infrared tracking

    Thirty-Second AAAI Conference on Artificial Intelligence

    (2018)
  • X. Lan et al.

    Modality-correlation-aware sparse representation for RGB-infrared object tracking

    Pattern Recognit. Lett.

    (2020)
  • C. Li et al.

    Fusing two-stream convolutional neural networks for RGB-T object tracking

    Neurocomputing

    (2018)
  • C. Li et al.

    Cross-modal ranking with soft consistency and noisy labels for robust RGB-T tracking

    Proceedings of European Conference on Computer Vision

    (2018)
  • C. Li et al.

    Learning local-global multi-graph descriptors for RGB-T object tracking

    IEEE Trans. Circuit. Syst. Video Technol.

    (2018)
  • S. Zhai et al.

    Fast RGB-T tracking via cross-modal correlation filters

    Neurocomputing

    (2019)
  • H. Liu et al.

    Fusion tracking in color and infrared images using joint sparse representation

    Sci. China Inf. Sci.

    (2012)
  • Cited by (130)

    View all citing articles on Scopus
    View full text