Object Tracking with Multi-Classifier Fusion Based on Compressive Sensing and Multiple Instance Learning

Object tracking is a critical research in computer vision and has attracted significant attention over the past few years. However, the traditional object tracking algorithms often suffer from the object drifting problem due to various challenging factors in complex environments such as object occlusion and background clutter.*is paper proposes a robust and effective object tracking algorithm, called MCM, which combines compressive sensing and online multiple instance learning in a multi-classifier fusion framework. In this framework, we integrate the different discriminative classifiers by learning the varied and compressed feature vectors based on different random projection matrices. And then an improved online multiple instance learning mechanism SMILE is adopted, which introduces the relative similarity to select and weight the instances in the positive bag. *e experiments show that the proposed algorithm can improve the performance of object tracking on the challenging video sequences.


Introduction
Object tracking [1] is an important research direction in the field of computer vision and has a wide range of applications. For example, video surveillance [2], mobile robots [3,4], and defense reconnaissance [5]. e object tracking methods continuously predict the location and the other information of the object for the subsequent frames in a video sequence, given the initial state in the first frame. At the same time, there are many challenges in object tracking, such as object deformation, out-of-plane rotation, and illumination variation. In the above scenes, the traditional tracking algorithms are easy to lead to tracking failure [6]. In this paper, we mainly focus on the online object tracking problem when the object location in the first frame of a video sequence is only given without both the other training samples and prior knowledge.
ese algorithms also have their own advantages and disadvantages. For example, the compressive sensing technology is introduced in the CT algorithm [7] for feature extraction. However, only a fixed random projection matrix is used to feature extraction, which easily loses discriminative features when the appearance of the object changes drastically. Although the MIL paradigm has been successfully used in the object tracking [6], there are still many problems to be solved. e traditional MIL-based object tracking algorithm [6] may be inaccurate in the complex scenes such as fast movements and occlusions because the model easily introduces the error information due to the unsuitable choice of the positive bag.
In this paper, we propose a novel object tracking algorithm with multi-classifier fusion based on compressive sensing and multiple instance learning, termed as MCM. In the multi-classifier fusion framework, the MCM algorithm firstly generates the different feature vectors of instances with the different random projection matrices based on the compressive sensing technology. en, an improved online multiple instance learning mechanism, i.e., SMILE (Relative Similarity Based Online Multiple Instance Learning), is adopted, which further selects and weights the instances in the positive bag by introducing the concept of relative similarity. Finally, by fusing the tracking results of multiple strong classifiers, the MCM algorithm can obtain the more discriminative classifier to effectively deal with the dramatic appearance changes in the complex environments. e main contributions of our paper are summarized as follows: (1) We introduce the multi-classifier fusion framework to object tracking, by utilizing multiple single classifiers to ensemble learning, so as to obtain better tracking results, benefited from the diversity of the multiple discriminative classifiers (2) We can extract the different low-dimensional feature vectors by using multiple different random projection matrices based on the compressive sensing technology, in order to obtain the more discriminative classifiers in the multi-classifier fusion framework (3) e MCM algorithm employs an improved SMILE mechanism to filter and weight the instances in the positive bag, so as to avoid the mistake information to be introduced in the tracking process e rest of this paper is organized as follows. We first review the related work about object tracking in Section 2 and then describe the overview of the proposed algorithm in Section 3. Section 4 illustrates the experimental results tested on the challenging video sequences. e conclusion is made in Section 5.

Related Work
e object tracking methods  can be divided into offline methods [14,15] and online methods [6][7][8][9][10][11][12][13]. e offline object tracking methods [14,15] use the object model built before tracking to track the object, and the learned model is not further updated during the tracking process. e shortcoming of the offline methods is that it is difficult to adapt to the drastic changes of the object appearance in the complex environment. On the other hand, the online tracking methods [16][17][18][19][20][21] can keep updating the object appearance model during the tracking process, so that the dynamic changes of the object appearance can be more robustly dealt with, and the object drifting problem can be avoided to some extent. erefore, this paper mainly focuses on the online tracking methods. According to the different modelling patterns, the online tracking algorithms can be further divided into generative learning [8, 12, 17, 18, 22-25, 33-38, 41, 42, 44, 45] and discriminative learning [6, 7, 9-11, 16, 19-21, 26-32, 39, 40, 43, 46-49].

Generative Tracking.
e online tracking algorithm based on generative learning usually establishes an object appearance model through the object representation technology and then searches for the region most similar to the object appearance model in a new image frame. For example, Havangi [18] performed the MCMC (Markov chain Monte Carlo) motion processing after resampling. However, the required number of probability transfers is large, and the convergence is difficult to achieve. In order to avoid the noise interference and the inaccurate observation value of the Jacobian matrix, Zhou et al. [44] proposed a robust Kalman filter (KF) algorithm with long short-term memory (LSTM) for an image-based visual servo control system. Mei and Ling [23] applied the sparse representation to the object tracking, which uses the reconstruction error of the sparse representation as the weight of the candidate object and selects the candidate object of the maximum weight as the tracking result. An adaptive image processing method [34] is suggested to determine the 3D position and the velocity of moving objects by using a distributed camera array. Marata et al. [35] proposed the Separate Monte Carlo Mean (SMC-MEAN), which is applied to an autonomous object tracking problem in both Gaussian and non-Gaussian scenarios. Nam et al. [32] proposed the convolutional neural networks (CNNs) algorithm to represent the appearance of the object, which constructs a tree structure based on multiple CNNs and maintains the reliability of the model by smoothly updating along the tree path. Lu et al. [33] presented a transform-aware attentive tracking framework, which uses a deep attentive network to directly predict the target states via spatial transform parameters. Pan et al. [37] proposed a robust object tracking framework, based on a canonical CF tracker. Specifically, an adaptive model update strategy is proposed to evaluate this model with a siamese network. Liu et al. [41] used the spatial feature and the temporal feature which are fused into the deep residual network model with the multi-scale feature vector, so that they can obtain the deep multi-scale spatiotemporal feature model. Zhou et al. [45] proposed a novel object tracking method with the fusion of the extended Kalman particle filter (EKPF) and the least squares support vector regression (LSSVR) to effectively improve the accuracy. Zhang et al. [42] put forward a multi-feature integration framework, including the gray features, histogram of gradient (HOG), color-naming (CN), and illumination invariant features (IIF), in order to overcome the problem of poor representation of single feature in a complex image sequence.
Although many online generative tracking methods have been proposed, there are still several issues to be solved. Firstly, the online appearance model is built which is largely dependent on a number of training samples, for example, deep learning-based trackers [32,33,37,41]. If there are only a few training samples at the beginning of tracking, the appearance model is hard to be robust when the appearance of the object changes significantly in the process of tracking. Secondly, when many samples around the current object location are used to build an object model, the error background information is potentially introduced. irdly, the generative algorithms usually only use the object information, without considering the helpful background.

Discriminative Tracking.
e online tracking algorithm based on discriminative learning learns a classifier to distinguish the object sample from the background. Collins et al. [26] embedded an online multi-feature selection mechanism in the mean shift tracking method, and the features with high discrimination are selected. is method easily suffers from the object drifting, when the object is similar to the background. Huang et al. [43] proposed the MD-JITS (multiple detection joint integrated track splitting) tracker, which is used to the multiple-detection multi-target tracking in cluttered environments. Grabner et al. [28] proposed a well-known online boosting framework for realtime object tracking, which uses only one positive sample in the object region and several negative samples from the background to update the classifier. However, the tracker may not keep very accurately tracking the object, and thus, some possibly mispositioned samples are easily introduced to update the appearance model.
To guard against selecting the incorrect samples to update the classifier, Grabner et al. [20] proposed an online semi-supervised boosting method, where the samples at the first frame are labelled and the samples from the other frames are unlabelled. Chen et al. [46] proposed a novel visual tracking algorithm via online semi-supervised coboosting, and the algorithm has a good ability to recover from drifting. Babenko et al. [6] introduced the multiple instance learning (MIL) paradigm to online object tracking by using the positive and negative bags, where some samples approaching the object location are added to a positive bag. A large number of samples far from the object location are taken to form a negative bag. In order to solve the unselective treatment of instances in the positive bag during the MIL tracking process and the lack of prior knowledge of the object in instance modelling, Chen et al. [47] proposed an online multiple instance boosting algorithm with instancelevel semi-supervised learning, termed SemiMILBoost, to achieve more robust object tracking. Zhang and Song [16] proposed an improved online weighted MIL tracker (WMIL), where the instances are weighted by the distance from the object location. Zhou et al. [48] improved the MIL tracking algorithm by optimizing a bag Fisher information function and integrating the co-training criterion. And, Zhou et al. [49] also proposed a multiple instance learning (MIL) tracking method based on a semi-supervised learning model with Fisher linear discriminant. However, in the above methods [6,16,47,48,49], the positive bag may contain some negative samples because the radius of the positive bag is difficult to be very accurately selected. erefore, when all the instances in the positive bag are used to update the classifiers, the above algorithms might suffer from the drifting problem in the complex environments.

Overview.
e MCM algorithm first uses the different random projection matrices to reduce the dimensions of instances in the positive and negative bags, respectively, thus obtaining the different compressed feature vectors. In the MCM algorithm, the SMILE mechanism is presented to improve the traditional online multiple instance learning by introducing the relative similarity metric. After the SMILE mechanism, the final strong classifier is obtained by the multi-classifier fusion framework. e whole flow of the MCM algorithm is shown in Figure 1.
During the tracking process, the original multi-scale image feature vectors (i.e., Haar-like features are used here) of the instances in the positive and negative bags are firstly obtained at the i-th frame. As shown in Figure 1, the feature vector in the green row represents an instance in the positive bag, and the feature vector in the red row represents the sample in the negative bag. To reduce the time complexity and improve the robustness of object tracking, we use the different random projection matrices to compress the highdimensional multi-scale image feature vectors into the different low-dimensional feature vectors. en, by exploiting the improved SMILE mechanism, the compressed Haar-like feature vectors are, respectively, used to update and select the best weak classifiers to obtain the G different strong classifiers H 1 , H 2 , . . . , H G . en, the G different strong classifiers are combined into a final strong classifier by a multi-classifier fusion framework. ereby, the object location with the largest response value is found by the combined final strong classifier in the search area of the (i + 1)-th frame.
In the next sections, we will further elaborate compressive sensing, the SMILE mechanism, and the multiclassifier fusion framework.

Compressive Sensing.
To avoid the high computational complexity and large storage consumption, compressive sensing is employed to object tracking [7,17]. e compressive sensing technology [50][51][52] satisfies the condition of RIP (Restricted Isometry Property) [53] and uses the random projection matrix to reduce the high-dimensional multi-scale image features to the low-dimensional image domain in real time. e random projection matrix satisfying the Johnson-Lindenstrauss (J-L) lemma is proved to hold true for the RIP condition in compressive sensing [54].
Given that R ∈ R n×m (n ≪ m) denotes a random projection matrix, where the n and m, respectively, denote the row number and column number of R.
en, the highdimensional spatial image feature vector x ∈ R m can be compressed into the low-dimensional image feature vector v ∈ R n , expressed as v � Rx. As long as the matrix R satisfies the J-L lemma, x can be reconstructed from v with a low error probability. Similar to topology reconstruction, the compressed feature v still retains most of the original information [7].
At present, there are the four main types of random projection matrices suitable for the J-L lemma as follows, and the reduced low-dimensional image feature vector can preserve basically the information of the original image feature vector [7,17,[50][51][52].

) Normal distribution random projection is also called
Gaussian distribution random projection [17,52]. According to J-L lemma, the random projection is represented by a n × m matrix R � 1/ �� m √ (r ij ), where r ij is the Gaussian distribution with N(0, 1). (4) e CT algorithm [7] uses the following random projection matrix: e verification result in [55] shows that when F is 2 or 3 in equation (1), R satisfies J-L lemma.
e traditional CT algorithm [7] only uses a fixed random projection matrix in the whole process of tracking, and thus, the helpful information might be omitted, New object location Frame (i + 1)  especially in the complex environments. To avoid the above problem, we randomly select a sparse random projection matrix R from the above four main types of random projection matrices and thus obtain the different compressed feature vectors for each instance, which is inputted to the following online SMILE mechanism.

Online SMILE Mechanism.
e workflow of the developed online SMILE mechanism is shown in Figure 2. In order to avoid the object drifting problem caused by the accumulated errors, we can filter some samples more likely to be negative in the positive bag by calculating the relative similarity of all instances in the positive bag. en, the filtered positive bag and the negative bag are used to update all the weak classifiers. At the same time, the relative similarity is also employed to weight the objective function and then the best weak classifiers are selected to integrate a strong classifier from the updated weak classifier pool.
In the multiple instance learning paradigm, the training data are and N is the total number of bags. e bag label is defined as y i � max(y ij ), where y ij is the label of the j-th instance in the i-th bag, but the instance label is unknown during MIL training. In the MIL, it is known that a positive bag contains at least one positive instance, and all the instances in a negative bag are negative. In this paper, we select a set of image patches X 1 � x: ‖d(x) − d * t ‖ < c , c < s as a positive bag, where ‖ · ‖ represents the Euclidean distance; c is the radius from the center point (in pixels); x is an image patch; d(x) indicates the location of the image patch x; d * t represents the location of the object at time t; and s is the search radius of the tracker. For the negative bag, we select the image patches from the annular region Because a large set of potential samples are generated, we select a random subset of these samples as a negative bag as in [1].
In most cases, there are some negative samples in the positive bag. However, the traditional online multiple instance learning algorithms [1] treat all the instances in the positive bag as positive in the process of updating all the weak classifiers, which will degrade the performance of the tracker. In order to improve the robustness and accuracy of object tracking, the SMILE mechanism introduces the concept of relative similarity [44] to further filter the instances in the positive bag, and then, the relative similarity is also used to weight the instance probability and then optimize the objective function.
First, in the SMILE mechanism, the similarity between the two instances x ij and x iq is defined as follows: where NCC is the normalized correlation coefficient. For any given instance x ij in a bag, we construct an object model E � x 11 , . . . , x 1N + , x 01 , . . . , x 0N − to represent the object and the background information observed so far. In this paper, x 11 , . . . , x 1N + are the object samples from the first frame to the previous frame. And x 01 , . . . , x 0N − are the instances in the negative bag in the previous frame. Given an arbitrary instance x ij and the object model E. e similarity measures for the MIL problem are defined as follows: ① Similarity with the positive nearest neighbour: ② Similarity with the negative nearest neighbour: ③ Relative similarity: e relative similarity varies from 0 to 1, and the larger S r indicates that the image patch is more likely to be the object. Given the positive relative similarity coefficient θ, and when S r (x ij , E) < θ, the image patch x ij is removed from the positive bag, and otherwise, it is retained in the positive bag.
In addition, the SMILE mechanism also uses the relative similarity to weight the instance probability, in order to more accurately estimate the bag probability p(y i |X i ). e probability of the positive bag to be positive is calculated as follows: Different from the WMIL algorithm [10], our algorithm uses the relative similarity to weight the instances, and the weight of each instance in the positive bag is calculated as e negative bag probability to be negative is calculated as follows: Since all negative instances are far from the object, w 0 is set to be a constant [10].
As shown in Figure 1, we compress the original highdimensional multi-scale Haar-like feature vector to the lowdimensional feature vector by the random projection matrix in Section 3.2, a very sparse measurement matrix, in order to extract a small number of effective Haar-like features [7]. In the SMILE mechanism, each weak classifier h k corresponding to a Haar-like feature f k in the compressed feature vectors is constructed based on the Bayesian theorem, as follows:

Mathematical Problems in Engineering
where the conditional distributions are modelled as a Gaussian function, i.e., . e four parameters μ 1 , σ 1 , μ 0 , σ 0 are updated with a learning rate parameter by the same schemes as the WMIL algorithm [10]. en, the MCM algorithm selects the best weak classifiers by maximizing the log-likelihood function of bags L � i (log p(y i |X i )). After the K iterations, the K best weak classifiers are integrated into a strong classifier H g . In the multi-classifier fusion framework, we obtain the different G strong classifiers {H 1 , H 2 , . . . , H G }.

Multi-classifier Fusion
Framework. In order to fuse the different G strong classifiers {H 1 , H 2 , . . . , H G }, the final strong classifier H final � H g * is calculated as follows: In equation (9), the g represents the g-th strong classifier in the total G strong classifiers. e 1/G G g�1 H g represents the mean response value of all G strong classifiers, and g * is the subscript of the selected strong classifier. ereby, the final strong classifier H final is used to find the location of the sample with the largest response value in the search area as the tracking object. e pseudocode of the MCM algorithm is summarized in Algorithm 1.

Experimental Results
In this section, we first introduce the implementation details of the experiments. Secondly, we discuss the suitable number of the strong classifier. Finally, we evaluate the proposed algorithm and several competing algorithms on the two baseline datasets, i.e., OTB-2013 [56] and OTB-2015 [57], from the perspective of quantitative evaluation, time complexity analysis, and qualitative evaluation.

Parameter Analysis.
We report the precision at an adaptive threshold [6,13], i.e., 0.5 × min(w, h) when the number of strong classifiers is from 30 to 100 on the OTB-2013 and OTB-2015 datasets. Here, w and h denote the width and height of the object, respectively. is adaptive threshold roughly corresponds to the percentage of frames with at least a 50% overlap between the bounding box and the ground truth. Figure 3 shows the precision under the different numbers of strong classifiers on the four video sequences, i.e., Bird1, Box, Crowds, and Soccer on the two datasets. From Figure 3, we can see that the tracking precision increases gradually and then begins to drop with the increase of the number of strong classifiers. When the number of strong classifiers is in [55,85], the tracking precision is the best for most of the video sequences.

Quantitative Evaluation.
e center location error and precision are employed to evaluate the experimental results of all the comparison algorithms. e center location error (CLE) is defined as the Euclidean distance between the object center location and the ground truth.
e CLE is formally expressed as follows: e function dist(., .) is used to calculate Euclidean distance. N f is the total number of frames; center(i) and gt(i) represent the center location given by the tracking algorithm and the ground truth center location in each frame. e smaller the value of CLE is, the lower the error of the tracking algorithm is.
In addition, we also report the precision at an adaptive threshold, i.e., 0.5 × min(w, h). e precision plots show the percentage of the frames whose center location errors are lower than a certain threshold, defined as follows: Input: t-th video frame. Initialization: strong classifier H � H 0 � 0. (1) Select a set of image patches χ s � x:‖d(x) − d * t ‖ < s and calculate their feature vectors; (2) Use the final strong classifier to estimate p(y|x) for x ∈ X s ; (3) Update the object location d * t � d(arg max x∈χ s p(y|x)) of the current frame and the object model E; (4) Select a positive bag X 1 � x:‖d(x) − d * t ‖ < c and a negative bag X 0 � x:c < ‖d(x) − d * t ‖ < β ; (5) Initialize the iteration number k � 1; (6) Randomly choose a random projection matrix R ∈ R n×m , and calculate the low-dimensional feature vector of each instance in the positive and negative bags; (7) Normalize each instance in the positive bag, and calculate the similarity of any two instances x 1j and x 1q in equation (2); (8) Calculate S + (x 1j , E) in equation (3) and S − (x 1j , E) in equation (4) for each instance in the positive bag; (9) Calculate the relative similarity S r (x 1j , E) in equation (5) Calculate the weighted bag probability p(y i |X i ) according to equations (6) and (7); (15) Calculate the log-likelihood function L � i (log p(y i |X i )), and select the best weak classifier h * k � arg max L(H k− 1 + h); (16) Repeat Steps 13-15 for K iterations. e k best weak classifier is integrated into the current strong classifier H k � H k− 1 + h * k ; (17) Obtain the g-th strong classifier H g � H K ; (18) Repeat Steps 5-17 for G iterations to generate the G strong classifier; (19) Obtain the final strong classifier H g * by equation (9).
Output: the final strong classifier H final � H g * .  Tables 1 and 2 show, respectively, the average center location error (in pixels) for each challenge factor on the OTB-2013 and OTB-2015 datasets. From the last row of Tables 1 and 2, we can see that the MCM algorithm obtains the lowest average center location error compared with the other competing algorithms. In the total of eleven challenge factors, the MCM has the lower center location errors than the other compared methods at the ten challenges on the   Tables 3 and 4 list, respectively, the average tracking precision at the threshold of 0.5 × min(w, h) of the competing algorithms under the different challenge factors on the OTB-2013 and OTB-2015 datasets. In the last row of Table 3, the average precision of the MCM algorithm is only lower than KCF by 0.043 on the OTB-2013 dataset. In Scale Variation (SV), Out-of-View (OV), and Low Resolution (LR), the precision of the MCM algorithm is higher than that of the KCF algorithm on the OTB-2013 dataset. In the last row of Table 4, the average precision of the MCM algorithm is 0.553, which is highest among all the comparison algorithms on the OTB-2015 dataset. In Scale Variation (SV), Occlusion (OCC), Deformation (DEF), Out-of-View (OV), and Background Clutters (BC), the precision of the MCM algorithm is higher than that of the other competing algorithms on the OTB-2015 dataset. Overall, by fusing multiple strong classifiers, the precision of the MCM algorithm is higher than that of CT and MIL using the single strong classifier on the two datasets. erefore, the effectiveness of the multi-classifier fusion of the MCM method is verified. Figures 4 and 5, respectively, show the center location error plots on several representative video sequences on the OTB-2013 and OTB-2015 datasets. In addition, as shown in Figures 4  and 5, we can intuitively see that the MCM tracker yields much lower center location errors for most of the frames than the other six algorithms on the two benchmarks. Figures 6 and 7, respectively, present the precision plots over thresholds in [0, 50] on eight representative sequences on the two datasets. Our algorithm has the higher precision than the other comparison algorithms among the thresholds [0, 50] for most of video sequences.
is is because compressive sensing and the improved online SMILE mechanism play an important role in the MCM algorithm. As a whole, the proposed algorithm is more effective to avoid the severe drifting problem than the other competing trackers in the complex environments.

Time Complexity Analysis.
We use the number of frames per second (FPS) to evaluate the time complexity of the algorithm. e larger the FPS is, the shorter the running time of the algorithm is, and the lower the time complexity of the algorithm is. Table 5 lists the average FPS values of different algorithms on the 100 challenging video sequences of the OTB-2015 dataset. As shown in Table 5, the tracking speed of the MCM algorithm is around 12 frames per second, and the MCM has the faster tracking speed than the Struck, WMIL, and TLD methods. Although it has the lower  tracking speed than the CSK, KCF, and CT algorithms, the MCM algorithm obtains the lower center location errors and the higher precision than the three compared algorithms, due to the multi-classifier fusion framework employed in this paper. Figure 8 summarizes the qualitative evaluation of the proposed algorithm and the six competing trackers under seven representative challenge sequences. In the Basketball video sequence (see Figure 8  fast motion, scale variation, and occlusion. In the #17 frame, the Struck algorithm loses the object, and the KCF and WMIL methods are partially deviated from the object when the occlusion occurs. When the object moves fast and the scale variations happen, the KCF algorithm drifts heavily (see the #63 frame), and the CT and TLD algorithms also suffer from drifting (see the #247 frame). e CSK algorithm gradually deviates from the object and then the object is lost from the #489 frame. e CT algorithm also has a severe offset in the subsequent #629 frame. e proposed MCM     algorithm performs well in most frames, where the object can be accurately tracked in the face of rapid movement and scale variation. ere are three tracking difficulties in the Bird1 sequence (see Figure 8(b)), i.e., fast motion of bird wings, long-term object occlusion, and rapid scale variation. In the initial fast motion, CSK loses the object firstly (see the #17 frame), and then the KCF and Struck algorithms drift severely (see the #35 frame). After long-term occlusion of cloud, only the MCM algorithm still tracks the object and the other algorithms are severely deviated from the ground truth (see the #187 frame). For the subsequent scale variations, the MCM algorithm still keeps tracking the object well (see the #266 and #403 frames).

Qualitative Evaluation.
In the BlurBody sequence (see Figure 8(c)), because the camera always shakes, some frames become blurred, which will have a great impact on the object tracking. During the tracking process, most algorithms have serious large-scale drifting, but the MCM algorithm is better to adapt to this situation.
In the Box video sequence (see Figure 8(d)), due to the occlusion problem in the moving process of the box, the CSK, TLD, and Struck algorithms drift successively, and then the object is lost (see the #73, #354, and #429 frames). e CT and WMIL and KCF algorithms miss the target in the subsequent frames with scale variations and rotations (see the #552 and #737 frames). However, the MCM algorithm has excellent performance in face of these challenge factors and can accurately track the target.
In the Panda sequence (see Figure 8(e)), a small Panda suffers from fast movements and occlusions. e Struck, CSK, and KCF algorithms have lost the object gradually, and the TLD algorithm fails due to scale variations during the tracking process (see the #160 and #609 frames), and the other algorithms are more accurate and stable to track the object.
A singer in the Singer2 sequence (see Figure 8(f )) has the large-scale variations and rotational deformations during the tracking process. In the scale variations of the object, the TLD, Struck, CSK, and CT algorithms work poor (see the #40, #107, and #269 frames), while the other algorithms can accurately track the target. In the movements of the singer, the KCF and WMIL algorithms are partially drifting from the object, and the MCM algorithm tracks the target more accurately during the tracking process.
In the Walking2 dataset (see Figure 8(g)), the main challenging factor is the occlusions of two persons. After the first occlusion, the Struck, TLD, CT, and WMIL algorithms drift to track the error object (see the #243 frame). After the second occlusion, the CSK method also begins to have a large offset (see the #389 frame). When the target becomes smaller and the scale variation occurs, the MCM algorithm is able to track the target accurately, and yet the other algorithms partially depart from the object (see the #412 frame).
In summary, the MCM algorithm outperforms the other compared trackers and can be more accurate and robust to track objects in the challenging factors of rapid motions, scale variations and object occlusions, and so on.

Conclusions
In this paper, we present an effective and robust object tracking algorithm, termed as MCM. We employ the multiclassifier fusion framework to integrate the advantages of multiple different discriminative strong classifiers, where the different random projection matrices are used to extract the low-dimensional feature vectors based on the compressive sensing technology. en, an improved SMILE mechanism is presented to further filter and weight the instances in the positive bag by introducing the concept of relative similarity. Experimental results show that the proposed MCM algorithm performs favourably against several representative tracking algorithms on the two popular tracking benchmark datasets.

Data Availability
e data used to support the findings of the study are available from the corresponding author upon request. are grateful to the Computer Vision Lab, Hanyang University, Seoul, Korea, who provides the dataset publicly (Visual Tracker Benchmark, http://www.visual-tracking.net).