A Novel Object Tracking Algorithm Based on Compressed Sensing and Entropy of Information

. Object tracking has always been a hot research topic in the field of computer vision; its purpose is to track objects with specific characteristics or representation and estimate the information of objects such as their locations, sizes, and rotation angles in the current frame. Object tracking in complex scenes will usually encounter various sorts of challenges, such as location change, dimension change, illumination change, perception change, and occlusion. This paper proposed a novel object tracking algorithm based on compressed sensing and information entropy to address these challenges. First, objects are characterized by the Haar (Haar-like) and ORB features. Second, the dimensions of computation space of the Haar and ORB features are effectively reduced through compressed sensing. Then the above-mentioned features are fused based on information entropy. Finally, in the particle filter framework, an object location was obtained by selecting candidate object locations in the current frame from the local context neighboring the optimal locations in the last frame. Our extensive experimental results demonstrated that this method was able to effectively address the challenges of perception change, illumination change, and large area occlusion, which made it achieve better performance than existing approaches such as MIL and CT.


Introduction
Recognition and tracking of moving objects through computer vision technology have been widely applied in various fields such as food quality control [1], traffic flow monitoring [2], and illegal surveillance [3]. At present, the methods of detecting moving object mainly include model-based tracking [4], region-based tracking [5], and contour-based tracking [6]. Representative object tracking algorithms have been proposed in recent years by a series of studies [7][8][9][10][11][12][13]. Although object tracking algorithms have been studied for decades, many challenging problems are still to be solved. Many factors affect the performance of object tracking algorithms, including illumination change, occlusion change, and complex background. However, so far, to the best of our knowledge, no robust algorithm has been developed to effectively address all the challenges caused by the aforementioned factors. Therefore, the proposed tracking algorithm in this research attempts to partially solve the problems caused by these influencing factors.
Characterization of an object is an extremely important component for any type of object tracking algorithms. The overall object template is widely used in tracking [14][15][16]. Mei et al. [17] utilized sparse representation to overcome the object appearance changes. Besides template, many features have been used in tracking algorithms, including color histogram [8], histograms of oriented gradients (HOG) [18], Region Covariance descriptor [19][20][21], Haar-like features [22,23], and ORB [24]. In addition, search strategies are also critical for tracking algorithms, and examples of search strategies include both definite and random methods [25]. Furthermore, due to high efficiency of calculation, particle filter is widely applied in object tracking algorithms [7,26].
In this paper we propose a novel object tracking algorithm based on compressed sensing and information entropy. The rest of this paper is organized as follows: particle filter, ORB feature, Haar feature, sparsification, local area, and entropy of information are introduced in Section 2. The detailed description of the new object tracking algorithm is presented in Section 3. This is followed by the report of experimental results to validate the proposed algorithm in Section 4. Finally, Section 5 concludes the paper.

Related Work
2.1. Particle Filter. Particle filter [27] simulates the state space by using a certain number of particles. Each particle is given a weight through approximating probability density function, thereby resulting in the minimum variance estimation of system state.
There commonly exists the degeneracy problem in particle filter; that is, after several states all but one particle has significant weight [28]. A number of approaches have been used to avoid particle degeneracy [29]. One solution [30] is to generate several offspring for each particle, and the number of offspring of each particle is proportional to its weight. Thus those particles with higher weights will be more likely to be chosen, which avoids the degeneracy to a great extent.
When applying particle filter, the state transition model and observation model are defined first. Suppose is the frame index of a video. V ( ) −1 is the velocity of the th particle in frame , ( ) is the position of the th particle in frame , ( ) is the observation of the th particle in frame , ( ) is the weight of the th particle in frame , and the problems are solved according to the process described below.
Step 1. When time = 1, effective sampling of prior knowledge ( 1 ) is taken. Thus the sampling particle set where is the number of particles.
Step 2. From = 2, new particle set is generated by using the state transition function ( ) where rand is a random number matrix. The weight of particle is calculated and normalized by using the observation model ( ) = ( ( ) | ( ) ).
Step 3. The system output is obtained by calculating weighted average of particles' position. Then the system state at time is estimated bŷ= ∑ =1 ( ) ( ) , which is considered as the solution of the problem at time .
As a rule of thumb, the particle number in this study is set as 200. The state transition function is set to be translation transformation in affine transformation. The prior probability model is assumed to be normal distribution. The mean and variance are obtained by using the vectors of ORB and Haar features.

Extraction of ORB
where image is the image from which the Haar feature will be extracted, width and height are the width and height of the search window, respectively, minNumRect and maxNumRect are the minimum and maximum numbers of feature rectangles, respectively, and is the number of times for extracting Haar features (as a rule of thumb, is set as 50), namely, the dimension of vector.
In the current study, width and height are the width and height of the search area, respectively. minNumRect and maxNumRect are 2 and 4, respectively.

Sparsification.
In this study, the ORB and Haar features are sparsified by using the following sparsification transition equation: where stands for the vector which needs to be sparsified, is the vector after sparsification, and is the sparse matrix. Two methods are available for generating each element of , which are shown in (4) and (5), respectively. The Haar feature is processed by using the method given in [31], where sparsifying the Haar feature is through using (4) [31]. In (4), Achlioptas [32] proved that this type of matrix with = 2 or 3 satisfied the Johnson-Lindenstrauss lemma, Different from (4), the generated random matrix according to (5) has a higher sparsity. The ORB feature is sparsified by using (5), where takes 3. When = 3, it is very sparse where two-thirds of the computation can be avoided: Take Figure 1 as an example: Image is the experimental figure; ORB of the Image is the extraction of ORB feature dimensions on the Image. The dimension of the ORB feature is high, and the nonlinear variation is complex and thus is not conducive for the processing of classifier. The sparse result of the figure, "sparse based on (4)" is based on (4), and the dimension is 40 times reduced. At the same time, the difference features extracted from the image are retained, which greatly reduces the processing burden of the classifier. Consider the 3D map shown in Figure 1; we can see that, compared to the sparse result calculated from (5) (sparse based on (5)), results obtained from (4) have larger variance; after normalization the part dimensions of information accounted for the total information are larger and in aspect of describing the features of diversity is not as good as the results from (5). In comparison, the sparsification method based on (5) can achieve dimension reduction and at the same time reduce as much as possible the loss of difference between features. So we choose (5) in this research.

Local
Context. The movement of an object being tracked is a continuous process from the previous frame to the next one. Thus, the object displacement difference between adjacent frames is necessary to be within a limited range. Take Figure 2 as an example: in the current frame, the object region is enclosed by a red rectangle. Then, in the next frame, the object region will naturally not exceed the limit of the blue rectangle, which is defined as local context in [34]. When generating particles, the coordinates of particles need to be placed within a limit. For those particles beyond the local context, we transfer the particles to the restricted region.
Reasonably adjusting the size of the local context according to the motion speed of an object can increase the accuracy of the object location detection to a certain extent. In this study, the width and length of local context are twice as those of the original object region. The red rectangle stands for object location in the previous frame; the green rectangles are those generated particles; the blue rectangle is the restricted local context.

Self-Adaptive Fusion Based on Entropy of Information.
The concept of information entropy [35] is used to quantitatively measure the amount of information contained by data. The entropy of information is defined in the following: where is a positive constant. If = 1, we get where ( ) is the entropy of information for the feature (Haar or ORB feature), ( ) is the probability of , and is the th particle. In feature fusion, the weight of a feature can be determined by the amount of information contained by data, namely, the entropy of information. After discretization and taking logarithm, (6a) and (6b) are transformed to the following equation: Then the observation probability value of Haar feature, namely, the weight of Haar feature, is expressed in where ( haar ) is the entropy of information for the Haar feature; ( haar | ) is the conditional probability of the th particle with observation value haar ; is the particle number.
The observation density of ORB feature, namely, the weight of ORB feature, is shown in where ( orb ) is the information entropy of ORB feature; ( orb | ) is the probability of the th particle with observation value orb ; is the particle number.
According to the definition of entropy of information, the fused weight of Haar feature is computed by the following equation: The fused weight of ORB feature is obtained by The illustration of fused weight of ORB feature and Haar feature is shown in Figure 3. In Figure 3, the number of different color squares represents different fused weights. Among them, the small red squares represent the Haar feature, while the small green squares stand for the ORB feature. As illustrated, fused weight of Haar feature accounts for a bigger proportion when the object is not being covered. However, when the object is covered, ORB feature dominates in the fusion.
In multiple feature fusion, the additive fusion and multiplicative fusion are the most widely used methods [36]. Additive fusion can relatively reduce system noise, while multiplicative fusion can increase weight identification ability but amplify the system noise. In the current study, two types of fusion are combined through adaptive adjustment of weight allocation by using where haar and orb are weights of the two features, respectively, and they are also called adaptive adjustment factors determined by the amount of entropy of information of the two features. According to (12), when haar approaches 0, namely, the entropy of information of Haar feature reaches minimum, (12) is equivalent to additive fusion, where only ORB feature plays a role in tracking. However, when orb approaches 0, namely, the entropy of information of ORB feature reaches minimum, (12) is equivalent to multiplicative fusion, where only Haar feature plays a role. Finally, ( | ) will be the particle weight .

See Figures 4 and 5
The description of our algorithm is shown as follows.
The Initial Condition. In the first frame (1), the location of tracking object is provided.
Initial Operation. The observation of the first frame is calculated as the estimated observation of the second frame. The observation of frame (t + 1)
Step 1. In ( ), particles are sampled. The vector of the Haar feature and ORB feature are extracted from each particle image to form the observation matrix. The prior probability of each particle is computed as weight according to normal distribution.
Step 2. The entropy of information of particle weight H h and H o is calculated.
Step 3. The fused weights beita h and beita o are computed.
Step 4. The location of the object is obtained (selecting candidate object locations in the current frames from the local areas neighbouring the optimal locations in the last frames) by calculating weighted average fusion of all particle locations.
Step 5. Resampling is conducted. The particle degeneracy is compensated as described in Section 2.1.
Step 6. Steps 1∼5 are repeated for 8 times (based on the rule of thumb). The th estimated object location is saved as the object location of the th frame.
Step 7. Loc( ) is the estimated observation of ( + 1)th frame. The update rate is 0.2.
Step 8. If < , = + 1, return to Step 1 and continue running. If = , the tracking is finished. The tracking results and location of each frame are returned.
The corresponding algorithm flow chart is shown in Figure 6.

Experiments
To verify the performance of the algorithm, an OpenCV and MATLAB-based object tracking prototype framework have been developed according to the proposed algorithm. In this prototype framework, the ORB feature is extracted from the OpenCV library, while the Haar feature and adaptive fusion of entropy of information are implemented by MATLAB scripts. In this study, open video datasets are utilized as experimental data including twelve video sequences: Basketball, David2, dollar, Dudek, OccFace2 Freeman1, Mhyang, Sylvester, Gym, Jumping, Jogging, Trellis from [33], and EnterPaths1 from [37]. The algorithm is compared with other algorithms such as BSBT [38], SBT [39], CT [31], MIL [10], and BT [23] in terms of performance from the following three aspects: average pixel distance error, average overlap area, and success rate (the overlap of object tracking results with rectangle area > 50%). All experiments reported in this research were performed on a computer with an Intel i5 CPU (2.67 GHz basic frequency) and 4 GB memory. The parameters have been given in Sections 2 and 3.   Tables 1-3. The data in Table 1 represent the average error, which is the average of distances between the tracking result and the actual center of the tracking object in all images. The data in Table 2 are evaluated in terms of average overlapping area, namely, the ratio of the overlapping area between the object tracking rectangle and the groundtruth rect rectangle, in which each row represents the bounding window of the actual object in that frame, that is, ( , , window-width, and window-height), where ( , ) is the left upper corner coordinates of the window. The data in Table 3 are average success rates. The success of tracking in one image is defined as the ratio of the overlapping area exceeding 0.5. The success rate is defined as the proportion of successfully tracked image  among the total images. In Table 3, red color stands for optimal results, while blue stands for the suboptimal results.
From a further examination of all the tracking results in Tables 1∼3 for all ten videos, we can see that our algorithm shows the minimum average error in 8 videos, the maximum average overlapping area in 7 videos, and the highest success rate in 9 videos, respectively. In general, our proposed HOPEF is more accurate in tracking the location of the object than other algorithms such as BSBT, SBT, BT, CT, and MIL.

The Analysis of Algorithm Performance on Key Frames.
In this section, we analyze the performance of the six algorithms based on some key frames. First, the algorithms and their corresponding colors used in this section are listed in Table 4.

Basketball Video.
Basketball video sequences describe an NBA basketball game where the tracking object is a basketball player. The foreground of the video contains the players from two teams, while the background is the audience in the stands. Besides the complex background, another tracking difficulty in this video is that the uniforms of the players. The performance of the six algorithms is shown in Figure 7. BSBT (black) and SBT (blue) lost the object in the 180th frame. MIL (purple) mistakenly tracked the wrong player who wore the same uniform from the same team. CT (yellow) lost the object in the 508th frame; meanwhile, BT (green) made the same mistake. However, our proposed algorithm was able to basically cover the object across all frames. The error curve of HOPEF (red) is found to lie in a 8

Mathematical Problems in Engineering
The 2nd frame The 50th frame The 129th frame The 323rd frame The 473rd frame The 508th frame The 640th frame The 723rd frame Figure 7: A performance comparison of the six algorithms in the 2nd, 50th, 129th, 323rd, 473rd, 508th, 640th, and 723rd frames in Basketball video sequence [33].

David2 Video Sequences. The background in David2
video sequence is a laboratory bench and walls, which are all fixed. The tracking difficulty in this video is that the dark plate painting on the wall has similar color with that of the object, which produces interference. The performance of the six algorithms is shown in Figure 8. In the 61st frame, influenced by the background dark painting plate, a large tracking deviation appeared and this error cumulated and affected the performance of CT (yellow) and MIL (purple) following this frame. During the entire tracking process, SBT (blue) and BSBT (black) lost object frequently. However, BT (green) and HOPEF (red) show a relatively smaller deviation in each frame, thereby avoiding losing the object.

Dollar Video Sequences.
The tracking objects in the dollar video sequence are a pile of dollars on the surface of a table. The interference is all set on the foreground including 2 aspects. On one hand, one dollar bill was folded up. On the other hand, one pile of bills was divided into two piles, namely, splitting and merging of two stacks of money repeated during the entire video. The performance of the six algorithms is shown in Figure 8. In general, only BSBT algorithm (black) is not affected by the action of the dollar bill being folded up. However, the interleaving movement of the two piles of money made BSBT (black) mistakenly follows the wrong pile, thereby leading to the subsequent false tracking. However, although HOPEF (red) is influenced by the bill folding up action, it is not affected by alternating interference. In addition, HOPEF (red) shows the minimum deviation error among all the algorithms, which means that it is less affected by alternating inference. The frames in Figure 9 demonstrated that our HOPPEF (red) algorithm achieved the highest accuracy.

Dudek Video Sequences. The object in
Dudek video sequence is the head of a moving person in the laboratory.
In the process of shooting, the person took off his glasses and moved along the laboratory. In addition, both background and illumination changed. The performance of the six algorithms is shown in Figure 10. SBT (blue) lost the object from time to time. In the 250th frame, CT (yellow) and MIL (purple) show an upper right deviation and BSBT (black) and BT (green) show a lower left deviation, which results in the loose of the tracking object in the subsequent frames for all these four algorithms. Only our HOPEF (red) algorithm shows a relatively small deviation from these frames.
The 2nd frame The 61st frame The 138th frame The 194th frame The 246th frame The 319th frame The 380th frame The 535th frame Figure 8: A performance comparison of the six algorithms in the 2nd, 61st, 138th, 194th, 246th, 319th, 380th, and 535th frames of David2 video sequences [33].
The 1st frame The 51st frame The 126th frame The 131st frame The 136th frame The 226th frame The 241st frame The 251st frame Figure 9: A performance comparison of the six algorithms in the 1st, 51st, 126th, 131st, 136th, 226th, 241st, and 251st frames in the dollar video [33].
The 105th frame The 250th frame The 325th frame The 361st frame The 424th frame The 519th frame The 821st frame The 974th frame Figure 10: A performance comparison of the six algorithms in the 105th, 250th, 325th, 361st, 424th, 519th, 821st, and 974th frames in video Dudek sequences [33].

Mathematical Problems in Engineering
The 103rd frame The 150th frame The 268th frame The 371st frame The 424th frame The 556th frame The 632nd frame The 692nd frame Figure 11: A comparison of the performance of the six algorithms in 103rd, 150th, 268th, 371st, 424th, 556th, 632nd, 692nd frames in video OccFace2 [33].
The 1st frame The 60th frame The 122nd frame The 203rd frame The 233rd frame The 287th frame The 303rd frame The 314th frame Figure 12: The performance comparison of the six algorithms in the first, 60th, 122nd, 203rd, 233rd, 287th, 303rd, and 314th frames in video Freeman [33].

OccFace2
Video Sequences. The purpose in video Occ-Face2 is also tracking a person's head. In the video, the person's head was covered by a book or a cap. The performance of the six algorithms is shown in Figure 11. After the 371st frame, BSBT (black) lost object all the time. Comparing the detection effect in the 371st and 556th frames, the influence of covering by book and cap shows a relatively large effect on SBT (blue) and a small effect on other algorithms. Therefore, the performance of CT (yellow), MIL (purple), BT (green), and HOPEF (red) is close in this video.

Freeman Video Sequences.
The Freeman video records a person walking from right to left and from far to near. The person took off his glasses and looked around from time to time. The performance of the six algorithms is shown in Figure 12. Before the 303rd frame, MIL (purple), CT (yellow), and HOPEF (red) did not lose the object and basically covered the person's head. After the 303rd frame, only CT (yellow) can capture the object.

Mhyang Video Sequences.
The person in the Mhyang video moved around inside the lens. The illumination and the head size slightly changed along with the movement. In the 138th frame, CT (yellow) shifted to the left, which influenced the detection in all the subsequent frames. In the 482nd frame, ML (purple) shifted to the lower left. After the 575th frame, BSBT (black) lost the object from time to time. BT (green) offset in the top right-hand corner of the object. However, the proposed algorithm in this study The 2nd frame The 138th frame The 259th frame The 482nd frame The 575th frame The 863rd frame The 1327th frame The 1445th frame Figure 13: The performance comparison of the six algorithms in the 2nd, 138th, 259th, 482nd, 575th, 863rd, 1327th, and 1445th frames in video Mhyang [33].
The 91st frame The 195th frame The 287th frame The 401st frame The 478th frame The 568th frame The 783rd frame The 835th frame Figure 14: The performance comparison of the six algorithms in the 91st, 195th, 287th, 401st, 478th, 568th, 783rd, and 835th frames in Sylvester video [33].
HOPEF (red) continuously captured the object and showed the least deviation during the entire video (see Figure 13).

Sylvester Video Sequences. The tracking object in
Sylvester video sequence is an animal doll with many edges randomly rotating under a lamp. The tracking difficulties of this video are the drastic change of the illumination and big rotation angle. In the 91st, 195th, 287th, and 401st frame, all algorithms could basically capture the object since there were only slight rotation of the animal doll and small illumination change. Compared to the previous ones, in the 478th frame, BSBT (black) lost the object; meanwhile, MIL (purple) began to shift. Between the 478th and 835th frames, the performance of CT (yellow), BT (green), and HOPEF (red) is close. After the 835th frame, the performance of HOPEF (red) is inferior to that of CT (yellow) (see Figure 14).

Gym Video Sequences.
The tracking object in Gym video sequence is an athlete. This video proves that our algorithm is robust to pose and illumination changes (see Figure 15).

Trellis Video Sequences.
In the video of Trellis, there is a man walking in a greatly varying environment. The extraction difficulties are moving vehicles in the background and the number of buildings surrounding the target. Moreover, when the tracking target slowly walked out from the dim room and the light became stronger, the tracker will face great challenge. According to the experimental result, it can be seen that the tracking effects are ideal with our tracking algorithm. The features used in the study are not sensitive to these factors (especially illumination change), and The 2nd frame The 148th frame The 212nd frame The 323rd frame The 498th frame The 605th frame The 674th frame The 753rd frame Figure 15: The performance comparison of the six algorithms in the 2nd, 148th, 212nd, 323rd, 498th, 605th, 674th, and 753rd frames in Gym video [33].
The 2nd frame The 33rd frame The 66th frame The 97th frame The 125th frame The 189th frame The 217th frame The 245th frame Figure 16: The performance comparison of the six algorithms in the 2nd, 33rd, 66th, 97th, 125th, 189th, 217th, and 245th frames in Trellis video [33].
hence the tracking performance is good throughout the video (see Figure 16).

Jumping Video Sequences.
In the video of Jumping, there is a man Jumping in a greatly varying environment. The extraction difficulties are motion blur. The performance of the six algorithms is shown in Figure 17. According to the experimental result, it can be seen that the tracking effects are ideal with our tracking algorithm.

Ambiguity in Detection.
In this section we discuss how our approach deals with ambiguity in detection. The ORB feature is robust to object rotation, scaling, and noise. The Harr feature is generated statistically, which makes it resilient to object rotation. The combination of these two features provides a better way of describing objects and can reveal differences between multiple objects. When tracking objects overlap, for instance, the first object is covered by the second, which makes it challenging when sampling image features from the first object as features from the second object may be sampled instead. To deal with this issue, when updating observations for the tracking object, we compare the similarity between the tracking object and other objects. If in the current frame there is a high similarity between the tracking object and a particular other object, we can conclude that the tracking object is covered by that particular object. In this way we can address the ambiguity issue to some extent (see Figure 18). Figure 19 shows the tracking error comparison in Basketball, David2, dollar,

Mathematical Problems in Engineering 13
The 3rd frame The 30th frame The 52nd frame The 79th frame The 89th frame The 108th frame The 130th frame The 196th frame Figure 17: The performance comparison of the six algorithms in the 3rd, 30th, 52nd, 79th, 89th, 108th, 130th, and 196th frames in Jumping video [33].
The 90th frame The 97th frame The 109th frame The 209th frame (a) Video source taken from [37] The 63rd frame The 75th frame The 79th frame The 104th frame Dudek, FaceOcc2, Freeman1, Mhyang, Sylvester, Gym, Trellis, Singer1, man, Jumping, EnterPaths1, and Jogging video sequences by the six algorithms. In Figure 19, the red line stands for our proposed HOPREF algorithm. It is clear that the red position error curve always stays in the bottom. In addition, from the beginning to the end, the error fluctuation of HOPREF was relatively small, which demonstrated that its tracking performance is stable. The performance of HOPEF and the other five models is compared from three aspects in this research. The results demonstrate that the proposed tracking method based on the prior probability of entropy of information of the ORB and Haar features is effective and robust.

Track for Two Objects.
Our algorithm can also track for two objects as shown in Figure 20.
In Figure 20, the red rectangle shows the location of the first object, and the green rectangle shows the location of the second object. Our algorithm for tracking two objects can handle occlusion, fast motion, and change of view.

Conclusions
In this research we propose a novel object tracking method based on compressed sensing and entropy of information. First, this method adopts the Haar and ORB features to characterize the object. Second, the dimensions of computational 16

Mathematical Problems in Engineering
The 1st frame The 104th frame The 125th frame The 175th frame (a) Video source taken from [37] The 1st frame The 101st frame The 194th frame The 350th frame (b) Video source taken from [37] The 1st frame The 28th frame The 93rd frame The 188th frame (c) Video source taken from [33] The 1st frame The 223rd frame The 430th frame The 613rd frame space of the Haar and ORB features were effectively reduced through compressed sensing. Then the above-mentioned features were fused based on entropy of information. Finally, in the particle filter framework, the object location was obtained by selecting candidate object locations in the current frame from the local areas neighboring the optimal locations in the last frame. Experimental results demonstrated that this method was able to effectively address the challenges of perception change, illumination change, and large area collocation. However, there is still room for improvement in our algorithm, which will be considered in the future work. First, the situation of losing track of the fast moving object still exists. A self-adaptive method needs to be designed to further improve the tracking performance. Second, according to the experimental results, the detection effect is not satisfactory when the size of tracking object changes dramatically due to the fixed size of identifying window in the entire tracking process. Furthermore, this algorithm cannot take multiple objects (more than two) into consideration, and we aim to address this issue in the next step of our research.