Optimized UAV object tracking framework based on Integrated Particle filter with ego-motion transformation matrix

. Vision based object tracking problem still a hot and important area of research specially when the tracking algorithms are performed by the aircraft unmanned vehicle (UAV). Tracking with the UAV requires special considerations due to the flight maneuvers, environmental conditions and aircraft moving camera. The ego motion calculations can compensate the effect of the moving background resulted from the moving camera. In this paper an optimized object tracking framework is introduced to tackle this problem based on particle filter. It integrates the calculated ego motion transformation matrix with the dynamic model of the particle filter during the prediction stage. Then apply the correction stage on the particle filter observation model which based on two kinds of features includes Haar-like Rectangles and edge orientation histogram (EOH) features. The Gentle AdaBoost classifier is used to select the most informative features as a preliminary step. The experimental results achieved more than 94.6% rate of successful tracking during different scenarios of the VIVID database in real time tracking speed.


Introduction
Tracking objects in Unmanned Aerial Vehicle (UAV) camera image had been an important area of research many years ago.Many applications still require improvements in the performance of the video trackers due to many challenges related to this topic.The fastabrupt motion, low resolution, noisy imagery, cluttered background, low contrast, and small target size are common issues that should be taken into considerations [1,2].There are also many factors govern the selection of the suitable object tracker such as the environment nature, object scale and types of the object motion [11,12].
Thanks to a "bird's eye view" delivered by the UAV camera, the UAV images is a vital choice for many critical application including traffic planning and surveillance, road conditions and emergency response, search and rescue operations, counter-terrorism and fighting against illegal immigration missions.The aerial images provide detailed information about the interesting target/targets that can be considered as the first and important step in the implementation of these application systems [2,3,5,10].
The detection and tracking algorithms can be divided into single object or multi-object tracking [1].Single target detection and tracking is very important in many critical applications.For instance, in military applications, the process of alignment of the enemy target in the center of the camera field of view (FOV) is necessary for shooting purpose.However, the difficulty of the tracking will increase in case of tracking using UAV camera images.This is due to the moving of this platform which produce an ego motion effect between the moving object and the background.Also, the changing of the altitude of the UAV results a change in object scale which adds extra challenges on the tracking algorithms.
In most cases the tracking systems occurs in highly environments disparity.Not all the features components extracted by certain feature extraction method may be considered as an optimum feature to contribute in an object representation model.For this reason, the proposed system performs a preliminary classification step to determine the best feature components using boosting algorithm [11].
The particle filter is a Bayesian-based framework that gives an optimal solution for tracking problem involving a recursive prediction and correction step [1,6].The prediction step computes the prior probability density function (PDF) of the current state based on the system dynamics of the target and the PDF of the previous state.The correction step gives the posterior distribution of the object current state through the likelihood of the measurement (in the new frame) and the Bayes' rule.The particle filter uses a set of weighted samples (particles) to represent the target object PDF.Each particle represents a potential state for the object candidate state.In each time step, the particles will be updated based on the dynamics of the target object, and per their measurements, they will be re-weighted.a higher particle weight represents a higher probability state of the object.The next section will survey some of related work to the proposed system which is discussed in section 3.In section 4 the experimental results and discussion will be clarified and section 5 will contain the conclusion.
2 Related works M. Josh et.al implemented an algorithm for victim detection and tracking in search and rescue operations using Unmanned Ground Vehicle (UGV) [3].They aimed to reduce the operator effort by developing a semi-autonomous system that can follow the victim.This system consisted of two main stages includes Ego motion compensation stage and particles filter and clustering stage.The ego motion compensation algorithm tracks good selected features from frame to frame then constructs compensation matrix and compensated image.They improve the system performance by adapting the pyramidal level per velocity feedback.
In 2015 M. Abdelwahab et.al proposed a real-time technique for detecting, tracking and counting vehicles in simultaneously manner for airborne and stationary camera video [4].They used Kanade-Lucas-Tomasi (KLT) Feature tracker to detect good features to track in the image frame.The non-stationary background points were removed by measuring the changes in the histogram of the pixels around each feature point with time to obtain the foreground features (FGF).Then they clustered and grouped them into separate trackable vehicles per the movement angles and displacement magnitudes.Their algorithm achieved real time performance for tracking vehicles in airborne videos without any prior knowledge for their locations and independent on their number.
Cao et.al use the KLT features and Random sample consensus (RANSAC) method to separate background features from moving objects and estimate the ego motion of the moving camera fixed in UAV [6]..They incorporated ego motion transformation matrix with the system model of the particle filter prediction step.The HSV color histogram and Hu moments were weighted and combined for computing the similarity measure and used them in the observation model of the particle filter correction step.The performance of their algorithm achieved tracking rate of 95%.However, the average tracking speed of the proposed method is 13.1 frames per second which is may not satisfy some application requirements.
Saif et.al presented and updated framework to handle six Uncertainty Constraint Factors (UCF) issues and challenges for moving object detection and tracking problems from UAV aerial images [2].Theas six UCFs including the illumination change, environment clutter, object type, Camera motion, moving object direction and motion complexity.They also dealt with the feature extraction problem as a separate unsolved issue because of the increasing of computation time related to the selections of large feature vector that suitable for optimum detection performance.They proposed a general framework for object detection problem from UAV aerial images.They employed a combination of frame difference and segmentation techniques for motion vector estimation and blob detection respectively.After that they suggested to use clustering to give physical meaning of overall detection.Finally, they recommended a proper classification step to distinguish between different types of objects that may include humans, vehicles, etc.
Moti et.al proposed a tracking method based on arbitration between Optical Flow (OF) and Kalman Filter (KF) techniques that can predict a target position in an efficient manner even it turns suddenly during its motion [7].Their attention had been drawn to different situations where either the OF worked better or the KF did.They measured the distances to the nearest obstacle using the laser and used infrared camera images to detect the target object.Then fused these two types of data with the arbitrate OFKF filter for real-time tracking of a man in an indoor lab environment.By the same way, Shantaiya et.al proposed a simultaneously multiple objects tracking algorithm using Kalman Filter and improved Optical Flow [8].They achieved better tracking accuracy using this improvement and combination relative to using each other separately, but the computation time still not suitable to track objects in UAV tracking requirements frame works.

Proposed Framework
In this paper the tracking framework can be divided into four stages includes pre-tracking, prediction, correction and post-tracking stages as illustrated in Fig. 1.The following sub-sections will give detailed descriptions for these stages.

Pre-Tracking Stage
The pre-tracking stage consists of four steps aimed to select the best features to represent the target from the first frame in the search area as illustrated in the following.

Target detection
The pre-tracking stage is executed to extract the best features indices vector before start tracking process.Many tracking systems proposed algorithms to detect the target object from image sequence using manual or automatic manners.The proposed framework starts with manual selection for the target object by the user as the region inside a rectangular region from the whale image frame.This region of interest (ROI) is considered as a target object template that will be exposed to the feature extraction step as illustrated in the following subsections.

Search area construction
The proposed framework starts with the construction of padding image ‫ܲܫ‬ before constructing the search area due to the possibilities of existing the target object position near the image boarders.This may restrict formation of the search area to have the same bounding box during tracking process for all frame sequence.By finding the width ‫ݓ‬ and height ℎ of the ROI, the padding value ܲ can be determined as ݊ multiples of ‫ݓ‬ or ℎ (which one is bigger).Then search area can be constructed based on the ROI bounding box and ܲ.

Best feature vector discovery
In this step the proposed system applies the Gentle AdaBoost classifier on a pre-constructed training dataset based on the search area foregrounds and backgrounds.By knowing target ROI, the ݂ foregrounds can be extracted by different random views or positions around the ROI center and the ܾ backgrounds can be extracted from different location the rest of the search area.All foregrounds and backgrounds have the same size of ROI ‫ݓ‬ and ℎ The features extraction step based on Haar rectangles (HR) and edge orientation histogram (EOH) is applied on all foregrounds and backgrounds to extract all candidate features.These candidates are concatenated together to form a meas input vector of size (݂ + ܾ)×ܶ, where ܶ is the number of all candidate features components.
The Gentle AdaBoost classifier trains ܶ week classifiers (the same number of the candidate features components) and combines them into a linear fashion.Each classifier ℎ ‫)ݔ(‬ is a simple threshold function trained on one feature only.They are weak in the sense that they can only decide whether a value is below or above a given threshold.Although one-week classifier is not accurately to describe a whole dataset alone, a combination of them would lead to a strong classifier.A simple weak classifier example used in this paper is as the following: Where ݂ is the selected feature and ߠ is the learned threshold.The output of the classifier ‫ܨ(‬ ௗ௫. ) is in the form of vector with size ܶ ௧. ×1, where ܶ ௧.Is a preselected number of the strongest week classifiers (i.e.candidate features components).

Target representation model
As illustrated above, the target is represented by a rectangle region that contains ‫×ݓ‬ℎ pixels which is difficult to deal with them directly.So, a feature extraction process should be applied and specifically on the features with the index numbers inside ‫ܨ‬ ௗ௫.vector.There are many types of features can describe the target object.Due to the real-time requirements of the UAV tracking system, in this paper the Haar-like rectangles ‫)ܴܪ(‬ and the edge orientation histogram (EOH) are used because of their simplicity and fast computation cost [13,14].

Haar-like Rectangles ‫)ܴܪ(‬ features
The HR features can describe the object with its color and much spatial information.They can be defined as a filter that computes the gray level difference between two defined white and black areas as the following: Where  The EOH features are invariant with color and local spatial information.So, they are good for tracking targets in low light where the colors can be hard to distinguish [14].To compute EOH first the search area is converted to grayscale image and is convoluted with the horizontal and vertical Sobel kernels to detect the edges on it.After that, the magnitude and direction are calculated for all pixels.Only pixels that have the magnitude greater than a pre-defined threshold (T) are considered and the rest considered as noise.Then all edge directions fall between 0 to 2π are quantized into (M) number of bins and calculate.Using the integral image technique of the M ୲୦ bin matrix, the system can compute all possible components of EOH features as the sum of the magnitude for each bin relative to ROI.

Prediction stage
The prediction stage is the first step in the Particle filter algorithm that concerns for estimating the new position of each particle using a state transition model (dynamic model).In many cases the first order auto-regressive model is used and achieves good results in tracking systems [6,7,8].However, in case of tracking objects using moving platform (UAV), this model should be optimized to compensate the ego motion resulted by the moving UAV camera.The proposed system uses the optical flow technique and k-means clustering method for calculating the ego motion transformation matrix to increase the tracking rate.Then the transformation matrix is combined with the dynamic model to produces the proposed state transition model for the particle filter technique as discussed later.

Harris Corner (HC) detection
To calculate the ego motion, the system starts with dividing the current frame into sub-frame regions (blocks) then detecting the Harris Corner points (HC) [15] on them.This HC points may relate to fixed objects (background) or moving objects (foregrounds).However, we first assume that the most of them are related to the background.The optical flow method is used to estimate the new position of HC points in the next frame.Fig. 3(a, b) shows HC points and their correspondences in all frame blocks.

Ego Motion Estimation
After having the new HC points and their correspondences, the Euclidian distance (ED) and angles (θ) are calculated for each HC pairs ((x ୧ , y ୧ ) and (y ୧ ᇱ , y ୧ ᇱ )) per equation 3, 4. The proposed system applies the kmeans method on all HC points belong to each block.It aims to cluster the HC data (θ and ED) into two groups representing the foreground (moving objects) and backgrounds.The group contains of the most HC points is considered as a background and the foreground points are removed as shown in Fig. 3(c).
The Maximum Likelihood Estimation SAmple Consensus (MLESAC) method is applied on the background HC pairs to estimate the affine geometric transform for each block.The MLESAC is a generalization of RANSAC estimator with a maximization of the likelihood rather than just the number of inliers [16].Due to the possibilities of image rotation, skewing, and warping, not all blocks have the identical affine transformation.So the median value of all affine transformations is computed and is considered as the ego motion transformation(E).The transformation E is an 3x3 affine transformation matrix with the following form: where a, b, c, d, e and f are the matrix components which determine the values of translation, scale, sheer and rotation transformation types.

Dynamic model update
As mentioned above the calculate ego motion transformation matrix is cooperated with auto-regression model to formulate the state transition model for the (N) particles.Each particle can represent a candidate object state (X) as the following: Where x and y are the left-top coordinate of the ROI rectangle and w and h are its width and height respectively.The velocity information is represented by ẋand ẏcomponents.The constant number 1 inserted for matrix operations convenient.The proposed first-order auto-regression model can be expressed by the following equations: ‫̇ݕ‬௧ = ‫̇ݕ‬௧ ିଵ + ‫‪̇௧‬ݕܩ‬ ିଵ (0, ߪ ௬̇) (10) ‫ݓ‬ ௧ = ‫ݓ‬ ௧ିଵ + ‫ݓܩ‬ ௧ିଵ (0, ߪ ௪ ) (11) ℎ ௧ = ℎ ௧ିଵ + ‫ܩ‬ℎ ௧ିଵ (0, ߪ ) (12) Where G * ( * ) is zero mean white Gaussian noise components for several unidentified dynamic factors that may happened during tracking (sudden random motion, slight acceleration, …).σ * is the variance of G * ( * ).After calculating the ego motion transformation matrix E ଷ×ଷ , the system takes only the first two rows in E and updates only the two position components in the state ‫܆‬ (i.e.x ୲ and y ୲ ) as the following equation: Where xe ୲ and ye ୲ are the two updated position components of the target object at current frame t.
The equations (11,12) are updated by inserting the effect of scale change represented by a and d components in the ego motion transformation matrix E as the following: Empirically, the nominal values of a and d components were between (0.97 -1.3) which denotes to decreasing or increasing the object scale.

Correction Stage
The main objective of the correction stage is to calculate the weights corresponding to the new samples in each time step.It consists of three steps including the first two HR and EOH features extraction steps as discussed before.In the observation modeling step, the system assigns weights to all particles according to a similarity score between the target object and the candidate particle ROI.
Based on the new weights of the particles (samples), the systematic resembling step is applied to reject samples which have very low weight and to concentrate on particles with large weights.In the beginning the system assigns equal weights to the particles.Then, by calculating the observation likelihoods in the next frames, each particle represents a good candidate to the object will be assigned to a high weight.

post-tracking
In this stage the proposed system can predict the new state of the target by taking the median of all candidate states represented by the particles.Because of the probability that the particles may have status positions relatively far from the object center, the median filter is used.It achieves good results better than average filter due to the minimal influence to the noise.Based on the estimated new position of the target object, the new search area is determined again to use in object tracking in the next frame.

Experimental results
To evaluate the proposed object tracking framework in different scenarios, the VIVID database [17] is used with small modification to simulate more difficult environments.This modification performed by adding some drifting noise in each new frame.This drift can be considered as UAV vibrations or camera shacking.The VIVID includes civilian and military vehicles moving on a runway with different kinds of motions (straight and loop) and changing speed.Also, the scale of them are changing in the image due to the change of UAV altitude.In some situations, the tracked vehicle looks very similar to another when it passes.
The algorithms implemented using MATLAB and all the experiments were executed on a 2.6 GHz computer with 4 GB RAM.The performance results achieved 94.6% successful tracking rate with calculation speed more than 48 frames per second (fps) for 640×480 pixels images.
At the first frame the system selects randomly 20 different sub-images around the object as foregrounds and 200 sub-images from the background.Then a pool of 106 extracted features (90 HR and 26 EOH) are created to be the inputs for the AdaBoost classifier for selecting the best ‫ܨ‬ features that represent the object.In calculating the ego motion the system divides the image frame into 4×4 sub-image and choose only the strongest 20 ‫ܥܪ‬ in each.Finally, the particle filter set at 500 particles to represent the candidate object locations.
To evaluate the system the recall and precision are calculated in same manner as in [6].Fig. 4 and Fig. 5 show the average accuracy and the speed of the system using different values of ‫.ܨ‬As mentioned before the ego motion incorporation with the dynamic model improves the accuracy of the framework.Fig. 6.Illustrates the accuracy of the system for the first image sequence of VIVID database (egtest01) which contain vehicles that loop around on a runway, then drive straight.Starts from frame 900 vehicle being tracked speeds up and passes very similar to another vehicle.When the vehicle speeds up the particles lags the center of which means that the dynamic model does not guides these particles and predicts the new position as better as before.So, the accuracy values are decreasing (red curve).By incorporating the ego motion transformation matrix with the dynamic model, the particles directed to closer position to the vehicle center resulted in improving the accuracy value (green curve).As mentioned before by updating the dynamic model again using equations ( 13) and ( 14), the width and the height of each candidate particle bounding box will get closer values as the object ground truth.Then the recall and precision accuracies will increase as shown in Fig. 8. Table 1 is showing a comparison between the proposed framework and the tracking method in [6] in both speed and accuracy to illustrate the improvement of the performance specially the real-time requirement with the satisfactory accuracy.The tracking speed using the proposed framework is approximately four times the speed in the other method.The accuracy didn't reduce by large value and can be consider the same in the both methods due to the probabilistic nature of the experiments.There are many factors for the improvements in the speed.The use of the integral images for calculating the HR and EOH features reduced the execution time significantly.However, this execution time is directly proportional to the image size.In single object tracking purposes using UAVs, the object size is small.So, it is intuitively to concentrate the calculations on a search area as minimum as possible to reduce the calculations but as maximum as possible to includes the object candidate positions.In this paper, the search area was chosen first to be three times greater than the object size while the object speed per frame not exceeds this value.

Conclusion
In this paper an optimized object tracking framework is introduced to achieve the UAV vision based object tracking requirements.The tracking process starts with selecting the best features form a pool of Haar-like rectangles (HR) and edge orientation histogram features (EOH).This is done by applying AdaBoost classifier to all features that extracted from a pre-created set of foregrounds and backgrounds in the first frame.The selected features are considered as the observation model of the target object.This observation model is used for the particle filter correction stage to compare with candidate position denoted by each particle.Before the correction step the new position of the object is predicted using a modified dynamic model of the object.This modification is done by calculating the ego motion of the UAV using fixed background points and comparing between them to extract the affine transformation matrix which incorporates with the object dynamic model.By using this framework, the tracking speed and accuracy achieves better results than most of recent UAV tracking methods.

Fig. 1 .
Fig. 1.The schematic diagram of the proposed framework.
‫,ݔ‬ ‫ݕ‬ are the coordinates of the top-left corner for any candidate ROI and ‫ݓ‬ and ℎ are its width and height respectively, the ‫݁ݕݐ‬ is one of the Haar-like rectangles patterns listed in Fig.2, the channel of the color space is represented in term ‫ܥ‬ and ‫ܧ‬ ௪ ‫)ܫܱܴ(‬ and ‫ܧ‬ ‫)ܫܱܴ(‬ are white and black parts of ‫.ܫܱܴ‬The pool of the features consists of various ‫ܴܪ‬ per patterns, sizes and color channels.Thanks to the integral image ‫)ܫܫ(‬ technique, the ‫ܴܪ‬ features can be extracted in very lower executional time.Using ‫,ܫܫ‬ the summed values of a certain ‫ܫܱܴ‬ can be efficiently computed by only four ‫ܫܫ‬ accesses.

Fig. 4 .
Fig. 4. Accuracy of the proposed system for different values of number of features.

Fig. 5 .
Fig. 5. Speed of the proposed system for different values of number of features.

Fig. 6 .
Fig. 6.Accuracy for VIVID (egtest01): (red) the accuracy without ego motion incorporation, (green) the accuracy with ego motion incorporation.The effect of scale change appears obviously when we examine the recall and precision measures separately as shown in Fig.7.Approximately after frame 1300 the UAV altitude was slightly decreasing results in increasing in the object scale which leads to decreasing in recall results.

Fig. 7 .
Fig. 7. Recall and Precision Accuracies without compensating the scale change effect.

Fig. 8 .
Fig. 8. Recall and Precision Accuracies with compensating the scale change effect.

Table 1 .
Comparison between the proposed tracking framework and the classical particle filter.