Data-driven XGBoost-based filter for target tracking

: In recent years, the data-driven approach has been introduced in the field of target tracking as a powerful tool developing the end-to-end mapping relationship between input features and outputs. Typically, in data-driven methods, neural networks serve as a supplement of traditional Bayesian filters for improved estimation accuracy. However, these hybrid methods are somehow complicated to realise. In this study, inspired by the idea of direct-mapping from measurements to states, simpler method by developing a data-driven XGBoost-based Filter (DXGBF) is proposed. The DXGBF consists of four components, namely data generator, sliding window, centralisation strategy and XGBoost learner (XL). The data generator generates simulated data from the probabilistic model in the training phase. By intercepting the measurements, the sliding window enables DXGBF to track targets online. The centralisation strategy extracts the relevant kinematic information from different tracks that enables DXGBF to track randomly initialised targets. The XL is responsible for learning a function that mapping to estimate states. Simulation results show that the estimation accuracy of DXGBF is higher than those of Kalman filter, sampling importance resampling particle filter and the data-driven random-forest-based filter.


Introduction
Data-driven approach has become more and more popular in recent years since it is a powerful and universal tool to learn an end-toend mapping from input features to desired outputs. It has been applied in many fields, such as medicine, transportation and security. Typically, one of the most important applications is to apply the data-driven approach to target tracking. In general, a lot of meaningful work has been done. In [1][2][3][4], neural networks (NNs) are used to predict the error between the estimate of Kalman filter (KF) [5] and the true state. The output values of NN are added to the results of KF as the final estimation of the target state. The literatures [6,7] study the NN-aided KF methods for GPS positioning. The innovation produced by KF is used as inputs of the NN. The NN estimates the measurement noise covariance or the gain matrix. Besides, [8,9] propose using NN to approximate the uncertainty of the system model caused by mismodelling, extreme non-linearities etc.
Although the hybrid NN methods above can improve tracking accuracy in certain scenarios, they are complicated to realise. In [10], an idea that a learner learns a simple end-to-end mapping from the measurements to the states directly is investigated. It uses the probabilistic model of the tracking problem to simulate a large amount of state and measurement sequences. Both are fed into a random forest [11] regression algorithm that directly learns the desired mapping function. We call this method the data-driven random forest based filter (DRFF) in this paper. In the low process noise condition, DRFF shows better estimation result than Kalman smoother [12] and particle filter [13]. Although the result is impressive, DRFF has two significant drawbacks. First, DRFF is an offline filter. It needs to collect all the measurements of a track, then filters the states together, and it needs to train different filters for different frames. Second, DRFF uses the original measurements directly as input features. When the tracks are randomly initialised, the performance of DRFF decreases dramatically.
To solve these two problems, we develop a data-driven XGBoost-based filter (DXGBF) which also adopts the idea of directly mapping. This filter consists of four components, namely data generator, sliding window, centralisation strategy and XGBoost [14] learner. The data generator is used to generate simulated tracks in the training phase. The sliding window intercepts the measurements that enable DXGBF to track targets online and to use one learner for different frames. The centralisation strategy is aimed to extract the relevant kinematic information from different tracks that enables DXGBF to track randomly initialised targets. XGBoost learner (XL) plays two roles. First, it is responsible for processing the missing values when the measurement sequence has an insufficient length; second, it is a learner that learns a mapping from processed features to desired outputs. The simulation results demonstrate that DXGBF is capable of online filtering, and its estimation accuracy is higher than those of KF, sampling importance resampling (SIR) [15] filter and DRFF when the tracks are randomly initialised.

Problem formulation
We investigate a standard tracking problem in this paper. The track evolves according to a constant velocity model [16] x k + 1 where {x k , k ∈ N} is the state sequence, x k , y k T and x˙k, y˙k T are the target position and velocity in Cartesian coordinates, {v k , k ∈ N} is an i.i.d process noise sequence, v x, k and v y, k are the acceleration noise in both xand ydirection, respectively, Δt is the sensor scanning circle.
The sensor is located at r = [r x , r y ] T . Its coordinate system is polar in 2D with range ρ and bearing α. The measurement model is denoted by where {z k , k ∈ N} is the measurement sequence, {n k , k ∈ N} is an i.i.d measurement noise sequence, n ρ, k and n α, k are the J. Eng measurement noises in both range and bearing direction, respectively. In particular, we seek filtered estimates of x k based on the set of all available measurements z 1: k = {z i , i = 1, . . . , k} up to time k. From the mapping perspective, the tracking problem is to design a function where X = x k is the output variable, x k is the filtered state at time k, and Z = z 1: k is the input variable. We focus on the design of this mapping function in the following section.

Data-driven XGBoost-based filter
In this section, a DXGBF, which can track randomly initialised target online, is developed. Fig. 1 shows the diagram of DXGBF. It contains four components, namely data generator, sliding window, centralisation strategy and XL. In the training phase, the data generator uses the probabilistic model to generate the training samples. After training, the red dotted lines are removed, and the trained DXGBF can be used to track new targets that have never been seen before. After passing the three modules 'sliding window', 'centralisation strategy', 'XL', the measurement sequence z 1: k is mapped to the estimate state x k . These four components will be discussed sequencially.

Data generator
The data generator (DG) is used to generate simulated tracks in the training phase. It employs the system model (1) and the measurement model (2) to generate sufficient training samples. Suppose we generate N tracks, each containing a sequence of measurements and a sequence of states from time k = 1 to time k = T. The training set can be expressed as which means we can obtain NT training samples from the tracks. After training, the structure and the weights of the ensemble tree model of the XL are determined; hence DG can be removed from DXGBF.

Sliding window
The sliding window (SW) is to intercept the K nearest measurements from the historical measurement sequence where K is the sliding window length. The condition that k < K will be discussed in Section 3.4. Compared with DRFF, which uses all the measurements to estimate states, SW uses the measurements up to the current time only, enabling online filtering. Besides, SW intercepts a fixed length measurement sequence, which allows the input of a fixed length feature vector to the XL, so it only needs one learner for all the frames. The idea of SW is that the historical measurements received a long time ago contains little information. We can analyse the information loss of this discarding operation in a linear Gaussian system. In such a system, KF is the optimal filter, the state update formula of which being where x k is the filtered state at time k, F k is the state transition matrix, H k is the observation matrix, K k is the gain matrix. From (6), we can recursively derive that where w k − K is the weight matrix of x k − K derived from all the measurements before time k -K, and From (7), we can see that in the linear Gaussian system, the information loss will be w k − K if we discard the measurements before the nearest K frames. If the state transition matrix F, the observation matrix H, the process noise covariance C v and the measurement noise covariance C n are all known, w k − K can be calculated directly. With the increase of K′s length, the information loss will decrease. However if K was too long, the input features would introduce more noise than information, so the choice of K needs to be moderate.

Centralisation strategy
The centralisation strategy (CS) is to transform the measurement sequence z k − K + 1: k to a new sequence z k − K + 1: k where z i = h(z i , z k ), i ≠ k is the relative position between the ith measurement z i and the kth measurement z k in Cartesian coordinates where [x, ỹ] T is the sensor centered position of the measurement z in Cartesian coordinates, and which means to transform z k from polar coordinates to Cartesian coordinates.
The idea of CS is to extract the relative movement information from different tracks by the first K − 1 elements of z k − K + 1: k , and to retain the absolute location information of the target by the measurement of the current frame. By this way, the common information of the tracks is extracted. The learner is trained effectively so that it can track the randomly initialised targets. It is noteworthy that, after transforming the measurements from polar coordinates to Cartesian coordinates, the measurement model (2) is equivalent to where ñ k is the measurement noise in Cartesian coordinates. The probabilistic density function (pdf) of ñ k changes when the range and bearing of the target changes. It is difficult for a learner to distinguish different obscure noise distributions, so in this paper, we only discuss the tracking problem in a constrained surveillance area, where the pdf of ñ k can be considered the same.

XGBoost learner
XGBoost is a tree-ensemble algorithm. In the training phase, it generates a new tree using the first-and second-order gradients of the loss function. In the prediction phase, a sample is mapped to a leaf of each tree by decision rules. The sum of the weights of all the corresponding leaves is the predict result. The XL has two responsibilities. First, it makes up the deficiency of SW in the track initiation phase. SW requires a measurement sequence with a length of K. However in the track initiation phase, the length of the sequence is <K, which makes SW unavailable. Fortunately, XGBoost can process missing values automatically. While the length of the measurement sequence is <K, the missing part can be viewed as missing values, and we make up the length of the sequence to K.
Second, XL is a learner that learns a mapping function In the training phase, the processed features z k − K + 1: k are fed into XL as input variables. XL tries to minimise the objective ℒ, which is a function of x k and x k . In this paper, the objective is the root mean square error (RMSE) of x k i and x k i , plus the default regularisation term of XGBoost regressor. After training, the structure of the trees and the weights of the leaves of XL are determined. DXGBF can be used to track new measurements that have not been seen before. The hyper parameters of XL have a significant effect on the filtering accuracy of DXGBF, which will be discussed in the next section.

Simulation
In this section, the performance of DXGBF is investigated, compared with KF, SIR and DRFF. First, we study the effect of hyper parameters on the performance of DXGBF. Then, two scenarios of tracking fixed state and randomly initialised targets are investigated. In the following analysis, the results are gathered by averaging over 1000 Monte Carlo realisations.
Here, the sensor is located at r = [0, 0] T . The sensor scanning cycle Δt = 1 s. The sequence length of the states T = 16 s. The process noise and measurement noise are Gaussian distributions v k ∼ N(0, C v k ) and n k ∼ N(0, C n k ), respectively, where .05 m/s 2 , and where σ ρ and σ α are the standard deviation of range and bearing measurement error, respectively. To apply KF, the measurement is converted into the Cartesian coordinates, and the measurement noise covariance in the polar coordinates is converted into Cartesian coordinates through a nonlinear transformation. The number of particles of SIR is set to 3000. Resampling is performed in each step. Both KF and SIR use the first two frame measurements to initialise a track. For DRFF, we collect all the measurements of a track and then estimate all the states offline. Both DRFF and DXGBF use 20,000 tracks as training samples. According to the optimal result of [10], the tree numbers in each random forest is set to 150. The sliding window length of DXGBF is K = 13, the number of trees is 200, the learning rate is 0.05, and the maximum depth of trees is 8.

Effect of hyper parameters
The hyper parameters have significant influence on the filtering accuracy of DXGBF. Here we study the most important three parameters, the number of the training tracks, the number of trees of the XL, and the length of the sliding window. In Fig. 2a, the RMSE is monotonically decreasing with the increase of the number of the training tracks. When the number of the training tracks becomes very large, its performance can still be slightly improved. In Fig. 2b, the RMSE quickly converge to a constant value with the increase of the number of trees of the XL. It is noteworthy that while the trees are insufficient, the performance of DXGBF is poorly due to underfitting. In Fig. 2c, the RMSE is also tend to converge while the length of the sliding window increases. However, we should remember that the performance could be worse if K was too long, because the input features would introduce more noise than information.

Tracking fixed state initialised targets
In this scenario, we study the tracking performance of fixed state initialised targets according to [10], for the purpose to compare with that of randomly initialised tracks. The tracks are initialised from a fixed state [C x , 50 m/s, C y , 50 m/s], where C = [C x , C y ] is the initial location of all the targets. Fig. 3 shows the filtering results in different surveillance areas or measurement noise covariance conditions. The parameter configurations are shown in Table 1.
From these figures, we can see that while the tracks are generated from a fixed state, the estimate accuracy of DRFF is the best. That is because the measurements of each frame of all the tracks have small fluctuation range that it is appropriate to use the measurements as input features directly. Besides, DRFF adopts offline filtering method that it uses all the measurements to estimate the states. The performance of DXGBF is nearly the same with that of DXGBF, which demonstrate the effectiveness of the sliding window method. There is little information loss caused by the discarding operation. Moreover, DXGBF realises online filtering and only needs one learner for all the frames due to the sliding window method. The performance of both the mapping methods of DRFF and DXGBF are obviously better than the Beyesian methods of KF and SIR. This will be discussed in the next subsection.

Tracking randomly initialised targets
In this scenario, we study the tracking performance of randomly initialised targets. The initial states of the tracks are uniformly generated in the interval [C x ± 500 m, ± 50 m/s, C y ± 500 m, ± 50 m/s], where C = [C x , C y ] is the centre of the surveillance area. Fig. 4 shows the filtering results in different surveillance areas or measurement noise covariance conditions. The parameter configurations of the centre of the surveillance area and the measurement noise covariance are the same as that in Section 4.2.
From these figures, we can see that the performance of DRFF decreases dramatically. That is because when the tracks are randomly initialised, the fluctuation of the measurements is in a large range. The method that directly uses the raw measurements as inputs is not appropriate any more. Interestingly, the shape of the RMSE of DRFF is like a bowl. That is because the more closer to centre, the more information can be inferred from the forward and backward measurements.
The filtering results of KF and SIR are nearly unchanged compared with that in Fig. 3. The reason is that the tracking performance of these Bayesian methods is not affected by the initial states of the targets.
The filtering accuracy of DXGBF is also decreased due to the random initialisation of the tracks. However even though its RMSE is the lowest among these filters. It demonstrates the effectiveness of the centralisation method, which extract the relative movement information among different tracks. The results of Figs. 3 and 4 show that the simple mapping methods based on the simulated data, which is generated from the models, can achieve better filtering accuracy than the model-based Bayesian methods in certain scenarios. Nevertheless it is difficult to give a theoretically proof that why the performance of the mapping method is better than that of Bayesian methods in these scenarios. The upper bound of performance of the data-driven approach needs to be further studied.

Conclusion
In this paper, we considered the target tracking problem for the data-driven approach. Specifically, a data-driven XGBoost-based filter, which is a directly mapping from measurements to states is developed. It consists of four components, namely data generator, sliding window, centralisation strategy and XL. The data generator is used for generating simulated data from the probabilistic model in the training phase. By intercepting the measurements, the sliding window enables DXGBF to track targets online and use one learner for different frames. The centralisation strategy extracts the relevant kinematic information from different tracks that enables DXGBF to track randomly initialised targets. In addition to processing missing values, the XL is responsible for learning a function that maps from processed features to estimate states. Simulation results showed that DXGBF has the ability of online filtering, and its estimation accuracy is better than those of KF, SIR and DRFF while the tracks are randomly initialised.

Acknowledgements
This work was supported in part by the Chang Jiang Scholars Program, in part by the 111 Project No. B17008, in part by the National Natural Science Foundation of China under Grant