Multisensor Estimation Fusion with Gaussian Process for Nonlinear Dynamic Systems

The Gaussian process is gaining increasing importance in different areas such as signal processing, machine learning, robotics, control and aerospace and electronic systems, since it can represent unknown system functions by posterior probability. This paper investigates multisensor fusion in the setting of Gaussian process estimation for nonlinear dynamic systems. In order to overcome the difficulty caused by the unknown nonlinear system models, we associate the transition and measurement functions with the Gaussian process regression models, then the advantages of the non-parametric feature of the Gaussian process can be fully extracted for state estimation. Next, based on the Gaussian process filters, we propose two different fusion methods, centralized estimation fusion and distributed estimation fusion, to utilize the multisensor measurement information. Furthermore, the equivalence of the two proposed fusion methods is established by rigorous analysis. Finally, numerical examples for nonlinear target tracking systems demonstrate the equivalence and show that the multisensor estimation fusion performs better than the single sensor. Meanwhile, the proposed fusion methods outperform the convex combination method and the relaxed Chebyshev center covariance intersection fusion algorithm.


Introduction
Estimation in nonlinear systems is extremely important because almost all practical systems involve nonlinearities of one kind or another [1,2], such as target tracking, vehicle navigation, automatic control and robotics [3,4]. In the case of nonlinearities, estimation cannot be obtained analytically in general. Some methods based on exact parametric models and ideas of the Kalman filter (KF) have been developed. The Extended Kalman filter (EKF) [5] was the most common application to nonlinear systems, which simply linearizes all nonlinear functions via Taylor-series expansion and substitutes Jacobian matrices for the linear transformations into the KF equations. The unscented transformation was introduced to address the deficiencies of linearization, namely unscented Kalman filters (UKF) [1], which provided a more direct and explicit mechanism for transforming mean and covariance matrices. Under the Bayesian framework, particle filter (PF) was presented by constructing the posterior probability density function of the state based on all available information in Reference [6]. However, for nonlinear systems, these methods were usually assumed that the nonlinear relationships are known. Lack of modeling accuracy, including the identification of the noise and the model the exact dynamic models or known nonlinear functions. For the cases in which accurate parametric models are difficult to obtain, it is worth considering integrating Gaussian processes with multisensor estimation fusion to improve the system's performance.
In this paper, we focus on multisensor estimation fusion including the centralized and distributed fusion methods with Gaussian process for nonlinear dynamic systems. Firstly, given the training data, we learn the dynamic models with the Gaussian process and derive multisensor estimation fusion methods based on the Gaussian process models for nonlinear dynamic systems, which can avoid inappropriate parametric models and improve predictive ability. Combining with the single senor GP-ADF and GP-UKF, respectively, the prediction step and update step of the multisensor estimation fusion are provided. In general, it is hard to analyze the performance of the non-parametric fusion methods. Since the Gaussian process fusion methods have analytic mean and covariance, we show that the distributed estimation fusion is equivalent to the centralized estimation fusion with the single sensor cross terms being full column rank. Numerical examples show that the equivalence is satisfied under the condition and the multisensor estimation fusion performs better than the single sensor. We also compare the proposed fusion methods with the RCC-CI [37] algorithm and the convex combination method [43]. In the Gaussian process model setting, the simulation results show that the multisensor estimation fusion methods outperform the RCC-CI algorithm and the convex combination method, as far as the estimation accuracy is concerned.
This article also extends our earlier work [48]. Compared with the conference paper, the main differences are as follows: • Detailed proofs are provided. • The enhancement of the multisensor fusion algorithm with GP-UKF is presented. • Additional set of extensive experiments are carried out. • The equivalence condition of Proposition 1 is analyzed. • Comparison between GP-ADF fusion and GP-UKF is given.
The rest of the paper is organized as follows. In Section 2, the problem formulation and the Gaussian process are introduced. In Section 3, the centralized estimation fusion and the distributed estimation fusion methods are presented. In Section 4, simulations are provided to confirm our analysis. Some conclusions are drawn in Section 5.

Problem Formulation
In this paper, we consider the state estimation problem of a nonlinear dynamic system with additive noise and N (N ≥ 2) sensor measurements. The multisensor nonlinear dynamic system with state equation and N measurement equations (see Figure 1) is described as follows: where x k ∈ R D is the state of the dynamic system at time k and y m k ∈ R l m is the measurement of the m th sensor at time k, m = 1, . . . , N. h(x k ) is the transition function of the state x k and g m (x k ) is the nonlinear measurement function of x k at the m th sensor. w k ∼ N(0, Q k ) is a Gaussian system noise and v m k ∼ N(0, R m k ) is a Gaussian measurement noise, which is independent of each other. Our goal is to estimate the state from all the available sensor measurement information. Gaussian processes are regarded as the prior of the transition function h(x k−1 ) and the sensor measurement function g m (x k ), m = 1, . . . , N, respectively. Then we make inference about the posterior distribution of the transition function and the sensor measurement function. The Gaussian process model represents a powerful tool to make Bayesian inference about functions [49].

Gaussian Processes
A Gaussian process is defined over functions, which is a generalization of the Gaussian probability distribution [8]. It means that we should consider inference directly in function space. Also, it gives a formal definition, "the Gaussian process is defined a collection of random variables, any finite number of which have a joint Gaussian distribution," in [8].
Similar to a Gaussian distribution, knowledge of both mean and covariance function can specify a Gaussian process. The mean function m(x) and covariance function (also called a kernel) k(x, x ) of a Gaussian process f (x) are defined as follows, where E[·] denotes the expectation. Thus, we write the Gaussian process as Unless stated otherwise, a prior mean function is usually assumed to be 0. The choice of the covariance functions depends on the application [14]. In this paper, we employ the most commonly-used squared exponential (SE) kernel in machine learning, which is the prototypical stationary covariance function and useful for modeling particularly smooth functions. It is defined as where parameter α 2 represents the variance of the function f that controls the uncertainty of predictions in areas of less training sample density and parameter Λ is a diagonal matrix of the characteristic length-scales of the SE kernel. Other commonly employed kernel functions can be seen in Reference [8].
A Gaussian process implies a distribution over functions based upon the obtained training data. Assume that we have obtained a set of training data X = {X, y}. X and y are made up of multiple samples drawn from the following standard Gaussian process regression model where f : R D → R and f (x) ∼ GP, N denotes the normalized Gaussian probability density function. Note that Gaussian process regression uses the fact that any finite set of training data and testing data of a Gaussian process are jointly Gaussian distributed. Let θ = {α 2 , Λ, σ 2 ε }, called the hyper-parameters of the Gaussian process. Using the evidence maximization method [8,50], we obtain a point estimatê from the known training data [51]. We can solve the optimization problem with numerical optimization techniques such as conjugate gradient ascent [8,14]. Next, we infer the posterior distribution over function value f (x * ) from training data for each inputs x * ∈ R D . In addition, the test input x * can be categorized into two classes, relying on whether it is uncertain or not. The corresponding predictive distribution over f (x * ) will be presented as follows. (1)

Predictive Distribution over Univariate Function
Suppose that the training data is X = (X, y)={(x i , y i ) : i = 1, . . . , n} , where x i is a D-dimensional input vector, y i is a scalar output, and n is the number of training data. Next, two cases, deterministic inputs and uncertain inputs, are considered.

Deterministic Inputs
Conditioned on the training data and the deterministic test input x * , the predictive distribution over f (x * ) is a Gaussian with the mean and the variance where Var f [·] represents the variance with respect to f , and K is the kernel matrix with each element satisfying K ij = k(x i , x j ). The uncertainty of prediction is characterized by the variance σ 2 f (x * ). More details can be seen in Reference [8]. •

Uncertain Inputs
When the test input is uncertain, namely x * has a probability distribution, the prediction problem is relatively difficult. Let us consider the predictive distribution of f (x * ) for the uncertain input x * ∼ N (µ, Σ). According to the results in Reference [7] (or reviewing References [52,53]), we introduce the predictive distribution over the function value as where the mean and variance of the distribution p( f (x * )|x * ) are provided with Equations (5) and (6), respectively. Based on the conditional expectation and variance formulae, it yields the mean µ * and variance σ 2 * of the distribution p( f (x * )|µ, Σ) in closed form. In particular, we have and Note that l in Equation (8) is from l = [l 1 , . . . , l n ] T , tr(·) represents the trace in Equation (9) and we havẽ In general, the predictive distribution in Equation (7) cannot be analytically calculated since it leads to a non-Gaussian distribution, if a Gaussian distribution is mapped through a nonlinear function. Using moment-matching method, the distribution p( f (x * )|µ, Σ) can be approximated by the Gaussian distribution N (µ * , σ 2 * ). (2)

Predictive Distribution over Multivariate Function
For the model (4), let us turn to the multiple output case that f : R D → R E , f ∼ GP. Following the results in References [7,51], we need to train E Gaussian process regression models independently. The ath model is learned from the training data [X, y a ], y a = [y a 1 , . . . , y a n ] T , a = 1, . . . , E, where y a j is the ath element of y j . It implies that any two targeted dimensions are conditionally independent given the input.
For any deterministic input x * , the mean and variance of each target dimension are obtained by Equations (5) and (6). The predictive mean of f (x * ) is a vector stacked by E targeted dimension mean, and the corresponding covariance is a diagonal matrix whose diagonal element is the variance of each targeted dimension.
With the uncertain input x * ∼ N (µ, Σ), the predictive mean µ * of f (x * ) also is the stacked E predictive mean µ a * given by Equation (8), a = 1, . . . , E. Unlike predicting at deterministic inputs, targeted dimensions are dependent due to the uncertain input. Denote f * a = f a (x * ) and the corresponding predictive covariance matrix is given by It is obvious that the predictive covariance is no longer a diagonal matrix. Using Equation (9), we can obtain the diagonal element of covariance matrix (10). For each a, b ∈ {1, . . . , E}, the off-diagonal elements satisfy Given where β a , β b can be similarly obtained in Equation (5). For notational simplicity, define Thus, we also approximate the distribution p( f (x * )|µ, Σ) with the Gaussian distribution N (µ * , Σ * ). More details can be found in Reference [51].

Multisensor Estimation Fusion
In this section, an essential prerequisite is that the transition function (1) and the measurement functions (2) are either not known or no longer accessible. Thus, we model the latent functions with Gaussian processes. For the N-sensor dynamic systems (1) and (2), suppose that we have obtained some training data of the target state and sensor measurement. In the following, we discuss the centralized and distributed estimation fusion methods to estimate the state from a sequence of noisy sensor measurements.

Centralized Estimation Fusion
First, we consider the centralized estimation fusion method. To facilitate fusion, we stack the multisensor measurement information as follows Thus the dynamic system (1) and (2) can be rewritten as where the covariance of the noise v k is given by Considering the Gaussian process dynamic system setup, two Gaussian process models can be trained using evidence maximization. GP h models the mapping As we know, the key of the estimation is to recursively infer the posterior distribution over the state x k of the dynamic system (13)- (14), which are based on all the sensor measurements y 1:k = {y τ } k τ=1 , including the prediction step and the update step. In particular, the prediction step uses the posterior distribution from the previous time step to produce a predictive distribution of the state at the current time step. In the update step, the current predictive distribution is combined with current measurement information to refine the posterior distribution at the current time step. Next, centralized estimation fusion methods with GP-ADF and GP-UKF are presented, respectively. (1)

Fusion with GP-ADF
Note that some approximation methods are required to obtain an analytic posterior distribution for the nonlinear dynamic systems. In particular, for the Gaussian process dynamic system, it is easy to make some Gaussian approximation to the posterior distribution. GP-ADF exploits the fact that the true moments of the Gaussian process predictive distribution can be computed in closed form. The predictive distribution is approximated by a Gaussian with the exact predictive mean and covariance [7]. Based on the following lemma, we present the centralized estimation fusion method with GP-ADF.

Lemma 1.
For the Gaussian process dynamic system (13)-(14), the posterior distribution of the state x k can be approximated by the Gaussian distribution N (µ e k , C e k ), in which the mean µ e k and covariance C e k are given as [7]: where Remark 1. Since GP-ADF algorithm [7] is a state estimation method in the single sensor Gaussian process dynamic system, it is applicable to the multisensor system with all the sensor measurements stacked into a vector. Therefore, Equation (15) and Equation (16) are called the centralized estimation fusion method with GP-ADF here.
Then, let us present the prediction step and update step of the above fusion method in detail.

• Prediction Step
First, this step needs to use the previous posterior distribution p(x k−1 |y 1:k−1 ) ≈ N (µ e k−1 , C e k−1 ) to produce a current predictive distribution p(x k |y 1:k−1 ). We write the predictive distribution as . From this, an analogy with Equation (7) yields the mean µ p k and the variance C p k of the state x k conditioned on the measurement y 1:k−1 . In particular, µ p k is a D-dimension mean vector and the computation of every target dimension can be seen in Equation (8). The corresponding covariance matrix where Σ w k is the covariance matrix of transition noise obtained by the evidence maximization method, and the computation of Σ h can be seen in Equation (10). •

Update Step
Next, we approximate the joint distribution p(x k , y k |y 1: Note that we are not aiming at approximating the distribution p(x k |y 1:k ) directly. To obtain the joint approximation Gaussian distribution, we use the Gaussian distribution N (µ y k , C y k ) to approximate the distribution p(y k |y 1:k−1 ), which can be done in a same way to the prediction step due to p(y k |y 1:k−1 ) = p(y k |x k )p(x k |y 1:k−1 )dx k .
On the other hand, the cross term satisfies is the mean of a new Gaussian distribution that is the product of two Gaussian density functions and c −1 2 is the normalization constant of the new Gaussian distribution. Consequently, from the joint distribution p(x k , y k |y 1:k−1 ), we obtain the posterior distribution p(x k |y 1:k ) ≈ N (µ e k , C e k ) with the mean and variance as Equations (15) and (16), respectively. (15) and (16), we can see that the centralized fusion method with GP-ADF is similar to the unified optimal linear estimation fusion [40,54]. The difference is that the computational way of µ p k , C p k , µ y k ,C y k and C x k y k and they are mainly based on the training data due to the nonparametric dynamic system model.

Prediction Step
At time k − 1, based on the unscented transform [1], we obtain X k−1 containing 2D + 1 points through the mean µ e k−1 and covariance C e k−1 . Using the GP prediction model, the transformed set is given byX and the process noise is computed as where GP [µ] h and GP [Σ] h are the mean and covariance models with respect to the Gaussian process GP h . The predictive mean and covariance are where W [i] is the weight generated in the unscented transform. Using the predictive mean µ p k and covariance C p k , we obtainX k through the unscented transform. Thus, based on the GP observation model,Ŷ and the observation noise matrix is determined by Then the predicted observation and innovation covariance are calculated bŷ Meanwhile, the cross covariance is given by •

Update Step
Finally, the update can be obtained by the following equations: Remark 3. Note that GP-ADF propagates the full Gaussian density by exploiting specific properties of Gaussian process models [7]. GP-UKF maps the sigma points through the Gaussian process models instead of the parametric functions and the distributions are described by finite-sample approximations. In contrast to GP-UKF, GP-ADF is consistent and moment preserving [7]. In addition GP-EKF requires linearization of the nonlinear prediction and observation models in order to propagate the state and observation, respectively [14]. GP-PF needs to perform one Gaussian process mean and variance computation per particle and has very high computational cost. For more details about these single sensor filtering methods, it can be seen in [7,14], and so forth. Due to the linearization loss of GP-EKF and high computational cost of GP-PF, we do not consider them and only introduce the multisensor estimation fusion with GP-ADF and GP-UKF in this paper.

Distributed Estimation Fusion
In this subsection, we mainly present the distributed estimation fusion with GP-ADF. For the dynamic systems (1) and (2), assume that there is a prior on initial state x 0 and each local sensor sends its estimation to the fusion center. Thus, the posterior distribution of m th local state estimation is a Gaussian distribution N (µ em k , C em k ), where µ em k and C em k are calculated as follows with Then, the local posterior mean and covariance are transmitted to the fusion center to yield the posterior distribution of global state estimation.
In general, the centralized estimation fusion method with directly fusing raw data has better fusion performance than the distributed estimation fusion method based on the processed data. However, in some cases, the distributed estimation fusion is equivalent to the centralized estimation fusion. In particular, in what follows, we propose a distributed estimation fusion method and prove that the distributed estimation fusion method is equivalent to the above centralized estimation fusion method when all cross terms C x k y m k , m = 1, . . . , N, have full column rank. Proposition 1. Assume that all cross terms C x k y m k , m = 1, . . . , N are full column rank, the distributed estimation fusion is equivalent to the centralized estimation fusion as follows Proof. According to Equation (15), the mean of centralized fused state is given by Based on the assumption that all C x k y m k have full column rank, we obtain Thus, combining Equations (37) and (38) and Equation (39) yields In order to obtain the centralized estimation mean, that is, fuse the mean of local sensor state estimation at the fusion center, we use Equations (32) and (40) to eliminate y k from Equation (36). From Equation (32), we have Thus, substituting Equation (41) into Equation (40) and recalling Equations (36)-(38), we have Therefore, we obtain the state mean of the distributed estimation fusion.

Numerical Examples
In this section, we assess the performance of our fusion methods with two different examples.

1D Example
We consider the nonlinear dynamic system with two sensor measurements as follows: where z m , m = 1, 2 are given by z 1 = 0 and z 2 = π 4 , respectively. This nonlinear system is popularly used in many papers (see References [7,10]) to measure the performance of nonlinear filters. We regard the first 200 samples as the training data and use the rest 50 samples called testing data to test the fusion methods. Assume that the x −200 is a Gaussian distribution with prior mean µ −200 = 0 and variance σ 2 −200 = 1.5 2 . The estimation performance of the single sensor and multisensor fusion is evaluated by the average Mahalanobis distances that are also used in Reference [7]. The Mahalanobis distance, which is defined between the ground truth and the filtered mean, is as follows: where C e k is the estimated covariance. For the Mahalanobis distance, lower values indicate better performance [7]. Figures 2 and 3 show the average Mahalanobis distances of the single filter with GP-ADF and GP-UKF, and the fusion methods after 1000 independent runs. Figure 4 shows the average Mahalanobis distances for different system noises.
According to the average Mahalanobis distances in Figures 2 and 3, we can see that the estimation performance of Sensor 2 is better than that of Sensor 1. Furthermore, the multisensor estimation fusion methods perform better than the single sensor filters with GP-ADF and GP-UKF, respectively. Thus the effectiveness and advantages of the multisensor estimation fusion with Gaussian process can be observed. In addition, GP-ADF fusion performs better than GP-UKF fusion for the system noise σ 2 = 1 and GP-UKF fusion performs better than GP-ADF fusion for the system noise σ 2 = 2. It means that GP-ADF fusion and GP-UKF fusion have their comparative advantages for different system noises. It suggests that GP-ADF fusion may be a better choice for the small system noise and GP-UKF fusion may be worth considering for the relatively large system noise. From Figure 4, we can see that the proposed fusion methods enjoy better performance for different system noises. It demonstrates the consistence of our fusion methods for different system noises.

2D Nonlinear Dynamic System
In this subsection, we elaborate the performance of the fusion methods with GP-ADF and GP-UKF, and we consider the example with constant turn motion model in target tracking. The root mean square error (RMSE) is used as the estimation performance measure. It is defined as follows: where M is the total number of simulation runs, x j k is the true simulated state for the jth simulation, andx j k is the estimated state value for the jth simulation at time k, j = 1, . . . , M. The lower the value of the RMSE is, the better the performance of corresponding method is.

Experimental Setup
We consider the constant turn motion model with two sensors as follows: where the state transition matrix F is given by with angular velocity Ω, the covariance matrix of transition noise satisfies Q k = 5 0 0 5 , and the covariance matrix of measurement noise is The We use the transition function (46) and measurement functions (47) to simulate a group of data. The values of angular velocity Ω are given by Ω = 2π 90 rad/s and Ω = 0.5 rad/s, respectively, which correspond to the small turn motion model and large turn motion model. The first κ samples are regarded as the training data and the rest 50 samples called testing data are used to test the fusion methods. The {x τ , y 1 τ , y 2 τ } −1 τ=−κ are the true simulated state and two-sensor observations used to train the models. Assume that the x −κ is a Gaussian distribution with the prior mean µ −κ = [0, 0] T and the identity matrix variance S −κ = I. The initial value of filtering for each sensor is set as the Cartesian coordinate transformation value based on the Polar coordinate observation with a random perturbation for each dimension. We use the random perturbation N(10, 1) for the κ = 300 training data case and N(20, 1) for the κ = 30, 60 training data cases, respectively. The initial value of fusion filtering is the mean of the initial values of all sensors. Based on the available observation information, the outputs of GP-ADF and GP-UKF for each single sensor are the mean and covariance terms µ The distributed fusion methods, the RCC-CI algorithm [37] and the convex combination method [43], are also used as a comparison with our distributed fusion method. The RCC-CI algorithm and the convex combination method directly use the estimated mean and covariance matrix to fuse the estimated results. Our methods take full advantages of the results of the single sensor Gaussian process filters such as the cross term C x k y m k which is easily obtained. Note that all of the fusion methods are based on the local estimates that are distilled from the measurements. Thus, the comparison is fair from the available observation information. The simulation results of the RCC-CI algorithm are based on the YALMIP toolbox [55]. We compare the fusion methods with the RCC-CI algorithm and the convex combination method by RMSE after M = 1000 independent simulation runs. Meanwhile, we also compare the computation time of the multisensor estimation fusion methods with the RCC-CI algorithm and the convex combination method. The ratio of full rank, defined as the percentage of full column rank of the cross term C x k y m k for the single sensor filtering at every time step after M = 1000 independent runs, is used to test the condition of equivalence. If the ratio of full rank is less than 100%, the condition is broken. The simulations are done under Matlab (MathWorks, Inc., Natick, MA, USA) R2018b with ThinkPad W540. Figure 5 describes the ratios of full column rank in the single sensor case for testing data with κ = 300 training data and Ω = 2π 90 , 0.5 rad/s. The RMSEs for Ω = 2π 90 , 0.5 rad/s in the case of κ = 300 training data are depicted in Figures 6 and 7, respectively. The average computation time of these fusion methods for Ω = 2π 90 , 0.5 rad/s in the case of κ = 300 training data after 1000 independent simulation runs are depicted in Figures 8 and 9, respectively. The diamond line represents the RMSE of the centralized estimation fusion with GP-ADF (CMF-GP-ADF) and the star line represents the distributed estimation fusion with GP-ADF (DMF-GP-ADF). The circle line represents the RMSE of the centralized estimation fusion with GP-UKF (CMF-GP-UKF) and the x line represents the distributed estimation fusion with GP-UKF (DMF-GP-UKF). The upper triangle line represents the RMSE of the RCC-CI fusion algorithm with GP-ADF (RCC-CI-GP-ADF) and the lower triangle line represents the RMSE of the RCC-CI fusion algorithm with GP-UKF (RCC-CI-GP-UKF). The square line represents the RMSE of the convex combination method, that is, covariance weight, with GP-ADF (CW-GP-ADF) and the + line represents the RMSE of the convex combination method with GP-UKF (CW-GP-UKF). At the same time, solid line is used for the single sensor GP-ADF filter and dash-dotted line is used for the single GP-UKF filter. Similarly, the ratios of full column rank in the single sensor case for testing data with κ = 30, 60 training data and Ω = 2π 90 rad/s are depicted in Figures 10 and 11, respectively. The RMSEs for Ω = 2π 90 rad/s in the case of κ = 30, 60 training data are depicted in Figures 12 and 13, respectively. The average computation time of these fusion methods for testing data with κ = 30, 60 training data and Ω = 2π 90 rad/s after 1000 independent simulation runs are depicted in Figures 14  and 15, respectively.

Experimental Analysis
We divide the experimental analysis into two cases, that is, the equivalence condition is satisfied or not. Some phenomena and analysis are described as follows: Case 1: the equivalence condition is satisfied.
• From Figure 5, we can see that the ratios of full rank are all equal to 100%. It means that the cross terms of the single sensor filters are all full column rank for the κ = 300 training data case and thus the equivalence condition of centralized fusion and distributed fusion is satisfied in Proposition 1. Meanwhile, from Figures 6 and 7, the RMSE of the distributed estimation fusion is the same as that of the centralized estimation fusion based on GP-ADF and GP-UKF, respectively. It demonstrates the equivalence between the centralized estimation fusion and the distributed estimation fusion in Proposition 1 under the condition of full column rank. • From Figures 6 and 7, it can be seen that the RMSE of the multisensor estimation fusion method is lower than that of the single sensor filtering. It shows that the multisensor fusion improves the estimation accuracy. • In addition, the RMSE of multisensor estimation fusion method is lower than that of the RCC-CI algorithm and the convex combination method. It implies the effectiveness of the fusion methods based on Gaussian processes. The possible reason is that our estimation fusion methods extract more extra correlation information, and the RCC-CI algorithm and the convex combination method only use the local estimates with mean and covariance. • Compared Figure 6 with Figure 7, we can find that our methods do well in the different angular velocity cases, i.e., the small turn motion model and large turn motion model. It confirms that our fusion methods have fine applicability with better performance. • From Figures 8 and 9, we can find that the computation time of the distributed estimation fusion is less than that of the centralized estimation fusion. It demonstrates the superiority of the distributed estimation fusion under the same fusion performance. The computation time of the proposed fusion methods is much less than that of the RCC-CI algorithm. The possible reason is due to solve a optimization problem for the RCC-CI algorithm. The convex combination method takes the least computation time, since it directly uses the weight combination with covariance.  • From Figures 10 and 11, it can be seen that the ratios of full rank are both less than 100% for the single sensor GP-UKF case with κ = 30 and κ = 60 training data. Thus, the equivalence between the centralized and distributed estimation fusion with GP-UKF is broken in Figures 12 and 13, respectively. The reason may be that the Gaussian models are relatively inaccurate with less training data, which can be known from the comparison with Figures 6, 12 and 13 for the same methods. At the same time, the finite-sample approximation of GP-UKF seriously depends on the Gaussian process models and the computation way of the cross terms is the sum about rank-one matrices for GP-UKF. However, the equivalence is still satisfied for the GP-ADF fusion. It implies GP-ADF is more stable than GP-UKF, which is also referred to in Reference [7]. • We can also see that the performance of GP-UKF fusion is better than that of GP-ADF fusion with κ = 300 training data from Figure 6 and a little worse with κ = 30, 60 training data from Figures 12 and 13. Meanwhile, from Figures 8, 14 and 15, the average computation time of GP-UKF fusion is less than that of GP-ADF fusion with κ = 300 training data and is contrary with κ = 30, 60 training data. It may inspire us that the GP-UKF fusion is suitable for the enough training data case and GP-ADF fusion does well in the small number of training data case for the turn motion systems.
In a word, distributed estimation fusion with GP-ADF is more stable, and has a better performance in the case of the small number of training data. If we have enough training data, GP-UKF fusion may be the better choice.

Conclusions
In the context of this paper, the property matters if the real system does not closely follow idealized models or a parametric model cannot easily be determined for nonlinear systems. A non-parametric method, the Gaussian process, is introduced to learn models from training data, since it takes both model uncertainty and sensor measurement noise into account. In order to estimate the state of the multisensor nonlinear dynamic systems, we have used the Gaussian process models as the prior of the transition and measurement function of the dynamic system. Then the transition function and measurement function have been trained with Gaussian processes, respectively. Based on the Gaussian process models for all available local sensor measurements, we have developed two fusion methods, centralized estimation fusion and distributed estimation fusion, with GP-ADF and GP-UKF, respectively. Taking full advantage of the nature of the Gaussian process, the equivalence between centralized estimation fusion and distributed estimation fusion has been derived under mild conditions. Simulations show that the equivalence is satisfied under given conditions and the multisensor estimation fusion performs better than the single sensor filters. Compared with the RCC-CI algorithm, the multisensor estimation fusion methods not only have higher accuracy, but also require less computation time. The estimation performance of multisensor estimation fusion methods is also better than that of the convex combination method. Future work may involve state initialization, Gaussian process latent variable models without ground truth states, multiple model fusion with different Gaussian processes for maneuvering target tracking and validate the methods with the real sensor data.