Kullback–Leibler Divergence Based Distributed Cubature Kalman Filter and Its Application in Cooperative Space Object Tracking

In this paper, a distributed Bayesian filter design was studied for nonlinear dynamics and measurement mapping based on Kullback–Leibler divergence. In a distributed structure, the nonlinear filter becomes a challenging problem, since each sensor cannot access the global measurement likelihood function over the whole network, and some sensors have weak observability of the state. To solve the problem in a sensor network, the distributed Bayesian filter problem was converted into an optimization problem by maximizing a posterior method. The global cost function over the whole network was decomposed into the sum of the local cost function, where the local cost function can be solved by each sensor. With the help of the Kullback–Leibler divergence, the global estimate was approximated in each sensor by communicating with its neighbors. Based on the proposed distributed Bayesian filter structure, a distributed cubature Kalman filter (DCKF) was proposed. Finally, a cooperative space object tracking problem was studied for illustration. The simulation results demonstrated that the proposed algorithm can solve the issues of varying communication topology and weak observability of some sensors.


Introduction
Recently, space situational awareness (SSA) [1][2][3] has attracted more and more attention, because of its broad applications in space surveillance, tracking of objects in Earth orbit, monitoring the conditions of Earth's magnetosphere, ionosphere and thermosphere, etc. Among the varieties of SSA systems, the space-based sensor network [4] (e.g., distributed satellite system) has many advantages compared with ground-based ones, since there is no atmosphere and weather problems for space-based sensor networks. Object tracking is a key problem in SSA, because many space missions highly depend on the results of object tracking, i.e., threat detection, cooperative search and collision avoidance. Kalman filter-based estimation [5,6] plays a key role among tracking methods, due to its ability of real-time estimation and non-stationary process tracking.
The main challenging problem in tracking objects in Earth orbit is the strong nonlinear dynamics of objects. Existing approximate nonlinear filters can be roughly classified into two categories: linearization and sampling approaches. The linearization approach is based on linearizing the nonlinear dynamics and measurement map and then employing the classical Kalman filter equations. For example, the extended Kalman filter (EKF) [7,8] puts the Jacobians of dynamics and measurement maps into the structure of the Kalman filter to estimate the state and corresponding covariance. It should be noted that the EKF is highly effective and has a broad range of applications [9,10].

1.
A distributed Bayesian filter is developed, which can be treated as an extension of the traditional Bayesian filter [7] and an extension of distributed linear filters [18,19,31] to a nonlinear case. By maximizing a posterior estimation method, we show that the global posterior estimation can be achieved by consensus of each local posterior distribution, where the consensus of PDFs is obtained by an information-theoretic approach.

2.
Based on the developed distributed Bayesian filter structure, a distributed cubature Kalman filter (DCKF) is proposed, which can improve the effectiveness and practicality for applications. Different from the design in [26], the only global information we required is the number of sensor, which is more suitable for applications. 3.
The cooperative space object tracking problem is studied. Different from [26], we focus on the scenario in which the communication topology may change due to the blockage of the Earth.
Moreover, we also consider the case that measurement mapping of each sensor may differ, which will lead to the problem of weak observability for some sensors. The issues of weak observability and blockage are handled by the proposed DCKF.
The remainder of paper is organized as follows. The problem formulation is given in Section 2. Then, the distributed Bayesian filter is discussed, while a fully-distributed cubature Kalman filter is proposed in Section 3. Following that, numerical simulations on space object tracking are shown in Section 4. The discussions are provided in Section 5. Finally, the conclusion of this paper is provided in Section 6.
Notations: The superscript " " represents the transpose. E{x} denotes the mathematical expectation of the stochastic variable x. diag{·} represent the diagonalization scalar elements. tr(P) is the trace of the matrix P, and var(x) is the variance of x. p(·) denotes the probability density function (PDF), and N(0, U) is the Gaussian distribution with mean zero and variance matrix U.

Problem Formulation
In this section, we first give the formulating of distributed Bayesian filtering and describe the consensus of PDFs.

Distributed Bayesian Filter Formulation
Consider the following discrete-time stochastic non-linear dynamics, where x k ∈ R n is the state that needs to be estimated and w k is zero mean Gaussian noise with E{w k w k } = Q k . Dynamics (1) describes the state transition p(x k |x k−1 ). Assume that the state x k is observed by a network of N sensors, whose measurement model is given as: where z i,k ∈ R m i is the measurement by sensor i at time k. v i,k ∼ N(0, R i,k ) is the measurement noise of sensor i, where 0 ∈ R m i denotes the zero vector. Measurement Equation (2) describes the measurement likelihood function p(z i,k |x k ), i = 1, 2, . . . , N. In this paper, we assume that w k and v i,k ∀i ∈ V are independent, and all z i,k , ∀i ∈ V are conditionally independent of x k .
The communication of the network is modeled by an undirected graph G k = (V, E k , A k ), which consists of the set of sensors V = {1, 2, . . . , N}, the set of edges E k ⊆ V × V and the weighted adjacent matrix A k = [a ij,k ]. In the weighted adjacent matrix A k , all the elements are nonnegative, row stochastic, and the diagonal elements are all positive, i.e., a ii,k > 0, a ij,k ≥ 0, ∑ j∈V a ij,k = 1. If a ij,k > 0, j = i, there is an edge (i, j) ∈ E k , which means nodes i and j can directly communicate, and node j is called the neighbor of node i. The degree matrix is defined as where the diagonal element d i is the number of nodes connected to node i. All the neighbors of node i including itself can be represented by the set {j ∈ V|(i, j) ∈ E k } {i} N i,k , whose size is denoted as |N i,k |. In this paper, we assume that the undirected graph G k is connected for all k.
The adjacent matrix A k represents the weights of nodes. Note that for a certain graph, there exists an infinite number of associated adjacency matrices. To ensure the double stochastic nature of adjacent matrix A k , a possible choice of the weights [32] is: Denote the global measurement as z k = [z 1,k , z 2,k , . . . , z N,k ] . The relationship between z k and x k can be given as global likelihood function p(z k |x k ), and the relationship between x k and measurement z i,k can be described as local likelihood function p(z i,k |x k ). Under the assumption that the agents are independent of each other, the global likelihood function can be expressed as a product of local likelihood functions, In this paper, we assume that the state x k is conditional independent with all past measurements z 1:k−1 , i.e., p(x k |z 1:k ) = p(x k |z k ). Sensor i knows the dynamics (1) and local likelihood function p(z i,k |x k ) and does not know the global likelihood function p(z k |x k ). Sensor i can only communicate with its neighbors.
The aim of the Bayesian filter is to compute posterior distribution p(x k |z k ). The recursive solution to compute p(x k |z k ) consists of prediction and update steps. The predictive distribution of state x k can be given by the Chapman-Kolmogorov equation, The posterior can be given as: wherec = p(x k |z k−1 )p(z k |x k )dx k denotes the normalization. However, for the Bayesian filter (6) and (7), the computational complexity of state estimation is usually intractable. A computationally-feasible approximation is provided by the cubature Kalman filter [13,14], which uses cubature rules to compute numerical integration for multi-dimensional integrals. It has been shown that the CKF has better performance compared with EKF and UKF [13]. It can be seen from (6) and (7) that if one can access the global likelihood p(z k |x k ), the global estimatex k can be obtained. However, in our paper, each sensor only knows local likelihood function p(z i,k |x k ), and therefore, we have to propose a distributed approach to estimate x k . In Section 3, we proposed a distributed Bayesian filter based on the consensus of PDFs, which is obtained by the Kullback-Leibler divergence described in Section 2.2.

Consensus of Probability Densities
In this section, we will describe the consensus of probability density functions, which will be used to solve the distributed Bayesian filter problem in Section 3.
The traditional average consensus problem is defined under Euclidean space [32]. However, the measure under Euclidean space is not suitable for the probability distribution. For example, two normal distributions N(0, 10,000) and N(10, 10,000) are almost indistinguishable, the Euclidean distance between the parameter is 10. In contrast, the distributions N(0, 0.01) and N(0.1, 0.01) barely overlap, but this is not reflected in the Euclidean distance, which is only 0.1. A more natural measure between two densities is Kullback-Leibler divergence rather than Euclidean distance.
K-L divergence between two PDFs p(· ) and q(· ) is defined as: Following [19,33], the centroid depending on K-L divergence (CKL) is considered, which describes the centroid form initial PDFs,p where a i ≥ 0, ∀i are weights and should satisfy ∑ i∈N a i = 1. The centroid in (9) turns out to be [19], It is worth noting that the CKL can be seen as an example of Bregman information as the mean of Bregman divergence [34,35]. An important feature of (10) is that it is suitable for distributed computation. Namely, the CKL can be achieved by some consensus algorithms, which requires that the data are only transmitted between agents and their neighbors at each step. Thus, CKL can be computed by the consensus algorithm as follows, where t = 1, 2, . . . refer to the t-th step and a ij,k is the weights between agent i and j.
When the distribution is given, the consensus of PDFs will be achieved by manipulating the corresponding parameters. The following lemma shows how to compute consensus on the exponential family iteratively, which can be found in [36].
i ), ∀i ∈ N be exponential distribution families, where λ is a natural parameter. Then, iteratively update (11) given by: Remark 1. An exponential family can be expressed in the following form [37], where λ is a natural parameter, A g (λ) is a log-normalizer and h(θ) is a carrier measure [38]. The exponential families include many of the most common distributions, e.g., Gaussian, Poisson, Bernoulli, Wishart, and many other. Namely, those distributions can be written in the form of exponential families (13).

Remark 2.
It should be noted that K-L divergence defined in (8) is not symmetric. In [33], the authors discussed sided Bregman centroids, i.e., right-sided As shown in [33] (Theorem 3.1 in [33]),p R can be expressed as a convex combination of PFDs, i.e.,p R = ∑ N i=1 a i p i , which always is the center of mass. Notice thatp R is hard to obtain, if p i and p j are correlated. However, for the distributed estimation problem, p i from different nodes need to be fused at each time k, and consequently, they are correlated (see [17]). In this paper, we only consider the left-sided centroid (9), which is easy to compute as shown in (12).

Distributed Cubature Kalman Filter
In this section, we first discuss the distributed Bayesian filter based on the maximum a posterior method and propose a distributed cubature Kalman filter based on K-L divergence.

Distributed Bayesian Filter
The global posterior distribution can be given as: Notice that the predictive distribution is p(x k |z k−1 ) = N(x k ,P k ), and the likelihood function is . Therefore, under the assumption of Gaussian noises, we can obtain the log posterior distribution as follows, Rearranging the items of Equation (17), we obtain: whereC is a constant term that does not effect the estimate of x k . By the maximum a posteriori method, our problem becomes: where . Although the global cost function F k (x k ) over the whole network can be decomposed, we cannot independently minimize the local cost function f i,k (x k ) at each node to reach a global optimum. A key point is that the global cost function −F k (x k ) of the full measurements over the whole network is definitely larger than or equal to the average local cost function f i,k (x k ) over all nodes. Namely, where x * k is the optimal distribution minimizing −F(x k ) and x * k is the one minimizing f i,k (x k ). The equality in the second line holds if and only if x * k is also the optimal solution for all the local cost functions f i,k (x k ), which is not always the case. Therefore, we cannot find the optimal solution by individually minimizing the local cost function at each sensor. However, from (20), we can see that the global optimal solution over the whole network can be approximated by the average local optimal solution of each sensor. Therefore, we can construct a distributed approach to solve the problem based on average consensus.
Taking the derivative of f i,k with respect to x k , we have: Here, we use the fact as the optimal solution with respect to problem min x k f i,k . The estimatex i,k by sensor i can be obtained by letting x k f i,k be equal to zero, and we get: By the matrix inverse lemma, we have: Substituting (25) into (23), we obtain: The estimate error covariance can be computed as follows, where: Remark 3. It should be noted that P i,zz,k and P i,xz,k are a little different with the standard extended Kalman filter, even though they can be obtained by each sensor individually. With N = 1 in (26) and (27), it will reduce to the standard Kalman filter.
Up to now, we obtain the optimal solution of min x k f i,k asx i,k andP i,k , which follows the Gaussian distribution N(x i,k ,P i,k ). As discussed in (20), the global optimal solution can be approximated by averaging local estimate N(x i,k ,P i,k ). However, the traditional average consensus algorithm in Euclidean space may not be suitable to compute the average of PDFs. Therefore, we use the consensus of PDFs described in Section 2.2 to compute the global solution to Problem (19).
. Then, the global posterior distribution can be approximated by the consensus of probability densities (12) as follows, where s = 1, . . . , S is the step of the consensus. Then, the estimates of each node can be achieved by

Remark 4.
In [19], each sensor performed a standard Kalman filter, then the fusion estimation was obtained based on consensus of PDFs. It should be noted that, if N = 1, (26)- (27) will reduce to the measurement update of the standard Kalman filter. We derive an optimal solution of each sensor for Problem (19), which may achieve better performance compared to [19]. However, we should highlight that [19] provided meaningful information the theoretical expression for the distributed filter.
Equations (26)-(31) provide a general framework of the distributed Bayesian filter for posterior estimation. Based on the measurement update Equations (26)-(31), we can construct distributed nonlinear filtering by combining the existing method for state propagation. For example, in [29], the ensemble Kalman filter (EnKF) was used for state propagation, which uses the Monte Carlo technique for integral operation in the Bayesian filtering. In this paper, we use cubature rules for state propagation and measurement update, which we will discuss in the following.

Distributed Cubature Kalman Filter
Suppose that the state x k−1 is approximated by sensor i at time k − 1 as follows, where N(x, P) denotes the Gaussian distribution with mean x and covariance P. The predictive distribution p(x k |z i,k−1 ) = N(x i,k ,P i,k ) can be obtained by the prediction step of CKF. To be specific, a set of cubature points [13] can be provided by: where the basic cubature point is given by ξ t = m 2 × [1] t , t = 1, . . . , m, m = 2n x and [1] t denotes the t-th element of set [1]. For example, let [1] ∈ R 2 , then it represents the set 1 0 , Then, the predicted state and covariance are given by: DenoteP i,k =S i,kS i,k ; under the assumption that these errors can be well approximated by the Gaussian, the prediction measurement can be obtained as follows, where the set of cubature pointsZ i,t,k , t = 1, . . . , m is given by: In the Bayesian framework, these prediction means and covariances will be incorporated in the procedure as prior information of the state to propel the measurement update. Based on Equations (26) and (27), the local posterior can be given as follows, Different from the standard cubature Kalman filter, the innovation covariance matrix P i,zz,k and cross-covariance matrix P i,xz,k of node i can be given according to (28) and (29) as follows, By the consensus of Gaussian distributions (30), the global estimate can be approximated as: With the iterations (44) and (45), we can get the final estimation of state in each sensor. Meanwhile, the iterative estimation will approximate to the global solution of Problem (20) because of the convergence of the average consensus of PDFs as S → ∞. In practice, the convergence will not be achieved fully, because the total number of iterations S is finite. Therefore, the distributed implementation will not perform as well as the centralized one. We summarize the distributed cubature Kalman filter in Algorithm 1.
In (36), Q k can be chosen as a sufficiently small constant matrix. Notice that the initialization of all the local estimates is exactly the same mean of the initial state. However, in practice, it is not easy to let every sensor know the prior knowledge. A more suitable setting is x i,0 = 0, ∀i ∈ V and P i,0 = 0, ∀i ∈ V, which means that there does not exist prior knowledge.
In [26], the authors proposed DCIF for cooperative space object tracking. For comparison, we briefly summarize the main steps of DCIF in Table 1, where the prediction step is omitted due to it being the same as the prediction step of DCKF. In DCIF,z i,k and P i,xz,k can be obtained by (39) and (43), and 0 < < 1 ∆ max , ∆ max = max i {d i }. Local estimation: Algorithm 1 DCKF at node i at time k Ensure: At time k, a prior information P i,k−1 = S i,k−1 S i,k−1 andx i,k−1 ; • Prediction -Local estimate and estimate error covariance

end for •
Compute the estimatex i,k and covariance matrix P i,k ,

Remark 5.
The algorithm in [26] can approach the centralized solution, which is achieved by performing a consensus on information pairs, i.e.,H i,k R −1 i,k (ž i,k +H i,kxi,k ) andH i,k R −1 i,kH i,k , whereH i,k ≈P −1 i,k P i,xz,k is the pseudo measurement matrix andž i,k = z i,k −z i,k . The main limitation of [26] is that it needs a sufficiently large number of consensus steps at each time step, so that the local information pairs can spread to the whole network. It should be noted that we do not limit the range of S. In [19], the convergence properties were proven for such a fusion principle, even S = 1 in the linear dynamics case, which used the fact that the whole posterior PDFs are combined rather than the state or the information pairs. A distributed extended Kalman filter was discussed in [20], which needs the linearization of nonlinear dynamics and measurement mapping. We proposed a distributed cubature Kalman filter, which does not need linearization and can achieve better performance. On the other hand, we provide a structure for the distributed Bayesian filter with the help of K-L divergence, which could approach the centralized solution more efficiently compared with the one in Euclidean space.

Remark 6.
An important feature of proposed algorithm is that the only used global information is the number of sensors N, which is suitable in application. In [26], the author proposed a distributed cubature information filter (DCIF) for the cooperative tracking space object, where the number of sensor N and the maximum degree ∆ max of the network are needed. However, in practice, ∆ max may change and is not easy to obtain in time.

Numerical Simulations
In this section, we illustrate the effectiveness of the proposed DCKF for the space object tracking problem, where the scenario is shown in Figure 1. A distributed satellite system is observing a non-cooperative object, where the bearing-only measurement information is considered. The number of observation satellites is N = 6. In what follows, we first give the dynamics of the space object and measurement mapping by a distributed satellite system, then we solve the cooperative space object tracking problem by the proposed DCKF.

Dynamics of Space Target
The dynamics of space object can be described as follows, where r = [rx ry rz] represents the position of the object in the Earth-centered inertial (ECI) coordinate frame, µ is the gravitational constant, J 2 stands for perturbations and w is Gaussian noise with zero mean. Denote x = [rx, ry, rz,ṙ x,ṙ y,ṙz] as the state variables; we can rewrite (46) in state-space description as follows,ẋ where w = [w 1 , w 2 , w 3 ] is process noise, and the perturbation has the following form, where E r is the Earth radius, a J ≈ 0.00108263. Dynamics (47) is a continuous time model, thus (47) should be discretized in order to apply the EKF algorithm. Let T = t k+1 − t k be the sampling period, then the discrete model of (47) is described in the following, When T = t k+1 − t k is sufficient small, the Taylor expansion of f (x(t)) is: where A(x k ) has the following form: Combine (49)-(51), the discrete model can be given as: where x k ∈ R n x is the state that needs to be estimated. The discretized dynamics (52) is used in the prediction step of the EKF. It can be seen that the higher order terms are ignored in (50), which may enlarge the estimate errors. The state of the object is completely represented by x(t) ∈ R 6 , which includes its position and velocity. When we describe the satellite moving along an orbit, it is often represented in the form of six orbital parameters, i.e., the semi-major axis a h , the eccentricity e, the inclination u, the argument of perigee γ, the longitude of the ascending node Γ and the mean anomaly m h . The nonlinear transformations that converts position and velocity into orbital elements can be found in [39].

Measurement Model
The dynamics (47) is measured by a distributed satellites system. In this example, we consider a satellite equipped with an optical sensor that can obtain the azimuth α or elevation β of the object. Measurement mapping of azimuth α and elevation β can be expressed as follows, where [řx k ,řy k ,řz k ] is the position of the satellite at time k and va k and vb k are measurement noise at time k.
In [26], the authors assume that the measurement equation of different satellites is the same. However, in practice, this does not always hold. In this example, we assume that a satellite can obtain either or both of the azimuth α and elevation β, which has a broader range of application. The measurement equation of the i-th satellite can be written as follows, where N is the number of satellites. Due to blockage of the Earth, it is more suitable to model communication topology as time-varying. Our aim is to estimate state x k by a network of satellites. It should be noted that, if one satellite can only obtain azimuth α, then the estimates obtained only by this satellite will be very large or even divergent. Figures 2 and 3 show the mean square error (MSE) of EKF for space object tacking in a single satellite, where only the azimuth α can be measured. It can be seen from Figures 2 and 3 that both the MSE of position and velocity are very large. More importantly, since the Earth may block the communication between satellites, the communication topology may change. Therefore, it is not reasonable to know the information of the global communication topology for each satellite in real time.

Simulation Results
The trajectories of observation satellites and the true object are generated by the six orbital parameters as shown in Table 2. We consider the dynamics (47); the state of the object is x = [rx, ry, rz,ṙ x,ṙ y,ṙz] . The noise variance of the process is Q k = diag{10 −4 , 10 −4 , 10 −4 , 10 −6 , 10 −6 , 10 −6 }. The initial state of each agent was chosen randomly from N(x 0 , P 0 ), where x 0 = [x p0 x v0 ] is the true initial state of the object, P 0 = diag[(10 10 10 10  Numerical simulations are conducted through Monte Carlo experiment, in which 50 Monte Carlo trials are done for each tracking algorithm. The total of the mean square estimation errors (MSE) is considered, which is widely used to indicate the performance of estimates, which is defined as: The performance of mean square estimation for each satellite is defined as: The Runge-Kutta method is used to generate predictionx i,k in DCKF. Namely, the solution of nonlinear propagation X * i,t,k = f (X i,t,k−1 ) is computed by the Runge-Kutta method.  Figure 4. We assume each satellite can obtain both azimuth α and elevation β, where the measurement mapping is defined in (53) and (54). The measurement noise variances are generated by R i = i· diag{0.001 2 0.001 2 }, i = 1, . . . , N. The number of consensus is S = 1. For Simulation Case 1, the comparison of MSE curves of different satellites is shown in Figure 5. The filtering precision and stability of the proposed DCKF for different satellites can be seen. It also illustrates that the estimations of different satellite almost reach a consensus, which can increase the robustness of tracking. More importantly, the estimate of each sensor is stable and converges even if the number of consensus is S = 1, which can reduce the communication rate compared with DCIF in [26].  Simulation Case 2: In this case, we test the performance of the proposed DCKF for the switching topology, i.e., the communication is time-varying. To be specific, the communication topology is switching among the given topologies as shown in Figure 6. The setting of initialization and noise variances is the same as Simulation Case 1.
As shown in Figure 7, the filtering precision and stability of the proposed DCKF for different satellites are also demonstrated. It should be noted that, for the switching case, the estimate of each node only needs the information of its neighbors. However, the algorithm in [16] needs global information ∆ max (∆ max = 3, 2, 3 in Figure 6), where ∆ max = max i d i , d i is the degree of node i, which is not easy to obtain in time.
Simulation Case 3: In this case, we compare the performance of DCKF with the distributed extend Kalman filter (DEKF) in [40] and the distributed cubature information filter (DCIF) in [26]. The discrete model (52) is adopted for the time update in the DEKF algorithm.
We assume that each satellite can only obtain either azimuth α or elevation β for the tested filters. To be specific, Satellites 1, 3 and 5 can only obtain the azimuth α, and Satellites 2, 4 and 6 can only obtain the elevation β of the object. In this case, we assume the measurement noises of satellite i to be v i,k~N (0, 10 −4 ).    Figure 8 shows the comparison between DEKF and DCKF for different numbers of consensus. It can be seen that, for the cases S = 1, S = 3 and S = 10, the DCKF performs more accurately than the DEKF, since the DEKF suffers form linearization errors due to numerically-linearizing the nonlinear dynamics and measurement map, which will enlarge the estimate errors. As S increases, the DEKF is more accurate. This is due to the fact that the local information will spread to the whole network as S → ∞. Despite the weak observability, the proposed DCKF algorithm provides reasonable performance even for S = 1, which indicates that the proposed DCKF is more robust and suitable for real-time applications with weak observability of some sensors than the DEKF.   Figure 9 gives the comparison between DCIF and DCKF with different consensus numbers S under the weak observability condition. We also assume that Satellites 1, 3 and 5 can only obtain the azimuth α and Satellites 2, 4 and 6 can only obtain the elevation β of the object. It can be seen that, under the weak observability condition, all test filters can successfully track, and the estimation errors of the position of DCKF and DCIF are almost the same. However, there is a large velocity overshoot in DCIF, which indicates that the DCKF enjoys a stronger stability property than the DCIF.  The computational time of a single satellite of different filters is given in Table 3. All tests are operated on a notebook with an Intel central processing unit (i7 4510U) and MATLAB. Note that the computational time of the DCKF is longer than that of the DEKF, and the computational time of the DCIF is twice as long as the DCKF.

Discussion
The distributed Bayesian filter design has been researched, and a distributed cubature Kalman filter was proposed to deal with the time-varying topology and weak observability of sensors. It can be seen from Figures 2 and 3 that the standard EKF cannot provide good results for weak observability of a single sensor. When it comes to the distributed setting, for the nodes with weak observability, both DCKF and DEKF can obtain stable estimation, and DCKF performs better than DEKF. From Figure 7, it also can be seen that DCKF is suitable for the case of switching topology. Namely, the proposed DCKF can handle the problem of blockage of the communication channel in time. Figure 9 illustrates that the proposed DCKF enjoys stronger stability properties than DCIF for the case of weak observability of some sensors, by noting that large velocity overshoot in the DCIF. A possible reason for such satisfactory performance of DCKF is that the global posterior PDF is considered for distributed estimation rather than just the innovations, as in DCIF. Moreover, the number of consensus in the DCKF could be one, which will conserve communication resources. However, we should highlight that the DCIF in [26] can approach the centralized solution as the number of consensus tends to infinity, and the DCKF in our paper cannot.
The K-L divergence for average consensus can be treated as a convex combination of the information matrices and vectors. This convex combination is well known as covariance intersection (CI) in the literature [41,42]. It is well known that the CI scheme provides an information fusion that is robust with respect to the unknown correlations among the information sources. The stability of such a fusion strategy for distributed estimation has been proven in [19] for linear time-invariant dynamics, and the results were extended to distributed EKF in [40].
A key point in our paper is that the global cost function (19) has a form of "sum-of-cost", which is amenable to distributed implementations [32,43,44]. Information geometric optimization approaches can be used to construct the formulation, in which the natural gradient descent method is used to seek the optimal estimation. For example, in [29,30], the natural gradient descent method was used to construct the Bayesian nonlinear filter. In the distributed setting, a global optimal estimate can be obtained under the structure of the distributed convex optimization ( [43]) by natural gradient descent.
In summary, the proposed DCKF has the advantages of strong stability and being more suitable for the cooperative object tracking problem compared to DEKF and DCIF. Future research issues mainly include the problems of measurement and communication delay, which will broaden the application of DCKF.

Conclusions
In this paper, we investigated the distributed Bayesian filter and proposed a distributed cubature Kalman filter. In order to solve the problems of weak observability and time-varying communication topology, we introduced Kullback-Leibler (K-L) divergence to measure the difference of local estimates, and the consensus estimate is achieved under the K-L average of local estimates. The simulation results indicate that, for the distributed space object tracking problem, the proposed DCKF has better results than DEKF and DCIF. Moreover, the proposed algorithm does not rely on the same measurement mapping of each sensor and can successfully track a space object for the time-varying communication topology and weak observability of some sensors.