Distributed Consensus Multi-Distribution Filter for Heavy-Tailed Noise

: Distributed state estimation is one of the critical technologies in the field of target tracking, where the process noise and measurement noise may have a heavy-tailed distribution. Traditionally, heavy-tailed distributions like the student-t distribution are employed, but our observation reveals that Gaussian noise predominates in many instances, with occasional outliers. This sporadic reliance on heavy-tailed distributions can degrade performances or necessitate frequent parameter adjustments. To overcome this, we introduce a novel distributed consensus multi-distribution state estimation method that combines Gaussian and student-t filters. Our approach establishes a system model using both Gaussian and student-t distributions. We derive a multi-distribution filter for a single sensor, assigning probabilities to Gaussian and student-t noise models. Parallel estimation under both distributions, utilizing Gaussian and student-t filters, allows us to calculate the likelihood of each distribution. The fusion of these results yields a mixed-state estimation and corresponding error matrix. Recognizing the increasing degrees of freedom in the student-t distribution over time, we provide an effective approximation. An information consensus strategy for multi-distribution filters is introduced, achieving global estimation through consensus on fused local filter results via interaction with neighboring nodes. This methodology is extended to the distributed case, and the recursive process of the distributed multi-distribution consensus state estimation method is presented. Simulation results demonstrate that the estimation accuracy of the proposed algorithm improved by at least 20% compared to that of the traditional algorithm in scenarios involving both Gaussian and heavy-tailed distributions.


Introduction 1.Background
Sensor networks have garnered significant attention in recent years in distributed applications, such as environmental monitoring, power grid systems and traffic management [1][2][3].Distributed target tracking or state estimation is one of the classical applications, and the core of it is the use of local communication and measurement through information exchange to obtain the state estimation of the entire system.However, some interference may lead to heavy-tailed measurement noise, as shown in Figure 1, which could degrade the estimation performance.Thus, how to improve the estimation performance in distributed scenarios with heavy-tailed noise is a troublesome problem.

Related Work
There are three main types of distributed state estimation methods, including consensus methods, gossip methods and diffusion methods.For gossip-based method [4,5], each sensor selects one or several connected neighbors randomly to send information to.Although they have a low communication cost, gossip-based methods show slow convergence.The diffusion methods use a one-step convex combination of the received data [6,7], so they cannot converge to a consensus result.In contrast, the consensus method can ensure global convergence in many cases [8,9]; thus, it is the most widely used.Combining this method with Kalman filtering is the most direct method, and there are four main types: the consensus on estimation (CE) method, consensus on measurement (CM) method, consensus on information (CI) method and hybrid CM-CI (HCMCI) method [10,11].Since the CE method does not consider the error covariance matrix, the estimation results are conservative [12].In order to solve this problem, the CM method is proposed.However, the bounded estimation error of the CM method can only be achieved when consensus steps are large enough [8].The CI method conforms the information vector and the information matrix, which can ensure the bounding of the estimation error of any number of consensus steps, while the new information is inevitably under-weighted [13][14][15].The HCMCI method can handle both a priori and likelihood consensus, combining the advantages of the CM and CI methods.In addition, there are consensus algorithms [16] and consensus algorithms based on the Luenberger observer [17].
However, the above systems assume that the system noise obeys a Gaussian distribution, which is easy to be violated in practice.Due to unmodeled anomalies, sudden disturbances from the environment, unreliable sensors, target maneuvers, system failures or attacks on the system, the system will suffer from outliers [18].In such scenarios, relying solely on the Gaussian noise assumption for state estimation leads to severe performance degradation and potential system divergence [19].Outliers in distributions typically exhibit heavy-tailed characteristics, prompting the utilization of heavy-tailed distributions to accurately characterize system states and measurements [20].The student-t distribution serves as an effective model for simulating heavy-tailed distributions, especially in handling non-Gaussian outliers, rendering it a widely adopted solution in this context [21].In [22], the sensor measurement noise was modeled as a multivariable student-t distribution using the CI consensus strategy, where the state and noise parameters are estimated at the same time.Combined with the variational Bayesian (VB) method, the joint posterior distribution is processed.
However, the above method can only deal with the situation where the measurement noise follows a heavy-tailed distribution.In order to deal with the situation that both process noise and measurement noise follow a heavy-tailed distribution, ref. [23] established both process and measurement noises as student-t distributions and proposed a distributed state estimation method based on a student-t filter and CI consensus strategy.In practical real-world settings, noise predominantly adheres to a Gaussian distribution, with outliers occurring sporadically.Given this scenario, an intuitive approach involves modeling the system where a mixture of Gaussian and student-t distributions encapsulates the dynamic nature of the noise.This method capitalizes on the strengths inherent in both Gaussian and student-t distributions, aiming to attain precise and consistent state estimations.Grounded in this conceptual framework, a resilient filtering algorithm, utilizing a blend of Gaussian and student-t distributions, is introduced for single-sensor state estimation [24].Nevertheless, the iterative calculations of the parameters pose a challenge, particularly in the context of distributed systems characterized by lightweight architectures and limited communication bandwidth.
We can find that many studies have focused on the robustness.However, in practical scenarios, heavy-tailed noise is a low-probability event, and the model noise can still present as a Gaussian distribution in most cases.Although the mentioned robust filter can provide a robust performance in the presence of outliers, it will lose estimation accuracy when the noise is normal.Therefore, how to balance the robustness and estimation accuracy as well as obtain consensus results are significant problems.
Motivated by these, this paper proposes a distributed consensus state estimation method based on the Gaussian distribution and the student-t distribution.A comparison of related studies is shown in Table 1.
The fusion of the results from these distributions provides a mixed-state estimation and corresponding error matrix.(2) Addressing the challenge of the increasing degrees of freedom in the student-t distribution over time by providing an effective approximate solution, ensuring stability and accuracy in the estimation process.(3) The introduction of an information consensus strategy for multi-distribution filters, enabling global estimation by achieving consensus on fused local filter results through interaction with neighboring nodes.The results are extended to the distributed case, and the recursive process presented further validates its efficacy, as supported by simulation results showcasing its performance in scenarios involving both Gaussian and heavy-tailed distributions.
The organization of this paper is shown in Figure 2. Section 2 presents the problem, Section 3 is the proposed distributed multiple-consensus-state estimation algorithm, the simulation results are presented in Section 4, and the conclusions are presented in Section 5.

Preliminaries and Problem Formulation
We consider a sensor network with node N and edge A. Then, the network can be represented as a two-tuple (N , A).The set of neighbor nodes (including node i itself) of node i ∈ N is N i .The process equation of the system is assumed to be discrete-time linear: where x k means the n x -dimension state vector at time k, F k means the state transition matrix, and w k means the process noise at time k.
The measurement equation of each node i ∈ N is also linear: where z i k means the n i z -dimension measurement vector of node i ∈ N at time k, H i k means the measurement matrix of node i ∈ N , and v i k means the measurement noise of node i ∈ N at time k.
In general, the noises can be supposed to follow a Gaussian distribution.However, some unknown perturbations may lead to outliers characterized by heavy-tailed noise.In order to deal with the problem, many studies have used distributions that are insensitive to outliers to model the noise.
The student-t distribution is one of them.A random variable x obeys the student-t distribution if its probability density satisfies the following [20]: where m is the mean, η is the degrees of freedom (dof), P is a positive definite symmetric matrix, Γ denotes the Gamma function, and n x is the dimension of x.The variables of the system model are summarized in Table 2.It should be noted that the variance in the student-t distribution is η η−2 P for dof η greater than 2; otherwise, the variance is undefined.When η = 1, it converges to the Cauchy distribution, and while η tends to infinity, it becomes the Gaussian distribution.Compared with the Gaussian distribution, the student-t distribution shows heavy-tailed characteristics: when moving away from the mean, the density does not decrease as quickly as with the Gaussian distribution.Figure 3 presents logarithmic presentations of the Gaussian and student-t distribution.It can be seen from the figure that the value of the Gaussian distribution quickly drops below −16, while the value of the student-t distribution does not drop so fast.Due to the long-tailed property, the student-t distribution is effective in dealing with heavy-tailed noise.However, it may not describe normal noise well.In practical scenarios, the Gaussian noise can adapt to most situations, while the heavy-tailed noise is a lowprobability event.Thus, only student-t distribution molding can improve the robustness of the algorithm, but it will loss estimation accuracy.
The problem is how to balance and obtain the robustness and estimation accuracy of the consensus estimation results for sensor networks in the presence of possible heavytailed noise.

Main Results
The proposed multi-distribution filter, based on Gaussian and student-t distributions for a single local sensor operating without information interaction, is introduced, addressing inherent algorithm limitations.An approximate method is provided to mitigate these shortcomings.Subsequently, a CI-based consensus strategy is proposed for scenarios involving mixed Gaussian and student-t distributions.Finally, building upon the aforementioned consensus strategy and the single-sensor multi-distribution filter, an algorithm for distributed multi-distribution filtering is proposed.

The Multi-Distribution Filter Based on Gaussian Distribution and Student-t Distribution
In this subsection, we present a multi-distribution filter for a single sensor exposed to both a heavy-tailed process and measurement noises.The exclusive reliance on a student-t distribution filter often results in prolonged performance degradation or necessitates frequent parameter readjustments during normal system operations.Conversely, relying solely on Gaussian filters tends to cause divergence in scenarios where system outliers manifest.To harness the strengths of both filters, we present the following two hypotheses for a single sensor node (omitting the superscript representing the sensor node in this subsection): H 0 : Suppose that the process and measurement noises obey the Gaussian distribution as follows: where N(x; m, P) denotes that x obeys the Gaussian distribution with mean m and covariance P. H 1 : Suppose that the process and measurement noises obey the student-t distribution as follows: where the probability of H 0 is µ 0 k , the probability of , and H = {0, 1}.Now, we need to assign a filter to each distribution.For the hypothesis of the Gaussian distribution, since the system model is linear, the standard Kalman filter can be used.The steps are as follows: given the initial values x0 0 and P0 0 , when time step k > 1, the following recursive process is performed: (1) Time update For the hypothesis of the student-t distribution, we use the student-t filter described as follows: given the initial values x1 0 , P 1 0 and η 0 , when time step k > 1, the following recursive process is performed: (1) Time update (2) Measurement update The detailed derivation of the student-t filter can be seen in [25].
In assuming that the state posterior obeys the mixed distribution of Gaussian and student-t distributions, their probabilities are µ 0 k and µ 1 k , respectively.According to the full probability theorem, where Z k = {z 1 , z 2 , . .., z k } represents the measurement set up to time k.In order to obtain the posterior probability density function (PDF), the required parameters are the probability µ r k corresponding to the two distributions, the state estimation xr k and the matrix P r k .For the student-t distribution, the dof η k is also required.In addition to the distribution probability µ r k , the others are obtained by the two parallel filters.Given the distribution probability µ i k−1 of the previous time, the probability of assuming that H r is correct can be given by where For the Gaussian filter, For the student-t distribution filter, The mean of the mixed posterior distribution is The covariance corresponding to the mixed posterior distribution is For the Gaussian distribution, the variance is and for the student-t distribution, the variance is Therefore, the expression of the covariance of the mixed posterior distribution can be obtained by Thus, we obtain a complete recursive step for the multi-distribution filter.
With each measurement update, the dof increase according to Equation (24).In turn, this also requires an increase in the dof of noise, making the problem more and more Gaussian.In fact, the algorithm will converge to the Kalman filter after several time steps.Therefore, it is necessary to find some approximate methods.
One of the simplest ways is to enforce η k = η in Equation (24), where η is a constant, so that the dof will not increase all the time.However, the actual posterior density is Notice that we have omitted the condition here.At this point, we need to find a posterior density q(x k ) = St(x k ; x1 k , P1 k , η) to approximate p(x k ).The qualitative characteristics should be retained.Therefore, the adjusted matrix parameters should be the scaled version of the original matrix.The general expression of this problem is how to find the probability density q(x) = St(x; x, cP, η) to approximate the probability density p(x) = St(x; x, cP, η) so that the two are as close as possible.This density is controlled by the scaling factor c>0. Therefore, our problem is how to find c so that the two probability densities are close in a certain sense.A suggested method is to use the moment matching method, that is, to make the variances of the p(x) and q(x) equal.The advantages of this method are that it is simple to apply and does not need any parameters to be adjusted.We can obtain the following conditions: where η > 2 and η > 2.Then, the scale factor c can be obtained: The PDF of process and measurement noises can be approximated in the same way.

Consensus on Mixed Density
The basic idea of consensus is to calculate the aggregation of the whole network by iteratively computing the same type of region on each node in the network that only contains the subset of adjacent nodes.Consensus is used to average the PDF of states in each node and the PDF of states received from neighbors.Given PDFs p(•) and q(•) and weight π, define the following information fusion and weighting operations: The Kullback-Leibler average (KLA), which depends on relative entropy, is an average information definition of a PDF.The weighted KLA in PDF {p i (•)} is defined as where π i > 0, and ∑ i∈N π i = 1 represents the relative weight.KL(p||p i ) is the Kullback-Leibler divergence (KLD) between p(•) and p i (•).The problem of the consensus algorithm can be described as finding a way to make lim where asymptotic PDF p(•) represents the KLA with the same weight.The solution to this problem is the collective KLA: the weighted KLA is given by the normalized weighted geometric mean of PDF It can be calculated by updating local data in a distributed manner using a convex combination with data from neighbors where ℓ is the number of consensus iteration steps, π i,j is the consensus weight satisfying π i,j ≥ 0, and ∑ j∈N i π i,j = 1, and it is also the (i, j) component of the consensus matrix Π (if j / ∈ N i , π i,j = 0).In addition, the initial value of iteration is l be the (i, j) component of matrix Π ℓ .Then, When π i,j is chosen so that matrix Π is primitive and doubly stochastic, Therefore, as the number of consensus steps increases, each local PDF tends to focus on the unweighted KLA.
For PDF p(x i ) = N(x; m i , P i ) with the Gaussian distribution, it can be proved that the probability density consensus algorithm can be simplified to an algebraic expression involving only its information vector q i = (P i ) −1 m i and information matrix Ω i = (P i ) −1 : This is the so-called CI consensus method.
In the above section, the multiple-distribution filter for a single sensor is given.To extend it to the distributed case, two important problems need to be solved: (1) for the mixed posterior distribution, which information should be transferred by the adjacent nodes and what strategy should be adopted to realize the collective KLA; and (2) since the mixed distribution is non-Gaussian, whether the CI consensus strategy for Gaussian distribution can be directly applied to the mixed distribution of Gaussian and student-t distributions.
For the first problem, the following consensus strategy is given: the distribution probability µ r k can be the first to obtain the consensus distribution probability, and then the fused PDF p i (x) = 1 ∑ r=0 µ r k p i,r (x) is given as the initial value, and then the consensus is run on the fused PDF.This strategy only needs to transfer the distribution probability and the fused PDF.
It should be noted that the above consensus method is for a continuous PDF.When the distribution probability is discrete, its distribution is a probability mass function (PMF).Given a PMF distribution µ i (i ∈ N ), the weighted KLA is defined as For PMF µ, ν and weight π > 0, the following information fusion and weighting operations are defined: Then, the KLA probability mass function can be expressed as The collective fusion of the PMF can be obtained in a distributed manner: where π i,j > 0 is the consensus weight, and Before answering the second question, let us first look at how to use a Gaussian distribution to approximate a student-t distribution.That is, how to find a scalar c to minimize the difference between q(x) = N(x; x, cP) and p(x) = St(x; x, P, η) under certain criteria.We know that when the dof of the student-t distribution tends to infinity, its distribution tends to the Gaussian distribution.Thus, we can obtain q(x) = N(x; x, cP) ≈ St(x; x, cP, η → ∞).In this way, the problem becomes an approximation between two student-t distributions, so we can use the moment matching method to obtain the value of c, that is Therefore, q(x) = N(x; x, η η−2 P) can be used to approximate p(x) = St(x; x, P, η).We can see that the fusion process can be seen as the fusion of two Gaussian distributions.Thus, we can directly use the CI consensus method based on the Gaussian distribution.

The Distributed Multi-Distribution Filter
We have previously obtained the algorithm for a single sensor and the consensus strategy for multi-sensor mixed density.Now, we need to extend the results to the distributed case.
For local node i, the initial values xi,r 0 , P i,r 0 , and µ r 0 are given for r ∈ H.For the student-t filter, the common dof η is also given.When k > 0, start the following recursive process.

Step 2 Calculate distribution probability
Update probability where the likelihoods of the Gaussian and student-t filter Λ i,0 k and Λ i,1 k can be obtained by Equations ( 27) and (28).

Step 3 Consensus on distribution probability
For L-step consensus, where ℓ = 1, 2, • • • , L is the number of consensus steps, π i,j is the consensus weight, π i,j ≥ 0, and Step 4 Fuse the mixed PDF Step 5 Consensus on fused PDF For L-step consensus, Step 6 Reinitialization The workflow of the proposed algorithm is shown in Figure 4, and the pseudocode is summarized in Algorithm 1.

Numerical Simulation
Consider a linear dynamic system in which the state contains x=[p x , ṗx , p y , ṗy ] T : where the sampling time is T = 1s, ∆ = diag([w 2 x , w 2 y ]), and w 2 x = w 2 y = 0.1.There are 20 sensor nodes in the sensor network, and their graphical topology is shown in Figure 5.The measurement model is 1 0 0 0 0 0 1 0 (69) In the simulation, the initial state of the filter is randomly selected from N(x 0 , P 0 ), where Process noise and measurement noise with outliers are generated by the following model [20,23]: with probability p o (72) where p o is the probability of measuring outliers.It should be noted that this model for heavy-tailed noise is widely used.
In this section, the following three methods are compared: (1) distributed consensus Kalman Filter [14], referred to as DCKF; (2) distributed consensus student-t filter, proposed in [23], referred to as DCSTF; (3) the proposed algorithm, referred to as DCMDF; and (4) the multiple-model method with two Gaussian distributions, referred to as DCKFIMM.
The number of consensus steps was set to L = 3, and the consensus weight of the sensor node was set to π i,j = 1/|N i | if j ∈ N i and π i,j = 0 if j / ∈ N i .The dof of the student-t distribution in DCSTF and DCMDF was η = 10.One model of DCKFIMM is the same as the DCKF, and anther model has the following parameters: the measurement noise variance is 100R, and the process noise variance is 100Q.The simulation results were obtained through 100 Monte Carlo simulations and evaluated using the root mean square errors (RMSEs) of the position and velocity.
The occurrence probability of outliers was set to p o = 0, which means that the noises are Gaussian.The RMSEs of the position and velocity of the four algorithms are shown in Figure 6 and 7.It can be found that under the condition of a Gaussian distribution, the performance of the DCMDF algorithm is almost the same as that of DCKF, while the error of DCSTF is relatively large.The performance of the DCKFIMM algorithm is better than the DCSTF and little worse than the DCMDF.This result shows that Gaussian models are better than the single student-t model in dealing with estimation problems in normal noise environments.At the same time, it is worth noting that DCMDF adopts a mixed model of Gaussian distribution and student-t distribution; therefore, it can obtain a higher estimation accuracy through model adaptation.The occurrence probability of outliers was set to p o = 0.3.Figure 8 presents the true and estimated trajectories for one simulation.Figures 9 and 10 show the RMSEs of the position and velocity of the three algorithms.It can be found that DCKF has the worst performance, which is caused by the natural defects of the Gaussian distribution.Although it can achieve good performance in a normal noise environment, it struggles to dealing with the estimation in the presence of outliers.The DCKFIMM has a better performance than DCKF; however, the performance of the DCKDIMM is worse than those of DCSTF and DCMDF because DCSTF and DCMDF adopt the student-t distribution, which is more in line with the distribution of noise while outliers occur.It is worth noting that DCMDF has the best performance because the algorithm uses a hybrid distribution model, which can be adaptive for the model in the estimation process, making the algorithm more robust and accurate.Tables 3 and 4 show the simulation results of the three algorithms under different outlier probabilities.It can be found that the RMSEs of the four algorithms will increase with the increase in outlier probabilities.However, the estimation performance of DCMDF always shows a relatively better estimation performance, and the estimation accuracy of the proposed algorithm improved by at least 20% compared to the other algorithms when p 0 ̸ = 0, which further proves that DCMDF has a good performance in robustness and estimation accuracy.To further test the performance of the algorithm, we reduced the number of nodes and density of network.The topology of a sensor network with 15 nodes is shown in Figure 11.Since the nodes and density of network were reduced, we increased the number of consensus steps to 6, the RMSEs of the position and velocity are shown in Tables 5 and 6.It can be seen that the proposed algorithm still has better performances in estimation accuracy.
In order to handle the estimation problem with heavy-tailed noise, in this study, we made two hypotheses for the model noise, which include the Gaussian distribution and student-t distribution.The reason why the student-t distribution was chosen to represent the heavy-tailed noise distribution is discussed in detail in [23], which is briefly reviewed here.By adjusting the dof, the student-t distribution can tend toward a Gaussian distribution and Cauchy distribution, which have good flexibility.Similar to the traditional multiple-model algorithm, we calculate the state estimation results based on two different hypothetical noises.Then, the two results are fused based on different weights.Therefore, the proposed algorithm can balance the robustness and estimation accuracy.
The computation and communication burden are important issues in distributed-state estimation.Compared with the traditional CI algorithm, the computation of the proposed algorithm is mainly in the step of local estimation.The amount of computation has doubled, but the strategy of parallel computing can be adopted.The other step is to calculate the distribution probability of consensus, which has a small increase in calculation.The increase in communication also comes from this step.It should be noted that the quantities that we are concerned with are the state and its error matrix.This step can also be omitted in the application.Then, the increases in computation and communication from the distribution probability calculation and consensus will be eliminated, which can further improve the computational efficiency and reduce the communication burden.

Conclusions and Outlook
For the problem of distributed state estimation in which both process noise and measurement noise are heavy-tailed, a distributed consistent multi-distribution state estimation algorithm based on the Gaussian distribution and student-t distribution parallel filter is proposed in this paper, with the following steps: (1) The system model is established as two models based on the Gaussian distribution and student-t distribution, and each model is assigned a filter based on the distribution.An algorithm based on single-sensor observation is derived, and a combined posterior distribution based on a mixed posterior distribution is presented.To solve the problem of the filter converging to the Gaussian distribution quickly, the moment matching method is used to approximate the filter to keep the heavy-tailed characteristics.
(3) Aiming at the consensus problem of mixed probability density, a mixed strategy based on CI consensus is proposed: firstly, the consensus on the discrete distributed probability PMF is carried out to obtain the KLA PMF; then, the posterior of the multi-distributions is combined using the probability of the distribution to obtain the posterior of the combination; finally, the CI method is used to obtain the consensus of the posterior of the combination.Based on the above hybrid probability consensus strategy and single-sensor algorithm, the recursive steps of distributed consensus multi-distribution filtering are presented.(3) The simulation results show that the proposed algorithm can achieve good results in both Gaussian noise and heavy-tailed noise scenarios.

Figure 1 .
Figure 1.Research background and problem.

Figure 2 .
Figure 2. The organization of this paper.

Figure 3 .
Figure 3. Logarithmic heat maps of Gaussian distribution (left) and student-t distribution (right).

,1 k and P i, 1 kFigure 4 .
Figure 4.The workflow of the proposed algorithm.

Figure 6 .
Figure 6.Position root mean square errors of different algorithms.

Figure 7 .
Figure 7. Velocity root mean square errors of different algorithms.

Figure 8 .
Figure 8.The true and estimated trajectories for one simulation.

Figure 9 .
Figure 9. Position root mean square errors of different algorithms.

Figure 10 .
Figure 10.Velocity root mean square errors of different algorithms.

Figure 11 .
Figure 11.Topology of sensor network with 15 nodes.

Table 1 .
Comparison with related studies.

Table 2 .
Variables of system model.

Table 3 .
Root mean square errors of position under different outlier probabilities.

Table 4 .
Root mean square errors of velocity under different outlier probabilities.

Table 5 .
Root mean square errors of position under different outlier probabilities.

Table 6 .
Root mean square errors of velocity under different outlier probabilities.