A ﬂexible robust Student’s t-based multimodel approach with maximum Versoria criterion

The performance of the state estimation for Gaussian state space models can be degraded if the models are affected by the non-Gaussian process and measurement noises with uncertain degree of non-Gaussianity. In this paper, we propose a ﬂexible robust Student’s t-based multimodel approach. More speciﬁcally, the degrees of freedom parameter from the Student’s t-distribution is assumed unknown and modelled by a Markov chain of state values. In order to capture more information of the Student’s t-distributions propagated through multiple models, we establish a model-based Versoria cost function in the form of a weighted mixture rather than the original form, and maximize the function to interact and fuse the multiple models. Simulated results prove the ﬂexibility of the robustness of the proposed Student’s t-basedmultimodel approach when the existence probability of the outliers is uncertain.


Introduction
As a classical state estimation method, Kalman filter [1] has been thoroughly applied in a variety of applications such as positioning, navigation, brain imaging, and traffic control, together with its numerous nonlinear or off-line extensions [2][3][4][5][6] . These estimators assume that the model parameters and the noise statistics are exactly known, which is too perfect for most real applications and usually cannot be satisfied, hence resulting in reduced accuracy of the state estimation.
Known and correct information of the process noise and measurement noise is a key factor affecting the state estimation performance. However, uncertainties are practically inevitable to most state estimation problems. Many sources of uncertainties are due to the abnormal changes in the process of state evolution or sensor observation, which makes the Gaussian assumption on the process noise and measurement noise invalid. For the state estimation with measurement outliers, the Masreliez-Martin method has been used to develop a robust extended Kalman filter [7] and a robust unscented Kalman filter together with the adaptivity to process and measurement covariances [8] . In [9] , a nonlinear unscented filter that is robust to measurement outliers and adaptive to a time-variant measurement covariance matrix has been designed by using Huber's M-estimation and the variational Bayesian approach. The variational Bayesian approach has proven its efficiency in suppressing the measurement outliers when the measurement noise is represented by the Student's t-distributionof a hierarchical form [10][11][12] . Aforementioned Huber's M-estimation approach and the hierarchical Student's t-distribution have also been used for the state estimation with non-Gaussian process noise, in the light of inertial and Global Positioning System navigation [13][14] .
In most real-time systems, simultaneous consideration of non-Gaussian process and measurement noises is necessary. By using the similar modelling idea in [14] , a robust Student's t-based Kalman filter using variational Bayesian approach is presented in [15] to cope with the unknown non-Gaussian process and measurement noises. In [16] , a generalized Gaussian scale mixture model has been proposed to model the heavy-tailed and skewed non-Gaussian process and measurement noises for the Kalman filter. Instead of the minimum mean-square error criterion, the maximum correntropy criterion has been introduced to establish a new state estimation framework [17][18] to deal with outliers both in the process and measurement noises. During recent years, the interests in the Student's t-filter methodology [19] have been significantly increasing, especially thanks to its ability to represent heavy-tailed distributions in many applications, with a light computational cost. Multiple advanced nonlinear Student's t-filters with efficient computations of integrals involving nonlinear functions have been developed in [20][21][22][23] .
Although many effort s have been devoted to developing robust state estimation filters, the research on state estimation with non-Gaussian noises with time-variant nature still receives much less attention than its Gaussian counterpart. Especially, there is lack of works targeting on the flexibility of the robustness, i.e. most current available filters are only suitable for the non-Gaussian noise with fixed or specific degree of non-Gaussianity, e.g. heavytailedness. These estimators are vulnerable to the outliers with an uncertain or time-variant existence probability in the system and measurement processes. Although an adaptive approach to changeable noise covariances has been developed in [9] and approaches using parameters learning have been proposed in [10][11][12] for the non-Gaussian noises, these works do not take into account the uncertain degree of heavy-tailedness, which makes the robustness of these filters lack of flexibility. Pioneering works on similar topics have been proposed in [24][25][26] by constructing several noise models. Nevertheless, the developed multimodel particle filters [24][25] are usually computationally expensive, and these works mainly focus on the measurement noise. In addition, the flexibility of the robustness has not been validated by the noise with different levels of heavy-tailedness. The Bayesian model averaging (BMA) approach has been successful with handling multiple noise models in [24][25][26] , but the model fusion only involves second-order information and high order information that is crucial to the non-Gaussian data has not been considered.
In this paper, we aim at enhancing the robustness of the Student's t-filter against noises with uncertain degree of heavytailedness, and propose a flexible robust Student's t-based multimodel approach. The main contribution of our work is two-fold. Firstly, the unknown degrees of freedom (dof) parameter is considered to follow a Markov process with interactive dof modes. Thus the flexibility problem of robustness can be naturally translated into an interacting multiple models (IMM) framework [27] . This modelling method is different from the one in [24] and therein the noise models are only combined directly at the final stage. Secondly, we introduce the maximum Versoria criterion, an effective optimization criterion recently used in kernel adaptive filtering area [28][29] , within the multiple Student's t-based models, and design a novel and effective model fusion strategy for the established dof multiple models. Using the synthetic data, we demonstrate that the proposed fusion strategy offers better model fusion for state estimation in the presence of non-Gaussian noises with uncertain degree of heavy-tailedness.
The outline of this paper is summarized as follows. The maximum Versoria criterion is briefly introduced in Section 2 . Section 3 presents the proposed flexible robust Student's tbasedmultimodel algorithm. The computational complexity and stability of convergence are briefly discussed in Section 4 . Simulated results are presented in Section 5 followed by concluding remarks in Section 6.

Maximum Versoria criterion
The original Versoria function, also named as Agnesi function, is defined as follows [30] where x denotes the variable, A = 2 a is the diameter of the generating adjoined circle of the Versoria function, meaning that the centroid of the circle x 2 + ( f ( x ) − a ) 2 = a 2 is located at ( 0 , a ) , and τ = (1 / 2 a ) 2 is the Versoria shape parameter. In Fig. 1 , the Versoria function with a = 0 . 5 is plotted and compared with the appropriate Gaussian and Student's t probability density functions. It can be seen that the Versoria function has the heaviest tail among all and its tail is even thicker than the tail of a Student's t density function ( v = 3 ) which is acknowledged as suitable for characterizing data with a heavy-tailed non-Gaussian distribution.
In the literature of kernel adaptive filtering, the objective of using the Versoria function is to estimate the unknown weight vector ω from the output signal as defined in [28] D k = Y T k ω + γ k (2) where γ k is the noise with impulsive interferences and Y k is the input vector of the unknown system. The output error e k is defined as where ω k −1 represents the estimate of ω of at time k − 1 . The optimization problem under maximum Versoria criterion (MVC) is to compute f ( e k ) using (1) or other relevant forms as the cost function and maximize it to achieve the minimum error.
MVC has been shown with excellent robustness against non-Gaussian interferences such as impulsive noise in kernel algorithms [28][29][30][31][32] . It provides superior performance than algorithms using Gaussian-based kernel. Actually, not only have the kernelbased functions been used in the community of adaptive signal processing and communication, but also been extended to develop optimal state estimation rules to capture high order information existing in the stochastic dynamical system [17][18] . Given that the heavy-tailedness of the specific Versoria function shown in Fig. 1 , the general Versoria function is believed to have the potential to fully exploit the non-Gaussian characteristics and offer better handling of high order information than the Gaussian function for the stochastic dynamical system. Inspired by the above, we introduce the MVC to the Student's t-filter based model fusion where second-order moments are not accurate enough and high order information in multiple models is crucial to the estimation accuracy. In the next section, the novel MVC based model fusion will be derived.

Maximum Versoria criterion based model fusion
The following stochastic dynamical system is considered for the Student's t-filter where x k and z k denote the m × 1 state variable and d × 1 measurement, respectively, F k is the state evolution matrix and H k is the measurement matrix. The process noise w k −1 and the measurement noise v k are the non-Gaussian noises with nominal covariance matrices Q k −1 and R k , respectively. In the standard Student's t-filter framework, the state x k and the noises w k −1 and v k are assumed to be marginally Student's t-distributed [19] .
In order to develop a Student's t-filter able to estimate the system state with non-Gaussian noises of time-variant heavytaildness, M appropriate models are introduced and each of them uses a different value for the dof υ s k k ∈ { υ 1 k , υ 2 k , ··· , υ M k } . These values for the dof could be chosen according to the prior knowledge about the non-Gaussianity, more specifically, the intensity of the heavy-tailedness. Here s k indicates the system mode which is represented by a M-state Markov chain and the state transition probability matrix is = { π ij } satisfying the condition M j=1 π ij = 1 for any i ∈ { 1 , 2 , ··· , M } . Each element π ij denotes the transition probability from mode i to mode j.
In fact, this modelling idea is originally taken from the works in the area of wireless localization [33][34] where the condition of the communication channel in the cellular networks switches between LOS and NLOS randomly due to the obstructed radio transmissions. In their works, a two-state Markov process is adopted to represent the mode state for the NLOS and LOS conditions. In our work, for each model i ∈ { 1 , 2 , ··· , M } , suppose we have obtained the Student's t-distributed filtering density at the end of discrete time instance k − 1 such that υ i k −1 is the dof parameter, and Ŵ(·) is the gamma function.

Step 1) Mixing under maximum Versoria criterion
At the mixing stage, we are interested in calculating the density by using marginalization where μ i | j k −1 is known as the mixing probability. As can be seen from (6), we use a single Student's t-distribution Recall that for the conventional IMM framework [27] where the above Student's t-distribution is replaced by the Gaussian, the moments matching technique has been widely used in most Gaussian mixture based methods and a single approximate Gaussian distribution is found by matching the first two moments of the mixture of Gaussians. However, this assumption is not valid often since a mixture of probability distributions does not necessarily have a Gaussian distribution, particularly for the jump Markov systems with non-Gaussian noises. In our formulated problem, each component of the mixture is assumed to be a Student's t-distribution. For such a heavy-tailed distribution, the moments matching technique faces challenges at the information fusion step due to the loss of high order information.
In order to cope with this problem, we introduce the MVC as the information measure for model interaction of the multimodel state estimates. Instead of using the error (3) defined for adaptive filtering, we define the error function for the model fusion as follows where the weighted l 2 norm || · || is used, x k −1 | k −1 is the unknown state variable. Since our focus is on fusing M models, the Versoria cost function here is re-defined in the form which is of a weighted mixture rather than the original form of the Versoria function. In (8) the mixing probability μ i | j k −1 is chosen as the corresponding weight.
To maximize the Versoria cost function (8), we need to compute its first derivative. After some mathematical arrangements, we obtain Note that there is no closed-form solution to (9) and a fixedpoint iteration algorithm is required to solve it. Let L (L ≥ 1) be the maximum iteration number and the value from the L th iteration is assigned to the mixing state vector ˆ x Although a few iterations are involved, the algorithm converges very fast to one local optimum due to the property of the Versoria function, especially when the error is large. Thus a small number of L iterations is enough to find a local optimal solution and also provides a satisfactory performance in practice. In the simulation section, the fast convergence of this fixed-point iteration will be validated.
After obtaining the mixing state vector ˆ x is required as an essential part of the whole cycle of the IMM framework. Hence we merge the scale matrices from multiple dof models using the efficient Kullback-Leibler based method [35] as follows Step 2) Mode-conditioned Student's t-filter At this stage, the standard Student's t-filter is run for each dof model. The prediction stage consists of The objective of the subsequent update is to compute the filtering density p(x j k | z 1: k ) after receiving the measurement vector z k . The joint density of the state and measurement is Student's tdistributed as follows where Thus the conditional density of the state can be updated by using the standard Student's t-filter [19] where υ j k = υ j k −1 . As stated in Student's t-filter related literature [19][20] , the moments matching technique (not the moments matching in the IMM framework) is often used to obtain (20) and (21) to prevent the loss of robustness after a few time steps such that ˆ x j * k | k and P j * k | k are substituted by ˆ x j k | k and P j k | k , respectively. The results (20)- (21) are still kept within our algorithm so that information from all the dof models using different values of dof can be distinguished and preserved for dof model interaction and fusion by using the MVC. Otherwise, all the Student's t-filters reduce to Kalman filters resulting in the loss of robustness, not to mention the flexibility of the robustness. Step

3) Mode probabilities update
The mode probability is updated by the Bayes' rule as where the measurement likelihood function is in the form of the standard Student's t-distribution (see (5)) instead of the Gaussian Step

4) Estimation fusion under maximum Versoria criterion
Similar to the mixing stage, the MVC is chosen to replace the moments matching technique to find the combination of all the models with updated mode probabilities. This time, the error function is defined as from which the covariance matrix  (7), the definition of (24) is due to the consideration that this fusion stage immediately follows the modeconditioned Student's t-filter, in which the state and scale matrix are all updated using different values of the dof. Thus the covariance matrix appears to be more accurate than the scale matrix alone as the dof parameter is included.
The Versoria cost function at this stage can be written as Taking the first derivate of (25) and setting it to zero, gives the fused state vector ˆ x k | k in the iterative form as follows where L denotes the maximum iteration number. The fused scale matrix can be obtained by applying Kullback-Leibler divergence based method It is known that a fixed single value chosen for the dof parameter limits the flexibility of the robustness of the Student's tfilter if the degree of heavy-tailedness is uncertain. Our method can cope with such an issue from two perspectives. Establishing multiple models for the dof parameter with possibly switched values makes the robustness of the Student's t-filter adjustable. With the Markovian transition of dof modes, the problem is straightforwardly translated into a multimodel estimation under the IMM framework. In addition, to incorporate the robust Student's t-filter effectively, the celebrated IMM framework is altered by introducing the Versoria function to combine all models with high order moments being processed for better robust fused results. The improvement will be demonstrated shortly in the simulation.

Further discussion
In this section, we briefly discuss the proposed algorithm in terms of computational complexity and stability of convergence, respectively.

Computational complexity
The proposed algorithm is within the IMM framework. It is known that the IMM framework has the best computational complexity among those popular multimodel approaches that have acceptable estimation errors [36] . As can be seen in the previous section, Step 2 and Step 4 are the main parts of the proposed algorithm under the IMM framework. Hence we give the computation complexity of (10) in terms of matrix inversion (using O (·) notation) and matrix/vector multiplication with respect to the state dimension m and the number of models M in the following. (28) which is related to the iteration number L . Since (10) shares the same structures with (26), the above computation complexity also applies to (26).
For a clear comparison, we may introduce moments matching to our Student's t-based problem for model fusion also within the IMM framework. The consequent algorithm would have a similar form to ours except (10), (11), (26) and (27). As a consequence, the mixing mean and scale matrix after replacing MVC with moments matching are approximated respectively as Therefore the major computational complexity is from (30) and computed in terms of matrix/vector multiplication as One may conclude that the whole computation cost of the proposed algorithm is evidently much more expensive than that of the moments matching based algorithm as the system dimension m grows. The primary cost arises from the multiple operations of matrix inversion/multiplication and depends on the iteration number L as well. Despite showing predictable higher computational cost, our method is expected to give better robustness performance than the moments matching based approach and other related competing estimation approaches. Fortunately, a large iteration number is usually unnecessary in practice owing to the MVC convergence property. Then the heavy computation loads can be avoided. In the simulation, we will compare our proposed method with other competing methods in terms of computational costs and estimation accuracy.

Stability of convergence
In this section, we briefly discuss the stability of the MVC-based iterative algorithm. The original MVC arises in the area of information theoretic learning, and its stability of convergence has been proved via the least square mean algorithm. In contrast, we use the MVC to fuse multiple models for the state estimation, which is a completely different subject from those in existing literature.
Hence, unlike what has been done in [28] , it is hard to obtain recursive update equations for the state estimation error from (10) and (26). This results in difficulty in analyzing the convergence directly. Fortunately, (10) and (26) are iterative fixed-point solutions under MVC, and [37] has offered a complete convergence analysis of the fixed-point algorithm with a similar form. For example, we rewrite (26) in the following form to match the general iterative form shown in [37] such that where l denotes the lth iteration, and we have made the following transformation where f (·) is the function with respect to the norm. Therefore the proof in [37] can be similarly applied into our iterative fixed-point equations like (10) and (26), and a sufficient condition therein guarantees that the iterative solutions are stable to converge to a unique fixed point. However, the above may not be extremely strict because a Gaussian kernel which is always less than one is used in their work, while the Versroria function dependent on the radius a is considered in ours. In addition, the relationship between the radius and the convergence has not been revealed. Thus a rigorous and in-depth theoretical exploration on these issues remain as our future work. Nevertheless, we will still demonstrate the convergence property of the proposed algorithm as a complement, and test how the radius influences the convergence in the following simulation.

Simulated results
In this section, we adopt an example that estimates the location of a single target to assess the performance of the proposed method, namely, maximum Versoria Student's t-based multimodel Table 1 Effects of the number of iteration.  It is worth pointing out that M is unnecessary to be very large in practice and a large M does not ensure the improvement of the final estimates but would absolutely cause extra computational burden. Too many dof models would not yield satisfactory fusion results in consequence of the excessive model competition. Similar remarks have been drawn in the area of maneuvering target tracking [38] where multiple models are established for the uncertain target dynamics. Thus M dof models should be chosen to have distinguishable estimation performance. If there is no accurate prior knowledge about the heavy-tailedness of the noises, it is advised to select at least two values, for instance, a very high value and a low value for the dof models.
In this simulation, the following two setups for the uncertain process noise and measurement noise are considered. Case 1 . Noises with abrupt changes: The process noise is purely Gaussian in the first 50s and only contaminated by 5% outliers N ( w k −1 | 0 , 25 Q k −1 ) in the rest 150s. The measurement outliers N ( v k | 0 , 50 R k ) occur with the probabilities of 0%, 5%, 1.5% and First of all, we study the convergence of the proposed algorithm and test the influence of the radius on the algorithm. Given five predefined values for the number of iteration L ( L = L in this simulation) and the assumption that a = 1 , Table 1 gives the average root mean square errors (ARMSEs) and standard deviations (SDs) of the estimated position after 20 0 0 Monte Carlo repetitions for both two cases. A notable decrease in the ARMSEs and SDs is observed when the iteration number changes from 1 to 2 both in Case 1 and Case 2. It can be seen as well that the ARMSEs and corresponding SDs change slightly in both cases when the number increases from 2 to 3, and remain unchanged if L ≥ 3 , which means the algorithm almost converges at L = 3 . This accords with the statement from literature that the Versoria function converges very fast to reach the local optimum.
Next, we move our focus on to the impacts of the radius a on the estimation accuracy of MVMMStd. The iteration number L = L is chosen as 2 in this part. In Fig. 2 , letting the radius range from 0.0 01 to 10 0 with selective values, we show the relationship between the ARMSEs of the estimated position varying with the radius a for both cases. The standard errors of the ARMSEs, repre- sented by upper and lower red chain-dotted lines, are provided in the figure as well. Note that a nonlinear log scale is employed for the horizontal axis to achieve good presentation. From Fig. 2 , it is observed that the ARMSE varies remarkably when a is chosen in the interval [2,20], while ARMSE curves for both cases tend to be flat out of this interval, which means the algorithm is quite insensitive to a < 0 . 1 and a > 20 . The estimation error can be diminished when 0 . 1 < a < 2 , but not as greatly as 2 < a < 20 . To further prove it, we select first 50 Monte Carlo runs from the whole 20 0 0 runs and compute the ARMSEs of the position over 200s for each run. As displayed in Figs. 3 and 4 , the MVMMStd with a = 0 . 1 has an apparent estimation advantage over the one with a = 20 but performs only slightly better than the one with a = 2 .
Even though the influence of the radius on estimation errors have been studied, it is still difficult to investigate the impact of the radius on the convergence directly. Instead, we combine with the iteration number and study how the ARMSEs change versus different iteration numbers as the radius varies. We continue to do the experiment as Fig. 2 by using different iteration numbers, and obtain the results shown in Fig. 5 . Note that only the results in the interval [1,10] are presented. This is because [1,10] is the most active interval to affect our algorithm, and the difference between each curve is quite small for the intervals except [1,10]. In addition to previous results given in Table 1 , Fig. 5 visually describes the convergence behavior of the proposed algorithm. A larger iteration number generally leads to a lower estimation error, and the performance of the MVMMStd using L ≥ 3 is quite close, especially when the radius is either in [1,2] or [5,10]. Thus values of the radius from these intervals do not show significant influence on the convergence. On the other hand, these four curves present the largest gaps between each other when the radius falls into the interval [2,5]. Moreover, it can be seen from Fig. 5 that to achieve the same estimation error the MVMMStd with a larger radius requires more iteration steps. It is equivalent to say a smaller radius in [2,5] enables the algorithm to converge faster. Therefore, if the convergence speed and estimation precision are simultaneously considered, a small value in [2,5] is suggested for the radius.
Secondly, we compare the MVMMStd with several competing robust estimation methods in terms of estimation errors. These methods include the Student's t-filters [19] such as Std3 and Std100 with fixed dof values as 3 and 10 0, respectively, the robust  Bayesian filter using Bayesian model averaging approach (RBFBMA) [24] which is essentially a multimodel approach combining BMA and restricted variational Bayes, the Student's t-based multimodel approach that shares the same Student's t-based modelling method with ours but uses moments matching (MMMMStd) [27] rather than the MVC, the Student's t-based robust Kalman filter using variational Bayesian (RStdKF) without knowing process and measurement noises [15] , the generalized hyperbolic skew Student's t Kalman filter (GHSStdKF) [16] , the generalized hyperbolic variance Gamma Kalman filter (GHVGKF) [16] , the maximum correntropy Kalman filter (MCKF) [17] , and the robust Kalman filter using Huber's M-estimation [ 13 , 39 ] (HMKF). All the multimodel algorithms are initialized by using the same predefined model parameters for fair comparisons. For the MVMMStd, L = 2 and a = 1 are used.
The number of iteration in GHSStdKF, GHVGKF, and RStdKF is chosen as 10 according to literature to ensure convergence. MCKF uses threshold detection to end iterative computations instead of using fixed iteration steps, and hence the number of iteration is dynamic. The actual average iteration steps of MCKF in Case 1 and Case 2 are 3.01 and 3.07, respectively.
Plotted in Fig. 6 are the RMSEs of the position of all the compared methods in Case 1. In first 50s, since only Gaussian process and measurement noises exist, Std100, MCKF and RStdKF have the best estimation precision. The performance of MVMMStd is still comparable to that of Std100, MCKF and RStdKF. MMMMStd, HMKF and RBFBMA have an advantage over Std3. MMMMStd even outperforms MVMMStd in this period, which means the loss of high order information from the dof models is insignificant and  the moments matching technique is better compared to the MVC under the pure Gaussian environment. In the next 150s with the existence of the moderate, mild and severe outliers in noises in order, MVMMStd always performs the best among all because of the ability to find most probable combination of multiple models. MMMMStd and RBFBMA offer worse performance than MVMMStd, although they both use the multimodel approach. This is because MMMMStd uses moments matching at the mixing and combination stages, which ignores the high order information contained in Student's t-filter based models, and RBFBMA is unable to handle non-Gaussian process noise apart from using moments matching at the stage of overall approximation. Though RStdKF gives the lowest estimation errors occasionally and is able to learn unknown noise parameters, it is sensitive to the change of heavy-tailedness and hence its estimation results have large fluctuations. GHSStdKF and GHVGKF have shown the effectiveness when the noise is heavy-tailed and skewed in literature, however, they exhibit worse than the proposed MVMMStd in this case. It is interesting to note that in the last 150s, Student's t-filters with fixed dof (3 or 10 0) present better performance for some time than the multimodel approaches (MMMMStd and RBFBMA) and other robust methods. Nonetheless, it is difficult to select a correct dof value for the Student's t-filter in practice, particularly in the case where the heavy-tailedness is time-variant. It also implies that solutions to the non-Gaussian noise with uncertain heavy-tailedness are necessary, and appropriate processing of the model fusion is required if the multimodel approach is considered.
Likewise, RMSEs of the position in Case 2 are given in Fig. 7 . All the algorithms demonstrate a trend of performance deterioration as a result of the increasing contamination probability 0 . 15(k − 1) / 200 for both the process and measurement noises. Although all of them exhibit close estimation performance at the beginning when the contamination of outliers is very low, the disparity among the algorithms keeps progressively growing along with the increasing outlier occurrence probability. The major reason that the estimation error of RBFBMA significantly  grows after 10 0s is its inability to handle the time-variant non-Gaussian process noise. Due to the same reason explained for Case 1, RStdKF has much larger errors and fluctuations in the last 10 0s than it has in the first 100s. It shows that the proposed MVMMStd provides the best estimation results throughout the whole time except the first 10s, i.e. when the outlier percentage is quite low and the actual noise distributions approximate to the Gaussian.
As the ARMSEs of MVMMStd using different iteration numbers have been displayed in Table 1 , Table 2 gives the AMRSEs of all the rest competing algorithms for Case 1 and Case 2 for comparison in addition to Figs. 6 and 7 . Note that the numbers following GHSSt-dKF, GHVGKF and RStdKF represent the iteration steps they use in the algorithms. As seen from Tables 1 and 2 , the proposed MVMM-Std has the best estimation results, which is in line with what we observe from Figs. 6 and 7 .
Furthermore, as MVMMStd and MMMMStd are structurally similar and only differ from each other in terms of the fusion method under the IMM framework, we show their mode probabilities in Figs. 8 and 9 for Case 1 and Case 2, respectively. As expected, for the MVMMStd in both cases, the probabilities of the model using a large dof value (Mode 1) are prone to decline as the outliers probability rises (at 50s and 150s in Case 1, the whole time in Case 2) and increase as the outliers probability decreases (at 10 0s in Case 1), while the ones using a smaller value (Mode 2) have an opposite trend. In Case 1, it shows that the mode probabilities estimation of MMMMStd is less sensitive to the sudden changes of outliers percentage at 50s and 100s, and obviously gives incorrect estimates when outliers percentage is high in the last 50s. Even if MMMMStd provides relatively close estimates of mode proba-bilities to MVMMStd in [50, 150]s, the position estimation errors in these periods are still larger than that of MVMMStd as shown in Fig. 6 because high order information is not appropriately processed. In Case 2, combining the results from Fig. 7 , we may find that MMMMStd underestimates the probability of Mode 1 when the degree of heavy-tailedness is mild and moderate except for the first 10s, whereas overestimates it when the degree escalates, and vice versa for Mode 2.
Finally, the computational complexity of MVMMStd is further compared to that of previous competing robust methods. The computational complexity of each algorithm is calculated by using the cputime function in Matlab. Let Std3 have a computational complexity of one, and the complexities of the rest are normalized relative to Std3. Normalized computational complexities obtained from Case 1 are given in Table 3 .
As can be observed from Table 3 , Std3 has the lowest computational complexity among all the competitors. RBFBMA, HMKF, MM-MMStd and MCKF are more computationally efficient than the rest competing methods apart from Std3. As expected, the computational complexity of MVMMStd increases when the number of iterations grows from 2 to 5 (Numbers following MVMMStd are iteration steps). MVMMStd-2 has almost an identical computational complexity to RStdKF-10, and is computationally much cheaper than GHSStdKF-10 and GHVGKF-10. Among all the iteration-based algorithms, MVMMStd has a very high computational complexity of a single iteration step because each iteration step of MVMM-Std contains not only weighted models aggregation but repeated operations of matrix inversion and multiplication. However, if we take into account the estimation errors given previously in Table 1 , Figs. 6 , and 7 , we may select appropriate iteration steps for MVMMStd and avoid spending unnecessary computational resources. The MVMMStd with 2 iteration steps already provides satisfactory results in both cases, thus L = 2 and L = 3 are preferable choices in our simulation. Using L = 4 , L = 5 or larger iteration number does not provide distinct improvement but leads to 3~4 times the computational complexities of RBFBMA, HMKF, MMMMStd and MCKF. Even though RBFBMA, HMKF, MMMMStd and MCKF have notably lighter computational complexities than MVMMStd, the robustness of these algorithms are undermined   when the process noise and measurement noise are of time-variant heavy-tailedness.

Conclusion
In this paper, a flexible robust Student's t-based multimodel approach is proposed for the state estimation. The key feature of the proposed approach is that the unknown degrees of freedom is modelled by a Markov process and the MVC has been used to fuse multiple Student's t-based models. Results from the simulated data show the superiority of the proposed approach when dealing with the non-Gaussian noises with uncertain degree of non-Gaussianity, ranging from mild to severe heavy-tailedness, in spite of higher computational complexity. This approach is expected to offer better estimation performance in practice by either optimizing transition probabilities or selecting more accurate dof models if the prior knowledge about the noises is obtained. The proposed approach can be also used to develop the nonlinear counterparts with light computational loads using existing Student's t-based numerical methods appropriately.

Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.