A Single-pass Noise Covariance Estimation Algorithm in Adaptive Kalman Filtering for Non-stationary Systems

—Estimation of unknown noise covariances in a Kalman ﬁlter is a problem of signiﬁcant practical interest in a wide array of applications. This paper presents a single-pass stochastic gradient descent (SGD) algorithm for noise covariance estimation for use in adaptive Kalman ﬁlters applied to non-stationary systems where the noise covariances can occasionally jump up or down by an unknown magnitude. Unlike our previous batch method or our multi-pass decision-directed algorithm, the proposed streaming algorithm reads measurement data exactly once and has similar root mean square error (RMSE). The computational efﬁciency of the new algorithm stems from its one-pass nature, recursive fading memory estimation of the sample cross-correlations of the innovations, and the RMSprop accelerated SGD algorithm. The comparative evaluation of the proposed method on a number of test cases demonstrates its computational efﬁciency and accuracy.

The key to process and measurement noise covariance estimation is a relationship linking the covariance of the state estimation error and of the innovations in any suboptimal filter.This relationship serves as a fundamental building block for the correlation-based approaches.Pioneering contributions using this approach were made by [5], [16]- [18].
Sarkka and Nummenmaa [19] proposed a variational Bayesian method for the joint recursive estimation of the dynamic state and measurement noise parameters in linear state space models.The method is implemented by forming separable variance approximations to the joint posterior distribution of state and noise parameters at each time step.However, this method does not account for changes in process noise.The variational approaches generally require tuning parameters to converge to the correct parameters and they often converge to local minima.
In [12], we improved the computational efficiency and accuracy of the batch estimation algorithm in [23] by a sequential mini-batch estimation method with adaptive step size rules.When updating the filter gain, we applied a sequential fading memory mini-batch estimates of the innovation correlations.We also applied dynamic convergence thresholds to enhance the computational efficiency of the sequential estimation algorithm.Application of this method to non-stationary systems requires a change-point detection algorithm described in [11] to extract the time points of abrupt changes in unknown noise covariances based on the innovation sequence.
The multiple-model estimation algorithm assumes that the system obeys one of a finite number of models, and each model has its own non-switching dynamics [3].The overall estimate of multiple-model algorithm is a convex combination of the estimates from multiple parallel filters, each based on the individual models, where the weights of convex combination are the concomitant posterior model probabilities.
For the noise covariance estimation in non-stationary systems, the limitations of previous research [12] are the following: First, the previous methods require multiple passes through the observation data and thus are computationally expensive.Second, the accuracy of decisiondirected noise covariance estimation method depends on the accuracy of the change-point detection algorithm because the sequential estimation algorithm is invoked for samples in between two consecutive change points.Third, the previous algorithms assumed that the structure of the dynamic model is known.In this paper, we seek to overcome these three limitations by developing a streaming algorithm that reads measurements exactly once and extends the approach to multiple models.

B. Contribution and organization of the paper
We present a single-pass, sequential mini-batch noise covariance estimation algorithm as an extension of the work in [12], [13] for non-stationary systems.Our proposed method enables the estimates of the measurement and process noise covariances without the use of a change-point detection algorithm.We enhance the computational efficiency of the method via a single-pass through the observation data.The only caveat is that jumps are assumed to occur occasionally and after the filter has reached a steady-state, that is, the jumps are infrequent.More significantly, the structure of the dynamic model is unknown, but is known to belong to a known subset of models.We validate the proposed method on several non-stationary and multiple-model system test cases.
The paper is organized as follows.In Section II, we provide an overview of the sequential mini-batch stochastic gradient descent (SGD) method in multiple-model system, including identifiability conditions for estimating the unknown noise covariances in each individual model, the fading memory filter-based innovation correlation estimation, and the SGD update of the Kalman gain.Section III shows evidence that our method can track unknown noise covariances in non-stationary systems, as well as systems exhibiting dynamics from a known subset, and that the single-pass algorithm is computationally efficient.Lastly, we conclude the paper and discuss potential avenues for future work in Section IV.

II. SEQUENTIAL MINI-BATCH GRADIENT DESCENT METHOD FOR ESTIMATING PROCESS AND MEASUREMENT NOISE COVARIANCE PARAMETERS IN MULTIPLE-MODEL SYSTEMS
The multiple-model approach assumes that the system obeys one of a finite number of fixed models.Formally, the approach assumes that the linear discrete time stochastic dynamic system can assume one of J models, j = 1, 2, • • • , J, given by where x(k) is the n x -dimensional state vector, z(k) is the n z -dimensional measurement vector, and j is the candidate model.F j and H j are the n x × n x state transition matrix and the n z ×n x measurement matrix of the system, respectively, and Γ j is the noise gain matrix.Here, the process noise v j (k) and the measurement noise w j (k) are the sequences of zero-mean white Gaussian noises with unknown process noise covariance Q j (k) and unknown measurement noise covariance R j (k), respectively.Note that the initial state error and the two noise processes are assumed to be mutually independent.We assume that Q j (k) and R j (k) are piece-wise constant such that the filter reaches a steady state between any two jumps and that the jump is of unknown magnitude.Given Q j (k) and R j (k), the multiple-model adaptive Kalman filter involves the consecutive processes of prediction and update given by [3], [9], [10], [15] The Kalman filter predicts the next state estimate at time index (k + 1), given the observations up to time index k in (3) and the concomitant predicted state estimation error covariance in (6), using model-specific system dynamics, the updated state error covariance P j (k|k) at time index k and the process noise covariance, Q j (k).The updated state estimate at time (k + 1) in (5) incorporates the measurement at time (k + 1) via the Kalman gain matrix in (8), which depends on the innovation covariance S j (k + 1) (which in turn depends on the measurement noise covariance, R j (k) and the predicted state error covariance P j (k + 1|k)).The updated state error covariance P j (k + 1|k + 1) is computed via (9).This corresponds to Joseph form, which guarantees that the updated state covariance matrix will remain positive definite.
The mode likelihood function Λ j (k) is computed via (10), which depends on the innovation sequence ν j (k) and innovation covariance S j (k).The mode probability p j (k) corresponding to each candidate model at time index k is computed via (11).Without loss of generality, we assume the initial mode probability p j (0) = 1/J.The identifiability conditions of the multiple-model approach depend on each candidate model since the corresponding Kalman filters are non-interacting.
A. Identifiability conditions for Q and R Consider model j.Assume that Q j and R j are unknown, but are piece-wise constant such that the filter reaches the steady state.Now, consider the innovations corresponding to a stable, suboptimal closed-loop filter matrix F j = F j (I nx − W j H j ) given by [20], [23] is the predicted error at time (k − m).Given the innovation sequence (12), let us define a weighted sum of innovations, ξ j (k) = m i=0 a j i ν j (k − i), where the weights are the coefficients of the minimal polynomial of the closed-loop filter matrix F j , m i=0 a j i ( F j ) m−i = 0, a j 0 = 1.It is easy to see that ξ j (k) is the sum of two moving average processes driven by the process noise and measurement noise, respectively, given by [20], [23] Here, B j l and G j l are given by Then, if we define the cross-covariance between ξ j (k) and ξ j (k − ℓ) as L j ℓ , we obtain The noise covariance matrices of dimension n z × n z are positive definite and symmetric.By converting noise covariance matrices and the L j ℓ matrices as vectors, Zhang et al. [23] show that they are related by the noise covariance identifiability matrix I j as in (17).
As shown in [23], if matrix I j has full column rank, then the unknown noise covariance matrices, Q j and R j , are uniquely identifiable.

B. Recursive fading memory-based Innovation Correlation Estimation
We compute the sample correlation matrix Ĉj,k seq (i) at sample k for model j and time lag i as a weighted combination of the correlation matrix Ĉj,k−1 seq (i) at the previous sample (k−1) for model j and time lag i, and the samples of innovations ν j (k − i) and ν j (k).The tuning parameter λ, a positive constant between 0 and 1, is the weight associated with the previous sample correlation matrix.The recursive nature of proposed algorithm makes it amenable to estimating Q j and R j for non-stationary systems.
The current M sample correlation matrices at time k are used as the initial values for the next pairs of samples for recursive computation.Let us define the number of samples as N .

C. Objective function and the Gradient
The ensemble cross-correlations of a steady-state suboptimal Kalman filter is related to the closed-loop filter matrix F j = F j (I nx − W j H j ), the matrix F j , the measurement matrix H j , the steady-state predicted covariance matrix P j , filter gain W j and the innovation covariance, C j (0) via [5], [16] The objective function Ψ j , formulated in [23], involves minimization of the sum of normalized C j (i) with respect to the corresponding diagonal elements of C j (0) for i > 0. Formally, we can define the objective function Ψ j to be minimized with respect to W j as where diag(C j ) denotes the diagonal matrix of C j or equivalently the Hadamard product of an identity matrix with C j .We can rewrite the objective function by substituting ( 20) into (21) as where The gradient of objective function ∇ W Ψ j can be computed as [23] ∇ The Z j term in (26) is computed by the Lyapunov equation.
In computing the objective function and the gradient, we replace C j (i) by their sample estimates, Ĉj,k  seq (i).Evidently, the covariance estimation is a stochastic optimization problem because the cost function and the gradient depend on realized sample paths.

D. Updating gain W sequentially
Let B be the mini-batch size and let K = N/B be the number of mini-batches (we assume that N is divisible by B for simplicity).While the mini-batch gradient descent sequentially updates the M sample covariance matrices at every sample, we update the Kalman filter gain W when the sample index k is divisible by the size of minibatch B using the gradient of the objective function at sample k.Sequential mini-batch gradient descent allows more opportunities to converge to a better local minimum by frequent update of the gain than the batch algorithm and is much less noisy than a single sample stochastic gradient algorithm [12].Letting r denotes the updating index starting with r = 0.The generic form of gain update is The incremental gradient algorithm in (28) can be sped up by adaptively selecting the step size (α j ) r .Our results in [12] showed that Adam [14] and RMSProp [21] have the best accuracy and rapid convergence among all the accelerated SGD algorithms (e.g., bold driver [4], constant, subgradient [7], Adadelta [22]) studied.Here, we show the performance results of our algorithm using the RMSProp update.
RMSProp keeps track of the moving average of the squared incremental gradients for each gain element by adapting the step size element-wise.
where c > 0 is a constant and K is the number of minibatches.Here, γ = 0.9 is the default value and ϵ = 10 −8 to prevent division by zero.When N is unknown, as in streaming data, K is absorbed into the constant c.This is not a restriction, as mini-batch B size is all we need to implement the stochastic gradient descent algorithm.

E. Estimation of Q and R
Assuming that the necessary and sufficient conditions for the identifiability of covariances are satisfied [23], here we explore the noise covariance estimation using a single-pass SGD algorithm and validate it with standard examples described in the literature.Unlike the algorithm in [23], this algorithm is applicable to non-stationary and multiple-model systems.
1. Estimation of R Let us define µ j (k), as the post-fit residual sequence of the Kalman filter, which is related to the innovations From the joint covariance of the innovation sequence ν j (k) and the post-fit residual sequence µ j (k), and the Schur determinant identity [6], [8], one can show that in the steady-state (assuming constant gain, W j and constant Q j and R j over large enough time intervals such that the filter achieves steady state) [23] where S j is the steady-state innovation covariance.Because (32) can be interpreted as a simultaneous diagonalization problem in linear algebra [8] or as a continuous-time algebraic Riccati equation, the measurement covariance R j can be estimated by solving the simultaneous diagonalization problem via Cholesky decomposition and Eigen decomposition, or by solving a continuous-time Riccati equation as in [1], [23].

Estimation of Q
Given the estimated R j , we can compute the process noise covariance Q j and the steady-state updated state covariance P j .This requires an iterative process because Q j and P j are coupled in the general case [23].Letting t and l denote the iteration indices starting with t = 0 and l = 0, and using an initial (Q j ) 0 = W j S j W j′ (exact solution in the Wiener process case [23]), we initialize the steady-state updated covariance matrix P j as the solution of the Lyapunov equation in (33) where F j = (I nx − W j H j )F j .We iteratively update P j as in (34) until convergence Given the converged P j , Q j will be updated in the t-loop until the estimate of Q j converges.
The pseudocode for the sequential mini-batch SGD estimation algorithm for noise covariance parameters in multiple-model non-stationary systems is included as Algorithm 1.

III. NUMERICAL EXAMPLES
We apply our proposed method to the system used in [17], but modified to have non-stationary noise covariance matrices.The system matrices are assumed to be as follows.
The non-stationarity is assumed to arise from occasional abrupt changes in process and measurement noise covariances of unknown magnitude.Here "occasionally" implies that the jumps are infrequent enough that the Kalman filter is in the steady-state prior to a jump in the noise covariance.In this paper, we call the subset of samples where the noise covariances are not changing as a subgroup.Let N sg be the number of subgroups and let N be the total number of samples.Here, each subgroup has N/N sg samples during which the noise covariances remain constant.
Algorithm 1 Pseudocode of single-pass estimation algorithm in multiple-model adaptive Kalman filtering for a non-stationary system 1: input: (W j ) 0 , (Q j ) 0 , (R j ) 0 , (α j ) 0 , B, p j (0) ▷ (W j ) 0 : initial gain, (Q j ) 0 : initial Q j , (R j ) 0 : initial R j , (α j ) 0 : initial step size, B: batch size, p j (0): initial mode probability 2: r = 0 {▷ Initialize the updating index r} update the step size (α j ) r 13: update the gain update (R j ) r+1 and (Q j ) r+1 15: r = r + 1 compute the mode probability p j (k) mix the state estimates 21: end for First, we consider three non-stationary scenarios for this system with N up to 50,000 measurement samples and N sg = 5 subgroups: 1) both Q and R change every N/N sg samples (see Section A), 2) Only Q changes every N/N sg samples, and 3) Only R changes every N/N sg samples (see Section B).Next, we consider a scenario using 4,000 measurement samples in which the measurement noise covariance changes continuously and compare our algorithm with variational Bayesian method and the interacting multiple model approach (see Section C).We also explore the problem of tracking the position and velocity of an aircraft in an air traffic control (ATC) system (see Section D).Finally, we explore a multiple model scenario with a set of single-pass adaptive Kalman filters for each mode.The multiple-model method estimates the noise covariance parameters in parallel, and then finds the probable model by the concomitant mode probability (see Section E).
Note that we present the performance of proposed method using a single model (j = 1) in Sections A to D, and then consider the multiple-model approach (j = 1, 2) in Section E. In the estimation procedure, we set the number of burn-in samples N b = 50, and the number of lags M = 5.All computational simulations were run on a computer with an Intel Core i7-8665U processor and 16 GB of RAM.
A. Scenarios when both Q and R vary We show that the proposed algorithm works with varying mini-batch sizes.We also simulate 100 MC runs with varying number of observation samples to establish the minimum number of samples required for accurately estimating the unknown variance parameters.

Varying the mini-batch sizes
Table I shows Monte Carlo simulation results when Q and R are varied.The noise covariance estimation algorithm was run for varying mini-batch sizes.The minimum mini-batch size for acceptable RMSE for this problem is 64, since the accuracy of estimating P11 is relatively low when mini-batch size ≤ 32 (shown highlighted).For all subsequent results, we use RMSProp update with a minibatch size of 64 in our estimation algorithm.

Varying the number of samples observed
In order to find the minimum number of observation samples for tracking time-varying Q and R, we compared the estimated Q and R values with their concomitant true values, while varying the number of observation samples N = 5,000, 10,000, 20,000, 30,000 and 50,000, and keeping the number of subgroups constant at N sg =5.Fig. 1 shows the results of 100 Monte Carlo simulations in estimating Q and R when the number of observed samples is varied.The five dotted lines indicate true values corre-sponding to each subgroup.Note that the true Q values by subgroups are [0.04,0.64, 0.25, 1.00, 0.09], and the true R values by subgroups are [0.42,0.81, 0.49, 0.16, 0.64] for all observation samples given.The solid lines show the estimated noise covariance parameters based on the single-pass SGD algorithm.Estimated Q and R are closer to the corresponding true values when N ≥ 30, 000.Fig. 1 indicates that the accuracy of estimates increases with an increase in the number of samples, as one should expect.

Comparison of different algorithms for noise covariance estimation
Table II shows the performance comparison of noise covariance estimation algorithms in non-stationary systems.As shown in Table IIa, the batch estimation algorithms presented in [17], [23] are not suitable for nonstationary systems because they assume the covarainces to be constant and require the entire observation sequence to compute the objective function and the gradient.In Table IIb, multi-pass sequential mini-batch algorithm presented in [12] can accurately estimate the noise parameters, but the algorithm requires an anomaly detection algorithm to detect the change points, and is computationally expensive.
Compared to the batch and multi-pass algorithms, the proposed single-pass method can estimate the noise   IIc.The multi-pass algorithm performed better than the singlepass algorithm in terms of RMSE, but the single-pass algorithm also estimates the noise parameter closer to the corresponding truth.More significantly, the proposed method exhibits a speedup of 36 over the multi-pass method.In addition, the single-pass algorithm works with streaming data and does not require the measurement data to be stored.Unlike the batch algorithm that is only applicable to stationary systems, the proposed algorithm and the multi-pass algorithm can track Q and R accurately as shown in Fig. 2a and Fig. 2b.The trajectories of Q and R estimates can be smoothed by a simple first order fading memory filter with a smoothing weight of 0.9.Fig. 2e shows the averaged NIS of the SGD algorithm (RMSProp; a mini-batch size of 64) when Q and R are varied.The RMSProp SGD algorithm-based Kalman filter is consistent, although there are transients when the noise covariances change abruptly.In this scenario, the process noise covariance Q changes every 10,000 samples, while the measurement noise covariance R is a constant over all 50,000 samples, as shown in Fig. 3.The sequential mini-batch algorithm can track Q and R quite well and provides a consistent state estimator.In this scenario, measurement noise covariance R changes every 10,000 observation samples, but the process noise covariance Q is kept constant over the 50,000 samples.Fig. 4 shows that the sequential mini-batch gradient descent algorithm can track Q and R correctly.Table III shows the 100 MC simulation results when either Q or R is constant.The trajectories of estimated process noise covariance and the estimated measurement noise covariance are very close to their true trajectories by the sequential algorithm.

C. Scenario where R changes continuously
We explore the scenario where R changes continuously (but Q is constant), as in the example used in [19].The system matrices are The sampling interval, h is 0.1 seconds.A total of 4,000 measurement samples were collected (400 seconds of data).The measurement noise covariance R is 0.2, except for the area where it starts to increase rapidly to 1 around the sample index k = 1,000 (100 seconds) and then decreases again rapidly around k = 2,000 (200 seconds).In our data generation process, we model the measurement noise variance, R(k) as a continuous function that approximates this change: Variational Bayesian Adaptive Kalman Filter (VB-AKF) algorithm described in [19] models this change by simply "spreading" its previous approximate posteriors via a heuristic factor ρ. Our approach does not require such heuristic approximations.
Table IV shows the RMSE of estimated R by VB-AKF, interacting multiple-model (IMM) filters and the proposed method for various values of ρ (for VB-AKF) and the number of noise models (for IMM).The performance of VB-AKF is very sensitive to the selection of ρ as shown in Table IV; the RMSE can change by a factor For estimating R based on the IMM filter, the range of noise levels is selected to be between 0.1 and 1.2 to be consistent with [19] and the quantized levels for the various numbers of noise models between 2 and 111 are selected from this range uniformly.The RMSE of estimated R is the least when IMM uses 111 noise models.Our experiments show that even 64 noise models would have provided better RMSE than the VB-AKF method.However, IMM with a large number of models is computationally very expensive.The computational efficiency of the proposed method (with RMSProp update; a mini-batch size of 64) is almost the same as that of VB-AKF, but the proposed method performs slightly better than VB-AKF in terms of RMSE values.More significantly, our approach does not require the knowledge of hyper-parameter ρ, which is rarely known in practice.If the computational cost of tuning the hyper-parameter ρ is taken into account, our algorithm is much faster and more accurate than the VB-AKF.Also, unlike VB-AKF, our approach allows for the estimation of both Q and R.
Fig. 5 shows the trajectory of estimated R by VB-AKF (for ρ = 1 − exp(−4)), IMM (for 111 noise models) and the proposed method.The proposed method and IMM provide measurement noise variance estimates that can track the true variance more closely than the VB-AKF.The correct choice of ρ is critical to the success of VB-AKF.

D. An Air Traffic Control (ATC) Scenario
We consider an air traffic control (ATC) scenario used in [3].The ground truth is a target moving with a constant speed of 250 m/s with initial state specified in in Cartesian coordinates.
The sampling interval is T = 1 second.A total of 500 measurement samples were collected (500 seconds of The target position measurements are generated starting from k = 0 and they are in polar coordinates (range r and azimuth θ) by a radar located at [ξ 0 , η 0 ] = [−10 4 , 0], with with additive white Gaussian noise with covariance In this example, we used a Kalman filter based on a second order linear kinematic model (WNA) with process noise of standard deviation 1 m/s 2 described in (41).
An IMM estimator with one WNA (a constant velocity model with process noise standard deviation 1 m/s 2 ) for the uniform motion (UM) and a nearly coordinated turn (CT) model described in (42) and (43) are used.The process noise standard deviations used in the CT model were 3 m/s 2 and 0.1 • /s 2 for the UM and turn rate of the state, respectively.
The mode transition probability matrix π in (44) is used for IMM estimator.
Fig. 6 shows the averaged tracking results of target motion over 100 MC runs by KF, IMM and the proposed method.For the single-pass SGD estimation algorithm, we considered two models using either the UM or the CT model.The proposed approach can track the target close to its true trajectory when compared to both KF and IMM.As shown in Fig. 7a, the proposed approach based on CT model (even UM model as well) has the peak root mean square (RMS) position error of about 200 m in the scenario considered.The proposed method reduces the RMS position error by a factor of nine when compared to a Kalman filter and by a factor of four when compared to an IMM estimator when the aircraft is maneuvering.The proposed approach shows an acceptable RMS error of velocity estimation as shown in Fig. 7b.The proposed approach can also track the target velocity close to its true value as shown in Fig. 7c and Fig.

E. Application to Multiple Model Cases
We consider a scenario with an unknown motion model using a multiple-model approach.The multiplemodel algorithm estimates the noise covariance parameters using two Kalman filters, each tuned to a model, and the mode probabilities are used to infer which model is active.Here, we assume that a model is correct if its mode probability is greater than 0.66 and use the corresponding state estimate.We mix the state estimates by their mode probabilities if the mode probability of a model is between 0.33 and 0.66. 1. Scenario when both Q and R vary In this example, we assume that the observation samples are generated by Model 2 as in (46).Model 1 is assumed to be Model 2 is Fig. 8 shows the trajectory of estimated parameters.The mode probability of second model is higher than the first model as expected.We find that the single-pass multiple-model approach can track Q and R accurately.

Scenario where R changes continuously
We consider the same scenario as in (37) with two dynamic models.Table V shows the RMSE of estimated R over 100 MC runs by the multiple-model approach.The multiple-model method estimates the the parameters in parallel, and then selects a suitable model based on mode probabilities.Note that the measurements are generated using Model 1 as in (47).The VB-AKF provides the best estimate when the hyper-parameter ρ = 1 − exp(−4), but the RMSE value is quite sensitive to the selection of ρ.When the computation time needed for tuning the hypre-parameter ρ is considered, our algorithm is superior to VB-AKF in RMSE and computational efficiency.For estimating R by the IMM filter with a multiple-model approach, each model used 111 noise models that changed uniformly between 0.1 and 1.2.The IMM filter with a large number of models (111 noise models) shows a 10% lower RMSE than our proposed method, but the IMM filter is very expensive computationally by as much as a factor of 197.Even IMM filter with 64 noise models shows slightly worse RMSE than our proposed method, but our method has a better computational efficiency by a factor of 66 over the IMM.

IV. CONCLUSION AND FUTURE WORK
In this paper, we presented a single-pass stochastic gradient descent (SGD) algorithm for the noise covariance estimation in adaptive Kalman filters that are an order of magnitude faster than the batch method or the multi-pass sequential algorithm and has acceptable state estimation root mean square error (RMSE).This algorithm is valid for streaming data from non-stationary systems, where the noise covariances can occasionally exhibit abrupt, but finite, changes, as well as for multiple models.The computational efficiency of the new algorithm stems from recursive fading memory estimation of the sample crosscorrelations of the innovations, accelerated SGD algorithms and single-pass computations.Evaluation of the proposed method on a number of test cases demonstrated its computational efficiency, accuracy and filter consistency over extant approaches.
In the future, a number of research avenues can be pursued, including 1) estimating Q and R using onestep lag smoothed residuals; 2) automatic model selection from a library of dynamic models for model adaptation; and 3) explore the utility of the covariance estimation algorithm as an alternative to IMMs.

3 : 7 :
for k = 1 to N do {▷ N : Number of samples} 4: for j = 1 to J do {▷ J: Number of models} 5: compute the innovation correlations ν j (k) 6: compute the mode likelihood function Λ j (k) if k > N b +M then {▷ N b : Number of burn-

Fig. 1 :
Fig. 1: Estimation of noise parameters with varying the number of observation samples for varying Q and R

Fig. 2 :
Fig. 2: Comparison of optimization algorithms in varying Q and R system

Fig. 3 :
Fig. 3: Trajectory of noise parameters in varying Q system

Fig. 4 :
Fig. 4: Trajectory of noise parameters in varying R system

Fig. 8 :
Fig. 8: Trajectory of Q and R based on the multiple-model approach

Fig. 9
Fig.9shows the averaged estimated trajectory of R over 100 MC runs by VB-AKF (for ρ = 1 − exp(−4)), IMM (for 111 noise models) and our single-pass SGD (for RMSProp update with a mini-batch size of 16) in the multiple-model scenario.For estimating noise covariance parameters, all optimization methods can track R correctly, but VB-AKF requires the knowledge of the heuristic factor ρ and the computation cost of IMM is substantially high.Here, the estimated trajectory of R by the single-pass algorithm was smoothed by a smoothing weight of 0.7.

TABLE I :
Monte Carlo Simulation when Q and R are varied (100 Runs; 50,000 samples; RMSProp update)

TABLE II :
Performance comparison of optimization algorithms in non-stationary systems (a) Batch estimation (100 Monte Carlo Runs; 50,000 samples; M = 100) H.-S. Kim et al.: A Single-pass Noise Covariance Estimation Algorithm in Multiple-model Adaptive Kalman Filtering B. Scenarios where either Q or R is constant 1. Varying Q

TABLE III :
Monte Carlo Simulation when either Q or R is constant (100 Runs; 50,000 samples; RMSProp update) (a) The case of varying Q the wrong value is selected for ρ.For the VB-AKF algorithm, we obtain the least RMSE value when ρ = 1 − exp(−4).

TABLE V :
RMSE of estimated R by multiple-model approach (100 MC Runs; 4,000 samples)