Variational Bayesian Iterative Estimation Algorithm for Linear Difference Equation Systems

Many basic laws of physics or chemistry can be written in the form of differential equations. With the development of digital signals and computer technology, the research on discrete models has received more and more attention. The estimates of the unknown coefficients in the discretized difference equation can be obtained by optimizing certain criterion functions. In modern control theory, the state-space model transforms high-order differential equations into first-order differential equations by introducing intermediate state variables. In this paper, the parameter estimation problem for linear difference equation systems with uncertain noise is developed. By transforming system equations into state-space models and on the basis of the considered priors of the noise and parameters, a variational Bayesian iterative estimation algorithm is derived from the observation data to obtain the parameter estimates. The unknown states involved in the variational Bayesian algorithm are updated by the Kalman filter. A numerical simulation example is given to validate the effectiveness of the proposed algorithm.


Introduction
Differential equations arise in many instances when using the mathematical models to describe phenomena in economics, engineering, biology, etc. [1,2]. Considering the following nth-order linear differential equation with zero initial conditions y (n) k +α 1 y (n−1) k + α 2 y (n−2) k + · · · + α n−1 y (1) k + α n y k = β 1 u (n−1) k where y k is the system output, u k is the system input, k is the time variable, α i and β i are the unknown real parameters. Each derivative symbol is defined as Although the laws of the systems are continuously evolved over time, the sampled values of the systems are always discretized. In order to obtain the estimates of the unknown parameters in the system mathematical models, there are usually two approaches to further process the differential equation models. One is called the frequency-domain analysis methods, and the other is called the time-domain analysis methods [3][4][5][6].
So the output equation can be further expressed as Y(s) = G(s)U(s).
The system will be excited by designing different input signals to obtain different output responses. The parameters of the system are further obtained according to the analysis of the output responses [7][8][9]. By using the Laplace transform, the transfer function is one of the most important mathematical models in control theory. Under the zero initial conditions, the Laplace transform of the system output divided by the Laplace transform of the system input is called the transfer function. However, the transfer function is only suitable for describing linear constant coefficient differential equation systems.
In the time-domain analysis method, the system is represented in discrete time and the sampling interval is assumed to be one time unit. In this case, the discrete-time operator equations or difference equations are necessary [10][11][12][13]. A typical controlled autoregressive difference equation is written as y k + a 1 y k−1 + a 2 y k−2 + · · · + a n y k−n = b 1 u k−1 + b 2 u k−2 + · · · + b n u k−n .
Then the output y k can be further written as a form of determining the next value for given previous observations: where From Equation (2), it is easy to find that the calculation of y k from past data depends on the parameters in ϑ. Call the calculated valueŷ k|ϑ and writê The solution of obtaining the estimates of the parameters is to optimize a criterion function. The effective estimates make the prediction error ε(k, ϑ) = y k −ŷ k|ϑ as small as possible. Given a batch of inputs and outputs over a time interval 1 ≤ k ≤ N: The quadratic criterion function is defined as [14]: The process of computing the estimateθ N is the process of finding the solution of the following optimization problem:θ where 'argmin ϑ ' represents the value of the considered parameter vector when the objective function takes the minimum value. The classical least squares-based and gradient descent-based numerical methods have been applied to search the estimates of the parameters by minimizing the criterion function [15][16][17][18]. For example, Xu et al. developed a least squares-based iterative parameter estimation algorithm for dynamic systems based on the parameter decomposition by using the hierarchical identification principle [19].
However, the gradient and least squares-based iterative algorithm only obtain the point estimates of parameters based on the known noise variance. They minimize a cost function on the parameter vector and then obtain the estimate of the parameter vector. Different from the stochastic gradient and least squares algorithm, the variational Bayesian (VB) algorithm needs the priors of parameters and then iteratively updates the posterior expression of the parameters by maximizing the lower bound marginal likelihood function [20,21]. The VB algorithm can obtain the distributions of the unknown parameters rather than the point estimates. For example, Yang and Yin applied the VB inference to estimate the parameters of finite impulse response (FIR) systems with randomly missing output data [22].
In this paper, the difference equation is further transformed to the state-space model. In modern control theory, the state-space model is widely used to describe the dynamics of systems, because it can not only reflect the internal states of systems but also reveals the relationship between the intermediate states and the external input and output variables [23][24][25][26][27][28]. A general linear state-space model . . .
where y k is the system output, u k is the system input, v k is the random noise, and x i,k is the unmeasurable internal state, which establishes the link between system internal information and external measurements is presented. However, state estimation brings difficulty to the parameter estimation [29]. For linear Gaussian systems, the Kalman filter can achieve optimal estimation performance with a small amount of computational load, which is usually the preferred method of the state estimation for state-space models [30][31][32].
Based on the previous iterative identification schemes [33][34][35][36], this paper concerns a VB-based iterative estimation algorithm for difference equation models with unknown measurement noise. Different from the work studied in [37], which used an open-loop state observer to roughly calculate the system states without considering the effect of measurement noise in the procedure of states estimation, this paper considers the problem of state estimation by using the Kalman filter. The main contributions of this paper are as follows.

•
By defining internal state variables, the difference equation model is rewritten in the form of the state-space model. • A Kalman filter based VB (VB-KF) estimation algorithm is provided to estimate both the model states and parameters interactively. Based on the estimated states, the VB algorithm calculates the system parameters and the noise variance. Then the states are updated by means of the Kalman filter with new obtained parameters and noise variance. • Different from the existing point parameter estimation methods, the proposed algorithm can give the distributions of the system parameters and noise variance to reflect the uncertainties of their estimates.
The remainder of this paper is organized as follows. Section 2 introduces the state-space model. Section 3 derives a VB iterative algorithm to estimate the unknown parameters of the systems by using the Kalman filter. The simulation is presented in Section 4 to verify the proposed method. Section 5 gives some conclusions.

Transformation of the Difference Equation
Reconsidering a general nth-order difference equation in Equation (2), in order to transform the difference equation to the state-space model, n state variables are needed. The choices of the state variables are not unique; they can be measurable or unmeasurable. Usually, the state variables which can help to convert the state equation into a standard form are selected as the intermediate states. Here, we introduce n state variables as follows: Further, replacing the time variable k with k + 1 on both sides of Equation (3) gives the following equations: . .
Define the state vector x k := [x 1,k , x 2,k , · · · , x n,k ] T ∈ R n . Rewrite the state equation for each state variable in Equation (4) by matrix representation; the observability canonical form state-space model of the difference Equation (2) can be described as . . .
Further, we rewrite (5) and (6) as where The matrix A ∈ R n×n is called the system matrix, the vector B ∈ R n×1 is the system input vector, and the vector C ∈ R 1×n is the system output vector. Equation (7) is called the state equation, and Equation (8) is called the output equation. Specially, for the multi-input multi-output systems, the input and output vectors will be extended to matrices. It is easy to find that those unknown parameters are in the system matrix and input vector. In order to estimate the unknown parameters of the system, the output equation needs to be redescribed to include all unknown parameters. Transforming the state equation, the output equation in Equation (8) can be further written as Define the parameter vector θ, the information vector ψ k as θ : = [a n , a n−1 , · · · , a 1 , b n , b n−1 , · · · , b 1 ] T ∈ R 2n , Considering that there will be some disturbance in the actual systems, introduce a noise term v k to the output equation. Then the output in Equation (9) can be simplified into a vector form where v k is the random disturbance, which can be the white noise or colored noise. For practical systems, the effects of noise should be considered when dealing with the problem of parameter estimation.

Remark 1.
By introducing internal state variables, higher-order difference equations are transformed into first-order difference equations, which simplify the forms of the output equations of the models.
This paper develops a combined parameters, intermediate states, and noise variance estimation algorithm for linear difference equation systems. On the basis of the model shown in Equation (10), the next section will drive a VB-KF iterative algorithm to get the solution for the problem of the parameter estimation.

The Variational Bayesian Parameter Estimation Algorithm
The VB algorithm is formed by introducing the variational approximation theory. It is difficult to deal with the integrations involved in the general posterior probability solution. The VB algorithm introduces a cluster of free factored distribution to approximate the complicated posterior probability, which avoids the direct multiple integrations on marginal distribution [38,39]. In this paper, the parameter vector together with the corresponding estimated error covariance matrix and noise variance are inferred by using the VB approach.
The unknown parameters that need to be estimated include the system model parameter vector θ, the variance of parameter estimates ζ −1 I, and the system noise variance σ −1 . Assuming their priors follow Gaussian, Gamma, and Gamma distributions, respectively, The VB algorithm introduces a free distribution q(·) as the approximate distribution of parameter to simplify the calculation of logarithmic likelihood function of the measurement data. Then the probability distribution of observation can be expressed as log p(y 1:k ) = log p(θ, ζ, σ, y 1:k ) − log p(θ, ζ, σ|y 1:k ) where F q(θ, ζ, σ) is the lower bound of log p(y 1:k ) (referring to the Jensen inequality [38]), that is KL (q(θ, ζ, σ)||p(θ, ζ, σ|y 1:k ) is the Kullback-Leibler divergence between the factored approximation q(θ, ζ, σ) and the true posterior distribution p(θ, ζ, σ|y 1:k ), which demonstrates the similarity between the two probability distributions.
According to Equation (14), we have where d (s+1) and e (s+1) are the two parameters of the Gamma distribution and are given by Then, update the expectation of ζ at iteration s + 1 bȳ Similarly, the variational form for q (s+1) (σ) obeys which yields where f (s+1) and h (s+1) are given by Thenσ can be computed asσ

The State Estimation Algorithm
Above all, we obtain the posterior probability distribution or optimal solution of system parameters and the noise variance. However, the state-space model contains both unknown system parameters and unavailable system states, which brings a difficulty to obtaining the parameter estimates. The Kalman filter is usually used to estimate the intermediate states for linear state-space systems with Gaussian noise. The state observer is a deterministic state estimation algorithm, which is simpler in structure but only obtains an approximate estimation of the system states. To improve the estimation accuracy, this paper adopts a modified Kalman filter to obtain state estimates under the Bayesian framework. According to the considered state-space model, the following Kalman filter algorithm [40] is designed to generate the state estimateŝ Since the system parameters are unknown, we have to utilize the estimated parameters to construct the system parameter vector and matrix The flowchart of executing the proposed VB-KF iterative algorithm is shown in Figure 1.
Obtain the estimateθ (s+1) ? End The iterative steps for calculating parameter and state estimates are summarized as follows:

5.
Use the new obtained parameter estimates and noise variance to update states with Kalman filter algorithm in Equation (23)-(25).

Evaluate the relative change betweenθ
for a very small positive number ε, then stop the iteration procedure and get the parameter estimatesθ (s+1) ; or else, let s = s + 1 and go to Step 3.

Simulation Example
Consider the following 3rd-order difference equation: Taking into account the interference of noise, transform the difference equation into the state-space model: The parameter vector to be estimated is θ = [0.18, 0.32, 0.50, 1.00 , 0.55, 0.97] .
In this simulation example, the input is selected as a persistent excitation signal sequence with zero mean and unit variance, v k is Gaussian noise sequence with zero mean and variance σ 2 . Of the collected data length N = 2000, the first 1000 sets of data (N 1 ) are used for the training set and the rest of the (N 2 ) sets are used for the testing set. Design different noise variances to simulate the noise interferences on the output signal. Use the proposed VB-KF iterative estimation algorithm to estimate the parameter vector.
Taking the calculated mean values of the Gaussian distributions for the parameters as the estimates of the unknown parameters, the proposed VB-KF parameter estimates and estimation error δ = ||θ (s) − θ||/||θ|| with the increasing of the iteration s under different noise variances are presented in Table 1 and Figure 2. The curves of parameter estimates against s are plotted in Figure 3. The estimated noise variances are shown in Table 2.  When the noise variance is σ 2 = 0.10 2 , use the estimated parameters to construct the approximate difference equation: The computed predictionŷ k and the true output y k are compared in Figure 4. From Figure 4, we can see that the predictions are very close to the true outputs.
For comparison, we also use the open-loop state observer to generate the estimates of unknown states [37]. Here, we combine the proposed VB parameter estimation method with state observer to form the state observer-based variational Bayesian (VB-SO) iterative estimation algorithm. In [57], the KF-based least squares iterative algorithm is used to identify the state space system, where the unknown process noise variance is set to one (R = 1.00) in the Kalman filter. Here, we use this KF based-variational Bayesian algorithm (VB-KF 2 ) and the proposed method (VB-KF 1 ), where the noise variance is accurately computed, to estimate system parameters and states under different noises. Furthermore, the parameter estimation errors of the three different algorithms are compared in Figure 5 with σ 2 = 0.50 2 .  In addition, we adopt the other batch of data set for model validation. The parameter estimates with the least estimation error are chosen for the verification test. The root mean square errors (RMSE) of the three estimate models are computed as [y i −ŷ vbso ] = 0.3007, From the calculated results, it can be seen that the proposed VB-KF algorithm has the smallest root mean square error.

Conclusions
In this paper, a Kalman filter-based variational Bayesian iterative algorithm is proposed to estimate the unknown parameters for linear difference equation systems. The main advantage of the presented method is that it can give the distribution of the parameter rather than a point estimate. Besides, the proposed algorithm obtains the noise statistics and state estimation. From the simulation results shown in Tables 1 and 2 and Figures 2-5, it is easy to find the following conclusions.

•
The parameter estimation errors are generally becoming smaller with iteration s increasing and with the noise variance decreasing. The estimated parameters are very close to the real values.

•
The estimated noise variances in Table 2 show that variance estimates are relatively large in the previous period because of the inaccurate parameters and states, but with the increasing of the iteration s, the noise variance estimates can converge to their true value.

•
Under the same batch of date and noise variance, the estimation accuracy of the proposed VB algorithm is higher than that of VB-SO and VB-KF 2 algorithms.

•
The predicted outputs fit well with true outputs, which indicates the effectiveness of the proposed algorithm.
The proposed variational Bayesian iterative estimation algorithm in this paper for state space models from the observation data can combine some mathematical strategies [58][59][60][61] to study the parameter estimation algorithms of other stochastic systems with colored noise and can be applied to other fields [62][63][64][65][66].
Author Contributions: J.M., Q.F., F.G., and W.X. conceived and worked together to achieve this work; J.M. and F.G. discussed the method and the derivation process of the algorithm; Q.F. and W.X. compiled the simulation program by Matlab and analyzed the data; J.M. and Q.F. wrote the original draft. Finally, all the authors have read and approved the final manuscript.