Dual extended Kalman filtering in recurrent neural networks

doi:10.1016/S0893-6080(02)00230-7

Neural Networks

Volume 16, Issue 2, March 2003, Pages 223-239

https://doi.org/10.1016/S0893-6080(02)00230-7 Get rights and content

Abstract

In the classical deterministic Elman model, the estimation of parameters must be very accurate. Otherwise, the system performance is very poor. To improve the system performance, we can use a Kalman filtering algorithm to guide the operation of a trained recurrent neural network (RNN). In this case, during training, we need to estimate the state of hidden layer, as well as the weights of the RNN. This paper discusses how to use the dual extended Kalman filtering (DEKF) for this dual estimation and how to use our proposing DEKF for removing some unimportant weights from a trained RNN. In our approach, one Kalman algorithm is used for estimating the state of the hidden layer, and one recursive least square (RLS) algorithm is used for estimating the weights. After training, we use the error covariance matrix of the RLS algorithm to remove unimportant weights. Simulation showed that our approach is an effective joint-learning–pruning method for RNNs under the online operation.

Introduction

The most well-known training approach for recurrent neural networks (RNNs) is the real time recurrent learning (RTRL) (Schmidhuber, 1992, Williams and Zipser, 1989, Zipser, 1990). However, it is a first order stochastic gradient descent method (Robbins & Monro, 1951) and hence its learning speed could be very slow. Recently, extended Kalman filtering (EKF) (Haykin, 1991) based algorithms have been introduced to train feedforward neural networks (FNNs) (Scalero and Tepedelelenlioglu, 1992, Shah et al., 1992, Singhal and Wu, 1989) and RNNs (Puskorius and Feldkamp, 1994, Williams, 1992). With the EKF approach, the learning speed has been improved. In some real-time applications, such as neural network controllers, the number of training iterations required for convergence is substantially important. The EKF approach is very useful for these applications. From the analysis in Williams (1992), the computational complexity of the EKF algorithm at each iteration is similar to that of the RTRL. However, the EKF algorithm offers the advantage of convergence in fewer iterations than the RTRL.

Another issue in neural networks is to remove some unimportant weights from a trained network. The Hessian based approach, such as: the optimal brain damage (OBD) method (Le Cun et al., 1989, Reed, 1993), is one of efficient approaches to remove unimportant weights. To identify unimportant weights, we are required to estimate the Hessian matrix, as well as the importance of each weight, by feeding the training set into the trained network. The computational complexity to get the importance of every weight is O(M²p) (Le Cun et al., 1989, Pearlmutter, 1994), where M is the number of weights and p is the number of training patterns. To avoid serious overfitting, the number of training patterns is usually much greater than the number of weights, i.e. p≫M. In the online situation, the Hessian matrix is usually unavailable since training patterns are not held after training. In Leung et al., 1996, Leung et al., 2001 Leung proposed a joint-learning–pruning algorithm for FNNs based on the error covariance matrix of the recursive least square (RLS) approach. The computational complexity for pruning is O(M³), that is much smaller than O(M²p) when p≫M. The stopping criteria of pruning is based on the estimated training error and the estimated change in the training error. The advantage of this RLS pruning approach is that during pruning the training set is not required. Hence, this approach is suitable for the online situation.

Williams (1992) proposed a global EKF algorithm to estimate the weights and the hidden state of RNNs. In this approach, the goal is to maximize the a posterior probability, rather than to minimize the training error. In Sum, Leung, Chan, Kan, and Young (1999), we used the error covariance matrix of the Williams's algorithm for estimating a posterior probability of each weight. Afterwards, we can obtain the pruning order of the weights based on the estimated probabilities. However, during pruning, we still need to use the training set (or test set) for determining the change in the training error (or test error). In this method, the stopping criteria of pruning is based on the actual training error (or test error) and the actual change in the training error (or test error). It is because the error covariance matrix of the EKF algorithm is related to the a posterior probability, rather than the training error (or test error).

Wan and Nelson, 1997a, Wan and Nelson, 1997b used the dual EKF (DEKF) approach for processing noisy time series. However, they did not consider external input, recurrent connection, and pruning. In their approach, the error covariance matrix of the weight estimation EKF algorithm is related to the a posterior probability maximization rather than the training error.

In this paper, we consider the operation, training and pruning of a stochastic RNN model with external input. In this model, during the operation, an EKF algorithm estimates the hidden state of the RNN model. Hence, the performance of this approach is better than that of the classical Elman model. During training, we use the DEKF approach to train and to operate a RNN, wherein an EKF algorithm estimates the state of the hidden nodes and a RLS algorithm estimates the weights. Since the objective of the RLS algorithm is to minimize the training error, we can use its error covariance matrix to estimate the importance of each weight, as well as to prune unimportant weights. The advantage of the proposing pruning approach is that we no need to use the training set to prune a trained network.

In the rest of the paper, the stochastic RNN model will be discussed in the Section 2. Section 3 presents our DEKF approach for stochastic RNNs. The pruning method will be discussed in Section 4. Section 5 discusses the complexities of the proposed approach. Five simulation examples will then be given in Section 6 for illustrating the effectiveness of our joint-learning–pruning scheme. Section 7 follows with a concluding remark.

Section snippets

EKF for nonlinear systems

Without loss of generality, we consider a stochastic nonlinear system, given by $a →_{t+1} =κ(a →_{t}, u →_{t+1})+ v →_{t},$ $d →_{t} =ℏ(a →_{t})+ w →_{t},$ where κ(·,·) is a function of the L_h-dimensional hidden state $a →_{t}$ and the L_i-dimensional input, ℏ(·) is a function of the hidden state, $d →_{t}$ is the L_o-dimensional observation. If we define $κ_{t} (a →_{t})=κ(a →_{t}, u →_{t+1}),$ $κ_{t} (a →_{t})$ becomes a time varying function. In the above system, $y →_{t} =ℏ(a →_{t})$ is the actual system output; $w →_{t}$ is the measurement noise; and $v →_{t}$ is the process noise which

DEKF based training for RNNs

In this section, we first review the Williams's global EKF approach (Williams, 1992) for RNNs and the Wan's DEKF approach (Wan & Nelson, 1997a) for FNNs. Afterwards, we point out the weakness of these two approaches in pruning RNNs. To overcome this weakness, a new DEKF approach for training RNNs is introduced.

Pruning RNN

This section discusses the conjunction between pruning and the weight estimation RLS algorithm, that is, the relationship between the error covariance matrix $P_{t}^{∗}$ and the Hessian matrix of the training error. Also, the pruning method for a trained RNN will be presented.

From , , the Hessian matrix of the energy function $J(θ →)$ is $∂^{2} J(θ →) ∂ θ →^{2} ≈2 P_{0}^{∗−1} + ∑ t′=1 t H_{t}^{∗} H_{t}^{∗T},$ $∂^{2} J(θ →) ∂ θ →^{2} =2P_{t}^{∗−1} .$ Define the training error $J′(θ →)$ as $J′(θ →)= ∑ t′=1 t ‖ d →_{t′} − y → ̂_{t′} ‖^{2} .$ Since the difference between $J(θ →)$ and $J′(θ →)$ is $(θ → − θ → ̂_{0})$

Training phase

For standard EKF or RLS algorithms, it is well known that both the computational and space complexities are equal to O(K²), where K is the dimension of the state vector.

For the state estimation EKF algorithm of the DEKF approach, the dimension of the state vector is equal to the number of hidden nodes, i.e. K=L_h. Therefore, both the computational and space complexities of the state estimation EKF algorithm are equal to O(L_h²). For the weight estimation RLS algorithm of the DEKF approach, the

Simulations

In this section, we will demonstrate the effectiveness of the proposed joint-learning–pruning scheme through five examples. The first two examples are system identification problems(Tsoi & Tan, 1997). The purpose of these two examples is to demonstrate that using our approach (without the training set and test set) can produces a good pruning order and a good estimate of the training error of a pruned RNN. The last three examples are time series prediction problems (Weigend et al., 1991, Mackey

Conclusion

In this paper, we have introduced a joint-learning–pruning scheme for online learning and pruning RNNs. The DEKF approach consists of two algorithms, namely, the state estimation EKF algorithm and the weight estimation RLS algorithm. During training, they are running concurrently over data until converge. The EKF algorithm uses the last estimate of the weight vector to estimate the hidden state, as well as to predict the system output. The RLS algorithm uses the current estimated hidden state

Acknowledgements

The work described in this paper was supported by the Strategic Grant, City University of Hong Kong, Hong Kong (Project No. 7001218).

References (27)

C.S. Leung et al.
A pruning method for recursive least square algorithm
Neural Networks
(2001)
S. Shah et al.
Optimal filtering algorithm for fast learning in feedforward neural networks
Neural Networks
(1992)
A.C. Tsoi et al.
Recurrent neural networks: a constructive algorithm, and its properties
Neurocomputing
(1997)
B.D.O. Anderson et al.
Optimal filtering
(1979)
S. Haykin
Adaptive filter theory
(1991)
Y. Iiguni et al.
A real-time learning algorithm for a multilayered neural network based on the extended Kalman filter
IEEE Transactions on Signal Processing
(1992)
Larsen, J (1993). Design of neural network filters. PhD Thesis, Electronics Institute, Technical University of...
Y. Le Cun et al.
Optimal brain damage
C.S. Leung et al.
On-line training and pruning for RLS algorithms
Electronics Letters
(1996)
M.C. Mackey et al.
Oscillation and chaos in physiological control systems
Science
(1977)

Moody, J. E (1991). Note on generalization, regularization and architecture selection in nonlinear systems. Proceedings...

B.A. Pearlmutter

Fast exact multiplication by the Hessian

Neural Computation

(1994)

G.V. Puskorius et al.

Neurocontrol of nonlinear dynamical systems with Kalman filter training recurrent networks

IEEE Transactions on Neural Networks

(1994)

Cited by (64)

Neural control of an induction motor with regenerative braking as electric vehicle architecture
2021, Engineering Applications of Artificial Intelligence
This paper presents the synthesis of an induction motor neural controller and a regenerative braking controller for an electric vehicle architecture, based on two energy system, a Main Energy System (MES) and an Auxiliary Energy System (AES). Such controllers are based on system identification, trajectory tracking and state estimation. System identification uses a Recurrent High Order Neural Network (RHONN), trained with an Extended Kalman Filter (EKF). RHONN obtains an accurate motor model which is robust in presence of external disturbances and parameter variations. To force the motor to track a desired torque trajectory and to reject undesired disturbances, an inverse optimal controller based on the identified neural model is proposed. Therefore, the proposed scheme does not need a-priori knowledge of motor parameters. For state estimation a super-twisting observer is implemented to estimate the rotor magnetic fluxes. The regenerative braking controller addresses the issue when the battery is not capable to accept the generated energy due to braking; Therefore, an AES based on a super capacitor and a buck–boost converter is a solution to recover the braking energy and give power to the motor during acceleration. The regenerative braking controller is based on a PI control to regulate voltage and current of the super capacitor. Simulation and experimental results illustrate the performance of the proposed controllers which are implemented using a rapid control prototyping platform integrated by a dSPACE board. Experimental tests are carried out for a topology with and without AES to illustrate the improvement of the proposed EV architecture.
Artificial neural networks in microgrids: A review
2020, Engineering Applications of Artificial Intelligence
Citation Excerpt :
Another, less used method for assigning weight parameters is the extended Kalman filter (EKF). The EKF has been shown to improve learning convergence in comparison to the backpropagation algorithm, although it is more costly (Leung and Chan, 2003). It is known that Kalman filtering (KF) estimates the state of a linear system with additive state and output white noises.
In this work it is shown that artificial neural networks have certain characteristics that make them advantageous in the development of controllers in the different levels of control that microgrids must include to be economic, efficient and able to satisfy the energy power quality and quantity requirements. An objective of this paper is to bring attention to the promising applicability of artificial neural networks applied to the control of microgrid distributed generation sources, as well as in scheduling, power sharing, supervisory control and optimization.
Real-time Neural Input–Output Feedback Linearization control of DFIG based wind turbines in presence of grid disturbances
2019, Control Engineering Practice
This paper proposes a Neural Input–Output Feedback Linearization (N-IOFL) controller for a Doubly Fed Induction Generator (DFIG) prototype connected to the grid. Under unbalanced grid voltages, the existing control strategies need to be modified, becoming very complicated. By using an identifier based on Recurrent High Order Neural Network (RHONN), trained on-line with an Extended Kalman Filter (EKF), an adequate model of the DFIG and of the DC-link can be obtained, which helps the control law based on feedback linearization to reject grid disturbances appearing under non-ideal grid conditions without decomposition process. Based on such identification, the proposed controller is used to track a desired Direct Current (DC) voltage reference at the output of DC-link, to maintain constant the electric power factor controlled by the Grid Side Converter (GSC), and to force independently the rotor currents to track a specified reference defined from the required stator powers, controlled by the Rotor Side Converter (RSC), under both balanced and unbalanced grid conditions. Real-time results illustrate the effectiveness of the proposed controller even in presence of non-ideal grid conditions.
Optimized control and neural observers with germinal center optimization: A review
2019, Annual Reviews in Control
Citation Excerpt :
In Rios et al. (2018), a recurrent high order neural network observer (RHONNO) (Sanchez et al., 2008) trained with extended Kalman filter (EKF) and optimized with GCO is presented. To train recurrent neural networks the back-propagation through time algorithm is commonly used, but it presents disadvantages as (Hermans & Schrauwen, 2010; Leung & Chan, 2003; Rovithakis & Christodoulou, 2012): High sensitivity to initial conditions,
The performance of most of nowadays control techniques depends on the choices of specific designed parameters. In the past five years, a common approach for solving the issue of finding the best designed parameters have been using metaheuristics optimization techniques. In the present review, we explore the use of the germinal center optimization algorithm (GCO) and its applications in neural identification and control. GCO is a novel artificial immune system for multivariate optimization, in contrast with other swarm optimization techniques, GCO have adaptive leadership, this enable to modify online the balance between exploration and exploitation. This is a good feature when the form of the objective function is unknown. We also explore the recent tendencies of other metaheuristics for control tuning.
Neural Input Output Feedback Linearization Control of a DFIG based Wind Turbine
2017, IFAC-PapersOnLine
Wind energy has many advantages, because it does not pollute and is an inexhaustible source of energy. The Double Fed Induction Generator (DFIG) is the most important generator used for Horizontal Axis Wind Turbine (HAWT). In this paper, a nonlinear controller is designed based on the Neural Input-Output Feedback Linearization Control ( N-IOFLC) for DFIG, to force the rotor currents to track specified reference defined form the desired active and reactive stator powers, using the fourth order model of the doubly-fed induction machine in (d, q) axis reference frame with stator currents and rotor currents components as state variables. The neural controller is based on a Recurrent High Order Neural Network (RHONN), trained with an Extended Kalman Filter (EKF). The RHONN works as an identifier to obtain an accurate model which is robust to disturbances and parameter variations. The DFIG stator is directly connected to the grid; the rotor is connected via a back to back converter. Computer simulation results obtained, using Matlab/Simulink^⋆, confirm the validity and effectiveness of the proposed control approach.
Real-time neural inverse optimal control for indoor air temperature and humidity in a direct expansion (DX) air conditioning (A/C) system
2017, International Journal of Refrigeration
A real-time neural inverse optimal control for the simultaneous control of indoor air temperature and humidity using a direct expansion (DX) air conditioning (A/C) system has been developed and the development results are reported in this paper. A recurrent high order neural network (RHONN) was used to identify the plant model of an experimental DX A/C system. Based on this model, a discrete-time inverse optimal control strategy was developed and implemented to an experimental DX A/C system for simultaneously controlling indoor air temperature and humidity. The neural network learning was on-line performed by extended Kalman filtering (EKF). This control scheme was experimentally tested via implementation in real time using an experimental DX A/C system. The obtained results for trajectory tracking illustrated the effectiveness of the proposed control scheme.

View all citing articles on Scopus

¹: This work was supported by the Strategic Grant, City University of Hong Kong (No. 7001218).

View full text

Dual extended Kalman filtering in recurrent neural networks1

Abstract

Introduction

Section snippets

EKF for nonlinear systems

DEKF based training for RNNs

Pruning RNN

Training phase

Simulations

Conclusion

Acknowledgements

Neural Networks

Neural Networks

Neurocomputing

Optimal filtering

Adaptive filter theory

A real-time learning algorithm for a multilayered neural network based on the extended Kalman filter

IEEE Transactions on Signal Processing

Optimal brain damage

On-line training and pruning for RLS algorithms

Electronics Letters

Oscillation and chaos in physiological control systems

Science

Fast exact multiplication by the Hessian

Neural Computation

Neurocontrol of nonlinear dynamical systems with Kalman filter training recurrent networks

IEEE Transactions on Neural Networks

Dual extended Kalman filtering in recurrent neural networks¹