Elsevier

Neurocomputing

Volume 73, Issues 4–6, January 2010, Pages 1024-1030
Neurocomputing

Letters
Efficient online recurrent connectionist learning with the ensemble Kalman filter

https://doi.org/10.1016/j.neucom.2009.12.003Get rights and content

Abstract

One of the main drawbacks for online learning of recurrent neural networks (RNNs) is the high computational cost of training. Much effort has been spent to reduce the computational complexity of online learning algorithms, usually focusing on the real time recurrent learning (RTRL) algorithm. Significant reductions in complexity of RTRL have been achieved, but with a tradeoff, degradation of model performance. We take a different approach to complexity reduction in online learning of RNNs through a sequential Bayesian filtering framework and propose the ensemble Kalman filter (EnKF) for derivative free parameter estimation. The EnKF provides an online training solution that under certain assumptions can reduce the computational complexity by two orders of magnitude from the original RTRL algorithm without sacrificing the modeling potential of the network. Through forecasting experiments on observed data from nonlinear systems, it is shown that the EnKF trained RNN outperforms other RNN training algorithms in terms of real computational time and also leads to models that produce better forecasts.

Introduction

Recurrent neural networks (RNNs) are nonlinear dynamic regression models which have found widespread use in various nonlinear modeling tasks such as ocean wave forecasting [14], geomagnetic storm prediction [29], adaptive control [19], [23], system identification [25], and signal processing [12], [15], [28]. However, online RNN applications have been hindered due to the high computational cost of training. The standard approach to online training in RNNs entails computation of the gradient of the error function with respect to the weights, usually with the real time recurrent learning (RTRL) [30]. The RTRL algorithm takes O(H4) computations (where H is the number of neurons), which is impractical for training large networks [30]. Moreover, algorithms relying on first order gradient descent may take multiple epochs to reach convergence, which makes gradient descent learning unrealistic for online tasks.

Various methods have been proposed to speed up online learning of RNNs, usually with a tradeoff. One of the first approaches to complexity reduction was through sub-grouping each output neuron together with an arbitrary number of neurons in the hidden layer [31]. Through this strategy, non-overlapping sub-networks were effectively created, leading to a drastic savings in computations in the order of O(H4/g3) where g denotes the number of sub-groups. However, as g increases a tradeoff is incurred; less crossover of training information flows between the sub-groups resulting in a significant degradation of the network's capabilities. Other variants of the algorithm were proposed, such as dynamic sub-grouping [6], but again the algorithm relies on arbitrary reduction in the sensitivity matrix which also degrades the networks modeling power when sub-grouping increases. The most prominent limitation of the sub-grouping strategy is the assumption of multiple output neurons. When applied to single output systems, such as time series modeling, the sub-grouping strategy is not valid for online applications, and a postprocessing stage is necessary. A hybrid BPTT/RTRL scheme has been put forward by [26], which reduced the computational complexity to O(H3). The method relies on segmenting the training set and running back propagation through time on each segment and then using RTRL to forward propagate the gradient history before the start of the next time step. Other methods that similarly reduce computational complexity to O(H3) have been proposed by [27], which make use of Greens function.

Sequential Bayesian filters have also been applied to RNNs, including the extended Kalman filter (EKF) [24], and the sigma-point (unscented and central difference) Kalman filters (SPKF) [2], [9], [22]. Although the filtering techniques offer superior convergence properties to gradient descent, the computational complexity per time step for the EKF is equivalent to RTRL (and it also depends on RTRL derivatives). The SPKF, are derivative free filters, but their complexity is approximately O(H6) which is much higher than RTRL, and that is why they are not discussed further in this paper.

Here we propose the ensemble Kalman filter (EnKF) [7] for online training of RNNs, which reduces computational complexity to O(H2). The EnKF is a Markov Chain Monte Carlo method for estimating time evolution of the state distribution, along with an efficient algorithm for updating the state ensemble when ever a new measurement arrives. The distinguishing feature of the EnKF is that it avoids the computation of the derivatives altogether. The EnKF approximates the integrals of the distributions of interest by discrete summation, thus computing efficiently their moments, which allows for computational complexity reduction. Through simulations, we demonstrate that the proposed algorithm has superior convergence properties to gradient descent learning and EKF filtering. This paper claims to be the first such study of the EnKF for reduction of computational complexity for RNNs in online time series applications.

The paper is organized as follows. An introduction of recurrent neural networks is given in Section 2. The EnKF algorithm for RNN parameter estimation is provided in Section 3. In Section 4 the performance of the EnKF on various nonlinear time series is reported, and finally in Section 5, concluding remarks are given.

Section snippets

Recurrent neural networks

RNNs are dynamic nonlinear models trainable by specialized weight adaptation algorithms [24], [30]. The unique feature that distinguishes RNNs from all other nonparametric regression models is the feedback loops in the topology of the network which enables the network to retain a memory of previous states during operation of the model. This implicit temporal dimension makes RNNs particularly suitable for modeling time varying phenomena [5].

In this study, the recurrent architecture known as the

The EnKF approach

The EnKF approach approximates the distributions of interest by discrete summation over sampled weights. The algorithm starts by randomly generating an ensemble of plausible weight vectors within a predefined interval. The initial ensemble contains a number of independent weight samples from the distribution of the state which are all equally important. Compared to other sampling filters, the EnKF does not resample the ensemble, rather it only updates the members of the ensemble (the neural

Experimental results

The objective of the presented work is to develop a computationally efficient method for sequential training of RNNs for use on time series modeling. To provide a comparison of model performance in terms of computational cost and accuracy, we have implemented several popular learning algorithms for recurrent networks; the RTRL algorithm [30], and the extended Kalman filter for RNNs (RNN-EKF) [4].

Studies into time series modeling and forecasting were carried out using two real world data sets;

Conclusion

This paper investigated the applicability of the EnKF to on-line learning of RNNs. Recurrent neural networks are known to be highly powerful forecasting tools, but the one of the largest obstacles to their practical use is the severe computational complexity of their on-line training algorithms. The research community has proposed algorithms that significantly reduce the computational complexity for on-line applications, but all algorithms so far have encountered two main disadvantages; the

Derrick Mirikitani was born in Hawaii in 1979. He received the B.A. degree form Villanova University, Pennsylvania, USA in 2000, a masters degree in eBusiness Management from Kokusai Daigaku, Niigata, Japan, in 2002, and an M.Sc. in Evolutionary and Adaptive Systems, in 2003 from the University of Sussex, Falmer, UK. He is currently working toward a Ph.D. in Computer Science at Goldsmiths College, University of London, UK. His research interests fall under the umbrella of complex adaptive

References (31)

  • G. Evensen

    The ensemble Kalman filter: theoretical formulation and practical implementation

    Ocean Dyn.

    (2003)
  • G. Evensen, Sequential data assimilation with a nonlinear quasi-geostrophic model using Monte Carlo methods to forecast...
  • L.A. Feldkamp, T.M. Feldkamp, D.V. Prokhorov, Recurrent neural network training by nprKF joint estimation, in:...
  • S. Haykin

    Kalman Filtering and Neural Networks

    (2001)
  • P.L. Houtekamer et al.

    Data assimilation using an ensemble Kalman filter technique

    Mon. Weather Rev.

    (1998)
  • Cited by (9)

    • An EnKF-based scheme to optimize hyper-parameters and features for SVM classifier

      2017, Pattern Recognition
      Citation Excerpt :

      The algorithm (EnKF) we adopt can solve these defects. Mirikitani and Nikolaev [44] propose the EnKF for online training of recurrent neural networks (RNNs). The experimental results demonstrate that the EnKF has superior convergence compared with GD and EKF.

    • Self-recovering extended Kalman filtering algorithm based on model-based diagnosis and resetting using an assisting FIR filter

      2016, Neurocomputing
      Citation Excerpt :

      The well-known Kalman filter (KF) [1] has been intensively studied and widely used for state estimation in various fields of science and engineering [2–4].

    • Neural networks and statistical learning, second edition

      2019, Neural Networks and Statistical Learning, Second Edition
    • Neural networks and statistical learning

      2014, Neural Networks and Statistical Learning
    View all citing articles on Scopus

    Derrick Mirikitani was born in Hawaii in 1979. He received the B.A. degree form Villanova University, Pennsylvania, USA in 2000, a masters degree in eBusiness Management from Kokusai Daigaku, Niigata, Japan, in 2002, and an M.Sc. in Evolutionary and Adaptive Systems, in 2003 from the University of Sussex, Falmer, UK. He is currently working toward a Ph.D. in Computer Science at Goldsmiths College, University of London, UK. His research interests fall under the umbrella of complex adaptive systems. He is currently developing neural adaptive systems for inverse modeling of nonlinear dynamical systems. He is a member of the International Neural Network Society.

    Nikolay Nikolaev received the Ph.D. degree in computer science and engineering in 1992 from the Technical University in Sofia, Bulgaria. From 1992 to 1993 he conducted postdoctoral research in machine learning at the University of Wales, Cardiff, United Kingdom. Since the fall of 1993 he was an Assistant Professor in Computer Science at the American University in Bulgaria. In the fall of 2000 he joined the Department of Mathematical and Computing Sciences at Goldsmiths College, University of London as a Lecturer in Computing. In 2000 and 2001 he was invited to work on research projects in evolutionary computation with colleagues from The University of Tokyo. His theoretical research interests include evolutionary computation–genetic algorithms, and inductive genetic programming; neural networks–recurrent neural networks, polynomial networks, and immune networks; machine learning–version spaces and Bayesian kernel classifiers. His application interests include financial time-series prediction, pattern recognition, and data mining.

    View full text