Fading memory echo state networks are universal

Echo state networks (ESNs) have been recently proved to be universal approximants for input/output systems with respect to various $L ^p$-type criteria. When $1\leq p<\infty$, only $p$-integrability hypotheses need to be imposed, while in the case $p=\infty$ a uniform boundedness hypotheses on the inputs is required. This note shows that, in the last case, a universal family of ESNs can be constructed that contains exclusively elements that have the echo state and the fading memory properties. This conclusion could not be drawn with the results and methods available so far in the literature.

). These findings have been an important motivation for the in-depth study of the approximation capabilities of these machine learning paradigms. The first results in this direction have been obtained in the context of systems theory for input/output systems with either finite or approximately finite memory [Sand 91, Perr 96, Stub 97] in a forward-in-time framework. More recently, those universality statements were extended to ESNs with semi-infinite inputs from the past. Various L p -type criteria have been used to measure the approximation error. The case 1 ≤ p < ∞ has been considered in [Gono 20b], where the universality of families of ESNs with a prescribed activation function and stochastic inputs was established with respect to the L p norm determined by the law of a fixed discrete-time input process defined for all infinite negative times. In this case, the universality is formulated in the category of all the causal and time-invariant input/output systems with p-integrable outputs. Extensions of some of these results in the particularly relevant case p = 2 for randomly generated ESNs and, more importantly, corresponding approximation and generalization error bounds for regular filters/functionals have been derived in [Gono 19, Gono 20a].
Universality with respect to uniform approximation, that is, the case p = ∞, has been studied in [Grig 18b, Grig 18a] for (almost surely) uniformly bounded inputs in the fading memory category. The universality of ESNs for uniformly bounded inputs in the fading memory category has been established in [Grig 18a] using an internal approximation property [Grig 18a, Theorem 3.1] that allows one to conclude the uniform proximity of input/output systems generated by a state-space system out of the uniform closeness of the corresponding state maps. Using this observation, it can be shown that ESNs inherit universality properties out of the universality of neural networks [Cybe 89, Horn 89].
The result that we just described, stated in Theorem 4.1 of [Grig 18a], does not guarantee though that the approximating ESNs have two important properties that we now briefly recall. The first one is the echo state property (ESP) that holds when every semi-infinite input has one and only one semiinfinite output associated. The ESP allows, in passing, to associate a filter to the ESN. The second one is the fading memory property which, in the presence of uniformly bounded inputs, amounts to the continuity of the associated filter when the spaces of inputs and outputs are endowed with the product topologies. The main theorem in this note shows that, when the activation function in the universal family of ESNs is Lipschitz-continuous, then this family can be chosen so that all its elements have the echo state and the fading memory properties. This fact was not established in the original paper and showing it requires a different strategy than in [Grig 18a].

The result
The statement and proof of the main theorem uses a notation similar to that in [Grig 18a]. In particular, for any M > 0, we denote by B · (0, M ) the Euclidean ball of radius M and by B · (0, M ) its closure. for definitions) with respect to which we shall prove the universality of the echo state family with the properties described above. We recall that an echo state network is determined by the state-space system: The values z t ∈ R d (respectively, y t ∈ R m ) are the components of the input sequence z ∈ K M (respectively, the output sequence y ∈ K L ). The map σ : R N −→ R N is obtained by componentwise application of an activation function σ : R −→ R that we assume L σ -Lipschitz-continuous, bounded, and non-constant.
Again by the universal approximation theorem [Horn 91] for each j = 1, . . . , K there exists N j ∈ N, W j ∈ M d, Nj , A j ∈ M Nj ,d , and ζ j ∈ R Nj such that the neural network (2.4) Define J j = I j • · · · • I 1 and J 0 (z) = z. We now prove inductively that for each j = 1, . . . , K sup z∈B · (0,M) (2.5) For j = 1 this follows from (2.4). For the induction step, we assume (2.5) holds for indices up to j − 1 and aim to prove it for j. Firstly, we obtain from the induction hypothesis for i = 1, . . . , j − 1 sup z∈B · (0,M) which proves (2.5). The Lipschitz continuity of σ and (2.5) thus allow us to estimate By the triangle inequality, (2.2), (2.3), and (2.6), we have In order to conclude the proof, it remains to be shown that H ESN is indeed the functional associated to an echo state network. Let N = N 1 · · · + N K + N and Consider now the state vectors and the state equation in R N determined by x t = σ(Ax t−1 + Cz t + ζ). (2.7) By our choice of matrices, the solutions x of (2.7) (if they exist) satisfy that t−1 + ζ j+1 ∈ R Nj+1 , with j ∈ {1, . . . , K − 1}, and Iterating the expression (2.8) we obtain that this solution indeed exists, it is unique, and it is given by: (1) t = σ( A 2 I 1 (z t−1 ) + ζ 2 ), . . . , x (K−1) t = σ( A K J K−1 (z t−(K−1) ) + ζ K ), and Consequently,