Low-Dimensional Galerkin Approximations of Nonlinear Delay Differential Equations

This article revisits the approximation problem of systems of nonlinear delay differential equations (DDEs) by a set of ordinary differential equations (ODEs). We work in Hilbert spaces endowed with a natural inner product including a point mass, and introduce polynomials orthogonal with respect to such an inner product that live in the domain of the linear operator associated with the underlying DDE. These polynomials are then used to design a general Galerkin scheme for which we derive rigorous convergence results and show that it can be numerically implemented via simple analytic formulas. The scheme so obtained is applied to three nonlinear DDEs, two autonomous and one forced: (i) a simple DDE with distributed delays whose solutions recall Brownian motion; (ii) a DDE with a discrete delay that exhibits bimodal and chaotic dynamics; and (iii) a periodically forced DDE with two discrete delays arising in climate dynamics. In all three cases, the Galerkin scheme introduced in this article provides a good approximation by low-dimensional ODE systems of the DDE's strange attractor, as well as of the statistical features that characterize its nonlinear dynamics.

In contrast to ordinary differential equations (ODEs), the state space associated even with a scalar DDE is infinite-dimensional, due to the presence of time-delay terms, which require providing initial data over an interval [−τ, 0], where τ > 0 is the delay. It is often desirable, though, to have low-dimensional ODE systems that capture qualitative features, as well as approximating certain quantitative aspects of the DDE dynamics.
Within the Hilbert space setting, different basis functions have been proposed to decompose the state space; these include, among others, step functions [BB78,KS78], splines [BK79,BRI84], and orthogonal polynomial functions, such as Legendre polynomials [Kap86,IT86]. Compared with step functions or splines, the use of orthogonal polynomials leads typically to ODE approximations with lower dimensions, for a given precision [BK79,IT86]. On the other hand, classical polynomial basis functions do not live in the domain of the linear operator underlying the DDE, which leads to technical complications in establishing convergence results [Kap86,IT86]; see Remark 2.1(iii) below.
In the present article, we propose to avoid these technical difficulties in approximating DDEs as systems of ODEs by using an alternative polynomial basis: the elements of this basis belong naturally to the domain of the underlying linear operator, but they have not been used in the DDE literature so far. The polynomials we shall use are named after Koornwinder [Koo84], who investigated polynomials that are orthogonal with respect to weight functions adjoining point masses, as discussed in Section 3 below. This polynomial basis turns out to be particularly useful not only for the rigorous analysis of polynomial-based Galerkin approximations of nonlinear systems of DDEs, as shown in Section 4, but also for their numerical treatment, cf. Section 6.
Useful new properties of the Koornwinder polynomials are identified in Lemma 3.1 for the scalar case, and a generalization of these polynomials to the vector case is given in Section 3.3; the latter includes the multi-dimensional extension of Lemma 3.1, namely Lemma 3.2. We show that these properties are essential for checking key stability and convergence conditions in Lemmas 4.1 and 4.3. Standard Galerkin approximation results for abstract nonlinear ODEs in Hilbert spaces are recalled in Theorem 4.1 and the rest of Section 4.1. They are then applied, with the help of Lemmas 4.1 and 4.3, to nonlinear systems of DDEs in Section 4.2.
Finite-time uniform convergence results are then derived for the proposed Galerkin approximations of nonlinear systems of DDEs, subject to simple and checkable conditions on the nonlinear term. These conditions are identified in Section 4.2; see Corollaries 4.2 and 4.3. The results apply to a broad class of nonlinear systems of DDEs, as discussed in Section 4.3.
The proposed framework yields a simple numerical calculation of the corresponding Galerkin approximations. Their coefficients are easily computable from the original system of DDEs by relying on simple recurrence formulas, cf. Proposition 3.1, and by solving upper triangular systems of linear equations, cf. Proposition 5.1; see Section 5 and Appendix C.
Finally, we outline here a useful interpretation of our proposed scheme regarding the finite-dimensional approximation of the linear part A of general systems of DDEs, when considered in the framework of Hilbert spaces, cf. (4.24). This interpretation relies on a formulation of the evolution in time of the initial state {x(θ) : θ ∈ [−τ, 0]} as the solution of a partial differential equation (PDE); see also Remark 4.1.
To do so, we first distinguish between the historic part of the evolving state, {x(t + θ) : θ ∈ [−τ, 0)}, and the state part, {x(t)}. Denoting by u(t, θ) the historic part, one can then rewrite, for instance, the simple linear DDE (1.1)ẋ = x(t − τ ), τ > 0, as the linear PDE with the boundary condition The key point is that, roughly speaking, the local differential operator v → ∂ θ v -obtained as the history component of A, and written out explicitly in Eq. (2.8), for instance -is approximated here by the nonlocal differential operator and that b N (θ) -expressed by means of Koornwinder polynomials, cf. (4.46) -is a bounded oscillatory coefficient that vanishes in L 2 as N → ∞; see Lemma 3.1. This nonlocal operator is the PDE representation for the history component of our Koornwinder-based Galerkin approximation A N given in (4.29).
Note that terms such as v(−τ )−∂ θ v θ=0 , which is responsible for the nonlocal aspect of (1.4), play an important role in the theory of numerical approximation of DDEs; see, for instance, [GZT08, Appendix A] and references therein. In particular, this term provides the exact value of the jump associated with the boundary condition (1.3). The first such jump occurs at t = 0 in the derivative of solutions of Eq. (1.1) that emanate from a constant history 1 ; this discontinuity propagates to higher-order derivatives at subsequent, integer multiples of the delay, t = kτ , k ∈ Z + .
The fact that this jump term is weighted by a vanishing term suggests that, for a given degree of accuracy, good approximation can be expected when using relatively low-dimensional Koornwinderbased approximations, as long as b N vanishes sufficiently quickly. We do not address such numerical considerations here; see, however, Table 1 in Remark 4.1 for results in the case of Eq. (1.1).
Instead, in Section 6, we provide several applications that show the proposed approximation to be not only rigorously justified, but very effective in nonlinear cases that yield quasi-periodic and chaotic, as well as nearly Brownian dynamics. In each case, low-dimensional ODE systems succeed in approximating important topological as well as statistical features of the corresponding DDE's nonlinear dynamics.
The article is organized as follows. In Section 2, we introduce the functional framework that will be adopted in Section 4.2 to recast a system of nonlinear DDEs into an abstract ODE. This framework relies on Hilbert spaces endowed with a natural inner product with a point mass. Koornwinder polynomials are then introduced in Section 3. The convergence of the Galerkin ODE systems built by projecting onto these polynomials to the original DDEs is proven in Section 4.
We provide explicit expressions of the Galerkin approximation in Section 5 for the scalar case, and in Appendix C for nonlinear systems of DDEs. Finally, numerical applications to three nonlinear DDEs are provided in Section 6. These applications involve: (i) a simple DDE with distributed delays whose solutions recall Brownian motion [Spr07]; (ii) a DDE with a discrete delay that illustrates bimodal, as well as chaotic dynamics [Spr07]; and (iii) a periodically forced DDE with two discrete delays as a highly idealized model of the El Niño-Southern Oscillation (ENSO: [GZT08, and references therein]). In all three examples, it is shown that our Galerkin scheme provides a good approximation by low-dimensional ODE systems of the DDE's strange attractor, as well as the statistical features that characterize the associated nonlinear dynamics.

Background and motivation
We introduce in this section the functional framework that will be adopted in Section 4.2 for the derivation of Galerkin approximations of a given nonlinear system of DDEs. Several function spaces can be used as a state space for the reformulation of a system of DDEs into an abstract ODE where among the most standard ones, those built-up out of the space of continuous functions on the interval [−τ, 0] play an important role in the DDE theory; e.g. [DvGVLW95,HVL93].
In this article, we adopt instead the use of Hilbert spaces which are more classically used in control or approximation theory of DDEs; see e.g., [BB78,BHS83,CZ95,Kap86,KZ86,KS87,NY89]. For a didactic expository of the associated theory of semigroups for (linear) systems of DDEs in this functional setting we refer to [CZ95,Sect. 2.4]; see also [BHS83].
will serve as our state space, and will be endowed with the inner product defined for any (f 1 , γ 1 ), (f 2 , γ 2 ) ∈ H, as: where ·, · denotes the Euclidean inner product of R d . We will also make use of the following subspace of H: where H 1 ([−τ, 0); R d ) denotes the standard Sobolev subspace of L 2 ([−τ, 0); R d ); see, e.g. [Bré10,Chap. 8]. This space consists of functions that are square integrable and whose first-order weak derivatives exist in a distributional sense and are also square integrable. Instead of presenting the general nonlinear systems of DDEs considered in this article (see Section 4.2), we introduce below a class of scalar DDEs that will serve us to identify within a simple context, the issues inherent to the Galerkin approximation of DDEs; see Remark 2.1 hereafter.
Example 2.1. In this example, we recall how a scalar DDE can be recast into an abstract ODE. For simplicity, we will focus on the following autonomous scalar DDE (d = 1): where a, b and c are real numbers, τ > 0 is the delay parameter, and F is a given scalar nonlinear function. The case of nonlinear systems of DDEs will be dealt with in Section 4.2. The reformulation of Eq. (2.4) into an abstract ODE is classical and proceeds as follows. Let us denote by x t the time evolution of the history segments of a solution to Eq. (2.4), namely

Now, by introducing the new variable
(2.6) u(t) := (x t , x(t)) = (x t , x t (0)), Eq. (2.4) can be rewritten as the following abstract ODE: where the linear operator A : D(A) ⊂ V → H is defined by and with the nonlinearity F : H → H defined by With D(A) such as given in (2.9), the operator A generates a linear C 0 -semigroup on H so that the Cauchy problem associated with the linear equationu = Au is well-posed in the Hadamard's sense; see e.g [CZ95,Thm. 2.4.6]. The well-posedness problem for the nonlinear equation depends obviously on the nonlinear term F and we refer to Section 4.2 for a solution to this problem within our functional framework; see also [Web76].
(i) It is important to note that when instead of L 2 ([−τ, 0); R d ), the space of continuous functions is typically used to obtain finite-time uniform approximation results of the semigroup generated by A. In the cases of step functions and splines, the conditions required in the Trotter-Kato theorem (see e.g. Conditions (A1) and (A2) in Theorem 4.1 below) have been analyzed in [BB78] and [BK79].
For the case of Legendre polynomials, technical complications have been encountered to check these conditions either in the setting of Galerkin approximation [Kap86] or in the setting of tau-method [IT86] largely due to the fact that the basis functions do not live in the domain of A. As noted in [Kap86,p. 168] or in [IT86,Sect. 5], either X N ⊂ D(A) or Π N is not orthogonal for the polynomial functions considered in [Kap86] and [IT86], respectively.
On the other hand, at a given precision, the use of polynomial basis leads typically to ODE approximations with lower dimensions when compared with those built out of step functions or splines [BK79,IT86].
The problems discussed in (iii) of the above remark already encountered in the linear case, have limited the applications of polynomial basis for the approximation of nonlinear systems of DDEs. The above discussion leads naturally to the question whether there exists an orthogonal polynomial basis 2 for the purpose of low-dimensional approximations. 3 See however [CZ95, Thm. 2.5.10] for a sufficient condition for the set of (generalized) eigenfunctions to be dense in H still for the case when A does not contain distributed delay terms.
4 It is also worth mentioning the more recent works [Vya12,WC05], in which interesting approximation schemes based on linear and sine functions have been proposed for the case of state dependent delays, and for which successful numerical performances have been reported although rigorous convergence results seem still to be lacking, within this approach.
for which standard approximation results for abstract nonlinear systems such as recalled in Theorem 4.1 below, could be applied to the case of nonlinear systems of DDEs.
The next section introduces orthogonal polynomials that will allow us to answer this question by the affirmative, leading to direct and explicit formulas for the rigorous Galerkin approximations of a broad class of nonlinear systems of DDEs; see Sections 4.2 and 5. As explained next, the key is to seek for polynomials that live in the domain of A, which is achieved here by seeking for polynomials to be orthogonal for the inner product (2.2) with a point mass.

Orthogonal polynomials for inner products with a point mass
The inner product given in (2.2) is naturally associated with the measure where δ 0 denotes the Dirac measure concentrated at θ = 0. Orthogonal polynomials with respect to the Lebesgue measure dθ or measures having a smooth density with respect to it, has a long history [Sze75]. The study of orthogonal polynomials with respect to a measure including a point mass such as given by (3.1) has been studied only lately [Ism05, Chap. 2.9]. It was in particular noticed that orthogonal polynomials with respect to such a measure can be expressed in terms of polynomials orthogonal with respect to the smooth part of the measure; see [Uva69] for an early contribution on the topic.
Koornwinder in [Koo84] dealt with the case of orthogonal polynomials on [−1, 1] associated with measures given by i.e. associated with measures having a Jacobi weight on [−1, 1] with two point-masses added to the extremities of the interval. Although many properties -such as three-term recurrence relationships or differential equations satisfied by such polynomials -remain valid in the case of a measure with a point mass, subtle but important qualitative and quantitative differences arise. For instance, [AMR97, Thm. 3 c)] shows that the zeroes closest to 1 of polynomials orthogonal with respect to the measure ν given in (3.2) converge to 1 faster than those associated with the standard Jacobi polynomials.
It is our goal to show that orthogonal polynomials with respect to the measure ν in (3.2) allows us to work within a simple and more direct framework than those found in the literature, for building Galerkin approximations of DDEs. Indeed, the approximation of DDEs by systems of ODEs built from orthogonal polynomials were not, so far, relying on classical Galerkin schemes as noted in [Kap86,p. 168] The results given below correspond to the case α = β = M = 0, and N = 1 considered in [Koo84].  As recalled above, this polynomial sequence is known to be orthogonal when a Dirac point-mass at the right endpoint, δ 1 , is adjoined to the Lebesgue measure [Koo84], in other words This orthogonality property and the main properties satisfied by {K n } on which we will rely on, are summarized from [Koo84] in the proposition below.
Moreover, the sequence given by (3.7) {K n := (K n , K n (1)) : n ∈ N} forms an orthogonal basis of the product space where E is endowed with the following inner product:

Moreover
Kn Kn E forms a Hilbert basis of E where the norm K n E of K n induced by ·, · E possesses the following analytic expression: (3.10) K n E = (n 2 + 1)((n + 1) 2 + 1) 2n + 1 , n ∈ N.
Proof. Based on (3.3), the proof consists essentially of algebraic manipulations relying on the following standard properties of the Legendre polynomials [STW11, Sect. 3.3]: • Orthogonality: where δ mn denotes the Kronecker delta.
• Three-term recurrence relation: where the first two Legendre polynomials are given by (3.14) L 0 ≡ 1 and L 1 (s) = s.
• First order derivative recurrence relation: Indeed, for K n given in (3.5), let us define the polynomial K τ n by (3.18) Since the sequence {K n = (K n , K n (1)) : n ∈ N} forms an orthogonal basis for E (cf. Proposition 3.1), it follows then that the polynomial sequence forms an orthogonal basis for the space H = L 2 ([−τ, 0); R) × R endowed with the inner product ·, · H given in (2.2) for d = 1. Since K n (1) = 1 from (3.6), we have Moreover, by applying the transformation T , we get trivially that We have then the following fundamental lemma.
Lemma 3.1. The rescaled Koornwinder polynomials {K τ j } j≥0 satisfy the following properties: Moreover, each function in L 2 ([−τ, 0]; R) enjoys the following decomposition in terms of the Koornwinder polynomials K τ j : Proof. For any Ψ = (Ψ D , Ψ S ) ∈ H, we have 5 Now, let Ψ D to be the zero-function on [−τ, 0] and Ψ S to be 1. For such a Ψ, by equalizing respectively the D-components and S-components of the RHS and LHS of (3.25), one then obtains from (3.20) that Remark 3.1.
(i) Note that the continuity condition, lim θ→0 − Ψ D (θ) = Ψ S , required in (2.9) in order that Ψ ∈ D(A), is here naturally satisfied by the Koornwinder basis function K τ n = (K τ n , K τ n (0)). It constitutes thus, for the inner product (2.2), an orthogonal polynomial basis whose elements live in the domain of the linear operator A given in (2.8).
As explained at the beginning of Section 3, the key element for such a construction relies on the inclusion of a point mass adjoined to the continuous part of the measure. When this point mass is absent, the corresponding orthogonal polynomials are (rescaled) Legendre polynomials. The associated basis in this latter case is given by (cf. Moreover, technical complications such as pointed out in Remark 2.1 iii) do not take place for the case of Koornwinder basis. Indeed, it will be shown in Section 4.2 that the properties of the Koornwinder polynomials such as summarized in Corollary 3.1 as well as the vectorized version given by Corollary 3.2 below, turn out to be sufficient to obtain finite-time uniform approximation results of the semigroup generated by the linear operator A; see Lemmas 4.1 and 4.3. (iii) It is also worth mentioning that Corollary 3.1 and Corollary 3.2 as well as the rigorous convergence results presented in Section 4.2 are not limited to the case of Koornwinder basis constructed here. Given any polynomial basis on [−τ, 0] that are orthogonal with respect to a measure of the form ν(dθ) = dρ(θ) + δ 0 with ρ being a positive non-decreasing function on [−τ, 0], the aforementioned results would still hold. We refer to [Uva69] for the construction of such polynomials when orthogonal polynomials with respect to ν(dθ) = dρ(θ) are known.
3.3. Vectorization of Koornwinder polynomials. We introduce here a generalization of the Koornwinder polynomials that will turn out to be useful for the approximation of nonlinear DDE systems. The purpose is here to build from the Koornwinder polynomials introduced above, linear subspaces Each function Ψ in H has here d-components that can be, each, approximated by a series of Koornwinder polynomials as in (3.25). If we restrict such an approximation to the first N Koornwinder polynomials, H N becomes then an N × d-dimensional subspace; see (4.26) below.
Our goal is also here to introduce a vectorization of Koornwinder polynomials which allows for a natural extension of Lemma 3.1 in the case d > 1. This extension of Lemma 3.1 will be particularly useful to provide finite-dimensional approximation of the linear part of systems of DDEs; see Lemma 4.1.
To do so, given j ∈ {1, · · · , N d}, we can associate a Koornwinder polynomial of degree j q ∈ {0, · · · , N − 1}, as follows where j r ∈ {1, · · · , d} is given by The vector K τ j (θ) is then nothing else than a d-dimensional canonical vector whose j th r -entry is given by the value at θ of the (rescaled) Koornwinder polynomial of degree j q ; the integers j q and j r being related to j according to (3.29)-(3.30). Based on these vectorized (rescaled) Koornwinder polynomials K τ j , we also introduce (3.32) In the remaining part of this section, we summarize some key properties of K τ j and K τ j for later usage. Hereafter, we use H 1 to denote the space H defined in (2.1) for the case d = 1, i.e., which is again endowed with the inner product given in (2.2) (still with d = 1).
Since the sequence {K τ j = (K τ j , K τ j (0)) : j ∈ N} forms an orthogonal basis for H 1 (cf. Section 3.2), one can readily check that {K τ j : j ∈ N * } forms an orthogonal basis for the space H. Note also that Given this vectorization of Koornwinder polynomials, we can now formulate the following extension of Lemma 3.1 that summarizes the key properties of the K j 's which will be used for the rigorous approximation of nonlinear systems of DDEs such as described in Section 4.2.
Lemma 3.2. The vectorized rescaled Koornwinder polynomials {K τ j } j≥1 satisfy the following properties: Moreover, each function in L 2 ([−τ, 0]; R d ) enjoys the following decomposition in terms of the vectorized Koornwinder polynomials K τ j : and the following identity holds: Proof. The above identities can be obtained by using the same type of reasoning as given in the proof of Lemma 3.1 for the scalar case. Indeed, by noting that {K τ j : j ∈ N * } forms an orthogonal basis of H, any Ψ ∈ H admits the following decomposition: where we have used the identity (3.34) in the last equality above. Now, let Ψ D ∈ L 2 ([−τ, 0]; R d ) to be the zero-function and Ψ S to be an arbitrary vector α ∈ R d . For such a Ψ, by equalizing respectively the D-components and S-components of the RHS and LHS of (3.39), we obtain respectively (3.35) and (3.36).

Galerkin approximation: Rigorous results
In this section, we establish the convergence of the Galerkin scheme based on the rescaled and vectorized Koornwinder polynomials of Section 3.3. These convergence results apply, as we shall see, to a broad class of nonlinear systems of DDEs.
As mentioned in the Introduction and in Remark 2.1-(iii), the advantage of the Koornwinder basis relies on the facts that the constitutive basis functions are orthogonal and belong each to the domain of the linear operator associated with a given DDE. In particular, there is no discontinuity at the right end point for each basis element, by construction; see Section 3. Thanks to these properties of the basis functions, convergence results for the associated Galerkin systems can be derived in a straightforward fashion (see Corollary 4.1) and under useful criteria on the nonlinear terms (see Corollaries 4.2 and 4.3), compared to other Galerkin schemes built from other bases; see, e.g., [Kap86,KS87,Vya12,WC05] and references therein.
In the following, we first present in Section 4. 4.1. Galerkin approximations of nonlinear ODEs in Hilbert spaces. We first present a general convergence result for Galerkin approximations of abstract nonlinear differential equations in a Hilbert space X, endowed with a norm · X . The mathematical setting is somewhat classical but we recall it below for the reader's convenience and later use.
In that respect, we assume in this Section the linear operator L to be the infinitesimal generator of a C 0 -semigroup of bounded linear operators T (t) on X. Recall that in that case the domain D(L) of L is dense in X and that L is a closed operator; see [Paz83,Cor. 2 where · denotes the operator norm subordinated to · X .
We are concerned with finite-dimensional approximations of the following initial-value problem: A mild solution of (4.2) over [0, T ], will be any function u ∈ C([0, T ], X) such that for u 0 ∈ X, Let {X N : N ∈ N} be a sequence of subspaces of X associated with orthogonal projectors The corresponding Galerkin approximation of (4.2) associated with X N is then given by: where (4.8) In particular, the domain D(L N ) of L N is X, because of (4.6). As we will see in Section 4.2, the choice of vectorized Koornwinder polynomials as a basis function will allow us to define subspaces X N naturally associated with orthogonal projectors Π N that satisfy the above properties in contrast to other polynomial functions used for (non-standard) Galerkin approximation or other Legendre-tau approximations of systems of DDEs used so far; see e.g. [IT86,Kap86]. See also Remark 2.1-iii).
These nice properties will allow us also to rely on the following general convergence result regarding standard Galerkin schemes, for the case of nonlinear systems of DDEs; see Section 4.2 below.
Theorem 4.1. Let L and {X N } N ≥0 be as described above. Assume furthermore the following set of assumptions: Furthermore the following uniform bound is satisfied by the family The following convergence holds (4.10) lim (A3) G is globally Lipschitz.
Then for any u 0 ∈ X, there exists a unique mild solution of (4.2) and such a solution can be approximated uniformly on each bounded interval [0, T ] by the sequence {t → u N (t; Π N u 0 )} N ≥0 of mild solutions obtained from (4.7), i.e.: (4.11) lim Proof. Recall that the existence and uniqueness of solutions to Eq. (4.3) emanating from any initial data u 0 ∈ X can be proved by a fixed point argument in C([0, T ], X) as in the proof of e.g. [CH98,Prop. 4.3.3], by relaxing the semigroup of contractions requirement therein to the C 0 -semigroup setting adopted here; see also [Lun04, Thm. 6.1.1]. Given u 0 ∈ X, let u be thus the unique mild solution of Eq. (4.2). By the variation-of-constants formula applied to Eq. (4.7) we have on the other hand, for 0 ≤ t ≤ T , Then Let us introduce (4.14) We obtain then from (4.13) that where we have used the global Lipschitz condition on G and (4.9) to derive the second inequality. It follows then from Gronwall's inequality that We are thus left with the estimation of N (u 0 ) and uniformly for t in bounded intervals. It follows that and that d N converges point-wisely to zero on [0, T ], i.e.
On the other hand, from (4.1), (4.9), and (A3), we get , and the Lebesgue's dominated convergence theorem allows us then to conclude from (4.19) and (4.21) that The desired uniform convergence estimate (4.11) is then trivially obtained from (4.16).

4.2.
Galerkin approximations of nonlinear systems of DDEs. In this section, given the Hilbert product space , endowed with the inner product (2.2), we restrict our attention to the following abstract ODE: where F is a nonlinear operator that will be specified later on, and where -given L D , a bounded linear operator from H 1 ([−τ, 0); R d ) to R d and, L S , a bounded linear operator from R d to R d -the linear operator A is given by Such an abstract setting arises naturally in the reformulation of a broad class of nonlinear systems of DDEs as an abstract ODE in H; see e.g. [BHS83,CZ95]. Examples of operators L D depending explicitly on the delay τ are given below; see (4.47).
λ > ω, the equation (λI −L)x = f possesses a unique solution x ∈ D(L), which implies in particular that (λI −L)D(L) is dense in X as required by the version of the Trotter-Kato theorem used here. This explains why this density requirement, consequence of our working assumptions, is omitted in the formulation of Theorem 4.1.
It is well-known that under these assumptions, the operator A defines a C 0 -semigroup on H [BHS83, Thm. 2.3], and in particular A is dense in H and is a closed operator.
We turn now to the definition of the subspaces X N and Π N of the previous section. For each positive integer N , we define the N d-dimensional subspace H N ⊂ H to be spanned by the first N d vectorized Koornwinder polynomials introduced in (3.32), namely (4.26) As noted in Section 3.3, these polynomials are orthogonal for the inner product with a point mass such as defined in (2.2).
The subspace H N is thus naturally associated with an orthogonal projector Π N , as required in the previous section. The approximation property (4.5) is satisfied due to the density arguments outlined in Appendix A.
Recall finally that by construction K τ j ∈ D(A) for any j ∈ N * , and therefore The corresponding N -dimensional Galerkin approximation of Eq. (4.23) reads then: which is therefore well defined on H because of (4.27).
We are now in position to check Conditions (A1) and (A2) of Theorem 4.1. To check Condition (A1), we will make usage of the following extension of the linear flow e A N t : Such an extension leads naturally to a C 0 -semigroup on H. The stability condition (4.9), will require however some specifications of the operator L D that will be made clear later. Condition (A2) can be however checked in the general setting by making an appropriate use of the properties of the vectorized Koornwinder polynomials summarized in Lemma 3.2. More precisely, Proof. By construction K τ j ∈ D(A) for each j ∈ N * . Since We recall from (3.34) that K τ j H = K τ jq H 1 for all j ∈ N. It follows then that the orthogonal projector Π N associated with the subspace H N takes the following explicit form: (4.33) where the operators p N , p N , q N , q N are defined as following: In the following, we arbitrarily fix Φ ∈ H k for some integer k > 0. Now let us choose N such that N d ≥ k, then Π N d Φ = Φ, and we get for each such N (4.35) We obtain then where I D and I S denote the identity maps on L 2 ([−τ, 0]; R d ) and R d , respectively. We show below that the RHS of (4.36) converges to zero. Let us begin with the term (I D −p N ) d + dθ Φ D . Note that by comparing the definition of p N give by (4.34a) and the decomposition of L 2 ([−τ, 0]; R d ) functions given by (3.37), we see that for each f ∈ L 2 ([−τ, 0]; R d ), the term p N f is the partial sum of the first N terms in the corresponding decomposition. It follows then that Since Φ ∈ H k ⊂ D(A), it holds that d + dθ Φ D ∈ L 2 ([−τ, 0]; R d ). We obtain then from (4.37) that We turn now to the estimates for q N (L S Φ S + L D Φ D ). By the definition of q N in (4.34c), we get It follows then from the identity (3.35) that it follows from the definition of p N given in (4.34b) and the identity (3.38) that where | · | denotes the Euclidean norm of R d . By using the identity (3.36), we also get Next, we use the expressions of A N , p N and q N -given, respectively, in (4.35), (4.34a) and (4.34c) -to note that the D-component v of any solution u of du dt = A N u, which emanates from initial data taken 7 in H N , satisfies the following nonlocal linear PDE: , and the rescaled Koornwinder polynomials K τ n are given by (3.18), since d = 1. We see therewith that the Galerkin scheme used herein introduces a nonlocal perturbation term with respect to the 7 For such initial data, the solution stays in HN , by the definition of AN . D-component of A given in (4.24). This perturbation term results from the difference between the S-component of A and the derivative at 0, when applied to functions in H N .
From Lemma 4.1 above and Lemmas 4.2 and 4.3 below, one can infer by the Trotter-Kato theorem that the effects on the solutions of Eq. (4.45) of this nonlocal perturbation -which vanishes as N → ∞, due to (3.22) -do not lead to a degenerate situation and that actually these solutions converge over finite intervals to those of the local PDE, ∂ t v = ∂ θ v. This nice convergence is due to the key properties of the Koornwinder polynomials, as summarized in Lemma 3.1 for d = 1, and in Lemma 3.2 for the multidimensional case; see also Fig. 1 for the nature of the approximation at θ = 0.
To conclude this remark, we return now to the discussion in the Introduction concerning the approximation of discontinuities that arise, for instance, in the first-order derivative of a DDE's solution, cf. [GZT08, Appendix A and references therein].
In Table 1, we report the corresponding differences over the interval [0, 2] between the analytic solution of Eq. (1.1) with τ = 1 and history x(t) ≡ 1, on the one hand, and low-dimensional Galerkin approximations obtained by application of the formulas derived hereafter in Section 5, on the other. The second column of this table is obviously consistent with the rigorous convergence result of Corollary 4.1 proved below. The third column shows that, furthermore, the aforementioned discontinuities present in the derivative of the DDE's solutions are well captured by the proposed methodology as well.
In the following, we restrict the linear operator A defined in (4.24) to be such that L S : R d → R d is a bounded linear operator, and L D is defined by (4.47) with B : R d → R d being a bounded linear operator 8 , and C(·) ∈ L 2 ([−τ, 0); R d×d ).
In the following preparatory lemma, Lemma 4.2, we recall by means of basic estimates, that the existence of ω > 0 such that A − ωI is dissipative in H, can be guaranteed. This fact will be used to establish a stability condition of type (4.9) (with M = 1) for the semigroups T (t) and T N (t), generated respectively by A and its finite-dimensional approximation A N ; see Lemma 4.3. The proofs of these Lemmas are quite straightforward but are reproduced below for the sake of completeness.  (4.49) ω = 1 + 1 2τ Proof. Let Ψ ∈ D(A), then by the definition of A given in (4.24), we have (4.50) Note that (4.51) Using the definition of L D given in (4.47), we obtain (4.52)
Lemma 4.3. Let A be defined such as in (4.24) with L D such as specified in (4.47) and L S : R d → R d to be a bounded linear operator. Then, the linear semigroups T (t) and T N (t) generated respectively by A and A N defined in (4.29), satisfy (4.56) T (t) ≤ e ωt and T N (t) ≤ e ωt , t ≥ 0, with ω given by (4.49).
9 Throughout this article, we will denote indistinguishably by | · |, either the Euclidean norm of a vector in R d , or its subordinated (operator) norm, in the case of a d × d matrix. It should be clear from the context which norm is used.
Proof. Since T (t) is a C 0 -semigroup with infinitesimal generator A, we have that T (t)u 0 ∈ D(A) for all u 0 ∈ D(A), and that We obtain thus where we have used Lemma 4.2 to obtain the last inequality above with ω given by (4.49). It follows then from Gronwall's inequality that (4.59) Since D(A) is dense in H and T (t) are bounded operators on H, the estimate (4.59) still holds for general initial data in H, leading in turn to (4.60) The estimate for T N is also trivial and proceeds as follows. First note that by the definition of T N given by (4.30), we have (4.61) Note also that (4.63) We obtain then from (4.62) that The desired estimate for T N (t) can be derived now from (4.64) by using Gronwall's inequality.

Remark 4.2.
Note that the estimate about T N in (4.56) shows in particular that solutions of (4.45) grow at most exponentially with a rate independent of N , and stay uniformly bounded over finite intervals.
With these preparatory lemmas, we are now in position to obtain as corollaries of Theorem 4.1, the convergence results for the Galerkin approximation (4.28) of d-dimensional nonlinear systems of DDEs of the form where F : R d × R d → R d , and L S , B, and C are as given in (4.47). We first sate the result for the case of global Lipschitz nonlinearity, keeping in mind that already for the case of scalar DDEs (d = 1), chaotic dynamics can take place under such a simple nonlinear setting; see Section 6.1 for a numerical illustration.
is globally Lipschitz. Then, for each u 0 ∈ H, the mild solution of (4.28) emanating from Π N u 0 converges uniformly to the mild solution of (4.23) emanating from u 0 on each bounded interval [0, T ], i.e.: In the next two corollaries, we relax the global Lipschitz condition assumed in Corollary 4.1 to a local Lipschitz condition in addition to either a sublinear growth for F (see Corollary 4.2) or an energy inequality satisfied by F; see Corollary 4.3.
Corollary 4.2. Let A be defined in (4.24) with L D specified in (4.47) and L S : R d → R d to be a bounded linear operator.
Assume that the nonlinearity F given by (4.66) is locally Lipschitz in the sense that for all r > 0 there exists L(r) > 0 such that for any Ψ 1 and Ψ 2 in H, we have Assume also that F satisfies the following sublinear growth: where γ 1 > 0 and γ 2 ≥ 0. Then, for each u 0 ∈ H, the mild solution u N (t; Π N u 0 ) of (4.28) emanating from Π N u 0 and, the mild solution u(t; u 0 ) of (4.23) emanating from u 0 , do not blow up in any finite time. Moreover, u N (t; Π N u 0 ) converges uniformly to u(t; u 0 ) on each bounded interval [0, T ], i.e.: where the positive constant ω is given by (4.49). A simple multiplication by e −ωt to both sides of (4.71), leads then trivially to An application of the Gronwall's inequality to v(t) := e −ωt u(t) H , gives then preventing thus the blow up of a mild solution in finite time.
Similarly, for mild solutions u N of (4.28), we have  Assume that the nonlinearity F given by (4.66) is locally Lipschitz in the sense of (4.68). Assume also that the following energy inequality holds for F (4.76) where γ 1 ∈ R and γ 2 ≥ 0. Then, for each u 0 ∈ D(A), the strong solution u N (t; Π N u 0 ) of (4.28) emanating from Π N u 0 , and the strong solution 11 u(t; u 0 ) of (4.23) emanating from u 0 , do not blow up in any finite time. Moreover, u N (t; Π N u 0 ) converges uniformly to u(t; u 0 ) on each bounded interval [0, T ], i.e.: Furthermore, any strong solutions v = u of (4.23) or v = u N of (4.28), emanating respectively from v(0) = u 0 ∈ D(A) or v(0) = Π N u 0 , have their H-norm controlled as follows: where κ = 2(ω + γ 1 ), with ω given in (4.49).
By taking the H-inner product on both sides of (4.79a) with the solution u ∈ D(A), and using the energy inequality (4.76) and the stability property AΨ, Ψ H ≤ ω Ψ 2 H from Lemma 4.2, we obtain (4.80) 1 2 where the positive constant ω is given by (4.49). It follows then from Gronwall's inequality that Similarly, by noting that We have thus shown that for any initial data u 0 ∈ D(A), the strong solutions u(t; u 0 ) and u N (t; Π N u 0 ) do not blow up in any finite time. The convergence result (4.77) can then be deduced as in the proof of Corollary 4.2.
11 By strong solutions of (4.23), we mean a solution in C([0, T ], D(A)) ∩ C 1 ([0, T ], H) of (4.23). 12 The regularity result [CH98,Prop. 4.3.9] is stated for the case of contraction semigroups. However, the proof can be adapted to the case of C0-semigroups for which T (t) ≤ e ωt (i.e. with M = 1) such as encountered here when LD is as specified in (4.47).

Remark 4.3.
It is worth mentioning that the conclusions of Corollaries 4.1, 4.2 and 4.3 still hold when the underlying system of DDEs (4.65) is perturbed by a suitable time-dependent forcing, g(t). For instance, it suffices to assume that g(t) ∈ L 2 loc ([0, ∞); R d ) to still get the convergence results. 4.3. Examples. In this section we provide some class of nonlinear scalar DDEs of the form (2.4) that fit with the assumptions of Corollary 4.3. In that respect we restrict our attention to the case of A such as defined in (2.8) for d = 1. We discuss below some classes of nonlinearities that verify the local Lipschitz condition (4.68) and the energy inequality (4.76). Extension to systems can be easily built out of these examples and are thus left to the reader. 4.3.1. Delay equations with a global Lipschitz nonlinearity. Let F be given such as in (2.10), and for which F is assumed to be of the form with g and h to be global Lipschitz maps from R to R, of constants L 1 and L 2 , respectively. Then, we have with γ 1 > 0 and γ 2 ≥ 0, and thus F satisfies the energy inequality (4.76). The local Lipschitz condition (4.68) for F is trivially satisfied under the assumptions on F .
Note that such nonlinear equations arise in many applications where a delayed monotone feedback mechanism is naturally involved in the description of the system's evolution; see [GZT08,Kri08,MPS96]. It is also interesting to mention that such seemingly simple scalar DDEs with global Lipschitz nonlinearity can also support chaotic dynamics as illustrated in Section 6.1 below.

4.3.2.
Delay equations with locally Lipschitz nonlinearity. We relax now the global Lipschitz requirement. In that respect, we consider and assume that (i) g 1 : R → R + is locally Lipschitz; (ii) g 2 : R → R is locally Lipschitz and verifies the condition These assumptions allow us to consider a broad class of nonlinear effects that are not necessarily bounded or polynomial. We check below that the abstract nonlinear map F defined in (2.10) with F given by (4.84) and that satisfy (i) and (ii), satisfies also the conditions of Corollary 4.3. We first check the local Lipschitz condition.
Trivially, let us first remark that (4.86) Let us introduce the notations α i := Ψ S i and β i := 0 −τ Ψ D i (θ)dθ, i = 1, 2. Let Ψ i be chosen such that for i = 1, 2, Ψ i H ≤ R for some R > 0. It follows that and thus by definition of the H-inner product (2.2) which leads to (4.89) where L 1 (r) (resp. L 2 (r)) denotes the local Lipschitz constant associated with g 1 (x) (resp. g 2 (x)) for |x| < r.
On the other hand, Note also that |α 1 − α 2 | ≤ Ψ 1 − Ψ 2 H , and that We obtain then from (4.89) that which gives the local Lipschitz property of F as a map from H to H. The energy inequality (4.76) is here readily satisfied since because of (4.85). with ω ∈ L ∞ (R, R + ) and f ∈ L 1 (R, R + ) satisfying the inequality for some γ > 0 and for almost every x, y ∈ R, are still covered by Corollary 4.3. (ii) Note also that many other nonlinear effects could have been considered in (2.4) for which the convergence result of Corollary 4.3 would hold. For instance we could have considered where each b j is a local Lipschitz function in L ∞ (R) and b 2p−1 (·) ≤ β < 0 for some constant β.
Under these assumptions, the nonlinearity (4.93) satisfies the energy inequality (4.76) with γ 1 = 0 and with γ 2 sufficiently large, depending on the L ∞ norm of the functions b j 's. This is because the term with the largest power (Ψ S ) 2p−1 is strictly negative by assumption on b 2p−1 , and the other terms with a lower degree can be controlled by using Young's inequality.

Galerkin approximation: Analytic formulas for scalar DDEs
This section is devoted to the derivation of explicit expressions of the Galerkin approximation (4.28) associated with nonlinear DDEs. For simplicity, we focus on the case of a scalar DDE taking the form given by (2.4). The more general case of nonlinear systems of DDEs is dealt with in Appendix C; see also Appendix C.2 for the case where the linear part of (2.4) involves a distributed-delay term as in (4.65).
As a preparation for the forthcoming analytic derivations, we need to express the derivative of the Koornwinder polynomials in terms of the polynomials themselves. This is the content of the following proposition.
By using (5.16) in (5.15), we get Now, by using (5.14) and (5.17) and recalling that K τ n H = K n E , we obtain from Eq. (5.7) the following explicit form of the N -dimensional Galerkin system (4.28): a + bK n (−1) + cτ (2δ n,0 − 1) + 2 τ n−1 k=0 a n,k δ j,k K j 2 E − 1 y n (t) For later usage, we rewrite the above Galerkin system into the following compact form: where Ay denotes the linear part of Eq. (5.18), and G(y) the nonlinear part. Namely, A is the N × N matrix whose elements are given by (5.20) where j, n = 0, · · · , N − 1, and the nonlinear vector field G : R N → R N , is given component-wisely by Remark 5.1. When a time-dependent forcing g(t) is added to the RHS of the DDE (2.4), the only change in the corresponding Galerkin system is to add a term K j −2 E g(t) to each G j in (5.21). Recall also from Remark 4.3 that it is sufficient to require that g ∈ L 2 loc (R; R) in order to ensure the convergence result of the Galerkin system over any finite interval.

Approximation of chaotic dynamics: Numerical results
In this section, we report on the performance of our Galerkin schemein approximating quasi-periodic and chaotic dynamics. In the case of the latter, it is well known that any finite-time uniform convergence result -such as the one given by (4.77) and obtained in Corollary 4.3 -becomes less useful, due to sensitivity to initial data.
In this case, when individual solutions diverge exponentially, it is natural to consider instead the approximation of the strange attractor or of the statistics of the dynamics. More generally, the approximation of meaningful invariant probability measures supported by the strange attractor is of primary interest in chaotic dynamics. Nonlinear DDEs, like those considered here, are known to support such probability measures once a global attractor is known to exist [CGH12].
If, in addition to the finite-time uniform convergence (4.77), a uniform dissipativity assumption is satisfied, then any sequence of invariant measures associated with the Galerkin approximation converges weakly to an invariant measure of the full system; see [Wan09,Thm. 2.2]. The uniform dissipativity assumption referred to in [Wan09] is satisfied if one can establish, uniformly in N , the existence of an absorbing ball for the Galerkin reduced systems in another separable Hilbert space V, which is compactly imbedded in H, in the case of strongly dissipative systems.
We show in Sections 6.1 and 6.2 below that, for two simple nonlinear scalar DDEs that -even in instances in which the uniform dissipativity assumption of [Wan09, Thm. 2.2] is not guaranteed -our Galerkin scheme is still able to approximate significant statistical properties of the chaotic dynamics, or the strange attractor itself. Finally, we illustrate in Section 6.3, for a highly idealized ENSO model from climate dynamics that our approach also works in the case of periodically forced DDEs with multiple delays. 6.1. "Nearly-Brownian" chaotic dynamics. We consider here a modified version of the DDE analyzed in [Spr07] with a sinusoidal nonlinearity. The modification consists of replacing the discrete delay by distributed delays. As we will see, this modification does not affect the main dynamical properties identified in [Spr07] in the case of a discrete delay, once the proper parameter values are chosen.
More precisely, we consider the following DDE (6.1)ẋ = a sin t t−τ x(s)ds , with a = 0.5 and τ = 5.5. This example fits within the general class discussed in Section 4.3.1, to which the rigorous convergence results described in Sections 4.2 in an abstract setting, and in Section 5 more concretely, do apply.
As pointed out in the introduction of this section, such finite-time uniform convergence results are essential in general, but they are not the ones we are necessarily looking for in approximating chaotic dynamics. We rely, therefore, on careful numerical simulations to explore the performance of our Koornwinder-polynomial-based Galerkin systems to approximate the statistical features of the chaotic dynamics in these examples.
A sample trajectory of Eq. (6.1) is shown in black in Fig. 2. While the governing DDE is perfectly deterministic, the visual resemblance of this trajectory to a sample path of Brownian motion is obvious. A trajectory with the same constant initial history over the interval [−τ, 0), but obtained by solving a 10D-approximation by our Galerkin scheme of the DDE, is shown as the red curve in the figure. It is clear that individual trajectories do have the same overall behavior, which is nearly Brownian, but the pointwise approximation of the exact trajectory by the approximate one is not good.
To study instead the statistics of the solutions, we have estimated -for a collection of n = 10 4 initial histories of constant value drawn uniformly in [−1, 1] -the empirical probability density function (PDF) of the corresponding x(t)-values at a given time t. Figure 3 reports on the numerical results at t = 2680 when x(t) is simulated from Eq. (6.1) by forward Euler integration (black curve) with a time step of ∆t = τ /2 10 , and when x(t) is approximated by a 10D-Galerkin approximation (red curve) obtained from the analytic formulas (5.19)-(5.21) applied to Eq. (6.1).
The Galerkin system of ODEs is integrated using a semi-implicit Euler method that still uses ∆t = τ /2 10 , but in which the linear part Ay is treated implicitly, while the nonlinear term G(y) is treated explicitly. The approximate solution x N (t), provided by an N -dimensional Galerkin system of the form (5.19), is obtained by using the expansion (5.5) into Koornwinder polynomials. More precisely, x N (t) is obtained as the state part of u N given by (5.5) which, thanks to the normalization property K τ n (0) = 1 given in (3.20), reduces to where y := (y 0 , · · · , y N −1 ) is the vector solution of (5.19). Also shown in blue in panels (a, c) of Fig. 3, is a Gaussian distribution with the same mean and standard deviation, both of which are estimated from the x(t)-values at t = 2680 of the simulated DDE solutions. In both cases, the empirical distributions, as obtained from the simulated DDE solution or as obtained by using a Galerkin approximation with N = 10, closely follow a Gaussian law.
For both x(t), as simulated by numerically solving Eq. (6.1), and x N (t), as obtained by using a Galerkin approximation with N = 10, the standard deviations of the corresponding collection of trajectories are shown versus time in panels (b) and (d) of Fig. 3. The best linear regression fit of log(σ) versus log(t) gives σ = 0.195t 0.479 , in the case of Eq. (6.1), and σ = 0.190t 0.480 , in the case of the 10D-Galerkin approximation.
In both cases, the slopes of the fitted lines indicate that the deterministic dynamics of Eq. (6.1) mimics that of Brownian motion, for which the slope would be 0.5, with a diffusion coefficient D = σ 2 /t. In particular, the solutions do not stay within any bounded subset; the dynamics thus violates any dissipation criterion, while still possessing an attractor with strange behavior (not shown). To the best of our knowledge, the 10D-Galerkin approximation computed here thus provides the first example of a chaotic system of ODEs whose solutions exhibit a statistical behavior close to that of Brownian motion.
6.2. Bimodal chaotic dynamics with low-frequency variability. The model studied in this subsection, is also based on [Spr07]; the parameter values used here are a = 0.5, b = 20 and τ = 3.35. Remark 6.1 at the end of this subsection discusses similarities between the dynamics of the model above and important aspects of ENSO dynamics.
In contrast to the DDE considered in the previous subsection, Eq. (6.3) does not fit directly within the general framework of Section 4.2, for which rigorous convergence results are available. The discrete lag present in the cubic term leads to complications for a rigorous analysis. Replacing this lag effect by a distributed one, as in Eq. (6.1), would place the DDE of Eq. (6.3) into the class considered in Section 4.3.2, for which finite-time uniform convergence results are guaranteed. But even then, we cannot be assured a priori of an effective approximation of statistical features of the dynamics, as discussed above.
The purpose of this subsection is to show that, even in such a borderline case with respect to the theory presented in this article, statistical and even topological features can still be remarkably well approximated by the Galerkin systems of Section 5, when appropriately modified to handle the case of discrete delay in the nonlinear terms. 13 13 This modification consists just of noting that, by replacing the nonlinear term with distributed delays in (2.4) by F (x(t), x(t − τ )), the identity (5.15) becomes estimated from the simulations of the DDE (6.1) (black curve and black circles, in the left panels and the right panels, respectively), and its 10D-Galerkin approximation (red curve and red circles) by the method described in Section 5, respectively. In each of the panels (a) and (b), the results are compared with a Gaussian distribution estimated by standard analytic formulas (blue curves), and in each of the panels (c) and (d), the results are compared with the best linear regression fit (blue curve) of log(σ) versus log(t), providing the corresponding slope and the exponent reported therein.
As shown in the left panel of Figure 4 in natural delayed coordinates, the strange attractor associated with Eq. (6.3) (black) has a nearly symmetric topological shape and is constituted by two fairly highdensity"islands" connected by a foliation of heteroclinic-like orbits. These properties of the attractor are very well captured by a 6D-Galerkin approximation (right panel, red).
These two high-density islands, along with the lower-density areas of heteroclinic-like connections between the two, give rise to a bimodal chaotic behavior, which gives rise, in turn, to an interesting time variability. Figure 5 plots the results of a standard numerical estimation of the spectral density -also known in the engineering literature as the power spectrum -associated with x(t), as directly simulated using Eq. (6.3) (black curve) and as approximated by a 6D-Galerkin approximation (red curve). In both cases, the power spectra are estimated from the corresponding autocorrelation functions [ER85, GAD + 02].
The numerical results show that the spectrum of x(t) contains two broadband peaks at low frequencies that stand out above an exponentially decaying background. As shown in Fig. 5, these  two peaks, as well as the exponential background, are strikingly well approximated by a 6D-Galerkin approximation, in both amplitude and frequency, as well as in the rate of decay for high frequencies.
The approximation by a truly low-dimensional, 6D-Galerkin model of these key features of the power spectrum of the solutions to the DDE (6.3) has deep dynamical implications in terms of the Ruelle-Pollicott resonances and mixing properties of the dynamics on the attractor [CNK + 14]. These implications are beyond the scope of this article but we intend to discuss them elsewhere.  a highly idealized ENSO model with memory effects; see [BH89, CMZ90, Dij05, GCS15, GT00, GZT08, KS14, MCZ91, TSCJ94, ZG10, vR13] and references therein. Indeed, for the given parameter values, the solution of Eq. (6.3) admits two metastable states, as can be seen from the two islands in the attractor given in Figure 4. These two metastable states are analogous to the warm, El Niño and the cold, La Niña states in ENSO dynamics. Moreover, the two broadband peaks at low frequencies in the spectral density of the solution, as shown in Figure 5, are also reminiscent of the quasi-quadrennial and the quasi-biennial mode in ENSO [GR00, GAD + 02, JNG95] dynamics. The important role of such low-frequency variability in the understanding and prediction of climate dynamics on various time scales was emphasized in [CKG11, DG05, Ghi01, GR00, GR02] and references therein.
This equation is a slightly modified version of the model used in [TSCJ94] for the study of the interaction between the seasonal forcing and the intrinsic ENSO variability. 14 We also refer to [BH89, CMZ90, Dij05, GCS15, GT00, GZT08, MCZ91, TSCJ94, ZG10, vR13, GZ15] and references therein for other models with retarded arguments used in this context.
The purpose of this subsection is to show that the Galerkin scheme developed in this article can also be easily adapted to deal with DDEs with multiple delays, as well as with non-autonomous, forced DDEs. For this purpose, Eq. (6.5) is placed in a quasi-periodic regime by choosing the parameter values α = 2.1, β = 1.05, γ = 3, κ = 10, τ 1 = 0.95, and τ 2 = 5.13.
We outline now the necessary modifications to the nonlinear term G(y) in the Galerkin system (5.19) for the case of multiple discrete delays. As a direct generalization of the case with a single discrete delay, given in footnote 13, the nonlinearity F in (2.4) takes the form F (x(t), x(t − τ 1 ), · · · , x(t − τ p )), where 0 < τ 1 < · · · < τ p =: τ . In this more general situation, the identity (5.15) becomes (6.6) which leads to the corresponding change in the formula (5.21) for the nonlinear vector field in Eq. (5.19). Note also that the forcing term is dealt with in Remark 5.1. Again, we compare the DDE's attractor in the left panel of Fig. 6 with the attractor obtained from an associated Galerkin system, in the right panel. Despite the complexity of the DDE's attractor, a 40dimensional Galerkin system can already provide a very accurate reconstruction of this attractor. The need for a higher dimensionality of the Galerkin approximation in this case arises from the presence of incommensurable frequencies in the periodically forced model, namely the seasonal cycle and the internal frequencies [Dij05, JNG94, GR00, GR02, TSCJ94]. Attractor from Galerkin approximation x(t) Figure 6. The attractor associated with Eq. (6.5) (left panel) and its approximation obtained from a 40D-Galerkin approximation (right panel).
Appendix A. Proof of Theorem 3.1 The identity (3.5) follows from the definition of K n given in (3.3) and the properties of the Legendre polynomials L n given by (3.13) and (3.15). The normalization K n (1) = 1 results directly from the identity (3.5) and the normalization L n (1) = 1 as recalled in (3.12).
The orthogonality property of (K n , K n (1)) is proved in [Koo84, Theorem 3.1]. The formula (3.10) about the norm can be checked directly by using (3.5) and the orthogonality property of the Legendre polynomials recalled in (3.11).
Finally, it remains to show that Kn Kn E forms a Hilbert basis of E which due to the orthonormality property boils down to show that the linear space spanned by the K n 's is dense in E. In that respect, by recalling that K n = (K n , K n (1)) and that K n are polynomials of degree n orthogonal under the inner product ·, · E , it follows that any function in the subspace S n := {(f, f (1)) : f is a polynomial of degree n} admits a unique expansion in terms of {(K 0 , K 0 (1)), · · · , (K n , K n (1))}. Note also that ∞ n=0 S n is dense in the subspace Since the last subspace C is dense in E, the desired result follows.
Appendix B. Proof of Proposition 5.1 Note that a n := (a n,0 , · · · , a n,n−1 ) tr given in (5.1) can be obtained by rewriting both sides in terms of the Legendre polynomials and comparing the coefficients, which leads to the algebraic equation Ta n = b n to be satisfied by a n . We provide below some details about the derivation of the matrix T and the vector b given in (5.3).
First note that by the definition of K n given in (3.3), we get (B.1) dK n ds (s) = d ds −(1 + s) d ds L n (s) + (n 2 + n + 1) dL n (s) ds .
The system of algebraic equations given in (5.2)-(5.3) can then be obtained by equating the coefficients for L n−1 , · · · , L 0 on both sides of the last equation above. The proof is now complete.
Appendix C. Analytic formulas for nonlinear systems of DDEs In this section, we extend the analytic formulas of Section 5 to nonlinear systems of delay equations of the form (4.65) based on the vectorized Koornwinder polynomials introduced in subsection 3.3.
Recall that (4.65) is given by where F : R d × R d → R d , and L S , B, and C are as given in (4.47).
Recall also that the unknown function u N in the corresponding Galerkin system (4.28) takes values in the N d-dimensional subspace H N defined in (4.26). We can thus rewrite u N in terms of the first N d vectorized Koornwinder polynomials (see (3.32)) as follows: with j q determined by (3.29) and H 1 being the product space L 2 ([−τ, 0); R) × R equipped with the inner product given by (2.2). In the following, we first deal with a special case in subsection C.1 by assuming that the linear operator C in the system (C.1) is time-independent. Necessary changes for the time-dependent case is then outlined in subsection C.2.
To summarize, when C is time-dependent, the N d-dimensional Galerkin approximation associated with (C.1) still takes the form (C.5) with the nonlinear part G(y) given by (C.7), and the elements of the N d × N d matrix A, given by (C.12) A j,n = 1 K jq 2 E 2 τ nq−1 k=0 a nq,k δ jq,k K jq 2 E − 1 δ nr,jr + (L S ) jr,nr + K nq (−1)B jr,nr + C jr,nr , K τ nq L 2 .