Rigorous numerics for nonlinear operators with tridiagonal dominant linear part

We present a method designed for computing solutions of infinite dimensional non linear operators $f(x) = 0$ with a tridiagonal dominant linear part. We recast the operator equation into an equivalent Newton-like equation $x = T(x) = x - Af(x)$, where $A$ is an approximate inverse of the derivative $Df(\overline x)$ at an approximate solution $\overline x$. We present rigorous computer-assisted calculations showing that $T$ is a contraction near $\overline x$, thus yielding the existence of a solution. Since $Df(\overline x)$ does not have an asymptotically diagonal dominant structure, the computation of $A$ is not straightforward. This paper provides ideas for computing $A$, and proposes a new rigorous method for proving existence of solutions of nonlinear operators with tridiagonal dominant linear part.


Introduction
Tridiagonal operators naturally arise in the theory of orthogonal polynomials, ordinary differential equations (ODEs), continued fractions, numerical analysis of partial differential equations (PDEs), integrable systems, quantum mechanics and solid state physics. Some differential operators can be represented by infinite tridiagonal matrices acting in sequence spaces, as it is the case for instance for differentiation in frequency space of the Hermite functions. Other examples come from the study of ODEs like the Mathieu equation, the spheroidal wave equation, the Whittaker-Hill equation and the Lamé equation.
While many well-developed methods and efficient algorithms already exist in the literature for solving linear tridiagonal matrix equations and computing their inverses, our own method has a different flavour. We aim at developing a computational method in order to prove, in a mathematically rigorous and constructive sense, existence of solutions to infinite dimensional nonlinear equations of the form where L is a tridiagonal linear operator and N is a nonlinear operator. The domain of the operator f is the space of algebraically decaying sequences where ω s k def = 1, k = 0, k s , k ≥ 1.
The assumptions on the linear and nonlinear parts of (1) are that L : Ω s → Ω s−s L and N : Ω s → Ω s−s N , for some s L > s N . Intuitively, this means that the linear part dominates the nonlinear part. Since Ω s1 ⊂ Ω s2 for s 1 > s 2 , one can see that f maps Ω s into Ω s−s L .
General nonlinear operator equations of the form f (x) = 0 defined on the Banach space Ω s arise in the study of bounded solutions of finite and infinite dimensional dynamical systems. For instance, x = (x k ) k≥0 may be the infinite sequence of Fourier coefficients of a periodic solution of an ODE, a periodic solution of a delay differential equation (DDE) or an equilibrium solution of a PDE with Dirichlet, periodic or Neumann boundary conditions. The unknown x may also be the infinite sequence of Chebyshev coefficients of a solution of a boundary value problem (BVP), the Hermite coefficients of a solution of an ODE defined on an unbounded domain, or the Taylor coefficients of the solution of a Cauchy problem. In the case when the differential equation is smooth, the decay rate of the coefficients of x will be algebraic or even exponential [1]. In the present paper, we chose to solve (1) in the weighed ∞ Banach space Ω s which corresponds to C k solutions. In order to exploit the analyticity of the solutions, we could follow the idea of [2] and solve (1) in weighed 1 Banach spaces. This choice of space is not considered in the present paper.
Recently, several attempts to solve f (x) = 0 in Ω s have been successful. They belong to a field now called rigorous numerics. This field aims at constructing algorithms that provide approximate solutions to a given problem, together with precise bounds implying the existence of an exact solution in the mathematically rigorous sense. Equilibria of PDEs [3,4,5], periodic solutions of DDEs [6], fixed points of infinite dimensional maps [7] and periodic solutions of ODEs [8,9] have been computed using such methods.
One popular idea in rigorous numerics is to recast the problem f (x) = 0 as a problem of fixed point of a Newton-like equation of the form T (x) = x − Af (x), where A is an approximate inverse of Df (x), andx is a numerical approximation obtained by computing a finite dimensional projection of f . In [3,4,6,7,9,5], the nonlinear equations under study have asymptotically diagonal or block-diagonal dominant linear part, which helps a lot in the computation of approximate inverses. In contrast, the present work considers problems with tridiagonal dominant linear part. To the best of our knowledge, this is the first attempt to compute rigorously solutions of such problems. While our proposed approach is designed for a specific class of operators (see assumptions (4) and (5)), we believe that it can be seen as a first step toward rigorously solving more complicated nonlinear operators with tridiagonal dominant linear part.
The paper is organized as follows. In Section 2, we present a method enabling to compute (with the help of the computer) pseudo-inverses of tridiagonal operators of a certain class. In Section 3, we recast the problem f (x) = 0 as a fixed point problem A is a pseudo-inverse, and we present the rigorous computational method to prove existence of fixed points of T . In Section 4, we present an application and finally, in Section 5, we conclude by presenting some interesting future directions.

Computing pseudo-inverses of tridiagonal operators
This Section is devoted to the construction of a pseudo-inverse of a linear operator with tridiagonal tail (see (6)). We begin this Section by specifying the assumptions that we make on the growth of the tridiagonal terms. Then we use an LU-decomposition to formally obtain a formula for the pseudo-inverse. Finally, we check that the (formally defined) pseudo-inverse has good mapping properties (see Proposition 2.3).
Given three sequences (λ k ) k≥0 , (µ k ) k≥0 , (β k ) k≥0 and x ∈ Ω s , we define the tridiagonal linear operator (acting on x) L(x) = (L k (x)) k≥0 of (1) by and L 0 (x) = µ 0 x 0 + β 0 x 1 . Assume that there exist real numbers s L > 0, 0 < C 1 ≤ C 2 and an integer k 0 such that Assume further the existence of δ ∈ 0, 1 2 and k 0 ≥ 0 such that Then, under assumptions (4) and (5), L defined by (3) is a tridiagonal operator which maps Ω s into Ω s−s L . Indeed, if x ∈ Ω s , then From now on, assume for the sake of simplicity that s N = 0, that is the nonlinear part N of (1) maps Ω s into Ω s . Since Ω s is an algebra under discrete convolutions when s > 1 (e.g. see [5,10]), then any N which is a combination of such convolutions maps Ω s into Ω s . Assume that using a finite dimensional projection f (m) : R m → R m of (1), we computed a numerical approximationx such that f (m) (x) ≈ 0. We identifyx ∈ R m and x = (x, 0, 0, 0, 0, . . . ) ∈ Ω s . We then try to construct a ball centered atx and containing a unique solution of (1), by showing that a specific Newton-like operator T (x) = x − Af (x) is a contraction on Bx(r). This requires the construction of an approximate inverse A of Df (x) = L(x) + DN (x). In order to do so, the structures of L(x) and DN (x) need to be understood. From (3) and (4), L(x) is a tridiagonal operator with entries growing to infinity at the rate k s L . Moreover, since DN (x) maps Ω s into Ω s , it is a bounded linear operator. As mentioned above, the expectation is that the coefficients ofx decay fast to zero. This implies that a reasonable approximation A † of Df (x) is given by with D def = Df (m) (x) for m large enough. We wish to find the inverse of A † in terms of D, (β k ) k≥m−1 , (µ k ) k≥m and (λ k ) k≥m . We assume therefore that where x and y are the infinite vectors The infinite part of (7) writes We introduce the notations of the book of P.G. Ciarlet (see Theorem 4.3-2 on page 142 in [11]): and (δ n ) n∈N defined by the induction formula δ 0 = 1, δ 1 = b 1 , and δ n = b n δ n−1 − a n c n−1 δ n−2 , for n ≥ 2.
Note that only the δ n are really useful. Let us define the tridiagonal operator T by For any infinite vector x = (x 0 , . . . , x k , . . .) T , we introduce the notation . , x m+k , . . .) T .
Using the notation e 1 = (1, 0, 0, 0, 0, · · · ) T , the system (8) becomes From Theorem 4.3-2 in [11], we compute an LU -decomposition of the tridiagonal operator defined in (9) as T = L I U I , where Hence, the system (8) becomes L I z I = y I − λ m x m−1 e 1 combined with U I x I = z I , that is Both infinite systems (11) and (12) can be explicitly solved. System (11) leads to and for any k ≥ 1 which we rewrite with infinite matrix/vectors notations as where The second system (12) leads to the infinite sum (for any k ≥ 0) which we also rewrite with infinite matrix/vector notations as Coupling (13) and (14), we end up with r0 the first row of the infinite matrix U I −1 L I −1 and (w I ) 0 the first element of w I , we can rewrite the first line of (15) as We now investigate the finite part of the linear system (7), which is given by or, according to (16), we consider its inverse K −1 . We denote the last column of K −1 by (K −1 ) cm−1 , its last row by (K −1 ) rm−1 , and its last ("south-east") element by (K −1 ) m−1,m−1 . Then we obtain using the tensor product notation. The last line of this identity reads Coming back to (15) and using (18), we see that Putting together (17) and (19), we end up with In order to get an approximate (pseudo) inverse of A † , we would like to get a numerical approximation of K −1 . However the definition of K involves (w I ) 0 , which cannot be explicitly computed. By definition, w I = U −1 I L −1 I e 1 , so using again the computations made in this Section, we get Given a computational parameter L, we definẽ We now can consider A m a numerically computed inverse ofK and then define the approximate (pseudo) inverse of A † as Proof. Let z I ∈ Ω s and x I = U −1 I z I . Using (14) and the formula above, we get Now remember that for all k ≥ 2, The study of the inductive sequence defined as above, but with ≥ replaced by =, yields Figure 1). We can then rewrite (22) in order to get Proof. Let y I ∈ Ω s and z I = L −1 I y I . Using (13) and the formula above without the last term since we do not consider here L −1 where we use the sequence u k introduced in the previous proof. We get and For any k ≥ m, we then have which is bounded uniformly in k since the last term goes to 0 when k goes to ∞, and the proof is complete.
Then, by definition of the operator A in (21), By the previous lemmas, U I −1 L I −1 y I ∈ Ω s+sL , and in particular U I −1 L I −1 r0 y I = U I −1 L I −1 y I 0 is well defined and so is x F . Using (21) again, Remember that w I = U −1 I L I −1 e 1 , so that w I ∈ Ω s for any s. According to the previous lemmas and the definition of Λ (see (21)), we see that x I ∈ Ω s+s L .

Computations of fixed points of the operator T
Our main motivation for computing approximate inverses is to prove existence, in a mathematically rigorous sense, of a fixed point of the Newton-like operator T in a set centered at a numerical approximationx. The Newton-like operator has the form where A is the approximate inverse (21) of Df (x) computed using the theory of Section 2.
Since f maps Ω s into Ω s−s L and A maps Ω s into Ω s+s L (thanks to Proposition 2.3), we see that T maps the Banach space Ω s into itself. Our goal is to obtain explicit bounds allowing us to show that a given T is a contraction on the ball Bx(r), which yields the existence of a fixed point of T (and thus of a zero of f ). The fixed point theorem that we use (see Theorem 3.1) requires bounds on T and its derivative. We get formulas for these bounds in Sections 3.2 and 3.3, and then explain in Section 3.4 how to use the so-called radii polynomials in order to find a radius r > 0 such that T (Bx(r)) ⊂ Bx(r), and such that T is a contraction on Bx(r).
Before proceeding further, we endow Ω s with the operation of discrete convolution. More precisely, given x = (x k ) k≥0 , y = (y k ) k≥0 ∈ Ω s , we extend x, y symmetrically bỹ x = (x k ) k∈Z ,ỹ = (y k ) k∈Z wherex −k = x k ,ỹ −k = y k , for k ≥ 1. The discrete convolution of x and y is then denoted by x * y, and defined by the (infinite) sum It is known that for s > 1, (Ω s , * ) is an algebra (e.g. see [10]), that is, if x, y ∈ Ω s , then x * y ∈ Ω s . This will be useful when we shall look for a bound such as (27) below. We start with a classical theorem, whose proof is standard (e.g. see the proof of Lemma 3.3 in [5]) and is a direct consequence of the contraction mapping theorem.
Assume that there exists a pointx ∈ Ω s and vectors and sup b1,b2∈B0(r) If there exists r > 0 such that Y + Z(r) s < r, then the operator T is a contraction in Bx(r) and there exists a uniquex ∈ Bx(r) such that T (x) =x.
We shall see how to get the bounds Y (Section 3.2) and the bounds Z(r) (Section 3.3), and we shall provide an efficient way of finding a radius r > 0 such that Y + Z(r) s < r (Section 3.4). The first step however consists in looking for bounds on A. More precisely, we need some estimates in order to control the action of U −1 I L −1 I . This is the goal of the following Subsection.

Some preliminary computations
We introduce the notations Lemma 3.2. Let y I = (y m , y m+1 , . . .) T be an infinite vector and x I = U −1 I L −1 I y I . Assume that m ≥ k 0 and δ < 1 2 . Then, for all k ≥ 0, Proof. We again introduce z I = L −1 I y I . Combining (23) from Lemma 2.1 and (24) from Lemma 2.2, we get In particular, we immediately obtain the two following corollaries (always under the assumptions of Lemma 3.2) which will be useful in the sequel.
Corollary 3.4. If y is such that y m+k = 0 for any k ≥ n, then and More generally, we will also need in the next two Subsections a uniform bound on |x m+k | (m + k) s+s L for k large enough. We assume here that m ≥ 2 (which will always be the case in practice), and define for any integer M (32) Then for all k < M , and for all k ≥ M Proof. Thanks to Lemma 3.2, Then for k ≥ M , we split the remaining sum The justification of the last inequality is contained in the following three lemmas. Proof. For x > 0, let ϕ 1 (x) def = θ x 2 x(m + x) s+s L , whose derivative is For 0 < θ < 1, the discriminant of ln √ θx 2 + m ln √ θ + s + s L + 1 x + m given by Hence, for x ≥ 4 (ln θ) 2 , ϕ 2 (x) ≤ 0 and so ϕ 2 (k) ≤ ϕ 2 (M ) for all k ≥ M . Then Hence, for x ≥ m, ϕ 3 (x) ≤ 0 and ϕ 3 (k) ≤ ϕ 3 (M ) for all k ≥ M .
Finally, we will need to bound the error made by usingw instead of (w I ) 0 for the definition (21) of A. Lemma 3.9. Assume that L ≥ k 0 and δ < 1 2 . Then Proof. Using (5) together with the sequence (u l ) introduced in the proof of Lemma 2.1, we get

Computation of the Y bounds
From now on, we shall assume for the sake of clarity that the nonlinearity N of f in (1) is a polynomial of degree two. The generalization to a polynomial nonlinearity of higher degree could be obtained thanks to the use of the estimates developed in [5] in order to bound terms like x 1 * . . . * x p n where x 1 , . . . , x p ∈ B 0 (r). Moreover, as long as one is interested in problems with nonlinearities built from elementary functions of mathematical physics (powers, exponential, trigonometric functions, rational, Bessel, elliptic integrals, etc.), our method is applicable. Indeed, since these nonlinearities are themselves solutions of low order linear or polynomial ODEs, they can be appended to the original problem of interest in order to obtain polynomial nonlinearities, albeit in a higher number of variables. This standard trick is explained in more details in [12], and is used in [18] to prove existence of periodic solutions in the planar circular restricted three body problem. With this in mind, we are ready to compute the bound Y appearing in Theorem 3.1. In everything that follows, |·|, when applied to vectors or matrices (even infinite dimensional), must be understood component-wise.
The main estimate of this subsection, that is the bound on Y , is presented in the following Proposition: and Then Proof. By definition of T , Note that since we suppose that f is at most quadratic, and sincex is constructed in such a way thatx k = 0 for all k ≥ m, we get the identity (f (x)) m+k = 0 for all k ≥ m − 1. Thanks to (21), so that using (30) with n = m − 1 and k = 0, we get which provides the bound (37). Using (21) again, so using (29), (30) and (31) (again with n = m − 1), we get which provides the bound (38), and which provides the bound (39). Finally, by (36), θ k (m + k) s ≤ θ M (m + M ) s for all k > M , and we obtain the bound (40).
We present in Section 3.4 the rationale behind the definition of Y m+k for k > M .

Computation of the Z bounds
In order to compute the Z bounds from Theorem 3.1, we need to estimate the quantity for all y, z ∈ B 0 (r). We are going to bound each term separately in the next two Subsubsections. We introduce the notation (41)

Estimates for (I − AA † )z
In this Sub-subsection, we present the bound on (I − AA † )z, which constitutes the first part of a bound for Z.

Estimates for
This Sub-subsection is devoted to the exposition of a bound for A Df (x + y) − A † z, which constitutes the second (and last) part of a bound for Z. This bound is detailed in Proposition 3.18.
Recall the assumption that the nonlinear part N is polynomial of degree 2. Hence, Df (x + y) can be written as a finite Taylor expansion We are going to bound the two terms of (45) separately. Let us denote by σ the coefficient of degree 2 of f , that is D 2 f (x) (y, z) = 2σ(y * z). We bound this convolution product thanks to the following result: Lemma 3.12. Let s ≥ 2 be an algebraic decay rate and n ≥ 6, let L ≥ 1 be computational parameters. For x, y ∈ Ω s and for any k ≥ 0, Proof. See [13] for a proof of this bound and [10] for a similar bound for 1 < s < 2.
Remark 3.13. It is important to notice here that α s k (n) = α s n (n) for all k ≥ n. From now on, we assume that m is taken larger or equal to 6, which will allow us to use Lemma 3.12 with n = m. Note that this condition is not stringent, since in practice more than 6 modes are usually needed in order to get a good numerical solutionx.
We begin by bounding the first term of (45).
Then for all z ∈ B 0 (r) Proof. According to the definition of A † in (6), we see that where in the convolution product, z F must be understood as the infinite vector (z F , 0, . . . , 0, . . .) T . Therefore, Df (x) − A † z 0 = 0, and for all z ∈ B 0 (r), Then, remembering that Df (x) = L + DN (x) and (6), we see that Df (x) − A † z I = (DN (x)z) I = 2σ (x * z) I , so that using Lemma 3.12, for all z ∈ B 0 (r), we end up with the bound We now bound the second term of (45).
We get similar results for the second order term. component-wise by Then Finally we can sum up all the computations of this Sub-subsection and state the following result: Proposition 3.18. Let M be an integer satisfying (32) and (36). We define D 1 (resp. D 2 ) as in Proposition 3.16 (resp. Proposition 3.17) and let Then for all y, z ∈ B 0 (r) Putting this together with Proposition 3.11, we end up with the following result: Proposition 3.19. Let M be an integer satisfying (32) and (36). Let Then for all y, z ∈ B 0 (r), |DT (x + y) z| ≤ Z(r).

The radii polynomials and interval arithmetics
All the work done up to now in Sections 2 and 3 can be summarized in the following statement: : where A is defined as in (21). Take M satisfying (32) and (36) and L ≥ 0 a computational parameter. Then the bound Y defined in Proposition 3.10 satisfies (26) and for all r > 0, the bound Z(r) defined in Proposition 3.19 satisfies (27). Now that we have found bounds Y and Z(r) that satisfy (26) and (27), we must find a radius r > 0 such that Y + Z(r) s < r in order to apply Theorem 3.1. By definition of the norm · s , it amounts to find an r > 0 such that, for every k ≥ 0, the radii polynomial P k (r) satisfies Note that since we constructed Y and Z in such a way that for every k ≥ M , it is enough to find an r > 0 such that for all 0 ≤ k ≤ m + M , P k (r) < 0. In order to do so, we numerically compute, for each 0 ≤ k ≤ m + M , If I is empty, then the proof fails, and we should try again with some larger parameters m and M . If I is non empty, we pick an r ∈ I and check rigorously, using the interval arithmetics package INTLAB [14], that for all 0 ≤ k ≤ m + M , P k (r) < 0, which according to Theorem 3.1, proves that T defined in (25) is a contraction on B s (x, r), thus yielding the existence of a unique solution of f (x) = 0 in B s (x, r).

An example of application
We present in this Section an example of equation, for which it is possible to apply the method developed in this paper. We first explain the link between the equation that we study (cf. (50) below) and the tridiagonal operator defined in Section 2. Then, we explain what are in this example the values of the various constants and parameters of our method.
Equations of the following form: where g is a 2π-periodic even smooth function, fall into the framework developed in Section 2. Consider indeed the cosine Fourier expansions of u and g: u(ξ) = k∈Z x k cos(kξ), g(ξ) = k∈Z g k cos(kξ).
Then, (50) can be rewritten as f (x) = 0, where and for all k ≥ 1, We see that the linear part of (51) is, as in (3), given by and for all k ≥ 1, Let us fix some m ≥ 2. With C 1 = 2, C 2 = 3 and δ = 1 4 We now focus on the example when so that u(ξ) = cos(ξ) is a trivial solution for σ = 0. We are going to use rigorous computations in order to prove the existence of solutions for σ = 0, and to compute these solutions. Starting from σ = 0, we first use standard pseudo-arclength continuation techniques to numerically get some nontrivial approximate solutions for σ = 0. We computed 1250 different solutions (675 for σ > 0 and 675 for σ < 0). See Figure 2 for a diagram summing up those computations, where each point represents a solution of (50). Then we use the rigorous computation method described in this paper to prove, for each numerical solution, the existence of a true solution in a small neighbourhood of the numerical approximation. We keep m = 20 Fourier coefficients for the numerical computation, and use M = 20 and the decay rate s = 2 for the proof. The bounds of Lemma 3.12 as well as the error onω (35) are computed with L = 100. For each numerical solution, the proof is successful. The set I defined in Section 3.4 on which all radii polynomials should be negative always contains [4 × 10 −11 , 10 −4 ], and we rigorously prove using interval arithmetics that they are indeed all negative for r = 10 −10 . Hence the assumptions of Theorem 3.1 hold and as a consequence, within a ball of radius r = 10 −10 in Ω s centered on the numerical approximation, there exists a unique solution to (50). Therefore the existence of the solutions represented in Figure 2 is rigorously proven, within a margin of error that is too small to be depicted. The codes used to perform the proofs can be found in [17].
Notice that existence of solutions of (50) could certainly have been obtained in different and more classical ways, for example using perturbative methods when σ is close to 0, or using a variational approach (that is, considering (50) as the Euler-Lagrange equation related to the critical points of a functional), or even using topological tools such as the Leray-Schauder theory. The advantage of our method is that it gives us more quantitative information than those approaches: indeed it enables to provide more than one solution for some values of σ, and, maybe more importantly, it gives a very precise localization of this (or these) solution(s) in terms of Fourier coefficients (something that looks very hard to obtain with qualitative PDEs methods).

Conclusion and Perspectives
A first interesting future direction of research would consist in adapting our approach to the rigorous computation connecting orbits of ODEs (using spectral methods). For instance, we would like to investigate the possibility of combining Hermite spectral methods with our approach to compute homoclinic orbits (e.g. see [15,16]). Since the differential operator in frequency space of the Hermite functions is tridiagonal, adapting our method to this class of operator could lead to a new rigorous numerical method for connecting orbits.
It would also be interesting to adapt our method to the case of solutions belonging to the sequence space |x k |ν k < ∞} for some ν ≥ 1. With this choice of Banach space, we could use the fact that 1 ν is naturally a Banach algebra under discrete convolutions. This could greatly simplify the nonlinear analysis.
Note that assumption (5) requires the tridiagonal operator to have symmetric ratios between the diagonal terms and the upper and lower diagonal terms. This is a restriction that could hopefully be relaxed. Since many interesting problems involve tridiagonal operators with non symmetric ratios (as in the case of differentiation in frequency space of the Hermite functions), we believe that this is a promising route to follow.
Finally, generalizing our approach to problems with block-tridiagonal structures could also be a valuable project.