Spectral and norm estimates for matrix sequences arising from a finite difference approximation of elliptic operators

When approximating elliptic problems by using specialized approximation techniques, we obtain large structured matrices whose analysis provides information on the stability of the method. Here we provide spectral and norm estimates for matrix sequences arising from the approximation of the Laplacian via ad hoc finite differences. The analysis involves several tools from matrix theory and in particular from the setting of Toeplitz operators and Generalized Locally Toeplitz matrix sequences. Several numerical experiments are conducted, which confirm the correctness of the theoretical findings.


Introduction
In the numerical approximation of elliptic differential equations, by using specialized approximation techniques, we obtain large structured matrices whose analysis provides information on the stability of the method.Here we provide spectral and norm estimates for matrix sequences arising from the approximation of the Laplacian via ad hoc finite differences that is from the Coco-Russo method [5].
The analysis involves several tools from matrix theory and in particular from the setting of Toeplitz operators and Generalized Locally Toeplitz (GLT) matrix sequences.Several numerical experiments are conducted, which confirm the theoretical findings.
The paper is organized as follows.Subsection 1.1 contains a motivation and a description of the Coco-Russo method, together with a brief account on the related literature.Subsection 1.2 contains the necessary tools from the Toeplitz technology, while Section 2 contains the matrix formulation in 1D in the language of Toeplitz structures, the analysis of the norm estimates in 1D, together with related numerical experiments and a preliminary discussion on the spectral features of the involved matrix-sequences.Section 3 contains more details on the 2D method, on its matrix formulation, on the spectral results in 1D and in 2D, and the basic tools taken the GLT theory.A discussion on the more challenging case of the norm estimates in 2D is also provided.A conclusion section ends the paper with a mention to a few open problems.

Method description and motivation
The design of numerical methods to solve Partial Differential Equations (PDE) on complex-shaped domains is obtaining an increasing interest in the scientific community.One of the bottlenecks of modern computer simulations is the modelling of physical processes around 3D complex-shaped objects through PDE.Finite Element Methods (FEM) are well-established approaches to solve PDE and supported by rigorous theoretical analysis developed in the last decades to prove the convergence and accuracy order of the method when the grid size approaches zero.
However, some critical limitations are commonly associated in literature with FEM, especially when applied to curved boundaries.In particular, the generation of elements to conform highly varying curvatures of the boundary might become cumbersome, especially if the domain changes its shape over time.Also, the design of a balanced partition of the mesh for parallel FEM is unhandy.For these reasons, approaches based on Finite Difference Methods (FDM) where the domain is immersed into a fixed grid are increasing their popularity in literature, since they do not require any mesh generation effort and at the same time allow for a natural design of parallel solvers.
On the other hand, FDM are commonly based on heuristic approaches and convergence and stability analysis are not sufficiently developed in literature, especially for the case of curved boundaries.
The Immersed Boundary Method proposed by Peskin in [15] and further developed by LeVeque and Li in [12] is a pioneer approach based on FDM for general domains immersed on fixed grids.
A more recent approach is the Ghost-Fluid Method proposed by Fedkiw et al. in [6] and further extended to higher accuracy by Gibou et al. in [10,9], where the values on grid nodes just outside the domain (ghost points) are obtained by accurate extrapolations of the boundary condition from inside values.
In [5], the authors present a highly efficient and accurate ghost-point method to solve a Poisson equation on a complex-shaped domain, modelled by a level-set function.Several numerical tests were presented to confirm the accuracy order and the efficiency of the multigrid solver.However, a theoretical analysis was missing.The method has been extended to several applications, such as compressible fluids in moving domains [3] or volcanology [4].
In this paper we present a technique to prove the stability of the Coco-Russo method [5] and the convergence to the predicted order of accuracy.
We start from the 1D problem.Consider the elliptic boundary-value problem: and a one-dimensional uniform grid G h = {x 0 , x 1 , . . ., x n+1 } with a constant spatial step h = x i − x i−1 , for i = 1, . . ., n + 1.Then, x i = x 0 + i h.Let x 0 < a < x 1 and x n+1 = b (see Fig. 1).The elliptic equation −∆u = f is discretized by central differences on x i for i = 1, . . ., n and the boundary condition on x = b = x n+1 is included in the internal discretization (this is the so Figure 1: Discretization of the 1D domain.Full black circles are the interior grid points, while the full square is the ghost point.Boundary values are indicated with red stars.Linear extrapolation is used to define the ghost value u 0 from u 1 and the left boundary value g a .called eliminated boundary condition approach): . The boundary condition on x = a is approximated by q(a) = g a , where q(x) is the polynomial of degree s − 1 that interpolates u on the grid points u 0 , u 1 , . . ., u s−1 .We call s the stencil size for the boundary condition on x = a.
The discretization of the boundary condition can be represented as: For s = 2 we have ϑu where ϑ = (x 1 − a)/h.The grid point x 0 is called ghost point and u 0 is the ghost value.
Although we can follow a similar technique for the boundary condition on x = a to the one that we adopted for x = b (i.e.we can solve (4) for u 0 and substitute its value into the internal equation for x 1 ), we keep a non-eliminated boundary condition approach in order to develop a theoretical analysis that can be straightforwardly extended to higher dimensional cases, where the eliminated approach is impractical.

The discretized problem is then a linear system A
where h = (b − x 0 )/(n + 1).

Toeplitz structures and related tools
Let T n be a Toeplitz matrix of order n and let ω < n be a positive integer where the coefficients a k , k = −ω, . . ., ω, are complex numbers.Let f ∈ L 1 (−π, π) and let T n (f ) be the Toeplitz matrix generated by f i.e. (T n (f )) s,t = a s−t (f ), s, t = 1, . . ., n, with f indicated as generating function of {T n (f )} and with a k (f ) being the k-th Fourier coefficient of f that is With these notations the matrix reported in (6) can be written as T n = T n (f ), where the generating function is f (θ) = ω j=−ω a j e ijθ .It is worth noticing that study of the generating function gives plenty of information on the spectrum of T n (f ) for any fixed n, and also asymptotically as the matrix-size n diverges to infinity (see [7] and [8] for the multilevel setting).For instance,r if f is real-valued almost everywhere (a.e.), then T n (f ) is Hermitian for all n.Furthermore, when f is real-valued and even a.e., the matrix T n (f ) is (real) symmetric for all n, while f real-valued and nonnegative a.e., but not identically zero a.e., implies that T n (f ) is Hermitian positive definite for all n: in such a setting the considered matrix-sequence could be ill-conditioned and indeed if f is nonnegative and bounded with essential supremum equal to M > 0 and a unique zero of order α > 0, then the maximal eigenvalue converges monotonically from below to M , whereas the minimal eigenvalues converges to zero monotonically from above with a speed dictated by α, that is the minimal eigenvalue is asymptotical to n −α .In many practical applications we remind that it is required to solve numerically linear systems of Toeplitz kind and of (very) large dimensions and hence several specialized techniques of iterative type, such as preconditioned Krylov methods and ad hoc multigrid procedures have been designed; we refer the interested reader to the books [14,2] and to the references therein.We recall that such types of large Toeplitz linear systems emerge from specific applications involving e.g. the numerical solution of (integro-) differential equations and of problems with Markov chains.

Matrix formulation and notation in 1D
The linear system to solve is (5), and we can decompose the matrix A h ∈ R (n+1)×(n+1) as follows where T n+1 (f ) in ( 8) is the Toeplitz matrix generated by f according to (7), with f (θ) = 2−2 cos(θ) so that, in the matrix in ( 6), we have α = 1, a 0 = 2, a 1 = a −1 = −1.Furthermore we have defined S n+1 in (9) as ).For this matrix everything is known and in fact with Q real symmetric and orthogonal and .
Hence its conditioning κ 2 (•) in spectral norm (the one induced by the Euclidean vector norm) is exactly known and it is equal to where ) and where, in our setting, a even more precise relation can be derived, that is that is . Since everything is known regarding the term S n+1 our idea is to reduce the analysis as much as possible to information concerning the matrix S n+1 and its inverse and to this end the application of the Sherman-Morrison-Woodbury is appropriate.
The Sherman-Morrison-Woodbury formula states that for and invertible square matrix A, column vectors u and v, and 1 and thus we can obtain in our setting defined above in (10) with A = S n+1 and u = e1 Our goal is to estimate quite accurately A −1 h p with p ∈ [1, ∞] and with • p being the matrix norm induced by the vector norm 1/p .We concentrate our efforts in the case where p = 1, 2, ∞, since the other estimates can be obtained via classical interpolation techniques.
We start by estimating The latter are used for giving quite precise bounds on 2 can be obtained by a direct check, but it essentially follows from the estimates on The components of the inverse r , are defined by, for a fixed column c, and symmetrically for a fixed row r All terms of S −1 n+1 (and T −1 n+1 ) are positive and real, and they are symmetric.Hence by using the explicit expressions of the considered norms, we find Numerically it is obvious that the highest row sum for matrices T −1 n+1 with n + 1 even is for row index r = (n + 1)/2 (or r = (n + 1)/2 + 1, they are equal).For odd n + 1, the highest row sum is for row index r = (n + 2)/2.
Thus for n + 1 even and for n + 1 odd Consequently, for n + 1 even, we deduce and for n + 1 odd, we have As a conclusion, for all n + 1 and using the symmetry and ( 17) and (18), we obtain that and the limit as the matrix size tends to infinity, that is (1) , we find that (1) . (20) Moreover we have from ( 13) that the components of t (1) are and we have from ( 8) .
and thus the components of the row vector , and the components of the matrix where t (1) r is defined in (21), and are defined in ( 14) and (15).Therefore for c = 1 and for c > 1 Thus we can now define the components of (R n+1 ) r,c = , defined in (20), since we have ( 22), (26), and (27).For c = 1 we have and for c > 1 Numerically it is obvious that R n+1 1 and R n+1 ∞ is always for the first column and first row.
We now compute R n+1 ∞ , by taking into account that all coefficients are positive except the first in the first column.We have from ( 28) and ( 29)

Estimating
n+1 and R n+1 positive and negative respectively, we can just compute the norm directly for A −1 h .The sum of the positive elements of the first column of T −1 n+1 is equal to n+1 2 , and thus the sum for the first column of S −1 n+1 is h 2 (n+1)

2
= h 2 , and the sum of components −(R n+1 ) r,1 is given in (30), that is We have from ( 15) and by using (31) we get and by (32) 35) is always negative we have As a conclusion we deduce from (33) and (36) In order to make a comparison, we recall that we know the exact asymptotical behavior of S −1 n+1 2 , with S n+1 being the pure Toeplitz counterpart of A h , as reported below

Spectral results: comments
Here we give a short discussion on few items that, for some aspects, will be considered in more detail in Section 3 and for other aspects will be listed as open problems in the conclusion Section 4.
• The estimates for A −1 h p are tight and the growth is like n 1/p : however the numerical growth of the error seems to be bounded by a constant independently of p.The reason relies on the vectors for which the norm is attained.Such vectors should be concentrated on the first component and this is quite unphysical and it is not observed in practice.
• Even if A h and its inverse are not symmetric we can prove the spectrum of the related matrixsequence is clustered along a real positive interval, using the results of the GLT technology reported in Subsection 3.2 (see also [1,11]): we refer to Subsection 3.4 where the analysis is performed both in 1D and 2D.
• Regarding the estimates of A −1 h p , the 2D case (and generically the dD case) is more difficult, but we can take advantage of the one dimensional case and from a clever tensor structure of the problem when the domain is rectangular (hyper-rectangular in the dD case).
• When the domain is generic a possibility is given by embedding techniques already exploited in the distributional setting via the GLT approach (see [16,17]).

Numerical tests in 1D
We consider the 1D problem (1) with a = 0 and b = π.We choose f = − sin(x), g a = 0, and g b = 0 so that u = − sin(x i ) is the exact solution in points x i .
We perform several tests varying the value of ϑ ∈ [0, 1], in order to establish whether the convergence of the method depends on the choice of ϑ.In practice, we choose ϑ and n and we compute h and x 0 accordingly: The numerical error e h = u − u h satisfies the following equation: where τ h is the consistency error: Consider the p−norm: In Fig. 2 we show that: confirming that the method is second-order consistent and accurate.
We complete the analysis showing the behaviour of the spectral radius of the matrix A −1 h .Fig. 2.5 shows how the smallest eigenvalue (in absolute value) of the matrix A h changes in relation to n (left panel) and in relation to ϑ (right panel).Fig. 2.5 shows that the smallest eigenvalue (in absolute value) of the matrix A h essentially does not depend on the value of ϑ and it approaches a constant values when n goes to infinity.

Since e h L
), as predicted in the first item of Subsection 2.4.

Problem formulation in 2D and related analysis
The section is organized into three parts: first we introduce the d-level notation and the d-level Toeplitz matrices in Subsection 3.1, secondly we define the notion of spectral and singular value distribution and the * -algebra of Generalized Locally Toeplitz matrix-sequences in Subsection 3.2, then we describe the matrices arising in the approximation of a Dirichlet problem by the Coco-Russo method in Subsection 3.3, and finally we give a spectral analysis of the resulting matrix-sequences in Subsection 3.4.

Multilevel notation: the case of multilevel Toeplitz and diagonal sampling matrices
We start by introducing the multi-index notation, which is useful in our context.A multi-index i ∈ Z d , also called a d-index, is simply a (row) vector in Z d ; its components are denoted by i 1 , . . ., i d .• 0, 1, 2, . . .are the vectors of all zeros, all ones, all twos, . . .(their size will be clear from the context).
• For any d-index m, we set N (m) = d j=1 m j and we write m → ∞ to indicate that min(m) → ∞.
• If h, k are d-indices, h ≤ k means that h r ≤ k r for all r = 1, . . ., d.
• The standard lexicographic ordering is assumed uniformly For instance, in the case d = 2 the ordering is the following: We now briefly summarize the definition and few relevant properties of multilevel Toeplitz matrices, that we will employ in the analysis of the 2D setting.Given n ∈ N d , a matrix of the form with e vector of all ones, with entries a k ∈ C, k = −(n − e), . . ., n − e, is called a multilevel Toeplitz matrix, or, more precisely, a d-level Toeplitz matrix.Let φ : [−π, π] d → C r×r a matrix-valued function in which each entry belongs to L 1 ([−π, π] d ).We denote the Fourier coefficients of the generating function φ as where the integrals are computed component-wise and (k, θ) = k 1 θ 1 + . . .+ k d θ d .For every n ∈ N d , the n-th Toeplitz matrix associated with φ is defined as or, equivalently, as where ⊗ denotes the (Kronecker) tensor product of matrices, while J m is the matrix of order m whose (i, j) entry equals 1 if i − j = l and zero otherwise.We call {T n (φ)} n∈N d the family of (multilevel block) Toeplitz matrices associated with φ, which, in turn, is called the generating function of {T n (φ)} n∈N d .Multilevel Diagonal Sampling Matrices.For n ∈ N and a : [0, 1] → C, we define the diagonal sampling matrix D n (a) as the diagonal matrix For n ∈ N d and a : [0, 1] d → C, we define the multilevel diagonal sampling matrix D n (a) as the diagonal matrix with the lexicographical ordering (41) as discussed at the beginning of the subsection.

GLT matrix-sequences: operative features
We start with the definition of distribution in the sense of the eigenvalues (spectral distribution) and in the sense of the singular values (singular value distribution) for a given matrix-sequence.
Then we give the operative feature of the * -algebra of matrix-sequences.
Definition 1.Let {A n } n be a sequence of matrices, with A n of size d n , and let f : D ⊂ R t → C be a measurable function defined on a set D with 0 < µ t (D) < ∞.
• We say that {A n } n has a (asymptotic) singular value distribution described by f , and we write • We say that {A n } n has a (asymptotic) spectral (or eigenvalue) distribution described by f , and we write If {A n } n has both a singular value and an eigenvalue distribution described by f , then we write The symbol f contains spectral/singular value information briefly described informally as follows.With reference to (44), assuming that d n is large enough and f is at least Riemann integrable, except possibly for a small number of outliers, the eigenvalues of A n are approximately formed by the samples of f over a uniform grid in D, so that the range of f is a (weak) cluster for the eigenvalues of {A n } n .It is then clear that the symbol f provides a 'compact' and a quite accurate description of the spectrum of the matrices A n for n large enough.Relation (43) has the same meaning when talking of the singular values of A n and by replacing f with |f |.
A d-level (d ≥ 1 integer) GLT matrix-sequence {A n } n is nothing more than a matrix-sequence endowed with a measurable function κ : [0, 1] d × [−π, π] d → C called symbol characterizing the distributional properties of its singular values, and, under certain hypothesis, of its spectrum.For a complete overview of the theory we refer to the books [7,8], while here we recall only the operative features we need for our restricted setting.Since we have already introduced the multilevel Toeplitz and diagonal matrix-sequences, the only other class we need is that of zero-distributed matrix-sequences, whose definition depends on Definition 1. Definition 2. [Zero-distributed sequence] A matrix-sequence {Z n } n such that {Z n } n ∼ σ 0 is referred to as a zero-distributed sequence.In other words, {Z n } n is zero-distributed if and only if In a different language, more common in the context of preconditioning and of the convergence analysis of (preconditioned) Krylov methods, a zero-distributed matrix-sequence is a sequence of matrices showing a (weak) clustering at zero in the sense of the singular values (see e.g.[7,19] and references therein).
With the notaion indicating by • the spectral norm (i.e. the maximal singular value or equivalently the induced Euclidean norm) and by • 1 the trace norm (i.e. the sum of all singular values), the following result holds true [7].
• every X n is Hermitian, GLT 3. We have A more general and more advanced result regarding item GLT2 can be found in [1,11], even if for our purposes item GLT2 is sufficient in our setting.
The square [0, 1] 2 is discretized through a uniform Cartesian grid with (n + 2) 2 grid points (x i , y j ) = (i h, j h), for i, j = 0, . . ., n + 1, where h = 1/(n + 1).As in the 1D case, let 0 < a < h and call ϑ S = (x 1 − a)/h (see Fig. 4).The subscript S stands for south, since the boundary y = a is the bottom side of the domain.A similar approach can be followed in the other cases.
Figure 4: Discretization of the 2D domain.Full circles are the n 2 inside grid points, while full squares are the n ghost points.Linear extrapolation is used to define the ghost values u i,0 from u i,1 and the boundary values g(x i , 0), for i = 1, . . ., n.
The elliptic equation −∆u = f of problem ( 45) is discretized by central finite difference on internal grid points, with eliminated boundary conditions on the boundaries x = 0, x = 1 and y = 1.Then, for 2 ≤ i ≤ n − 1 and 1 ≤ j ≤ n − 1 we have: while for for i = 1 and j = 1, . . ., n − 1 we eliminate the boundary condition on x = 0: Similarly, we elimiante the boundary conditions on x = 1 and y = 1.The boundary condition on y = a is discretized by linear interpolation: Overall, there are n 2 inside grid points (x i , y j ) for i, j = 1, . . ., n and n ghost points for (x i , 0) for i = 1 . . ., n.
Using a total lexicographical order, the matrix of coefficients that we obtain is a 2-level matrix with the following structure: A h has n blocks of G, so A h ∈ R n(n+1)×n(n+1) .

Spectral analysis in 1D and in 2D
Having in mind the notations of Subsection 3.1, the matrix A h can be decomposed in the following way where n T k (2 − 2 cos(θ)) is a Toeplitz matrix, already used in the 1D case in Section 2, and Of course, taking into account relation (42) with d = 2 and (47), the function f is bivariate and can be written as f (θ 1 , θ 2 ) = 4 − 2 cos(θ 1 ) − 2 cos(θ 2 ).
Therefore by using item GLT1 in Theorem 3 we have in the sense of of Subsection 3.2, so that according to Definition 1. Furthermore, since T n (f ) is Hermitian (in fact real symmetric) for any choice of the partial sizes, thanks to item GLT1, we deduce {T n (f )} ∼ λ f as well.Now, taking into account Definition 2, it is easy to see that {X n } ia a zero-distributed matrixsequence, sinply because its rank is bounded by n and hence the number of nonzero singular values is at most n = o(n(n + 1)) with N (n) = n(n + 1) being the sinze of X n .Therefore by item GLT3 {X n } ∼ GLT 0, so that {h 2 A h } ∼ GLT f by item GLT4, since both {T n (f )}, {X n } are GLT matrix-sequences and h 2 A h = T n (f ) + X n for any choice of the partial sizes.Then, again by item GLT1 we deduce However, X n is non-Hermitian and therefore we cannot apply item GLT1 for concluding {h 2 A h } ∼ λ f .However, this can be done by using item GLT2, as proven in the following lines both in 1D and in 2D.
Proof In 1D we recall the identity Since e 1 v T h is a rank one matrix, it has a unique nozero singular value so that Therefore, by item GLT2, we infer that both the GLT matrix sequences {h 2 A h }, {T n (2 − 2 cos(θ))} share the same eigenvalue distribution function 2 − 2 cos(θ), which is the GLT symbol, so that (49) is proven.
Consequently, again by item GLT2, we deduce that both the GLT matrix sequences {h 2 A h }, {T n (4− 2 cos(θ 1 )−2 cos(θ 2 ))} share the same eigenvalue distribution function 4−2 cos(θ 1 )−2 cos(θ 2 ), which is the GLT symbol, and hence (50) is proven. • The previous result shows a spectral distribution as nonnegative functions both in 1D and 2D.More precisely, looking at the range of the spectral symbols, we deduce that [0, 4] is a cluster for the eigenvalues of {h 2 A h } in 1D, while [0, 8] is a cluster for the eigenvalues of {h 2 A h } in 2D.This is nontrivial (and somehow unexpected), given the fact that the related corrections are non-Hermitian and possess only strictly negative eigenvalues and zero eigenvalues.

Conclusions
We have provided spectral and norm estimates for matrix sequences arising from the approximation of the Laplacian via the Coco-Russo method and we have validated them with a few numerical experiments.The analysis has involved several tools from matrix theory and in particular from the setting of Toeplitz operators and Generalized Locally Toeplitz matrix sequences.Open problems remain involving variable coefficients and non square domains: both cases can be handled form a spectral view point using the GLT machinery.In particular when considering variable coefficients, the use of the diagonal sampling matrix-sequences allows to remain in GLT * -algebra, while the case of non square domains can be treated using the reduced GLT theory (see page 398-399 in [16] and Subsection 3.1.4in [17]).
More involved is the case of the norm estimates of the inverse even in the case of a square in 2D.Below we present an idea in this direction.
Actually the decomposition (48) suggests, as in the 1D setting, the use of the Sherman-Morrison-Woodbury formula: we can set A = T n (f ), X n = U CV , n = (n + 1, n), so that ].

Figure 2 :
Figure 2: The dots represent the p -norm of the numerical error e h (top) and consistency error τ h (bottom) for different values of n (horizontal axis) and ϑ: ϑ = 0 (left), ϑ = 0.5 (middle), ϑ = 1 (right).The solid line is a reference for second-order decay.

Figure 3 :
Figure 3: Smallest eigenvalue in absolute value (vertical axis) for different values of n (horizontal axis, left plot) or for different values of ϑ (horizontal axis, right plot).