Computers and Mathematics with Applications

A convergence result is stated for the numerical solution of space-fractional diffusion problems. For the spatial discretization, an arbitrary family of finite elements can be used combined with the matrix transformation technique. The analysis covers the application of the implicit Euler method for time integration to ensure unconditional stability. The spatial convergence rate does not depend on the fractional power of the Laplacian operator. An efficient numerical implementation is developed avoiding the direct computation of matrix powers.


Introduction
Many observations confirmed the presence of the fractional diffusion in the natural sciences. Tracking the motion of food-seeking animals, the presence of ''anomalous'' diffusion was reported earlier [1,2]. In concrete terms, assuming diffusion, the flight x(t) of the individuals during time t obeys a normal distribution. Consequently, the flight-length mean ⟨|x(t)|⟩ of the individuals should be proportional with √ t. Due to accurate measurements in the last decade, the dynamics of individual molecules could also be observed: in many cases, it exhibits a subdiffusive behavior such that instead of the above proportionality, the linear relation ⟨|x(t)|⟩ ∼ √ t α was detected. See a detailed overview of these measurements in [3]. A possible explanation of this dynamics can be found in [4].
A number of different mathematical models have been suggested to simulate this dynamics. Among the PDE models, in case of homogeneous Dirichlet boundary conditions, a possible choice is taking the fractional Laplacian on the entire space R d (see [5]) and restricting to functions which are identically zero outside of the computational domain. The numerical analysis and implementation for the corresponding elliptic problems can be found in [6] and [7], respectively. In the one-dimensional case, an interesting, physically motivated approach is analyzed in [8].
Another conventional choice, which is used in the present work, is the spatial differential operator −(−∆ D ) α , where ∆ D denotes the Dirichlet Laplacian.
For a systematic comparison of the different approaches, we refer to [9,10] and [11].
For the numerical solution of problems with −(−∆ D ) α , the fractional power can be applied also at the discrete level, approximating (−∆ D ) α with the power of the discretization of −∆ D . This was first observed in [12] and analyzed in [13] for the corresponding elliptic problems. A similar analysis -based on a general integral form of operator powers -was performed in [14] using an efficient numerical integration technique.
Regarding the full discretization with an implicit time stepping and efficient implementation issues, three main approaches were proposed in the literature. In [15], the authors propose a direct approximation of the solution operator (which is a matrix-power exponential) based on the elliptic theory in [14]. The corresponding contour integral is recasted as a real improper integral and approximated using an exponentially convergent quadrature.
In the works [16] and [17], explicit and implicit time discretizations are used. In [17], again using an integral representation, an efficient simple quadrature is proposed to approximate time steps in an implicit method, which is shown to be unconditionally stable.
A related approach was recently developed in [18], where the best rational approximation (BURA) of the implicit time-step operators -including matrix powers -is constructed.
Dealing with the full matrices arising from the discretization of non-local operators is a challenging topic. One can also explore their structure, which makes possible to develop an efficient solution of the corresponding linear problems, see, e.g., [19] and [20].
The aim of the present contribution is to propose an alternative approach, which has the following advances.
• The analysis for the full discretization error is simple.
• The spatial convergence order is independent of the power α and can ensure a higher-order accuracy.
• The numerical method is simple and applicable without any extra computations for all exponents α.
To compare which the earlier achievements, note that in the literature, a spatial error of order h α appears in the error estimates for any initial conditions, see, e.g., Theorem 3.1 in [15]. We point out that assuming smooth initial condition, this can be changed to h k , where k is the polynomial order in the finite element approximation. The smoothness condition will be discussed after Theorem 2. Also, in the computationally most efficient BURA approach (see [18]), the determination of the coefficients for an arbitrary rational function r α needs efforts.
The main ideas of our works are the following: • For the analysis of the spatial discretization error we have used a weak form, which, for the FE discretization is sufficient.
• The computation algorithm is based on the matrix power-vector product in [21] combined with a conjugated gradient method such that the entire algorithm is composed of sparse matrix-vectors products.

Mathematical preliminaries
We investigate the numerical solution of space-fractional diffusion problems. Recall that the (negative) Dirichlet Laplacian operator −∆ D : L 2 (Ω) → L 2 (Ω) is positive and has a compact inverse. The complete orthonormal system of its eigenfunctions and the corresponding eigenvalues will be denoted by respectively. Then the fractional Dirichlet Laplacian, which is investigated here, is defined with For more details, we refer to [22]. With this, the space-fractional diffusion equation reads as where Ω ⊂ R d (d = 1, 2, 3) is a Lipschitz domain. Note that the definition of the Dirichlet Laplacian involves the homogeneous Dirichlet boundary condition. We use the notion of Sobolev spaces H k (Ω) and H k 0 (Ω) with a non-negative index k; the corresponding inner product is denoted with (·; | ·) k and the notation ∥ · ∥ k will be used for the corresponding Sobolev norm. In case of k = 0, we usually omit the subscript. To depict clearly the matrix-vector operations in the practical computations, the Euclidian scalar product in R N will be denoted with (·, ·).
In the estimates, the relation r 1 ≲ r 2 means that r 1 ≤ cr 2 is valid with a positive mesh-independent constant c.
The spatial discretization is performed using a generic finite element space V k h ⊂ H k 0 (Ω) and the corresponding elliptic We assume that V k h is chosen so that For the corresponding requirements, we refer to [23 In the discrete setting, we use the following expansion of u h ∈ V h : which defines a natural linear bijection For the approximation of the differential operator, we apply the so-called matrix transformation technique. According to this approach, the operation In case of finite difference discretization, this is straightforward [24], but for finite element methods combined with implicit time-stepping the definition of D h needs a special care.

Results
Using the variational principle together with the backward Euler time stepping for α = 1, the full discretization of (1) can be given as where δ > 0 is the time step and is the approximation of u(nδ, ·) : Ω → R. According to (4), this can be recasted into the matrix-vector form An important observation here, that for using the matrix transformation method, we need to take the power of D h instead of the stiffness matrix S h . With this, the matrix transformation method for a general α ∈ R + , combined with the time discretization above, can be given as We perform an error analysis for this approach.

Spatial discretization
For the error analysis of the spatial discretization, we need the following identities. and Proof.
Using the definition of π h , and the expansion in (5), we have as stated. The derivation of the second equality can be performed in a similar way. □ For the brevity, we also use the notation which gives an approximation of (−∆ D ) α on V k h . To obtain a sharp estimate of this term, we use a weak formulation of Balakrishnan's representation [25] in the case of Hilbert space operators. Henceforth, in the article, we assume that α ∈ (0, 1). Theorem 1. Let A denote a positive self-adjoint operator on a Hilbert space H and α ∈ (0, 1) an arbitrary exponent. Then for all u ∈ Dom A and v ∈ H we have the following equality: Henceforth, we analyze the case A = −∆ D and A h = −∆ D,h . The basis of the spatial error estimation is the following statement.

Proposition 1.
Using the above notations, we have the following inequality for each eigenfunction φ j of −∆ D : Proof. Using Balakrishnan's representation in (11), we have Using the definition of P h in (2), then (9), (8) and (10), we obtain that for an arbitrary w ∈ H k 0 (Ω) the following identity is valid: . Using this with w = (sI + A) −1 φ j , we can rewrite the scalar product on the right hand side of (12) as follows: Inserting the identity (13), using the self-adjoint property of (sI + A h ) −1 and (12) again, we obtain that Applying the approximation property in (3) with (13) and (14) with the Cauchy-Schwarz inequality we finally have that Since A h is a positive operator on V h , we also have Inserting this estimate with (15) into the right-hand side of (12), we get | ( as stated in the proposition. □ Using the orthonormal system { φ j } j∈N , we have the following expansions in (1): and in the second case, the t-dependence of the coefficients u j is not displayed.

Theorem 2. Assume that
Then the following estimate holds for a general u = u(t, ·) in (1):

Proof.
The solution of (1) can be given as see [26], such that |u j | < |u 0j | and the assumption imply that uniformly for all t ∈ R + .
We also recall a growth condition for the H k (Ω)-norm of the Dirichlet-Laplacian eigenfunctions: Note also that by the assumption, u j λ j → 0. Hence, there is an index j 0 such that for all j ≥ j 0 , the following estimate is valid: In other words, (19) means for α = 1 that In this way, the limit is also valid in the H 1 (Ω)-norm so that using the H 1 (Ω)-orthogonality of the projection P h and (16), we finally have Applying the continuous linear operator A α (20), using (19), the estimate in Proposition 1, the estimate in (18) and finally the assumption in (17), we have which proves the statement. □

Time discretization
Now we can prove the convergence of the full discretization. Since for the solution u of the problem in (1), we have u(t, ·) ∈ C ∞ (Ω) for every t ∈ (0, T ] (see Proposition 1 in [9]), we only need the assumption in Theorem 2 for the initial data. Note that regarding the stability, a related result was established in [16] including also second order time discretizations with a possible source term. Here, we also give the convergence rate explicitly.
Theorem 3. Using the assumption in Theorem 2 for u(0, ·) in (1), the full discretization in (7) is convergent of where k is the approximation order of the finite element discretization.
Proof. Rewriting (7) in V k h gives the scheme where u j h is the numerical solution at t = jδ. □ To analyze this scheme, we use the notation u j = u(jδ, ·) with u the analytic solution of (1) and the elliptic projection P h introduced in (2) to obtain the following equality: where z j can be recognized as a consistency error, which we estimate termwise. Obviously, using the general mean value theorem, and the smoothness of the analytic solution, we obtain Similarly, the general mean value theorem and the approximation property in (3) imply Combining these estimates with Theorem 2, we get Using the notation y (21) and (22), we obtain that Taking here v h = y j+1 h and rearranging the equality, we have and therefore, using the positivity of A α h and (23), we get which results in the following estimate: A consecutive application of this inequality gives which, together with (3) can be used to get the final error estimator as stated in the theorem.

Numerical method
For the numerical solution of problem (1), we need to use an efficient method avoiding the direct computation of the α-th power of the matrices. Such an approach was first proposed in [21], which can be applied immediately only for explicit time stepping. Here we propose an algorithm to approximate the implicit time stepping u j+1 = (I + δA α h ) −1 u j , i.e. u n = (I + δA α h ) −n u 0 without computing matrix powers or solving linear systems with a dense matrix. We also assume here that A h is symmetric, which will be satisfied for the finite element space in the numerical experiments. In a general situation, it is also satisfied if the finite element basis functions are translations of each other. In practice, we can use this if the domain is approximated using a square grid.
In a general situation, for arbitrary finite elements, an L 2 -orthogonal basis implies M h = I h such that A h = I −1 h S h becomes symmetric. This, however, leads to a full matrix A h , which will slow down our algorithm.
For this, we will combine the algorithm in [21] with a conjugate gradient method.

The algorithm
The proposed method consists of the following steps.
(ii) Let X = [x 1 , x 2 , . . . , xk] ∈ R n×k denote the matrix composed of these eigenvectors, and Qk = XX T the orthogonal projection matrix to the subspace span {x 1 , x 2 , . . . , xk}. In this case, I − Qk is the orthogonal projection matrix to the complementary subspace. (iii) The problem is divided then into two parts: (a) We can directly compute the first part, since we already know the corresponding eigenvalues: In the steps of the conjugate gradient algorithm, we use an approximation of the matrix-vector products (I + δA α h )w without computing A α h . This is performed using the Taylor series approach where σ (A) denotes the spectral radius of A.
Remark. The conjugate gradient algorithm is suitable here as we have a symmetric positive definite matrix I + A α h . The stopping criterion (or tolerance) for this procedure is given by discussing the numerical experiments.
An important parameter in the approximation in (25) is the parameter K . An estimation of this is given in Section 2.3 in [21], which motivated our choice in Section 4.3.
The operations of the conjugate gradient method are invariant to the subspace ran(I − Qk), thus the Taylor-series method will converge quickly in every time step. Also, in the Taylor series approach, we use only sparse matrix-vector products such that beyond the eigenvalue approximation, the entire algorithm involves only these very quick operations.

Error analysis of the algorithm
Recall that in Theorem 3, we estimated the difference between the analytic solution and the numerical solution based on the implicit Euler time stepping. In the practice, however, according to (iii)(b) in the above algorithm, we do not apply directly the implicit time steps. In concrete terms, we compute CG n w 0 instead of (I + δA α To estimate the extra error term arising from this approximation, we also use the notation T (A h , α) for the Taylor series approximation of A α h . We assume that T (A h , α) gives also a positive definite matrix with the minimal eigenvalue λ 1,T , which is satisfied for any reasonable approximation. Since A α h is also positive definite, for all exponents n ∈ N, we have With these, a triangle inequality gives where the first term, using (26) can be estimated further as Note that a computable upper bound for the Taylor's remainder term ∥A α h − T (A h , α)∥ was developed in [21].
To estimate the second term in (27), we recall that a time step in (iii)(b) has the form such that we define the computational error e j+1 regarding the CG method as Note that the error term, using again (26), can be controlled easily based on the estimation where the right-hand side can be computed using a sparse matrix-vector product. Using (29) in the second term of (27), we obtain Accordingly, using (26), we obtain the estimate Summarized, the inserting (28) and (30) into (27) gives the following upper bound for the computational error: Here, as mentioned, all of the terms can be controlled, moreover, the estimate is independent of the number of time steps.

Numerical experiments
We investigate the model problem which has the analytic solution The computational domain was tessellated uniformly and locally, Q 1 elements were used to constitute the finite element space. Since they contain all piecewise linear functions, by means of Corollary 1.109 in [23], the spatial approximation order is k = 2 in (3). Therefore, according to 3, we expect second-order convergence spatially. Also, this choice ensures that A h is symmetric.
In the conjugate gradient algorithm, we have used the tolerance max j=1,2,...,N ∥e j ∥ = 10 −6 and for a fine spatial grid, K=100, 200 and 300 terms in the Taylor expansion. To accelerate the convergence of this, we have computed 10 eigenvalues in step (i) of our algorithm. Corresponding to (31), these are the chief parameters in our estimation.
In each case, the computational error was computed in a discrete L 2 norm and we have displayed the computing time in seconds.
After the consecutive refinements, we have estimated the convergence rate, which is shortly called rate in the tables.
In the first In a second series of experiments, we investigated the spatial convergence rate using for several spatial discretization parameters h by fixing δ = 0.01 such that, practically, the accuracy depends mainly on the spatial discretization parameter h. To depict the efficiency of the algorithm, we have also displayed the number of unknowns (DOF) in the corresponding linear system and the effect of choosing different number K of terms in the Taylor expansion. An extra step in our algorithm compared to any other approaches is the computation of some eigenvalues of the stiffness matrix. Therefore, the corresponding computing time is also shown in the following One can observe that in this case, an optimal choice of K is a few hundreds, which ensures already the desired convergence rate. It is also clear, that we really need ''long'' Taylor approximations. An error analysis for this can be found in [21]. It also clear that the computation of the eigenvalues requires minimal extra efforts.
Also, the convergence results in the tables are in a good accordance with our theoretical results: the spatial convergence order is above two and the temporal order is about one. Finally, the computational time remains proportional with the number of unknowns. This indicates that only operations with sparse matrices were used in the algorithm.