The Galerkin Finite Element Method for A Multi-term Time-Fractional Diffusion equation

We consider the initial/boundary value problem for a diffusion equation involving multiple time-fractional derivatives on a bounded convex polyhedral domain. We analyze a space semidiscrete scheme based on the standard Galerkin finite element method using continuous piecewise linear functions. Nearly optimal error estimates for both cases of initial data and inhomogeneous term are derived, which cover both smooth and nonsmooth data. Further we develop a fully discrete scheme based on a finite difference discretization of the time-fractional derivatives, and discuss its stability and error estimate. Extensive numerical experiments for one and two-dimension problems confirm the convergence rates of the theoretical results.


Introduction
We consider the following initial/boundary value problem for a multi-term time fractional diffusion equation in u(x, t): (1.1) where Ω denotes a bounded convex polygonal domain in R d (d = 1, 2, 3) with a boundary ∂Ω, f is the source term, and the initial data v is a given function on Ω and T > 0 is a fixed value. Here the differential operator P (∂ t ) is defined by where 0 < α m ≤ ... ≤ α 1 < α < 1 are the orders of the fractional derivatives, b i > 0, i = 1, 2, ..., m, with the left-sided Caputo fractional derivative ∂ β t u, 0 < β < 1, being defined by (cf. [17, p.   (t − s) −β u (s)ds, (1.2) where Γ (·) denotes the Gamma function.
In the case of m = 0, the model (1.1) reduces to its single-term counterpart ∂ α t u − u = f in Ω × (0, T ]. (1.3) This model has been studied extensively from different aspects due to its extraordinary capability of modeling anomalous diffusion phenomena in highly heterogeneous aquifers and complex viscoelastic materials [1,31]. It is the fractional analogue of the classical diffusion equation: with α = 1, it recovers the latter, and thus inherits some of its analytical properties.
However, it differs considerably from the latter in the sense that, due to the presence of the nonlocal fractional derivative term, it has limited smoothing property in space and slow asymptotic decay in time [32], which in turn also impacts related numerical analysis [12] and inverse problems [14,32]. The model (1.1) was developed to improve the modeling accuracy of the single-term model (1.3) for describing anomalous diffusion. For example, in [33], a two-term fractional-order diffusion model was proposed for the total concentration in solute transport, in order to distinguish explicitly the mobile and immobile status of the solute using fractional dynamics. The kinetic equation with two fractional derivatives of different orders appears also quite naturally when describing subdiffusive motion in velocity fields [28]; see also [16] for discussions on the model for wave-type phenomena.
There are very few mathematical studies on the model (1.1). Luchko [25] established a maximum principle for problem (1.1), and constructed a generalized solution for the case f ≡ 0 using the multinomial Mittag-Leffler function. Jiang et al. [9] derived formal analytical solutions for the diffusion equation with fractional derivatives in both time and space. Li and Yamamoto [22] established the existence, uniqueness, and the Hölder regularity of the solution using a fixed point argument for problem (1.1) with variable coefficients {b i }. Very recently, Li et al. [21] showed the uniqueness and continuous dependence of the solution on the initial value v and the source term f , by exploiting refined properties of the multinomial Mittag-Leffler function.
The potential applications of the model (1.1) motivate the design and analysis of numerical schemes that have optimal (with respect to data regularity) convergence rates. Such schemes are especially valuable for problems where the solution has low regularity. The case m = 0, i.e., the single-term model (1.3), has been extensively studied, and stability and error estimates were provided; see [23,34,37] for the finite difference method, [19,20,36] for the spectral method, [10][11][12]18,27,29,30] for the finite element method, and [4,7] for meshfree methods based on radial basis functions, to name a few. In particular, in [10][11][12], the authors established almost optimal error estimates with respect to the regularity of the initial data v and the right hand side f for a semidiscrete Galerkin scheme. These studies include the interesting case of very weak data, i.e., v ∈Ḣ q (Ω) and f ∈ L ∞ (0, T ; Ḣ q (Ω)) for −1 < q ≤ 0.
Numerical methods for the multi-term ordinary differential equation were considered in [6,15]. In [38], a scheme based on the finite element method in space and a specialized finite difference method in time was proposed for (1.1), and error estimates were derived. We also refer to [24] for a numerical scheme based on a fractional predictor-corrector method for the multi-term time fractional wave-diffusion equation. The error analysis in these works is done under the assumption that the solution is sufficiently smooth and therefore it excludes the case of low regularity solutions. This is the main goal of the present study. However, the derivation of optimal with respect to the regularity error estimates requires additional analysis of the properties of problem (1.1), e.g., stability, asymptotic behavior for t → 0 + . Relevant results of this type have recently been obtained in [21], which, however, are not enough for the analysis of the semidiscrete Galerkin scheme, and hence in Section 2, we make the necessary extensions. Now we describe the semidiscrete Galerkin scheme. Let {T h } 0<h<1 be a family of shape regular and quasi-uniform partitions of the domain Ω into d-simplexes, called finite elements, with a maximum diameter h. The approximate solution u h is sought in the finite element space X h of continuous piecewise linear functions over the triangulation T h where a(u, w) = (∇u, ∇ w) for u, w ∈ H 1 0 (Ω), and v h ∈ X h is an approximation of the initial data v. The choice of v h will depend on the smoothness of the initial data v. We shall study the convergence of the semidiscrete scheme (1.4) for the case of initial data v ∈Ḣ q (Ω), −1 < q ≤ 2, and right hand side f ∈ L ∞ (0, T ; Ḣ q (Ω)), −1 < q < 1. The case of nonsmooth data, i.e., −1 < q ≤ 0, is very common in inverse problems and optimal control [14,32]; see also [5,13] for the parabolic counterpart.
The goal of this work is to develop a numerical scheme based on the finite element approximation for the model (1.1), and provide a complete error analysis. We derive error estimates optimal with respect to the data regularity for the semidiscrete scheme, and a convergence rate O (h 2 + τ 2−α ) for the fully discrete scheme in case of a smooth solution.
Specifically, our essential contributions are as follows. First, we obtain an improved regularity result for the inhomogeneous problem, by allowing less regular source term, cf. Theorem 2.3. This is achieved by exploiting the complete monotonicity of the multinomial Mittag-Leffler function, cf. Lemma 2.4. Second, we derive nearly optimal error estimates for a semidiscrete Galerkin scheme for both homogeneous and inhomogeneous problems, cf. Theorems 3.1-3.4, which cover both smooth and nonsmooth data. Third, we develop a fully discrete scheme based on a finite difference method in time, and establish its stability and error estimates, cf. Theorem 4.1. We note that the derived error estimate for the fully discrete scheme holds only for smooth solutions.
The rest of the paper is organized as follows. In Section 2, we recall the solution theory for the model (1.1) for both homogeneous and inhomogeneous problems, using properties of the multinomial Mittag-Leffler function. The readers not interested in the analysis may proceed directly to Section 3. Almost optimal error estimates for their semidiscrete Galerkin finite element approximations are given in Section 3. Then a fully discrete scheme based on a finite difference approximation of the Caputo fractional derivatives is given in Section 4, and an error analysis is also provided. Finally, extensive numerical experiments are presented to illustrate the accuracy and efficiency of the Galerkin scheme, and to verify the convergence theory. Throughout, we denote by C a generic constant, which may differ at different occurrences, but always independent of the mesh size h and time step size τ .

Solution theory
In this part, we recall the solution theory for problem (1.1). We shall describe the solution representation using the multinomial Mittag-Leffler function, and derive optimal solution regularity for the homogeneous and inhomogeneous problems.

Solution representation
For s ≥ −1, we denote by Ḣ s (Ω) ⊂ H −1 (Ω) the Hilbert space induced by the norm:  [35,Lemma 3.1]. Note that Ḣ s (Ω), s ≥ −1 form a Hilbert scale of interpolation spaces. Hence, we denote · H s (Ω) to be the norm on the interpolation scale between H 1 0 (Ω) and L 2 (Ω) for s ∈ [0, 1] and · H s (Ω) to be the norm on the interpolation scale between L 2 (Ω) and H −1 (Ω) for s ∈ [−1, 0]. Then, · H s (Ω) and · Ḣs (Ω) are equivalent for s ∈ [−1, 1]. Further, for a Banach space B and any r ≥ 1, we define the space L r (0, T ; B) = u(t) ∈ B for a.e. t ∈ (0, T ) and u L r (0,T ;B) < ∞ , and the norm · L r (0,T ;B) is defined by Upon denoting α = (α, α − α 1 , ..., α − α m ), we introduce the following solution operator This operator is motivated by a separation of variables [25,26]. Then for problem (1.1) with a homogeneous right hand side, i.e., f ≡ 0, we have u(x, t) = E(t)v. However, the representation (2.1) is not always very convenient for analyzing its smoothing property. We derive an alternative representation of the solution operator E using Lemma 2.2: Besides, we define the following operator Ē for χ ∈ L 2 (Ω) bȳ The operators E(t) and Ē (t) can be used to represent the solution u of (1.1) as: The operator Ē has the following smoothing property.
Proof. The definition of the operator Ē in (2.3) and Lemma 2.1 yield where the last line follows by the inequality sup j∈N

Solution regularity
First we recall known regularity results. In [22], Li and Yamamoto showed that in the case of variable coefficients 1). These results were recently refined in [21] for the case of constant coefficients, i.e., problem (1.1). In particular, it was shown that for v ∈Ḣ q (Ω), 0 ≤ q ≤ 1, and for some γ ∈ (0, 1]. Here we follow the approach in [21], and extend these results to a slightly more general setting of The nonsmooth case, i.e., −1 < q ≤ 0, arises commonly in related inverse problems and optimal control problems. We shall derive the solution regularity to the homogeneous problem, i.e., f ≡ 0, and the inhomogeneous problem, i.e., v ≡ 0, separately. These results will be essential for the error analysis of the space semidiscrete Galerkin scheme (1.4) in Section 3. First we consider the homogeneous problem with initial data v ∈Ḣ q (Ω), −1 < q ≤ 2.
Proof. We show that (2.2) represents indeed the weak solution to problem (1.1) with f ≡ 0 and further it satisfies the desired estimate. We first discuss the case = 0. By Lemma 2.1 and (2.
where the last line follows from the inequality sup j∈N The estimate for the case = 1 follows from the identity (Ω) . It remains to show that (2.2) satisfies also the initial condition in the sense that lim t→0 + E(t)v − v Ḣq (Ω) = 0. By identity (2.1) and Lemma 2.1, we deduce Upon noting the identity lim t→0 Hence, the desired assertion follows by Lebesgue's dominated convergence theorem. 2 Now we turn to the inhomogeneous problem with a nonsmooth right hand side, i.e., f ∈ L ∞ (0, T ; Ḣ q (Ω)), −1 < q ≤ 1, and a zero initial data v ≡ 0.
Proof. By construction, it satisfies the governing equation. By Lemma 2.3, we have which shows the desired estimate. Further, it satisfies the initial condition u(0) = 0, i.e., for any > 0, lim t→0 + u(·, t) Ḣq+2− (Ω) = 0, and thus it is indeed a solution of (1.1). 2 Next we extend Theorem 2.2 to allow less regular right hand sides f ∈ L 2 (0, T ; Ḣ q (Ω)), −1 < q ≤ 1. Then the function u(x, t) satisfies also the differential equation as an element in the space L 2 (0, T ; Ḣ q+2 (Ω)). However, it may not satisfy the homogeneous initial condition u(x, 0) = 0. In Remark 2.1 below, we argue that a weaker class of source term that produces a legitimate weak solution of (1.

Error estimates for semidiscrete Galerkin scheme
Now we derive and analyze the space semidiscrete Galerkin FEM scheme (1.4). First we describe the semidiscrete scheme, and then derive almost optimal error estimates for the homogeneous and inhomogeneous problems separately. In the analysis we essentially use the technique developed in [12] and improved in [10,11].

Semidiscrete scheme
To describe the scheme, we need the L 2 (Ω) projection P h : L 2 (Ω) → X h and Ritz projection R h : The operators R h and P h satisfy the following approximation property [35].

By interpolation, the operator P h is also bounded on
Now we can describe the semidiscrete Galerkin scheme. Upon introducing the discrete Laplacian h : where v h ∈ X h is an approximation to the initial data v. Next we give a solution representation of (3. Then the solution u h of the discrete problem (3.1) can be expressed by: On the finite element space X h , we introduce the discrete norm | · |Ḣ p (Ω) defined by The norm | · |Ḣ p (Ω) is well defined for all real number p. Clearly, |ψ |Ḣ1 (Ω) = ψ Ḣ1 (Ω) and |ψ |Ḣ0 (Ω) = ψ L 2 (Ω) for any ψ ∈ X h . Further, the following inverse inequality holds [12]: if the mesh T h is quasi-uniform, then for any l > s where the last inequality follows from sup 1≤ j≤N The next result is a discrete analogue to Lemma 2.3.

Lemma 3.3. Let Ē h be defined by (3.3) and χ ∈ X h . Then for all t
Proof. The proof for the case 0 ≤ p − q ≤ 2 is similar to Lemma 2.3. The other assertion follows from the fact that {λ h j } N j=1 are bounded from zero independent of h. 2

Error estimates for the homogeneous problem
To derive error estimates, first we consider the case of smooth initial data, i.e., v ∈Ḣ 2 (Ω). To this end, we split the error u h (t) − u(t) into two terms: By Lemma 3.1 and Theorem 2.1, we have for any t > 0 (3.6) So it suffices to get proper estimates for ϑ(t), which is given below.
Proof. Using the identity h R h = P h , we note that ϑ satisfies with ϑ(0) = 0. By the representation (3.4), Then by Lemmas 3.3 and 3.1, and Theorem 2.1 we have for p = 0, 1 which is the desired result. 2 Using (3.6), Lemma 3.4 and the triangle inequality, we arrive at our first estimate, which is formulated in the following theorem: Now we turn to the nonsmooth case, i.e., v ∈Ḣ q (Ω) with −1 < q ≤ 1. Since the Ritz projection R h is not well-defined for nonsmooth data, we use instead the L 2 (Ω)-projection v h = P h v and split the error u h − u into: Thus, we only need to estimate the term ϑ(t), which is stated in the following lemma.

Error estimates for the inhomogeneous problem
Now we derive error estimates for the semidiscrete Galerkin approximation of the inhomogeneous problem with f ∈ L ∞ (0, T ; Ḣ q (Ω)), −1 < q ≤ 0, and v ≡ 0, in both L 2 -and L ∞ -norm in time. The case f ∈ L ∞ (0, T ; Ḣ q (Ω)), 0 < q ≤ 1, is simpler and can be treated analogously. To this end, we appeal again to the splitting (3.7). By Theorem 2.2 and Lemma 3.1, the following estimate holds for : where the last inequality follows from the fact t ≤ T , and t α is bounded. Now the choice h = | ln h|, (3.10) Thus, it suffices to bound the term ϑ ; see the lemma below. Lemma 3.6. Let ϑ(t) be defined by (3.9), and f ∈ L ∞ (0, T ; Ḣ q (Ω)), −1 < q ≤ 0. Then with h = | ln h|, there holds Proof. By (3.4) and Lemma 3.3, we deduce that for p = 0, 1 Further, using (3.5) and Lemma 3.1, we deduce for p = 0, 1 Now by (2.5) and the choice = 1/ h we get for p = 0, 1: where the first inequality follows directly from (2.5) and the second inequality follows from This completes the proof of the lemma. 2 An inspection of the proof of Lemma 3.6 indicates that for 0 < q < 1, one can get rid of one factor h . Now we can state an error estimate in L ∞ -norm in time.
Last, we derive an error estimate in L 2 -norm in time. To this end, we need a discrete analogue of Theorem 2.3, whose proof follows identically and hence is omitted.

Lemma 3.7. Let u h be the solution of (1.4) with v h = 0. Then for arbitrary p
Proof. We use the splitting (3.7). By Theorem 2.3 and Lemma 3.1 By (3.4), (3.8) and Lemmas 3.7 and 3.1, we have for p = 0, 1: Combing the preceding two estimates yields the desired assertion. 2

A fully discrete scheme
Now we describe a fully discrete scheme for problem (1.1) based on the finite difference method introduced in [23]. To discretize the time-fractional derivatives, we divide the interval [0, T ] uniformly with a time step size τ = T /K , K ∈ N. We use the following discretization: 1, 2, ..., n and r n+1 α,τ denotes the local truncation error. Lin and Xu [23, Lemma 3.1] (see also [34,Lemma 4.1]) showed that the truncation error r n+1 α,τ can be bounded by Then the multi-term fractional derivative P (∂ t )u(t) at t = t n+1 in (1.1) can be discretized by where the discrete differential operator P τ (∂ t ) is defined by (4.4) where the coefficients {P j } are defined by Then by (4.2) the local truncation error R n+1 τ of the approximation P τ (∂ t )u(t n+1 ) is bounded by (4.5) By the monotonicity and convergence of {d α, j } [23, Eq. (13)], we know that P 0 > P 1 > ... > 0 and P j → 0 for j → ∞. (4.6) Now we arrive at the following fully discrete scheme: find U n+1 ∈ X h such that where F n+1 = f (x, t n+1 ). Upon setting γ = Γ (2 − α)τ α , the fully discrete scheme (4.7) is equivalent to finding U n+1 ∈ X h such that for all χ ∈ X h The next result gives the stability of the fully discrete scheme.  (4.9) where the constant c depends only on α and T .
Proof. The case n = 1 is trivial. Then the proof proceeds by mathematical induction. By noting the monotone decreasing property of the sequence {P n } from (4.6) and choosing χ = U n+1 in (4.8), we deduce Using the monotonicity of {P n } again gives It suffices to choose a constant c such that c P N − γ > 0. By taking τ = T /N and noting the concavity of the function Then by choosing c = C −1 The desired result follows by dividing both sides by P 0 . 2 Next we state an error estimate for the fully discrete scheme. In order to analyze the temporal discretization error, we assume that the solution is sufficiently smooth.
Then there holds Proof. We split the error e n = u(t n ) − U n into e n = u(t n ) − R h u(t n ) + R h u(t n ) − U n =: n + θ n .
Here n is a special case of (t) : It suffices to bound the term θ n . By comparing (1.1) and (4.7), we have the error equation P τ (∂ t )θ n , χ + ∇θ n , ∇χ = ω n , χ , (4.10) where the right hand side ω n is given by Table 1 Numerical results for the case with a smooth solution at t = 1 with β = 0.2 and α = 0.25, 0.5, 0.95, discretized on a uniform mesh with h = 2 −10 and τ = 0.2 × 2 −k . α τ Meanwhile, the second term ω n 2 can be bounded using (4.5). Then by the stability from Lemma 4.1 for the error equation (4.10), we obtain Remark 4.1. The error estimate in Theorem 4.1 holds only if the solution u is sufficiently smooth. There seems no known error estimate expressed in terms of the initial data (and right hand side) only for fully discrete schemes for nonsmooth initial data even for the single-term time-fractional diffusion equation with a Caputo fractional derivative.

Numerical experiments
In this part we present one-and two-dimensional numerical experiments to verify the error estimates in Sections 3 and 4. We shall discuss the cases of a homogeneous problem and an inhomogeneous problem separately.

The case of a smooth solution
Here we consider the following one-dimensional problem on the unit interval Ω = (0, 1) with 0 < β < α < 1 In order to verify the estimate in Theorem 4.1, we first check the case that the solution u is sufficiently smooth. To this end, . Then the exact solution u is given by u(x, t) = (1 + t 2 )(−x 2 + x), which is very smooth. In our computation, we divide the unit interval Ω into N equally spaced subintervals, with a mesh size h = 1/N. Similarly, we fix the time step size at τ = 1/K . Here we choose N large enough so that the space discretization error is negligible, and the time discretization error dominates. We measure the accuracy of the numerical approximation U n by the normalized L 2 error U n − u(t n ) L 2 (Ω) / v L 2 (Ω) . In Table 1, we show the temporal convergence rates, indicated in the column rate (the number in bracket is the theoretical rate), for three different α values, which fully confirm the theoretical result, cf. also Fig. 1 for the plot of the convergence rates.

Homogeneous problems
In this part we present numerical results to illustrate the spatial convergence rates in Section 3. We performed numerical tests on the following three different initial data:
(2c) Very weak data: v(x) = δ 1/2 (x), a Dirac δ 1/2 (x)-function concentrated at x = 1/2, which belongs to the space Ḣ − (Ω) for any ∈ (1/2, 1]. In order to check the convergence rate of the semidiscrete scheme, we discretize the fractional derivatives with a small time step τ so that the temporal discretization error is negligible. In view of the possibly singular behavior as t → 0, we set the time step τ to τ = t/(5 × 10 4 ), with t being the terminal time. For each example, we measure the error e(t) = u(t) − u h (t) by the normalized errors e(t) L 2 (Ω) / v L 2 (Ω) and ∇e(t) L 2 (Ω) / v L 2 (Ω) . The normalization enables us to observe the behavior of the error with respect to time in case of nonsmooth initial data.

Numerical results for example (2a): smooth initial data
The numerical results show O (h 2 ) and O (h) convergence rates for the L 2 -and H 1 -norms of the error, respectively, for all three different α values, cf. Fig. 2. As the value of α increases from 0.25 to 0.95, the error at t = 1 decreases accordingly, which resembles that for the single-term time-fractional diffusion equation [12].

Numerical results for example (2b): nonsmooth initial data
For nonsmooth initial data, we are particularly interested in errors for t close to zero, and thus we also present the errors at t = 0.01 and t = 0.001; see Table 2. The numerical results fully confirm the theoretically predicted rates for nonsmooth initial data. Further, in Table 3 we show the L 2 -norm of the error for fixed h = 2 −6 and t → 0. We observe that the error deteriorates as t → 0. Upon noting v ∈Ḣ 1/2− (Ω), it follows from Theorem 3.2 that the error grows like O (t −3α/4 ), which agrees well with the results in Table 3.

Numerical results for example (2c): very weak initial data
The numerical results show a superconvergence with a rate of O (h 2 ) in the L 2 -norm and O (h) in the H 1 -norm, cf. Fig. 3(a). This is attributed to the fact that in one-dimension the solution with the Dirac δ-function as the initial data is Table 3 L 2 -error with α = 0.5 and h = 2 −6 for t → 0 for nonsmooth initial data (2b).   Table 4 Numerical results for example (3a) with α = 0.5 and β = 0.2 at t = 1, 0.01, 0.001, discretized on a uniform mesh h = 2 −k and τ = t/(5 × 10 4 ). smooth from both sides of the support point and the finite element spaces X h have good approximation property. When the singularity point x = 1/2 is not aligned with the grid, Fig. 3(b) indicates an O (h 3/2 ) and O (h 1/2 ) convergence rate for the L 2 -and H 1 -norm of the error, respectively, which agrees with our theory.

Numerical results for example (3b)
In Table 5 we show convergence rates at three different times, i.e., t = 0.1, 0.01, and 0.001. Here the mesh size h is chosen to be h = 1/(2 k + 1), and thus the support of the Dirac δ-function does not align with the grid. The results indicate an O (h 1/2 ) and O (h 3/2 ) convergence rate for the H 1 -and L 2 -norm of the error, respectively, which agrees well with the theoretical prediction. If the Dirac δ-function is supported at a grid point, both L 2 -and H 1 -norms of the error exhibit a superconvergence phenomenon, i.e., orders O (h 2 ) and O (h), respectively, which are one half order higher than theoretical ones, cf. Table 6. This superconvergence phenomenon still awaits theoretical justifications.

Examples in two-dimension
In this part, we present three two-dimensional examples on the unit square Ω = (0, 1) 2 .
To discretize the problem, we divide each direction into N = 2 k equally spaced subintervals, with a mesh size h = 1/N so that the domain [0, 1] 2 is divided into N 2 small squares. We get a symmetric mesh by connecting the diagonal of each small square.
The numerical results for example (4a) are shown in Table 7, which agree well with Theorem 3.2, with a rate O (h 2 ) and O (h), respectively, for the L 2 -and H 1 -norm of the error. Interestingly, for example (4b), both the L 2 -norm and H 1 -norm of the error exhibit superconvergence, cf. Table 8. The numerical results for example (4c) confirm the theoretical results; see Table 9. The solution profiles for examples (4b) and (4c) at t = 0.1 are shown in Fig. 4, from which the nonsmooth region of the solution can be clearly observed.

Concluding remarks
In this work, we have developed a simple numerical scheme based on the Galerkin finite element method for a multiterm time fractional diffusion equation which involves multiple Caputo fractional derivatives in time. A complete error analysis of the space semidiscrete Galerkin scheme is provided. The theory covers the practically very important case of nonsmooth initial data and right hand side. The analysis relies essentially on some new regularity results of the multiterm time fractional diffusion equation. Further, we have developed a fully discrete scheme based on a finite difference discretization of the Caputo fractional derivatives. The stability and error estimate of the fully discrete scheme were established, provided that the solution is smooth. The extensive numerical experiments in one-and two-dimension fully confirmed our convergence analysis: the empirical convergence rates agree well with the theoretical predictions for both smooth and nonsmooth data.