Fast permutation preconditioning for fractional diffusion equations

In this paper, an implicit finite difference scheme with the shifted Grünwald formula, which is unconditionally stable, is used to discretize the fractional diffusion equations with constant diffusion coefficients. The coefficient matrix possesses the Toeplitz structure and the fast Toeplitz matrix-vector product can be utilized to reduce the computational complexity from \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\mathcal {O}}{(N^{3})}$$\end{document}O(N3) to \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\mathcal {O}}{(N \log N)}$$\end{document}O(NlogN), where N is the number of grid points. Two preconditioned iterative methods, named bi-conjugate gradient method for Toeplitz matrix and bi-conjugate residual method for Toeplitz matrix, are proposed to solve the relevant discretized systems. Finally, numerical experiments are reported to show the effectiveness of our preconditioners.

fractional derivative and the FDEs with two-sided fractional derivatives, respectively, and their method has been proved to be unconditionally stable. However, it is worth noting that most of the numerical methods for FDEs tend to generate full coefficient matrices, which require computational cost of O(N 3 ) and storage of O(N 2 ), there is no doubt that it will certainly increase the computational work; see, e.g., Wang et al. (2010) for a discussion of these issues.
Recently, there is some progress on fast numerical solutions of FDEs. Wang et al. (2010) discovered that the full coefficient matrix generated by Meerschaert and Tadjeran (2006) method has a good feature, i.e., it can be written as a sum of diagonal-multiply-Toeplitz matrices. Thus the storage requirement is reduced from O(N 2 ) to O(N ). As we know, the Toeplitz matrix-vector product (MVP) can be computed in O(N log N ) operations by the fast Fourier transforms (FFTs) (Chan and Ng 1996). Fast methods Lei and Sun (2013), Popolizio (2015),  have been developed to solve FDEs with the shifted Grünwald formula. Wang and Wang (2011) proposed a conjugate gradient normal residual (CGNR) to solve the discretized system by Meerschact and Tadjeran's method with the computational cost of O(N log N ). The preconditioned CGNR with a circulant preconditioner is proposed by Lei and Sun (2013), to solve FDEs by Meerschact and Tadjeran's method with constant diffusion coefficients.
In this paper, we employ the implicit finite difference method to discretize the FDEs and the problem is transformed to solve a linear nonsymmetric Toeplitz system in each time step. Since, the Bi-Conjugate Gradient (BiCG) (Saad 2003, pp. 234-236) and Bi-Conjugate Residual (BiCR) (Sogabe et al. 2009) can be regarded as two effective methods for solving the nonsymmetric system. There is no doubt that the two methods with Toeplitz fast MVP can be used to solve such discretized Toeplitz linear systems. However, from Sogabe et al. (2005), Pestana and Wathen (2015), if the two iterative methods are employed directly, then we indeed fail to make full use of Toeplitz structure of the discretized system, it also means that their computational cost fail to attain optimality. Hence, it is still worth finding more effective methods to reduce the computational complexity. Recently, in Sogabe et al. (2005), Pestana and Wathen (2015), a permutation matrix P was introduced to transform the nonsymmetric matrix into a symmetric one so as to improve the performance of iterative methods. In view of this point, we re-explain the ideas in Sogabe et al. (2005), Pestana and Wathen (2015) as a kind of preconditioning techniques for solving the discretized system of the FDEs by the method of Meerschaet and Tadjeran. More precisely, we do equivalent transformation for the original discretized system, left multiplying by a permutation matrix (Pestana and Wathen 2015) at the same time, then we obtain a new symmetric linear system with the coefficient matrix being a Hankle matrix, which has the same solution with the original discretized system. As we know, the symmetric linear systems are usually simpler to be solved than the nonsymmetric cases. Conjugate Gradient (CG) and Conjugate Residual (CR) are two effective methods for solving symmetric linear system. In this paper, we extend CG and CR to BiCGT and BiCRT, respectively, which are proposed to solve the equivalent equation. The numerical results show that both BiCGT and BiCRT are more competitive than CGNR.
The paper is organized as follows. in Sect. 2, we briefly introduce the discretization of FDEs by finite difference method. In Sect. 3, we construct the permutation preconditioner and propose BiCGT and BiCRT to solve the equivalent system of linear equations. In Sect. 4, numerical results are reported to illustrate the efficiency of the proposed methods. Concluding remarks are given in Sect. 5.

Discretization of FDEs by finite difference method
In this section, we are interested in solving an initial-boundary value problem of the following FDEs, where α ∈ (1, 2) is the order of the space fractional derivative, f(x, t) is the source term, and the diffusion coefficients satisfying d 1 ≥ 0, d 2 ≥ 0, and d 1 + d 2 � = 0. In this paper, we use the Grünwald-Letnikov form Podlubny (1999) to define the left-sided and the right-sided fractional derivatives ∂ α u(x,t) where ⌊·⌋ denotes the floor function, and the Grünwald coefficients g (α) k are defined as follows which can be evaluated by the recurrence relation In order to derive the proposed scheme, let h = x R −x L N +1 and ∆t = T /M be the sizes of spatial grid and time step, respectively (N, M are positive integers). Define x i = x L + ih (i = 0, 1, . . . , N + 1) and the temporal partition t m = m∆t (m = 0, 1, . . . , M). Let . We employ the shifted Grünwald approximation Tadjeran 2004, 2006): k is defined in Eq.
(2). Then the corresponding finite difference scheme is unconditionally stable, see Tadjeran (2004, 2006) for details. (1) We can note that G α is a nonsymmetric Toeplitz matrix, thus it can be stored with N + 1 entries (Wang et al. 2010). The Toeplitz matrix-vector product (MVP) can be computed in O(N log N ) complexities with the aid of FFTs (Pang and Sun 2012). Define which is related to the number of time steps and grid points. The above linear system (4) can be rewritten in the following matrix form where and the right hand vector In order to illustrate the convergence and stability of the implicit difference scheme (3), we note that g (α) k satisfy the following proposition.

The BiCGT method and the BiCRT method
We will show how to construct the permutation preconditioners for accelerating the iterative solver and describe the derivation process of BiCGT and BiCRT. Furthermore, an analysis of computational cost for each iteration step is also proposed. In the linear system (5), M (m) is a nonsymmetric Toeplitz matrix. As previously mentioned, we cannot exploit BiCG and BiCR for resulting linear systems (5) directly, otherwise it is not possible to take advantage of the Toeplitz structure of coefficient matrix. So we need to modify and improve the BiCG and BiCR methods particularly for solving (5). Recently, Sogabe et al. (2005), proposed a preconditioner of permutation matrix for improving the performance of the Krylov subspace method, which is used to solve a nonsymmetric Toeplitz linear system. Later, Pestana and Wathen rigorously establish a circulant preconditioned MINRES method (Paige and Saunders 1975) for nonsymmetric Toeplitz systems. Inspired by their pioneer work, we construct a preconditioner, which is a permutation matrix P (Sogabe et al. 2005) with the form of We would like to solve the Toeplitz system (5) by the CG-like method. This goal can be achieved with little additional computing cost, since we can get the equivalent system: which can be regarded as a left preconditioning technique (Saad 2003) and also has the same solution with (5). Define M (m) = PM (m) , b(m−1) = Pb (m−1) , then (6) can be rewritten into an equivalent statement is that M (m) is self-adjoint with respect to the bilinear form defined by P (Paige and Saunders 1975). P is symmetric positive definite, a nonsymmetric Toeplitz matrix is exactly changed into a symmetric matrix M (m) , so that (7) can be solved by the modified BiCG, where the additional operations for dual systems have been eliminated.
Firstly, we consider CG and BiCG (Saad 2003). Then we followed the philosophy behind the derivation of iterative method in Sogabe et al. (2005 Algorithm1), i.e., in the CG Algorithm, we replace A and b with Ã = PM and f = Pf , respectively. Then we get the following new algorithm: In BiCGT, we only need one MVP, i.e. M (m) p n , and two inner products, three vector additions/subtractions per iteration. The rewritten algorithm is more effective than CGNR, because P multiply an arbitrary vector is to reorder the vector in its reversed order (Sogabe et al. 2005). Therefore, it can greatly reduce the required number of MVPs.
In a similar way, we can get the algorithm of BiCRT for symmetric linear system (7), BiCRT reduces to CR if we get rid of the permutation preconditioner P, and the algorithm of BiCRT is presented as blew.
If we employ BiCG or BiCR directly, it is impossible to minimize the residual vector in some special conditions. However, BiCRT could realize this goal to some extent. The approximation u n+1 is generated from u n by moving from u n in a certain direction p n to a minimum point of the residual function E(u) = �M (m) u −b (m−1) � 2 , u ∈ R N . In other words, for u n+1 = u n + α n p n , α n is chosen to minimize E(u).
It is useful to consider the computational cost. We give a table to illustrate the computational cost of BiCG, BiCR, BiCGT, BiCRT and CGNR. "AXPY" denotes addition of scaled vectors, and "1 + 1" denotes 1 product with the matrix and 1 with its transpose. From Table 1, it is remarkable that BiCG and BiCR, BiCGT and BiCRT require almost the same memory and computational cost in each iteration step. More precisely, BiCGT is the best method to solve the above system in terms of computational cost (i.e, AXPYs and MVPs). For BiCRT, the number of AXPYs is one more than BiCGT, and for CGNR, the number of MVPs is one more than BiCGT. As we know, the computational complexity of one Toeplitz MVP is O(N log N ) by FFTs, but one AXPY can be computed in O(N ) complexity. So BiCRT is more efficient than CGNR from this perspective.

Numerical results
We solve the FDEs (1) numerically by the implicit finite difference method given in Sect. 2. After the finite difference discretization and the equivalent transformation, the symmetric linear system (7) is solved by BiCGT (Algorithm 1), BiCRT (Algorithm 2), and CGNR (Wang and Wang 2011), respectively. All the MVPs M (m) u (m) are done by FFTs in O(N log N ) operations (Lei and Sun 2013) and the initial guess is chosen to be zero vector at each time step. The stopping criterion of BiCGT, BiCRT and CGNR is set to be where r (k) is the residual vector of the linear system after k iterations and r (0) is the initial residual vector.
In the following tables, "N" denotes the number of spatial grid points, "M" denotes the number of time steps, CPU(s) denotes the total CPU time (in seconds) for solving the whole discretized system. "Error" denotes the infinity norm of the difference between the true solution and the approximation at the last time step. "Iter" denotes the average number of iterations over all time discretized level for solving the FDEs, i.e., Iter = 1 M M m=1 Iter (m), where Iter(m) is the number of iterations required for solving the linear system (7) in the mth time discretized level. All experiments are run in MATLAB R2010a on a PC with the following configuration: Windows 7 (32 bit), Iter(R) Core(TM) i3-2130 CPU 3.40 GHz and 4 GB RAM. The exact solution is u(x, t) = sin(t + 1)x 3 (1 − x) 3 . For the finite difference discretization, the space step and time step are taken to be h = 1/(N + 1), ∆t = 2h, i.e., N + 1 = 2M.

Example 1 We consider FDEs (1) on space interval
The numerical results are listed in Table 2, as for comparisons, we also carry out CGNR without preconditioner. From Table 2, it is remarkable that the error is improved for CGNR, BiCGT and BiCRT as the increasing of α. However, BiCGT and BiCRT are more effective than CGNR in terms of CPU time. More precisely, the performance of BiCGT is the best in terms of the CPU time except the cases of α = 1.4, N = 255, α = 1.4, N = 511, α = 1.4, N = 1023, and α = 1.8, N = 1023. In addition, the average number of iterations of BiCGT is less than that by CGNR and BiCRT sometimes. For instance, look at these cases in the numerical results at discretized size N = 127, M = 64 , N = 255, M = 128, and N = 511, M = 256 for α = 1.8. BiCGT and BiCRT have faster convergence speed, less computational time expenditure than CGNR. Meanwhile, we can see that the CPU time increases as the order α of the time derivative increases. To explain this phenomenon, we list the spectra of the original matrix M (m) with different α in Fig. 1. As we can see from the figure, most of eigenvalues is close to zero with the increasing of the valve of α, it means that the coefficient matrix become increasingly ill-conditioned, it also implies that the linear systems will be difficult to solve. The exact solution of this example is u(x, t) = e −t x 2 (1 − x) 2 . For the finite difference discretization, the space step and time step are taken to be h = 1/(N + 1), ∆t = 2h and N + 1 = 2M, respectively. Table 3 shows the numerical results for solving Example 2 by different methods. The error is decreased for those methods as the increasing of α, and the accuracy is almost the same as Example 1. Similar to Example 1, BiCGT and BiCRT are more effective than CGNR in terms of CPU time elapsed. Besides, the CPU time increases as the order α of the time derivative increases is similar to Example 1, and the reason is the same as Example 1. We also list the spectra of the matrix M (m) with different α in Fig. 2.

Concluding remarks
Two new iterative methods, named BiCGT and BiCRT, are presented to solve the resulting linear system of the FDEs (1), which are discretized by the implicit finite difference method. Namely, with the help of the permutation matrix P, we transform the difficult nonsymmetric linear systems into the symmetric cases, which are often simpler to be  In our future work, we will apply BiCGT and BiCRT to solve other (two dimensional) fractional differential equations (Wang and Basu 2012), such as fractional advection-diffusion equations; and we will investigate and develop some suitable preconditioning, see e.g. Lei and Sun (2013),  to further accelerate our proposed methods.