Fast collocation method for a two-dimensional variable-coefficient linear nonlocal diffusion model

In this paper, a fast collocation method is developed for a two-dimensional variable-coefficient linear nonlocal diffusion model. By carefully dealing with the variable coefficient in the integral operator and then analyzing the structure of the coefficient matrix, we can reduce the computational operations in each Krylov subspace iteration from O(N2)\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$O(N^{2})$\end{document} to O(NlogN)\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$O(N\log N)$\end{document} and the memory requirement for the coefficient matrix from O(N2)\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$O(N^{2})$\end{document} to O(N)\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$O(N)$\end{document}. Numerical experiments are carried out to show the utility of the fast collocation method.


Introduction
Recently, nonlocal models such as fractional partial differential equations (FPDEs) and nonlocal diffusion and peridynamic models have been applied in many research fields. These models can be regarded as a generalization of classical PDEs [1][2][3][4] and have shown the utility in modeling some challenging phenomena including anomalous diffusion and long-range spatial interactions [5][6][7][8][9][10][11][12][13]. These phenomena cannot be modeled properly by classical integer-order PDEs. Especially speaking, nonlocal diffusion and peridynamic models provide a more natural method to describe physical problems especially near singularities and discontinuities [14]. However, because of the nonlocality, the numerical algorithms usually yield dense or full coefficient matrices, which can cause significantly increased computational complexity and memory storage. For example, widely used direct solvers need O(N 3 ) operations to solve the linear system and O(N 2 ) computer memory space to store the coefficient matrix in which N is the number of unknowns.
To date, there have been many papers aimed to develop fast numerical schemes for nonlocal models to reduce the computational complexity and the memory requirement [13,[15][16][17][18][19][20][21]. Among these papers, in [18], the authors developed a fast collocation method for a two-dimensional linear nonlocal diffusion model. We developed a fast-method in [19] and a preconditioned fast collocation method in [20] for two-dimensional linear bondbased peridynamic model, respectively. All these papers are based on the Toeplitz-like structure of the coefficient matrix. This structure can help us to reduce the computational work from O(N 2 ) to O(N log N) in each Krylov subspace iteration and the memory storage from O(N 2 ) to O (N). Numerical examples in these papers show the utility of the fast methods.
In [22], a variable-coefficient peridynamic model was developed to account for the heterogeneity of the elastic material. However, because the variable coefficient occurs inside the integral operator, the coefficient matrix resulting from the numerical discretization does not maintain the Toeplitz-like structure as in [18,19]. Therefore the fast methods developed before do not apply to the variable-coefficient peridynamic model. To overcome this difficulty, [23] developed a fast collocation method to a one-dimensional variablecoefficient nonlocal diffusion model based on a piecewise-constant approximation to the variable coefficient.
In this paper, a fast collocation method is developed for the two-dimensional variablecoefficient linear nonlocal diffusion model based on Taylor expansion of the variable coefficient. The rest of the paper is organized as follows:. In Sect. 2, we present the bilinear collocation scheme for the two-dimensional variable-coefficient linear nonlocal diffusion model. In Sect. 3, we study the structure of the coefficient matrix and prove that the coefficient matrix can be approximated by a sum of the products of diagonal matrix and Toeplitz matrix. Then fast matrix-vector multiplication and the reduced memory requirement can be proved, which can be used in the Krylov subspace iteration method. In Sect. 4, we do numerical experiments to investigate the computational benefits of the fast collocation method. In Sect. 5, we make the concluding remarks.

The bilinear collocation approximation for variable-coefficient linear nonlocal diffusion model
To begin with, we consider the following two-dimensional variable-coefficient static linear nonlocal diffusion model [22]: Here Ω = (0, x R ) × (0, y R ); δ represents a parameter of the model which determines the range of interactions; Ω c denotes a boundary zone surrounding Ω with width δ; B δ (x, y) is an open disk in which the center is located at (x, y) and the radius is δ. σ (x, y) denotes the integral kernel; f (x, y) and g(x, y) represent the source term and prescribed nonlocal boundary data, respectively; α(x, y) is the elasticity coefficient with lower and upper bounds.
Let N x and N y represent the number of mesh grids in the x and y directions, respectively. Let h x := x R /N x be the mesh size in the x direction. Let h y := y R /N y be the mesh size in the y direction. We define a spatial partition onΩ by x i := ih x for i = 0, 1, . . . , N x and y j := jh y for j = 0, 1, . . . , N y . To account for the nodes on Ω c , we need to extend the partition to (x i , y j ) for i = -K + 1, . . . , -1, 0, 1, . . . , N x , N x + 1, . . . , N x + K -1 and j = -L + 1, . . . , -1, 0, 1, . . . , N y , N y + 1, . . . , N y + L -1. Here, Let ψ(ξ ) be equal to 1 -|ξ | when ξ ∈ [-1, 1] and equal to zero otherwise. Then the pyramid function φ ij (x, y) centered at (x i , y j ) can be expressed as Then the trial function u is Because for the spatial nodes (x i , y j ) ∈ Ω c , u(x i , y j ) = g(x i , y j ) are known, we choose the spatial nodes (x i , y j ) ∈ Ω as the collocation points. Then we obtain a collocation scheme as follows: We substitute (4) into (5) and rewrite (5) as follows: where f i,j = f (x i , y j ) and u ij is the collocation approximation to the true solution u(x i , y j ). Let N := (N x -1) × (N y -1) be the number of collocation points. Let u and f be Ndimensional vectors defined by and respectively. Then the collocation approximation (6) can be expressed in the following matrix form: Here, A 1 ∈ R N×N and A 2 ∈ R N×N are associated with the first term and the second term on the left-hand side of (6), respectively.
For the matrix A 1 , each entry A m,n 1 for m = 1, . . . , N and n = 1, . . . , N is defined as For the matrix A 2 , each entry A m,n 2 for m = 1, . . . , N and n = 1, . . . , N is given by In (10) and (11), δ m,n is equal to 1 for m = n and equal to zero otherwise. The elements {f m , m = 1, . . . , N} of f on the right hand side in (9) are given by In (10), (11), and (12), the global indices (m, n) and the local indices (i, j), (i , j ) have the following relationship:

Structure of the coefficient matrices A 1 and A 2 in (9)
To construct a fast collocation method, we are now in a position to analyse the structure of the coefficient matrices A 1 and A 2 in (9).

Theorem 1 The matrix A 1 can be expressed as
Here D 1 is a diagonal matrix and the entries on the main diagonal are given by In (16), Proof By a straightforward calculation and analysis of A 1 , we can find that The entries in B are given by where the local indices (i, j), (i , j ) and the global indices m, n are related by (13). Following the proof of Theorem 1 in [24], we see that B is a BTTB matrix and can be expressed as B 1 .
Now we consider the structure of matrix A 2 . From (11), we observe that the variable coefficient occurs inside the integral, which brings a global impact to the numerical discretization. If we calculate the matrix A 2 straightforwardly, we find A 2 is also a 2K + 1 block-banded 2L + 1 banded-block matrix but destroys the BTTB structure. Therefore if we use this structure to solve the linear system, the computational complexity and memory requirement will not decrease.
In this paper, we consider the following approximate way. That is, although the original matrix A 2 does not have the BTTB structure, it can be approximated by a sum of products of diagonal matrix and BTTB matrix if the variable coefficient α(x, y) is approximated by Taylor expansion. We denote the sum by A a 2 .

Theorem 2 The matrix A 2 can be approximately decomposed as
Here D 1 and B 1 are given in the Theorem 1. D 2 and D 3 are diagonal matrices where the main diagonal entries are defined as and D m,m respectively. B 2 and B 3 have the same structure as B 1 . The definition of B 2 and B 3 can be easily obtained by replacing the subscript 1 by 2 and 3 in (16) and (17), respectively.
Proof Using Taylor expansion, the variable coefficient α(x , y ) can be expanded to the following form at the point (x i , y j ): Substituting (23) into (11), each entry of A 1 can be rewritten as where I m,n We first consider the term I m,n 1 . Because I m,n 1 is the same as A m,n 1 , the matrix form of I m,n Here the global indices (m, n) and the local indices (i, j), (i , j ) are related by (13).
Next, we only prove the structure of matrix generated by I m,n 2 . The structure of matrix generated by I m,n 3 is similar. Let the matrix form of I m,n 2 , with 1 ≤ i ≤ N x -1, 1 ≤ j ≤ N y -1, be denoted by A. By a direct analysis, we have where D 2 is defined in (21) and each entry of M is defined as Furthermore, M is of the form Here each matrix block M j,j , 1 ≤ j ≤ N y -1, 1 ≤ j ≤ N y -1 is of order N x -1 and can be expressed as We now analyze the structure of M. From (13) From (28) and (30), each matrix block M j,j satisfying |jj | > L becomes zero matrix.
Thus M has a block-banded structure given by From (29) and (30), we also observe that all the entries M j,j i,i with |ii | > K in M j,j vanish. That is, M j,j satisfying |jj | ≤ L has a banded structure expressed as By the transformations the pyramid function given in (2) can be written as Then from Eq. (27) can be deduced Let j 1j 1 = j 2j 2 = l, -L ≤ l ≤ L. That is, the matrix blocks M j 1 ,j 1 and M j 2 ,j 2 are on the same diagonal. Let Then we have According to (35), we have proved M j 1 ,j 1 = M j 2 ,j 2 .
Then we observe that Combining (35) and (37), we conclude that the matrix defined in (26) is a BTTB matrix.
Using the Taylor expansion (23), the collocation scheme (6) can be changed to an approximate form, whereû i,j is the alternative collocation approximation to the true solution u(x i , y j ). The matrix form of (38) can be written aŝ wherê andf The elements inf are given bŷ (42)

Corollary 1 The coefficient matricesÂ can be stored in O(N) memories.
Proof From (39), to store the matricesÂ, we need to store D 1 , D 2 , D 3 , B 1 , B 2 , and B 3 . Because D 1 , D 2 , and D 3 are all diagonal matrices, the storage of these three matrices requires O(N) memories.
According to the fact that matrices B 1 , B 2 , and B 3 have the same structure, we only need to analyse the storage requirement of B 1 . Because of the BTTB structure of B 1 , to store B 1 , we need to store only (2K + 1)-by-(2L + 1) entries which can be arranged as follows: Hence, the memory requirement of B 1 is O(N).

Corollary 2 The matrix-vector multiplicationÂu can be calculated in O(N log N) operations for any vector u ∈ R N .
Proof From (39), we havê We only consider the first term on the right-hand side in (44). The other two terms are calculated similarly. Because the matrix B 1 is a BTTB matrix, the matrix-vector multiplication G 1 = B 1 u can be finished in O (N log N) operations [25]. Subsequently, D 1 G 1 can be calculated in O(N) operations due to the fact that D 1 is a diagonal matrix. In summary, In the well-known Krylov subspace iteration method, such as conjugate gradient method (CG) and conjugate gradient squared method (CGS), each iteration contains several matrix-vector multiplications and vector operations. Because of the nonlocal property of the nonlocal diffusion model, the coefficient matrices A 1 and A 2 are usually dense. Hence, the matrix-vector multiplications in each iteration need O(N 2 ) computational operations. All other computations in each iteration still need O(N) computational operations. Thus, a fast Krylov subspace iteration method can be developed by calculating the matrix-vector multiplications in each iteration using Corollary 2. Moreover, the computer memory storage can be reduced for the coefficient matrixÂ using Corollary 1.

Numerical experiments
In this section, several numerical experiments are done to investigate the performance of the fast collocation method for the two-dimensional variable-coefficient nonlocal diffusion model. In the numerical runs as follows:, the variable-coefficient nonlocal diffusion model (1) with kernel function [26] σ (x, y) = 1 (x 2 + y 2 ) 1+s , s < is considered.
In the following numerical experiments, the domain Ω in (1) is chosen to be Ω = (0, 1) × (0, 1). We divide Ω by uniform square meshes, i.e., N x = N y . The horizon parameter is fixed to δ = 1/8. The true solution is chosen to be x(1x)y(1y). g(x, y) = u(x, y) on the boundary zone Ω c . The variable coefficient α(x, y) = 1 + x 2 + y 2 . We use MATLAB to run the following numerical examples on a 8G-memory laptop. Table 1 The L 2 errors of the collocation schemes (6) and (38) We run this numerical experiment with the original collocation method (6) and the approximate collocation method (38), respectively. In (6), there are not any approximations to the variable coefficient. However, in (38), a Taylor expansion is used to approximate the variable coefficient. We solve (6) and (38) by CGS and present the L 2 errors and the number of iterations of these two collocation methods in Table 1.
From Table 2, we can observe that the L 2 errors of the approximate collocation method are comparable to that of the original collocation method. Furthermore, the L 2 errors of the approximate collocation method are slightly better than that of the original collocation method. This is because the formation of matrix A 2 requires lots of numerical integrations (precisely speaking, we need 4(2K + 1) 2 numerical integrations to calculate each non-zero entry), but the formation of matrixÂ needs only a total of 3(2K + 1) 2 numerical integrations. The accumulation of truncation errors of the latter is better than that of the former. When N x = N y = 2 6 , because of the massive numerical integrations, the formation of matrix A 2 is very time-consuming(precisely speaking, it took us more than 2 days). Hence, when N x = N y is larger than 2 6 , we terminated the numerical experiment.

Conclusions
In this paper, a fast and faithful collocation method is developed for a two-dimensional variable-coefficient linear nonlocal diffusion model. Using Taylor expansion to the variable coefficient α(x, y), we see that the coefficient matrix resulting from the bilinear collocation method can be approximated by a sum of the products of diagonal matrix and BTTB matrix, which result in an approximate collocation method to the original physical problem. This fast method can reduce the computational operations in each Krylov subspace iteration from O(N 2 ) to O(N log N) and the memory requirement of the coefficient matrix from O(N 2 ) to O(N). The numerical experiments show the credibility and utility of this fast collocation method.