A preconditioned fast collocation method for a linear bond-based peridynamic model

We develop a fast collocation method for a static bond-based peridynamic model. Based on the analysis of the structure of the stiffness matrix, a fast matrix-vector multiplication technique was found, which can be used in the Krylov subspace iteration method. In this paper, we also present an effective preconditioner to accelerate the convergence of the Krylov subspace iteration method. Using the block-Toeplitz–Toeplitz-block (BTTB)-type structure of the stiffness matrix, we give a block-circulant-circulant-block (BCCB)-type preconditioner. The numerical experiments show the utility of the preconditioned fast collocation method.


Introduction
In the last decades, the peridynamic model has been applied in many research fields, such as failure and damage in composite laminates, crack propagation and branching, crack nucleation, phase transformations in solids, impact damage, damage in concrete, and so on [1][2][3][4][5][6][7]. In contrast to the classical theory of solid or fluid mechanics, which are usually modeled by partial differential equations [8][9][10], the peridynamic model does not explicitly involve any spatial derivatives of the displacement. Thus it provides a more natural description for problems with spontaneous formation of discontinuities or other singularities.
To date, several numerical methods have been developed and analyzed for the nonlocal diffusion model and peridynamic model as well as other related nonlocal models, such as finite element discretizations, finite difference method, finite volume method, collocation method, and the meshfree method [1,[11][12][13][14][15]. However, mathematically speaking, because of the nonlocality of these models, the stiffness matrices resulting from finite element method or collocation method are usually full or dense diagonal. The widely used Krylov subspace iteration method requires O(N 2 ) of memory to store the stiffness matrix and O(N 2 ) computational work in each iteration to solve the associated linear system where N is the number of spatial nodes. Therefore, the simulation of peridynamic model is usually time-consuming especially for large-scale problems (e.g.,multi-dimensional peridynamic models).
Several research works have been devoted to reducing the memory requirement and computations of the corresponding numerical schemes for peridynamic model [15][16][17][18]. These works are based on the analysis of Toeplitz or Toeplitz-type structure of the stiffness matrices and on the fast matrix-vector multiplication technique which can be used in the Krylov subspace iteration method. In [18], a fast collocation method for a two-dimensional bond-based linear peridynamic model was developed by carefully exploring the BTTBtype structure of the stiffness matrix A 2N . However, from the numerical experiments we can find that the convergence is slow, especially when the singularity of the kernel function is stronger.
In this paper, we follow the work started in [18] and develop a preconditioned fast collocation method for a linear peridynamic model. We provide an effective preconditioner to accelerate the convergence of the Krylov subspace iteration method. The rest of the paper is organized as follows. In Sect. 2, we recall the bilinear collocation method developed in [18]. However, the stiffness matrix is rewritten by reordering the unknowns differently from the unknowns' arrangement in [18]. In Sect. 3, we analyze that each block matrix of the stiffness matrix is a BTTB matrix. Based on this analysis, the operations in each Krylov subspace iteration and the storage of the stiffness matrix can be reduced. In Sect. 4, to accelerate the convergence of the Krylov subspace iteration method, a BCCB-type preconditioner is provided. In Sect. 5, we carry out several numerical experiments to investigate the performance of this preconditioner. In Sect. 6, we draw concluding remarks.

The bilinear collocation method
We consider the following two-dimensional bond-based linear peridynamic model: Here C is the micromodulus function Ω := (0, x R ) × (0, y R ) represents a rectangular domain; x := (x, y) T and x := (x , y ) T are positions of the particles in the reference configuration; u(   (2) is defined as For the convenience of the following chapter, we also give the definition of tensor product in the matrix case. Given two matrices A = (a i,j ) ∈ R p,q and B ∈ R r,s , the tensor product of A and B is defined as follows: Let N x and N y be positive integers. We define a spatial partition onΩ by x i := ih x for i = 0, 1, . . . , N x and y j := jh y for j = 0, 1, . . . , N y , where h x := x R /N x and h y := y R /N y are the mesh sizes in the x and y directions, respectively. To handle the discretization on the boundary zone Ω c , we extend the partition to (x i , y j ) for i = -K + 1, . . . , -1, 0, 1, . . . , N x , N x + 1, . . . , N x + K -1 and j = -L + 1, . . . , -1, 0, 1, . . . , N y , N y + 1, . . . , N y + L -1. Here, are the ceilings of δ/h x and δ/h y , respectively. Let ψ(ξ ) = 1-|ξ | for ξ ∈ [-1, 1] and zero otherwise. The two-dimensional pyramid functions φ i,j (x, y) centered at (x i , y j ) can be expressed as Then the trial functions v and w in the displacement vector u can be written as We choose (x i , y j ) for i = 1, . . . , N x -1 and j = 1, . . . , N y -1 as our collocation points. By substituting (5) into (1), we obtain the following collocation scheme: Let N = (N x -1)(N y -1) be the number of unknowns. If we define the vector of unknowns u 2N and the vector of right-hand side as follows: then the collocation scheme can be written as the following matrix form: where the 2N -by-2N stiffness matrix A 2N can be expressed as the following form in which each matrix block is of N order: In (9), the entries in the submatrix A v,v N are defined as Similarly, the entries in the submatrices A v,w N and A w,w N are given by and y) denotes the two-dimensional bilinear basis function at the grid point (x i , y j ). The global indices m and n are related to the indices (i, j) and (i , j ) by In (7) f v i,j and f w i,j are defined as follows: 3 The structure of stiffness matrix and A w,w N has a BTTB structure. More precisely, each matrix can be expressed as a (N y -1)-by-(N y -1) block-banded Toeplitz matrix with a block bandwidth 2L + 1, where the superscript Proof We only investigate the structure of A v,v N . The analysis of A v,w N and A w,w N is similar. By expanding the matrix A v,v N , we can easily find that A v,v N can be written as the following form: Here, each block matrix B j,j of order N x -1 denotes the interaction of row j and row j in the discrete system for 1 ≤ j ≤ N y -1 and 1 ≤ j ≤ N y -1. From (11) and (13), we can find that the entry A v,v m,n = 0 if and only if Therefore all the matrix blocks B j,j with |jj | > L in (17) vanish. Then A v,v N is a 2L + 1 block-banded matrix and can be expressed as the following form: Furthermore, in each matrix block B j,j , we have A v,v m,n = 0 for |ii | > K . That is, B j,j is a 2K + 1 banded matrix expressed as By introducing the following translations: the entries A v,v m,n given in (11) can be reduced to Let j 1j 1 = j 2j 2 = l, -L ≤ l ≤ L, and let Then we observe that, According to (24), we have proved B j 1 ,j 1 = B j 2 ,j 2 if the block matrices B j 1 ,j 1 and B j 2 ,j 2 are on the same diagonal.
Then we observe that According to (26), we conclude that each block matrix B j,j is a banded Toeplitz matrix. Combining (24) and (26), we complete the proof.

Corollary 1 The stiffness matrix A 2N can be stored in O(N) memories.
Proof From the structure of A 2N , we only prove that the block matrix A v,v N can be stored in O(N) memories. From Theorem 1 we can find that the matrix A v,v N can be stored only by storing the following (2K + 1)-by-(2L + 1) entries: Hence, A 2N can be stored in O(4 * (2K + 1) * (2L + 1)) = O(N) memories.

Corollary 2 For any vector u ∈ R 2N , the matrix-vector multiplication A 2N u can be carried out in O(N log N) operations.
Proof We divide the vector u in half. That is, in which u 1 , u 1 ∈ R N . Therefore Because of the BTTB structure of the matrices A v,v N , A v,w N , and A w,w N , the matrix-vector multiplications A v,v N u 1 , A v,w N u 2 , A v,w N u 1 , and A w,w N u 2 can be computed in O(N log N) operations [19]. Accordingly, the total matrix-vector multiplication A 2N u can be evaluated in For a BTTB-type matrix A 2N , a straightforward preconditioner can be chosen as

A preconditioned fast Krylov subspace method
where C v,w F,1 = C w,v F,1 and each matrix C I F,1 ∈ R N×N , with I = (v, v), (v, w) or (w, w), is written as [19][20][21] in which the matrix block C I l , with -L ≤ l ≤ L, is the Chan's circulant matrix of the Toeplitz matrix T I l : In (32), the diagonals of C I l are given by However, this preconditioner is not easy to invert. So we consider the following preconditioner called BCCB-type preconditioner in this paper: where for I = (v, v), (v, w),or (w, w), C I F,2 represents the BCCB matrix of A I N , and it can be generated by C I F,1 defined in (31). More precisely, C I F,2 can be expressed as where each circulant matrix block is defined bỹ Let F n be the n-order discrete Fourier transform matrix and I 2 be the 2-order identity matrix Let F 2 = F N y -1 ⊗ F N x -1 be the two-dimensional discrete Fourier transform matrix and F * 2 = (F N y -1 ⊗ F N x -1 ) * be the two-dimensional inverse Fourier transform matrix. Then the matrix C F,2 can be decomposed into the following form: Here Λ I is a diagonal matrix in which the entries on the main diagonal are the eigenvalues of the matrix C I F,2 for I = (v, v), (v, w), or (w, w). That is, Λ I can be written as To invert the matrix C F,2 fast, we consider the computational cost of solving the linear equation for some d ∈ R 2N .

Theorem 2 For any vector d ∈ R 2N , linear system (39) can be solved in O(N log N) operations.
Proof Recall that the eigenvalues of C I F,2 can be computed in O (N log N) operations by two-dimensional fast Fourier transform (2DFFT). Thus, the computation of Λ requires O (N log N) operations. From (37), to inverse C F,2 efficiently, we need to compute the inverse of Λ efficiently.
We introduce a permutation matrix P such that where for 1 ≤ k ≤ N , Therefore, each matrix blockΛ k,k with 1 ≤ k ≤ N is a 2-by-2 matrix. It requires O(1) operations to inverseΛ k,k . Accordingly, the overall computational work to inverse the matrix Λ is O(N) operations. To inverse the matrix C F,2 , we also have to compute (I 2 ⊗ F 2 )u and (I 2 ⊗ F * 2 )u for some u ∈ R 2N . It requires O(N log N) operations by using 2DFFT and twodimensional inverse fast Fourier transform (2DIFFM).
More precisely, we give the following procedures to solve the linear system (39) where w 1 (i) and w 2 (i) are the ith elements of the vectors w 1 and w 2 , respectively. It

Numerical experiments
In this section, we carry out some numerical experiments to investigate the performance of the preconditioned fast collocation method with the BCCB-type preconditioner discussed in Sect. 3. In the numerical experiments, we consider the peridynamic model (1) with the kernel function [15] σ (x, y) = 1 (x 2 + y 2 ) 1+s .
For simplicity, we choose h x = h y = h. We solve linear equation (8) by the conjugate gradient squared method (CGS), the fast conjugate gradient squared method (FCGS) without any preconditioners, and the fast conjugate gradient squared method with BCCB-type preconditioner (BCCB-FCGS). The following numerical experiments are performed on a computer with 8-GB memory. The accuracy requirement for iteration termination is 10 -8 .
Example 1 In this numerical example, we choose s = 3/8 in (44). We present the L 2 errors of the numerical solutions generated by the CGS solver, the FCGS solver, and the BCCB-FCGS solver for the mesh size from 1/2 4 to 1/2 9 in Table 1. We also present the CPU time consumed by these solvers and the number of iterations in the iterative methods in Table 1. In addition to the advantages of fast methods (FCGS and BCCB-FCGS) over traditional method (CGS) in memory requirement and CPU time, we observe from Table 1 that the BCCB-type preconditioner can significantly reduce the number of iterations and CPU time in contrast to FCGS without any preconditioners. For example, while h is equal to 2 9 , the FCGS solver takes 455 iterations and nearly 3 hours of CPU time, but the BCCB solver needs only 58 iterations and nearly 8 minutes of CPU time without any loss of accuracy. However, while we carried out the numerical test with h = 1/2 10 , we found a small increase in L 2 error. This small increase may be caused by singular integral when we calculate the main-diagonal entries of A v,v , A v,w , and A w,w .
In this numerical experiment, we also use a linear regression to fit the convergence rate α and the associated constant C α in the error estimate We also present these results in Table 1. We see that the convergence rate α is sublinear.
Example 2 We choose s = 0 in (44). We present the corresponding numerical results in Table 2 as in Example 1. We have the similar observations as in Example 1. However, in comparison with the results showed in Table 1, an improved accuracy of the numerical solutions and a reduced number of iterations in these three iteration methods are observed in Table 2.