The surrogate matrix methodology: A reference implementation for low-cost assembly in isogeometric analysis

Graphical abstract


Method details
In [1] , we applied the surrogate matrix methodology to isogeometric analysis (IGA) in order to avoid over-assembling mass, stiffness, and divergence matrices. While doing so, we also showed how this methodology could be applied to other system matrices arising in IGA computations. In that article, large performance gains were demonstrated for a number of important PDE scenarios while still maintaining solution accuracy. In the present article, we provide an accompanying reference implementation, together with explanations and examples. This should be considered an extension to the main theoretical article [1] , which we strongly recommend to read beforehand.
Recall that [1] focuses on the Galerkin form of IGA [2,3] . The resulting surrogate matrix methods perform quadrature for only a small fraction of the NURBS basis function interactions during assembly and then approximate the rest by interpolation . This leads to large sparse linear system matrices where the majority of entries have been computed by interpolation. Such interpolated matrices will generally not coincide with those which would otherwise be generated by performing quadrature for the complete basis, or on every element, but they can be interpreted as cost-effective surrogates for them.
This idea was first introduced in the context of first-order finite elements and massively parallel simulations by Bauer et al. in [4] . There, the computational run time improvement results from a two-scale strategy after which the method was named. Thereafter, several subsequent investigations were initiated [1,[5][6][7] , of which this paper may be seen as a product. In particular, massively parallel applications in geodynamical simulations are presented in [5,6] and a theoretical analysis for first order finite element methods is given in [7] . For moderately sized problems a two-scale strategy will possibly not result in an optimal performance gain. Therefore, we exploit a mesh-size dependent macro-element approach where the macro-mesh is closer related to the fine mesh.
In this article, we restrict our attention to Poisson's boundary value problem on a single patch geometry = ϕ ( ˆ ) , where ˆ = (0 , 1) n , ϕ : ˆ → R n is a fixed diffeomorphism of sufficient regularity, and n = 2 , 3 . Represented on the reference domain ˆ , the bilinear form appearing in the standard weak form of this problem is defined This definition preserves the symmetry and the kernel of the true IGA stiffness matrix A ∈ R N×N . The reference implementation described in this paper is built on the GeoPDEs package for Isogeometric Analysis in Matlab and Octave [8,9] . This package provides a framework for implementing and testing new isogeometric methods for PDEs. Our reference implementation is available in a git repository 1 , which is itself a fork of the GeoPDEs repository. It is important to note that similar modifications can be made to other present-day software libraries and so the implementation presented in this article should foremost serve as a template. For this reason, we tried to find a balance between comprehensibility and performance. Most notably, the performance of our implementation may be greatly improved by performing quadrature for individual basis function interactions instead of element-wise quadrature. However, for most IGA software libraries, this feature would require significant software restructuring.
Before continuing, let us establish some more mathematical notation which appears in the article, cf. [1] . Let n be the space dimension, p be the maximum degree of the discrete IGA B-spline spaces, and h be the mesh size. By M , we denote the skip parameter which defines the sampling length H = M · h on which we perform interpolation of the stencil functions. Let q be the spline degree of the interpolation space. We assume that the number of elements in each spatial dimension is always equal and we denote its value by N el . Finally, recall that B-spline and NURBS bases are built from tensor products of univariate B-splines. For convenience, we assume that the associated number of knots and the B-spline degrees do not depend on the Cartesian coordinates.

Implementation
Our implementation preserves the local element quadrature approach present in most standard IGA and finite element software. This is not optimally efficient, but performance advantages can still be easily achieved because quadrature is usually not required on every element. In order to avoid performing quadrature on specific elements, we made some minor modifications to the following core functions in GeoPDEs: msh_evaluate_col.m 2 , @sp_scalar/sp_evaluate_col.m 3 , and @sp_scalar/sp_evaluate_col_param.m 4 . 5 In addition to these minor modifications, we added two functions which handle the assembly of the surrogate stiffness matrices in 2D and 3D, respectively. Namely, we added op_gradu_gradv_surrogate_2d 6 and op_gradu_gradv_surrogate_3d 7 . These functions are based on @sp_scalar/op_gradu_gradv_tp 8 wherein the global matrix is assembled in a column-wise fashion [9] . In this section, we outline the modifications made to these core routines and describe the new 2D function in detail by breaking it down into coherent pieces of code.

Code modifications
In GeoPDEs, column-wise assembly which exploits the tensor product structure in IGA is used for performance [9] . For instance, in 2D, the functions listed above are called for each column of  Some typical patterns of active elements are depicted in Fig. 1 . Here, we have illustrated two different element masks corresponding to the cases p = 2 , 3 , M = 10 , and N el = 39 . Let us focus on the case p = 2 . Here, the first and last four columns correspond to near-boundary elements. That is, each whole column of elements is made up of active elements; cf. the red column in Fig. 1 . In this scenario, we should employ unmodified function calls to msh_evaluate_col and sp_evaluate_col .
In the third and final scenario, both elements in the interior and near the top and bottom boundary are active; cf. the blue elements in Fig. 1 . Each interior active element patch consists of p + 1 elements and we start sampling at the p + 1 element with an increment of M . Since we want to avoid extrapolation in the subsequent spline interpolation, we also include the active elements near the interior boundary. With K = N el − 3 p − 1 , the interior indices are thus given by The final element mask is obtained by also including the near-boundary elements, i.e., I ∪ B. In our case p = 2 , the element mask is described by the set {1, 2, 3, 4, 5, 13, 14, 15, 23, 24, 25, 33, 34, 35, 36, 37, 38, 39}.

Code extensions
Below is the signature of op_gradu_gradv_surrogate_2d . This function has the following input arguments: a space of type sp_scalar , a mesh of type msh_cartesian , a function handle coeff of the coefficient, the skip paramater M and the surrogate interpolation degree q . The final surrogate matrix K_surr (in sparse format) is the sole output.
At the beginning of the function, some self-explanatory sanity checks are performed which verify that the correct input parameters have been passed.
In the next few lines, some helper variables holding the number of degrees of freedom (dofs) in the univariate B-spline basis are defined. The total number may be expressed as N el + p. Removing 2 p dofs from both the left and right boundary yields the number of interior dofs N el − 3 p present in one Cartesian direction in the stencil function domain.
The following lines compute the indices of the rows in the global stiffness matrix corresponding to the stencil function domain. This is done by collecting all available indices and removing the 4 p indices nearest the boundary in each dimension. In the 3D case, the array row_indices has 3 dimensions.
The next step is computing the sample point coordinates in the interior of the stencil function domain ˜ described in [1] . For practical reasons, we map this domain to the unit domain [0, 1] n and take every M th point in each direction. To avoid extrapolating any values later, we reinsert the sample points on boundary of [0, 1] n which may have been skipped.
Prior to computing the element mask, we save the indices of the matrix rows corresponding to the sample points row_indices_subset . These indices are required in order to determine the active elements with the GeoPDEs function sp_get_cells . In order to make the coming call to sp_get_cells faster, we filter the row_indices_subset to only include the rows up to index (2 p + 1) (N el + p) . The last line filtering the rows is optional and may be omitted.
The following snippet sets up an element mask which skips the quadrature at all non-active elements. Here, we employ the sp_get_cells function which returns the indices of all elements within the support of the basis functions corresponding to the rows filtered in the previous snippet. Some additional filtering enforces that only the indices up to N el are included. Since quadrature needs to be performed on all of the remaining near-boundary elements, we ensure that the elements with first or second indices in B are also active.
Additionally, a second mask with only the near-boundary element indices is required. Clearly, this mask only includes the indices in B.
For performance reasons, we pre-allocate memory for the sparse matrix with an estimated number of non-zero entries per row.
Below is the first main loop which only performs quadrature on the active elements; i.e., those required for the subsequent stencil interpolation. It is very similar to the original column-wise loop in op_gradu_gradv_tp , but it employs the modified msh_evaluate_col and sp_evaluate_col (see previous subsection) with the previously determined masks.
We define three vectors, which will hold the columns, rows, and values of the surrogate matrix. This format allows for faster construction of a sparse matrix in a compressed sparse column format.
The following snippet contains the main loop of the surrogate assembly algorithm. In practice, there are | D| = (2 p + 1) n surrogate stencil functions ˜ δ (·) , so the statements in the inner-most loop are reached (2 p + 1) n times. First, the position (i.e., shift ) of the stencil function relative to the diagonal entries in the matrix is computed. This "shift" is in one-to-one correspondence with the translations δ ∈ D described in [1] . Due to symmetry, we only need to compute the surrogates for the upper-diagonal matrix, thus we skip all cases where the shift is smaller or equal to zero by simply continuing to the next iteration. If the shift is larger than zero, we extract all the sample points from the associated rows of the partly assembled stiffness matrix. Having all the sample points ˜ x s i at hand, we can easily perform the interpolation of the remaining matrix entries ˜ δ ( ˜ x i ) . In order to remove any dependence on external software, the built-in Matlab function interp2 is used here.
Note that this function only supports the spline interpolation orders q = 1 and q = 3 . In [1] , we used the RectBivariateSpline function provided by the SciPy Python package [10] , which supports any spline interpolation up to order 5. In the last three lines, the rows, columns, and values of the surrogate matrix are added to the vectors required for the sparse matrix creation at the end of the routine. Note that the values of the upper-diagonal are added to the lower-diagonal in order to enforce symmetry.
For performance reasons, we pre-allocate the diagonal of the surrogate stiffness matrix with dummy values which are overwritten later.
In the penultimate step, we combine the matrix components obtained through interpolation with those obtained through standard quadrature. This is done by creating a matrix K_interp with only the interpolated components. We then zero all of the entries in K_surr which have non-zero values in K_interp and finally sum the two matrices. This may seem complicated at first, but it yields good performance because the sum of both sparse matrices may be computed efficiently.
In the final step, the zero row-sum property is enforced by setting the diagonal value to the negative sum of the off-diagonal values in each row.

Remark 1.
The op_gradu_gradv_surrogate_3d function is in many ways similar to its 2D counterpart. For instance, clearly some of the 2D arrays need to be changed to 3D arrays, the 3D interpolation function interp3 needs to be used, and the number of stencil functions increases.
However, performing quadrature only on the active elements is more difficult, since in the columnwise iteration GeoPDEs uses, each column is now made up of a matrix of elements (instead of a vector). Therefore, a much more complicated set of element masks need to be used; see lines 98 to 115 in op_gradu_gradv_surrogate_3d.m 10 .

Verification
To facilitate easy verification, we added the demo script geopdes_surrogate_examples.m 11 , which provides the ability to run the problems considered in [1, Section 7.3] . The code for the 2D and 3D examples may be found in ex_surrogate_poisson_2d.m 12 and ex_surrogate_poisson_3d.m 13 , respectively. Again, recall Poisson's problem on a domain ⊂ R n . The 2D and 3D domains we considered are depicted in Fig. 2 and their respective NURBS/B-spline descriptions are in GeomQuarterAnnulusWithBumps.m 14 and GeomBentTwistedBox.m 15 . In order to test the quality of the discrete solutions, we employ the following manufactured solution and load in 2D f (x, y ) = 800 π 2 sin (20 π x ) sin (20 π y ) , and the following in 3D u (x, y, z) = sin (20 π x ) sin (20 π y ) sin (20 π z) , f (x, y, z) = 1200 π 2 sin (20 π x ) sin (20 π y ) sin (20 π z) .
The Dirichlet datum g is chosen as the restriction of the manufactured solution to the boundary ∂ .
Each example script assembles the standard IGA matrix A first and solves the corresponding system. Afterwards, the surrogate matrix ˜ A is assembled and the surrogate system is solved. Finally, the relative L 2 and H 1 errors for each case are computed and written to the console. In addition, both the maximum absolute value of the difference in the two stiffness matrices (i.e., A −˜ A max ) and the achieved speed-up are displayed.
In the following, we present the output of running the ex_surrogate_poisson_2d script in Matlab 2018b. The script ran on a workstation equipped with an Intel R Xeon R E5-1630 v3 processor with a nominal frequency of 3.7 GHz.

Remark 2.
The default parameters in the demo script are p = 2 , q = 3 , and M = 10 . The number of knots in each dimension is 160 in 2D and 40 in 3D. These parameters may be easily modified by resetting the appropriate variables at the beginning of each script.