A p-multigrid method enhanced with an ILUT smoother and its comparison to h-multigrid methods within Isogeometric Analysis

Over the years, Isogeometric Analysis has shown to be a successful alternative to the Finite Element Method (FEM). However, solving the resulting linear systems of equations efficiently remains a challenging task. In this paper, we consider a p-multigrid method, in which coarsening is applied in the approximation order p instead of the mesh width h. Since the use of classical smoothers (e.g. Gauss-Seidel) results in a p-multigrid method with deteriorating performance for higher values of p, the use of an ILUT smoother is investigated. Numerical results and a spectral analysis indicate that the resulting p-multigrid method exhibits convergence rates independent of h and p. In particular, we compare both coarsening strategies (e.g. coarsening in h or p) adopting both smoothers for a variety of two and threedimensional benchmarks.


Introduction
Isogeometric Analysis (IgA) [1] has become widely accepted over the years as an alternative to the Finite Element Method (FEM). The use of B-spline basis functions or Non-Uniform Rational B-splines (NURBS) allows for a highly accurate representation of complex geometries and establishes the link between computer-aided design (CAD) and computer-aided engineering (CAE) tools. Furthermore, the C p−1 continuity of the basis functions offers a higher accuracy per degree of freedom compared to standard FEM [2]. IgA has been applied with success in a wide range of engineering fields, such as structural mechanics [3], fluid dynamics [4] and shape optimization [5]. Solving the resulting linear systems efficiently is, however, still a challenging task. The condition numbers of the mass and stiffness matrices increase exponentially with the approximation order p, making the use of (standard) iterative solvers inefficient. On the other hand, the use of (sparse) direct solvers is not straightforward due to the increasing stencil of the basis functions and increasing bandwidth of matrices for higher values of p. Furthermore, direct solvers may not be practical for large problem sizes due to memory constraints, which is a common problem in high-order methods in general. Recently, various solution techniques have been developed for discretizations arising in Isogeometric Analysis. For example, preconditioners have been developed based on fast solvers for the Sylvester equation [13] and overlapping Schwarz methods [14]. As an alternative, geometric multigrid methods have been investigated, as they are considered among the most efficient solvers in Finite Element Methods for elliptic problems. However, the use of standard smoothers like (damped) Jacobi or Gauss-Seidel leads to convergence rates which deteriorate for increa-sing values of p [15]. It has been noted in [16] that very small eigenvalues associated with high-frequency eigenvectors cause this behaviour. This has lead to the development of non-classical smoothers, such as smoothers based on mass smoothing [17,18,40] or overlapping multiplicative Schwarz methods [20], showing convergence rates independent of both h and p. p-Multigrid methods can be adopted as an alternative solution strategy. In contrast to h-multigrid methods, a hierarchy is constructed where each level represents a different approximation order. Throughout this paper, the coarse grid correction is obtained at level p = 1. Here, B-spline functions coincide with linear Lagrange basis functions, thereby enabling the use of well known solution techniques for standard Lagrangian Finite Elements. Furthermore, the stencil of the basis functions and bandwidth of the matrix is significantly smaller at level p = 1, reducing both assembly and factorization costs of the multigrid method. p-Multigrid methods have mostly been used for solving linear systems arising within the Discontinuous Galerkin method [6,7,8,9], where a hierarchy was constructed until level p = 0. However, some research has been performed for continuous Galerkin methods [10] as well, where the coarse grid correction was obtained at level p = 1. Recently, the authors applied a p-multigrid method, using a Gauss-Seidel smoother, in the context of IgA [11]. As with h-multigrid methods, a dependence of the convergence rate on p was reported. In this paper, a p-multigrid method is presented that makes use of an Incomplete LU Factorization based on a dual threshold strategy (ILUT) [12] for smoothing. The spectral properties of the resulting p-multigrid method are analyzed adopting both smoothers. Numerical results are presented for Poisson's equation on a quarter annulus, an L-shaped (multipatch) geometry and the unit cube. Furthermore, the convection-diffusion-reaction (CDR) equation is considered on the unit square. The use of ILUT as a smoother improves the performance of the p-multigrid method significantly and leads to convergence rates which are independent of h and p. Compared to standard h-multigrid methods, both the coarsening strategy and smoother are adjusted in the proposed p-multigrid method. Therefore, a comparison study is performed in terms of convergence rates and CPU times between p-multigrid and h-multigrid methods using both smoothers. Furthermore, the p-multigrid method with ILUT as a smoother is compared to an h-multigrid method adopting a smoother based on stable splittings of spline spaces [18]. Finally, to show the versatility of the proposed p-multigrid method, it is applied to solve linear systems of equations resulting from THBspline discretizations [41]. This paper is organised as follows. In Section 2 the model problem, the basics of IgA and the spatial discretization are considered. Section 3 presents the p-multigrid method in detail, together with the proposed ILUT smoother. A spectral analysis is performed with both smoothers and coarsening strategies and is discussed in Section 4. In Section 5, numerical results for the considered benchmarks are presented. Finally, conclusions are drawn in Section 6.

Model problem and IgA discretization
To assess the quality of the p-multigrid method, the convection-diffusionreaction (CDR) equation is considered as a model problem: where D denotes the diffusion tensor, v a divergence-free velocity field and R a source term. Here, Ω ⊂ R 2 is a connected, Lipschitz domain, f ∈ L 2 (Ω) and u = 0 on the boundary ∂Ω. Let V = H 1 0 (Ω) denote the space of functions in the Sobolev space H 1 (Ω) that vanish on ∂Ω. The variational form of (1) is then obtained by multiplication with an arbitrary test function v ∈ V and application of integration by parts: where and The physical domain Ω is then parameterized by a geometry function The geometry function F describes an invertible mapping connecting the parameter domain Ω 0 = (0, 1) 2 with the physical domain Ω. In case Ω cannot be described by a single geometry function, the physical domain is divided into a collection of K non-overlapping subdomains Ω (k) such that A family of geometry functions F (k) is then defined to parameterize each subdomain Ω (k) separately: In this case, we refer to Ω as a multipatch geometry consisting of K patches.

B-spline basis functions
Throughout this paper, the tensor product of univariate B-spline basis functions of order p is used for spatial discretization, unless stated otherwise. Univariate B-spline basis functions are defined on the parameter domain Ω = (0, 1) and are uniquely determined by their underlying knot vector consisting of a sequence of non-decreasing knots ξ i ∈Ω. Here, N denotes the number of univariate basis functions of order p defined by this knot vector. B-spline basis functions are defined recursively by the Cox-de Boor formula [21]. The resulting B-spline basis functions φ i,p are non-zero on the interval [ξ i , ξ i+p+1 ), implying a compact support that increases with p. Furthermore, at every knot ξ i the basis functions are C p−m i -continuous, where m i denotes the mutiplicity of knot ξ i . Finally, the basis functions possess the partition of unity property: Throughout this paper, B-spline basis functions are considered based on an open uniform knot vector with knot span size h, implying that the first and last knots are repeated p + 1 times. As a consequence, the basis functions considered are C p−1 continuous and interpolatory only at the two end points. For the two-dimensional case, the tensor product of univariate B-spline basis functions φ ix,p (ξ) and φ iy,q (η) of order p and q, respectively, with maximum continuity is adopted for the spatial discretization: Here, i and p are multi indices, with i x = 1, . . . , n x and i y = 1, . . . , n y denoting the univariate basis functions in the x and y-dimension, respectively. Furthermore, i = i x n x + (i y − 1)n y assigns a unique index to each pair of univariate basis functions, where i = 1, . . . N dof . Here, N dof denotes the number of degrees of freedom, or equivalently, the number of tensor-product basis functions and depends on both h and p. In this paper, all univariate B-spline basis functions are assumed to be of the same order (i.e. p = q).
The spline space V h,p can then be written, using the inverse of the geometry mapping F −1 as pull-back operator, as follows: The Galerkin formulation of (2) becomes: The discretized problem can be written as a linear system where A h,p denotes the system matrix resulting from this discretization with B-spline basis functions of order p and mesh width h. For a more detailed description of the spatial discretization in Isogeometric Analysis, we refer to [1]. Throughout this paper four benchmarks are considered, to investigate the influence of the geometric factor, the considered coefficients in the CDR-equation, the number of patches and the dimension on the proposed p-multigrid solver.

Benchmark 1.
Let Ω be the quarter annulus with an inner and outer radius of 1 and 2, respectively. The coefficients are chosen as follows: Furthermore, homogeneous Dirichlet boundary conditions are applied and the right-hand side is chosen such that the exact solution u is given by: u(x, y) = −(x 2 + y 2 − 1)(x 2 + y 2 − 4)xy 2 .
Benchmark 2. Here, the unit square is adopted as domain, i.e. Ω = [0, 1] 2 , and the coefficients are chosen as follows: Homogeneous Dirichlet boundary conditions are applied and the righthand side is chosen such that the exact solution u is given by: u(x, y) = sin(πx)sin(πy).
A multipatch geometry is created, by splitting the single patch in each direction uniformly. The coefficients are chosen as follows: The exact solution is given by: , and the right-hand side is chosen accordingly. Inhomogeneous Dirichlet boundary conditions are prescribed for this benchmark.
Benchmark 4. Here, the unit cube is adopted as domain, i.e. Ω = [0, 1] 3 , and the coefficients are chosen as follows: Homogeneous Dirichlet boundary conditions are applied and the righthand side is chosen such that the exact solution u is given by: u(x, y) = sin(πx)sin(πy)sin(πz).

p-Multigrid method
Multigrid methods [22,23] aim to solve linear systems of equations by defining a hierarchy of discretizations. At each level of the multigrid hierarchy a smoother is applied, whereas on the coarsest level a correction is determined by means of a direct solver. Starting from V h,1 , a sequence of spaces V h,1 , . . . , V h,p is obtained by applying k-refinement to solve Equation (13). Note that, since basis functions with maximal continuity are considered, the spaces are not nested. A single step of the two-grid correction scheme for the p-multigrid method consists of the following steps [11]: 1. Starting from an initial guess u (0,0) h,p , apply a fixed number ν 1 of presmoothing steps: where S h,p is a smoothing operator applied to the high-order problem. 2. Determine the residual at level p and project it onto the space V h,p−1 using the restriction operator I p−1 p : 3. Solve the residual equation at level p − 1 to determine the coarse grid error: 4. Project the error e h,p−1 onto the space V h,p using the prolongation operator I p p−1 and update u 5. Apply ν 2 postsmoothing steps of (18) to obtain u h,p . In the literature, steps (2)-(4) are referred to as 'coarse grid correction'. Recursive application of this scheme on Equation (20) until level p = 1 is reached, results in a V-cycle. In contrast to h-multigrid methods, the coarsest problem in p-multigrid can still be large for small values of h. However, since we restrict to level p = 1, the coarse grid problem corresponds to a standard low-order Lagrange FEM discretization of the problem at hand. Therefore, we use a standard h-multigrid method to solve the coarse grid problem in our p-multigrid scheme, which is known to be optimal (in particular h-independent) in this case. As a smoother, Gauss-Seidel is applied within the h-multigrid method, as it's both cheap and effective for low degree problems. Applying two V-cycles using canonical prolongation, weighted restriction and a single smoothing step turned out to be sufficient and has therefore been adopted throughout this paper as coarse grid solver. Note that, for the p-multigrid method, the residual can be projected directly to level p = 1. It was shown in [24] that the preformance of the p-multigrid method is hardly affected, while the set-up costs decrease significantly. In Appendix A, numerical results are presented for the first benchmark confirming this observation. Throughout this paper, a direct projection to level p = 1 is adopted for the p-multigrid method, see Figure 1. Results are compared to an h-multigrid method, which is shown as well in Figure 1.

Prolongation and restriction
To transfer both coarse grid corrections and residuals between different levels of the multigrid hierarchy, prolongation and restriction operators are defined. The prolongation and restriction operator adopted in this paper are based on an L 2 projection and have been used extensively in the literature [25,26,27]. At level p = 1, the coarse grid correction is prolongated to level p h-multigrid IgA Figure 1: Illustration of the considered p-multigrid (top) and h-multigrid (bottom) method. At p = 1, Gauss-Seidel is always adopted as a smoother (•), whereas at the high order level Gauss-Seidel or ILUT can be applied ( ). At the coarsest level, a direct solver is applied to solve the residual equation ( ).
by projection onto the space V h,p . The prolongation operator I p 1 : V h,1 → V h,p is given by where the mass matrix M p and transfer matrix P p 1 are defined, respectively, as follows: The residuals are restricted from level p to 1 by projection onto the space To prevent the explicit solution of a linear system of equations for each projection step, the consistent mass matrix M in both transfer operators is replaced by its lumped counterpart M L by applying row-sum lumping: Numerical experiments, presented in Appendix B, show that lumping the mass matrix hardly influences the convergence behaviour of the resulting pmultigrid method. Neither does it affect the overall accuracy obtained with the p-multigrid method. Alternatively, one could invert the mass matrix efficiently by exploiting the tensor product structure, see [28]. Note that this choice of prolongation and restriction operators yields a nonsymmetric coarse grid correction and, hence, a non-symmetric multigrid solver. As a consequence, the multigrid solver can only be applied as a preconditioner for a Krylov method suited for non-symmetric matrices, like BiCGSTAB.
Remark 1: Choosing the prolongation and restriction operator tranpose to each other would restore the symmetry of the multigrid method. However, numerical experiments, not presented in this paper, show that this leads to a less robust p-multigrid method. Therefore, the prolongation and restriction operator are adopted as defined in Equation (22) and (24), respectively.

Smoother
Within multigrid methods, a basic iterative method is typically used as a smoother. However, in IgA the performance of classical smoothers such as (damped) Jacobi and Gauss-Seidel decreases significantly for higher values of p. Therefore, an Incomplete LU Factorization is adopted with a dual threshold strategy (ILUT) [12] to approximate the operator A h,p : The ILUT factorization is determined completely by a tolerance τ and fillfactor m. Two dropping rules are applied during factorization: 1. All elements smaller (in absolute value) than the dropping tolerance are dropped. The dropping tolerance is obtained by multiplying the tolerance τ with the average magnitude of all elements in the current row.
2. Apart from the diagonal element, only the M largest elements are kept in each row. Here, M is determined by multiplying the fillfactor m with the average number of non-zeros in each row of the original operator A h,p .
The ILUT factorization considered in this paper is closely related to an ILU(0) factorization. This factorization has been applied in the context of IgA as a preconditioner, showing good convergence behaviour [34]. An efficient implementation of ILUT is available in the Eigen library [29] based on [30]. Once the factorization is obtained, a single smoothing step is applied as follows: where the two matrix inversions in Equation (28) amount to forward and backward substitution. Throughout this paper, the fillfactor m = 1 is used (unless stated otherwise) and the dropping tolerance equals τ = 10 −12 .
Hence, the number of non-zero elements of L h,p + U h,p is similar to the number of non-zero elements of A h,p . Figure 2 shows the sparsity pattern of the stiffness matrix A h,3 and L h,3 + U h,3 for the first benchmark and h = 2 −5 .
Since a fill-reducing permutation is performed during the ILUT factorization, sparsity patterns differ significantly. However, the number of non-zero entries is comparable.

Coarse grid operator
At each level of the multigrid hierarchy, the operator A h,p is needed to apply smoothing or compute the residual. The operators at the coarser levels can be obtained by rediscretizing the bilinear form in (2) with low-order basis functions. Alternatively, a Galerkin projection can be adopted: However, since the condition number when using the Galerkin projection is significantly higher compared to the condition number of the rediscretized operator, the coarse grid operators in this paper are obtained by using the rediscretization approach.

Computational costs
To investigate the costs of the proposed p-multigrid method, the assembly costs, factorization costs and the costs of a single V-cycle are analyzed. Assuming a direct projection to level p = 1, both A h,p and A h,1 have to be assembled. Furthermore, the (variationally lumped) mass matrices M L 1 , M L p and transfer matrix P 1 p have to be assembled. Assuming an element-based assembly loop with standard Gauss-quadrature, assembling the stiffness matrix or transfer matrix at level k costs O(N dof k 3d ) flops. More efficient assembly techniques like weighted quadrature [31], sum factorizations [32] and low tensor rank approximations [33] exist, but have not yet been explored. However, since assembly costs might dominate the overall computational costs, assembly costs will be presented separately in Section 5. Assembling the (variationally lumped) mass matrices M L 1 and M L p costs O(N dof ) flops. At the high order level an ILUT factorization of A h,p needs to be determined, costing O(N dof p 2d ) flops [34] in case m = 1 and τ = 10 −12 . Alternatively to ILUT, Gauss-Seidel can be applied as a smoother, without any set-up costs. At the high order level of the V-cycle both pre-and postsmoothing is applied. Given the ILUT factorization, applying a single smoothing step costs O(N dof p d ) flops. Applying Gauss-Seidel as a smoother at level p = 1, costs O(N dof 1 d ) flops [34]. For both prolongation and restriction, a sparse matrixvector multiplication has to be performed, costing O(N dof p d ) flops for each application.
Finally, the residual equation (20) is solved by applying a single V-cycle of an h-multigrid method, which uses Gauss-Seidel as a smoother. Prolongation and restriction operators of the h-multigrid method are based on canonical interpolation and weighted restriction, respectively. Table 1 provides an overview of the computational costs of the proposed p-multigrid method.
The memory requirements of the proposed p-multigrid method is strongly related to the number of nonzero entries of each operator. For the stiffness matrix in d dimensions, the number of nonzero entries at level k equals O(N dof k d ). Table 2 shows the number of nonzero entries for all operators in the p-multigrid method for each level.

Number of nonzero entries
Note that, compared to h-multigrid methods, the p-multigrid method consists of one extra level. Since k-refinement is applied, the dimensions of the matrix remain of O(N dof ). However, at level p = 1, coarsening in h is applied which leads to a reduction of the number of degrees of freedom with a factor of 2 d from one level to the other, as with h-multigrid. Furthermore, the number of nonzero entries significantly reduces due to the smaller support of the piecewise linear B-spline basis functions. A more detailed comparison between h-multigrid and p-multigrid methods, also in terms of CPU times, can be found in Section 4 and 5.

Spectral Analysis
In this section, the performance of the proposed p-multigrid method is analyzed and compared with h-multigrid methods in different ways. First, a spectral analysis is performed to investigate the interplay between the coarse grid correction and the smoother. In particular, we compare both smoothers (Gauss-Seidel and ILUT) and coarsening strategies (in h or p). Then, the spectral radius of the iteration matrix is determined to obtain the asymptotic convergence factors of the p-multigrid and h-multigrid methods. Throughout this section, the first two benchmarks presented in Section 2 are considered for the analysis.

Reduction factors
To investigate the effect of a single smoothing step or coarse grid correction, a spectral analysis [35] is carried out for different values of p. For this analysis, we consider −∆u = 0 with homogeneous Dirichlet boundary conditions and, hence, u = 0 as its exact solution. Let us define the error reduction factors as follows: where S h,p (·) denotes a single smoothing step and CGC(·) an exact coarse grid correction. We denote a coarse grid correction obtained by coarsening in p and h by CGC p and CGC h , respectively. For CGC p , a direct projection to p = 1 is considered. As an initial guess, the generalized eigenvectors v i are chosen which satisfy Here, M h,p is the consistent mass matrix as defined in (23). The error reduction factors for the first benchmark for both smoothers and coarsening strategies are shown in Figure 3 for h = 2 −5 and different values of p. The reduction factors obtained with both smoothers are shown in the left column, while the plots in the right column show the reduction factors for both coarsening strategies. In general, the coarse grid corrections reduce the coefficients of the eigenvector expansion corresponding to the low-frequency modes, while the smoother reduces the coefficients associated with high-frequency modes. However, for increasing values of p, the reduction factors of the Gauss-Seidel smoother increase for the high-frequency modes, implying that the smoother becomes less effective. On the other hand, the use of ILUT as a smoother leads to decreasing reduction factors for all modes when the value of p is increased. The coarse grid correction obtained by coarsening in h (e.g. CGC h ) is more effective compared to a correction obtained by coarsening p. Note that, for higher values of p, both types of coarse grid correction remain effective in reducing the coefficients of the eigenvector expansion corresponding to the low-frequency modes. Figure 4 shows the error reduction factors obtained for the second benchmark, showing similar, but less oscillatory, behaviour.
These results indicate that the use of ILUT as a smoother (with ν 1 = ν 2 = 1) could significantly improve the convergence properties of the p-multigrid and h-multigrid method compared to the use of Gauss-Seidel as a smoother.

Iteration Matrix
For any multigrid method, the asymptotic convergence rate is determined by the spectral radius of the iteration matrix. To obtain this matrix explicitly, consider, again, −∆u = 0 with homogeneous Dirichlet boundary conditions. By applying a single iteration of the p-multigrid or h-multigrid method using the unit vector e i h,p as initial guess, one obtains the i th column of the iteration matrix [36]. Figure 5 shows the spectra for the first benchmark for h = 2 −5 and different values of p obtained with both multigrid methods using Gauss-Seidel and ILUT as a smoother. For reference, the unit circle has been added in all plots. The spectral radius of the iteration matrix, defined as the maximum eigenvalue in absolute value, is then given by the eigenvalue that is the furthest away from the origin. Clearly, the spectral radius significantly increases for higher values of p when adopting Gauss-Seidel as a smoother. The use of ILUT as a smoother results in spectra clustered around the origin, implying fast convergence of the resulting p-multigrid or h-multigrid method. The spectra of the iteration matrices for the second benchmark are presented in Figure 6. Although the eigenvalues are more clustered with Gauss-Seidel compared to the first benchmark, the same behaviour can be observed. The spectral radii for both benchmarks, where ν 1 = ν 2 = 1, are presented in Table 3 until Table 6. For Gauss-Seidel, the spectral radius of the iteration matrix is independent of the mesh width h and coarsening strategy, but depends strongly on the approximation order p. The use of ILUT leads to a spectral radius which is significantly lower for all values of h and p. Although ILUT exhibits a small h-dependence, the spectral radius remains low for all  values of h and both coarsening strategies. As a consequence, the p-multigrid and h-multigrid method are expected to show both h-and p-independence convergence behaviour when ILUT is adopted as a smoother.

h-multigrid (p=4)
Gauss-Seidel ILUT Figure 6: Spectra of the iteration matrix for the second benchmark obtained with pmultigrid (left) and h-multigrid (right).

Numerical Results
In the previous Section, a spectral analysis showed that the use of ILUT as a smoother within a p-multigrid or h-multigrid method significantly improves the asymptotic convergence rate compared to the use of Gauss-Seidel as a smoother. In this Section, p-multigrid and h-multigrid are both applied as a stand-alone solver and as a preconditioner within a stabilized BiConjugate Gradient (BiCGSTAB) to verify this analysis. Results in terms of iteration numbers and CPU times are obtained using Gauss-Seidel and ILUT as a smoother. Furthermore, the proposed p-multigrid method is compared to an h-multigrid method using a non-standard smoother. Finally, the p-multigrid method is adopted for discretizations using THB-splines. For all numerical experiments, the initial guess u (0) h,p is chosen randomly, where each entry is sampled from a uniform distribution on the interval [−1, 1] using the same seed. Furthermore, we choose ν 1 = ν 2 = 1 for consistency. Application of multiple smoothing steps, which is in particular common for Gauss-Seidel, decreases the number of iterations until convergence, but does not qualitatively or quantitatively change the p-dependence. Furthermore, since the smoother costs dominate when solving the linear systems, CPU times are only mildly affected. The stopping criterion is based on a relative reduction of the initial residual, where a tolerance of = 10 −8 is adopted. Boundary conditions are imposed by eliminating the degrees of freedom associated to the boundary. Since the use of a V-cycle or W-cycle led to the same number of iterations and the computational costs per cycle is lower for V-cycles, they are considered throughout the remainder of this paper. Table 7 shows the number of V-cycles needed to achieve convergence using different smoothers for all benchmarks. For the first three benchmarks, the number of V-cycles needed with Gauss-Seidel is in general independent of the mesh width h, but strongly depends on the approximation order p. For some configurations, however, the use of Gauss-Seidel leads to a method that diverges, indicated with (−). The p-multigrid method was said to be diverged in case the relative residual exceeded 10 10 at the end of a V-cycle. In general, the use of ILUT as a smoother leads to a p-multigrid which converges for all configurations and exhibits both independence of h and p. Only for the third benchmark, a logarithmic dependence in h is visible for p = 2. Furthermore, the number of iterations needed for convergence is significantly lower.

p-Multigrid as stand-alone solver
For Poisson's equation on the unit cube, the p-dependence when Gauss-Seidel is adopted as smoother is stronger compared to the dependence for the twodimensional benchmarks. Furthermore, the number of iterations slightly decreases for smaller values of the meshwidth h. The number of iterations needed with ILUT as a smoother, is effectively independent of the approximation order p.   h-Multigrid as stand-alone solver Table 8 shows the number of V-cycles needed to achieve convergence using a h-multigrid method. As expected from the spectral analysis, the number of V-cycles needed with Gauss-Seidel is in general independent of the mesh width h, but strongly depends on the approximation order p. The use of ILUT as a smoother leads to a h-multigrid which converges for all configurations and exhibits both independence of h and p. Furthermore, the number of iterations needed for convergence is significantly lower. Compared to the use of p-multigrid as a method, the results are very similar. For benchmark 3, however, the number of iterations needed with h-multigrid using ILUT as a smoother is slightly lower compared to the p-multigrid method. For some configurations, the h-multigrid does not converge when applied to the threedimensional benchmark, which is denoted by (−).  p-Multigrid as a preconditioner As an alternative, the p-multigrid method can be applied as a preconditioner within a BiCGSTAB method. In the preconditioning phase of each iteration, a single V-cycle is applied. The number of iterations needed to achieve convergence can be found in Table 9. When applying Gauss-Seidel as a p = 2  smoother, the number of iterations needed with BiCGSTAB is significantly lower compared to the number of p-multigrid V-cycles and even restores stability for higher values of p (see Table 7). However, a dependence of the iteration numbers on p is still present. When adopting ILUT as a smoother, the number of iterations needed for convergence slightly decreases compared to the number of p-multigrid V-cycles for all configurations and benchmarks. Furthermore, the number of iterations is independent of both h and p. h-Multigrid as a preconditioner The number of iterations needed to achieve convergence with h-multigrid can be found in Table 10. Note that, since the h-multigrid method is symmetric,  Table 9: Number of iterations needed to achieve convergence with BiCGSTAB, using p-multigrid as preconditioner.
a Conjugate Gradient (CG) method can be applied as a Krylov solver. In general, a single iteration performed with a BiCGSTAB method is twice as expensive compared to a single CG iteration. Results with a CG method have been added between brackets in Table 10.

CPU times
Besides iteration numbers, computational times have been determined when adopting p-multigrid and h-multigrid as a stand-alone solver. A serial implementation in the C++ library G+Smo [39] is considered on a Intel(R) Core(TM) i7-8650U CPU (1.90GHz). Figure 7 illustrates the CPU times obtained for the p-multigrid and h-multigrid method for the first benchmark. Tables with the detailed CPU times can be found in Appendix C and Appendix D. The assembly times denote the CPU time needed to assemble all operators, including the prolongation and restriction operators. Note that, for the pmultigrid method more operators have to be assembled. However, most of the operators in the p-multigrid method are assembled at level p = 1, where the number of nonzero entries is significantly lower compared to the matrices resulting from high order discretizations. As a consequence the total assembly costs are lower with p-multigrid compared to h-multigrid for higher values of the approximation order p. With respect to the setup costs of the smoother, similar observations can be made: For higher values of p, the ILUT factorization costs are significantly higher for the h-multigrid method. The time needed to solve linear systems is slightly lower for the h-multigrid methods, since the costs of a single V-cycle is lower compared to the p-multigrid method. When adopting Gauss-Seidel as a smoother, the time needed to solve the linear systems is significantly higher compared to the use of ILUT. However, since the factorizations costs are relatively high, the p-multigrid/h-multigrid methods using ILUT as a smoother are faster for only a limited amount of configurations.
Remark 2: For all numerical experiments, the 'coarse grid' operators of the multigrid hierarchy have been obtained by rediscretizing the bilinear form in Equation (2)). Alternatively, all operators of the h-multigrid hierarchy could be obtained by applying the Galerkin projection. Furtermore, alternative (and more efficient) assembly strategies exist, as mentioned in Section 3. Therefore, the assembly, smoother setup and solving costs are presented separately in this Section. Comparison with an alternative smoother Throughout this paper, the use of ILUT and Gauss-Seidel as a smoother has been investigated within a p-multigrid and h-multigrid method. However, alternative smoothers have been developed for h-multigrid methods in recent years. For example, a smoother based on stable splittings of spline spaces [18]. In this section, we compare the p-multigrid (adopting ILUT as a smoother) with an h-multigrid method using this smoother. Note that this smoother has been extended to multipatch domains as well [40]. For this comparison, we consider the CDR-equation on the unit square with: Homogeneous Neumann boundary conditions are applied and the right-hand side is given by: f (x, y) = 2π 2 sin(π(x + 1 2 ))sin(π(y + 1 2 )).  Table 11: Comparison of a p-multigrid using ILUT as a smoother with a h-multigrid method based on stable splittings of spline spaces [18].
CPU times for assembly, setting up the smoother and solving the linear system are presented in Figure 8. Again, a serial implementation in the C++ library G+Smo [39] is considered on a Intel(R) Core(TM) i7-8650U CPU (1.90GHz). Detailed CPU times can be found in Appendix E. The time needed to assembly the operators is comparable for the p-multigrid and hmultigrid method. However, setting up the ILUT smoother is significantly more expensive compared to the smoother from [18]. On the other hand, the CPU needed to solve the problem is lower when adopting the p-multigrid method. The total solver costs are lower for all configurations when adopting the smoother based on stable splitting of subspaces. However, we would like to emphasize that the proposed p-multigrid method can easily be implemented and applied for a wide variety of problems (multipatch, variable coefficients) without the need of tuning a parameter or development of a specific smoother.

Truncated Hierarchical B-splines (THB-splines)
Finally, to illustrate the versatility of the proposed p-multigrid method, we consider discretizations obtained with THB-splines [41]. THB-splines are the result of a local refinement strategy, in which a subset of the basis functions on the fine level are truncated. As a result, not only linear independence and non-negativity are preserved (as with HB-splines [42,43]), but also the partition of unity property. In the literature, the use of multigrid methods for THB-spline discretizations is limited and an ongoing topic of research [44,45,46]. We consider Poisson's equation on the unit square, where the exact solution is the same as for the second benchmark. Starting from a tensor product B-spline basis with meshwidth h and order p, two and three levels of refinement are added as shown in Figure 9, leading to a THB-spline basis consisting of, respectively, three and four levels.
(a) (b) Figure 9: Two hierarchical mesh adopted for THB-Spline basis with the second (green), third (orange) and fourth (red) refinement levels coloured. Figure 10 shows the sparsity pattern of the stiffness matrix and the ILUT factorization for p = 4 and h = 2 −5 for configuration (b). Compared to the (standard) tensor-product B-spline basis the bandwith of the stiffness matrix significantly increases. Table 12 shows the results obtained with p-multigrid applied as a stand-alone solver. The number of iterations needed with pmultigrid (and ILUT as a smoother) depends only mildly on p. Furthermore, the number of iterations are significantly lower compared to the use of Gauss-Seidel as a smoother. For the configurations denoted in bold, a fillfactor of 2 was adopted, to prevent the p-multigrid from diverging. Figure 11 illustrates  does not reduce the norm of the (generalized) eigenvectors, while a fillfactor of 2 reduces the eigenvectors over the entire spectrum. In general, a higher fillfactor was necessary for only a limited amount of configurations. For all numerical experiments, smoothing is performed globally at each level of the multigrid hierarchy. In general, local smoothing is often adopted to ensure optimal order of the complexity. Results presented in this Section should be considered as a first step towards the use of p-multigrid methods for THB-spline discretizations. Future research should focus on more efficient applications of p-multigrid solvers for THB-spline discretizations.

Conclusions
In this paper, we presented a p-multigrid method that uses ILUT factorization as a smoother and compared this with different smoothers and coarsening strategy (e.g. h-multigrid). In contrast to classical smoothers, (i.e. Gauss-Seidel), the reduction factors of the general eigenvectors associated with high-frequency modes do not increase when adopting ILUT as a smoother for higher values of p. This results in asymptotic convergence factors which are independent of both the mesh width h and approximation order p for both p-multigrid and h-multigrid methods adopting this smoother. Furthermore, we observed that, assuming an exact coarse grid correction, coarsening in h leads to a more effective coarse grid correction compared to a correction obtained by coarsening in p. Numerical results, obtained for Poisson's equation on a variety of domains and the CDR equation on the unit square have been presented when using p-multigrid and h-multigrid as stand-alone solver or as a preconditioner within a BiCGSTAB or CG method. For all configurations, the number of iterations needed when using ILUT as a smoother are significantly lower compared to the use of Gauss-Seidel, while the number of iterations needed with p-multigrid are very similar to those needed with an h-multigrid method. Hence, the smoother determines to a great extent the resulting convergence rate of the multigrid method. CPU times have been presented for the pmultigrid and h-multigrid method using both smoothers. For low values of p, the use of h-multigrid combined with Gauss-Seidel as a smoother lead to the lowest CPU times. For higher values of p, however, the use of p-multigrid adopting ILUT becomes more efficient, due to the lower assembly and factorizations costs. Note that this is the result of the smaller stencil of the B-spline functions at level p = 1 compared to high order B-spline functions. The p-multigrid method using ILUT as a smoother has been compared as well to an h-multigrid method with a non-standard smoother [18]. Result show that the total solving costs are lower when adopting h-multigrid with this smoother due to the lower setup costs of the smoother. Finally, the p-multigrid method has been succesfully applied to solve linear systems of equations arising from THB-spline discretizations. In general, a significantly lower number of iterations was needed compared to the use of Gauss-Seidel as a smoother. For a limited number of configurations, a higher fillfactor of 2 (instead of 1) was necessary to achieve convergence. Future research will focus on the application of p-multigrid methods for higher-order partial differential equations (i.e. biharmonic equation), where the use of basis functions with high continuity is necessary. Furthermore, local smoothing within the p-multigrid method should be considered to make it more efficient for THB-spline discretizations. Finally, the use of block ILUT as a smoother in case of a multipatch geometry will be investigated.

Acknowledgements
The authors would like to thank Prof. Kees Oosterlee from TU Delft for fruitful discussions with respect to p-multigrid methods.
Although lumping the mass matrix in Equation (22) and (24) slightly influences the number of iterations needed to achieve convergence with the p-multigrid method, the overall accuracy of the p-multigrid method is not affected. To illustrate this, the L 2 error is presented in Table B.16 for all configurations when adopting a lumped or consistent mass matrix in the prolongation and restriction operator.
Appendix E. CPU times compared to an alternative smoother