High-order multigrid strategies for hybrid high-order discretizations of elliptic equations

This study compares various multigrid strategies for the fast solution of elliptic equations discretized by the hybrid high-order method. Combinations of h -, p -, and hp -coarsening strategies are considered, combined with diverse intergrid transfer operators. Comparisons are made experimentally on 2D and 3D test cases, with structured and unstructured meshes, and with nested and non-nested hierarchies. Advantages and drawbacks of each strategy are discussed for each case to establish simplified guidelines for the optimization of the time to solution.

In this article, we propose a comparative study of various multigrid strategies for high-order HHO systems. Three main strategies are explored, characterized by the type of coarsening they implement: (i) h-coarsening only, where the mesh is coarsened and the same high polynomial degree is preserved at every level; (ii) p-coarsening to lower the degree, followed by h-coarsening once the desired low order is reached; (iii) hp-coarsening to simultaneously lower the degree and coarsen the mesh, followed by h-coarsening once the desired low order is reached. In any case, we assume that the h-coarsening phase is able to carry on until a small enough system is reached. Similar studies have been conducted for the continuous finite element method, 15 the discontinuous Galerkin method, 16 and for isogeometric analysis. 17 This work follows up on References 9 and 10, where an efficient h-multigrid method was designed. The convergence of the algorithm, which preserves the polynomial degree at every multigrid level, is shown to be independent of the mesh size (and weakly dependent on the polynomial degree). The intergrid operators defined in those papers shall serve as a basis for the h-, p-, and hp-strategies of the present work. In particular, the prolongation operator leverages, in an intermediate step, the higher order potential reconstruction defined by the HHO discretization. This operator allows the gain of one additional degree in the approximation of the error during the intergrid transfer. However, the degree is lowered back to its original value at the end of the prolongation operation in order to preserve the same value at all multigrid levels. Although this trick brings a significant improvement of the convergence rate of the multigrid method (see Reference 9, Section 4.4.1), lowering back the degree at the end of the prolongation seems to be a waste of possibly valuable higher-order information. One can then legitimately wonder whether the reconstruction of higher degree could be better exploited in a context of p-multigrid, where the extra degree could be preserved after prolongation. This idea is at the origin of the alternative strategies discussed in this work. They consist in using the prolongation operator designed for h-coarsening also when p-and hp-coarsening are employed, that is, when two successive levels do not hold on to the same polynomial degree.
The article is organized as follows. Section 2 first introduces the diffusion equation with piecewise constant permeability tensor as a model problem, then recalls its HHO discretization and the assembly of the discrete problem. Section 3 defines the multigrid ingredients, especially the interlevel transfer operators used to build the method. The term "interlevel transfer operators" is used to cover both the h-and p-multigrid cases. In Section 4, we define the high-order strategies we consider. In short, they correspond to the combination of interlevel transfer operators and a global coarsening strategy defining problem reductions with respect to h, p, and hp. In the numerical tests of Section 5, the various strategies are compared on different 2D and 3D test cases with practical, moderate orders of approximation. Structured and unstructured meshes are used, and the multigrid method is applied to hierarchies of nested and non-nested grids. The multigrid method is employed both as a solver and as a preconditioner to a Krylov iteration. Finally, the concluding Section 6 presents guidelines for the best strategy for each test case. In particular, if the setup is neglected, we find that, in the majority of cases, the most efficient one is found among the most intuitive strategies, namely, h-multigrid only and p-multigrid on top of h-multigrid at low order. Strategies designed to take advantage of the high-order reconstruction without subsequent degree reduction seem unrewarding in practice. Even though they can sometimes prevail, the gap measured with the other strategies is in general not large enough to be significant. Nonetheless, the strategy involving simultaneous hp-coarsening can prove to be useful if the setup cost cannot be neglected. Besides these conclusions, the provided numerical results also offer important insight about the practical computational cost required to solve large scale problems. Indeed, since all strategies yield an optimal solver, the test results can be extrapolated to problems of larger size.

The diffusion problem
Let Ω ⊂ R d , d ∈ {2, 3}, be an open, connected, bounded polyhedral domain with boundary Ω. We consider the following boundary value problem: Find u ∶ Ω → R such that where f ∶ Ω → R denotes the source function and the permeability tensor K ∶ Ω → R d×d is assumed to be symmetric, uniformly positive definite, and piecewise constant over a fixed partition of Ω into polyhedra.

Mesh definition and notation
Let the couple  h ∶= ( h ,  h ) define a mesh of the domain Ω, that is,  h is a set of open, polyhedral elements such that ⋃ T∈ h T = Ω,  h is the set of element faces, and h ∶= max T∈ h h T with h T denoting the diameter of T. The mesh is assumed to match the geometrical requirements of Definition 1.4 in Reference 1.  h is split into two disjoint subsets:  B h , collecting the boundary faces lying on Ω, and  I h , collecting the faces located in Ω. For all T ∈  h , we collect into  T the faces of T. Reciprocally, for all F ∈  h ,  F collects the elements that admit F as a face. We also denote by n TF the unit vector normal to F pointing out of T. In the context of problem (1), we further assume that the diffusion tensor K is constant over each element, and we let K T ∶= K| T for all T ∈  h , as well as K TF ∶= K T n TF ⋅ n TF for all F ∈  T . Finally, for all X ∈  h ∪  h , we denote by (⋅, ⋅) X the standard inner product of L 2 (X) or L 2 (X) d .

Local and broken polynomial spaces
The HHO method hinges on discrete unknowns representing polynomial functions local to elements and faces. So, for all m ∈ N 0 and all T ∈  h (resp. F ∈  h ), we denote by P m (T) (resp. P m (F)) the space spanned by the restriction to T (resp. F) of d-variate polynomials of total degree ≤ m. From these local polynomial spaces, we can construct the following broken polynomial spaces, respectively, supported by the mesh and its skeleton: Typically, the latter space will be used with

Discrete formulation
Let k ∈ N 0 be an integer corresponding to the polynomial degree of the scheme. The global and local spaces of hybrid variables are, respectively, defined as For any v h ∈ U k h , we denote by v T ∈ U k T its restriction to T ∈  h . The homogeneous Dirichlet boundary condition is strongly accounted for in the following subspaces: The global HHO bilinear form associated to the variational formulation of problem (1) is In this expression, the first term is responsible for consistency while the second is required to ensure stability of the scheme. The consistency term involves the local potential reconstruction p k+1 Given the local interpolate v T ∈ U k T of a function v ∈ L 2 (Ω), p k+1 T reconstructs an approximation of v of degree k + 1, that is, one degree higher than the element or face components. The stabilization bilinear form s T depends on its argument only through the difference operators k These operators capture the higher-order correction that the reconstruction p k+1 T adds to the element and face unknowns, respectively. A classical expression for s T The global discrete problem then reads

Assembly
The choice of basis functions for the local polynomial spaces defines the local contributions for the assembly of the algebraic problem corresponding to (4). For all T ∈  h , those contributions are, respectively, for the left-and right-hand side, the matrix A T and the vector B T such that in which the unknowns have been numbered so that element unknowns come first and face unknowns follow. We refer to Appendix B in Reference 1 for further details. Assembly of the local contributions and enforcement of the Dirichlet condition by elimination of the boundary unknowns yield the global linear system Note that A  h  h is block-diagonal, with the blocks (A TT ) T∈ h . It is therefore inexpensive to invert, which is advantageously used in the so-called process of static condensation. Indeed, the element unknowns can be expressed in terms of the face unknowns in the first equation of (6), yielding that is, where v T denotes the restriction to T of v  h , and v F T the restriction of v  I h to the faces of T interior to the domain, completed by zeros for boundary faces. Finally, replacing v  h with its expression (7) in the second equation of (6) gives the so-called trace or condensed system Solving (9) for the face unknowns is the costliest operation and the purpose of our multigrid methods, after which the element unknowns can be locally and independently recovered via (8).

Multilevel hierarchy
Assume a multigrid hierarchy of L levels indexed by ∈ {1, … , L} and sorted so that L refers to the finest level and 1 to the coarsest one. For each level ∈ {1, … , L}, denote by h the corresponding mesh size. For the sake of simplicity, the quantities so far subscripted by h are now simply subscripted by instead of h , so we denote by  ∶= ( ,  ) instead of  h ∶= ( h ,  h ) the mesh at level . Finally, denote by k the HHO polynomial degree at level . Relative to those levels, consider a hierarchy of (not necessarily nested) polyhedral meshes ( ) ∈{1, … ,L} satisfying the requirements of Section 2.2. Considering two successive levels (fine) and − 1 (coarse), the coarsening principles impose that k −1 ≤ k and h −1 ≥ h . Specifically, we distinguish three coarsening strategies: • k −1 < k and  −1 =  (p-coarsening).
We further assume that mesh coarsening not only involves the coarsening of the elements, but also that of the faces; see Reference 9, Section 4.4.3 for details and justifications.

Interlevel transfer operators
In this section, we consider two successive levels (fine) and − 1 (coarse), and define various prolongation and restriction operators.

Prolongation by decondensation (for h-, p-, and hp-coarsening)
The prolongation operator employed in our previous works 9, 10 in the context of h-multigrid is here extended to the cases where k > k −1 and/or  −1 =  , in order to handle p-and hp-coarsening as well. Its philosophy and construction remain unchanged, and hinge on a sequence of operations: decondensation of the cell unknowns to reconstruct a potential in the cells from values on the coarse faces; increase of the polynomial degree via the higher-order reconstruction operator; transfer to the fine mesh; trace on the fine faces. This sequence is formalized by the definition of the following operators: • Θ k ∶ P k ( ) → P k+1 ( ) for all and k: Let v  ∈ P k ( ) denote the argument of Θ k . For all T ∈  , let us denote by v  T ∶= (v F ) F∈ T the restriction of v  to T, and by v  T ∶= (v F ) F∈ T its algebraic representation as vectors of coefficients in the chosen polynomial bases. With these notations, we define the algebraic volumic potential v T by reversing the static condensation, that is, via (8) without the constant term: Once the potential v T ∈ P k (T) is retrieved from (10) via its algebraic representation, we improve its accuracy through the higher-order reconstruction operator p k+1 T defined by (3). The local contribution to Θ k is therefore given by • J k −1, ∶ P k ( −1 ) → P k ( ) for all and k: This operator handles the transfer from the coarse to the fine cells without altering the polynomial degree. If the meshes are nested, then J k −1, is merely defined as the natural injection operator, induced by the embedding of the function spaces. In particular, if both meshes are identical (p-coarsening), it reduces to the identity. If the meshes are non-nested, we define J k −1, as the L 2 -orthogonal projector onto P k ( ).
Let v ∶= (v T ) T∈ ∈ P k ( ) denote the argument of Π k,k′ . Let F ∈  and, if F ∈  I , let T 1 , T 2 denote the distinct elements in  F ⊂  . The local contribution of F corresponds to a weighted average of the traces of degree k ′ computed on both sides of F: with the weighting values w T i F ∶= We refer the reader to References 9 and 10 for more details about these operators. The prolongation operator by decondensation P ∶ P k −1 ,0 ( −1 ) → P k ,0 ( ) is defined as The definition of P is represented in Figure 1. In this composition, we start at the coarse level − 1, from face-defined polynomials of degree k −1 . The application of Θ transfers that broken polynomial to the fine mesh  (possibly equal to  −1 , in which case J is the identity).
Finally, Π k −1 +1,k defines polynomials of degree k on the fine faces. Notice that, in the case of p-or hp-coarsening, k −1 + 1 ≤ k , so no degree reduction is performed in (12). The use of this operator with p-and hp-coarsening then gives rise to novel strategies in which, unlike with h-coarsening, no information is lost in the degree reduction. These strategies are discussed in Section 4.

Prolongation by natural injection (for p-coarsening only)
In p-coarsening, we work with the same mesh at both levels so that, in particular,  −1 =  . Then, k −1 < k implies P k −1 (F) ⊂ P k (F) for all F ∈  −1 , which justifies the natural injection operators The global prolongation operator by natural injection

Restriction by adjoint (for h-, p-, and hp-coarsening)
The restriction operator is defined as the adjoint of the prolongation operator with respect the coarse and fine skeleton-defined inner products. Consequently, the matrix representation of the restriction is the transpose of the matrix representation of the prolongation.
Restriction by L 2 -orthogonal projection (for p-coarsening only) p-coarsening implies  −1 =  , and we denote by k , verifying (2). It is interesting to notice that k ,k −1 is the adjoint operator of the natural injection I k −1 ,k w.r.t. the L 2 -inner product on the mesh skeleton. Indeed, for all F ∈  ,

Smoothers
In order to relax together the set of unknowns related to the same local polynomial, block versions of standard fixed-point methods are used. Block Jacobi, block SOR and their derivatives are typical choices. In our numerical tests, we use block Gauss-Seidel. The block size corresponds to the number of degrees of freedom per face, which is given, for each level , by Notice that the block size depends on the multigrid level and the coarsening method. If level − 1 is obtained from level by p-coarsening, then the block size at level is relative to k and the block size at level − 1 is relative to k −1 < k . Considering a 3D example with k = 3 and k −1 = 1, the block sizes shall be 10 at level and 3 at level − 1. On the contrary, if only h-coarsening is used, then k remains constant across the multigrid levels, and so does the block size.

Efficient implementation with hierarchical, L 2 -orthogonal bases
As pointed out in Reference 11, the implementation of the p-multigrid part of the algorithm can be optimized by using modal, hierarchical, and L 2 -orthogonal local polynomial bases on the faces. Assume k > k −1 and  −1 =  . Let F ∈  , and let  k F be a hierarchical, orthogonal basis of P k (F). The hierarchical structure allows us to extract from  k F the basis functions of degree at most k −1 to obtain a basis  The following efficient implementations are then possible: 1. Natural injection: Given v ∈ P k −1 (F) and v ∈ R n k −1 the vector of coefficients in the decomposition of v on the modal basis  k −1 F , the algebraic representation of I k −1 ,k F v is obtained by simply extending v to R n k , via the padding of (n k − n k −1 ) zero coefficients. This operation can then be applied dynamically without any storage or preliminary construction in a setup phase. Note that this implementation only hinges on the basis being hierarchical, orthogonality is not requested. 2. L 2 -orthogonal projection: Reciprocally, given v ∈ P k (F) and v ∈ R n k the vector of coefficients in the decomposition of v on the modal basis  k F , the algebraic representation of F v is obtained by shrinking v to a size of n k −1 , that is, by discarding the coefficients corresponding to the degrees > k −1 . Like the preceding operator, the projection can then also be applied on the fly, without setup. This implementation hinges on the orthogonality of the basis functions, and is easy to prove via the orthogonal direct sum Assembly of the coarse matrix: Instead of resorting to the rediscretization of the equation at the order k −1 , for the same reasons as for the L 2 -orthogonal projection, it suffices to discard the high-order coefficients in the matrix of level . It corresponds to extracting the top-left corner of size n k −1 × n k −1 from each original block of size n k × n k . One can remark that the matrices of both the natural injection and the L 2 -orthogonal projection are rectangular identity matrices (one being the transpose of the other), so, in the case where they are both used as transfer operators, the coarse matrix also corresponds to the Galerkin operator. The two-grid scheme is then in the configuration of the variational framework, ensuring convergence.

Practical implementation of suitable polynomial bases
First of all, recall that the above optimizations require only the face polynomial bases to be hierarchical and orthogonal. In principle, the bases for the elements do not have to be. However, in the presence of distorted elements, nonorthogonal bases may dramatically affect the condition number of the matrix. In this case, or if high precision is required from the approximation, using orthogonal bases for the elements is also recommended. An orthogonal basis can be locally constructed by applying the modified Gram-Schmidt (MGS) algorithm (w.r.t. the local L 2 -inner product) to an initial hierarchical basis composed of Cartesian products of local monomials. Note that MGS also performs normalization, which means that the basis functions are scaled according to the size of the element in which they are defined. This can, in fact, raise numerical issues in the simultaneous presence of large and comparatively small elements, which typically happens when the mesh is locally refined. Although we find that the normalization has little effect on the condition number, the scaling of the basis functions is strongly affecting the convergence properties of our multigrid method in the presence of aggressive local refinement. To maintain a scaling of the basis functions that is suited for the multigrid algorithm, we employ MGS without the normalization step.
The local orthogonalization of the bases induces additional computational cost. Besides the actual orthogonalization process, the computation of the integrals during assembly requires more work. Indeed, the basis functions are defined as linear combinations of the nonorthogonal, initial polynomials. Consequently, in an implementation with quadrature, the evaluation of a basis function on a quadrature point implies the evaluation of all the initial polynomials occurring in the linear combination. That said, the cost can be reduced significantly with more efficient quadrature-free methods. We especially refer to Section 5.3 in Reference 18. Nonetheless, to avoid extra cost, one should use an already orthogonal basis whenever possible. Especially, the Legendre basis is L 2 -orthogonal on the unit interval. It can then be used on the faces of 2D problems (edges) and, in the form of tensor products, on shapes that can be mapped to Cartesian reference shapes. It typically includes, in 2D, quadrilateral elements, and in 3D, quadrilateral faces and hexahedral elements.

Multigrid algorithms under consideration
Suppose a fixed hierarchy of M meshes, which will serve for the implementation of (M − 1) h-coarsening operations. Note that the term of h-coarsening does not refer to how the mesh hierarchy is actually built. It refers in an abstract way to the relation linking a fine mesh to its immediately coarser one in the hierarchy. Consequently, we speak of h-coarsening between two meshes even if the fine one was obtained by refinement of the coarse one. We assume that the problem assembled on the coarsest mesh is small enough for the linear system to be solved directly, regardless of the polynomial degree. We then consider three types of strategies according to the successive coarsening operations they perform to build the coarse multigrid levels, starting from the fine problem: • "h only": One multigrid level is built for each mesh in the hierarchy, meaning that M levels are built, linked by (M − 1) h-coarsening operations. The original polynomial degree is conserved at every level.
• "p then h": First, p-coarsening is successively performed to construct a first set of levels. Once the polynomial degree has been reduced to k = 1, the mesh hierarchy is used to build further coarse levels.
• "hp then h": First, hp-coarsening is used to simultaneously lower the degree and coarsen the mesh. Then, assuming that k = 1 is reached before exhausting the mesh hierarchy, the remaining meshes are used to construct further coarse levels. This assumption is usually verified in practice for large problems and moderate values of the original polynomial degree.
In what follows, h-coarsening is realized by successively doubling the mesh size. Concerning p-coarsening, multiple p-multigrid algorithms have shown that it is not necessary to lower the degree by only one at each step. Decreasing the degree more aggressively, 11,15,16,19,20 or even going directly to the low order in one single p-coarsening step, 21 often proves to be more efficient. In the present multigrid method, we have empirically identified that the best performance with p-multigrid seems to be achieved when lowering the degree by two. In case of "p then h" coarsening, we then The multigrid strategies of Table 1, given a fine problem with k = 3 and a hierarchy of four meshes. The black-connected dots correspond to the multigrid levels, the uppermost rightmost one representing the finest level. The most relevant prolongation operators are also plotted. As such, the operator P is decomposed according to the suboperators in (13), where the subscripts related to levels have been omitted for readability. Superscripts, related to degrees, are however conserved. Blue arrows raise the degree, red ones decrease it, and dotted black ones have no effect on the degree.
The column "Coarsening" refers to the coarsening algorithm followed to successively construct the coarse multigrid levels from the fine one. The third column quantifies the "Degree reduction" between two successive levels and − 1 linked by p-or hp-coarsening: the value corresponds to m in the enforced relation k −1 ∶= max{k − m, 1}. The remaining columns on the right indicate the prolongation and restriction operators used in case of p-coarsening on one hand, and h-or hp-coarsening on the other hand.
enforce k −1 ∶= max{k − 2, 1} for the p-part. However, in case of the naturally more aggressive hp-coarsening, we only decrement the degree by one at each step, that is, k −1 ∶= k − 1.
When two successive levels do not share the same mesh, that is, in case of h-or hp-coarsening, we use the prolongation P constructed by decondensation (cf. Section 3.2.1). The associated restriction operator, in this case, is its adjoint (cf. Section 3.2.3). Otherwise, when the levels are linked by p-coarsening, we explore two possibilities: (i) the prolongation by natural injection I k −1 ,k (cf. Section 3.2.2) along with the restriction by L 2 -orthogonal projection k ,k −1 (cf. Section 3.2.4); (ii) the prolongation by decondensation P along with its adjoint. These settings define four global strategies, summarized in Table 1. Note that not all possible combinations are considered. We have retained only those that can ensure the symmetry of the multigrid operator. The strategies of Table 1 are represented in Figure 2 on an example with k = 3 and a hierarchy of four meshes.

Interpretation
While h-only and p-h (see Table 1) correspond to the most classical and intuitive options, p-h* and hp-h are more original. Both these alternatives are designed to circumvent the loss of information induced by the use of the higher-order reconstruction in the operator P in the original h-only strategy. Indeed, if the reconstruction of polynomials in the cells (performed by Θ k −1 −1 ) raises the degree to k −1 + 1, it is lowered back to k −1 in the trace operation on the fine faces (performed by Π k −1 +1,k , where k = k −1 ); see Figure 2a. Although this trick ultimately improves the accuracy of the trace of degree k −1 , one might wonder whether preserving the higher degree k −1 + 1 on the trace could be more efficient. This supposes that the fine level be associated to a degree greater or equal to k −1 + 1. If so, either the fine level also has a finer mesh, yielding the strategy hp-h (see Figure 2d), or it has the same mesh (i.e., p-coarsening only is enforced), yielding p-h* (see Figure 2c). We can also interpret p-h* with respect to p-h. In the p part of p-h, each local polynomial of degree k −1 is simply injected identically into a larger polynomial space, namely, of degree k −1 + 2 (recall that in our configuration, a reduction of 2 degrees is performed between level and level − 1); see Figure 2b. This injection leaves the burden of solving for the two degrees separating the levels to the fine grid smoother. On the other hand, the operator P that is used in p-h* instead of the natural injection actually offers the computation of one degree of approximation higher. Thus, here, only one polynomial degree needs to be handled by the fine smoother. However, P does not deliver on the fine level an identical representation of the coarse function, which may have negative consequences. The numerical experiments of Section 5 indeed show that p-h* is usually less efficient than p-h.
hp-h can be similarly interpreted with respect to h-only. While in h-only, the reconstructed higher degree is lowered back, it can be conserved in hp-h, which may counterbalance the elevated aggressiveness of the hp-coarsening. The numerical tests of Section 5 show that hp-h sometimes prevails over h-only.

Experimental setup and implementation
The multigrid strategies described in Section 4 and summarized in Table 1 are used to solve the condensed linear system (9), either as a solver or as a preconditioner for a Krylov method. The smoother used at every level is the block Gauss-Seidel method, with block size corresponding to the number of unknowns per face (cf. Section 3.3). Following the recommendations of Section 4.2.2 in Reference 9, we use V-cycle with post-smoothing only; see also References 22 and 23. This choice has been found to optimize the performance of the multigrid methods (in terms of computational work and execution time) for all considered strategies. In particular, we use V(0, 3) in 2D and V(0, 6) in 3D. These unsymmetrical cycles prevent the use of the standard conjugate gradient (CG) as outer iteration. Instead, such a nonsymmetric preconditioner may be used in a flexible conjugate gradient (FCG), as justified in Reference 24. We stress that we obtain significantly better performance with our unsymmetrical cycles and FCG than with symmetrical cycles and CG. We then choose FCG (see Reference 25 for the detailed algorithm) as outer Krylov method for the use of our multigrid method as a preconditioner. Denoting by r i the algebraic residual of (9) at iteration i and || ⋅ || 2 the Euclidean norm on the space of coefficients, the solver stops when ||r i || 2 ∕||b|| 2 is lower than a given tolerance. Our software uses shared-memory parallelism with a maximum of 24 parallel threads. Parallel execution is employed in the assembly and the setup phase of the algorithm, since here only local computations are performed. Specifically, this applies to the following operations: (i) the assembly of the coarse matrices obtained by rediscretization of the problem on the coarse meshes in the context of h-and hp-coarsening; (ii) the extraction of the coarse matrices from the fine one when performing p-coarsening using orthogonal bases; (iii) the assembly of the prolongation operators P (13); (iv) the factorization of the diagonal blocks of the operators for future use in the block Gauss-Seidel iterations. Note that, assuming orthogonal bases, the prolongation by natural injection and the restriction by L 2 -orthogonal projection do not need any setup (cf. Section 3.4).
The block Gauss-Seidel smoother, however, is an inherently sequential algorithm. We have therefore chosen to adhere to the sequential ordering to avoid the situation that the convergence rates could depend on the specific parallelization strategy. This is necessary, since the main purpose of this current study is to compare high-order solution strategies independently of parallelization issues. To this end, we have defined metrics of computational cost that do not depend on the specific implementation and parallelism: theoretical computational work and number of iterations. As a complement to this, we will also cite CPU times.
The theoretical computational work is expressed in work units (WUs), where one WU amounts to the number of flops involved in one application of the fine grid matrix (namely, twice the number of nonzeros in the matrix). The number of WUs can be interpreted, for instance, as the cost corresponding to the same number of Richardson iterations. This notion of a WU is commonly used in the multigrid literature to express the performance of an algorithm in light of the textbook multigrid efficiency. 26,27 In particular, the textbook paradigm quantifies the unspecified constant in the optimality property. According to Brandt,26 an ideal multigrid algorithm should solve a system with an algebraic error below the discretization error for a computational cost no higher than 10 WUs.

Square domain, Cartesian mesh
To establish a baseline, let us start with the square domain Ω ∶= [0, 1] 2 , discretized by a uniform, Cartesian mesh. The tensor K is the identity matrix. The source function f is computed with respect to the exact solution u ∶ [0, 1] 2 ∋ (x, y)  → sin(4 x) sin(4 y). The problem is discretized by the HHO method with k = 5. At this high degree, a mesh of 128 × 128 square elements is enough to achieve an L 2 -error lower than 10 −12 , close to machine precision. The mesh hierarchy is composed of four embedded Cartesian meshes. Orthogonal Legendre polynomials are chosen as local polynomial bases for both cells and faces, which allows the optimizations described in Section 3.4. The multigrid method is employed as a solver to solve the condensed linear system (9), using each of the strategies of Table 1. The tolerance defining the stopping criterion is set to 10 −12 , in accordance with the discretization error. Figure 3 presents bar plots that compare the performance of the various strategies. In (a), the computational work of the iteration phase is expressed in equivalent number of WUs. In (b), this work is measured in terms of CPU time. In (c), the convergence is evaluated in terms of the number of iterations to reach convergence. In (d), the setup cost is measured in CPU time. Table (e) gives additional information about the multigrid method, namely, the number of levels built and the asymptotic convergence rate , defined as the geometric mean of the residual convergence ratios of the last five iterations: n denoting the last iteration. Finally, plot (f) stresses the robustness of the convergence of the methods with respect to the mesh size. First, one can see that the results of the iteration phase in computational work and CPU time are consistent, yielding the same ranking. In particular, p-h (in red) is found to be the most computationally efficient strategy, with about half the theoretical cost of the h-only strategy (in blue). Clearly, this is due to its better convergence rate (see plot (c)). Second, plot (d) shows that p-h also has the cheapest setup, due to the fact that (i) the levels issued from the p-coarsening operations do not need rediscretization (the coarse matrices are extracted from the fine one), (ii) the three rediscretizations performed after the h-coarsenings are made at the low order k = 1, and (iii) the transfer operators between levels related by p-coarsening do not require any setup. In comparison, the h-only strategy rediscretizes the problem three times on the coarse meshes at the high order k = 5, which is costlier, and the respective prolongation operators P also have to be built at that high order. Next, one can remark that the strategy p-h* does not seem to pay off. One could have expected the operator P, here used for p-multigrid, to have a greater impact. Indeed, while the natural injection leaves the burden of solving the two polynomial degrees separating the levels to the fine smoother, the operator P actually offers the computation of one degree of approximation higher, leaving only one remaining degree to be handled by the fine smoother.
Although the lowest possible discretization error is already achieved, the mesh is refined multiple times to increase the problem size in order to exhibit the asymptotic behavior of the solver. The flat curves of Figure 3f clearly show the optimality of each method, the convergence rates of which are independent of the number of unknowns. Figure 4 presents the analogous information as Figure 3, this time using the multigrid method as a preconditioner for FCG. Note that the number of iterations displayed here corresponds to the iterations of the preconditioned FCG solver. One can see that the strategy rankings of Figure 3 are preserved and that the overall computational cost is reduced. In particular, the linear system is solved to machine precision with a computational work corresponding to about 65 WUs. This is significantly higher than what is expected for classical textbook multigrid efficiency (TME), which suggests that 10 WUs should suffice to solve the linear system. However, for example, Reference 27, p. 318 states that, in many cases, solvers require an order of magnitude higher cost than the ideal goal of 10 WUs. In our case, additional difficulties arise because of the high order of the discretization, the face-based nature of the system, and the correspondingly more complex matrix structure. This makes it unclear whether TME with 10 WUs could be achieved. Note also that 10 WUs are only claimed for multigrid when used in a full multigrid setting (i.e., a nested iteration), when the initial guess is recursively computed by multigrid employed on a coarser level. In our current experiments, we only use V-cycles that are insufficient to reach TME. We point out, however, that TME has been attained, for example, in Reference 28 for conforming 3D discretizations.
When the exact solution is sufficiently smooth, the use of high polynomial degrees reduces the cost to achieve a desired precision. Figure 5a presents, for each value of k ∈ {2, 3, 4, 5}, how finely the unit square must be decomposed by Cartesian elements (and how large the resulting linear system must be) to achieve an L 2 -error < 10 −12 . Clearly, the higher the polynomial degree, the coarser the mesh can be, and the smaller the linear system becomes. Figure 5b illustrates the time spent solving each of those problems, including initial assembly and setup, using the FCG solver preconditioned by the multigrid method with strategy p-h. It shows that, for approximations resulting in equivalent L 2 -errors, high orders have a clear advantage over low ones in terms of global time to solution. As a consequence, if the solution is smooth, raising the order of approximation is more efficient to improve the solution quality than refining the mesh, but of course this requires that a suitable and efficient solver is available. If the solution is nonsmooth, the higher orders cannot be fully exploited and their effect is mitigated (cf. Section 5.4 and Figure 6). A classical solution consists in resorting to posteriori-driven adaptivity.

Square domain, unstructured triangular mesh
Using the problem settings of the preceding test case, the square domain is now discretized by a coarse Delaunay triangulation. The initial triangulation is refined until the mesh is fine enough for the approximate solution to achieve an L 2 -error lower than 10 −12 . Each refinement step consists in subdividing each coarse triangle into four fine triangles connecting the edge middle points. Setting k = 5, the fine mesh is composed of about 11,000 elements, for a system size of about 100,000 rows. The mesh hierarchy obtained (three meshes) is used to model the operations of h-coarsening in the multigrid method. The polynomial bases are chosen as the orthogonal Legendre basis on the faces (here, edges) and, in the elements, local monomials are orthogonalized element-wise (cf. Section 3.4.1). Figure 7 presents the numerical results of the different strategies when the multigrid method is used as a solver. The ranking, here, differs from that of the preceding test case with Cartesian meshes, The change of element type seems to favor the strategies prioritizing mesh coarsening, namely, h-only and hp-h. Note that they are also the ones with the costliest setup, due to the rediscretizations at high orders on the coarse meshes. This makes them competitive only when the system needs to be solved many times. If the system needs to be solved with one or few right-hand sides, then strategies leveraging p-coarsening have a clear advantage. Moreover, one can remark the higher cost of the setup in comparison with Cartesian elements, owing to the dynamic orthogonalization of the local polynomial bases. Figure 8 presents the results of the multigrid method as a preconditioner. Note that the ranking given by the computational work and the CPU time ((a) and (b), respectively) do not concur. For this reason, we avoid drawing conclusions and recall the bias these results are subject to. First, the intergrid transfers of the p-multigrid part of p-h perform no theoretical flops, although their execution does consume CPU time, which makes the computational work of p-h more optimistic than its CPU time. This remark is nonetheless to be mitigated by the fact that, in p-h, the intergrid transfers consumes less than 2% of the CPU time. The second bias is the dependency of the CPU time on the implementation, contrary to the F I G U R E 7 Unit square, unstructured triangular mesh, k = 5. The multigrid method is used as a solver with the strategies of Table 1.
Convergence is assumed achieved once the backward error reaches the tolerance of theoretical computational work, which is objective. On the other hand, the CPU time usually reflects the practical time to solution more accurately. Following this criterion, the ranking of the strategies follows that of the multigrid used as a solver, that is, a preference for h-only and hp-h.

Kellogg heterogeneous problem
The Kellogg problem is a well-known heterogeneous diffusion benchmark problem. The square domain is partitioned into four quadrants (Ω i ) i=1, … ,4 such as in Figure 9a, and the diffusion coefficient K is such that K |Ω 1 = K |Ω 3 = 1 I and K |Ω 2 = K |Ω 4 = 2 I, where I is the identity matrix and 1 , 2 are positive scalar values. The source function is f = 0. The discontinuities in the coefficient reduce the regularity of the solution, which is H 1+ , 0 < ≤ 1. This low regularity makes it a difficult problem, often used as a benchmark for adaptive element methods. Indeed, the exact solution is known. In our case, we follow Example 5.4 in Reference 29 and refer to it for the solution's analytical formula. In particular, the heterogeneity ratio is 1 ∕ 2 ≈ 161, inducing = 0.1 and leading to the solution plotted in Figure 9b. Nonhomogeneous Dirichlet conditions are enforced on the boundary of the domain. While the preceding test cases could be solved at machine precision with very few unknowns, the low regularity of the Kellogg solution requires a finer mesh, corresponding to a larger problem size. In particular, we use a hierarchy built by successive refinements of an unstructured mesh locally refined around the central singularity (see Figure 9c), and set k = 5. The fine mesh has over 350,000 elements, producing over 3 million unknowns. The L 2 -error reached with this setting is about 3.10 −3 . Figure 10 presents the numerical results of the various high-order multigrid strategies to solve the system (cf. Table 1). Given the discretization error, a tolerance of 10 −3 would be sufficient. However, to stress the convergence of the algorithm, we set it to 10 −12 . The rankings match those of the preceding test case (unit square with unstructured triangular mesh), presenting h-only and, to a lesser extent, hp-h, as the most suitable alternatives if the setup can be neglected. This tends to strengthen the hypothesis that unstructured triangular meshes favor strategies that prioritize h-coarsening. In particular, h-only has clearly the best asymptotic convergence rate (0.27 vs. about 0.40 or more for the other strategies; see (e)).  Figure 11 presents the results of the multigrid methods as preconditioners to FCG. As in the preceding test case, the rankings provided by the computational work (a) and the iteration CPU time (b) are not consistent, so we avoid drawing conclusions. That said, following the CPU time criterion, our implementation favors again h-only and hp-h for use as preconditioners. Recall also that, surprisingly, while p-h* was more efficient than p-h as a solver, it is the other way around when these strategies are used as preconditioners. Figure 5 has shown the clear advantage of high order on the overall time to solution when the exact solution is smooth. For the sake of comparison, Figure 6 reproduces the experiment on the Kellogg problem, that is, when the solution is nonsmooth. Figure 6a presents, for each polynomial degree k ∈ {1, 2, 3, 4, 5}, the coarsest mesh required (still obtained from successive refinements of an initial coarse mesh such as Figure 9c), as well as the corresponding number of unknowns, to achieve an L 2 -error smaller than 5 × 10 −3 . As there is no distorted or stretched element, we simply use the monomial bases in cells and on faces to save up the cost of the orthogonalization in this experiment. Figure 5b presents the overall time to solution (including initial assembly and multigrid setup) to solve those problems. The system is solved using the h-only strategy as a preconditioner (found to be the most efficient solver in computational time, setup excluded, for this problem; see discussion above). The tolerance is set to 10 −3 , in accordance with the discretization error. As a consequence, the solver converges in less than four iterations, which makes this step almost negligible. The results here are more difficult to interpret than on the smooth test case because the L 2 -error is hard to reduce, which means that a small decrease of the error may actually demand a lot of additional work. Consequently, as the problems of Figure 6a do not generate exactly the same L 2 -errors, comparing their computational costs under the assumption that they achieve equivalent L 2 -errors is somewhat erroneous. However, this setting is enough to roughly illustrate the limitations of high orders. Considering the approximate solutions yielded by all problems of equivalent quality, the problem with k = 3 is a sweet spot, and a better option than k = 4 on the same mesh or k = 5 on a mesh with twice the mesh size. Furthermore, noticing that problems with k = 1 and k = 4 in fact generate the same discretization error, one can see that it is more efficient to choose k = 1 on a fine mesh than k = 4 on a coarser one.

5.5
Cubic   Figure 13) presents the results of the multigrid method used as a solver (resp. as a preconditioner), with the various high-order strategies of Table 1. Clearly, as in 2D, the p-multigrid approach prevails with Cartesian meshes.

Cubic domain, structured tetrahedral mesh
This test case follows the settings of the preceding one, except for the mesh type, which is now tetrahedral. The elements are arranged in a structured manner: the domain is uniformly subdivided in a Cartesian way, where each cube is itself subdivided into six tetrahedra through the Kuhn triangulation (see Reference 30, Figures 8 and 9). This decomposition allows an easy and practical mesh coarsening, in the sense that halving the number of cubes in the Cartesian decomposition results in a coarse mesh that embeds the fine one and doubles its mesh size. Additionally, both meshes comprise the same tetrahedral shapes, therefore ensuring the same aspect ratios. We build this way a hierarchy of five nested When the multigrid method is used as a solver, the results of Figure 14 favor the h-only strategy, agreeing with the similar test case in 2D (cf. Section 5.3). As a preconditioner (Figure 15), h-only is still favored by the CPU times of the iterations, although to a lesser extent.

3D complex domain, non-nested meshes
This test case is the complex geometry of a Geneva wheel with unstructured tetrahedral mesh, represented in Figure 16. The source f is a piecewise polynomial function. The main difficulty of this test case lies in the construction of the F I G U R E 16 Tetrahedral mesh of a Geneva wheel mesh hierarchy. Indeed, proceeding by tetrahedral refinement of an initial coarse mesh degrades the aspect ratio of the elements, causing bad performances of our multigrid method, highly sensitive to the mesh quality (cf. Reference 9, Section 4.5). The use of non-nested meshes circumvents the problem. We then use independent Delaunay triangulations of the domain, choosing the mesh sizes so that each pair of successive meshes enforces a geometric coarsening ratio close to two. In order to avoid prohibitive computational costs, the L 2 -orthogonal projection J k −1, , introduced for all k ≥ 0 in Section 3.2.1, is computed approximately via the method described in Reference 10, Section 3.3. In particular, the strict definition of J k −1, involves the evaluation of the geometric intersections between all overlapping fine and coarse elements, which induces complex and costly computations. Instead, the intersection of a coarse element T c ∈  −1 and an overlapping fine one T f ∈  is approximated in the following manner: where Sub(T f ) is a fixed subdivision of T f , obtained in practice by tetrahedral refinement. With this method, the L 2 -inner product between a coarse basis function and a fine one attached to the overlapping element T f is decomposed into a sum of integrals over tetrahedra, namely, those whose barycenter is included in T c . The numerical tests in Reference 10 showed that one Bey's refinement is enough to construct an approximate L 2 -orthogonal projection operator accurate enough to be successfully used in our multigrid method up to k = 2. From k = 3, two refinement steps are necessary. However, given that Bey's method decomposes a tetrahedron into eight smaller ones, the computational work required to assemble the operator is multiplied by eight for each additional refinement. As we find two refinements too costly in practice when integrals are computed by quadrature rules, we limit ourselves to k = 2 for the strategies h-only and hp-h. For higher orders, strategies lowering the degree before dealing with the non-nested meshes are recommended. Information regarding the hierarchy of non-nested meshes is provided in Figure 17f. The rest of Figure 17 presents the results of the various strategies of multigrid method used as a solver. Interestingly, hp-h is found to be the most efficient. As a preconditioner, following Figure 18, hp-h is still favored by the CPU times of the iteration step. One can remark the high computational cost compared to the other problems. This can be explained by (i) the very use of the L 2 -orthogonal projection, which is an approximation in itself (even computed exactly) and therefore implies a loss of precision with respect to the initial function it is applied to; (ii) compared to nested meshes, the stencil of the prolongation is enlarged, thus increasing the cost of the grid transfers: indeed, while in the case of nested meshes, each fine element is included in only one coarse element, a non-nested fine element usually overlaps multiple coarse elements, thus multiplying the size of the corresponding stencil.

CONCLUSION
In this article, we defined and compared several high-order multigrid strategies (cf. Table 1) for the fast solution of elliptic equations discretized by the HHO method. Assuming that the integrals are computed with quadrature rules, the setup phase can become time-consuming in high-order and, according to the strategy, even take a significant part of the computations, especially if the bases have to be locally orthogonalized. Consequently, if the arising linear system needs to be solved with only one right-hand side, then the usual method consisting in plugging a p-multigrid on top of an h-one (strategy p-h) clearly stands out, owing to its very cheap setup. Otherwise, if the system is to be solved with many right-hand sides, the best option may differ according to the space dimension and the type of mesh. The most important take-away of this study is that the best choice may significantly reduce the time to solution as compared to other strategies. This remark should encourage users to put different strategies to the test in order to find out the most efficient one for their specific case study. Based on our numerical experiments, we can, nonetheless, attempt to draw simplified conclusions to help in this process. In particular, the strategy p-h*, especially designed to exploit the higher-order reconstruction without loss of information, yields disappointing results while being also the least intuitive strategy. It can therefore be discarded in practice. hp-h, also designed to exploit the higher-order reconstruction, often shows performances close to h-only while having a much cheaper setup. In 2D, while Cartesian meshes see p-h prevailing, h-only and hp-h are favored in more general cases. In 3D, on the other hand, the results are not so clear. Nonetheless, it is to be noted that in complex cases, where non-nested meshes and approximate L 2 -orthogonal projections are employed, setup computations are severely more demanding at high-order, rendering h-only hardly practicable. This conclusion could be modified by the use of more efficient strategies for the numerical computation of integrals such as the technique of Reference 31. The best strategy for the method used as a solver is also often the one prevailing when the method is used as a preconditioner. Even if the multigrid method is already optimal as a solver, the preconditioning technique allows to save, on average, about 25% of the computational work.
Another meaningful remark is that the computational work required exhibits large variations according to the type of mesh. Typically, a solution on an unstructured mesh may demand twice the work of that on a Cartesian mesh, mainly due to an increased number of iterations.
The methods of this article will have to be significantly extended when applied to other settings, such as, for example, strongly anisotropic equations or Stokes flow. We refer to the literature for the general adaptations needed in multigrid solvers for these problem classes. Studying such extensions specifically for the h-/p-multigrid methods of this article, will be the topic of future research.

CONFLICT OF INTEREST
This study does not have any conflict to disclose.

DATA AVAILABILITY STATEMENT
Data sharing is not applicable to this article as no datasets were generated or analyzed during the current study.

ACKNOWLEDGMENT
Open access funding enabled and organized by Projekt DEAL.