Non-modal analysis of linear multigrid schemes for the high-order Flux Reconstruction method

We present a numerical analysis of linear multigrid operators for the high-order Flux Reconstruction method. The non-modal analysis is used to assess the short-term numerical dissipation in the context of 1D and 2D linear convection-diﬀusion. The eﬀect of several parameters, namely the number of coarse-level iterations, the polynomial order and the combination of h - and p -multigrid is explored in an eﬀort to identify the most eﬃcient conﬁgurations. V-cycle p -multigrid is shown to have increased eﬃciency at higher polynomial orders, and the use of W-cycles and/or hp -multigrid appear to oﬀer additional advantages. The eﬀect of high P´eclet numbers and high aspect-ratio cells is also explored in 2D, and both factors are shown to decrease the error dissipation. Finally, we relate the non-modal dissipation to the convergence rate of the multigrid in a series of manufactured solutions.

In Computational Fluid Dynamics (CFD), the use of low-order methods is widespread due to their 34 well-known robustness, flexibility and reliability. Having progressed considerably over the last decades, they nowadays provide satisfactory solutions for many flow problems at the industrial level. As scale- 36 resolving simulations gain strategic importance to the CFD industry [1], present-generation methods face known challenges in which high-order methods (HOM) become the subject of growing interest. These While the convergence properties have been the main focus of the aforementioned publications, little attention has been paid to the influence of the multigrid parameters themselves, i.e. the number of coarse-136 level smoothing steps, the type of cycle (V/W), or the number of h-multigrid levels added to the p-multigrid in a combined hp-hierarchy. Aside from [97], analyses on the behaviour of the multigrid operator as a 138 whole are scarce for more than two p-levels. Likewise, there is not enough information on the hp-multigrid performance in the presence of convection, nor in high aspect-ratio meshes. High-order methods for industrial applications require robust and efficient algorithms, for which numerical analysis can provide relevant a priori estimations. In this work we propose to apply the non-modal 142 analysis introduced by Fernández et al. [40] to several p-and hp-multigrid operators for Flux Reconstruction discretizations of the linear convection-diffusion equation. Our aim is to find those realistic -as opposed 144 to simplified -multigrid configurations that exploit the potential of the method and at the same time are robust enough to be applicable in a wider range of problems -from pure diffusion to high-Péclet convection. 146 To that end, this manuscript is organized as follows. The Flux Reconstruction method for linear equations is summarized in section 2, together with the local form of the operators required for a stability analysis. In 148 section 3, we explain the p-and hp-multigrid methods for Flux Reconstruction and section 4 describes the analysis tools used in this study. A discussion of the results for 1D follows in section 5.1, and the extension 150 to 2D is made in section 5.3. Finally, the analysis is linked to simulation results in a series of manufactured solutions from [98] in section 6, followed by a summary of the key findings in section 7.

Discretization of the linear convection-diffusion equation
The linear convection-diffusion equation (LCDE) with coefficients a and ν ≥ 0 for a scalar quantity 154 u(x, t) reads: where the convective (inviscid) flux is F cnv = au and the diffusive (viscous) flux is F diff = −ν∇u. In an 156 infinite domain, for a complex wave-like initial condition u(x, 0) = exp(ıκ · x) where ı is the imaginary unit, the exact solution is given by: u(x, t) = e ıκ·x e −(ıa·κ+ν κ 2 ) t . (2) In the presence of a non-zero diffusion coefficient, ν > 0, the system evolves towards a zero-steady-state solution. 160 We now briefly present the FR discretization of the linear convection-diffusion in 1D. For a detailed description of the Flux Reconstruction method, the reader is referred to the literature [11,12]. 162

Domain decomposition
The computational domain is divided in elements or cells denoted C j , where j is a unique integer. The center of the element is located at coordinate x j , and the element width is h j . As is common practice in Finite Element methods, a reference space is defined where most arithmetical operations take place. The reference space, which is local to each element, is parametrized by coordinate −1 ≤ ξ ≤ 1 and the mapping from physical space to reference space is as follows: Inside element C j , the solution is approximated at a set of P + 1 quadrature points (solution points) ξ k , 164 k = 0, ..., P , locally defining a polynomial of order P . The polynomial can be represented by a vector of modal coefficientsû j of a Legendre basis, or a vector of nodal coefficients u j . The latter corresponds to the 166 solution values at the aforementioned quadrature points.

168
In Flux Reconstruction, the divergence of the fluxes is evaluated in 7 steps.
1. Interpolation of the solution at the cell interface points. In 1D, these are the cell boundaries given by ξ = ±1. In essence, we can obtain the interpolated values at the left boundary (L) and the right boundary (R) by means of row vectors l j and r j : 2. Computation of the common solution values. At the interface points, a common value for the solution must be defined in order to account for the interaction between neighbouring elements. In this study, 174 the common solution values are calculated using the Local Discontinuous Galerkin (LDG) approach [99]: 3. Computation of the gradient of the solution, q j , as the sum of two components. The first component is the discontinuous gradient, which results from directly evaluating the derivative of u j . The derivative 178 can be computed by applying a derivative matrix D j , whose shape is not detailed here (see [100]). The second component is a correction term, proportional to the derivative of certain functions g L , g R (see 180 [12] for more details) and to the difference between the interpolated solution and the common solution.
Hence, q j is also called corrected (or continuous) gradient: 4. Interpolation of the fluxes at the interface points. Taking the left boundary as an example, in the case of linear convection-diffusion with constant coefficients this procedure simplifies to: 5. Computation of the common flux values, for both the convective flux and the diffusive flux, using an appropriate Riemann solver. In this study, we use the Roe scheme for the convective flux, and the 186 LDG scheme for the diffusive flux: 6. Divergence of the corrected fluxes. Similar to step 3, we add the corrections and the discontinuous 188 fluxes, which are evaluated directly from the solution and the solution gradient, to obtain the corrected fluxes. In practice, however, we compute directly the divergence of the fluxes by applying the derivative 190 matrix and by using the derivative of the correction functions, g R j and g R j : 7. Transformation to physical space and evaluation of the time derivative. In step 6 we have computed 192 the flux divergence in reference space. To transform it back to physical space, we make use of the Jacobian of the transformation (cf. equation 3): This completes the spatial discretization. Clearly, the above steps can be grouped in a compact matrixoperator form as follows: where A j,i is an operator defined in element C j acting on neighbouring solution vectors u i from cells C i .
In this case, a five-cell stencil is assumed due to the use of the In order to perform a stability analysis, we need to express the spatial discretization as an operator acting only on the local values u j , as opposed to the above formulations in which we need information from 212 neighbouring points. To that end, we require that u i be expressed as a linear function of u j , something that we will achieve by using some shift matrices T i,j : The form of these matrices depends upon the initial condition for the solution, the elements' widths and the polynomial order. If the initial condition is given by u(x, 0) = exp(ıκx), it follows that, for two adjacent 216 cells C i and C j , the shift matrix is diagonal with terms: Note that the above expression is valid for cells with different widths, and reduces to the well-known 218 expression (T i,j ) kk = exp(ıκh) when the mesh is uniform (see for instance [38]). We can now express the local discretization operator as follows:

Time discretization
As per equation 16, we have computed the time derivative of the solution. To advance in time we may 222 use, for instance, an explicit Runge-Kutta scheme. In such case, if we define the fully discrete operator as the one containing the spatial and temporal discretizations: where the time-step is ∆t = t n+1 − t n , then A and Z are related by a Runge-Kutta polynomial P N S (x) with S stages that is N -th order accurate in time:

Multigrid operators
Let us consider the following linear system where u is the vector of unknowns: For simplicity, we will drop the subscript j from the solution vector since we will deal with fully-local operations only. An inexact guess of the solution vector, i.e. a guess with a certain amount of error will 230 produce a residual given by r = f − A u. In order to damp the residual, one could use an explicit smoother and iterate until the solution converges, or equivalently, march towards the steady-state. If the right-hand side of equation 19 is zero, the iterative process -henceforth referred to as fully-explicit marching -is: This procedure is inefficient at eliminating the low-wavenumber errors; hence the multigrid is considered 234 as an alternative. The iterative process applying a multigrid operator G can be written in the same form: where G contains all grid-transfer, smoothing and correction operations. We now give some indications on 236 how this matrix is computed.

238
The linear multigrid algorithm, also called Correction Scheme [50], solves the full system at the fine level and the residual equation at the coarse levels. The nonlinear Full Approximation Scheme, commonly used 240 in the Navier-Stokes equations, reduces to the Correction Scheme when applied to linear systems. Using subscript p = 0, ..., P to denote the level, where P is the finest level and 0 the coarsest one, the equations of 242 the Correction Scheme are: where R p p+1 is the restriction matrix from level p + 1 to level p. We further introduce m p as the number 244 of smoothing steps (sweeps) at level p, the superscripts "do" and "up" for the solutions obtained in the down-cycle and the up-cycle and the subscript 0 or m p to denote the solution vector before smoothing and 246 after smoothing, respectively. The set of values for m p at all levels will be henceforth called sweep pattern.
In such way, starting from an initial guess (u P ) do 0 , the fine-level pre-smoothing after applying m P smoothing 248 steps is: For an explicit marching with a Runge-Kutta method, the smoothing operators Z m,p andZ m,p are: whereP(x) = (1 − P N S (x))/x is obtained from the Runge-Kutta polynomial. Note that these operators directly apply m p smoothing steps instead of just one. The reader may identify A p and Z p with the FR 252 spatial discretization and the fully-discrete operators, as presented in equations 16 and 18, for a given polynomial order p. The Correction Scheme requires the initial guess for the coarse-level variables to be zero, so the smoothing steps in the down-cycle are expressed as follows: Combining the above equations and denoting I p the (p + 1) × (p + 1) identity matrix, we obtain that the 256 solutions at the coarse levels are linear in terms of the fine-level residual: On the other hand, the up-cycle consists of a prolongation of the correction and a post-smoothing: otherwise expressed in a single step as: From the above we can infer that the up-cycle solutions are expressed in terms of the down-cycle solutions 260 according to: Equation 29 is only valid for p < P , since at the fine level the factor (Z m,P + I P ) does not apply to 262 (u P ) do m P , but only Z m,P . However, combining equation 29 with equation 26 we can express the solution vector (u P ) up m P in terms of the initial guess (u P ) do 0 and f P . For instance, further considering f P = 0 and 264 for P = 1, we obtain the multigrid matrix for a two-level V-cycle -the simplest possible: This is in agreement with existing results in the literature [101,93]. At this point we can express the 266 multigrid operator as a single matrix for whichever polynomial order and number of smoothing steps. This allows for a numerical analysis of the operator as a whole, without need to consider separately the behaviour 268 of the smoothers, the restriction matrices and the prolongation matrices. Likewise, the eigenvalues of the operator can be computed without simplifications based on the wavenumber.

Matrix formulation of the W-cycle operator
We only consider W-cycles where the coarse-level loops, henceforth referred to as w-sweeps, take place 272 between levels p = 1 and p = 0. One may easily obtain the matrix operator for a W-cycle multigrid by following the same procedure as in section 3.1 and adding the corresponding extra smoothing operations.

274
The matrix obtained in equation 30 is readily available for this purpose. smoothing operator Z m,p would rather be: This applies to all operators, including the restriction and prolongation matrices between p-levels. As 282 for the inter-grid projection matrices, we use a simple averaging for the restriction to a coarser mesh, and pure injection for the prolongation to finer meshes. Remember that the h-part of the multigrid happens at 284 constant P = 0, therefore there is only one solution point per cell. In a 1D, two-grid case where h 0 is the coarse mesh and h 1 is the fine mesh, this yields: At this point, we have the possibility to express in compact matrix form a variety of multigrid operators, with changing number of H and P levels, types of cycle (V and W) and sweep patterns. Moreover, for a fixed 288 multigrid configuration, a number of parameters can still influence the numerical behaviour: the time-step at each level, the mesh characteristics and the ratio of convection to diffusion, among others. The procedure for performing a non-modal analysis is similar to a von Neumann analysis in that we seek to write the numerical scheme in matrix form: However, instead of analyzing the eigenvalues of A j , Fernández et al. [40] define a short-term diffusion where * ≤ 0 is expected. This quantifies the change in the magnitude of u j right after the simulation starts, with τ * = t a/h * a non-dimensional convection time per DOF. Note that h * j = h j /(P + 1), is the 298 smallest length scale of the discretization. The above can be rewritten as: where u j,0 is the initial condition, and † denotes the complex conjugate. The former equation provides an solution is an eigenvector of the discretization operator A. Likewise, the dispersion curves are symmetric as in the von Neumann analysis; however, they are not periodic.

306
In this study, we are interested in solving the steady-state problem A j u j = 0 with a good convergence rate. Knowing that the errors satisfy the same equation as the numerical solution, we relate the efficiency 308 of a multigrid configuration to its non-modal dissipation curve. A multigrid operator with higher shortterm diffusion (more negative values of * ) will dissipate the error-waves faster, therefore having a higher 310 convergence rate.

Non-modal analysis of a FR multigrid operator 312
For the purpose of this paper, it is imperative to work with fully-discrete schemes, both in space and time. The short-term diffusion curve is obtained from a spatial discretization operator; one can nevertheless 314 analyze a fully-discrete operator by taking the following approximation: We have substituted the spatial operator A j by the fully discrete operator Z j . This definition is con-316 sistent with the authors' original equation (35), meaning that in the time-continuous limit ∆t → 0, as time-discretization errors become negligible, one recovers the dissipation curve of the spatial discretization 318 operator. By using G j instead of Z j , the analysis can also be extended to multigrid operators.

Comparison against von Neumann analysis 326
In figure 1, a comparison is made between non-modal analysis and von Neumann analysis in order to illustrate their similarities and differences. The modal (von Neumann) dissipation curves are computed from 328 the eigenvalues of the operator -see for instance [38], and the so-called physical eigenmode is the one that approaches a zero-dissipation in the zero-frequency limit. As expected, the physical eigenmode approaches 330 the non-modal dissipation curve in the low-wavenumber region. Following reasoning made in [38] and [40], the contrary happens in the high-wavenumber region, where all eigenmodes contribute in similar proportions. 332 This is especially clear at P = 1, where the non-modal dissipation curve approaches that of the spurious eigenmode at high-frequencies. We note that the use of terms "physical" or "spurious" are probably not 334 adequate in this context, as we are using a multigrid approach that does not try to accurately follow a physically meaningful transient behaviour. Rather, what we call the "physical eigenmode" in this context 336 is the one responsible for the coarsest-level smoothing. Its range of influence in the frequency spectrum is therefore reduced at increasing polynomial orders, as it can be seen at P = 2.
Figure 2: Non-modal dissipation curves of the p-multigrid operator (red) and fully-explicit marching (black line) at higher polynomial orders, in the time-continuous limit. Throughout this work, the time-continuous approximation is made by choosing ∆t < ∆tmax · 10 −4 , where ∆tmax is the stability limit of the fine-level smoother. Below this threshold, the dissipation curves are virtually independent of the time-step.

Effect of the polynomial order
V-cycle p-multigrid is considered with only one smoothing step at each level. Figures 1 2 compare, for 340 polynomial orders P = 1, .., 6, the non-modal dissipation curve of p-multigrid and the corresponding explicit operator in the limit ∆t → 0 so that time-discretization errors are negligible. The limit ∆t → 0 results in 342 a "pseudo-time-continuous" p-multigrid operator G P which can be compared to the corresponding spatial operator A P at the finest level. The mesh is uniform in all cases, h = 1. It is clearly seen how dissipation is 344 increased in the low wavenumber range for all orders, the difference becoming greater for higher polynomial orders. This suggests that the p-multigrid effectiveness would increase with the polynomial order. 346

Effect of the sweep pattern
Coarse-level sweeps are computationally inexpensive with respect to fine-level sweeps, therefore multi-348 grid implementations usually perform more smoothing steps at coarser levels to increase low-wavenumber dissipation. We denote the sweep pattern (SP) as the number of smoothing steps used for each multigrid 350 level. Let us consider three possible choices: • m p ≡ 1, constant ("all-ones") SP. One pre-smoothing step and one post-smoothing step are performed at all levels. This is the choice made in the previous figure 2.

354
• m p = 2 P −p , geometric SP. 2 P sweeps are performed at p = 0. The limited efficiency of level p = 0 has also been reported in the literature [95] in the context of a 370 discontinuous Galerkin discretization.

V-cycle and W-cycles
372 Figure 5 shows the non-modal dissipation by means of V-cycles and W-cycles for P = 2 and P = 3, always using a geometric sweep pattern. In the W-cycle there is an extra parameter, the number of w-sweeps, that 374 corresponds to the number of extra loops between p = 1 and p = 0 with respect to a V-cycle. For instance, at P = 2 with a geometric sweep pattern:
• The sequence in a W-cycle with 1 w-sweep is:

378
• The sequence in a W-cycle with 2 w-sweeps is: At P = 2, the difference between V-cycles and W-cycles is relevant. For the same amount of fine-level it is only expected that the relative importance of p = 1 and p = 0 decreases, and therefore the advantage of 384 using a two-level loop in the W-cycle is limited. Similar to increasing the number of coarse-level smoothing steps as in figure 4, the behaviour near the high-wavenumber limit remains largely unchanged. as it has already shown to have the highest dissipation rate. Figure 6 plots the dissipation curves of the combined hp-multigrid operators compared to the p-multigrid. It can be seen that the numerical behaviour 392 is not substantially different for higher polynomial orders (P ≥ 3), even with the addition of only one mesh.
One may draw two observations:

394
• Most of the frequency spectrum does not seem affected by the use of coarser meshes, which implies that the p-multigrid dominates the dissipation behaviour over the h-multigrid. Raising the number 396 of h-levels (or coarser meshes) barely seems to increase the dissipation rate. Note that, by using a geometric sweep pattern and especially at high polynomial orders, the number of sweeps at the coarsest 398 mesh becomes very high. The computational cost of doing such a number of smoothing steps should be considered against the marginal increase in the dissipation rate, especially for P = 4, where even 400 the sole addition of more meshes is barely effective.
• However, in the limit φ → 0, there are significant differences in the dissipation rate, especially at 402 low and moderate polynomial orders. Take for instance φ = π/16, which at P = 3 represents an error wave of wavelength λ = 8. This means that a full length spans 8 cells, which seems reasonable   for a low-frequency error in an actual simulation. Without h-multigrid, the damping factor of the V-cycle is * p (π/16) = −3.3; the same V-cycle with an additional mesh (hp-multigrid) has a damping 406 * hp2 (π/16) = −6.6, and with two more meshes it decreases down to * hp4 (π/16) = −12.6 (see figure  7). This ratio is approximately maintained at P > 3, being even higher at P ≤ 2, which is expected to 408 translate into an improved convergence rate when the solution has a strong content in low-frequency errors. Therefore, in spite of the curves being visually similar, the use of hp-multigrid would still be 410 advantageous.

1D convection-diffusion
Following what has been shown in the literature, high degree of convection can degrade the performance 418 of the multigrid [91], and the same applies to high aspect-ratio meshes [93]. Therefore, we look for those multigrid configurations that damp the expected reduction in the dissipation rate.

420
The dissipation of V-cycle p-multigrid with geometric sweep pattern for P = 1, .., 4 is shown in figure 8 for different values of the Péclet number. The low-and mid-wavenumber dissipation is most affected by the 422 presence of convection, whereas in the high-wavenumber limit the change is not as drastic, especially at low polynomial orders. Overall, the dissipation decreases as the Péclet number increases, and is greatly reduced 424 at already moderate convection (Pe = 10). The case Pe = ∞ has also been plotted, even though a pureconvection problem has no steady solution, to illustrate the similarity between Pe = 10 and pure-convection.

426
The multigrid dissipation curve of mixed convection-diffusion at Pe ≥ 100 is already indistinguishable from Pe = ∞. Of particular interest is the apparent anti-dissipative behaviour shown by the latter at P ≥ 3, 428 which would imply an unstable scheme. In practice, the dissipation introduced by the temporal discretization prevents this from happening. Note also that the fully-explicit marching in the Pe → ∞ limit shows a much 430 smaller dissipation than the multigrid at low polynomial order. From previous results it had been shown that W-cycles or hp-multigrid increases the dissipation, so 432 applying them in mixed convection-diffusion may help in counteracting the flattening of the curves at higher Péclet numbers. Figure 9 compares several multigrid strategies at P = 2 and P = 3, from where it is 434 seen that (i) W-cycles increase the overall dissipation, especially in the mid-and high-wavenumber region, (ii) hp-multigrid increases the low-wavenumber dissipation and (iii) the combined use of W-cycles and hpmultigrid retains both advantages, yielding the most efficient scheme for mixed convection-diffusion. This trend maintains for higher polynomial orders. In the next section, we will cover how the results shown so 438 far extend to 2D and in the presence of high aspect-ratio cells.

2D convection-diffusion 440
In 2D, two independent wavenumbers φ x and φ y define the initial wave. Therefore, we show dissipation surfaces rather than dissipation curves. The initial condition is a plane wave forming an angle with the 442 x-axis given by θ = tan −1 (κ y /κ x ). This plane wave is parallel to the x-axis when κ y = 0 and parallel to the y-axis when κ x = 0.

2D diffusion on square cells
In figure 10 we can see the dissipation surfaces at P = 2 of a 2D diffusion problem. With respect to 446 the V-cycle, the W-cycle increases the dissipation in the whole frequency spectrum, and the hp-multigrid has a sharper decrease in the damping factor at very low frequencies: all of this is consistent with what has 448 been reported in 1D, and the same trend applies to polynomial orders P = 1, ..., 6 alike. We find that the behaviour of multigrid operators extend naturally from 1D to 2D square meshes in the case of pure diffusion; 450 therefore we avoid unnecessary repetitions and focus on high aspect-ratio cells and mixed convection-diffusion instead.   For simplicity we only consider convection speeds a x = 1 and a y = 0, so convection happens along the x-axis, and a Péclet number of Pe = 10 3 which implies ν = 0.001. For convection-dominated flows, one may 470 approximate that the maximum allowable time-step scales with h −1 and therefore the scaled dissipation is taken as * r −1 h . At P = 2 and using a V-cycle p-multigrid, figure 12 shows the dissipation surfaces at increasing aspect-ratios. In a square cell, the surface is regular and monotonic, and it can be easily connected to the 1D analogous curve shown in figure 8. The behaviour changes drastically as we move to an aspect-ratio 474 r h = 10, seeing an overall decrease in dissipation especially for the low-frequency, y-component of the errors which remains largely undamped. On the other hand, as the aspect ratio increaser further, the damping 476 factors of the high-frequency, y-component of the errors decrease comparatively while the overall shape of the surface becomes similar to that of figure 11. This implies that very high aspect-ratios dominate the 478 multigrid behaviour over the relative importance of convection or diffusion. To mitigate the dissipation reduction in high aspect-ratio, one may propose to use W-cycles and hp-480 multigrid. It was shown in figure 9 that this strategy has a certain efficiency in mixed convection-diffusion, and we obtain similar results when applied to high aspect-ratios, see figure 13. The W-cycle effectively dou-482 bles the maximum dissipation and the use of hp-multigrid shows higher damping near the low-wavenumber region in φ y . As in the case of figure 9, the hp-multigrid with W-cycles retains the advantages of both 484 W-cycles and hp-multigrid. The trends observed in this section maintain at higher polynomial orders, as it can be seen in figure 14.

Verification with a Method of Manufactured Solutions
We now solve two simple, linear convection-diffusion problems in 1D using multigrid methods and compare 488 the convergence rate against the results of the non-modal analysis. The test cases are steady manufactured solutions proposed in [98] to assess the numerical properties of Flux Reconstruction. In order to measure 490 the experimental convergence, in each one of the simulations we will compute the convergence factor ρ: where N I is the number of iterations with time-step ∆t needed to achieve a certain residual norm ||r N I ||. damping factor, while * would be a theoretical damping factor. The value of ρ is expected to approach (P +1) * for a given error frequency k. The factor (P +1) must be introduced since the non-modal damping 500 factor represents damping per normalized time (cf. [40] for more details).
Time-marching (or smoothing operations) is done using the Optimal Runge-Kutta (ORK) schemes re-502 cently developed by Vermeire et al. [102], whose coefficients have been optimized for FR-DG discretizations of linear advection in order to maximize the allowable time-step while dropping time-accuracy. As this study 504 focuses on multigrid marching to a steady-state solution, neither is time-accuracy of interest here.

1D periodic wave 506
The computational domain comprises the interval x ∈ [−π, π] with periodic boundary conditions. The linear convection-diffusion in 1D is to be solved with a constant source term s(x) and the corresponding 508 steady-state analytical solution u e (x) being as follows [98]: where we have introduced k = 1, 2, 4, 8, ... as an additional free parameter. The steady solution is a sinusoidal 510 function of amplitude a/ν. In this study, we will use u 0 (x) ≡ 0 as an initial condition, and therefore it can be seen that the initial error is a wave of the same frequency as the steady solution itself. This allows to use 512 the test case to assess the convergence rate of the multigrid for a single wavenumber, which can be related to a single value of the non-modal dissipation * (φ). We will focus on mixed convection-diffusion with a = 1 and ν = 1, and polynomial orders P = 1, ..., 6. The case setup is as follows. For a given polynomial order P , the domain is divided in a number or cells N C , which for the zero-initial condition yields the following 516 error wavenumber φ: We will focus on the behaviour in the low-frequency spectrum by using fine meshes and alternating 518 between k = 1 and k = 2. The normalized wavenumber φ is used to compute the theoretical damping factor (P + 1) * (φ) for the multigrid configuration being tested. The system is solved by marching to steady state 520 until the cell-average L 2 norm of the residuals drops below 10 −6 with respect to the initial residual, and then the convergence factor ρ is computed. The time-step size is chosen between 0.5 × ∆t max and 0.6 × ∆t max , 522 with ∆t max the stability limit of the fine-level smoother.
The results are shown in table 1. A total of 44 simulations have tested, for polynomial orders 1 to 6, the 524 most common multigrid configurations: p-multigrid with V-cycles, p-multigrid with W-cycles, hp-multigrid with V-cycles and hp-multigrid with W-cycles. The geometric sweep pattern is used in all configurations.

526
Two w-sweeps are used in the W-cycles, and one additional mesh is used in hp-multigrid. The main outputs of table 1 are the values of the experimental convergence rate ρ and the theoretical factor (P + 1) * .

528
Very good agreement is shown in most cases. With the exception of some clearly discrepant values (e.g. P = 4, k = 2 with W-cycle p-multigrid), the relative error between experimental and theoretical rates of 530 convergence is below 10%, which seems satisfactory for predictive purposes. Of particular notice is that the non-modal damping factor, being representative of short-term dynamics, can predict long-term behaviour 532 with surprising accuracy in cases in which a large number of iterations is needed to converge (see V-cycle p-multigrid at k = 1 for P = 1, 2, 3). However, one would accordingly expect the non-modal analysis to 534 better estimate the higher convergence rates. This is clearly not the case -see for instance P = 5, k = 2 for W-cycle hp-multigrid -, probably due to the following:

536
• The non-modal dissipation is obtained in the time-continuous limit ∆t → 0. Hence, we are approximating the dissipation behaviour of a multigrid operator in the same way that we would approximate 538 the properties of a discrete scheme -in space and time -by the properties of the spatial discretization operator. The dispersion and dissipation curves are known to be more sensitive to time-discretization 540 errors in the high-wavenumber range -see [30]. This would explain why simulations at higher error wavenumbers yield higher discrepancies between ρ and (P + 1) * .

542
• In the case of a very fast convergence (see W-cycle hp-multigrid at P = 5, 6), the approximation used for computing the convergence factor, equation 38, may also yield higher errors. It can be shown that 544 reducing the time-step makes ρ approach (P + 1) * , which is theoretically consistent.
Finally, it is worth noting the improvement in the convergence rate as the polynomial order increases.
546 Table 1: Convergence data for the 1D periodic problem with a = 1, ν = 1. All multigrid operators use geometric sweep pattern; W-cycles use 2 w-sweeps and hp-multigrid use one additional mesh. The same time-step is used across all levels, and required relative residual drop is 10 −6 .  All simulations using p-multigrid have a comparable number of degrees of freedom (around 30), and as the polynomial order increases, the number of iterations decreases steadily even if the time-step is also lower. This is probably due to the increasing number of multigrid p-levels for higher values of P . For the V-cycle hp-multigrid at P > 2 and even more for the highly efficient W-cycle hp-multigrid, the number of iterations 550 N I required to converge is maintained approximately constant as the number of degrees of freedom increases with P . In all those cases the cell width is the same, h = 0.7854, and thus the results can be considered as 552 a rough indication on the quasi-h-independence on the convergence of hp-multigrid.

554
The next test-case is a wall-bounded manufactured solution which mimics a boundary-layer profile. The simulation domain is the interval 0 ≤ x ≤ 1 with Dirichlet boundary conditions u(0) = 0 and u(1) = 1.

556
Inside the domain, the analytical source term and the exact solution are [98]: x u a/ν = 1 a/ν = 10 a/ν = 100 Figure 15: Analytical solution of the manufactured 1D boundary-layer problem from [98] for several values of the ratio a/ν.
As in [98], we plot the analytical solution for several values of a/ν in figure 15. While previously the error 558 wavenumber was known and unique, that information is not available now and therefore we resort to the frequency-averaged dissipation, * as a simplified representation of the theoretical predictions. Likewise,  higher for all polynomial orders, which is an expected result (see figure 8). Assuming that the frequency content of the errors is richer at the initial stages of the simulation, before high-frequency errors are damped, 572 this factorρ −1 is closer to what the average dissipation (P + 1) * is representing in this case, although we emphasize that they are not mathematically related. (P + 1) * would represent a short-term damping 574 factor assuming an equal distribution for the error wavenumber, and would also not be constant throughout the simulation since, as high-frequency errors are damped out, only low-wavenumber errors are left for which 576 the damping factor is much lower. Overall, the interest in computing (P + 1) * for predictive purposes is debatable, but from an academic point of view, it is interesting to note thatρ −1 and (P + 1) * follow the 578 same trend -decreasing as the ratio a/ν increases. This points to the possibility of finding more elaborated quantities instead of a rough average (P + 1) * to better estimate the actual behaviour with mixed error 580 frequencies, something that remains for future investigation.
The long-term convergence factorρ −10 is orders of magnitude lower, and also more similar to the convergence factors computed for the 1D periodic wave case. For all polynomial orders we see that the convergence factor worsens significantly from a/ν = 1 to a/ν = 10, but improves again for a/ν = 100, contradicting the expectations. In order to explain this, we argue that the manufactured solution at a/ν = 100 has a sharp profile and that, departing from a zero-initial solution, the initial error intuitively has a heavy high-frequency component. Additionally, the analytical solution having the following term: we know that the initial error contains a low-frequency wave with non-dimensional wavenumber φ = πh/(P + 1), with h = 0.05, whose amplitude decreases at higher degrees of convection. This suggests that the improved convergence at higher convection is also case-dependent.

584
The long-term convergence factorρ −10 also becomes more negative as the polynomial order increases, but without translating into faster convergence rates. Higher degrees of convection consistently require a smaller 586 number of iterations, and higher polynomial orders increase the number of iterations. The explanation could lie in what has been argued in the previous paragraph, but also in the allowable time-step. Notice the sharp 588 decrease in the time-step used from one polynomial order to the next (the time-step was chosen following the same criterion as for the 1D periodic wave). Also, higher ratios of a/ν can use higher time-steps, since 590 the viscosity coefficient is decreasing and the convection velocity is kept constant, so the viscous time-step restriction fades away. All in all, the time-step stability limit obviously influences the required number of 592 iterations, and this is not taken into account by the non-modal analysis. Therefore, the interpretation of * onto an expected convergence rate or computational cost should be made with caution. Altogether,

594
after showing in the first test-case how the non-modal analysis can successfully estimate damping rates of single-frequency errors, this test-case illustrates some of its limitations as a tool for quantitative predictions.

Conclusions
In this paper, we have presented an application of the non-modal analysis (NMA) to study the numerical 598 behaviour of multigrid schemes in Flux Reconstruction. The FR discretization of linear convection-diffusion has been briefly explained, and a compact matrix form has been obtained for a generic p-or hp-multigrid 600 operator using explicit Runge-Kutta smoothing. This has allowed to assess the non-modal (short-term) dissipation of a number of multigrid configurations, in the frequency domain, assuming that a stronger 602 dissipation is indicative of a faster convergence rate.
The analysis has first focused on 1D diffusion, where it has been shown that the non-modal dissipation 604 is higher for a p-multigrid operator than for a no-multigrid smoother -i.e. Runge-Kutta marching at order P -, and that the increase in dissipation is more pronounced at higher polynomial orders. Also, increasing 606 the number of coarse-level smoothing steps by means of the so-called geometric sweep pattern -performing 2 P −p sweeps at level p -effectively increases the multigrid damping factors. Similar effects are observed by 608 using W-cycles instead of V-cycles, or by using hp-multigrid with an increasing number of coarse meshes -hmultigrid levels. In particular, the hp-multigrid appears to add significant damping for very low-wavenumber 610 errors.
The non-modal analysis has been extended to mixed convection-diffusion, where it has been observed 612 that higher Péclet numbers decrease the dissipation rates especially at higher polynomial orders. This can be partially counteracted by the use of W-cycles and hp-multigrid. The analysis results extend naturally 614 to 2D cartesian meshes, where again convection-diffusion translated into reduced dissipation and a similar conclusion can be drawn for high aspect-ratio meshes.
The analysis has been compared against simulations results of manufactured solutions proposed in [98]. A steady 1D periodic-wave problem has been considered, the convergence factors predicted by the non-modal 618 analysis showing very good agreement with the experimental ones. This test-case has demonstrated the increase in multigrid efficiency at higher polynomial orders, and same has been confirmed for the use of W-620 cycles and hp-multigrid to increase low-wavenumber damping. Next, we have assessed in a 1D boundary-layer problem the capability of non-modal analysis to predict convergence in cases where the frequency content of 622 the errors is not known a priori. The short-term damping observed in simulations follow the same trend as an averaged damping factor * obtained from non-modal dissipation curves. In this case, higher damping 624 does not translate into an improvement in convergence, highlighting some of the current limitations of this analysis. As used in this paper, the NMA does not take into account time-step restrictions and therefore 626 an increase in the damping factor might be neutralized by a smaller stability limit. Along these lines, an extension of the non-modal analysis for finite time-steps ∆t > 0 is necessary to obtain convergence factors per 628 iteration, which are more pertinent for convergence acceleration. A non-modal analysis of a fully-discrete operator would also improve the damping factor estimations against actual simulations, especially in the 630 high-wavenumber range. Finally, based on the ability of the NMA to predict the convergence behaviour for single-frequency errors, an effort to more tightly relate the dissipation curves to convergence in more complex 632 simulations remains for future work.