An implicit HDG method for linear convection-diffusion with dual time stepping

This work presents, for the ﬁrst time, a dual time stepping (DTS) approach to solve the global system of equations that appears in the hybridisable discontinuous Galerkin (HDG) formulation of convection-diffusion problems. A proof of the existence and uniqueness of the steady state solution of the HDG global problem with DTS is presented. The stability limit of the DTS approach is derived using a von Neumann analysis, leading to a closed form expression for the critical dual time step. An optimal choice for the dual time step, producing the maximum damping for all the frequencies, is also derived. Steady and transient convection-diffusion problems are considered to demonstrate the performance of the proposed DTS approach, with particular emphasis on convection dominated problems. Two simple approaches to accelerate the convergence of the DTS approach are also considered and three different time marching approaches for the dual time are compared. an the

As with other implicit methods, the computational cost and memory requirements of the HDG method can become prohibitive when applied to problems that require a large number of degrees of freedom. This is of major importance for non-linear and/or transient problems in three dimensions [23,18,13]. In such scenarios, the solution of the large sparse global system of equations is typically performed using iterative methods. Although efficient preconditioners have been recently devised for certain problems [24,25], there are many situations where the application of HDG to large-scale problems becomes challenging.
The dual time stepping (DTS) approach was originally proposed, almost three decades ago [26], in the context of compressible fluid flow problems, where implicit time marching algorithms are employed to avoid the severe restrictions of explicit time marching methods. The main idea of the DTS is to add a dual time, or pseudo time, derivative to the discrete equations and to recover the implicit solution as the steady state solution in dual time [27][28][29][30]. The efficiency of the DTS approach is mainly dictated by its ability to obtain a fast convergence to steady state in each physical time step.
The DTS approach has been recently applied to high-order methods such as the standard DG method [31,32] and the flux reconstruction method [33][34][35], including several improvements to accelerate the convergence of the subiterations. This paper presents a DTS approach to solve the global system of equations that appears in the HDG formulation. The proposed approach is presented using a convection-diffusion model problem in one dimension. A proof of the existence and uniqueness of the steady state solution of the HDG global problem with DTS is presented. In addition, the stability limit of the DTS approach is derived using a von Neumann analysis, leading to a closed form expression for the critical dual time step. Finally, an optimal choice for the dual time step is proposed by imposing maximum damping over the whole range of frequencies.
The performance of the proposed DTS approach is analysed for steady and transient problems, by comparing the minimum number of dual time steps required to reach the steady state solution in each case. The study is performed for different spatial and temporal discretisations as well as for different convection-diffusion regimes, with particular emphasis on convection dominated problems. Furthermore, two techniques, previously applied in the context of DTS to improve the convergence of the subiterations, are considered. The first approach involves an extrapolation technique designed to create a better initial condition for the DTS [36]. The second approach involves the use of local DTS [37], which is equivalent to a preconditioning, and it is applied to a problem where the solution exhibits a boundary layer and the mesh contains elements of very different size.
The remainder of the paper is organised as follows. Section 2 summarises the standard HDG formulation of the convection-diffusion model problem. The spatial discretisation with isoparametric high-order elements and the temporal discretisation with implicit backward differentiation formulae (BDF) formula is presented in section 3. The proposed DTS technique for the HDG global problem is presented in section 4, including the proof of existence and uniqueness of the steady state solution, the von Neumann stability analysis of the DTS approach and the optimal choice of the dual time step. Numerical examples involving steady and transient convection-diffusion test cases are presented in section 5. This section also includes a comparison of the performance of the forward Euler method and two multi-stage Runge-Kutta methods for the dual time marching. The examples include cases where extrapolation and local time stepping are used to accelerate the convergence of the DTS approach. Finally, section 6 summarises the conclusions of the work that has been presented. Appendix A presents a brief analysis of the stability and the optimal choice for the dual time step based on the eigenvalues of the global HDG matrix.

The model problem
Let us consider the non-dimensional transient convection-diffusion model problem in an open bounded domain ⊂ R, where a is a smooth velocity field, ν is the positive diffusion coefficient, s is a source term, u 0 is the initial condition and T is the final time. To simplify the presentation, Dirichlet boundary conditions are considered on the boundary of the domain ∂ , but the application of other boundary conditions can be easily considered. The parameter δ can take the value zero for a steady problem or one if a transient problem is of interest.

Mixed formulation in the broken computational domain
A partition of the domain, , in n el disjoint subdomains e is considered and the mesh skeleton, or internal interface, is defined as By introducing the mixed variable, q = −ν ∂u ∂x , the transient convection-diffusion model problem can be written as a first-order system of equations in the broken computational domain where · denotes the jump operator, defined on using the values from the left and right of the interface, e.g. i and j , as = i − j .

HDG formulation
Following previous work on HDG methods [38,3,39,4,5,40], the strong form given by equation (3) is solved in two stages. First, the local problems are defined as for e = 1, . . . , n el . It is worth noting that the local problems can be solved independently if the Dirichlet boundary data given by the so-called hybrid variable, û, is known.
Second, the global problem imposes the continuity of the solution and the normal flux across the interface It is worth noting that the first equation can be omitted due to the unique definition of the hybrid variable û on the interface and the imposition of the Dirichlet boundary condition in the local problems.
The semi-discrete weak formulation of the local problems, given by equation (4), reads: for e = 1, . . . , n el , given û on has been introduced. In this definition, P p ( e ) denotes the space of polynomials of degree at most p ≥ 1 in e .
Following [4,41], the numerical normal trace corresponding to the convective component is defined as and the numerical normal trace corresponding to the diffusive component is defined as In the above expressions, τ a and τ d are stabilisations parameters related to the convective and diffusive component, respectively, and θ takes value -1 on x e 1 and value 1 on x e p+1 . Following previous work on HDG methods [4,41], the stabilisation parameters are selected as where is a characteristic length. Introducing the numerical normal traces from equations (7) and (8) in equation (6) leads to the semi-discrete local problems: for e = 1, . . . , n el , given The discrete formulation of the global problem given by equation (5) becomes: find û h such that Introducing the numerical normal traces from equations (7) and (8) in equation (11) leads to the following semi-discrete

Spatial discretisation
The discretisation of the weak form of the local problem, given by equation (10), leads to a system of equations that may be expressed as Similarly, the discretisation of the weak form of the global problem, given by equation (11), leads to a system with the following structure It is worth noting that A e qu = (A e uq ) T and A ê uq = (A e qû ) T . However, the global problem is not symmetric as A ê uu = (A e uû ) T .

Temporal discretisation
In this work, if δ = 1, the time discretisation is performed using either first or second order accurate backward differentiation formulae (BDF) [42]. The application of other implicit time marching algorithms is straightforward. Introducing this approximation of the first-order time derivative in equation (13) for the second order accurate BDF (BDF2), where t is the time step and the super-index n denotes the evaluation at time t n .

Implementation
The local problem of equation (15)  . Introducing these expressions in the global problem of equation (14) leads to the global system of equations where the global matrix is obtained by assembling the elemental contributions given by Similarly, the global right hand side is obtained by assembling the elemental contributions given by Once the global system is solved, the local problems (15a) and (15b) are used to obtain the approximation of the primal and mixed variables in each element at time t n+1 . Remark 1. If δ = 0 the temporal discretisation using BDF is not required, leading to a global system that can be written as in equation (18) The notation used here for the numbering of the primal, mixed and dual variables is illustrated in Fig. 1 for a uniform mesh with quadratic elements. Inserting the expressions of equation (21) into the global problem leads to the equation for a generic vertex i, not on the boundary, where

Global problem with dual time stepping
The HDG method is known to more efficient than other DG methods [43,44]. This is due to the reduced number of degrees of freedom induced by the introduction of the hybrid variable as an independent unknown. However, its implicit nature leads to a large sparse global system of equations that needs to be solved to obtain the approximation of the trace of the solution on the element faces.
To avoid this requirement, the DTS approach is applied to the global problem of equation (18). The proof of existence and uniqueness of the solution of the modified problem is presented and the stability limit for the dual time step is derived. In addition, an optimal choice for the dual time step is presented.

Formulation
A first-order dual time derivative is added to the HDG global problem, creating the system of ordinary differential equations where t is the so-called dual (non physical) time.
It is clear that, if a steady-state solution of the problem (25) exists, the steady state solution coincides with the solution of the HDG global problem of equation (18).
The objective of the DTS approach is to advance the solution in the dual time until a steady-state is reached. Therefore, in principle, a high order accurate scheme in the dual time is not needed. In this work, the system of ordinary differential equations (25) is discretised in the dual time, t , using the first-order explicit forward Euler method, leading to the scheme (26) to advance the solution in the dual time, where the superscript m denotes the evaluation at the dual time t m .

Remark 2.
There are several alternatives to accelerate the convergence to the steady state. First, it is possible to perform the widely used local DTS approach where, in an HDG context, the nodal values of each face are advanced at a different dual time step. Second, an extrapolation based on previous physical time steps can be devised to obtain an initial condition for the DTS that reduces the overall number of dual time steps. In addition, the dual time discretisation can be performed with other explicit time marching algorithms. Despite time accuracy is not required, it is worth noting that other time integrators, such as multi-stage Runge-Kutta methods, could offer a faster convergence to the steady state. Finally, it is possible to include a preconditioning matrix in the formulation of the global problem with dual time, namely The first three strategies are studied numerically in section 5. The use of preconditioning matrices is beyond of the scope of the current work.

Stability analysis
The stability of the DTS scheme is studied first. Then, some of the expressions derived to obtain the stability limit are used in section 4.3 to develop the proof of existence of the steady state solution and the optimal choice for the dual time step. Lemma 1. Assuming a uniform spatial discretisation with characteristic element size h and degree of approximation p, a constant velocity a ∈ R and a positive constant diffusion coefficient ν ∈ R + , then α := α i = α j , β := β i = β j and γ := γ i = γ j , ∀i, j. The DTS scheme of equation (26) can be written aŝ (28) for a generic vertex x i that is not on the boundary.

Proof.
Assuming that all elements have the same characteristic size h and degree of approximation p, the matrices M e , A e uq , A e qu and A e qû are the same for all the elements. A constant velocity implies that the matrices A e qq are the same for all the elements. A constant diffusion coefficient implies that the matrices A e uu and A e uû are the same for all the elements.
As the stabilisation parameter τ depends upon a and ν, this implies that the matrices R e and S e are the same for all the elements. This completes the proof.
Proof. Combining equations (22a) and (22b) leads to Taking the sum over all the rows and using the definition of the matrices B e uu , A e uq and A e uû , the two equations are obtained, where ϑ = 1/ t for BDF1 and ϑ = 3/(2 t) for BDF2. The positive weights w j for j = 1, . . . , p + 1 correspond to the quadrature weights associated to the nodal distribution x j in an element and are obtained using the partition of unity property of the shape functions, namely Equation (30) was derived using the two properties resulting from the definition Lagrange polynomials and their first derivative. Adding equations (30a) and (30b) leads to which concludes the proof.

Theorem 3.
Under the assumptions of Lemma 1 and assuming that δ = 1, the scheme of equation (36) is Proof. To perform the classical von Neumann stability analysis of the DTS scheme, the homogeneous version of equation (28) is considered. As usual in this context, constant coefficients and a uniform spatial discretisation are considered, leading to the discrete equation for a generic vertex, x i , not on the boundary. Assuming that the initial condition can be written as a convergent Fourier series and using the linearity of the problem, only the evolution of a single Fourier mode, namelŷ needs to be considered, where I = √ −1 and k is the wave number.
Inserting equation (37) into the discrete scheme of equation (36) and, leads to the relation where the numerical amplification factor is given by and ξ = kh is the dimensionless wave number.
For a given dual time step, the extreme values of the square of the modulus of the numerical amplification factor are obtained when ∂|G(ξ, t )| 2 /∂ξ = 0, leading to possible solutions and ξ = π. (40c) The solution of equation (40a) gives Similarly, the solutions of equations (40b) and (40c) give and respectively.
If αγ = 0, the first solution corresponds to an inflection point, so either G 2 or G 3 corresponds to a maximum. If αγ > 0, the first solution is a minimum because ∂ 2 |G(ξ, t )| 2 /∂ξ 2 = 8αγ sin 2 (ξ ) t 2 > 0. If αγ < 0, the first solution is a maximum, provided that Imposing that G 1 < 1 leads to the stability condition . (43) The most restrictive condition given by equation (43) is obtained for ξ = 0 or ξ = π , meaning that either G 1 = G 2 or This means that, from a stability point of view, only the solutions corresponding to ξ = 0 or ξ = π are of interest. Particularising the value of the numerical amplification factor for ξ = 0 and ξ = π , gives Imposing the requirements that G(0, t ) > −1 and G(π , t ) > 1, the critical time step of equation (35) is obtained. Finally, imposing that G(0, t ) < 1 and G(π , t ) < 1 and noting that the dual time step, t , is positive by definition, it is clear that (β + α + γ ) > 0 and (β − α − γ ) > 0, which means that the critical dual time step is strictly positive.
Proof. Following the proof of Theorem 3 it is clear that |G(0, t )| 2 = 1 so it is only necessary to impose that |G(π , t )| < 1, leading to the stability limit The proof is concluded by using Lemma 2.
The value of the critical dual time step depends upon several physical and numerical parameters, so that To illustrate the influence of these parameters, Fig. 2 shows the value of t , with the BDF1 method used for the physical time marching, as a function of the Péclet number P e = ah/(2ν) and the Courant number C = a t/h. Different values of the velocity a, the diffusion coefficient ν and the degree of approximation are considered. The white dashed line denotes the curve where α + γ = 0 that produces a change in definition of the critical time step given by equation (35). Numerical experiments indicate that for an even value of the degree of approximation, p, the value of α + γ changes sign. For an odd value of p the value of α + γ is always negative. It is worth noting that the range of values used in Fig. 2  and a = 1, as shown in Fig. 2(a), the critical dual time step is t ≈ 0.0937 for P e = 0.1 and C = 0.1, whereas t ≈ 1.4814 for P e = 10 and C = 0.46. In all cases, there is a region, for C < 1 and P e > 10 2 , where the critical dual time step is maximum. However, this region is not of great interest when implicit time integrators are employed. For a fixed P e and C , the results show that the critical dual time step is almost independent on the diffusion coefficient ν, whereas it depends on the value of the velocity a. In fact, the results indicate that the critical time step depends on the inverse of the velocity. To illustrate the difference induced by the degree of approximation, Fig. 3 shows the critical dual time step as a function of P e and C in the range [1, 10 6 ], for a = 1, ν = 0.1 and for p = 1, . . . , 6. The results clearly indicate a different pattern for Fig. 3. Value of the critical dual time step, t , given by equation (35), as a function of P e and C for a = 1 and ν = 0.1.
odd and even values of p and show that the critical dual time step is lower for an even value of p compared to an odd value of p. The region where the critical dual time step is maximum is also slightly bigger when the value of p is odd. Further numerical studies reveal that the qualitative behaviour is similar when the BDF2 method is used for the physical time marching, with a slightly higher value of the critical time step for the BDF2 method compared to BDF1.

Existence of the steady state solution in dual time
The proof of the existence of the steady state solution of equation (25) will use the following results.
Proof. The resulting matrix of the HDG global problem can be written as The proof is completed by noticing that K is a Toeplitz matrix, whose eigenvalues are obtained using the Chebyshev recurrence formula [45] and are given by where N = n el + 1 is the size of the HDG global matrix.
(50) Theorem 3 shows that the maximum of |G(ξ )| 2 is achieved at ξ = 0 or ξ = π . Then, it is clear that |G(π /2)| 2 < 1 if the dual time step is selected below the stability limit derived in the previous section, that is if t < t .
Proof. In the proof of Theorem 3, the two inequalities were obtained, where the equality in equation (52a) corresponds to the steady case, with δ = 0.
After some simple algebraic manipulations, the inequality is obtained, which completes the proof.

Theorem 7.
Under the assumptions of Lemma 1, the steady state solution of the global HDG problem with DTS given by equation (25) exists and is unique.
Proof. If αγ ≤ 0, the real part of the eigenvalues of the HDG global matrix is (λ r ) = β for r = 1, . . . , n el + 1, according to Lemma 4, which is positive according to Lemma 5.
If αγ > 0, the eigenvalues are real and bounded as λ r > β − 2 √ αγ for r = 1, . . . , n el + 1. Using the result of Lemma 6, it can be concluded that λ r > 0 for r = 1, . . . , n el + 1. By using the spectral decomposition of K = V V −1 , the global HDG system with DTS can be written as where W := V −1 U, is a diagonal matrix containing the eigenvalues of K and g := V −1 f. The solution of the HDG global problem with DTS given by equation (54) can be written as which, for t → +∞, converges because all the eigenvalues are positive [46]. This completes the proof of existence.
A steady state solution of the HDG global problem with DTS given by equation (25) is, by definition, a solution of the original HDG global problem of equation (18). The uniqueness of the steady state solution with DTS is then a consequence of the unique solution of the standard HDG method.

Optimal choice of the dual time step
i.e. the value that provides the maximum damping of all the frequencies, is given by (57) Proof. The proof of Theorem 3 shows that the extrema of |G(ξ )| 2 correspond to the three solutions in equation (40).
If αγ ≥ 0, the solution given by equation (40a) does not correspond to a maximum. The solutions given by ξ = 0 and ξ = π lead to the expressions of the square of the numerical amplification factor given by equations (41b) and (41c) respectively. Comparing the expressions for G 2 and G 3 it is clear that the dual time step that provides a minimum value of both G 2 and G 3 is t = 1/β. If αγ < 0, the solution given by equation (40a) is a maximum, provided that equation (42) holds. Then, the solutions given by ξ = 0 and ξ = π lead to a minimum, so they are not of interest from the point of view of the optimal dual time step. After some algebraic operations, if equation (41a) holds, then G 1 can be written as Imposing the requirement that dG 1 Finally, it is worth noting that Lemmas 5 and 6 imply that t • > 0.
To illustrate the result of Theorem 8, Fig. 4 shows the numerical amplification factor for a = 1, ν = 0.01, n el = 160, p = 2 and t = 9.375 × 10 −4 for four different values of t . The amplification factor with t = t • corresponds to the optimal value given by Theorem 8. The amplification factors with t = 1/(α + β + γ ) and t = 1/(α − β − γ ) correspond to the values that provide maximum damping for ξ = 0 ξ = π respectively. Finally, the amplification factor given by the critical time step is also included. It can be observed that the optimal choice of t = t • provides an amplification factor with a minimum value of |G(ξ, t )| 2 L ∞ ([0,2π ) . In addition, this figure also suggests that a non-uniform dual time step can provide a faster convergence to the steady state. For frequencies corresponding to kh near zero, t = 1/(β + α + γ ) provides the maximum damping. For frequencies corresponding to kh near π , t = 1/(β − α − γ ) provides the maximum damping. In fact, it is possible to select a dual time step that provides maximum damping for a given frequency ξ . This value for the dual time step can be shown to be

Steady convection-diffusion
The first test case, taken from [46], considers the solution of the steady convection-diffusion equation in = [0, 1] with a = 1. The Dirichlet boundary conditions and the source term are selected such that the analytical solution is given by u(x) = sin(10π x) + 1.
First, a mesh convergence study is performed to numerically validate the implementation of the proposed approach.   rate of convergence of the primal variable is above the optimal rate p + 1, whereas the rate of the mixed variable is slightly below. The dual time step has been selected as t = 0.99 t to ensure stability of the DTS and the tolerance for the steady state residual has been selected as = 10 −12 .
Next, the influence of the spatial discretisation, namely the element size and the degree of approximation, in the number of dual time steps required to reach the steady state solution of the HDG global problem is studied. Fig. 6 shows the number of dual time steps required to reach the steady state as a function of the dual time step size, t , for different mesh refinements and orders of approximation, when ν = 0.1. The results show that, for a given dual time step, the number of steps required to reach the steady state solution of the HDG global problem increases as the mesh is refined. It is interesting to observe that the number of dual time steps seems to be independent of the degree of approximation p. The discontinuous line in Fig. 6 corresponds to the critical dual time step given by equation (35), which for this example coincides with the optimal dual time step of equation (57), represented with a dashed line. The results show that the theoretical DTS limit derived in this work is accurate.
Note that, usually, more dual time steps are required as the mesh is refined or the order of approximation is increased. This behaviour is attributed to the hybridisation process as the size of the HDG global system in one dimension does not depend upon the degree of approximation.  The results in Fig. 6 show a sudden increase of the number of dual time steps for the coarsest mesh and for a value of the DTS slightly over the critical DTS given by equation (35). This is only observed in the coarsest mesh, where the influence of the boundary conditions and the fact that a finite domain is used, implies that the critical DTS derived using the von Neumann analysis is below the exact critical DTS. To illustrate this phenomenon, the exact critical DTS, t λ , is defined as the DTS that leads to a maximum eigenvalue of the amplification matrix, G( t ) := (I − t K), with norm equal to one. The derivation of the exact value of the critical DTS is presented in Appendix A.1. Fig. 7 shows a detailed view of the results of Fig. 6, for the coarsest mesh, in the region containing the critical DTS. In addition to the critical DTS obtained from the von Neumann analysis, the exact value of the critical DTS is also depicted. It can be observed that the critical DTS from the von Neumann analysis is slightly below the exact critical DTS. It is worth noting that, as the mesh is refined, this effect is no longer observed as the influence of the boundary conditions, and the fact that a finite domain is used, is less important.
To study the influence of the convection dominated character of the problem on the proposed dual time approach, Figs. 8 and 9 show the same study for ν = 0.01 and ν = 0.001. Again, the results show that the number of dual time steps is not influenced by the degree of the approximation. More importantly, when comparing Figs. 6, 8 and 9, it can be clearly observed that the number of dual time steps required to achieve the steady state solution of the HDG global problem significantly reduces as convection dominates. In addition, it can be observed that the critical dual time step increases for convection dominated cases and that this limit is only weakly dependent on the number of elements.
These observations indicate that the proposed approach is particularly well suited for convection dominated problems.
For instance, using a mesh of 320 elements, the case with ν = 0.001 requires less than 1,000 dual time steps to reach the steady state, whereas the case with ν = 0.1 requires nearly 100,000 dual time steps. This observation coincides with the behaviour observed when other spatial discretisation schemes are used with DTS. For instance, using a second order finite difference discretisation of the convection-diffusion equation, the critical DTS is proportional to h when convection dominates, whereas the critical DTS is proportional to h 2 when diffusion dominates.  In all the previous examples, the tolerance of the dual time marching algorithm was set to = 10 −6 . Fig. 10 shows the number of dual time steps required to reach the steady state as a function of the tolerance for different spatial discretisations and for different values of the diffusion coefficient. In all cases, the dual time step has been selected according to the critical dual time step given by equation (35) and a quadratic degree of approximation has been selected. The results clearly show that for ν = 0.1 the number of dual time steps required to reach the steady state substantially increases as the tolerance is decreased. In addition, the number of dual time steps increases as the mesh is refined. In contrast for lower values of the diffusion coefficient, the number of dual time steps shows a very weak dependence on the tolerance as well as on the number of elements. Again, this demonstrates that the proposed approach is particularly well suited for convection dominated cases.
Finally, the relation between the element size and the number of dual time steps required to reach the steady state solution of the global HDG problem is studied. Fig. 11 shows the number of dual time steps required to reach the steady state as a function of the number of elements, in logarithmic scale. The results indicate a quadratic dependence when ν = 0.1 and ν = 0.01 whereas a two regions can be differentiated when convection dominates. For relatively coarse meshes, there seems to be a linear dependence between the number of elements and the number of dual time steps required whereas for finer meshes the quadratic dependence is again observed.

Performance comparison of different time marching methods in dual time
As mentioned in Remark 2, high order multi-stage Runge-Kutta methods have shown a superior performance for the integration in the dual time in using different spatial discretisations [34]. This sections compares the performance of the DTS approach using the forward Euler method and two popular Runge-Kutta methods of order two and four in the context of HDG. The second-order Runge-Kutta (RK2) selected is described by the two stages   (1) ).
The example considered in the previous section is used to compare the different three time integrators. For each case, the optimal value of the DTS is numerically found by approximating the maximum eigenvalue of the amplification matrix. The simulation is performed with the optimal DTS and the number of dual time steps required to reach the steady state is recorded. Fig. 12 shows the ratio between the optimal number of DTS using the forward Euler method, m • Euler , and the secondorder Runge-Kutta method, m • RK2 , as a function of the degree of approximation p. Different values of the diffusion are considered and five levels of mesh refinement are used. The results show that, when the diffusion dominates, the number of dual times steps required by the forward Euler method is almost the same as the number of dual times steps required by the RK2 method, for all the spatial discretisations considered. When convection dominates, i.e. for ν = 0.001, the RK2 method requires more dual time steps to reach the steady state for the majority of cases. Only for linear elements and for two of the five meshes employed the RK2 requires less dual time steps than the forward Euler method. However, it is worth noting that the extra cost induced by the two-stage RK2 method means that the forward Euler method is a more efficient alternative. Fig. 13 shows the ratio between the optimal number of DTS using the forward Euler method, m • Euler , and the fourthorder Runge-Kutta method, m • RK4 , as a function of the degree of approximation p. When the diffusion dominates the ratio of dual time steps is, again, independent on the spatial discretisation considered. The RK4 is able to converge to the steady state using approximately a third of the number of dual time steps required by the forward Euler method. In contrast, when convection dominates the forward Euler method shows a better performance in general. It is worth noting that, for higher order elements, the Euler method results in a more efficient alternative. This is particularly relevant in the context of HDG as this method is more competitive when high-order elements are employed.
First, a convergence study is performed to numerically validate the implementation of the physical time marching BDF1 and BDF2 approaches. Fig. 14 shows the evolution of the relative error, in the L 2 ( ) norm, of the primal and mixed variables, u and q respectively, as a function of the characteristic element size, h, for three values of the diffusion coefficient and for different orders of approximation, p. The results show the optimal rate of convergence in all cases. The dual time step has been selected as t = 0.99 t to ensure stability of the DTS and the tolerance for the steady state residual has been selected as = 10 −12 . For the spatial discretisation a fine mesh with 1,000 elements and p = 8 is being used to ensure that the error due to the spatial discretisation is negligible, compared to the error induced by the temporal discretisation. Next, the influence of both the spatial and temporal discretisations in the number of dual time steps required to reach the steady state, in each physical time step, is studied. Fig. 15 shows the mean value of the number of dual time steps required to reach the steady state as a function of the dual time step size, t , for different mesh refinements and orders of approximation, when ν = 0.1. The mean value is computed over the total number of physical time steps n T . The study is restricted to a spatial approximation with quadratic elements as previous numerical experiments have shown that the degree of approximation has little influence on the results. For large values of the physical time step, i.e. n T = 40, the qualitative behaviour in the transient problem, shown in Fig. 15, is similar to that obtained in the previous steady state example. The number of dual time steps required to achieve the steady state solution of the global problem decreases and it is a minimum when the dual time step is close to the critical dual time step. However, it can be observed that the number of required iterations is approximately one order of magnitude lower than in the steady state problem. When the physical time step is reduced, i.e. for n T = 160 and n T = 640, the number of dual time steps required decreases. In addition, it can be observed that the dual time step required to achieve the minimum number of iterations differs from the critical dual time step, especially when coarse meshes are considered. It can be observed that the value of t that provides the minimum number of iterations is close to the optimal value, derived in section 4.4, that provides the maximum damping for all the frequencies. As discussed in section 4.4 the value that provides maximum damping for all the frequencies is not necessarily the one that provides the fastest convergence to steady state as it is not possible to know a priori the frequency content of the analytical solution.  Next, the same study is repeated for lower values of the diffusion parameter ν. Figs. 16 and 17 show the same study for ν = 0.01 and ν = 0.001 respectively. As before, the number of dual time steps required to reach the steady state solution significantly decreases as the value of the diffusion parameter decreases, illustrating again the suitability of the proposed method for convection dominated cases.
It is worth noting that, for ν = 0.01 and ν = 0.001, the difference between the dual time step that provides the minimum number of iterations and the critical dual time step is clearly observed when coarse meshes are used and a large number of physical time steps is considered. However, the difference decreases when the Courant number is higher than one. In addition, it is worth noticing that for the convection dominated case, with ν = 0.001, the steady state solution of the global HDG problem in each time step can be reached with less than 10 iterations, clearly indicating the potential of the proposed methodology. This example clearly shows the benefit of choosing a dual time step t = t • , rather than t = t . The number of dual time steps required by the former is up to two orders of magnitude lower.
The possibility to accelerating the convergence to steady state by devising a better initial condition for the dual time marching is now considered. The first approach is simply to take the initial condition as the hybrid variable at the previous time step, namely U n+1,0 = U n . A second approach consists of performing a first order extrapolation of the hybrid variable from the previous two physical time steps, that is U n+1,0 = 2 U n − U n−1 . A third approach considers a second order extrap-   olation using the hybrid variable from the previous three physical time steps, namely U n+1,0 = 3 U n − 3 U n−1 + U n−2 . Fig. 18 represents the number of dual time steps required to reach the steady state, m, as a function of the physical time step n, for three different values of the diffusion coefficient. The three different extrapolation techniques to devise the initial condition of the DTS are compared. A mesh with 320 quadratic elements is considered and 160 physical time steps are performed in all cases to reach the desired physical time T = 0.6. The optimal dual time step is selected according to equation (57) and the tolerance for reaching the steady state in the dual time is set to = 10 −6 .
It can be observed that for ν = 0.1, where the DTS requires more iterations to reach the steady state, the first and second order extrapolation approaches are able to significantly reduce the number of dual time steps required. When convection dominates, namely when ν = 0.01 and ν = 0.001 the extrapolation always reduces the required number of dual time steps but the computational savings are less significant as the DTS already reaches the steady state with very few dual time steps.
It is worth noting that other second order extrapolation approaches were also proposed in [36] for the solution of the Navier-Stokes equations. However, additional experiments, not reported here for brevity, did not show any advantage over the second order extrapolation reported here.

Boundary layer problem
The last example, taken from [47], considers the solution of the steady convection-diffusion equation in = [0, 1] with a = 1, homogeneous Dirichlet boundary conditions and with a constant source term s = 1, such that the analytical solution is given by where μ = a/ν.
In the convection dominated regime, the solution develops a boundary layer near the right end of the domain, at x = 1.
In such scenarios, the use of a non-uniform spatial discretisation is clearly advantageous. The example is used to show the possibility of using the local time stepping approach within the proposed dual time marching algorithm. Fig. 19 shows the convergence of the residual of the global HDG problem as a function of the number of dual time steps for a mesh with n el = 40 and p = 6 such that the elements are clustered at x = 1. The high interpolation order is chosen to ensure that the boundary layer is resolved even with a coarse mesh with only 40 elements. The ratio between the maximum element size, at x = 0, and the minimum element size, at x = 1, is approximately 35. The results show that using the local time step it is possible to reduce the number of dual time steps required. The saving is less significant when convection dominates. Fig. 20 shows the value of the critical time step in each face of the mesh for the three values of ν. It can be observed that when ν = 0.1 the critical time step of the first and last face differs by more than one order of magnitude, whereas for lower values of ν this difference decreases.

Concluding remarks
This work proposes a DTS approach to solve the global system of equations that appear in the HDG method for a convection-diffusion model problem. The proposed approach has been rigorously studied, including a proof of the existence and uniqueness of the steady state solution of the DTS method and the derivation of the magnitude of the stability limit for the dual time step size. An optimal choice for the dual time step is also provided and it is shown that this value provides the maximum damping for all the frequencies.
Numerical experiments were used to illustrate the dependence of the critical dual time step with respect to the physical (i.e. velocity and diffusion) and the numerical parameters (i.e. element size, degree of approximation and physical time step).
The results showed that the critical dual time step is higher for convection dominated problems and a different behaviour was observed for odd and even values of the degree of approximation.
Three numerical examples were used to test the performance of the proposed DTS technique to solve the global system of equations encountered in the HDG formulation of convection-diffusion problems. The first two examples considered a steady and transient convection-diffusion problem respectively and showed the potential of the proposed approach. For transient convection dominated problems, it was observed that the number of dual time steps required to reach the steady state was frequently below 10. The last example demonstrated the potential benefits of two strategies to further decrease the number of dual time steps required.

Declaration of competing interest
The author declares that he has no competing financial interests or personal relationships that could have appeared to influence the work reported in this paper. where N is the size of the global HDG matrix.
If αγ ≥ 0, the maximum eigenvalue of G is such that Similarly, if αγ ≥ 0, the maximum eigenvalue of G is such that Imposing that |μ max | 2 ≤ 1, leads to t ≤ t • λ , with the exact critical dual time step given by It is worth noting that, as the mesh is refined the exact critical dual time step is approximated as (A.5)

A.2. Optimal choice of the dual time step
Using the expression of the eigenvalues given by equation (A.1) it is also possible to derive the exact value of the optimal dual time step, by imposing that the maximum eigenvalue of the amplification matrix is a minimum. This leads to the expression t • λ = β β 2 − 4 cos 2 Nπ N+1 min(αγ , 0) . (A.6) It is worth noting that, as the mesh is refined the exact optimal dual time step is approximated as which coincides with the optimal value derived from the von Neumann analysis in Theorem 8.