Optimal Runge–Kutta schemes for pseudo time-stepping with high-order unstructured methods

Article history: Received 27 February 2018 Received in revised form 22 November 2018 Accepted 3 January 2019 Available online 30 January 2019


Introduction
The Artificial Compressibility Method (ACM) is an alternative to pressure correction based methods for solving the incompressible Navier-Stokes equations [1]. In the ACM, pressure is solved together with other variables from a transformed hyperbolic steady-state system instead of via an elliptic Poisson equation. Combined with dual time stepping [2], the ACM can be applied to incompressible unsteady flows by approximately solving the steady-state problem in fictitious pseudo-time at each physical time step. In practice, the computational cost of this approach is dominated by the rate of convergence of the solution in pseudo-time. There are several popular approaches to performing pseudo-iterations, such as implicit pseudotime stepping with the Lower-Upper Symmetric Gauss Seidel (LU-SGS) method [3,4], or using explicit schemes such as Runge-Kutta (RK) methods [2].
Recent research has led to the development of high-order unstructured spatial discretizations suitable for the ACM, such as the Discontinuous Galerkin (DG) [5,6], Spectral Difference (SD) [7,8], and Flux Reconstruction (FR) approaches [9]. It has been demonstrated previously that these methods are particularly appealing for scale-resolving simulations of unsteady turbulent flows via Large Eddy Simulation (LES), Implicit LES (ILES), and Direct Numerical Simulation (DNS) [10][11][12]. They are able to provide accurate solutions on mixed-element unstructured grids, and have been shown to be more accurate than industry standard second-order tools per degree of freedom [13]. While significant contributions have been made to the application of high-order unstructured methods to incompressible flows [4,[14][15][16], the majority of previous work has focused on their application to the compressible Navier-Stokes equations. Recently, it has been demonstrated that the FR approach can be extended to solve incompressible flow problems using the ACM and dual time stepping. Furthermore, it has been demonstrated that FR with explicit time stepping can be scaled to tens of thousands of GPU compute nodes achieving > 50% of peak performance on modern many-core hardware architectures [17].
The main advantages of using the ACM with explicit pseudo-time stepping are its scale invariance and low memory footprint, making it well-suited for modern many-core hardware platforms. Although explicit schemes are relatively inexpensive per time step, they are restricted by their conditional stability limit and, hence, a greater total number of pseudo time steps must be taken. This issue can be addressed by adopting implicit schemes, allowing significantly larger pseudo-time steps. However, the trade-off with implicit schemes is that they require the computation and storage of Jacobian matrices. This becomes prohibitively expensive for higher-order spatial discretizations and is typically memory bandwidth bound. In addition, many methods for solving the resulting linear system, such as Lower-Upper Symmetric Gauss-Seidel (LU-SGS) or Newton-Krylov methods, are not scale invariant and increasing the amount of parallelism also reduces the numerical effectiveness of the preconditioner. Hence, in the context of high-order spatial discretizations using the ACM on modern hardware architectures, explicit approaches remain relevant even though a relatively large number of explicit iterations are usually required to converge the pseudo time system to pseudo-steady-state [18].
The maximum stable time-step for an explicit discretization is a function of both the temporal scheme, such as the particular explicit RK method being used, and the spatial discretization [19]. In this study we develop optimal explicit RK methods in order to reduce the total number of pseudo time steps and hence total computational cost of an explicit high-order unstructured ACM implementation. The methods are optimal in that they allow the largest possible pseudo time step to be taken for a given number of RK stages. The rest of the manuscript is organized as follows. In Section 2 we discuss the general ACM formulation and its solution using dual time stepping. In Section 3 we summarize general RK methods and in Section 4 present the FR approach. We then derive optimal RK schemes for pseudo time stepping, and demonstrate their application to incompressible turbulent flow over an SD7003 airfoil and an incompressible turbulent jet in Section 5. Finally, we present conclusions in Section 6.

Formulation
The ACM was originally introduced by Chorin [1] in the late 1960s for solving steady incompressible fluid flow problems. Rather than relying on a Poisson equation based projection, a coupling pressure term is introduced to the continuity equation to drive the system towards a solenoidal velocity field in the limit of steady state. The original formulation preserves the hyperbolic nature of the system, but destroys time accuracy. The most common procedure for recovering time accuracy is the dual time stepping approach of Jameson [2]. The artificial compressibility system can be written in a conservation form as ∂u ∂t where ζ is the artificial compressibility relaxation factor, ν is the kinematic viscosity, p is the pressure, and v i is a velocity component for i ∈ {x, y, z}. The ACM formulation has a hyperbolic nature in pseudo time, similar to the linear advection equation, since artificial pressure waves travelling at a finite speed are introduced. The eigenvalues of the inviscid flux Jacobian matrices where c i = v 2 i + ζ is the pseudo speed of sound. The pseudo speed of sound is proportional to the artificial compressibility relaxation factor ζ , which is adjusted to downscale the wave speed to reduce system stiffness.

Dual time stepping
Dual time stepping can be considered as consecutive pseudo steady-state simulations [2]. In this approach, the real time is discretised with a backward difference (BDF) scheme, whose solution is found by marching the governing equation in pseudo time. A conservation law with an additional pseudo time derivative can be marched towards a pseudo steady state following ∂u ∂τ A cancellation matrix I c = diag{0 1 1 1} needs to be added as a coefficient for the real time derivative to eliminate it from the continuity equation. This ensures that the solution is driven towards the divergence free state. As the solution is integrated with respect to pseudo time, the real time derivative can be considered as a source term for the right-hand side that is updated at every real time step. Defining R = R − S t , where R = −∇ · f and S t = I c ∂u ∂t , Equation 8 reads Discretizing the real time derivative with an M-stage BDF [19] scheme and the pseudo time derivative with a s max -stage explicit RK scheme [19], we can write a single pseudo iteration as u n+1,m+1,0 = u n+1,m u n+1,m+1,s = u n+1,m+1,0 + α s τ R n+1,m+1,s−1 , for s = 1 ... s max (10) where the superscripts n and m denote real and pseudo time levels, s is the stage index, and α s is the stage coefficient.
Different real time levels are needed for the BDF expansion in the right hand side computation according to in which B σ are the coefficients of the BDF schemes listed in Table 1. In this study, the leading term in the BDF is expressed at the first RK stage, which is often referred to as point-implicit source term treatment. The treatment also introduces a scaling coefficient α PI = 1 + α s τ B 0 / t which effectively limits τ if τ t. In this instance, however, best performance was achieved with α PI = 1.
The sub-iteration procedure according to Equation 9 is repeated until the solution has reached a specified convergence tolerance. Subsequently, the pseudo steady-state solution is stored as the real solution and assigned as the initial value for the next time level u n+1 = u n+2,0 . Furthermore, the rest of the terms in S t are updated accordingly, u n−(σ +1) = u n−σ for σ = 1...M. Algorithm 1 illustrates the dual time stepping methodology with BDF2. Return u n+1 u n−1 = u n u n = u n+1

Explicit schemes
A general s-stage explicit RK scheme can be represented in matrix-vector form by the Butcher tableau [19] c A b (12) where and which ensures A is strictly lower triangular.
When applied to ODE's of the form du/dt = ωu, where ω ∈ C, a pth-order explicit RK method can always be expressed as [20] u n+1 = P s,p (z)u n , where P s,p (z) is the schemes stability polynomial and z = ω t. The resulting scheme will be linearly stable provided |P s,p | ≤ 1, where | · | is the complex modulus. The stability polynomial can be determined directly from the Butcher tableau according to where e is a vector of ones. The region of absolute stability S of the RK method is the set in the complex plane where |P s,p | ≤ 1 holds, that is [20], and the linear stability condition is The eigenvalues ω δ of a general spatial discretization can be obtained via von Neumann analysis for each permissible wavenumber θ δ . Therefore, to ensure linear stability of any fully-discrete scheme consisting of a spatial discretization and RK pair we require

Optimizing Runge-Kutta methods
In general, the stability polynomial P s,p of an explicit RK method is a polynomial of degree s defined by polynomial coefficients γ j according to [20] P s,p (z) = s j=0 γ j z j . (21) For the method to be accurate to order p, we require the stability polynomial coefficients to be identical to the exponential function up to terms of at least order p, that is For any explicit scheme we can determine the largest time-step t for which Equation 20 remains satisfied. Our objective is to determine the set of free parameters {γ 0 , γ 1 , . . . , γ s } that yield the largest possible value of t for a given spatial discretization to accelerate the convergence of Equation 9. This can be cast as an optimization problem [20] maximize We use t opt to denote the solution to this optimization problem (the optimal time-step size), and P opt to denote the corresponding optimal stability polynomial. Finding the global optimum in Equation 23 can be particularly challenging, as it is nonconvex for s > 2 and there are typically several sub-optimal local minima.
Ketcheson and Ahmadia [20] proposed an alternate optimization approach, which seeks to first determine how small the maximum modulus of P s,p ( tω δ ) can be for a given t, which can be cast as The solution to Equation 24 is denoted r s,p ( t, ω ω ω δ ) and we note that |P s,p (z)| is linear in γ j . Therefore, Equation 24 is convex [20]. Ketcheson and Ahmadia [20] then reformulated the problem defined by Equation 23 as This is an optimization problem in a single variable, and can be solved using the bisection method outlined in Algorithm 2. The solution of Algorithm 2 yields the maximum possible linearly stable time-step t opt , as well as the corresponding optimal stability polynomial P opt .
Finally, to generate Butcher tableaus based on the optimal stability polynomial P opt we used RK-Opt, a package for optimization of Runge-Kutta methods [21,22]. RK-opt has also been used in related work, such as Parsani et al. [22] who found optimal RK schemes for the SD method, and Kubatko et al. [23] who used it to determine Strong Stability Preserving (SSP) RK schemes for DG methods. Our FRDG schemes differ from those of Kubatko et al. [23] in that we relax the order of accuracy of the identified RK schemes to first-order, since we do not require accuracy in pseudo time. Also, due to the smooth nature of incompressible flows, we focus on optimizing linear stability limits rather that SSP properties. Applying these accuracy constraints to the stability polynomial we obtain which has s − 1 free parameters {γ 2 , γ 3 , . . . , γ s }. In the current study we converge the bisection method in Algorithm 2 to 10 −12 using Matlab 2016a [24]. To solve the convex minimization problem in Equation 24 we used CVX, which is a Matlab-based package for specifying and solving convex programs, using the "best" precision setting [25,26].

The flux reconstruction approach
In the current study we use the FR approach for spatial discretization, noting that the RK optimization procedure is also applicable to other general spatial discretizations such as the finite difference, finite volume, finite element, or DG methods. The FR approach, first introduced by Huynh [9], later described using our notation in [27][28][29] and summarized here for completeness, is used to spatially discretize a 1D hyperbolic conservation law of the form where n is one of N e elements in the domain. For simplicity, each element n is mapped to a reference element I via the linear mapping where x n is the mesh node corresponding to the left face of n , n is the mapping function, and ξ is the location in reference space. This mapping can be inverted according to which is also linear. This allows us to construct the transformed solution in computational space according tô and a transformed flux in computational space according tô The system of equations in computational space can then be written as Following Huynh [9], the solution within each element is represented by a degree k polynomial, which is allowed to be discontinuous at the interface between elements. This polynomial is supported by nodal basis functions generated at k + 1 solution points. Therefore, the solution within each element in computational space can be approximated aŝ where û δ n =û δ n (ξ, t) is the polynomial representation of the solution within an element, û δ n,l =û δ n,l (t) is the value of the solution at solution point l, and φ l =φ l (ξ ) is its corresponding nodal basis function in reference space. For the one-dimensional case these basis functions are the Lagrange polynomialŝ A polynomial representation of the discontinuous flux function f δ D n =f δ D n (x, t) can be constructed using the same polynomial basis as the solution according tô where the superscript D denotes that this flux, like the solution, is allowed to be discontinuous at the interface between elements and f δ n,l =f δ n,l (t) is the flux evaluated at the solution points. This discontinuous flux is then corrected to be C 0 continuous bŷ where g L = g L (ξ ) and g R = g R (ξ ) are correction functions, f δ D n,L =f δ D n (−1, t), and f δ D n,R =f δ D n (1, t).
The gradient of the continuous flux function can then be found according to which is of degree k and in the same polynomial space as û δ n . The behaviour of an FR scheme depends on the choice of correction functions g L and g R for the left and right hand boundaries, respectively. Huynh [9] originally introduced several correction functions including one that recovered a collocation-based nodal DG scheme (FRDG), one that recovered an energy-stable SD scheme (FRSD), and one that recovered a so-called g 2 scheme. Vincent et al. [27,30] later introduced a single-parameter family of energy-stable flux reconstruction schemes, whose semi-discrete dispersion and dissipation behaviour was studied.

Von Neumann analysis
To determine the stability constraints of an FR scheme we perform Von Neumann analysis following Huynh [9], Vincent et al. [27], and Vermeire et al. [28,29]. For this study, we consider the linear advection equation with fully upwind fluxes at element interfaces, since the ACM requires the solution of a non-linear advection problem.
We follow the notation of Vincent et al. [27] for Von Neumann analysis of a general linear advection equation which admits plane wave solutions of the form where û δ is the vector of solution point values, g ξ L is the gradient of the correction function evaluated at the solution points, is the boundary extrapolation operator, and D is the nodal gradient operator. We seek Bloch wave type solution to Equation 43 of the form Values of t opt for optimal RK schemes for FRDG.
Stages where θ δ is a prescribed baseline wavenumber within the range −π ≤ θ ≤ π and ω δ is the resulting temporal frequency of the scheme. The upwind interface flux can then be written aŝ where r[l] =φ n,l (1).
By substituting Equation (45) and Equation (46) into Equation (43) we obtain Equation (48) is a classical eigenvalue problem, where v is one of k + 1 valid eigenvectors with an associated eigenvalue ω δ .
The operator Q is a function of θ δ , and it follows that so too are v δ and ω δ . In order for a fully discrete scheme to remain stable, all of the eigenvalues ω δ from the spatial discretization must be contained within the region of absolute stability of the temporal discretization for all permissible values of θ δ for a given t.

Optimal methods for DG
By using the FRDG correction function described by Huynh [9] and Vincent et al. [30] in Equation 40 we recover the DG method. Values of t opt obtained using Algorithm 2 are summarized in Table 2 for FRDG schemes with polynomial degrees k = 1 to k = 8 using between s = 2 and s = 7 RK stages. These results are also plotted in Fig. 1. Tabulated stability polynomial coefficients and corresponding Butcher tableau coefficients are provided as Electronic Supplementary Material with this article. We observe that, for a given polynomial degree, optimal RK schemes with more stages have a larger stability limit. Also, for a given number of stages, optimal RK schemes for higher degree FRDG schemes have a smaller stability limit.
Although optimal RK schemes with more stages permit larger time-steps, they also require an additional residual evaluation per stage. Therefore, the effective computational cost is obtained by normalizing t opt of a given scheme by the total number of stages as t opt /s as shown in Fig. 1. In Table 3 we present the effective speedup of the optimal RK schemes obtained in this study relative to the classical R K 4,4 scheme. We observe speedups in excess of 1.80× depending on the polynomial degree, suggesting that the optimal RK schemes developed in the current study could be nearly twice as fast as conventional methods for converging the ACM in pseudo time when using FRDG. Furthermore, as shown in Fig. 1, the benefit of increasing the number of stages beyond s = 7 is only marginal.
We are also interested in the computational cost of converging the pseudo time problem for a fixed spatial resolution. This is proportional to (k + 1) t opt /s, which normalizes t opt by both the number of stages and the number of degrees of freedom per element. Referred to here as the effective resolution, this measure accounts for the fact that for a fixed spatial resolution fewer relatively large high-order elements can be used. Results for this, shown in Fig. 1, demonstrate that we can expect higher-order FRDG schemes to be more expensive than lower-order schemes per degree of freedom. Interestingly, the additional cost of moving to subsequently higher-order schemes reduces as the order is increased. This suggests that the additional computational cost of using, for example, a k = 8 FRDG scheme compared to a k = 7 scheme with an equivalent number of degrees of freedom could be marginal when using the current optimized RK schemes.
Finally, we have plotted a sampling of the eigenspectra for the k = 4 FRDG scheme scaled by the maximum stable time step (ω ω ω δ t opt ) alongside the region of absolute stability (S) and contours of |P s,p (z)| in Fig. 2. It can be seen that large portions of the region of absolute stability for optimal schemes with a low number of stages are effectively unused. For example, for s = 2 there are large stable regions above and below the eigenspectra that are not utilized when the maximum stability condition t opt is met. However, as the number of stages, and hence degrees of freedom available to be optimized

Table 3
Speedup factor in terms of maximum effective time-step size t opt /s relative to R K 4,4 for FRDG.
Stages are increased, the scaled eigenspectra and boundary of the region of absolute stability are nearly identical. This suggests that, given an RK scheme with enough stages, the optimizer will generate a scheme whose stability boundary mimics the shape of the spatial discretization's eigenspectra.

Optimal methods for SD
It should be noted that the optimization strategy used in this study is not limited to the FRDG correction function, and can be generalized to any valid FR correction function or to other types of spatial discretizations. For example, by using the FRSD correction function described by Huynh [9] and Vincent et al. [30] in Equation 40 we recover a SD scheme. We can then generate optimal RK schemes, with values of t opt obtained using Algorithm 2 summarized in Table 4 for FRSD schemes with polynomial degrees k = 1 to k = 8 using between s = 2 and s = 7 RK stages. These re- sults are also plotted in Fig. 3. Tabulated stability polynomial coefficients and corresponding Butcher tableau coefficients are provided as Electronic Supplementary Material. We observe that, for a given polynomial degree, optimal RK schemes with more stages have a larger stability limit. Also, for a given number of stages, optimal RK schemes for higher degree FRSD schemes have a smaller stability limit. Furthermore, we note the similarity of the results with those obtained for FRDG schemes of the same polynomial degree. This is because c S D → 0 = c DG for large values of k where c S D and c DG are the coefficients that recover the FRSD and FRDG schemes, respectively, following Vincent et al. [30]. Therefore, the higher-order FRSD schemes have similar eigenspectra to their FRDG counterparts and recover similar optimal RK schemes. Also similar to the previous results using the FRDG scheme, we observe speedup factors in excess of 1.80× for all solution polynomial degrees, as shown in Table 5. This suggests that the optimal RK schemes developed in the current study could be nearly twice as fast as conventional methods for converging the ACM in pseudo time when using FRSD. Furthermore, we have demonstrated that optimal RK schemes can be readily generated for various FR correction functions. Table 4 Values of t opt for optimal RK schemes for FRSD.
Stages  Table 5 Speedup factor in terms of maximum effective time-step size t opt /s relative to R K 4,4 for FRSD.
Stages  Fig. 3. Optimal time-step t opt (top), time-step per stage (middle), and time-step per stage normalized by effective resolution (bottom) for optimal RK schemes using FRSD.

SD7003
We investigate transitional and turbulent flow over an SD7003 airfoil [31] to assess the performance benefits of the optimal RK schemes for ILES of the incompressible Navier-Stokes equations. We use k = 1, 2, and 3 FRDG schemes with their correspondingly optimal R K 7,1 schemes and the classical R K 4,4 scheme to solve the pseudo time problem in the ACM. Each simulation is run using a modified version of PyFR 1.6.0 [32] with full details supplied as Electronic Supplementary Material.
Each case is run at an angle of attack α = 8 • and Reynolds number Re = V ∞ c ν = 60, 000, where V ∞ is the freestream velocity and c is the chord length. We used the same quadratically curved hexahedral mesh as Vermeire et al. [13] that has 138,024 elements and a span of 0.2c. This test case is commonly used to examine the suitability of numerical schemes for predicting separation, transition, and turbulent flow. It has been studied previously by, for example, Visbal and collaborators including Visbal et al. [33], Garmann et al. [34], and Beck et al. [35] using a DG Spectral Element Method (DGSEM). The characteristic features of the flow include laminar separation on the upper surface of the airfoil, which then reattaches further downstream forming a laminar separation bubble. The flow transitions to turbulence partway along this separation bubble, creating a turbulent wake behind the airfoil.
We start each simulation using solution polynomials of degree k = 1. These are run for a total of 30t c , where t c = c V ∞ is a flow convective time. After 30t c each set of simulations is then restarted and run for an additional 30t c using k = 2 and, in the case of the optimal RK scheme simulation, switched to the correspondingly optimal R K 7,1 scheme for that polynomial degree. This process is then repeated for each polynomial degree until all simulations have been run for a total of 30t c . Statistics are collected over the final 20t c of each simulation at each polynomial degree, allowing any transient effects from startup or increasing the polynomial degree to be damped. We use BDF2 for physical time integration with a fixed time-step of t = 10 −3 t c and the velocity residuals of the pseudo time problem are converged to 5 × 10 −4 at each physical time step.
Each scheme was run using its maximum possible stable pseudo time-step, which was acquired via bisection with the constraint that the velocity residuals did not diverge. Results in Fig. 4 show isosurfaces of q-criterion for each solution polynomial degree using R K 4,4 and their corresponding optimal R K 7,1 schemes in pseudo time. As expected, as the solution polynomial degree is increased with a fixed number of elements the qualitative fidelity of the scale-resolving ILES simulations increases. All three polynomial degrees exhibit a short laminar separation bubble on the leading edge of the airfoil, transition to turbulent flow, and a turbulent wake that propagates downstream. This behaviour is consistent with previous studies using high-order compressible solvers at low Mach numbers. Qualitatively, the results obtained using R K 4,4 and the optimal R K 7,1 schemes are similar. The measured lift and drag coefficients for each simulation using R K 4,4 and the optimal R K 7,1 schemes are shown in Table 6 for each solution polynomial degree. These values are generally consistent with those available from similar studies using low Mach number compressible solvers [33][34][35]. Importantly, we note that the measured lift and drag coefficients are not sensitive to our choice of pseudo time stepping scheme, with only minor variations observed between the R K 4,4 and optimal R K 7,1 scheme results. These variations are within the margin of error expected using the finite 20t c averaging period.
To assess the relative performance of each scheme we have plotted samples of their convergence history in Fig. 5 towards the end of each simulation. It is clear from these figures that the R K 7,1 schemes converge significantly faster per iteration than the R K 4,4 scheme. In fact, the optimal schemes can complete four to five time-steps in the same number of iterations that it takes the conventional scheme to complete one. Furthermore, we note that simulations using the R K 7,1 schemes were between 1.89× and 2.11× faster in terms of wall-clock time than those using the classical R K 4,4 scheme as shown in Table 7. These results are consistent with results of our theoretical analysis presented in Table 3 for the FRDG scheme. Therefore, we observe that the proposed R K 7,1 schemes are able to achieve significant speedup over conventional explicit pseudo time integrators, without any adverse impact on simulation accuracy.

Turbulent jet
To further demonstrate the suitability of the optimal RK schemes for ILES we consider an incompressible turbulent round jet at Reynolds number Re = V 0 d jet ν = 10, 000 where d jet is the jet diameter, and V 0 is the midline velocity of the inflow. Experimental data for this test case is summarized by Lipari et al. [36] and previous numerical studies include DNS at Re = 5, 000 by Boersma [37] and explicitly filtered LES at Re = 11, 000 by Bogey and Bailly [38] performed with compressible solvers at Mach numbers Ma = 0.6 and 0.9, respectively. Fig. 6 depicts a portion of the mesh and the computational domain. A circular 2D mesh of diameter D = 48d jet was extruded to L x = 100d jet in 250 equal increments. The element count of the cylinderical 3D mesh is 247, 250 hexahedra and 596, 500 prisms. Additionally, in Fig. 6, x 0 refers to the streamwise location of the virtual origin which is the starting point of the self-similar region, associated with a linear velocity decay and spreading rate [36]. The self-similar region is defined in terms of the self-similarity coordinate where r = y 2 + z 2 . The inflow profile of the jet is imposed via where r jet = 0.5d jet and δ = 40d jet . Additionally, a sponge layer given by a source term is added to R to dissipate transient waves before the jet reaches the outlet boundary. The distance from the outlet to the point of inflexion of the hyperbolic tangent that gradually dissipates the flow is L spg = 10d jet , and the target outflow state variables are u out = {17 0 0 0} T . A no-slip boundary condition is imposed at the vertical front interface outside the jet inflow zone and a pressure outlet boundary condition is used for the outlet. The lateral far-field wall is specified as a non-entraining slip wall boundary. Hence, the jet receives entrainment by drawing fluid from the edges of the domain, which is evident by a small backflow at large r.
A single simulation was performed with a solution polynomial order k = 4, and ζ = 2.5. The BDF2 scheme was used for physical time and the optimal R K 7,1 scheme for k = 4 was used in pseudo-time. Constant time steps of t = 0.008t c , were used throughout the simulation. The jet was initially developed up to 1, 040t c to

Table 7
Speedup for the SD7003 airfoil test cases using optimal RK schemes with s = 7 relative to the classical R K 4,4 scheme.
damp initial transients using 15 pseudo-iterations within each physical time step. The simulation was then restarted with a convergence criterion of 5 × 10 −5 for the velocity residuals and statistics were collected up to 2440t c . The real time steps converged within 18 to 24 pseudo-iterations, resulting in the L 2 -norm of divergence ∇ · u = 1 ζ ∂ p ∂τ being approximately 6 × 10 −4 . Fig. 7 shows a volume rendering of the instantaneous velocity field to visualise the shape of the jet. The experimental study by Panchapakesan and Lumley [39] at Ma ≈ 0.01 − 0.02 and Re = 11, 000 are used as a reference for all flow statistics.
To find the location of the virtual origin, the time-averaged midline axial velocity decay was shifted in x to fit a linear constant decay rate through the origin. The midline velocity decay shifted by x 0 = 3.2d jet is shown in Fig. 8a together with the experimental rate observed by Panchapakesan and Lumley [39]. We observe our predicted linear velocity decay region   matches with the reference before the sponge regions starts gradually dissipating the velocity. Fig. 8b shows the average axial velocity with respect to the self-similarity coordinate η. In addition to averaging in time, the results were spatially averaged along conical surfaces defined by η(R, x) = R x−x 0 in the self-similar region 24d jet ≤ x ≤ 60d jet and normalised by the mean midline velocity v c . From Fig. 8b, we can see that our mean axial velocities agree with the reference data across the entire self-similarity region. Figs. 9a and 9b show the mean axial and radial velocity fluctuations in the self-similar region. Both graphs indicate that the simulation is able to accurately capture velocity fluctuations in agreement with the experimental data. These results demonstrate that the optimal RK schemes combined with a BDF scheme are suitable for simulating turbulent flows, and that this approach can correctly predict mean flow and turbulent quantities.

Conclusions
In this study we have generated optimal RK schemes for convergence of the ACM using dual time-stepping. Noting that our approach is generalizable to a variety of different spatial discretizations, we have demonstrated its utility for the FR approach. Specifically, we have generated optimal schemes for the FRDG and FRSD correction functions, recovering the DG and SD methods respectively, for spatial polynomial degrees of k = 1 to k = 8 and for RK schemes with s = 2 to s = 7 stages. These schemes were optimized in the context of linear advection, predicting speedup factors in excess of 1.80× relative to the classical R K 4,4 scheme. In practice, we were able to demonstrate speedup factors for incompressible ILES of flow over an SD7003 airfoil between 1.89× and 2.11×. Furthermore, we have demonstrated the utility of the schemes for incompressible ILES of a turbulent jet, achieving good agreement with experimental data. In summary, these new schemes yield a significant speedup when solving the ACM via dual time-stepping. Furthermore, they are easy to implement and only require modification of a small number of Butcher tableau coefficients for the pseudo time solver. Future work should investigate the utility of these optimized schemes in the context of multigrid and implicit preconditioning strategies for convergence acceleration. Furthermore, the influence of diffusion, boundary conditions, and mesh topology on the optimization procedure could be explored.