Optimised prefactored compact schemes for linear wave propagation phenomena

A family of space- and time-optimised prefactored compact schemes are developed that minimise the computational cost for given levels of numerical error in wave propagation phenomena, with special reference to aerodynamic sound. This work extends the approach of Pirozzoli [1] to the MacCormack type prefactored compact high-order schemes developed by Hixon [2], in which their shorter Padé stencil from the pre- factorisation leads to a simpler enforcement of numerical boundary conditions. An explicit low-storage multi-step Runge–Kutta integration advances the states in time. Theoretical predictions for spatial and temporal error bounds are derived for the cost-optimised schemes and compared against benchmark schemes of current use in computational aeroacoustic applications in terms of computational cost for a given relative numerical error value. One- and two-dimensional test cases are presented to examine the effective- ness of the cost-optimised schemes for practical ﬂow computations. An effectiveness up to about 50% higher than the standard schemes is veriﬁed for the linear one-dimensional advection solver, which is a popular baseline solver kernel for computational physics problems. A substantial error reduction for a given cost is also obtained in the more complex case of a two-dimensional acoustic pulse propagation, provided the optimised schemes are made to operate close to their nominal design points.


Challenges in modelling wave generation and propagation phenomena
Models for the propagation of waves in a continuum are developed across the full spectrum of physical sciences, including aeroacoustics, where increasingly stringent aircraft noise regulations [3,4] promote the development of accurate and affordable methods for predicting aerodynamically generated noise. Enhanced Computational Aeroacoustic (CAA) schemes are sought that can model sound generation and its propagation as part of the industrial design process, where predic-tions of accuracy compatible with the design specifications are required at an affordable cost, produced within specific time-constraints, earlier in the design process, using multi-processor computer clusters.
Computational aeroacoustic schemes that model sound generation and propagation with one simulation typically face a number of challenges. In aeroacoustics, modelling trailing edge noise is an example of where this approach is used. Small-scale flow structures, of the size of the boundary layer thickness, cross a trailing edge, where they experience largeamplitude fluctuations in momentum. The interaction generates small-amplitude acoustic pressure perturbations of long wavelength, compared to the boundary layer thickness. These radiate at the speed of sound, which is much larger than the flow structure mean convection speed.
In a direct computational aeroacoustic simulation of the trailing edge problem, the computational domain size scales with the long wavelength of the acoustic pressure perturbation whereas the spatial discretisation scales with the size of the small flow structures. This separation of length scales results in large sized computational meshes. Furthermore, supporting large-amplitude localised momentum fluctuations and small-amplitude acoustic perturbations in the same solution requires numerical schemes with low dispersion and low dissipation characteristics, to prevent the low-amplitude acoustic signal from being distorted and/or suppressed by the flow solver numerical viscosity.

Prefactored compact finite-difference schemes
Different aeroacoustic problems exhibit different flow physics. Therefore no single algorithm is available to model all problems with adequate resolution and accuracy. Low Mach number acoustically active flows typically involve capturing complex features that are nevertheless computationally smooth, that is, these features do not involve any sharp change in the flow state, such as from a shock. In these circumstances, it is computationally advantageous to use higher than second-order (high-order) schemes that can achieve exponential (e −aN ) convergence rates by increasing the order of the scheme as opposed to algebraic (N −b ) convergence rates by increasing the spatial mesh refinement, where N is the number of degrees of freedom and a and b are positive real numbers. Several numerical methods for aeroacoustics have emerged in the last two decades [5][6][7] with attendant applications documented in four proceedings of the Computational Aeroacoustic (CAA) workshops on benchmark problems [8][9][10]. Lele [11] pioneered the use of Padé type compact and explicit optimised schemes in aeroacoustics. This work highlighted the requirement for special near-boundary treatment, driven by the longer finite-difference stencils used in these high-order methods. Hixon [2] introduced in 1996 a prefactorisation method to reduce the non-dissipative central-difference stencil length of the compact schemes to two lower-order biased stencils which have easily solvabe matrices of reduced size. The advantages of these schemes over traditional compact schemes arise from their reduced stencil length and the independent nature of the resultant factorised matrices. By reducing the stencil length of the compact schemes, the prefactorisation method reduces the depth of the boundary frame over the perimeter of the computational domain where ad-hoc boundary stencils are required. This simplifies the specification of the boundary conditions [12]. Ashcroft and Zhang [13] extended the prefactorisation method to a broader class of compact schemes using a more general derivation strategy, which combines Fourier analysis with the notion of a numerical wavenumber. This class of optimised prefactored schemes exhibits better wave propagation characteristics than the standard prefactored compact ones.

Optimisation of the numerical methods
Several finite-difference explicit and compact methods are now available for solving wave propagation problems in a low Mach number flow. It is typical to involve a high performance computer cluster for such a computation, where the cost of the computation is important. It is therefore of interest to develop and implement an optimisation strategy for the computational cost, based on an acceptable level of numerical accuracy of the results. The issue of computational efficiency of finite-difference schemes has been investigated in detail by Colonius and Lele [7] and by Spisso and Rona [14]. These authors considered the dispersive and dissipative characteristics of several types of spatial discretisation. The error associated with the approximate time integration is usually considered separately from the spatial error. Spatial and temporal errors combine in finite-difference time-marching schemes to determine the overall numerical accuracy of the solution. Pirozzoli [1] developed in 2007 a general strategy for the analysis of finite-difference schemes for wave propagation problems, trying to involve time integration in the analysis in a natural way. The analysis of the global discretisation error has shown the occurrence of two approximately independent sources of error, associated with the discretisation in space and in time. The performance improvement of the global scheme can be achieved by trying to separately minimise the two contributions. The analysis leads to logical and simple criteria for deriving optimised space-and time-discretisation schemes, based on the concepts of spatial and temporal resolving efficiencies. Such efficiencies are achievable by a careful design of the spaceand time-discretisation schemes, as well as by an appropriate choice of the grid spacing and of the time step. This analysis points towards substantial savings in computational resources.

Obtaining and testing optimised prefactored compact schemes
The objectives of the present work are to extend the computational cost optimisation method of Pirozzoli [1] to prefactored compact schemes, assess by numerical experiment the actual computational cost of the newly developed optimised schemes, and verify that the desired level of numerical accuracy of the solution is achieved. This work advances the state of the art by presenting a stencil optimisation procedure for prefactored compact time-marching schemes that minimises the computational cost for a given numerical error level and spatial and temporal resolving efficiencies [15]. Furthermore, this work gives a first confirmation of the performance of spatial and temporal error estimators in prefactored compact finitedifference schemes coupled with the Runge-Kutta time integration by numerical experiment. Finally, this work provides a first verification of the predictive performance of a computational cost estimator of simple formulation, showing that this linearly relates to the actual computational time recorded in numerical experiments on a high performance computer. The impact of these three contributions is substantial. It enables tailoring the spatial and the temporal coefficients of the prefactored compact finite-difference time-marching schemes to obtain aeroacoustic predictions within computational cost and numerical error bounds. This contributes towards decreasing the uncertainty in computational aeroacoustic predictions for practical engineering and physical sciences applications, leading to a more optimal and reliable design solution.

Paper outline
The paper is organised as follows: in Section 2, the theory of the cost-optimisation of Pirozzoli [1], based on the optimisation of the computational cost for a given error level, is reviewed and the approximate decoupling of the spatial and temporal errors is introduced. The de-coupled errors are used in Section 3 to obtain a new family of prefactored cost-optimised schemes, with three different levels of numerical accuracy. In this section, the algebraic symmetry properties of the prefactorised stencils from Hixon [2] are used to define a new spatial differentiation for the cost-optimised compact schemes that is prefactored. The spatial differentiation is coupled with a Runge-Kutta time integration that is costoptimised to a matching level of accuracy by the cost-optimisation process of Bernardini and Pirozzoli [16]. The theoretical performance of the optimised coupled schemes is then investigated for a range of cost levels. Numerical tests in both one and two spatial dimensions are reported in Section 4, to compare the numerical accuracy achieved by the computational schemes against the design values from Section 3. Section 4 also addresses whether the analytic cost function used for the cost-optimisation is a good representation of CPU time recorded during the actual computations. Section 4 investigates the performance gain achieved by the optimised schemes versus their non-optimised counterparts. A discussion of the extension of the proposed schemes to nonlinear and higher dimensional problems, as well as to those with non-periodic boundary conditions, is given in Section 5 and, finally, conclusions from the current work and future perspectives are presented in Section 6.

Compact finite-difference schemes
By performing a Taylor's expansion of a smooth function f : R → R, its derivatives can be shown to satisfy the following where P , Q , R and S are non-negative integers, such that (R + S) ≥ 1, P ≤ R, Q ≤ S and h > 0. Consider now a set of uniformly spaced nodes indexed by i along the R axis, as shown in Fig. 1, where (2) Selecting P = Q = 0 gives an explicit scheme, while other choices yield implicit schemes, otherwise known as Padé or compact schemes. Formally, from Eq. (1), the approximations f i have a truncation error of O h n and standard techniques seek to choose coefficients α j and a j that give the largest possible n for a given stencil width. For brevity, CP Q R S is the notation used to denote compact schemes. Of particular interest in CAA is the numerical error in the wave-like propagation of a single Fourier component of f (x). A monochromatic wave of wavelength λ and wavenumber k has the scaled wavenumber κ = kh with support −π ≤ κ ≤ π . Upon expanding f as a Fourier series, each Fourier coefficient satisfies Eq. (1) and, ignoring the contributions from the higher order terms, the scaled pseudo-wavenumber of the scheme Eq. (2) can be defined byκ If κ = κ could be achieved over the entire wavenumber range, then the finite difference estimate of f i by Eq. (2) would have spectral accuracy. In practice, the discrepancies between κ and κ increase as κ → π , driven by the analytical form of the Padé approximation by which κ = 0 at κ = π . κ is, in general, complex valued and its real part corresponds to a dispersive error, while its imaginary part is related to a dissipative error. In computational aeroacoustics, where disturbances are transmitted over large distances, it is important to minimise the dissipative error. Selecting R = S, Q = P and letting a j be antisymmetric and α k be symmetric produces a scheme with zero dissipation. These central schemes are hereinafter denoted by CR Q . This desirable property is included in the current work by restricting the stencil optimisation to tridiagonal compact schemes ( P = Q = 1) with a five point stencil (R = S = 2) i.e., C12 schemes. In this case, the scaled pseudo-wavenumber is a real value given bȳ This work shows the optimisation of the prefactored compact stencils on the one parameter family of C12 schemes given by The choice α 1 = 1/3 yields the sixth-order C12 scheme. The analysis in this article is, in principle, extensible to more general C P Q R S compact schemes.

Runge-Kutta time stepping
The semi-discretisation of a general conservation law by means of finite differences gives rise to a non-autonomous system of ordinary differential equations of the form dU dt = F(U(t), t), where U represents the vector containing the solution values at spatial nodes. The solution can be time-marched from t = t n to t = t n + t by means of an explicit p-stage, two level Runge-Kutta scheme, which is formulated as where the coefficients β l can be determined to achieve a given formal order of accuracy by means of a Taylor expansion, or, for example, to minimise dissipation and phase errors [17,18]. Assuming F(U) = AU, where A is linear, the scheme can be rewritten as where γ j = p l=p− j+1 β l . This work uses the 4-stage Runge-Kutta scheme with γ 1 = 1, γ 2 = 1/2 and γ 3 and γ 4 as free parameters. Setting γ 3 = 1/3! and γ 4 = 1/4! yields a formally 4th order accurate scheme.

The one dimensional single-scale problem
The one dimensional linear advection equation (LAE) ∂u ∂t where c is the wave speed, is a simple platform for developing and testing appropriate schemes for more complicated, multidimensional, multi-variable problems. Imposing the sinusoidal monochromatic initial condition u 0 (x) =û 0 e ikx on infinite and periodic domains gives rise to the solution u (x, t) =û 0 e i(−ωt+kx) , where the angular frequency ω and the wavenumber k are related by the dispersion relation ω = ck.
For the p-stage Runge-Kutta scheme defined in Eq. (11) and the compact scheme from Eq. (1), the amplification factor r (κ, σ ) from one time level to the next can be obtained [19] as where σ = c t/h is the Courant number and j = 1, . . . , p. In the case of null spatial error, i.e., if κ = κ, the amplification factor is where z = σ κ and γ = γ j p j=1 . The stability limit z s is then given by the following condition: Let u h (·, ·) denote an approximation to u (·, ·). Following the work of Pirozzoli [1], the relative error in the L 2 -norm across one wavelength is defined as where the norm of any scalar function w is defined as , and λ = 2π /k is the wavelength. It can then be shown [1] that, to a good approximation,

Multi-dimensional single-scale problem
For a problem in d space dimensions, the cost C d of its numerical solution, in terms of floating point operations, can be shown by dimensional arguments [7] to scale as (18) where (L/h) d is the number of points in the domain, N op is the required number of operations per mesh node that depends on the scheme used to approximate the spatial derivatives, p is the number of Runge-Kutta stages and T / t is the number of time steps.
Consider, for instance, the cost of solving Eq. (12) in d dimensions using the spatial discretisation of  [7]. Reducing R = S to 1 lowers the FLOPS required to six [7]. The aggregate number of operations to compute one Runge-Kutta l stage per mesh node is N op = (d + 1) + d (9 + 1) with the C12 scheme. Therefore, the complexity of solving the linear system of equations, Eq. (12), in d dimensions by the C12 finitedifference scheme coupled to a p-stage Runge-Kutta integration is p (11d + 1) T t −1 N d FLOPS, as represented by Eq. (18).
The prefactorisation of C12 in Section 3.2 reduces N op but the complexity remains of the same order of N.
The low-storage Runge-Kutta method of Section 2.2, at each stage l, requires the storage of O 3N d floating point numbers, which determines the baseline computer memory requirement. Additional memory storage is typically used in the parallel implementation of compact schemes, for instance by slab decomposition [20], compared to their equivalent explicit schemes. By introducing the non-dimensional groups ckT and kL, which are, respectively, measures of the number of wavelengths travelled by a wave in time T and the number of wavelengths contained in the computational domain L, the normalised error e and the normalised cost c d are obtained The normalised error represents the numerical error incurred in advecting the wave u (x, t) over one period. Similarly, the normalised cost represents the cost of advecting u (x, t) in one dimension over the same period.

Multi-dimensional multi-scale problems
In practice, aeroacoustic signals are composed of a number of waves of differing wavelengths. It is therefore of interest to extend the error and cost definitions from Eq. (19) and Eq. (20) so they apply to broadband signals, supposing these have a spectrum of finite width |k| ≤k and varying propagation velocities |c| ≤ĉ. Let κ =kh, σ =ĉ t/h, then the formulae for the normalised global error and the normalised global cost metric for a broadband signal can be defined aŝ The definition of the global error given in Eq. (21) is an analytical estimator of the numerical error in a given simulation, based on the premise that all spectral scales in the wave propagation problem make equal contributions to the signal power. In real-world fluid mechanic applications, state variables have non-flat spectra and the smaller scales are usually less energetic than the larger scales, with the implication that the error estimator defined in Eq. (21) models a 'worst case scenario'.

Optimal performance for multi-scale problems
In this work, optimising the performance of a scheme is taken as minimising the computational cost for a given level of error, for given values of p and N op and a given problem, i.e., for specific values of the non-dimensional groups ckT and kL. To do this, a target level for the relative error is specified in the non-dimensional form e κ,σ = ckT ≡˜ , (23) and the optimisation consists of finding a pair of values κ * ˜ , σ * ˜ that minimises the cost metric and satisfies the stability condition Eq. (15).
An illustration of this optimisation process is depicted in Fig. 2(a), which shows iso-lines of the normalised global error ê κ,σ and normalised cost ĉ 1 κ,σ in the wavenumber-Courant number κ,σ -plane, where the support is given by 0 ≤σ ≤σ max , 0 ≤κ ≤κ max . σ max and κ max are the numerical stability limits of the specific scheme. Results are shown for the sixth-order compact spatial discretisation scheme coupled with the four-stage, fourth-order RK method, denoted by C12RK4, for which σ max = 1.42 and κ max = π . For the C12RK4 scheme, the iso-cost curves are convex whereas the iso-error curves are concave [1]. Under such a topology, the cost minima occur at the single point of tangency between the iso-lines and the circles in Fig. 2(a) show the 'optimal' working conditions for a number of relative error values. Letê

Table 1
Comparison between the 'optimal' pair κ + ,σ + based on the global error ê κ,σ estimated by Eq. (21) and the optimised pair κ * ,σ * based on its approximation ê a κ,σ from Eq. (24), for the classical C12RK4 scheme. be an approximation [1] of ê κ,σ from Eq. (21) where ê 0 κ is the spatial error (assuming no temporal error) and ê t ẑ is the temporal error (assuming no spatial error). ê 0 κ and ê t ẑ are given bŷ where ẑ =σκ. ê a (·, ·) enables a decoupling of the error into spatial and temporal contributions, which will lead to a simplification of the computations that follow. Fig. 2(b) shows a comparison of the optimised working points for the C12RK4 scheme for the same levels of error as in Fig. 2(a), when ê a κ,σ is used in place of ê (κ, σ ). The dashed lines show the maximum of the spatial and temporal errors. The vertical segments are where the spatial error ê 0 κ is dominant and the dashed curves are where the temporal error ê t ẑ is dominant. The circles correspond to the error ê κ,σ estimated by (21) and the squares to its approximation ê a κ,σ . Table 1 shows the optimised values κ + ,σ + and κ * ,σ * , corresponding to κ,σ from Eq. (21) and Eq. (24), respectively, and the relative distance between them defined by For the errors considered, there is a maximum of 17% difference in the location of the 'optimal' working conditions between the two error definitions. Whereas such a discrepancy has an appreciable magnitude, it still enables operating a finite difference time-marching scheme in the neighbourhood of its optimal design point, as verified in the propagation of a monochromatic wave [20]. This allows the splitting of the derivation of the cost-optimised coefficients into a spatial component and a temporal component.

Optimised compact schemes
This section describes the optimisation of compact finite-difference schemes, by which κ is maximised for a given normalised error level ˜ , assuming that the time integration is exact. This is equivalent to finding the maximum resolving

Table 2
Coefficients and resolving efficiencies of optimised spatial C12-n schemes and temporal RK4-n schemes. α = 1/3, κ max = 1.98 for the standard sixth-order C12 scheme; γ R K 4 efficiency κ opt so that all wavenumbers κ ≤κ opt are resolved with an error less than ˜ . For this purpose, the error ê 0 κ from Eq. (25) is used. The baseline spatial scheme is the compact C12 scheme of Eq. (5) with α 1 = 1/3. Fig. 4 gives a graphical representation of how ê 0 κ varies with κ for this scheme by displaying the difference between the exact value of κ, shown by the dotted line, and its modelled counterpart Re (κ (κ)), shown by the dashed line. As the C12 scheme is compact, the Padé stencil imposes the constraint Re (κ (κ)) = 0 at κ = π . The C12 scheme therefore exhibits a progressive departure from the exact relationship Re (κ (κ)) = κ as κ → π that generates the error ê 0 κ .
The baseline compact C12 scheme of Eq. (5) is optimised by treating α 1 as a free parameter. The new class of optimised schemes will be denoted as C12-n, where n represents the exponent in the target ˜ = 10 −n . Optimisation in this setting is a non-trivial matter. As it can be seen in Fig. 3, the optimal value α n 1 ,κ n opt of each 10 −n error contour occurs at a corner point. The corner point for n = 5 is shown magnified in the inset of Fig. 3. Near this corner point, for each α 1 , there can be multiple values of κ which lie on the desired error contour. As such, a standard local minimisation routine will not necessarily locate the optimum. The procedure used for estimating α n Fig. 3 and are presented in Table 2. Table 2 shows that the optimised values for α 1 are above the 'standard' value of 1/3 for all error levels, asymptoting toward α 1 = 1/3 as the desired error level is reduced. As ˜ → 0 the asymptotic error analysis holds true and the formal order of accuracy of the scheme controls the error. Hence, the optimised coefficient tends to the one which maximises the formal order of accuracy, i.e., α 1 = 1/3, which gives a sixth-order accurate scheme. The same argument can be extended to more general C P Q R S compact schemes. Table 2 shows the maximum resolving efficiency κ n for a normalised error level ˜ = 10 −n , when the non-optimised C12 scheme is used. Using the cost-optimised coefficients increases the spatial resolving efficiency by κ n ∼ 50% over the range 10 −6 ≤ ≤ 10 −4 , indicating a gain in spatial resolving efficiency from the spatial cost-optimisation.

Spatial differentiation by prefactored compact finite-difference schemes
In the C12 scheme of Eq. (5), the approximation of the derivatives at the nodal points requires the solution of a tridiagonal system of equations. In this section, following the methodology of Hixon [2], the tridiagonal system of equations is recast as two (upper and lower) bidiagonal systems. This gives a prefactored compact scheme. Let the derivative f i be split into a backward component f B i and a forward component f F i so that Suppose that f B i and f F are found by using, respectively, the following backward and forward finite difference formulae: The coefficients (α F , α B ) and (a F , a B ) in Eqs. (28) and (29) are determined so that the real (dispersive) components of the scaled pseudo-wavenumbers of the forward and backward stencils are equal to the scaled pseudo-wavenumbers of the original central scheme and the imaginary (dissipative) components of the scaled pseudo-wavenumbers are equal in magnitude and opposite in sign. Such schemes are termed MacCormack schemes [21]. The scaled pseudo-wavenumber of the generic C12 scheme of Eq. (4) is given bȳ In a similar manner, the scaled pseudo-wavenumber of the generic forward operator f F i is and for the backward operator f B i is: Imposing the constraints ensures that the imaginary components of the forward and backward operators are equal in magnitude and opposite in sign and that the real components are equal to one another. Further, imposing The quadratic terms mean that Eq. (35) has two solutions. The lower value solution for α F 1 is selected to minimise the ratio α F 1 /α F 0 , so that the influence of errors at the boundaries on the interior scheme is minimised, in problems that require closures at the computational domain boundaries. Substituting α n 1 from Section 3.1, Table 2, for α 1 in Eq. (35), the prefactored coefficients for the C12-n schemes are obtained. The resultant prefactored coefficients are listed in Table 3. It is noticeable that the standard C12 scheme appears to exhibit a greater central dominance in the stencil, as confirmed by the higher valued a F 0 coefficient. The coefficients in Table 3 30), that is Re(κ(κ)) = Re(κ F (κ)) = Re(κ B (κ)). Hence, each line with symbols represents Re(κ(κ)) = Re(κ F (κ)) = Re(κ B (κ)) in Fig. 4(a). Fig. 4(b) shows the spatial error ê 0 for the classical C12 and cost-optimised C12-n schemes versus the discrete number of points per wavelength N λ = 2π /κ. The classical scheme C12 displays a straight line (constant logarithmic roll-off rate) along the whole wavenumber spectrum. The cost-optimised schemes C12-n feature local ê 0 minima of 10 −n for selected wavenumbers at which the spatial error ê 0 is reduced approximately by one order of magnitude compared to the C12 scheme. Similar spatial error-number of points per wavelengths e 0 − N λ plots with single and multiple ê 0 minima are reported in the literature [5,11,13,22] of optimised finite-difference schemes.
The two imaginary components of scaled pseudo-wavenumber of the prefactored forward and backward stencils, respectively from Eqs. (32) and (34), are equal in magnitude and opposite in sign and they cancel each other out by the summation in Eq. (27), to give a dissipation-free scheme.

Cost-optimised Runge-Kutta time integration
The optimisation approach of Section 3.1 is herein extended to the Runge-Kutta time integration schemes of Eqs. (8)- (10).
Assuming a zero spatial differentiation error, values for the coefficients γ j are sought that maximise the temporal resolving efficiency ẑ opt (˜ ) for a given value of the normalised error ˜ ẑ opt = max{z :ê t (z, γ ) ≤˜ }, under the following stability constraint where z s is the stability limit defined in Eq. (15). The factor ζ ≥ 1.1 has been introduced to guarantee a greater stability margin beyond the range of well resolved angular frequencies z. The four-stage, fourth-order RK scheme is selected as the baseline temporal solver. To preserve formal second-order time integration accuracy, let γ 1 = 1 and γ 2 = 1/2. This leaves two free parameters γ 3 , γ 4 to be determined from ζẑ opt .
To find optimal values for differing levels of target error ˜ = 1 × 10 −n , a global pattern search algorithm similar to the one in Section 3.1 is used, in which an additional nonlinear constraint is incorporated to satisfy Eq. (36). In this case, ζ = 1.1 was found heuristically to provide a good balance between performance and stability. The results are plotted in shows that the RK4-n schemes consistently achieve a higher temporal resolving efficiency ẑ n opt than the resolving efficiency ẑ RK4 = max z :ẽ t z, γ ≤˜ , with γ j = 1/ j!, achieved by the classical RK4 scheme for the same target error ˜ = 1 × 10 −n . Fig. 5(b) shows the stability footprints from the Von-Neumann analysis [19] for the classical RK4 and the temporal cost-optimised RK4-n, n = 4, 5, 6 schemes. The stability footprints are the locus of points in the C complex plane where the amplification factor from Eq. (14) is equal to unity |r t = 1|, in case of null spatial error. It is shown that the stability footprints for the temporal cost-optimised schemes do not differ too much from the classical non-optimised RK4 scheme. The temporal cost-optimised schemes have a slightly larger footprint than the classical one in the left half-side of the complex plane. The larger increment is obtained for n = 4.
The coefficients of the optimised RK4-n schemes γ n 3 , γ n 4 , their respective temporal resolving efficiencies ẑ n and the stability limits z n s from Eq. (15) are listed in the Table 2. The stability limits z n s of the temporal cost-optimised schemes, which are the intercepts of the stability footprints with the Im(z)-axis of the complex plane in Fig. 5(b), are marginally lower than the corresponding limit of the classical RK4 scheme, which is z s = 2 √ 2. Using the cost-optimised coefficients increases the temporal resolving efficiencies between approximately 30% and 50% over the range 10 −6 ≤˜ ≤ 10 −4 , indicating a gain in temporal resolving efficiency from the temporal cost-optimisation.

Theoretical performance of the cost-optimised schemes
Sections 3.2 and 3.3 considered the potential benefits of cost optimisation in solving ordinary differential equations in space and in time, respectively. This section explores combining C12-n and RK-n schemes in a single numerical procedure, denoted C12RK4-n, to solve the partial differential equation Eq. (12). The limitation on the maximum value of the Courant number σ max for the class of cost-optimised C12RK4-n schemes is given by which depends on both the spatial and temporal discretisation. Table 4 reports the σ max values for n = 4, 5, 6. Fig. 2(a) has shown that, given a target level of normalised error ê(κ, σ ), it is possible to determine the optimal working conditions (κ + , σ + ) that give the smallest normalised cost ĉ 1 (κ, σ ) and Fig. 2(b) has shown that the normalised approximate error ê a (κ, σ ) can be used in place of ê(κ, σ ) for computing the estimations (κ * , σ * ) of (κ + , σ + ), which are listed in   Table 1. Following the same procedure, the normalised cost ĉ 1 (κ * , σ * ) of operating the C12RK4 and the C12RK4-n schemes for different levels of normalised approximate error ê a (κ, σ ) were determined and the results reported in Fig. 6. For each C12RK4-n scheme, the symbols denote the normalised cost ĉ 1 (κ * , σ * ) of operating this scheme at the normalised approximate error ê a (κ * , σ * ) = 10 −n , which is the design error level of the C12RK4-n scheme. Fig. 6 confirms that, for each C12RK4-n scheme, the design level of error 10 −n of its spatial differentiation C12-n and temporal integration RK4-n constituents is retained once the spatial differentiation and temporal integration are coupled together in one partial differential equation solver. Whereas it is encouraging to notice that the C12RK4-n schemes retain their design error level based on the approximation ê a (κ * , σ * ) at their design point, it is of interest to discuss the error trend away from the ê a = 10 −n level, in comparison with the conventional C12RK4 scheme. All C12RK4-n schemes exhibit a rate of convergence higher than the C12RK4 scheme up to their respective design error levels. Thereafter, the C12RK4-n schemes exhibit a ê a plateau with increasing ĉ 1 , followed by a convergence rate that is lower than that of C12RK4. The ê a trend is monotonically decreasing for all schemes. The variable error roll-off of the C12RK4-n schemes offers some advantages and limitations in computational physics applications. The monotonicity of the numerical error indicates that the scheme can be operated with confidence in wavenumber bandwidth limited problems knowing the design error will not be exceeded. This indicates that each optimised scheme could be expected to perform well when operated close to its design point. The variable error roll-off rate, however, prevents the implementation of classical roll-off rate checks for mesh convergence computations, as constant error roll-off rates exhibit only at a normalised cost ĉ 1 about two orders of magnitude higher than the scheme's optimal operating point. This cost difference is likely to amount to an onerous mesh resolution in a practical computation, the rate of convergence of which requires testing by alternative means. Table 4 shows the computational cost ĉ * 1 required by each of the C12RK4-n schemes to achieve their design level of error ê a = 10 −n . This is then compared with the corresponding cost ĉ * 1 (C12RK4) required by the standard C12RK4 scheme to obtain the same level of error; the percentage improvement ĉ 1 , based on the cost ĉ * 1 (C12RK4) of the classical scheme, is between 48% and 56% and increases with decreasing error level. In addition, the smallest error ê a (C12RK4) achieved by the standard scheme for the estimated computational costs ĉ * 1 is reported in Table 4 and the percentage numerical error reduction ê a , based on the error ê a (C12RK4) of the classical scheme, is seen to be in the region of 80% for all the optimised schemes.
Let space-only optimised schemes be denoted by C12RK4-Sn, where the standard RK4 scheme is used with the space optimised schemes of Section 3.1, and time-only optimised schemes be denoted by C12RK4-Tn, where the standard C12 spatial discretisation is combined with the temporal optimised schemes of Section 3.3; consistently, n represents the exponent of the error level. Fig. 7 shows a comparison between the fully-optimised C12RK4-5 scheme and the C12RK4-S5 and  C12RK4-S5 schemes, respectively. Similar results hold for other error levels n, but these are omitted from Fig. 7 for clarity. These results highlight the advantages that can be achieved by using a scheme fully optimised in both space and time.

Numerical experiments
This section presents a series of numerical examples to highlight the improvement in computational efficiency that is attainable by using cost-optimised prefactored schemes. The classical C12RK4 and the cost-optimised C12RK4-n scheme of Secs. 3.2 and 3.3 are used to solve the governing equations. Periodic boundary conditions are used at the computational domain boundaries. The numerical experiments reported in this work were performed on the High Performance Computer cluster Alice at the University of Leicester. The cluster is an assembly of 208 pairs of 2.6 GHz CPUs with 64 GB RAM per node pair. The tests were run as scalar computations on one dedicated computer core with its own dedicated memory, isolated from the remainder of the hardware to prevent other jobs running concurrently on the cluster from affecting the assessment of the computational cost.

Polychromatic wave
In the first numerical example, the one-dimensional LAE (Eq. (12)) is solved on the interval (0, 1) with c = 1, periodic boundary conditions u(0, t) = u(1, t) at t > 0, and the initial condition u(x, 0) = 4 j=1 sin(2 ( j+1) π x). The solution is a polychromatic wave with k = 32π . For a given theoretical cost level ĉ 1 , a simulation is run with the optimal Courant numbers σ * and the optimal pseudo-wavenumbers κ * found in Section 3.4. Once an approximate solution u h (x, t) has been computed, the normalised relative numerical error ē is calculated. Here, ē is given bȳ where N is the number of nodes in the computational domain. ē represents an approximation to E/(cκ T ) due to there being only a finite number of points at which the solution is available. However, tests where ē was replaced by an error computed using a coarse grained solution yielded negligible changes to the results, except where very few points per wavelength were utilised. These results indicate that ē is not affected by spurious numerical integration effects. Fig. 8(a) and Fig. 8(b) show, respectively, the iso-lines of the theoretical global approximate error function ê a (·, ·) and the iso-lines of the numerical error ē(·, ·) for the C12RK4-4 scheme with (κ, σ ) ∈ [0, 1.5] × [0, 1]. The iso-lines of the numerical error ē(·, ·) reported in Fig. 8(b) were calculated according to Eq. (38), with c = 1, k = 32π , T = 1 and the non-uniform (κ, σ ) discretisation used by Spisso [20]. The locus of optimal points (κ * , σ * ) and the design point (κ 4 opt , σ 4 opt ) for the C12RK4-4 scheme are overlaid on both figures by using, respectively, the thick (magenta in the web version) line and the filled (magenta in the web version) circle. The a posteriori computed numerical error ē(κ, σ ) of Eq. (38) is dependent on the wavenumber spectrum of the problem being solved. Still, the comparison between Fig. 8(a) and Fig. 8(b) shows that there is a strong correlation between the theoretical error ê a (κ, σ ) and the computed numerical error ē(κ, σ ) for this polychromatic wave propagation problem.
There are two branches of the locus of optimal points (κ * , σ * ). The design point (κ 4 opt , σ 4 opt ) lies on the right hand branch. The discontinuity in the locus of optimal points is due to the non-monotonicity of the approximate spatial error function ê a (·, ·) as κ increases from 0.75 to 1.2 for a fixed σ < 0.58, as shown in Fig. 8(a) by the contour error labels over the range 0.75 <κ < 1.2. The non-monotonic error variation is lower in magnitude than the error change outside this region, as indicated by the packing of the contour lines in Fig. 8(a), so that this region can be regarded as featuring an error plateau. In practice, by increasing the cost above that given by the design point, the theoretical error initially plateaus, until the jump from the right to the left branch occurs. Thereafter, it monotonically decays with the reduction of both κ and σ as shown in Fig. 6.  Fig. 9(a) plots the numerical error ē against cost ĉ 1 , using the same optimal conditions κ * ,σ * as in Fig. 6, for the optimised schemes C12RK4-n (lines with symbols) and the standard C12RK4 scheme (solid lines), at a non-dimensional time T = 1 and overlays the theoretical results (dotted lines) from Fig. 8(a). The normalised numerical errors ē κ * ,σ * computed a posteriori compare very favourably with the approximate error estimates ê a κ * ,σ * . Each of the optimised schemes gives a significant computational cost reduction over the C12RK4 scheme close to its design level of error, as indicated by the open symbols lying left of the dashed line, although the maximum cost reduction is seen at error levels slightly below the design level of error. The approximate error ê a κ * ,σ * in Fig. 9(a) is shown to be consistently of the same order of magnitude as the a posteriori numerical error ē κ * ,σ * over the normalised cost range 2 × 10 1 ≤ĉ 1 ≤ 4 × 10 3 . More specifically ê a ≥ē. This is a desirable feature, as it indicates that ê a can be used as an upper bound indicator for the normalised error for the application of a cost optimised prefactored compact scheme to multi-scale linear advection problems. It implies that the numerical solution will not exceed an acceptable level of numerical error ē = 10 −n that is arbitrarily set by the numerical modeller selecting a specific C12RK4-n scheme. By construction, from Eq. (21), ê and its approximation ê a represent the maximum normalised error over the wavenumber and Courant number range 0,κ × 0,σ . This implicitly attributes the same importance to all the (κ, σ ) scales over this range [1], whereas the most energetic scales in a wave propagation problem may differ from the ones associated to the normalised error maximum ê from Eq. (21), which typically lie towards the upper limit κ of the wavenumber range, as shown in Fig. 4(a). For instance, in the propagation of broadband noise, the sound intensity spectrum decays at higher wavenumbers and frequencies so that propagation errors that occur at these scales have a modest impact on the overall sound pressure level prediction. ê and its approximation ê a therefore represent the worst case scenario of the most energetic scale in a multi-scale wave propagation problem being also the scale of maximum normalised propagation error ê in Eq. (21), from which ê a ≥ē. Fig. 9(a) also shows that, for each of the C12RK4-n schemes, there is a discrepancy between the numerical errors and the theoretical predictions in Fig. 9(a) near its design level of error ˜ = 10 −n . In this region, the theoretical error ê a is larger than the numerical error ē. This trend is explainable by considering the right-hand branch of the (κ * , σ * ) locus in Fig. 8(b), moving from right to left. Using the C12RK4-4 scheme near to its design point, the locus of optimal points begins to enter the valley of low error, thus ē diverges from ê a due to this valley not being present in Fig. 8(a). As the locus of points 'climbs' back out of the valley, ē increases with cost until once again ê a ≈ē. A similar discussion applies to both the C12RK4-5 and C12RK4-6 curves in Fig. 9(a). Table 5 reports the performance of the C12RK4-n schemes benchmarked against the C12RK4 scheme at their respective design points and confirms quantitatively that ē ê a . For each optimised scheme operating at its design point, the numerical error ē is slightly below the corresponding design level of error ˜ . Table 5 also reports the percentage cost reduction ĉ 1 over the standard C12RK4 scheme in achieving the same numerical error level ē attained by each optimised scheme operated at its design point. This cost reduction varies from 50% to 60% and it increases with n. Similarly, the percentage error reduction ē achieved by each optimised scheme over the C12RK4 scheme for the same cost ĉ * (c) Normalised CPU time t CPU compared with theoretical cost, in one-and two-space dimensions.

Table 5
Polychromatic wave: performance of cost-optimised schemes at their design points (κ n opt , σ n opt ) and comparison with the standard C12RK4 scheme. ĉ 1 (%) and e(%) indicate the percentage cost and error reduction with respect to the C12RK4 scheme.  Fig. 9(b) shows the normalised numerical error ē against the normalised computational time t CPU , at the same (κ * , σ * ) as in Fig. 9(a). Here, to be consistent with Eq. (20), t CPU = T CPU /((ckT )(kL)), where T CPU is the CPU time used by the simulation in seconds. Computations were carried out on a machine with an Intel Xeon Ivy Bridge CPU running at 2.6 GHz and the computational time was averaged over 100 runs per (κ * , σ * ) pair to ensure the consistency of the results. Qualitatively, Fig. 9(b) is very similar to Fig. 9(a). This indicates that ĉ 1 is a good pseudo-variable for the normalised CPU time; this is confirmed in Fig. 9(c), where the normalised CPU time, t CPU , is plotted against ĉ 1 for the C12RK4-5 scheme and shows ĉ 1 ∝ t CPU . The computations were repeated for the C12RK4-4 and C12RK4-5 schemes and the constant of proportionality was found to be the same for all three schemes. Using the t CPU ∝ĉ 1 relationship from Fig. 9(c), each optimised scheme offers the same (percentage) reduction in CPU time over the C12RK4 scheme as the ĉ(%) reported in Table 5.

One-dimensional Gaussian pulse
The second sample application of the C12RK4-n schemes is derived from Test case B from [16]. A numerical solution is sought for the one-dimensional LAE equation Eq. (12) over the domain [−100, 100], with c = 1, periodic boundary condi-  Table 6 One-dimensional Gaussian pulse: performance of cost-optimised schemes at their operating points (κ n opt , σ n opt ) and comparison with standard C12RK4 scheme. ĉ 1 (%) and ē(%) indicate the percentage cost and error reduction with respect to the C12RK4 scheme. tions, and the initial condition u(x, 0) = 1 2 e −(x/3) 2 . The solution has a broadband Fourier decomposition and provides a good test of the C12RK4-n schemes applied to a broadband signal. The characteristic length scale for this problem is λ = 6, which corresponds to k = π/3. For a given cost level ĉ 1 , the optimal Courant number σ * and the optimal pseudo-wavenumber κ * have been selected from Fig. 6 for ê a = 10 −4 , 10 −5 and 10 −6 . These set the spatial and temporal discretisations (h and t) for the application of the cost optimised schemes. Fig. 10 plots the normalised numerical error ē, computed using Eq. (38), against cost ĉ 1 for each of the optimised schemes C12RK4-n and the standard C12RK4 scheme at the final time T = 100. In Fig. 10, all optimal operating points of the C12RK4-n schemes, denoted by the open symbols, lie to the left of the dashed line, therefore, numerical solutions with the same level of normalised numerical error ē can be obtained for a lower computational cost than by using the C12RK4 scheme. In this case, the reduction in computational cost ĉ 1 obtained by using the optimised schemes is not as large as in the polychromatic wave test case, but there is still a clear advantage in using the optimised schemes at their respective design points. This is better appreciated in the enlargement of Fig. 10(b) in the vicinity of ē = 1 × 10 −6 and ē = 1 × 10 −5 . Table 6 shows a comparison between the optimised schemes C12RK4-n, n = 4, 5, 6, operating at their design points and the standard C12RK4 scheme. Table 6 confirms that, when operating at their design points, all optimised schemes require a 15% to 25% lower cost than the C12RK4 scheme to achieve the same normalised numerical error ē. Similarly, when the optimised schemes are operated at their design point, they offer between a 30% and 50% reduction in normalised numerical error for the same normalised cost ĉ * 1 when compared with the C12RK4 scheme.
Notwithstanding the positive outcome from this broadband wave propagation test, which points to a consistent performance advantage of the C12RK4-n schemes over the C12RK4 scheme, this performance advantage is significantly lower than that predicted by the theory in Table 4. The background of this lower performance gain is explored by reporting in Fig. 10, by the dotted lines, the approximate normalised error ê a (κ, σ ) versus the normalised cost ĉ 1 for the C12RK4-6 and C12RK4 schemes. Plots of the approximate normalised error for the other optimised schemes are omitted for clarity. Recall that Fig. 4(a) indicates an increase in dispersion with increasing wavenumber. Given that the Gaussian pulse has a definite roll-off at high wavenumbers, ê a (κ, σ ) overestimates ē, as shown by the dotted curves lying above the continuous and discontinuous line in Fig. 10. The approximate errors can be considered to be upper bound estimates for ē because ê a assumes the error is generated at all wavenumbers ≤κ whereas, in a physical problem, some wavenumbers may have comparatively negligible energy levels and make a negligible contribution to the actual error ē. Given that the baseline a posteriori normalised numerical error is lower than ê a , a more modest reduction in cost ĉ 1 to achieve ē = 10 −n appears to be a reasonable outcome by the use of a C12RK4-n scheme.

Linearised Euler Equations
The last test case is the application of the same optimised schemes from Section 2 to a broadband two-dimensional linear wave propagation problem, which is governed by the Linearised Euler Equations (LEE) in two-dimensions. The twodimensional normalised LEE equations, in non-conservative form, are ∂U ∂t that represent an initially quiescent medium perturbed by a two-dimensional Gaussian pulse. M x = M y = 0 is used for consistency with the numerical example presented in [1]. Periodic boundary conditions are applied at the computational domain perimeter and the solution is time-advanced to the non-dimensional time T = 40 at which the pulse, which travels at the speed of sound, has yet to reach the perimeter, which therefore remains unperturbed. The initial condition of Eq. (41) has characteristic wavenumber k = π (ln 2) 1/2 /3.
The procedure of Section 2 is used to estimate the optimal values (κ * , σ * ) which give the smallest error estimate ê, from Eq. (21), for a given cost estimate ĉ 2 , from Eq. (22), for the C12RK4 and C12RK4-n, n = 4, 5, 6 schemes. Fig. 11(a) shows the resulting cost-error trends, using the same notation as Fig. 6. The open symbols represent the estimated optimal operating points (κ opt , σ opt ) of each scheme and indicate a good cost-advantage potential where the C12RK4-n schemes work at these spatial and temporal discretisation level. A quantitative comparison in the predicted performance between the standard C12RK4 scheme and each of the cost-optimised C12RK4-n schemes is presented in Table 7, for the optimal values (κ opt , σ opt ). All optimised schemes are predicted to provide more than 60% reduction in cost ĉ 2 to achieve the same magnitude of normalised numerical error as the C12RK4 scheme. Furthermore, all optimised schemes are predicted to give about an 80% reduction in normalised numerical error ê a compared to the C12RK4 scheme for the same computational cost estimate ĉ * 2 . Comparing ĉ d and ê a values between Tables 4 and 7 shows that these error and cost reductions from using C12RK4-n schemes at their respective optimal operating points are predicted to increase in two-dimensional applications.
These cost and error reduction estimates are verified a posteriori by solving numerically the two-dimensional Gaussian pulse propagation problem of Eqns. (39), (41). The baseline solution is provided by the C12RK4 scheme used on a uniform  87.86 Table 8 Two-dimensional Gaussian pulse: performance of cost-optimised schemes at their design point (κ n opt , σ n opt ) and comparison with the standard C12RK4 scheme. ĉ 2 (%) and ē(%) indicate the percentage cost and error reduction with respect to the C12RK4 scheme.  Equivalently, solutions are obtained using the C12RK4-n schemes with spatial and temporal discretisation corresponding to their (κ n opt , σ n opt ) design points. From these solutions, the a posteriori normalised numerical error ē at T = 40 is computed where N is the number of nodes in each coordinate direction and each (x i , y j ) is a coordinate of a node in the mesh. The normalised numerical error computed a posteriori is shown in Fig. 11(b). Each component of the solution is a broadband signal that spans the full wavenumber spectrum. Thus, similarly to the one-dimensional Gaussian pulse propagation problem of Section 4.2, the numerical results from the cost-optimised schemes do not show the same reduction in cost over the C12RK4 scheme as the theoretical results. Nonetheless, qualitatively, the numerical errors show a similar trend to the theoretical results in Fig. 11(a). At its design point (κ n opt , σ n opt ), each C12RK4-n scheme shows a reduction in cost compared with the C12RK4 scheme and with the other C12RK4-n schemes to achieve the same level of normalised numerical error ē. Indeed, Table 8 shows that each cost-optimised scheme gives a reduction in normalised cost ĉ 2 of over 30%, when operating at its design point (κ n opt , σ n opt ), compared to the standard C12RK4 scheme. In addition, the optimised schemes operated at their design points give approximately a 50% reduction in error ē over the standard C12RK4 scheme for the same cost ĉ * 2 (C12RK4).
Finally, Fig. 9(c) confirms that in two-dimensions the cost function ĉ 2 is a good pseudo-variable for the normalised CPU-time t CPU = T CPU /((ckT )(kL) 2 ).

Applicability of cost-optimised schemes to three-dimensional and to non-linear problems
The present work has dealt only with the optimisation of interior compact finite difference schemes and the understanding of their wave propagation properties in unbounded, periodic, one-and two-dimensional domains. Most real flow physics problems are further complicated by boundary condition effects, three-dimensionality, and non-linearity.
A preliminary assessment of the impact on the stability and accuracy of the prefactored cost-optimised schemes when coupled with high-order boundary closures is given in [20]. The baseline solver coupled with high-order boundary closures is shown to be stable [20]. The L 2 -norm error between the analytical reference solution and the numerical prediction obtained from the C12RK4 scheme using a CFL number of O 10 −2 was shown to match the sixth-order roll-off predicted by the truncation error of Eq. (1). Future efforts will focus on designing suitable boundary closures for the optimised schemes.
Preliminary solutions of three-dimensional linear wave propagation problems have been obtained using the C12RK4 scheme. The results show that the C12RK4 scheme coupled with a buffer zone at the through-flow boundaries is able to model the rotor-stator interaction in a three-dimensional turbomachinery geometry, producing engineering accurate aeroacoustic predictions [23].
The proposed method is expected to encounter some difficulties when applied to multi-dimensional systems of equations with differing wave speeds. In such cases, the overall solution error, in a linear setting, can be estimated from a weighted average of the errors associated with each wavenumber.
It is expected that the present analysis may also give some benefits for weakly non-linear problems, because the leading order effects due to variable wave speed and multiple flow scales are naturally accounted for. A rigorous assessment of this statement is made difficult by the lack of appropriate explicit closed-form solutions and further efforts are required in these directions [1]. Non-linear problems can lead to the wavenumber spectrum of the solution varying with time. As such, it is worth considering an adaptive algorithm that can detect the changes in wavenumber and modify the mesh spacing and time steps accordingly. In conclusion, for real flow physics problems, the cost-optimised schemes are expected to retain some advantage over the non-optimised ones, even though the actual improvement may be less than the theoretical prediction reported in Section 4.

Conclusion
A class of optimised prefactored schemes have been developed based on fully spatio-temporal a priori error estimates. The following procedure can be outlined in order to achieve the best performance for a given physical problem in which a typical maximum wavelength (and associated wavenumber k ), a typical maximum propagation speed ĉ, and final time T can be identified: i) given the allowed error tolerance , the corresponding normalised error level is determined ˜ = /(ĉk T ); ii) the most efficient prefactored space-and time-discretisation scheme is selected among the optimised ones herein proposed upon inspection of the respective cost vs. error plots, as given for instance in Fig. 2; iii) for the selected scheme, optimal values of the scaled wavenumber (κ * (˜ )) and of the Courant number (σ * (˜ )) are determined; iv) optimal values of the grid spacing and of the time step are finally determined according to h * (˜ ) =κ * (˜ )/k, t * (˜ ) =σ * (˜ ) h * (˜ )/ĉ.
A theoretical performance analysis shows the clear superiority of optimised schemes over classical CAA schemes, especially if high-fidelity approximations are sought in multiple space dimensions. The proposed cost reduction is, according to Eq. (22), more effective in higher dimensions, due to the exponent of κ being d + 1. The theory suggests that a cost reduction of over 75% is possible in three dimensions when using the C12RK4-n schemes, with n = 4, 5, 6, compared with a maximum of 56% and 70% in one-and two-dimensions, respectively. Numerical experiments convincingly support the above statements, showing a cost reduction for a given error level of O(50%) for one-dimensional monochromatic wave propagation and of about 20% for a two-dimensional acoustic pulse propagation. This feature, coupled with simplicity in the enforcement of numerical boundary conditions imparted by pre-factorisation, makes the present class of schemes especially promising for carrying out computations of long-range acoustic propagation. It is important to note that the favourable properties of space-and-time optimised schemes depend sensitively on the choice of the mesh size and time step, which should be sufficiently close to the nominal optimal point of operation. It is an open issue whether the same features also carry on to the fully non-linear regime (i.e. turbulent flow) and to flow cases in which no a priori estimate is given for the bulk flow properties (i.e. length and time scales).