An exponential integrator/WENO discretization for sonic-boom simulation on modern computer hardware

Recently a splitting approach has been presented for the simulation of sonic-boom propagation. Splitting methods allow one to divide complicated partial differential equations into simpler parts that are solved by specifically tailored numerical schemes. The present work proposes a second order exponential integrator for the numerical solution of sonic-boom propagation modelled through a dispersive equation with Burgers' nonlinearity. The linear terms are efficiently solved in frequency space through FFT, while the nonlinear terms are efficiently solved by a WENO scheme. The numerical method is designed to be highly parallelisable and therefore takes full advantage of modern computer hardware. The new approach also improves the accuracy compared to the splitting method and it reduces oscillations. The enclosed numerical results illustrate that parallelisation on a CPU results in a speedup of 22 times faster than the straightforward sequential version. The GPU implementation further accelerates the runtime by a factor 3, which improves to 5 when single precision is used instead of double precision.


Introduction
Sonic-booms are acoustic waves generated by supersonic planes when they fly faster than the speed of sound. Sonic-booms are heard as two loud bangs that are close together.
These bangs not only annoy the population, but they can also potentially damage building facades. For these reasons supersonic planes are limited to military use and commercial supersonic flights are still not possible. Starting from the 60s many studies have been conducted in order to design the shape of the aircraft so that the generation of sonic-booms is minimized or eliminated; we refer the reader to [2,16,22,23,24]. The theory finds partial confirmation in physical experiments [21]. However, conducting real experiments turns out to be extremely expensive, as the shape of the aircraft cannot be changed easily. Therefore, the need arises to perform numerical simulations that aim to model the propagation of acoustic waves generated by supersonic planes.
It is well known that the airflow over a supersonic aircraft generates a pressure disturbance. The acoustic wave originating from the pressure disturbance will evolve in a N shaped wave when it propagates "far" enough from the aircraft. Typically, N -waves appear after that acoustic waves generated by the pressure disturbance have propagated ten body lengths away from the aircraft. This distance is usually referred to as the mid field. The N -wave is a mathematical model to the loud bangs perceived by humans. A first issue is then to solve an inverse problem in order to predict the formation of N -waves in the mid field starting by a pressure disturbance generated by the aircraft geometry in the near field (i.e. the acoustic wave generated shortly after the plane passed), see [1,2]. A second problem is to study the propagation of the resulting N -wave from the mid field to the ground (far field). In this paper we consider only the second problem, which is of key importance in any optimization algorithm as the final goal is to model the shape of the aircraft in such a way that N -waves do not appear or are mitigated by the time that sonic-booms reach the ground. This turns into modelling the sonic-boom propagation through a partial differential equation and solving it several times. We focus our attention on the evolution of acoustic waves from the mid field to the far field. To do so, we take as initial value the N -wave and we simulate its propagation into the far field.
In our study, we follow the mathematical model proposed in [4] in order to simulate nonlinear and diffraction effects in the propagation of N -waves. This consists in solving numerically a Khokhlov-Zabolotskaya-Kuznetsov (KZK)-type equation, which is done in [4] by splitting methods. In this paper we introduce a new approach based on exponential integrators to solve efficiently the aforementioned partial differential equation. Exponential integrators have been successfully employed to solve various partial differential equations; we refer the interested reader to [8,9,14,15]. The main idea is to solve the linear part of the KZK-equation in frequency space with the help of the fast Fourier transform (FFT) and to discretise the nonlinear terms with a weighted essentially non-oscillatory (WENO) scheme. The latter scheme is very well known in the literature and is widely used in computational fluid dynamics for the numerical solution of hyperbolic conservation laws, see [17,26,27]. The new approach brings multiple advantages such as the reduction of oscillations, less number of operations (asymptotically speaking) and acceleration due to parallelisation.
The goal of this work is to fully exploit modern computer hardware such as GPUs to drastically reduce the computational cost of the numerical simulations. There is a flourishing literature about use of GPUs in order to accelerate scientific computations, e.g. [6,10,12,13,28]. Reducing the simulation time is of great importance when we aim to model physical phenomena. Efficient implementations allow us to consider, for example, more grid points to obtain numerical solutions that are closer to reality. To provide numerical algorithms that take full advantage of parallel architectures is therefore of great practical interest. To achieve good performance it is very important to design the numerical scheme keeping the parallel paradigm in mind. Indeed, a sequential algorithm will probably run much faster on a single, powerful, single-core CPU rather than on a GPU. This is due to the fact that the number of operations per seconds on each core is much lower on a GPU compared to a CPU, see [20]. Increase in performance is only possible when the parallel architecture of a GPU is fully used. As mentioned above, the approach proposed in this paper is based on exponential integrators. Examples of their implementation on GPUs can be found e.g. in [3,7,11,25].
The paper is structured as follows: in section 2 we describe the mathematical model of N -waves propagation in randomly inhomogeneous media. In section 3 we briefly recall the approach based on splitting methods and introduce a new algorithm based on exponential integrators. The two methods are discussed and compared. In section 4 we present some numerical experiments that support the theoretical derivations and illustrate the performance of the proposed numerical scheme on parallel architectures.

Mathematical model
In this work we consider a mathematical model for the description of sound propagation. The model is based on a nonlinear dispersive partial differential equation that takes into account effects of the turbulent velocity field and describes the evolution of the acoustic pressure. For more details we refer the reader to [4,5]. Henceforth, we consider the following partial differential equation in the unknown V = V (σ, ρ, θ): (1) is a KZK-type equation in dimensionless form that models the acoustic wave propagation in inhomogeneous medium. The unknown V is the acoustic pressure normalized with respect to the initial pulse amplitude. The variables σ and ρ are the propagation distance and the transverse coordinate, both normalized with respect to the initial pulse length. The variable θ is the time normalized with respect to the initial pulse duration.
Further information about the model and parameters are given in [4]. The variable coefficients U and U ⊥ are the first and second component of a two dimen-sional isotropic random velocity field where c 0 ∼ 343 m/s denotes the ambient sound speed. The velocity field U is computed by following the approach given in [4]. At a given point r = λ(σ, ρ) the velocity field is given by the sum of N random modes through the formula where "·" denotes the scalar product in R 2 . The angle φ n is the phase of the nth mode and K n is the wave vector given by where θ n is the angle between K n and the σ-axis. Both φ n and θ n are elements of two independent random sequences uniformly distributed in [0, 2π]. The wavenumbers |K n | are equispaced in an interval [K min , K max ]. The amplitude | U (K n )| is related to the Gaussian energy spectrum In this work we set [K min , K max ] = [0.1/L, 9.0/L], where L = 4λ is the length scale. The parameter σ u is set to 3 m/s and λ = T 0 c 0 , where T 0 = 2·10 −2 s is the initial pulse duration. A similar setting is adopted in [4]. This gives fluctuations of the variable coefficients U , The information will be later useful to estimate the CFL conditions of the proposed numerical schemes. A pseudo-code for the generation of the inhomogeneous velocity fields U and U ⊥ is given in Algorithm 1.

Numerical approach
In this section we describe two different numerical approaches for the solution of (1). For the sake of comparison we present first a splitting approach following the one given in [4].
we device a new approach based on exponential integrators and WENO schemes. This new approach requires (asymptotically) a smaller number of machine operations. Moreover, the method is of second order in the variable σ. This improves the convergence rate with respect to the splitting approach, which was of first order only. Finally, we observe numerically that the second approach has smaller oscillations (in amplitude) compared to the splitting approach for long propagation distances. In the following the two methods are described in detail and compared in terms of computational cost.

Splitting method
A possible numerical approach is given by the Lie-Trotter splitting. This method consists in dividing (1) in sub-problems each of them modelling a single physical effect. Before we proceed to describe the numerical scheme, we transform (1) in order to obtain an evolution equation in the variable σ. To do so, we simply integrate both sides of (1) from θ min to θ. Therefore, we obtain In (2) we assumed ∂ σ V (σ, ρ, θ min ) = 0 for every σ and every ρ. This assumption holds true if the domain is chosen large enough with respect to θ. Indeed, in this case the initial data do not evolve at the boundaries (or the effects at the boundaries are negligible). Therefore, boundary conditions do not play a significant role in the numerical simulations. We assume homogeneous Neumann boundary conditions for ρ and periodic boundary conditions for θ, in the same spirit as in [4]. We set σ as the "marching" direction, also known as artificial time. Let 0 ≤ σ ≤ Σ, N σ ∈ N, ∆σ = Σ/N σ and σ n = n∆σ, n = 0, 1, . . . , N σ be the uniform discretization of the variable σ. We split up (1) into five equations as follows: Then, starting from an approximation V n (ρ, θ) to the the solution of (2) at σ = σ n , the solution V (σ, ρ, θ) at σ = σ n + ∆σ is approximated by ∆σ , i = 1, . . . , 5 are the flows of the initial value problems associated to (3)- (7), respectively. The solution of each sub-problem is approximated by different numerical schemes that are tailored to the considered sub-problem. We remark that the Lie-Trotter splitting is a method of first order in σ. This might be insufficient for certain applications. However, if needed the scheme could be generalized to second order which increases the computational cost by approximately a factor of two.

Full discretization of single flows
The numerical schemes for the single sub-problems are described in the following. We adopt a uniform discretization both in the variables ρ min ≤ ρ ≤ ρ max and θ min ≤ θ ≤ θ max and denote by V n j,k the numerical approximation of V (σ n , ρ j , θ k ). Let be the grid sizes for ρ, θ, respectively. Then, the sub-problems are discretized as follows.
Diffraction. Equation (3) is a diffraction equation and will be solved by a Crank-Nicolson finite difference scheme combined with the trapezoidal rule in θ: Nonlinearity. Equation (4) is a Burgers' equation, responsible for the nonlinear effects. We employ the Godunov method, see [19], which is conservative. The discretization is given by with Axial convection and Absorption. Equations (5) and (6) model the axial convection and acoustic absorption, respectively. The solutions are computed in frequency space. Let us represent V by its Fourier series in the variable θ: Then, the solution at σ + ∆σ is given by wherê (11) Transverse convection. The last equation (7) is the transverse convection. The solution is obtained by a Lax-Wendroff method, which is conservative and of second order both in σ and ρ. The numerical scheme is obtain as follows. We compute a Taylor expansion of V in the variable σ: and note that Inserting ∂ σ V and ∂ 2 σ V in the Taylor expansion and approximating the derivatives in ρ by centred finite differences gives the Lax-Wendroff scheme: The global numerical scheme is of first order in σ and of second order in (ρ, θ). We remark that (4) and (7) are solved by explicit conservative schemes. Therefore, a CFL condition has to be satisfied. In particular, (4) and (7) give respectively. These CFL conditions are not too restrictive, indeed, we have The second bound is obtained numerically. The absorption coefficient A in sonic-boom simulation is typically of size 7·10 −6 . For the above numerical scheme a stronger absorption coefficient is needed in order to ensure stability of the solution. Therefore, we set A = 3.4 · 10 −4 . The algorithm proceeds sequentially to solve (3)-(7). In the following we discuss the computational cost of the single steps.

•
Step (3) is the most expensive one and requires O N ρ N 2 θ operations.
• Steps (5) and (6) Advancing the numerical solution from V n to V n+1 requires O N ρ N 2 θ operations. One of the main disadvantages of the proposed splitting approach is that it is not suitable for parallelisation. The main obstacle is given by the numerical scheme (8). Indeed, this numerical scheme approximates the integral by a sum which is a non-local operator. This means that the numerical solution at stage k cannot be computed before all the solutions till stage k − 1 are computed. Therefore, parallelising this process is not possible. Further, the absorption parameter A has to be set higher than the one resulting from physical measurements in order to ensure stable numerical solutions. On the other hand, the implementation of the scheme is very easy. Moreover, each of the employed numerical methods is very well known, meaning that possible issues and restrictions concerning the numerical schemes are already fully studied. Therefore, this full discretization provides reliable numerical solutions that can be used as benchmark to test the correctness of other numerical methods.
We remark that (3) could be solved more efficiently in spectral space via fast cosine transform. Indeed, the cosine transform automatically imposes homogeneous Neumann boundary conditions. Moreover, spectral methods allow us to choose a relative low number of grid points and obtain high spatial accuracy, provided smooth data. Another possibility would be to assume periodic boundary conditions on ρ and use the discrete Fourier transform also in the transverse coordinate. This would not affect heavily the numerical solution because the analysed region of the variable ρ is much smaller than its total domain. Therefore, effects due to boundary conditions are negligible. In this work, we used a finite difference Crank-Nicolson scheme for its simplicity of implementation and a better comparison with the numerical solutions provided in [4].
In the next session, we propose a different approach to (1) with the final goal to obtain an highly parallelisable scheme. Moreover, the presented numerical scheme mitigates the stiffness inherent in (3) and (6), and it is able to reproduce solutions with sharp gradients maintaining a reasonable size grid for (ρ, θ).

Exponential integrators
Similarly to the splitting approach, instead of (1) we consider the evolution equation Here we treat ∂ −1 θ in frequency space. In particular, the antiderivative ∂ −1 θ corresponds to the multiplication with −i/k in frequency space, more details are given in section 3.4. We distinguish two parts on the right-hand side of (14): linear terms with constant coefficients given by 1 4π and linear terms with variable coefficient together with the nonlinear term This partition motivates the use of exponential integrators. The advantage of these integrators lies in the fact that they integrate the terms in (15) exactly. Therefore, the stiffness given by second derivatives vanishes. Let us define a linear operator L and a nonlinear operator b by setting Then, (1) is rewritten fir short as Notice that we make the σ-dependence in b explicit since (16) depends on U ⊥ and U that are σ-dependent. The exact solution is obtained by using the variation of constants formula and it reads Notice that in (18) is given in an implicit form only and the analytical solution is not available, in general. The basic idea of exponential integrators is to obtain numerical solutions by approximating the integral in (18) with the available information, see [15]. The simplest (reasonable) approximation of the integral is done by replacing b σ + s, V (σ + s, ρ, θ) with b σ, V (σ, ρ, θ) . Then, the s-dependence in the function b is removed and we integrate e (∆σ−s)L exactly. This gives the exponential Euler method: where ϕ 1 (z) = (e z − 1)/z in an entire function. Notice that for brevity we suppressed the (ρ, θ) dependence of the variable V . The numerical scheme is of first order in σ, i.e. it has the same order of convergence of the splitting scheme described in section 3.1. Higher order exponential integrators can be constructed systematically. We refer the reader to [15] for an exhaustive discussion about exponential integrators. In this work we use a two-stage second order exponential integrator (ExpRK22) similarly to [8,9]. The scheme is given by where ϕ 2 (z) = (e z − 1 − z)/z 2 .

Full discretization with FFT and WENO
In this section we describe the spatial discretisation which makes use of the fast Fourier transform (FFT) and the weighted essentially non-oscillatory (WENO) scheme. As usual, we consider a uniform discretization of the propagation distance The discretization in (ρ, θ) is done by pseudo-spectral methods. We adopt the same uniform discretization as in section 3.2. As it is well known, linear operators with constant coefficients can be treated efficiently in frequency space. On the other hand, variable coefficients and nonlinear terms should be computed in the physical space. In contrast to section 3.1 we assume periodic boundary conditions both in ρ and θ. Assuming periodic boundary conditions for ρ does not affect the numerical simulations. Indeed, the domain is chosen large enough so that effects of the boundary conditions are negligible for the investigated region.
The numerical scheme (19) is considered in frequency space to facilitate the computation of the operator e ∆σL . We apply the Fourier transform to both equations in (19) and obtain where The symbol ⊙ denotes the component-wise product between two matrices, e.g.
The value ǫ in z jk corresponds to the machine epsilon, e.g. ǫ = 2 −53 for double precision floating point or ǫ = 2 −24 for single precision floating point. The definition of z jk is in accordance to the so called regularized Fourier multiplier. Roughly speaking, we can think of the regularized Fourier multiplier as a numerical trick in order to avoid treating the 0th k-frequency separately. A similar idea is used for example in [10,18]. Notice that the computation of the terms involving b is done in the physical space. This means that, starting from F(V n ) four discrete Fourier transforms are computed in order to obtain the numerical solution F(V n+1 ). Namely, one inverse and one forward Fourier transform to compute F(b(σ n , V n )) in the first step and one inverse and forward Fourier transform to compute F(b(σ n+1 , V n, * )) in the second step.
What is left is the grid discretization of b(σ n , V n ) and b(σ n+1 , V n, * ) in the variables (ρ, θ), which is performed by the weighted essentially non-oscillatory scheme of order 5 (WENO5). This nonlinear numerical scheme has the advantage to limit oscillations in region where the solution is not regular, i.e., where the solution has sharp gradients, or it is even discontinuous. Moreover, in the region where the solution is smooth the WENO5 scheme reaches high order accuracy, in this case order 5. For an exhaustive introduction to ENO and WENO schemes we refer the reader to [26].
The WENO5 scheme is tailored to discretise gradients-like operators. Then, the solution is advanced in time by a chosen time integrator, in our case the ExpRK22 scheme. Notice that (16) is not in a gradient-like form, indeed we have In (21) we used the fact that U is θ-independent. However, we cannot write U ⊥ ∂ ρ V in gradient form because U ⊥ = U ⊥ (σ, ρ) is ρ-dependent. We rewrite the last term as with Then, we discretize ∇ · f (V ) by the WENO5 scheme. The extra term −∂ ρ U ⊥ V does not require any approximation. The ρ-derivative of U ⊥ can be analytically computed (or numerically approximated) before starting the σ-evolution because the coefficient U ⊥ is known a priori for every σ and ρ. Therefore, the extra term −∂ ρ U ⊥ V reduces to the point-wise multiplication −∂ ρ U n ⊥,j V jk , where ∂ ρ U n ⊥,j approximates∂ ρ U ⊥ (σ n , ρ j ) and V jk approximates V (ρ j , θ k ).
The WENO5 scheme in combination with the ExpRK22 scheme must fulfil a CFL condition induced by the terms collected in b(σ, V ) in order to provide stable solutions. We remark that B and the variable coefficients U , U ⊥ are relatively small (≤ 5 · 10 −2 ). This gives a mild CFL condition and we observe in numerical simulations a similar CFL condition as the one for the numerical scheme presented in section 3.1.
The algorithm proceeds in two stages. In the first stage the quantity F(V n, * ) is computed, given F(V n ), see (20).
• An inverse Fourier transform is performed via IFFT in order to get V n and compute b(σ n , V n ). IFFT works with O(N ρ log N ρ N θ log N θ ) operations and b(σ n , V n ) is computed via WENO5 in O(N ρ N θ ) operations; • Then, a FFT is performed to obtain F(b(σ n , V n )).
This means that the first stage requires O(N ρ log N ρ N θ log N θ ) operations. Similar considerations apply to the second stage leading to the same asymptotic estimate. This already gives an indication that the proposed numerical method outperforms (at least asymptotically) the one given in section 3.1. Moreover, WENO schemes together with FFT and IFFT are very suitable to parallelization. This results in a remarkable boost in terms of performance, as the numerical experiments in section 4 show.

Numerical results
Numerical simulations are performed for the following parameters: The intervals [ρ min , ρ max ] and [θ min , θ max ] are chosen large enough in order to mitigate the influence of the boundary conditions in the simulation. The variable ρ is examined only in the interval [133,267], while θ is examined in [0, θ max ]. A similar setting is also adopted in [4]. The variable coefficients U , U ⊥ are generated before starting the simulation by using the procedure described in section 2. An instance of U , U ⊥ is displayed in Fig 1. For reasons of comparison we generate just one set of data [U , U ⊥ ] T for all numerical simulations. We use four different sets of values for N σ , N ρ and N θ collected in Table 1. The inhomogeneous velocity fields are generated for N σ = 2400 and N ρ = 10000. For each of the remaining values in Table 1 the velocity fields U , U ⊥ are sampled accordingly.
The initial data is chosen as an N -wave pulse modeled by: The absorption parameter A will be discussed later in this section. In Fig. 2 we display the initial pulse V 0 and the final solution V Nσ obtained by the five way splitting scheme and the exponential integrator/WENO5 scheme. Here we employ N σ = 1200, N ρ = 2500 and N θ = 7 · 2 9 for both schemes and display V Nσ for (ρ, θ) ∈ [133, 267] × [0, θ max ]. The two solutions have a similar shape, but different amplitudes. The difference in amplitude is justified by the employed numerical methods. Indeed, the numerical schemes used in the splitting approach introduce numerical diffusion that is significantly higher than the numerical diffusion introduced by the WENO scheme. Therefore, the amplitude of the solution decreases much faster for the splitting approach than for the new approach presented in section 3.3. The similar shape of the final solutions in Fig. 2

Splitting approach vs. exponential integrator approach
We compare the two approaches by performing numerical simulations for the different values in Table 1. These values are chosen so that the CFL conditions imposed by the splitting approach and the exponential integrator approach are fulfilled. In the splitting approach the CFL conditions imposed by the Godunov and the Lax-Wendroff scheme, used to solve (4), (7), respectively, read where we used (13) with the estimate V ∞ ≤ 5. A similar CFL condition applies to the exponential integrator approach. We stress that these conditions are not restrictive, in particular the one imposed on N ρ . This allows us to perform numerical simulations choosing reasonable grid sizes. In Fig. 3 we display the evolution of the numerical solutions obtained by the splitting approach and the exponential integrator approach at different propagation distances σ along the fixed transverse coordinate ρ = 144. Notice that for σ = 41 the solutions exhibit a Ushape profile, similarly to [4]. The profile of the solutions get flatter as σ increases. This effect is due to the fact that the initial wave travels through the inhomogeneous medium following many different paths. This result in a scattering of the original pulse. Additionally, dissipative effects introduced by the numerical schemes in the splitting approach accentuate the flattening of the original pulse.
Finally, in Table 2 we compare the total time needed to compute the numerical solution V at the final propagation Σ = 120 for the two approaches and for the different values in Table 1. We observe that the splitting approach is faster for a small number of grid points, but is outperformed by a significant margin (up to a factor of 4 for large resolution) by the ExpRK22 scheme as the grid gets more refined. This is in accordance with the theoretical estimates provided in section 3. Indeed, we have that the splitting scheme requires O(N ρ N 2 θ ) operations, while the exponential integrator requires O(N ρ log N ρ N θ log N θ ) operations. In addition, the exponential integrator/WENO scheme introduces less numerical diffusion and avoids oscillations as will be discussed in more detail below.   is to prevent instabilities and/or oscillations of the numerical solutions, see the discussion in [4], section II-A. We test the new approach presented in section 3.3 for both values of A = 3.4 · 10 −4 and A = 7 · 10 −6 . In Fig. 4 we compare the numerical solution V as a function of θ obtained by the two methods at (σ, ρ) = (115, 144). The discretisation is done by employing N σ = 1200, N ρ = 2500 and N θ = 7 · 2 9 . Notice that oscillations are significantly bigger for the splitting approach, while they are negligible for the exponential integrator approach. The smaller oscillations in the exponential approach are due to use of the WENO scheme. Indeed, WENO schemes are able to capture shocks (i.e. where the solution has less regularity) by reducing the accuracy. However, in smooth regions the scheme recovers its precision, in the specific case the WENO scheme converges with order five. This allows us to choose the coefficient A closer to the parameter given by the physical measurements.
Convergence. We test the convergence of the new approach, which is of second order in σ. To do so, we consider a reference solution V ref obtained by using N σ = 2400 points in the propagation direction and a final distance Σ = 30. Then, the reference solution is compared with numerical solutions V num obtained using a smaller number of points in the propagation direction. Both V ref and V num are computed with the same number of grid points in the coordinate ρ and θ. In particular, we set N ρ = 5000 and N θ = 7 · 2 8 .
We compute the relative error at the final propagation distance Σ read off from The convergence rate is given by the quotient of two consecutive error values. More specifically, let N 1 < N 2 be two different number of σ-points for the computation of two numerical solutions and err 1 , err 2 the associated relative errors. Then, the convergence rate β is given by In Table 3 we collect the relative errors with the convergence rate β. Notice that β ≈ 2 indicates that the employed method is of second order.

High performance computing
We perform numerical simulations by using the exponential integrator method in combination with the WENO5 scheme presented in section 3.3.
For the four different sets of values in Table 1 we measure the total time in seconds needed in order to compute the numerical solution at the final propagation distance Σ = 120. The computer system used is an Intel Xeon scalable CPU Gold 6130 and a Titan V GPU. On the CPU the numerical simulations are performed by using 32 cores.
We carry out four tests. First, we test the sequential code. Second, we parallelize with the application programming interface OpenMP. Third, we parallelize with graphic processing units by using CUDA. In particular, for the GPU implementation we present two versions of the code: one in double precision floating point and one in single precision floating point. Performance results for the four different cases are reported in Table 4. Numerical tests show a drastic speed up achieved using parallelisation with OpenMP compared to the sequential code, which is up to 22 times faster. The simulations are further accelerated when GPUs are involved. In particular, we observe that the single precision floating point implementations on GPUs run more than five times faster than the corresponding CPU implementations. The single precision implementation offers a speed up by a factor roughly two with respect to the double precision implementation, as expected. To trade accuracy for precision might not always be a good choice. However, for this work single precision simulations still offer good results that give insight of the physical phenomena. We stress the fact that performance improvements of the simulations can be obtained only if the algorithm has a high rate of parallelisation. Differently, the simulation times on GPUs might result even inferior to the sequential code.
An interesting aspect in HPC is to compare problems that are compute bound versus problems that are memory bound. Compute bound problems are of the kind that memory access is negligible with respect to the number of arithmetic operations, while the vice-versa holds for memory bound problems. This two aspects find place in the example treated in this work. More specifically the cost of the linear part are essentially memory bound, while the cost of the non-linear part are compute bound. We compare the time required to solve the linear parts against the WENO5 scheme. To illustrate this, let us consider the pseudo-code given in Algorithm 2. STEP 1 and STEP 3 are responsible for the non-linear effects, while STEP 2 and STEP 4 for the linear ones. In Table 5 we report the required average time to compute the different steps in one iteration. To do so, we measure the total time to the completion of the simulation and divide it by N σ . The results show how the computational cost of the linear effects is lower than the non-linear ones for the sequential simulation. When the code is parallelised the situation is reversed. This is due to the fact that the WENO5 scheme is computationally bound, while the FFT is memory bound.

Conclusions
This work is devoted to the study of the propagation of sonic-booms from the mid field into the far field. The mathematical model is given by a KZK-type equation which is a dispersive nonlinear partial differential equation. In the literature a numerical approach based on splitting methods has been proposed. In this work an adaptation of the algorithm is presented and discussed in details. One of the main disadvantages of the aforementioned approach consists in a difficult parallelisation of the algorithm. Therefore, we present a different approach based on exponential integrators in combination with WENO schemes. The new algorithm is highly parallelisable resulting in a tremendous acceleration (almost up to 185 times faster) with respect to its sequential version. Other than a reduced time of simulations, the exponential integrator approach brings additional benefits. The proposed algorithm achieves an higher accuracy with respect to the splitting approach, moreover the number of operations is (asymptotically) smaller. Finally, we observe numerically a significant reduction of oscillations (in amplitude) of the numerical solutions when the exponential integrator approach is used. The new approach allows us to choose more grid-points and obtain numerical solutions that better describe the physical phenomena of N -wave propagation maintaining the time of simulations relatively low.   Table 5: Time required to simulate the non-linear (STEP 1+3) and linear (STEP 2+4) effects in a single iteration for the different code versions: sequential, Open MP, CUDA double precision and CUDA single precision. These results are obtained by using Set 4 in Table 1. Notice how the computational cost of linear and non-linear effects scale equally with the parallelisation.