A recursive system-free single-step temporal discretization method for finite difference methods

Single-stage or single-step high-order temporal discretizations of partial differential equations (PDEs) have shown great promise in delivering high-order accuracy in time with efficient use of computational resources. There has been much success in developing such methods for finite volume method (FVM) discretizations of PDEs. The Picard Integral formulation (PIF) has recently made such single-stage temporal methods accessible for finite difference method (FDM) discretizations. PIF methods rely on the so-called Lax-Wendroff procedures to tightly couple spatial and temporal derivatives through the governing PDE system to construct high-order Taylor series expansions in time. Going to higher than third order in time requires the calculation of Jacobian-like derivative tensor-vector contractions of an increasingly larger degree, greatly adding to the complexity of such schemes. To that end, we present in this paper a method for calculating these tensor contractions through a recursive application of a discrete Jacobian operator that readily and efficiently computes the needed contractions entirely agnostic of the system of partial differential equations (PDEs) being solved.


Introduction
In past decades, high-order discrete methods for hyperbolic conservation laws have become one of the central themes in computational fluid dynamics (CFD) due to their potential in achieving highly accurate predictions in balance with trends in current high-performance computing (HPC) architectures. Using high-order methods allows better solution accuracy with a fixed memory profile at the cost of increased floating point operations; in line with next generation HPC systems that are seeing increased computing power with saturation in the memory per compute core.
Practitioners have focused on designing high-arithmetic-intensity data reconstruction and interpolation models in the context of finite volume methods (FVM) and finite difference methods (FDM). Under the dual computational need for accuracy and stability, the CFD community has developed high-order reconstruction and interpolation methods that can produce highly accurate solutions while avoiding unphysical oscillations near the discontinuities. Examples include the early success of the piecewise parabolic method (PPM) by Colella and Woodward [1,2], which has been still actively adopted as a shock-capturing partial differential equation (PDE) solver by many CFD users after about four decades since its introduction. In 1987, Harten et al. [3] proposed an essentially non-oscillatory (ENO) scheme that chooses the appropriate stencil adaptively to prevent spurious oscillations near strong gradients. Liu et al. [4] improved this idea by introducing the weighted essentially non-oscillatory (WENO) scheme, which takes a convex combination of all possible stencils, reducing the number of logical calculations needed for the original ENO method (WENO-JS). Assuming that the temporal integration is discretized with a qth order scheme and the spatial with a pth order scheme (often with q ≤ p), the overall solution accuracy is determined by the leading errors of the two discretizations, namely, O (∆t q , ∆s p ), where ∆t and ∆s represent the temporal and spatial length scales. Practically speaking, the spatial errors usually dominate the temporal errors; thus, the high-order methods with q ≤ p are justifiable. Nonetheless, we observe that the temporal errors gradually dominate the spatial errors as we increase the grid resolution progressively, which leads us to find a higher than a third-order temporal scheme for maintaining overall solution accuracy even in the finer grid resolutions.
In Lax-Wendroff type time discretization methods, like PIF and SF-PIF3 methods, rank-4 Jacobian-like tensors are needed for the fourth-order approximation. Although possible, the need for such analytical handling becomes prohibitive in the Lax-Wendroff type schemes when extending their accuracy beyond third-order due to the drastic growth in complexity.
In this regard, we propose a new fourth-order extension of the SF-PIF3 method. Since SF-PIF3 bypasses all the Jacobian-like calculations, our SF approach is further rewarded in a fourth-order extension, which demands a leap in calculation counts for conventional Lax-Wendroff type schemes. Moreover, we present a new improved version of the previous SF approach [27], which applies a recursive strategy to obtain the higher-order derivative tensor-vector contractions, promising a more compact code structure and faster performance than the original SF method. With such modification, our fourth-order SF-PIF4 method shows nearly twice faster performance than the optimal fourth-order five-stage SSP-RK method.
We organize the paper as follows. In Section 2 we briefly review the general discretization strategy of the original PIF method. We present the fourth-order extension of the original PIF method in Section 3, and we apply the original SF approach to the fourth-order PIF method in Section 4. The improved recursive SF approach will be introduced in Section 5, which ensures faster performance and a more straightforward code structure. The results of various 2D and 3D benchmark problems are presented in Section 6. We conclude our paper with a summary in Section 7.

Picard integral formulation
We are interested in solving the general conservation system of equations in 3D, where U is the vector of conservative variables and F, G, and H are the flux functions in x-, y-and z-direction, respectively. In the Euler equations, the conservative variables and the flux functions are defined as, ( Applying the Picard integral formulation (PIF) [33], we take a time average of Eq. (1) within a single time step ∆t over an interval [t n , t n + ∆t] = [t n , t n+1 ], where F avg , G avg , and H avg are the time-averaged fluxes in each direction, for x = (x, y, z) ∈ R 3 . We wish to express the spatial derivatives of the time-averaged fluxes in Eq. (3) using highly approximated numerical fluxesf,ĝ, andĥ at cell interfaces. Taking x-directional flux F, for example, we aim to express The y-and z-directional derivatives are approximated in a similar fashion. Finding approximated solutions for the numerical fluxes in the PIF method is nearly identical to the conventional finite difference method (FDM). Following the standard convention in FDM, we treat pointwise x-directional flux F(x, y j , z k ) as a 1D cell average of an auxiliary functionF in 1D, Then, by taking an analytical derivative of Eq. (6) with respect to x, we have By comparing Eq. (5) and Eq. (7), we can identify the auxiliary functionF as a numerical fluxf. We can follow similar procedures to identify the rest of numerical fluxesĝ andĥ. Therefore, determining spatially approximated solutions for the numerical fluxes is the inverse problem of Eq. (6), viz. finding pointwise values at cell interfaces, given the volume-averaged quantities at cell centers, which is exactly the same way as the high-order reconstruction procedure used in 1D finite volume method (FVM). Namely, we can use the conventional high-order reconstruction procedures used in FVM for approximating numerical fluxesf,ĝ and h, at the desired spatially pth-order accuracy by using the cell-centered pointwise analytic fluxes, F i jk , G i jk , and H i jk . One strategic difference in the PIF method is that we use the time-averaged fluxes, F avg , G avg , and H avg , as the inputs of the high-order reconstruction method instead of the pointwise flux values in the conventional FDM in order to assure the anticipated qth-order temporal accuracy for the numerical fluxesf,ĝ, andĥ at cell interfaces.
The time-averaged fluxes are obtained through the Taylor expansion of the pointwise flux around t n . In the qthorder PIF method, the time-averaged x-directional flux F avg is approximated as, We will use the temporally qth-order approximated fluxes F appx,q as the inputs of the pth-order reconstruction scheme R(·) that is combined with a characteristic flux splitting method F S(·) to apply the pth-order spatial approximation to the numerical fluxf at cell interfaces, where r represents the stencil radius required for the pth-order reconstruction method, R(·). This study uses the conventional fifth-order WENO-JS method [5] and the global Lax-Friedrichs flux splitting scheme taking the maximum wave speed over the entire domain and all characteristic fields. The choice of this global Lax-Friedrichs scheme is particularly more diffusive than other possible forms of splitting, although we have found that the added numerical dissipation becomes less significant for designing our fourth-order SF-PIF scheme without sacrificing accuracy.
Recall that the numerical flux in Eq. (9) is a high-order approximated solution both in time and space since we used temporally approximated inputs, F appx,q , instead of the temporally pointwise flux values. Therefore, the x-directional numerical flux derived from Eq. (8) and Eq. (9) allows us to update the solution in a single step while maintaining high order accuracy both in space and time. The y-and z-directional numerical fluxes,ĝ andĥ, are obtained in similar fashions.
Finally, the fully discretized form of the governing equation is given as,

The fourth-order extension of the PIF method
The primary goal in this section is to extend the PIF method [33] to fourth-order in time. We consider a fourthorder approximated time-averaged flux in x-direction, F appx,4 , from the Taylor expansion of the pointwise flux around t n . As expressed in Eq. (8) we have, The other y-and z-directional approximated fluxes, G appx,4 and H appx,4 , are defined in similarly. Our main interest is transforming all the time derivatives in Eq. (11) to the corresponding spatial derivatives; thereby we could express Eq. (11) in a fully explicit form.
For simplicity, we begin to adopt a compact subscript notation of partial derivatives and omit the temporal expression of t = t n . Thus, we rewrite Eq. (1) in a compact form as, In [27], we introduced the so-called flux equation -an evolution equation of fluxes -by applying a chain rule to Eq. (12). For example, the flux equation of the x-flux is The above flux equation allows us to convert time derivatives of the fluxes to the spatial derivatives via the LW/CK procedure. For instance, the first-order time derivative of F is easily converted to the spatial derivatives as, The higher-order time derivatives could be achieved by taking partial derivatives to Eq. (14) recursively. As an example, the second-order term is written as where Following the same procedure, we are able to obtain an explicit form of the third-order time derivative of the flux as, where and and similarly for ∇ f ty and ∇ f tz . Collecting Eqs. (14) to (19), we can express the fourth-order approximation of the timeaveraged flux F appx,4 explicitly, as the spatial derivatives are readily approximated through the conventional central differencing schemes.
We should note that we adopt the conventional five-points, fourth-order in space central differencing schemes to evaluate the spatial derivative terms, following the original PIF method in [33,28]. Moreover, for reducing the code complexity and improving the code performance, we reuse the divergence of the flux, ∇ f , for calculating high order spatial derivatives, e.g., ∇ f x , ∇ f xx , and ∇ f xy . This approach requires an additional guard cell layer (resulting in two more guard cells for the five-points derivatives). However, the overall code performance is better than evaluating high-order derivatives in each direction. Also, we have observed that it does not affect the accuracy and the stability of the PIF scheme.

The original non-recursive system-free (SF) approach
As firstly proposed in [27], the system-free (SF) method provides a capability to bypass all the analytical derivations of Jacobian-like terms (F U , F UU , · · ·) in the PIF method by considering a central differencing with a small perturbation ε for the input space of the flux function. For example, a dot product between the Jacobian matrix F U and an arbitrary vector V can be approximated as, We can extend the above idea to the Hessian tensor contraction with the two same vectors V, The contraction with two different vectors of V and W is achieved by a linear combination of Eq. (21) by following the simple vector calculus, Theoretically speaking, the original system-free procedure in the above can be applied to any arbitrary order of derivatives of the flux function F with respect to the conservative variable U. For instance, the fourth-order extension of the PIF method requires the third-order derivative of F, i.e., F UUU . Following the same mathematical basis of Eq. (20) and Eq. (21), we obtain We can further extend the procedure to compute the contraction with three different vectors, V, W, and X, and only to see that the number of terms to be computed rapidly increases in high-order tensor contraction terms. As such, the original SF method in Eq. (24) becomes less attractive for any PIF method higher than third-order accuracy, as it demands increasing complexity in code implementation, which results in a significant loss in the overall performance of the code. For example, it requires 28 times flux function calls for just getting a single tensor contraction, F UUU · V · W · X. We should note that the major bottleneck of the original system-free method stems from Eqs. (22) and (24) that require to perform the Jacobian-like approximations multiple times.
To mitigation the computational bottleneck, in the following section, we introduce a newly improved version of the system-free method, which does not require any further modifications, even for the case of the tensor contractions of different vectors.

A newly improved recursive system-free (SF) approach
In this chapter, we will improve the original system-free method [27] to be applied in a recursive manner for higher-order derivatives of F, ensuing much simpler code complexity and faster performance. Primarily, we define a functional D u that represents the Jacobian-free method denoted in Eq. (20), where ε v is the appropriately calculated ε corresponding to the vector V by following the original idea of the systemfree method in [27], The study in this paper uses ε op = 4.8062 × 10 −6 that is the optimal ε value for the second-order Jacobian-free approximation in the 64-bit machine. This choice is also justifiable for the recursive scheme considered below, where the functional D u itself is defined as the Jacobian-free method fundamentally. More detailed discussions about ε calculations could be found in [27,34] and [31]. We continue to apply D u in the following successive fashion to calculate the tensor contractions between higherorder derivatives for the flux function F and arbitrary vectors. Thus, the Hessian approximation is, Again, following Eq. (26), ε v and ε w are the optimal ε values normalized by its corresponding vectors V and W, respectively. Note that the improved version of the Hessian-free method in Eq. (27) is applicable regardless the tensor contraction is with two identical vectors (e.g., F UU · V · V) or with two distinct vectors (e.g., F UU · V · W), hence it does not require separate formulations as in Eqs. (22) and (24).
The simplicity gain from the improved version of the system-free method is further rewarded when we apply it to the higher-order derivatives of F. Following the equivalent strategy, the tensor contraction of the third-order derivative of the flux function, F UUU with three distinct vectors, V, W, and X is written compactly as, Let us consider the number of flux function calls of SF approximations to compare the performance gain from the recursive SF method. Comparing Eq. (24) and (28), for example, the numerical flux function needed to be called 28 times in the original SF method. However, the recursive SF method only requires eight evaluations. This is a huge improvement in both performance and compactness.
With the modified SF method in Eqs. (25) to (28), we can approximate all the tensor contractions needed for calculating temporal flux derivatives in Eqs. (14) to (19) without analytical derivations for Jacobian-like terms, giving the system independence of the high-order scheme. We should note that the recursive modifications of the SF method presented in this section do not affect the solution's accuracy and stability compared to the original SF method.

Numerical Test Results
This section presents numerical results of well-known test problems for benchmarking the modified SF-PIF methods' capabilities. We mainly compare the results from the third-order and the fourth-order temporal methods of SSP-RK and SF-PIF. We used the well-known three-stages, third-order SSP-RK method [35] and the five-stages, fourth-order SSP-RK method [36] (RK3 and RK4 hereafter) for the comparisons of the third-order and fourth-order SF-PIF methods (SF-PIF3 and SF-PIF4 hereafter), respectively.
The main purpose of this section is to demonstrate that the proposed recursive SF-PIF methods produce the same quality of solutions, with less CPU time in the light of the single-stage time update, as compared to RK3 and RK4 methods applied to FDM discretizations. We use the conventional fifth-order WENO-JS [5] spatial reconstruction method for all simulations; thus, we expect fifth-order spatial accuracy O(∆x 5 ) combined with third O(∆t 3 ) or fourthorder O(∆t 4 ) temporal accuracy for all numerical results. The fixed Courant numbers, C cfl = 0.4 and C cfl = 0.3, are used for all 2D and 3D simulations, respectively.
6.1. 2D Euler equations 6.1.1. The Sod's shock tube test (rotated 45°) We start with the classical shock tube problem of Sod [37], testing a numerical scheme's ability to capture the three wave families of the 1D Riemann problem: a shock, contact discontinuity, and rarefaction fan. Although Sod's problem is a 1D shock tube problem originally, we implement the test in a 2D domain by tilting the shock wave direction by the angle of θ = 45°. We adopt the idea of Kawai [38], where the initial conditions are repeated multiple times along the direction of the wave propagation so that the problem may be executed with periodic boundary conditions. Explicitly, the initial condition is given as,  Fig. 1. We used the grid resolution of N x = N y = 1024; thus, the figure shows an N = 256 number of data points along with the diagonal axis, x . As shown in Fig. 1, SF-PIF methods produce fairly comparable results to those of RK3 and RK4, except for the small oscillation in the shock front of x-velocity. This issue was already discussed in our previous study [27], where the oscillations are originated from the central differencing formulae that we used for getting spatial derivatives. We can minimize the oscillation by applying WENO-like differencing introduced in the appendix of [27]. However, we choose to use the conventional central differencing formulae for this study as we did not observe any unphysical oscillations for the rest of the test problems.

The Shu-Osher problem (rotated 45°)
Our next choice of test problem is the Shu-Osher problem [39] that describes the interactions between a Mach 3 shock and a smooth density profile. Initially, a Mach 3 shock wave travels to the right through a sinusoidally perturbed density profile. As the shock wave propagates along the perturbed region, the profile gets compressed, resulting in a frequency-doubled region behind the shock. As the shock wave moves further to the right, the doubled-frequency region returns to its original frequency, at which point it becomes a sequence of sharp profile instead of smooth sine wave due to the shock-steepening.
We performed the Shu-Osher problem in 2D by inclining the shock wave direction by an angle of θ = 45°, similar to what we did in the Section 6.   The density profiles along the x direction are given in Fig. 2. The four different temporal method choices (RK3, RK4, SF-PIF3, and SF-PIF4) produce reasonably acceptable solution profiles capturing the high-frequency amplitudes fairly well in the frequency-doubled region. We see that the results of RK3 and RK4 are nearly identical, while SF-PIF3 and SF-PIF4 show slight differences near the highest amplitudes. Generally, RK methods capture the amplitudes marginally better, but in the left-most part of the double-frequency region, x ≈ 5.8, SF-PIF methods capture the highest peak of the amplitude better than the RK methods near the transition between the frequency doubling and the shock steepened perturbations.

Isentropic vortex advection
The isentropic vortex advection problem [40] is one of the most popular test choices to measure the simulation code's accuracy and performance. Although the problem is fully nonlinear, the exact solution is always existent in the form of its initial condition, from which an isentropic vortex is advected through periodic boundaries. We can evaluate the accuracy of a method on nonlinear problems by comparing the final result with its initial condition. Here, we adopt the idea from Spiegel [41], where the simulation domain size is doubled up as [0, 20] × [0, 20] to prevent vortex-vortex couplings near the periodic boundaries.
The results of L 1 errors on six different grid resolutions, N x = N y = 120, 200, 400, 800, 1600, and 3200, are plotted in the left panel of Fig. 3. As expected, all temporal methods follow the convergence line of order ∼ O(∆x 4.6 ), which is nearly the same as WENO's fifth-order spatial accuracy. However, at the critical grid resolution, N x = N y = 1600, the third-order temporal methods of RK3 and SF-PIF3 start to degrade the whole solution accuracy. This behavior can be explained that the errors from the fifth-order spatial WENO solver are dominant on the grid resolutions up to N x = N y = 1600, after which the truncation errors associated with the third-order temporal integrators become dominant over the error of the fifth-order spatial solver. This result tells us the significance of high-order temporal updates in fine grid resolutions: a high-order spatial method does require a comparably high-order temporal method to retain the overall quality of the solutions, particularly when we are motivated to add more grid resolutions to resolve finer scales more accurately. Otherwise, the temporal solver's lower order accuracy can potentially degrade the solution accuracy, contradicting the intended motivation. A direct consequence of this observation applies to CFD simulations on an adaptive mesh refinement (AMR) configuration. A mediocre low-order temporal method is integrated with a high-order spatial solver. This combination of such two solvers creates a computational dilemma of not making any further enhancement in solution accuracy as the computational grids are progressively refined to improve the quality of the AMR solutions. Instead, the solution accuracy is to be bounded by the lower temporal accuracy.
To compare the computational performance over the four different temporal integrators, we present L 1 errors as a function of CPU time data on the right panel of Fig. 3. The drop of the convergence rates is again observed in the two third-order solutions of RK3 and SF-PIF3 at the critical resolution N x = N y = 1600. On the contrary, the two fourth-order solutions of RK4 and SF-PIF4 continue at fourth-order without any change. Up to this resolution, SF-PIF3 is the fastest in reaching any given target L 1 error threshold in CPU time. However, on any grid resolutions finer than the critical resolution, SF-PIF3's L 1 error drops to the third-order convergence, which ultimately crosses the SF-PIF4's fourth-order straight convergence line at N x = N y = 10 4 , beyond which its error will remain larger than the ones from the fourth-order temporal schemes as long as the convergence rate follows the pattern at the high-resolution tail. Table 1: The L 1 errors, the rates of convergence, and the computation times for the vortex advection test solved using RK3 and SF-PIF3 methods (top); using RK4 and SF-PIF4 methods (bottom). All simulation runs are performed on the four 20-cores Cascade Lake Intel Xeon processors, utilized 64 parallel threads. CPU times are measured in seconds, averaged over 10 individual runs. On the other hand, it is distinctively superior to see that SF-PIF4's solution reaches any fixed target error in a faster CPU time than the third-order RK3's solution while keeping the numerical errors as low as RK4 results at each grid resolution. The detailed simulation data from the vortex advection tests are presented in Table 1. All performance results are measured in second, averaged over 10 simulation runs conducted on two nodes of the UC Santa Cruz's high-performance computer, lux. Each node has two 20-core Cascade Lake Intel Xeon processors, and we utilized 64 parallel threads for each simulation run.

2D Riemann problem: Configuration 3
To test the methods' ability to capture the complex fluid structures in 2D, we performed a two-dimensional Riemann problem called Configuration 3. This specific setup and other types of configurations have been extensively studied in [42,43,44] and have been adopted as popular benchmark test problems. To set up Configuration 3, we follow the initial condition from [45,26,27]. We conducted numerical experiments on a 1600 × 1600 grid resolution, which is generally considered as a very high resolution in 2D. This grid resolution choice is made based on our observation in Section 6.1.3 to make sure that the temporal errors are comparable to or dominant over the spatial errors; thereby, we can anticipate temporal error dominant results that allow us to focus on the effects of the four different temporal solvers.
The results at t = 0.8 are shown in Fig. 4. The pseudo-colors represent the density map ranging between [0.1, 1.8], and 40 contour lines within the same range are over-plotted as solid black lines. We see that all four different temporal schemes produce well-known, acceptable results, keeping the assumed diagonal symmetry exceptionally well on this high resolution. This problem is highly nonlinear, involving formations of the upward-moving jet, the downwardmoving mushroom-jet, secondary Kelvin-Helmholtz instabilities exhibited as the small-scale vortical rollups along the slip lines and along the stems of the two jets. As such, it is a non-trivial task to address if a method under consideration is better or worse based on the number of such rollups in the simulations (see our discussion in [27]). At best, such quantification can only provide proof of intrinsic information about the amount of numerical dissipation of each method. From this perspective, we conclude that the two SF-PIF solutions produce the equivalent amount of vortical rollups compared with the corresponding RK solutions, confirming the SF-PIF methods' validity.

Double Mach reflection
Our final 2D test case is the double Mach reflection test that launches a strong Mach 10 shock with 60°of an incident angle between the shock plane and the bottom reflecting wedge. The initial condition is the same as the original setup introduced by Woodward and Colella [2], except that we doubled the y-domain size following [46] to prevent numerical artifacts from the top boundary interaction with the secondary shock wave and the slip line.
The density results are presented in Fig. 5. The density color map ranges between [1.3, 22.6], and the solid black lines represent 40 levels of the density contour lines with the same range. The figures are zoomed-in near the vicinity of the jet and the primary triple point, which is widely considered the main area of interest in the Double Mach Reflection test. All simulation runs are performed on a 4096 × 2048 grid resolution.
The results from the third-and fourth-order SF-PIF methods produce well-acceptable results compared to the corresponding RK methods. Except that there are minor differences in the shape of Kelvin-Helmholtz instabilities along the primary slip line and the bottom jet, the overall dynamics of the two SF-PIF solutions match well with the RK solutions, validating the fidelity of the proposed SF-PIF methods in the presence of a strong shock.

3D Sedov test
To test the code's ability to maintain the spherical symmetry in all spatial directions, we consider the Sedov blast test [47] in 3D. Initially, a point-source of a highly pressurized perturbation is given at the domain center, which leads to a strong spherical shock wave propagating outward from the source. The simulation domain is a 3D square box of [−1.2, 1.2] × [−1.2, 1.2] × [−1.2, 1.2] resolved on a 128 × 128 × 128 grid resolution. Outflow boundary conditions are imposed in all directions. The initial conditions are based on the setup found in [48], but we choose the deposited energy, E tot = 0.851072, according to the setup in [49]. Fig. 6 shows the density profiles at t = 1 along the diagonal (x = y = z) and the x-axis (y = z = 0). The results on the left panel are solved with the RK4 method and the right panel with SF-PIF4. The exact self-similar solution is plotted as a reference solution in solid black curves on both figures. As shown, both the RK4 and SF-PIF4 methods show nearly identical results. Despite the visible differences between the diagonal and the x-axis profiles, especially at the highest peak and the shock front location, both fourth-order temporal solvers show well-maintained reflectional symmetry along the normal axis at the origin. x x=y=z Figure 6: The density profiles of the 3D Sedov explosion test at t = 1. The red triangles represent the density profiles along the x-axis from the origin, while the blue circles represent the density profiles along the diagonal axis. The solid black line is the well-known exact self-similar solution. All simulations are performed on a 128 × 128 × 128 grid resolution, solved with RK4 (left) and SF-PIF4 (right) temporal solvers.

3D Sod's blast explosion test
Next, we consider the 3D explosion test problem found in [49]. This test is a three-dimensional extension of the 1D Sod's shock tube problem [37], which we already solved in a rotated 2D shock-tube configuration in Section 6. The result of the density profiles along the diagonal (x = y = z) and x-axis (y = z = 0) at t = 0.25 are presented in Fig. 7. The solid black curve represents the reference solution using the 1D Euler equations with the appropriate geometric source term according to [49]. The results show that the SF-PIF4 solver captures the shock profile and the spherical symmetry exceedingly well compared to the RK4 result and the reference 1D solution. There is no noticeable difference between the two fourth-order temporal solvers.

3D Riemann problem
Finally, we performed the 3D Riemann problem presented in [50]. We follow the same initial conditions, con- resolved on a 256 × 256 × 256 grid resolution. The setup imposes outflow boundary conditions at all boundaries. The initial condition will carry out 2D Riemann problems at each octant interface, including the diagonal plane of the 3D computational cubic. The resulting density profiles at t = 0.53 are given in Fig. 8. The pseudo-color map ranges between [0.5, 2.65], and we over-plot 40 levels of contour lines using the same range. We observe three different 2D Riemann problems on the left, top, and the diagonal planes in the left panel. The diagonal planes are separately shown in the right panel. SF-PIF4 method is able to capture all the important features as much as the RK4 result, confirming the validity of the SF-PIF4 method in comparison.  Table 2 shows the performance results for the 3D Riemann problem test on four grid resolutions. As shown in the table, the SF-PIF4 method demonstrates nearly the same performance as the third-order RK method, especially in high-resolution cases. We should note that the performance gains from the SF-PIF methods are more compensated on the high-resolution grids, which are indispensable for high fidelity physical simulation studies.

Conclusion
In this study, we have extended the original third-order (non-recursive) SF-PIF method [27] to a fourth-order temporal scheme with the improved recursive version of the system-free (SF) approach. The newly proposed recursive SF approach enables the fourth-order extension of the SF-PIF method (SF-PIF4), increasing computational performance gain with a simplistic code structure.
The original SF approach's critical design purpose [27] is to bypass all the analytical derivations of Jacobians, Hessians, and even higher-order derivatives tensor terms. With the SF approach, approximating the tensor contractions of Jacobian-like terms in the Lax-Wendroff type time discretization becomes simplified.
In this paper, the SF method's advantage is further enhanced by introducing a new recursive procedure. The recursive SF method requires only a small amount of calculations for approximating the tensor contractions, thereby empowering the fourth-order extension of the SF-PIF method to ease and faster performance.
We have tested our fourth-order and third-order SF-PIF methods with a wide range of test problems in two and three spatial dimensions. The results show the SF-PIF methods have significantly faster computational time performance than the corresponding SSP-RK methods at the same temporal order while maintaining the equivalent solution accuracy. In two-dimensional cases, SF-PIF methods show more than two times faster performance results than the SSP-RK counterparts. Moreover, we demonstrate that the fourth-order SF-PIF's performance results are nearly the same as the third-order SSP-RK method. This enhanced performance gain in our SF-PIF solvers can provide a big leap in large-scale parallel computing to save computational costs and reach highly accurate numerical predictions in practice.
We believe the recursive SF method has a broad potential to be relevant in various fields of study. It is neither designed particularly for the PIF method nor any specific numerical methods in computational fluid dynamics. Instead, it is solely intended for approximating the Jacobian-like tensor contractions. We presume that our recursive SF schemes are applicable in other numerical algorithms to enhance the calculation speed and ease the code implementations wherever the Jacobians, Hessians, or higher derivative tensors are required.
A further extension is to design the SF-PIF method in arbitrary order. As reported in Section 6.1.3, the temporal errors dominate the spatial errors in high-resolution simulations; thus, it is noteworthy to use the same accuracy in both the spatial and temporal solvers. We realize that a naive extension of the SF-PIF method to the fifth or higher-order could potentially be less attractive due to the drastic increase of complexity in Taylor expansion. We aim to reduce such complexity and make the SF-PIF method an arbitrary order of accuracy, which will be further investigated in our future studies.

Acknowledgements
We acknowledge the use of the lux supercomputer at UC Santa Cruz, funded by NSF MRI grant AST 1828315.