Accuracy and Performance Evaluation of Low Density Internal and External Flow Predictions using CFD and DSMC

The Direct Simulation Monte Carlo (DSMC) method was widely used to simulate low density gas flows with large Knudsen numbers. However, DSMC encounters limitations in the regime of lower Knudsen numbers ( Kn < 0 . 1 ). In such cases, approaches from classical computational fluid dynamics (CFD) relying on the continuum assumption are preferred, offering accurate solutions at acceptable computational costs. In experiments aimed at imaging aerosolized nanoparticles in vacuo a wide range of Knudsen numbers occur, which motivated the present study on the analysis of the advantages and drawbacks of DSMC and CFD simulations of rarefied flows in terms of accuracy and computational effort. Furthermore, the potential of hybrid methods is evaluated. For this purpose, DSMC and CFD simulations of the flow inside a convergent-divergent nozzle (internal expanding flow) and the flow around a conical body (external shock generating flow) were carried out. CFD simulations utilize the software OpenFOAM and the DSMC solution is obtained using the software SPARTA. The results of these simulation techniques are evaluated by comparing them with experimental data (1), evaluating the time-to-solution (2) and the energy consumption (3), and ∗


Introduction
The Boltzmann equation is valid at any Knudsen number Kn, i.e., from very low Kn (hydrodynamic flows) up to very high Kn (extremely rarefied gas flows).While flows at low Kn are traditionally simulated by methods from computational fluid dynamics (CFD) solving the Navier-Stokes equations, they fail to provide sufficiently accurate results for rarefied gases since the continuum mechanics assumption is violated.However, in the rarefied gas flow regime, where the mean free path of the molecules λ approaches or surpasses the characteristic length scale of the flow, micro-scale effects invalidate the continuum hypothesis.Consequently, particle-based methods, such as the direct simulation Monte Carlo (DSMC) method, often emerges as the preferred choice.The DSMC method as introduced by Bird [1,2], tackles the Boltzmann equation through Monte Carlo simulations.Unlike molecular dynamics approaches which trace the trajectory of every particle in the fluid based on Newton's equations of motion, DSMC stochastically generates collisions using scattering rates and post-collision velocity distributions derived from the kinetic theory of dilute gases [3].Notably, DSMC exhibits superior accuracy in high Knudsen number regimes [1,4].
However, none of these two methods -CFD and DSMC -is entirely optimal for simulating gas flows in the intermediate-Knudsen-number regime Kn ≈ 0.01 . . .10, especially when Kn changes drastically over the flow field.While CFD methods may suffer from significant inaccuracies stemming from the neglect of molecular effects, the computational demands for accurate DSMC simulations increases with Kn −4 [5] rendering them prohibitively expensive for Kn < 0.05.Consequently, there has been an increasing interest in using hybrid methods, especially DSMC/CFD methods to simulate rarefied gas flows with high Mach and intermediate Knudsen numbers [6][7][8][9][10][11][12][13][14].
Furthermore, to decrease the time-to-solution for the arising pure or hybrid simulations, high-performance computing techniques were often applied [15][16][17][18][19].While numerous works study the accuracy of the DSMC and CFD methods and how different parametrizations, e.g., the choice and parameters for the collision model or the number of simulation particles, impact this accuracy [3,5,8,[20][21][22][23][24], they do not take into account the cost and efforts of the methods and how much resources in terms of energy they consume.On the other hand, in studies on the computational performance of the DSMC method [15][16][17][18][19]25] the influence of the different parametrizations on the simulation results was not investigated.
There is a long list of publications assessing the DSMC and CFD conundrum, c.f. [3,5,8,[15][16][17][18][19][20][21][22][23][24][25][26][27].However, a detailed explanation of the impact of the great variety of parametrization possibilities of the solvers, in particular for DSMC, is often incomplete or tailored to very specific setups that are not easily generalizable.Furthermore, the precision of the results and the performance of the DSMC method are not considered simultaneously to judging if the accuracy is worth the amount of time and energy that the simulation consumes.
Here, we briefly revise the CFD and DSMC methods and compare them in terms of accuracy and computational effort in different flow regimes.Considering both an external flow around a conical body and an internal nozzle flow (2D and 3D), parametrizations are detailed and their impact on the solution is analyzed.The software packages OpenFOAM (CFD) [28] and SPARTA (DSMC) [29] were used.Besides accuracy, the computational costs (run-time and energy consumption) are analyzed.
After summarizing related work in section 2, the details of the test cases and the parameters for both DSMC and CFD simulations are described in section 3.Sections 4.1 and 4.2 explain the CFD and DSMC solvers employed in this work, respectively, as well as the fundamental physical modeling of these approaches.The simulation approaches described in Sections 4 to 5 are underpinned by validation with experimental results, c.f. section 6.A conclusion and an outlook to future work is provided in section 7.
The main goals of this work are: • Presenting a comprehensive study of rarefied gas flows for internal and external configurations • Using a combination of 2D/3D configurations of DSMC, CFD and hybrid DSMC-CFD simulation • Highlighting the effect of different parameters having major impact on the simulation such as the mesh size, number of particles, time step, collision model, boundary conditions, and computational speed • Validating and evaluating the simulations against experiments • Reporting the performances and computational efforts of the employed approaches in terms of scalability and energy consumption

State of the art
Numerous studies were performed to analyze gas flow fields, including rarefied and continuum flows, most of them using DSMC or CFD methods.We provide a brief summary of the literature, which cannot be comprehensive but primarily presents studies that depict specific aspects relevant to our study yet collectively demonstrate remaining questions in the field.
The majority focused on the accuracy and parameter studies of DSMC simulations such as the choice of the mesh or the collision model [30][31][32][33][34][35] and validated the results partially against experimental results.Some studies were related to more practical cases for engineering purposes [5-8, 23, 24, 36].Since the DSMC method can be very time consuming, many studies targeted the performance of the approach: While the computational performance of the DSMC method on different computer architectures in terms of runtime is discussed in [15,16,18,26,37], algorithms and optimizations to speed up the DSMC method can be found in [17,19,25,27].To the authors' knowledge, energy consumption has not been addressed so far.Discussions on the impact of the mesh dependency, collision model and boundary conditions are not included in the aforementioned works.
Flows through micro-nozzles were studied based on DSMC and a compressible Navier-Stokes solver applying slip and no-slip conditions [5].The computational results are compared and the most important outcome is that a better agreement between DSMC and CFD is observed when a slip-wall boundary condition is implemented.It is also shown that the CFD and DSMC results differ in the divergent part of the nozzle, especially close to the outflow, where the breakdown parameter Kn is relatively high or, in other words, strong rarefied effects start to appear.No evaluation of the results with regard to a comparison with experimental findings or computational expenses are included.The effect of different parameters such as inlet and wall boundary conditions and the Reynolds number on DSMC and Navier-Stokes approaches for a micro-nozzle flow with a relatively small Knudsen numbers was studied [23].Furthermore, it was investigated in which part of the micro-nozzle DSMC and CFD provide the best results.It is shown that the CFD results exhibit obvious deviations from the DSMC results as Kn exceeds 0.045.The computational performance of large-scale parallel DSMC on homogeneous (CPU) and heterogeneous (CPU+GPU) systems was studied and different programming approaches (MPI, hybrid MPI+OpenMP and OpenACC) were discussed [17].
Extensive research on the development of hybrid DSMC-CFD methods was carried out, c.f. [6][7][8][9][10][11][12][13][14].With regard to their application, simulation cases were mostly very specific, and the comparison was focused typically either on numerical accuracy [6-8, 12-14, 38] or on performance [9].For example, the advantages of a hybrid DSMC-CFD approach over pure DSMC is indicated in [6].For this purpose, the authors simulate a hypersonic flow over a two-dimensional wedge.A comparison of the flow field predicted by a 3D CFD-DSMC simulation considering a space capsule geometry is given in [7], including a validation against experimental results from wind tunnel tests.The study also investigates the effect of the mesh dependency on CFD and DSMC methods.A comparison study [8] explores the performance difference between coupled CFD-DSMC and pure CFD methodologies in simulating a gas centrifuge handling 235 U F 6 gas.Pure DSMC results around the Mars pathfinder and Mars micro-probe capsules are studied in [24], however, without a validation against experimental results.Computational efficiency of massively parallel (stand-alone) DSMC for different cases is studied, amongst others, in [16] and [37].Another comparison study between DSMC and CFD results of a low-density nozzle flow and the experimental evaluation is discussed in [21].
The accuracy of a DSMC simulation depends on a number of numerical parameters such as the time step size, the cell size, and the number of samples.Furthermore, the choice of the collision model plays an important role.An analysis of (statistical and deterministic) numerical errors corresponding to numerical parameters in the DSMC method is provided in [20] based on a simple heat transfer problem between two parallel plates in 1D and 2D.The statistical error analysis of the DSMC method applied to hypersonic and nozzle flows is provided in [39].A further error analysis considering various numerical parameters (sampling cells, sampling time step and sample sizes) is provided in [40].
Table 1 summarizes which related studies covered the different areas of DSMC/CFD simulations in terms of accuracy and computational performance and classifies the current paper accordingly.It should provide recommendations and suggestions on DSMC, CFD and hybrid methods as well as a calibration of those DSMC parameters having a major impact on the simulations to achieve sufficiently accurate results in an acceptable amount of time.

Internal flow -Low density nozzle flow
Figure 1 indicates the 2D simulation domain of the low density nozzle.The test case is based on the experiment by Rothe [45] where the low density  flow properties are measured inside the nozzle using the electron beam fluorescence technique.The inflow boundary is located on the extreme left (line ab), i.e., at the inlet of the pressure chamber.An outflow boundary condition is assumed along the extreme right of the geometry (line fghij).The segment (aj) in Figure 1 is the axis of symmetry.For 2D simulations, an axisymmetric boundary condition was imposed.3D simulations were carried using the complete geometry.The test gas used in the simulation is nitrogen with a stagnation temperature of 300 K. Three different stagnation pressure configurations are tested according to the setup described in Table 2.That leads to the three internal flow cases i.I to i.III.The nozzle Reynolds number Re N is calculated using Re N = ρ o û r * /µ o where ρ o is the stagnation density, û is the adiabatic speed, r * is the radius of the throat and µ o is the viscosity based on the stagnation condition.The Knudsen number is calculated based on the stagnation condition and the throat diameter.

External flow -Blunt cone & sharp cone
Figure 2 demonstrates the 2D computational domains of the external flow simulations considering two geometries.The first geometry is a cone-shaped half body with a blunted nose and the second one possesses a sharp nose.Both geometries are assumed to be infinitely long in positive x-direction.The symmetry line of the half body is aligned with the free-stream.For the first half body, only the first 0.05 m of the blunted nose with a radius of 6.35 × 10 −3 m (0.25 in) is considered For the second half body, the first 0.09 m of the sharp nose is taken into account.The computational domains around these bodies are depicted in Fig. 2.
The fluid considered in both cases was pure nitrogen and the ambient conditions at different altitudes are listed in Table 3.Other parameters needed for CFD and DSMC simulations such as the pressure and number density were calculated correspondingly.The flow direction was in x-direction such that the left and right surfaces of the domain represent the inlet and outlet, respectively (see Fig. 2).For the 2D external flow, only half of the computational domain in y-direction was considered, i.e., a cut through the computational domain (x-y plane) as visible in Figure 2

Numerical Approach
4.1.Continuum Flow Solver: Navier-Stokes For this study, CFD simulations were performed using OpenFOAM, version v2112.OpenFOAM is a finite-volume based code which has a collection of libraries dedicated to the solution of partial differential equations (Navier-Stokes equations).Among the standard solvers available in the OpenFoam library, the rhoCentralFoam solver is used for investigating the present test cases.rhoCentralFoam is a density-based transient solver used for transonic and supersonic flow regimes of a compressible gas [46].
First/second-order schemes are used for the discretization of the governing equations.For gradient and divergence terms, a second-order Gauss linear scheme is used.For the diffusive terms in the governing equations, the Gauss scheme is the only choice of discretization and requires an interpolation scheme for the diffusion coefficient (linear).For the transient cases, the temporal derivatives are discretized using either the first-order bounded implicit Euler scheme or the backward scheme.The Kurganov scheme [47,48] is used to compute fluxes at the cell interfaces which prevents spurious oscillations around shocks.
In OpenFOAM, the computational domain is generally split up into a set of patches and the boundary conditions are then assigned as attributes to the patches and to the field variables on a patch.Various kinds of boundary conditions are available in the OpenFOAM library.The boundary conditions assigned to the test cases are listed in Table 4 and Table 5 for the internal and external flow cases, respectively.For both internal-and external-flow cases, the CFD calculations rely on structured grids specified and depicted in Appendix C.

Kinetic Approach: DSMC Method
The Direct Simulation Monte Carlo method is a discrete particle simulation technique that provides a numerical approximation of the solution of the Boltzmann equation.In this method each particle represents a large number of real molecules [2,4]  distribution.The inter-molecular and the molecule-surface collisions are calculated using probabilistic models.The momentum term and the collision term in the Boltzmann equation are solved in a decoupled manner [3].The DSMC method is a widely used methods for simulating rarefied gas flows.Its accuracy depends on the number of particles per grid cell, the size of the grid cells, the choice of the time steps and the collision models, i.e., for particle-particle and particle-surface collisions.It is general practice to limit the maximum cell size to one third of the mean free path and the time step to below one fourth of the mean collision time [4].In the subsequent analysis process, the microscopic-flow properties are sampled by averaging the particle properties per grid cell to obtain the macroscopic flow quantities.These sampled values are stored for the geometric center of the DSMC grid cells.We used the SPARTA (Stochastic PArallel Rarefied-gas Time-accurate Analyzer [49]) DSMC code to simulate transitional and rarefied flows.SPARTA is a highly benchmarked tool [50] that can simulate systems with a few to millions or billions of particles.It exhibits a good scalability and memory usage [37,49].Presently, the no-time-counter (NTC) method is used as collision-sampling technique [2] along with the variable-hard-sphere (VHS) and variable-soft-spheres (VSS) molecular models [1,4] as well as the Larsen and Borgnakke (L-B) model [34,35] to handle internal energy exchange.In the L-B model, only a fraction of the collisions are assumed to be inelastic which is defined by an average probability of the internal energy exchange ϕ.This parameter is used to determine the rate of the relaxation process of the energy which can also be given as the inverse of the relaxation collision number Z (ϕ = 1/Z) [4,21].SPARTA uses the aforementioned Larsen and Borgnakke model with constant and variable relaxation [1].In the current study, the vibrational mode and chemical reactions are assumed to be frozen.The collision parameters used are given in Table 6.
For the variable relaxation model, the parameter ϕ rot in SPARTA is calculated using equation (A.5) in [1].The unknowns in the equation were 4.07 × 10 −10 m 0.74 273.15K 1.36 2 0.2 Table 6: Collision model parameters of simulated gas molecules [2,32].ω is the power exponent of the temperature in the viscosity law, T ref is the reference temperature, α is the scattering coefficient and ξrot is the rotational degree of freedom.
inferred from experiments [51,52].Both models can be used for temperature ranges from 0 to 1200 K.
The boundary conditions in DSMC can be classified in two types: Freestream boundary conditions (inlet and outlet) and wall boundary conditions, where the gas-surface interactions are predominant.At the inlet, the flux and the thermal state of the molecules are defined according to the flow conditions and the particles are introduced into the simulation domain [1].The outlet removes the simulation particles leaving the simulation domain (see Appendix A).

Gas-surface interactions
The gas-surface interaction model plays a dominant role for the accuracy of the DSMC simulation and choosing an accurate model is subjected to many factors.The most widely used gas-surface interaction models in DSMC are the Maxwell model and the Cercignani-Lampi-Lord (CLL) model which include several unknown parameters.In the following, these models are briefly discussed to establish some guidelines in choosing the appropriate model along with the corresponding parameters.In the Maxwell model (Figure 3 (a)), it is assumed that the molecules are either reflected diffusively with complete energy accommodation (α E = 100 %) or reflected specularly with no energy exchange.The Maxwell model takes a fraction ϵ of incident molecules to be scattered diffusely and the remaining fraction of molecules (1 − ϵ) are scattered specularly.This model is useful for describing the thermodynamic behavior of the gas, however, it does not describe the molecular behavior of the gas which is frequently observed in fundamental gas-surface scattering experiments [53].
The Cercignani-Lampis-Lord model (CLL) [54,55] is based on the assumption that there is no coupling of the normal and tangential velocity components during gas-molecule reflections from a surface.Therefore, this model uses two coefficients α n and α t , which represent energy-accommodation coefficients associated with normal and tangential components of the velocity, respectively.The scattering distribution of the molecules is centered around an average scattering angle θ r which is a function of the two accommodation coefficients (Figure 3 (b)).This scattering distribution has a lobular shape similar to the one observed in experiments [53].This model also accounts for the internal energy exchange by introducing accommodation coefficients for rotational and vibrational modes.Furthermore, CLL has the capability to produce diffuse scattering with incomplete energy accommodation (α E = α n = α t < 100 %) by changing the scattering distribution.The details of the implementation of this model are described in [4].

Hybrid CFD/DSMC
The one-way coupling of CFD and DSMC is implemented as follows: The fluid flow in the entire computational domain is simulated based on the continuum solver (OpenFoam).An interface position (plane) is chosen in the computational domain and the fluid flow data are extracted at this position.This position is determined according to a continuum-breakdown parameter calculated from the flow field.The computational domain of the DSMC method is generated by splitting the former computational domain from the interface position toward the outflow exit.The extracted fluid flow data are introduced as an inflow boundary condition to the DSMC simulation, which is then carried out to resolve the transitional rarefied region.
In order to quantify the continuum-breakdown parameter, different definitions of the Knudsen number are used such as (a) a global Knudsen number Kn = λ/L and (b) a local Knudsen number or Boyd's Gradient-Length-Local Knudsen number Kn GLL,Q = λ|∇Q|/Q [41]; λ is the mean free path, L is the characteristic length, Q represents a macroscopic flow property such as the density ρ, the velocity v or the temperature T .A breakdown Knudsen number Kn B is calculated based on the maximum of the local Knudsen numbers and the global Knudsen number in the computational domain [5,43], i.e, Kn B = max(Kn, Kn GLL,ρ , Kn GLL,T , Kn GLL,|v| ). ( When Kn B > 0.05, the continuum breakdown is assumed [5,11].

Results and Discussion
6.1.Internal Flow: DSMC, CFD, Hybrid Simulations were performed for the test cases presented in Table 2 using the continuum approach (OpenFoam), the DSMC method and the hybrid CFD/DSMC.The full DSMC simulations are inefficient in the low Knudsen regime (Kn < 0.05) and demand higher computational effort.Therefore, the hybrid DSMC method was used in order to accelerate the simulations.The results obtained by the continuum approach were used to estimate the breakdown Knudsen number Kn B (eq. ( 1)).By determining Kn B throughout the simulation domain, it was observed that the continuum breakdown occurred in the simulation domain right after the throat.Hence, the interface between the continuum and DSMC domain was positioned at the throat using the approach described in Section 5.The continuum simulations were performed for a 3-D simulation domain and the hybrid DSMC using both the 2D axisymmetric and 3D configuration.The simulation results were validated against experiments conducted by Rothe [45] and the deviations are presented in the following figures.The sensitivity studies concerning different simulation parameters are described in Section 6.3.Due to the limited availability of experimental data, the cases i.I and i.III were studied in more detail than case i.II, where only the centerline temperature data is available.
Figure 4 to 7 show various flow parameters obtained using the continuum approach and the hybrid DSMC approach.For the purpose of validation, the corresponding experimental data [45] are included.Figure 4 (a) and (b) show the centerline density profiles in the nozzle for cases i.I and i.III, respectively.Here, the densities ρ were normalized by the stagnation density ρ o and this ratio was plotted against the non-dimensional axial distance, i.e., the ratio of the axial position from the throat x and the radius of the throat R t .The radial variation of densities normalized by the maximum cross-sectional density ρ c found at the axis is studied at the cross-sectional position x/R t = 13.7 from the throat for case i.I in Figure 5 (a) and for case i.III in Figure 5 (b).The radial density profiles are also studied for other cross-sectional positions depicted in Figure 14 in Section 6.3.The density profiles obtained by the hybrid DSMC method show good agreement with the experimental data compared to the continuum method.It should be noted that the error limits in the measured experimental densities reported by Rothe [45] are ±10 % along the centerline and ±5 % for the relative densities along the cross-sections.
Figure 6 shows the centerline temperature profiles of cases i.I and i.III, respectively.The equilibrium temperature was calculated by the continuum method and the translational and rotational temperatures were predicted by the hybrid DSMC method.Similar to the densities, all temperatures were normalized by the stagnation temperature T o and plotted against x/R t .For case i.I, where Re N > 500, the computed temperatures from all three models agree well with the experimental data, while for Re N < 300 there are significant differences: Unlike the case i.I (Re N > 500) where the temperatures decrease monotonically from the throat to the exit of the nozzle (Figure 6 (a)), in the cases i.II and i.III (Re N < 300) the temperatures reduce to a minimum at x/R t ≈ 6 and increase toward the exit of the nozzle (see Figure 6 (b) and Figure 9.This is a result of stronger rarefaction effects where the flow gets thermalized due to viscous dissipation, i.e., due to more molecule-surface collisions than molecule-molecule collisions.This effect also has an influence on the Mach number profiles shown in Figure A.21.However, it can be observed that the rotational temperatures predicted by DSMC match best with the measurements as the experimental data [45] are rotational temperatures.Figure 7 shows the normalized temperature variation against the radial distance normalized by the cross-sectional radius at x/R t = 13.7 from the throat.The rotational temperatures are again in good agreement with the experimental data.The rotational temperature is always greater than the translational temperature due to flow expansion [45].The equilibrium temperature obtained by the continuum method can be compared with the translational temperatures from the hybrid DSMC method.The differences between these two temperatures, particularly for Re N < 300, and the differences in densities observed above are due to the inaccuracy of the continuum method in the rarefied flow regimes.

External Flow: DSMC, CFD
External flow simulations were carried out for the test cases listed in Table 3 using pure CFD and DSMC methods.The flow is assumed to be laminar, which is in line with the fact that the Reynolds number is relatively small.As described earlier, the results for the continuum approach were used to estimate the breakdown Knudsen number Kn B (Eq. ( 1)).By determining Kn B throughout the simulation domain, it is found that near the surface of the external bodies, Kn B is relatively small (Kn B < 0.004).This is due to a shock wave formed around the bodies which eventually increases the pressure and the density inside the shock layer.The pressure coefficient C p is used as a quantity for validating the external flow cases.In Figure 2 the dotted red lines indicate the measurement line, along which the results are evaluated.
Figure 8 shows the variation of C p on the surface plotted against the axial distance x for cases e.I and e.II, respectively.Due to the near-continuum flow in the vicinity of the surface, both continuum and DSMC methods show similar trends in the results and reasonably match the experimental data [6,56,57].However, the DSMC results show a smoother variation in C p .Furthermore, there is a significant difference in the results of the methods near the stagnation region (x ≈ 0 cm to 0.5 cm).This is because the DSMC method predicts a much thicker shock layer around the body compared to the continuum method particularly near the stagnation region.This phenomenon is subject to further investigations.Sensitivity studies of different simulation parameters are described in Section 6.3.

Impact of DSMC simulation parameters
The results presented in the last two subsections show the level of accuracy of the DSMC method in resolving rarefied gas flows.However, as mentioned in Section 4.2, several simulation parameters are responsible for attaining these accurate results.In this section, the sensitivity of the results with respect to different DSMC simulation parameters is presented.Although the simulations for the internal flow cases were carried out using the hybrid method, only the DSMC domain, i.e., the region where Kn B > 0.05, is considered for this parameter study.

Effect of simulation particles and DSMC grid size
In order to accurately describe the rarefied gas flow, it is important to have a sufficient number of simulation particles in the simulation domain and per grid cell.These represent the distribution of the actual gas molecules and are necessary to preserve the statistical accuracy and resolution of molecular collisions in the simulation.In SPARTA this property is controlled by the keyword fnum.The parameter fnum sets the ratio of real molecules to the simulation particles.Therefore, the smaller the value of fnum, the greater the number of simulation particles and the simulation accuracy.Once the number of simulation particles crosses a certain threshold, the accuracy of the simulation reaches convergence in results.Using much smaller values of fnum compared to this threshold value increases the computational cost as in DSMC the computational cost scales linearly with the number of simulation particles [50].Therefore, it is very important to choose a trade-off value of fnum in order to optimize the computational costs while ensuring the accuracy of the simulation.
The simulation domain was discretized with regular grids and the grid cell size, e.g., ∆x, of the simulation is chosen according to the criterion mentioned in Section 4.2, i.e., ∆x ≤ 1 3 λ min with the minimum value of the mean free path in the simulation domain λ min .
The present convergence study varying fnum has been performed for a uniform grid of size ∆x.The simulation particles were created using the fnum parameter and were distributed such that each cell has roughly the same number of particles.Figure 9 (a) shows the effect of the parameter fnum on the predicted rotational temperature for the 2D internal flow case i.II.A value of fnum ≤ 5×10 15 is required to reach convergence and agreement with the experimental data for the 2D axisymmetric configuration.Likewise, Figure 10 (a) shows the effect of fnum on the predicted pressure coefficient C p for the 2D external flow case e.I.Here a value of fnum ≤ 1 × 10 17 is required for convergence and a reasonable agreement with the experimental data.Using smaller values of fnum increases the number of simulation particles and thereby the computational cost.Hence, the above mentioned values of fnum are chosen as a trade-off for the simulation of the flow cases.A convergence study has also been performed with respect to the grid resolution of the simulation.The uniform regular grid is coarsened until the maximum grid size ∆x ≤ λ min is reached keeping the fnum value constant (trade-off fnum).Figure 11 shows the results achieved by the different grid sizes for case i.II and e.I, respectively.In the limit ∆x ≤ λ min there are no significant deviations between the calculations.Based on these cases the optimal value of the grid size is ∆x ≈ λ min with fnum ≤ 5 × 10 15 for the internal and fnum ≤ 1 × 10 17 for the external 2D axisymmetric configurations.Compared to the previous configuration ∆x ≈ 1  3 λ min these parameters increased the simulation efficiency roughly by a factor of 3.5 for the internal case i.II and by a factor of 4.25 for the external case e.I.The 3D simulations are also carried out with ∆x ≈ λ min and the corresponding trade-off value of fnum is ≤ 5 × 10 20 for the case i.II and ≤ 1 × 10 11 for the case e.I, see Figure 9 (b) and Figure 10 (b).
The number of simulation particles and the computational effort could be further reduced without compromising the computational accuracy by using an adaptive mesh refinement technique.However, this technique is not considered in the current paper.

Effect of time step
The choice of the time step is another parameter which can significantly affect the solution of the DSMC method.In a DSMC simulation the time step ∆t must be chosen in relation to the mean collision time t mct [4].The time step is estimated according to the following equation: Here, v represents the average thermal speed of the molecules which is determined from the kinetic theory of gases, v = (8 K B T )/(π m).In DSMC, a smaller time step requires a larger number of time steps needed to achieve a steady-state solution, corresponding to increased CPU time.Using much larger values of the time step may also increase the CPU time [58] since in DSMC the probability of collisions between two particles increases with the size of the time step.Therefore, a trade-off value of the time step ∆t has to be estimated similar to the previous subsection.The convergence study is performed by varying the time step ∆t using the trade-off values of fnum and ∆x.For the calculation of the mean collision time t mct in Eq. ( 2), the mean-free-path value λ = λ min is chosen.The corresponding values of λ min and t mct estimated for different test cases are tabulated in Table B.7 in Appendix B. Figure 12 (a) shows the effect of the time step on the centerline rotational temperature of case i.II.For this case, it can be seen that in the limit of ∆x ≤ t mct there is no significant change observed in the results which supports the assumption of Eq. ( 2). Figure 12 (b) depicts for case i.I the radial variation of the density near the throat region, where the density is higher since it is near to the continuum region and also due to presence of compression waves near the throat.Here, it is obvious that a time-step value of ∆t ≤ 0.7 t mct is required to attain converged results.Figure 12 (c) shows a similar trend in the calculation of C p for case e.I particularly near the vertex region, where the density increases drastically due to the presence of a shock wave.Therefore, an optimal time-step value of ∆t = 0.7 t mct was chosen.This value increases the simulation efficiency by a factor of 1.1 for case i.I and by a factor of 1.2 for case e.I compared to the recommended value, i.e., ∆t = 0.25 t mct .

Gas-surface interaction
For the internal flow cases, the simulations are performed using various gas-surface interaction models described in Section 4.3.The surface collisions are treated as either fully specular, fully diffusive or a combination of both.Figure 13 (a) and (b) show the effect of these surface collision models on the centerline densities for cases i.I and i.III, respectively.In both cases the simulation with a completely diffusive gas-surface interaction matches well with the experimental results.The simulations performed with a specular gas-surface interaction model yield a faster expansion of the flows compared with the experiment and the simulation relying on the diffuse surface interaction.For simulations with the surface interactions modeled by the combination of 50 % diffusive and 50 % specular collisions, the centerline density curve lies in between the completely diffusive and completely specu-lar simulations.As the proportion of specular collisions increases, the curve shifts toward the completely specular simulation curve and vice-versa.The effect of the gas-surface interactions is also studied regarding radial variations of the density at three different cross-sections of the nozzle.Figure 14 shows these distributions at the non-dimensional distances of x/R t = 3.7 and x/R t = 6.2 from the throat for cases i.I and i.III, respectively.Here, the densities are normalized by the corresponding density value at the axis ρ c .The gray shaded regions in the plots represent the error margin in the experiments [45].Similar trends as observed for the centerline data are visible in the results, i.e., the simulations with the diffusive-surface-interaction model matches well with the experiment.For case i.I at the cross-section x/R t = 3.7 which is close to the nozzle throat, there is a slight deviation of the simulation results in comparison with the experimental density trend shown in Figure 14 (a).The density first increases until a radial distance of r/R w = 0.4 and then reduces with the distance toward the wall.This density hump is due to the presence of a weaker compression wave near the throat [21] which is also captured well by the continuum simulation.Furthermore, the density values do not coincide with the experiment near the nozzle wall.This deviation could be due to the collisional quenching effects of the electron beam technique used by Rothe [45] which reduces the quality of density measurements at higher pressure levels.As the flow of case i.I progresses in the downstream direction, the simulations match well with the experiments due to low pressure levels.Furthermore, for case i.III the quenching effects are reported to be negligible [45] which explains the good agreement of the simulation results with the experiments shown in Figure 14 (b) and (d).For the external flow cases shown in Figure 15, the gas-surface interaction models with fully diffusive and the interaction models consisting of fractions of specular collisions lead to a closer agreement with the experiments.Although the interaction models which are biased toward specular (e.g., 10 % diffusive and 90 % specular collisions) showed the best agreement with the experiments, the values of these specular to diffuse fractions can be case-specific and difficult to estimate.Therefore, it can be a safe option to assume the completely diffuse interaction model.Nevertheless, these two particular cases must the studied in more detail in the future.
Another important parameter in modeling the gas-surface interaction is the thermal accommodation coefficient.As described in Section 4.3, this parameter quantifies the energy exchange between the surface and the gas.For gas-surface interactions which are fully diffusive, this parameter has a negligible effect on the density.However, it has an influence on the temperature.Figure 16 (a) and (b) show the effect of the thermal accommodation coefficient on the rotational temperature at the cross-section x/R t = 13.7 for cases i.I and i.III, respectively.The thermal accommodation coefficient only mildly affects the centerline temperature.However, near the wall it has a significant influence.It can be seen that the fully diffusive gas-surface interaction model with an accommodation coefficient of 0 % (adiabatic wall) is in close agreement with the experiment.As the value of the thermal accommodation coefficient increases, the temperatures near the wall increase and diverge from experiments.Although not shown, the effect of grid refinement near the surface on the results (Figure 13 -16) was also studied.The grid near the wall is refined in the range of ∆x = λ min to 1  3 λ min and within this limit the grid refinement has negligible effect on the results.

Collision and energy exchange models
More sensitivity studies have been conducted considering the choice of the molecular models (i.e., VHS and VSS) and the energy exchange models (constant and variable relaxation).According to the evaluation in Section 6.3.3, the fully diffusive gas-surface interaction model with 0 % thermal accommodation is used in this study for the internal flow cases.The Maxwellian gas-surface interaction model with 10 % diffusive and 90 % specular is used for external flow cases.Figure 17 shows the comparison of the various parameters obtained using different molecular and energy exchange models with the experimental data for cases i.III and e.I.It can be seen that the results predicted by these different models lie with in the experimental error estimates.However, in order to rank the best DSMC simulation configuration for internal flows, the accuracy is quantified on the basis of the error estimation given by: where ψ can be any flow quantity such as the rotational temperature.Again, the collision and energy exchange models have a negligible effect on the density.
The simulation configuration with the variable hard-sphere model with a variable rotational relaxation (obtained using parameters from [51]) yield the best result for internal flow cases.For external flow cases the best result is achieved with the variable soft-sphere model with variable relaxation (obtained using parameters from [51]).However, in terms of performance the variable hard-sphere model with a constant rotational collision number (i.e., ϕ rot = 0.2) is the best choice for both internal and external cases as the simulation speed is roughly 1.5 times higher than the latter.More details on the simulation performance are discussed in Section 6.4.

DSMC sampling
As mentioned in Section 4.2 the macroscopic flow properties shown above are obtained in the DSMC method by averaging the microscopic properties per grid cell.However, DSMC being a probabilistic method is very much prone to statistical fluctuations.The statistical noise in the flow field is typically filtered out by time averaging of the cell properties to obtain mean macroscopic properties.This procedure is performed after a steady state of the flow is established.In Figure 9 and Figure 10, it was evaluated that a sufficient number of particles (defined in terms of fnum) is required to obtain reliable macroscopic values.Furthermore, it can be observed from these figures that using less particles than the trade-off values also induces statistical noise.In this section, the statistical error associated with DSMC according to the effect of the number of particles per cell N c and the number of time steps N T used for averaging is studied.Here, the root-mean-square (rms) error is used as an indicator to quantify the statistical fluctuations.The relative root-mean-square error ϵ ψ in dimensionless form is given by: Again, ψ can be any macroscopic property where ψ max is its maximum value in the flow field and N cell is the total number of grid cells in the simulation domain.ψ i and ψ ref,i represent the computed and the reference solutions in the grid cell i.A sample size S is defined as the product of N c and N T through which a macroscopic property is sampled.In the study, it was observed that the translational temperature had a higher sensitivity to the statistical fluctuations and hence it is chosen to describe the results.The rms error ϵ T based on the translational temperature is determined for different values of S obtained by varying N c and N T , see Figure 18.The reference solution used to estimate the rms error has a sample size of S = 2700 k (N c = 270, N T = 10, 000) for the internal flow and S = 1800 k (N c = 180, N T = 10, 000) for the external flow case.Peak temperatures of 300 K and 4100 K were used for normalizing the rms errors for internal and external cases, respectively.Figure 18 shows that increasing N c has a major effect in reducing the statistical errors.For a fixed value of N c the rms error reduces until it reaches an asymptotic value.This trend was also observed in the study by Chen and Boyd [39].The computational cost for a particular sample size S is observed to be constant for small to large values of N c .Therefore, it is advantageous to use greater values of N c for a specific S leading to the smallest rms error at given CPU-time resources.

Performance study
The last part deals with the performance and computational costs of the DSMC simulations running on the high-performance computer HSUper at the Helmut-Schmidt University.HSUper consists of 581 nodes in total, 571 compute nodes with 256 GB RAM per node, 5 nodes with 1 TB RAM per node, 5 nodes with 2 NVIDIA A100 GPUs.Each node contains two sockets equipped with Intel Xeon Platinum 8360Y (36 cores, 2.4 GHz) processors, hence 72 physical cores or 144 virtual CPUs via hyperthreading.The memory is provided by 16 × 16 GB DDR4 RDIMM 3200 MHz ECC-registered modules.HSUper utilizes InfiniBand HDR, 100 Gb/s, non-blocking fat tree networking.The test cases with the optimized simulation setup (i.III and e.I) were used to study the performance.
The energy consumption of the CPU and DRAM reported in the following are based on the running average power limit [59] as well as the CPU time of the job reported by the SLURM workload manager.In order to keep the result comparable for internal and external flow cases with different configurations, the run-time is normalized in all cases to 1 core/node.SPARTA uses a hierarchical Cartesian grid, which is not a body-fitted grid.Since a big portion of the grid points is therefore outside the computational domain and without simulation particles, the usage of load balancing methods is necessary to optimize computational efforts.
Figure 19 compares the strong scaling results on 1 core up to 144 cores equivalent to 2 nodes on HSUper using static and dynamic load balancing for the internal and external cases.The dynamic load balancing had a huge impact on the speedup of the simulation.The effect of the dynamic load balancing is stronger for the internal case than for the external flow because the external case is a hypersonic flow and due to the very high flow velocity the computational domain gets saturated very quickly with sufficient simulation particles.Thus, the static load balancing at the beginning of the simulation already leads to good results contrary to the internal case where the flow velocity is much smaller than in the external case and the simulation needs therefore a much longer time to reach the steady-state solution.In Figure 19 (c) and (d), a similar behavior is also observed in the energy consumption results, which is strongly related to the run-time.
Weak scaling is achieved by varying the parameter fnum, e.g., the number of particles is increased by a factor of three when running on three cores, compared to the single-core case.The results also shown in Figure 19 are in full agreement with the argumentation given for the strong scaling results.Note that the fnum parameter decreases correspondingly as the number of cores increases, e.g., the fnum value for 1 core is decreased by a factor of 3 for 3 cores.So the number of simulation particles increases for 3 cores and the number of simulation particles per core remains the same as for 1 core.The fnum parameter is set for 9, 18, 36 and 72 cores, correspondingly.Note that in Figure 19 (d) for the internal test case, the weak scaling configuration starts from a different number of particles on 1 core than the strong scaling configuration (fnum = 5e15) in Figure 19 (c).
Figure 20 demonstrates the strong scaling results on 1 to 16 nodes (i.e. up to 1152 cores).It has to be mentioned that all 72 cores of each HSUper node are occupied for the simulation.Unlike in Figure 19 (c) and (d), the energy consumption on more than one node increases constantly in Figure 20 (c) and (d).By using a few processors of a node, the other processors of the node are in the idle mode and still consume energy.Therefore, using more cores of one node which leads to a smaller run-time may require less energy, because the reduction of the computation time may compensate the effect of using more cores.Figure 20 (c) shows that this is not the case when using more than one node.Considering simulations using 1 or 2 nodes, if the speedup of the simulation from 1 to 2 nodes would be exactly 2, then the energy consumption would be the same, since the speedup of the simulation would fully compensate the additional energy consumption of more nodes.But as can be seen in Figure 20 (a), the speedup of the simulation from 1 to 2 nodes is less than two.So the computational time reduction can not compensate the energy consumption of the additional node and therefore the total energy consumption increases (by a factor of 2).It can be summarized that since an entire node is always allocated exclusively, the faster the simulation is finalized on it, the less energy is used and this rationalizes the decrease in energy consumption up to 72 cores.For more cores the energy consumption increases, because more nodes are allocated.Based on this explanation, in the SPARTA cases it is necessary in terms of energy consumption to always fully use all cores of a node as long as runtime decreases are observed.The weak scaling results in Figure 20 are in full agreement with the argumentation given for the strong scaling.

Conclusions
The rarefied nitrogen gas flow through a convergent-divergent nozzle (internal flow) and over a conical body (external flow) were numerically investigated using a continuum-based Navier-Stokes solver and a stochastic Boltzmann solver (DSMC) for different Knudsen-and Reynolds-number regimes.The numerical results were validated against the available experimental data.In the higher Knudsen number range (Kn > 0.1), the DSMC method demonstrated superior accuracy in predicting state properties such as density, pressure, and rotational temperature.However, in the continuum regime (Kn < 0.1) the Navier-Stokes solver exhibited higher accuracy and computational efficiency than the DSMC method.Consequently, for internal-flow cases encompassing variable Knudsen number regimes, a one-way-coupled hybrid approach combining the Navier-Stokes and the DSMC solver was preferred to accurately resolve the flow in the entire domain while mitigating computational costs.
The sensitivity study of the DSMC simulations unveiled several essential insights, particularly regarding parameters such as the ratio of real molecules to simulation particles fnum, the grid size, the time step, the sampling and the surface and molecular collision models: • Threshold values for fnum or the number of particles to approach optimal simulation accuracy were investigated and are reported for both internal and external flow cases in 2D and 3D configurations.
• Our study revealed that varying grid sizes ∆x of the DSMC domain in the range of the minimum mean free path λ min have negligible effects on the results.
• We found that the value of the time step ∆t must definitely be a fraction of the mean collision time t mct .For resolving near-continuum flow regions in DSMC domains, ∆t values of 0.7 t mct or smaller are required.However, for highly rarefied regions the time-step size can be relaxed to ∆t = t mct .
• Gas-surface interactions play a major role in obtaining accurate results.For all cases the results obtained with the fully diffusive interaction model (isotropic scattering) showed good agreement with the experiments.Furthermore, the adiabatic nature of the wall in the internal-flow scenarios necessitates the utilization of a diffusive model with incomplete thermal accommodation, accomplished through the CLL model.
• The sensitivity study on molecular and energy exchange models did not yield major significance on the results and all predictions lie within the experimental margins.However, from the performance point of view, the VHS collision model with constant relaxation is found to have a higher computational speed for all cases.
• The statistical error analysis of the DSMC method showed that the translational temperature had the highest sensitivity to statistical fluctuations.The study favored the use of more particles per cell N c compared to the number of sampling time steps N T for a specific sample size S = N c ×N T .This approach significantly reduces statistical errors while maintaining the same computational costs for the sample size.
In addition, performance studies concerning both strong and weak scaling were conducted to analyze the computational speedup and the energy consumption of DSMC simulations.This investigation revealed that the dynamic load balancing feature of SPARTA provided the most efficient solution.The iterative study presented here exploits an optimal parametric setup for hybrid DSMC/CFD simulations without compromising the computational efficiency.
This benchmarked hybrid methodology holds huge potential for simulating molecular beam experiments, particularly those involving gas flows spanning a wide Knudsen number range.At Deutsches Elektronen-Synchrotron (DESY), molecular beams are generated using an aerosol injector or a cryogenic buffer gas cell for nanoparticle imaging experiments [60,61].The integration of the hybrid CFD/DSMC methodology into the simulation framework currently underway at DESY aims to accurately simulate such molecular beam experiments.Furthermore, this simulation framework can facilitate the study of gas-particle interactions within the experiments and contribute to optimizing experimental designs.

Figure 1 :
Figure 1: Computational domain of the low density nozzle flow (cases i.I to i.III).

Figure 2 :
Figure 2: Computational domains for the external flow test cases.(a) Half body cone with blunted nose, 2D setup.(b) cone with sharp nose, 2D setup.

Figure 3 :
Figure 3: Schematic diagrams of the (a) Maxwell model and (b) CLL models.

Figure 8 :
Figure 8: Comparison of the pressure coefficient Cp over the surface of the body for the cases (a) e.I and (b) e.II.

Figure 11 :
Figure 11: Effect of grid resolution on (a) centerline rotational temperatures for case i.II and (b) pressure coefficient Cp for case e.I.

Figure 12 :
Figure 12: Effect of time step on (a) centerline rotational temperatures for case i.II, (b) radial density variation at cross-section x/Rt = 3.7 for case i.I and (c) pressure coefficient Cp for case e.I.

Figure 13 :
Figure 13: Effect of gas-surface interactions on densities along the nozzle axis for the cases (a) i.I and (b) i.III.

Figure 14 :
Figure 14: Effect of gas-surface interactions on densities at different cross-sections for case i.I (left column) and case i.III (right column).

Figure 15 :
Figure 15: Effect of gas-surface interactions on Cp for the cases (a) e.I. and (b) e.II.

Figure 16 :
Figure 16: Effect of thermal accommodation coefficient on the rotational temperature at the cross-section x/Rw = 13.7:(a) case i.I and (b) case i.III.

Figure 17 :
Figure 17: (a) Centerline rotational temperature profiles predicted with different collision models for case i.III.(b) Comparison of Cp distribution for various collision models for case e.I.

Figure 18 :
Figure 18: RMS error ϵT based on the translational temperature for (a) case i.III and (b) case e.I.

Figure 19 :
Figure 19: Scaling results on up to 144 cores.Left column: strong scaling.Right column: weak scaling.(a) & (b) normalized run-time.(c) & (d) energy consumption for one time step.

Figure C. 22 :
Figure C.22: Structured O-grid of the nozzle geometry.

Figure C. 23 :
Figure C.23: Structured grid (clipped) of a typical conical body viewed from different directions.

Table 1 :
Overview of related work on DSMC/CFD simulations and classification of the present study.Here Kn refers to studies related to the continuum breakdown using global and/or local Knudsen numbers ( a Collision Model, b Gas-Surface-Interaction, c Energy Consumption).

Table 2 :
Flow-condition parameters of the low-density-nozzle cases i.I to i.III.
. The axisymmetric boundary condition was imposed on the line a-d.

Table 3 :
Flow conditions for external test cases.

Table 4 :
Boundary conditions for the low-density-nozzle flow (internal flow).

Table 5 :
while maintaining the overall phase space Boundary conditions for the flow around a conical body (external flow).