FleCSPH: The Next Generation FleCSIble Parallel Computational Infrastructure for Smoothed Particle Hydrodynamics

FleCSPH is a smoothed particle hydrodynamics simulation tool, based on the compile-time configurable framework FleCSI. The asynchronous distributed tree topology combined with a fast multipole method allows FleCSPH to efficiently compute hydrodynamics and long range particle-particle interactions. FleCSPH provides initial data generators, particle relaxation techniques, and standard evolution drivers, which can be easily modified and extended to user-specific setups. Data input/output uses the H5part format, compatible with modern visualization software.


ture (FleCSI) [1] developed at Los Alamos National Laboratory (LANL).
FleCSI is a task-based runtime abstraction layer that provides a seamless programming model for distributed-memory tasks (Legion [2]), and finegrained data-parallel kernels (Kokkos [3]), with several core topology types that can be statically specialized to support a variety of applied methods. When using FleCSI, FleCSPH has the potential to separate the application implementation from the details of machine architecture, although currently only MPI back end is implemented.
SPH is an explicit mesh-free Lagrangian method that solves the partial differential equations of hydrodynamics by discretizing the flow with a set of fluid elements called particles [4,5]. The main SPH formula to interpolate some quantity A( r), specified by its values over a set of particles A b ≡ A( r b ), is given by where W is a smoothing kernel, h is the smoothing length (hydro interaction range) at a position r, and V b is a volume element, usually V b = m b /ρ b . SPH has several advantages including handling complex geometries and support for true vacuum conditions. Conservation of mass is included by construction, and conservation of linear momentum, angular momentum, and energy can be implemented up to machine precision. Since the quantities are stored in the moving particles, SPH has the advantage of exact and automatic advection. Furthermore, the same tree structure used for determining particle neighbors can be employed for computing gravitational forces. SPH particles can carry the stress history of the material to determine damage, model fracture, and fragmentation of solids [6]. Complicated processes in reacting flows are easily incorporated into an SPH model [7]. In this paper, we outline the main features of FleCSPH. The initial version of this software is described in [8], while subsequent extensions are presented here.

Software description
FleCSPH is written in C++ for UNIX computing and supercomputing platforms taking advantage of modern features of C++ and the Standard Template Library (STL) [9]. CMake [10] provides the build system and FleCSPH can be integrated with Spack [11] to construct build-and run-time environments. Particle data is output in the H5part format [12], compatible with modern visualization software, such as SPLASH [13], Paraview [14] or VisIt [15]. Simulation checkpoint and restart is supported.

Software Architecture
Drivers and initial data generators. The main set of user "apps" are drivers, which implement evolution equations, and initial data generators, which produce initial particle configurations. Two drivers are provided to evolve hydrodynamics with and without gravity. Users can also iterate on the set of existing drivers for more advanced physics models with different numerical evolution schemes and create custom particle system generators. A suite of initial data generators is provided including: five standard Sod shock tubes, Sedov blast wave, Noh implosion, as well as Rayleigh-Taylor and Kelvin-Helmholtz instability.

Physics
Functionalities. An SPH "particle" is implemented as a class body u, templated on the problem dimension and basic floating-point type. 'body' refers to a single body in N-body system and ' u' stands for 'unspecialized.' Elements of this class consist of various particle physical properties, e.g., density and velocity, with standard mutators and accessors to allow customized access control. FleCSPH offers a selection of kernels, two SPH formulations, several viscosity prescriptions, particle relaxation mechanism, external conservative forces, and multiple equations of state (EOSs). The list of options for physics functionalities is described in the developer's notes on the project wiki page. These functionalities are sorted into different C++ header files.
Parameter files. The choice of physical and numerical methods is made at runtime based on the options supplied by an ASCII parameter file with a key-value syntax. The parameter file specifies options such as the number of particles, SPH kernels, and boundary conditions. The complete list of options is located on the wiki page. FleCSPH parameter files are concise and human-readable records of simulation conditions, allowing for simple reproducibility.
Tree topology. FleCSPH uses a hashed tree [16,17], also known as a binary tree, quadtree, or octree in 1, 2, or 3 dimensions, respectively. Space-filling curves are used for the domain decomposition and the hash-table construction. This allows for finding the parent or children of a node in O(1) on average. Both Morton [18] (Z-order) and Hilbert space filling curves are implemented, which show faster computation of keys or better particle locality, respectively. Figure 1 diagrams the tree topology. The central data structure of the tree is a hash table. It stores cells, denoted hcells, which can represent both a particle and a node. An hcell is identified by a binary-string key determined from the space-filling curve and the type of cell they represent:  for particles, the key is computed directly from their coordinates; for nodes, it is computed based on the position of the node center of mass (CofM). To resolve any key conflicts, each particle is assigned a unique ID. The hashing function to distribute the hcell takes the N last bits of the keys, offering a perfect distribution at the bottom and contiguous storage at the top of the tree, which is accessed more often. This representation allows for the parent or children of a branch to be easily located by adding or removing a digit in their key and accessing the hash table, an order O(1) on average operation. Creating the distributed tree requires several steps. First, the particles are added individually to the tree, and the necessary nodes are refined until the particles are isolated each in their own node. The node is refined into 2 D sub-entities, where D is the dimensionality. After the local trees have been created, information about the top part is exchanged across the MPI ranks. Using a hypercube communication pattern, all the ranks share the useful data resulting in a tree where each unknown node belongs to only one other MPI rank.
FleCSPH implements hydrodynamics with SPH, and gravity with the fast multipole method (FMM). While SPH only computes short-range forces, gravity is a long-range interaction, requiring global communication. Both cases of short-and long-range forces are handled by the same particle tree, but two different types of tree traversals are used: SPH and FMM.
The tree traversal is performed on a group of particles in the same node using the CofM boundary information to prune empty areas of the tree. A list of neighbors per particle is built, and the physics functions are called during the traversal. When encountering a non-local node, the ranks use asynchronous MPI to request missing information from the owner, while iv continuing the traversal on other particles. When the data arrives, it is added to the tree and used to complete the neighbors list.

Software Functionalities 2.2.1. SPH Formulation in FleCSPH
FleCSPH numerically solves Euler equations of ideal fluid in their Lagrangian formulation, expressing conservation of mass, momentum and energy: where ρ is density, d/dt = ∂ t + v · ∇ is convective derivative, v is fluid velocity, u is specific thermal energy, P is pressure, and g is an external acceleration. The gravitational acceleration is determined by the fluid selfgravity, an external gravitational field, or both.
For SPH discretization, we use one of the simplest formulations [19]. The Euler equations are discretized with the volume element V b = m b /ρ b , and an artificial viscosity term Π ab is added: where W ab = W (| r a − r b |, h ab ) and h ab = (h a + h b )/2. The density is not evolved but reconstructed from the particle positions: For the viscous stress tensor Π ab , we follow the standard prescription [20,19]. Alternatively, FleCSPH features an implementation of the so-called thermokinetic formulation [19], in which the total particle energy is evolved: Corresponding discretized version of the energy equation reads, v An adaptive smoothing length h a allows placing particles where more resolution is needed [21,22,19]. The smoothing length is adapted according to the expression: where D is dimensionality, C is a kernel-dependent normalization constant, and η is a user-specified number of neighbors [19]. In this method, the number of neighbors for hydrodynamic interactions remains approximately constant for all particles and times.

Computing Gravitational Force via FMM
FleCSPH uses the FMM approximation [23] to treat gravitational interactions, following [24], up to the first order in the Taylor expansion. Without approximation, pairwise N-body interactions have O(N 2 ) computational complexity. FMM replaces gravitational forces between individual particles in distant nodes with symmetrized node-node interactions [24]. Nodes are accepted for node-node interaction, if they satisfy the so-called "Minimal Acceptance Criterion" (MAC, see Figure 2).
Attraction of a distant node is approximated by a series of multipoles. Because of the symmetry in nodal interactions, FMM conserves linear momentum. In the implemented first order of the multipole expansion, angular momentum is conserved exactly. The gravitational acceleration g a of a particle a is computed as the sum of attractions from all cells passed during the tree traversal while satisfying vi the MAC angle with respect to the particle: where G is the Newton's gravitational constant, A is the cell containing particle a, a ∈ A, and B is the other cell, a ∈ B. R A and R B are the centers of mass for A and B, respectively. R AB = R A − R B is the distance vector, and M B is the mass of B.
When θ MAC = 0, eq. (10) reduces to the exact expression for Newtonian interactions between individual particles.

Illustrative Examples
In this section, we provide several test cases to demonstrate validation and functionalities. Detailed instructions for all test cases can be found in the wiki page.

Basic Hydrodynamics Problems: Sod Shock Tube Test
The Sod shock tube is a standard test with a classical Riemann problem with the following initial parameters: This leads to the development of a shock front, which propagates from highdensity into low-density regions, and followed by a contact discontinuity. A density rarefaction wave propagates into the high-density region. Figure 3 shows a 1D Sod shock tube with 10, 000 particles. Density(ρ), pressure(P ), specific internal energy(U ), and velocity(v r ) are plotted at four different times. The dashed black lines are analytic solutions. Despite small deviations appearing near shock and contact discontinuities, our results agree well with the analytic solution.

Testing Gravity: Stellar Oscillations
As a demonstration of the numerical methods for self-gravitating fluids, we present evolution of a stable isolated star in equilibrium. Truncation error in the initial configuration triggers small oscillations of the star, most notably its fundamental radial mode and the first few overtones [25]. These oscillations are damped by the viscosity during the evolution. This test checks consistency and conservation properties for the coupled hydrodynamics and gravity. We compare conservation of energy, momentum, and angular momentum, computed using the FMM approximation, and using the exact pairwise N-body particle interactions with O(N 2 ) complexity. For the initial data, we solve Lane-Emden equation [26] for polytrope with Γ = 5/3, K = 10 12 (in CGS units), and central density ρ c = 5.2 × 10 6 g cm −3 . This results in a polytrope resembling a white dwarf, with mass 0.2 M and radius 4790 km. and period of adiabatic oscillations in the fundamental mode for this kind of polytrope is 7.75 s (see e.g. [25], Table 8.1). Figure 4 shows a star discretized with 14,993 equal-mass particles evolved using the thermokinetic formulation. It demonstrates perfect conservation of linear and angular momenta to machine precision. The top right panel shows the relaxed particle configuration. The top left panel displays the evolution of the gravitational energy (normalized to its initial value), for the exact N-body gravity with pairwise interactions, and FMM implementation with different values of tan θ MAC . Gravitational energy oscillates with the fundamental frequency. Evolution of the gravitational energy differs for the different values of MAC, but the results approach the N-body case when the MAC angle  decreases. Since the O(N 2 ) algorithm computes gravitational forces exactly, this provides validation of the FMM implementation. The FMM algorithm could be extended to higher orders in Taylor expansion [24,27] to provide better agreement with the exact N-body scheme. However, conservation of angular momentum breaks down for higher orders, and special techniques are needed to recover it [28].

Integrate Example: White Dwarf Binary
A prominent application for SPH is stellar mergers [29,30,31,32]. Here, we present a binary white dwarf(WD) simulation. To set up the system spherical particle distributions are generated for the individual WDs, similar to the setup in Section 3.2. The configurations are then placed on a Keplerian orbit.
ix Figure 5: The density of particles in a binary white dwarf merger. Figure 5 shows four frames of the merging double WD binary. The corotating system is composed of 21,295 particles using the zero temperature WD EOS [33] and has an initial period of 40.1s. In order to limit the simulation time for this example the particle number has been lowered and the orbit is chosen such that the stars immediately being transferring mass and merge within a single orbital period. In nature the stars would begin to slowly transfer mass at a wider orbit and this process would evolve over many orbital periods, but end a qualitatively similar state. Figure 6 demonstrates strong and weak scaling of FleCSPH on LANL supercomputing clusters Grizzly and Capulin. Grizzly is an 8-SU cluster running RHEL Linux v.7.7, it has dual socket 2.1 GHz 18 core Intel Broadwell E5 2695v4 processor with 45MB of cache and 128GB of RAM on each node. Capulin is an Advanced RISC Machine cluster (ARM) by Cray, Inc., with 56 cores in simultaneous multi-threading (SMT) mode with four threads per core and 256GB of RAM per node. Scaling runs on Capulin were capped at 32 cores/node. The oscillating star setup with different numbers of particles was used for both tests (see Section 3.2). For jobs with number of ranks from 1 to 32, we used a single node, and for larger jobs -multiple nodes with 32 ranks per node. This manifested in a drop in strong scaling efficiency on Grizzly between 32 and 64 ranks is due to the inter-node data transfer. On Capulin, efficiency degrades due to increased context switching between SMT threads. For the weak scaling, efficiency drops between 1 and 2 ranks due to parallel overhead, but then remains relatively flat. Overall both SPH and FMM tree traversals show comparable scaling.

Scalability
Unlike the exact N-body with quadratic complexity, FMM algorithm has linear (or even sublinear) complexity [27]. For FleCSPH, this is shown in the bottom right panel of Fig. 6. The SPH tree traversal is expected to have O(N log N ) complexity, which is satisfied in FleCSPH for the highest number of particles.
The timing of FMM algorithm strongly depends on the MAC angle. Top left panel in Fig. 6 shows the timing of a single FMM tree traversal as a xi function of MAC. In the extremes, MAC=0 reduces to the exact N-body, while MAC=1 corresponds to nodes in contact, i.e., accepting any nodes that do not intersect. The speedup between the extremes for 38,000 particles is almost three orders of magnitude, which will be even higher for larger number of particles. This comes at a moderate price, because using higher MAC reduces code accuracy, as shown in Fig. 4.

Impact & Conclusion
FleCSPH is designed to be a performance portable particle hydrodynamics simulation tool, oriented to explore modern heterogeneous architectures and massive parallelism. It opens an easy avenue for researchers to write efficient applications that will perform at scale, for those areas that can benefit from meshfree methods. Its modular design allows users to extend the initial hydrodynamics + gravity suite with a variety of other multi-physics applications. In this work, we demonstrate the structure and capabilities of FleCSPH using several examples of standard hydrodynamic tests and a few astrophysical applications.
FleCSPH was designed from conception to use the compile-time configurable framework FleCSI, with its functional programming model for execution, control, and data abstractions that are consistent both with MPI and with task-based runtime systems such as Legion (distributed) or Kokkos (node-level). However, the current version of FleCSPH does not take advantage of all features provided in FleCSI: it is limited to only having the support for the MPI backend. Future development plans include incorporating more FleCSI functionalities into FleCSPH.
As its nearest goal, FleCSPH will address problems in astrophysics that involve highly irregular morphologies and are sensitive to conservation properties. Examples of such problems include mergers of binary white dwarfs or neutron stars, tidal disruptions of stars by black holes, fallback accretion, planetary impacts, and more. FleCSPH is currently being used by scientist at LANL to address these research topics and is also being used as an educational tool for participants in LANL's computational science and physics student programs.
With the recent advances in SPH techniques, new codes have been developed [34,35,36], bringing the fidelity needed to resolve some of these difficult research problems. FleCSPH will feature performance portability in an open-source environment, which will allow its users to study the big open questions in astrophysics and hydrodynamics at scale. Several natural avenues for future development include new multi-physics applications, such as Lagrangian magnetohydrodynamics [37], radiation transport in the fluxlimited diffusion approximation [38], and coupling to other methods such as Monte Carlo method [39].