Towards real-time fluid dynamics simulation: a data-driven NN-MPS method and its implementation

ABSTRACT In this work, we construct a data-driven model to address the computing performance problem of the moving particle semi-implicit method by combining the physics intuition of the method with a machine-learning algorithm. A fully connected artificial neural network is implemented to solve the pressure Poisson equation, which is reformulated as a regression problem. We design context-based feature vectors for particle-based on the Poisson equation. The neural network maintains the original particle method’s accuracy and stability, while drastically accelerates the pressure calculation. It is very suitable for GPU parallelization, edge computing scenarios and real-time simulations.


Introduction
The moving particle semi-implicit (MPS) method, initially proposed by Koshizuka and Oka [1], was developed to solve the Navier-Stokes equations in the Lagrangian framework for incompressible viscous flows.It is essentially a fractional step method that consists of the split of each time step into two steps of prediction and the correction.The MPS method improves the accuracy and stability of the pressure calculation compared to many other particle methods that calculate pressure explicitly, like the smooth particle hydrodynamics (SPH) [2].Giving the stability of the pressure calculation and the convenience of dealing with free-surface boundary conditions, MPS are widely implemented to address problems like haemodynamics simulations [3], large-scale simulations on geographical phenomena [4] and analysis on material processing [5].However, accuracy comes at the cost of computational resources and time.In the MPS scheme, the pressure of the next frame is derived from solving the pressure Poisson equation.The preconditioned conjugate gradient (PCG) method or the incomplete Cholesky conjugate gradient (ICCG) method is usually used to solve the discrete pressure Poisson equation and requires enormous computational resources as the number of particles is large.This is also discussed in [6].Generally speaking, these iterative methods are not suitable for the GPU parallelization.Hence, large-scale simulations using MPS are difficult without high-performance CPU clusters; it prohibits MPS from being widely employed as a desktop tool for engineers.A modified method, the explicit MPS method [7], has been proposed to make the solving where D Dt is the material derivative for time, u is velocity, g is the external body force per unit mass, ρ and ν are the density and kinematic viscosity, respectively.

Kernel function and particle number density
The kernel function describes how a particle interacts with others in its vicinity.In this work, we adopt the same kernel function from the original MPS method, see [1].The radius of the interaction area is determined by the parameter r e , wðrÞ ¼ r e r À 1; ð0 � r < r e Þ 0 ðr e � rÞ � : Particle number density at coordinate r i , where the particle i is located, is defined by If the incompressible condition is satisfied, the particle number density should be constant N 0 .

Discrete differential operators
A gradient vector between two particles i and j having scalar quantities ϕ i and ϕ j at coordinates r i and r j is defined [1] as where d is the dimension of the space, minðϕ i Þ denotes the minimum value of a scalar quantity ϕ among the neighbouring particles of particle i.
Similarly, a divergence scalar quantity between two particles i and j having vector quantities φi and φj at coordinates r i and r j can be defined as

:
The Laplacian model is derived from solving a diffusion equation, i.e.
where the normalization parameter λ is given by where V denotes the neighbourhood of particle i.

Boundary condition and searching neighbour particles
In the MPS method, dynamic boundary conditions are imposed to deal with the freesurface.The particle number density is smaller for particles in the vicinity of free-surface because their neighbourhoods include empty air regions.Particles satisfying the following criteria are considered as on the free-surface.

N �
i < βN 0 ; where β in this work is selected to be β ¼ 0:97.In the pressure calculation, the dynamic free-surface condition is satisfied by setting the pressure of free-surface particles to be zero, which equals the atmospheric pressure.
For the wall boundary condition, as shown in Figure 1, two layers of wall particles are placed along the solid boundary, surrounded by two layers of ghost particles.For the pressure calculation, the wall particles are used; they interact with nearby fluid particles, while their positions keep stationary.The ghost particles are only taken into consideration for calculating the particle number density to avoid that wall particles are identified to free-surface particles.
Many steps of the MPS method requires neighbour particles searching.Three searching techniques are commonly used in particle methods.They are the linear scanning, the linked-list searching [7], and the tree searching [19].In this work, we use the linked-list searching technique.The linked-list searching places a temporary grid on the calculation region, and particles are assigned to different cells.Particles in the same or adjacent cells are considered.

Velocity update scheme
The MPS method is a predictor-corrector method for the velocity, which includes three main steps [20].
Firstly, it transits particles using viscosity force, body forces, and applies then collision detection.
Secondly, it calculates the source term and the coefficient matrix using the temporary location r � i and the temporary velocity u � i and solves the pressure Poisson equation.We use the modified source term from Lee et al. [20], where u � i denotes the temporary velocity of particle i, N � i is the temporary particle number density of particle i, γ is a blending parameter ranging from 0.0 to 1.0.Based on our numerical tests, we set γ ¼ 0:2.
Finally, it uses the predicted pressure p nþ1 i to update the temporary velocity u � i and temporary location r � i , i.e.
3. A data driven pressure model

Modelling of incompressibility using a data-driven pressure projection
In the MPS method, the calculation of pressure uses the continuity equation and the method keeps the particle number density N � to be constant N 0 .The discrete Poisson equation of pressure for particle i is ) Since it is based on the interaction between the particle and it neighbour particles, it has to be solved globally (simultaneously for all particles), which brings challenges to the parallelization.Also, for large-scale problems, solving large sparse systems can be expensive.To address this problem, we design a data-driven model to efficiently carry out the pressure calculation while maintaining the robustness and stability.We reformulate the pressure projection step as a data-driven regression problem.The machine learning model used in this work is the neural network.However, it relies on the initialization of weight parameters, and many of its mechanism is still not so well understood.Yet, its capability of extracting, transforming low-level features and fitting nonlinear function makes it an excellent choice to serve as a regressor.
To ensure the robustness and accuracy of the neural network, choosing an appropriate representation for the model's input is very important.In traditional data-driven modelling problems, techniques like principle components analysis (PCA) are implemented to derive representative feature of the input data.Since the issue addressed here is physicsinformed, the construction of the feature vector can be referred to the original MPS method.
In the pressure Poisson equation, we have And it can be rearranged as From equation ( 1), the pressure of the centre particle i relies on its temporary velocity and particle number density, and the weighted pressure of other particles in its neighbourhood.Based on this implicit relationship, we consider the regression problem of form where H is the neural network regressor that projects the feature vector into the pressure solution space, ϕ ð�Þ denotes the integral pressure feature of the centre particle or neighbour particles.
For a centre particle i and particles in its neighbourhood, as shown in Figure 2, the integral pressure feature is defined as where p n i denotes the pressure of particle i in the current frame.This integral pressure feature implies the magnitude of the interaction between the centre particle and its neighbour particles in the current frame (the nth frame).In the MPS method, the actual pressure of the centre particle of the next frame (the ðn þ 1Þst frame) is a linear combination of the integral pressure feature of neighbour particles (also of the next frame), and the temporary particle number density and divergence of velocity.For the pressure regression problem, we expect the neural network to learn a more abstract way of combination such that, given the integral feature of the nth frame, it can predict the pressure of the ðn þ 1Þst frame.The velocity update scheme is shown in Figure 3.

The training process
The basic structure of the neural network is shown in Figure 4.The depth of the hidden layers is set to three.We have found that, although deeper neural networks can reach smaller losses, they perform worse than the one of three hidden layers.The neural network is fully connected, consists of three rectified linear unit layers [21] (ReLU) and a linear activation layer.The forward propagation of the proposed neural network can be summarized as (2) where, given the feature vector θ, pnþ1 is the pressure of the next frame generated by the neural network, f is the nonlinear activation function ReLU, W ðlÞ and b ðlÞ are parameters of the lth layer, l ¼ 1; 2; 3; 4 f g.The integral pressure feature of each neighbour particle is numbered according to the distance between the centre particle and it.For example, in Figure 4, ϕ pres 1 is the integral pressure feature of the nearest particle.This numbering method can help the neural network to learn the underlying invariance of the system.The radius of neighbour is set to 1:1 � l 0 .l 0 is the initial distance of each particle.The reason of choosing it is that particles beyond it barely contribute to the system, see Figure 5.Then, based on this radius, we set the number of input integral pressure features to eight.If there are less than eight neighbour particles in the radius, vacant positions are filled by 0.
For the learning of the pressure regressor, we measure the loss with mean squared L2norm, i.e.
where m is the number of training sets, pi is the outcome of the neural network.Here, we use the result of the CG method as the ground-truth label to train our neural network.The training was done in the four simple scenes shown in Table 1 and Figure 6 with a total number of a billion training samples.Namely, we apply the back propagation [22] to update the parameters in the neural network and use Adam [23], a mini-batch stochastic optimization algorithm, to optimize the loss function.We also use the dropout [24] method to avoid overfitting.The values of W l and b l , l ¼ 1; 2; 3; 4 f g, in (2) after training are shown in Figure 5.It can be seen that Ñ � u is negatively correlated to the results while ϕ j is positively correlated to the results, and the correlation decreases as the distance increases.The training process totally takes about 3-4 hours on our platform.
Visualization of W l and b l , l ¼ 1; 2; 3 f g, in (2) after training.The vectors at the top are b and the matrices at the bottom are W. From left to right: 1 st layer, 2 nd layer, 3 rd layer, 4 th layer.

Validation on workstation
The dam collapse is a benchmark problem in the simulation of free-surface flow.To illustrate the performance of the NN-MPS, we implement both the original MPS and the NN-MPS on the dam collapse problem and compare their results to those of an experiment.The parameters of this test are listed in Table 2. Figure 7 shows that particles of two methods move in a similar pattern.Pressure in NN-MPS distributes more smoothly, resulting in a less drastic velocity oscillation.
The pressure distributions at 0.3 s, 0.9 s and 1.5 s are shown in Figure 8.The pressure distributions of the NN-MPS method become more uniform compared to those of the original MPS method.From inside to the surface, particle pressure in both methods decreases due to the decreasing particle number density.However, in the original MPS method, the pressure boundary is not obvious, while that in NN-MPS is more obvious.
The comparison of the front position and pressure at the left-bottom corner is shown in Figure 9.In MPS and NN-MPS, the aspect ratio is 2.25, and in the experiment of J.C. Martin and W.J.Moyce [18], the aspect ratio is 1.125 and 2.25.At the beginning stage, the particles in NN-MPS move more slowly (closer to the experimental results) than MPS.After a while, the MPS results become closer to those of the experimental data.For the Table 1.Parameters of different scenes in the training set.The time step is set to 0.003 s and the simulation time of all scenes is 1.5 s (500 steps).The initial distance is set to 0.075.The size implies the number of particles, where r is the radius, h is the height, l is the length and w is the width.Overall, the changing patterns of pressure in both methods are similar.We also applied NN-MPS to two dam collapse with obstacle problems.In these problems, particles out of the right-side boundary (x ¼ 36l 0 ) will not be calculated anymore.Physics parameters (except those of the obstacles) are set to be the same as the dam collapse problem.The radius of the cylinder obstacle is 4l 0 .As shown in Figures 11-12, visually there are few differences between the results of NN-MPS and MPS.The pressure and z component of the pressure gradient are compared in Figure 10.In both cases, the NN-MPS results show a slightly larger oscillation in the early stage, but gradually approach the MPS results.

Efficiency comparison
The profile of devices used for this test is as follows: Intel Core i7-6820HK @ 2.70 GHz, 16.0GB memory, Nvidia GTX 1070.The algorithms are mainly implemented in Python.The regression model is constructed using the deep-learning module Tensorflow [25] and is trained on GPU.The calculations are conducted on both CPU and GPU.The GPU  implementation of NN-MPS uses CUDA.The parameters used for these tests are shown in Table 3.
Results are shown in Figure 13.As the scale of the problem increases, the pressure calculation time in MPS can increase in a relatively large rate.In contrast, the increasing  rate of the pressure prediction time in NN-MPS is almost linear to the total particle number.Moreover, the pressure calculation in NN-MPS is independent for each particle.Thus, the algorithm can be parallelized efficiently on GPU.Since GPU has much more processing units and threads than CPU, it can outperform CPU on computing efficiency by far.The efficiency of the GPU implementation of NN-MPS is nearly 50 times of that of the CPU implementation.

Edge computing
Our final goal is to achieve the real-time simulation.One alternative way is to use hardwares to speedup the computation.In the previous section, we have used the GPU to accelerate the computation.While in this part, we choose an edge computing device, which is cheaper and needs lower electrical power to achieve a more efficient computation.

Introduction to edge computing and DaVinci architecture
Edge computing is a distributed computing framework that computes data at its source and requires a quicker response and a lower power consumption.Atlas 200DK Developer Kit (or, shortly, Atlas 200DK) is a high-performance AI application development board for edge computing scenarios.In this part, Atlas 200DK is taken as an example to demonstrate how to apply NN-MPS to edge computing and how well it works.Ascend 310 is the main computing chip of Atlas 200DK which consists of two AI cores (Da Vinci architecture) and eight AI CPUs.The DaVinci architecture has three kinds of basic computing units, i.e. cube, vector and scalar units, which provide high-speed matrix, vector, and scalar computing.These AI cores are extremely efficient for neural network thus is possible to speedup up the computation of NN-MPS.
To run NN-MPS more effectively on Ascend 310, two important adjustments must be made: using AI Core to accelerate the neural network process and working with a highefficiency parallel algorithm to make full use of all the AI CPUs.

Multi-precision and parallel strategy
One of the ways AI core accelerates computation is to reduce floating-point precision to reduce floating-point computation time.Therefore, the floating-point precision of the AI  core is reduced to 16 bits.Sixteen bits floating-point consists of one sign bit, five exponential bits, and ten bits for fraction, it can be calculated with an accuracy up to four significant digits.
Lower precision will lead to a bad influence on the simulation results.However, in our numerical tests, this influence is limited as will shown in Section 5.3.Therefore, the accuracy loss of running the neural network on AI cores is acceptable.Figure 14 shows the multi-precision strategy used in this paper.
The parallelization is as follows: Firstly, as shown in the left diagram of Figure 15, the flow field is evenly divided into rectangular grids based on the search strategy, i.e. the linked-list method.Then, these grids are divided into several parts along the x-axis.Each part is computed by a single process.As shown in the right diagram of Figure 15, since each particle pressure depends only on particles nearby, each process only needs to store some buffer particles of its neighbour processes.For each time step, the only communication between processes is exchanging their buffer particles.

Numerical results
The three testing models shown in Figure 16 are used to demonstrate the reliability and superiority of NN-MPS on the edge computing device.Details are shown in Table 4.
The running times in the blue box in Figure 14 are measured and displayed in Table 4. AI Cores lead to a significant improvement on the pressure-solving time thanks to their high efficiency for neural networks.The results of free-surface motion are also shown to show that the increase in efficiency does not cause a decrease in accuracy.As shown in Figure 17(a), X is the motion of the fluid front, and H is the height of the fluid on the left edge.Results are shown in Figures 17(b), Figures 17(c) and 17(d).It is easy to see that the results of freesurface motion have very low error.The pressure results at the left-bottom corner are also shown in Figure 18.There is an error in the results, but the error is limited.And as shown in Figures 19-21, the results of two different devices are visually the same.
Results of the parallel speedup and efficiency are shown in Figure 22.The efficiency is as high as 87.5% with four computing processes, which shows the superiority of NN-MPS in parallel computing.

Summary and discussion
A novel machine learning method to accelerate the pressure calculation of the MPS method is proposed in this work.According to our tests, the neural network regressor we built can successfully learn the underlying pressure variation pattern and maintain the robustness.Since the projection through a neural network does not require iterations, it can significantly improve the simulation efficiency.Moreover, our work can attain a similar effect with a relatively small neural network.This characteristic further helps saving computational resources.The proposed NN-MPS shows clear computational cost improvement for free-surface flows and does not sacrifice accuracy for the pressure computation.
Like many other machine learning techniques and data-driven approaches, a weakness of the proposed method is that it cannot be generalized to other cases that are too different from the training data (e.g.flows with different physics parameters such as particle mass, density, viscosity, etc.).Another problem is that, in every frame, the continuity equation cannot be entirely satisfied; the average particle number density sometimes deviates from the initial constant N 0 .However, this error will not accumulate, and the NN-MPS can maintain the convergence even after hundreds of time steps.Since the neural network is used to solve a pressure Poisson equation with specific spatial and temporal resolutions, different values of these two parameters can lead to totally different solutions.As a result, the NN-MPS cannot run with training sets of different spatial and temporal resolutions.However, as a particle method, we can run many different cases with the same resolution and different geometries, which extends the application scenario of this method.
The introduction of the neural network provides MPS an opportunity for the usage of advanced computing deep learning hardwares, and the feature of not relying on all particles in the computing domain makes it easier to be parallelized.By running on special AI hardware, the computation time of the pressure Poisson equation is further reduced.The results also show the possibility to reach a high speed with a good parallel strategy.Discussions about running NN-MPS on the edge computing device shows the potential for the NN-MPS method to realize real-time simulation in the future.
Future works of this research include the implementation of geographical phenomena simulations and realistic graphics rendering to demonstrate its functionality on practical problems with limited computational resources.The time interval in the simulation requires further study, which has a significant influence on the simulation speed.In this study, we only adapt the constant time interval criterion from the MPS method.The pressure oscillation is also an important aspect for further improvements.We will also consider some ways (e.g.methods in [26][27][28]) to reduce the pressure oscillation.

Figure 1 .
Figure 1.The arrangement of particles near the wall.

Figure 2 .
Figure 2. A description of a particle and its neighborhood.

Figure 3 .
Figure 3.The velocity update scheme in the proposed NN-MPS method.

Figure 4 .
Figure 4.A description of the neural network.For each neuron in the hidden layer, the nonlinear activation function is the ReLU.Each of the first two hidden layers has eight neurons, and the last hidden layer has four neurons.

Figure 5 .
Figure 5.A Visualization of W l and b l , l ¼ 1; 2; 3 f g, in (2) after training.The vectors at the top are b and the matrices at the bottom are W. From left to right: 1 st layer, 2 nd layer, 3 rd layer, 4 th layer.

Figure 6 .
Figure 6.A visualization of the training set.From left to right: scene 1, scene 2, scene 3, scene 4.

Figure 9 .
Figure 9.A comparison of the position of fluid front and pressure at bottom of the left wall.

Figure 10 .
Figure 10.The pressure and the pressure gradient of flow field.Left: With one-cylinder obstacle.Right: With a two-cylinder obstacle.Upper: Pressure at bottom of the left wall.Lower: z component of pressure gradient at bottom of the left wall.

Figure 11 .
Figure 11.A comparison of two algorithms on the problem with a one-cylinder obstacle.Top row: NN-MPS.Bottom row: MPS.

Figure 12 .
Figure 12.A comparison of two algorithms on the problem with a two-cylinder obstacle.Top row: NN-MPS.Bottom row: MPS.

Figure 13 .
Figure 13.The ratio of pressure calculation time.

Figure 14 .
Figure 14.An illustration of the multi-precision strategy used on ascend 310.

Figure 15 .
Figure 15.A description of dividing the domain into grids and distributing the grids to different processes.

Figure 16 .
Figure 16.A visualization of three testing models.

Figure 17 .
Figure 17.Numerical results of the free-surface movement.(a) shows the definition of X and H. (b), (c) and (d) respectively show the results of models with index number 1, 2 and 3.

Figure 18 .
Figure 18.Pressure results at left corner.(a), (b) and (c) respectively show the results of models with index number 1, 2 and 3.

Figure 19 .
Figure 19.A comparison of two different devices on the problem with case 1. Top row: CPU.Bottom row: AI Core.

Figure 20 .
Figure 20.A comparison of two different devices on the problem with case 2. Top row: CPU.Bottom row: AI Core.

Figure 21 .
Figure 21.A comparison of two different devices on the problem with case 3. Top row: CPU.Bottom row: AI Core.

Figure 22 .
Figure 22.The parallel efficiency and speedup.

Table 2 .
The parameters setting for the dam collapse problem.

Table 3 .
The parameters setting for the dam collapse problem.

Table 4 .
The model sizes and speedup.