DNN-MG: A Hybrid Neural Network/Finite Element Method with Applications to 3D Simulations of the Navier-Stokes Equations

We extend and analyze the deep neural network multigrid solver (DNN-MG) for the Navier-Stokes equations in three dimensions. The idea of the method is to augment a finite element simulation on coarse grids with fine scale information obtained using deep neural networks. The neural network operates locally on small patches of grid elements. The local approach proves to be highly efficient, since the network can be kept (relatively) small and since it can be applied in parallel on all grid patches. However, the main advantage of the local approach is the inherent generalizability of the method. Since the network only processes data of small sub-areas, it never ``sees'' the global problem and thus does not learn false biases. We describe the method with a focus on the interplay between the finite element method and deep neural networks. Further, we demonstrate with numerical examples the excellent efficiency of the hybrid approach, which allows us to achieve very high accuracy with a coarse grid and thus reduce the computation time by orders of magnitude.


Introduction
Accurate flow simulations remain a challenging task.The success of deep neural networks in machine translation, computer vision, and many other fields on the other hand, has lead to a growing (and renewed) interest to apply neural networks to problems in computational science and engineering.The combination of classical finite element approximation techniques with deep neural networks adds new aspects to the pure numerics-oriented approaches.At higher Reynolds numbers, accurate simulations, especially in 3D become increasingly difficult, and the classical methods reach their limits.Although the finite element method is highly efficient and established for the discretization of the Navier-Stokes equations, fundamental problems, such as the resolution of fine structures or a correct information transport between scales, are still not sufficiently solved.Even if linear solvers with optimal complexity (such as

Deep neural network on fine meshes
Prolongation  Restriction ℛ Predict error on fine mesh with a deep neural network Figure 1: The idea of DNN-MG is to use a multigrid solver to solve the problem on the first  levels of the mesh hierarchy.The obtained solution is then prolongated to a finer level  +  , where a neural network predicts corrections to the solution.These corrections are incorporated into the time evolution through the time stepping and enter through the right-hand side.multigrid methods) are used, nonlinear equations such as the Navier-Stokes equations show more and more local features under mesh refinement and these turn out to be a hurdle for the approximation.
We usually consider finite elements as a hierarchical method, where natural hierarchies of finite element meshes and spaces can be constructed (adaptively) to yield approximations of increasing accuracy.This hierarchical setup is then used within multigrid methods for efficient solution of the algebraic system.Here, we connect such a hierarchic finite element approach with neural networks in a hybrid setting: while coarse meshes are handled in the traditional finite element multigrid way, updates in finer meshes are learned by the neural network.They are hence used whenever a full resolution of the effects does not seem possible or efficient.
We label this approach the Deep Neural Network Multigrid Solver (DNN-MG) as it is based on hierarchies of meshes and functions spaces and combines the tools of multigrid solvers with deep neural networks (cf. Figure 1).However, the neural networks are used to predict updates to the nonlinear problem and hence, the approach can be used for an upsampling of any kind of finite element (also of finite volume or finite difference) solutions towards a representation on a finer mesh.
We demonstrate the efficiency, generalizability, and scalability of our proposed approach using 3D simulations.We first train the neural network on data from the classical flow with one circular obstacle.To analyze the generalization capability of DNN-MG, we test the network on channel flows with one or two obstacles at different Reynolds numbers.The obstacles have an elliptical cross-section with varying eccentricities.The obtained solutions as well as the lift and drag functionals demonstrate that DNN-MG obtains considerably better accuracy than a coarse multigrid solution on mesh-level  while taking less than 3% of the computation time required by a full solution on  + 2 levels.Therefore, DNN-MG offers a speedup by a factor of 35.DNN-MG's efficiency is evident, as it requires only double the time of a coarse mesh solution on level , yet offers substantial improvements in the overall solution quality.
The paper is organized as follows.In the Section 2, we review related work on using neural networks for the simulation of partial differential equations.Sections 3 provides a recap of the Navier-Stokes equations' solution using the geometric multigrid method, while Section 4 introduces the neural networks used later in our numerical experiments.In Section 5, we present the deep neural network multigrid solver, discussing its design in a general form, which makes it applicable to other problems.Finally, we present our numerical results in Section 6.

Related works
We discuss different approaches to the simulation of physical systems using deep neural networks.
The most direct approach to utilize (deep) neural networks for the solution of partial differential equations is to represent the solution of the PDE as such a network.In [35] already, a least squares formulation has been employed to minimize the residual in the training process.The idea has recently been used again in the Deep Ritz Method [18] and a variety of similar approaches have emerged in the last years and coined Physics Informed Neural Network (PINNs) in the literature [38].For a comprehensive review of PINN and related approaches, we refer to the comprehensive review [17].Such approaches use an existing partial differential equation and can be considered data free since no explicit training data is required.They are, however, static in that the solution to one specific problem instance is learned.PINNs show superiority for instance in very high dimensional problems [40] but they do not reach the efficiency of established methods when standard problems, e. g. from elasticity or fluid dynamics, are considered (cf.[24] for case studies on the Poisson, Allen-Cahn and Schrödinger equation).The idea of PINNs has also been extended to learning solution operators [39,14,34].The main advantage of this approach is that once the solution operator is trained, it can be applied to other settings.This enables the application to inverse problems [50,30,13] or optimal control [44].Inspired by classical time stepping schemes, another approach to a neural network-based simulation of (evolutionary) partial differential equations is to consider these as time series and predict the next state of the system based on the previous ones.With this approach, network architecture for sequential data can be used, in particular recurrent neural networks and their extensions such as Long-Short-Term-Memory units (LSTM) [28] and Gated Recurrent Units (GRUs) [16] that we also used for the DNN-MG Navier-Stokes solver [45,46].
At the other end of the spectrum are approaches that are purely data-based.For example, in [19] flow fields are learned in convolutional neural networks based on the flow geometry and using techniques from image processing.Recent work [53,9,36] uses purely data-driven neural network models trained on historical weather measurements and these perform on par with one of the most sophisticated weather forecasting models based on partial differential equations.The authors of [22] use transformer models for the prediction of dynamical systems.In the context of dynamical systems, the combination of Data Assimilation, Uncertainty Quantification, and Machine Learning techniques has gained significant interest, we refer to [15] for an in-depth review.For a recent overview on different neural network based approaches to approximate partial differential equations and related inverse problems we refer to [51].
One of the central open questions in the current literature is to what extent and how physical constraints or existing knowledge about the dynamics should be used for the training of a neural network, see also [23].The Deep Neural Network Multigrid Solver (DNN-MG) is a hybrid approach that combined the model based finite element method with deep neural networks that also integrate model knowledge.DNN-MG is not a PINN since we do not represent the solution by only aim for fine scale updates.Also, the general setup of DNN-MG aims for generalization, and it is closer to learning a numerical solver than to learning a specific solution.
The close incorporation of the deep neural network into the finite element method, e. g. that we learn fine-scale solution coefficients, gives rise to analytical tools that can be used in first steps of error estimation [48].
In our first work on DNN-MG [46] we introduced a method which combines a geometric multigrid method with ANNs but only considered 2D simulations.We also adapted the method In 2D to ensure divergence-freedom by construction through a network that predicts the streamfunction [45].In later studies we showed that the corrections added by the neural network improve the guesses in the Newton method, which leads to a reduction in wall-time compared to the coarse grid solutions [47].Similar to the ideas of DNN-MG, the authors of [42,43] use numerical solutions of higher order to train a neural network which generates corrective forcing terms.Similarly, in [20], low-order simulations of wave equations are enhanced using neural networks to provide corrective forcing.This approach includes improvements in the temporal resolution of the simulations.Learning corrective forcing terms shares similarities with the approach of DNN-MG, as both methods introduce corrections to the solution through the right-hand side, which can be seen as corrective forcing terms.However, DNN-MG goes further by directly correcting the solution, potentially providing advantages in computing goal quantities like drag and lift.All these methods have a resemblance to reduced order models (ROMs).Along these lines, in [4], a ROM based on adaptive finite elements and correction terms combined with Artificial Neural Networks (ANNs) is presented.In [32] the authors developed a mesh based approach to ANN based solvers, leveraging existing FEM theory.The authors of [12] propose a neural framework for optimizing weak formulations in FEM.An ANN acting as control variable is included in the weak form and the minimization of a cost function yields desirable approximations where known data or stabilization mechanisms can be easily incorporated.Ainsworth and Dong propose a framework for adaptively generating basis functions, each represented by a neural network with a single hidden layer, which may be used in a standard Galerkin method [1].In [49] the authors augment PDEs by ANNs to represent unknown terms.The hybrid model is then discretized using FEM.
The MgNet framework introduced in [25] draws connections from convolutional neural networks (CNNs) to Multigrid methods and thereby improve the design and understanding of CNNs.The authors of [41] propose a framework for unsupervised learning of Algebraic Multigrid (AMG) prolongation operators for linear systems which are defined directly on graphs.In the Multigrid framework, selecting an appropriate smoother is crucial to ensure convergence and efficiency, but the optimal choice of a smoother is problem dependent and is in general a major challenge for many applications.In [29], the authors optimize the smoothers in Multigrid methods, which they form by CNNs.

Numerical Methods
We solve model problems of incompressible viscous flow around a cylinder in 2D and 3D domains.This leads to the solution of the Navier-Stokes equations.Our approach involves a geometric multigrid method with a cell-based Vanka-smoother as a preconditioner to a Newton-Krylov method.
We consider a domain Ω ∈ R  with  ∈ {2, 3} and Lipschitz-continuous boundary Γ and a bounded time interval [0,  ].For solving the instationary Navier-Stokes equations we seek the velocity  ∶ [0,  ]× Ω → R  and pressure  ∶ [0,  ] × Ω → R, such that where  > 0 is the kinematic viscosity and  an external force.The initial and boundary conditions are given by where  denotes the outward facing unit normal vector on the boundary Ω of the domain.The boundary Γ = Γ  ∪ Γ  is the union of Γ  with Dirichlet boundary conditions and Γ  with Neumann type conditions.

Notation and variational formulation of the Navier-Stokes equations
By  2 (Ω) we denote the space of square integrable functions on the domain Ω ⊂ R  with scalar product (⋅, ⋅) and by  1 (Ω) those  2 (Ω) functions with weak first derivative in  2 (Ω).The function spaces for the velocity and pressure are then where   ∈  1 (Ω)  is an extension of the Dirichlet data on Γ  into the domain.We normalize the pressure to yield uniqueness, if Dirichlet data is given on the whole boundary.With these spaces, the variational formulation of (1) is given by Let Ω ℎ be a quadrilateral or hexahedral finite element mesh of the domain Ω satisfying the usual requirements on the structural and form regularity such that the standard interpolation results hold, compare [54,Section 4.2].By ℎ  we denote the diameter of an element  ∈ Ω ℎ and by ℎ the maximum diameter of all  ∈ Ω ℎ .

Semidiscretization in space
For the finite element discretization of (4) we choose equal-order continuous finite elements of degree two for velocity and pressure on hexahedral meshes.By Q ()  ℎ we denote the space of continuous functions which are polynomials of maximum directional degree  on each element  ∈ Ω ℎ .Then we define the discrete trial-and test-spaces for the discretization of (4) as  ℎ ,  ℎ ∈  ℎ = [Q (2)  ℎ ]  and  ℎ ,  ℎ ∈  ℎ = Q (2)  ℎ .Since the resulting equal order finite element pair  ℎ ×  ℎ does not fulfill the inf-sup condition, we add stabilization terms of local projection type [5].We further add convection stabilization, also based on local projections [6].The resulting semidiscrete variational problem then reads Here, we denote by  ℎ ∶  (2)  ℎ →  (1)  ℎ the interpolation into the space of linear finite elements and by   ,   two local stabilization parameters specified in (7).

Time discretization
For temporal discretization, the time interval [0,  ] is split into discrete time steps of uniform size The generalization to a non-equidistant time discretization is straightforward and only omitted for ease of presentation.We define   ∶=  ℎ (  ) and   ∶=  ℎ (  ) for the fully discrete approximation of velocity and pressure at time   and apply the second order Crank-Nicolson method to (5) The right hand side only depends on the velocity  −1 at the last time step  − 1 and we will denote it as  −1 in the following.
The stabilization parameters   and   depend on the mesh Peclet number and with two parameters  0 > 0 and  0 ≥ 0 we define them as see [11] for details.Introducing the unknown   = (  ,   ), we write a short form of the equations ( 6) as where [ ℎ (  )]  and [ ℎ ]  are the left and right hand sides of ( 6) for all test functions (  ℎ ,   ℎ ).In the presented form the Crank-Nicolson time discretization is sufficiently robust for smooth initial data  0 ; we refer to [26] for small modifications with improved robustness and stability.

Solution of the algebraic systems
Discretization in space and time (6) leads to a nonlinear system of equations with a saddle-point character.The nonlinearity problem is solved by Newton iteration, where we usually omit the dependency of the stabilization parameters (7) on the velocity when setting up the Jacobian.The Jacobian matrix within the Newton iteration is kept for several iterations and also for consecutive time steps and only re-assembled when the nonlinear convergence rates deteriorate above a certain threshold, usually set to 0.1 or 0.05.As outer solver for the linear problems GMRES iteration [31] is employed which is preconditioned with a geometric multigrid solver [7].
To cope with the saddle-point structure of the incompressible Navier-Stokes equations and to allow for efficient shared memory parallelization, a Vanka smoother is used within the geometric multigrid.Therefore, small matrices describing the local problems on each mesh element are inverted in a parallel loop with Jacobi coupling.Using quadratic finite elements on hexahedral meshes these local matrices are of size 108 × 108.The complete layout of the algebraic solver and its parallelization is described in [21].

Neural Networks
Various neural network architectures are in principle suitable for our approach.Having previously worked with recurrent neural networks designed for sequence prediction, we settled for feedforward neural networks for this work.The simple structure makes them faster to train and to evaluate.Furthermore, we found that with proper regularization, the feedforward neural network described below performs more consistent than the alternatives we tried.
A general feedforward neural network is a composition of multiple parameterized nonlinear functions defined by where  denotes the set of optimizable (learnable) parameters.The -th layer of the prototypical ANN we consider here consists of a weight matrix   ∈ R   × +1 and a nonlinear function  ∶ R → R, called the activation function, which is applied component wise,  (   )  =  ( ( ) ) .Furthermore, we employ two techniques common in deep learning to accelerate training and to improve generalization.Batch normalization   ∶ R  +1 → R  +1 , applied before each activation, re-scales each component by where  ⊙  is the Hadamard product of two matrices (or tensors) and   ∈ R  +1 and   ∈ R  +1 are respectively, mean and standard deviation of samples estimated during training.The additional parameters   ∈ R  +1 and   ∈ R  +1 are a learnable, component-wise affine transformation.Skip connections are inserted where input and output dimensions match, by adding the inputs of a layer to its outputs, leading to layers   of the structure

The Deep Neural Network Multigrid Method
In this section, we review the Deep Neural Network Multigrid (DNN-MG) solver, that we introduced in [46].DNN-MG leverages a deep neural network to predict the correction of a coarse mesh solution that has been prolongated onto one or multiple finer mesh levels.The objective is to obtain solutions that are more accurate than using the coarse mesh alone, while increasing computational efficiency compared to performing full multigrid computations on the fine mesh levels.We develop the DNN-MG solver in a general formulation, while maintaining the connection to the Navier-Stokes equations, for which we developed DNN-MG.We begin by providing an overview of the modifications made in DNN-MG compared to the MG method during a time step of the Navier-Stokes simulation.We continue by describing the structure, inputs and outputs of the Neural-Network in DNN-MG.In particular, the network is designed to make localized predictions over specific parts of the simulation domain.
Notation In the context of the Geometric Multigrid we always have a hierarchy of meshes Ω 0 ℎ , … , Ω + ℎ , obtained by successive refinement of the initial mesh Ω 0 ℎ .We denote the finite element space  ℎ and  ℎ introduced in Section 3.2 on level  ∈ {0, … ,  +  } by   ℎ and   ℎ .With their nested structure, these spaces reflect the grid hierarchy.Then, let Ω  ℎ be the quadrilateral or hexahedral finite element mesh on level  ∈ {0, … ,  +  } with   cells = card(Ω  ℎ ) cells and the index set   of all global degrees of freedom with cardinality   dof ≔ card(  ).This notation and the description in Section 5.2 is inspired by the description of the Vanka smoother in [2].
Algorithm 1: DNN-MG for the solution of the Navier-Stokes equations.Lines 6-9 (blue) provide the modifications of the DNN-MG method compared to a classical Newton-Krylov simulation with geometric multigrid preconditioning.

Time stepping using DNN-MG
We present a detailed description of one time step in the simulation of the Navier-Stokes equations using the DNN-MG solver.The computations involved are summarized in Algorithm 1.
At the beginning of time step , we solve for the unknown velocity    and pressure    on the coarse level  using the classical Newton-Krylov simulation (Algorithm 1, lines 4-6) as described in Section 3. and further utilizes information about the local mesh structure  ∈ R  Geo .This information is extracted from Ω + ℎ .To conclude the  th time step, we compute the right-hand side  + +1 of Eq. 6 on level  +  using the corrected velocity ṽ+  +  +  (Algorithm 1, line 10), and then restrict it to level  (Algorithm 1, line 11).This right-hand side, incorporating the neural network-based correction, becomes part of the multigrid computations in the next time step (Algorithm 1, line 4), thereby influencing the overall time evolution of the flow.This approach of constructing the right-hand side  + +1 on level  +  and subsequently restricting it is a crucial aspect of the DNN-MG solver.
We note that in the case of the Navier-Stokes equations the pressure is handled implicitly on level  in the multigrid solve and is not directly receive a correction by the neural network.However, the pressure is included in the network's input through the prolongated solution and the residual, as it is an integral part of the solution and is indirectly influenced through the corrections to the velocity.

The Neural Network of DNN-MG
The neural network component forms the core of the DNN-MG solver.It plays a crucial role in enhancing computational efficiency, facilitating fast training procedures, and ensuring robust generalization across diverse flow regimes beyond the training set.We achieve this through the careful design of a compact neural network architecture with a moderate number of parameters and a localized, patch-based structure.Figure 2 provides an overview of the network component, illustrating the local approach, which we introduce next.
A patch-based neural network To ensure computational efficiency, the DNN-MG approach adopts a neural network that operates on localized patches of the mesh.The network's input and output are constrained to a patch, which reduces the computational cost compared to a prediction over the entire domain.This local approach leads to more compact neural networks, with a smaller number of parameters.
We define a patch that predicts a local velocity update  + ,  ∈ (  ,  ) based on local data which is also restricted to the patch   ,  .This prediction is repeated independently for each patch in the domain, covering the entire domain Ω + ℎ .The predictions of shared degrees of freedom in adjacent patches are averaged to ensure consistency.We now formulate the process we just sketched in more mathematical terms.
We define a patch   ,  , as a collection of cells on level + that are formed by  successive refinements of a coarse mesh cell  ∈ Ω  ℎ .In other words,   ,  comprises the (2  )  cells obtained by successive refinement of the coarse mesh cell  ∈ Ω  ℎ for  levels and can be defined rigorously by In order to define , we need the Then, we define  such that it maps a global vector  ∈ R  + dof to a matrix containing local restrictions to a patch for all  0 , … ,    cells −1 ∈ Ω Patch ℎ , stacked on top of each other: After applying  to the residual  +  and x+  , we obtain the patch-wise, local data required as input to the DNN.According to (14), the local data is stacked along the first dimension, forming a single batch.The network is then evaluated independently for each patch, with each row of the input representing a different patch.The output of the DNN then consists of a batch of the patchwise defect  + ,  .To obtain the global defect vector  +  , we use the extension operator  to transfer this batch of local data back to the global domain.To this end, we first introduce a   ,  -local extension operator .
We further need a scaling vector  ∈ R  + dof which contains the reciprocal of the valence of a degree of freedom, i. e. the number of patches a degree of freedom is contained in.Then we define  such that it maps   cells local vectors to one global vector as For the sake of brevity, we used  not only as an element of Ω  ℎ in (15) but also to index the rows of the collection of local vectors.
This concludes the description of all operations necessary to integrate the neural network with its local approach into the multigrid solver.In particular is a well-defined operation (cf.Algorithm 1 line 9).The mapping (10)  are obtained.The network operates on each patch independently, which allow us to take advantage of parallelization through the localized nature of the problem.This shows that the local approach is a key feature in our method, which enables the network to handle a wide range of flow scenarios while maintaining computational efficiency, as we will show later in Section 6.The rest of this section consists of a detailed description of the inputs to the neural network and the training methodology.
Neural network inputs The effectiveness of neural network prediction hinges upon the selection of appropriate network inputs that provide enough information regarding the coarse flow characteristics at level  and the local patch geometry, akin to the significance of mesh cell geometry in classical finite element simulations.A well-informed choice of these inputs holds the potential to keep the number of parameters in the neural network low, thereby reducing evaluation time during runtime and minimizing the data and time prerequisites for training.
In our implementation, we use the following inputs to the neural network: ∈ R   dof on mesh level  +  ; • the geometry of the cells, which for the 3D case consist of the edge lengths ℎ  ∈ R 12 ; the lengths of the diagonals connecting the vertices which do not share the same face   ∈ R 4 ; for each vertex, the average of the 3 angles between the faces   ∈ R 8 .
The nonlinear residual plays a crucial role as it quantifies the local error or defect at each degree of freedom on level + and therefore carries essential information about the underlying partial differential equation.Furthermore, the velocity and pressure fields contain details about the current solution and provide knowledge about the flow within the current patch.It is important to note that the geometric information of the cells also plays a significant role in our approach, as we impose no specific restrictions on the cells, except for the standard shape regularity requirements commonly employed in finite element methods.

Numerical examples
In the following paragraphs we will document different numerical simulations.The test cases are based on a well established 3D Navier-Stokes benchmark problem [55,10].While the neural network is trained on the original benchmark configuration, we study its performance and generalization on a set of modified problems.As perturbations we consider increasing or decreasing the Reynolds number and also a substantial change to the problem geometry by introducing a second obstacle.It is important to stress that the network is trained only once and no retraining is applied for the further test cases.

Implementation aspects
The DNN-MG method is implemented mainly using two libraries: The FEM library Gascoigne3D [8] and the machine learning library PyTorch [52].The Newton-Krylov geometric multigrid implemented in Gascoigne3D method has been shown to be efficient in [21].
We implement the deep neural networks with libtorch, the C++ interface of PyTorch.Using C++ for the implementation allows us to couple the numerical methods and neural networks in a performant manner without unnecessary overhead.PyTorch also supports parallelization with MPI, which is used here to distribute training on multiple GPUs.

The 3D Benchmark Flow Around a Cylinder
We examine a variant of the three-dimensional flow benchmark presented in [55] in a setting similar to the 3D-2Z one, with different Reynolds numbers  ∈ {200, 240}.We show the geometry in Figure 3.While the original benchmark description considers an obstacle with circular cross-section, we allow elliptical ones for testing the generalizability of the approach.We define the Reynolds number as where v is the average inflow velocity and  the length of the major axis of the obstacle.When the cross-section of the cylinder is circular,  represents its diameter.If the cross-section is elliptical,  corresponds to the height of the obstacle, as the major axis is always parallel to the -axis.The time step size is chosen as  = 0.008 on the interval  = (0, 8].The initial velocity is (0) = , and the inflow boundary condition is prescribed for  > 0, with a smooth startup process.The flow is driven Table 1: Spatial meshes with increasing number of multigrid levels and degrees of freedom (DoF).For the DNN-MG setups we indicate the size of the patches (counted as number of fine mesh elements) where the network is applied.(18)  = 0.41 is the height and width of the channel and () acts, as a regularization of the startup phase.
On the wall boundary Γ  and on the obstacle we use no-slip boundary conditions.On the outflow boundary Γ out we use a do-nothing outflow condition [27].The average flow rate is v3 = 1 and the viscosity is  = 5 × 10 −4 .The training data is generated considering the circular obstacle, i. e.  = 0.1 which results in the Reynolds number Re = 200.For testing, the elliptical cross-section with  = 0.12 is considered, leading to the slightly increased Reynolds number Re = 240.See Section 5.2 for details.As in the classical benchmark description we analyze the drag and lift coefficients of the obstacle To compute the drag and lift coefficients, we use the Babuszka-Miller trick; cf.[10,3] and rewrite the surface integrals over the volume for obtaining super convergence of fourth order.

Neural Network parameters
DNN-MG uses a neural network that operates on patches.In our experiments, a patch is once or twice refined cell on level .In Table 2 we list the dimensions and number of trainable parameters of the resulting neural networks.The hidden size is arbitrary, but a size of 512 or 750 has been shown to give good results, while keeping the network size relatively small.We generally use 8 hidden layers of the same size.Unless mentioned otherwise, the results presented here were obtained with a hidden size of 750.Networks with a width of 512 are considered only for comparison.

Neural Network training
Training data are generated on grids with three successive levels of refinement, see Table 1.Here, we denote by MG(3) the coarse grid.The grid levels MG(4) and MG(5) are the training data for a network prediction over one, respectively over two grid levels.A single simulation is sufficient since, by the patch-wise application of the network, a single simulation provides   ×  training items, where   is the number of patches and   is the length of the time series.From the experience gained in the 2D case [46], we consider a small subinterval of  = [0, 8].Here we  The networks were trained using 2 GPU nodes, each with 2 Nvidia A100 GPUs and 2 Intel Xeon Platinum 8360Y CPUs.The number of MPI processes were chosen equal to the number of GPUs such that one MPI process uses one GPU and CPU.Using this setup, the training of the feedforward network takes one day, regardless of the network size.Although the data loading process is implemented efficiently and the average load of the GPUs ranges from 80% for small networks to 90% for large networks, the limiting factor in terms of performance is the memory bandwidth.
The parameter  is the number of levels we skip with the prediction by the DNN.We use a Tikhonovregularization with a scaling factor  = 10 −5 .To optimize our model, we utilize the AdamW [37] optimizer with a maximum of 1000 epochs.Finally, we select the model with the lowest validation loss.The convergence of the training for the different network configurations is reported in Figure 4.

Flow prediction
For testing we use the flow around an elliptic obstacle with an increased height of 0.12 compared to the training data (see Figure 3).The resulting finite element meshes have the same structure as in the training case, but the elements are distorted.We observe that DNN-MG is indeed able to predict high frequency fluctuations that are not visible in the coarse mesh solution.In particular in the vicinity of the obstacle the quality of the solution is strongly enhanced with distinct features in the wake being apparent in the DNN-MG simulation.
In addition to a viscosity of  = 5 × 10 −4 and  = 240, we consider fluids with  = 4 × 10 −4 corresponding to a Reynolds number of  = 300 and  = 6.67 × 10 −4 which corresponds to a Reynolds number of 180.Both of these Reynolds numbers are calculated for the elliptic obstacle with a height of 0.12, which is our typical test case.The network is trained only for  = 200 using the obstacle with circular cross-section.
Channel flow at Re =  In Figure 6 and Table 3, the drag and lift forces at Reynolds number 180, computed using DNN-MG and the classical MG method on different levels are presented.The results show the predictive capacity of the DNN-MG approach in reproducing the dynamic behavior of the flow.In terms of the drag coefficient, DNN-MG yields a slight improvement in terms of the magnitude.The lift coefficients on the other hand improve greatly compared to the coarse solution, as the overall flow dynamics are well reconstructed, albeit with temporal deviations.We observe this in Figure 5 too, where different flow patters emerge in the wake of the flow.Furthermore, with respect to the minimum and maximum lift values, our approach is in good agreement with the reference solution.Experience with finite element simulations usually show that the lift is the functional that is harder to approximate by far [10].
Channel flow at Re =  Similarly, Figure 7 and Table 4 depict the drag and lift forces at Reynolds number 240, obtained through DNN-MG and the multigrid solution on the coarse and reference level.Figure 5 show a comparison of the velocity fields obtained using DNN-MG with the coarse mesh and reference solution.The local corrections improve the solution at a global level and DNN-MG is clearly able to capture and improve the flow characteristics.In particular, it has the capacity to significantly enhance the flow resolution, yielding improved solutions and reproducing intricate features that were previously unobservable at lower levels.This underscores the method's efficacy in capturing fine-scale details.We see similar behavior as in Re = 240, indicating good generalization of DNN-MG to different Reynolds numbers, considering the network was trained on Re = 200.
Channel flow at Re =  For Reynolds number 300, Figure 8 and Table 5 present the drag and lift forces predicted by DNN-MG.Remarkably, the good results in terms of drag and lift as well as improvement of the velocity field we found for the drag and lift coefficients at Re = 180 and Re = 240 extend to significantly higher Reynolds numbers.DNN-MG shows good robustness with respect to a generalization to higher and lower Reynolds numbers than it was trained on.At Re = 300, the improvement in terms of the minima and maxima of the drag and lift coefficients are even better than the results for lower Reynolds numbers and are often the closest results to the reference solution.Analyzing Tables 3, 4 and 5, we see that the results of DNN-MG mostly outperform even the MG(4) solution in terms of the drag and lift coefficients.
In Figure 9 we present the presenting the velocity and pressure error.The results show good improvement in terms of the accuracy of DNN-MG over the coarse mesh solutions.Further, DNN-MG even outperforms the geometric multigrid level at the intermediate level 4, between coarse and reference level.We see that DNN-MG has less than 50% overhead compared to a coarse mesh simulation.On the right we see the contributions of different parts of the program to the wall clock time in percent.We observe that the evaluation of the ANN takes less than 3% of the time.

Performance measurements
For the DNN-MG method to be advantageous, it must achieve two goals: Firstly, it must enhance the accuracy compared to the solution on a coarse mesh.Secondly, it must reduce the computation time compared to a solution on a fine mesh.However, the Newton-Krylov geometric multigrid method already is of optimal complexity [33,21].In [46] this already raised the question what advantage DNN-MG offers, and we addressed it there for the 2D case.Here we adapt the theoretical findings to the 3D case.

Complexity analysis
The Newton-Krylov geometric multigrid method has linear complexity ( ) in the number of DoFs  .For each global mesh refinement the number of DoFs increase by a factor of 8, i. e.  +1 ≈ 8  .The constant hidden in ( ), however, can be very significant, since on average 5 Newton steps are required in each time step and within each Newton step one has to compute on average 12 GMRES steps with one sweep of the geometric multigrid solver as preconditioning.In addition, the geometric multigrid method usually involves six pre-smoothing and six post-smoothing steps.Thus, on each mesh level ,  − 1, … , 0, a total of approximately 700 Vanka smoothing steps must be performed.One Vanka step thereby requires the inversion of about (  /8) small matrices of size 108 × 108, one on each element of the mesh, resulting in approximately 108 3 ≈ 1 259 712 operations.Since the complexity of all mesh levels sums up to about   +  −1 + …  0 ≈   (1 + 8 −1 + ⋯ + 8 − ) ≈ 8 7   we can estimate the effort for the complete solution process on level  as 5 ⋅ 12 ⋅ (6 + 6) ⋅ 108 3 ⋅ 8 7 ≈ 10 9   .We thereby only counted the effort for smoothing and neglected all further matrix, vector and scalar products.If we were to solve the problem on mesh level  + 1, the required effort would increase by a factor of 8 to approximately ≈ 8 ⋅ 10 9   operations.
On the other hand, the DNN-MG method only requires the prolongation of the solution to the next finer mesh and the evaluation of the neural network on each patch.If we again consider patches of minimal size (one patch is one element of mesh layer  + 1) about ( +1 /8) = (  ) patches must be processed.The effort for the evaluation of the network can be estimated by the number of trainable parameters with   inputs.The DNN-MG approach requires only one evaluation of the network in each time step.The number of trainable parameters is for all network models of the order of single digit millions.Hence, the effort for correcting the level  Newton-Krylov solution by the neural network on level  + 1 can be estimated by 10 7   , which is almost a factor of thousand lower than number of FLOPS needed to solve on  + 1 (≈ 8 ⋅ 10 9   ).
For the prediction of two levels, the estimate for the computational cost of DNN-MG remains largely unchanged.While the size of the patches and the inputs and outputs of the networks increase, the number of patches and network evaluations don't change.Surprisingly, we don't need to significantly increase the network size and therefore give an upper bound of 2 ⋅ 10 7   trainable parameters.This is even more favorable than the prediction of a single mesh level, as the numerical solution on level  + 2 incurs a much higher cost of approximately 64 ⋅ 10 9   .
Computational performance of DNN-MG In Figure 10 (a) the runtimes of DNN-MG and a purely numerical MG solution are plotted.While for the MG solution with each added mesh level, the wall time increases by a factor of 8, the wall time of DNN-MG increases at most by a factor of 1.5.Note that the wall time of DNN-MG(3 + 2) and MG(5) thereby differ by a factor of 35.Although this has to be weighed against the loss in accuracy, DNN-MG is clearly capable of increasing the computational efficiency.Moreover, DNN-MG(3 + 2) consistently outperforms MG(4) in terms of the accuracy in previous tests.Remarkably, it still has an advantage in terms of the runtime which once more underlines the computational efficiency of DNN-MG.
As can be seen in Figure 10 (b), the time spent in the solver in DNN-MG is constant, which shrinks the share of the solver in the wall time while other parts of the program gain weight.The main increase in computation time is due to the increased cost of the assembly of the right hand side and the evaluation of the functionals, since these calculations are done on the higher levels.Note that these operations are also of order ( ), but with a much smaller constant than other operations in the program, e. g. the operations mentioned above in our theoretical considerations.This makes the DNN-MG method highly efficient for 1 or 2 level predictions.More levels are not feasible at this point due to limitations in the underlying FEM library.Further investigations in the scalability of the method when predicting more mesh-levels is part of future work.Predicting weights of higher order polynomials instead of the weights of basis functions on refined meshes could further reduce the computational burden and increase the computational efficiency.
The evaluation of the DNN takes less than 3% of the wall time, which is less than our previous results [46] in 2D.This is due to substantial improvements of our implementation: collecting the input for the neural network and processing the output is parallelized and the evaluation of the network is done on a GPU.Therefore, DNN-MG not only reduces the cost of accurate simulations, but it is also well-suited for heterogeneous computing environments.

Generalizability
In order to ensure the practicality of the DNN-MG method, it must generalize well to similar flows beyond those seen during training.Previous results on the 3D benchmark demonstrate the network's ability to do so under small geometric perturbations and varying Reynolds number.Here, we evaluate the network's performance under more substantial changes considering a channel with two obstacles shown in Figure 11.As for the single obstacle case, we test the performance under varying Reynolds numbers with a viscosity of  = 5 × 10 −4 and  = 240,  = 4 × 10 −4 ( = 300) and  = 6.67 × 10 −4 ( = 300).We adopt some settings from the single obstacle case.For instance, the boundary conditions, in particular the inflow at the left boundary Γ in given by (18), the initial velocity (0) =  and we chose the time step size  = 0.008 on the interval  = (0, 8].We reuse the network trained for the single-obstacle channel at Reynolds number  = 200.The neural network component of DNN-MG operates on level 5 as fine resolution and level 3 as coarse resolution in this section.In order to not overburden this paper we leave out the detailed results of a single level prediction.The complexity of the finite element meshes is given in Table 6.
Channel with two obstacles at Re =  In Figure 13 and Table 7, the drag and lift coefficients obtained using DNN-MG at Reynolds number 180 are presented.The results demonstrate the method's ability to accurately predict the dynamics of the flow.In terms of the drag, the improvement by DNN-MG greatly improves the magnitude of the drag and closely follows the trajectory of the reference solution up to the small-scale dynamics.The lift is very similar in that we are able to reconstruct the overall dynamics of the flow, although we do not get the same temporal behavior.We also observe this in Figure 12, through the different flow patterns.In terms of min and max of the lift we are very close to the reference solution.
Channel with two obstacles at Re =  Likewise, Figure 14 and Table 8 show the drag and lift forces at Reynolds number 240, further substantiating the accuracy of DNN-MG and its ability to improve the solution.Figure 12 further demonstrates the effectiveness of DNN-MG in accurately capturing the velocity magnitudes for scenarios on which the DNN was not trained.The figure shows the capability of the method to effectively increase the resolution of the flow, to improve the solution and to add features that could not be observed on lower levels, which are present on higher levels.This is a great example of the efficacy of the localized predictions of DNN-MG, which enable this generalization.Analogous to the case with one obstacle, we observe good generalization of DNN-MG to different Reynolds numbers.The great results we found for the drag and lift coefficients and the velocity field at Re = 180, translate similarly to Re = 240 case.Channel with two obstacles at Re =  For Reynolds number 300, Figure 15 and Table 9 present the drag and lift forces predicted by DNN-MG.The results demonstrate the method's ability to capture the complex flow characteristics at higher Reynolds numbers.We see that DNN-MG generalizes well to different geometries and Reynolds numbers.Considering the scenarios we tested so far, we determine that DNN-MG indeed offers great potential in terms of computational efficiency.Figure 16 demonstrates the good performance of DNN-MG in terms of velocity and pressure error.The figure shows a great improvement of the accuracy over an MG(3) solution and remarkably even outperforms the MG(4) solution.This further substantiates the efficiency in terms of runtime we demonstrated in Section 6.6.Overall, DNN-MG is robust with respect to changes of the geometry and material parameters and is able to improve solutions to PDEs in terms of characteristic quantities of interest and error measures.

Conclusion
We have presented the deep neural network multigrid solver (DNN-MG) which uses a deep neural network to improve the efficiency of a geometric multigrid solver, e. g. for the simulation of the Navier-Stokes equations.Previously demonstrated for 2D simulations, we extended DNN-MG to 3D and reformulated it in a rigorous manner.Despite the increased complexity of 3D flows, the algorithm remained applicable and delivered even larger speed-ups compared to 2D, while retaining the efficiency, generalizability, and scalability.
We established the efficacy of DNN-MG for large-scale simulations in regimes where direct solvers are not feasible anymore.The overhead is small and in 3D we can accelerate high fidelity simulations by a factor of 35.This comes at a trade-off in terms of quality compared to the high fidelity solution.However, we were able to consistently outperform classical solutions on the intermediate levels, both in terms of solution quality and wall time.This efficiency-performance trade-off establishes DNN-MG as a highly promising approach for accelerating numerical solution methods for PDEs.
We ascertained a great generalization capability of DNN-MG due to the local approach.We only trained the network with a single flow scenario at Re = 200 and got great results for lower and higher Reynolds numbers.Our results showed that DNN-MG substantially improves the solution accuracy, as measured by the  2 -errors, and reduces the errors of the drag and lift functionals across all tested scenarios.The experiments with a two obstacle flow at different Reynolds numbers were particularly successful.
Especially noteworthy is that DNN-MG is able to predict flow profiles of completely different dynamics.The grid finenesses MG(3) and MG(5) considered by us produce flows with very different character.Thus, especially in the test problem with two obstacles, there are hardly any oscillations on the coarse grid.These are correctly reproduced by DNN-MG.The local approach is thus able to identify global structures of the Navier-Stokes solution and to correct the solution in both local and temporal dynamics.
Although DNN-MG generalizes well, there are limits to this approach.In future work we want to develop an online learning approach to retrain the network adaptively based on the uncertainty of the predictions.Further, we want to incorporate physical information, by including the residual of the PDE into the loss function.We further want to investigate the application of DNN-MG to other PDEs. of Re = 240, both for pressure and velocity, is shown in Figure 17 and 18. Analogously, the case with 2 obstacles at the same Reynolds number is shown in Figure 19 and 20.
We have seen great improvement in terms of the drag and lift coefficients in Section 6.In accordance with this, we see that the error around the obstacles is greatly reduced by DNN-MG.However, we observe a significant error reduction over the entire domain.The main error contributions are due to the different flow patterns evolving in the wake of the channel, which we can also observe in Figures 5 and 12. Towards the end of the channels these differences seem to dissipate.Overall the spatial distribution of the error shows that DNN-MG is capable of globally improving the errors via the local approach.Through the localized approach the method could be extended to use localized error estimators to train the network on the critical regions in the combination with an online learning approach.
4. Subsequently, we prolongate    to x+  on the next finer level  +  (Algorithm 1, line 7) where a richer function space  + ℎ ×  + ℎ is available.Using the prolongated solution, we calculate the residual on level  +  for the input to the neural network (Algorithm 1, line 8).The residual is calculated according to (8).We then use the neural network of the DNN-MG solver to predict the velocity update  +  , which represents the difference  x+  and the unknown ground truth solution x+  on level  +  (Algorithm 1, line 9).The prediction is based on x+

Figure 2 :
Figure 2: DNN-MG adopts a local approach where the neural network works on small neighborhoods of the simulation domain called patches, and independently provides a velocity correction for each one.

Figure 3 :
Figure 3: Geometry of the training and test scenario with a parabolic inflow profile at Γ in , do-nothing boundary conditions at the outflow boundary Γ out and no-slip conditions on the walls Γ wall .The center of the obstacle is at (0.5, 0.2).

Figure 4 :
Figure 4: The average training and validation loss of the feedforward network architecture with hidden size (a) 512 and (b) 750.Due to the data efficiency of DNN-MG and the large number of batches (≈ 900 000) relative to the network size, we were able to achieve a good training loss after a single epoch and there is only little improvement in the training loss beyond that point.

MGFigure 5 :Figure 6 :
Figure 5: Comparison of the magnitude of the velocity field for the channel with a cylindrical obstacle with an elliptical cross-section at Re = 240 and time  = 7.5.The results are shown for DNN-MG(3 + 2), along with the reference solution obtained with 5 levels (MG(5)), as well as the coarse mesh solution at level 3 (MG(3)).The DNN-MG(3 + 2) method utilizes a network trained on simulation data from a round obstacle.The first three images show a cross-section of the domain in the -plane and the others show a cross-section of the domain in the -plane.

Figure 7 :
Figure 7: The drag (left) and lift (right) functionals for the channel with one obstacle with an elliptical cross section at Re = 240 are shown.The results are presented for three discretizations: The coarse mesh MG(3), the reference MG(5), and the DNN-MG(3 + 2) improved by a neural network.

Figure 8 :
Figure 8: The drag (left) and lift (right) functionals for the channel with one obstacle with an elliptical cross section at Re = 300 are shown.The results are presented for three discretizations: The coarse mesh MG(3), the reference MG(5), and the DNN-MG(3 + 2) improved by a neural network.

Figure 9 :Figure 10 :
Figure 9: Relative velocity errors (first row) and pressure errors (second row) for coarse solutions with and without ANN correction (MG(3) and DNN-MG(3 + 2) − 750) compared to the reference solution on level  + 2 for different Reynolds numbers.As further evidence of the innovative capacity of DNN-MG we show the errors of the MG(4) solution as well, which we consistently outperform.

Figure 11 :
Figure 11: Geometry of the training and test scenario with a parabolic inflow profile at Γ in , do-nothing boundary conditions at the outflow boundary Γ out and no-slip conditions on the walls Γ wall .The center of the obstacle is at (0.5, 0.2).

MGFigure 12 :Figure 13 :
Figure 12: Comparison of the magnitude of the velocity field for the channel with two cylindrical obstacles at time  = 7.5.The first obstacle has a circular cross-section and the second obstacle has an elliptical cross-section, which results in a Reynolds number of Re = 240.The results are shown for DNN-MG(3 + 2), along with the reference solution obtained with 5 levels (MG(5)), as well as the coarse mesh solution at level 3 (MG(3)).The DNN-MG(3 + 2) method utilizes a network trained on simulation data from a round obstacle.The first three images show a cross-section of the domain in the -plane and the others show a cross-section of the domain in the -plane.

Figure 14 :
Figure 14: The drag (left) and lift (right) functionals for the channel with two obstacles at Re = 240 are shown.The results are presented for three discretizations: The coarse mesh MG(3), the reference MG(5), and the DNN-MG(3 + 2) improved by a neural network.

Figure 15 :
Figure 15: The drag (left) and lift (right) functionals for the channel with two obstacles at Re = 300 are shown.The results are presented for three discretizations: The coarse mesh MG(3), the reference MG(5), and the DNN-MG(3 + 2) improved by a neural network.

Figure 16 :
Figure 16: Relative velocity errors (first row) and pressure errors (second row) for coarse solutions with and without ANN correction (MG(3) and DNN-MG(3 + 2) − 750) compared to the reference solution on level  + 2 for different Reynolds numbers.As in the case with a single obstacle, we show the errors of the MG(4) solution as well, which we again outperform.

MGFigure 17 :
Figure 17: Comparison of the error of the pressure for the channel with one cylindrical obstacle at time  = 7.5 at Reynolds number Re = 240 (cf.Figure 5).The results are shown for DNN-MG(3 + 2), as well as the coarse mesh solution at level 3 (MG(3)) with respect to the reference with 5 levels (MG(5)).The first two images show a cross-section in the -plane and the others show a cross-section in the -plane.

Figure 5 )Figure 18 :
Figure 17: Comparison of the error of the pressure for the channel with one cylindrical obstacle at time  = 7.5 at Reynolds number Re = 240 (cf.Figure 5).The results are shown for DNN-MG(3 + 2), as well as the coarse mesh solution at level 3 (MG(3)) with respect to the reference with 5 levels (MG(5)).The first two images show a cross-section in the -plane and the others show a cross-section in the -plane.

Figure 5 )Figure 19 :
Figure 18: Comparison of the error of the velocity field for the channel with one cylindrical obstacle at time  = 7.5 at Reynolds number Re = 240 (cf.Figure 5).The results are shown for DNN-MG(3 + 2), as well as the coarse mesh solution at level 3 (MG(3)) with respect to the reference with 5 levels (MG(5)).The first two images show a cross-section in the -plane and the others show a cross-section in the -plane.

Figure 12 )
Figure 19: Comparison of the error of the pressure for the channel with two cylindrical obstacles at time  = 7.5 at Reynolds number Re = 240 (cf. Figure 12).The results are shown for DNN-MG(3 + 2), as well as the coarse mesh solution at level 3 (MG(3)) with respect to the reference with 5 levels (MG(5)).The first two images show a cross-section in the -plane and the others show a cross-section in the -plane.

Figure 20 :
Figure 20: Comparison of the error of the velocity field for the channel with two cylindrical obstacles at time  = 7.5 at Reynolds number Re = 240 (cf. Figure 12).The results are shown for DNN-MG(3 + 2), as well as the coarse mesh solution at level 3 (MG(3)) with respect to the reference with 5 levels (MG(5)).The first two images show a cross-section in the -plane and the others show a cross-section in the -plane.

Figure 12 )
Figure 20: Comparison of the error of the velocity field for the channel with two cylindrical obstacles at time  = 7.5 at Reynolds number Re = 240 (cf. Figure 12).The results are shown for DNN-MG(3 + 2), as well as the coarse mesh solution at level 3 (MG(3)) with respect to the reference with 5 levels (MG(5)).The first two images show a cross-section in the -plane and the others show a cross-section in the -plane.
The neural network operates on a single patch   ,  and predicts the velocity update  + ,  specifically for the degrees of freedom within that patch.Since a single neural network is used in parallel on each patch, all patches must have the same structure and consist of the same number of fine mesh elements.
Local restriction by  ∶ R  + dof → R   cells ×  dof : Using a local restriction operator, the globally defined data of size  + dof is transformed into local, patchwise defined input of size   dof suitable for the DNN.This is done for each patch, resulting in a matrix of size R   Patch-wise prediction by the neural network  ∶ R (2  dof + Geo ) → R   dof : For each patch, the neural network  maps the local input data of size (2  dof +  Geo ) to a local output of size   dof , predicting the velocity update for the degrees of freedom within the patch   ,, . + dof : The locally defined output of the DNN is processed by the extension operator , which maps the local data of size   cells ×   dof to a single globally defined defect vector of size  + dof .
3. Global extension by ∶ R   cells ×  dof → R

Table 2 :
The input size  in , hidden size  hidden , output size  out and the number of trainable parameters of the feedforward network, for different numbers of predicted levels.

Table 3 :
The min, max, mean and amplitude of drag and lift functionals in the same setting as above in Figure6.The best results in comparison to the (5) solution are highlighted with color coding: 1st best, 2nd best, 3rd best.

Table 4 :
The min, max, mean and amplitude of drag and lift functionals in the same setting as above in Figure7.The best results in comparison to the (5) solution are highlighted with color coding: 1st best, 2nd best, 3rd best.

Table 5 :
The min, max, mean and amplitude of drag and lift functionals in the same setting as above in Figure8.The best results in comparison to the (5) solution are highlighted with color coding: 1st best, 2nd best, 3rd best.

Table 6 :
Spatial meshes with increasing number of multigrid levels and unknowns.

Table 7 :
The min, max, mean and amplitude of drag and lift functionals in the same setting as above in Figure13.The best results in comparison to the MG(5) solution are highlighted with color coding: 1st best, 2nd best, 3rd best.

Table 8 :
The min, max, mean and amplitude of drag and lift functionals in the same setting as above in Figure14.The best results in comparison to the MG(5) solution are highlighted with color coding: 1st best, 2nd best, 3rd best.

Table 9 :
The min, max, mean and amplitude of drag and lift functionals in the same setting as above in Figure15.The best results in comparison to the MG(5) solution are highlighted with color coding: 1st best, 2nd best, 3rd best.