End-to-end differentiable learning of turbulence models from indirect observations

The emerging push of the differentiable programming paradigm in scientific computing is conducive to training deep learning turbulence models using indirect observations. This paper demonstrates the viability of this approach and presents an end-to-end differentiable framework for training deep neural networks to learn eddy viscosity models from indirect observations derived from the velocity and pressure fields. The framework consists of a Reynolds-averaged Navier-Stokes (RANS) solver and a neural-network-represented turbulence model, each accompanied by its derivative computations. For computing the sensitivities of the indirect observations to the Reynolds stress field, we use the continuous adjoint equations for the RANS equations, while the gradient of the neural network is obtained via its built-in automatic differentiation capability. We demonstrate the ability of this approach to learn the true underlying turbulence closure when one exists by training models using synthetic velocity data from linear and nonlinear closures. We also train a linear eddy viscosity model using synthetic velocity measurements from direct numerical simulations of the Navier-Stokes equations for which no true underlying linear closure exists. The trained deep-neural-network turbulence model showed predictive capability on similar flows.


Introduction
There still is a practical need for improved closure models for the Reynolds-averaged Navier-Stokes (RANS) equations. Currently, the most widely used turbulence models are linear eddy viscosity models (LEVM), which presume the Reynolds stress is proportional to the mean strain rate. Although widely used, LEVM do not provide accurate predictions in many flows of practical interest, including the inability to predict secondary flows in non-circular ducts (Speziale 1982). Alternatively, non-linear eddy viscosity models (NLEVM) capture nonlinear relations from both the strain and rotation rate tensors. NLEVM, however, do not result in consistent improvement over LEVM and can suffer from poor convergence (Gatski & Speziale 1993). Data-driven turbulence models are an emerging alternative to traditional single-point closures. Data-driven NLEVM use the integrity basis representation (Pope 1975;Ling et al. 2016) to learn the mapping from the velocity gradient field to the normalised Reynolds stress anisotropy field, and retain the transport equations for turbulence quantities from a traditional model. It has been natural to train such models using Reynolds stress data (e.g. Ling et al. 2016;Weatheritt & Sandberg 2017). However, Reynolds stress data from high-fidelity simulations, i.e. from direct numerical simulations (DNS) of the Navier-Stokes equations, is mostly limited to simple geometries and low Reynolds number. It is therefore desirable to train with indirect observations, such as quantities based on the velocity or pressure fields, for which experimental data is available for a much wider range of complex flows. Such measurements include full field data, e.g. from particle image velocimetry, sparse point measurements such as from pressure probes, and integral quantities such as lift and drag. Training with indirect observations has the additional advantage of circumventing the need to extract turbulence scales that are consistent with the RANS modelled scales from the high fidelity data (e.g. Weatheritt & Sandberg 2017).
Recently, Zhao et al. (2020) learned a zonal turbulence model for the wake-mixing regions in turbomachines in symbolic form (e.g., polynomials and logarithms) from indirect observation data by using genetic algorithms. However, while symbolic models are easier to interpret, they may have limited expressive power as compared to, for instance, deep neural networks (Raghu et al. 2017), which are successive composition of nonlinear functions. Symbolic models may therefore not be generalisable and be limited to zonal approaches. More importantly, gradient-free optimisation method such as genetic programming may not be as efficient as gradient-descent methods, and the latter should be used whenever available (Audet & Kokkolaras 2016). In particular, deep learning methods (LeCun et al. 2015) have achieved remarkable success in many fields and represent a promising approach for data-driven NLEVM (e.g. Ling et al. 2016).
A major obstacle for gradient-based learning from indirect observations is that a RANS solver must be involved in the training and the RANS sensitivites are required to learn the model. While, such sensitivites can be obtained by using adjoint equations, which have long been used in aerodynamic shape optimisation (Jameson 1988), these are not generally rapidly available or straight forward to implement. The emerging interest in differentiable programming is resulting in efficient methods to develop adjoint accompanying physical models, including modern programming languages that come with built-in automatic differentiation (Bezanson et al. 2019), or neural-network-based solutions of partial differential equations (Raissi et al. 2019). Recently, Holland et al. (2019) used the discrete adjoint to learn a corrective scalar multiplicative field in the production term of the Spalart-Allmaras transport model. This is based on an alternative approach to data-driven turbulence modelling (Parish & Duraisamy 2016) in which empirical correction terms for the turbulence transport equations are learned while retaining a traditional linear closure (LEVM).
In this work we demonstrate the viability of training deep neural networks to learn general eddy viscosity closures (NLEVM) using indirect observations. We embed a neural-network-represented turbulence model into a RANS solver using the integrity basis representation, and as a proof of concept we use the continuous adjoint equations to obtain the required RANS sensitivities. This leads to an end-to-end differentiable framework that provides the gradient information needed to learn turbulence models from indirect observations.

Differentiable framework for learning turbulence models
In this proposed framework a neural network is trained by optimising an objective function that depends on quantities derived from the network's outputs rather than on those outputs directly. The training framework is illustrated schematically in figure 1 and consists of two components: the turbulence model and the objective function. Each of these two components has a forward model that propagates inputs to outputs and a backwards model that provides the derivatives of the outputs with respect to the inputs or parameters. The gradient of the objective function J with respect to the network's trainable parameters w is obtained by combining the derivative information from the two components through the chain rule as where τ is the Reynolds stress tensor predicted by the turbulence model.

Forward model
For given values of the trainable parameters w, the forward model evaluates the cost function J, which is the discrepancy between predicted and observed quantities. The forward evaluation consists of two main components: (i) evaluating the neural network turbulence model and (ii) mapping the network's outputs to observation space by first solving the incompressible RANS equations.
The turbulence model, shown on the left box in figure 1, maps the velocity gradient field to the Reynolds stress field. The integrity basis representation for a general eddy viscosity model (Pope 1975) is given as where a is the anisotropic (deviatoric) component of the Reynolds stress, b is the normalised anisotropic Reynolds stress, T and θ are the basis tensor functions and scalar invariants of the input tensors, g are the scalar coefficient functions to be learned, and I is the second rank identity tensor. The input tensors are the symmetric and antisymmetric components of the normalised velocity gradient: S = 1 2 t τ ∇u + ∇u and R = 1 2 t τ ∇u − ∇u , where t τ is the turbulence time-scale and u is the mean velocity. The linear and quadratic terms in the integrity basis representation are given as where curly braces {·} indicate the tensor trace. Different eddy viscosity models differ in the form of the scalar coefficient functions θ → g and in the models for the two turbulence scale quantities k and t τ . We represent the scalar coefficient functions using a deep neural network with 10 hidden layers with 10 neurons each and a rectified linear unit (ReLU) activation function. The turbulence scaling parameters are modelled using traditional transport equations T (k, t τ ) = 0 with the TKE production term P modified to account for the expanded formulation of Reynolds stress: P = τ : ∇u, where : denotes double contraction of tensors.
The RANS solver along with its post-processing serves as an observation operator that maps the turbulence model's outputs (Reynolds stress τ ) to the observation quantities (e.g., sparse velocity measurement, drag). This is shown in the right box in figure 1. The first step in this operation is to solve the RANS equations with the given Reynolds stress closure to obtain mean velocity u and pressure p fields. This is followed by post-processing to obtain the observation quantities (e.g., sampling velocities at certain locations or integrating surface pressure to obtain drag). When solving the RANS equations, explicit treatment of the divergence of Reynolds stress can make the RANS equations ill-conditioned (Wu et al. 2019;Brener et al. 2021). We treat part of the linear term implicitly by use of an effective viscosity ν eff which is easily obtained since with the integrity basis representation the linear term is learned independently. The incompressible RANS equations are then given as where the term ∇·ν eff ∇u is treated implicitly. Here a NL represents the non-linear component of the Reynolds stress anisotropy, the isotropic component of Reynolds stress is incorporated into the pressure term p * , and s is the momentum source term representing any external body forces. The coefficients g (i) are outputs of the neural network-based turbulence model that have the fields θ as input.

Adjoint model
For a proposed value of the trainable parameters w the backwards model (represented by left-pointing arrows in figure 1) provides the gradient ∂J/∂w of the cost function with respect to the trainable parameters. This is done by separately obtaining the two terms on the right hand side of equation (2.1): (i) the gradient of the Reynolds stress ∂τ /∂w from the neural network turbulence model and (ii) the sensitivity ∂J/∂τ of the cost function to the Reynolds stress, using their respective adjoint models. Combining the two adjoint models results in an end-to-end differentiable framework, whereby the gradient of observation quantities (e.g. sparse velocity) with respect to the neural network's parameters can be obtained.
The gradient of the Reynolds stress with respect to the neural network's parameters w is obtained in two parts using the chain rule. The gradient of the neural network's outputs with respect to its parameters, ∂g/∂w, is efficiently obtained by backpropagation, which is a reverse accumulation automatic differentiation algorithm for deep neural networks that applies the chain rule on a per-layer basis. The sensitivities of the Reynolds stress to the coefficient functions are obtained as ∂τ /∂g (i) = 2kT (i) from differentiation of equation 2.2, which is a linear tensor equation.
For the sensitivity of the objective function with respect to the Reynolds stress we derived the appropriate continuous adjoint equations. Since the Reynolds stress must satisfy the RANS equations, this is a constrained optimisation problem. The problem is reformulated as the optimisation of an unconstrained Lagrangian function with the Lagrange multipliers described by the adjoint equations. The resulting adjoint equations are whereû andp are the Lagrange multipliers, referred to as adjoint velocity and adjoint pressure, respectively, and J Ω is the scalar field such that the objective function is given as the integral J = J Ω dΩ over the solution domain Ω. When the cost function is instead given as an integral over the domain boundary (e.g. drag) the cost function affects the boundary conditions of the adjoint equations instead (Othmer 2008). When solving the adjoint equation using a segregated approach such as the SIMPLE algorithm used here, the adjoint transpose convection term ∇û · u is treated explicitly and can result in instabilities (Oriani & Pierrot 2016). For this reason it is common to dampen or eliminate this term (Othmer 2014), and here we eliminate it. After solving the adjoint equations, the sensitivity of the function J to the Reynolds stress is given as the gradient of the adjoint velocity ∂J ∂τ = −∇û. (2.6) The details of the derivation and use of the continuous adjoint equations are shown in (Michelén Ströfer 2021) for interested readers.

Gradient descent procedure
The training is done using the Adam algorithm, a gradient descent algorithm with momentum and adaptive gradients commonly used in training deep neural networks. The default values for the Adam algorithm are used, including a learning rate of 0.001. The training requires solving the RANS equations at each training step. In a given training step the inputs θ i are updated based on the previous RANS solution and scaled to the range 0-1, and the RANS equations are then solved with fixed values for the coefficient functions g. The inputs θ i and their scaling parameters are fixed at a given training step and converge alongside the main optimisation of the trainable parameters w.
initialisation of the neural network's parameters requires special consideration. The usual practice of random initialisation of the weights is not suitable in this case since it leads to divergence of the RANS solution. We use existing closures (e.g. a laminar model with g (i) = 0 or a linear model with g (1) = −0.09) to generate data for pre-training the neural network and thus provide a suitable initialisation. This has the additional benefit of embedding existing insight into the training by choosing an informed initial point in the parameter space. When pre-training to constant values (e.g. g (1) = −0.09) we add noise to the pre-training data, since starting from very accurate constant values can make the network difficult to train.

Initial
Final Truth Observation

Test cases
The viability of the proposed framework is demonstrated by testing on three test cases. The first two cases use synthetic velocity data obtained from a linear and a non-linear closure, respectively, to train the neural network. The use of synthetic data allows us to evaluate the ability of the training framework to learn the true underlying turbulence closure when one exists. In the final test case realistic velocity measurements, obtained from a DNS solution and for which no known true underlying closure exists, are used to learn a linear eddy viscosity model. The trained LEVM is then used to predict similar flows and the predictions are compared to those from a traditional LEVM.

Learning a synthetic LEVM from channel flow
As a first test case we use a synthetic velocity measurement at a single point from the turbulent channel flow to learn the underlying linear model. The flow has a Reynolds number of 10, 000 based on bulk velocity u b and half channel height h. The turbulent equations used are the k-ω model of Wilcox (1998), and the synthetic model corresponds to a constant g (1) = 0.09. For the channel flow there is only one independent scalar invariant and T (1) is the only linear tensor function in the basis. We therefore use a neural network with one input and one output which maps θ 1 → g (1) . The network has 1021 trainable parameters and is pre-trained to the laminar model g (1) = 0. The sensitivity of the predicted point velocity to the Reynolds stress is obtained by solving the adjoint equations with J Ω equal to the velocity field times a radial basis function. Figure 2 shows the results of the training. The trained model not only results in the correct velocity field, but the correct underlying model is learned.

Learning a synthetic NLEVM from flow through a square duct
As a second test case we use a synthetic full field velocity measurement from flow in a square duct to learn the underlying nonlinear model. The flow has a Reynolds number of 3, 500 based on bulk axial velocity u b and half duct side length h. This flow contains a secondary in-plane flow that is not captured by LEVM (Speziale 1982). For the objective function, J Ω is the difference between the measured and predicted fields, with the discrepancy of the in-plane velocity scaled by a factor of 1, 000 as to have a similar (a) weight to the axial velocity discrepancy. The NLEVM is the Shih quadratic model (Shih et al. 1993) which, using the basis in equation 2.3, can be written as g 1 (θ 1 , θ 2 ) = −2/3 1.25 + √ 2θ 1 + 0.9 √ −2θ 2 , g 2 (θ 1 , θ 2 ) = 7.5 1000 + ( √ 2θ 1 ) 3 , g 3 (θ 1 , θ 2 ) = 1.5 1000 + ( √ 2θ 1 ) 3 , g 4 (θ 1 , θ 2 ) = −9.5 1000 + ( √ 2θ 1 ) 3 . (3.1) For the flow in a square duct only four combinations of the Reynolds stress components affect the predicted velocity (Speziale 1982): τ xy and τ xz in the axial equation and τ yz and (τ zz − τ yy ) in the in-plane equation. In this flow the in-plane velocity gradients are orders of magnitude smaller than the gradients of the axial velocity u x . For these reasons only two combinations of coefficient functions can be learned, g (1) and the combination g (2) − 0.5g (3) + 0.5g (4) , and there is only one independent scalar invariant with θ 1 ≈ −θ 2 . The neural network has two inputs and four outputs and was pre-trained to the LEVM with g (1) = −0.09. The turbulent equations used are those from the Shih quadratic kε model. Figure 3 shows the learned model which shows improved agreement with the truth. The combination g 2 − 0.5g (3) + 0.5g (4) shows good agreement only for the higher range of scalar invariant θ 1 . This is due to the smaller scalar invariants corresponding to smaller velocity gradients and smaller magnitudes of the tensors T . The velocity field is therefor expected to be less sensitive to the value of the Reynolds stress in these regions. It was observed that the smaller range of the invariant, where the learned model fails to capture the truth, occurs mostly in the center channel. Figure 4 shows the ability of the learned model to capture the correct velocity, including predicting the in-plane velocities, and the Reynolds stress. The trained model fails to predict the correct τ yz in the center channel, but this does not propagate to the predicted velocities. Additionally, it was observed that obtaining significant improvement in the velocity field requires only a few tens of training steps and only requires the coefficients to have roughly the correct order of magnitude. On the other hand obtaining better agreement of the scalar coefficients took 1-2 orders of magnitude more training steps with diminishing returns in velocity improvement. This shows the importance of using synthetic data to evaluate the ability of a training framework to learn the true underlying model when one exists rather than only comparing the quantities of interest.

Learning a LEVM from realistic data of flow over periodic hills
As a final test case, a LEVM is trained using sparse velocity measurements from DNS of flow over periodic hills. The DNS data comes from Xiao et al. (2020) who performed DNS of flow over periodic hills of varying slopes. This flow is characterised by a recirculation region on the leeward side of the hill and scaling the hill width (scale factor α) modifies the slope and the characteristics of the recirculation region (e.g. from mild separation for α = 1.5 to massive separation for α = 0.5). For all flows, the Reynolds number based on hill height h and bulk velocity u b through the vertical profile at the hill top is Re = 5, 600. The training data consists of four point measurements of both velocity components in the flow over periodic hills with the baseline α = 1 geometry. The two components of velocity are scaled equally in the objective function. The training data and training results are shown in figure 5. The neural network in this case has one input and one output and is pre-trained to laminar flow, i.e. g (1) = 0. The trained model is a spatially varying LEVM g (1) = g (1) (θ 1 ) that closely predicts the true velocity in most of the flow with the exception of the free shear layer in the leeward side of the hills. To test the extrapolation performance of the trained LEVM, we use it to predict the flow over the other periodic hill geometries, α ∈ [0.5, 0.8, 1.2, 1.5], and compare them to results with the k-ω model g (1) = −0.09. The results for α = 0.5 and α = 1.5 are shown in figure 6. For the α > 1.0 cases the trained linear model outperforms the k-ω model in the entire flow. For the α < 1.0 cases the the trained model results in better velocity predictions in some regions, particulary the upper channel, while the k-ω model results in better velocities in the lower channel.

Conclusions
In this paper we present a framework to train deep learning turbulence models using quantities derived from velocity and pressure that are readily available for a wide range of flows. The method was first tested using synthetic data obtained from two traditional closure models: the linear k-ω and the Shih quadratic k-ε models. These two cases demonstrate the ability to learn the true underlying turbulence closure from measurement data when one exists. The method was then used to learn a linear eddy viscosity model from synthetic sparse velocity data derived from DNS simulations of flow over periodic hills. The trained model was used to predict flow over periodic hills of different geometries.
This work demonstrates that deep learning turbulence models can be trained from indirect observations when the relevant sensitivities of the RANS equations are available. With the growing interest in differentiable programming for scientific simulations, it is expected that the availability of derivative information will become more commonplace in scientific and engineering computations, making it more seamless to couple scientific computations with novel deep learning methods.