Data-driven RANS closures for three-dimensional flows around bluff bodies

In this short note we apply the recently proposed data-driven RANS closure modelling framework of Schmelzer et al. (2020) to fully three-dimensional, high Reynolds number flows: namely wall-mounted cubes and cuboids at Re=40,000, and a cylinder at Re=140,000. For each flow, a new RANS closure is generated using sparse symbolic regression based on LES or DES reference data. This new model is implemented in a CFD solver, and subsequently applied to prediction of the other flows. We see consistent improvements compared to the baseline $k-\omega$ SST model in predictions of mean-velocity in the complete flow domain.


Introduction
Reynolds averaged Navier-Stokes (RANS) models are notoriously inaccurate in the presence of massive flow separation, for example in the wake of bluff bodies. The steady RANS paradigm -of modelling fluctuations at all scales -is fundamentally ill-suited to representing the complex unsteady dynamics in the wake of e.g. a cylinder or wall-mounted cube. Nonetheless, in a wide variety of industrial applications (notably the automotive industry) it would be extremely valuable to have access to RANS closures that give reasonably accurate predictions in wakes.
Data-driven turbulence modelling uses data from high-fidelity simulations (LES, DNS) or experiments to semi-automatically derive new closure models, see the surveys [1,2]. The methods of supervised machine-learning are used to represent and fit models, e.g. neural networks [3], random-forests [4,5,6], geneexpression programming [7], and sparse symbolic regression [8,9]. The latter two methods have the advantage of generating concise expressions for closure models, which can be inspected, analysed and implemented easily.
In this work we apply the Sparse Regression of Turbulence Anisotropy (SpaRTA) framework first developed by Schmelzer et al. [8]. The models produced are explicit algebraic Reynolds-stress models, based on k − or k − ω closures, but correcting both the turbulence anisotropy, and the t.k.e. production. The required correction fields are solved for by injecting DES, LES or DNS data into the RANS equations; and a model is obtained by regressing these corrections against mean-flow quantities available to RANS. This model can then be applied to predict a flow for which no reference data is available.
Though the objective of predicting bluff body flows with RANS might seem optimistic, SpaRTA has already been demonstrated with success for flows with significant separation [10]. Also note: our intention here is not to derive new general purpose closure models. At a minimum that would require a more diverse set of training flows. Rather, we wish to demonstrate that our framework has the capability of constructing RANS closures that generalize acceptably for massively detached flows. As such they may be useful as components of larger, general purpose models.
While using the framework of [8] and [10], this work extends those articles in several ways. We consider flows at significantly higher Reynolds numbers (Re = 140, 000 vs Re 10, 000 in [8]); in 3D vs only 2D previously; and we use DES data for the first time. The 3D LES data source means that optimization of the symbolic regression for large data-sets is required, and we introduce a practical technique for library reduction.

Methodology: SpaRTA
The objective is to generate a RANS closure from reference data, that improves predictions for some class of flows. For extended details of our methodology, see [8]. In brief: the incompressible RANS k − ω SST equations are modified with correction terms b ∆ and R to: with ρ the fluid density, U , P the mean velocity and generalized pressure, ν, ν T = ν T (k, ω) the molecular-and turbulence viscosity, k, P k = −2k(b ij + b ∆ ij )∂ j U i the turbulence kinetic energy and its production rate, and ω the specific dissipation rate. See [11] for details of remaining terms and coefficients. The baseline SST model uses the Boussinesq assumption as a fundamental modelling premise, namely where b ij is the normalized turbulence anisotropy. The purpose of b ∆ is to relax this assumption by allowing for deviations from Boussinesq. Similarly the SST k-equation is a model for the true t.k.e. equation; and the new term R is placed to allow for modelling errors here. In practice R allows for control of turbulence intensity, and b ∆ for control of turbulence anisotropy.
Solving for corrective fields. Given full-field LES or DES data for flow A, our preliminary objective is to find corrective fields R(x), b ∆ (x) -i.e. as functions of the spatial coordinates x := (x, y, z) -such that when the system (1) is solved for the same flow A, the resulting U and k correspond well to LES/DES mean values. This is achieved by injecting frozen LES/DES quantities into (1), and solving for the remaining unknowns R, b ∆ and ω. This procedure is named k-corrective-frozen-RANS [8] and can be seen as a generalization of the "frozen" method for estimating t.k.e. dissipation rate from LES data [12]. We use DES/LES estimates of the Reynolds stress tensor and t.k.e. which include contributions from both modelled and resolved stresses. Note that we deliberately conflate the k of the k − ω model, and the true t.k.e. at this step. The objective is to obtain a system of equations which is predictive of true t.k.e., at the cost of perhaps larger correction terms.
Model Regression. To make predictions it is necessary to generalize these corrective fields by building a closure model. We achieve this by now finding expressions for R(S, Ω), b ∆ (S, Ω) as functions of the mean strain-rate tensor S and rotation rate tensor Ω ij := 1 2 (∂ j U i − ∂ i U j ); quantities available to the RANS solver. In particular we follow Pope's integrity basis formulation [13], which specifies that the most general functional form (under modest assumptions) for b ∆ (S ij , Ω ij ) can be written: where T (1) := S, T (2) := SΩ−ΩS, etc. are basis tensors, λ l are the five invariants of S and Ω [14], and α k : R 5 → R are ten, arbitrary scalar-valued functions. By this construction b ∆ is always symmetric and traceless, the map b ∆ (·) is invariant under rotations, and Galilean invariant by virtue of the dependence only on ∂ j U i . In this work we use only T (1) , . . . , T (4) and λ 1 , λ 2 , thereby restricting our search to quadratic nonlinear eddy-viscosity models, following e.g. [15,16]. Aiming for simple algebraic expressions, we represent each α k (λ 1 , λ 2 ) as a linear combination of a large library of basis functions L = (φ 1 , . . . , φ L ): where the library is generated from products and powers of the inputs L = (1, λ 1 , λ 2 1 , . . . , λ 2 , λ 2 2 , . . . , λ 1 λ 2 , . . . ), see for example [17]. To avoid models with large numbers of terms, as well as overfitting, we apply elastic net regression [18], which encourages sparsity (most coefficients c l k = 0), to find the model form. Let c ∈ R 4×L represent the vector of all model coefficients, then we solve where in practice the first norm is estimated using the points of the mesh used to obtain the corrective field b ∆ (x). The term c 1 := |c i | encourages sparsity of c and the term c 2 2 := c 2 i controls the magnitude of nonzero coefficients. Both are blended by ρ ∈ [0, 1] and θ ∈ R + ultimately controls the extent of regularization. Using path elastic net [19], a large number of candidate models for various ρ, θ are obtained, and a second ridge-regression step is used to select coefficients. A similar procedure is applied to model R(S, Ω): we assume the form R = 2k∂ j U iRij (S, Ω), and use the base-tensor series to modelR ij .
With the extension from 2D cases in [8] to 3D cases here the size of the data-set in (4) has increased significantly, and symbolic regression becomes a significant memory bottleneck due to storage of the library L evaluated on the full data-set. We introduce a cliqueing procedure motivated by the high multicollinearity observed in L. Specifically we compute the correlation coefficient between all pairs of library functions φ i , and sort them into cliques whose correlation within a clique always exceeds a cut-off (of 0.99). Efficiently finding cliques is an established problem in graph theory [20]. We then select the algebraically simplest member of the clique to represent the clique, and discard the remainder. Although φ i are nonlinear in the inputs, a linear measure of correlation is adequate, as they are combined linearly in (3). This method is reminiscent of elite basis regression [21], except that we are not concerned with correlation with the target, only with basis functions amongst themselves.

Results
We examine three flows: a wall-mounted cube in a channel (Flow A), a wall-mounted cuboid (length:width:height ratio 3:2:2) also in a channel (Flow B), and an infinite circular cylinder (Flow C). Flows A, B are at Re = 40, 000 based on bulk velocity and cube/cuboid height h, with the channel of height 2h, and are well-used experimental [22] and numerical test-cases [23]. Flow C is at Re = 140, 000 based on cylinder diameter [24]. All three flows include massive separation, resulting in unsteady wakes with a wide range of time-and length-scales.
Ground-truth reference fields for Flows A, B are obtained with DES [25] simulating a domain of size 14.5h × 9h × 2h, the obstacle centred at x = 4h, with periodic boundary-conditions in the cross-channel direction, and synthetic channel turbulence at the inflow plane [26], and an averaging time of 12.4 flow-throughs. The RANS part of the DES is based on the Spalart-Allmaras one-equation model [27], in contrast to our use of k − ω SST in our enhanced RANS.
For Flow C ground-truth comes from a wall-resolved LES. The cylinder has diameter d and simulated length π · d with periodic boundaries. The flow is fully turbulent, with synthetic turbulence at the inlet plane. For all flows, velocity profiles were compared with existing published results, and found to be sufficiently accurate for the application.
The "Cube" model. We first apply the SpaRTA methodology to Flow A. Mean velocity profiles for the DES reference and baseline RANS are shown in Figure 1. The baseline significantly overestimates the size of the recirculation region on the top and sides of the cube, and the wake recovery is very significantly delayed. Frozen corrective fields are obtained, and symbolic regression is used to reduce these to a closure model, giving the following correction to k − ω SST: These corrections are implemented in our solver, a modified version of simpleFoam in OpenFOAM, as a new turbulence model -this is straightforward and can be completely automated. The RANS solver is run with the new model for Flow A as a verification check. The resulting mean-flow is also shown in Figure 1, where the flow in the entire domain is seen to better represent the reference.
In particular the flow near the stagnation point, the separation on the top and sides, and the wake recovery are all closer to the reference -although wake recovery is still somewhat under-predicted. Note that to obtain these results it was necessary to relax b ∆ Cube by 50% to gain stability. In general we observed that the R correction term acted to increase the production of k, while the contribution of b ∆ to the production term was generally negative in the wake. In the frozen corrective fields these effects cancelled to some extent, but the same cancellation was less apparent in the discovered models. As it is, the relaxation of b ∆ by 0.5 will tend to increase the eddy-viscosity beyond that recommended by the frozen corrective fields, and thereby stabilize the simulation.
To make a prediction, the "Cube" model is applied to Flow B. The mean velocity profiles are shown in Figure 2. The flow is topologically similar to Flow A, as such it is reasonable to hope that the model generalizes well. This is indeed the case, with significant improvements over the RANS baseline visible    everywhere in the domain. Notably, on the longer top and sides of the obstacle, the reproduction of the ground-truth is good -while the flow separation is quite different. A weakness can be seen in the near wake, where in Figure 2b a much narrower shear-layer is visible in the reference compared to the corrected model, however the wake recovery is good.
As a second prediction, the Cube model is applied to Flow C, see Figure 5. In this flow the baseline RANS predicts the flow upstream of the obstacle very well, and the correction model does not modify the flow here. Once again however the baseline predicts much too slow wake recovery, and this is corrected almost completely by our model based only on data from Flow A.
The "Cylinder" model. To investigate the sensitivity of the data-source on the resulting model and its performance, we apply the SpaRTA methodology using R Cylinder = T (1) (−2.823λ 1 + 33.82λ 2 + 2.586) In this case it was possible to find a significantly sparser model that nonetheless reduced the regression error to an acceptable level. This is likely at least partially thanks to the two-dimensionality of the mean-flow. Again the model was implemented in our OpenFOAM-based solver, and Flow C was predicted as a verification exercise -see Figure 5. Once more the flow is significantly more accurate than the baseline (and especially in terms of wake recovery) except in a small region in the near wake. Notably the "Cylinder" model performs worse for this verification exercise, than the "Cube" model does in prediction. We suspect this is partially a consequence of the relatively diverse flow content of the cube data-set -the choice of training data will be a subject of future work. Finally, in Figure 4 k-profiles for Flow C are compared. Again both correction models improve dramatically on the baseline -which massively underpredicts k in the wake -however neither are particularly accurate. A likely cause is the large-scale dynamics of the Kármán vortex street, which contribute to the LES estimate of t.k.e. but are missing in RANS. Once more however the free-stream is unaltered.

Conclusions
We've demonstrated the ability to construct custom explicit algebraic Reynolds stress models for bluff-body flows from LES and DES data. We've shown the models have a generalization capability, with respect to different geometries and different flows, while consistently outperforming baseline models. This study shows the possibility of using RANS for massively separated flows using suitable machine-learning techniques, though many questions remain. Specifically: Is the method robust to training-data choice? Is symbolic regression general enough to capture the necessary corrections? How do the b ∆ and R corrections interact in k-production? Further work will focus on investigating the stability of automatically generated models, and scaling up training data from one flow, to a large database of flows.