A variational data assimilation approach for sparse velocity reference data in coarse RANS simulations through a corrective forcing term

The Reynolds-averaged Navier-Stokes (RANS) equations provide a computationally efficient method for solving fluid flow problems in engineering applications. However, the use of closure models to represent turbulence effects can reduce their accuracy. To address this issue, recent research has explored data-driven techniques such as data assimilation and machine learning. An efficient variational data assimilation (DA) approach is presented to enhance steady-state eddy viscosity based RANS simulations. To account for model deficiencies, a corrective force term is introduced in the momentum equation. In the case of only velocity reference data, this term can be represented by a potential field and is divergence-free. The DA implementation relies on the discrete adjoint method and approximations for efficient gradient evaluation. The implementation is based on a two-dimensional coupled RANS solver in OpenFOAM, which is extended to allow the computation of the adjoint velocity and pressure as well as the adjoint gradient. A gradient-based optimizer is used to minimize the difference between the simulation results and the reference data. To evaluate this approach, it is compared with alternative data assimilation methods for canonical stationary two-dimensional turbulent flow problems. For the data assimilation, sparsely distributed reference data from averaged high-fidelity simulation results are used. The results suggest that the proposed method achieves the optimization goal more efficiently compared to applying data assimilation for obtaining the eddy viscosity, or a field modifying the eddy viscosity, directly. The method works well for different reference data configurations and runs efficiently by leveraging coarse meshes.


Introduction
There are numerous relevant fluid flow problems in science and engineering, ranging from the smallest scales, e. g. in biological flows, to atmospheric flows at a planetary scale.Efficient and accurate analysis of fluid flow problems is critical in many engineering applications, including power generation with gas turbines or coal combustion, renewable power generation such as hydro and wind, and energy storage such as pumped-storage hydropower.Efficient propulsion is essential in transportation, utilizing jet engines, propellers, and internal combustion engines, alongside streamlined aircraft, ships, and land vehicles to achieve low drag and minimize noise (e. g. [1,2,3,4,5]).
Analysis of these flows often relies on simulation, since experiments might not be feasible for technical, financial, or safety reasons.Compared to experimental investigations, flow simulations have further advantages such as their flexibility and their speed, but they never will exactly represent reality.Many of the aforementioned technical applications involve turbulent flows, which are challenging to simulate because of the large number of spatial and temporal scales involved.Direct numerical simulation or high fidelity methods such as large eddy simulation (LES) are prohibitively expensive for many industrially relevant problems, while Reynolds-averaged Navier-Stokes (RANS) based simulations are widely used.RANS computations only provide (ensemble) averaged results and rely upon more modeling and approximations than high-fidelity methods, but they are significantly less expensive.
The RANS approach in particular is prone to a number of errors related to averaging and the closure models, which often rely on the Boussinesq approximation that introduces the eddy viscosity.Two equation models for the eddy viscosity, such as the k-ε and k-ω models (cf.[6] or [7]) or later the SST model [8], or the one equation Spalart relies on existing closure models.The objective is to establish a data assimilation framework that enhances a low-cost simulation, allowing for precise full field simulation results at reasonable computational cost.This is achieved by relying on the RANS equations and coarse meshes for the forward problem, on a single scalar parameter field for tuning, and an efficient data assimilation implementation in OpenFOAM based on the discrete adjoint method.In particular, a parameterized forcing term is introduced in the Reynolds stress closure term to correct for eddy viscosity model errors, thus not limiting the approach by the eddy viscosity assumption.This is similar to the approach by Patel et al. [31] that use a divergence free force term with the continuous adjoint method.The performance of this approach is assessed by comparing it to alternative data assimilation approaches that act directly on the eddy viscosity.All these approaches aim at improving a particular simulation setup and not the underlying closure models.
This paper is organized as follows.Section 2 introduces the proposed data assimilation approach with all the necessary components, as well as two alternative approaches.One of the alternative approaches is based on modifying the eddy viscosity with a multiplicative field.A recap of this method is presented in appendix B together with a more detailed discussion of the modifications made over the original approach.In section 3 a flow over a backward facing step is used to compare the proposed data assimilation method with its alternatives.This is followed by results with the corrective forcing approach at flows over a periodic hill and a bump setup.Additional results for the periodic hill setup with the modified eddy viscosity and the corrective forcing term are presented in appendices D and E, respectively.Finally, section 4 discusses conclusions and potential areas for future research.

Simulation of turbulent flows 2.1.1 Reynolds-averaged Navier-Stokes equations
Reynolds-averaging separates a quantity θ into its average θ and fluctuation θ ′ .Applying this to the continuity and momentum equations of a steady incompressible Newtonian fluid results in and ∂ ūi ūj with the unclosed Reynolds stresses u ′ i u ′ j and the averaged rate-of-strain tensor Sij = 1 2 Alternatively, eqs.( 1) and ( 2) can be formulated as a system of momentum and pressure equations.The latter is obtained by applying a divergence operator on the momentum equation, followed by a simplification step based on the continuity equation.

Turbulence modeling
To close the RANS eq. ( 2), the Boussinesq hypothesis with the turbulent kinetic energy k and the Kronecker delta δ ij is used.This introduces the eddy viscosity ν t that relates the Reynolds stress tensor to the averaged rate-of-strain tensor Sij .For the steady momentum equation, this yields ∂ ūi ūj with the modified pressure The corresponding pressure equation is derived from this momentum equation and the continuity eq.(1) as with the effective viscosity The k-ε model is used in this work to obtain the eddy viscosity based on the solution of transport equations for the turbulent kinetic energy k and the turbulent rate of dissipation ε.In particular, the default coupledKEpsilon model of foam-extend-5.0 is used.This implementation is based on a coupled solution of these two extra equations.

Data assimilation with a corrective forcing term
Closure models based on the Boussinesq hypothesis (cf.eq. ( 4)) are often successful in engineering applications, but they still suffer from shortcomings.In this work, we introduce a forcing term F i to correct for discrepancies in the divergence of the modeled Reynolds stress tensor as and subject it to a Stokes-Helmholtz decomposition, i. e., with a scalar potential ϕ, a vector potential ψ k , and the Levi-Civita symbol ϵ.In this work, only velocity reference data is assimilated, which only affects the ψ k parameter, whereas the assimilation of pressure reference data would affect the ϕ parameter.This naturally leads to the forcing term being divergence free.
For the residual of the steady momentum equation this yields where the averaged pressure p, the isotropic part of the Reynolds stress tensor, and the scalar potential ϕ (cf.eq. ( 10)) are absorbed in the modified pressure p † as This pressure definition effectively eliminates parameter ϕ from the equations, only leaving parameter ψ k for data assimilation.Furthermore, due to the divergence-free property of the ψ k term, the resulting pressure equation is identical to the one in the basic RANS approach in eq. ( 7), i. e., parameter ψ k is not present in the pressure equation.Compared to the approach by Foures et al. [29] only ψ k appears explicitly in the momentum equations, and the discrete adjoint approach is used.Also, the Reynolds stress tensor still is primarily based on the eddy viscosity model and is only corrected by data assimilation where the model is not sufficient.Further, data assimilation is performed for ψ k , such that no additional equations need to be solved for ϕ and ψ k , as in [30].For the optimization, the parameter field ψ k is thus initialized with a uniform zero value, corresponding to no correction of the eddy viscosity model.In the case of two-dimensional RANS only the ψ z component is relevant, i. e., a scalar field is sufficient to influence both velocity components, which reduces complexity over directly tuning two components of a force F i .This is different than in the work by Patel et al. [31], who also use a divergence-free forcing term.Further differences to their work include the regularization and the parameter mapping.

Alternative data assimilation parameters
Two further parameter options for data assimilation are presented below.They serve to use as benchmark for the corrective forcing term approach.The first choice is a slightly modified version of the method by Brenner et al. [33] which introduces a spatially variable parameter field to locally tune the frozen eddy viscosity field obtained from the k-ε closure model.The second choice relies on tuning the eddy viscosity directly.

Modified eddy viscosity
More concretely, an initial solution of the RANS eq. ( 1) is computed, which serves as baseline solution with corresponding eddy viscosity ν t .A spatially variable parameter field α is then introduced in the RANS equation as with the frozen baseline eddy viscosity ν b t and the modified pressure p ‡ .In this case, the pressure equation will deviate from the baseline expression in eq. ( 7) with parameter α in the viscous term.To simplify notation p is used for any of the above modified pressures, i. e., p * in eqs.( 6) and ( 13), p † in eq. ( 12), and p ‡ in eq. ( 13), in the remainder of the manuscript.
For the optimization, the parameter field α is uniformly initialized with a value of unity across the entire domain.A more detailed description of the method and the modifications made over the previous work by Brenner et al. [33] are presented in appendix B.

Tuned eddy viscosity
A more direct approach is to not use a closure model and directly tune the eddy viscosity ν t in eqs.( 5) and (7).To obtain a converged initial flow solution, a suitable initial value for this eddy viscosity field needs to be selected.In the absence of any prior knowledge a positive uniform field is assumed, which corresponds to an increase in viscosity or a reduction of the Reynolds number.It is thus important to choose an initial value for the eddy viscosity field that is as small as possible, but sufficiently large to obtain a converged solution, i. e., 0 < ν (0) t ≪ 1.

Forward problem formulation
To obtain the numerical solutions of eqs.( 1) and (2), the momentum equations for ūi and the pressure equation for p are solved.Consequently, the solution vector is defined as In each iteration the forward system of equations is linearized and solved in a coupled manner as where ξ represents a generic parameter used for data assimilation.The system matrix of the coupled linear system of equations A U is composed of sub-matrices that describe the implicit contributions of the velocity (A ūū ) and pressure (A ū p) in the momentum equations, and the contributions of the velocity (A pū ) and pressure (A p p) in the pressure equation, respectively.The discretization, i. e., construction of the linear system of equations, and the solution of the same equations is based on the finite volume code OpenFOAM.

Inverse problem formulation
A scalar cost functional f is introduced as a measure for the agreement between model output and reference data.In this work, the cost function consists of two contributions, namely a regularization function f ξ that only depends on the parameter field ξ, and a discrepancy contribution f U that is a measure for the difference between the forward solution U and the reference, respectively, i. e., Based on this cost function, the data assimilation procedure is formulated as an optimization problem with the objective of reducing the discrepancy between the forward problem solution and the reference data by tuning the parameter field ξ.This is an inverse problem as the input parameter ξ for the forward problem is sought based on the results U obtained from the forward problem.

Cost function and regularization
The discrepancy part of the cost function measures agreement of the forward problem solution U with the reference data U ref .
In the presented application, only averaged velocity data is assimilated, i. e., where R is the set of reference cells j, V j the volume of cell j, and the total volume of all reference cells.A regularization function is introduced to reduce ambiguity of the inverse problem.Since the flow setups investigated in this work are only two-dimensional and in the x-y plane, only the z-component of vector potential ψ k has a non-zero value.Thus, regularization only is applied to the ψ z component of the parameter field.This is done without loss of generality, i. e., the other components would be subject to the same regularization in the general three-dimensional case.In particular, following the method by Brenner et al. [33], a function of the form with regularization weight w reg is used to guide the optimization toward smooth parameter fields.Here, index i loops over all cells in the simulation domain Ω, and index k loops over B i , the set of neighboring cells of cell i, with |B i | denoting the number of neighboring cells of cell i.

Test function
Apart from the reference data utilized in the cost function and the consequent optimization procedure, a second set of reference data is used.This collection, called the test dataset, of reference values that are not used for data assimilation, but instead serve for the assessment of the test function f test (U ).The form of this function is equivalent to the discrepancy part of the cost function, i. e., of f U (U ) in eq.(18).
Conceptually, this is similar to the idea of using training, validation, and test data sets in the training of machine learning models.The regularization in eq. ( 20) features a hyper parameter w reg that requires tuning and that introduces a trade-off.Larger values for w reg promote smooth parameter fields, which results in more accurate solution results between the sparse reference data locations, while smaller values result in lower values of the discrepancy part of the cost function f U .
The test function helps to balance this trade-off by measuring the quality of the optimized solution in-between reference locations.A good regularization weight is found, when both cost and the test functions are reduced during the optimization.

Inverse problem solution 2.6.1 Discrete adjoint method
To derive an expression for the cost function gradient, a Lagrangian with Lagrangian multiplier is introduced.The corresponding gradient with respect to the parameters ξ is derived as For a more detailed derivation and description, the reader is referred to appendix A and [33].
Rearranging the terms and considering that the derivative of the linearized forward problem residual R with respect to the linearized forward problem solution U corresponds to the respective system matrix A U (cf. eq. ( 15)) yields The right-hand side of this equation is analytically derived from the cost function and thus comes at low computational cost.Solving this system of linear equations for λ has an associated computational cost that is comparable to solving the forward system for U .It is important to note that this cost is independent of the number of parameters ξ.Applied to the RANS forward problem with residual eq. ( 15) and cost function eq. ( 16) with eq. ( 18), the coupled adjoint system of equations, i. e. eq. ( 24), reads The derivatives ∂R ∂ξ of the forward problem residual with respect to the parameters are evaluated using the approximate and efficient approach introduced by Brenner et al. [33].In particular, the forward problem R is numerically linearized with respect to parameter ξ in OpenFOAM as For the derivative with respect to parameter ξ, that is needed for the evaluation of the adjoint gradient in eq. ( 23), this yields Combining these numerically approximated terms with eq. ( 23) and the solution of eq. ( 24) and expanding the terms for the discrete adjoint gradient.As a simplification, Brenner et al. [33] did not consider the last term in eq. ( 28), i. e., the pressure contribution.While this is justified in their periodic hill case, where this contribution is negligible, it is considered in this work.
In the case of the corrective forcing, the parameter ψ k only appears in a single term in the momentum equation R ū (cf.eq. ( 11)), i. e., where A ū,ψ is the system matrix and b ψ the right-hand side vector of the implicit curl operator, respectively.Due to the divergence-free property of the forcing term, there is no contribution of parameter ψ k to the pressure equation (cf.section 2.2).The derivative of the pressure residual with respect to the parameter, i. e., ∂Rū ∂ψ , is thus zero and there is no contribution of the adjoint pressure λ p to the adjoint gradient.The implicit curl operator required in eq. ( 29) is not available by default in any version of OpenFOAM and was thus specifically implemented and validated for this work (cf.appendix C).
For the modified eddy viscosity approach with parameter α, the corresponding gradient evaluation yields where matrix A ū,α is thus given by the system matrix of the implicit divergence operator of OpenFOAM.In contrast to the corrective forcing term, the pressure contribution is non-zero here.In particular, it is derived as Matrix A p,α is thus built from the implicit Laplacian and divergence operators in OpenFOAM.
In the case where the eddy viscosity is directly adjusted within the data assimilation process, the derivations in eqs.( 30) and ( 31) are analogous.

Parameter map
Depending on the choice of optimization parameter, the range of parameter values might be limited by physical or other constraints.In this work, only positive values are allowed for the DA parameters α and ν t to prevent numerical instabilities, while no such constraint is necessary for parameter ψ z .
This can be addressed by introducing a regularization function f ξ that punishes values outside the permitted limits with large values.However, this approach can not strictly guarantee that these limits are not violated.
Following Brenner et al. [33], a problem dependent mapping function is thus introduced to neatly integrate these limits into the discrete adjoint framework.In particular, the physical parameter field ξ, to which the limits apply, is mapped to the optimization parameter field ζ ∈ R as ξ = ξ (ζ).The gradient based parameter updates are thus performed for ζ and then mapped back to ξ.This update step requires the cost function gradient with respect to the optimization parameter ζ, which is obtained by mapping the adjoint gradient with respect to ξ using the product rule.Evaluating these forward and inverse parameter maps, as well as the gradient mapping, is trivial and comes at very low computational cost.It only involves the numerical evaluation of the corresponding expressions which are derived analytically from the expression of the parameter map.
Here, the exponential-linear map proposed by Brenner et al. [33] is applied for DA parameters α and ν t , while no map (i.e. a linear map) is applied for ψ z .

Optimization
The gradient based Demon-Adam optimization algorithm [39] is employed to perform updates of optimization parameter ζ from iteration step (n) to (n + 1) as The parameter update ∆ζ is a function of the adjoint gradient value and is determined based on the optimization algorithm.
Convergence of the optimizer can be defined based on the cost function value, the gradient norm, the number of iterations, or other measures.In this work, the maximum number of iterations is selected as a-priori convergence criterion for better comparisons between the different approaches and setups.

Reference data
Literature reference data available from public online resources was used for assimilation.Accordingly, the test case setups are replicating the reference case geometries and boundary conditions.In particular, spanwise and temporally averaged LES or DNS velocity reference data was used.
The griddata interpolation function for unstructured data provided by the SciPy library for Python was used for interpolation of the reference data to the cell center locations of the OpenFOAM meshes.In particular, the nearest neighbor method was employed and the full fields were interpolated.To select the reference and test data sets from these full fields, two indicator fields are introduced, to mark test and reference cells, respectively.

Implementation
The implementation is based on foam-extend-5.0 [40], a version of the open-source CFD framework OpenFOAM.This specific version was selected because it offers coupled solvers, e. g. for the RANS equations.In particular, the pUCoupledFoam solver was modified and extended for the different DA parameters in this work.The extension, which was introduced in [33], includes a periodic forcing term to drive stationary flows in periodic domains.Three solvers were implemented, one for each DA parameter.In the case of parameter ψ k , this warranted the implementation of an implicit curl operator (cf.appendix C).These solvers were modified to solve the RANS forward equations and the adjoint equations, as well as to evaluate the cost function and the adjoint gradient.Considering the two-dimensional equations considered in this work, all solvers are dedicated two-dimensional implementations.This reduces the size of the linear systems of equations and thus the memory usage as well as the linear solver cost.
Figure 1 gives an overview of the complete data assimilation process.The process is controlled by a Python script that interacts with the OpenFOAM solvers and performs the optimization loop including the mapping of parameters and gradients.
To create the meshes, the OpenFOAM-8 [41] utility blockMesh was used.This version of OpenFOAM provides a more extensive implementation of the utility, allowing more control over the mesh generation process.

End Yes
No Figure 1: Flowchart of the optimization based data assimilation procedure.Starting from an initial RANS setup, the parameter field ξ is updated iteratively based on the cost function gradient df /dξ, until convergence is reached.An OpenFOAM solver is used to evaluate the forward and adjoint problems, as well as the cost function and its gradient (gray shading).To enforce bounds for ξ, parameter updates are performed for optimization parameter ζ which is connected to ξ by a mapping function.The parameter updates, the parameter mapping, and the optimization are implemented in a Python script.

Results and discussion
In this section, results for the three data assimilation parameter choices that were introduced above are presented.The results for a backward facing step setup with two different reference and test data distributions are compared to discuss the effectiveness of the different methods.Then, further results for the corrective forcing term method in a periodic hill and in a bump setup are presented.Further results can be found in appendices D and E.

Comparison of data assimilation approaches for a backward facing step setup
The simulation domain of the backward facing step setup used as a test case, and presented in fig.2, is based on the work by Pont-Vílchez et al. [42].Time-averaged DNS velocity results of their work serve as reference data here.The setup features an inlet channel of height H, followed by a backward facing step with expansion ratio 2. Considering the viscosity ν and the inlet bulk velocity ūb , the Reynolds number is The domain is spatially discretized to obtain a coarse mesh of 4800 cells.The boundary conditions for the boundaries indicated in fig. 2 are based on the reference data case.All walls feature no-slip boundary conditions for velocity ū, Neumann boundary conditions for pressure p, and wall models for turbulent kinetic energy k and the turbulence dissipation ε.The inlet velocity profile is interpolated from the reference data and velocity fluctuations at the inlet are assumed to have a magnitude of 5 % of the average inlet velocity, and the turbulent length scale is assumed to be 10 % of the inlet channel height H, to compute values for k and ε, respectively.Zero gradient boundary conditions at the outlet are chosen for all fields except pressure, where a fixed value is chosen for pressure.
The baseline RANS solution with a turbulence model results in velocity profiles ūini x that deviate from the reference ūref x , especially in the wake of the step, as depicted in fig. 3. Much better agreement with the reference is found in the upstream channel, which is attributed to the velocity inlet boundary condition that is interpolated from the reference data.Further downstream of the step, where its effects fade off, the agreement with the reference improves again. of the direct eddy viscosity tuning approach.All length scales are normalized by the inlet channel or step height H and all velocities by the inlet bulk velocity ūb , respectively.Figure 4 depicts the two reference and test data distributions considered in this section, that is, a dense distribution with 210 reference and 178 test points, and a sparse distribution with 55 reference and 40 test points, respectively.These reference points are concentrated in the section just downstream of the step, where most of the relevant flow dynamics occurs and where turbulence modelling is most challenging.The corresponding test points are placed inbetween the reference points to best measure the effects of data assimilation and regularization on the flow field where no information is assimilated.
Next, data assimilation results for this backward facing step setup with both dense and sparse reference distributions are presented for all three approaches.

Data assimilation for eddy viscosity
For data assimilation parameter ν t , no RANS turbulence model is solved during the forward solution process.To initialize the eddy viscosity field, a value of ν regularization weight of w reg = 1 × 10 −3 is used in this case.
As seen in the cost function plots of fig. 5, the test and cost functions get continuously but slowly reduced for both reference configurations during the optimization.The velocity profile plots show that both optimized results strongly deviate from the reference.Further, it can be observed that the regularization works well here, as the test function continuously decreases and smooth optimized velocity profiles are obtained.The agreement with the reference, however, is very poor.To conclude, this approach proved to be unsuccessful for data assimilation in the case of this setup.

Data assimilation for eddy viscosity correction
Next, the performance of the modified eddy viscosity for the backward facing step setup is investigated, again for the two configurations presented in fig. 4. The baseline eddy viscosity field ν b t results from solving the forward problem with the k-ε model in a precursor step.Not surprisingly, the resulting initial velocity field ūini is much closer to the reference data than in the previous case, as depicted in fig. 7. The eddy viscosity field is then frozen, i. e., not updated, when solving the forward problem during the optimization.An initially uniform parameter field of α (0) = 1 is chosen, and the optimization is performed with a learning rate of η = 1 × 10 −3 , a number of total iteration steps of N opt = 25, and a regularization weight of w reg = 5 × 10 −5 .The resulting cost and test function plots in fig.7 indicate a continuous improvement throughout the domain, but it is evident from the velocity profiles that this is mainly happening downstream of the recirculation zone.In the recirculation zone, the baseline solution deviates most from the reference, and data assimilation hardly improves the results there, in particular in the case of the sparse reference data distribution.
The final results are much closer to the reference, if the eddy viscosity correction factor is tuned instead of the eddy viscosity directly.Nevertheless, this approach falls short of producing a very close match.While data assimilation for α was reported to provide good results for a well-resolved periodic hill case, it seems to perform much worse in this case.Like the eddy viscosity assumption in general, the influence of parameter α is limited to flow regions of large mean shear rates.

Data assimilation for corrective forcing
Finally, the corrective forcing approach is applied to the backward facing step setups.Starting from an initial uniform parameter field ψ (0) z = 0, the same initial RANS solution as in the previous case (cf.section 3.1.2) is seen, as depicted in fig. 7. Data assimilation is subsequently performed with a learning rate of η = 1 × 10 −2 , a number of total iteration steps of N opt = 40, and a regularization weight of w reg = 4 × 10 −5 .Contrary to the other approaches, the modeled eddy viscosity is updated in every optimization step.
The cost and test function plots in fig.7 get drastically reduced during the optimization procedure, clearly surpassing the previously demonstrated results.From the corresponding velocity profile plots it is clear that the dense reference data lead to excellent agreement with the reference data, while there still are slight deviations present in the recirculation zone for the sparse case.

Discussion of the backward facing step results
This section demonstrates the differences in data assimilation performance between the three approaches.Directly working on the eddy viscosity that is initialized with a uniform field leads to large deviations in the initial solution, which are not significantly reduced through data assimilation.
Tuning the eddy viscosity obtained from a proven physics-based closure model is more promising.The initial solution is much closer to the reference and is further improved through data assimilation.However, for the present setup, the improvement is not satisfactory.What makes this setup difficult for RANS solvers is the coarse mesh resolution and the sharp edge at the step with the associated recirculation.
The corrective forcing proves to be the most powerful approach.It combines a good initial condition of the closure model with a source term that can influence the flow beyond the limitations of an eddy viscosity model.The coarse mesh resolution is no limitation and thus allows for very good data assimilation results at low cost.

Further results for data assimilation with corrective forcing
In this section, results for data assimilation with the corrective forcing approach for two further flow problems are presented, namely the periodic hill and the bump setups.As for the backward facing step above, directly tuning ν t starting with an initially uniform parameter field does not yield meaningful results for any of the two setups.For the periodic hill setup, correcting the eddy-viscosity with parameter α can produce convincing results, as shown by Brenner et al. [33].However, for the coarse mesh used here, the improvements are less convincing, which is further discussed in appendix E. It will be shown that for the bump setup, the α approach fails to produce accurate results.

Periodic hill
The periodic hill geometry with its corresponding sparse reference data distribution is depicted in fig.8.A coarse mesh of 74 cells in x-direction by 60 cells in y-direction (total cell count is 4440) is used with 48 reference and 42 test points that are sparsely distributed in the domain.Velocity data from LES simulations by Gloerfelt and Cinnella [43], averaged in time and in z-direction, are considered as reference.This setup features a flow at a Reynolds number of Re = ūb H ν = 10 595 , (34) based on the bulk velocity ūb over the hill crest of height H and the kinetic viscosity ν.At the upper and lower walls, no-slip condition is set for the velocity, and Neumann condition for the pressure.For the eddy viscosity the low Reynolds number boundary condition is chosen (i.e., ν t at the wall face is zero).
The resulting baseline RANS solution (cf.fig.9) is already close to the reference, except in the wake region of the hill.This recirculation zone and the position of the reattachment point are not well captured, however.The optimization is performed with a learning rate of η = 8×10 −6 , a number of total iteration steps of N opt = 100, and a regularization weight of w reg = 9 × 10 −4 .Figure 10 shows a clear reduction in both the cost and test function values, which can also be observed in the obtained velocity profiles.Across the domain the optimized velocity field matches the reference very well with a particular improvement in the wake region.Considering the low number of reference data points, this is a convincing improvement.The reattachment point of the optimized result is at x/H = 3.901 which is considerably closer to the reference value of x/H = 4.268 than the baseline RANS result at x/H = 3.069.
Results for denser reference data configurations are presented in appendix D; similar levels of improvement can be observed.Again, the same learning rate and regularization weights were employed for all configurations.
In appendix E equivalent results are presented for the tuning of parameter α.Although convincing results for this setup and approach have been presented in the literature using a more refined mesh (cf.[33]), the results of this work suggest that this does not apply to the rather coarse meshes used here.The corrective forcing term optimizations perform significantly better.

Bump
Another similar setup featuring a distinct but small separation bubble is the bump case by Webster et al. [44], for which Matai and Durbin [45] provide averaged LES reference data.In this analysis we focus on the specific bump configuration with height H = 42 mm as used in the original publication.The Reynolds number of the flow is with viscosity ν and nominal free stream velocity ū∞ .The geometry with boundary conditions and the reference data distribution are shown in fig.11.For the upper boundary, a slip condition is chosen for all fields, while a no-slip condition is chosen for the velocity of the lower wall with wall functions for k, ε, and ν t and a Neumann condition for pressure.The velocity profile at the inlet is interpolated from the reference data, and the k, ε, and ν t fields are assigned fixed values based on the free stream turbulence intensity of 0.2 % as reported in [44].For the outlet, a Neumann boundary condition is applied for all properties except pressure, for which a Dirichlet condition is used.A coarse mesh of 15 cells in x-direction by 50 cells in y-direction (total cell count is 7500) is used with approximately uniformly spaced, interlaced patterns of 200 reference and 171 test points.Results for data assimilation in the bump case with parameter ψ k are presented in fig.12. From the velocity profile plots it is evident that the baseline RANS solution agrees very well with the reference data with only small deviations present around and downstream of the bump.Data assimilation was performed with a learning rate of η = 1 × 10 −3 , a number of total iteration steps of N opt = 50, and a regularization weight of w reg = 2 × 10 −4 .The improvement of cost and test functions is clearly less than one order of magnitude.Regarding the velocity profile plot, which only represent a small sample of the domain, no clear improvement can be seen.
Equivalent studies were performed for the tuning of α and ν t , respectively.Neither of these two methods yielded an improvement in the cost function across a wide range of optimization and regularization parameter values.However, while not achieving the same level of improvement observed in other cases, the corrective forcing approach stood out as the only method that showed any improvement in this apparently challenging setup.Compared to the the periodic hill and backward facing step setups, the initial RANS solution is already very close to the reference and leaves only little room for improvement.

Conclusions and outlook
This paper presents an efficient data assimilation approach for RANS simulations based on coarse meshes and a corrective forcing term.The latter is computed from a vector potential when only velocity reference data are used, as in this work, resulting in its divergence-free property.While applicable in a general context, this approach shows particular advantages in two-dimensional setups where the parameter field simplifies to a scalar field.The effect of the data assimilation parameter extends beyond the constraints of the eddy viscosity and is observed to consistently produce better optimized outcomes in comparison to alternative methods that rely on adjusting the eddy viscosity.Even in the challenging case of flow over a bump, where the other methods failed entirely, the divergence-free forcing approach delivers a modest improvement.The efficacy of the approach was established across three setups, namely flow over a backward facing step, periodic hills, and a bump, respectively.In the backward facing step and periodic hill setups, different reference data distributions were considered.It has been observed that the regularization weight parameters As anticipated, more tightly packed reference distributions yielded superior outcomes, but regularization allowed for accurate results with sparse distributions.It has been observed that the same values for the regularization weight could be used for different reference data distributions of a specific setup.While a number of flow setups and reference data distributions were considered in this work, the optimal number and placement of reference and test data points for this data assimilation approach still needs to be further investigated.Ideally, only very little few well placed reference data points need to be sourced in an experimental context.This could be combined with a more in-depth analysis of parameter field regularization and fine-tuning of the regularization weight parameter.
The required number of optimization steps in the presented cases is reasonable, as the computational cost of the incremental flow solutions is relatively low.A more detailed study of alternative optimization methods could be performed to further improve the overall performance of the approach.

A Discrete adjoint method
To derive an expression for the cost function gradient, we introduce a Lagrangian L as in eq. ( 21) with Lagrangian multiplier λ (cf.eq.( 22)).The corresponding gradient with respect to the parameters ξ is derived as To simplify the second to last line above, the Lagrangian multiplier λ is chosen such that the expression in brackets vanishes.Rearranging the terms and considering that the derivative of the linear forward problem residual R with respect to the linear forward problem solution U corresponds to the respective system matrix A U yields The right-hand side of this equation is analytically derived from the cost function and thus comes at low computational cost.Solving this system of linear equations is comparable in cost to solving the forward system for U and independent of the number of parameters ξ.
The derivatives of the forward problem residual with respect to the parameters ∂R ∂ξ are evaluated using the approximate and efficient approach introduced by Brenner et al. [33].In particular, the forward problem R is linearized with respect to ξ, denoted as R ξ , in OpenFOAM as The derivatives thus correspond to the system matrix A ξ .Combining these approximate terms with eq. ( 36) and the solution of eq. ( 37) and expanding the terms yields for the discrete adjoint gradient in RANS.

B modified eddy-viscosity approach
The RANS data assimilation approach by Brenner et al. [33] is based on the momentum equation with modified pressure and baseline eddy viscosity ν b t , which is kept constant (i.e. frozen) for the data assimilation procedure.Instead, the spatially averaged parameter field α is tuned in the process, starting from an initially uniform field of α (0) = 1.Brenner et al. [33] showed good performance of this method for the periodic hill setup, despite its limitations to only influence the flow in regions with sufficiently large eddy viscosity and velocity gradient values.
The linearized version of eq.(40 with linearized velocity φj based on the last iteration step; terms considered implicitly in the discretized system of equations are grouped on the left-hand side of the equation, while explicitly treated terms are grouped on the right-hand side.
In the present work, the solver was modified to provide periodic solutions λ of the adjoint system of equations.This mandated additional steps to improve solver convergence and stability.The parameter updates based on the periodic adjoint gradients, and thus the parameter fields, are consequently periodic as well.Accordingly, the TV regularization implementation was extended to consider cell boundaries across periodic domains.
A minor change was introduced to the cost and test functions, which are now normalized by the volume and not the number of reference and test cells, respectively.In addition, the TV regularization is now considered without the extra smoothing step that was part of the original approach and consisted of the solution of an additional Helmholtz equation.This is sufficient to provide smooth results and further reduces the computational cost.

C Implicit curl operator: Implementation and validation
The foam-extend-5.0 [40] package does not provide an implicit curl operator that is required to evaluate the adjoint gradient for DA parameter ψ k (cf.eq. ( 29)).As part of this work, this relevant part of the missing operator was successfully implemented, validated, and employed in the DA procedure.The implementation of the operator is presented next, followed by a description of the validation approach, and a summary of the validation result.

C.1 Implementation
The implementation does not feature a full-fledged implicit OpenFOAM operator for the curl of a vector field, but only the part relevant to this work, i. e., the corresponding system matrix.Applying the curl operator on a vector field v yields where the last step represents the discrete representation.This representation bears some similarities with the gradient operator applied to a scalar field s as Specifically, the operator matrix entries in eq. ( 43) correspond to the entries in the operator in eq. ( 44).In the implementation this is leveraged by assembling the implicit curl operator system matrix with the corresponding elements of the implicit gradient operator according to eq. (43).

C.2 Validation
The method of manufactured solution is employed to validate the implementation.To this end, an analytical reference solution v ref is defined and evaluated based on explicit operators to obtain a value for the right-hand side of the system of equations.The operator implementation is then validated by solving this system of equations for v and comparing the result to the reference field v ref The implicit curl operator produces only off-diagonal matrix entries, so a system of equations consisting of only this implicit operator alone cannot be solved numerically.To obtain a better suited matrix, an additional term av with parameter a is introduced to give more weight to the matrix diagonal by where the implicit left-hand side corresponds to matrix A of the system of linear equations and the explicit right-hand side to the source term r.In discrete form, this reads For validation, an analytical reference solution v ref was selected and evaluated at the cell center locations of the test case meshes, using the implicit source and curl operators provided by OpenFOAM.Accordingly, the analytical expression for the right-hand side was derived and evaluated.A total of seven three-dimensional test cases were set up, which differ in boundary conditions, domain shape and size, mesh resolution, and mesh cell properties.In particular, periodic and Dirichlet boundary conditions were considered, and the mesh cell shapes ranged from uniformly hexahedral to very elongated and skewed.Equation (46) was then solved numerically for all cases, where relatively large values had to be chosen for a to reach convergence.
From the results of all test cases it was found, that there are no boundary effects and that the results agree very well with the reference.The implementation of the implicit curl operator is thus concluded to be correct.

D Additional periodic hill results for the divergence-free source term approach
In this section, additional results for the periodic hill case from section 3.3.1 are presented.In particular, two additional sets of reference and test data distributions are introduced, as depicted in fig.13.These setups are referred to as dense and intermediate reference density cases, respectively.They feature the same mesh and boundary conditions as the sparse case in section 3.3.1,which has 48 reference and 42 test points.In comparison, the intermediate case features 120 reference and 130 test points, while the corresponding numbers of the dense case are 270 reference and 285 test points, respectively.To facilitate the analysis of the influence of the number of reference data points on the optimization result, the optimization and regularization parameters are chosen identically for all three cases.The learning rate η was set to 8 × 10 −6 , a number of total iteration steps of N opt = 100, and the regularization weight w reg to 9 × 10 −4 .
Figure 14 depicts the resulting velocity profiles and the cost function plots for the dense, intermediate, and sparse reference data.Qualitatively comparing the optimized velocity profiles shows clear improvements in the recirculation region, where the difference between baseline RANS ūini and reference ūref is the greatest in the hill wake for all setups.The main deviations between the optimized results and the reference are observed near the lower wall in the inlet plane, with other smaller deviations just below the centerline.These are attributed to the reference data distribution, as there is a gap between the first plane of reference locations near the inlet and outlet planes, respectively.Similarly, the density of reference data points is closer in the near wall region than in the bulk region.
The reattachment length of the optimized results are closer to the reference values, as reported in table 1.This is consistent with observations made in the velocity profile plots.

E Periodic hill results for the modified eddy viscosity approach
For the same three periodic hill setups described in appendix D, data assimilation was performed for parameter α.These cases are comparable to the ones presented in [33], albeit with all the modifications discussed in appendix B. Specifically, the periodicity of the Lagrangian multipliers λ is considered, which requires an under-relaxation of the corresponding system of linear equations for a stable solution process.This is also true for the periodic hill results with the divergence-free forcing term presented in appendix D. Further, the mesh in this work has a much lower resolution of only 4400 cells, instead of 23 400 in [33]; the Demon-Adam optimizer is used over the Gauss-Newton method, and no additional smoothing step is employed.Data assimilation was performed with a learning rate of η = 1 × 10 −3 , a number of total iteration steps of N opt = 100, and a regularization weight of w reg = 5 × 10 −8 .Figure 15 shows the resulting cost function and velocity profile plots.An improvement in cost function values and velocity profiles is observed for all three configurations, while the results are better for higher numbers of reference data points.However, both the reduction in cost and test functions and the improvement in velocity profiles are less than in the results for the divergence-free forcing approach.Compared with the literature results in [33], the performance here is worse.This is attributed to the vastly reduced mesh resolution and the lack of an additional smoothing step which requires the usage of stronger regularization.For the sparse configuration, the regularization almost completely blocks any improvement.
As reported for the ψ k parameter, the reattachment lengths of the optimized results are also closer to the reference values for the α optimization parameter (cf.table 2).

Figure 2 :
Figure 2: The backward facing step domain as presented by Pont-Vílchez et al. [42] with labels for the different boundaries.All length scales are normalized by step height H.

Figure 3 :
Figure 3: Stream wise velocity components of the reference ūref , and the baseline RANS results ūini for the backward facing step setup at Re = 6932.The left plot shows the baseline RANS solution, while the right plot shows the solution for the initially uniform eddy viscosity ν (0) t

= 5 ×Figure 4 :Figure 5 :
Figure 4: Distribution of reference (dots) and test (crosses) data points for the backward facing step in the dense (left) and sparse (right) setups.

Figure 6 :
Figure 6: Data assimilation results with parameter α for the backward facing step (cf.fig.2) with dense (top) and sparse (bottom) reference data distributions (cf.fig.4).Shown are the cost function components and the test function (left), as well as the streamwise velocity components of the reference ūref , the baseline RANS results ūini , and the optimized results ū (right).

Figure 7 :
Figure 7: Data assimilation results for parameter ψ k with the backward facing step (cf.fig.2) with dense (top) and sparse (bottom) reference data distributions (cf.fig.4).Shown are the cost function components and the test function (left), as well as the stream wise velocity components of the reference ūref , the baseline RANS results ūini , and the results ū (right).

Figure 8 :
Figure 8: Simulation domain of the periodic hill setup with indicated boundary types (left).Mean flow is in x direction.All length scales are normalized by the hill height H. Distributions of reference (dots) and test (crosses) data points for the sparse configuration are shown on the right.

Figure 9 :
Figure 9: Stream wise velocity components of the reference ūref , and the baseline RANS results ūini for the periodic hill setup at Re = 10 595.All length scales are normalized by the hill height H and all velocities by the bulk velocity at the hill crest ūb , respectively.

Figure 10 :
Figure10: Data assimilation results with parameter ψ k for the periodic hill setup with the sparse reference data distribution (cf.fig.8).Shown are the cost function components and the test function (left), as well as the stream wise velocity components of the reference ūref , the baseline RANS results ūini , and the optimized results ū (right).

Figure 11 :
Figure 11: The simulation domain of the bump setup with indicated boundary types is shown in the left plot.Mean flow is in x direction and all length scales are normalized by the bump height H.The distribution of reference (dots) and test (crosses) data points are shown in the right plot.

Figure 12 :
Figure12: Data assimilation results with ψ k for the bump setup (cf.fig.11).Shown are the cost function components and the test function (left), as well as the stream wise velocity components of the reference ūref , the baseline RANS results ūini , and the optimized results ū (right).All length scales are normalized by the bump height H and all velocities by the free stream velocity ū∞ , respectively.

Figure 13 :
Figure 13: Distribution of reference (dots) and test (crosses) data points for the dense (left) and intermediate (right) configurations of the periodic hill setup (cf.fig.8).

Figure 14 :
Figure 14: Cost function (left) and mean stream wise velocity profiles of the mean-flow direction component ūx (right) for data assimilation parameter ψ k in the periodic hill geometry.The rows correspond to the dense (top), intermediate (center), and sparse (bottom) reference patterns, respectively (cf.figs.8 and 13).Plots for the sparse configuration are identical to those presented in fig.10.

Table 1 :
This table compares the reattachment lengths of the baseline RANS solution and the data assimilation results for parameter ψ k with different reference data configurations.The reattachment lengths are reported as the distance from the left boundary normalized by the hill crest height H.The relative errors are reported with respect to the reference value.

Table 2 :
This table compares the reattachment lengths of the baseline RANS solution and the data assimilation results for parameter α with different reference data configurations.The reattachment lengths are reported as the distance from the left boundary normalized by the hill crest height H.The relative errors are reported with respect to the reference value.