Machine Learning for RANS Turbulence Modelling of Variable Property Flows

This paper presents a machine learning methodology to improve the predictions of traditional RANS turbulence models in channel flows subject to strong variations in their thermophysical properties. The developed formulation contains several improvements over the existing Field Inversion Machine Learning (FIML) frameworks described in the literature, as well as the derivation of a new modelling technique. We first showcase the use of efficient optimization routines to automatize the process of field inversion in the context of CFD, combined with the use of symbolic algebra solvers to generate sparse-efficient algebraic formulas to comply with the discrete adjoint method. The proposed neural network architecture is characterized by the use of an initial layer of logarithmic neurons followed by hyperbolic tangent neurons, which proves numerically stable. The machine learning predictions are then corrected using a novel weighted relaxation factor methodology, that recovers valuable information from otherwise spurious predictions. The study uses the K-fold cross-validation technique, which is beneficial for small datasets. The results show that the machine learning model acts as an excellent non-linear interpolator for DNS cases well-represented in the training set, and that moderate improvement margins are obtained for sparser DNS cases. It is concluded that the developed machine learning methodology corresponds to a valid alternative to improve RANS turbulence models in flows with strong variations in their thermophysical properties without introducing prior modeling assumptions into the system.


Turbulence modelling
The governing equations of fluid flow have long been established, yet modelling turbulence remains one of the biggest challenges in engineering and physics. While it is possible to resolve the smallest scales of turbulent flows using direct numerical simulations (DNS), DNS is still unfeasible for real-world engineering applications. Due to this reason, engineers must rely on RANS turbulence models to describe turbulent flows. However, most of the development of turbulence models has focused on isothermal incompressible fluids. Therefore, these models can be inaccurate when applied to flows with strong variations in their thermophysical properties [1,2], such as supercritical fluids or hypersonic flows. Understanding the behaviour of flows subject to strong property gradients is critical for several engineering applications, such as heat exchangers, supersonic aircraft, turbomachinery, and various applications in the chemical industry [3,4,5,6,7]. Even incompressible fluids, such as water, can present large changes in viscosity when subjected to temperature variations.
For incompressible constant-property flows, the governing parameter in the description of turbulent boundary layers is the Reynolds number. For compressible flows, the Mach number and the associated changes in properties become additional parameters that characterize turbulent wall-bounded flows. From past studies, it is known that differences between a supersonic and a constant-property flow can be explained by simply accounting for the mean fluid property variations, as long as the Mach number remains small [8]. This result is known as Morkovin's hypothesis [9]. DNS of compressible channel flows [10] also suggest that in the near-wall region most of the density and temperature fluctuations are the result of solenoidal 'passive mixing' by turbulence. Previous work by Patel et al. [11] has provided a mathematical basis for the use of the semi-local scaling as proposed by Huang et al. [12]. It was concluded that under the limit of small property fluctuations in highly turbulent flows, a change in turbulence is governed by wall-normal gradients of the semi-local Reynolds number, defined as: where ρ is the density, µ dynamic viscosity, the bar denotes Reynolds averaging, the subscript w indicates the value at the wall, and Re τ is the friction Reynolds number based on wall quantities and the half channel height, h. Thus, Re ⋆ τ provides a scaling parameter which accounts for the influence of variable properties on turbulent flows. With the semi-local scaling framework and the fact that variable property turbulent flows can be successfully characterized by Re ⋆ τ , two main developments followed. First, in Patel et al. [13], a velocity transformation was proposed which allows to collapse mean velocity profiles of turbulent channel flows for a range of different density and viscosity distributions. Although following a different approach, this transformation is equivalent to the one proposed by Trettel and Larsson [14]. Second, this insight has later been used in Pecnik and Patel [3] to extend the semi-local scaling framework to derive an alternative form of the turbulent kinetic energy (TKE) equation. It was shown that the individual budget terms of this semi-locally scaled (TKE) equation can be characterized by the semilocal Reynolds number and that effects, such as solenoidal dissipation, pressure work, pressure diffusion and pressure dilatation, are indeed small for the flows investigated. Based on the semi-locally scaled TKE equation, Rodriguez et al. [1] derived a novel methodology to improve a range of eddy viscosity models. The major difference of the new methodology, compared to conventional turbulence models, is the formulation of the diffusion term in the turbulence scalar equations.
While these corrections improve the results of RANS turbulence models significantly, they can still be subject to further improvements. Due to these reasons, the present investigation will focus on building ML models to improve the performance of existing RANS turbulence models.

Machine Learning
In recent years, machine learning has been successfully applied in fluid mechanics and heat transfer due to its inherent ability to learn from complex data, see for instance Chang et al. [15]. While different ML methods are available, deep neural networks have emerged as one of the most promising alternatives to improve turbulence modelling [16]. These systems are able to approximate complex non-linear functions by using nested layers of non-linear transformations, which can be adapted to the context of every application to optimize the usage of computational resources and to mitigate over-fitting. Different types of neural networks currently hold the state-of-the-art accuracy record in challenging domains, such as computer-vision or natural-language processing [17]. During the last decade, one of the main reasons behind the success of deep learning has been the ability of neural networks to both approximate general non-linear functions while providing multiple alternatives to optimize their design.
Significant works in the context of deep learning applied to CFD can be found in the studies of Ling et al. [18], who developed deep neural networks to model turbulence with embedded Galilean invariance, or in the work of Parish and Duraisamy [19], where field inversion machine learning (FIML) is proposed in the context of CFD. Despite the abundance of recent works, significant research is still required regarding the application of ML in the context of CFD, and rich datasets to study turbulence in complex conditions must still be outlined. The future availability of datasets to study turbulence in complex geometries is particularly promising, as this could yield new models with strong applications to industrial and environmental problems.
The methodology for the present study is based on the FIML framework proposed by Parish and Duraisamy [19]. This methodology focuses on building corrections for existing RANS turbulence models instead of attempting to rebuild existing knowledge entirely. In the FIML framework, the process of building machine learning models is split into two stages. In the first stage, a data gathering process known as field inversion is performed, where the objective is to identify an ideal set of corrections for the RANS turbulence model under study. Then, in the second stage, a machine learning system is trained in order to replicate the corrections identified. The main advantage of this procedure is that the training process of a neural network is effectively decoupled from the CFD solver, thereby improving the efficiency of the procedure by several orders of magnitude.
For the present work, several modifications are proposed with respect to the study made by Parish and Duraisamy [19] and the subsequent publications of Singh et al. [20,21,22]. The modifications considered cover different stages of the problem; such as the optimization methods employed in field inversion, the generation of automatic formulas to compute the gradients of the CFD system, the possibility to automate the process of generating feature groups for the ML system, and novel methods to improve the stability of the FIML methodology while making predictions.

Fully developed turbulent channel flows
In this work we consider fully developed turbulent channel flows for which a large number of available DNS studies exist, and for which the time and space averaged conservation equations can be substantially simplified.

DNS database
The DNS database of turbulent planar channel flows consists of three different sets of simulations. The first set represents variable property low-Mach number channel flows with isothermal walls, heated by a uniform volumetric source to induce an increase of temperature within the channel [3,13,23]. Using different constitutive relations for viscosity µ, density ρ and thermal conductivity λ as a function of temperature, different DNS cases are used to study the effect of varying local Reynolds and Prandtl number on near wall turbulence. The cases with their respective relations for the transport properties and their corresponding wall-friction velocity based Reynolds number and local Prandtl number are summarized in table 1 (low-Mach number cases). Most of the cases have a friction based Reynolds number at the wall of Re τ =395. Depending on the distribution of density, viscosity, and conductivity, the semi-local Reynolds number Re ⋆ τ and the local Prandtl number are either constant, increasing or decreasing from the walls to the channel center. More details on the cases can be found in Refs. [13,23,3]. The second set of DNS consists of high-Mach number compressible channel flow simulations with air modeled as a calorically perfect gas [14] (high-Mach number cases). The Mach number ranges from 0.7 to 4 and the corresponding constitutive laws for the transport properties, Re τ and Prandtl number Pr are summarized in table 1 as well. The third set of simulations contains incompressible channel flows [24] (incompressible cases). These cases been added as an additional set to train the FIML framework to account for a large range in Reynolds numbers for incompressible flows.
For all of the variable property DNS cases, it is possible to show that Morkovin's hypothesis applies [11]. This hypothesis establishes that only the averaged values in molecular properties can be used to characterize the changes in turbulence, and that any higher-order correlations of turbulent fluctuations observed in these properties have a negligible impact in the mean balances [11,10].

RANS equations
To model the turbulent channel flows described above, we use the Reynolds/Favre averaged Navier-Stokes equations. For a fully developed turbulent channel flow, the only in-homogeneous direction of the averaged flow corresponds to the wall-normal coordinate, leading to a set of one-dimensional partial differential equations for the mean momentum, mean energy and any additional transport equations for the turbulence quantities used to close the RANS equations. The Reynolds/Favre averaged streamwise momentum and energy equations for a fully developed turbulent channel flow read with the variables u and T referring to the Favre-averaged streamwise velocity and the cross-sectional temperature profiles, respectively. The variables c p , Pr t and S φ refer to the isobaric heat capacity, the turbulent Prandtl number, and an arbitrary volumetric heat source term. The coordinates x and y further refer to the streamwise and the wall-normal IC.Re4200 ---4200 --- Table 1: DNS database of turbulent channel flows with variable properties (low-Mach) [13,3], with ideal gases at high-Mach numbers [14], and with constant properties (incompressible) [24].
directions for the channel flow. The wall based friction Reynolds number, the Prandtl number and the friction based Eckert number are defined as with u τ,w = τ w /ρ w the friction velocity, h the channel half width, γ the ratio of specific heats and Ma τ,w = u τ,w /a w , where τ w is shear stress and a w the speed of sound at the wall. Given these non-dimensional groups, the nondimensional density, temperature, viscosity, thermal conductivity are one at the wall, while the non-dimensional isobaric heat capacity is c p = 1 in the whole domain.
The mean momentum and energy equations make use of the Boussinesq approximation and the strong Reynolds analogy to model the turbulent shear stress, the turbulent heat transfer, and the turbulent dissipation in the energy equation (first term on the right-hand-side). As such, a turbulent eddy viscosity µ t appears in eqs. (2) and (3), which is commonly provided by an eddy viscosity model. While many eddy viscosity models exist in literature, in this work we choose a simple k − ε turbulence model [25], which has also been used in our previous studies to model turbulence in variable property turbulent channel flows [1]. The equations for the turbulent kinetic energy k and turbulent dissipation ε read with the supporting damping functions and the definition of the turbulent Reynolds number, the semi-locally scaled wall distance and the eddy viscosity, respectively, The constants take the following values: C ε1 = 1.4, C ε2 = 1.8, C µ = 0.09, σ k = 1.4 and σ ε = 1.3. Note, the original model uses the wall distance based on viscous wall units y + in the damping functions. Here, we replaced y + with y ⋆ to account for the changes in viscous length scales due to changes in density and viscosity close to the wall [1]. A preliminary study revealed that the approximation of Pr t ≈ 1 yields accurate temperature profiles for the DNS cases that we will consider. The Python and the Matlab source codes to solve the set of RANS equations with the associated boundary conditions can be found on Github [26]. The velocity profiles for a few selected cases with the original MK turbulence model are shown in Fig. 1. Large deviations occur in flows subject to strong variable-property gradients. The largest deviations found in such regimes can be found in the DNS case JF M.CRe ⋆ τ from Table 1. Here, it can be noted that the maximum error margin reaches a magnitude of 30.4% at the channel center (y = H). Based on these results, it can be noted that the MK turbulence model corresponds to an interesting target for ML optimization, since there exist large deficits to be mitigated.

Improved field inversion machine learning for variable property turbulence
In this section we present an improved methodology of the FIML as proposed by Parish and Duraisamy [19], which is also suitable to account for turbulence in variable-property flows.

Field inversion
In order to minimize the difference between the DNS and the modeled velocity obtained with the RANS approach, the original k − ε equations are modified by introducing field inversion multipliers β. The turbulent kinetic energy k and the turbulent dissipation ε, eqs. (5) and (6), can then be written as Contrary to Parish and Duraisamy [19], we multiply the dissipation rather than the production terms, in order to adhere to energy conservation, i.e. turbulent kinetic production also appears in the mean kinetic energy equation with an opposite sign [27]. On the other hand, introducing β k in the k-equation can lead to an imbalance of turbulent production and dissipation in the log-layer region. Therefore, we also present results where only β ε is used to perform the field inversion, since the ε-equation contains the largest amount of empiricism. Another reason to modify the dissipation instead of the production terms is because the production is active in a smaller region of turbulent boundary layers. This implies that field inversion optimizers modifying the dissipation term have a larger capacity to build corrections in regions where other budget terms of RANS turbulence models are still active, such as the diffusion terms.
It is important to note that field inversion optimizers build corrections, which are ideal with respect to the cost function formulated. As a result, the cost function for the field inversion process must be carefully designed, to minimize not only the differences between the velocity profiles, but also the shape of the corrections that will be applied to the turbulence model. A suitable cost function J is defined as with individual weights I U , I k and I ε for each term in the cost function. The first term represents the difference between the RANS velocity profiles (u) and the DNS data (u * ), whereas the subsequent terms are equivalent to source/sink terms in the turbulence modelling equations. It can easily be shown that δ is related to β as Finally, S U , S k and S ε are used to normalize the variations in the cost function, and they are defined as δ k and δ ε are normalized such that the importance factors, I, are easier to interpret among all DNS cases considered in this study. As it can be seen in the present formulation, the final field inversion study must include an hyper-parameter optimization analysis for the values of I U , I k and I ε . The selection method is based on the elbow method [28], since it was found that each field inversion shows clear inflection points (discussed in detail later).

Optimization algorithm
To solve the previously defined field inversion problems, we will use gradient-descent (GD) algorithms. In general, GD algorithms are preferred over Hessian methods to solve complex non-linear optimization problems across different fields. Moreover, for our specific application, it can be shown that the Hessian matrix is non-invertible at the channel center due to the vanishing gradients near the symmetry plane. Accordingly, the β multipliers at the channel center have a negligible effect on the solution, and thus their influence on the cost function J is nearly zero. Another favorable property of GD algorithms is that their results yield continuous spatial distributions due to the smoothness of the gradients associated with the turbulence model. Furthermore, GD algorithms tend to leave the β multipliers near the channel center at their initial values (β = 1), since these algorithms do not modify parameters which are not relevant to the cost function J.
β n ← β n−1 10: The GD algorithm used in the present study is based on the traditional bold drive method [29]. However, we also introduced gradient inertia to increase the convergence speed. The final approach for the optimizer is shown in algorithm 1. In this algorithm, the optimizer starts by taking a traditional step using gradient-descent with added momentum. The values generated for the gradient inertia and the optimization parameters are stored using the auxiliary variables m ′ and β ′ respectively. If the updated value for the cost function J (β ′ ) is lower than before, the temporary values for m ′ and β ′ are accepted as the new state of the system. Additionally, the learning rate α is increased according to the expansion ratio k + . This allows the optimizer to dynamically search for a learning rate schedule that maximizes the convergence speed. If divergence is detected (J (β ′ ) > J n−1 ), the optimizer retains the β parameters from the previous iteration (β n−1 ), resets the gradient inertia (m) to the current Jacobian, and decreases the learning rate according to the ratio k − . These simple steps allow the optimizer to perform a line-search process, seeking optimal values for the learning rate α. Gradient inertia must be necessarily removed from the system during the line-search process, since otherwise it cannot be guaranteed that the algorithm will converge to an optimized β distribution.
The recommended values from literature for the parameters k + and k − are 1.1 and 0.5, respectively. However, in the present study, we employ an aggressive expansion value of k + = 1.2. The constant c corresponds to the gradient inertia hyper-parameter. For this variable, a recommended value of c = 0.9 can be found across a wide variety of algorithms described in the literature [30,31]. It was found that the introduction of the gradient inertia decreased the running times by a factor three with respect to the original bold drive method. The proposed algorithm allows to fully automatize the process of field inversion, and to subsequently run over 600 optimization cases in total.

Jacobian matrix calculation
The Jacobian associated with the field inversion process is computed using the discrete adjoint method. In this method, the discretized RANS equations are written as a residual vector R(W(β), β) = 0, that contains one entry per every discretized cell and scalar equation. The variable W(β) corresponds to the vector of dicretized physical degrees of freedom present in the RANS equations, such as pressures or velocities. According to the discrete adjoint method, the Jacobian ∇ β J can be calculated as where the vector Ψ can be obtained from the following system of linear equations The main advantage of the discrete adjoint method is that only the vector Ψ must be calculated from eq. (18), whereas a direct calculation method based on chain-rule differentiation would require the computation of the rank 2 sensitivity matrix ∂W/∂β. Since the latter matrix is orders of magnitude larger than the vector Ψ, the discrete adjoint method constitutes a better alternative. In order to generate explicit formulas for all the entries present in the matrices ∂R/∂W and ∂R/∂β, we utilize symbolic algebra packages, such as Sympy [32]. As a result, the coefficients of these matrices are described by long arithmetic formulas, which can be inserted into the source code of a function written in any programming language. The use of explicit formulas increases the speed of our optimizer as any zero coefficients are immediately cancelled by the algebraic package. Moreover, commonly repeated algebraic sub-terms, such as the eddy viscosity µ t = C µ f µ ρ k 2 /ε, can be replaced by auxiliary variables to avoid redundant calculations.

Neural networks
In order to complete the FIML methodology, we need to construct a predictive system using neural networks. These networks use hyperbolic tangent neurons in their deeper layers, due to their inherent ability to produce smooth output distributions and since they fulfill the universal approximation theorem [33,34]. An early reference to the use of hyperbolic tangent neurons in the context of fluid mechanics can be found in the work of Milano and Koumoutsakos [35]. In the first layer of the neural network we use logarithmic neurons. These neurons, first proposed by Hines [36], are capable of automatically discovering input parameter groups, such as the Reynolds number (Re), the Prandtl number (Pr) or any feature present in boundary layers. The ability to create parameter groups is important, since it is not fully known which features (X) are best suited to model the non-linear relationship δ = f (X). As a result, the introduction of logarithmic neurons into ML systems allows the optimizer to determine which feature groups are optimal; even in the absence of previous modelling knowledge.

K-fold validation
Due to the relatively small size of our database, in combination with the diversity of cases, it proved difficult to split the data between training, cross-validation (CV) and test sets. Picking relevant CV sets that were unbiased by prior turbulence modelling knowledge corresponded to a difficult challenge, since the uniqueness of many DNS samples contained in our database implied that the cases picked for cross-validation could greatly underestimate the error margins found in the test set. As a result, employing CV sets proved to be ineffective.  As a result, the study was performed using the K-fold validation method [37]. This method assesses the robustness of machine learning models by picking "K" random training sets, and subsequently evaluating the results with the remaining test set. If a large variance is detected, it may indicate that more data is required to train the ML system effectively, or that a different ML architecture is required. The K-fold validation method corresponds to one of the best alternatives available to assess the performance of ML systems trained with small datasets [38].
The test sets for the different K-fold combinations are listed in table 2. The DNS cases are picked randomly, except for the K-fold set (K-1), which corresponds to a hand-picked selection of challenging test cases yielding large modelling errors when using the MK turbulence model. Among the test cases of the K-fold set (K-1), DNS cases are included that are subject to extreme conditions; such as the cases JF M.CRe ⋆ τ and M4.0R200. As a result, the K-fold set (K-1) represents a scenario where challenging predictions are required, despite the absence of adequate training samples. The incompressible DNS cases extracted from the work of Jiménez and Hoyas [24] are added to the test sets of the K-fold validation trials (K-2 to K-10) in order to assess the response of the ML system for different Reynolds numbers. However, these DNS cases were omitted for the K-fold set (K-1), since the intent of this validation run was to test the ML system only under challenging variable-property flow cases.
The selection procedure to determine the final machine model for the study is based on finding the smallest neural network architecture which is capable of fitting the training data available for the K-fold combination (K-1) listed in Table 2. According to the principles of the elbow method, the smallest system which can fit the training data is less likely to produce over-fitting than larger machine learning models. The K-fold set (K-1) is chosen, since this combination represents a realistic scenario where a selection of the most challenging CFD cases remain hidden from the training data. In summary, the final neural network architecture is not pre-conditioned to perform well under the most complex test conditions available. Fig. 2 shows a comparison between the δ k predictions, made by a fictitious neural network (δ ini ), and the groundtruth labels δ * obtained through field inversion for the DNS case JF M.CRe ⋆ τ . The values of δ ini in Fig. 2 contain the explicit perturbation term ∆ = 0.6 y 3 sin (8πy), which was added to test the robustness of the present methodology. The corrections present in the distribution δ ini are physically plausible up to y ⋆ < 100, while spurious oscillatory corrections exist at the channel center. These oscillations also produce unstable behavior in CFD solvers, since all budget terms in the RANS equations are inactive near the symmetry plane at the channel center. To mitigate the previous issues, we introduce a weighted relaxation factor methodology, α, which is defined as the fraction of the original ML corrections δ ini to keep.

Weighted relaxation factor
The derivation of the weighted relaxation factor starts by noting that the magnitude of the final corrections that will be applied to the RANS model, δ f , only corresponds to a fraction, α, of the original corrections predicted by a neural network, δ ini . This relation can be stated as or alternatively, In order to assess the true compatibility of the final δ f corrections with a given RANS turbulence model, we express δ f in eq. (20) as β times the production term P, namely in the corresponding turbulence modelling equations. They key idea to build a robust methodology is to recognize that spurious machine learning corrections, such as δ ini in Fig. 2, create large oscillations in the β multipliers defined by eq. (21). As a result, the introduction of a L2-regularization hyper-parameter, λ, for these β multipliers would immediately penalize the presence of large oscillations in regions where RANS turbulence models are inactive. The cost function associated with this problem is Eq. (22) states that the final β multipliers must produce the greatest degree of similarity between δ f and δ ini , while minimizing the magnitude of β 2 according to a regularization hyper-parameter λ. In order to minimize the cost function defined in eq. (22), its Jacobian can be forced to form a null vector: Replacing eq. (22) into the previous condition, yields the following element-wise array equation: Re-arranging the terms of eq. (24) further reveals that Note, eq. (25) is evaluated element-wise. Replacing eq. (25) back into eq. (21) gives a direct residual equation for λ Since eq. (26) only contains one unknown (λ), a simple root-finding algorithm can be used to solve this optimization problem, such as the Newton-Raphson method. For reference, the gradient of the previous residual equation (R λ ) is given by the following formula: After obtaining the regularization hyper-parameter, λ, the final ML corrections (δ f ) are given by: Eqs. (26)(27)(28) constitute the only required components to implement our weighted relaxation factor methodology in a computer environment. The results depicted in Fig. 2, show how the relaxation factor removes 40% of the worst part of the initial corrections, δ ini , based on the budget for the production term of the k-equation. The final distribution obtained, effectively resembles the ground-truth labels, δ * , which were hidden from the system. Finally, it is important to note that the current weighted relaxation factor methodology is applicable to either 1D, 2D or 3D RANS turbulence models. The systems of equations obtained in each case are identical, since the formulation employed does not impose constraints on the spatial distribution of the data points considered.

Final framework
The final machine learning framework developed can be found in Fig. 3, which is split into two stages. In the first stage, shown in Fig. (3.a), a baseline RANS turbulence model is compared with existing DNS data in order to produce δ k field inversion corrections. These corrections are subsequently used as ground-truth labels to train a deep learning system, which is able to replicate the trends observed based on a stack of relevant features X f . One of the main differences between the framework described in Fig. (3.a) and the original approach proposed by Parish & Duraisamy [19] is that our field inversion corrections δ k are subject to L2-regularization based on their absolute magnitude as a fourth budget-term in the RANS equations, instead of their values as relative β multipliers with respect to existing RANS terms. This enables the field inversion optimizer to build corrections that follow patterns which are not captured by baseline RANS models. Additionally, the framework described in Fig. 3(a) has been adapted to account for the changes observed in flows subject to strong variations in their thermophysical properties, namely, ρ, µ and λ. The approach chosen effectively decouples the analysis of the RANS momentum equations from the energy equation or any associated equations-of-state for fluids. This is achieved by passing the DNS profiles for the molecular properties of the fluids to the baseline RANS turbulence models during the field inversion process. However, one challenge introduced by this procedure is that the stack of features X f used to predict the field inversion corrections δ k must be based on accurate estimations of the profiles for the molecular properties of fluids. This challenge was solved by using the machine learning framework presented in Fig. 3(b), which corresponds to the system employed to create predictions for unknown CFD cases. This algorithm is based on an iterative feedback loop, where the improved velocity profiles generated by the predicted δ ML corrections are passed back to the viscous heating term of the energy equation in order to generate better estimates for the profiles of the molecular properties µ and ρ. The algorithm is initialized by using the profiles generated by the baseline RANS model without δ corrections, and it iterates until convergence is achieved. The weighted relaxation factor methodology described in section 3.2.2 is applied immediately once neural network predictions are obtained.
The final neural network architecture is depicted in Fig. 4. Here, it can be seen that only three logarithmic neurons are used in the initial layer. The initial stack of features, presented in Fig. 4 [19]. The diagram on the left (a) presents the methodology employed to obtain field inversion corrections and train deep learning systems, whereas the scheme on the right (b) corresponds to the feedback loop used to used to perform predictions at runtime.

Field inversion results
This section will describe the results of the FIML study for the MK turbulence model. First, the different hyperparameter combinations for the field inversion study will be analyzed. Then, the observed trends in the final machine learning predictions will be presented, followed by a brief discussion of the results.
The field inversion study of the MK turbulence model focuses on determining the values of the hyper-parameters I U , I k and I ε in eq. (11). In the first combination, an equal importance is assigned to the corrections used in each scalar equation (k and ε) by setting I k = I ε = 1. The value of I U was calibrated by applying the elbow method to the system. The results obtained can be found in Fig. 5, where it can be seen that clear inflection points exist for each case. For any subsequent ML analysis, it is possible to either choose the I U values located at the inflection point of each DNS case, or to pick a common value of I U for all cases. It was decided to pick a common value of I U = 100 for all DNS cases, since this creates smooth trends across the whole dataset. Additionally, selecting a unique value for I U can simplify the creation of deep learning models, since all the target δ * corrections correspond to the solution of a single optimization problem. If different I U values were picked for each DNS case, additional training data might be required to allow the deep learning system to approximate the selection criterion employed.  The second stage of the hyper-parameter optimization study consists in analyzing the effect of the individual values of I k and I ε in the field inversion results. The effects of varying these hyper-parameters are depicted in Fig. 6 for the DNS case CRe ⋆ τ , which corresponds to the case with the highest modelling errors according to the MK turbulence model. The results show that varying the values of I k and I ε has a minor influence in the shape of the corrections, since only the magnitude of the peaks tends to change. After a detailed analysis, it can be proven that the trends observed in Fig. 6 are also present across different DNS cases, and that further tuning the values of I k and I ε is not relevant in the context of the present study.
After analyzing the effect of the different hyper-parameters found in the field inversion formulation, it was decided to study the effect of building independent sets of δ k and δ ε corrections. Employing a unique set of corrections can simplify the subsequent ML study, since the need to produce two-dimensional output pairs (δ k , δ ε ) is avoided. By applying the elbow method to calibrate the values of I U for each set of independent predictions, it is found that I U = 100 corresponds to a reasonable approximation as well. The individual corrections obtained for δ k and δ ε can be found in Fig. 7. While both distributions appear similar, it can be proven that the corrections for δ ε tend to present sharper transitions in the buffer layer near y ⋆ = 30 than their counterparts given by δ k , especially for the DNS cases requiring the largest corrections. The previous fact is found to have a large influence on the quality of the final machine learning predictions obtained for each case, since the presence of sharp transitions for δ ε tends to create larger spikes in the predictions made by NNs. As a result, it is decided to employ only δ k corrections in the final study.

Machine learning predictions
The final ML predictions were obtained by training the deep learning architecture described in Fig. 4, and subsequently applying a weighted relaxation factor of α = 0.5 in eq. (26) to smooth all the predictions generated by neural networks. The hyper-parameter α = 0.5 was found to yield favourable results, despite substantially reducing the magnitude of the final δ k corrections. A selection of the results can be found in Fig. 8. The K-fold validation trials present in this figure correspond to the three cases with the highest modelling errors, and to the K-fold trial with the lowest error margins (K-6). The DNS cases used in the comparison also correspond to a selection of the cases with the highest and the lowest test errors found in each validation set. The results show that the ML predictions are able to retain or improve the initial predictions of the MK turbulence model in nearly every case tested. The results found for the DNS case CRe ⋆ τ are analyzed in greater detail and presented Fig. 9. Here, the average modelling error and corresponding standard deviation are plotted. It can be seen that the different K-fold validation trials achieve an average improvement of 13.3%. While this improvement margin may seem modest, it is important to remember that the DNS case JF M.CRe ⋆ τ corresponds to an atypical sample with respect to the remaining dataset. The extent of the modelling errors found in the DNS case JF M.CRe ⋆ τ with the original MK turbulence model is unmatched by any other DNS case available. As a result, it can be seen that the ML architecture is able to achieve a robust behavior even in the presence of adverse conditions. Moreover, the ML system also manages to yield accurate predictions in DNS cases, which are closely related to the available training data, such as samples from the supersonic flow cases simulated by Trettel and Larsson [14]. Based on the previous success, it can be concluded that the ML system is able to act as an effective non-linear interpolator, and that the predictions made for sparse DNS cases only yield moderate improvements.
Regarding the feature groups created by the NN architecture, different trends were observed across each K-fold validation run. Obtaining conclusions regarding the feature groups was complex, since the subsequent layers of hyperbolic tangent neurons in the ML system also created non-linear combinations between the different parameter groups assembled. Furthermore, the stack of features X f passed to the ML system is also likely to present complex non-linear correlations. As a result, the NN itself was able to tweak the scope and influence of each parameter group chosen during the K-fold validation runs. Despite the previous variability, it is important to note that the system achieved similar predictions across different K-fold validation trials. Reducing the size of the NN architecture was deemed unfeasible, since the ML model became incapable of fitting the training data appropriately when such attempts were made. As a result, it can only be concluded that more DNS data should be used in subsequent studies. However, it can still be noted that the ML architecture developed was able to yield positive contributions across a wide variety of unique/sparse DNS cases. Therefore, it can be concluded that the ML architecture developed is well-posed to make predictions on sparse DNS samples which may not be well-represented in the training set.   Figure 8: Selection of the results obtained for the K-fold validation trials at the extreme ends of the test error ranking generated during the prediction of the independent sets of δ k corrections for the MK turbulence model. The upper curves represent the initial RANS velocity profiles (red dashed lines), the data-augmented velocity predictions (blue solid lines) and the reference DNS data (black dotted lines). The lower curves present the initial predictions made by the NN system (red solid lines), the results obtained after applying a weighted relaxation factor of 0.5 (blue solid lines) and the ground-truth labels for the field inversion values (black dotted lines).

Conclusions
In this paper we used machine learning to improve the predictions of RANS turbulence modelling in channel flows subject to strong variations in their thermophysical properties. The study was based on a technique known as FIML proposed by Parish and Duraisamy [19]. Several adaptations were introduced into the formulation proposed by the original authors, which can be recommended for future studies. The first recommendation is the use of the bold drive method with added momentum to drive field inversion optimization processes. This method proved to be unconditionally stable and numerically efficient in over 600 optimization cases run during the present work. As a result, this method can operate automatically requiring minimal attention from the user. The use of symbolic algebra solvers to generate expressions for the entries present in the matrices required by the discrete adjoint method in CFD also proved to be a valuable alternative, since the closed-form expressions generated are sparse-efficient. Furthermore, the addition of β multipliers to specific terms in the RANS equations proved to yield numerically stable formulations. The overall shape of the corrections obtained can be controlled by employing cost functions containing adequate conversion terms (e.g., β vs. δ). Regarding this context, the use of a direct non-linear optimization approach is also recommended in the context of RANS turbulence modelling in CFD, since DNS data contains negligible levels of numerical noise or discretization errors.
Regarding the machine learning study, the use of an initial layer of logarithmic neurons followed by layers of hyperbolic tangent neurons proved to be a robust architecture, which was able to yield positive contributions in nearly every case tested. The use of the weighted relaxation factor methodology proved to be an important asset within this context, since it was able to recover valuable trends present in otherwise spurious predictions. The high variability observed among the different K-fold validation trials did not manage to destabilize the final results. The overall behavior of the ML models indicates that the system is able to act as an excellent non-linear interpolator between DNS cases well-represented in the training set, whereas the predictions obtained for sparse DNS samples only show moderate improvement margins. Due to the previous reasons, in future studies it is recommended to search for parameter groups which display low levels of variability across different DNS cases, such that the process of building ML predictions can be simplified. The substantial variations among the K-fold validation trials also suggest that more training data should be incorporated into the study. Despite the challenges encountered, the FIML methodology developed to account for strong variations in the thermophysical properties of fluids was able to yield positive improvements across a wide variety of unique DNS cases, which were not well-represented in the training set, such as the DNS case JF M.Re ⋆ τ .