Development of a data-driven model for turbulent heat transfer in turbomachinery

. Machine Learning (ML) algorithms have become popular in many fields, including applications related to turbomachinery and heat transfer. The key properties of ML are the capability to partially tackle the problem of slowing down of Moore’s law and to dig-out correlations within large datasets like those available on turbomachinery. Data come from experiments and simulations with different degree of accuracy, according to the test-rig or the CFD approach. When dealing with modelling of turbulent flows in turbomachinery there is a constant trade-off between accuracy and computational costs, but starting from the large amount of data on turbomachinery performance, with ML it is possible to train a learner to correct and improve CFD. The aim of this work is to investigate an innovative data-driven approach that could lead to a significant improvement in the analysis of heat transfer in turbulent flows. The effects of Reynolds number and wall temperature on heat transfer for a double forward-facing step with two squared obstacles were investigated by numerical simulations carried out in OpenFOAM. Then a machine-learnt model was derived using a regression algorithm. The results of regressor showed that a data-driven approach can effectively predict results of the RANS model.


Introduction
Modelling of heat transfer is a key issue in many engineering applications, especially when dealing with Computational Fluid Dynamics (CFD) of turbulent flows with heat transfer. Among many open issues, most comes from the Boussinesq approximation and in particular to the limit of having Reynolds fluxes aligned to Reynolds stresses [1]. This condition is nonphysical, especially in strongly non-isotropic turbulence regions and in particular along solid walls. Moreover, these regions are also those which pose the most challenging conditions when dealing with grid refinement for CFD due to the resolution needed to integrate Navier-Stokes to the wall [2], and unfortunately most industrial CFD design and optimization loops still rely on the use of wall functions to model the near-wall flow and thermal fields. Since wall functions are usually calibrated on simple attached-flows conditions, this eventually leads to increase of errors when solving more complex flows with impingement, recirculation, reattachment, transition and other phenomena.
In the following we discuss the possibility of deriving the distributions of turbulent viscosity and diffusivity in a low-Reynolds RANS approach using machine learning, and in particular to apply this strategy to a complex flow configuration with multiple flow features, such as geometry-induced separation, impingement, reattachment, wakes using data from multiple Reynolds numbers. In particular, the case here reproduced is the double forwarding facing step with squared obstacles previously investigated in [3]. The case here considered is the one with the configuration: L= 1.6m, a=1m, b=0.2m, c=0.4m, H=0.1m, h1=h2=0.02m, w=t=0.01m, Figure 1. Combinations of Reynolds number and boundary conditions for temperature will be discussed in section 2.2 and is summarized in Table 1. This is a preliminary work to understand the feasibility of such approach in a simplified 2D geometry with multiple flow regimens at different Reynolds.

CFD method
Computations were carried out using the OpenFOAM v1812 [4] using the buoyantBoussinesqSimpleFoam solver for incompressible flows with heat transfer. In this case heat transfer is calculated assuming temperature to be a transported scalar with Prandtl number Pr=0.71 [5] and turbulence closure relied on the Launder-Sharma k- model [6]. The linearized system of equations was solved using conjugate gradient solver for all quantities. Tolerance was set to 10 -5 for pressure and 10 -8 for all the other equations. Numerical schemes used entailed central differencing for gradients, QUICK for divergence terms and upwind for Laplacian terms. Boundary conditions entailed mass flow rate at the inlet with TI=5%, /=60, T=293K. On solid surfaces no-slip conditions were used with integration to the wall approach and an average value of y + equal to 0.88 for the highest Reynolds number considered (maximum y+ = 1.1 in this case). Fixed temperature was imposed on heated walls according to the conditions investigated in Table 1. At the outlet convective conditions were imposed. The computational domain is 2D and the final mesh entails 622,400 cells. Convergence of grid was investigated using Nusselt number on the heated walls as convergence parameters using 300,200, 622,400 and 1,120,000 cells. The difference in average Nusselt is below 1% between 622,400 and 1,120,000 cells. Results of the 9 computations were used as training and testing data for the ML approach.

Dataset
In the present work, the regression algorithm is designed to implement a data-driven approach for heat-transfer in turbulent flows, in particular the final aim is to predict the turbulent thermal diffusivity (αt) and turbulent viscosity (νt) fields.
The overall training dataset of the regression algorithm consists of ∼95M elements and is the set of nine different smaller datasets each of ∼10.5M elements covering three different Reynolds numbers (Re1=30000, Re2=80000 and Re3=100000) and three different temperature values (T1=313K, T2=423K and T3=626K). Table 1 gives a summary of the nine different cases.

Feature creation and normalization
The training datasets initially includes information on seventeen features (Table 2). To avoid problems related to the frame of references all the selected features are scalars. In [7], Goodfellow et al. stated that ML algorithms are negatively affected by features that exhibit large variations in magnitude, in fact the predictor could overestimate the influence of features with the smallest values. To avoid this, features were normalized following the findings of [8] that pointed out how this method is more suitable than usual normalization techniques used in ML and was positively tested in [9]- [11]. According to [8], most features were normalized as: where ' is the normalized feature, the original feature, and a normalization factor summarized in Table 2, where Uref=mag(U) and Lref=4•mag(U)/Sii are respectively the reference velocity and length. This normalization, unlike most ML techniques, is based on local flow and thermal field properties. All the features that don't show a value for N in Table  2 were normalized with minMax scaler [7] as a local normalization would result in a constant value (like for velocity) or would not be suitable (like for angles).

Feature selection
A valid predictive model must guarantee that there is no obvious predominance between the input features and the output. Moreover, ML algorithms do not satisfactory work with data that present a strong direct correlation between each other since this would be transferred directly to the final model. In so doing, an Exploratory Data Analysis (EDA) is performed to verify the accuracy of the former feature selection and a correlation analysis of the data is reported in Figure 2.
From the correlation matrix it is possible to observe that the strain tensor (Sii) and the rotation tensor (Wii) are highly correlated and that β, θ and ζ are not correlated to t or αt. Thus, we decide to not include Wii, β, θ and ζ in the training data.

Regression algorithm and training
XGBoost [12] was selected as regression algorithm due to its resilience in working with continuous data with a high number of outliers [12]. Objective of the training was set to linear regression, with subsampling of columns set to 30%, learning rate to 0.1, maximum depth of each tree to 50, number of trees to 200. These hyperparameters were optimized using a random search algorithm [13].
The general framework is made up of two phases: i) training and test and ii) prediction. During training, the individual datasets relating to the nine cases are grouped and crossed based on the Reynolds Number of references to build three different training datasets (Table  3). Each of these three datasets is used to train a regression model for αt and νt. A logarithmic transformation is performed on αt and νt which are strongly inclined variables in a more normalized dataset. The use of the logarithm of these two variables improves the fit of the regression model by transforming the distribution of the features into a bell curve of a more normal shape. Figure 3 shows the values of αt and νt predicted from the models with respect to the training data. For each of the six training phases root-mean-square error (RMSE) remains below of 1%, this means that the training is successfully achieved and verifies the initial assumptions on feature selection and the framework architecture. As an example, Figure 4 shows the field of αt relating to case 3, which perfectly reproduces the real values.

Results
Prediction follows training of the algorithm: to test the generalization of the models the regression model is forwarded to the three cases not included in the training datasets. Cross validation is required to evaluate the generality of the models out of the training data, the target is to find out how the model is performing based on training data. By means of an interpolation process from Dataset B it is possible to predict both αt and νt fields of the three missing cases (case4-6) corresponding to Re 80000; similarly, with two extrapolation processes from Dataset A it is possible to predict both αt and νt fields of the three missing cases (case7-9) corresponding to Re 100000 and from Dataset C it is possible to predict that fields of the three missing cases (case1-3) corresponding to Re 30000. The results of the interpolation process are highlighted in in Figure 5 and in Figure 6, where it is shown the good agreement of the predicted values of αt and νt with their respective real values. For what concerns the extrapolation processes Figure 7 shows the prediction of νt relating to case 9 with a not perfect match with respect real values, Figure 8 depicts the prediction of αt relating to case 3 with a match with respect real values even worse than the previous one.     Figure 9 shows the results of the regression models for both features αt and νt. It is possible to state that the models have good agreement with the CFD values in particular for the interpolation process, but when dealing with the extrapolation processes the predicted values deviate more from the real ones. Table 4 gives a summary of the root-mean-square error of predictions.

Conclusions
We investigated the possibility of predicting a more accurate distribution of eddy viscosity and eddy diffusivity in a RANS framework to possibly extend the characterization of a turbulent and thermal boundary layer to a more accurate wall function.
A dataset of 9 cases was created using low-Reynolds computations of a complex 2D test case characterized by many flow regimens, varying two parameters: Reynolds number and wall temperature.
Then, the complete set of data was characterized using correlation matrix and other EDA techniques to select the appropriate features to be used for regression. From a number of 17 original features only 13 were used.
A XGBoost regressor was then trained and tested. In so doing a series of sub-dataset were selected aiming at investigating the capability of the model of interpolating and extrapolating the correct fields of eddy viscosity and diffusivity.
Interpolation lead to good results, while extrapolation showed the limits of this approach, suggesting that maybe a more complex ML approach should be used.
From the response of the algorithm we can infer that interpolation could increase its accuracy increasing the number of Reynolds numbers used for the training of the algorithm as the regressor has a linear trend whilst in the considered range of Reynolds the flow characteristics have increasing turbulent behaviour without reaching a stable range of operations.
Nevertheless since this is also a preliminary work to assess the possibility of using low-Reynolds approach to better tune an high-Reynolds wall function, we can see from Figures 4-8 that in the boundary layer the regressor has a high accuracy and therefore it is possible to use these data to tune a dynamic wall function that takes into account the local properties of the flow and thermal fields.