Applying Bayesian Optimization with Gaussian Process Regression to Computational Fluid Dynamics Problems

Bayesian optimization (BO) based on Gaussian process regression (GPR) is applied to different CFD (computational fluid dynamics) problems which can be of practical relevance. The problems are i) shape optimization in a lid-driven cavity to minimize or maximize the energy dissipation, ii) shape optimization of the wall of a channel flow in order to obtain a desired pressure-gradient distribution along the edge of the turbulent boundary layer formed on the other wall, and finally, iii) optimization of the controlling parameters of a spoiler-ice model to attain the aerodynamic characteristics of the airfoil with an actual surface ice. The diversity of the optimization problems, independence of the optimization approach from any adjoint information, the ease of employing different CFD solvers in the optimization loop, and more importantly, the relatively small number of the required flow simulations reveal the flexibility, efficiency, and versatility of the BO-GPR approach in CFD applications. It is shown that to ensure finding the global optimum of the design parameters of the size up to 8, less than 90 executions of the CFD solvers are needed. Furthermore, it is observed that the number of flow simulations does not significantly increase with the number of design parameters. The associated computational cost of these simulations can be affordable for many optimization cases with practical relevance.


Introduction
In computational fluid dynamics (CFD), the Navier-Stokes equations or variations of them are numerically solved in order to simulate fluid flows. At most of the Reynolds numbers associated to environmental flows and relevant to engineering applications, flows are turbulent, see e.g. Ref. [14]. Various approaches have been developed to numerically simulate turbulent flows, ranging from low-fidelity Reynolds-averaged Navier-Stokes (RANS) simulation to high-fidelity approaches including large eddy simulation (LES) and direct numerical simulation (DNS), see Ref. [63]. Moving from RANS toward DNS, the influence of modeling is reduced, the role of numerics becomes more dominant, and above all, the computational cost increases [68]. In many applications where the fluid flows are involved, we need to optimize different parameters in order to meet (within a margin) a certain set of objectives while satisfying a set of constraints, see Refs. [76,17] and the references therein. The aim of the present paper is to illustrate how Bayesian optimization (BO) [67,21], which has been mostly utilized in the field of machine learning and data sciences, can be applied to different types of optimization problems arising in CFD. Previous application of Bayesian optimization to CFD problems is limited to a few studies, see Refs. [73,44,38]; therefore, the diversity of the applications considered in the present study can reveal the flexibility of the method as well as shed light on its pros and cons.
To put the work in context, we shortly review the main approaches which have been employed for the purpose of optimization in CFD. Such approaches are iterative and can, in general, be divided into gradient-based, gradient-free, and gradient-enhanced, see Refs. [46,76]. The gradient-based methods are the appropriate choice when the evaluation of the derivatives of the cost function with respect to the design parameters is computationally efficient. At each iteration, the first-order gradient is needed to determine the searching direction and the second-order gradients (i.e. Hessian matrix) or the approximations of them are required to obtain the optimal step size, see e.g. Ref. [46]. Therefore, depending on the approach, a minimum order of smoothness for the objective function is necessary. The gradients can be computed by automatic-differentiation (finite differences), or more efficiently by solving forward or adjoint sensitivity equations. The use of the latter is recommended when the number of design parameters is more than the number of objectives, see [46,28,76]. As its main advantage, the gradient-based optimizations exhibit fast convergence and can handle large numbers of design parameters. However, it is always likely to be trapped by local optima. Besides this, in the cases of unsteady Navier-Stokes equations the need for saving the forward solution fields which are required to solve the adjoint equations may lead to memory issues. To rectify these issues, extra treatments are needed, see [3,29]. Various types of application of gradientbased methods for optimization in CFD can be found in the literature, including shape optimization [41], optimization of initial conditions for fast transition to turbulence [32], and topology optimization in turbulent flows using the RANS approach [17].
In gradient-free approaches, the simulator (here, the CFD solver) is treated as a blackbox and run to evaluate realizations of the response at the samples of the design parameters. This approach which goes back to Box and Draper [7] can be implemented using different methods, see Refs. [33,20]. The simplest method is the grid search, where a set of manually selected samples are tested to find the optimum. Evolutionary algorithms [2], which mimic the nature's survival by iteratively simulating "selection and mutation" of the design parameters, are also considered as an effective approach; see e.g. Ref. [82] where the algorithm is applied to modify the RANS stress-strain relationship. However, the most frequently-used method of this type in CFD is called the response-surface method (RSM), in which a metamodel or surrogate is constructed using a finite set of training data comprised of parameter samples and corresponding responses. The size of the training data set is determined as a balance between the required accuracy of the surrogate and the cost involved in running the simulator. In many CFD applications, Gaussian process regression (GPR), or the Kriging method [57,27], has been employed to construct the surrogate. As detailed in Section 2, GPR is among the non-parametric surrogates, naturally allows for noisy (uncertain) data, and more importantly, can estimate the uncertainties involved in their predictions. Examples of RSM-based optimization can be found, for instance, in Viana et al. [78]. Contrary to the gradient-based methods, RSM is suitable for finding the global optima, but it suffers from the curse of dimensionality. To reduce the required number of samples for highdimensional parameters yet constructing an accurate response surface, gradient information can be added to the training data, see the review by Laurent et al. [35]. In particular, the gradient-enhanced Kriging (GEK) has been extensively used in the last two decades for optimization in CFD, see for instance, [11,34]. The study by Laurenceau et al. [34] showed that at large number of parameters (=45), GEK outperforms RSM-GPR, while at small number of parameters (=6), inclusion of the gradients makes no significant improvement in the computational performance of the optimization.
In the present study, the Bayesian optimization based on Gaussian process regression, hereafter referred to as BO-GPR, is employed, see Refs. [21,67]. This approach can be classified among the gradient-free approaches, although a gradientenhanced version of it has also been proposed, see Ref. [84]. In BO, a sequence of samples for the design parameters is taken which converges to the global optimum. Therefore, as a main difference with the RSM approach, the surrogate in BO is sequentially updated. That is, in fact, the clear connection of BO to the Bayesian formalism, see e.g. Ref. [69]. As detailed in Section 2, at each iteration, the decision about the next sample is taken based on combining the exploitation and exploration, where the latter directly takes into account the predictive uncertainty of the surrogate. This active involvement of the uncertainties in the algorithm is another distinguishing characteristic of the BO approach over RSM methods. In general, the predictive uncertainties are a combination of the uncertain CFD data and the constructed surrogate. The use of the BO is a timely choice considering the application of the uncertainty quantification (UQ) [69] in different aspects of CFD and numerical simulation of turbulence over the last decade, see e.g. Refs. [4,48,85,58,59,61]. In fact, most of those studies are classified as UQ forward problems, the outcomes of which can be nicely employed in BO, which is, in fact, an inverse UQ problem. Having the design parameters defined over an admissible space, the BO can be non-intrusively linked to any CFD solver. This provides a great flexibility noting that a CFD solver is not necessarily equipped with adjoint solvers to compute gradients, and in any case, computing the adjoint sensitivities could be expensive.
The application of BO in CFD has been very limited. Talnikar et al. [73] developed a parallel version of BO to minimize the drag in a turbulent channel flow simulated by LES. Nabae et al. [44] used BO to optimize the phase-speed of streamwise traveling wave-like wall deformation for drag reduction in a turbulent channel flow simulated by DNS. More recently, BO was employed by Mahfoze et al. [38] to optimize the blowing amplitude and coverage in order to reduce the friction drag to achieve a certain net-power saving in a zero-pressure-gradient turbulent boundary layer (TBL) simulated by DNS. To study different aspects of BO, a diverse set of optimization problems are considered in the present paper. The objectives, number of design parameters, and CFD solver are different in these problems. This paper is structured as follows. In Section 2, a detailed review is given to the theoretical aspects of the adopted Bayesian optimization methodology. In Section 3, BO is applied to optimize the initial conditions of a set of coupled nonlinear ordinary differential equations so that an energy norm at some later time is maximized. As the second example, shape optimization of a laminar lid-driven cavity flow is studied in Section 4, where the objective is to either minimize or maximize the energy dissipation. Another shape optimization which has practical applications, is considered in Section 5. The objective is to optimize the shape of a wall boundary of a channel, so that the streamwise pressure gradient of the turbulent boundary layer (TBL) over the other wall matches a given profile. Section 6 is devoted to the optimization of a spoiler-ice airfoil profile. The parameters controlling the configuration of two spoilers which are located on the pressure and suction sides of the airfoil are optimized so that the distribution of the pressure on the airfoil surface becomes the same as if a real ice was formed on the front part of the airfoil. In the end, concluding remarks are provided in Section 7.

Bayesian optimization
Consider the set of design parameters q = {q 1 , q 2 , · · · , q d } which are allowed to vary over the admissible space Q ⊂ R d (note that Q is not the set of rational numbers in this paper). Running the simulator, i.e. the CFD code, realizations for the quantities of interest (QoIs) or responses are obtained, which in general can be noisy. The aim of the optimization is to minimize (or maximize) an objective function r(q) which depends on the simulator outputs over space Q, and find optimal parameter q opt , q opt = arg min q∈Q r(q) . (1) Ideally, a functional dependency between the observed values r and parameters q could be established as [57,27], where, ε denotes the observation noise, and without loss of generality it is assumed to be identically distributed for all samples as N (0, σ 2 ). However, the complexities in the flow solver make acquiring a closed form for f (q) infeasible. Alternatively, a surrogate f (q) can be constructed adopting a non-intrusive approach in which the simulator is treated as a blackbox. To this end, a finite set of training data D = {(Q (i) , R (i) )} n i=1 is employed, where Q and R are respectively samples of q and corresponding realizations of r. In general, the number of the training data, n, is limited due to the cost of running the simulator.
The two main components of the Bayesian optimization, i.e. the choice of method for constructingf (q), and drawing the sequence of training samples, are now shortly reviewed. The Gaussian process regression (GPR) [57,27] is employed to buildf (q). In this approach,f (q) is assumed to be random for which a prior distribution in the form of a Gaussian process is assumed. A Gaussian process GP is a multi-variate Gaussian distribution defined as, in which the mean and covariance are respectively given by, In these definitions, E[·] denotes the expected value, ε specify the parameters building the noise structure, and β are the hyperparameters in the kernel function which represents the covariance off (q). Given the training data D along with the associated observation uncertainty, the posterior and posterior predictive off (q) (conditioned on D and ε ) can be inferred. Simultaneously, the hyperparameters β are optimized. The posterior-predictive off (q) at a set of test samples Here, without loss of generality, the mean of f (q) in Eq. (3) is assumed to be zero, R = [r (1) , r (2) , · · · , r (n) ], C(Q, Q ) is a n ×n matrix where [C(Q, Q )] ij = c(q (i) , q ( j) ) and c(·, ·) is a kernel function dependent on hyperparameters β. Moreover, C N is the covariance matrix of the noise in the training data. In this derivation, the noise samples are assumed to be independent and identically distributed (iid) as ε ∼ N (0, σ 2 ). A more general case in which the noise is allowed to be observationdependent has been developed by Goldberg et al. [24], and recently applied for the purpose of uncertainty quantification in CFD by Rezaeiravesh et al. [59]. In practice, the observational uncertainties can, for instance, be due to the convergence limits imposed when solving discretized equations and also finite time-averaging in unsteady flows. Including these uncertainties in the BO-GPR requires considering two points. 1. The computed optimum is, in general, uncertain and it may not be possible to obtain a noise-free value. In practice, when maximizing the acquisition function, the expected value of the posterior predictive for the noisy response can be used, see Ref. [26]. 2. The standard acquisition functions, such as expected improvement, Eq. (8), may not provide adequate exploration over the parameter space, therefore, the global optima may not be found or the convergence rate is very slow [77]. Recently, Letham et al. [36] has proposed a modified version of the expected improvement function for handling the noisy data (there are other acquisition functions for noisy data which have been reviewed in Ref. [36]). Implementing such modified acquisition functions for BO-GPR in CFD will be considered in a future extension of the present work. Now, a short overview is given on how the sequence of samples for q can be drawn from Q so that they converge to q opt . At the k-th iteration of optimization the training data set is D 1: . The decision about the next sample q (k+1) is made through maximizing an acquisition function (ACF) α(q; D 1:k ), see e.g. Ref. [21]. The most popular ACF is the expected improvement (EI) which was first suggested by Močkus [40] and then utilized in BO by Jones et al. [31]. The EI for the minimization problem is defined as, where q † = arg min q∈D 1:k r(q), denotes the best estimate for optimum among the k observations (iterations). Note that the improvement I(q) is random since r(q) is so, see Eq. (2). In the cases of using a standard GPR forf (q), Jones et al. [31] derived a closed-form expression for the EI which reads as, where s(q) = √ v(q), and m(q) and v(q) are the mean and standard deviation of the posterior predictive of the GPR at any q, as given by Eqs. (6) and (7), respectively. Further, (ζ ) and ϕ(ζ ) respectively represent the CDF (cumulative distribution function) and PDF (probability density function) of the standard Gaussian distribution for ζ defined by, In Eq. (9), the first term in the summation specifies exploitation, the use of the best value so far, and the second term identifies exploration parts of the Q where the surrogate has highest uncertainty. The ad-hoc parameter ξ can be used in practice as the controller of the exploitation in comparison with exploration.
In the present study, the Bayesian optimization algorithm based on GPR is implemented using the Python libraries GPy [25] and GPyOpt [75]. The developed Python codes are non-intrusively linked to the CFD solvers, Nek5000 [19] and OpenFOAM [83] through appropriate bash drivers. As a result, the whole optimization loop is fully automated. All optimizations started from a random sample for q taken from Q. For constructing the kernel in the GPR surrogates, the Matern52 function is used, see e.g. [57,27]. Note that in general, the choice of the kernel function can affect the convergence of the BO-GPR [62]. Our tests showed that Matern52 kernel could lead to a faster convergence for a set of tested problems. To optimize the GPR hyperparameters, the Broyden-Fletcher-Goldfarb-Shanno (BFGS) algorithm is utilized, see e.g. [46]. In Appendix B, a short discussion is made on the computational time required for BO-GPR and CFD.
Recognizing the convergence of the BO and hence imposing a stopping criterion for that can be, in general, not so straightforward, see e.g. [9,66]. In fact, this is a common characteristic of all gradient-free optimizations. In some cases, mon- 2 at the k-th iteration can help. But noting that the exploration guides the samples to different parts of the admissible space Q, obtaining a small difference between the successive samples, especially for large-dimensional parameters, is difficult to achieve in practice. We have observed that a more reliable measure is tracking the best value of the objective, i.e. r(q † ) where q † = arg min q∈D 1:k r(q). If this measure remains unchanged after a sufficiently large n, then we can consider the optimization to be converged (this is called the identification step, according to Ref. [30]). However, the value of n is not known a priori and it is basically limited by the computational cost of the simulations and available computational budget. Despite this, examining the response surface constructed by the surrogate can help identifying parts of Q with insufficient samples at which the predictive uncertainty of the surrogate is high. Guiding the sequence of samples toward those parts can be, for instance, done by adjusting ξ in Eq. (9). Despite this, the cases studied in Sections 5 and 6 are not directly affected by the above discussion, since the optimum value for the objective is the deviation of a computed QoI from a target value, and is therefore a priori known (albeit potentially not exactly achievable given the design space). In Section 4, where this is not the case, monitoring the best value found when sufficiently many samples are drawn will help.

An introductory example
In many applications in fluid dynamics, in particular hydrodynamic stability, we are interested in finding the optimum initial condition which leads to fastest growth of the unstable modes and hence fastest transition from laminar state to represented by hollow markers. The predicted mean by the constructed GPR is close to the true gain obtained by solving Eq. (11) by the fourth-order Runge-Kutta scheme given the optimum θ . The normalized error between the GPR and true gains measured in L 2 -norm is equal to 0.0102, 0.0142, 0.1360, and 0.4267 for a equals to 10 −4 , 0.9, 0.9999, and 1.0001, respectively. In constructing the GPR, the σ in the observational noise in Eq. (2) was considered to be zero. The shaded areas specify the 95% confidence intervals built around the posterior predictive mean.
turbulence. Kerswell et al. [32] have considered this problem through using adjoint-based optimization. As a "toy" problem to examine their approach, Kerswell et al. [32] introduced the following dynamical system comprised of two ordinary differential equations (ODE), with the initial condition x(0) = x 0 . The objective is to find the initial data x 0 parameterized as x 0 = a(cos πθ, sin πθ) (with fixed a) which maximizes x(T ) 2 2 /a 2 , the normalized energy gain at time T . Therefore, the design parameter is q = θ which is allowed to vary over the admissible range Q = [0, 2). Based the above parameterization, all solutions x(t) start from the same distance x 0 2 = a. Depending on the value of a, the trajectories for x(t) will end up at different attractors of the dynamical system at time T . In the gradient-based optimization adopted in [32], the adjoint equation of Eq. (11) is derived using the Lagrange multiplier approach and is then solved backward from t = T to t = 0 in each iteration of the optimization. Here, we apply the BO algorithm explained in Section 2 to find the optimum θ for different values of a adopted in [32]. For all values of a, the BO starts from the same initial samples θ = 0.1 and 1.5. Our further study showed that the initial samples have no impact on the final optimal values, but they can slightly affect the sequence of samples and hence convergence of the BO. As shown in Fig. 1, the computed optima by the BO-GPR approach are accurate (compared to the true values) and the same as those in Ref. [32] where an adjoint method was employed. This is a significant advantage of the BO-GPR method, noting that solving the same problem with the adjoint-based method could be more involved and for some values of a the convergence of the algorithm would be a bit difficult to reach due to numerical instabilities. On the other hand, the convergence in all cases shown in Fig. 1 is achieved with less than 15 samples. The history of the optimization is presented in Figs. A.15 and A.16 in Appendix A.

Shape optimization of a lid-driven cavity
We now move to problems involving the Navier-Stokes equations. Consider an incompressible two-dimensional cavity flow, where all walls are at rest except the upper wall (lid) which moves with the constant velocity U 0 in the positive horizontal direction, x. The aim is to optimize the shape of the left and right walls so that the energy dissipation is either minimized or maximized with the constraint of keeping the total volume of the fluid confined in the cavity fixed and also retain the length and position of the upper and lower walls. Therefore, the objective is to find the wall shape which either minimizes or maximizes, keeping dx = m fixed. In Eq. (12), S is the strain rate tensor, where S ij = (∂u i /∂x j + ∂u j /∂x i )/2 for i, j = 1, 2 and u i denotes the i-th component of the velocity vector. In order to apply the Bayesian optimization of Section 2, we parameterize the geometry using a third-order polynomial, x = a 0 +a 1ȳ +a 2ȳ 2 +a 3ȳ 3 , wherex = x/l, ȳ = y/h, l is the length of the upper and lower walls, h denotes the height of the side walls, and, x and y specify horizontal and vertical coordinates, respectively.
Considering the origin of the coordinates at (x = 0, y = 0), the constraint of keeping l fixed leads to, a 0,L = 0 and a 3,L = −(a 1,L + a 2,L ) for the left wall, and, a 0,R = 1 and a 3,R = −(a 1,R + a 2,R ) for the right wall. As a result, four design parameters are left to optimize, q = (a 1,L , a 2,L , a 1,R , a 2,R ). The admissible space of q is the tensor-product of Q a 1,L = Q a 1,R = [−0.7, 0.7] and Q a 2,L = Q a 2,R = [−0.5, 0.5] which are arbitrarily chosen to allow for different wall shapes to occur. For any sample taken from Q, first the shape of side walls is determined. At this stage, the height of the walls, h, is adjusted to satisfy the constant-volume constraint. Then, the mesh is generated using gmsh [23]. The simulations are performed using Nek5000, an open-source spectral-element-based flow solver developed by Fischer et al. [19]. In Nek5000, the weak form of the incompressible Navier-Stokes equations are integrated in time on Gauss-Lobatto-Legendre (GLL) points through expressing the velocity and pressure fields in terms of Lagrange interpolants of the Legendre polynomials, see the details in Deville et al. [16]. To avoid the discontinuity in the velocity at the top corners where the moving lid and the side walls meet, the smoothing suggested in Ref. [47] is applied. In all simulations, 30 elements in both x and y directions are considered with 10 GLL points in each spatial directions per element. Simulations start at t = 0 from a uniform velocity field and continue until t = 50l/U 0 , which is chosen to ensure the steady-state value for dissipation (12) has been reached. For both minimization and maximization problems, the BO starts from same four initial samples for q that are Then, decision about next sample of the design parameters is taken. This algorithm is repeated until optimal design parameters are obtained. For this example we only consider steady laminar flow, however an extension to turbulent flow is possible in a straightforward way.
For a cavity flow at Reynolds number Re = U 0 l/ν = 2000 (ν is the kinematic viscosity), Fig. 2 shows the obtained shapes of the cavity with minimum and maximum energy dissipation. The obtained shapes are consistent with the flow physics, for instance, in case of the maximum dissipation two vortices adjacent to the lid are generated which prevent the fluid underneath to receive a high velocity. The obtained optimal shapes can be compared to results reported by Nakazawa [45] using an adjoint method for optimization, although the setups in the two studies are slightly different. In Ref. [45] the lid velocity was assumed to vary with x (it might be hard to interpret it physically) and the Reynolds number Re was higher than the critical value of about 8000 [6] (although no treatment for turbulence modeling was mentioned). For the case of minimizing the dissipation, the shape in Fig. 2 (top) agrees well with [45]. But we found the shape in Fig. 2 (bottom) has a slightly higher dissipation compared to the maximum-dissipation case reported by Nakazawa [45] in which both side walls were deformed towards the inside of the cavity. This can be an indication for the ability of the Bayesian optimization in exploring different possible geometries and finding the global optimum without quickly getting stuck in a local optimum. Following the discussion in Section 2 regarding the convergence of the BO, the reported optimal shapes in Fig. 2 are based on the parameters which were found to be the best values within a sample size of n = 50. In fact, the parameters for minimizing and maximizing the dissipation were already found at the 20-th and 15-th iterations, respectively. However, the extra samples were taken to ensure the parameter space is sufficiently explored. It is recalled that the definition of the convergence criterion in Bayesian optimization is not straightforward, see [9,66]. Therefore, in order to impose an appropriate stopping criterion, the whole history of the samples has to be considered. The history of the BO-GPR for the optimization cases of the cavity flow is plotted in Figs. A.17 and A.18 in Appendix A.

Motivation
Studying in a controlled fashion pressure-gradient (PG) turbulent boundary layers (TBLs) is important since they are present in a wide range of industrial applications, for instance, the flow around the curved boundaries such as car bodies, airplanes, airfoils, etc. The so-called Clauser pressure-gradient parameter β [12,13] is widely accepted as one of the most relevant non-dimensional parameters for PG TBLs [64], which is defined as, where δ * , τ w and dP e /dx are, respectively, the displacement thickness, the magnitude of the wall-shear stress and the pressure gradient along the boundary-layer edge in the streamwise direction. Since the history of the PG β(x) has a significant impact on the characteristic of the TBLs, flows with a constant β in the streamwise direction are very relevant for the study of PG TBLs [5,18]. Despite their importance, there are few studies in the literature on flows with constant pressure gradient, mainly due to the difficulty of setting up this configuration with sufficient accuracy. In experiments, a desired β distribution at the edge of a TBL is usually achieved by changing the wall shape of wind tunnels through a trial-and-error process, see e.g. Sanmiguel Vila et al. [64]. This is, in fact, very time consuming, since β is needed to be measured in the streamwise direction at each iteration. Consequently, it is very difficult to achieve a long constant-β region in the streamwise direction. If a proper shape of the wall could be known prior to an experiment, not only time and cost could be saved, but also a more accurate constant-β distribution would be eventually obtained. The objective of this section is to efficiently obtain the optimal shape of the upper wall of a channel using the Bayesian optimization described in Section 2 so that a target β along the edge of the TBL over the lower wall is achieved. We consider both zero-pressure-gradient (ZPG) and adverse-pressure-gradient (APG) TBLs with constant values for the target β. Besides these, an optimization is considered where the target β distribution varies in the streamwise direction. Such distribution is taken from the flow around a NACA0012 airfoil. This demonstrates the applicability of the BO method to a wide variety of target distribution for the pressure gradient β.

Problem setup
The considered flows have high Reynolds number and are hence turbulent. Therefore, we perform incompressible Reynolds-averaged Navier-Stokes (RANS) simulations using the open-source software OpenFOAM [83]. In particular, the simpleFoam solver is used which is based on the SIMPLE scheme [55] for velocity-pressure coupling. The k − ω shearstress transport (SST) turbulence model introduced by Menter et al. [39] is used, since for TBLs this model has shown better agreement with experimental data compared to other models, see e.g. Vinuesa et al. [79]. Note that the default values are used for the RANS model coefficients.
A two-dimensional domain is considered to mimic the conditions that may exist in a wide wind tunnel, see the schematic in Fig. 3. The domain has a fixed-shape flat-plate lower wall and an upper wall whose shape is subject to optimization. The domain length in the streamwise direction and the domain height at the inlet are denoted by L x and L in y , respectively. In particular, we consider L x /δ in 99 = 1000 and L in y /δ in 99 = 40, where δ in 99 is the 99% boundary layer thickness at the inlet. In general, δ 99 is defined as the distance from the wall at which the mean streamwise velocity becomes 99% of the local edge velocity, U e . In the present study, U e represents the local maximum velocity of the lower-wall's TBL. Note that the diagnostic-plot scaling method proposed by Vinuesa et al. [80] is not applicable to RANS due to the lack of information about the instantaneous velocity. Note also that δ in 99 pertaining to the upper and lower TBLs are the same since the inflow condition for the ZPG-TBLs over the upper and lower walls are taken to be the same. The inflow profiles are assigned based on the DNS data of ZPG-TBL from Schlatter and Örlü [65], see the details in Ref. [43]. The upper wall is constructed by using a spline function to smoothly connect the intersection of the channel inlet and the upper wall, i.e. x = 0, y = L in y , and d  control points. The control points are equally spaced in the streamwise direction. The heights y of these points are subject to optimization, therefore the (x, y) coordinates of the i-th control point are defined as (iL x /d, q i ) where i = 1, 2, · · · , d, see Fig. 3. When applying the BO of Section 2, for each sample of q a new geometry for the domain is obtained for which a structured hexahedral mesh is generated using OpenFOAM standard meshing function, blockMesh. In order to resolve the near-wall region of the TBL, a fine mesh resolution is needed near the wall. The mesh is constructed in such a way that y + 1 = u τ y 1 /ν remains below unity at the walls. Here, y 1 is the distance of the first computational cell center from the lower wall, ν is the kinematic viscosity, u τ = √ τ w /ρ is the wall-friction velocity with τ w and ρ respectively denoting the wall-shear stress and fluid density.
The aim of the optimization is to obtain a pressure-gradient distribution sufficiently close to a given target β t . Therefore, we can formulate a minimization problem whose objective function is defined as an error between the computed β distribution and the target distribution, where, · L 2 denotes a standard L 2 -Lebesgue norm evaluated over the domain of x. Note that regions close to the inlet and outlet of the domain have to be excluded when evaluating r(q) to avoid any inlet or outlet effects on the β distributions, see Fig. 3. The remaining range of x over which the norm in Eq. (14) is computed is referred to as optimization range which may be chosen adequately for each optimization case as explained in Section 5.3. For additional details about the problem setup, refer to Ref. [43].

Optimization
As explained in the Section 5.1, optimization is conducted for three different distributions of β: constant values of zero (ZPG) and one (APG), and a non-constant distribution corresponding to the flow around a NACA0012 airfoil. These optimization cases are named as Constant-0, Constant-1 and Wing, respectively, and are discussed in the following subsections. The conditions of these cases are summarized in Table 1. Prior to the optimization, numerical experiments are conducted to decide the number of control points, admissible range of parameters, optimization range, and the total number of iterations.

Constant-β distribution
For the Constant-0 case, essentially a correction for the displacement effect of the growing boundary layers is expected, which yields a comparably simple shape of the upper wall. Thus we chose to only use 2 points along the length of the  14), the validity of the computed optimal parameters can be examined. The results of the optimization are shown in Fig. 4. It is observed that the global minimum of r is found in the considered admissible space and the associated β distribution is highly accurate (within ±0.005 of the target value of zero). The 95% confidence interval of r(q), which is shown in Fig. 4(d), is smaller close to the optimum because of the higher number of samples.
Similar to the Constant-0 case, optimization of the Constant-1 case, i.e. for β t = 1 is conducted. Since the target distribution of β is constant but larger than zero, the optimum shape of the upper wall is expected to be only moderately complex; thus, 4 control points are used. The optimum is found at the 40-th iteration, whose results are shown in Fig. 5. The optimal shape of the upper wall is steeper than that of the Constant-0 case, since the target β t is higher. In the Constant-1 case, the computed optimal β distribution remains almost constant in the optimization range and deviates from the target β t by less than ±0.02. It is noteworthy that since the inflow is taken from a ZPG-TBL, a development length is needed to reach β = 1.
This development length is chosen to be 300δ in 99 . We observed that forcing a faster development of the β through imposing a shorter development length, could lead to the separation of the TBL over the upper wall and hence make it more difficult to achieve a constant β for the lower-wall's TBL. As separation is typically unwanted in a wind-tunnel setting, we prefer to have a longer development length avoiding this issue.
To validate the results of the constant-β TBLs obtained from the optimization, the streamwise development of the skinfriction coefficient c f and the shape factor H are compared to the reference data, see Fig. 6. The skin-friction coefficient and the shape factor are defined as c f = 2(u τ /U e ) 2 and H = δ * /θ , where δ * and θ respectively denote the displacement and momentum thickness. As the reference data for the constant-β TBLs, the correlations based on the data compiled by Vinuesa et al. [81] and the experimental data from Sanmiguel Vila et al. [64] are adopted. The Reynolds number based on the momentum thickness, Re θ = U e θ/ν, is used as the horizontal axis of the plots in Fig. 6, so that the direct comparison is possible. Clearly, the optimized cases show excellent agreement with the reference data in the region where β is approximately constant (Re θ 3000). The Constant-1 case shows a transition from β = 0 to β = 1 profile because of the development of β in the streamwise direction, see Fig. 5(b). Note that the experimental β distribution of Sanmiguel Vila Note that the range of the vertical axis in Fig. 4(a) and Fig. 5(a) are set differently for illustration purposes. et al. [64] has a variation in the streamwise direction (about ±15%) because of the difficulties to set the constant-β distributions in the experiments, as mentioned in Section 5.1. As a conclusion, not only the applicability of the optimization method to obtain constant-β distributions is confirmed but also the validity of the resulting constant-β TBLs is approved.

Non-constant-β distribution
The same optimization procedure can be applied when β t is not constant in the streamwise direction. The target β distribution is taken from the flow around a NACA0012 airfoil. Note that due to the geometrical symmetry with respect to the airfoil chord, β distributions of the suction and pressure sides are identical for NACA0012 at zero angle of attack. The distribution of the target β is taken from the numerical simulation by Tanarro et al. [74]. Since the target β is originally defined based on the normalized coordinate x/c, where c denotes the chord length, it has to be mapped to the channel normalized coordinate in the streamwise direction, i.e. x/δ in 99 . Here, the original data are mapped to the first 95% of the channel length, i.e. 0 < x/c < 1 → 0 < x/δ in 99 < 950, to avoid any effect of the outlet boundary. Furthermore, the first 15% of the channel is excluded from the optimization range because the reference data close to the leading edge is not available.
According to Ref. [74], this is due to the difficulty of calculating δ 99 with the diagnostic-plot scaling method near the leading edge since the flow is not fully turbulent. To be able to achieve the relatively complex β distribution along the edge of the TBL at the lower wall, the parameterization of the upper wall of the channel is done with 8 control points, see Table 1.
The results of the optimization are shown in Fig. 7. The optimum is found at the 52-nd iteration with associated β being found very close to the target distribution. In fact, the largest deviation which occurs around x/δ in 99 ≈ 850 is less than 0.54.
This promising result proves that the Bayesian optimization method can also be used for more complex pressure-gradient distributions. This is, in fact, very useful for future wind-tunnel experiments to study wall-bounded turbulent flows with arbitrary β distributions, noting that the common approach in practice is based on trial-and-error, which can be inaccurate and time consuming. The results of the present section have been obtained by RANS, which may have its own deficiency in fidelity compared to e.g. large eddy simulations or even direct numerical simulations. However, the underlying flow simulation method does  not affect the performance of the optimization procedure. Therefore, the current results could be even achieved using higher-fidelity methods, or even -if properly implemented -in an experimental setup.

Optimization of a spoiler-ice airfoil model
Finally, in this section we present a more applied flow case which shows the potential of the BO framework also for industrial flow cases where optimal designs are sought.

Motivation
When operating in cold climate, the performance of the wind turbines can be reduced by icing [8,72]. In an extreme condition, heavy icing can even lead to a complete stop of the turbine. Icing is also a risky condition for airplane wings since it may induce flow separation at a small angles of attack, which then might lead to stall and consequently loss of lift. Ice accretion is a complex process which depends on both aerodynamics and thermodynamics. The process is affected by many parameters, for instance, ambient temperature, surface properties, relative speed of the airfoil, and the time span over which the icing event takes place. As a result, numerous possibilities for an ice profile exist. Bragg et al. [8] categorized the leading-edge ice on the airfoils into four types: roughness, horn, streamwise, and spanwise-ridge ice. In this study, we focus on the horn and streamwise ice, see Fig. 8.
The main characteristic of the horn ice is the presence of the large protuberances which induce a large flow separation downstream of the ice, and hence dramatically affect the aerodynamic performance, including the lift, drag and moment coefficients. On the other hand, streamwise ice is smoothly formed along the streamlines and is less dangerous in the sense of having adverse effects on the aerodynamic performance. Fig. 9. Original iced airfoil used in the present study. The ice is formed on a clean NACA64618 airfoil, which is shown with a black solid line. The original ice shape is illustrated with a red solid line and a filled gray region. Coordinates are non-dimensionalized by the chord length of the clean airfoil, c. Note that the origin of the x coordinate is set as the leading edge of the clean airfoil, and this is the convention throughout Section 6.
Modeling an iced airfoil is a challenging task in both CFD and experiments since the ice profile essentially has infinite degrees of freedom, noting that every ice shape is unique and consequently the investigations would be case-dependent. In order to generalize the icing assessments, one needs to find specific characteristics to parameterize the geometry such that various ice profiles can be described with the reduced-order model. The simplified "simulated-ice" concept has been considered as a representative of the actual ice profile. In a simple representation, Papadakis et al. [50][51][52][53][54] approximates the horn-shaped ice from glaze ice conditions as spoilers. As a result, the profile of the horn is reduced to a thin plate and its effect on the aerodynamic properties is parameterized by just its height, angle, and location. In particular, the thin plates which are called the "spoilers" and have the same height as the original ice can be used to reproduce the aerodynamic performance of the ice, see Fig. 8. The resulting model is called the "spoiler-ice" model. Although the height, angle, and location of the spoilers have been considered as important parameters which affect the airfoil aerodynamic performance, see e.g. Refs. [52,51], Tabatabaei et al. [70,71] suggested that the thickness of the spoilers should be also taken into account as an effective parameter.
Although parametric studies of the spoiler-ice model have been conducted to investigate associated effects on the aerodynamic performance, see e.g. Refs. [52,51], it is not yet clearly known which configuration of the spoiler is appropriate to accurately represent the aerodynamic performance of an arbitrary given real ice profile. The BO method introduced in Section 2 can be utilized to find the optimal parameters of the spoiler configuration. Although the spoiler-ice concept has been previously used mainly for the horn ice, see e.g. Ref. [53], here we also consider to apply the idea to the streamwise ice. Therefore, an original arbitrary ice shape is chosen so that it has both horn and streamwise ice on the suction and pressure side of the airfoil, respectively. Two spoilers on the suction and pressure sides (the upper and lower spoilers, respectively, shown in Fig. 10(a)) are used to obtain a similar effect of the original iced airfoil. Since a spoiler-ice with the same height as the original ice shape has been used in the previous studies [51][52][53], here the optimization is first conducted with the constraint of keeping the heights of the spoilers fixed. However, it is shown in Fig. 14 that the results of the optimization without such a constraint are in better agreement with those of the original ice. It will thus be shown that a more complete parameter optimization may indeed be relevant for accurate simulations.

Problem setup
In this section, we describe the setup and implementation of the icing flow case, including the basic simulation, the meshing and optimization loop.

Original ice profiles and the spoiler ice
As clean airfoil, a NACA64618 profile is considered, which has been used in a 5MW NREL (National Renewable Energy Laboratory) wind-turbine blade. For the purpose of the optimization, an arbitrary original ice shape is required as a reference which is taken from the ice formed on the leading edge of the airfoil studied in Ref. [22], see Fig. 9. The ice on the upper (suction) and lower (pressure) sides of the airfoil is of the horn and streamwise types, respectively.
As mentioned in Section 6.1, the flow around the original iced airfoil can be modeled by two spoilers each located on one side of the airfoil. For the purpose of optimization, the configuration of the spoilers is controlled by seven parameters: heights, angles and widths of the upper and lower spoilers, h u , h l , θ u , θ l , w u and w l , respectively, and the displacement of the lower spoiler, (d l ), see Fig. 10(a). The displacement of the upper spoiler is not taken into account since the original ice shape on the suction side is horn ice, which has a clear connection between the position of the horn of the original ice and the position of the spoiler-ice. On the other hand, the pressure side of the original ice is of streamwise type, hence it is not, in general, clear where the spoiler-ice should be located. To rectify this, the displacement parameter d l is needed for the lower spoiler to optimize its position as well as the other parameters.
Prior to the optimization, several numerical experiments are conducted to decide the admissible range of the parameters and the total number of iterations. Fig. 10(b) shows the configuration of the spoilers when the design parameters are at either end of their admissible range. Note that the upper limit of the admissible range of the heights, h u and h l , is the same as the highest point of the original ice profile.

Numerical simulations
Numerical simulations are conducted considering a cross section of an iced-airfoil inside a (virtual) wind tunnel. Twodimensional steady and incompressible RANS are performed using OpenFOAM with the same solver settings and turbulence model as mentioned in Section 5.2. The computational domain, see Fig. 11, is made based on the test section of the Minimum-Turbulence-Level (MTL) wind tunnel, which is a closed-loop circuit tunnel housed at KTH. Additional information about the MTL wind tunnel can be found in e.g. Ref. [37]. The commercial software Ansys ICEM CFD 18.2 [1] is used for generating high-quality meshes in the relatively complex geometries involved in the problem. The mesh is constructed to be sufficiently fine, and in particular y + 1 is kept below 1 all around the airfoil. The resulting total number of computational cells is about 1 million. Note that the number of computational cells slightly differs for the simulations associated with a particular configuration of the spoiler ice. To put the computational cost in perspective, each flow simulation takes about one hour running in parallel on 24 cores of Intel Xeon E5-2690v3 Haswell processors.
To measure the similarity of the aerodynamic performance of a spoiler-ice model and the actual ice profile, the pressure coefficient (c p ) distribution around the airfoil is considered, since it is directly related to the aerodynamic characteristics such as the lift and drag coefficients. The objective function in the optimization is thus defined as the L 2 -norm of the error between the computed c p distributions on the suction and pressure sides of the airfoil, and the corresponding distribution where 2 f =1 specifies the summation over the suction and pressure sides of the airfoil and c p,t denotes the c p distribution of the original iced airfoil. Note that the objective function is evaluated only downstream of the original ice, meaning that the optimization range is set to 0.1 < x/c < 1 for both suction and pressure sides of the airfoil.
Prior to the optimization, a simulation of the original iced airfoil is conducted to obtain the target pressure-coefficient distribution c p,t . The resulting velocity contours and pressure-coefficient distribution are shown in Fig. 12. Because of the ice formed on the leading edge, there are separation bubbles right after the ice on both suction and pressure sides. Moreover, the c p near the leading edge exhibits sudden changes in the streamwise direction. Note that x/c 0.1 is still on the ice surface (in fact, the ice ends at x/c = 0.086 and 0.0975 on the suction and pressure sides, respectively). Therefore, the optimization range is set to 0.1 < x/c < 1, as represented in Fig. 12.

Optimization
Optimization of the design parameters is conducted with two configurations: spoilers with fixed height and flexible height. As mentioned in Section 6.1, glaze ice has been simulated by spoiler ice whose height is the same as the highest point of the original ice [51][52][53]. As a result, one optimization is conducted with the constraints of having the heights of the spoilers fixed, and the case is referred to as "fixed-height optimization". Imposed by these constraints, the number of design parameters is reduced to d = 5. Since the height of the spoiler ice is a relevant parameter on the aerodynamic performance [72,15], the second optimization is conducted without the above constraints, and the case is called "flexibleheight optimization". Allowing the height of the spoilers to change, the number of design parameters becomes d = 7. The In Fig. 13, the separation bubbles and vortex structures are illustrated as they appear behind the ice profile and the spoiler ice on both pressure and suction sides. The general structure is in agreement with the previous studies [51,71], where the primary and secondary vortices were observed around the spoiler-ice through numerical simulations. In fact, it is not even necessary that the flows downstream of the original ice and the spoiler ice have similar vortex structures since the primary purpose of using the spoiler-ice is to model the aerodynamic characteristics on an iced airfoil. The flexibleheight optimization successfully yields a reattachment point similar to the original iced-airfoil case, while the reattachment point is pushed further downstream for the fixed-height optimization. This has a direct influence on the c p distribution, see Fig. 14. In fact, the c p distribution of the fixed-height case has four times larger deviation from the target distribution, i.e. r in Eq. (15), compared to the flexible-height case. It is also observed that for the fixed-height optimization, the optimal  Fig. 12), and the optimized spoilerice models. The corresponding flow fields are represented in Fig. 13. The solid and dotted lines specify the suction and pressure sides of the airfoil, respectively. The black lines represent the target distribution, c p,t , which is also shown in Fig. 12(b). Note that the optimization range is 0.1 < x/c < 1, which specifies a region downstream of the original ice. The distribution of the clean airfoil is also presented for reference. parameters q opt are located at the borders of the considered admissible space. This means that the global optimum could change value if the admissible space would be different. Recall that the admissible range of the parameters are limited due to the geometrical and meshing constraints. In contrast, for the flexible-height optimization, the optimum q opt is found within the admissible space and leads to a better agreement with the result of the original iced airfoil. Therefore, despite being used in the literature, keeping the heights in the spoiler-ice model fixed is not a sufficiently accurate assumption, and considering also the height as an design parameter leads to higher fidelity in the results.
The obtained c p distribution from the flexible-height optimization is almost identical to the target distribution within the optimization range except on the pressure side around x/c = 0.2. The slight deviations are due to the fact that the original ice on the pressure side is of the streamwise type, which is basically more difficult to be represented by a spoiler-ice model. The BO has also shown to be an efficient optimizer even for this applied CFD case. The coupling to the CFD code and meshing software are quite straightforward, and can be done non-intrusively. Note that no derivatives of the target function are needed, which makes the present approach suitable for general CFD applications.

Summary and conclusions
The Bayesian optimization based on Gaussian process regression (BO-GPR) has been applied to different CFD problems ranging from purely academic to industrially relevant setups, using state-of-the-art simulation methods. The diversity of the examples with different numbers of design parameters helps to better understand the applicability and performance of the BO-GPR approach which has not been frequently used in CFD and turbulent flows despite being a primary choice in other fields, e.g. data sciences. The use of the BO-GPR is straightforward noting that the Bayesian optimization is among the gradient-free approaches and hence only requires forward evaluation of the quantities of interest (QoIs), i.e. running a CFD code. In the BO, a sequentially-updating set of samples is taken from the space of the design parameters, such that the sequence converges to the global optimum after a finite number of iterations. A main advantage of the BO-GPR is its versatility, meaning the algorithm is non-intrusive and can be utilized with any CFD code and for any number of design parameters. The use of a GPR surrogate provides the possibility of estimating the confidence in the values of the cost function over the admissible space of the parameters. What is needed to create the optimization loop is to automatize the whole process through appropriate script-driven interfaces between the BO and CFD codes. The software developed in the present work is based on GPy [25] and GPyOpt [75] libraries and is available online. 1 Although the optimization problems studied in the present study contain as many as 8 design parameters, the maximum number of iterations to ensure obtaining a reliable global optimum is no more than 90. This number should be compared with the number of function evaluations necessary e.g. for adjoint-based optimizations. More interestingly, by changing the number of design parameters from 2 to 8, the number of required flow simulations does not significantly increase. Therefore, the computational effort for BO-GPR is observed to be affordable for many practical applications, especially if the RANS (Reynolds-averaged Navier-Stokes) approach is employed for the simulation of turbulence. However, similar to other gradient-free approaches, the BO-GPR may become inefficient for a large number of design parameters. As shown for instance by Wu et al. [84], the efficiency of BO-GPR can be improved by adding sensitivity information from the gradients of the cost function computed by adjoint methods. Application and analysis of such modifications will be considered in the future extensions of the present study.
We demonstrate the BO-GPR approach on two comparably simple problems in Sections 3 and 4 for an analytical test problem and a lid-driven cavity. In both cases, our approach could find the global optimum, as compared to previous liter-ature results of the same cases. The optimization required about 20 iterations for the four-dimensional problem, exploring the complete parameter space in an efficient manner. It can be observed that the Bayesian technique quickly focuses on the regions where the potential optima are located. In addition, the uncertainty information about the optima can be retrieved from the approach.
A more physically relevant problem is considered on the example of the shape optimization of the upper wall of a two-dimensional channel. Here, the goal is to obtain a given target pressure-gradient (PG) distributions in a turbulent boundary layer developing over a flat wall, which is achieved by adjusting the location of the upper wall in a variableheight channel flow. The flow is simulated using the open-source finite-volume-based solver OpenFOAM [83] adopting the RANS approach to model the turbulence. Accurate results are obtained for the target pressure gradient regardless of being constant or varying in the streamwise direction. Different numbers of design parameters 2, 4, and 8 are considered for which 25, 50, and 80 simulations are respectively needed to find the corresponding global optimum. The results of the optimum configuration for the constant-PG flows show an excellent agreement with the desired pressure-gradient distribution, and as consequence also with the correlations and experimental data reported in the literature for integral quantities of pressure-gradient boundary layers. In addition, promising results for the non-constant target PG distribution suggests that the BO-GPR approach can be applied to various practical applications including the design of inserts in wind tunnels. This is of special interest noting that the current approach for such designs is based on trial and errors, with limited accuracy and robustness.
Finally, we also apply the Bayesian optimization to an industrially-relevant case, in an effort to show that the approach can readily be used in an applied context. The goal is to optimize the configuration of the spoiler ice which is a simplified yet useful model for the actual ice profiles formed on the airfoils in cold conditions. The objective is to match, within a small margin of error, the resulting pressure distribution of an airfoil with modeled ice with that of the same airfoil but with actual ice profile. The RANS simulations using OpenFOAM for design parameters of dimension 5 and 7 are examined for which the maximum number of iterations in the optimization is found to be equal to 50 and 90, respectively. The encouraging results of the optimization prove the applicability of the BO-GPR framework to a relatively complex parameter optimization and also re-confirm the validity of the spoiler-ice models. It is also observed that keeping the heights in the spoiler-ice model fixed, as it is convenient in the literature, would lead to inaccurate flow simulations. Moreover, although the spoiler-ice concept has been mainly used in the literature for modeling the horn ice, see Refs. [51][52][53], the present study reveals that the model can also be applicable to the streamwise ice, conditioned on adopting the optimal parameters.
The present study can be extended in several ways. Our future attempts will include the use of scale-resolving approaches for simulation of the turbulent flows, instead of RANS which has been adopted here. In this way, even the effect of time-averaging on transient data can be considered. In addition, larger numbers of design parameters could be inquired, to identify the impact of dimensionality on the complete optimization problem. To make the BO more affordable, at least two strategies can be considered. As it is suggested for instance in Ref. [42], linear or non-linear mappings (feature mappings) can be introduced to map the high-dimensional space of the design parameters to a reduced-dimension space where the optimization of the acquisition function can be performed. As the second remedy, multifidelity modeling (MFM), see Refs. [56,49,60], can be combined with BO. In MFM, an accurate surrogate of the objective function can be constructed combining a large number of low-fidelity CFD simulations with only a few high-fidelity ones.

Declaration of competing interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.   These plots are associated with the optimization problems represented in Fig. 2.  Fig. A.19. Impact of increasing the number of samples on the surrogate of the objective function for the Constant-0 case in Section 5. The plots are associated with those presented in Fig. 4.   Fig. A.20. Convergence of the samples in the BO-GPR for the Constant-0 case in Section 5. The plot is corresponding to the optimizations shown in Fig. 4.