Efficient Design Optimization Assisted by Sequential Surrogate Models

. The paper proposes a global optimization algorithm employing surrogate modeling and adaptive in ﬁ ll criteria. The surrogates are exploited to screen the design space and provide lower-ﬁ delity predictions across it; on the other hand, speci ﬁ c criteria are designed to suggest new points for high-ﬁ delity evaluation so as to enrich the optimizer database. Both Kriging and radial basis function network are used as surrogates with di ﬀ erent training strategies. Sequential design is achieved by introducing several in ﬁ ll criteria according to the realization of the exploration-exploitation trade-o ﬀ . Optimization results are provided both for scalable and analytical test functions and for a practical aerodynamic shape optimization problem.


Introduction
The trade-off between high fidelity and short response time is an essential part of today's real-world engineering design applications.On the one hand, the industrial request to shift the focus and part of the costs from experimental to numerical design and analysis leads to the introduction of more and more physics modeling into numerical simulation codes.On the other hand, the price to pay is related to increasing computational time of single analyses and design cycles which is detrimental for the purpose of reducing the time to market.This trade-off is even more evident when computational fluid dynamics (CFD) is involved in the design loop as the need to accurately evaluate very complex configurations and the required high number of CFD simulations represents a further issue.The engineering computational design process may be speeded up by either accelerating the high-fidelity evaluation or reducing/accelerating the steps of the numerical algorithm used for exploring the design space.In the first group, High-Performance Computing (HPC) methods are numbered to increase the global performance of a single call to the CFD solver (through code parallelization, vectorization, and profiling).With reference to the second group, several research trends are focused on reducing the problem dimensionality (hence, the iterations needed to find the solution), improving the search algorithms for achieving faster and better solutions, and tuning the algorithm parameters to automatically and efficiently adapt to the response.An interesting alternative branch is represented by surrogatebased or metamodel-assisted optimization (SBO) which was originally conceived to relieve the computational loading associated with the usage of black box response functions.SBO consists in replacing the high-fidelity model (or "truth" model, e.g., the CFD simulation) with a fast, lower-fidelity model which has preliminarily "learned" from high-fidelity data.Since the pioneering work by Jones et al. [1], several theoretical studies [2][3][4][5][6][7][8][9][10][11][12][13][14][15][16][17][18] have been published on the topic.The proposed methods differ for one of the following items: the employed surrogate model (e.g., model type and single or multiple models), the training approach (e.g., optimizing the prediction error, the cross-validation error, the generalized cross-validation error, and the likelihood function), the model updating strategy (e.g., usage of surrogate minimizers, infill criteria, and random criteria), and the optimization method adopted to find the model parameters and to explore the surrogate (e.g., heuristic, gradient-free or gradient-based, and global or local).SBO has been successfully applied in the aerospace engineering field [19][20][21][22][23][24][25][26][27].In continuity with other works by the author [28][29][30][31], the present paper proposes an adaptive SBO framework for design optimization with different updating strategies and optimization algorithms.A sketch of the main building blocks is provided in Figure 1.A choice of surrogate models is also available for selection.The main novelty of the paper is in the proposal of two groups of diverse infill strategies and in the capability to apply many of them during the adaptive sampling cycle by defining activation probabilities.Moreover, the usage of two different optimization libraries (public domain NLopt and in-house ADGLIB) allows performing both hyperparameter optimization for surrogate model training and objective function minimization with a variety of approaches.Finally, while previous investigations focused only on aerodynamic optimization cases, here an extensive study is carried out on analytic test functions in order to quickly assess the performance of the algorithm on known data.The paper is structured as follows: the first section is devoted to the surrogate model definition and to the training methods; then, the sequential design by means of various infill criteria is discussed and some examples on a basic test function are proposed; furthermore, having introduced all the computational pieces, the whole surrogate-based sequential optimization algorithm is described in detail and interactions between subphases are highlighted; finally, the experimental test campaign on multidimensional scalable test functions is discussed as well as the results obtained by using different setups; an example of real-world application is given in the very final section where a benchmark aerodynamic shape optimization case is faced and results are compared with a previous work by the same author [31].

Surrogate Models
2.1.Kriging Model.The Kriging model assumes that the function value at each point in the domain is represented by a separate random variable correlated with all the other points.Given that f x is the function response of interest, a Kriging surrogate is defined as a realization of a regression model h and a stochastic process z having zero mean, process variance σ 2  k , and covariance model R θ, x 1 , x 2 between z x 1 and z x 2 with parameter vector θ [32]: where β is the regression coefficient vector and h is the regression vector.The correlation between training sites x j j=1,⋯,M is condensed within the covariance matrix and given by K ij = R θ, x i , x j .In multidimensional cases, the covariance is obtained as a tensor product of onedimensional covariance functions: where D is the dimension of the problem, θ p is the length scale in the p-th dimension, x ip is the p-th component of the vector x i , and Kr is the one-dimensional Matern function: In practice, the covariance matrix may become illconditioned or regression features are required to handle noisy functions.As a consequence, "nugget" or noise terms are plugged in the covariance matrix diagonally which becomes K ij = R θ, x i , x j + λδ ij , where the Kronecker convention has been used and λ is the noise ratio.Predictions at a generic location x are obtained by using the following expression: where H is the linear regression matrix and β is the generalized least square estimate of β, K is the covariance matrix, k is the covariance vector between the generic design site x and the training sites, and f = f 1 , f 2 , ⋯, f M T is the vector of function values.The stochastic nature of the Kriging model allows the obtainment of an estimate of the prediction variance in the form: where σ 2 k is the estimated process variance, u = H T K −1 k − h, and h = h 1 , h 2 , ⋯, h M T .The prediction and its variance are both functions of the hyperparameters, i.e., the length scales θ p , the process variance σ 2 k , and the noise magnitude λ.The hyperparameters have to be tuned in order to make the model output more "likely" against a given set of training data.In other words, the aim is to maximize the probability that the observed data follows the Gaussian process assumed with a specific set of hyperparameters.This is typically achieved by optimizing the maximum likelihood estimator (MLE) [33].In the following, two optimization strategies are proposed and hereinafter referred to as "full optimization" and "partial optimization."The first is required when the function noise has to be fitted, while the second is required when ill conditioning of the covariance matrix is likely to occur.In both cases, the optimization algorithms are called from the NLopt library (available online at http://ab-initio.mit.edu/nlopt) in a loosely coupled globallocal approach: first, a global exploration is performed with the evolutionary strategy ESCH algorithm [34]; then, the best solution is taken as the starting point of a local refinement by using a reviewed version of the Nelder-Mead simplex algorithm [35].
2.1.1.Full Optimization.In this approach, the regression parameters are found by imposing an optimality condition and given by All the hyperparameters (length scales, process variance, and noise level) are obtained through the maximization of the likelihood function given by where M is the number of training points, S is the number of terms in the regression, and the regression matrix A is defined as 2.1.2.Partial Optimization.In this formulation, the process variance and regression parameters are both computed thanks to the optimality condition; hence, the optimization process involves only the covariance length scales θ p .The optimal process variance is set to the value which cancels the partial derivative of the likelihood function with respect to σ k , given by The likelihood function to be maximized reduces to This final formula is a function of the length scales θ p and the noise level ratio λ.The optimization is performed only over the length scales, and the noise level ratio is fixed throughout the optimization.A typical choice is to set the noise level λ to a small fraction of the process variance σ k to avoid ill conditioning.

Radial Basis Function
Network.Radial basis functions (RBF) are a powerful tool for data interpolation and regression.The response depends on the location of centres and on the Euclidean distance from the centres.By appropriately defining M centres and weighting the contribution of each RBF, a RBF network is obtained as Here, the centre of each RBF is x i and the weights are k i (which, in turn, depend on a regularization parameter λ).The amplitude of the radial basis functions is controlled by means of the width parameters θ i .The type of the RBF kernel r may assume various mathematical forms; here, the following types are considered (d = x − x i is the Euclidean distance from the centre): Great emphasis has been put in the RBF literature on the choice of the width parameters.Gutmann [36] showed that both the interpolation error and the solution matrix conditioning are highly sensible to the value of θ: in particular, theta should be low enough to improve stability; however, the highest approximation accuracy is often found for large θ value which may lay in the unstable region.As a consequence, a conflict arises between accuracy and stability, sometimes referred to as the "trade-off principle" [37].This is solved by using state-of-the-art optimization techniques and considering a unique scalar width θ for each RBF centre.The optimization algorithm autonomously chooses the kernel function type and optimizes the width parameters taking the leave-one-out cross-validation error as the objective function to be minimized.Similar to Tenne and Armfield [23], the procedure works as follows: (1) All the kernel functions are trained on the current training set (2) The leave-one-out (LOO) error norm is computed as

17
where f j is the value of the function at the j th training site x j and f −j is the RBF prediction at x j when the model is trained without x j and f j .The computation of the M terms f −j does not require the training of M RBF models; indeed, it can be computed effortlessly thanks to Rippa's formula [38] (3) The combination of the RBF kernel and width parameter which gives the lowest LOO error norm is sought by solving, for each kernel, the optimization problem: Two options are available: (i) Grid search: a grid of discrete couples θ, λ is generated, all the possible combinations of parameter values are evaluated, and the best combination (in terms of minimum ε LOO ) is retained (ii) Numerical optimization: the minimization problem (18) is solved considering the hyperparameters as continuous variables and using the same algorithms for searching the Kriging hyperparameters Once the optimal width parameters were found, the weights k i have to be computed.A regularization parameter λ (also known as ridge regression parameter in the RBF literature) is introduced to avoid overfitting and improve International Journal of Aerospace Engineering the interpolation matrix conditioning.By imposing the interpolation condition [37] on the training set, the weights are the solution of the linear system: where where

Sequential Design and Infill Criteria
This section will give details about the sequential enrichment of a surrogate model assisting the optimization task.Indeed, once a surrogate model is trained, a basic and simplistic approach would be to optimize the surrogate and find the model minimizers.A further step could be to reevaluate the suggested points with the high-fidelity model, update the training dataset, build a new surrogate, and then optimize again in an iterative manner to drive to true optimality quickly.However, feeding the surrogate back with no information about its error measure and no tendency to exploration would easily lead the updating process to get trapped in local minima.The weak point lies in considering the model prediction as the only source of "knowledge" for increasing the quality of the approximation ("exploitation").An advanced solution is obtained by mixing the information coming from the available data, the surrogate prediction, and the estimation of the predictive behaviour away from the training set ("exploration"): this could drive to a "smarter" selection of new points and, thus, improve the surrogate.The aim is to obtain a model that cleverly supports the optimization path, being ideally very accurate near the global/local optimal location and acceptably rough elsewhere.Of course, the updating strategy has to take into account the specific surrogate model at hand, so that the adaptive sampling criteria should be designed upon its features (e.g., vector or scalar data, availability of an estimated prediction error, and interpolation/regression character).Adaptivity is another key point in view of inserting new points depending on the samples and response function values collected so far.The exploration/exploitation trade-off usually drives the adaptation by mixing the contribution of high prediction error areas (exploration) with potentially promising regions (exploitation).Indeed, explorative search is a cornerstone for global optimization; however, it may lead to the continuous unveiling of poor regions of the design space; on the other hand, exploitation induces to trust the surrogate prediction, which surely improves the local accuracy but may also lead to being stuck in local minima.Only a proper and balanced combination of both components will be effective in leading to an efficient global optimization.
By way of example, Figure 2 illustrates the addition of a new point obtained, respectively, by employing pure exploitation, pure exploration, and balanced approaches.The picture shows the set of training points (black-filled circles), the true function (solid black line), and the surrogate prediction (dashed black line) built on the training points.Starting from this, a pure exploitation criterion places a new point at the surrogate global minimum, i.e., very close to one of the training points (triangle mark); by pursuing pure exploration, instead, the new point is located very far from the training set (circle mark), thus sampling is performed where the prediction uncertainty is highest; finally, the new point predicted with a balanced exploration/exploitation approach (square mark) is located in an interesting position very close to the true optimum: a new surrogate model trained on the old training set plus the new point will surely lead to the detection of the global optimum quickly.
Infill criteria are here referred to as means for adding new samples to the training set by designing auxiliary functions for generic surrogate models.Let us say x is the design vector which defines a generic location in the design space, f x is the objective function to be minimized, X n = x 1 , x 2 , ⋯, x n is the set of n available training points, F X n = f x 1 , f x 2 , ⋯, f x n is the corresponding set of true objective function values, and f x is the response at x of the surrogate model built on X n , F X n .[30]).International Journal of Aerospace Engineering An infill criterion is defined as finding a new sample x n+1 which maximizes an auxiliary function v called the potential of improvement: Hereinafter, the maximization of v is not achieved by using numerical optimization techniques, but rather by hugely sampling the design space (e.g., five hundred times its dimension) with a Latin hypercube sampling method and selecting the point at which the maximum value of v is met.Despite the size of the test dataset, the evaluation of ν requires limited computational effort as the function evaluation involves the surrogate prediction, which is fast to compute, and the true objective function values (already collected).An important point is related to the fact that, as the test dataset is generated many times along with multiple updating iterations, duplication of the selected sample may occur.In order to avoid this, the seed of the Latin hypercube is changed unambiguously at each iteration.
As concerns the type and nature of the potential of improvement function v, previous investigations [30] showed that error-driven infill criteria may lead to the intensive exploration of the design space in order to reduce the prediction error, but, conversely, this resulted in a lack of efficiency of the whole optimization process at fixed total computational budget.Hence, in the following section, the discussion will focus on approaches which proved to be more suitable to global optimization.In particular, two sets of criteria will be proposed: the first is based on the factorization of the potential of improvement in order to explicitly realize the trade-off between exploration and exploitation and the second is defined according to the expected improvement concept.
3.1.Factorized Infill Criteria.The first proposal of adaptive infill criterion is aimed at combining exploration and exploitation by means of a generic factorization as follows: The functions g and h measure the exploration and model trust contribution, respectively.In particular, the exploration function g should estimate how strong the influence of the set of already collected samples X n on a generic candidate x is.One of the preferred approaches is to make the function dependent on the Euclidean distance d x, x i between the generic design space location x and the i-th element of the training set X n : On the other hand, the exploitation function h should take into account how the surrogate prediction f compares with the available set of true objective function values F X n .In particular, this contribution should put emphasis on trusting the model prediction; hence, the h function should exhibit its maxima in correspondence with some identified features of f (e.g., minima, discontinuities, or local strong nonlinearities).Of course, different infill criteria can be designed by properly defining the functions g and h.A set of choices is presented here.
3.1.1.Leave-One-Out Error Criterion.The leave-one-out error (LOOE) criterion is aimed at individuating the regions where the surrogate model lacks accuracy and is much more sensible to the insertion of new designs.The factorization functions are as follows: where, given a generic location x, x i is the location of the nearest training sample.Of course, as this criterion will tend to select new candidates around training samples exhibiting the highest LOO error, the clustering of points around specific regions will be avoided.
3.1.2.Weighted Leave-One-Out Error Criterion.As the LOOE criterion may be too much exploratory and ignore the information given by the surrogate model, an alternative is given by weighting the function with a term which measures the trust in the surrogate prediction.The weighted leave-one-out error (WLOOE) criterion modifies the h function as follows: where σ is a tuning parameter, f min = min f x 1 , ⋯, f x n , and f max = max f x 1 , ⋯, f x n .This choice of the h function provides two main features: (1) The value of h approaches the LOOE prediction when f x approaches f min (2) For f x < f min , the value of the h function is higher than the LOOE With respect to the LOOE criterion, the multiplicative exponential term augments the error-minimizing term with a goal-oriented exploitation term: hence, "bad" candidates (according to the surrogate prediction) will be filtered out, while "good" candidates will be recognized and rewarded with higher rank.
Such a constant is usually employed to bound the nonlinear character of the function f : for instance, it gives an upper bound on the number of oscillations of a given amplitude or it limits the maximum and minimum value a function can assume in a given range.Of course, it has a local character; hence, a function may exhibit subregions with either small or large values of the constant.In the present context, the Lipschitz constant has to be estimated at every possible location within the design space.This is done by computing the K-means clusters of X n and considering the variation of f between the nearest training sample and the set of all nodes belonging to the cluster.Algorithm 1 details the estimation of the Lipschitz constant.
The effective number of clusters r is the result of an iterative cluster solution: indeed, it is required to have at least two candidates in each cluster in order to be able to compute the Lipschitz constant and, in some cases, this may not occur naturally (e.g., strong aggregation of training samples).As a consequence, r is initialized to int n/d , but it is downgraded every time a single-component cluster is found.According to the algorithm, a Lipschitz constant value is associated with each training sample: at a generic location x, it is assumed that the Lipschitz constant is the same as the nearest training sample, i.e., L x = L x nn , where x nn = arg min The functions g and h for the LC criterion are defined as g x, X n = min where the division by f max − f min has been introduced for normalization purposes and the exponential term provides for a tuned decay weight (through parameter γ) of the h function.In fact, when the design candidate x is near to a training point, the term min x i ∈X n d x, x i is small and the weight of the local Lipschitz constant is accordingly decreased with respect to a design candidate belonging to the same cluster and located farther.
3.1.4.Weighted Distance Criterion.The weighted distance (WD) criterion is based on the following definitions for functions g and h: where the nomenclature is the same as described above.
Again, the exploitation is given by the exponential term which gives confidence in the surrogate prediction, while the exploration element is represented by how far the candidate x is from the nearest training sample.Analogous to the WLOOE criterion, candidates with a surrogate prediction lower than the current function minimum will be more likely to be selected.However, if they are too close to samples stored in X n , they will be penalized by the g function.Hence, a trade-off is realized between surrogate prediction and location in the design space.
3.2.Expected Improvement-Based Infill Criteria.Another set of infill criteria is defined to accomplish the explorationexploitation trade-off from a different point of view.The expected improvement algorithm by Schonlau et al. [39] and Jones et al. [1] represents a standard design strategy to add one new sample aimed at achieving the global optimum of the response surface.In the following, the original criterion and some variants of it are presented.

Expected Improvement (EI) Criterion.
The sequential algorithm is based on the notion of "improvement" defined by end for 7: Set L x i = max j L ij 8: end for Algorithm 1. Lipschitz constant estimation.7 International Journal of Aerospace Engineering where the function F x is here supposed to be a random variable as for the Kriging model (see equation ( 1)).After integration with respect to the conditional distribution where f x is the Kriging predictor (equation ( 7)), ŝ2 x is the corresponding prediction variance (equation ( 8)), and Φ x and ϕ x are, respectively, the N 0, 1 cumulative distribution and probability density functions.The search for the global minimum is enriched by finding the point x that maximized the EI function.Two additive contributions are clearly evident: (i) The first term gives importance to sample points having predicted values much less than f min (thus exploiting the surrogate model) (ii) The second term enhances samples with high uncertainty about the prediction (thus fostering global search) Both terms are weighted by the probability measure of the standard normal distribution defined in the modified variable f min − f x /ŝ x : hence, depending on this ratio, the EI landscape is usually featured with many sharp peaks and wide regions where its value is almost zero.Equation ( 30) may be used even if the prediction model is not stochastic in nature: for example, the prediction function and the associated variance coming from a RBF network (equations ( 15) and ( 21)) may be plugged in it and adopted as EI infill criterion for choosing new points.

EI-Like Criterion. This criterion has been designed
trying to mimic the same rationale of the expected improvement criterion, which was originally conceived for a Gaussian process surrogate.The present approach, referred to as "EI-like" hereinafter, represents a generalization of that algorithm: indeed, for a generic surrogate model, the information about the prediction uncertainty may not be available as in the case of Kriging or RBF network model.The idea is to define a model for the prediction error which is theoretically applicable to any function and any surrogate and to link it to the local complexity of the true function.The potential of improvement function is designed to have the same form of the expected improvement function (equation ( 30)), but here the prediction error is estimated as follows: where γ is a tuning parameter and L x is the local Lipschitz constant estimated as in Section 3.1.3.The ŝ function is related to the order of magnitude of the local maximum difference of f : indeed, the Lipschitz constant is multiplied by a distance so as to make the ŝ quantity similar to Δf from a dimensional point of view.The prediction variance has been designed in order to increase with increasing distance from an available sample.The negative exponential term and the γ parameter in it allow for adjusting the rate of change of ŝ while moving away from known points: for low values of γ, the variance quickly increases and plateau-like regions are generated between samples, while for high values of γ, the rise is milder and a series of hills (midsamples) and valleys (near-samples) are generated.An example will be provided at the end of the section to better explain the landscape of the EI-like function.

Expected Improvement for Global Fit (EIGF) Criterion.
A modified version of the expected improvement criterion is here considered as proposed by Lam [10].Instead of locating the global optimum, the aim is to add new points which are located in regions with significant variation of the response function and to improve the global fit of the model.In other words, the rationale is close to the Lipschitz criterion.Similar to the EI criterion and adopting the same notation, the improvement function is here defined as where x nn = arg min x i ∈X n x i − x is the training site closest (in distance) to x.After taking the expected value of equation ( 32) and recalling that F x ∼ N f x , ŝ2 x , the expected improvement for global fit function is derived as Again, this potential of improvement function consists of two components, one local and one global.The first (local) is large where the predicted response varies significantly with respect to the nearest sample.The second (global) increases when the uncertainty in the prediction is high, i.e., far from the sampled points.
The main difference with respect to equation (33) is that now the change in response f x − f x nn and the prediction uncertainty ŝ2 x are not separated as they interact with each other.This should drive the search to regions where both terms are important as their combined effect would be amplified.International Journal of Aerospace Engineering F X n = −0 656577,−0 015577, 0 909297,−4 605754, 5 711950 has been extracted by uniformly sampling the design space D = 0, 1 : Figure 2 shows the function f x , the surrogate prediction f x (here, a Kriging model is used), and the training samples.Figure 3 shows the potential of improvement functions obtained by applying the criteria described so far for two values of the tuning parameters.It can be observed that, apart from the LOOE criterion where no tuning parameter is introduced, the levels of v are globally reduced with increasing σ or γ; indeed, a different scale of the y-axis is used.
The effect of the σ parameter is clearly observable by comparing the WLOOE curves: indeed, for σ = 1, the maximum value of the WLOOE criterion is located at x = 0 8 and the v function is quite peaky, while for σ = 10, the peak shifts to x = 0 65 and the function is null almost everywhere.This occurs because the relative importance of the exploitation term (equation ( 25)) decreases rapidly with increasing σ, thus raising the exploration contribution within the factorization (equation ( 22)).The same behaviour is observed also for the WD criterion.The LC criterion, instead, suffers only a global and uniform damping of the peaks; hence, the location of high-ranking candidates is not altered by changing the tuning parameter.
Figure 4(a) shows the potential of improvement functions for EI-based criteria.The test function, the number   8)) and by the EI-like criterion (equation ( 31)) for γ = 0 1.In fact, the EI-like prediction error is smaller where the variation in f is limited, as for 0 < x < 0 6, and larger where significant gradients are present, while equation (8) depends only on the correlation between points, i.e., on distance and spatial distribution of points.EIGF and GEIGF criteria privilege the point at x = 0 8 (as LC and LOOE criteria) and show 5 more local maxima (corresponding to the interval end points and to the midpoints between the sample data) that may be selected in successive iterations of the method.This characteristic of placing samples "close to the midpoint" has been highlighted also by Lam who suggests to start with a small number of points from an LHS sampling in order to feed the EIGF function with a smooth predicted response.11 International Journal of Aerospace Engineering Figure 5 shows the result of 5 repeated updates of the surrogate model with each of the proposed criteria.The new 5 points are depicted with black circles, while the initial training points (the same set for every criterion) are represented by light gray circles.Each plot shows the test function and the surrogate prediction after the infill process.All proposed criteria manage to capture the location of the global optimum and to minimize the prediction error in the surrounding region.A strong clustering is observed for WLOOE and WD criteria, a clear effect of the exponential weight built on the surrogate prediction.On the other hand, LC and LOOE criteria are naturally more explorative as they place new points where the prediction error is high or where the function derivative is large.
Figure 6 shows the results of 5 iterations with each of the EI-based criteria.The EI-like criterion rapidly detects the global minimum and tends to cluster points around it, thus showing an "optimizer" behaviour.Similar results are obtained with the EI criterion, even if the final location of the global optimum is approximated.As predicted, EIGF and GEIGF provide a rather dispersed distribution of samples; however, a global optimization would take advantage of this as the region around the minimum is well captured.

Surrogate-Based Sequential Optimization
The workflow of the surrogate-based optimization is depicted in Figure 7.The method is built around the training database which is progressively fed and updated throughout the surrogate enrichment.Three major stages are conceived and designed, answering different needs in surrogate training and optimization: the space-filling initialization, the adaptive infill, and the sequential optimization infill.Each stage will be discussed and detailed in the following sections.
4.1.Space-Filling Initialization Stage.As a first step, the training database has to be initialized in order to build the first instance of the surrogate model.This stage is aimed at providing basic information about the objective space and selecting n apr samples to maximize the informative level.This stage is usually referred to as a priori sampling because it does not require any detail about the response function.A spacefilling design of the experiment technique is deemed appropriate to this aim, e.g., Latin hypercube sampling or Latinized central Voronoi tessellation techniques.Typically, according to literature results and the author's experience, the number of initial samples n apr produced in this stage should not exceed one-third of the total computational budget.Moreover, as multiple and explorative infill criteria may be applied in the second stage, the number of initial samples has to be kept as lowest as possible.Generating multiple training samples all together has the great advantage that they can be evaluated simultaneously; thus, this stage can be executed in parallel to speed up the simulation.Once the evaluation process has finished, the selected surrogate model can be built as described in Section 2.     The training of the initial surrogate is not driven by optimization purposes neither by considerations concerning the prediction error.Being a pure space-filling exercise, the aim is to evenly distribute the samples across the design space: as a consequence, the metamodel cannot be as accurate as required by the optimization task as (1) no control over the prediction errors has been introduced and (2) the proper identification of global minima is not investigated.The second stage of the SBO process (here referred to as "smart exploration" or sequential adaptive sampling stage) reflects the need to provide the optimizer with an improved and reliable surrogate model.The cycle iterates to update the training database with n adpt new samples suggested by infill criteria as described in Section 3. The iterative procedure is structured as follows: (1) Initialization step: the number n par of new samples to be inserted at each infill iteration is defined and the infill criterion is chosen among the available ones.Two options are implemented: (1) the single predefined criterion or (2) the criterion is chosen randomly at each iteration according to a given activation probability for each criterion.International Journal of Aerospace Engineering (2) Testing step: a huge space-filling testing dataset X t is generated.The number of test samples is n t = 1000d, with d being the dimension of the design space.The Latin hypercube is used with different seeds each time in order to avoid sample duplication issues.According to the selected criterion, the potential of improvement function v is evaluated at each point of the testing dataset and a vector V t of size n t is obtained (3) Infill step: a single-point n par = 1 or multipoint 1 < n par < n adpt + n apr − n selection is available-in the first case, each infill iteration produces a single candidate to be evaluated; hence, a complete sequential infill approach is realized and the surrogate has to be built n adpt times; in the second one, multiple samples are simultaneously selected, so that the true function can be evaluated in parallel before refitting the surrogate model.In the latter case, a batch sequential selection is performed.However, the two approaches pose different issues: the single-point approach gives the possibility to update the surrogate many times and to select more criteria during the sequential process, but of course it is much slower; on the other hand, by selecting the   multipoint approach, the best scoring candidates will probably lead to cluster points in a specific area with no diversity and hamper the surrogate adaptation.To overcome this issue, the following procedure is followed.The V t vector is ranked according to the score represented by the value of the v function.If n par = 1, the highest scoring candidate x h = arg max x i ∈X t v x i , f x i , X n , F X n is selected; if n par > 1 and a multipoint selection is requested, the highest scoring candidate x h is again selected and included in the batch selection set X h = x h .The scores of the remaining n par − 1 are reweighted according to the distance penalization introduced by Maljovec et al. [40].For x i ∈ X t \ X h , the distance from the nearest training sample x * i is d x i , x * i ; the mean value of those distances over X t \ X h is where n h is the dimension of the batch selection set (equal to 1 at this point).A reweighted score

16
International Journal of Aerospace Engineering function v * is then assigned to each test sample x i and defined by using a penalization term ρ: Finally, the new candidate is chosen by selecting The batch selection set is now increased by the new sample X h = x h , x h+1 .By repeating the reweighting process and the highest score selection n par − 2 more times, a set X h = x h , x h+1 , ⋯, x h+n par −1 of n par new candidates is obtained and passed to the parallel evaluation with the true model.Note that the new samples are selected by using the same surrogate prediction f and the same training set X n , F X n ; hence, the associated computational cost is negligible (4) New candidate evaluation step: the parallel evaluation of the candidates in X h is performed and the corresponding set of response values F X h is created (5)      18 International Journal of Aerospace Engineering exploitation of the effort spent so far in fitting an accurate and improved metamodel.The training database of design solutions is here enriched with n opt samples suggested by many sequential optimizations on the metamodel: at the end of each optimization, the best candidate is evaluated with the high-fidelity model and appended to the database for a new model fitting.This leads to the generation of a sequence of suboptimal candidates which continuously improves the objective function and approaches the design space region where the "true" optimum resides.The iterative procedure is structured as follows: (1) Surrogate minimization step: given f x the current metamodel built over the training set X n , F X n , the new suboptimal candidate is found by solving the optimization problem x opt = arg min x f x .The optimizer may be chosen among a wide range of options: the in-house genetic algorithm library ADGLIB [41], the CMA-ES algorithm [42] .This means that 50% of the total computational budget is devoted to the adaptive exploration: this is not surprising as goal-oriented infill criteria are used in combination with error-driven infill criteria to realize a high-level exploration-exploitation trade-off.The test functions are described in the following sections.

Ackley Function. The Ackley function is defined as
where m = 10.The function is featured with alternating steep valleys and ridges and has d local minima.The function is continuous, nonconvex, multimodal, and scalable on the d-dimensional space.The input domain is x i ∈ 0, π for all i = 1, ⋯, d.As the global minimum and its location vary with the input dimension, Table 1 reports the corresponding values for d = 2, 5, 10, 20 .

Rastrigin Function.
The Rastrigin function is defined as The function is continuous, convex, and scalable on the d-dimensional space, has several local minima, and is highly multimodal.However, the locations of the minima are regularly distributed.There is one global minimum f x target = 0 at x target = 0, ⋯, 0 .The function is evaluated on the hypercube x i ∈ −5 12, 5 12 for all i = 1, ⋯, d.

Schwefel Function. The Schwefel function is defined as
The function is continuous, nonconvex, multimodal, and scalable on the d-dimensional space.Many local minima are irregularly distributed in the input space, and one global minimum is present.Moreover, with respect to the global minimum, the nearest local minimum in the objective space is the farthest in the input space.These features make the Schwefel function very hard to solve by an approximated approach.The function is evaluated in the hypercube x i ∈ −500, 500 for all i = 1, ⋯, d, and the global minimum is f x target = 0 at x target = 420 9687, ⋯, 420 9687 .5.1.5.Details of Surrogate Models under Testing.Despite the fact that the surrogate models are two (RBFN and Kriging model), some differences may arise according to the method chosen for training, i.e., for selecting good values for the hyperparameters.Table 2 shows four types of surrogates that have been used for testing purposes.The search for the hyperparameters must have a global character as often the associated cost function may have more than one extremum and being trapped in a local minimum/maximum may significantly impact the model accuracy.The hyperparameter bounds are fixed as follows: (i) For RBFN models: θ, λ ∈ 0 01r min , 1 5r max × 10 −5 , 10 with r min and r max being the minimum and maximum Euclidean distance, respectively, between training samples (ii) For Kriging models:

Results and Discussion
. As the surrogate-based optimization package discussed so far offers several choices, this section is dedicated to the investigation of the effect of selecting among the available alternatives.In particular, the influence of the type of surrogate model, infill criteria, and computational budget allocation will be studied and discussed.

Influence of Surrogate Model Selection.
As mentioned in Section 5.1.5,four types of surrogates can be trained according to the mathematical nature (Gaussian or RBF) and optimal hyperparameter selection.All models have undergone a first experimental campaign to underline differences and draw up a ranking.The EI, EI-like, and WLOOE criteria are activated with probabilities of 50%, 30%, and 20%, respectively.Figures 8-11 show the boxplot distribution of Δf min over 5 repetitions for each surrogate and dimension.Outliers are highlighted by grey-filled circles, and all repeated points are plotted as small grey circles.The Gaussian-based models clearly outperform in average the RBFN except for the Ackley function.Table 3 reports a summary of optimization results, highlighting in bold character the best performing model for each case.By taking into account also the standard deviation of Δf min for ranking purposes, Gaussian-based approaches seem to be more robust in that they offer better results with less variability.For RBFN, optimizing the hyperparameters instead of grid searching is by far the preferred approach.For Kriging, there is no clear evidence about the best training strategy as krig-hp and krig-all show similar performances and trends.
In order to provide a more comprehensive insight into the optimization data, Figures 12-15 depict the evolution of the Pearson correlation coefficient R for all cases and rbf and krig-hp models.In the present case, it is used to compute the correlation between n samples of the "true" objective function f i , i = 1, ⋯, n and the leave-one-out prediction by the surrogate Each plot does not start from zero because the first training of the surrogate is done after those n apr samples have been computed.It can be observed that all simulations start with low correlation between data and surrogate prediction.This is much more evident for high-dimensional function cases, where in some cases negative correlation is even found.As a general consideration, all methods achieve to reach very high correlation at the end of the optimization process, but with different paths.In particular, Gaussian-based approaches present a constant and continuous increase of R throughout both the adaptive sampling phase and the sequential optimization process.On the other hand, RBF methods tend to stagnate during the infill phase while the correlation gets better with the addition of the last n opt points.This would seemingly suggest that infill criteria are not effective in reducing the prediction error of RBF methods.However, it should be also considered that trends are quite the opposite if considering only the Ackley function; hence, the test function characteristics play a major role.Hence, it derives that a more extended suite of test functions should be considered for further investigation.to provide an improved surrogate function from the optimizer's point of view.In order to ease the understanding of the results, two models (krig-hp and rbf) and a single test function (Schwefel function with d = 5) have been used.16 infill strategies have been defined, and Table 4 summarizes all of them.The first 9 strategies exploit each single criterion without combination.On the other hand, the last 7 strategies have been designed by coupling a factorization-based criterion with an expected improvement-based one.10 repetitions have been carried out for each infill strategy in order to extract reliable mean values.The optimization setup is identical to that in the previous section.Results are shown in Figure 16 as boxplot distribution and reported in Table 5 in terms of mean and standard deviation of Δf min (over 10 repetitions).By considering both mean values and outliers, the best strategies are 3, 5, 6, 9, 10, and 11 for the Kriging model and 0, 2, 6, 8, 9, and 13 for the RBFN model.Thus, as a general consideration, Gaussian-based models benefit from a blend of expected improvement-based criteria (EI, EI-like, EIGF) and error-driven ones (mainly LOOE and WLOOE).Conversely, RBFN models work well with global fit-oriented criteria (EIGF, GEIGF, and LC) and pure EI search.

5.2.3.
Influence of the Computational Budget Allocation.In this section, the influence of the optimization setup is investigated in terms of computational budget allocation to the infill and sequential optimization phases, having fixed the total budget n run = 50d .Table 6 reports the base setup used so far as setup no. 1, while setup no. 2 refers to the new one to be studied.The main difference is found in a more extended search with adaptive sampling to the detriment of the optimization iterations which are reduced.On the other hand, a more pronounced emphasis is put on infill criteria (EI and EI-like) specifically devoted to global optimization purposes.Results obtained with the new setup are depicted in Figures 17-20.All test functions have been considered, but only the best models (namely, krig-hp and rbf) from previous investigations have been used.The pictures are the analog of  and must be compared with them in order to draw conclusions.In particular, a general worsening of the algorithm performance is highlighted by employing the new setup.Table 7 shows the new results and compares them with setup 1.The bold character is used to underline improvements with respect to setup 1. Setup 2 is beneficial in reducing the standard deviation in several cases and in average offers meaningful improvement only for the Rastrigin function.This observation confirms that results may vary significantly depending on the selection of the set of test functions as well as on the methodology and setup.

Aerodynamic Shape Optimization Problem
In order to test the present approach in a real-world and representative case, a benchmark problem has been selected from those proposed within the AIAA Design Optimization Discussion Group (ADODG), namely, the RAE 2822 airfoil optimization case.This section proposes a critical analysis of the results obtained and represents a sort of prosecution of a previous work [31] which will be taken here as a reference.where x is the generic design vector representing an airfoil shape, A x is the total area enclosed by the airfoil, and A base is the corresponding value for the baseline RAE 2822 airfoil.The lift constraint is explicitly satisfied by performing the flow simulation at fixed lift by varying the angle of attack.
The pitching moment and the geometric constraint are treated by using a penalization approach; hence, the objective function is defined as with C d,0 being the drag coefficient of the RAE 2822 airfoil.According to this position, the baseline airfoil has a unit objective function value f x RAE = 1 , while feasible design solutions "better" than the baseline shape will be rewarded with f x < 1.A unit airfoil chord is assumed, the pitching moment is evaluated at the quarter-chord, the Mach number is 0.734, and the Reynolds number is 6 5 × 10 6 .
6.1.1.Parameterization.In the reference paper [31], the Class-Shape Transformation (CST) by Kulfan [43] was used to make the RAE 2822 shape parametric using 14 design variables.In the present work, the open-source SU2 code capabilities [44][45][46] are exploited and the FFD (free-form deformation) approach is used to arbitrarily change the geometry and, consequently, the volume mesh.
In particular, 20 FFD control points (CPs) are defined around the RAE 2822 airfoil, as depicted in Figure 21, and the vertical displacements of the control points are employed as design variables.The two CPs on the leading edge and on the trailing edge are constrained to have the same displacement, so the total number of design variables is reduced to 18.
6.1.2.Optimization Studies.Three optimization studies have already been performed employing different methods

28
International Journal of Aerospace Engineering and computational loads [31].Details are reported in Table 8, together with information about the present optimization studies.In all cases, RANS simulations are launched to evaluate the objective function when high fidelity is required, e.g., to validate the samples suggested by infill criteria or by optimizing on the surrogate.Practically, the CFD approach is different: reference cases employed the in-house multiblock structured ZEN code [47], the k − ω TNT turbulence model, and a fine mesh selected after a detailed mesh convergence analysis; as already mentioned, new cases take advantage of the unstructured SU2 suite running with the Spalart-Allmaras turbulence model on a coarser mesh.Hence, any comparison between cases should take into account such differences.SBOSA (Surrogate Based Optimization with Sequential Adaptation) was similar to the present one, but considered a lower dimensional space (14 instead of 18 design variables, CST parameterization instead of FFD) and a surrogate model based on proper orthogonal decomposition (POD) and RBF interpolation of coefficients.EGO (Efficient Global Optimization) refers to the classical Krigingbased approach made popular by Jones et al. [1] and implemented within the Dakota package [48].PGA (Plain Genetic Algorithm) consisted in a pure, intensive evolutionary optimization where no surrogate model was used and all evaluations were performed by using the highfidelity CFD approach.Here, two further approaches are added for benchmarking: SU2AO and SU2SBO.SU2AO uses the adjoint gradient-based optimization method embedded within the SU2 code which includes the flow solution, continuous adjoint flow solution, geometry modification, mesh deformation, and gradient optimization (SLSQP method).SU2SBO, instead, involves the presented surrogate-based approach and the SU2 tools for flow solution, airfoil shape modification, and mesh deformation.Table 9 reports the setup used for this simulation which reflects the experimental knowledge gained with the test functions in the previous sections.
As for Table 8, the number of CFD evaluations needed by the SU2AO approach comprises the calls to the adjoint flow solver (one for each objective/constraint, 3 in total according to the problem statement in Section 6.1), as each additional adjoint evaluation has approximately the same cost of a single CFD solution.However, the number of gradient evaluations (and, hence, the calls to the adjoint flow solver) is smaller than CFD ones (30 out of 70) as the gradient vector is not updated at each iteration.
Figure 22 shows the progress of the minimum value of the objective function found through the sequential optimizations SU2SBO and SU2AO.A log scale for the x-axis is used to make the picture clearer.The local approach (SU2AO) is much faster to reach feasible and optimal regions of the design space; indeed, the objective function drops down to approximately 30% with respect to the baseline after only 30 design cycles.The global  29 International Journal of Aerospace Engineering approach (SU2SBO) takes more CFD effort to accurately train the surrogate model: after the initial design space sampling by LHS (180 samples), the objective function has been decreased to just 8%.From that point on, the global approximation starts to predict the objective function landscape and the adaptive sampling provides new insight into unknown regions: their combined effect causes a significant performance improvement at around 300 design solutions.The last 100 solutions are sequentially introduced in the database by optimizing the surrogate with an in-house genetic algorithm [41].A further drop is observable in the very last stages which pushes the obtained performance beyond the SU2AO best result.
Figure 23 depicts a geometric and aerodynamic comparison of baseline and optimized airfoil shapes.The shock wave intensity reduction and the consequent improvement of the boundary layer behaviour past the flow discontinuity are highlighted by the redesign of the front airfoil region (to increase the flow expansion peak) of the whole pressure side (to recover lift and pitching moment) and of the rear part of the suction side (to control the shock).Similar results have been found in the reference paper.Figure 24 depicts the Mach number contour map and shows how the supersonic region has been greatly reduced and moved upstream.Finally, Table 10  provides quantitative figures about the optimal solutions from the reference paper and present simulations.The baseline airfoil drag coefficients obtained with both ZEN and SU2 codes have been reported as they represent the reference performance for corresponding cases: indeed, the percentage ΔC d change is also shown in the rightmost column.The present results are fully in line with previous studies as the optimized airfoils satisfy all constraints and feature drag reductions of the order of 35%-40% with respect to the baseline shape.

Conclusions
A framework for surrogate-assisted optimization has been presented, featuring design exploration, adaptive sampling, and sequential global search.A series of infill criteria have been proposed to adaptively choose new points to be added to the surrogate database: most of them pursue the exploration-exploitation balance and are aimed at providing an improved version of the surrogate to the final optimization stage.A new feature has been added which allows triggering several criteria during the sampling phase according to activation probabilities.Gaussian-based and radial basis function network models are used as surrogates of the objective function.Two experimental test campaigns have been performed: the first has involved analytic test functions with multidimensional and scalable character in order to quickly analyze the performance of the method and the second campaign is a prosecution of a previous work as it deals with the aerodynamic shape optimization of an airfoil profile in transonic viscous flow conditions.The open-source SU2 package has been exploited in most of its functionalities related to aerodesign.Results have been compared to the reference and to an adjoint gradient-based optimization.The work confirmed that approximate solutions, close to the global optima, can be found with the proposed surrogate-assisted optimization.Simulations with test functions highlighted that the proper combination of surrogate and infill criteria can be quite sensible to the function type and suggested that, by having multiple surrogates and criteria available, probably the best option is to exploit them in an ensemble approach.This will be the main topic of a future research study.Concerning the aeroshape optimization case, the achievements are in line with previous investigations as all the basic (geometric and aerodynamic) features of the optimal shape have been captured by the present approach.In the context of global and real-world optimization, the adoption of surrogate models is essential to reduce the computational burden.The proposed example clearly indicates the benefits associated with their usage in terms of cost/performance trade-off.Further application will deal with more complex cases, e.g., three-dimensional benchmark problems defined within the AIAA Aerodynamic Design Optimization Discussion Group and interference drag reduction of wing/ pylon/nacelle configuration.
the unknown RBF weights, and f = f 1 , f 2 , ⋯, f M T are the function values at the training sites.The prediction variance is estimated as

3. 1 . 3 .
Lipschitz Constant Criterion.This criterion (LC) is aimed at selecting new design samples where the local complexity of the function is high.The Lipschitz constant is considered here as an indicator of the local complexity.Given a 6 International Journal of Aerospace Engineering domain D and a function f defined in D, the Lipschitz constant denotes the smallest constant L > 0 in the Lipschitz condition, namely, the nonnegative number: 3.2.4.Generalized EIGF Criterion.Starting from equation (32) and taking the expected value of I 2 x , the 8 International Journal of Aerospace Engineering generalized expected improvement for global fit function is obtained as

Figure 8 :
Figure 8: Boxplot of Δf for the Ackley test function.
The current training sample set is X n of size n = n apr

Figure 9 :
Figure 9: Boxplot of Δf for the Michalewicz test function.

Figure 11 :
Figure 11: Boxplot of Δf for the Schwefel test function.

6. 1 .
RAE 2822 Airfoil Shape Optimization.The shape optimization problem is formulated as a drag (total drag coefficient C d ) reduction task while keeping the lift level (lift coefficient C l ), trim control of the pitching moment coefficient C m , and minimum airfoil area constant.It reads as follows: minimize x C d x subject to C l x = C l,base = 0 824 C m x ≥ C m,base = −0 092 A x ≥ A base = 0 7787m 2 43

Figure 17 :
Figure 17: Boxplot of Δf for the Ackley test function with setup 2.

Figure 18 : 5
Figure 18: Boxplot of Δf for the Michalewicz test function with setup 2.

Figure 19 :
Figure 19: Boxplot of Δf for the Rastrigin test function with setup 2.

Figure 20 :
Figure 20: Boxplot of Δf for the Schwefel test function with setup 2.

Figure 22 :
Figure 22: History of minimum objective function values for SU2SBO and SU2AO optimizations.

Table 1 :
Global optimum of the Michalewicz function.

Table 2 :
Type of surrogate models.
Surrogate update step: the surrogate model is refitted over the updated training set X n ∪ X h , F X n ∪ F X h , and the training set size is updated n = n + n par (6)Check step: if the infill computational budget is not over n < n adpt + n apr , a new iteration is started by going back to point (2); otherwise, the sequential infill process is terminated 4.3.Sequential Metamodel Optimization.The last phase of the surrogate optimization process is devoted to the

Table 3 :
Summary of optimization results.The best performance for each case is highlighted in bold character.
5.2.2.Influence of the InfillCriterion.This section is devoted to testing different infill criteria and combination of them with the aim of assessing which infill solution is more capable

Table 5 :
Summary of infill screening results.The best 5 results for mean and standard deviation of Δf min are highlighted in bold character.

Table 7 :
Summary of optimization results with setup 2. The best performance for each case is highlighted in bold character.

Table 8 :
Summary of optimization studies for the RAE 2822 case.

Table 10 :
Comparison of the best solutions.Run Total CFD evals.C d (ZEN, fine mesh) C d (SU2, coarse mesh) ΔC d wrt ref.