Surrogate-based optimisation using adaptively scaled radial basis functions

Aerodynamic shape optimisation is widely used in several applications, such as road vehicles, aircraft and trains. This paper investigates the performance of two surrogate-based optimisation methods; a Proper Orthogonal Decomposition-based method and a force-based surrogate model. The generic passenger vehicle DrivAer is used as a test case where the predictive capability of the surrogate in terms of aerodynamic drag is presented. The Proper Orthogonal Decomposition-based method uses simulation results from topologically different meshes by interpolating all solutions to a common mesh for which the decomposition is calculated. Both the Proper Orthogonal Decomposition-and force-based approaches make use of Radial Basis Function interpolation. The Radial Basis Function hyperparameters are optimised using differential evolution. Additionally, the axis scaling is treated as a hyperparameter, which reduces the interpolation error by more than 50% for the investigated test case. It is shown that the force-based approach performs better than the Proper Orthogonal Decomposition method, especially at low sample counts, both with and without adaptive scaling. The sample points, from which the surrogate model is built, are determined using an optimised Latin Hypercube sampling plan. The Latin Hypercube sampling plan is extended to include both continuous and categorical values, which further improve the surrogate’s predictive capability when categorical design parameters, such as on/off parameters, are included in the design space. The performance of the force-based surrogate model is compared with four other gradient-free optimisation techniques: Random Sample, Differential Evolution, Nelder–Mead and Bayesian Optimisation. The surrogate model performed as good as, or better than these algorithms, for 17 out of the 18 investigated benchmark problems. © 2020TheAuthors.PublishedbyElsevierB.V.ThisisanopenaccessarticleundertheCCBYlicense (http://creativecommons.org/licenses/by/4.0/).


Introduction
Optimisation is an ongoing field of research and aims to find the best solution to a specific problem while obeying some constraints.Given that resources and time are a limitation, it is often not feasible to try all possible design combinations.As a consequence, it is practically not possible to determine if a given solution is the global optimum.
The field of optimisation is often split into two categories, gradient-based and gradient-free techniques.Gradient-based optimisation relies on the gradient information to determine in which direction to search for a better design.Gradient-based methods perform well for unimodal problems; however, they risk converging to a local optimum if the problem exhibits several minima [1].Gradient-based methods require gradient information to be available, something which is not possible when dealing with categorical, or discrete, values.When working with gradient-based optimisation of numerical aerodynamic simulations, it is essential to use robust mesh-deforming strategies so that each simulation produces meaningful results.Without such strategies, the optimisation method may exploit mesh discretisation errors leading to significant discrepancies between simulated and actual performance [2].
Gradient-free optimisation can be adopted with ease into existing practices; however, a major limitation is the often high number of objective function evaluations needed.The cost of one objective function evaluation, for example, a Computational Fluid Dynamics (CFD) simulation, can be expensive.Joly et al. [3] used low-cost Reynolds-averaged Navier-Stokes (RANS) CFD simulations to reduce the computational effort when optimising the vane-rotor shock interaction with the Evolutionary Algorithm (EA) Differential Evolution (DE).Massaro and Benini [4] used an Artificial Neural Network as a global surrogate in a surrogateassisted evolutionary framework for optimisation to improve the convergence speed of the optimisation problem.
In this work, the expensive cost of running a multitude of CFD simulations is mitigated by the use of a surrogate model https://doi.org/10.1016/j.asoc.2019 Scaled design variable c j Dimension scaling factor (SM) when searching for new designs.The surrogate model is comparatively cheap to evaluate and create; however, the new designs found by exploring the surrogate model is subject to the accuracy of the model.Strategies need to be employed that increase the accuracy of the surrogate while exploring promising regions of the design space.These strategies typically involve balancing exploration and exploitation.Exploration involves sampling locations that improve the surrogate model's accuracy by exploring the design space, for example, by sampling sparse areas, while pure exploitation is achieved by sampling the bestpredicted design location.When new designs are validated, they are added to the initial set and the surrogate model is built again.One such strategy was developed by Goel et al. [5], who used an ensemble of surrogates to determine regions where the standard deviation between the surrogates was large to find regions of high uncertainty to explore the design space.However, they noted that regions of low standard deviation do not necessarily correlate with low uncertainty.Another approach developed by Sun et al. [6] uses a combination of two metaheuristic optimisation algorithms, one focusing on exploration and the other on local refinement of the solution.The combined approach yielded high performance for problems of large dimensionality, in this case, up to 200 dimensions.
Two surrogate models are compared in this work, a Proper Orthogonal Decomposition (POD)-based surrogate and a forcebased surrogate.Several authors have investigated POD-based optimisation [7][8][9][10][11].As an example, Miretti et al. [9] found that the use of a POD-based surrogate model reduced the number of computations needed by 30% to 40% compared with other gradient-free methods when optimising the drag of a road vehicle.In the POD-based approach, the entire flow field is used for each input sample and an entire flow field is given as the output.The input to the force-based surrogate model is a scalar and the output is an estimate of the same scalar in a new design point.Note that the scalar of interest can be any quantity, not only a force.
The focus of this paper is on the accuracy and performance of the surrogate model.The accuracy of the method is evaluated using CFD simulations of the generic vehicle DrivAer, developed by the Technical University of Munich [12].The surrogate accuracy is investigated using three design variables; roof angle, diffuser height and front-wheel deflector on/off.The surrogate model's performance, when used in optimisation, is benchmarked with nine test functions with and without added noise.
The main contributions of this work are: (1) A new axis scaling parameter has been implemented in the Radial Basis Function (RBF) based surrogate model which improved the surrogate model fit.The improvement is pronounced when there is a significant difference in sensitivity between the different input dimensions in regards to the function output.(2) A new method of creating optimised Latin Hypercube sampling plans, including categorical design spaces, was developed which halved the interpolation error when combined with adaptively scaling RBFs.(3) It was shown that the inclusions of a POD-based method did not improve the interpolation accuracy over a forcebased surrogate model for the investigated aerodynamic test cases.

Proper orthogonal decomposition
Proper Orthogonal Decomposition is a modal decomposition technique that extracts orthogonal modes from the field of interest by optimising the mean square field variable captured by each mode.If the field variable of interest is velocity, this is analogous to capturing the most amount of kinetic energy using the least amount of modes.POD is also known as the Karhunen-Loève procedure or Principle Component Analysis (PCA) and has been used in many applications such as statistics, turbulent flows, reduced-order modelling and optimisation.POD was first introduced as a data processing technique in fluid flows by Lumley [13].A brief introduction to the POD technique is given below.More information can be found in the works by Taira et al. [14] and Muld et al. [15].
It is assumed that the flow can be decomposed as where φ i (r) is the ith POD mode and a i (d) are the designdependent mode coefficients containing the design information.X are the space and design-dependent flow variables arranged as with n data points and m snapshots, where one snapshot corresponds to one CFD simulation.The variable matrix X is built from the pressure and velocity fields in the entire computational domain.As mentioned, the POD method seeks to find the orthogonal basis vectors that optimally represent the data in a mean squared, or L 2 sense.It is possible to find these basis vectors by solving the eigenvalue problem where the modes are calculated as which is often referred to as classical POD.Since XX T ∈ R n×n , this can become prohibitively expensive when n is large, as is the case for flow fields.In fluid mechanics, where n is often much larger than m, the method of snapshots is used which solves the eigenvalue problem which is of size m × m.This formulation assumes the data to be sampled on an equidistant mesh, which is seldom true for CFD simulations.The cell volume is included in order to use POD on a non-equidistant mesh by replacing X T X with X T WX, where W is a diagonal matrix containing the cell volumes [15].The design coefficients a i (d) are constructed as In practice, this can also be solved using a Singular Value Decomposition (SVD) The size of the problem can be significantly reduced by truncating the number of POD-modes if a large portion of the total energy is contained in a few modes.This was done in the work by Iuliano and Quagliarella [11], where the modes were truncated to contain 95 % of the total energy.In this work, all modes are used to remove any decency on the truncation level.

Latin Hypercube sampling
The surrogate model was created from an initial number of CFD simulations in a Design Of Experiments (DOE) sampling plan.There exist several DOE techniques and, in this work, the Latin Hypercube (LHC) sampling plan was used.The LHC plan divides each design parameter into N equally sized intervals where the same value of a parameter can only occur once.An example of a randomly generated LHC plan can be seen in Fig. 1.
Randomised LHC sampling plans can suffer from clustering, leaving areas of the design space less explored.The separation between designs was increased by optimising the LHC plan using the Audze-Eglais objective function where L 2 pq is the distance between two sample points.The optimisation of the LHC sampling plan is based on the work by Bates et al. [16], where the sampling plan is improved by minimising Eq. ( 7) using a permutation genetic algorithm.Fig. 2 shows the optimised sampling plan with 101 sample points that were used in this work for the two-dimensional test case.Throughout this work, an optimised LHC plan was used except for the onedimensional case, where the samples were spaced equidistantly.

Categorical Latin Hypercube sampling
LHC sampling plans require an equal number of points in each dimension; this typically means that each dimension needs to be continuously variable.It can be of interest to include dimensions which are not continuously variable in the design plan.For example, the choice of vehicle tyre is not continuously variable; rather, there exists a pre-determined number of tyre options from the tyre manufacturers.To be able to mix continuous and categorical, or discrete, design parameters, a categorical dimension is added to the LHC plan where the same value is allowed to appear multiple times to fill up the categorical dimension.Note that this is no longer a strict LHC since the same design parameter appears multiple times in the categorical dimensions.
When optimising the categorical LHC, the objective function is altered to promote separation of the design points within each categorical plane.This is done by calculating the Audze-Eglais function within each category, in addition to calculating it between all points.Both objectives are then summed together using weights for each objective, where the weights are specified by the user.Promoting separation within each categorical plane ensures that the points within a categorical dimension are not clustered in one area.The effect of different weightings can be seen in Figs.3a and 3b.Note that putting a large emphasis on the categorical separation is similar to creating two separate sampling plans.In this work, the categorical LHC uses the weights 0.9975 between all points and 0.0025 for the categorical separation.

Radial basis function interpolation
New flow fields are created by finding new design coefficients in Eq. ( 5).These are found through interpolation between existing coefficients to new, untested design locations.In this work, Radial Basis Function interpolation is used to estimate the design coefficients.The RBF interpolation is calculated as where w i are the weights and ξ i is the Radial Basis Function, and ∥x − x i ∥ 2 denotes the Euclidean distance between the new point x and a sample point x i .The RBFs, ξ i , used in this work is the gaussian basis functions where ε is the width factor determining the size of the RBF.An example of the influence of the width factor can be seen in Fig. 4.
The weights w i are found by solving the linear system where and u = u(x i ) are the known function values at the sample points.The chosen RBFs will result in a positive-definite matrix A, thereby guaranteeing a unique solution to Eq. ( 12) [17].

Radial basis function hyperparameter optimisation
The tuning of the hyperparameters ξ i and ε i is done with the Differential Evolution algorithm de/rand/1/bin/radiuslimited with a population size of 50, crossover probability of 0.7 and differential weighting factor of 0.6.More information on Differential Evolution can be found in the book by Price et al. [18].The objective function used to improve the interpolant is the Leave-One-Out (LOO) cross-validation error.The interpolant is created from n − 1 points where the prediction error, e i , for the removed point is stored.This process is repeated for all n points, and the LOO RMS error is used to quantify the performance of the interpolant.The LOO error for all n points is expensive to calculate, but thanks to Rippas algorithm [19], the LOO error can be estimated without computing the solution to Eq. ( 12) n times.The total computational cost for Eqs. ( 12) and ( 13) is on the order of O(n 4 ) without Rippas algorithm and on the order of O(n 3 ) with Rippas algorithm.

Dimension scaling
The design parameters used to construct the surrogate model are in the form of engineering quantities.These are typically used directly as input to the surrogate model or scaled nondimensionless, for example from 0 to 1 as . ( This can pose a problem since RBFs are radial.To illustrate this, the three-hump camel function is used in the interval [−5, 5] for both dimensions.The function is shown in Fig. 5.When optimising the interpolant, all dimensions are considered equally important.This leads to a compromise when selecting the RBFs, ξ i , and the widths, ε i .An example of this compromise can be seen in Fig. 6a.The resulting interpolation is not representative of the overall shape of the original function.The sharp peaks in Fig. 6b occur due to the interpolants inability to fit a surface which minimises the LOO-error for all points.When it is not possible to fit a smooth surface through all points, the RBF width factor, ε i , becomes large for some points to reduce their influence on the total LOO-error.To reduce the flexibility of the interpolants which can be fitted to a set of observations, it is possible to constrain the problem.The interpolant is constrained to use the same width and RBF for all points in Fig. 6b.While the smoothness of the function is improved, the overall fit is still poor compared to the ground truth, Fig. 5.The cause of the poor fit is the large differences in scale of each dimension when mapping from function input to output.The inputs for each design dimension need to be scaled independently to solve this.In the work by Giannakoglou [20], the design parameters are scaled based on gradient information from the response surface.In this work, the design parameters are scaled by using an additional scaling factor c j , which is introduced to Eq. ( 14) as where c j = [c 1 , c 2 , . . ., c d ] and d is the number of design dimensions.Note that the resulting RBFs are radial in the scaled space which, when translated back to the design space, results in the RBFs being ellipsoidal.The benefits of using adaptive scaling Radial Basis Functions on the same three camel hump function are evident when considering Fig. 6c, where a much better prediction of the global shape is obtained.
Adaptively scaling the design dimensions allows the designer to use engineering quantities for each design dimension without considering the appropriate scaling of each quantity.Furthermore, adaptively scaling each dimension reduces the penalty of including design parameters which have little or no influence on the output, i.e. constant response.

Kriging interpolation
Another commonly used interpolation technique, which was considered in this work, is Kriging.The authors compared the performance between Kriging and RBF for 5, 10 and 40 sample points using the one-dimensional data presented in Section 4.1.1.The RBF based interpolant outperformed Kriging over ten runs for 10 and 40 points while both methods performed similarly for 5 points.Chandrashekarappa and Duvigneau [21] compared the accuracy of Kriging-and Radial Basis Function-interpolation and found that the methods performed comparably.As concluded in the literature review by Goel et al. [5], no single surrogate performed best for all problems.It should be noted that Kriging offers the benefit of providing error estimates which can be used to explore the design space efficiently.
Adaptive scaling can be used with Kriging interpolation; however, the computational complexity of Kriging is O(n 4 ) while for RBF interpolation, it is O(n 3 ).For larger n, Kriging can become limited by the number of affordable iterations compared to RBF interpolation when optimising the scaling factor, c j .

Practical considerations
The condition number of the matrix A in Eq. ( 12) can become large when optimising the RBFs and width factors.Due to finite numerical precision when solving the linear system, the condition number of matrix A needs to be considered to keep the desired number of significant digits.Similarly, the numerical precision of the dimension scaling needs to be considered when scaling between small and large numbers.
When dealing with data containing noise, the RBF interpolation exhibits high-frequency oscillations when the number of interpolation points increases.Which is due to the interpolant creating a fit containing the noise as well as the global function shape.Fasshauer [22] suggests using ridge regression when dealing with data containing noise.A regression term, λ, is added to the diagonal of A, Eq. ( 12).This term relaxes the requirements of the surrogate model being directly interpolating, which reduces high-frequency oscillations in the surrogate model when modelling a problem with noise.The ridge regression term is treated as an additional hyperparameter and is optimised together with the other RBF hyperparameters.
The presented methodology is stochastic, and therefore, does not yield the same solution each time.To illustrate this, an interpolant based on the Forrester function is recreated 1000 times based on five equidistantly spaced sample points.
As mentioned, the interpolants hyperparameters are optimised using DE where the LOO-error is used as the objective function to tune the RBFs, ξ i , and the widths, ε i .In this example, the optimisation of the interpolant is stopped after 1 × 10 4 function calls.The average interpolant, along with the standard deviation based on the 1000 interpolants, can be seen in Fig. 7. Optimising the width factor and RBF for each point allows for flexibility when fitting the interpolant to the sample points.However, this flexibility might lead to overfitting, where the number of free parameters is large compared to the problem [23].A demonstration of the possible effects of a large number of free parameters is given by Mayer et al. [24].Reducing the degrees of freedom by using the same width and RBF for all points can alleviate the problem of overfitting; however, it is not clear which degree of freedom is suitable when constructing the interpolant.The impact of the degrees of freedom on the resulting surrogate model will be investigated in the results.
In the work by Abedinia and Amjady [25], the trade-off between under-and over-fitting was improved by optimising the number of RBF centres and their placement.Srivastava et al. [26] proposed a method where the neurons of a neural net are randomly turned off at training time.This helps regularise the resulting network to act like an average of an ensemble of networks.

Surrogate model
As mentioned, new flow fields are created by interpolating the coefficients in Eq. ( 5).The prediction accuracy of each new design is evaluated using the integration planes in Fig. 11.The entire procedure is outlined in Algorithm 1.
Algorithm 1: POD-based optimisation.The POD-based approach is compared to the force-based method, where the objective value is used directly to create the surrogate.Using the objective value to create the surrogate model is often termed Response Surface Modelling (RSM) while the POD-based method is commonly termed reduced-order Modelling (ROM).The asymptotic cost for the POD-based method is O(n 4 ), while being O(n 3 ) for the force-based method.The complexity for creating one surrogate model is dominated by the cost of solving the linear equation system, equation (12), which is an O(n 3 ) operation.The POD-based cost of O(n 4 ) is due to the number of interpolants needed for the POD-based method is equal to the number of modes which is equal to the number of samples.For the force-based method, the number of interpolants needed is equal to the number of objectives.The computational cost of the POD-based method can be reduced by using a truncated set of modes so that the cost final cost scales with O(n 3 ); however, in this work, all POD modes are retained.In practice, the POD-based method will be the number of modes, m, times more expensive than the force-based method.The storage cost is another factor to consider when using the POD-based method as the flow field needs to be saved for each design, whereas for the force-based method, the storage is negligible.

Software
The CFD simulations were performed in the commercial flow solver Star-CCM+.All the surrogate modelling and design plan construction was performed in the Julia programming language [27] using the packages LatinHypercubeSampling.jl [28] for the sampling plans and ProperOrthogonalDecomposition.jl [29] for the POD construction.The RBF interpolation was done with Scat-teredInterpolation.jl [30] and the Differential Evolution algorithm was provided by BlackBoxOptim.jl [31].
The packages to perform the optimisation are bundled into one package, SurrogateModelOptim.jl [32], with functionality to   perform the entire optimisation routine for convenience.All benchmark results for the surrogate model benchmark can be found in [32] as well as the script used to run the benchmark.

Aerodynamic surrogate accuracy test
The surrogate model accuracy was tested on the generic vehicle model DrivAer [12], developed by the Technical University of Munich, in the estate configuration.The variant, shown in Fig. 8, features a smooth underbody, closed rims and no cooling flow.The method performance is investigated using three design parameters, for one dimension, roof angle; two dimensions, roof and diffuser angle; and three dimensions, roof angle, diffuser height and front-wheel deflector on/off.The roof angle and diffuser height extreme points can be seen in Fig. 9.The front-wheel deflector is 175 mm wide and 40 mm tall and is located 100 mm inboard of the front-wheels outer face and 85 mm upstream of the front-wheels most forward point, see Fig. 10.Each simulation was run steady-state for half the vehicle model and a symmetry plane boundary condition was used to account for the other half of the vehicle.The meshes consisted of approximately 22 × 10 6 cells and the realisable k − ε RANS turbulence model was used.Ridge regression was not used for the aerodynamic accuracy comparison to retain an exactly interpolating surrogate model.

Mesh
Proper Orthogonal Decomposition requires the dataset to be ordered consistently between snapshots.This is typically achieved through volume mesh deformation [10].In this work, each CFD simulation is run on topologically different, unstructured, meshes and is later interpolated to a common mesh.The common mesh does not include the vehicle body since it changes between each design; however, the wheel geometry is constant and kept in the common mesh.The results are interpolated to the common mesh using a weighted nearest neighbour interpolation where the distances to the nearest neighbour and its immediate neighbours in the original mesh are used to compute the weights.
Since each simulation is interpolated to a mesh without a vehicle body, it is no longer possible to use the pressure and shear forces acting on the vehicle surface to compute the drag and lift.To solve this, the drag and lift area, C D A and C L A respectively, are evaluated in the far-field as where the integration planes, S, O, R and G, are defined in Fig. 11.
The mesh interpolation and subsequent evaluation of forces in the wake introduce errors in the force calculation.Based on 101 simulations of the DrivAer with varying roof angle, the errors were (0.005 ± 0.002) C D and (0.000 ± 0.004) C L .The constant offset of 0.005 C D is due to the vehicle and ground interaction.This can be accounted for by including the drag on the ground plane located between the inlet and plane S. Due to the offset being near-constant, the ground plane drag was not considered in this work.

Benchmark test cases
The surrogate model performance, when used for optimisation, was tested with nine benchmarking functions both with and without noise resulting in a total of 18 cases.Each benchmark was started from five sample points in an optimised LHC and run for 95 iterations where the surrogate model was recreated between each iteration.The small initial sample size was chosen to verify the methods convergence capability when starting with a sparse sampling plan.
When performing aerodynamic vehicle optimisation in a wind tunnel or using numerical simulations, the results from each test

Problem
Definition )) 2 2 )] contain some level of noise.The repeatability, when performing wind tunnel tests, without removing the vehicle, is on the order of ±0.001 C D [33] and typical changes to the vehicle drag vary from 0.010 C D to 0.050 C D depending on the development stage.
Interpreting the repeatability as noise, this translates to a noise level of 2 % to 10 %.The optimisation methods are benchmarked with 10 % noise to cover noise levels seen in aerodynamic vehicle optimisation.The noise level is determined by the maximum and minimum function value and the noise is uniformly distributed pseudorandom white noise.The ridge regression coefficient λ was treated as a hyperparameter for both the smooth and noisy test cases as to not make any underlying assumption of the noise level to verify the methods ability to interpret the noise level automatically.
The nine benchmark functions are available in Table 1.Both multimodal and unimodal benchmark problems were selected as well as multidimensional problems to verify the surrogate models ability to handle cases similar to those seen in aerodynamic optimisation.Note that the output of these functions is a single scalar, and it is only the force-based method which is benchmarked.
Each function was evaluated in the search space presented in Table 2.The function values for each benchmark problem are scaled from 0 to 1 based on the global maximum and minimum function value.
The results of each benchmark problem are presented in comparison with four other gradient-free optimisation algorithms, Random Sample (RS), Differential Evolution (DE), Nelder-Mead (NM) and Bayesian Optimisation (BO).Random Sample is the conceptually simplest of the methods and works by randomly sampling the search space and returning the solution with the

Problem
Search space f 1 Styblinski-Tang 2D x i ∈ (−5.0, 5.0) for all i f 2 Rastrigin 2D x i ∈ (−5.12, 5.12) for all i f 3 Rosenbrock 2D x i ∈ (−5.0, 5.0) for all i f 4 Beale 2D x i ∈ (−4.5, 4.5) for all i f 5 Sphere 2D x i ∈ (−5.12, 5.12) for all i f 6 Perm d, β 2D x i ∈ (−2.0, 2.0) for all i f 7 Goldstein-Price 2D x i ∈ (−2.0, 2.0) for all i f 8 Hartmann 6D x i ∈ (0.0, 1.0) for all i f 9 Rosenbrock 12D x i ∈ (−5.0, 5.0) for all i currently found best fitness after each iteration.The Nelder-Mead, or simplex search, algorithm constructs a simplex from d+1 function calls, the simplex is then either reflected, expanded, contracted, or shrunk.The Nelder-Mead optimisation algorithm was supplied by Optim.jl[34].More information about the simplex algorithm can be found in the works by Nelder and Mead [35] and Gao and Han [36].The parameters used in the NM algorithm are those suggested by Gao and Han [36] α Bayesian Optimisation is suited for expensive functions and constructs a surrogate model of the problem and quantifies the underlying uncertainty using Gaussian process regression.The Bayesian optimisation algorithm, called Dragonfly, was supplied by Kandasamy et al. [37].More information on Bayesian Optimisation can be found in the work by Frazier [38].The parameters used in the DE algorithm are the same as used for optimising the RBF hyperparameters except for the population size, which was reduced to 10, to balance the trade-off between population size and function evaluations.Note that this population size is small; however, the total number of function evaluations in this comparison was also small.Populations sizes of 5 and 50 were also tested; however, both performed worse compared to a population size of 10.
The surrogate model hyperparameters were optimised for 5000 iterations where the first 2500 iterations were optimised with an additional constraint of using the same RBF and width.Optimising the hyperparameters with the constraint improved the accuracy of the surrogate model as suitable dimension scaling, and ridge regression factors are found faster.This is especially true when the number of samples is large.The LHC sampling plan was optimised for 1000 iterations.

Infill
The surrogate model was used to determine the next sample location after each function evaluation.Infill strategies which explore the design space help the optimisation process to stay on target, finding the global optimum, without getting stuck in local minima.Exploitive strategies pick the minimum location predicted by the surrogate model.There also exist variations which try to take advantage of both, such as Expected Improvement which is commonly used with Bayesian optimisation.In this work, an ensemble of 10 surrogates was created and used to determine new infill locations, inspired by Goel et al. [5].The surrogate prediction was calculated from the median of the 10 surrogate models.
A mix of two infill criteria was used to determine new sampling locations in the surrogate model, minimum prediction and maximum standard deviation of the surrogate models.An example of optimising the Forrester function, Eq. ( 17), can be seen in Fig. 12.The first surrogate model estimate is a poor representation of the overall function due to the small number of sample points.Notice that the first surrogate model prediction is not interpolating, i.e. it is not able to fit a good RBF interpolant without increasing the ridge regression factor.As the number of sample points increases, the function is correctly identified as noise-free.
It is believed that the algorithm will converge to the global optimum as the number of points tend to infinity due to the sampling of the standard deviation.The standard deviation is low close to sampled areas which will increase the coverage in space as samples are added.However, the intent is to use the surrogate model with expensive functions where the rate of improvement is the most important aspect of the optimisation algorithm.In the example, the criteria were alternated between pure exploitation, using the median prediction, and pure exploration, using the standard deviation of the surrogate ensemble.For the benchmark tests, every odd sample was selected based on pure exploitation and every even sample by a weighted sum of the surrogates median and standard deviation as A weight of 0.0 results in an explorative search, while a weight of 1.0 is purely exploitative.For every even sample, the weight was cycled through 0.0, 0.3, 0.6, 0.9.The performance was increased by using the weighted formulation compared to using pure exploration for every even sample.This is due to more samples being spent in regions of predicted low function value, particularly when the number of samples increase and the standard deviation in the ensemble reduces.
The infill objective is optimised each iteration using the previously presented DE algorithm for 25 000 iterations.The package bundle SurrogateModelOptim.jl [32] includes convenience functions to create an optimised surrogate model.This is useful if the user wants to try other metaheuristic optimisation algorithms such as ant colony-or particle swarm-optimisation for example, to find promising new infill locations.

Aerodynamic surrogate accuracy-test
The accuracy of the interpolants is investigated in four cases, listed in Table 3.The constrained RBF and width refers to the use of the same RBF and width factor for all points, while for the variable RBF and width, is optimised per RBF centre, which in this case is per sample point.Note that the interpolant is optimised in each case.For cases A and B, the engineering quantity, such as measured distance or angle, is used directly as input to the surrogate while for cases C and D, the engineering quantity is adaptively scaled.This is done for both the POD-based and force-based surrogate model.
The results are presented for drag only since lift follows a similar trend.The results are presented successively for one dimension, two dimensions and three dimensions.Each dimension investigation is performed by running a DOE of 101 CFD simulations, where a subset of the simulations are used to create the surrogate model, and the remaining points are used as a validation set.The performance indicator used is the Mean Absolute Error (MAE) ith CFD result, or ground truth, and fi is the ith surrogate pre- diction.The MAE performance indicator was used as it does not bias the analysis toward outliers like other indicators such as Root Mean Squared Error does.This indicator was chosen since the surrogate model is recreated several times when used for optimisation and it is thus more important to have good performance on average.In the case of the two-and threedimensional test cases, the sample points are chosen as a subset of an LHC plan containing all 101 simulations.The subset is chosen by optimising the Audze-Eglais error as a subset from the original plan.Note that extracting a subset from an existing plan will result in an Audze-Eglais objective which is worse than creating an entirely new plan; however, this significantly reduces the computational expense since one dataset can be used to investigate the performance for several subsets.Creating the surrogate from an optimal subset of the entire optimised LHC plan ensures separation between the evaluation points and the points from which the surrogate is created, which can be seen in Fig. 13.The performance of the surrogates is investigated in subsets of 10, 20 and 40 points for each case and for one, two and three design dimensions.
Due to the method being stochastic, each interpolant is created 25 times.The presented performance data indicates the median performance with ±95% confidence interval for the median unless stated otherwise.The median confidence interval for the lower interval is calculated as and for the upper interval is calculated as where ⌊x⌉ denotes rounding to the nearest integer.The samples from which the median is created are shown; however, for illustration purposes, the y-axis is limited with some outliers not visible.An example can be seen in Fig. 14a.

One dimension
The roof angle is used to investigate the surrogate model performance for one dimension.In one dimension no relative scaling is used, i.e. it is only cases A and B which are investigated.The design parameter is varied from 0 • to 22 • in steps of 0. The method performance for the POD-based and force-based surrogate can be seen in Figs.14a and 14b.Both surrogate models perform similarly for 20 and 40 samples while the POD-method performs worse for 10 sample points.It is interesting to note that the spread between samples is larger for the POD-based surrogate.Case B, being a surrogate model created using different RBFs and widths, has a larger spread between values compared to case A, particularly for 10 sample points.This can be explained by the degrees of freedom being too large for the number of sample points, leading to overfitting.
The surrogate performance increases as the number of points increase; however, the improvement is gradual.The prediction error for each sample is investigated further for the surrogate with median performance, Fig. 15.The performance is largely within 0.005 C D absolute error except for around 16 • roof angle, where the error is large.Fig. 16 shows all 101 CFD simulations as well as the sample locations for the 10 samples.At around 16 • , the simulation results are discontinuous.This is due to the flow separating from the roof, causing a sudden increase in drag.It is  believed that the limited increase in surrogate performance, as the number of samples increase, is due to the discontinuity in drag.It is a known fact that the performance of RBF interpolation suffers in the neighbourhood of discontinuity or strong gradients [17,39].Jakobsson et al. [17] developed the rational Radial Basis Function interpolation, which better approximates steep gradients; however, rational RBFs were not tested in this work.The samples after separation are in a design region which is not desirable, removing these samples and containing the problem could improve the surrogate performance.Additionally, weighting the LOO-error to favour the surrogate performance of points with low drag could improve the interpolant near the minimum.
The POD-based method's relatively poor performance at 10 sample points is investigated further by looking at the convergence of POD modes.The mode convergence can be used to determine if additional entries to the snapshot matrix X would increase the information gained.Muld et al. [15] used two metrics to determine if the POD modes were converged, the orthogonality between modes and the L 2 -norm.It was concluded that both methods give good results, here the L 2 -norm is used, defined as for the ith modes.Each mode is normalised to length 1 before the comparison.The mode φ i,end is the ith mode used for comparison, containing the snapshots for all 101 designs.The minimum is used since the sign of each mode is not known.If the L 2 -norm approaches 0 before the last snapshot it is considered converged, that is, adding additional snapshots will not add more information.The snapshot convergence for the first 10 modes is presented in Fig. 17.The first POD mode shows signs of convergence; however, the remaining 9 modes do not.This is thought to be the reason why the POD-based surrogate performance is particularly poor if the number of samples is small.The energy contained in each mode indicates how many modes are needed to capture the variance in the dataset.Fig. 18 shows the energy contained in each POD mode based on all 101 simulations.The first mode contains a large portion of the overall energy (83 %); however, 77 modes are needed until 99 % of the energy is captured.Since a large number of snapshots are needed,  it indicates that there is not much underlying generality in the dataset which can be extracted using POD.It is a known fact that POD is not capable of decomposing travelling wave problems which might explain the slow convergence and the large number of POD modes needed to reconstruct the dataset [14].

Two dimensions
The two-dimensional design parameters are the roof angle and the diffuser height.The roof angle is varied from 0 • to 22 • and the diffuser height from 0 mm to 350 mm, with 0 mm being defined as a flat floor, Fig.The performance for cases A through can be seen in Fig. 19.The performance between the POD-and force-based methods is larger for the two-dimensional case to the one-dimensional test, with the force-based method being the more accurate surrogate model.It should be noted that the confidence interval for the median is generally larger for the POD-based method due to the larger spread.Even though the confidence interval of the median is large, the force-based surrogate consistently outperformed the POD-based surrogate.
The best performing interpolant is case C, i.e. using the same RBF and width factor with adaptive scaling.The adaptive scaling results in better performance overall for both surrogate models.The surrogate performance for cases A and C, which use the same RBF and width factor, generally results in less spread between the surrogates' performance, compared to B and D. The median performing force-based surrogate is investigated to increase the understanding of the surrogate model's shortcomings.The absolute error for the median surrogate and the associated training locations can be seen in Fig. 20 with the associated drag values in Fig. 21.At around 16 • roof angle the flow separates in the twodimensional case, similar to the one-dimensional case; however, there is an interaction between the diffuser height and angle at which the roof flow separates since the diffuser angle influences the separation point.In the region of large differences in drag, the predictive performance of the surrogates tends to decrease.
Due to the stochastic nature of the surrogate model construction, the Mean Absolute Error for the 25 surrogates is investigated in Fig. 22a.Note that the error for all points is shown since the subset is stochastic, resulting in different subsets being used for the surrogate construction.
The region of largest MAE is located close to the point of separation, at around 16 • roof angle, which is an expected result.
As mentioned in the introduction, Goel et al. [5] used an ensemble of surrogates as an indicator of the local error.This indicator can be used to add infill points to the surrogate to increase its accuracy.The use of the standard deviation as an error indicator is also investigated here.The standard deviation for predicted drag for the 25 surrogates can be seen in Fig. 22b.
Comparing Figs.22a and 22b, the areas of high standard deviation are located close to regions where the MAE is larger, around 16 deg roof angle.Note that the standard deviation for the surrogate is only shown at the sample locations since this is the only locations which can be used to verify the local error.It is, of course, possible to determine the standard deviation of surrogates at any location in the design space.This is a promising result for the use of the standard deviation of several surrogates as an indicator of potential infill locations.

Three dimensions
The third design parameter added to the test case is the frontwheel deflector on/off.The influence of the front-wheel deflector was tested for one of the sample points.The tested configuration has a roof angle of 4.6 deg and a diffuser height of 74 mm.The front-wheel deflector reduced drag by 0.009 C D and reduced lift by 0.007 C L for this configuration.
The performance for each case, number of sample points, and the comparison between the POD-based and force-based surrogate are similar to the two-dimensional test case and therefore are not shown here.The results for three design parameters show an overall larger error, as expected.
In this work, the Latin Hypercube Sampling plan was extended to include categorical, or discrete, design parameters.The performance difference between using separate LHC plans per category or using one categorical LHC is presented here.In this comparison, a subset of 20 sample points is used.Out of these 20 points, half is used to create a surrogate for front-wheel deflector on, while to other half is used to create a surrogate for frontwheel deflector off.These two separate surrogates are compared with a surrogate using all 20 points.This process is repeated 25 times for cases A and C to gather statistics due to the stochastic nature of the surrogate creation.
The results can be seen in Fig. 23, where the performance for the unscaled surrogate model, case A, shows a small benefit of using the categorical LHC plan.The performance when using adaptive scaling improves for the separate LHC plans as seen previously; however, when building the surrogate from the categorical LHC plan, the overall predictive capability of the surrogate model is greatly improved.Note that the error bars for the median are small and not visible in Fig. 23.The negligible performance gain for case A when using the categorical LHC is thought to be limited by the compromise of choosing RBF and width factor when included design parameters are of largely differing scales.This is similar to the interpolation performance of the three camel hump function presented in Section 2.3.

Benchmark test cases
Based on the accuracy of the aerodynamic test case, the surrogate models C and D, which both use adaptive dimensional scaling, are selected for further benchmarking.Surrogate models C and D are compared with test functions f 3 and f 8 , Rosenbrock 2D and Hartmann 6D, in Fig. 24.The increased flexibility of the surrogate model D in the two-dimensional case increases the convergence speed after approximately 20 function evaluations.The six-dimensional test function indicates slower convergence when using surrogate model D; this is thought to be due to overfitting when the number of free parameters is too large compared to the problem being solved.It is not clear when there are enough samples for surrogate model D to outperform C. The benchmark results are only presented for surrogate model D from here on.
Each benchmark function was run 100 times for the surrogate model to gather statistics where the confidence intervals are calculated using bootstrapping.The Bayesian Optimisation algorithm was run 20 times for each function due to the high computational cost.The other optimisation algorithms are run a minimum of 100 times; however, sometimes more runs were required to reduce the confidence intervals and give meaningful comparisons of the mean performance.The surrogate model shows good initial convergence speed which is important when optimising with a fixed time budget.It is important to note that the results are presented on a logarithmic scale and that the change in the objective function value for the lower levels of convergence can be small relative to the problem being solved.
The Bayesian Optimisation algorithm has overall good performance and tends to perform better for multimodal functions such as the Rastrigin or Styblinski Tang function.The Bayesian Optimisation featured here, called Dragonfly [37], uses several acquisition functions and initially picks any acquisition function with equal probability.As the optimisation continues, acquisition functions that perform well are picked with a larger probability.
The benchmark results for 10 % noise are presented in Fig. 26.Overall the convergence speed of all algorithms is reduced when noise is introduced to the problem.The surrogate model performs better or as good as the other methods, in all nine benchmark functions.The NM algorithm, which showed good performance for some of the noise-free benchmark problems, tends to converge prematurely due to the added noise and even performs worse than the RS algorithm for all tested problems.
The Bayesian method tends to perform well for noisy functions, and it is theorised that this is in part due to the stochastic acquisition function selection.The surrogate model performance might be able to be improved on noisy and multimodal functions by using a similar approach or even a hybrid of the two methods while keeping the current performance for smooth functions.
It should be noted that the Bayesian Optimisation algorithm is the most expensive optimisation algorithm used in the benchmark.The Bayesian algorithm is approximately an order of magnitude more expensive to compute compared to the surrogate model for the investigated test cases.To create the surrogate model, containing 100, 200 and 400 samples, required approximately 3, 12 and 66 s respectively on an Intel(R) Xeon(R) E5-2640 v4 CPU utilising 10 cores.The break-even point between the cost of function calls and the surrogate model creation is quickly outweighed by the reduced number of function calls for expensive function.Considering aerodynamic optimisation, where the function call is a wind tunnel test or a numerical simulation, the wall clock time is large enough to outweigh the surrogate creation cost.
The surrogate model tended to perform particularly well compared to the other methods for functions where the difference between the design dimension is large, such as the Rosenbrock problems.Remaining questions of interest for RBF based optimisation is increasing the performance for non-smooth functions such as the Rastrigin function.Kandasamy et al. [37] used an approach where successful objective functions evaluations were selected out of a set of objective functions with a higher probability.Another weakness of the RBF based surrogate model is poor performance around discontinuities in the objective function, which requires further attention.The computational cost of creating the surrogate model is negligible for most aerodynamic optimisation.However, to increase the application range to cheaper objective functions, the computational cost needs to be reduced.Anitescu et al. [40] used a coarse grid to train their neural network initially and stated that the approach increased the robustness as well as significantly reducing the computational effort.It is possible that a similar approach could be adapted to the RBF based technique to reduce the cost of the method without sacrificing accuracy.
The results indicate that the SM optimisation algorithm is robust, high performing, capable of handling a low number of initial samples and can optimise functions with noise.

Fig. 3 .
Fig. 3. Categorical LHC sampling plans with varying weights.The two categories are represented by grey and black dots respectively.

Fig. 7 .
Fig. 7. Forrester function and interpolant created from five points.The standard deviation and average based on 1000 runs.

Fig. 11 .
Fig. 11.Planes used to evaluate lift and drag.Wake plane S extends to the domain edges.O, R and G are the domains outlet, roof and ground respectively.

Fig. 13 .
Fig. 13.Optimised subset LHC plan in two dimensions for a subset of 20 points, also seen are the points used to evaluate the surrogate performance.
22 • , resulting in 101 simulations.The maximum/minimum drag and lift coefficient varied from 0.231 C D to 0.272 C D and −0.208 C L to 0.128 C L for the investigated samples.It should be noted that the investigated range for the design parameter is larger than what is typically considered when working with external vehicle aerodynamics.

Fig. 14 .
Fig. 14. 1D, surrogate performance.Bar of the median performance, the error bars indicate the ±95% confidence interval for the median.The samples are shown as circles.

Fig. 15 .
Fig. 15.Performance of the median force-based surrogate for 10 sample points, case C.

Fig. 16 .
Fig. 16.Simulated drag values of all 101 simulations for varying roof angle.

Fig. 19 .
Fig. 19.2D, surrogate performance.Bar of the median performance, error bars indicate the ±95% confidence interval for the median.The samples are shown as circles.

9 .
Based on the LHC plan with 101 simulations, the drag varied from 0.227 C D to 0.284 C D and the lift from −0.214 C to 0.211 C L

Fig. 20 .
Fig. 20.Performance of the median force-based surrogate for 20 sample points, case C.

Fig. 21 .
Fig. 21.Simulated drag values for all 101 simulations.Sample points of the median force-based surrogate for 20 sample points, case C.

Fig. 22 .
Fig. 22. Mean absolute error and standard deviation of 25 surrogates created from 20 sample points, case C.

Fig. 23 .
Fig. 23.Separate LHC plans compared with one categorical plan for cases A and C using 20 sample points in total.

Fig. 24 .
Fig. 24.Surrogate model optimisation performance on two test functions using surrogate model versions C and D.

Fig. 25
Fig.25shows the performance history for all nine benchmark functions without noise for a maximum of 500 function calls.The surrogate model outperforms the other methods in eight out of the nine tested functions.The surrogate model shows good initial convergence speed which is important when optimising with a fixed time budget.It is important to note that the results are presented on a logarithmic scale and that the change in the objective function value for the lower levels of convergence can be small relative to the problem being solved.The Bayesian Optimisation algorithm has overall good performance and tends to perform better for multimodal functions such as the Rastrigin or Styblinski Tang function.The Bayesian Optimisation featured here, called Dragonfly[37], uses several acquisition functions and initially picks any acquisition function with equal probability.As the optimisation continues, acquisition functions that perform well are picked with a larger probability.The benchmark results for 10 % noise are presented in Fig.26.Overall the convergence speed of all algorithms is reduced when noise is introduced to the problem.The surrogate model performs better or as good as the other methods, in all nine benchmark functions.The NM algorithm, which showed good performance for some of the noise-free benchmark problems, tends to converge prematurely due to the added noise and even performs worse than the RS algorithm for all tested problems.The Bayesian method tends to perform well for noisy functions, and it is theorised that this is in part due to the stochastic

Fig. 25 .
Fig. 25.Optimisation benchmark performance, functions f 1−9 .Results are normalised by the test function maximum and minimum value.

Fig. 26 .
Fig. 26.Optimisation benchmark performance, functions f 1−9 with 10% white noise.Results are normalised by the test function maximum and minimum value.
RBFinterp i (d new ) 10 for i ← 1 to m do 11 a i,new = i (d new )φ i (r); 14 objective = integrationPlanes(X new (r, d new )); 15 end 16 Add design(s) found by EA to sampling plan; 17 Simulate the new designs in the sampling plan; 18 Interpolate X(r, d) to common mesh; 19 end

Table 1
Benchmark test functions.

Table 2
Benchmark test function search space.

Table 3
Surrogate performance configurations.