Abstract

In this paper, we investigate two ideas in the context of the interpolation-based optimization paradigm tailored to derivative-free black-box optimization problems. The proposed architecture maintains a radial basis function interpolation model of the actual objective that is managed according to a trust-region globalization scheme. We focus on two distinctive ideas. Firstly, we explore an original sampling strategy to adapt the interpolation set to the new trust region. A better-than-linear interpolation model is guaranteed by maintaining a well-poised supporting subset that pursues a near regular simplex geometry of points plus the trust-region center. This strategy improves the geometric distribution of the interpolation points whilst also optimally exploiting the existing interpolation set. On account of the associated minimal interpolation set size, the better-than-linear interpolation model will exhibit curvature, which is a necessary condition for the second idea. Therefore, we explore the generalization of the classic spherical to an ellipsoidal trust-region geometry by matching the contour ellipses with the inverse of the local problem hessian. This strategy is enabled by the certainty of a curved interpolation model and is introduced to accounts for the local output anisotropy of the objective function when generating new interpolation points. Instead of adapting the sampling strategy to an ellipsoid, we carry out the sampling in an affine transformed space. The combination of both methods is validated on a set of multivariate benchmark problems and compared with ORBIT.

1. Introduction

We consider the following optimization problem with the objective function . We further assume that is continuously differentiable, smooth, and bounded from below and that is Lipschitz continuous on the subset . We restrict the minimization to a convex subset , however, we assume that none of these constraints are active at the minimum so that (1) can be considered unconstrained in practice (the subset is mainly introduced to emphasize that the search is limited to a range of practical interest. It may also be a means to delimit the search space to the definition domain of the function).

Such problems are common in engineering, and consequently, a large family of search algorithms has been tailored to solve such problems. To motivate the development of derivative-free optimization methods, we premise additional assumptions on the computational tractability of the objective.

As is the case in many engineering design applications, we assume that any evaluation of the objective is associated with the execution of a complex simulation or legacy code. Such evaluations are time-demanding and usually render local derivative information inaccessible. These assumptions rule out the use of the members of the family of gradient-based optimization algorithms that exploit the first- and second-order derivative information to search for points that satisfy the first- and second-order optimality conditions. Under these conditions, it is also intractable to approximate derivative information indirectly using, for example, a finite difference scheme, given the tight evaluation budget that is available. In short, these assumptions imply that the search algorithm has to rely exclusively on direct evaluations of the objective function to probe the local geometry of the problem. The denominator derivative-free optimization (DFO) refers to any search algorithm that uses only the direct evaluations of the objective function.

1.1. Overview of DFO Methods

In this family, there are two main classes to be recognized, which are as follows: stochastic search methods and deterministic search methods. Either can be further subdivided into population-based and evolutionary strategies and direct search methods and interpolation-based search methods, respectively. The distinction between stochastic or deterministic here refers to the property that, excluding the random initialization of the algorithm, the execution and convergence history is the same for deterministic algorithms whilst being subject to variation for stochastic search algorithms.

Generally speaking, stochastic search algorithms maintain a belief over the solution space representing possible candidate solutions. This belief is either represented with an actual population or with a parametric distribution, e.g., the multivariate normal distribution. Members of the latter class are denoted as evolutionary strategies (ESs). Stochastic search methods proceed with the evaluation of any new members of the population and use stochastic operations to generate a new population. In the past few decades, large varieties of population-based algorithms have been invented that are inspired by the swarm intelligence exhibited by animal populations. Examples are monarch butterfly optimization [1], earthworm optimization algorithm [2], elephant herding optimization [3], moth search algorithm [4], slime mould algorithm [5], Harris hawks optimization [6], artificial bee colony optimization [7], and grey wolf optimization [8], amongst others, which, in some way or another, are all related to the classic particle swarm optimization algorithm [9]. These algorithms have found applications in structural design optimization mostly and are particularly appealing to find global solutions at the cost of sampling the objective function abundantly. Instead of a population, the class of evolutionary search algorithms maintains a parametric distribution, which is used to spawn a sample set, which, in turn, is evaluated and assessed to infer the next parametric belief. The benefit of these evolutionary strategies is that they may be derived from much more principled arguments. Again, their main appeal is that they solve for a global solution and are particularly well-equipped to treat noisy or multi-global objective functions, where other algorithms, even population-based algorithms, are prone to get stuck in local minimal. Examples popular in the machine learning and reinforcement learning communities are the covariance-matrix adaption evolutionary strategy [1013], natural evolutionary strategies [1417], and information geometric optimization [18]. On account of their inherent stochasticity, the execution of stochastic search methods may be subject to excessive covariance, and their actual performance may differ for each individual run. Basically, they battle noise with noise.

Deterministic search methods eliminate that problem. The subclass of direct search methods encompasses the historical Nelder-Mead simplex method [19] and modern pattern search methods [20], such as mesh adaptive direct search (MADS) methods and related variants [2124]. These direct methods are characterized by the fact that they try to satisfy the zeroth-order optimality criterion by comparing the ordering of evaluated points but usually do not exploit any directional preferences implied by them. The advantage here is that they will not be misled by false positives or false negatives (e.g., a strong objective slope that suddenly flattens) but consequently can also not exploit, or at least to a lesser extent, true negatives and true positives (e.g., a strong objective slope that actually points into the direction of the minimum). The other subclass of direct DFO methods can be described as interpolation-based optimization (IBO) methods. The basic idea here is to approximate the objective function by fitting an interpolation model to the available objective function evaluations. The hypothesis is that the underlying objective function is subjected to tendencies that are picked up by the interpolation model, which, in turn, can be exploited to explore the potential regions of interest. Within this scope, there are basically two main branches that can be classified based on the fact whether they maintain a global or a local interpolation function. The global branch was founded when the efficient global optimization (EGO) algorithm was first published [25]. Members of this class maintain a so-called acquisition function that can be optimized to determine the next best point to be evaluated and added to the interpolation set. A large family of methods has been developed based on this idea. The other branch of strictly deterministic interpolation-based methods is based on the trust-region framework and will be discussed in the following subsection.

In conclusion, we note that the principal idea of interpolation-based optimization, i.e., replacing the objective function with an interpolation-based proxy, has been subject to cross-fertilization with ideas from the population-based stochastic search family and are commonly referred to as surrogate-assisted stochastic search methods. As opposed to the acquisition function used in EGO and related methods, here, a so-called batch infill criterion is used that can generate a whole set of points instead of a single point at once. Examples are TLRBF [26], RBFBS [27], SATLBO [28], SAGWO [29], DSCPSO-EMM [30], amongst others. Further exposition is out of scope.

1.2. Trust-Region IBO and Contributions

This article focuses on the particular deterministic DFO subclass of trust-region-based IBO methods, or TRIBO for short [3134]. As opposed to the global IBO methods surveyed above, TRIBOs target a compact neighborhood in each iteration. Such a strategy limits the required amount of sample points to obtain a reliable surrogate in the compact neighborhood considered during that iteration [35]. A standard gradient-based optimizer can then be used to solve the local subproblem confined to the compact neighborhood.

This analysis allows one to identify three key components. A framework is required to maintain a sequence of compact neighborhoods. Secondly, a strategy should be present to populate the new compact neighborhood with interpolation points. In turn, this interpolation set can be used to determine a new local interpolation model that is sufficiently accurate in the compact neighborhood. In this work, we will make use of the radial basis functions (RBF) interpolation framework [3638]. The first two components are discussed next.

The trust-region globalization scheme offers a framework to generate a sequence of compact neighborhoods that will be referred to as trust-regions henceforth. When the surrogate satisfies Taylor-like error bounds on its function value and gradient for every iteration, convergence to first-order optimality can be guaranteed [39]. In the context of interpolation, such error bounds can be established, depending on the geometry of the interpolation set used [40]. The geometry of an interpolation set can be understood as the distribution of points with respect to a certain region. The quality of geometry is quantified by a set-dependent geometry metric. An arbitrary set of points is then said to be well-poised for interpolation within the specified region if the corresponding geometry measure is sufficiently large.

To guarantee a valid interpolation model suited for optimization within a specified trust region, IBO methods maintain a subset of the available interpolation set for which the geometry constant is maximized. Consequently, the well-poisedness of the subset geometry is assured, and so is the model validity. In every iteration, this constant is maximized by removing points from the interpolation set and generating new ones according to the current size and location of the trust region whilst limiting the overall accumulation of objective evaluations. This delicate mechanism is referred to as interpolation set management.

In [32, 41, 42], such a geometry measure was described using a basis of polynomials. Other authors have described this constant in the function of the QR pivots of the Vandermonde matrix determined by the interpolation points [31, 40]. The mentioned methods consider a subset of interpolation points to guarantee a valid model. In any of these cases, the interpolation set management pursues orthogonality between interpolation points relative to the current trust-region center. The center then serves as the interpolation point.

The contribution of this article is twofold. Firstly, in our algorithm, we pursue a subset of instead of interpolation points, as seen in [31, 32]. An original sampling strategy is devised that is inspired by the volumetric extremity properties of the regular -simplex. This strategy has two benefits. For one, we achieve an improved interpolation set distribution, which is heuristically justified for low-dimensional problems. Then, considering that we use the RBF framework to construct a local interpolation model, it is guaranteed that the surrogate will exhibit curvature.

Secondly, we propose to leverage this curvature information. We, therefore, generalize the traditional spherical trust-regions to ellipsoidal trust regions. The motivation to use an ellipsoidal geometry is again twofold. An ellipsoidal trust-region will adapt to the geometry of the objective function and will, therefore, allow larger steps toward more promising regions compared to using spherical geometry, i.e., the trust-region boundaries will align with the contours of the objective function, and the objective will look like a spherical problem. This idea is already leveraged in the context of evolutionary computation [10, 43]. Secondly, our interpolation set management is input-space oriented, meaning it only takes into account the local distribution of the interpolation points and does not consider the associated output values. The introduction of an ellipsoidal trust-region, whose shape reflects the objective’s response behavior, allows to account for the output-space when executing the interpolation set management and will, therefore, benefit interpolation-based metamodeling.

1.3. Outline

To keep this article self-contained, we provide a brief review of the general trust-region framework for surrogate management in Section 2. In Section 3, we discuss the radial basis function interpolation framework and provide details on the implementation. Section 4 motivates our choice for using interpolation points and introduces our interpolation set management. In Section 5, we discuss the transition toward ellipsoidal trust regions. In Section 6, we validate our methods empirically on a number of benchmark problems.

2. Trust-Region Methods

In this section, we give a brief review of the traditional trust-region globalization scheme and its applications in IBO. Therefore, we will introduce the so-called class of fully linear surrogate models that generalize the characteristics of the surrogate model used by the trust-region methodology from gradient-based models (such as second-order Taylor models) to more general model types (such as radial basis interpolation models).

2.1. Generic Trust-Region Method

The trust-region framework is a widely-used technique to attain the global convergence of gradient-based optimization algorithms. At every iteration , a surrogate model, , is considered that approximates the actual objective within a neighborhood of iterate . Traditionally, this neighborhood, i.e., the trust-region, is modeled as a sphere.where is the current iterate and is referred to as the trust-region radius.

A trial step, , is obtained by minimizing the surrogate model, , in the trust-region, . This minimization problem is referred to as the trust-region subproblem. The resulting trial step, , gives rise to the trivial candidate iterate, .

In every iteration, the size and position of the trust-region are updated in such a way that the generated sequence of iterates converges. The configuration of the trust-region, which is determined by the couple, , is updated according to a fundamental scheme based on the ratio of actual to predicted decrease, . The ratio constitutes a metric for the predictive quality of the surrogate with respect to the current trust-radius [35].

Based on the value of , the trust region is updated according to the following general rules. If the prediction of the surrogate is good, the radius is increased. In case of a bad prediction, the radius is reduced. For other cases, the current radius is simply maintained.

To regulate the expansion and contraction of the trust-region, one takes values and , where and represent the minimal and maximal radii, respectively. The value of is then updated as follows:

To decide whether to accept or reject a trial step, , we adopt the rules from [35]. If the candidate iterate, , is not accepted, then the trust-radius shrinks, given that .

The generic trust-region algorithm is presented in Algorithm 1.

(1)choose , , and
(2)for until convergence do
(3) obtain model
(4) solve sub-problem (3) to obtain
(5) evaluate and compute through (4)
(6) update and according to (5) and (6)
(7)end for
(8)return
2.2. Trust-Regions and Fully Linear Models

In the gradient-based trust-region method, the surrogate, , is provided by a quadratic model of the following form:where and are the evaluations of the first- and second-order derivatives of the objective function in the iterate , respectively. According to Taylor’s theorem, a region exists within the vicinity of , where the assumption that model is a reliable approximation of is true. A region of reliable approximation can be defined as a region where the surrogate modeling approximation error, reflected by , is small enough such that the descend in the surrogate model, , restricted to this region corresponds to the descend of the objective function . It is apparent that the strength of the trust-region method thus depends on the capacity of the updating rules to maintain a region, where the surrogate modeling approximation error is sufficiently bounded so that is suited for optimization purposes [44, 45].

From that perspective, the trust-region method can be decomposed into two crucial mechanisms that interact to generate the sequence of iterates. A procedure must be provided that can generate a quadratic approximation model of the goal function for any given iterate . The updating rules will then maintain a series of regions, where the corresponding and consecutive models, , are suited to generate the descending sequence of iterates as the solution of subproblem (3). The latter mechanism is in that aspect generic and can be employed to maintain a series of trust-regions given any type of model. Therefore the mechanism extends from quadratic models to the interpolation based and thus derivative-free models that will be used here.

In this work, we are interested in the class of so-called fully linear models. The definition of a fully linear model establishes Taylor-like bounds on arbitrary models, and therefore, it generalizes the notion of a reliable approximation. As a consequence, the trust-region framework can be generalized to include such fully linear models.

We adopt the definition from [39, 40] with .

Definition 1. For given, , and for given, , defining, surrogate modelis said to be fully linear on, if for all, Let us assume two different procedures that can generate a fully linear surrogate model on or that are able to verify whether the surrogate model, , is fully linear on and that for any given , . If two such procedures are available, (1) can be modified to construct a trust-region algorithm with fully linear models that will converge to a first-order critical point of the function [39]. Algorithm 2 presents such a trust-region algorithm using fully linear models.
The main modification is the introduction of the termination criteria, . This condition is a reformulation of the first-order optimality condition on the objective function, . Assuming that is fully linear on , we have the following:Algorithm 2 assures that upon termination, model will be fully linear on . Substituting for in (9) reveals that this is equivalent to satisfying the first-order optimality criterion, . We remark that it is thus fundamental to provide procedures that can guarantee the tightest possible bound on and . In Section 4, such procedures are developed, considering that model is an interpolation model.

(1)choose , , and
(2)for do
(3) construct fully linear model valid in
(4)if then
(5)  if is fully linear in then
(6)   stop
(7)  else
(8)   set and
(9)end if
(10)else
(11) solve sub-problem (3) to obtain
(12) evaluate and compute through (4)
(13) update and according to (5) and (6)
(14)end if
(15)end for
(16)return
2.3. Interpolation Models

When first-order derivative information, , is not directly available, it is possible to construct a surrogate model based on strategically generated data, i.e., the direct evaluations of the objective function. A model denotes an interpolation model if it satisfies the interpolation conditions in (10) for a given input set, , with interpolation points, , and a corresponding output set, .

A quadratic model, as in (7), is capable of modeling the curvature of the underlying function and allows, therefore, to proof global convergence to second-order critical points [39]. Correspondingly, several DFO methods have been developed that operate with quadratic interpolation models [34, 46, 47]. To determine a quadratic model, the number of interpolation points, , must either equal or exceed , thus requiring at least that number of objective function evaluations. This requirement contradicts with our intention to limit the number of objective function evaluations.

A linear model can be constructed with as few as interpolation points. Algorithms that operate with minimal interpolation sets of are described by [31, 32, 48]. However, the drawback, in this case, is that a linear model is not capable of representing any curvature present in the objective.

In this work, we impose the additional requirement that the minimal number of interpolation points must equal because it is then guaranteed that the resulting model can represent a curvature, given a proper interpolation framework (see Section 3). We will refer to such models as better-than-linear models, noting that these models classify as fully linear models but are constructed with at least one more interpolation point than the minimum for exact linear models .

2.4. Interpolation-Based Optimization

Anticipating that a procedure is available that can generate an interpolation model, , that classifies as a fully linear model, an interpolation framework is trivially embedded in Algorithm 2.

In Section 3, we will discuss a multivariate interpolation method that can model curvature and is flexible in the sense that it does not impose strict requirements on the number of interpolation points. Within the context of this framework, it will be shown that an interpolation model can be classified as a fully linear model if the set, , satisfies conditions on the geometry of the points with respect to the trust-region [40]. The geometry can be understood as the scattering of interpolation points inside a specified region. In Section 4, the geometry of an interpolation set is quantified using a novel geometry metric.

The procedures of verifying and generating fully linear interpolation models thus boil down to verifying whether the geometric conditions are satisfied with respect to having a fully-linear model for a given interpolation set and to complementing an existing or possibly empty set so that the condition is satisfied by construction. In the context of IBO, these two mechanisms are provided by a single procedure that is referred to as the interpolation set management or the sampling strategy. A sampling strategy for minimal interpolation sets of is proposed in Section 4.

3. Radial Basis Functions

Algorithm 2 allows to engage the trust-region globalization scheme whilst substituting interpolation models (10) for the surrogate model, . It requires a flexible, multivariate interpolation modeling framework that can capture curvature and does not impose strict conditions on the number of interpolation points since we will recycle interpolation points from previous iterations. The radial basis function (RBF) interpolation framework has readily shown its utility within IBO [3133]. The traditional RBF framework decides arbitrarily on the shape of the modeling RBFs by fixing a shape parameter a priori. Alternatively, one can determine the shape parameter based on the interpolation data. In that case, the RBF framework coincides with universal kriging [36], where the shape parameter is referred to as a hyperparameter. The modeling assumptions of the framework and the determination of the model parameters and hyperparameters are discussed next.

3.1. Interpolation Framework

Consider a given set of interpolation points and corresponding objective function evaluations . The RBF multivariate interpolation framework then presumes a function model of the following form:where is an element of the class of RBFs and is an element of an -variate polynomial space of at most degree , referred to as the polynomial tail. Let be an -variate basis for with being the size of this -variate basis. Then, the polynomial tail can be represented by a linear combination of this basis, i.e., , with being the expansion coefficient. Some popular choices for RBF are listed in Table 1. Notice that the function has isosurfaces coinciding with hyperspheres in , i.e., the functions radiate out isotropically.

By introducing vectors, and , and, and , , which entries are given by and, respectively-, we can reformulate the RBF model as

3.2. Determination of Model Parameters

The RBF function model (12) is uniquely defined by the parameters and . These parameters can be determined as follows:

Let the Vandermonde matrices and be defined as and . The set of interpolation conditions in (10) can then be written as follows:where the entries of are defined as the function evaluations contained in .

These conditions do not render unique and . Therefore, additional conditions are required to complement (13) so that the assembled system matrix is nonsingular. To this end, we require that the model is the smoothest model of the form described by (12) that complies with the interpolation condition (10). For RBFs, function smoothness can be defined in relation with the concept of conditional positive definiteness (CPD) [37]. RBFs have the specific property that to any RBF, , one can associate a value such that if the maximum degree of the polynomial tail satisfies , the matrix will be positive definite with respect to all vectors . Equivalently, we have that if and , it holds that . In conclusion, when CPD is satisfied, i.e., , it can be shown that the corresponding interpolator is the smoothest function of form (12) satisfying (10). For an elaborate proof and additional details on CPD, we refer to [37].

Formally, the matrix representation of the interpolation condition can be complemented as such by the function smoothness condition to obtain the linear system of equations.

The model parameters are determined uniquely if the system matrix is nonsingular. It is easily verified that if , it is then sufficient that has a full column rank. If so, the interpolation set is said to be well-poised for interpolation. In Section 4, how the well-posedness corresponds with the geometrical properties of the points in is discussed. [4]

Based on the previous observation, the explicit solution of the model parameters is as follows:This result may also be obtained using the universal kriging (UK) framework. Here, a linear predictor of the form is considered. The objective function is modeled as the realization of a polynomial tail and a random process, . The process’ stochasticity is quantified by a spatial correlation function that models the output correlation as a function of the spatial distance between the input values. The correlation is thus expressed in terms of RBF as . It can be shown, e.g., [36, 38], that the optimal linear unbiased predictor, , is then determined by the following system of equations, where can be interpreted as a Lagrangian multiplier.

For additional details, we refer to appendix A.

3.3. Hyperparameters

The value of the shape parameter (see Table 1) is arbitrary and affects the receding radial effect of RBF contribution in (12). Heuristics may be provided, however, they usually lack proper theoretical foundation. A suitable value is mostly determined by trial-and-error. Furthermore, we emphasize that each component of the interpolation points contributes equally to the distance metric, , i.e., the model presumes the isotropic dependence of the objective function on all of its variables. Such isotropic behavior can be achieved by normalizing the optimization space by performing a linear transformation of the input space. Nonetheless, such measure cannot account for any local deviations on the global trend. Since we are to construct local interpolation models whose validity is not to stretch beyond the limits of the trust-region boundary, we could redefine this linear transformation locally.

To account for local anisotropy in the principal directions of the input space, one may define the distance metric as (17). The RBFs in (12) are then redefined as . It is clear that in this case, the univariate hyperparameter has become redundant. In practical implementations of the UK framework, the determination of hyperparameters is automated using a maximum log likelihood estimate based on the observations, .

We refer to appendix A and [49] for elaborate details.

4. Interpolation set Management

In Section 2, we described a trust-region method with fully linear models. It was demonstrated how a sequence of fully linear models can be utilized to converge to a first-order critical point of the objective function . When considering interpolation models satisfying (10), the question whether is a fully linear model boils down to a proper management of the interpolation set used to construct . This so-called sampling strategy should maintain a proper geometry of the interpolation set.

In [40], the geometry of an arbitrary interpolation set, , was quantified by a single set-dependent constant directly related to the conditions in (8) to define a fully linear model. The lemma in [40], however, only applies to interpolation models for which the set size equals exactly. On this basis, the authors in [31] devised an IBO algorithm, employing RBF interpolation models, given at least interpolation points to construct a fully linear model. By applying a QR pivoting strategy directly related to the measure described by [40], their sampling strategy assures that a subset of interpolation points (amongst the entire available data set) exhibits the required geometry to guarantee a fully linear interpolation model. Possibly, additional samples need to be generated to have the verification of the full linearity of the interpolation model. We, henceforth, refer to these points to verify that is fully linear as the affine subset. Additional points from the entire data set are subsequently recycled to enhance the modeling capabilities of the RBF model as long as the well-poisedness of (14) is not corrupted.

Having an interpolation model representing the curvature requires an affine subset of (see Section 3.2) that is adequately embedded in an interpolation-based trust-region algorithm. In other words, we aim to ensure the convergence of Algorithm 2 to second-order critical points whilst only adding a single point to the affine subset with respect to [31, 32]. Since an estimate of the curvature is now always available, the remark on the RBF hyperparameters in Section 3.3 can be generalized to arbitrary directions. The latter is investigated in Section 5.

In Section 4.1, we introduce a generalized version of the theorem in [40] for interpolation sets with size exceeding . On that basis, we identify the optimal affine subset geometry that corresponds with interpolation points in 4.2. Finally, in 4.3, we propose a procedure capable to maintain an interpolation set with an affine subset of points so that the interpolation model, , is fully linear and exhibits curvature.

4.1. Better-Than-Linear Models

The uniqueness of the RBF interpolation model parameters implied well-poisedness, i.e., full column rank, of the matrix . Thus, the size of the interpolation set must equal or exceed the size of the polynomial basis. Given that we address a minimal interpolation set size and thus , the polynomial tail will be linear or equivalently . A convenient choice for the basis is then clearly . Since we will embed the interpolation framework in a trust-region framework, the current iterate is always available for interpolation since is evaluated for calculating the predictive quality of the interpolation model (4). Henceforth, we will consider a trust-region centered at the origin, , by performing a shift of coordinates each time. Furthermore, we require that is always an element of , i.e., .

Introducing the interpolation set Vandermonde matrix, , matrix can be written as follows:

As mentioned in [40], the conditions (8) were related to the interpolation set geometry, and more specifically, they were related to matrix . Here, a generalized lemma is presented, establishing similar Taylor-like conditions for better-than-linear models, i.e., when . A derivation is included in appendix B.

Lemma 1. Suppose thatis an RBF model with linear tail and RBF contributionthat satisfies the interpolation condition,, for the interpolation set, where and satisfies. Furthermore, suppose that andare continuously differentiable in the hypersphereand that andare Lipschitz continuous in, with Lipschitz constants and , respectively, then the following two inequalities are satisfied for any :

These conditions are similar to (8). The lemma provides, therefore, an admittedly conservative estimate of the constants and , however, foremost it reveals the crucial role of the interpolation set geometry, which is reflected by the value of .

It follows that our interpolation set management must ensure the boundedness of from below by the a priori fixed value at each iteration.

4.2. Optimal Critical Set Geometry

The value of is understood as a measure for the algebraic well-poisedness of a given set within [40]. For any given number of interpolation points , we can define the optimal critical set geometry that corresponds with the largest value of corresponding to the tightest bounds on and . Hence, assuring the set’s well-poisedness reduces to generating an affine subset that resembles the optimal set geometry as closely as possible whilst recycling as many points as possible. This strategy balances the requirement to minimize the total amount of function evaluations without corrupting the model quality. Figure 1 depicts optimal geometries for dimensions, maximizing the measure for affine subsets of size , , and , respectively. The optimal set geometry for complies with the normalized set being orthonormal with respect to the origin, or equivalently when coincides with a standard simplex. This set geometry is pursued by QR pivoting strategies as elaborated in [31] and Lagrange or Newton polynomials-based strategies as elaborated in [32, 40]. The optimal set geometry for corresponds with the regular simplex geometry (plus the origin) [19, 50] and that for with that of an orthoplex (plus the origin) [51]. Orthoplex geometry has been recommended by Powell in [34]. This geometry also determines the fixed search directions in many pattern search optimization methods.

In this context, it is relevant to inspect the spatial distribution of the 2D geometries in Figure 1. The optimal set geometries for and are unoriented: the center of mass of the corresponding sets coincide with the origin. It does not apply for , where the set is biased. Note that the above is valid for arbitrary . To quantify this more rigorously, we introduce the metric, , defined as follows:

Its value is equal to the maximum mean distance that any test point in the trust-region can be removed from the modelling set (including ). The value gives an indication for the worst-case contribution that the radial basis term (that is directly related to the distance between the test point and the interpolation points) has compared to the linear tail. In Table 2, we compare values for different optimal set geometries. It can be observed that values can be improved by only adding a single point, referring to the shift from the standard to the regular simplex geometry. The contours of are presented in Figure 1 that allow one to visualize the regions that are not represented well by the optimal set geometries. For , tends to for all three given configurations, implying that all are as good, or as worse, for infinite dimensional problems.

The discussion based on provides an additional argument to choose the regular simplex over the configuration in Figure 1(a), certainly for low-dimensional problems.

4.3. Regular Simplex Affine Subsets

According to the previous subsection, the optimal affine subset (without the origin) for is a regular simplex. remains an element of the affine subset that affects the definition of a regular simplex affine subset based on the remaining interpolation points.

Formally, the set of all regular simplices lying on the surface is given by (21). (see appendix C) is a scaling factor depending on the radius of the -sphere (here 1).

The geometry of the regular simplex can be related to the matrix norm, , by examining the hypervolume spanned by the convex hull of the affine subset, . Consider, therefore, the hypervolume, , of an arbitrary set, , defined in (22) with spanning vectors, . As outlined in [50, 52], the regular simplex has volumetric extremity properties and maximizes the set well-poisedness metric, .

Hence, the algebraic problem of maximizing coincides with the geometric problem of maximizing the volume of the convex hull of . Approaching the problem from this point of view helps to devise an intuitive interpolation set management, resulting in a critical subset that is (close to) an element of the set , whilst recycling the interpolation points was evaluated in previous iterations.

Based on the hypervolume of the subset , a metric can be construed whether matches a regular simplex sufficiently. The principal idea is that should be close to the theoretical maximal hypervolume of the regular simplex, . In particular, we require the hypervolume that is not smaller than the maximal hypervolume multiplied with some predefined relative error factor . If is not satisfied, new interpolation points need to be generated. This procedure guarantees .

Before we elaborate the particularities of our set management, we mention that each set of points , is a lower order regular simplex itself, where is the dimension of the regular subsimplex considered, and therefore, . is defined in (23). This statement holds since any subset of a set simply inherits the property that the distance between every two points is the same.

4.3.1. Algorithm Expand2volume

Based on the previous idea, Algorithm 3 constructs an affine subset in a systematic manner so that classifies as a fully linear model for given . The goal is to attain an affine subset with hypervolume whilst adding as few points as possible to the available set of interpolation points. The input is the set of all interpolations points available, scaled with respect to the trust radius, , and unbiased with respect to the center, , i.e., . is the set incorporating every point evaluated so far. The current origin is excluded such that we remain with only the candidate interpolation points to form the affine subset.

The algorithm embarks on removing all points that are outside a certain periphery, parametrized by the variable . Note that we artificially enlarge the trust-region since we allow parameter to be greater than 1. We can always assure that the interpolation model is fully linear by redefining and . It ensures that in practice, the number of interpolation points actually accepted to be part of the affine subset is slightly increased, improving the overall convergence. Otherwise, the algorithm might end up expanding an empty set too regularly (certainly for larger ). If this operation yields a subset, , that is still larger than the required affine subset size of (which, in practice, will almost never occur), all points that are furthest away from the trust-surface are additionally removed.

If the size of the resulting subset, , is smaller than , it is expanded by calling Algorithm 4. The procedure described by exploits the idea that if the set were to be an element of , we can expand it with an element of so that their union would then constitute an element from . The expansion element can be generated by rotating and translating a properly sized lower-order regular simplex.

Our numerical implementation proceeds as follows: a set, is determined to be an element of embedded in the -dimensional workspace. The direction is determined as the orthogonal projection direction of the origin on the subspace spanned by the vertices of . Subsequently, a rotation matrix is determined so that the basis spanned by the rotated element is orthogonal to the union of the basis spanned by and the direction . In a final step, the rotated element is scaled according to and translated along the direction over a distance . The factors and are explained in appendix C and will only maximize the volume in the occasional coincidence, where . Nonetheless, this same reasoning can be applied when . In that case, an additional subproblem must be solved to identify the optimal value of to maximize the volume of their union. Here, is a scaling factor related to and . In the description of the algorithm, we make no distinction between a set and its matrix column vector representation .

We also use the following operators: operator constructs a matrix containing a set of column vectors that span the space determined by the vertices in , i.e., . The function outputs a set of vertices that constitutes a regular -simplex in an -dimensional coordinate space. The matrix function constructs orthonormal bases for span and kernel , determined by the columns of the input matrix, respectively. The mechanics of are illustrated in Figure 2. Each subfigure depicts a set that differs in size from the previous set, the corresponding expansion set , the two bases , and and the vector .

If the union does not satisfy the criticality condition , an iterative procedure is initiated, where points are removed from the set systematically. Then, the contracted set can be expanded by applying again. To avoid that, expansion points are generated too close to a point that was removed in an earlier iteration. Information from the removed point is transferred to the contracted set . Information about a point that gets removed is stored by transferring it to a memory set . The mean of this memory set is used to complement the points in that remain before expanding the set with a new regular -simplex. In this fashion, the added simplex is generated away from the mass center of the original set .

The algorithm ends after at most iterations, i.e., when the entire original set is transferred to the memory set . We remark that in this case, the mass center is expanded with an -regular simplex that is, however, not necessarily an element of . As a consequence, the eventual affine subset needs to not be an element of . As this does not ensure that the sufficiency condition is satisfied, the algorithm closes with a final verification of the hypervolume and possible expansion of the set .

Input:
Output:
(1)obtain with elements such that
(2)define
(3)if then
(4) order such that
(5) obtain
(6)else if then
(7) get
(8)end if
(9)find initial memory
(10)define memory set
(11)while or do
(12) find
(13) contract
(14) redefine memory set
(15) expand
(16)end while
(17)if then
(18)
(19)end if
(20)return
Input:
Output:
(1)
(2)
(3), ,
(4)
(5)
(6)
(7)
(8)return
4.4. Remarks

With respect to algorithms 3 and 4 described above, we add the following: during the initialization of -, all points that are furthest away from the trust-surface are removed. It is a heuristic. Alternatively, one might run through a more complex scheme, where every possible size set is verified with respect to their hypervolume, and the largest set is selected. These points could also be added to the memory set. However, it is a situation that will almost never occur in practice, partially because the optimization algorithm will guide the iterations to unexplored regions of the parameter space. It is also because the trust region will contract when the sequence of iterates starts to converge. Hence, by proper choice of and , it can be avoided that the complete interpolation set of the previous iteration is within the current periphery.

Furthermore, we remark that the alignment of the kernel associated with the space spanned by the union and the space spanned by the regular simplex is underdetermined. Here, this additional degree of freedom remains unexploited. Alternatively, this degree of freedom could be employed by orienting the simplex that is addedso that the generated points are located furthest away from the available set of interpolation points or such that the added points correspond to a mean minimal predicted objective function value.

5. Ellipsoidal Trust-Regions

The procedures described in the previous two sections can be employed to verify and construct a model, , that satisfies Definition 1 for a fully linear model in a given trust-region, . Hence, these procedures can be used to embed an interpolation framework in the trust-region algorithm with fully linear models described by Algorithm 2, factually realizing the IBO algorithm.

As for now, the geometry of the interpolation points is considered only with regard to the input space and with respect to the trust-region defined in it. Given a spherical trust-region, our input space-oriented sampling strategy will treat every direction equally in the function of a well-conditioned system matrix . However, from the perspective of optimization, it might be of interest to account for the geometry of the objective function as well.

The gradient of the objective function will be more sensitive to certain search directions than to others as determined by the local curvature of the objective function. Currently, the set management, however, treats all directions equally (as a result of the spherical shape of the trust-region) and will try to homogenize the distance from the trust-region center and in between the interpolation points. Now, as a result of the local curvature, the corresponding function values may vary either faster or slower than that can be expected when solely considering the local objective gradient. Hence, the geometry of the objective function with respect to the arbitrary choice of the optimization space coordinates is reflected by its local curvature.

Therefore, we propose to execute the set management in a transformed space by introducing an auxiliary optimization parameter, . The goal of this transformation is to obtain a normalized curvature of the objective function in the transformed optimization space. We propose to use an adaptive affine mapping from the transformed space to the original space that changes with the trust-region iterations. Notice that the origin in the auxiliary space corresponds with the trust-region center in the original space.

The trust-region will be stretched along those directions, where the local gradient is varying slowly and is shrunk in those directions, where the local gradient is varying rapidly. As a consequence, the objective function values evaluated along the trust-region boundary will increase or decrease in equal amounts with respect to the objective function value at the trust-region center. It will have a positive effect on the hyperparameter tuning described in 3.3 since the principal directions are now homogenized.

Another consequence is that a spherical trust-region in the auxiliary space will now have an ellipsoidal shape in the original optimization space illustrated by figures 3(a) and 3(b). As a result, a step of length 1 taken in the auxiliary space represented in the original space may exceed the original spherical trust-region radius. Two situations can be distinguished when inspecting the directional curvature. When having a large directional curvature on the one hand, the ellipsoidal geometry will have a stabilizing effect, considering that the set of admissible steps (i.e., the trust-region) becomes less aggressive in terms of large jumps in the objective function, which will also have an indirect effect on the modelling procedure. On the other hand, when having a small directional curvature, the ellipsoidal trust region will allow the optimization procedure to take large steps in the direction of a slowly varying gradient. In this manner, the original optimization space will be explored more rapidly. For instance, in figure 3(b), the trust-region in the auxiliary space encloses a larger part of the inner contour that contains the actual minimum. Similar ideas to exploit the curvature information have been used in the context of pattern search algorithms [?], however, to the extent of our knowledge, not to reshape the trust-region geometry in the context of IBO.

5.1. Curvature Normalization

The goal of the coordinate transformation is to normalize the local curvature of the problem in the new coordinate space. This condition can be expressed mathematically as given. For reasons that will become apparent, we introduce an additional scaling factor .

Assume that we now have an estimate for the Hessian . We can formulate an expression for by considering the singular value decomposition (SVD) of matrix . The SVD decomposition of the positive definite matrix is given by . Here, is an orthonormal matrix and is a diagonal matrix with the singular values on its main diagonal. It is then easily verified that,

Finally, we will utilize the scaling factor to keep subproblem (3) as it manifests in the auxiliary coordinate space proportional to its manifestation in the original coordinate space. Mathematically, this proportionality can be realized by determining a measure that remains invariant under the transformation. Recall that as a result of the affine coordinate transformation, a spherical trust-region in the auxiliary optimization space will adopt an ellipsoidal geometry in the original optimization space. We propose the following measure of proportionality: the value of is determined such that the spherical and ellipsoidal trust-region representations remain isovolumetric.

Consider that the volume of an -dimensional hypersphere with radius is equal to and that the volume of an -dimensional ellipsoid with the given half principal axis lengths, , is equal to , where is a mutual scaling factor. We also have that by the construction of , half the principal axis lengths can be written in the function of the singular values as . An isovolumetric relation between the trust-region representations in both the original as well as the auxiliary coordinate space can be maintained by proper choice of the scaling factor , i.e.,

The exact coordinate transformation of (24) can then be written explicitly as follows:

5.2. Exponential Smoothing

Several methods can be devised to provide an estimate of the Hessian, . Since our sample strategy ensures that the local interpolation model is curved, we can calculate an approximation of the Hessian at the next iterate in the auxiliary coordinate space directly from the model, i.e., .

This estimate can be calculated back to the original coordinate space to provide an estimate for the actual Hessian, . To get a smooth approximation, we propose to use exponential smoothing. Note that if , this framework reduces to the original spherical trust-region framework.

6. Numerical Results

In this section, we present the numerical results of the presented derivative-free optimization algorithm applied on a set of unconstrained problems from the CUTEst test environment [53]. As indicated by the neighborhood representation metric, , the method should be especially suited for low-dimensional problems and was developed from this point of view. We have applied the algorithm on a set of test functions with a maximum problem dimension . All tested algorithms were initialized from the traditional starting point [54] and are run once provided that theirexecution is otherwise entirely deterministic. The following benchmark procedure has been adopted, and the results are presented in Table 3. The results as obtained with 4 different versions of the present algorithm have been compared with those obtained with its closest competitors in the context of derivative-free optimization, being ORBIT [31] and NMSMAX. Since ORBIT outperforms the NEWUOA algorithm (at least for small scale problems [31]), we did not compare our algorithm with NEWUOA [34]. The details of ORBIT are elaborated in the associated paper and are essentially similar to the details of the presented algorithm when the spherical trust-region framework is used and are apart from the novel regular simplex-based sampling strategy. We used the implementation made available at [55]. The NMSMAX algorithm is a frequently practiced implementation of the Nelder-Mead simplex method [19] and is made available in the matrix computation toolbox [56]. The NMSMAX algorithm was initialized with a regular simplex in accordance with the present algorithm and is included as a benchmark.

Each algorithm was given a budget of 1000 function evaluations. Since every test function has a known global minimum, , the tolerance of having a convergence of an algorithm was , i.e., . The internal stopping conditions of each of the algorithms were set to 0 so that an algorithm only stops when this benchmark condition was satisfied or when it exceeded its evaluation budget. These internal stopping conditions include the gradient condition (9) and the minimal simplex size for NMSMAX. Only the total number of function evaluations and the difference between the final objective function value and the known minimum are recorded. We did not consider the CPU time and performance and data profiles that are gaining currency in the optimization community [57, 58]. Since the focus of the paper is primarily on the novelty of the sampling strategy and the ellipsoidal trust-region framework, we reason that the adopted approach was best-suited to illustrate the elaborated methods.

Each algorithm was applied using its standard settings to mimic the realistic practical usage. The only parameter that was varied was the initial trust-region radius, or in case of the NMSMAX algorithm, the size of the initial simplex. It corresponds with an arbitrary uniform scaling of the original optimization space, which is a standard practice when performing optimization routines. The standard parameter settings of the presented algorithm are given in Table 4, those of each of the competing algorithms can be found in the respective references. The parameters present in Table 4 were chosen via trial-and-error on a limited benchmark set. Those corresponding to the ORBIT algorithm were adopted from that reference. For future development, a short description of the influence of each parameter is included in table as well.

We have considered 4 different versions of the present algorithm. Details about the numerical implementation are given in the next subsection. These 4 versions allow us to study the separate effects of the two presented mechanisms, namely the novel regular simplex-based sampling strategy and the ellipsoidal trust-region mechanism. The different versions are defined in Table 5 and are related to the values of and from Algorithm 3 and (29), respectively. To study the effects of the regular simplex-based sampling strategy apart from the ellipsoidal trust-region framework, version V1 can be compared with the ORBIT and NMSMAX algorithm. Note that in this case, the interpolation set management is purely input space-oriented. To study the effect of the ellipsoidal trust-region framework, version V2 can be compared with respect to V1. Recall that in this case, the shape of the trust-region is adapted to the output behavior of the objective function. To study the effect of the novel sampling strategy in combination with the ellipsoidal trust-region, versions V2 to V4 can be compared. Note that when parameter (29) is equal to 1, it corresponds with a spherical trust-region framework. It is not possible to study the behavior of the ellipsoidal trust-region framework in combination with the sampling strategy of ORBIT, given that this strategy does not assure that a better-than-linear model is available every iteration.

6.1. Numerical Implementation

All described procedures were implemented in MATLAB R2017 computational environment and run on a Dell Laptop with an Intel i7 CPU with 4 cores. All code is available at the GitHub repository in [59]. Some of the subproblems were solved using the available implementations of routine in MATLAB. These are documented next.

As mentioned in Section 4, once an affine subset was determined using Algorithm 3, the available set of interpolation points was reconsidered to enhance the modelling capabilities of the surrogate model. Additional points were added as long as they did not corrupt the well-poisedness of (14) and if the total amount of interpolation points did not exceed the limit . A similar procedure is performed by ORBIT and was adopted from that source code.

The interpolation framework used here is that of universal kriging. We employed the DACE toolbox to obtain the modelling parameters as described in Section 3. This framework requires the user to define the lower and upper limits of the hyperparameter , , and , respectively. More specifically, we used Gaussian RBFs for the following reasons: they generally perform well and exhibit nice interpolation properties that result in a reasonably smooth optimization surface. Other kernels could be considered as well, however, their influence should be limited.

Subproblem (3) was solved using the MATLAB function , an extension of the function made available on MathWorks File exchange [60]. Since absolute CPU time was not considered, the solution procedure was executed until fully converged.

6.2. Discussion of Results
6.2.1. Benchmark Problems

By assessing the results displayed in Table 3, one can observe that, on the whole, the proposed algorithm outperforms both NMSMAX and ORBIT, disregarding some coincidental draws.

Considering that V1 outperformed ORBIT in all but one occasion, we can conclude that the regular simplex-based sampling strategy offers a significant advantage. One might, however, argue that the increased implementational and computational complexity of Algorithm 3 does not justify the advantageous optimization setting in case that less challenging optimization problems are considered. In none of the benchmark problems did V1 outperform V2, V3, or V4, indicating that engaging an ellipsoidal trust-region framework is clearly a beneficial endeavor regardless, given the low implementational cost. However, the downside is that to obtain a curved model, a minimum of points are required, making the advanced regular simplex sampling strategy indispensable.

Whether V2, V3, or V4 outperformed one another depends on the objective function at hand. However, on the whole, we could state that V2 is the better candidate for lower-dimensional problems, whilst V3 is the better candidate for higher dimensional problems, as will be confirmed in the next section. Also, considering the convergence history of the different algorithms, it was noted that the ORBIT algorithm occasionally stagnated on a local plateau of the objective function. Since this behavior was also exhibited by V1 (e.g., GULF, TRID; although V1 recovered each time), it demonstrates that compensating for the output behavior of the objective function by the input space-oriented sampling strategy and by altering the shape of the trust-regions is quintessential to recover from such apparent plateaus.

6.2.2. Extended Powell Singular and Extended Rosenbrock

To demonstrate and comment on the mechanics of the elaborated methods in greater detail, convergence results on the 8-dimensional extended Powell singular and extended Rosenbrock are presented in Figure 4 and discussed below.

Consider the convergence history for the extended Powell singular function in Figure 4(a). We will initially discuss the history of the algorithms V1, ORBIT, and NMSMAX. All exhibit approximately the same convergence behavior during the initial stage of the optimization (up to 100 function evaluations). After 100 function evaluations, the NMSMAX algorithm clearly starts to diverge from this trend. It is because of the fact that the NMSMAX algorithm is bounded to operate on its vertices. At the common point of approximately 100 function evaluations, it reaches a region where the simplex size needs to shrink sufficiently to attain a size that is small enough to accurately grasp the local curvature of the objective function. A similar phenomenon is observed for the ORBIT algorithm, which starts to diverge from the global trend approximately after 200 function evaluations. The phenomenon occurs later for the ORBIT algorithm since it is not bounded to operate on the vertices of its standard simplex, and the solution of (3) is only limited to the convex set, . Therefore, it can maintain a larger trust-region throughout the iterations than NMSMAX. That algorithm V1 can maintain a larger trust-region longer than ORBIT, and NMSMAX is a result of the improved geometry of the interpolation points. Apart from that, algorithm V1 too fails to reach the benchmark condition within the admissible budget. Now, consider the behavior of algorithm V2. The sampling strategy is equivalent to that of V1, however, the modelling space is now normalized with respect to the estimated curvature of the objective function. It corresponds with an ellipsoidal trust-region geometry in the original optimization space. The effect is hardly observable during the initial stage but clearly demonstrates its usefulness after approximately 500 function evaluations, where V2 diverges from V1 and successfully reaches the benchmark condition. It is argued that because of the transformation, the modeling and optimization steps of the algorithm are now executed in a space, where the objective function behaves more as if it was a quadratic function. The effect of the parameter can be assessed by comparing V2, V3, and V4. The parameter determines how strictly the volume of an exact regular simplex has to be respected. It results in a larger number of interpolation points to be added in every iteration. Therefore, algorithm V3 and V4 diverge from the common trend at about the same moment V1 and ORBIT do. In this region of the optimization space, the size of the trust-region (simplex) starts to shrink for all algorithms. It means that less function evaluations can be recycled from one trust-region to another. Algorithms V3 and V4 set a stricter limit on the volume of the simplex and are, therefore, forced to add more points to complement the recycled set. As a result, they consume more function evaluations than V1 and V2 to cross the same region. Nonetheless, this increase in function evaluations benefits the overall modelling procedure, and V2 is surpassed by V3 eventually.

These observations are confirmed by the convergence history for the extended Rosenbrock function, see Figure 4(b). Here, the results are directly affected by the local behavior of the objective function and all algorithms start to diverge at the starting point. In this case, V1 does reach the benchmark condition, again illustrating the improved scattering of points with respect to state-of-the-art. We argue this is a result of the regular simplex based sampling strategy. The beneficial effect of the ellipsoidal trust-region framework is again clearly present as V2 to V4 converge faster than V1. In conclusion, one can observe that the same effect is present for V3 as was the case for the extended Powell singular function. Function evaluations are consumed at a higher rate in the initial stage. This phenomenon ultimately benefits the modelling procedure in the final optimization stage, resulting in a steeper convergence history and even the surpassing of V2 with respect to convergence speed.

7. Conclusion

This paper is about reducing the number of objective function evaluations in derivative-free numerical optimization with surrogate models. Here, the radial basis functions are considered to be surrogate models with the interpolation framework provided by universal kriging. In each iteration of an interpolation-based optimization algorithm, a trust region can be used, wherein an optimization is performed using the surrogate model. The interpolation framework consists of using an interpolation set, of which, a subset of interpolation points needs to be well-poised so that the quality of the surrogate model is maximal. Interpolation set management aims at maximizing this range of model validity.

Currently, at least interpolation points are considered to guarantee a valid surrogate model. We advanced on interpolation-based optimization by relying on a subset of at least interpolation points within the interpolation set management to improve the scattering of the interpolation points, but more importantly, to have surrogate models that exhibit curvature. For that purpose, we defined fully linear surrogate models in Definition 1 that can be incorporated in the trust-region Algorithm 2. We implemented RBF models using a flexible multivariate interpolation framework for which universal kriging was used (see Section 3). In Lemma 1, we provide for a given RBF model with linear tail, information on the boundedness of the tolerance of the RBF model, objective function, and information on their gradients, i.e., constants and in Definition 1. In this manner, we are able to predict whether an interpolation model can be classified as being fully linear. Since we consider interpolation points that represent a regular simplex (see Section 4.3), we propose in algorithms 3 and 4 how the interpolation set management needs to be adapted, i.e., how the input points that need to be evaluated in the objective function need to be chosen.

By leveraging on the curvature of the surrogate model that is valid in the presented interpolation set management, we introduced an ellipsoidal trust-region framework that allows morphed trust-regions that adapt to the objective function geometry. This adaptation is implemented by means of curvature normalization with coordinate transformation (24) and filtered Hessian approximation (29). In this manner, we allow to incorporate the output behavior of the objective function in the shape of the trust-region.

The proposed interpolation set management with proper input of interpolation points that follow a well-poised geometry, together with the output behavior of the objective function that is incorporated in the ellipsoidal trust-region, have shown their benefits with respect to the reduction of the number of objective function evaluations when performing elaborate numerical optimizations on various test functions with . To show the impact of internal parameters of the algorithms, i.e., in Algorithm 3, which is related to the input set, and in (29), which is related to the ellipsoidal trust-region shape, the results of 4 different versions of the presented algorithm are shown in Figure 4 and Table 3. The results clearly illustrate the impact of the used input interpolation points and the ellipsoidal trust region. Moreover, the convergence histories show faster convergence compared to algorithms incorporating at least interpolation points and compared to those that use a spherical trust-region [61].

Appendix

A. Universal Kriging

The universal kriging (UK) framework obtains an interpolating approximation model for the function as the most likely unbiased linear predictor, , based on a set of interpolation points and a set of function evaluations, . The framework presumes that the modelled function can be decomposed as follows:

The function is assumed to be an element of a linear function space, . In the UK framework, one is interested in the specific case, where is the -variate polynomial space, , of at most degree spanned by the arbitrary basis, . We further introduce the vector, , whose entries are determined by so that we can write , where is a unique coefficient vector. It is further assumed that is an unbiased random process and that the output values and are correlated. Their correlation is modeled by a function from the class of radial basis functions, i.e., . One looks for an unbiased linear predictor , where the elements of determine the entries of .

The model decomposition can be applied to this vector so that one obtains . Here, the entries of are defined as and the matrix as . Substitution in the linear predictor yields . Hence, we can define an error function between the linear predictor and the true function as follows:.

We can now impose two additional properties on the linear predictor that affect the error function and that will render the predictor unique. Firstly, the linear predictor should be unbiased, which can be expressed as . It follows that , and consequently, it follows that . Secondly, we wish to determine so that the expected squared error is minimized. Given that , the expected squared error can be defined as , and after substitution of the error function, we obtain . The evaluation of the expectation operator determines that the error is proportional to the functional , where matrix is defined as , and the entries of are determined by . The most likely unbiased linear predictor is thus the solution of the equality constrained variational problem.

The expression of the first-order optimality condition yields the system of linear equations given in (16).

According to the probabilistic interpretation of the random process , the function values in can be associated with the outcome of a stochastic experiment. Consequently, it is possible to associate a probability value to the outcome of this experiment. If we make the additional assumption that the random process itself behaves as a Gaussian with the expectation and covariance , the probability of the experiment is given by,

A maximum log likelihood estimation for the parameter results in the following estimate: . Now, assume that the correlation model still depends on the additional hyperparameter, say (e.g., ). Then, the optimal parameter can also be determined by maximizing the likelihood.

B. Error Bounds on Better-than-Linear RBF Models

Using the Einstein summation convention, we can write an RBF model with linear polynomial tail as presented in (B.1). Here, we have implicitly defined the error functions and . We are interested in the bounds on the values these functions can take, given that the interpolation set, , contains the origin and that both and are contained within the ball .

Subtracting the left- and right-hand side expressions for from the left- and right-hand side expressions of each of the interpolation conditions, , one obtains the following set of equalities.of which the latter holds for all .

These inequalities can be manipulated so to relate the error on the function value, , to the error on the derivative value, . For that purpose, we shall first rewrite the functions and using the integral representation . After substituting for , it follows that . Now, to make the derivative error function emerge, one can add to each side of the expressions. Together with the left-hand side of (B.2) this expression combines into . In the right-hand side, these expressions can be included in the integrand since their value is constant with respect to the integration variable and considering that the integral is taken over the unit interval.

These manipulations allow to reorganize the equalities in (B.2) and rewrite them in the function of the operators and , which are defined as follows:and in the function of the error functions, and are given by the following:of which the latter holds for all .

Introducing the operator, , and subtracting the first equation from the others yields the final expressions,

Considering that , it can be shown that the right-hand side is bounded by (see [40, 62]),where and are the Lipschitz constants of the functions and , given that .

The combination of the equations in (B.5) yields a bound on the derivative error function in the function of the interpolation points contained in .

Since we already established a relation between both error functions in (B.5), the bound above allows one to determine a bound on as well, which is given by the following:

Now, upon introducing the scaled interpolation matrix , we have established Taylor-like bounds on both the error value functions that are related to the inverse value of . These bounds are, in fact, solely determined by the interpolation set geometry.

Consider a set of points associated with a set of coordinates in an dimensional space. Assume that these points are linearly independent so that the column vectors are linearly independent. A simplex is, therefore, the -dimensional generalization of a triangle. In simplex terminology, one has that each point from determines a vertex, each couple of vertices determines an edge, and each subset of vertices determines a face. Thus, an -simplex has vertices, edges, and faces. Conveniently, it follows that each face of an -simplex constitutes an -simplex itself.

According to the general convention, when speaking of the -simplex that is determined by , one refers to the convex set determined by .

Now, as it may be inconsistent with this general convention, in this work, we refer to the set of vertices when speaking of the simplex. However, when we speak of the hypervolume of that simplex, we do, in fact, refer to the hypervolume of the convex set, .

The volume of the simplex determined by is defined as the determinant of the associated column vector matrix normalized by the factor .where the operator is defined as .

In the following subsection, we mention some relevant simplex geometries and properties.

Orthocentric Simplex. A simplex for which the following holds are called orthocentric simplices: each edge is perpendicular to all the edges it does not meet [63]. Orthocentric simplices possess a unique orthocenter. In the special cases where , every set of points that determines a simplex is orthocentric as well.

In [50], it is shown that an orthocentric simplex maximizes the volume satisfying a predetermined orthocenter, , and distances from the simplex vertices to the orthocenter . These conditions determine the simplex, , within an isometry. It is this very property that we exploit to maximize the volume of the union of an existing set and an expansion set generated by the procedure , Algorithm 4. The expansion set is constructed exactly so that all of its edges are perpendicular to the edges determined by the existing set. Given the freedom of the edge lengths, albeit that the union of existing and expansion set is restricted to an -sphere, these are determined so to maximize the volume.

Standard Simplex. A standard -simplex is obtained when of its edges are orthogonal and have the same length, . A convenient coordinate representation of a standard -simplex is given by the origin, , and the scaled coordinates of the basis vectors, . It is easily verified that the nonorthogonal edges have length .

We also note that the standard simplex is not an orthocentric simplex.

Regular Simplex. A regular -simplex with side length is obtained when all edges have the same length. From this definition, it follows immediately that any subset of points from that simplex determines a regular -simplex.

If we consider all simplices whose vertices are contained within an -hypersphere with radius , it can be shown that if a simplex’s volume is maximized, it is an element from the class of regular simplices [63]. Moreover, it can be shown that the regular simplex is an orthocentric simplex.

It is worth mentioning that a regular -simplex is embedded in a standard -simplex as the face determined by the nonorthogonal edges of the standard -simplex. The edge length of a regular -simplex that is embedded in a standard -simplex with an orthogonal edge length, , is in accordance with the nonorthogonal edge length of a standard simplex, with an edge length as determined above.

Volume of the Regular -Simplex. Let be the hypervolume occupied by the standard -simplex with an orthogonal edge length . Following the definition given in (C.4), it is easily verified that . Now, consider that the face determined by all the nonorthogonal edges of the standard -simplex constitutes a regular -simplex. The distance from the intersection of the orthogonal edges, i.e., origin according to the coordinate representation in C.2, to the face that determines the regular -simplex is given by . Since the hypervolume of a standard -simplex can also be determined using the definition of the hyperpyramid, i.e., by multiplying the hypervolume of one of its faces with the corresponding altitude divided by , the hypervolume of the regular -simplex can be determined. Recall that the edge length of the regular -simplex determined by the nonorthogonal edges is and that, therefore, the result must be scaled by a factor .

Radius of the Circumscribed -Sphere. The radius, , of the -sphere circumscribing a regular -simplex with side length can be determined reconsidering the geometric relations between a standard -simplex and a regular -simplex described to determine the hypervolume. In a regular -simplex, the distance between the unique orthocenter and a vertex is equal to the radius of the circumscribed -sphere. It holds that the altitude of the face determined by the nonorthogonal edges and one of the orthogonal edges of the standard -simplex constitute a right triangle with the orthocenter of the associated regular -simplex and a common vertex. Note that again the edge length of this -simplex is and that, therefore, the result must be scaled with by a factor .

Hence, the side length, , of the regular -simplex inscribed in an -sphere with radius is given by .

Other Useful Relations. From these results, we can derive some other equalities that are of interest for the procedure described by Algorithm 4.

The radius of the circumscribed -sphere of a regular -simplex that is part of a regular -simplex circumscribed by an -sphere with radius is given by the following:

Hence, the distance, , from the orthocenter of the regular -simplex to a subset of vertices determining such a regular -simplex can be determined considering that,

This relation can be generalized. Consider, therefore, a regular -simplex embedded in an -dimensional space and inscribed in an -sphere with radius . Its edge length must then be . Now, if one would translate this -simplex along any direction perpendicular to the -dimensional subspace spanned by its edges over a distance , this -simplex remains on the surface of the -sphere as long as it is scaled by a factor .

Data Availability

The numerical data used to support the findings of this study are available from the corresponding author upon request and are also available on the Githubpage associated to this article.

Disclosure

An earlier version of this research has been presented as a preprint.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Acknowledgments

The authors acknowledge the support of the FWO Research Project G.0D93.16N and the Flanders Make Project EMODO.