Machine Learning based optimization for interval uncertainty propagation

Two non-intrusive uncertainty propagation approaches are proposed for the performance analysis of engineering systems described by expensive-to-evaluate deterministic computer models with parameters defined as interval variables. These approaches employ a machine learning based optimization strategy, the so-called Bayesian optimization, for evaluating the upper and lower bounds of a generic response variable over the set of possible responses obtained when each interval variable varies independently over its range. The lack of knowledge caused by not evaluating the response function for all the possible combinations of the interval variables is accounted for by developing a probabilistic description of the response variable itself by using a Gaussian Process regression model. An iterative procedure is developed for selecting a small number of simulations to be evaluated for updating this statistical model by using well-established acquisition functions and to assess the response bounds. In both approaches, an initial training dataset is defined. While one approach builds iteratively two distinct training datasets for evaluating separately the upper and lower bounds of the response variable, the other builds iteratively a single training dataset. Consequently, the two approaches will produce different bound estimates at each iteration. The upper and lower bound responses are expressed as point estimates obtained from the mean function of the posterior distribution. Moreover, a confidence interval on each estimate is provided for effectively communicating to engineers when these estimates are obtained for a combination of the interval variables for which no deterministic simulation has been run. Finally, two metrics are proposed to define conditions for assessing if the predicted bound estimates can be considered satisfactory.


Introduction
It is often the case in engineering problems that little information is available concerning the actual value and/or the inherent variability of the key input parameters of an expensive-to-run model required for a system performance assessment with respect to safety, quality, design or cost constraints, especially when the performance is assessed in terms of noise and vibration [1,2].
While the epistemic uncertainty is related to a lack of knowledge and can be potentially reduced by gathering more information, the aleatoric uncertainty is irreducible since it is related to an inherent variability of the input parameters. The performance assessment of such designs should ideally consider these uncertainties to select robust design solutions [1][2]. This can be achieved by performing the so-called forward uncertainty quantification analysis [1]. Within this analysis, the underlying uncertainties (for example, uncertainties in material properties, loading conditions, boundary conditions, and fabrication details) of the input parameters are described with an uncertainty model, and the goal is to obtain the corresponding uncertainty description of the response variables. In particular, depending on the available information on the input parameters, they can be described either with a probabilistic [1,3], non-probabilistic [4,5] (by setting upper and lower bounds [6], using an admissible convex region [7] or by means of fuzzy numbers [8][9][10]), or based on info-gap decision theory [11]. Alternatively, non-parametric uncertainty models can be implemented to obviate the need for a detailed description of the underlying uncertain model parameters by using some form of Random Matrix theory [1,[12][13][14].
Broadly speaking, the forward uncertainty propagation approaches can be grouped into intrusive and non-intrusive approaches. Non-intrusive approaches do not require direct modifications on the expensive-to-evaluate model which can be treated as a black-box function. The model is evaluated for each realization of the uncertainty variable, to assess the input uncertainty effects on the response variables. One very well-known example when dealing with probabilistic uncertainty descriptions is the Monte Carlo sampling strategy [3] that can be used to yield the probability density function and other statistics of a response variable of interest. On the other hand, the intrusive approaches require modifications of the governing equations of the problem to build models that inherently embed certain uncertainty descriptions. Very well-known examples are some of the expansion-based methods [15]. These approaches enable a drastic reduction of the 3 computational cost of the uncertainty propagation and are valid under certain assumptions on the uncertainty levels.
Although many approaches have been developed in the last decades, there are still unresolved challenges that need to be tackled for assessing the noise and vibration performance of an engineering design, especially when uncertain input parameters are described by interval variables [5]. Intervals are used for describing parameters not known precisely for which only the maximum and minimum values, also known as bounds, they could take is known. These bounds are often representing "tolerance values" on the parameters associated with the properties of a component of a system. Bounds can also be used to describe known limits on the values that an input variable of a model might take. This type of bounded description is commonly used in engineering practices, especially at early design stages when a limited amount of data and/or information may be available to select the form of the distribution of the uncertain variable and/or to specify the parameters of the distribution. In fact, the results on a system performance assessment with respect to safety, quality, design or cost constraints are strongly dependent on the probability density function assumed [16,17]. Often it is assumed that the interval description is equivalent to the uniform distribution, by exploiting the Maximum Entropy Principle [18] so that well-established probabilistic uncertainty quantification approaches can be implemented. However, converting an interval description into a uniform probability distribution means assigning equal probability of occurrence to each value within the interval (see for example [19]). Moreover, if the upper bound of the response is obtained at the upper bound of an interval variable, there is a very low probability of the random sampling strategy selecting this value precisely [19]. This paper is focused on engineering problems characterized by an expensive-to-run deterministic model for which the input parameters are described by interval variables to yield the bounds on a response variable of interest. No assumption on the monotonicity of the response with respect to the input uncertainties is made, since this is not often met when dealing with noise and vibration problems because of the presence of resonances [19]. Furthermore, the attention is focused on nonintrusive interval uncertainty propagation strategies. Among the state-of-the-art approaches, the Interval Vertex Method (VM) [20] would not provide an accurate estimate of the response bounds because of the aforementioned non-monotonic behaviour. This is because, as the name suggests, the VM requires evaluating the deterministic model only for the combination of all the vertices of the interval input variables. Alternatively, the Subinterval Method (SM) [19] can be used. With this approach, the domain of each interval variable is decomposed into subintervals within which it is assumed that the generic quantity of interest is a monotonic function with respect to the uncertain parameters. Then, the response bounds are obtained as the minimum and maximum across all the possible combinations of the endpoints of the subintervals. Since it is not known a priori for which values of the interval variables the bounds of the generic response quantity of interest are attained, then enough subintervals should be considered. Global Search Algorithms [5] can be also employed to deal with these problems. These algorithms are usually combined with a surrogate model which approximate the relationship between the structural response and the input parameters and is computationally cheaper to run [5,21]. Such surrogate models can be obtained by using high-order polynomial function or are based on machine learning techniques [5,21]. For example, Radial Basis Function [22], Kriging interpolation schemes [23] and Artificial Neural Network [24,25] have been implemented to address the interval uncertainty propagation problem.
Nonetheless, these approaches would require to evaluate a large number of runs to construct an accurate surrogate model, and to fine-tune the surrogate model parameters. Therefore, they can entail a considerable computational effort if a large number of uncertain parameters need to be considered. De Munck et al. [26] proposed an adaptive response surface method based on Kriging model to yield an accurate meta-model with a limited computational cost. In [27] the upper and lower bounds of a black-box response function were considered as two separate optimization problems, and the computational cost was reduced by applying the Efficient Global Optimization (EGO) algorithm [28]. Recently, a similar approach has been used to evaluate the response bounds for train-bridge systems [29].
In this paper, two non-intrusive UQ propagation approaches are proposed for the analysis of a generic system subject to interval uncertainties. These approaches employ a machine learning based optimization strategy for evaluating the Upper Bound (UB) and Lower Bound (LB) of a generic response variable over the set of the possible responses by iteratively selecting a limited number of simulations to be evaluated. The response is considered as a black-box function that depends on the interval variables for which no analytical solution or derivative information is available. No assumptions on the linearity or monotonicity of the response variable with respect to the interval uncertainties are required. It is assumed that the response function is smooth and it can be obtained for discrete values of the interval variables by running an expensive-to-evaluate 6 a threshold on the absolute maximum value of the AF is reached. The response bound estimates are then expressed in terms of the maximum and minimum of the posterior mean function of the corresponding GP. Moreover, the confidence bounds at the points are used to quantify the remaining epistemic uncertainty. Two metrics are then introduced to assess if the response bounds obtained are satisfactory or if additional simulations need to be carried out.
The paper is structured as follows: the two proposed approaches are described in section 2. The algorithms summarizing the steps of Approach A and Approach B are described in section 3 and section 4, respectively. The two approaches are illustrated by investigating a single degree of freedom system and a plate-acoustic cavity subject to interval uncertainties in section 5. Finally, section 6 and section 7 provide discussions and conclusions, respectively.

Problem description
Let us consider a generic engineering system represented by a deterministic model (obtained with Finite Element [40], Boundary Element [41], Wave Finite Element [42] or any other deterministic modelling strategy) subject to time-varying inputs and characterised by r independent uncertain input parameters described as intervals. These uncertain input parameters are collected into the interval vector   T  int  int  int  12   , , , , where b and b denote the vectors collecting the lower bound and the upper bound of the i-th interval variable int , In other words, w and w are, respectively, the minimum and the maximum values of the all possible response evaluations obtained when each uncertain parameter int i b ,   1,2, , i r  varies independently over its range. The direct solution of the optimization problems in Eq. (1) can become challenging to the point of being unfeasible for large complex models of engineering systems. This is because the analytical expressions of   w b and of its derivatives with respect to specific combinations of the interval parameters j b are often not available. Consequently, the resolution of the optimization problem would require multiple runs of the underlying model or of a surrogate model [5,21]. However, as the number of uncertain parameters increases and as the mathematical model complexity increases, the evaluation of the bounds on   w b can become computationally expensive and prone to errors [5,21]. These problems are particularly relevant when dealing with large numerical models for the vibro-acoustics performance assessment of engineering systems. Moreover, vibro-acoustic models display a non-monotonic behaviour of the response due to the presence of resonant peaks that might occur in certain frequency bands for different combinations of the interval input parameters.

Interval Uncertainty Quantification as a Bayesian optimization problem
The proposed approaches frame the uncertainty quantification under interval variables into a Bayesian Optimization (BO) framework [31,32]. The BO is an approach to maximising an objective function. It is based on building a probabilistic surrogate of an objective function by using a Bayesian machine learning technique, the Gaussian Process Regression [30]. The so-called Acquisition Functions are then applied to this probabilistic model for efficiently selecting the objective function samples to be evaluated [31,32]. can be then modelled as a Gaussian random variable. The collection of Gaussian random variables   j w b represents a Gaussian Process (GP) of   w b . The GP represents a probability distribution over the function   w b , that is the prior distribution regression is a non-parametric approach that finds a distribution for   w b over the possible interpolating functions that are consistent with the observed data D [30]. This distribution is the so-called posterior distribution and is indicated with . Here, D is obtained by running some deterministic simulations of the model for fixed is obtained by means of Bayesian inference: is the joint distribution of   w b and D , that can be expressed as the product of     p D w b , that is the so-called likelihood model, and   pD, that is the marginal likelihood (or evidence).   pD is independent on   w b and is therefore a normalizing constant. Within the GP regression model [30], we are dealing with Gaussian distributions which are closed under conditioning. This means that the resulting predictive posterior distribution is also Gaussian, and the expression of its mean and variance are readily available [30].
The goal of interval uncertainty quantification is to obtain the maximum and the minimum of at two distinctive combinations of the interval variables to yield an estimate of the bounds of   w b in terms of the maximum or minimum of the predictive posterior mean function, and to quantify the uncertainty associated with that estimate in the form of confidence bounds. As in BO, this can be achieved by using the Acquisition Functions (AFs). These ad-hoc functions of the GP regression model are 9 inexpensive to evaluate compared to the objective function. The AFs are used to select which combination of the interval variables j b has to be evaluated next in order to improve the most the assessment of each response bound.
Based on this framework, two approaches are proposed and are schematically shown in Figure 1 and Figure 2. Both approaches require the selection of an initial number of simulations to construct the so-called initial training dataset. This dataset is used to build a GP regression model of the response variable. Subsequently, the GP is used in combination with the AFs to identify two combinations of the interval variable at which additional simulations must be evaluated to improve the accuracy of the predicted response bounds. These simulations are then used to enhance the training dataset, and to identify through an iterative procedure the next set of simulations to be performed to improve the accuracy of the predicted response bounds. Stopping criteria are used to limit the number of simulations to be evaluated. The difference between these two approaches is in the use of the simulation results obtained at each iteration. Approach A considers two separate enhanced training datasets (one for the evaluation of the UB and one for the evaluation of the LB) and therefore addresses the two optimization problems in parallel. Approach B uses a single enhanced training dataset. As a result, at the end of each iteration, different predictive posterior distributions will be obtained with the two approaches which consequently will yield different bound estimates.  Finally, two metrics are introduced to define conditions for assessing if the predicted response bounds are satisfactory or additional simulations need to be evaluated.
The two approaches share the same building blocks that are described in what follows.

Gaussian Process regression model
The n Gaussian random variables representing each   j w b are collected into a vector This collection of variables has a jointly Gaussian distribution. Therefore, the vector w represents a GP of the response function   of noise-free observations, and a test dataset of function realisations * w we want to predict.
Because of the properties of the multivariate Gaussian distribution, the joint distribution of 0 w and * w is itself Gaussian [30], and it can be expressed as:

Cov
, , The other covariance matrices are defined in a similar fashion:

Cov
, , The covariance matrix defines the similarity between the i-th and j-th entries of the random variable. Therefore, it encodes the prior assumption on the smoothness of the black-box response function. The covariance matrix is defined in terms of a positive definite kernel function  [30,43] which for a pair of points (e.g. two training points) return the similarity between those points as a scalar.
Because of the properties of the multivariate Gaussian distribution, it can be easily shown that the posterior predictive distribution of * w , given the training dataset are needed to measure of the predictive uncertainty at each test point. Therefore, the predictive marginal mean and variance for a single test point j b are expressed, respectively, as: where 0 j Σ is a matrix of size 0 1 n  . From Eqs. (9)(10), it is possible to observe that while each element of the mean vector is shifted by the conditioned variable, the covariance matrix is independent on the conditioned variable. The posterior distribution obtained will be such that the GP will have zero variance and the mean corresponding to the actual response for each training dataset point.
Without loss of generality, it can be assumed that *  μ0 and 0  μ0 . The kernel function is the crucial element of the GP since it determines the characteristic shape of the function being approximated [30,43]. The selection of a type of kernel function encodes the user's belief on the function being predicted [30]. Broadly speaking, kernel functions can be subdivided into stationary and non-stationary [30,43]. While stationary kernels are invariant to translation and the covariance is dependent only on the relative position of , j bb , this is not the case for non-stationary kernel. The simplest non-stationary kernel is the linear kernel [30,43]. In this paper, the kernel chosen is stationary and is defined as the exponential of a weighted distance function (also known as squared exponential), which is preferred when little is known about the black-box function to be approximated [21,28]: where the weighted distance function is defined as [21,28]:  [21,28].
These hyperparameters   ,  α θ p can be estimated by maximising the log marginal likelihood (or equivalently, minimising the negative log marginal likelihood) of the observed data [30,43,44].
Independently of the kernel chosen, the log marginal likelihood of a Gaussian distribution and its partial derivatives with respect to the hyperparameters can be expressed in closed-form as [30]:   T -1 -1 -1 00 00 0 00 00 0 00 11 Tr 22 where  indicates the determinant, while

 
Tr  is the trace operator. Setting to zero the closedform expression of the partial derivatives, it is then possible to evaluate numerically the estimates of α . It is well known that the log marginal likelihood is a non-convex optimization problem can be characterised by several local optima [31][32][43][44][45]. The selection of a point different from the global optimum can considerably affect the accuracy of the hyperparameters, and consequently of the GP regression model. Therefore, it is recommended to consider multi-starting points optimization strategies to evaluate the global optimum [43,44]. Alternative approaches for hyperparameters estimation include maximum a-posteriori [43] and stochastic-based inference methods [45,46]. On time complexity of the determinant and inverse of the covariance matrix of the training data [30].
Reducing the computational cost while preserving the accuracy of the GP is an active area of research in computer science [47,48]. Since the response variable of a large-scale-model might be high-dimensional (meaning that it is defined as a function of a large number of interval variables), the size of both training and test datasets could be quite large. Therefore, the complexity in each optimization step could hinder the application of the GP regression model. In this paper, this problem is addressed by selecting the minimum number of elements required for the initial training dataset, and subsequently for the enhanced training dataset, to improve the GP capability to predict the response bounds while containing the computational costs, as described in what follows.

Initial training dataset
The training data   combinations, and thus the corresponding simulations to be evaluated, in this paper it is assumed that no prior information about the interactions between the interval parameters and their effect on the response is available. These interactions are then "learned" during the iterative steps of the proposed approaches. The Taguchi method [35,36] is implemented here to (i) cover as much as possible the r-dimensional hypercube represented by r>2 interval variables and (ii) to limit the initial computational effort required for building a generic initial GP regression model of the response function.
The Taguchi method [35,36] is a well-established Design of Experiment technique which selects the so-called design points, that are the simulations that are going to be run, by using the properties of the so-called orthogonal arrays. This approach requires specifying a number of q levels, which would result in q-1 subintervals within each interval variable. The minimum number of design points s (neglecting the interaction between the interval parameters) corresponds to However, it is desired that the Taguchi matrix   r s q L is balanced, which results in s being a multiple of q. Interested readers can find more details about the Taguchi method and approaches for deriving the Taguchi matrix in reference [35,36]. The selection on the number of levels q, and the resulting s simulations, is strongly dependent on the total number of simulations available for the problem under investigation. As shown in the second numerical application, for a limited total number of simulations, increasing the number of q does not necessarily improve the bounds prediction. As a rule-of-thumb, it is recommended to not use more than a quarter of the total number of simulations available for the initial training dataset.
The set of s combination of the interval variables Thus, the initial training dataset is: Therefore, the GP is built considering t DD  . Alternative sampling methods [21,35], e.g. the Latin Hypercube sampling, could be also implemented to obtain the initial training dataset.

Selecting the elements of the Enhanced Training Dataset
Having defined the GP regression model of the objective function   w b , the Bayesian optimization framework [31,32] can be readily employed for the selection of the next point ˆj b at which the   w b has to be evaluated in order to improve the most the prediction of the maximum of the objective function. In particular, the next point ˆj b is evaluated by formulating an optimization problem in the form [28,31,32]: (17) Where   AF b represent the so-called Acquisition Function (AF), that is an inexpensive function of the GP regression model which quantifies how desirable is to evaluate the objective function at j b [28,31,32]. The AF is chosen to respect two criteria [28,31,32]: (i) Exploitation which suggests that ˆj b should be located at regions with low variance values close to the current optimum value; (ii) Exploration which suggests that ˆj b located at unexplored regions with high variance. Eq. (17) can be particularized for tackling the evaluation of the next point when predicting the LB and the UB of the response variable of interest, ˆL where the notation is used to distinguish the formulation of the AF to be implemented in the two optimization problems in Eq. (1).
Many AFs have been proposed in the literature to balance the Exploitation and Exploration. An exhaustive review and comparison of the most known and used AFs can be found in references [32,[49][50][51][52]. Three AFs are explored in this paper: Probability of Improvement [21,33,37], Expected Improvement [28,31,32,34,37] and Confidence Bounds [38,39]. The AFs are often multimodal, and for complex problems, they might be non-convex and high-dimensional.
Therefore, their maximization can be a challenging task [32]. Since the AF is relatively inexpensive to evaluate compared to the black-box response function, the resolution of the optimization problem in Eq. (18) and Eq. (19) is usually addressed with alternative optimization strategies, including gradient-based methods and variations of random search [31,32]. In reference [29], closed-form expressions of the derivatives of the Expected Improvement AF with respect to the parameters b were used to tackle the AF optimization [29]. This approach might be prone to numerical errors related to calculation of the inverse of the training points covariance matrix and of the derivatives of the training and test points covariance matrix, especially in high-dimensions.

Probability of Improvement (PI)
The Probability of Improvement is based on the definition of the so-called Improvement function [21,33,37]. The improvement function with respect to the minimum value of the vector of observed responses   min 0 min w  w is defined as [21,33,37]: (20) So that values of the   w b higher than the current observed minimum yield zero improvement.
Similarly, the improvement function with respect to   max 0 max w  w is written as [21,33,37]: For each value of  (10), respectively). Therefore, the probability density function of improvement is expressed as: Then the so-called Probability of Improvement (PI), for the minimization With    being the cumulative distribution function of the standard normal and: The PI AF is more prone to select points favoring Exploitation [21,33,37]. Indeed, it favors sampling at locations with less uncertainty to yield an improvement over the current best observation [32,33]. This behavior can be observed in the results of the first numerical application shown in Figure 8(a).

Expected Improvement (EI)
The Expected Improvement (EI) was introduced by Močkus [34] and it is used as AF in the Efficient Global Optimization algorithm developed by Jones et al. [28]. The EI AF for the problems can be defined as [28,34]: The integrals in Eq. (26) can be solved by parts leading to the following closed-form expressions [21]: The EI AF is positive quantity that takes the value of zero at the observed responses. This is an important property which suggests that we cannot expect an improvement at points at which the model has been run [28,34]. In [28,34] it was shown that the EI yields a good balance between Exploitation and Exploration. This behavior can be observed in the results of the first numerical application shown in Figure 8(b). The performance of the EI AF has been investigated in several references [37,52,53]. Generalizations of this AF can be found in [54]. This AF has been used for assessing response bounds in engineering applications [26,27,29].

Confidence Bounds (CBs)
The Confidence Bounds (CBs) are another type of AF [36,37]. The Lower Confidence Bound (LCB) and the Upper CB (UCB) are expressed as [38,39]: with 0   . The parameter  balances the trade-off between Exploitation and Exploration. In particular, the lower is the value of  , the more Exploitation is obtained. According to [38], 2   yield a good compromise. More advanced strategies can be implemented to tune  , as described in [39]. Results obtained for the first numerical applications shown in Figure 8(c) with 2   indeed display a good balance between Exploration and Exploitation.

Enhanced training dataset
The new point ˆL In Approach A, two separate enhanced training datasets are generated at each iteration: So that for the next step of the iterative procedure E DD  . In this approach, at each iteration a single GP is used for estimating the LB and UB. Therefore, ˆL

Stopping Criteria
The stopping criteria define the conditions under which the iterations are stopped. The most obvious stopping criterion is the computational budget. This is because in engineering applications, the number of simulations that it is possible to run is usually limited by time and/or cost constraints. However, a pre-fixed computational budget may lead to stopping before the actual response bounds have been correctly identified. Alternatively, late stopping might occur, which would result in the unnecessary allocation of computational resources.
Within Bayesian optimization, it is common to use alternative stopping criteria [31,32]. One popular stopping criterion [27,28,29,37,38] is expressed in terms of the AF being smaller than a specified tolerance value  [37,38]: For the EI AF, it is suggested to set  to 2 10  [37]. For the CBs,  would correspond either the minimum or the maximum of the observed responses ( respectively) [38]. Both criteria are going to be considered and verified at the end of each iteration step. It is worth noting that the satisfaction of the stopping criterion in Eq. (33) does not necessarily guarantee a certain accuracy on the response bounds estimate, as discussed in what follows.

Response bound estimates
Once the stopping criteria are met, the latest GP regression model can be used to obtain the prediction on the response LB and UB in terms of:  (34) (ii) the minimum and/or maximum of the mean function of the posterior distribution, (35) These are standard ways of expressing the results obtained with Bayesian Optimization [31,32], and (i) is usually preferred in engineering applications [26,27,29]. Nonetheless, the mean   m b can be interpreted as a point estimate of   w b for a fixed b , and using the frequentist interpretation, the credible interval     2 m   bb can be interpreted as the 95% confidence intervalthat is the interval in which there is a 95% probability to find the true function   w b .
The confidence bounds returned at the minimum and/or maximum of the mean function can be used to quantify the remaining epistemic uncertainty caused by not having evaluated the simulation at that point. Therefore, each bound obtained with Approach A or Approach B can be expressed as an interval: The lower bound of the LB response, LB w , and the upper bound of the UB response, UB w¸ are, respectively:  considered as position vectors. The Euclidian distance of these two vectors can be then considered: Where i e is the unit vector along the length of each interval variable so that i=1,2,…,r and the dot indicates the scalar product. If the distance in Eq. (38) or Eq. (39) is large, the bound estimate obtained might be inaccurate, since the true response bound might occur in a part of the input interval variables domain that has not been sufficiently explored. As a rule-of-thumb, a distance greater than 2% of the length of the corresponding input interval variable should be investigated.
Therefore, it should be verified that: Moreover, the confidence interval obtained with the GP at the point identified by the AF, that is should be such that: (43) where LB w and UB w are defined in Eq. (37). A warning flag should be raised if Eq. (40) and Eq. (42) and/or Eq. (41) and Eq. (43) are not simultaneously met. It is important to note that these conditions might not be satisfied for Approach A, even when the correct bound has been identified.
After finding a local optimum, the AF might suggest to explore regions of the objective function which are yet unexplored to identify either the true maximum or the minimum of the objective function. For example, it is possible to observe this effect in the results obtained in the numerical application for iteration 5 of Figure 8 These conditions are important to ensure that the bound estimates are obtained for values of the interval variable near where min max , ww have been observed. However, it is also important to compare the magnitude of the mean function of the posterior distribution obtained at these points.
As a rule-of-thumb, a variation in magnitude of the mean function of the posterior distribution higher than 5% might indicate a highly non-linear response function. Therefore, it should be simultaneously verified that: It is worth noting that the conditions expressed in Eq. (46) and Eq. (47) 00   1  0  1  0ˆˆ,   ,  ,  ,  , , , , , When applying Approach A and Approach B, the optimization problems were addressed with robust strategies which are less efficient than advanced optimization solvers. In particular, the Matlab function "patternsearch" [56] was used for the loglikelihood minimization (by using Eq. (13)) in order to determine the parameters of the kernel function, and the following ranges were

Single Degree of Freedom with a single interval uncertainty
Let us consider a mass-spring-damper system characterized by a mass seconds, and it is subject to zero initial conditions. The dynamic response is governed by the well-known second-order differential equation [57]. The response variable of interest is the maximum of the absolute acceleration time history   fixed k. The Transition Matrix Method integration scheme [58] was used, and a time-step of 3 10 t   s was considered.
The SM was applied by subdividing int k into n=300 uniform subintervals. The target response function   wk , shown in Figure 3, was evaluated by performing 301 simulations. From Figure 3, it is possible to observe that SM yields very accurate predictions of the response UB and LB. The response bounds yielded by the VM (obtained by setting the k to its maximum and minimum value) are instead underestimating the UB and overestimating the LB. This is because   wk is not a monotonic function with respect to the interval variable k , as assumed in VM. In the next subsection, the results yielded by the SM and the VM approaches are compared when considering six different deviations of the uncertain variable with respect to its nominal value to investigate the non-monotonicity effects on the response bounds.

Non-monotonicity effects on the response bounds
Let us indicate int ,  , the response function   wk is a monotonic function with respect to k . Therefore the results obtained with the SM and the VM coincide. This is not the case for . The response bounds predicted by VM become inaccurate because of the non-monotonicity of   wk with respect to k . In particular, for 0.3 b  the VM overestimates the LB by 27.78% and underestimates the UB by 13.60%.

Initial training dataset
The interval is partitioned in two parts

Approach A with fixed computational budget: investigation on AFs choice
The computational budget is fixed to 33 simulations, 3 of which were used for building the initial training dataset. The rest of the budget is equally shared for the estimation of the LB and of the UB. The results obtained with Approach A for different AFs are shown in Figure 6, 7 and 8. Figure   6 shows the maximum value of the AF and percentage error between the maximum/minimum of the posterior mean function predicted at each iteration with respect to the exact bounds obtained with SM. Figure 7 and Figure 8 show the GP mean function and 95% confidence interval obtained at several iterations for predicting the LB and UB, respectively. The blue and red vertical solid lines represent the location of the next candidate point identified by the chosen AF for minimizing (blue) and maximizing (red) the objective function.   In conclusion, for the case under investigation, the stopping criterion on AF was successfully used to reduce the number of simulations to be run without compromising the accuracy on the bound estimates when considering Approach A. Moreover, EI and CBs showed a similar performance.

Results comparison for the last iteration
The results obtained with Approach A and Approach B when considering the stopping criterion on the AF are summarised in Table 1.

Effect of initial training dataset on Approach A and Approach B
Let us consider n equal partitions of the interval variable leading to 1 n  values of the interval variable at which the response has to be evaluated in order to define the initial training dataset These results can be explained by considering that by evaluating simulations which are not specifically selected to improve the response bound estimates, the regression model of the overall response would improve globally, but this will not necessarily speed up the search of the response bounds. While in Approach A, this problem is related only to the size of the initial training dataset, in Approach B this effect is also caused by the fact that a single GP is updated to solve for both the upper and lower bounds. In Figure 17, the results expressed in terms of the intervals on the bound estimates and the minimum and maximum of the observations when considering different initial training datasets are shown. It can be observed that increasing the initial training datasets, does not necessarily lead to an improvement on the bound estimates. warning was raised. Indeed, the results reported in Figure 17 show a good accuracy in the prediction of the bounds.

Plate-acoustic-cavity system with four interval variables
Enclosures with one flexible wall excited by an external loading are commonly used to provide insights into a variety of structural-acoustic problems [55]. In this section, we will investigate the vibro-acoustic response of a flexible simply-supported rectangular plate coupled to a rigid-wall acoustic cavity, driven by a single unit harmonic force applied to the plate. This problem is of interest since the plate vibration will produce an acoustic pressure in the cavity, which needs to be assessed and kept below some specific thresholds to satisfy noise level requirements. The cavity contains air, and therefore the plate and cavity are weakly coupled, and the modal-interaction approach is used to evaluate the responses of interest. Let us consider the flexible simply supported rectangular aluminium plate coupled with an acoustic cavity with 5 rigid and perfectly reflecting walls, as described in [55], whose model parameters are summarised in Table 2.  Table 2. Plate-acoustic-cavity system model properties, as reported in [55].
The plate thickness, plate Young modulus, fluid density and fluid speed of sound are assumed to be described by interval variables and are summarised in Table 3.    [55]. For the present investigation, the response variable of interest is the SPL at the microphone location calculated at 109 Hz. This frequency is close to the first resonance of the plate when the nominal values are considered. Therefore, it is anticipated that the uncertainties in the plate thickness will strongly affect the SPL at the frequency of interest.

Lower Bound
Reference [55] provides the Ansys FE model of this system for the nominal properties indicated in Table 3 considering a frequency range up of 1-400 Hz. In particular, FLUID20 elements were used for the acoustic domain, while SHELL181 elements were used for the plate [55]. Each element size was set to 0.025 m [55]. More details on the FE model can be found in [55]. For a given set of the interval variable (e.g. considering the nominal values), this relatively simple model can take several hours to run [55]. In the same reference [55], this FE model is verified with a very fast numerical model. For speeding up the computational cost of the current investigations, the fast numerical model available in [55] is used. Damping is included in the calculation by means of the application of the correspondence principle of linear viscoelasicity [59], which allows solving the damped problem by replacing the stiffness matrix of the uncoupled plate and of the uncoupled acoustic cavity with the corresponding complex modulus   , where  is the loss factor and K is the stiffness matrix. The loss factors of the plate and of the cavity are set to 0.03. The results obtained by applying the SM (40 subintervals for each interval variable) and the VM are shown in Table 4. Scaling of each input interval variable was carried out so that all the intervals were of a similar magnitude, and the objective function was modified consequently.  It is possible to observe that the LB occurs at one of the input bounds, and thus it is correctly evaluated with the VM, while the estimate of the UB obtained with the VM is 57% lower than that obtained with the SM. This is because of the resonance effect. If is also worth noting that the results obtained at the nominal values

Approach A and B with fixed computational budget and EI as AF
Approach A and Approach B are applied with EI as the AF. A total computational budget of 100 simulations is considered in addition to the simulations evaluated for building the initial training dataset. In particular, four initial training datasets were considered by running 8, 9, 16 and 25 simulations. These simulations were selected by applying the Taguchi method, and the set of input variables considered for each initial training dataset are specified in appendix. The 100 simuations were divided equally between LB and UB, unless the stopping criterion on the AF was met for one bound. For this numerical application only at 16 D the stopping criterion on the EI AF at the last iteration was met, as shown in Figure 18.

46
The number of simulations considered for each analysis is summarised in Table 5. Only for Approach B with 16 D the stopping criterion on the EI AF is met for both the LB and UB. The intervals on the bound estimates (minimum and maximum of the mean function) and also the minimum and maximum of the observations for four initial training datasets are shown in Figure   19. It can be observed that, as expected, the bound estimates are consistently over-predicting the maximum of the deterministic observations, and consistently under-predicting the minimum of the deterministic observations. Therefore, the bound estimates provide more conservative results.
Moreover, as for the previous numerical applications, increasing the elements within initial training datasets, does not necessarily lead to an improvement on the bound estimates. It can also be observed that Approach B might largely under-predict the LB. However, a different pattern in observed for the UB. The most interesting results are obtained for 9 D , where the maximum of the mean function obtained with both Approach A and Approach B is under-predicting the upper bound, and the confidence interval is misleadingly suggesting to trust those estimates. As a result, the condition expressed in Eq. (47) would be met. For Approach B with 16 D , a relatively good agreement between the prediction and the exact bounds can be observerd in Figure 19.
A closer look at the absolute percentage error on the bound estimates obtained at the last iteration of Approach A and Approach B when compared to the SM results is shown in Figure 20. It can be observed that the percentage error can be very large, especially for the lower bound, and that it does not necessarily decrease for increasing number of initial simulations or total number of simulations. The results obtained for Approach B with 16 D , have satisfied the stopping criterion on EI, however they display some percentage errors on the bound estimates. Since this information on the percentage error is considered not available (that is, the full SM analysis was not carried out), the conditions described in Section 2.8 were assessed at the last iteration for each interval variable (plate thickness, plate Young modulus, fluid density, fluid speed of sound). The results are summarized in Tables 6 and 7, where the letter Y is used to indicate when the condition was met; otherwise, the letter N was used.    (47)) were simultaneously met. It is worth noting that when conditions expressed in Eq.  (Figure 20b), but the two approaches yield different bound estimates. This can be also observed in Tables 8 and 9, where the position of the identified LB and UB obtained with Approaches A and 49 B are reported. From these tables, it is possible to observe errors smaller than 3% for all the input interval variables, which nonetheless lead to large variations in the response bounds.  Table 9. Error on LB and UB locations for four initial training datasets (Approach B, EI)

Approach A and Approach B with fixed computational budget, CBs AF
Also for the case of the CBs as AF, 100 simulations in addition to each initial training dataset were considered. The stopping criteria on the AF for the LB has been met for both approaches, while the stopping criterion on the AF was not met for the UB, as shown in Figure 21. The total number of simulations used in Approach A and Approach B are reported in Table 10.    The conditions described in Section 2.9 were considered also in this case. The results are summarized in Table 11 and 12, where the letter Y is used to indicate when the condition was met, otherwise the letter N was used.

Investigation on the AF stopping criteria on CBs
The results obtained when considering 8 D and the stopping criterion on the CBs AF are reported in   Table 15. Results obtained for initial training dataset D8 using the stopping criteria on the CBs.

Discussion
The results presented in Section 5 indicate that: i. The proposed approaches led to a drastic reduction of the number of simulations to be carried out in order to obtain accurate response bound estimates compared to the Sub-Interval method.
ii. Increasing the number of simulations for the initial training dataset does not necessarily lead to a reduction on the errors of the estimates of the response bounds when considering a fixed computational budget, nor would necessarily reduce the total number of simulations to be carried out to meet the stopping criteria on the AF. This is because these simulations are chosen a-priori when no knowledge about the relationships between the input and output interval variables is available. Therefore, these initial simulations would potentially improve the initial GP regression model overall, but not necessarily greatly help with the selection of the points needed to improve the prediction of the response bounds. As a result, it is recommended to consider a limited number of simulations for the initial training dataset, using for example the Taguchi method or other Design of Experiments techniques, and let the algorithm automatically select the points to be included in the enhanced training dataset using an AF that has a good balance between exploration and exploitation.
iii. The choice of the AF can have a large impact on the total number of simulations to be carried out and on the accuracy of the bound estimates, even when many simulations are included in the initial training dataset. As expected, it was shown that the Probability of Improvement should be avoided because it would favor exploitation rather than exploration. The Expected Improvement (EI) and the Confidence Bounds (CBs) Acquisition Functions have a good balance between exploitation and exploration and performed equally well. Interestingly, the CBs outperformed the EI in most of the cases investigated. Therefore, the CBs should be preferred. vi.
Using the stopping criterion in terms of the AF is not enough for ensuring accurate estimates of the response bounds. It was shown that the conditions based on the two metrics proposed were able to flag the need for further simulations to increase the accuracy on the response bounds.

Conclusions
Two non-intrusive uncertainty propagation approaches have been proposed for the analysis of a generic engineering system subject to interval uncertainties. One of the main advantages of the proposed approaches is interpretability, since the selection of the next combinations of interval variables to be investigated is justified in terms of the trade-off between exploration and exploitation considering a probabilistic model that embeds both user knowledge and noise-free observations obtained with a physics-based model. Another advantage relates to the fact that the results are explainable and can be used to justify the decision of running additional simulations.
Moreover, the bounds predicted with the proposed approach would over-predict the minimum and under-predict the true maximum of the response with respect to the current observations available. The proposed approaches are generally applicable to the analysis of engineering systems subject to interval uncertainties for which a large-scale-model is available. However, it is well-known that GP approximations are best suited for dealing with a continuous domain with a small number of dimensions, usually less than 20 [31]. This is because, the larger the dimension of the continuous domain, the more training points are needed to fully explore the input-output relation. As a result, this might restrict its applicability to a certain number of uncertainties. Current work is focusing on investigating strategies for addressing this dimensionality issue.