ASAMS: An Adaptive Sequential Sampling and Automatic Model Selection for Artificial Intelligence Surrogate Modeling

Surrogate Modeling (SM) is often used to reduce the computational burden of time-consuming system simulations. However, continuous advances in Artificial Intelligence (AI) and the spread of embedded sensors have led to the creation of Digital Twins (DT), Design Mining (DM), and Soft Sensors (SS). These methodologies represent a new challenge for the generation of surrogate models since they require the implementation of elaborated artificial intelligence algorithms and minimize the number of physical experiments measured. To reduce the assessment of a physical system, several existing adaptive sequential sampling methodologies have been developed; however, they are limited in most part to the Kriging models and Kriging-model-based Monte Carlo Simulation. In this paper, we integrate a distinct adaptive sampling methodology to an automated machine learning methodology (AutoML) to help in the process of model selection while minimizing the system evaluation and maximizing the system performance for surrogate models based on artificial intelligence algorithms. In each iteration, this framework uses a grid search algorithm to determine the best candidate models and perform a leave-one-out cross-validation to calculate the performance of each sampled point. A Voronoi diagram is applied to partition the sampling region into some local cells, and the Voronoi vertexes are considered as new candidate points. The performance of the sample points is used to estimate the accuracy of the model for a set of candidate points to select those that will improve more the model’s accuracy. Then, the number of candidate models is reduced. Finally, the performance of the framework is tested using two examples to demonstrate the applicability of the proposed method.


Introduction
Many science and engineering fields rely on computer simulations to replace expensive physical experimentation to analyze and improve the quality of different designs, methodologies, or products. The continuous research of numerical simulations has reduced the gap between the physical system and its model. Nevertheless, this improvement comes with a cost in time due to the complexity of such numerical models. Surrogate modeling has become a solution for approximating the expensive numerical simulations of complex systems used to solve heavily iterative problems, such as optimization problems, and achieve acceptable accuracy at a low computational cost.
Surrogate modeling has been incorporated in multiple fields. In [1], the authors develop a multi-fidelity surrogate model for a microwave component. In [2] the authors use a surrogate Kriging

Related Work
All surrogate modeling techniques share the same objective representing the target system as accurately as possible. This objective can be archived by increasing the size of the training data set to get a better understanding of the complete system, as models constructed using bigger datasets record better accuracy [17]. However, this approach can become very expensive due to the computational cost or economic cost of sampling the target plant. In these cases, a second objective is introduced, which is to minimize the cost associated with measuring the target system. These surrogate modeling cases are stated as multiobjective optimization problems, whose goals are to build a model as accurate as possible whit the minimum number of points as possible. The search for the best compromise has been studied from different angles. The first approach, meta-modeling, focuses on tailoring the model that will be used to emulate the target system. This is done through careful model selection and hyperparameter tuning. The performance of the most prominent methods for surrogate creation depends more strongly on the correct setting of many internal hyperparameters [18], and the correct selection of the modeling method is critical for achieving good performance [19]. The second approach, sampling, focuses on a sampling strategy that determines the best data points to measure the target phenomena. It has been proven that the choice of observed points is crucial to the prediction quality of the model.
The meta-modeling approach focuses on the strong sensitivity-to-design decisions during the construction of machine learning surrogates. The main problems are model selection and hyperparameter tuning. The field of AutoML aims to make these decisions in an automated way [20]. Every machine learning system has hyperparameters, and the most basic task of AutoML is to automatically optimize these parameters; this is referred as automatic hyperparameter optimization (HPO). The traditional way of performing HPO is through the grid search algorithm [21], which performs an exhaustive search across a grid of parameters comparing them via a distance metric. Even though this is an old algorithm, it is relatively simple and is still one of the most used nowadays [22][23][24].
The sampling approach is a crucial process in constructing an accurate surrogate model. In [25], the authors classified the main sampling approaches into two categories: one-shot sampling methods and adaptive sequential sampling. One-shot sampling methods consist of generating sampling points through different design of experiments (DoE) methods. Their objective is to allocate the sampling points reasonably as uniformly as possible in the design space. Classic DoE methods include Factorial Designs [26], Central Composite Design (CCD) [27], Monte Carlo Sampling (MCS) [28] and Latin Hypercube Design (LHD) [29]. Adaptive Sampling distributes more points in the regions of interest by analyzing the performance of the surrogate model in previous data. In comparison, adaptive sampling performs better than the one-shot sampling, having great potential to build accurate meta-models with fewer points [30]. For complex systems, sampling more points where the surrogate model has large prediction errors increases the accuracy using fewer points than those needed if the points are sampled evenly; that is, focusing in regions with large prediction errors (regions of interest) allowing the sampling process to adapt to the target function. For improving the surrogate model accuracy, the selection of the regions of interest must consider two conflicting parts: (1) local exploitation: using the model to find regions with large prediction error and (2) global exploration to discover interesting regions that have not been sampled before [31].
The current adaptive sampling approaches are classified into four categories based on the representation type of prediction errors. The variance-based adaptive sampling [31][32][33][34] uses estimations of statistical models, mainly the Kriging model, to detect the regions of interest of regression models. The query-by-committee (QBC) strategy [31,35,36] uses the predictions of multiple competing surrogate models as a committee to predict the response at a candidate point. The cross validation (CV) based adaptive sampling [37][38][39] estimates the prediction error at a candidate point using a cross-validation process. Finally, The gradient-based adaptive sampling [40][41][42] uses the local gradient information of the model to represent the prediction errors. All adaptive sampling methodologies use a particular local exploitation method for searching for new critical samples accordingly to the meta-model performance. The variance-based and gradient-based approaches use the model information as the model gradient or the model hyperparameters, which represent a model dependence. This limits the models that can be used for surrogating, mainly to the Kriging models, while query-by-committee, and cross-validation can be applied to different types of surrogate models.
The main limitation of the current adaptive sampling methods is that they seek to identify the points that most favor a predetermined meta-model based on its performance [37][38][39][40][41][42]. However, it is not easy to determine if the points were selected due to the target system behavior or to an intrinsic limitation of the selected meta-model. Ideally, all the selected samples are critical due to the complex behavior of the target system; however, when evaluated through their performance in the surrogate model, it is not possible to differentiate if they are complex by themselves or if they are only complex for the selected model. This issue raises the following question: does the selected meta-model and its current hyperparameter selection is suitable for representing the target system? or a different meta-model with a different tuning can improve the performance of the surrogate? The question of whether it is possible to generate a new sample point through an adaptive sampling methodology that is independent of the meta-model used to evaluate its performance.
To solve this problem, we propose an adaptive sampling methodology that combines the CV and QBC methodologies together with a grid search algorithm to generate new sampling points at the same time that performs a model selection and a hyperparameter tuning.

Proposed Method (ASAMS)
The proposed methodology is named Adaptive Sampling and Automatic Model Selection (ASAMS). It consists of an AutoML methodology to adjust the hyperparameters of the AI algorithms, and an adaptive sampling method that combines the two methods independently of models CV and QBC. The AutoML method is complemented with a reduction process based on the elitism mechanism of evolutionary methods. A flow diagram of the methodology consists of the modules shown in Figure 1, explained in detail below. 1. Parameter Selection and Constraints. In this stage, the problem statement is carried out, determining the design parameters and constraining the design space. See Section 3.2 for more details. 2. Design of experiments. In order to initialize the construction of the surrogate model, a small number of initial training points are generated using a one-shot sampling method or DoE. In this work, we decided to use a full factorial sampling but any other method can be applied. 3. Plant Evaluation. In this stage, the response of the plant to each training point is measured and assigned as a target for the surrogate training process. The process of evaluating the plant can be online as mathematical models, computational simulations, or physical online measurements, as in DT; or it can be offline for DM. In this paper, we analyze two problems that use an online implementation: one mathematical model and one multiphysics computational simulation. 4. Model Selection and Hyperparameter Tuning. In this step, an AutoML algorithm is applied to perform an algorithm selection and hyperparameter tuning for each possible algorithm. See Section 3.3 for more details. 5. Reduced Model Selection and Hyperparameter Tuning. This step is similar to the Model Selection and Hyperparameter Tuning with only one difference: after the first iteration, the number of candidate models will be reduced through an elitism mechanism; this step performs the hyperparameter tuning and model selection to reduce candidates. 6. Cross-validation. As a result, this process returns the best candidates and the validation score for each candidate model obtained by cross-validation. 7. Stop Learning Conditions. In this step, the methodology validates if the algorithm has met any stop criteria. In this proposal, we consider three different stop conditions. See Section 3.4 for more details. 8. Reduced Model Selection. The process of selecting a suitable model and the correct hyper-parametrization can be explained as an exploration-exploitation problem. During the Model Selection and Hyperparameter Tuning step, the design space is explored, and in the Reduced Model Selection phase, we propose an exploitation mechanism to reduce the search space. See Section 3.5 for a detailed explanation. 9. Adaptive Sampling. In this step, a novel mechanism of adaptive sampling that combines CV and QBC generates new training points through a Voronoi approach. See Section 3.7 for a detailed explanation of the contribution.
We developed the ASAMS algorithms mainly in Python [43], with the idea of integrating them with computer-aided engineering (CAE) and computer-aided design (CAD) software. We performed the integration through Matlab R [44]. As CAE software, we selected COMSOL Multiphysics R [45] and as CAD software we used SolidWorks R [46]. Software and experiments are available at GitHub [47]. Some references are made to the code implementation, however, the methodology is independent of the software selection.

Formal Problem Statement
As mentioned in Section 2, the surrogate model creation problem can be stated as an optimization problem which consists of minimizing the error between meta-model prediction and the real measurement represented by Equation (1), and minimizing the number of experiments M Equation (2), subjected to the algebraic and inequality restriction in functions of the design parameters Equations (3) to (12). (1,t) , hp (2,t) , hp (3,t) , . . . , hp (k t ,t) } k t = 1, 2, 3, . . . , K t (7) The target error, Equation (1), is calculated using the outputs of the surrogate modelȲ (i,t) and the corresponding targets of the testing dataset Y i of N elements, in which every element is a set of S outputs, as shown in Equation (9). The i-th output of the surrogate modelȲ (i,t) is a function of the selected model F t performance, Equation (3), which depends on the training data set T d used to fit the model; the hyperparameters H p (v,t) selected for the model and the i-th input vector X i of the testing dataset, Equation (8). The other target is the minimization of the number of points M of the training dataset T d , shown in Equation (10).
The design variables selected for this optimization problem are the surrogate model algorithm F t from a set of possible T modelsFT, stated in Equation (4). The hyperparameters set H p (v,t) , shown in Equation (7), are used to configure the F t model. Each hyperparameter set can have K t different hyperparameters, and each hp (k,t) hyperparameter is selected from a set P (k t ,t) of possible hyperparameter values, Equation (11).
Finally, we must consider that all the model inputs are constrained, as represented by Equation (12). The nomenclature used in this section is shown in the end of this paper.

Parameter Selection and Constraints
A correct selection of the design parameters and a constrained search space are crucial factors for developing an accurate surrogate model. In this step, we define the initial size M and data points of the initial dataset T d , as well as the size N and input X i and output Y i vectors of the testing dataset. Finally, we define the number J, type, and constraints for each input parameter using a JSON notation built by a list of objects representing each design parameter. In this proposal, we consider three different types of input variables: continuous, discrete, and categorical. Continuous and discrete variables are constrained by a minimum and a maximum value, while categorical ones are constrained to a set of values. Categorical variables are defined by two properties: set and table. Set is described by a list of categories and table has a list of objects associated with their corresponding list of values. Listing 1 shows an example of these parameters.

Model Selection and Hyperparameter Tuning
The main inconvenience of the grid search is its high dimensionality and time cost. Still, its required execution time is relatively short compared with complex computational simulations of physical experimentation. This motivated us to incorporate an exhaustive grid search for hyperparameter tuning for multiple algorithms-this allows the algorithm to perform hyperparameter tuning and model selection simultaneously. With these, we seek to give the surrogate model the greatest possibility to obtain a suitable performance with the minimum number of plant evaluations. In our implementation, we selected a setFT of three machine learning algorithms: a support vector machine regressor (SVM) [48], a random forest regressor (RF) [49], and a Bayesian Ridge model (BR) [50], as shown in Equation (13), although this methodology can be generalized to any set of machine learning algorithms.
We also make a proposal for the hyperparameters for each algorithm in Listing 2. The formal statement of the hyperparameters is shown in Equations (14) to (28).
The combinations of all the algorithms are represented inĤ p show in Equations (29)- (32).

Stop Conditions of Learning
We decided to stop the surrogate optimization process by three different criteria. The first criterion is the accuracy assessment (acc). It considers if the surrogate has achieved the desired performance by performing an error comparison using Equation (33).
where ε m = [ε m1 , ε m2 , ..., ε mi ] is the vector of the error ε mi for each of the i models; ε t is the target error. The second criterion limits the computational power expended by the search algorithms constraining the maximum number of iterations (max I ). The third criterion considers the cost in time or money in each plant evaluation, limiting the maximum number of points (max P ) that can be evaluated.

Reduced Model Selection
The proposed sequential methodology changes the number of training points in each iteration, which gives more information to the machine learning models, enabling us to change the model that fits better for solving the target problem. However, the grid search algorithm suffers from the dimensionality problem and it is necessary to reduce the models that are too far away to represent the intended phenomenon. This characteristic can be interpreted as an exploration-exploitation problem in which we want to keep exploring the models that have an opportunity to fit the system but focus on exploiting the ones that have better performance. Previously, the exploration process through Model Selection and Hyperparameter Tuning was discussed in Section 3.3 in which the proposed structure of models and hyperparameters represent a total of 501 models. However, testing all the models in each iteration of the ASAMS is too computationally expensive. This motivated us to include a mechanism for reducing the number of candidate models in each iteration.
We developed an exploitation mechanism based on elitism. The main objective is to reduce the models to explore via grid search in each iteration. The proposed exploitation mechanism is Reduced Model Selection. As a first step, we rank the models inFT for allĤP t hyperparameter combinations taking advantage of the CV score obtained by the grid search method. Then, we proceed to remove the models that had the worst performance. The number of models to be removed in each iteration is provided as a hyperparameter of ASAMS. Then, after each hyperparameter setĤP t is sorted, we proceed to remove the W t hyperparameter combinations from each model F t that had the worst performance. The number of models W t to be removed in each iteration is a function of the keepRate hyperparameter of ASAMS, shown in Equation (34).
As an example, we propose keepRate = 0.6, which means that we will keep 60% of the hyperparameter combinations, using the proposed method described in Section 3.3. We estimate the W t values for three subsequent iterations in Equations (35) to (43).
In the first iteration, we assume that we have the 501 initial models, and reduce them considering the proposed keep ratio, as indicated in Equations (35)- (37).
In the second iteration, we need to remove the models selected in the first iteration, which leaves us 36 SV M, 216 RF, and 49 BR models. Then we will reduce them further in the second iteration considering the proposed keep ratio, as indicated in Equations (38)- (40).
In the third iteration, we need to remove the models selected in the first and second iteration, which leaves us 22 SV M, 158 RF, and 29 BR models. Then, we will reduce them further in the third iteration considering the proposed keep ratio, as indicated in Equations (41)-(43).
This process is repeated for all iterations.

Adaptive Sampling
The main challenge is to build a suitable surrogate model with the minimum number of training examples, which means we cannot waste samples or system evaluations. It is particularly meaningful if assessments come from a physical system that is constructed and measured. We propose a framework that partitions the sampling space into regions in Section 3.6.1, and then decides which the best are, via a CV and QBC evaluation (Section 3.6.2), and take few candidate points from them.

Partition of the Sampling Space and Candidate Points Selection
We define the sampling region as the set of input values that are valid in all the input constraints. At the start, the sampling region must be reduced from an infinite number to a finite number of points. Therefore, the selected sampling points named candidate points should be far enough from each other and spread in all the search space. In [51], the authors generate a candidate points population using Monte Carlo sampling (MCS); other alternatives are translational propagation Latin Hypercube Design (TPLHD) [52], Uniform Design (UD) [53], and Voronoi sampling [39]. From the latest emerging methodologies, we selected the Voronoi sampling approach because the partition of the design space is done according to the current samples, which allows us to focus on the regions of interest instead of the whole design space.
In our proposal, we create Voronoi regions based on the work of [39] from the training samples and incorporate a constraint-handling technique to select a subset of Voronoi vertexes that meet the parameters' constraints. For constraint handling, the methodology considers two approaches: for any Voronoi vertex that goes outside the minimum and maximum value of any parameter, we apply the death penalty constraint-handling technique [54], in any other point we apply the bounce back constraint-handling technique [55]. The set of data points selected by this technique is denominated as the Voronoi set (VoP) and is represented in Equation (44).
The points of the Voronoi set are the Voronoi vertexes that are estimated from the points of the training dataset T d . A graphical example of two-dimensional Voronoi regions is shown in Figure 2.

Region Assessment and Candidate Selection
The assessment process consists of estimating the precision of the surrogate model in each of the points of the Voronoi set VoP. In this proposal, we combine the two adaptive sampling approaches that have no model dependencies: the CV and the QBC approaches. Voronoi regions can be evaluated using the cross-validation approach proposed in [39]. However, we have more than one model trained with the same data as a result of the grid search methodology, and each model has a different assessment of regions of interest due to the particular characteristics of each model. For this reason, instead of using a particular model for estimating the critical regions, we decide to make a committee conformed from the best of each model type to determine the accuracy of each region. With this approach, we can use the leave-one-out cross-validation approach for each training point on each of the different models to estimate the global performance of all Voronoi regions.
In the first step, every Voronoi region a is rated considering the LOOCV score for the central point of the region by the best hyperparameter set H p B for each model F t independently. As a result, every Voronoi region has as many ratings as different models trained by the grid search, as shown by Equations (45)- (46).
RR (a,t) = (Y a −Ȳ (a,t) ) 2 t = 1, 2, 3, . . . , T where X a central point of the a-th Voronoi region H p B best hyperparameter set for the t-th model T d ∩ {X a } Training data set excluding the X a point RR (a,t) prediction error of the t-th model in the central point of the a-th region RR (a) mean prediction error of the central point of the a-th region RR set of mean prediction error of the all Voronoi regions Y (a,t) Output vector of the t-th surrogate model with the a-th input vector trained excluding the X a point A number of Voronoi regions In order to consider the information of each model in the committee, the final score for each region is the mean value of the scores, see Equation (47).
Finally, it is necessary to rate each candidate's point. As we stated before, the candidate points are a subset of the Voronoi vertexes VoP. By definition, a Voronoi vertex is the midpoint where multiple Voronoi regions collide [56], as shown in Figure 3. The subset of regions that collide with the g-th Voronoi vertexR Rz g is shown in Equation (50). With this consideration, we define the performance of a candidate point VPR g as the mean performance of the adjacent regions (Equation (49)). After assessing the candidate points(V PR), we sort them and select the N p points with the worst performance. N p is a hyperparameter of ASAMS.
RRz g = {RR (g,1) ,RR (g,2 ,RR (g, 3) , . . . ,RR (g,z) } z g = 1, 2, 3, . . . , Z g RRz g ⊆RR (51) where VPR g assessment of the g-th Voronoi point RRz g set of assessments of coliding regions to the g-th candidate Voronoi point VPR set of assessments candidate Voronoi points G Number of candidate Voronoi points Z g Number of coliding regions to the g-th candidate Voronoi point

Step by Step Algorithm
In this section, we present a step-by-step algorithm to detail the input outputs and requirements of each step in the ASAMS Algorithm (see Section 3.7). The ASAMS algorithm uses the Problem or plant evaluation function, the parameters declaration, the candidate models Mdls, and the DoE algorithm as an input. It has five hyperparameters for adjusting the behavior of the algorithm. It uses a keep rate keepRate to indicate how many models must be carried on to the next iteration, the number of new points that will be generated each iteration nExp, and three stop conditions. The maximum number of points maxExp, the maximum number of iterations maxIter, and the target MSE Error.
The experiment design function creates the starting sample using a DoE and the parameter description. The plant evaluation function takes the sample and evaluates the problem. Then, the model selection and hyperparameter tuning (MdlSeletHyParm) function perform the grid search and the cross-validation of the candidate models. The stop condition function validates all the stop conditions and returns a boolean True if any condition has been broken; False in other cases. The reduce model function returns the remaining models after applying the keep rate. The adaptive sampling returns the nExp new points generated, and the joint function unites the new points and evaluations of the training set.

Highly Nonlinear Oscillator
The first example is a highly nonlinear oscillator proposed in [57], as shown in Figure 4a, subjected to a rectangular load pulse with random duration and amplitude (Figure 4b). This is a benchmark problem widely used in many adaptive sampling works [34,58,59]. The limit state is defined by Equation (53), in which z max is the maximum displacement of the system and r is the displacement at which one of the spring yields. The maximum displacement z max , Equation (54), is determined by the magnitude of the force F 1 , the duration of the pulse t 1 , the mass of the system m, and the oscillation frequency w 0 . The oscillation frequency w 0 is defined by the spring constants c 1 and c 2 . Performance function is defined by Equations (53)- (55) in Equation (56).  The surrogation problem stated to find a model m ai that is capable of representing the nonlinear oscillator performance (Equation (56)). The inputs and outputs of the proposed surrogate are presented in Figure 5 and Equation (57). m ai (c 1 , c 2 , m, r, t 1 , F 1 ) = g (57) Parameter selection and constrains: this problem consists of five input variables whose constraints are defined in Listing 3 and is considered a moderate dimensional problem.
Design of experiments: we select the full factorial method for two levels with a total of 64 samples as starting training points.
Plant Evaluation: the plant evaluation will be performed using Equation (56). Model and Hyperparameter constraints: the search of the surrogate model will be constrained to the algorithms and parameters proposed in Listing 2.
Selection of the stop Learning Conditions: the objective of this study is to compare the performance of the proposed algorithm with a baseline static design of experiments. As a baseline, we select a full factorial design of three levels which yields a total of 730 experiments. These motivate us to use as stop criteria a maximum number of points equal to the number of points of the factorial design 730. We select a maximum number of iterations of 100 and a target error of 0.

Magnetic Circuit
A magnetic circuit is a path in which a magnetic field can be enclosed, as shown in Figure 6a. These circuits can be modeled by computer simulation and have been used to optimize the circuit's performance [60]. The magnetic circuit is described by a CAE simulation in COMSOL Multiphysics R , shown in Figure 6b. In this figure, the magnetic field of all the circuit is estimated using the CAE model. As the surrogate target, the mean magnetic field in the central tube is used. The physical parameters that determine the characteristics of the magnetic circuit are the number turns of the coil N t , the core wire diameter D w , and core geometry, defined by the core base B c , the core height h c , the core width w c , and the core depth P c . This problem consists of five input variables defined in Listing 3 and is considered a moderately-dimensional problem. Additionally to the geometrical parameters, it is necessary to include the electric current I that passes through the coils. Figure 7 shows the geometric parameters. To state the model inputs, we take some considerations about the physical construction of the magnetic circuit, the core geometry is built using laminated silicon steel, which is commercially available in some specific geometries. For this reason, as a design parameters, we select a categorical parameter, the core Id that is related to the core geometry (B c ,h c ,w c ). The core wire diameter D w is limited to commercial gauges available and is considered a categorical parameter. The number of turns of the coil N t is considered a discrete integer parameter, and the core deep is a discrete parameter with increments of 0.5.
The objective is to find a surrogate capable of predicting the mean magnetic field B inside the tube at the center of the magnetic circuit. The proposed surrogate model is shown in Figure 8. Design of experiments: we select the full factorial method for two levels with a total of 33 samples as starting training points.
Plant Evaluation: the plant evaluation will be performed using the CAE simulation in COMSOL Multiphysics R .
Model and Hyperparameter constraints: the search of the surrogate model will be constrained to the algorithms and parameters proposed in Listing 2.
Selection of the stop Learning Conditions: the objective of this study is to compare the performance of the proposed algorithm with a baseline static design of experiments. As a baseline, we select a full factorial design of three levels which yields a total of 244 experiments. These motivate us to use, as stop criteria, a maximum number of points equal to the number of points of the factorial design 300. We selected a maximum number of iterations of 34 and a target error of 0.
ASAMS parametrization: we selected the following parameters for the ASAMS algorithm in this case study. A keep rate of 0.5, which represents that only 50% of the target models will be kept from each iteration. A number of experiments nExp of 14, which means that the algorithm will generate 14 new experiments in or each iteration.

Highly Nonlinear Oscillator
The first problem is a benchmark problem; due to this reason, the performance of the algorithm can be analyzed by different means. First, we aim to compare the generalization capability of the algorithm trained using ASAMS with a one-shot approach as a baseline. We select the full factorial method for two and three levels that yields a total of 64 and 730 sampling points respectively. For validating the model performance, we will use a data set of 200 random points that are different from the ones used in the training of any algorithm. This testing set was used for comparing the performance of both baselines with the accuracy of the ASAMS at a different number of sample points. We select two different metrics for this comparison: the RScore metric [61] and the squared mean error (MSE) stated in Equation (58). In Figure 9, the comparison of the baselines with the ASAMS algorithm is shown. It can be appreciated that with 537 samples, the algorithm performs better than the 730 one-shot sampling approach. We noticed some fluctuations in the performance during the testing face between 150 and 250 samples; however, these variations were not present during the training cross-validation mean square error (MSE) shown in Figure 10. We found that the variability was due to the performance of the model in some particular zones. For this reason, we decided to focus our test on the zones of interest by generating additional testing points using the ASAMS methodology without training the model with them. The new testing set of fifty points was used to compare the models of ASAMS methodology for different numbers of samples with the baseline models. Results are shown in Figure 11. In this figure, we can appreciate that the type of model for each iteration is the same, which means that the ASAMS algorithm does not take advantage of the capability of changing the model while generating new training points. The critical region experiment shows that the ASAMS model has better performance than any baseline in the regions that are considered as harder regions for the model to capture. Another important observation is that the baseline model with more points has the worst performance in the critical regions.
One advantage of the ASAMS methodology is that it is capable of changing the type and hyperparameters of the selected surrogate model in each iteration. We hypothesize that this feature will help to improve the performance and the selected model will be changing when the size of the training dataset increases. To analyze this hypothesis, we decide to register the number of changes between the models. In Figure 12 we show the changes between the model types. It is clear that the algorithm has been fixed since the starting dataset. However, we also want to analyze if the algorithm at least changes the hyperparameters for fine-tuning the model. In Figure 13a, a histogram of the selected hyperparameters is shown. A table of the selected hyperparameters is shown in Figure 13b. In this analysis, we can appreciate that the algorithm adjusts a couple of times the hyperparameters before it converges to a specific set. The final model is SV M with the following hyperparameters: C = 100, gamma = auto, and kernel = rbf.

Magnetic Circuit
This second problem is harder to evaluate because of the integration with COMSOL Multiphysics R and SolidWorks R . In Figure 14 we compare the generalization capability of the algorithm trained using ASAMS with a one-shot approach as a baseline using the full factorial method for 2 and 3 levels that yield a total of 33 and 244 sampling points. We compare both baselines with the accuracy of the ASAMS at a different number of sample points using MSE for an independent 35 point test set. Note that, with 75 samples, the algorithm performs better than the 244 one-shot sampling approach. As in the previous case, we can observe some oscillations in the ASAMS results. For this reason, we decided to perform also an independent testing set of 64 points of critical regions using ASAMS mechanisms. In Figure 15, a comparison of both baselines with the accuracy of the ASAMS using the Critical Testing set is shown. In this experiment, it can be seen that the 244 point baseline does not improve in the critical regions while the ASASM algorithm clearly improves without losing generalization. A correct model selection and a fine hyperparameter tuning are crucial for a complex task. We hypothesize that the AutoML feature of the ASAMS will become increasingly important to improve the performance in harder problems. To analyze this hypothesis, we decide to register the number of changes between the models. In Figure 16, we show the changes between the model types. Observe that in this case, the algorithm shows some changes between the model types. We are also interested in analyzing if the algorithm changes the hyperparameters for achieving a fine-tuning of the model. In Figure 17a, a histogram of the selected hyperparameters is shown. A table of the selected hyperparameters is shown in Figure 17b. In this analysis, we can appreciate that the algorithm adjusts the hyperparameters before and does not achieve convergence. The final model is RF with the following hyperparameters: max_depth = NaN, max_features = auto, min_samples_leaf = 2, min_samples_split = 5 and n_estimators = 200.

Discussion
We hypothesize that the capability of changing the algorithm of the surrogate model during the iterative adaptive sampling that generates additional data points will become crucial to improve the performance. While this was true in the magnetic circuit problem, in the nonlinear oscillator the model selection was not important after four iterations, and we can even argue that the changes between the selected models were minimum. From this behavior, we can infer that the additional computational cost is only justified in complex problems because in simpler problems the initial sampling provides enough information for selecting a suitable model for the task.
We decided to perform two different kinds of validation. The first one was performed using a random sampling of the valid input space. In this experiment, the baseline performs as expected when we use a bigger dataset the predictions of the model improved. As expected, the performance of the model trained using ASAMS was better than the baseline with fewer points in both cases. The second experiment was performed using a sampling of critical points form the model trained by ASAMS which by definition are the points in which our model performs worst. This dataset became interesting because we found out that the baseline models do not improve their performance when the size of the data set is increased as expected. However, the model train by ASAMS improves in the testing set when the data set size increases. This characteristic must be studied in future work.
After the analysis, we note that the use of ASMS should be limited to complex problems since the improvement is marginal in simple problems and does not justify the additional computational cost.
The proposed methodology can be generalized to any surrogation problems, provided there is some alternative for evaluating the performance of the proposed points. The evaluation can be performed online using any type of computational model or offline by an experimental method. In the case of experimental measures, the algorithm must be stopped in every iteration to preform experimental evaluations. From the target system, the designer must identify the inputs and outputs for the surrogate model. The inputs of the surrogate model will be selected as parameters, and they must be constrained. Then, the algorithm must start with a one-shot sampling method.
The ASAMS algorithm can be tuned by changing the number and type of candidate IA algorithms for surrogating the system, also the list of hyperparameters can be changed. Finally, the stop conditions, keep rate, max experiments, and the number of experiments must be set.
With all these hyperparameters the ASAMS algorithm can be tuned for many surrogation tasks.

Conclusions and Future Work
Adaptive sampling algorithms have been proven to be useful in reliability analysis for traditional surrogate models like the Kriging method, but the growing demand for algorithms to translate the surrogate models into applications such as digital twinning, design mining or soft sensors has generated the need to transfer adaptive sampling methods into machine learning models. In this paper, we have proposed a new adaptive sequential sampling approach that combines a meta-learning algorithm with two adaptive sampling methods for machine learning models. Constrain handling techniques were introduced to consider different types of discrete design parameters, allowing the algorithm to handle more types of problems. We proposed an elitism mechanism for reducing the number of candidate models as part of the meta-learning approach.
We selected two different study cases for testing the ASAMS performance: a benchmark and a real problem. In these tests, we compared the performance of a surrogate model trained with the ASAMS algorithm and a model trained with a baseline one-shot sampling. Two different testing sets were used for the comparison. The first one is a random sampling of 200 points and the other is a focal sampling of the critical points of the study case. These experiments show that the model generated by the ASAMS algorithm can obtain better performance than the baseline approaches with fewer sampled points. Additionally, we verified that the surrogate models were more robust to the critical parts of the target system without losing generalization.
Another important factor is the ability of the ASAMS methodology to perform a selection of suitable algorithms for surrogating the problem and the fine-tuning of the selected algorithm. In the test, we observed that this feature is relevant for complex models.
While the methodology was tested using a couple of well-known algorithms, it can be easily used for almost any machine learning algorithm and can be applied for generating a surrogate model for any field of study.
The ASAMS methodology is an initial approach to combine AutoML with adaptive sampling techniques. As future work, we suggest improving the proposed methodology with more complex AutoML algorithms. Additionally, we recommend studying further the differences between the critical points for a model and a system to improve the performance of adaptive sampling techniques.

Conflicts of Interest:
The authors declare no conflict of interest.

Y i
Target vector of the i-th element of the Testing data set X i Input vector of the i-th element of the Testing data set Y (i,t) Output vector of the t-th surrogate model w/ the i-th input vector of the Testing data set F t Algorithm selected from theFT algorithm set H p Set of all hyperparameter combinations for the T candidate solutionŝ H p t Set of valid hyperparameter combinations for the t-th candidate model H p (v,t) Hyperparameters selected for the F t model hp (k t ,t) Value of the k t hyperparameter of the t model T d Training dataset FT Set of candidate algorithmŝ P (k t ,t) Set of candidate values for the k t hyperparameter of the t model x (j,i) j-th element of the input vector X i y (s,i) s-th element of the target vector Y i d m m-th data point of the Training dataset p (q (k,t) ,k t ,t) q-th candidate value for the k t hyperparameter of the t model