Sequential Experiment Design for Parameter Estimation of Nonlinear Systems using a Neural Network Approximator

—We consider the problem of sequential parameter estimation of a nonlinear function under the Bayesian setting. The designer can choose inputs for a sequence of experiments to obtain an accurate estimate of the system parameters based on observed outputs, while complying with a constraint on the expected outputs of the system. We quantify the accuracy of the obtained estimate in terms of the ℓ 2 norm. We propose to solve the problem by casting it as the problem of minimizing the Bayesian Mean Square Error (BMSE) of the parameter estimate subject to a constraint on the expected deviation of the output from the desired target value. We develop a greedy policy to solve the problem in the sequential setting, and we characterize the solution structure based on analytical results for the Gaussian case. For a computationally tractable update of the posterior, we propose the use of a surrogate model combined with approximate Bayesian computation. We evaluate the proposed approach on the use case of smart road compaction, where the goal is to estimate asphalt parameters while reaching the desired compaction level, by choosing the value of the loading pressure. Simulation results on a synthetic road compaction dataset show the efficacy of the proposed solution scheme in both parameter estimation and effective compaction of the road.


I. INTRODUCTION
An accurate estimate of the model parameters of equipment and materials has a variety of applications in chemistry, manufacturing and construction [1], [2].In the lack of nondestructive testing methodologies that would be applicable post production, parameter estimates often have to be obtained during production in real-time, i.e., under time constraint.This is often the case, for example, during road compaction and metalworking.Importantly, the estimation of the parameters during the production phase should not be detrimental to production quality and is limited to interaction with the material or equipment for the purpose of production, by providing a sequence of inputs and observing the outputs.
The problem of designing inputs to a system with the purpose of estimating its parameters has been studied in the framework of experimental design (ED) [3].In ED, an optimal set of inputs is designed for a system described by a parameterized model, so as to obtain an accurate estimate of the model parameters.A commonly used measure of accuracy is the variance of the parameter estimate, and the This work was partly funded by the Vinnova Competence Center for Trustworthy Edge Computing Systems and Applications (TECoSA) at KTH and by KTH Digital Futures.
task is to reach a target variance through as few experiments as possible, because experiments are considered costly [4], [5].In many applications, however, even evaluating a model of the system can be costly, e.g., if it involves finite element (FE) method simulations of material behavior [2].
The computational cost of evaluating the system model becomes of crucial importance for sequential ED, i.e., when the observations from previous experiments are to be used for the design of future experiments [6].As the problem is essentially a sequential decision making problem under uncertainty, it can be cast as a dynamic programming problem, but existing solution approaches rely on Monte Carlo simulation methods, which are computationally prohibitive if the evaluation of the system model is computationally intensive [6], [7].In this case even greedy approaches are computationally challenging due to the need for updating the posterior after each experiment, which typically involves many evaluations of the system model for different parameters [8], [9].
In this paper we formulate the problem of estimating model parameters through sequential experiments subject to the constraint of reaching a target output value by sequentially designing inputs to the system.We develop a greedy approach for average optimal experiment design, i.e., minimizing the variance of the estimate in a Bayesian setting, and propose a computationally lightweight approach for updating the posterior through a surrogate system model based on a neural network approximator used for posterior update via approximate Bayesian computation.A key novelty of our approach is its applicability to problems where the evaluation of the system model is computationally demanding.In addition, the use of the Bayesian mean-squared error as a cost is novel in the sequential OED setting.We show the potential of the proposed scheme through estimating the material parameters during road compaction, while maintaining the error between target and actual compaction bounded in expectation.
The rest of the paper is organized as follows.In Section II we review related work, in Section III we introduce the sequential parameter estimation problem in a Bayesian setting and in Section IV we formulate the input design problem as Bayesian-mean squared error (BMSE) minimization.In Section V we design a computationally lightweight greedy algorithm for sequential ED, and we show its efficacy on the use case of asphalt compaction in Section VI-D.Section VII concludes the paper.

II. RELATED WORK
As mentioned before, our work is closely related to the field of optimal experiment design in statistics.In system identification problems, it is also known as optimal input design [10] wherein the parameters of the system model to be estimated are treated as deterministic quantities and the input to the system model is shaped or designed in a way that the estimation error of the parameters is minimized [11].In linear models, the inverse of error covariance matrix of the parameters is related to the spectrum of the input which can be parameterized and therefore used for shaping the input to minimize the trace (A-optimality) or determinant (D-optimality) of error covariance matrix.Sequential version of the problem is also studied with applications in trajectory planning.More recently, theoretical guarantees for non-linear system identification have also been provided in [12].
The Bayesian version of OED is reviewed in [13].The most commonly used utility function in the Bayesian version is the KL divergence between the prior and posterior parameter distributions averaged over all the observations [13].Since this utility is intractable in general, a variational approach to approximate the posterior distribution is undertaken in [14].As another approach, authors in [15] introduce a lower bound on the utility and maximize it with respect to design criteria parameterized by a neural network.For linear models, i.e.Gaussian priors and posteriors in the Bayesian model, it is possible to obtain closed form expressions for optimal experiment design as shown in [16], which leads to the well-known criterion of D-optimality in experiments.For Bayesian non-linear state-space models authors in [17] minimize the posterior Cramer-Rao lower bound (PCRB) in lieu of maximizing a utility function based on Bayesian mean-squared error (BMSE).
The sequential setting of OED was studied in [18], where a single experiment is performed at a time, the posterior of the parameters is updated and the process continues until the allowed number of experiments are performed.The utility considered is the KL divergence between the prior and posterior of the parameter distribution.The overall problem is formulated as a dynamic program and accounts for a dynamic environment.Authors in [6] consider the same problem and address computational and numerical issues through approximate dynamic programming.More recent papers utilize deep learning for input design, e.g., a neural network was trained to represent the input design policy in a Bayesian setting in [19].

III. PROBLEM STATEMENT
We consider a sequential parameter estimation problem, with outputs denoted by y t ∈ Y ⊂ R N and inputs denoted by x t ∈ X ⊂ R M .The generative model for the output at time instant t is of the form where ϕ(.; θ) is a function with parameter θ ∈ R K .The term η t is used to model both model uncertainty due to approximation of the generative model and measurement noise due to environmental factors.It is assumed that η t follows a zero-mean Gaussian distribution with known variance σ 2 , η t ∼ N (0, σ 2 I).We consider that the family of functions that ϕ(.) belongs to is known, but the true value θ * of the parameter is unknown.In addition there is a desired target output y that the designer intends to meet with the appropriate value of the input x and parameter θ.
The designer is allowed to perform at most T experiments, by providing inputs and observing the outputs.Let us denote the inputs and outputs until experiment t as We adopt a Bayesian approach, and hence assume that the true parameter is a realization of a random variable θ.The belief about the distribution of θ right before experiment t is referred to as the prior and is denoted by p t (θ), while the belief about the distribution of θ after experiment t is referred to as the posterior distribution and is denoted by p(θ|Y t ; X t ).Given the posterior distribution, it is well-known that the estimator of θ that minimizes the mean-squared error is the mean of the posterior distribution, E[θ|Y t ; X t ] and the designer wants this mean to be as close to the true value θ * as possible [20] after T experiments.Furthermore, the designer also wants the expected deviation from target output y to be below a certain threshold δ.Formally, the designer wants to solve the optimization problem min (3) subject to In constraint (4), the expectation is taken with respect to θ drawn from the prior distribution p t (θ).In an offline setup, where x T is decided for all experiments at once, the prior distribution is not updated based on the observed outputs, i.e., p t (θ) = p 0 (θ) ∀t.In a sequential setup, where x t for experiment t is decided based on past observations Y t−1 and inputs X t−1 , the prior distribution becomes the most recent posterior, i.e., p t (θ) = p(θ|Y t−1 ; X t−1 ).The motivation behind the formulation stems from the goal of minimizing the mean squared error.Note that the objective (3) is expressed as the norm of the difference between the best estimate E[θ|Y T ; X T ] and the true parameter θ * .Nonetheless, the true parameter is a realization of the random variable θ, and is not observable.

IV. BAYESIAN MEAN-SQUARED ERROR FORMULATION
In order to cope with the unknown parameter, we propose to adopt a Bayesian approach.Following the Bayesian approach, the estimate E[θ|Y T ; X T ] of the parameter is based on the observed outputs Y T , and minimizes the meansquared error in the Bayesian sense, i.e., the Bayesian MSE (BMSE).This formulation is similar to other works in OED where the goal is to minimize a metric of parameter uncertainty.Here, that metric is the BMSE, i.e., the variance of the posterior distribution.

A. BMSE Problem Formulation
In order to express the BMSE with respect to an estimator θ considering the average error over the observations, let us denote by Y the random vectors corresponding to the observations and use p(Y ; X T ) to denote the distribution of Y parameterized by inputs X T .We can then express the BMSE of θ by marginalizing with respect to the observations Y and the parameter θ by considering the following factorization of their joint distribution, Then, the BMSE for estimator θ is given as Now, we can define the minimum BMSE with respect to the estimator θ as C(X T ) ≜ min θ BMSE, where Tr(C θ|Y ) refers to the trace of the conditional covariance matrix of θ given Y .Note that the minimum MSE is expressed with respect to the deterministic parameters of the distribution of Y and parameters of the prior distribution of θ.In the case of Gaussian noise as in (1), the minimum MSE is defined by the input X T and noise variance σ 2 and parameters of p t (θ).
A strategy to determine the input X T could be to minimize the quantity C(X T ) with respect to X T subject to the constraint pertaining to the target output y in (4).Overall, the problem can formulated from (7) as subject to x t ∈ X 1 ≤ t ≤ T

B. Analytical Example
As an illustration, let us consider a linear measurement model with scalar parameters, a single experiment T = 1, and a Gaussian prior for parameter θ ∼ N (µ 0 , σ 2 θ ).Under these assumption, it is well known that the posterior distribution of θ is Gaussian with mean and variance given by [20] -0.The cost (7) can thus be expressed as and it does not depend on y.Also, the constraint pertaining to the target output ( 4) can be written as Thus, the input x has to lie in the set Note that the expectation is taken with respect to θ using the prior distribution.The overall problem (8) thus becomes Observe that the cost for minimization translates to maximizing x 2 subject to a quadratic inequality (18).If the discriminant of the quadratic is positive or zero, then we have a solution to (17).If the discriminant is negative, then the parabola in x lies entirely above the x-axis thereby rendering no solution to the problem in (17) subject to the constraint.
For the discriminant to be positive, the following inequality must hold If the condition above is satisfied then the solution is Fig. 1 shows the error curve, the threshold and the feasible set of inputs for a numerical example.Remark 1: Observe that the optimal solution mainly depends on the parameters of the prior distribution of θ, the target output and the accuracy budget δ.Also note that the optimal solution x * is at the boundary points of the feasible set, which we will make use of in the next section.

V. OPTIMAL INPUT DESIGN
Minimizing the BMSE with respect to input x provides the optimal solution, but it is not always possible to express the cost in closed form as a function of x.In what follows, we discuss alternative solution approaches that do not assume that the BMSE can be expressed in closed form, but still respect the constraints on the input at every experiment.
Generalizing (16), let us define the set of feasible values of x given the prior distribution p t (θ) as In order to satisfy the constraint, we have to ensure that x ∈ X t δ upon every experiment.Nonetheless, X t δ depends on the most recent prior p t (θ), hence in what follows we distinguish between offline and sequential input design.

A. Offline Input Design
We start with considering the offline case, i.e., observations become available only after all experiments are done, and thus the prior cannot be updated, i.e., p t (θ) = p(θ), and hence the decision regarding all inputs can be made at once.As stated before, the cost in ( 8) is in general not analytically tractable, and as we will show in Section V-A.2, the solution may be to use the same input T times.Hence we explore computationally efficient alternatives.
1) Feasible Set Sampling: Feasible set sampling considered here focuses on the feasibility of the solutions in (4).A feasible set of solutions X T can be obtained by sampling from the set X δ , but it is unclear how to best sample from X δ .A straightforward approach would be to sample uniformly at random from X δ .To obtain a better sampling strategy, recall that the solution in the Gaussian example shown in Section IV-B was to choose the input from the boundary of the feasible set.We thus consider sampling from a distribution that assigns a larger value at the boundary of the feasible set.Thus, we use the distribution This strategy reduces the entropy of the posterior distribution since, on average, conditioning reduces entropy.We use this offline strategy as a baseline in our evaluation.
2) Analytical Example: We continue with the example from Section IV-B, focusing now on the case with T experiments.We can concatenate the observations, y ⊤ ≜ y 1 y 2 . . .y T , to arrive at the observation model where the inputs x ⊤ ≜ x 1 x 2 . . .x T need to be determined in an offline manner, i.e., all the inputs are decided at once before making any observations.As before, the posterior distribution of θ is Gaussian with mean and variance as The cost for minimization is C(x) = C θ|y The constraints are decoupled with respect to each observation y i , thus we have the overall optimization problem, which as before translates to maximizing with respect to x, ) Note that the objective ( 26) can be decomposed with respect to each input x i .Due to the absence of constraints that couple the inputs of different experiments, we get the same solution x t = x * ∀t, where x * is given in (20).Thus, a sufficiently diverse set of outputs cannot be obtained to change the posterior distribution significantly.In addition, observe that x * , as in Section IV-B for T = 1, lies at the boundary of the feasible set.
Feasible Set Sampling: The expected error is given in the left-hand side (LHS) of ( 15) and the feasible set of x in ( 16) can be rewritten as with x * as in (20) and the upper and lower limits switched if yµ θ < 0. The sampling distribution can be written as with C X δ evaluated as in (22).

B. Sequential Input Design
We now turn to the sequential case, where the prior distribution of the parameters can be updated based on the outcome of previous experiments, before the next input is chosen, i.e., the posterior parameter distribution obtained after t − 1 experiments serves as the prior distribution for chosing x t , Ideally, one would solve (8) in the sequential setting by formulating it as a dynamic programming problem [18].This involves solving the problem by backward induction to obtain a policy to determine the set of inputs X * T .The following result follows from basic principles of dynamic programming.
Proposition 1: Consider experiment T with prior p T (θ) = p(θ|Y T −1 ; X T −1 ).The optimal input x * T is the solution to Solving with backward induction, at experiment t < T , with prior p(θ|Y t−1 , X t−1 ) and optimal inputs x * t+1 , . . ., x * T , the cost C(X T ) has to be evaluated by computing multidimensional integrals with respect to Y T .Therefore, for addressing the multi-stage problem, in what follows we propose to adopt a greedy approach.
The greedy approach for the sequential input design consists of finding the optimal solution for experiment t given prior p t (θ) and ignoring the terms in the cost comprising of future observations and inputs.In particular, at experiment t we consider the cost Thus, following the greedy approach the input x G t at experiment t is the solution to subject to 1) Analytical Example: Continuing with the analytical example in the sequential setting, observe that the initial prior is Gaussian and the observation y and θ are jointly Gaussian.Thus, at experiment t, the prior p t (θ) is Gaussian and has mean µ θ,t−1 and variance σ 2 θ,t−1 given by The cost for experiment t is the same as in the single experiment case (eqn.( 13)) and does not depend on observation y t .Thus, the optimal input x * t is given by solving (17) with µ θ = µ θ,t−1 and σ 2 θ = σ 2 θ,t−1 and is given by .
For experiment t + 1, the posterior of θ is updated using the recursive formulae for mean in (37) and variance (38) and the optimal input is also obtained recursively using (39).

C. Training of Surrogate Model for Posterior Update
To efficiently update the posterior distribution after observing the outputs, a closed form expression of the prior distribution, evaluation of the distribution p(y t |x t , θ) and computation of the function ϕ(x; θ) are needed.In lack of these, the computation of the posterior distribution has to be based on a numerical method.To address this problem, we propose to use a neural network approximator of the function ϕ(x; θ) as a surrogate model, combined with approximate Bayesian computation [21] to update the posterior distribution.For illustration, consider the sequential setting, where the update of the posterior distribution after observing the output y t of the experiment t is In many applications the prior p(θ|Y t−1 , X t−1 ) is not given in closed form and the distribution p(y t |x t , θ) is difficult to evaluate, especially if the function ϕ(x; θ) takes a long time to compute.The surrogate model is denoted by ϕ ω (x; θ), where ω are the parameters of the neural network.The surrogate model is trained utilizing data generated by the original model ϕ(x; θ), and is done offline, prior to performing the experiments.The surrogate ϕ ω (x; θ) can be used for posterior update in both offline and sequential input design scenarios.
Given the surrogate, we perform the update of the posterior distribution when a closed form expression does not exist using approximate Bayesian computing [21], which is a rejection sampling approach.Algorithm 1 details the steps to produce samples of θ from posterior distribution p(θ|Y t ; X t ), given prior p t (θ), input x t , and corresponding observation y t .The approach involves repeatedly generating a sample θ ′ from the prior distribution p t (θ) and using the sampled parameter to generate a sample z t using the surrogate model, z t = ϕ ω (x t ; θ ′ ), in lieu of simulating from the actual function ϕ(θ ′ ; x t ).Then, the distance between generated sample z t and observation y t is calculated as ρ(y t , z t ) = ∥y t − z t ∥ 2 2 .If ρ(y t , z t ) ≤ ϵ, the sample θ ′ is accepted and added to the set of samples, Θ, that will constitute the posterior.The procedure is repeated until there are enough samples (N s ) to form an empirical posterior distribution.
For the offline input design case, the prior is the same for all experiments, i.e. p t (θ) = p(θ) ∀t.Therefore, given observations Y T and inputs X T , samples z 1 , . . ., z T are generated as before using θ ′ ∼ p(θ).Then, we use to accept or reject θ ′ using a new tolerance value ε which we set to be ε = ϵT to have the same tolerance per experiment as in the sequential design case.

VI. APPLICATION TO INTELLIGENT ROAD COMPACTION
In what follows we show how the considered problem applies to the use case of intelligent road compaction [22].
During road compaction, compaction actions (speed, vibration) are applied on asphalt, modeled as an elastoplastic material with uncertain parameters, so as to achieve a desired deformation, called permanent strain.Considering a contiguous stretch of asphalt of homogeneous material as a sequence of T sections, compaction can be modeled as T subsequent experiments.Learning the parameters of the compacted ashpalt, which is the aim of the experiment, in turn allows to optimize predictive maintenance, thereby reducing maintenance costs.

A. Mathematical model of compaction simulation
The compaction models of asphalt in the literature [2] are quite complex and require the knowledge of several parameters in order to simulate the effect of compaction.In this paper, we use a bilinear elastoplastic material model along with site geometry and boundary conditions.The process of compaction can be modeled as a loading pressure (stress) x that leads to strain ε [23].Due to the plastic nature of the material there is a plastic strain, which leads to permanent deformation.For simplicity, we consider the input in experimental design to be the stress x.For a bilinear elastoplastic material, the relationship between stress x (in MPa) and permanent deformation (in m) can be described using three parameters: yield point σ yield , linear elastic modulus E e and the isotropic tangent modulus E Tiso .Thus, θ = σ yield E e E Tiso ⊤ .In the elastic region, there is no deformation since the removal of load reverses the strain.However, after the stress reaches the yield point σ yield , the rate of increase in strain changes according to E Tiso .
In one dimension, we can write the relationship between the input stress x t and the expected resulting output, i.e., plastic deformation as where y 0 is the original depth of the material before compaction.
In two dimensions, the displacement field in the material can only be obtained by using finite element (FE) methods, which require one to numerically solve a system of differential equations that govern the spatial variation in Fig. 2: Compaction simulation plastic displacement field generated by COMSOL [24] with parameters σ yield = 0.2, E e = 3, E Tiso = 2 and loading pressure x = 2. (all values in MPa).Fig. 3: Geometry and Load by Compactor stress, and compute the strain, which leads to displacement or deformation in 2D space.An example of displacement field generated using COMSOL Multiphysics [24] is shown in Fig. 2, assuming that the measured permanent deformation corresponds to the displacement at the mid-point of the compactor in the direction perpendicular to the compactor wheel.

B. Generation of Synthetic Data
We simulate the transverse direction of the road that the compactor is passing along.Assuming the contact length is 2m as a typical width of a roller, and the contact width is 0.1m in the longitudinal direction.We simulated a cross section of a road by 2D solid mechanics with plane strain setting.For simplicity, there are only 2 layers: stiff underlying soil layer and compaction material layer.The left and bottom boundaries are fixed, and the right boundary is a symmetric boundary which constrains the movement in X−axis.The geometry is shown in Fig. 3.For simplicity, we assume that the stiff soil layer is linear elastic and has a Young Modulus E = 300 MPa and Poisson's ratio v = 0.35.The compaction material layer is bilinear elastoplastic with a constant Poisson's ratio of v = 0.15, and the parameters to be estimated are θ = σ yield E e E Tiso ⊤ .Since the constitutive model we used is not dependent on loading time, we focus on varying loading pressure magnitude alone.We used COMSOL Multiphysics [24] for the FE method computations, the response contains the plastic deformation of the compaction material top surface.Treating the result of the FE method as the function ϕ, for given loading pressure x and parameters θ we obtain the observed deformation as

C. Training the Surrogate
In order to obtain a surrogate model ϕ ω (x t ; θ), we created a data set using uninformative priors for the parameters, p(θ), i.e., uniform and independent distributions as discussed next.
where σ yield ∈ [σ, σ], (E Tiso , E e ) ∈ E. We used the data set for training a feed-forward neural network with three hidden layers of 10 neurons each.The activation function used is the rectified linear unit (ReLU).The parameters ω of the surrogate model ϕ ω were learned using mini-batch stochastic gradient with RMSprop [25], i.e., the learning rate was divided by the running average of the recent gradient magnitude.The learning rate used was 0.001, batch size was 50 and 1500 epochs.The test accuracy (RMSE) was 10 −3 .

D. Numerical Results
For the evaluation we used the same prior parameter distribution as for creating the training data set.A histogram of samples drawn from the prior distribution is shown in Fig. 4. The input x t (in MPa) can taken values in X , x t ∈ X = {1, 1.05, . . ., 2.05}.The observation y t is obtained by adding Gaussian measurement noise with standard deviation 1 × 10 −3 to the output of the COMSOL model.The target deformation is y = −0.03where the sign indicates compaction of material.For the constraint, we used δ = 5×10 −4 , which is lower than the test accuracy of the surrogate model.For each evaluation, the true value of the parameter θ * was chosen from the prior distribution.We run the algorithm 10 times with different set of true parameters.For each set of true parameters, T = 20 compaction experiments are performed.In the offline setting, the value of input x t is chosen at random from the feasible set X δ .We consider three different sampling strategies for x t .First, we sample uniformly at random from the feasible set.Secondly, we sample from the set of points that lie on the boundary of the feasible set.In this problem, it translates to sampling from a set with two values where the average (with respect to prior p(θ) deviation from the target y is maximum.Thirdly, we sample the input x t from the distribution in (22).In the sequential setting, the cost in (36) is evaluated numerically for an input x t using Monte-Carlo integration method [26] with 2000 evaluation points due to the absence of closed form expressions.In both the sequential and offline methods, approximate Bayesian computation (ABC) is used for posterior update.For both types of methods, the estimator is the posterior mean, θ = E[θ|Y T ; X T ].Fig. 5 shows the estimation performance for the offline and the sequential greedy input design methods.The mean-squared error, , is plotted for different values of T .As expected, sequential input design outperforms all the offline methods due to the feedback in the form of sequential observations to improve the estimator.The behaviour of the MSE of the online greedy method seems erratic due to limited number of trials.For the sequential greedy input design, the posterior distribution after T experiments is plotted in Fig. 6.We see that the mean of the posterior distribution is close to the actual value for E e , E Tiso , the posterior distribution is sharper compared to the prior, and its weight is concentrated closer to the actual value, thereby effectively reducing uncertainty in parameter estimate.Fig. 7 shows the input load x t obtained by solving the greedy problem (36) and the corresponding observed permanent deformation y t .We see a close proximity to the desired target compaction y, which shows the efficacy of sequential input design.The variations in the values of the input and the corresponding observations are a result of competing objectives of reducing the uncertainty of parameters and reaching the target value; a variety of the input values helps improve the parameter estimates.However, to reach the target value, a stricter constraint, i.e., a low δ leads to same input value at every experiment.

VII. CONCLUSIONS AND FUTURE WORK
We investigated the problem of sequential parameter estimation of a nonlinear function under the Bayesian setting while also meeting the system output constraints by optimally choosing the inputs to the system.We formulate this problem as a minimization of the Bayesian mean-squared error (BMSE) under constraints, which is a novelty of the paper, and put forward a greedy solution for the sequential estimation problem.The proposed use of a neural network surrogate provides computational ease compared to other methods in the literature for the posterior update.In the road compaction case, it is found that the greedy approach is effective in accurate estimation of parameters as well as compaction of the road.Future work involves analytical development of input design methods using approximate dynamic programming and investigation of deep learning based decision models for sequential input design under output constraints.

Algorithm 1 :
Approximate Bayesian Computing Data: Observation y t , Input x t , tolerance level ϵ, number of samples N s Result: Posterior sample set Θ

Fig. 4 :I
Fig. 4: Prior distribution histogram.σ yield ∼ U [0.1, 0.8].E e , E Tiso are jointly distributed as in (45) with E e ∈ [1, 8], E Tiso = [1, 4].The prior of the yield point σ yield is uniform,σ yield ∼ U[σ, σ],and is independent of the other parameters.The parameters E e ∈ [E e , E e ] and E Tiso ∈ [E Tiso , E Tiso ] has to satisfy E Tiso < E e .This is incorporated into the prior distribution by havingp(E e , E Tiso ) = 1/C, (E Tiso , E e ) ∈ E 0, else(43)where C is the area of the region E where E Tiso < E e C =

Fig. 5 :Fig. 6 :
Fig. 5: Estimation performance for online and offline approaches.Estimator is posterior mean E[θ|Y T ; X T ] after T experiments.

Fig. 7 :
Fig.7: Input loading pressure x t obtained by solving the greedy problem (36) in the online setting and corresponding observed plastic deformation y t which is close to the desired target value y.
From (30), note that the prior distribution p t (θ) for experiment t was obtained based on the previous observations and inputs.This means that the posterior p(θ|Y t ; X t ) and its mean E[θ|Y t ; X t ] have previous observations Y t−1 and inputs X t−1 as constants.Then p(Y t ; X t ) = p(y t ; x t ) t−1 i=1 δ(y i − ỹi ) where ỹi are the observed values and p(y t ; x t ) = p(y t |θ; x t )p t (θ)dθ (33) Also, since observations are independent of past observations given the parameter θ, the cost C(x t ) only depends on the input at experiment t, i.e., C(x t ) = Tr C θ|yt p(y t ; x t )dy t .