Multielement polynomial chaos Kriging-based metamodelling for Bayesian inference of non-smooth systems

This paper presents a surrogate modelling technique based on domain partitioning for Bayesian parameter inference of highly nonlinear engineering models. In order to alleviate the computational burden typically involved in Bayesian inference applications, a multielement Polynomial Chaos Expansion based Kriging metamodel is proposed. The developed surrogate model combines in a piecewise function an array of local Polynomial Chaos based Kriging metamodels constructed on a finite set of non-overlapping subdomains of the stochastic input space. Therewith, the presence of non-smoothness in the response of the forward model (e.g.~ nonlinearities and sparseness) can be reproduced by the proposed metamodel with minimum computational costs owing to its local adaptation capabilities. The model parameter inference is conducted through a Markov chain Monte Carlo approach comprising adaptive exploration and delayed rejection. The efficiency and accuracy of the proposed approach are validated through two case studies, including an analytical benchmark and a numerical case study. The latter relates the partial differential equation governing the hydrogen diffusion phenomenon of metallic materials in Thermal Desorption Spectroscopy tests.


Introduction
The widespread use of high performance computing (HPC) technologies has enabled the increasingly frequent adoption of computationally intensive numerical models in a myriad of disciplines in academia and industry.Such high fidelity models allow conducting virtual testing of engineering systems, avoiding the technical limitations and minimizing the costs associated with traditional experimental testing.Nevertheless, formidable challenges still arise when implementing these models into computationally demanding studies such as optimization [1] , sensitivity analysis [2] , model identification and calibration [3] , reliability analysis [4] , or robust design [5] .As a solution, a variety of surrogate models or metamodels have been proposed in the literature in recent years.Nevertheless, despite the considerable advances in the field, there remain open research challenges for their extensive use such as handling highly nonlinear model responses, limited training datasets and, in general, the online implementation of surrogate models to enable decision-making in engineering systems [6,7] .In particular, the development of real time surrogate model-based parameter estimation approaches draws high interest in frontier research fields such as digital twins development [8] and smart maintenance of engineering systems [9] .
In their broadest sense, surrogate models are computationally light black box representations of resource-intensive models.Surrogate modelling methods can be generally classified into three categories [10] : (i) projection-based methods or reduced-order models (ROMs) [11] ; (ii) multi-fidelity methods [12] ; and (iii) data-driven or responsive surface methods (RSMs).Projection-based techniques project the governing equations of the original model onto a low-dimensional subspace.Hence, although ROMs have the advantage of retaining the physics underlying the model, these are limited to situations where access to the governing equations is granted, which is not the case in many practical applications when using commercial software.Multi-fidelity methods are built by simplifying the underlying physics or reducing the numerical resolution.These methods may also render some difficulties, being highly case-dependent and requiring specialized expertise to find a suitable trade-off between prediction accuracy and computational burden.Such difficulties have fostered rapid developments of RSMs in recent years as a non-intrusive technique with great flexibility in a wide range of applications.The key advantage of these methods relies in the fact that the forward model does not need to be modified and, therefore, it can be essentially treated as a black box [13] .Among the broad variety of RSMs available in the literature, some of the most popular ones are Kriging [14] , radial basis functions (RBF) [15] , support vector regression (SVR) [16] , artificial neural networks (ANN) [17] , Gaussian process (GP) regression [18] , polynomial chaos expansions (PCE) [2] and Polynomial Chaos Expansion based Kriging (PCK) [19] .These models are trained by exploiting a set of realizations of parameters of interest of the model, called the experimental design (ED), and the corresponding model evaluations, also called quantities of interest (QoI).Therefore, the accuracy of data-driven surrogate models is highly determined by the dimensions of the design space and the number and distribution of the training samples in the ED [20] .In engineering practice, obtaining the ED constitutes the most time-consuming part since it requires the evaluation of the computationally intensive forward model at each sample point.Choosing high quality EDs is thus critical to achieve high accuracy in the surrogate model construction with the least possible number of training samples.Sampling approaches can generally be divided into static (one-shot) and sequential methods.One-shot sampling generates the training sample points in one single step, and common approaches include fractional designs and orthogonal arrays [21] .While these techniques offer easy implementation and minimal computational cost, the determination of the optimal sample size may be troublesome when the behaviour of the forward model is unknown.To minimize such difficulties, a number of sequential sampling strategies have been introduced, including adaptive and space-filling sequential methods [22] .On one hand, space-filling sequential designs such as Latin Hypercube Sampling (LHS), Sobol, Hammersley and Halton sampling [23] generate samples iteratively to attain good coverage of the parametric domain.On the other hand, adaptive sampling techniques iteratively draw new samples in regions of the parametric space with large prediction errors, enabling to account for local refinements in the ED (refer to references [22,24] for a thorough state-of-the-art review).
A second major challenge of non-intrusive surrogate models regards the difficulties involved in the fitting of non-smooth models exhibiting unsteadiness, sparseness or large perturbations.In these cases, adaptive solutions accommodating local relevant refinements of the response surface are required.A large volume of research has been conducted in the last decade to address this issue, giving origin to a number of advanced surrogate modelling techniques such as multi-resolution generalized PCE [25] , domain partitioning [26] , sparse grid collocation [27] , Voronoi tesselations [28] , local search at trust regions [29] , clustering-based partitioning [30] , ensembles of surrogate models [31] , multi-element generalized polynomial chaos (ME-gPC) method [27] and multi-element probabilistic collocation methods (ME-PCM) [32] .In this light, different partitioning techniques can be found in the literature in the context of the pure GPs [33,34] , PCE [35] , and PCE-based mapping of likelihood functions in parameter inference applications [25,36] .In those works, different domain partition criteria were proposed based on dissimilarity measurements between regions [34] , data density [33] , or maximum residual differences [35,36] .In general, these approaches usually generate series of subdomains where the non-smooth response surface can be assumed locally smooth, thus enabling the definition of local surrogate models contributing to the global response in a piecewise fashion.
The development of cost-efficient surrogate models opens vast new opportunities for real-time parameter estimation applications.Probabilistic Bayesian approaches are particularly attractive owing to their ability to assess the effects of uncertainties on the model parameters and the derived response predictions, as well as their robustness to noise pollution and efficiency to deal with ill-conditioning and ill-posedness.Such excellent features have fostered their implementation in multiple fields such as structural identification [37] , geotechnical problems [38] , material characterization [39] , and bioengineering [40] , just to mention a few.In general, Bayesian parameter estimation approaches exploit experimental data to infer the posterior probability distribution functions (PDFs) of certain unknown model parameters through the Bayes' theorem.Nonetheless, the direct evaluation of the posterior PDFs requires solving the possibly high-dimensional integral related to the evidence of the model.Therefore, except for some trivial cases, posterior PDFs often need to be approximated numerically.Markov chain methods constitute the most widespread set of techniques to extract series of samples to estimate the posterior PDFs, allowing to sample from a large class of high-dimensional distributions.Popular procedures for Markov chain Monte Carlo (MCMC) sampling are the Metropolis-Hastings [41] and Gibbs algorithms [42] .The basic idea of these techniques is to construct a Markov chain with a stationary distribution resembling the posterior distribution, in such a way that a sample of the joint PDF of the model parameters can be obtained by collecting the states of the chain.These methods guarantee asymptotic convergence to the exact PDF, although a considerably large number of iterations are typically required to achieve convergence, which compromises the computational efficiency of the inference [43] .To alleviate the computational burden in classical MCMC techniques, a variety of more efficient sampling algorithms have been proposed in recent years, including Sequential Monte Carlo (SMC) [44] , Transitional MCMC [45] , and Bayesian broad learning [46] .Notwithstanding these advances, their elevated computational cost remains a critical limitation when high-fidelity models are considered in the inference problem.Herein is where surrogate models offer an efficient solution to conduct cost-efficient Bayesian inference while retaining the accuracy of high-fidelity models.The enormous potentials of this approach are evidenced by the increasing number of research studies reported in recent years.It is worth noting the work by Schneider et al. [47] who proposed a Bayesian procedure using rational PCE metamodels of the response of dynamic systems in the frequency domain, and demonstrated its effectiveness for the identification of a cross-laminated timber plate.Xing et al. [48] developed an additive GP model for multi-fidelity surrogate modelling inserted into a non-parametric Bayesian approach with a closed-form solution for the predictive posterior PDF.The accuracy and flexibility of their approach were validated with several benchmark problems, including the identification of a solid oxide fuel cell model and an elbow-shape pipe under turbulent mixing flow conditions.Ierimonti and co-authors [49] proposed a Kriging-based conjugate Bayesian identification methodology for online damage identification of an instrumented monumental building, the Consoli Palace in Gubbio (Italy).Del Val et al. [50] implemented an MCMC approach to infer the catalytic recombination parameters of reusable thermal protection materials exploiting plasma wind tunnel measurements.Interestingly, instead of bypassing a high-fidelity model, those authors approximated the likelihood function in the Bayesian inference using a GP surrogate model to accelerate the parameters estimation.
In light of the literature review above, it is apparent that the fields of surrogate modelling and its applications for fast parameter estimation have experienced considerable advances in recent years.Nonetheless, light metamodels capable of bypassing highly nonlinear models for online parameter estimation are yet to be fully developed.In this context, the present work presents a novel multielement surrogate model extending the PCK model proposed in Schobi et al. [19] for online Bayesian parameter inference of non-smooth models exhibiting highly nonlinear spatial variability.To do so, a simple but efficient and flexible regular block partitioning approach is adopted.On this basis, the space domain is partitioned into distinct subregions with splitting directions defined after a preliminary sensitivity analysis.The algorithm prioritizes the partition of those variables providing highest sensitivities in terms of the Sobol indices, which can be readily extracted from the coefficients of a global PCE [51] .Regarding the construction of the local PCK surrogate models, the optimal order of the polynomials in the PCE is automatically identified by a model selection technique for sparse linear models, the least-angle regression (LAR) algorithm set out by Efron et al. [52] .Then, the optimal PCE is inserted into a Kriging predictor as the trend term, while the stochastic term is fitted through a genetic algorithm (GA) global optimization approach.The global model response is obtained by combining the local surrogate models in a piecewise fashion.In order to minimize the computational burden in the construction of the surrogate models, the ED is obtained with the adaptive Monte Carlo-Intersite-proj-th (MIPT) approach [22] .Finally, the surrogate model is used for Bayesian parameter estimation using a cost-efficient adaptive MCMC with delayed rejection (DRAM) algorithm developed by Haario et al. [53] .The effectiveness of the proposed approach is validated through two benchmark case studies: (i) an analytical benchmark; (ii) and a partial differential equation (PDE) for thermal desorption spectroscopy (TDS) experiments of hydrogen in metals.The latter represents a formidable example of a model exhibiting a non-smooth behaviour in the shape of nonlinearities and unsteadiness.The presented results and discussion demonstrate the robustness of the proposed approach to the presence of measurement noise and multimodality in the posterior distributions.Overall, the developed surrogate model assisted Bayesian inference approach represents a comprehensive solution for fast parameter inference of computationally intensive mathematical models, offering wide applicability across disciplines such as the identification of structural systems, geotechnical problems, Bioengineering, or Astrophysics, among others.
The remainder of this paper is organized as follows.Section 2 outlines the theoretical formulation of the proposed approach.Section 3 presents the numerical results and discussion.In particular, two case studies are investigated, namely a benchmark analytical model and a numerical model of the TDS analysis of hydrogen desorption in metals.Finally, Section 4 discusses the contributions of this work and presents the main concluding remarks.

Theoretical formulation
The main purpose of this section is to present the theoretical fundamentals of the proposed surrogate model-based Bayesian inference approach.The general methodology is sketched in Fig. 1 and comprises four main steps, namely (i) domain partitioning; (ii) construction of the local surrogate models; (iii) surrogate models assemblage; and (iv) validation.

Surrogate modelling: polynomial chaos expansion based Kriging
Let ( , , μ) be a probability space, with ⊂ R M denoting the event space equipped with a σ -algebra of subsets of and a probability measure μ such that μ( ) = 1 .Let M : ⊂ R M → R be a computational model mapping a vector of T into the output variable or quantity of interest y ∈ R .The goal of surrogate modelling is to approximate the forward model M , which is typically computationally intensive, by a computationally inexpensive function ˆ M .In this paper, PCK metamodels combining PCE and Kriging are adopted.In this light, PCE is used to approximate the global behaviour of the computational model M while the Kriging metamodel captures its local behaviour.

Polynomial chaos expansion
Assume that the input vector x ∈ is constituted by M independent random variables components { x i } M i =1 with probability density functions μ X i .Then, PCE represents the output response y ∈ R as the infinite expansion of M over an orthonormal basis of multivariate polynomials α as: where a α , α = (α 1 , . . ., α M ) , α i ∈ N are the coefficients of the expansion.Orthonormal basis families associated with a variety of standard distributions can be found in reference [54] .Multivariate polynomials α can be expressed in terms of a family of univariate polynomials j are also orthonormal with respect to the marginal distribution, that is: with δ j,k being the Dirac Delta function.Random variable x induces the probability measure μ The PDF of x is given by the probability measure μ x as μ x ( u ) = μ( x ≤ u ) , u ∈ R M .Note that, since the components of x are independent, the joint probability function μ x ( u ) is given by the product of the marginal PDFs of x i , i.e. μ x ( u ) Therefore, Eq. ( 2) can be written in a more compact form as: Although the expression in Eq. ( 1) is exact for an infinite number of terms, in practice only a finite number can be computed and a certain truncation scheme needs to be adopted.One of the simplest approach consists in selecting all the polynomials whose total degree | α| = M i =1 α i belongs to the set: For high-dimensional and non-linear problems, this truncation procedure usually leads to large numbers of polynomial coefficients and considerable computational burdens.Nevertheless, it is often observed in many practical applications that coefficients corresponding to high interaction terms between the input variables are close to zero, a phenomenon that is also known as the sparsity-of-effect principle [51] .To alleviate this, an hyperbolic truncation scheme can be adopted.
This approach selects all multi-indices with q -norm α q = M i =1 α q i 1 q less than or equal to a certain model order p, i.e.A M,p,q = α ∈ N M : α q ≤ p .Note that the expansion tends to only maintain univariate polynomials as the q -norm decreases, thus achieving a reduction in the computational cost of the metamodel.Nonetheless, the number of terms may remain elevated since the potential sparseness in the coefficients is not being actually assessed.Following the work by Blatman and Sudret [51] , the LAR algorithm [52] is adopted to further reduce the number of polynomial coefficients.In the context of PCE, LAR constructs a set of expansions incorporating an increasing number of basis polynomials α , from 1 to P = card A M,p,q .The resulting sequence of index sets is used to construct a family of expansions with decreasing sparseness.Finally, a cross-validation procedure can be implemented to select the best metamodel among the obtained family of expansions.In particular, the Bayesian Information Criterion (BIC) is adopted in this work.Once the optimal index set is selected, the expansion coefficients a = { a α , α ∈ A M,p ⊂ N M } are obtained by minimizing the expectation of the least squared error: In practice, Eq. ( 5) is calibrated on a set of N realizations = x (1) , . . ., x (N) of the input variable x forming the ED.In order to obtain a representative ED, the adaptive MIPT sampling method [22] is adopted in this work.Then, the expectation operator in Eq. ( 5) is replaced by its discretized version using the ED realizations as follows: Denoting the realizations of the output variable y by y = { y (1) = M ( x (1) ) , . . ., y (N) = M ( x (N) ) } T , the solution of the optimization problem in Eq. ( 6) reads: where denotes the information matrix calculated from the evaluation of the basis polynomials on .For the least-square minimization problem in Eq. ( 6) to be well posed, the size of the ED is usually selected according to the heuristic rule N ≈ 2 • P or 3 • P [51] .Once Eq. ( 6) has been solved, the predictions of the PCE surrogate model can be obtained as:

Polynomial chaos expansion based Kriging (PCK)
The Kriging method assumes that the response of a computational model M ( x ) is modelled by the sum of a stochastic random process Z ( x ) and a regression model T ( x ) , also called trend, in the form Fuhg et al. [22] : The stochastic component in Eq. ( 9) is fully determined by the covariance function [55] : with σ 2 being the process variance, and R x − x ; θ an auto-correlation function [56] between two input sample points x and x ' that depends on certain hyper-parameters θ to be computed.In this work, the Gaussian correlation function is adopted as: The trend term of the Kriging model in Eq. ( 9) interpolates the forward model evaluations at the ED, while the local variability is captured by the stochastic process.Depending on the form of the trend, three different versions of Kriging are typically referred to in the literature [22] , including simple, ordinary and universal Kriging, which respectively correspond to polynomials of degrees 0, 1 and N.In this work, with the aim of combining the excellent global approximation capabilities of the PCE previously introduced in Section 2.1.1 , the sparse PC expansion obtained by LAR is introduced in the shape of the trend term in Eq. ( 9) .The resulting PCK metamodel reads: The construction of the PCK metamodel in Eq. ( 12) consists in two steps.Firstly, the optimal set of orthonormal polynomials α (for α ∈ A the truncation set) is obtained by LAR as indicated in Section 2.1.1 .Secondly, the calculation of hyperparameters ˆ θ and the polynomial coefficients and the process variance { a ( ˆ θ) , σ 2 ( ˆ θ) } are obtained.The optimal correlation parameters ˆ θ can be determined by the Maximum-Likelihood-Estimation (ML) through the following minimization problem [57] : In order to solve the optimization problem in Eq. ( 13) , local optimization algorithms such as gradient-based methods are often used.Nonetheless, a major drawback of these techniques relates the troublesome identification of global maxima/minima, being possible to get stuck in local maxima/minima.To avoid this, a global genetic algorithm optimization procedure is used in this work.Since the correlation matrix is symmetric and positive definite, its inverse in Eq. ( 13) is computed by Cholesky decomposition.Then, once ˆ θ is computed, the polynomial coefficients and the process variance { a ( ˆ θ) , σ 2 ( ˆ θ) } are calculated using the Empirical Best Linear Unbiased Estimator (BLUE) as [58] : where θ is the correlation matrix and i j = ψ j x (i ) the information matrix evaluated at all the samples of the ED.

Effective explorative sampling
With the aim of generating representative EDs, the MIPT algorithm is adopted as a computationally efficient and easily implementable adaptive sampling technique.The main advantage of this technique compared to space-filling techniques such as LHS regards its ability to avoid local clustering of points which may consequently lead to numerical instabilities in the inverse of the Kriging correlation matrix in Eq. ( 14) .This exploration distance-based sampling method iteratively augments the ED by adding new sampling points with maximum distance with respect to the data population in the ED among a large set of N c random Monte-Carlo candidates.Specifically, among the candidates set C = { ξ (1) , ξ (2) , . . ., ξ (N c ) } , a new sample x (N+1) is chosen by solving the following optimization problem: with • 2 the euclidean norm, i.e.

Multi-element surrogate model approach
The previously presented PCK metamodel suffers from low convergence rates when the forward model M exhibits nonsmoothness [59] .Thus, considerably large ED sizes are often required to achieve accurate predictions.This aspect undermines the computational efficiency of the PCE-based Kriging model, which is dominated by the O(N 3 ) complexity of the Kriging predictor.In turn, this implies long construction times or even memory overflow issues when solving the optimization problem in Eq. (13) .Moreover, the larger the size of the ED, the slower the evaluation of the corresponding metamodel, which reduces or vanishes the advantages of the surrogate approach.To address this issue, a multi-element PCK model inspired by the ME-gPC method by Wan and Karniadakis [60] is proposed in this work.This approach consists in the partitioning of the random input space into a finite set of non-overlapping subdomains, the construction of a local PCK surrogate model in each one following the formulation in Section 2.1.2and, finally, assembling them into a piecewise function to obtain a global metamodel, as sketched in Fig. 1 .
To systematize the partition of the domain, a simple but efficient regular block partitioning approach has been adopted.To do so, the domain partitions are conducted one variable at a time based on a preliminary sensitivity analysis.To this aim, the algorithm prioritizes the partition of those variables providing highest sensitivities in terms of the Sobol indices, which can be readily extracted from the coefficients of a global PCE [51] .In this way, priority in the partitioning is given to the direction of those parameters with highest sensitivity, i.e., those with the greatest effect on the variability of the quantity of interest y .Over a set a surrogate models considering an increasing number of domain partitions, the optimal number of Table 1 Error metrics for the accuracy assessment of surrogate models over a validation set (VS) of size K (Ref.[61] ).

Normalized mean-square error (NRMSE)
Normalized average absolute error (NAAE) Coefficient of Determination ( R 2 ) Normalized maximum absolute error (NMAE) partitions is determined in terms of the quality metrics reported hereafter in Table 1 .In general, for the random variable x : → D x ⊂ R M , a decomposition is defined as: where χ D j : → R denotes the indicator random variable: In this way, the global model is defined in a piecewise fashion as: ˆ As aforementioned, the number of partitions in this work is defined after a parametric analysis.Nevertheless, the previous formulation may be readily automated as follows.The splitting criterion of the domain is determined by a certain user-defined accuracy goal and a minimum number of samples N per region.Afterwards, the splitting process is performed iteratively from a PCK model built over the full parameter space .In case the target accuracy has not been reached, the space is split into two regions and the ED is enriched in each of these subdomains by the MIPT algorithm until there are N samples in each one.Note that, given the sequential nature of MIPT, the information of the previously extracted samples is not lost.If the accuracy goal is not reached yet, a new division of the space and a new enrichment of the ED are performed.

Surrogate model accuracy. Complexity analysis of the algorithm
To evaluate accuracy of the developed metamodel, both local and global error metrics are considered.These metrics are computed by considering a validation set (VS) = { ξ (1) , . . ., ξ (K) } , K ∈ N , of the parameters space (independent of the ED).Denote by ϒ = { υ (1) = M ( ξ (1) ) , . . ., υ (K) = M ( ξ (K) ) } and ˆ ϒ = { ˆ υ (1) = ˆ M PCK ( ξ (1) ) , . . ., ˆ υ (K) = ˆ M PCK ( ξ (K) ) } the outputs of the VS estimated by the forward model and the metamodel, respectively.Then, the accuracy of the surrogate model can be assessed by using the error metrics like those collected in Table 1 .In this table, Ῡ and σ ϒ = In addition to the error metrics shown in Table 1 , and to verify the whole rate of convergence of the proposed model to the unknown function on untried points, we are interested in bounding the maximum PCK-predictive error over the domain where ˆ is the best linear unbiased predictor (BLUP) of the model M response at any untried point T the vector of correlations between the design sites j = x (1) , . . ., x (N j ) ⊂ D j and x , and R j the selected correlation function particularized in the j th subregion.Note that the uniform bound in Eq. ( 18) covers the worst case for the prediction error of the PCK model.
It has been reported in the literature [62,63] that the prediction error of the universal Kriging converges to zero under uniform metric.Adapting Theorem 2 in Wang et al. [63] to the multielement PCE-Kriging model proposed in this work, the prediction error can be stated to satisfy: where J is the number of subdomains, P j = card A M,p,q , and A is a constant depending on the eigenvalues of j .Term P j ( x ) denotes the power function given by P 2 j ( x ) := 1 − r j T ( x ) R j −1 r j ( x ) , and P := sup x ∈D x P j ( x ) is the supremum of the pointwise predictive standard deviation.It is thus reasonable to look for EDs minimizing P j .Note that the rate of convergence in Eq. ( 19) is a deterministic function dependent on the experimental design j and decreasing with P j .In fact, when N j = card ( j ) increases, P j tends to zero and so does the multielement PCK prediction error under the uniform metric in Eq. ( 19) .
On the other hand, the algorithm for finding an optimizer of Eq. ( 13) is an iterative process involving the calculation of the inverse and determinant of a large N × N covariance matrix R i j = R x (i ) − x ( j) ; ˆ θ .Thus, the computational effort to obtain the solution may become impractical for large numbers N of training data points in .Note that the PCK model requires O(N 3 ) operations and has a memory complexity of the order of N 2 [34] .In this light, the splitting technique presented in Section 2.1.3leads to substantial reductions in the computational effort.Specifically, taking N j = card ( j ) , with N j N, J N j , the algorithm effort and the memory storage reduces to , respectively.On the other hand, the optimal order of the polynomials in the PCE is automatically identified by the LAR algorithm.It is reported in reference [52] that the LAR algorithm with M variables requires O(M 3 + N j M 2 ) computations in any subdomain D j .Therefore, in our case where M N j , it follows that M 3 < N j M 2 and, thus, O(N j M 2 ) ∼ O(N j ) .Hence, the computational complexity of PCE when inserted as the trend term is marginal with respect to the overall construction of the Kriging model, thereby we can deduce that the efficiency of the proposed PCE-Kriging metamodel is

Bayesian parameter inference via MCMC
In the Bayesian inference framework, model parameters θ are conceived as a random variable with a certain posterior PDF π described by Bayes' theorem: where p( y | θ) = L ( θ) denotes the likelihood function, p( θ) the prior distribution of the model parameters, and p( y |M ) a normalizing constant, also called evidence.In the context of this work, y and θ represent a set of n experimental observations and the model parameters of the metamodel to be calibrated, respectively.Errors ε between the experiment and the predictions of the surrogate model are assumed to be normally distributed with zero mean and standard deviation σ ε , that is y = M ( θ) + ε with ε ∼ N (0 , σ ε I ) .Then, the likelihood function L ( θ) can be expressed as: Obtaining π from Eq. ( 20) in analytical closed-form is infeasible in most practical applications, being MCMC methods the most popular approach to numerically characterize the PDF of the model parameters.This approach allows one to draw samples from π without computing the model evidence, which is independent from the model parameters θ.In this work, the DRAM algorithm developed by Haario et al. [53] is implemented.This approach combines delayed rejection (DR) [64] and adaptive Metropolis (AM) [65] , which enhances the sampling efficiency of the sampling and enables the identification of multi-modal PDFs.Given a set of observed data samples in vector d , the working principle of the DRAM approach can be outlined as follows: 1. Initialize the parameter set θ c = θ 0 and the number T of desired samples.Set an initial point from the parameter space and the covariance of the proposal distribution p = 0 .The proposal distribution is chosen as a multivariate Gaussian distribution with mean θ c and covariance matrix p .Select the initial non-adaptation period n o and set i = 1 .2. Propose a new parameter value θ p, 1 by sampling from a proposal PDF S 1 ( θ, θ c ) .Accept θ p, 1 with probability: and go to step (4).If rejected, propose a second stage move in step (3).
3. Propose a second stage move θ p, 2 sampling from S 2 ( θ, θ p, 1 , θ c ) .This second stage proposal depends not only on the current position of the chain but also on the candidate that has just been proposed and rejected.Accept or reject θ p, 2 by setting: with 4. Update the covariance matrix p as: with s d a scaling parameter.Following [66] , s d = 2 .4 2 /d with d being the number of fitting parameters is recommended as a good default value in most applications.5. Go to step 2, until the desired number of samples T are obtained.

Numerical results and discussion
This section presents two application case studies to demonstrate the effectiveness of the proposed surrogate modelbased Bayesian parameter estimation.These include a two-dimensional benchmark function and the PDE for TDS testing of hydrogen desorption in metals.The previous formulation has been implemented in Matlab environment, and all the numerical tests have carried out in a computer Intel(R) Core(TM) i9-10900X CPU @ 3.70 GHz with 64 GB of RAM memory.In the remainder of this section, for simplicity in the notation, the predictions of the PCK metamodels ˆ M PCK are noted as ˆ M .A q -norm value of 0.95 and Legendre polynomials of orders ranging from 2 to 6 are selected to build the PCEs in all the analyses hereafter.For the generation of the EDs, the number of random Monte Carlo candidate samples in the MIPT algorithm introduced in Section 2.1.2is set to 25,0 0 0.

Two-dimensional Drop-Wave function
This first case study investigates the Drop-Wave function, also known as the Salomon's function [67] , given by f : This function is commonly used for benchmarking optimization algorithms.Owing to its highly non-linear character, the Drop-Wave function represents an ideal case study to validate the proposed multi-element PCK metamodel.Note that the surrogate modelling of this function using conventional techniques is extremely challenging given its fast-varying gradients and irregular response as observed in Fig. 2 (a).Following Section 2.1.3, four experimental design sets ED i ⊂ D x , i = 1 , . . ., 4 containing 360, 720, 1440 and 2880 samples have been defined.In addition, three different number of domain partition schemes P j , j = 1 , . . ., 3 , have been considered.These include and P 3 = −10 , −10 3 ∪ −10 3 , 10 3 ∪ 10 3 , 10 2 , leading to a total of two, four and nine sub-domains, respectively.The number of samples has been chosen with the aim of obtaining a wide range of errors to correctly identify the convergence of the prediction error.For instance, if one takes the R 2 error metric, note that the constructed metamodels exhibit a wide range of R 2 values from 0.011 to 0.999.Additionally, the predictions by a previously reported multielement approach, the Stochastic Spectral Embedding (SSE) model proposed by Sudret and Marelli [35] , are also presented as a reference solution.
The SSE model is a PCE-based technique consisting of constructing a sequence of residual spectral expansions of the target model in subdomains of the input space.The implementation included in the UQLab software [68] has been used to carry out the analyses.Four different surrogate models have been built, one for each considered ED.As parameters, a q -norm value of 0.95, polynomials ranging from degree 2 to 10, and a minimum size of points per region equal to the size(ED)/120 have been selected.To sample the ED, the sequential experimental design based on the LHS implemented in UQLab has been chosen.Hence, a total of sixteen surrogate models have been constructed.For ease in the discussion, the PCK surrogate models are specified with sub-and super-indexes denoting the size of the ED and the number of partitions, respectively, as reported in Table 2 .All the surrogate models have been validated using a reasonabily large VS of 20,0 0 0 samples, and the accuracy of the models has been evaluated through the accuracy metrics reported in Table 3 .The computational times involved in the construction t c of the surrogate models, as well as their evaluation times t e and t e (VS) for a single point and the full VS

Table 2
PCK surrogate-models constructed for the Drop-Wave function considering increasing EDs ( ED i ) with varying numbers of domain partitions ( P j ) (VS of 20,0 0 0 samples).
No. of sub-domains ED 1 (360 samples) ED 2 (720 samples) ED 3 (1440 samples) ED 4 (2880 samples)  The first two metamodels consider the whole design space, while the last two account for partition approaches.It is noted in this figure that the best approximations are found for ˆ M 1 4 (no partitions) and ˆ M 3 4 (9 partitions with EDs of 2880 samples).The predictions by these models exhibit low scatter around the diagonal line (perfect metamodel) with coefficients of determination R 2 very close to 1.The limited efficiency of the SSE model in this case study is evidenced by the large scatter of its predictions along the diagonal in Fig. 3 (d).Interestingly, note that the predictions by ˆ M 1 4 slightly  outperform those obtained with ˆ M 3 4 , while higher numbers of partitions do not seem not to systematically improve the prediction accuracy.Nevertheless, the computational times involved in the construction and evaluation of ˆ M 1 4 are, respectively, about 10 and 100 times those required by ˆ M 3 4 (see Table 3 ).It is extracted from this analysis that the selection of the optimal surrogate model must be conducted by balancing the computational burden and the fitting accuracy.In this regard, Fig. 4 investigates the computational efficiency in terms of t e versus prediction accuracy (NRMSE) for all the considered surrogate models.In this figure, it is trivially observed that as the size of the ED increases, both computational time and accuracy of the metamodels increase.It is important to highlight that the consideration of higher number of subdomains leads to lower evaluation times for all the considered EDs.This is explained by the implementation of the LAR algorithm to extract optimal sets of polynomials in the PCE, and in particular, thanks to the reductions in the computational cost involved in the construction of the Kriging predictor (see Section 2.1.4).When inspected in a partition-wise fashion, the forward model exhibits a smoother behaviour, in such a way that the PCE requires less high-order polynomials to reproduce its behaviour.This results in more compact expansions, which also decreases the cost in the computation of the correlation matrix in the Kriging metamodel.On the other hand, note that the higher the order in the PCE, the larger the number of samples that are required in the ED to fit the expansion with accuracy.In this light, to reach a comparable accuracy to the one achieved by the Multi-Element PCK ( R 2 > 0 .99 ) through SSE, it is necessary to increase the degree of the polynomial expansion up to 14 and to sample a ED with more than 45,0 0 0 training points.These results strengthen the comparatively superior convergence rate of the proposed approach for this class of problems with highly non-linear spatial variability.As previously detailed in Section 2.1.4, the computational complexity of the proposed PCK model is O(N 3 j ) .This is due to the Cholesky decomposition of the correlation matrix R in Eq. (13) .Therefore, as the ED increases, the computational cost involved in the determination of the stochastic hyper-parameters of the Kriging model and its evaluation rises dramatically.On the other hand, the dependence of the accuracy of the metamodel with the number of partitions is not so clear.It is noted that the accuracy of metamodels trained with a larger number of subdomains is higher compared to those with no partitions for limited to moderate EDs.Nevertheless, when the size of the ED goes from moderate to large, the accuracy diminishes.

Thermal desorption spectroscopy (TDS) of hydrogen in metals
This last section reports the use of the proposed surrogate model for the Bayesian identification of hydrogen desorption and trapping characteristics in metallic materials.Hydrogen embrittlement (HE) refers to the loss of ductility and toughness of metallic alloys induced by hydrogen atoms deposited at lattice sites and micro-structural defects such as dislocations, grain boundaries or vacancies [69,70] .Although this phenomenon has been extensively documented since the 19th century [71] , the growing trend towards a hydrogen-based economy as a means of mitigating CO 2 emissions and fossil fuel dependency has generated unprecedented interest on HE research.Micro-structural defects in metals act as 'trap' sites, which sequester hydrogen and govern the susceptibility to HE [72,73] .Their characterization is thus of pivotal importance for the understanding of HE and the design of HE-resistant alloys, and this is generally achieved using TDS experiments [74] .The TDS test involves several stages [75] : charging a sample with hydrogen, heating the sample at a fixed rate, and detecting the flux of desorbing hydrogen as a function of temperature by using a mass spectrometer.The hydrogen flow curve of desorbing hydrogen as a function of temperature defines the TDS spectrum, whose peaks can be associated with the presence of diverse micro-structural defects.Nonetheless, the formation of peaks in the TDS spectrum may be induced by the combined action of manifold hydrogen traps, being necessary to use simulation models and inverse calibration for their identification.Previous investigations on the modelling of the TDS test evidenced the existence of non-smooth relationships between the flux curves and the parameters characterizing micro-structural defects (see e.g.[76] ), making this application a formidable benchmark case study for the formulation presented in this work.In the remainder of this section, the PDE governing the hydrogen diffusion in materials tested by TDS is introduced in Section 3.2.1 .The construction of the surrogate model and its performance evaluation is reported in Section 3.2.2and, finally, Section 3.2.3presents the Bayesian parameter identification results.

TDS governing diffusion equation
Consider a one-dimensional specimen of length L as sketched in Fig. 5 (a).The specimen is subjected to increasing temperatures T , starting from T o and increasing at a constant heating rate φ.Hydrogen atoms occupy normal intersticial lattice sites (NILS) and additionally can reside at trapping sites such as interfaces or dislocations.The kinetics of hydrogen trapping and detrapping in metals is commonly described with a two-level system as sketched in Fig. 5 (b) for the case of a single trap.The potential landscape in this figure describes the diffusion path of hydrogen in metals, the trap binding energy H being the difference between detrapping and trapping energies.Let us assume that the number of hydrogen traps in the specimen amounts to N t .Then, let C L (x, t ) and C T,i (x, t ) , i = 1 , . . ., N t , denote the hydrogen concentration in the lattice and in the i -th trap, respectively, with x ∈ [ −L/ 2 , L/ 2 ] and t respectively denoting space and time.On this basis, the Fickian diffusion equation needs to be enriched with source and sink terms as [76] : with D L = D o exp ( −Q/RT ) being the lattice diffusion coefficient, which is expressed in terms of the lattice activation energy Q, diffusion pre-exponential factor D o , and the universal gas constant R .It is convenient to introduce the lattice and Trap binding energy Trap density , respectively, by rewriting the corresponding concentrations in the form Here, β is the number of NILS per unit volume, α is the number of atoms sites per trap, N L is the number of lattice atoms per unit volume, and N T,i is the number of trap sites per unit volume.Therefore, Eq. ( 27) can be rewritten as: The PDE in Eq. ( 28) needs to be complemented with trap kinetic equations.To this aim, the formulations by MacNabb and Foster [77] and Oriani [78] are commonly adopted.The latter represents a simplification of the former by assuming that a local equilibrium exists between the hydrogen atoms at the lattice sites and the i -th trap such that, for θ L 1 , with K i being the local equilibrium constant for the i -th trap: Introducing Eq. ( 29) into (28) , and following the non-dimensional formulation developed by Raina et al. [76] , the governing PDE describing hydrogen diffusion in the TDS test can be recast in a compact form as: with θ 0 L being the initial lattice occupancy.The non-dimensional variables employed are listed in Table 4 .The initial and boundary conditions of the PDE in Eq. ( 31) are schematically presented in Fig. 5 (a).At t = 0 , it is assumed an initial uniform lattice occupancy θ L x , t = 0 = 1 .Thereafter, the hydrogen lattice occupancy is assumed zero at the boundaries, that is θ L x = ±1 / 2 , t > 0 = 0 .As temperature raises, the lattice occupancy evolves spatially and temporally as sketched in Fig. 5 (c), and the flux of hydrogen atoms J(t ) diffusing out at boundaries is measured as presented in Fig. 5 (d).This flux can be obtained in non-dimensional terms after solving Eq. ( 31) as [76] : Generally, the magnitudes of Q, D 0 and θ 0 L are known, and the heating rate φ is an input to the TDS system.Therefore, the TDS spectrum can be used to map the microstructural hydrogen traps, as characterised by their trap densities ( N i ) and binding energies ( H i ) .These can be obtained for a given flux curve J by the inverse calibration of the PDE in Eq. (31) .
The surrogate modelling of the flux curves obtained after solving Eq. ( 31) represents a formidable problem due to the strong nonlinearities of these curves.Specifically, depending upon the hydrogen trap configuration, several different regimes can be observed, as previously discussed by Raina et al .[76] .Specifically, their results for the case of metals containing a single trap showed that no peak flux is attained for low trap densities and binding energies.Alternatively, when a peak flux is found, those authors identified two distinct regimes (I and II) originated by two types of microstructural defects, referred to as shallow and deep traps.Shallow traps are characterized by large trap densities, and give origin to peak fluxes that are highly sensitive to both N and H. On the other hand, deep traps are characterized by low trap densities, resulting in peak fluxes that are insensitive to the trap binding energy.The existence of these different regimes turns the construction of a surrogate model covering the whole domain of the traps into an notably challenging task.Note that a large number of high-order polynomials and a dense ED need to be included in the PCE to accurately represent the whole global behaviour of the hydrogen flux.Such large EDs may severely compromise the computational efficiency of the surrogate model since, as indicated above, the complexity of the Cholesky decomposition of the correlation matrix R in Eq. ( 13) is O N 3 .The metamodeling of TDS experiments thus represents an exceptional case study to justify the use of the domain partitioning approach presented in Section 2.1.3 .

Surrogate modelling of TDS flux curves for metals with two traps
The PDE in Eq. ( 31) is solved numerically by using the pdepe solver in MATLAB.A space discretization of 201 elements along x was found to provide mesh-independent results.To illustrate the behaviour of a ferritic steel sample, representatitive model parameters from reference [76] have been adopted herein, including a lattice activation energy Q = 6 .7 kJmol −1 , diffusion pre-exponential factor D o = 2 × 10 −7 m 2 s −1 , heating rate φ = 0 . 1 and lattice density N L = 8 .46 × 10 28 atoms m −3 , with α = β = 1 .The initial temperature and occupancy fraction are chosen as T o = 293 K and θ 0 L = 10 −6 , respectively, and the thickness of the specimen is chosen as L = 5 mm.The variation of trap binding energies and trap densities are selected as the physically meaningful ranges −40 ≤ H i ≤ −10 and 10 −7 ≤ N i ≤ 10 −2 .In the present study, we limit to the modelling of metals with two hydrogen traps, i.e.N t = 2 .Therefore, in the surrogate modelling, temperature and the trap densities and binding energies are considered as input variables, which amounts to 5 design variables, i.e.M ( x ) = J with After some preliminary sensitivity analyses, two partitions of D x have been considered, namely P 1 , P 2 .Partition P 1 has been defined by splitting the temperature T and activation energy domains ( H i , i = 1 , 2 ) in two, while three segments were considered for the partition of the domain of the trap densities ( log ( N i ) , i = 1 , 2 ).On the other hand, the partitions in P 2 remain identical except for the temperature domain which is divided in three sub-domains.In order to define the optimal surrogate model, EDs of 10 0 0 and 150 0 sampling points per-subdomain have been considered for P 1 , while EDs of 667 and 10 0 0 points per sub-domain have been defined for P 2 .This amounts to four different surrogate models labelled with ˆ M i , i = 1 , . . ., 4 .In order make a fair comparison between the different proposals, models ˆ M 1 and ˆ M 3 are trained with EDs of 72,0 0 0 and 72,036 samples respectively, while ˆ M 2 and ˆ M 4 are trained with 108,0 0 0 samples.The comparison of the metamodels in terms of accuracy and computational efficiency is reported in Table 5 over a VS of 36,0 0 0 samples.Similarly to the results in the previous case study, the consideration of domain partitioning leads to considerable computational time reductions and moderate reductions in prediction accuracy.Note that the evaluation time of the forward model is about 280 ms, so all the metamodels achieve reductions between 98.3%-99.7%.The computation time of the metamodel depends upon the size of the ED in each region, which explains why models ˆ M 3 and ˆ M 2 are the fastest and slowest ones, respectively.On the other hand, the accuracy of the metamodel increases as so does the size of the ED.Indeed, models ˆ M 2 and ˆ M 4 exhibit significantly lower errors compared to models ˆ M 1 and ˆ M 3 .Therefore, in view of these results, ˆ M 4 provides a good trade-off between computational efficiency and accuracy, and it is selected in the subsequent Bayesian model parameter inference.To illustrate the effectiveness of the surrogate model in representing the different stages observed in the TDS test, Fig. 7 shows the comparison of the forward model and the predictions by ˆ M 4 for a variety of combinations of traps, including the case of fluxes without peak, one single peak, and two peaks.It is observed that the proposed PCK model can accurately reproduce all the different regimes observable in the TDS test.Only some minor errors are observed in the no flux regime, given the imposed limitation on the order of the polynomials in the PCE for the sake of computational efficiency.Finally, in order to highlight the superior performance of the proposed multi-element PCK metamodel, Fig. 6 furnishes the comparison of the predictions by standard LAR-PCE (trained with 76,0 0 0 samples) and ˆ M 4 .These results clearly evidence the superior performance of the proposed approach with respect to LAR-PCE, whose predictions versus the forward model exhibits a large scatter around the diagonal line with a low coefficient of determination of R 2 = 0 .46 .

Bayesian inference of the trapping sites from a TDS experiment
In this last subsection, the previous surrogate model ˆ M 4 is used to conduct Bayesian parameter inference following the MCMC algorithm in Section 2.2 .The trap binding energies and densities of the two trap system are chosen as the inference parameters θ = H 1 , H 2 , log ( N 1 ) , log ( N 2 ) in Eq. (20) .With the purpose of assessing the performance of the implemented DRAM MCMC approach to infer the parameters of hydrogen traps covering the two different regions described in reference [76] , two different trap configurations are considered to generate synthetic experimental data from the forward model.A two-trap system (EI) with properties θ = ( −25 , −35 , −3 , −2 .5 ) is considered first.The second one (EII) instead is defined by θ = ( −15 , −30 , −6 , −3 ) .The flux curves obtained in EI and EII correspond to those previously shown in Fig. 7 (f) and (g), respectively.In addition, to evaluate the sensitivity of the model parameter inference to the presence of noise pollution in the experiment, a second analysis of the EII experiment was performed after affecting the flux curve with a zero-mean Gaussian white noise with a standard deviation equal to 0.4 times the mean value of the unpolluted flux curve   (note later in Fig. 11 that such a noise level represents a considerably low signal-to-noise ratio).The experiment EI was defined to illustrate the potentials of the implemented DRAM MCMC to draw samples from a multi-modal distribution.Note that the PDE in Eq. ( 31) does not differentiate the order of the traps, thereby the problem is ill-posed and the posterior distribution is expected to exhibit two modes corresponding to two symmetric solutions.Instead, the experiment EII was designed to account for a trap ( H 1 = −15 , log ( N 1 ) = −6 ) in the regime with no flux as identified by Raina and et al. [76] , while the second trap ( H 2 = −30 , log ( N 2 ) = −3 ) represents a deep trap.Therefore, the PDF in this case should be unimodal.
In the inference analyses, uninformative uniform priors U (−40 , −10) and U (−7 , −2) are selected for H i and log ( N i ) ( i = 1 , 2 ), respectively.A total number of 200 000 samples with a burning time of 50 000 samples were drawn by the previously introduced Bayesian inference approach for EI.The sampling of the posterior PDF in Experiment EII was more challenging given its uni-modal nature with large regions of low probability, requiring up to 480,0 0 0 samples with a burning period of 160,0 0 0 samples to achieve convergence.Interestingly, this phenomenon attenuates when the flux curve is affected by noise, only requiring a chain of 120,0 0 0 samples with a burning period of 30,0 0 0 to attain convergence.This is expectable since the noise-induced lower probability concentration around the exact true solutions makes it easier for the chain to span from one solution to the symmetric one.The initial location state was defined as θ 0 = ( −25 , −25 , −4 .5 , −4 .5 ) , while the prediction error was set to σ ε = 1 E − 9 and 1 .6 E − 5 for the noise unpolluted and polluted cases, respectively.After some initial calibration by visual inspection of the chain traces, a diagonal covariance matrix with entries equal 0 .05 • θ 0 2 was initially defined for the Gaussian proposal.In the AM step the proposal distribution was scaled by a factor s d = 2 .4 2 /d and the non-adaptation period n 0 was set to 500 and 4000 for the EI and EII experiments, respectively.On the other hand, in the DR step the proposal is scaled down by a factor of 0.2.The Markov chain and the joint posterior PDF obtained for Experiment EI are presented in Figs. 8 and 9 , respectively.
As anticipated, the problem is ill-posed and there exist two potential solutions, namely θ = ( −25 , −35 , −3 , −2 .5 ) and θ = ( −35 , −25 , −2 . 5 ,−3 ) .This manifests in the marginal PDFs in Fig. 9 .Specifically, the PDFs corresponding to parameters H 1 and H 2 have two identical modes at −35 and −25 , and parameters log ( N 1 ) and log ( N 2 ) have two modes at −3 and −2 .5 .It is observed in Fig. 9 that, indeed, the implemented DRAM algorithm is capable of exploring the two modes in the distribution, without getting stuck around one of them as it is usually the case when implementing standard MCMC methods.For validation purposes, the posterior PDF has been also computed by direct integration of the forward solution.
To do so, the evidence of the model has been computed over a mesh of 60 4 elements.This required forty five hours of parallel computation on ten cores, while the MCMC approach only required about four hours on a single core.The Highest Density Regions (HDRs) at the 80% and 50% level of both distributions are reported in Table 6 The close fittings between the exact marginal PDFs and those predicted by the surrogate model-based Bayesian inference in Fig. 9 demonstrate the accuracy of the developed approach, as it is also evident from the computed HDRs in Table 6 .Finally, the Markov chain, and the posterior PDF obtained for the TDS experiment EII are reported in Figs. 10 and 11 , respectively, and the posterior HDR values are reported in Table 7 .In this case, the PDFs exhibit one single mode as previously anticipated.This corresponds to the shallow trap ( H 1 = −30 , log ( N 1 ) = −3 ), while the trap in the no-flux regime goes unnoticed.From a Bayesian perspective, this represents an observability limitation of the experiment, being the model of one single trap more likely to represent the material given the experimental evidence.Furthermore, it is noted that the presence of measurement noise does not substantially alter the inference outcome.The modes of the posteriors for the trap densities parameters H 1 and

Conclusions
This work presents the development of a multi-element PCK meta-model for surrogate model-based Bayesian parameter inference of highly nonlinear engineering models.The proposed metamodel combines adaptive sparse PCE and Kriging metamodelling to attain both global and local prediction capabilities.The optimal order of the polynomials in the PCE is automatically identified by the LAR algorithm.Then, the optimal PCE is inserted into a Kriging predictor as the trend term, while the stochastic term is fitted through GA optimization.With the aim of tackling non-smoothness in the forward model, a simple regular block partitioning approach has been implemented.On this basis, the space domain is split into a discrete number of subsets where local surrogate models are constructed.Then, the global model response is obtained by combining the local metamodels in a piecewise fashion.Finally, the surrogate model is used for Bayesian parameter estimation using a cost-efficient DRAM MCMC with DR and AM capabilities.The effectiveness of the proposed approach has been validated through two benchmark case studies: (i) the analytical Drop-Wave function; (ii) and a PDE for TDS tests.Key findings and contributions of this work include: • Optimal surrogate models ought to be defined by preliminary parametric analyses accounting for prediction accuracy and computational cost.The latter is particularly critical when performing computationally intense applications such as Bayesian parameter estimation.To this aim, this work has presented a set of error metrics and a methodological discussion through two validation case studies.• The analysis of the Salomon function has shown that the proposed multi-element PCK model with regular block partitioning provides similar accuracy ( R 2 > 0 .99 , NMAE < 10 −4 ) as the (classical) PCK approach, while achieving 100 and 10 times shorter evaluation and construction times, respectively.Moreover, the presented results have shown that the proposed method outperforms the SSE technique for the analysis of such a highly nonlinear surface, requiring 20 times fewer samples to achieve a comparable accuracy.• The size of the ED and the number of domain partitions critically determine the computational cost of the developed sparse PCE-Kriging metamodel.Specifically, the partition of non-smooth problems into a finite set of sub-domains allows the sparse adaptive PCE to eliminate a considerable number of high-order components through LAR, so achieving important savings in the construction of the Kriging model and the evaluation of the resulting metamodel.
• The developed surrogate model-based DRAM MCMC approach allows to conduct fast Bayesian parameter inference.In particular, the proposed approach has been applied to the identification of micro-structural traps in metallic alloys subject to TDS.The hydrogen fluxes obtained in TDS test represent a considerable challenge in surrogate modelling due to the presence of diverse regimes depending on the configuration of the hydrogen traps.In terms of R 2 , the proposed approach is capable of reproducing more than 99.9% of the hydrogen diffusion TDS model with computational time savings of 99.3% with respect to the forward numerical model.• The presented analyses evidence the potential of the developed approach for conducting inverse characterisation of hydrogen-metal interactions.The accuracy of the proposed PCK surrogate model in conjunction with DRAM MCMC opens vast possibilities for future applications in model selection, and information gain analysis of TDS hydrogen desorption tests.
Despite its simplicity, the adopted regular block partitioning model has demonstrated significant performance in terms of computational savings.In this respect, future research will involve the development of more efficient partitioning algorithms that would allow the sampling effort to be localised where the forward model presents greater non-linearities, thus achieving similar accuracies with smaller sample sizes.Another interesting goal for future work consists in the development of multielement surrogate PCK-based models capable of dealing with discontinuities in the response surface.

Fig. 5 .
Fig. 5. (a) A schematic illustration of initial and boundary conditions in a TDS test.(b) Schematic definition of binding energy in a one-dimensional diffusion path.(c) Transient solution curves of the normalised lattice occupancy fraction θ L /θ 0 L at different times t along the specimen's thickness.(d) A schematic of typical hydrogen desorption flux versus temperature curves obtained in a TDS test.

Fig. 6 .
Fig. 6.Scatter plots of Hydrogen flux curves obtained by the forward solution of the PDF of the TDS test versus the predictions by standard LAR-PCE (a) and by the proposed multi-element PCK metamodel ˆ M 4 (b) (VS of 36,0 0 0 samples).

Fig. 7 .
Fig. 7. Surrogate modelling of the Hydrogen flux curves obtained by TDS of metals with different values of trap binding energies and concentrations.Quantities in parenthesis represent the parameters of the traps H 1 , H 1 , log ( N 1 ) , log ( N 2 ) .

Fig. 9 .
Fig. 9. Bayesian identification results of the trap parameters θ = H 1 , H 2 , log ( N 1 ) , log ( N 2 ) of TDS Experiment EI.The surface plo t in the top right corner corresponds to the marginal PDF over H 1 , H 2 obtained by numerical integration.
1 ) denote the arithmetic mean and the quasi standard deviation of ϒ, respectively.Term σ ϒ ˆ ϒ represents the covariance of ( ϒ, ˆ ϒ) , and σ 2 ϒ and σ 2 ˆ ϒ indicate the variance of ϒ and ˆ ϒ, respectively.Note that the error metric NMAE in Table 1 provides a local estimation of accuracy, while NRMSE, NAAE, and R 2 represent global accuracy measures.

Table 3
Accuracy and computational efficiency analysis of proposed surrogate models applied to the Drop-Wave function.Terms t c , t e and t e (VS) denote the time of construction, average point evaluation, and evaluation on the full validation set, respectively.

Table 4
Non-dimensional variables used in the hydrogen diffusion PDE employed for the TDS tests.

Table 5
Accuracy and computational efficiency analysis of PCK metamodels developed for the surrogate modelling of hydrogen diffusion flux curves obtained by TDS (VS = 36,0 0 0).