Polynomial chaos expansion for sensitivity analysis of model output with dependent inputs

In this paper, we discuss the sensitivity analysis of model response when the uncertain model inputs are not independent of one other. In this case, two different kinds of sensitivity indices can be evaluated: (i) the sensitivity indices that account for the dependence/correlation of an input or group of inputs with the remainder and (ii) the sensitivity indices that do not account for this dependence. We argue that this distinction applies to any global sensitivity measure. In the present work, we focus on the estimation of variance-based sensitivity indices which are based on the second-order moment of the model response of interest. In particular, we derive new strategies and new computationally efficient methods to assess them, which rely on the polynomial chaos expansion. Several numerical exercises are carried out to demonstrate the performance of the new methods, including a sensitivity analysis of a drainage model posterior to its statistical calibration.


Introduction
Uncertainty and sensitivity analysis are essential ingredients of modelling [1]. In a decision-making process, uncertainty analysis aims to verify whether the decision made (according to the model responses) is robust to different sources of uncertainty. If the results are not robust, sensitivity analysis can be used to identify which sources of uncertainty are primarily responsible for the model output uncertainty [2]. Then, where feasible, further research can be conducted to improve knowledge (and reduce the uncertainty) of the most influential inputs. This refinement of knowledge is intended to yield a clearer decision. The importance of uncertainty and sensitivity analysis in the decisionmaking process is recognised by governmental organisations such as the European Commission, which has included them in their ''Better Regulation Guidelines'' [3]. These guidelines are the basis for all policy making at the European level.
Sensitivity analysis has received much attention during the two last decades in all disciplines that use and develop computer models, in particular since the seminal work of Ilya M. Sobol' [4] (who established the theoretical basis of the widely-used variance-based sensitivity indices, otherwise known as the Sobol' indices), and the promotion of variancebased sensitivity analysis by the European Commission's Joint Research Centre [1,5,6]. Sobol' proved that, provided that the model inputs (i.e. the sources of uncertainty in computer models) are independent of each other, there is a unique functional ANOVA (ANalysis Of VAriance) decomposition of the model response of interest with respect to the uncertain inputs. This functional representation decomposes * Corresponding author.
E-mail addresses: thierry.mara@ec.europa.eu (T.A. Mara), william.becker@bluefoxdata.eu (W.E. Becker). the model response variance into a sum of partial variances which are contributions of each input individually, and further contributions due to interactions with other inputs. With such a decomposition, it is straightforward to measure the relative importance of the input variables in terms of their contributions to the model output variance.
Following the work of Sobol', many numerical methods have been proposed to estimate the Sobol' indices when model inputs are independent of each other. Monte Carlo methods are amongst the first estimators proposed in the literature [4,7,8]. They employ several (quasi) Monte Carlo samples to estimate Sobol' indices by following the pick-freeze strategy [9]. Another very popular class of methods are based on the so-called Fourier Amplitude Sensitivity Test [10][11][12][13]. Introduced in the 70s, they rely on Parseval-Plancherel theorem to estimate first-and total-order Sobol' indices. The most efficient methods (in most circumstances) are those based on surrogate modelling of the computer model responses [14][15][16][17]. A surrogate model (otherwise known as a 'emulator' or 'metamodel') is a statistical model that mimics the input-output relationship of the original model, but at a greatly reduced computational cost. It is constructed by running the original model a modest number of times for different input values and obtaining corresponding output values. The surrogate model is then constructed to fit this input-output data as well as possible by using a variety of possible approaches. Global sensitivity analysis (which typically requires a large number of model runs) can then be performed using Monte Carlo estimators on the emulator rather than the real https://doi.org/10.1016/j.ress.2021.107795 model, resulting in significant computational savings. In the present article, we consider the Polynomial Chaos Expansion (PCE) method. PCE is classified amongst the spectral approaches ( [15,16,18] among others), and can be used as an approximation of the input-output relationship (inferred from a given input-output sample) by casting the model response onto orthogonal polynomials. The Parseval-Plancherel theorem is then invoked to compute Sobol' indices, which means that the variance decomposition and the Sobol' indices are analytically obtained from the PCE coefficients without running the PCE as a surrogate model.
The large majority of sensitivity analysis approaches and estimation procedures assume that model inputs are independent from one another. However in reality, input variables may be correlated with one another (i.e. linear dependence), or more generally dependent on each other, which can also imply nonlinear relationships. Performing global sensitivity analysis of computer model responses with dependent inputs is challenging. The difficulty stems from the non-uniqueness of the ANOVA decomposition and the definition of useful importance measures (see [19]). In the last decade, several mathematical frameworks and associated numerical methods have been proposed to address this issue [19][20][21][22][23][24][25][26][27][28][29][30][31]. Li et al. (2010) introduced the covariance-based sensitivity indices [21]. These are split into two kinds of sensitivity indices, namely, the structural and correlative sensitivity indices. They can be inferred after a functional input-output relationship (i.e. metamodel) is obtained. Caniou and Sudret (2013) use PCE as metamodel, identified by assuming first that the input variables are independent of each other [31][32][33]. Then, the covariance analysis (that the authors called ANCOVA) is performed by considering the possible correlation between the variables. Typically ANCOVA is a two-step approach. Chastaing et al. (2012) attempt to define an ANOVA representation (like the one of Sobol' in which the summands are orthogonal) of the functional input-output relationship in the case of correlated input variables [25]. Thus, with such a representation, it is straightforward to infer variancebased sensitivity indices. They find that this is possible only for a few joint probability distribution function. Therefore, the application of this approach is limited. In [22], a Monte Carlo estimation procedure was proposed for estimating sensitivity indices of correlated variables, although this requires knowledge of the joint distribution, and the computational cost is quite high. This was later extended to a metamodelling approach in [34], but in both cases there is no decomposition of sensitivity into contributions due to correlations, and independent contributions. Owen and Prieur (2017) propose to use the Shapley measure [35] as a sensitivity index (called Shapley effect) when the input variables are dependent on each other [28] (see also [36]). The nice properties of the Shapley effect are that: (i) there is only one sensitivity index for each input, (ii) it takes into account the contribution of the input to the output variance in terms of correlation and interactions, (iii) mutual contributions are equally shared over the input variables that interact and that are correlated, and (iv) the overall Shapley effects sum to one. The main obstacles to the use of the Shapley effects for sensitivity analysis are the computational burden and their accurate estimation.
The authors in [20] observe that, when the input variables are correlated, the model response variance might be captured by only a few inputs; the remainder forming a subset of spurious inputs whose contribution to the response variance is embedded in the contribution of the former. This led the authors to introduce two types of variancebased sensitivity indices: one that accounts for the correlation of each input variable with the others (called full sensitivity indices in [23]) and one that does not (called independent sensitivity indices). The importance measures mentioned in previous paragraph do not decompose the contribution of each variable into correlated and non-correlated parts with the exception of the ANCOVA approach. Indeed, it is shown in [37] that it is possible to infer the correlated and non-correlated variance-based sensitivity indices with the ANCOVA approach. In our opinion, this decomposition is important in understanding whether an input is contributing to the model directly or through a correlation with another variable. Therefore, this paper will discuss approaches to computing the full and independent indices.
We note that this distinction is not completely new in statistics: for example, the partial correlation coefficient is essentially a version of the Pearson correlation coefficient which does not account for the mutual effect due to correlations with the other variables [38]. However, neither the partial correlation coefficient nor the sensitivity indices of [20] are ''model free'', because they assume linearity of the model response, and a linear dependence structure (i.e. correlated inputs). A first step beyond these limitations was made in [23], in which a polynomial chaos expansion method was used to extend the approach of [20] to nonlinear model responses, but still with limitations on the dependence structure between the input variables. Subsequently, a second step was made in [19] to handle any (given) dependence structure: sampling-based strategies (i.e. Monte Carlo estimators) were introduced to evaluate variance-based sensitivity index of any individual inputs or groups of inputs. In [30], the FAST method was adapted to compute the (full and independent ) first-order and total sensitivity indices. The concept of full and independent sensitivity measures has also been extended to the elementary effects method of Morris in [29].
In the present paper, we make a further advance in global sensitivity analysis with dependent inputs, by extending the PCE-based method of [23] to a broader class of dependence structure (as in [19] and [30]). We focus on the estimation of the first-and total-order Sobol' indices, although any Sobol' index can be estimated in theory (at no extra cost). Incidentally, the number of model runs needed for accurate identification of sparse PCEs with the algorithm of [18] has been shown to be only weakly dependent on the dimensionality of the input space. This is not the case with the aforementioned Monte Carlo methods and the FAST method. Additionally, we show that the concept of distinguishing between full and independent sensitivity measures can be extended to any sensitivity index, although a full exploration of these new indices is left for future work.
The paper is organised as follows, in Section 2 we recall the definitions of the full and independent first-order and total sensitivity indices, and discuss the usage and implications of these indices in some detail. We also define the full and independent moment-independent sensitivity measures. Then, in Section 3 we recall three possible sampling techniques to generate dependent random samples from independent ones (and vice-versa). In Section 4, we describe the algorithm to compute the sensitivity indices of interest with PCE. Section 5 is devoted to numerical test cases and applications. Section 6 concludes.
Following this idea, the resulting sensitivity indices can be interpreted as follows: the sensitivity index of 1 is the full sensitivity index of 1 (taking into account dependence with the other input variables within 2 ) whereas the sensitivity index of̄2 is the independent sensitivity index of 2 , which does not account for dependence with other input variables (see [19,23,30]).
Within the context of variance-based sensitivity analysis, the following sensitivity indices can be defined, 1 ≤ 1 (see for instance [30]). The full first-order sensitivity index 1 measures the amount of variance of due to 1 and its dependence with 2 but does not include the interactions of 1 with̄2. The full total sensitivity index 1 takes into account both dependence and interaction. The independent firstorder sensitivity index 1 measures the contribution of 1 by ignoring its correlations with 2 and interactions with̄2 while 1 accounts for interactions and ignores correlations. Consider an input , which contributes to the model response variance only because of its strong dependence to the other inputs. In that case, we shall have ≥ 0 and = 0. It is worth underlining here that interactions in the case of dependent inputs (following the partitioning of variables used in this paper) do not have the same interpretation as in the case of independent ones. While in the dependent case interactions concern ( 1 ,̄2) and (̄1, 2 ), in the independent case they are related to the original variables ( 1 , 2 ). This has the following consequence: a non-interacting input/output relationship with respect to ( 1 , 2 ) can turn out to have an interacting relationship with respect to ( 1 ,̄2) or (̄1, 2 ). This complicates somewhat the analysis, but recall that we use the pairs ( 1 ,̄2) and (̄1, 2 ) because they are independent variables, which allows the use of the variance decomposition of in the Sobol' sense [4]. However, in the dependent case, note that such a decomposition is not unique as one can derive it with respect to either ( 1 ,̄2) or (̄1, 2 ).
To illustrate, let us consider a model response which is supposedly a function of two dependent random vectors ( 1 , 2 ), but actually only a function of 1 , say, = ( 1 ). By computing the sensitivity indices w.r.t. ( 1 ,̄2), one will find 1 = 1 = 1 and 2 = 2 = 0. This leads to the conclusion that, because of the dependence structure between ( 1 , 2 ), all the information (i.e. variance) of is explained by the subset 1 . By further computing the sensitivity indices w.r.t. (̄1, 2 ), one will get 1 ≥ 1 > 0 and 2 ≥ 2 > 0 which indicates that because of its dependence with 1 , 2 also has an influence on the model response. It can be inferred that the value of 2 cannot be fixed at an arbitrary value because this has an impact on the conditional pdf of 1 (to do so, one should have 2 = 2 = 0, according to [29]). Finally, it can be concluded that 2 is a spurious vector of inputs only relevant because of its dependence with 1 (see [20]). However, it cannot be concluded that the original model structure (w.r.t. ( 1 , 2 )) is of the form = ( 1 ). Indeed, it is only when ( 1 , 2 ) is a vector of independent random variables that the original model structure can be revealed because the variance decomposition in the Sobol' sense w.r.t. ( 1 , 2 ) is unique. Nevertheless, with this result one can conclude that a surrogate model with only 1 ∼ 1 can be built that could be very accurate as compared to the original model, thus contributing to a model reduction objective.
The authors of [23] propose to estimate Eqs. (1)-(4) with polynomial chaos expansions after decorrelating the input sample. The advantage of the proposed approach is its computational efficiency. Indeed, only one input sample of a relatively modest size (albeit a function of the model complexity) is sufficient to estimate all the firstorder and total sensitivity indices of each individual input variable. The problem is that decorrelation does not necessarily mean independence. Actually, with the decorrelation procedure of [23] recalled in Section 3.3, independence is ensured only for a particular form of the dependence structure, which is explained later in Section 3.3. In Section 3.2 we extend the method to correlation structures represented by a Gaussian copula [22,39] and in Section 3.1 to the general dependence structure represented by the Rosenblatt transform. In Section 4, we then extend the PCE method to the case of dependent variables represented by one of these dependence structures.

Moment-independent sensitivity indices
In the following, we show how the concept of full and independent sensitivity indices can also be applied to moment-independent measures. At the end of this section, we also comment on how this can apply in the more general case.
In the case of independent inputs, variance-based sensitivity indices indicate whether a model output is a function of an input (or subset of inputs) in the form of a functional relationship like = ( 1 , 2 ). Specifically, they examine the contribution to the variance of the model output distribution . If a relationship of the form = ( 2 ) is discovered, then 1 is deemed irrelevant. Arguably a more natural way to infer that does not depend on 1 is to prove that | 1 = . This has led to the definition of the so-called moment-independent sensitivity measures defined as [40], This measure is equal to half of the average of the differences between the unconditional pdf of , denoted , and the conditional pdf of given 1 , denoted | 1 . Finding 1 = 0 allows us to conclude that is not dependent on 1 , therefore 1 can be regarded as noninfluential. We note that to evaluate , 1 is drawn from 1 while the complementary input subset 2 is sampled from 2 | 1 . In the independent case discussed here we naturally have 2 | 1 = 2 . In the case of dependent input variables, the functional relationship between and can take one of the following forms, = ( 1 ,̄2) or = (̄1, 2 ). If one shows that = ( 1 ) (i.e. 1 = 1 ⇔ 2 = 0), then it can be concluded that 2 is a subset of spurious inputs. Additionally, if 2 = 0 then it can be inferred that 2 can be fixed without any impact on the model response. However, such a distinction cannot be made with the -importance measure as defined in Eq. (5) because the latter actually assesses the importance of 1 by accounting for its dependence with 2 . This remark leads us to introduce the following complementary -importance measure that does not account for the dependence of 1 to 2 , wherē1 ∼ 1 | 2 while 2 ∼ 2 . Finding 1 = 0 indicates that 1 is a spurious subset of inputs which impacts the model response only via its dependence with 2 (if any exists) while if 1 = 1 = 0, it can be additionally concluded that 1 can be fixed without affecting the model response.
The brute-force estimates of 1 and 1 subtly differ. In both cases, is obtained after sampling ∼ with a (quasi) Monte Carlo technique, running the model for each draw and collecting the model response = ( ). Subsequently, can be estimated with an appropriate method (such as the kernel density estimator of [41]). To estimate | 1 one has to draw one random value of 1 ∼ 1 , then generate a (quasi) Monte Carlo sample of̄2 ∼ 2 | 1 and infer from̄2 the sample of 2 , run the model and get the model responses before finally estimating | 1 for the drawn value of 1 . This process is repeated several times for different values of 1 ∼ 1 . In the end, one obtains one single estimate of and several estimates of | 1 for different values of 1 sampled from 1 . These can be used to compute 1 . On the contrary, to compute 1 , one has to get several estimates of |̄1 for different values of̄1 sampled from 1 | 2 . We proceed by first generating a (quasi) Monte Carlo sample of 2 ∼ 2 , then we deduce the draws of̄1 sampled from 1 | 2 , and therefore the sample of 1 . Finally |̄1 is deduced after running the model and getting the response vector. This process is repeated several times for different values of̄1. This tricky sampling procedure is further explained in Section 3.
The present extension of Borgonovo's moment-independent measure is given here to show how the underlying approaches in this paper can be generalised outside of variance-based measures. Indeed, we could conceivably apply the same principles to any global sensitivity measures defined in the following general way in [42], where (⋅, ⋅) is a dissimilarity measure between two random variables [43]. For example, in the case of dependent inputs one can introduce the complementary sensitivity index that does not account for the effect of 1 due to its dependence on 2 , that is, In the remainder of this paper, however, we focus on variance-based sensitivity analysis and leave investigation of other measures to future work.

Generating dependent and independent samples
In the previous section it was explained that a sensitivity analysis of the system of dependent variables ( 1 , 2 ) can be achieved by instead considering the independent pair 1 ,̄2. Modellers need samples of ( 1 , 2 ) to run their model but they need samples of̄1 to compute The key challenge in the present approach to estimate the set of variance-based sensitivity indices is to obtain samples of the conditional and unconditional variables (e.g.̄1 and 1 ) respectively. In this section, we present several possibilities to do so, with different properties and requirements.

The Rosenblatt transformation
Let = ( 1 , 2 ) be the vector of independent random variables stemming from the isoprobabilistic transformation, where 1 is the cumulative density function (cdf) of 1 and 2 | 1 is the conditional cdf of 2 . We note that is uniformly distributed over the unit hypercube [0, 1] . We also notice that Eq. (9) is not unique as one can swap the subscripts 1 and 2. Eq. (9) is called the Rosenblatt transform (RT, [44]) of and allows independent random variables to be obtained from dependent ones provided that the conditional cdfs are known.
Propagating the input uncertainty through the model response requires the generation of (quasi or pseudo) random draws of . This can be achieved with one of the inverse RT as follows, which requires that the cdfs be invertible. This is performed in practice by first generating a sample of with a (quasi) random generator such as the LP sequences of [45]. Then, this sample is transformed into a sample of using Eq. (10). From Eq. (9),̄2 = −1 2 ( 2 ), therefore it is straightforward to generatē2 independently of 1 as the former only depends on 2 . Therefore, fixing the value of 2 is equivalent to fixinḡ 2 .

The Nataf transformation
The Rosenblatt transformation requires knowledge of the conditional cdfs, which might not always be the case in practice. When the uncertainty of is characterised by a set of marginal cdfs { 1 , … , } and a product-moment correlation matrix , Nataf transformation is more suitable to generate random samples of (NT, [46]). We note however that the Nataf transformation is equivalent to the Rosenblatt transform with a Gaussian copula [47]. Given and ∼  (0, ) the steps of the NT are, The crux of this algorithm is to choose so that the product-moment correlation matrix of , in fine, is the one desired (i.e. ). For some special distributions, the link between and are well-known (see [48]). Generally, an iterative procedure can be required to tune . We note that Eq. (11) is equivalent to the procedure of Iman & Conover [49] when is a rank correlation matrix. In that case, there is no need to tune as the latter exactly equals . In the NT transformation, two complementary independent subsets 1 and 2 such that = ( 1 , 2 ), produce two correlated subsets ( 1 , 2 ). It is worth noticing that because is a upper triangular matrix, the information in 2 propagates only through 2 . Therefore,̄2 is related to 2 which means that to fix̄2 one simply needs to fix 2 . As for RT, NT is not unique as the elements in the subsets 1 and 2 can be chosen arbitrarily. Freedom in the definition of the subsets implies columns and rows permutation of . Of particular interest are subsets of the form ( , − ) in order to compute individual sensitivity indices.

The transformation of Mara and Tarantola
In [23], a non-linear Gram-Schmidt-like transformation is introduced to obtain independent variables 1 , 2 , … , from dependent ones. This transformation (hereafter referred to as ''MT'') reads as follows, and assumes an additive dependence structure between the -variables of the form, where ∫ R , ( ) d = 0, which ensures that the set of functions { ,1 , … , , −1 } in the th row of Eq. (13) are orthogonal (according to [4]). We note that each equation in Eq. (13) is related to the generalised additive model representation [50]. We also note that = ( 1 , 2 ) produces = ( 1 , 2 ) with the information in 2 only found in 2 and not in 1 . Therefore, fixing 2 implies fixinḡ2. This transformation was introduced to overcome NT that only deals with linear pairwise dependencies between variables. In effect, MT tackles nonlinear pairwise dependencies. The transformation of [23] in Eq. (12) is particularly suited to dealing with given data. In that case, the orthogonal functions in Eq. (13) can be estimated with an iterative univariate regression procedure [50][51][52][53], among others). Note that a less restrictive assumption than (13) can be adopted (see [23]), but then a more sophisticated approach is required to infer the independent vector . Importantly, the MT transformation requires no knowledge of cdfs, either marginal or conditional. In that sense, it is the most generally-applicable transformation of the three described here (RT being the most generally-theoretical), albeit containing the strong assumption in (13). The transformation is also sensitive to the ordering of the variables -for this reason, variables are circularly reordered in calculating sensitivity indices (see Section 4.2). Specifically, circular permutation is chosen to make sure that one obtains a single estimate of first-and total-order sensitivity indices of each variable, both accounting for and ignoring the mutual dependent variance contributions. Finally, we note that as with any estimation procedure, its accuracy will be sensitive to the sample size -this may be checked by monitoring convergence as the sample size is incrementally increased, and/or by bootstrapping over the full sample.

The case of independent variables
In [15], the author demonstrated that it was straightforward to estimate the variance-based sensitivity indices from a polynomial chaos expansion. Since then, PCE for global sensitivity analysis has received much attention in the community of modellers [16,[54][55][56][57][58]. PCE is a spectral representation of any square-integrable function ( ), when the inputs are independent random variables. A PCE representation is written, where = 1 2 … ∈ N is a -dimensional index and the are the PCE coefficients. The are the multidimensional polynomial basis elements of degree | | = ∑ =1 . Because is a vector of independent random variables, is obtained by tensor product of univariate polynomials ( ). Depending on the marginal probability distribution function of , belongs to different families of orthonormal polynomials (such as Legendre polynomials for uniform distribution, Hermite polynomials for normal distribution, Laguerre polynomials for exponential distributions, and so on -see [59]).
Given the PCE coefficients, and knowing that the inputs are independent, the variance-based sensitivity indices can be computed as follows [15], To estimate the coefficients , we use the Bayesian sparse polynomial chaos expansion approach of [18], which builds sparse PCEs in a Bayesian framework with the help of the Kashyap information criterion for model selection [60]. This means that any statistic computed with this approach is assigned a credible interval, which is crucial to assess its significance level. There are other alternatives proposed in the literature to build sparse PCEs, notably [16].

The case of dependent variables
In the case of dependent inputs, PCEs must be constructed with respect to one of the vectors of independent variables discussed in Section 3. If RT is used, then we consider in conjunction with the shifted-Legendre polynomials, if NT is employed then we consider the vector and Hermite polynomials while with MT, is considered in association with the orthogonal polynomials derived with the Gram-Schmidt procedure. In the sequel, we denote bȳthe vector of independent variables obtained by one the aforementioned transformations. Now, suppose that̄is obtained by transforming ( 1 , … , ). Then, the sensitivity indices of̄1 represent the full sensitivity indices of 1 which account for its dependencies with the other variables (see Section 2). The sensitivity indices of̄2 represent the sensitivity indices of 2 which account for its dependencies with the other variables except 1 (see [23]). The sensitivity indices of̄are interpreted as the independent sensitivity indices of , and the full sensitivity indices of ( 1 , 2 ) are those of (̄1,̄2). In the same way, the independent sensitivity indices of ( −1 , ) are those of (̄− 1 ,̄).
However, as underlined previously, the different transformations to generatēare not unique. The order of the variables in the vector to be transformed determines which sensitivity indices can be computed with the identified PCE. To compute the overall set of full sensitivity indices ( , ) and independent sensitivity indices ( , ), for all = 1, … , , all the transformations obtained by circularly reordering the input vector ( 1 , … , ) must be considered. This implies transformations and subsequently the identification of sets of PCE coefficients. Nonetheless, with PCE, only one Monte Carlo sample of the input variables is needed to assess any variance-based sensitivity indices. Hence, the number of model calls can still be reasonable. In many cases, < 1 000 is sufficient to obtain an accurate estimation of the sensitivity indices but this depends on the effective dimension of the model and the degree of smoothness of ( ). This makes PCE-based sensitivity analysis a compelling approach compared with existing methods. The algorithm to compute the overall ( , , , ), for all = 1, … , , is given in Appendix.
We also note that if the objective is to estimate higher-order interaction effects, circularly reordering the variables is not sufficient, and other permutations would have to be considered. However, for many cases the estimation of first and total-order sensitivity indices is sufficient, and higher-order effects are left to future work.
Depending on the transformation used, PCEs are built upon̄which is either , or or . If is chosen, then PCEs are built with the shifted-Legendre polynomials, if it is then Hermite polynomials will be used, while if the random variables are arbitrarily distributed (like in the case of ), then a Gram-Schmidt orthogonalisation procedure is used to obtain the subset of orthogonal polynomials. But as highlighted in [61], working with the transformed independent variables might pose problems of convergence of PCEs. In that case, subsequent individual transformation of thē-variables can help to overcome this issue. In the numerical exercises Section 5, we will indicate which further transformation has been used if so.

Computational issues
Eq. (15) indicates that the Sobol' indices are computed directly from the PCE coefficients. Thus, there is no need to run the PCE as a surrogate model to estimate variance-based sensitivity indices. This is the reason why variance-based sensitivity analysis with PCE can be classified among the spectral methods along with the Fourier amplitude sensitivity test [10,13]. Therefore, the challenge with PCEbased sensitivity analysis is the PCE coefficient estimation. In practice, it is expected that a sparse PCE often suffices to be a good proxy of the input-output relationship, that is, where  is a subset of multi-indexes ⊂ N of maximal polynomial degree  = | | and level of interaction  ,̄the vector of independent variables and the approximation error. We employ the stepwise regression algorithm of [18], derived in a Bayesian framework, to identify the best PCE. Let be an input sample of ∼ of size × and the associated vector of model responses, Bayesian sparse PCE starts by obtaining the standardised vector from the original vector . We denote bȳone of the possible independent input samples obtained from the transformation of with either RT, NT or MT. From the dataset (̄, ) , the best sparse PCE is obtained as follows, The PCE coefficients are estimated at step 3 by assuming ∼  (0, 2 ), yieldinĝ with, and the posterior pdf of the error variance is the following inverse Gamma distribution, whose mode is, The estimated vector of coefficientŝ is known as the maximum likelihood estimate (MLE) which differs from [18] who identified the maximum a posteriori estimate since Gaussian prior was assigned to the PCE coefficients. The error vector = − ̂ can be compared a posteriori with  (0, (̂) 2 ) to ensure that the initial assumption (Gaussian likelihood) is reasonable. It is worth mentioning here that assuming Gaussian likelihood might provide non-robust estimates for some problems (e.g. when contains outliers).
In step 3, the current sparse PCE (associated with ) is judged better than the previous one provided that its model selection criterion (namely, ) is smaller than the previous one (i.e. −1 ). The Kashyap information criterion (KIC) at the th iteration is defined as follows = Card (), | ⋅ | stands for determinant. With the previous algorithm, the final maximal polynomial degree  and level of interaction  are automatically identified. Suppose that, at iteration − 1, elements of polynomial degree within [ , + 2] and level of interaction within [ , + 1] has been added to the subset of multi-indexes (step 4) but at iteration none of these elements has been kept during the stepwise regression. Then, at the end of the th iteration, one comes up with a subset of maximal polynomial degree  = and level of interaction  = . In this case, the iteration stop because it is unlikely that higher-order elements would provide better sparse PCE. Otherwise, the enrichment procedure should go further.
As an illustration, suppose that at step 4, the current subset of multiindexes for a problem with = 4 variables is the subset  below. We note that  is of polynomial degree  = 4 and level interaction  = 3. Hence, at step 4,  is enriched with elements of degrees 5 and 6 (see  ).  3110  1310  1130  4010  2210  2030  3020  1220  1040  4020 2220 2040 To circumvent the curse of dimensionality, we proceed as follows Another feature of Bayesian sparse PCE is that the joint probability density function of all quantities are estimated (see Eqs. (17)- (21)). Therefore, the sensitivity indices (15) can be computed with uncertainty range. This is useful to assess significant differences between PCE-based computed statistics.

PCE-RT on non-rectangular domain
We illustrate the use of polynomial chaos expansion in conjunction with the Rosenblatt transformation (method denoted PCE-RT) by considering a ten dimensional problem with an inequality constraint. The input vector = ( 1 , … , ) ∈ [0, 1] is uniformly distributed within the non-rectangular domain defined by the constraint ∑ =1 < 1. Sampling a joint pdf of this nature can be a challenging issue if one uses the basic acceptance-rejection sampling proposed in [62]. This is because the joint pdf of is: < 1, otherwise ( ) = 0. This means that the probability of drawing a good candidate (that satisfies the constraint) by randomly drawing from the unit hypercube is 1 ! which is very low for = 10 as discussed in the present example. On the contrary, knowing the conditional cdfs one can efficiently generate random samples w.r.t. the joint pdf by using Eq. (10) (inverse RT). It can be shown that the inverse conditional cdfs in this case are, In the present work, this sample is generated with the LP sequences of [45]. A sample size of = 1 024 is chosen. Now, let us consider the following model response: = ∑ =1 log( ). We set = 10, and = 0 for all < 7, and for all ≥ 7 = 1. This implies that the model structurally only depends on ( 7 , … , 10 ). It is straightforward to see that, as a function of the -variables, the model response is of the form, Indeed, by replacing Eq. (23) in the model response , it can be inferred that, Because the -variables are independent of each other it can be concluded that the model is additive w.r.t. to . To verify this, the Sobol' indices of the -variables are estimated with the PCE approach. PCEs are built upon the -variables with the shifted-Legendre orthogonal polynomials. The results are gathered in Table 1 for two different RT transformations. They have been estimated with fair accuracy. Examining Table 1, despite the estimation error, one can easily verify that ∑ 10 =1̂= 1, whatever the transformation ordering. Recall that the interpretation of the sensitivity indices of thevariables as those of the -variables depends on how the former have been obtained from the RT (Eq. (9)). In Table 1, we have reported the first-order sensitivity indices of the -variables for two different transformations. In the first one (row #2), the sample of has been computed after transforming the sample of set in the canonical order ( 1 , … , 10 ). Consequently,̂1 =̂1 , which is the full first-order effect of 1 that accounts for its dependence with all other variables. We note that̂1 only represents 3% of the total variance. The sensitivity index 2 is of 2 | 1 , which is the first-order effect of 1 which accounts for its dependence with all other variables except 1 (in [23] the authors denote it̂2 −1 ). The interpretation of the other indices follows the same reasoning until the interpretation of̂1 0 , where the latter is simply the independent first-order effect of 10 , namelŷ1 0 . We note that the independent contribution of 10 to the total variance is much higher than the full contribution of 1 . This is explained by the structural independence of to 1 . Although the model does not structurally depend on 1 , because of its dependence with the other variables, 1 still has a substantial impact on the model response. By fixing the value of 1 , the uncertainty of the other variables would be impacted, which would marginally affect the variance of the model response (a small reduction of 3%). However, because the model is additive, fixing ( 1 , … , 6 ) would lead to a significant reduction of the response variance (i.e. ∑ 6 =1̂= 40%). The second RT is applied to ( 7 , … , 10 , 1 , … , 6 ) ( Table 1, row #3). Because has a structural dependence only on the first four variables ( 7 , 8 , 9 , 10 ), we should find ( 1 ,…, 4 ) = ( 7 ,…, 10 ) = 1. This is confirmed by our results. Indeed, we first note that̂= 0 for > 4. This is explained by the fact that, with the current RT, the sensitivity indices of these -variables (for > 4) are those of ( 1 , … , 6 ) without accounting for their dependencies with ( 7 , … , 10 ). Since the former only impact via their dependencies with the latter, their independent contributions are null. We also note that the first-order sensitivity indices of ( 7 , … , 10 ) sum-up to one as expected.
Finally, the fact that we have found that ( 7 ,…, 10 ) = 1 and ( 1 ,…, 6 ) = 0, does not allow us to infer that the model response is structurally independent of ( 1 , … , 6 ). As previously discussed in Section 2, the non-uniqueness of the ANOVA decomposition (in the Sobol' sense) precludes us making this conclusion. However, this finding is still useful in that it gives information relevant to a model reduction setting. In effect, we can conclude that all the information in the model response is contained in the input subset ( 7 , 8 , 9 , 10 ). Therefore, prioritisation should be given to the accurate assessment of these variables in order to better estimate the model response.

PCE-NT on the Ishigami function
As a second example and to demonstrate the PCE-NT approach, let us consider the Ishigami function, which is defined as follows, ( ) = sin 1 + 7 sin 2 2 + 0.1 4 3 sin 1 (24) where the input variables are uniformly distributed over [− , ], with a linear correlation between 1 and 3 . We denote 13 the momentproduct correlation coefficient. The Sobol' indices of this problem for linear correlations varying over (−1, 1) were investigated in [22] and [30]. In [22] only and were estimated with a Monte Carlo approach. The overall first-order and total-order sensitivity indices were assessed in [30] with the Fourier amplitude sensitivity test (the approach was called EFAST-INT). For the sake of comparison, PCE-NT is employed to estimate ( , , , ), for all = 1, 2, 3. PCE-NT relies on transformation Eq. (11) which requires a vector of independent standard normal variables , due to the choice of a Gaussian copula. The latter is obtained from the isoprobabilistic transformation of the vector uniformly distributed over [0, 1] , i.e. = −1 ( ). A sample of is drawn with the LP sequences. PCEs are built upon the -variables with shifted Legendre polynomials because PCEs built upon the -variables with Hermite polynomials faced convergence issues. This is due to the fact that is uniformly distributed like the original model input (albeit the correlation structure) while the -variables are non-linear transformations of the -variables. Such a non-linear transformation can complicate the relationship between and as compared with . This issue has been highlighted by other authors [63].
The estimated sensitivity indices are depicted in Fig. 1. Note that we do not report the estimation errors as the indices are estimated accurately. They are in good accordance with those obtained in [30] with the EFAST-INT method. Nevertheless, while with EFAST-INT an overall number of 3 × 8 192 function calls were necessary to obtain accurate estimates of the Sobol' indices, with PCE-NT only = 1 024 function calls were performed. This demonstrates the efficiency of the Bayesian sparse polynomial chaos expansion proposed by [18].
The results indicate that 2 is not correlated to the other two inputs as its full and independent sensitivity indices are equal. We can also  guess that 2 does not interact with the other two inputs, because its total and first-order indices are also the same. As far as 1 and 3 are concerned we note that, when the correlation coefficient is zero, the full and independent sensitivity indices are equal. When the correlation coefficient is close to ±1, the independent sensitivity indices (both first and total order) of 1 and 3 are null because all the information in ( ) is captured by only one of the pairs ( 1 , 2 ) or ( 2 , 3 ). This finding informs the analyst that the output uncertainty is explained by one of these two pairs only, thus, allowing some kind of dimensionality reduction.

Motivation
We consider now a case in which model input distributions are inferred from statistical calibration. Statistical calibration of computer models enhances their reliability by demonstrating that they can match observations. If the model at hand is not able to match the observations, statistical calibration can help in identifying the sources of discrepancy, if done properly. Model calibration is usually undertaken in a Bayesian framework which requires the definition of a likelihood function (characterising discrepancies between model predictions and observations) and prior uncertainties of the model inputs (independence is usually assumed). Model inputs might also include hyperparameters of discrepancy models [14,[64][65][66]. This yields the so-called joint posterior distribution of the model inputs. Sampling random draws from the posterior distribution is not an easy job, and Markov Chain Monte Carlo (MCMC) methods are typically used for such a task [67][68][69][70].
We consider the sensitivity analysis of a drainage experiment model posterior to its calibration. This model was studied in different papers. First, [71] used this model to compare the performances of two Bayesian approaches for statistical calibration. Then, [72] performed the global sensitivity analysis of the model responses prior to its calibration. Recently, [73] used this model to illustrate the extension of the calibration method developed in [74] to the generation of stochastic random samples from a joint pdf defined in a Bayesian framework. In the present exercise, we consider the data obtained in [73]. We stress on the fact that this is a purely numerical exercise meaning that the observations were generated by simulations corrupted with random noise. Hereafter, we recall the setting of the statistical calibration problem.

The case study
In the modelled experiment, a vertical column filled with watersaturated soil is drained by imposing multistep pressure heads on either side of the column. This is achieved by moving vertically (downward), at different timesteps, a reservoir tank connected to the bottom of the column. The soil drainage experiment is modelled by several equations and initial/boundary conditions. The flow through the porous medium is governed by the following partial differential equation:  [75,76], where [L][T] −1 is the saturated hydraulic conductivity, and (−) is the effective saturation defined as follows: The calibration exercise of this model was performed in [71]. For sake of completeness, we recall here the target joint posterior distribution which is the likelihood function (using independent uniform priors for the calibrated parameters), where ℎ and are the sum of square errors of the pressure head and water content respectively while 2 ℎ and 2 are their error variances. We recall that the data sets are simulated model outputs corrupted with Gaussian noise. With true experimental data sets, the choice of the target joint posterior distribution must be chosen with care. The random vector = ( , , , , , ) contains the soil hydraulic parameters to be calibrated. Independent uniform priors were assumed within ranges as shown in Table 2. These uncertainty ranges were considered in the sensitivity analysis of the observed model responses prior to the calibration exercise [72]. The observations are ℎ , the hydraulic heads at = 3 cm, and the average water content at the same location. For both variables, = 481 observations at different timesteps were considered.
A random sample of size = 512, drawn from the joint posterior distribution Eq. (28), was generated with the numerical approach described in [73]. This allows sampling of the posterior without an explicit analytical representation of the distribution. It is worth mentioning that a MCMC sample could have been used. However, the 9 T.A. Mara and W.E. Becker Table 2 Prior uncertainty ranges of hydraulic parameters; cm and minutes are used following the convention in hydrology. sample obtained in [73] is preferred because the method used to generate it relies on the assumption defined by Eq. (13). Therefore, this sample fits the requirements of the PCE-MT approach described in Section 3.3. The random sample is depicted in Fig. 2.
The main diagonal of Fig. 2 shows the posterior marginal distributions of the inputs. We note that, while the distributions of ( , , ) appear to be close to normal, the distributions of ( , , ) are skewed. The posterior ranges of ( , , ) are remarkably narrow, proving that they are satisfactorily identified. The plots below the main diagonal show the pairwise scatter plots of the generated random sample (see [73] for more details). It is clear that there are strong correlations in the sample, with some parameters being positively correlated (e.g. and ) and others negatively correlated (e.g. and ). This correlation structure is dictated by the model which is based on the physics of the system. This suggests that to fit the observed data, one must account for the complex interplay between the hydraulic parameters.

Posterior sensitivity analysis
The quantity of interest in the present study is the predicted cumulative outflow at = 240 min, which is related to the quantity of water removed from the soil. It is defined as follows, To perform the posterior uncertainty and sensitivity analysis of , we propagate the random sample previously discussed in Eq. (29). Because the joint posterior pdf in Eq. (28) from which the input sample has been drawn is not tractable, one cannot infer the conditional cdfs and get sets of independent samples with RT. Hence, PCEs are built upon the -variables obtained after orthogonalisation of the MCMC sample from Eq. (12). The associated orthogonal polynomials are inferred from Gram-Schmidt orthogonalisation. We recall that the -variables are not completely independent (see previous discussion about Fig. 2). Therefore, the sensitivity analysis has to be conducted with care: a visual inspection can help to gauge whether any residual dependence still exists. This limitation is however expected with any decorrelation procedure. An important remark is that PCE identification with the method of [18] is computationally cheap because of the strong correlation between the input variables that yields very sparse PCEs.
The total-order and first-order variance-based sensitivity indices are gathered in Table 3. It results that the first-order effects are equal to the total-order effects for all input variables. The full sensitivity indices are all greater than 0.81, meaning that none of them can be fixed to their best estimate without impacting the prediction of the cumulative flow. We also observe that all the independent sensitivity indices are close to zero (the highest one is ≃ 0.04). This means that all the hydraulic parameters mainly contribute to the model response variance through their mutual correlations. These results are related to the strong correlations observed in Fig. 2 (lower diagonal). It is then expected that only few parameters suffice to explain the predicted flow variance.
The MvG parameter is the one which has the highest full firstorder sensitivity index ( ≃ 0.94, Table 3 in bold font), although has a full first-order effect with almost the same value ( ≃ 0.93). The difference between and is likely to be insignificant if one recalls that the samples of and̄are not completely independent (read the discussion in the previous subsection). Anyway, if we assume this difference does not matter, we can conclude that the independent mutual contribution of the other parameters to the model response variance is ( , , , , ) = 1 − ≃ 0.06. This implies that if one were able to find the true value of , and sample the remainder conditionally on that value, then propagating the sample through Eq. (29) would yield a reduction of 94% of the variance of the cumulative flow (because there are no interactions in the model, = ). However, given the high accuracy with which has been estimated during the calibration process ( ∈ [0.008, 0.011] cm −1 , see Fig. 2 row # 4, column # 4), it is unlikely that its uncertainty can be further reduced.
What is the smallest subset of input parameters that explains the uncertainty in the predicted flow ? The answer to this question requires the calculation of the first-order sensitivity indices of groups of parameters until one of them is equal to one. These indices are called ''closed'' sensitivity indices because they are closed within a subset of inputs [77]. They are usually denoted by (… ) with a superscript . Here, they are simply denoted (… ) to be consistent with our notation in Eq. (1). The sensitivity to pairs of variables (Table 4) indicates that the couplet which has the highest contribution to the response variance is ( , ) (with ( , ) ≃ 0.97, in bold font), although there are several other pairs with similarly high values (e.g. ( , ) ≃ 0.96). By looking at triplets of parameters, it is found that the total response variance is almost entirely explained by ( , , ), since the effect of this group of parameters is ( , , ) ≃ 1 (bold font in Table 5). We note that ( , , ), ( , , ), ( , , ), ( , , ) and ( , , ) also capture most of the variance of the predicted cumulative flow as their sensitivity index is higher than 0.99. These results confirm our preliminary guess that because of the strong correlation between the input parameters, only a few explain the total variance of the predicted flow. Table 3 Sensitivity analysis of the model of drainage experiment. Estimated individual first-order sensitivity indices of the hydraulic parameters for predictive cumulative flow. In parenthesis, the 95% credible intervals. The highest value is highlighted in bold font.

Table 4
Sensitivity analysis of the model of drainage experiment. Estimated first-order sensitivity indices of couplets of hydraulic parameters (95% credible intervals). The highest value is highlighted in bold font.  Table 5 Sensitivity analysis of the model of drainage experiment. Estimated first-order sensitivity indices of triplets of hydraulic parameters. The highest value is highlighted in bold font.

Conclusion
Global sensitivity analysis of models with dependent inputs is a challenging issue. The reason is that even if a model response is structurally independent of a given input (by ''structurally'' we mean in its mathematical definition), it can appear to be sensitive to that input if it is strongly correlated to others. Computing the independent sensitivity indices helps to identify the input variables that are structurally linked to the model response of interest. This is the case when the independent total-order sensitivity index of an input is not null. However, if its independent total-order sensitivity index is null, it cannot be inferred that the model response does not structurally depend on that input. To conclude this, one has to perform the ANOVA decomposition by assuming independence, because in that case the ANOVA decomposition is unique.
In the present work, we have used the Polynomial Chaos Expansion (PCE) approach to estimate variance-based sensitivity indices. For this purpose, three transformations have been used: namely, the Rosenblatt transformation [44], the Nataf transformation [46] and [23] transformation. The decision of which transformation to use is problemdependent. For example, the Rosenblatt transformation should be preferred if the conditional distributions are known. The Nataf transformation is adequate when the input uncertainty is defined by their marginal distributions and their moment-product correlation matrix. With a given input sample, the [23] transformation is appropriate, provided that Eq. (13) holds.
The case studies here have demonstrated the efficiency of the proposed methodology, allowing the estimation of any variance-based sensitivity indices from one single sample. Finally, the sensitivity analysis of a drainage model posterior to its calibration highlights the importance of defining the objective of the sensitivity analysis, and thereby computing the sensitivity indices that are appropriate for the task, as recommended in [77]. For the calibrated drainage model, the