Sensitivity analysis for multidimensional and functional outputs

Let $X:=(X_1, \ldots, X_p)$ be random objects (the inputs), defined on some probability space $(\Omega,{\mathcal{F}}, \mathbb P)$ and valued in some measurable space $E=E_1\times\ldots \times E_p$. Further, let $Y:=Y = f(X_1, \ldots, X_p)$ be the output. Here, $f$ is a measurable function from $E$ to some Hilbert space $\mathbb{H}$ ($\mathbb{H}$ could be either of finite or infinite dimension). In this work, we give a natural generalization of the Sobol indices (that are classically defined when $Y\in\mathbb R$ ), when the output belongs to $\mathbb{H}$. These indices have very nice properties. First, they are invariant. under isometry and scaling. Further they can be, as in dimension $1$, easily estimated by using the so-called Pick and Freeze method. We investigate the asymptotic behaviour of such estimation scheme.


Introduction
Many mathematical models encountered in applied sciences involve a large number of poorly-known parameters as inputs. It is important for the practitioner to assess the impact of this uncertainty on the modeliid output. An aspect of this assessment is sensitivity analysis, which aims to identify the most sensitive parameters. In other words, parameters that have the largest influence on the output. In global stochastic sensitivity analysis, the input variables are assumed to be independent random variables. Their probability distributions account for the practitioner's belief about the input uncertainty. This turns the model output into a random variable. Using the so-called Hoeffding decomposition [14], the total variance of a scalar output can be split down into different partial variances. Each of these partial variances measures the uncertainty on the output induced by the corresponding input variable. By considering the ratio of each partial variance to the total variance, we obtain a measure of importance for each input variable called the Sobol index or sensitivity index of the variable [13]; the most sensitive parameters can then be i.dentified and ranked as the parameters with the largest Sobol indices. A clever way to estimate the Sobol indices is to use the so-called Pick and Freeze sampling scheme (see [13] and more recently [6]). This sampling scheme transforms the complex initial statistical problem of estimation into a simple linear regression problem. Widely used by practitioners in many applied fields (see for example [12] and the complete bibliography given therein), the mathematical analysis of the Pick and Freeze scheme has been recently performed in [6] and [5] (see also [15] where some mathematical draft ideas are given). In the last decade, many authors have proposed some generalizations of Soboliid indices for scalar outputs (see for example [1], [11], [10] and [3]). The aim of the present paper is twofold. First, we wish to build some extensions of Sobol indices in the case of vectorial or functional output. Secondly, we aim to construct some Pick and Freeze estimators of such extensions and to study their asymptotic and non-asymptotic properties. Generalization of the Sobol index for multivariate or functional outputs has been considered in an empirical way in [2] and [7]. In this paper, we consider and study a new generalization of Sobol indices for vector or functional outputs. This generalization was implicitly considered in the pioneering work of Lamboni et al ( [7]). The starting point of the construction of these new indices relies on the multidimensional Hoeffding decomposition of the vectorial output.
Further, due to non-commutativity, many choices for an extension of Sobol indices are possible. To restrict the choice we both require that the indices satisfy natural invariance properties and remain easy to estimate when using a Pi.ck and Freeze sampling scheme. The paper is organized as follows. To begin with, we start in the next section by developing and discussing two examples. These examples illustrate the difficulty of extending directly scalar Sobol indices to a multidimensional context. The generalized Sobol indices and their main properties are given in Section 3. In a nutshell, these newiid Sobol indices are the same as the classical ones of the unidimensional context, up to the trace operation taken on both terms of the ratio. We show that these quantities are well-tailored for sensitivity analysis, as they are invariant under isometry and scaling of the output. In Section 4, we introduce another general family of Sobol matricial indices. They are also compatible with the Hoeffding decomposition and they also satisfy the natural invariance properties. Each element of this family depends on a probability measure on the group of signed permutation matrices. The main drawback of these quantities is that they are not so easy to estimate unlike the indices introduced in Section 3. In Section 5 we revisit the Pick and Freeze sampling scheme and study the asymptotic and non-asymptotic properties of the Pick and Freeze estimators of the new indices. These properties are numerically illustrated on two relevant examples in Section 5.4. To finish, the extension of our results, we present in Section 6 the case of functional outputs.

Motivation
We begin by considering two examples that enlighten the need for a proper definition of sensitivity indices for multivariate outputs.
Example 2.1. Let us consider the following nonlinear model where X 1 and X 2 are assumed to be i.i.d. standard Gaussian random variables (r.v.s).
First, we compute the one-dimensional Sobol indices S j (f a,b i ) of f a,b i with respect to X j ( i, j = 1, 2). We get So that, the ratios do not depend on b. Moreover, for |a| > 1, as this ratio is greater than 1, X 1 seems to have more influence on the output. Now let us perform a sensitivity analysis on Y 2 . Straightforward calculus lead to  Figure 1: Plot of (a − 1)(a 3 + a 2 + 5a For the quantity Y 2 , the region where X 1 is the most influent variable depends on the value of b. This region is not very intuitive. We plot in Figure 1 the region where X 1 is the most influent variable.
Example 2.2. Here, we study the following two-dimensional model We obviously get So that X 1 seems to have more influence on the output than X 2 .
If we consider Y 2 , we straightforwardly get Y 2 = X 2 1 that does not depend on X 2 .
A last motivation to introduce new Sobol indices is related to the statistical problem of their estimation. As the dimension increases the statistical estimation of the whole vector of scalar Sobol indices becomes more and more expensive. Moreover, the interpretation of such a large vector is not easy. This strengthens the fact that one needs to introduce Sobol indices of small dimension, which condense all the information contained in a large collection of scalars. In the next section we define new Sobol indices generalizing the scalar ones and resuming all the information.

Definition of the new indices
We denote by X := (X 1 , . . . , X p ) the random input, defined on some probability space (Ω, F, P) and valued in some measurable space E = E 1 × . . . × E p . We denote also by Y the output where f : E → R k is an unknown measurable function (p and k are positive integers). We assume that X 1 , . . . , X p are independent and that Y is square integrable (i.e. E Y 2 < ∞). We also assume, without loss of generality, that the covariance matrix of Y is positive definite. Let u be a subset of {1, . . . , p} and denote by ∼ u its complementary in {1, . . . , p}. Further, we set X u = (X i , i ∈ u) and E u = i∈u E i .
As the inputs X 1 , . . . , X p are independent, f may be decomposed through the so-called Hoeffding decomposition [14] f where c ∈ R k , f u : E u → R k , f ∼u : E ∼u → R k and f u,∼u : E → R k are given by Thanks to L 2 -orthogonality, computing the covariance matrix of both sides of (1) leads to Σ = C u + C ∼u + C u,∼u .
Here Σ, C u , C ∼u and C u,∼u are denoting respectively the covariance matrices of Y , f u (X u ), f ∼u (X ∼u ) and f u,∼u (X u , X ∼u ).
Remark 3.1. Notice that for scalar outputs (i.e. when k = 1), the covariance matrices are scalar (variances). So that (2) may be interpreted as the decomposition of the total variance of Y . The summands traduce the fluctuation induced by the input factors X u and X ∼u , and the interactions between them. The (univariate) Sobol index S u (f ) = Var(E(Y |X u ))/Var(Y ) is then interpreted as the sensibility of Y with respect to X u . Due to non-commutativity of the matrix product, a direct generalization of this index is not straightforward.
In the general case (k ≥ 2), for any square matrix M of size k, the equation (2) can be scalarized in the following way Of course we can analogously define The following lemma is obvious.
1. The generalized sensitivity measures sum up to 1 3. Left-composing f by a linear operator O of R k changes the sensitivity measure accordingly to 4. For k = 1 and for any M = 0, we have S u (M ; f ) = S u (f ).

The important identity case
We now consider the special case M = Id k (the identity matrix of dimension k).
Notice that in this case the sensitivity indices are the same as the ones considered through principal component analysis in [7]. We set S u (f ) = S u (Id k ; f ). The index S u (f ) has the following obvious properties for any λ ∈ R, S u (λf ) = S u (f ); Remark 3.2. The properties in this proposition are natural requirements for a sensitivity measure. In the next section, we will show that these requirements can be fulfilled by S u (M ; f ) only when M = λId k ( λ ∈ R * ). Hence, the canonical choice among indices of the form S u (M ; f ) is the sensitivity index S u (f ).

Identity is the only good choice
The following proposition can be seen as a kind of reciprocal of Proposition 3.1. 2. M has full rank; f ) (C u and Σ being symmetric matrices). Thus we assume, without loss of generality, that M is symmetric.
We diagonalize M in an orthonormal basis: M = P DP t , where P t P = Id k and D diagonal. We have By assumption 1. and 3., M can be assumed to be diagonal.
Now we want to show that M = λId k for some λ ∈ R * . Suppose, by contradiction, that M has two different diagonal coefficients λ 1 = λ 2 . It is clearly sufficient to consider the case k = 2. Choose f = Id 2 (hence, p = 2), and u = {1}.
We have Σ = Id 2 and C u = ( 1 0 0 0 ). Hence on one hand S u (M ; f ) = λ1 λ1+λ2 . On the other hand, let O be the isometry which exchanges the two vectors of the canonical basis of R 2 . We have S u (M ; Of ) = λ2 λ1+λ2 . Thus 3. is contradicted if λ 1 = λ 2 . The case λ = 0 is forbidden by 2. Finally, it is easy to check that, for any λ ∈ R * , S u (λId k ; ·) = S u (Id k ; ·) = S u (·).

Remark 3.3. (Variational formulation)
We assume here that E(Y ) = 0, if it is not the case, one has to consider the centered variable Y − E(Y ). As in dimension 1, one can see that this new index can also be seen as the solution of the following least-squares problem (see [6]) As a consequence, S u (f )Y can be seen as the projection of Y u on {aY, a ∈ R}.
Remark 3.4. Notice that the condition Tr(Σ) = 0 (necessary for the indices to be well-defined) is fulfilled as soon as Y is not constant.
We now give two toy examples to illustrate our definition.
Example 3.1. We consider as first example with X 1 and X 2 i.i.d. standard Gaussian random variables. We easily get We have This result has the natural interpretation that, as X 1 is scaled by a, it has more influence if and only if this scaling enlarges X 1 's support i.e. |a| > 1.

About uniqueness
In this section, we show that it is possible to build other indices having the same invariance properties as S u (f ).

Another index
Here we use (2) in a different way to get a natural definition of a Sobol matricial index with respect to the variable X u . Indeed, we may choose as Sobol matricial index for any matrices A and B such that AB = Σ −1 . First, note that this index is a square matrix of size k. Second, any convex combination of Sobol matricial indices of the form (5) is still a good candidate for the Sobol matricial index with respect to X u .
where P is a matrix such that P t ΣP is diagonal and M k is the set of all square matrices of size k. The solution of this minimization problem is a good candidate to be a Sobol matricial index (with A = Σ −1 P and B = P t ). Note now that the symmetric version of this Sobol matricial index is also a good candidate (here A = P and B = P t Σ −1 ).
In order to warrant that a Sobol matricial index fulfills a consistent definition, we should require a little bit more. First, a reasonable condition should be that the Sobol matricial index is a symmetric matrix: the influence of the input X i on the coordinates k and l of the output Y should be the same as the influence of the input X i on the coordinates l and k of Y . Secondly, the Sobol matricial index should share the properties of the scalar index S u (f ). That is, it should be invariant by any isometry, scaling and translation. This leads to the definition of a family of matricial indices.
For the sake of simplicity, we assume that the eigenvalues of Σ are simple. Let 0 < λ 1 < . . . < λ k be the ordered eigenvalues and let (O i ) i=1,...,k be such that O i is the unit eigenvector associated to λ i whose first non-zero coordinate is positive. Let O be the (orthogonal) matrix whose column i is O i . Let H k be the group of signed permutations matrix of size k. That is, P ∈ H k if and only if each row and each column of P has exactly one non zero element, which belongs to {−1, 1}.
Notice that any orthogonal matrix that diagonalizes Σ can be written as OP , where P ∈ H k . Suppose that µ is a probability measure on H k . We define We then have the following Proposition.
Proposition 4.1. T u,µ is invariant by any isometry, scaling and translation.
Proof. Let U be an isometry of R k and set By integrating the above equalities with respect to µ, we obtain the invariance by isometry. The other invariances are obvious.
At first look, one may also consider matricial indices based on Nevertheless, these matricial indices are not admissible even if Σ −α Σ −β = Σ −1 (see (5)) since they are not invariant by isometry.

Remark 4.3.
For the sake of simplicity, we have restricted ourselves to the generic case where all eigenvalues of Σ are simple. When there is only l (l < k) distinct eigenvalues, the group H k has to be replaced by the much more complicated set of all isomorphisms P on R k that can be written as where Π is some permutation on {1, . . . , k} and O i (i = 1, . . . , l) is some isometry on R k letting invariant the orthogonal of the eigenspace associated with the i th eigenvalue.
Let µ be the uniform probability measure on the finite set H k .
Proof. One can see that H k = D P σ ; ∈ {−1, 1} k and σ a permutation of {1, . . . , k} where D = diag( ) and P σ the permutation matrix associated to σ that is Set P ,σ = D P σ the element of H k associated to and σ. Then and we have Using the previous Lemma in conjunction with (7), we obtain the following Sobol matricial index Notice that this matricial index only depends on the real number Tr Σ −1 C u /Tr(I k ) which is easy to interpret.

Comparison between S u (f ) and T u
We have defined two nice candidates to generalize the scalar Sobol index in dimension k. A natural question is: which one should be preferred? There is a priori no universal answer. Nevertheless, from a statistical point of view, T u presents a major drawback: its estimation may require the estimation of an inverse covariance matrix Σ −1 , which may be tricky. While the estimation of S u (f ) only uses estimation of traces of covariance matrices. Besides, the following example shows that T u may be useless in some models. 2 We easily get Thus whereas we have obtained previously, the more intuitive result Moreover T u is not informative since for a = 1, the indices T 1 and T 2 satisfy and do not depend on b.
Thus, it seems to us that S u (f ) is a more relevant sensitivity measure, and, in the sequel, we will focus our study on S u (f ).

The Pick and Freeze estimator
In practice, the covariance matrices C u and Σ are not analytically available. In the scalar case (k = 1), it is customary to estimate S u (f ) by using a Monte-Carlo Pick and Freeze method [13,6], which uses a finite sample of evaluations of f .
In this Section, we propose a Pick and Freeze estimator for the vectorial case which generalizes the T N estimator studied in [6]. We set Y u = f (X u , X ∼u ) where X ∼u is an independent copy of X ∼u which is still independent of X u . Let N be an integer. We take N independent copies Y 1 , . . . , Y N (resp. Y u 1 , . . . , Y u N ) of Y (resp. Y u ). For l = 1, . . . , k, and i = 1, . . . , N , we also denote by Y i,l (resp. We then define the following estimator of S u (f ) Remark 5.1. Note that this estimator can be written where C u,N and Σ N are the empirical estimators of C u = Cov(Y, Y u ) and Σ = Var(Y ) defined by

Asymptotic properties
A straightforward application of the Strong Law of Large Numbers leads to Proposition 5.1 (Consistency). S u,N converges almost surely to S u (f ) when N → +∞.
We now study to the asymptotic normality of (S u,N ) N .
Remark 5.2. Following the same idea, it is possible, for v ⊂ {1, . . . , p}, to derive a (multivariate) central limit theorem for We then can derive a (scalar) central limit theorem for S u∪v,N − S u,N − S v,N , a natural estimator of S u∪v −S u −S v , which quantifies the influence (for u∩v = ∅) of the interaction between the variables of u and v.
Proposition 5.3. Assume E(Y 4 l ) < ∞ for l = 1, . . . , k. Then (S u,N ) N is asymptotically efficient for estimating S u (f ) among regular estimator sequences that are function of the exchangeable pair (Y, Y u ).

Concentration inequality
In this section we apply Corollary 1.17 of Ledoux [8] to give a concentration inequality for S u,N . In order to be self-contained we recall Ledoux's result.
Corollary 5.1 (Corollary 1.17 of [8]). Let P = µ 1 ⊗ . . . ⊗ µ n be a product probability measure on the cartesian product X = X 1 × . . . × X n of metric spaces (X i , d i ) with finite diameters D i , i = 1, . . . , n, endowed with the l 1 metricd = n i=1 d i . Let F be a 1-Lipschitz function on (X, d). Then, for every r ≥ 0, i . Our concentration inequality is the following Proposition 5.4. Assume that Y is bounded almost surely in R k , that is, there exists ρ > 0 so that Y 2 < ρ. Let v l := Σ l,l /ρ 2 for l = 1 . . . k.
Then, for all t ≥ 0, we have Remark 5.3. Note that the use of Corollary 1.17 of Ledoux [8] leads to bounds improving the one obtained in [5].
A simple computation gives that We have: and symmetrically Applying several times the triangular inequality and that w 2 ≤ 1, we deduce and Thus, G is L-Lipschitz with L : Now we apply Corollary 1.17 of Ledoux [8] with for z = (x, y) ∈ X i and z = (x , y ) ∈ X i , x, x , y and y ∈ B k (0, 1), We then get the upper deviation bound of (13). To get the second bound, we repeat the procedure by replacing G (respectively t) by −G (resp. −t). Note that in this case, we take which is non-negative thanks to the minoration hypothesis on t.
The bounds in Proposition 5.4 depend on the unknown quantity S u (f ) which can not be computed in practice. To address this problem, we use the bound 0 ≤ S u (f ) ≤ 1 to get: Proof.

Proof of (14): by Proposition 5.4 we have
and so as, in this case, and we have

hence in this case
Finally, in all cases, we have (14).
2. Proof of (15): hypothesis t ≥ 9/(8N ) ensures that the second part of Proposition 5.4 can be applied. We have As necessarily S u (f ) − t − 1 ≤ 0, we have: as t ≤ 1. Hence which completes the proof.

Numerical illustrations
In this section, we provide numerical simulations for the sensitivity indices S u (f ) defined in Section 3.

Toy example
We consider again Example 3.2 with k = p = 2, a = 2 and b = 3 which leads to the following model In the "Gaussian case" (respectively "Uniform case"), we take X 1 and X 2 independent standard Gaussian random variables (resp. independent uniform random variables on [0; 1]). In these two cases, a simple analytic calculus yields the true values of the sensitivity indices S 1 (f ) and S 2 (f ).  Table 1: Estimated coverages of the 95% confidence intervals for S 1 (f ) and S 2 (f ).
Asymptotic confidence interval We perform 100 simulations of the estimated Pick and Freeze confidence interval given by Proposition 5.2 for N = 100, 200 and 10000. In each case, we estimate the coverage of the 95% confidence interval procedure by counting the proportion of estimated intervals containing the true value.
The results are gathered in Table 1. We see that the estimated coverages are close to the theoretical level of 95% with a coverage higher than the theoretical one in the Uniform case and lesser in the Gaussian case.
Concentration inequality We notice that the concentration inequality (Proposition 5.4) can not be applied to the Gaussian case since Y 2 is not bounded. Hence we only study the Uniform case.
For different values of t, we compute the (estimated) smallest N so that the upper bound of P (|S u,N − S u (f )| ≥ t) of Corollary 5.2 achieves 5% (i.e., the sum of the right-hand sides of (14) and (15) is less than 0.05). The constant V is estimated empirically. The results of these computations are displayed in Figure 2. The set u is {1} or {2}.
These results clearly show that the confidence intervals produced by the use of the concentration inequality on S u (f ) require a large sample size. As a consequence, its use is only possible when many evaluations of the output function f are available.

Mass-spring model
In this section, we consider the displacement x(t) of a mass connected to a spring for t ∈ [0; 40]. This displacement is given by the following second-order differential equation together with initial conditions x(0) = l, x (0) = 0. We use the readily-available analytical closed-form expression of this initial-value problem for x(t).
The input parameters are X = (m, c, k, l) (so that p = 4) whose interpretations and distributions are given in Table 2.
The output vector is defined by , . . . , x(t 800 )) , for t i = 0.05i and k = 800.    Table 3: Results of the estimation of the first-order generalized Sobol indices in the spring-mass model.
Sobol indices of Y (t i ) for i = 1, . . . , 800 and each input variable. This gives the plot of Figure 3. This plot seems difficult to interpret since we can see that the indices for l, k and m oscillate rapidly, leading to a frequent change of their respective rankings as time evolves. This is an additional motivation for using the generalized Sobol indices considered in this paper, easier to interpret. Note that, for large values of t, the first-order indices do not sum up to 1 meaning that interactions between the variables have a large influence for such t's.  Table  3. This computation makes clear that the ranking of the first-order influence indices of each input parameter is

Case of functional outputs
In many practical situations the output Y is functional. It is then useful to extend the vectorial indices to functional outputs. This is the aim of the following section.

Definition
Let H be a separable Hilbert space endowed with the scalar product ·, · and the norm || · ||. Let f be a H-valued function, i.e. Y and Y u are H-valued random variable. We assume that E Y 2 < ∞. Recall that E (Y ) is defined by duality as the unique member of H satisfying Recall that the covariance operator associated with Y is the endomorphism We also recall that it is a well known fact that E Y 2 < ∞ implies that Γ is then a Trace class operator and its trace is then well defined. We generalize the definition of S u (f ) introduced in Section 3 for functional outputs: In the next lemma we give the so-called polar decomposition of the traces of Γ and Γ u . Lemma 6.1. We have Let (ϕ l ) 1≤l be an orthonormal basis of H. Then Now, in view of estimation, we truncate the previous sum by setting Remark 6.1. It amounts to truncate the expansion of Y to a certain level m. Let Y m be the truncated approximation of Y : seen as a vector of dimension m, and results of Section 5 can be applied to Y m . Notice that Y m is than the projection of Y onto Span(ϕ 1 , . . . , ϕ m ).

Estimation of S u,∞ (f )
As in Section 5, we define the following estimator of S u,∞ (f ): Let T be a H-valued random variable. For any sequence (T i ) i∈N * of iid variables distributed as T , we define In the spirit of [4], we decompose D N,m (T ) and give asymptotics for each of the terms of the decomposition. Proposition 6.1.
1. D N,m (T ) can be rewritten as the sum of a totally degenerated U-statistic of order 2, a centered linear term and a deterministic term in the following way 2. Assume that there exists δ > 1 so that iid and δ > 1 so that Then for any m = m(N ) so that: we have Proof. In order to simplify the notation, we set U N K := U N K(T ), P N L := P N L(T ), B m := B m (T ) and P N L := P N L (T ).

a) Term B m
Since ∞ l=1 v l < +∞, (v l ) l is bounded, let K be so that max l≥1 v l ≤ K. We have, for sufficiently large m(N ) (hence sufficiently large N ), Hence, and both terms go to zero when N → +∞ by (19). Hence E Z i1,l Z i1,k Z j1,l Z j1,k , E 2 = 4 N 4 B m to get, for sufficiently large m(N ) (hence sufficiently large N ), As a consequence, by (19), E 1 = o 1 N and we obtain that E (U N K) 2 = o 1 N . c) Term P N L By Markov inequality we have Hence, it is sufficient to prove that N E |P N L − P N L| 2 → 0 when N → +∞. But where A m,i := l>m (W i,l − 2e l Z i,l ). Then A m,i = Var (A m,1 ) .
The last inequalities come from the fact that (A m,i ) i=1,...,N are centered and i.i.d r.v.s. Indeed, since by assumption (17), we can apply Tonelli's theorem to show that l>m E (|W i,l |) and l>m E (|Z i,l |) are finite. Hence, by Fubini's theorem and the fact that each variable W i,l − 2e l Z i,l is centered, we get which proves that A m,i is centered.
It remains now to upper-bound Var(A m,1 ). Before starting the proof of the theorem we state an auxiliary lemma:

Var(A
We conclude using the central limit theorem for random variables valued in an Hilbert space (see e.g. [9]).
Proof. First we note that S u,m,N = 1 2 The proof will be decomposed into 3 steps. where By Proposition 6.1 2., it is enough to prove a CLT for P N L . This is the case since it is an empirical sum of i.i.d. centered random vectors.
• Step 2 Using Lemma 6.2 and the Delta method, we derive a CLT for • Step 3 We conclude using the Delta method.