Sensitivity-enhanced generalized polynomial chaos for efficient uncertainty quantification

We present an enriched formulation of the Least Squares (LSQ) regression method for Uncertainty Quantification (UQ) using generalised polynomial chaos (gPC). More specifically, we enrich the linear system with additional equations for the gradient (or sensitivity) of the Quantity of Interest with respect to the stochastic variables. This sensitivity is computed very efficiently for all variables by solving an adjoint system of equations at each sampling point of the stochastic space. The associated computational cost is similar to one solution of the direct problem. For the selection of the sampling points, we apply a greedy algorithm which is based on the pivoted QR decomposition of the measurement matrix. We call the new approach sensitivity-enhanced generalised polynomial chaos, or se-gPC. We apply the method to several test cases to test accuracy and convergence with increasing chaos order, including an aerodynamic case with $40$ stochastic parameters. The method is found to produce accurate estimations of the statistical moments using the minimum number of sampling points. The computational cost scales as $\sim m^{p-1}$, instead of $\sim m^p$ of the standard LSQ formulation, where $m$ is the number of stochastic variables and $p$ the chaos order. The solution of the adjoint system of equations is implemented in many computational mechanics packages, thus the infrastructure exists for the application of the method to a wide variety of engineering problems.


Introduction
In all practical applications, the performance of an engineering system is affected by stochastic uncertainties in the system parameters, boundary conditions etc. Significant research has been directed towards quantifying the effect of such uncertainties on the performance of engineering systems. The performance is evaluated by a metric, called Quantity of Interest (QoI), that depends on the system under investigation. For example, in the area of aerodynamics it can be the lift or drag coefficient of an airfoil, and the uncertainty may arise due to stochastic variations in the angle of attack, approaching velocity etc. This area of research is known as Uncertainty Quantification (UQ).
The simplest UQ approaches are random sampling techniques, such as the Monte-Carlo (MC) [1,2]. Such approaches however require a large number of samples for the statistics of the QoI to converge and are therefore computationally expensive. On the other hand, methods based on Polynomial Chaos Expansion (PCE) have much reduced computational cost. In these approaches, the uncertain QoI is expanded in terms of multivariate polynomial bases that are orthogonal with respect to the joint probability density function (PDF) of the uncertain parameters. The method was first introduced in [3] for Gaussian stochastic variables and later extended for other distributions in [4]. These methods have been very successful in predicting the statistics of the QoI in a wide range of applications, such as applied mechanics, fluid dynamics, space, medicine. etc [5,6,7,8,9,10].
In the standard Galerkin-projection formulation of these methods, the cost of computing the spectral coefficients of the QoI expansion scales exponentially with the dimension of the stochastic space. For the non-intrusive projection methods, this is due to the large number of sampling points when extending onedimensional quadrature integration schemes to multi-dimensions using tensor products, for details see the book of [11]. At each sampling point, the evaluation of the model describing the system is by far the most time consuming part of the method. In sparse grid approaches such as Smolyak grids, [12,13,14,15], the spectral coefficients are computed from evaluations at a significantly smaller number of sample points. In [16] an adaptive method to build a sparse polynomial chaos (PC) expansion basis using least-angle regression (LARS) is introduced that requires a reduced number of evaluations.
Another approach to compute the spectral coefficients is based on leastsquares (LSQ) regression, that minimises the expectation of the square of the difference between the QoI evaluation and the PCE prediction at different sampling points, [17,16,11]. A review of the techniques used to select the evaluation sampling points is provided in [18]. Several optimality criteria are discussed, such as the alphabetic D-, A-, E-, K-and I-criteria, and a hybrid sampling method, called alphabetic-coherence-optimal, is proposed.
For the least square regression to provide accurate results, the model has to be evaluated at a number of sampling points, say q, which scales with the number of unknown PCE coefficients. The oversampling ratio is usually between 2-3 [16] or larger [18]. For m stochastic variables and polynomial chaos order p, the number of PCE coefficients is (p+m)! m!p! , see also equation (4) below, which behaves asymptotically as ∼ 1 p! m p when m → ∞, [16]. Thus, the number of evaluations grows rapidly with increasing p and m, and the computational cost quickly becomes intractable. Similarly, for Smolyak quadrature the computational cost scales as ∼ 2 p p! m p , [16]. For the smallest chaos order p = 1, the cost grows linearly with m in either approach.
In the present paper, we aim to reduce the computational cost of the LSQ method. In the standard formulation, each model evaluation provides one equation for the linear system of the PCE coefficients. We reformulate the method by enriching the system with additional equations that can be obtained very efficiently. More specifically, if the underlying system is described by a set of linear or non-linear Partial Differential Equations (PDE), the solution of the adjoint system allows for a very efficient evaluation of the sensitivity of the QoI with respect to all m stochastic variables at each sampling point, [19,20,21,22].
The cost of the solution of the (linear) adjoint system is comparable to that of the governing PDE. Therefore at each sampling point we can obtain 1+m equa-tions at the cost of two evaluations, one for the direct and one for the adjoint system. Thus for q sampling points, we have a linear system of q(1 + m) equations, and if we make this number of equations commensurate to the number of unknown PCE coefficients, we get q(1 + m) ∼ (p+m)! m!p! , which for large m becomes q(1 + m) ∼ m p or q ∼ m p−1 . Thus the computational cost now scales as ∼ m p−1 , as opposed to ∼ m p of the standard formulation. For example, for p = 1 the cost becomes independent of m. This drastic reduction of the computational cost, by a factor m, is the central contribution of this paper.
The solution of the adjoint equations is implemented in many computational mechanics packages (commercial or open source). This sensitivity information is mainly used for optimisation purposes (using gradient descent for example), but it can be also employed for UQ as described in the rest of the paper. In order to minimise the number of sampling points, and make the oversampling ratio close to 1, we employ the pivoted QR decomposition of the measurement matrix [23,24,25]. We call the new approach sensitivity-enhanced gPC, or se-gPC.
The paper is organized as follows. In section 2, an overview of UQ using gPC with LSQ is outlined. Section 3 presents the formulation of sensitivity-enhanced gPC, followed by section 4 that reviews the adjoint method for efficient computation of sensitivities. Section 5 discusses the sampling of the input stochastic space. Results for several test cases are presented in section 6 and we conclude in section 7.

Uncertainty Quantification with gPC
Let P = (Ω, Σ, dP) denote a complete probability space, where Ω is the sample set of random events, and dP is a probability measure defined on the σ-algebra Σ. An uncertain input is modelled as a stochastic vector ξ of m independent stochastic variables ξ i , i.e. ξ = [ξ 1 , . . . , ξ m ] : Ω → R m with m ∈ N. We use subscripts to denote random variables, and superscripts for realisations, for The stochas-tic variable ξ i follows a probability density function (PDF) w i (ξ i ), defined in the domain E i . Since the variables are independent, the m-dimensional vector The polynomials are normalized so that Ψ j , Ψ j = 1. Note that unless specified otherwise, repeated indices don't imply summation. When m > 1, the multidimensional basis functions Ψ j are defined as tensor products of the univariate polynomials ψ, for details see [11].
A Quantity of Interest (QoI) M : ξ → M ∈ R with finite variance can be written in terms of Ψ i as, In practise, the above expansion is truncated by limiting the maximum order of the polynomials Ψ i (ξ) to a value p, known as chaos order. In this case, we get where (ξ) is the truncation error, and the number of retained P + 1 terms is given by The statistical moments of M can be computed algebraically from the spectral In this work, we use the Weighted Least Squares (WLSQ) approach to compute the coefficients c i . A set of q samples of ξ, ξ (i) (i = 1, . . . , q), are defined and the value of the QoI at each sample is computed, forming the vector The set of the unknown spectral coefficients c i forms the vector c = [c 0 , . . . , c P ] ∈ R P +1 , and using this notation, the truncated expansion (2) can be written as where ψ is the measurement matrix with elements ψ ij = Ψ j ξ (i) ∈ R q×(P +1) , i.e. the i-th row contains the values of the orthogonal polynomial bases Ψ j evaluated at the i-th sample point, ξ (i) . In expanded form, ψ is written as The coefficient vector c is computed as a solution to the following weighted least-squares minimization problem, where . 2 is the Euclidean norm, W The computational cost of forming and solving the above system is determined by the number of samples q. When random Monte Carlo sampling that follows the PDF is employed to obtain ξ (i) , q P + 1 is required for the system (9) to be well-conditioned. For the close connection between the LSQ approach as q → ∞ and the non-intrusive Galerkin projection with Gaussian quadrature, see chapter 3 of the book [11].
We use the following weighting functions for Gaussian and uniform input distributions, where the subscripts 'H' and 'L' refer to Hermite and Legendre polynomials respectively. These analytic expressions are derived via an approach known as asymptotic sampling [18,17,26].
System (9) will be augmented with information on the sensitivity of M(ξ (i) ) with respect to ξ (i) ; this is detailed in the next section. The sensitivities with respect to all the elements of ξ (i) i.e. all random variables, can be evaluated very efficiently with a single adjoint computation, as explained in section 4.

Sensitivity-enhanced gPC
Taking the gradient of the truncated expansion (3) with respect to the k-th random variable at the j-th sample point, we get For each k, we have a block of q equations that can be written in matrix form We now stack together one block of q equations of the form (6) (arising from the evaluation of M ξ (j) ) and m blocks, of q equations each, of the form (12) (arising from the evaluation of sensitivity of M ξ (j) at all sample points with respect to each uncertain variable). In total, there are (1 + m) × q equations that can be put in matrix form as where is a block column vector, and is a row block matrix.
We now define the following weighted least squares minimisation problem where W is a block diagonal weighting matrix, consisting of 1 + m blocks of which is a generalisation of the minimisation problem (8).
The vectorĉ that minimises the above function is obtained from the solution of the normal set of equations which is analogous to system (9).

Efficient computation of sensitivities with an adjoint formulation
We now consider the computation of the sensitivities dM dξ (j) k (k = 1 . . . m) at the j-th sampling point. We assume systems that are described by a set of partial differential equations that after spatial discretisation take the form where u ∈ R Nu is the state vector and R : R Nu × R m → R Nu varies smoothly with u and ξ. For simplicity, we suppose that (20) describes mathematically a steady problem, i.e. u does not depend on time, but the analysis can be easily extended to unsteady problems as well. For example, in the context of Computational Fluid Dynamics (CFD), equation (20) represents the discretised steady-state Navier-Stokes and continuity equations, the state vector u contains the velocity and pressure at discrete points in the computational domain, and the stochastic vector ξ describes uncertainties in the geometry, boundary conditions etc. The QoI can therefore be written as where u(ξ) is the solution of (20) for one realisation of ξ.
Using the adjoint formulation, the sensitivities dM sampling point j can be computed at a cost which is independent of the dimension m of the vector ξ. Below we outline the basic steps of the discrete adjoint approach, more details can be found in [27] and the review paper [20]. It is also possible to derive a continuous adjoint formulation, but this requires to use the continuous form of the equations from which the discrete system (20) is derived.
The discrete formulation suffices for the purposes of this section. We employ the continuous formulation in one of the test cases of section 6.
We start by defining the Lagrangian L : where λ ∈ R Nu are known as Lagrange multipliers (or adjoint variables), and u(ξ) is the solution of (20) at the j-th sampling point, and ∂R ∂u is the Jacobian matrix. The above expression can be re-arranged as, The Lagrange multipliers are chosen so that dL dξ becomes independent of variations of u with respect to ξ. This is achieved by setting the term within square brackets equal to 0, resulting in the Field Adjoint Equation (FAE) Solving (25), the Lagrangian multiplies λ are obtained and the sensitivities can be computed from The state and adjoint variables, u and λ respectively, have the same dimension (equal to N u ) and the solution algorithms for the two sets of equations, (20) and (25), are similar. In the following, we will assume that the computational cost is also the same. Note that in practise the cost of solving (25) is lower compared to that of (20), because the former set is linear, so no outer iterations to deal with the non-linearity are necessary. The cost of (26) is considered negligible.
To summarise, the solution of equation (25) at each sampling point provides a set of m extra conditions at the cost of one additional evaluation. It is exactly this property of the adjoint formulation that is exploited in the se-gPC method to enrich efficiently the least squares system (9).
For the lowest chaos order approximation with p = 1, the number of coefficients, see equation (4), is equal to P + 1 = (1+m)! 1!m! = 1 + m. One direct and one adjoint evaluation can yield 1 + m equations at a single sampling point, thus making the computational cost of the se-gPC independent from the dimension of the stochastic space m.
For the standard least squares method, if q sampling points are needed for a well conditioned least-squares system (9), q equations are obtained at a cost of q evaluations. For the same number of sampling points, the se-gPC will yield (1 + m)q equations, at a cost of 2q evaluations. If the number of equations is kept constant and equal to q, then se-gPC allows one to use a smaller number of sampling points, equal to q/(1 + m), with cost 2q/(1 + m) evaluations. For cases with a large number of uncertain parameters, this is a very significant reduction in computational cost.
If the computational cost is kept constant (for example due to a fixed computational budget) and equal to q evaluations, the number of sampling points with se-gPC will be q/2 and the number of equations (1 + m)q/2.
In all the above scalings, we have assumed that for each sampling point, both the direct and the adjoint problems are solved. This need not be the case, for example for some points only the direct problem is solved, while in other points both problems are solved (note that the adjoint problem requires the solution of the direct problem).
The minimum number of equations needed to compute the spectral coefficients is equal to P + 1 (this corresponds to oversampling ratio equal to 1), thus P + 1 and (P + 1)/(1 + m) sampling points are required for the standard gPC and se-gPC respectively. Of course, one can use larger oversampling ratios. In the present paper we are using the smallest number of sampling points.
In figure 1  Since in the paper we are using the smallest number of sampling points, their selection needs to be done carefully; this is examined in the next section.

Sampling of the Stochastic Space
There are many different algorithms to sample the stochastic space for least squares PCE, for an overview see [18]. In this paper we apply the pivoted QR decomposition algorithm, which has been used for sparse sensor placement for flow or image reconstruction [24], or to find optimal evaluation points and weights for integration [23]. This is a greedy algorithm, that maximises the determinant of an appropriate matrix; in this sense it is a D-optimal design method. For the application of this algorithm for sampling the stochastic space, see [25,24]. Below we summarise the central idea and implementation.
Consider again the standard least-squares problem, equation (6), repeated below for convenience, and assume that the elements of the residual vector are independent, zero mean Gaussian random variables with the same standard deviation, i.e. for the i-th Assuming for the time being that the weighting matrix is identity, W = I q , the solutionĉ of the least squares system satisfies ψ ψ ĉ = ψ Q, see (9). The covariance matrix of the differenceĉ − c is given by This can be easily proved, where E(.) is the expectation operation. If we include the weighting matrix and follow the same steps, the covariance matrix becomes The measurement matrix ψ contains information for all q sampling points, see (7). The objective of the D-optimal design is to select the sampling points that minimise the determinant of the matrix ψ ψ −1 or ψ W ψ −1 . Here we follow a greedy approach, i.e. from a large pool of q candidate sample points, we select the required number of points (equal to P + 1 for the standard least squares method) that maximise the determinant. The process is as follows.
To select only a subset of the q points, we multiply equation (27) with the row selection matrix P ∈ R (P +1)×q . At each row of P all elements are 0, except the element at the column that corresponds to the selected sampling point, which takes the value of 1. Thus (27) becomes, where Q P = P Q now contains only the values of the QoI M at the selected P + 1 sampling points. The covariance matrix is and if we include the weighting matrix, it becomes The aim now is to chose the row-selection matrix P that minimizes the determi- or equivalently maximise the determinant of the inverse, i.e.
There are very robust subroutines to perform this decomposition; in this paper we have used the 'qr' command of MATLAB. The permutation matrix P is chosen so that the diagonal elements R ii of the upper diagonal matrix R are ordered in descending order, i.e.
The absolute value of the determinant is the product of the diagonal entries [24], In the present paper, we apply pivoted QR decomposition to the matrix we identify the P + 1 points required by the standard LSQ approach, and then select the most important (P + 1)/(m + 1) points (i.e. the ones with the highest diagonal elements |R ii |) and apply on those the se-gPC.

An example with two stochastic variables
We have applied the QR decomposition algorithm to the case of m = 2 uncertain input variables, ξ 1 , ξ 2 , both following standard normal distribution, N (0, 1). We selected chaos order p = 4, that results in P + 1 = 15 unknowns.   Results for chaos order p = 4. The stochastic input is ξ ∈ R 2 with ξ 1 , ξ 2 ∼ N (0, 1) sampled at 10000 random points.
We repeated the process for chaos order p = 2 and the same input variables; now there are P +1 = 6 unknowns. The joint PDF and the selected QR sampling points are shown in figure 3. Again the first point is placed at the peak of the PDF, i.e. ξ 2 = 0, and 5 points are symmetrically placed at a radius ∼ 1.75σ.
We now proceed to compute the condition number (CN) of the matrix W 1 2 ψ . Note that the matrix appearing in (9) has CN which is the square    In the following section, we apply standard WLSQ, se-PCE and Smolyak quadrature in several test cases and compare the three methods in terms of accuracy and computational cost.

Stochastic Linear ODE
A common benchmark for PCE methods is the linear ODE [4,28], where k is a single stochastic variable (so m = 1), uniformly distributed in the interval [0, 1], i.e. the PDF is f k (k) = 1, 0 ≤ k ≤ 1, or k ∼ U (0, 1). For each k realisation, the solution of (40) is and the stochastic mean and variance can be computed analytically as, The quantity of interest is u(t; k), and its derivative (sensitivity) with respect to k is, For this test case, gPC provides accurate results for the mean and variance for small t, but as the time horizon t grows, the results increasingly deviate from the analytical solution (42); for more details and the underlying fundamental physical reason, refer to [28].
We compute the spectral coefficients c i using Smolyak quadrature, as well as WLSQ and se-gPC at the QR sampling points. We consider a high chaos order p = 6, and since m = 1, there are in total P + 1 = p + 1 = 7 coefficients.  comparable with the other UQ approaches. Notice that for a small time window, say t < 10, the variance is predicted with high accuracy, but for larger t the accuracy deteriorates, as mentioned earlier.
The computational advantages of the method are not evident in this case because these is only one uncertain variable. Next, we consider cases with more uncertain variables.

Ishigami Function
The Ishigami function [29], is a commonly used benchmark case for UQ and Global Sensitivity Analysis [30,31,32,33]. It has three stochastic variables X 1 , X 2 and X 3 (so m = 3) that follow a uniform distribution in the interval [−π, π]. This is a challenging case because the QoI, Y , is a strongly nonlinear and non-monotonic function, thus requires a high spectral order for accurate results; it is therefore an excellent case to assess the performance of se-gPC.
The statistical moments and sensitivity measures of Y can be computed analytically. For α = 7 and β = 0.1, the reference values for the first four moments are µ Y = 3.5000, σ Y = 3.7208, δ Y = 0 and κ Y = 3.5072 [34]. We In the Ishigami function, where the number of uncertain variables is relatively small (m = 3) and the required chaos order is high, the computational advantages of the se-gPC are not maximized, even though the method still outperforms the sparse quadrature. In the following, we consider cases with larger  values of m.

Two-dimensional viscous Burgers' equations
We consider the two-dimensional viscous Burgers' equations, The equations for the adjoint variables u + and v + can be easily derived and take the form, The boundary conditions for u + are u + (0, y) = 0, u + (x, 0) = 0 and u + (x, 1) = 0.
Similarly for v + , we get v + (0, y) = 0, v + (x, 0) = 0 and v + (x, 1) = 0. The Neumann conditions at the exit, together with the form of the QoI, lead to Robin boundary conditions for the adjoint variables, The sensitivities of the QoI with respect to s i (i = 1, . . . , m) are given by   To shed more light into this behaviour, a contour plot of C L in the (a ∞ , M ∞ ) plane is shown in in figure 11; the examined rage is ±3σ around the nominal values for both stochastic variables. In the plot, we also superimpose the QR sampling points for p = 3, with the labels indicating the ranking order. As expected, the highest ranked points are closer to the mean. However, contrary to figures 2b and 3, the mean value is not one of the QR points. We have empirically observed that for Gaussian uncertainties the QR points include the mean value only for even p, while for odd p the points are placed on either side of the mean.
As can be seen, the lift coefficient depends strongly on both parameters. It is well known that the variation of C L with M ∞ in the transonic regime is highly non-linear [36].  sensitivities dC L /dM ∞ , dC L /da ∞ are computed with one direct and one adjoint evaluation, thus we need only 4 QR points (that will provide 12 equations). As can be seen from the table, the mean and standard deviation are computed with less than 1% error, and even for the kurtosis the error is less than 10%. Method

Geometric and free-stream uncertainties
In the second scenario, we consider uncertainties in both the free-stream conditions as well as the geometry. Uncertainties in the geometric shape are present in physical settings and can affect the airfoil performance. To model them, we employ Hicks-Henne bump functions, which have been used to describe shape deformation in aerodynamic optimization and wing design problems [37,38]. The parameterized geometry is given by, where m G is the number of bump functions, a i is the bump amplitude, t i controls the width of the bump, and h i determines the location of the bump maximum.
We consider m G = 38 stochastic coefficients a i that follow normal distribution, The standard deviation is selected to be small; larger values can result in non-physical airfoil shapes. In total, we have m = m G + 2 = 40 stochastic parameters, with large variance in the free-stream conditions and small variance for the parameters that describe the geometry. This case mirrors some of the features that are found in robust aerodynamic shape optimization.
Note that the number of uncertain parameters is not an exaggeration, similar cases in aerodynamic shape optimization use dozens of shape parameters [39,40,41]. The open source CFD software SU 2 [35] was used again to simulate the flow and calculate the sensitivities of C L with respect to the geometric parameters using the adjoint approach.
The PDF of the lift coefficient C L , obtained from 5000 Monte Carlo simulations, is shown in figure 13. The PDF remains bi-modal as in the first scenario, but has different statistical properties. The performance of the airfoil is significantly affected by the geometric uncertainties. For example, the mean value of the lift is reduced by 8.3% from 0.687 to 0.630. The higher order moments are also affected.
The computational cost (number of evaluations) of the three methods con-      4). As mentioned earlier, for p = 1 the cost of the se-gPC is independent of the number of uncertain variables. In the context of aerodynamic design the number of control variables is usually large, and it is useful to be able to get fast estimations of the mean and standard deviation. Larger values of p allow accurate approximations of high order moments also at a much reduced computational cost. For example, the se-gPC produced the most accurate prediction of the skewness and it was the only method that was able to produce an estimation of the kurtosis.

Conclusions
We propose a method to enrich the Least Squares PCE approach for un- and the results were found to be of comparable accuracy for the same chaos order, but were obtained at a much reduced computational cost, between one to two orders of magnitude smaller for the cases examined.
The se-gPC aims to speed up non-intrusive UQ approaches when adjoint sensitivity information is available. The latter is also useful for optimisation purposes and many open source and commercial packages provide solvers for the adjoint equations. This functionality can be directly exploited for efficient UQ calculations for complex engineering problems with many stochastic variables.