Nested polynomial trends for the improvement of Gaussian process-based predictors

doi:10.1016/j.jcp.2017.05.051

Journal of Computational Physics

Volume 346, 1 October 2017, Pages 389-402

https://doi.org/10.1016/j.jcp.2017.05.051 Get rights and content

Abstract

The role of simulation keeps increasing for the sensitivity analysis and the uncertainty quantification of complex systems. Such numerical procedures are generally based on the processing of a huge amount of code evaluations. When the computational cost associated with one particular evaluation of the code is high, such direct approaches based on the computer code only, are not affordable. Surrogate models have therefore to be introduced to interpolate the information given by a fixed set of code evaluations to the whole input space. When confronted to deterministic mappings, the Gaussian process regression (GPR), or kriging, presents a good compromise between complexity, efficiency and error control. Such a method considers the quantity of interest of the system as a particular realization of a Gaussian stochastic process, whose mean and covariance functions have to be identified from the available code evaluations. In this context, this work proposes an innovative parametrization of this mean function, which is based on the composition of two polynomials. This approach is particularly relevant for the approximation of strongly non linear quantities of interest from very little information. After presenting the theoretical basis of this method, this work compares its efficiency to alternative approaches on a series of examples.

Introduction

The numerical cost of many codes to simulate complex physical systems is still very high. To perform sensitivity analyses, uncertainty quantification or reliability studies, these computer models have therefore to be replaced by surrogate models, that is to say by fast and inexpensive mathematical functions. Within the computational science community, when the maximal available information is a finite set of code evaluations, the most widely used surrogate models are the generalized polynomial chaos expansion (PCE) [11], [10], [32], [7], [21], [1], [27] and the Gaussian process regression (GPR), or kriging (see [30], [24], [34]).

On the one hand, the main idea of PCE is to expand the code output, which is denoted by g in the following, onto an appropriate basis made of orthonormal multivariate polynomials, which are related to the distribution of the code input variables. As the number of unknown expansion coefficients usually grows exponentially with the number of input parameters, the relevance of these approaches strongly depends on their ability to select the most relevant basis functions. To this end, several penalization techniques, such as the $ℓ_{1}$ -minimization [33], [15] and the least Angle Regression (LAR) methods [14], [9], [5], have been introduced to select polynomial basis sets that lead to more accurate PCE than would have been obtained if the basis is a priori fixed. Taking advantage of the tensor-product structure of the multivariate polynomial basis, separated representations, such as low-rank approximations [23], [19], have alternatively been proposed to develop surrogate models with polynomial functions in highly-compressed formats.

On the other hand, the GPR is based on the assumption that the code output is a particular realization of a Gaussian stochastic process, Y. This hypothesis, which was first introduced in time series analysis [26] and in optimization [20], is widely used as it allows dealing with the conditional probability and expectation, while leading to very interesting results in terms of computer code prediction. Hence, contrary to the PCE, the GPR is not associated with an a priori projection basis, but requires the introduction of the mean and the covariance functions of Y. In practice, we observe that the role of the mean function of Y on the prediction decreases when the number of code evaluations increases. This explains that in applications where many code evaluations are available, good GPR-based surrogate models can be obtained using constant or linear trends for the mean function. On the contrary, when the number of code evaluations is low compared to the complexity of g, it can be very useful to optimize it. In that case, searching the mean function of Y as a well-chosen sum of polynomial functions can indeed strongly improve the relevance of the associated GPR. In particular, the authors refer to [16] and [18] for an illustration of the interest of using variable selection techniques to optimize this polynomial representation of the mean function of Y.

Following on these works, the idea of this paper is to propose an alternative parametrization of the mean function of Y, which is particularly adapted to the case when the number of code evaluations is small compared to the complexity of g. Instead of searching sparse polynomial approximations, we look for high dimensional polynomial approximations that are characterized by a small number of parameters. In other words, if we want to model a complex code response with a very limited number of code evaluations, we believe that it can be more efficient to use complex but approximated models than simple but fully optimized models. We thus propose to consider the composition of two polynomials for the mean function of Y. Indeed, the composition of two polynomial functions is still a polynomial function, but of much higher order. In particular, such a formalism can be used to model separately a transformation of each code input and the dependence structure between them.

The main difficulty concerning this specific representation is the identification of the parameters of the two combined polynomials. Indeed, by composing two polynomial functions that are linear with respect to their parameters, we get a strongly non-linear representation, which is likely to be very sensitive to small changes in the parameters values. In addition, distinct values for these parameters can lead to the same nested representation, which does not help for the identification. To avoid such redundancies, minimal nested parametrizations are introduced, and we show to what extent integrating this nested structure in the Gaussian process formalism can increase the robustness of the results, make easier the error control, and limit as much as possible over-fitting.

The outline of this work is as follows. First, Section 2 presents the theoretical framework for the definition of a Gaussian-process regression with a linear polynomial trend. Then, the nested polynomial trends we propose are detailed in Section 3. At last, the efficiency of the method is illustrated on a series of analytic examples in Section 4.

Section snippets

General framework

For $d \geq 1$ , let $L^{2} (D_{d}, R)$ be the space of square integrable functions on any compact subset $D_{d}$ of $R^{d}$ , with values in $R$ , equipped with the inner product ${(\cdot, \cdot)}_{L^{2}}$ , and the associated norm ${‖ \cdot ‖}_{L^{2}}$ , such that for all u and v in $L^{2} (D_{d}, R)$ , ${(u, v)}_{L^{2}} : = \int_{D_{d}} u (x) v (x) d x, {‖ u ‖}_{L^{2}}^{2} : = {(u, u)}_{L^{2}} .$

Let $S$ be a physical system, whose response depends on a d-dimensional input vector $x = (x_{1}, \dots, x_{d})$ , and whose performance can be evaluated from the computation of a quantity of interest, $g (x)$ . Function g is a deterministic mapping that is

Nested polynomial trends for Gaussian process predictors

As presented in Introduction, we are interested in identifying the best predictor of g in any unobserved point x in $D_{d}$ , when the maximal information is a fixed number of code evaluations. Instead of considering sparse representations for the parametrization of the mean function in the GPR formalism, this section proposes to focus on nested polynomial representations. First, the notations and the motivations for this new parametrization are presented. Then, it is explained why and how it is

Applications

To illustrate the advantages of the nested structure presented in Section 3 for the modeling of quantity of interest g, this section introduces a series of analytic examples, which are sorted with respect to the input set dimension, d. In each case, the proposed approach is compared to the “LAR + UK” approach, which has been described in Section 2. For each function g, let ${\hat{g}}^{nest}$ and ${\hat{g}}^{LAR+UK}$ be the best approximations of g we can get from the available information, when considering a nested

Conclusions

One of the main objectives of this paper is to propose an alternative parametrization of the polynomial trends for the Gaussian process regression. This parametrization, which is based on the composition of two polynomials, allows us to span high dimensional polynomial spaces with a reduced number of parameters. Hence, it has been shown on a series of examples that this approach can be very useful, especially when confronted to the modeling of complex functions with very little information.

References (34)

M. Arnst et al.
Identification of bayesian posteriors for coefficients of chaos expansions
J. Comput. Phys.
(2010)
S. Das et al.
Polynomial chaos representation of spatio-temporal random field from experimental measurements
J. Comput. Phys.
(2009)
J.D. Jakeman et al.
Enhancing $ℓ_{1}$ -minimization estimates of polynomial chaos expansions using basis selection
J. Comput. Phys.
(2015)
P. Kersaudy et al.
A new surrogate modeling technique combining kriging and polynomial chaos expansions – application to uncertainty analysis in computational dosimetry
J. Comput. Phys.
(2015)
Katerina Konakli et al.
Polynomial meta-models with canonical low-rank approximations: numerical insights and comparison to sparse polynomial chaos expansions
J. Comput. Phys.
(2016)
J. Bect et al.
Sequential design of computer experiments for the estimation of a probability of failure
Stat. Comput.
(2012)
B.J. Bichon et al.
Efficient global reliability analysis for non-linear implicit performance functions
AIAA J.
(2008)
I. Bilionis et al.
Multi-output separable gaussian process: towards an efficient, fully bayesian paradigm for uncertainty quantification
J. Comput. Phys.
(2013)
G. Blatman et al.
Adaptative sparse polynomial chaos expansion based on least angle regression
J. Comput. Phys.
(2011)
R. Brent
Algorithms for Minimization Without Derivatives
(1973)

O. Dubrule

Cross validation of kriging in a unique neighborhood

Math. Geol.

(1983)

B. Efron et al.

Least angle regression

Ann. Stat.

(2004)

R. Ghanem et al.

Stochastic Finite Elements: A Spectral Approach

(2003)

R. Ghanem et al.

Polynomial chaos in stochastic finite elements

J. Appl. Mech.

(1990)

R.B. Gramacy et al.

Cases for the nugget in modeling computer experiments

Stat. Comput.

(2012)

M.S. Handcock et al.

A bayesian analysis of kriging

Technometrics

(1993)

T. Hastie et al.

Elements of Statistical Learning

(2002)

Cited by (0)

View full text

Nested polynomial trends for the improvement of Gaussian process-based predictors

Abstract

Introduction

Section snippets

General framework

Nested polynomial trends for Gaussian process predictors

Applications

Conclusions

J. Comput. Phys.

J. Comput. Phys.

J. Comput. Phys.

J. Comput. Phys.

J. Comput. Phys.

Sequential design of computer experiments for the estimation of a probability of failure

Stat. Comput.

Efficient global reliability analysis for non-linear implicit performance functions

AIAA J.

Multi-output separable gaussian process: towards an efficient, fully bayesian paradigm for uncertainty quantification

J. Comput. Phys.

Adaptative sparse polynomial chaos expansion based on least angle regression

J. Comput. Phys.

Algorithms for Minimization Without Derivatives

Cross validation of kriging in a unique neighborhood

Math. Geol.

Least angle regression

Ann. Stat.

Stochastic Finite Elements: A Spectral Approach

Polynomial chaos in stochastic finite elements

J. Appl. Mech.

Cases for the nugget in modeling computer experiments

Stat. Comput.

A bayesian analysis of kriging

Technometrics

Elements of Statistical Learning