Nested polynomial trends for the improvement of Gaussian process-based predictors
Introduction
The numerical cost of many codes to simulate complex physical systems is still very high. To perform sensitivity analyses, uncertainty quantification or reliability studies, these computer models have therefore to be replaced by surrogate models, that is to say by fast and inexpensive mathematical functions. Within the computational science community, when the maximal available information is a finite set of code evaluations, the most widely used surrogate models are the generalized polynomial chaos expansion (PCE) [11], [10], [32], [7], [21], [1], [27] and the Gaussian process regression (GPR), or kriging (see [30], [24], [34]).
On the one hand, the main idea of PCE is to expand the code output, which is denoted by g in the following, onto an appropriate basis made of orthonormal multivariate polynomials, which are related to the distribution of the code input variables. As the number of unknown expansion coefficients usually grows exponentially with the number of input parameters, the relevance of these approaches strongly depends on their ability to select the most relevant basis functions. To this end, several penalization techniques, such as the -minimization [33], [15] and the least Angle Regression (LAR) methods [14], [9], [5], have been introduced to select polynomial basis sets that lead to more accurate PCE than would have been obtained if the basis is a priori fixed. Taking advantage of the tensor-product structure of the multivariate polynomial basis, separated representations, such as low-rank approximations [23], [19], have alternatively been proposed to develop surrogate models with polynomial functions in highly-compressed formats.
On the other hand, the GPR is based on the assumption that the code output is a particular realization of a Gaussian stochastic process, Y. This hypothesis, which was first introduced in time series analysis [26] and in optimization [20], is widely used as it allows dealing with the conditional probability and expectation, while leading to very interesting results in terms of computer code prediction. Hence, contrary to the PCE, the GPR is not associated with an a priori projection basis, but requires the introduction of the mean and the covariance functions of Y. In practice, we observe that the role of the mean function of Y on the prediction decreases when the number of code evaluations increases. This explains that in applications where many code evaluations are available, good GPR-based surrogate models can be obtained using constant or linear trends for the mean function. On the contrary, when the number of code evaluations is low compared to the complexity of g, it can be very useful to optimize it. In that case, searching the mean function of Y as a well-chosen sum of polynomial functions can indeed strongly improve the relevance of the associated GPR. In particular, the authors refer to [16] and [18] for an illustration of the interest of using variable selection techniques to optimize this polynomial representation of the mean function of Y.
Following on these works, the idea of this paper is to propose an alternative parametrization of the mean function of Y, which is particularly adapted to the case when the number of code evaluations is small compared to the complexity of g. Instead of searching sparse polynomial approximations, we look for high dimensional polynomial approximations that are characterized by a small number of parameters. In other words, if we want to model a complex code response with a very limited number of code evaluations, we believe that it can be more efficient to use complex but approximated models than simple but fully optimized models. We thus propose to consider the composition of two polynomials for the mean function of Y. Indeed, the composition of two polynomial functions is still a polynomial function, but of much higher order. In particular, such a formalism can be used to model separately a transformation of each code input and the dependence structure between them.
The main difficulty concerning this specific representation is the identification of the parameters of the two combined polynomials. Indeed, by composing two polynomial functions that are linear with respect to their parameters, we get a strongly non-linear representation, which is likely to be very sensitive to small changes in the parameters values. In addition, distinct values for these parameters can lead to the same nested representation, which does not help for the identification. To avoid such redundancies, minimal nested parametrizations are introduced, and we show to what extent integrating this nested structure in the Gaussian process formalism can increase the robustness of the results, make easier the error control, and limit as much as possible over-fitting.
The outline of this work is as follows. First, Section 2 presents the theoretical framework for the definition of a Gaussian-process regression with a linear polynomial trend. Then, the nested polynomial trends we propose are detailed in Section 3. At last, the efficiency of the method is illustrated on a series of analytic examples in Section 4.
Section snippets
General framework
For , let be the space of square integrable functions on any compact subset of , with values in , equipped with the inner product , and the associated norm , such that for all u and v in ,
Let be a physical system, whose response depends on a d-dimensional input vector , and whose performance can be evaluated from the computation of a quantity of interest, . Function g is a deterministic mapping that is
Nested polynomial trends for Gaussian process predictors
As presented in Introduction, we are interested in identifying the best predictor of g in any unobserved point x in , when the maximal information is a fixed number of code evaluations. Instead of considering sparse representations for the parametrization of the mean function in the GPR formalism, this section proposes to focus on nested polynomial representations. First, the notations and the motivations for this new parametrization are presented. Then, it is explained why and how it is
Applications
To illustrate the advantages of the nested structure presented in Section 3 for the modeling of quantity of interest g, this section introduces a series of analytic examples, which are sorted with respect to the input set dimension, d. In each case, the proposed approach is compared to the “LAR + UK” approach, which has been described in Section 2. For each function g, let and be the best approximations of g we can get from the available information, when considering a nested
Conclusions
One of the main objectives of this paper is to propose an alternative parametrization of the polynomial trends for the Gaussian process regression. This parametrization, which is based on the composition of two polynomials, allows us to span high dimensional polynomial spaces with a reduced number of parameters. Hence, it has been shown on a series of examples that this approach can be very useful, especially when confronted to the modeling of complex functions with very little information.
References (34)
- et al.
Identification of bayesian posteriors for coefficients of chaos expansions
J. Comput. Phys.
(2010) - et al.
Polynomial chaos representation of spatio-temporal random field from experimental measurements
J. Comput. Phys.
(2009) - et al.
Enhancing -minimization estimates of polynomial chaos expansions using basis selection
J. Comput. Phys.
(2015) - et al.
A new surrogate modeling technique combining kriging and polynomial chaos expansions – application to uncertainty analysis in computational dosimetry
J. Comput. Phys.
(2015) - et al.
Polynomial meta-models with canonical low-rank approximations: numerical insights and comparison to sparse polynomial chaos expansions
J. Comput. Phys.
(2016) - et al.
Sequential design of computer experiments for the estimation of a probability of failure
Stat. Comput.
(2012) - et al.
Efficient global reliability analysis for non-linear implicit performance functions
AIAA J.
(2008) - et al.
Multi-output separable gaussian process: towards an efficient, fully bayesian paradigm for uncertainty quantification
J. Comput. Phys.
(2013) - et al.
Adaptative sparse polynomial chaos expansion based on least angle regression
J. Comput. Phys.
(2011) Algorithms for Minimization Without Derivatives
(1973)