Variable Fidelity Regression Using Low Fidelity Function Blackbox and Sparsification

Zaytsev, A.

doi:10.1007/978-3-319-33395-3_11

Variable Fidelity Regression Using Low Fidelity Function Blackbox and Sparsification

A. Zaytsev^17,18

Conference paper
First Online: 17 April 2016

1873 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 9653))

Abstract

We consider construction of surrogate models based on variable fidelity samples generated by a high fidelity function (an exact representation of some physical phenomenon) and by a low fidelity function (a coarse approximation of the exact representation). A surrogate model is constructed to replace the computationally expensive high fidelity function. For such tasks Gaussian processes are generally used. However, if the sample size reaches a few thousands points, a direct application of Gaussian process regression becomes impractical due to high computational costs. We propose two approaches to circumvent this difficulty. The first approach uses approximation of sample covariance matrices based on the Nyström method. The second approach relies on the fact that engineers often can evaluate a low fidelity function on the fly at any point using some blackbox; thus each time calculating prediction of a high fidelity function at some point, we can update the surrogate model with the low fidelity function value at this point. So, we avoid issues related to the inversion of large covariance matrices — as we can construct model using only a moderate low fidelity sample size. We applied developed methods to a real problem, dealing with an optimization of the shape of a rotating disk.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

Alexandrov, N.M., Nielsen, E.J., Lewis, R.M., Anderson, W.K.: First-order model management with variable-fidelity physics applied to multi-element airfoil optimization. Technical report, NASA (2000)
Google Scholar
Álvarez, M.A., Lawrence, N.D.: Computationally efficient convolved multiple output Gaussian processes. J. Mach. Learn. Res. 12, 1425–1466 (2011)
MathSciNet MATH Google Scholar
Armand, S.C.: Structural optimization methodology for rotating disks of aircraft engines. Technical report, National Aeronautics and Space Administration, Office of Management, Scientific and Technical Information Program (1995)
Google Scholar
Bachoc, F.: Cross validation and maximum likelihood estimations of hyper-parameters of Gaussian processes with model misspecification. Comput. Stat. Data Anal. 66, 55–69 (2013)
Article MathSciNet Google Scholar
Banerjee, S., Gelfand, A.E., Finley, A.O., Sang, H.: Gaussian predictive process models for large spatial data sets. J. Royal Stat. Soc. Ser. B (Statist. Method.) 70(4), 825–848 (2008)
Article MathSciNet MATH Google Scholar
Belyaev, M., Burnaev, E., Kapushev, Y.: Gaussian process regression for structured data sets. In: Gammerman, A., Vovk, V., Papadopoulos, H. (eds.) SLDS 2015. LNCS, vol. 9047, pp. 106–115. Springer, Heidelberg (2015)
Chapter Google Scholar
Bishop, C.M.: Pattern Recognition and Machine Learning. Springer, New York (2006)
MATH Google Scholar
Boyle, P., Frean, M.: Dependent Gaussian processes. Adv. Neural Inf. Process. Syst. 17, 217–224 (2005)
Google Scholar
Burnaev, E., Panov, M.: Adaptive design of experiments based on Gaussian processes. In: Gammerman, A., Vovk, V., Papadopoulos, H. (eds.) SLDS 2015. LNCS, vol. 9047, pp. 116–125. Springer, Heidelberg (2015)
Chapter Google Scholar
Burnaev, E.V., Zaytsev, A.A., Spokoiny, V.G.: The Bernstein-von Mises theorem for regression based on Gaussian processes. Russ. Math. Surv. 68(5), 954–956 (2013)
Article MathSciNet MATH Google Scholar
Chang, W., Haran, M., Olson, R., Keller, K., et al.: Fast dimension-reduced climate model calibration and the effect of data aggregation. Ann. Appl. Stat. 8(2), 649–673 (2014)
Article MathSciNet MATH Google Scholar
Doyen, P.: Porosity from seismic data: a geostatistical approach. Geophysics 53(10), 1263–1275 (1988)
Article Google Scholar
Drineas, P., Mahoney, M.W.: On the Nyström method for approximating a Gram matrix for improved kernel-based learning. J. Mach. Learn. Res. 6, 2153–2175 (2005)
MathSciNet MATH Google Scholar
Druot, T., Alestra, S., Brand, C., Morozov, S.: Multi-objective optimization of aircrafts family at conceptual design stage. In: Design and Optimization Symposium. Albi, France, In Inverse Problems (2013)
Google Scholar
Farshi, B., Jahed, H., Mehrabian, A.: Optimum design of inhomogeneous non-uniform rotating discs. Comput. Struct. 82(9), 773–779 (2004)
Article Google Scholar
Forrester, A.I.J., Sóbester, A., Keane, A.J.: Multi-fidelity optimization via surrogate modelling. Proc. Roy. Soc. A Math. Phys. Eng. Sci. 463(2088), 3251–3269 (2007)
Article MathSciNet MATH Google Scholar
Forrester, A.I.J., Sóbester, A., Keane, A.J.: Engineering Design Via Surrogate Modelling: a Practical Guide. J. Wiley, Chichester (2008)
Book Google Scholar
Foster, L., Waagen, A., Aijaz, N., Hurley, M., Luis, A., Rinsky, J., Satyavolu, C., Way, M.J., Gazis, P., Srivastava, A.: Stable and efficient Gaussian process calculations. J. Mach. Learn. Res. 10, 857–882 (2009)
MathSciNet MATH Google Scholar
Furrer, R., Genton, M.G., Nychka, D.: Covariance tapering for interpolation of large spatial datasets. J. Comput. Graphical Stat. 15(3), 502–523 (2006)
Article MathSciNet Google Scholar
Golub, G.H., Van Loan, C.F.: Matrix Computations, vol. 3. JHU Press, Baltimore (2012)
MATH Google Scholar
Grihon, S., Burnaev, E., Belyaev, M., Prikhodko, P.: Surrogate modeling of stability constraints for optimization of composite structures. In: Koziel, S., Leifsson, L. (eds.) Surrogate-Based Modeling and Optimization, pp. 359–391. Springer, New York (2013)
Chapter Google Scholar
Han, Z., Görtz, S., Zimmermann, R.: Improving variable-fidelity surrogate modeling via gradient-enhanced kriging and a generalized hybrid bridge function. Aerosp. Sci. Technol. 25(1), 177–189 (2013)
Article Google Scholar
Hastie, T., Tibshirani, R., Friedman, J., Franklin, J.: The elements of statistical learning: data mining, inference and prediction. Math. Intell. 27(2), 83–85 (2005)
Google Scholar
Hensman, J., Fusi, N., Lawrence, N.D.,Gaussian processes for big data. arXiv preprint arXiv: 1309.6835 (2013)
Higdon, D., Gattiker, J., Williams, B., Rightley, M.: Computer model calibration using high-dimensional output. J. Am. Stat. Assoc. 103(482), 570–583 (2008)
Article MathSciNet MATH Google Scholar
Huang, Z., Wang, C., Chen, J., Tian, H.: Optimal design of aeroengine turbine disc based on kriging surrogate models. Comput. Struct. 89(1), 27–37 (2011)
Article Google Scholar
Kennedy, M.C., O’Hagan, A.: Predicting the output from a complex computer code when fast approximations are available. Biometrika 87(1), 1–13 (2000)
Article MathSciNet MATH Google Scholar
Koziel, S., Bekasiewicz, A., Couckuyt, I., Dhaene, T.: Efficient multi-objective simulation-driven antenna design using co-kriging. IEEE Trans. Antennas Propag. 62(11), 5900–5905 (2014)
Article MathSciNet Google Scholar
Madsen, J.I., Langthjem, M.: Multifidelity response surface approximations for the optimum design of diffuser flows. Optim. Eng. 2(4), 453–468 (2001)
Article MATH Google Scholar
Mohan, S.C., Maiti, D.K.: Structural optimization of rotating disk using response surface equation and genetic algorithm. Int. J. Comput. Methods Eng. Sci. Mech. 14(2), 124–132 (2013)
Article Google Scholar
Neal, R.M.: Monte carlo implementation of Gaussian process models for Bayesian regression and classification. arXiv preprint physics/9701026 (1997)
Park, J.-S.: Optimal Latin-hypercube designs for computer experiments. J. Stat. Plann. Infer. 39(1), 95–111 (1994)
Article MathSciNet MATH Google Scholar
Park, S., Choi, S.: Hierarchical Gaussian process regression. In: ACML, pp. 95–110 (2010)
Google Scholar
Pepelyshev, A.: The role of the nugget term in the Gaussian process method. In: Giovagnoli, A., Atkinson, A.C., Torsney, B., May, C. (eds.) mODa 9-Advances in Model-Oriented Design and Analysis, pp. 149–156. Springer, Heidelberg (2010)
Chapter Google Scholar
Qian, Z., Seepersad, C.C., Joseph, V.R., Allen, J.K., Wu, C.F.: Building surrogate models based on detailed and approximate simulations. J. Mech. Des. 128(4), 668–677 (2006)
Article Google Scholar
Quiñonero-Candela, J., Rasmussen, C.E.: A unifying view of sparse approximate Gaussian process regression. J. Mach. Learn. Res. 6, 1939–1959 (2005)
MathSciNet MATH Google Scholar
Rasmussen, C.E., Williams, C.K.I.: Gaussian processes for machine learning. The MIT Press, Cambridge (2006)
MATH Google Scholar
Shaby, B., Ruppert, D.: Tapered covariance: Bayesian estimation and asymptotics. J. Comput. Graphical Stat. 21(2), 433–452 (2012)
Article MathSciNet Google Scholar
Shi, J.Q., Murray-Smith, R., Titterington, D.M.: Hierarchical Gaussian process mixtures for regression. Stat. Comput. 15(1), 31–41 (2005)
Article MathSciNet Google Scholar
Sun, G., Li, G., Stone, M., Li, Q.: A two-stage multi-fidelity optimization procedure for honeycomb-type cellular materials. Comput. Mater. Sci. 49(3), 500–511 (2010)
Article Google Scholar
Sun, S., Zhao, J., Zhu, J.: A review of Nyström methods for large-scale machine learning. Inf. Fusion 26, 36–48 (2015)
Article Google Scholar
Titsias, M.K.: Variational learning of inducing variables in sparse Gaussian processes. In: International Conference on Artificial Intelligence and Statistics, pp. 567–574 (2009)
Google Scholar
Williams, C.K.I., Barber, D.: Bayesian classification with Gaussian processes. IEEE Trans. Pattern Anal. Mach. Intell. 20(12), 1342–1351 (1998)
Article Google Scholar
Xu, W., Tran, T., Srivastava, R., Journel, A.: Integrating seismic data in reservoir modeling: the collocated cokriging alternative. Society of Petroleum Engineers, In: SPE Annual Technical Conference and Exhibition (1992)
Google Scholar
Zahir, M.K., Gao, Z.: Variable fidelity surrogate assisted optimization using a suite of low fidelity solvers. Open J. Optim. 1(1), 0–8 (2012)
Google Scholar
Zaitsev, A., Burnaev, E., Spokoiny, V.: Properties of the posterior distribution of a regression model based on Gaussian random fields. Autom. Remote Control 74(10), 1645–1655 (2013)
Article MathSciNet MATH Google Scholar

Download references

Acknowledgments

We thank Dmitry Khominich from DATADVANCE llc for making the solvers for rotating disk problem available, and Tatyana Alenkaya from MIPT for proofreading of the article. The research was conducted in IITP RAS and supported solely by the Russian Science Foundation grant (project 14-50-00150).

Author information

Authors and Affiliations

IITP RAS, 127051, Moscow, Russia
A. Zaytsev
MIPT, Dolgoprudny, 141700, Moscow, Russia
A. Zaytsev

Authors

A. Zaytsev
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to A. Zaytsev .

Editor information

Editors and Affiliations

University of London, Egham, United Kingdom
Alexander Gammerman
University of London, Egham, United Kingdom
Zhiyuan Luo
CIEMAT, Madrid, Spain
Jesús Vega
University of London, Egham, United Kingdom
Vladimir Vovk

Appendices

Appendix

A Proof of Technical Statements

In this section we provide the proofs of the statements of Sect. 4.

Proof (Proof of Statement 1)

For the posterior mean we get:

$$\begin{aligned} \hat{\mathbf {y}}_h(\mathbf {x}^*)&\approx \mathbf {K}_1^* \mathbf {K}_{11}^{-1} \mathbf {K}_1^T (\mathbf {K}_1 \mathbf {K}_{11}^{-1} \mathbf {K}_1^T + \mathbf {R}^{-2})^{-1} \mathbf {y}= \mathbf {K}_1^* \mathbf {K}_{11}^{-1} \mathbf {K}_1^T \mathbf {R}(\mathbf {R}\mathbf {K}_1 \mathbf {K}_{11}^{-1} \mathbf {K}_1^T \mathbf {R}+ \mathbf {I}_{n})^{-1} \mathbf {R}\mathbf {y}=\\&= \mathbf {K}_1^* \mathbf {K}_{11}^{-1} \mathbf {C}_1^T (\mathbf {C}_1 \mathbf {K}_{11}^{-1} \mathbf {C}_1^T + \mathbf {I}_{n})^{-1} \mathbf {R}\mathbf {y}= \mathbf {K}_1^* \mathbf {K}_{11}^{-1} (\mathbf {C}_1^T \mathbf {C}_1 \mathbf {K}_{11}^{-1} + \mathbf {I}_{n_1})^{-1} \mathbf {C}_1^T \mathbf {R}\mathbf {y}= \\&= \mathbf {K}_1^* (\mathbf {C}_1^T \mathbf {C}_1 + \mathbf {K}_{11})^{-1} \mathbf {C}_1^T \mathbf {R}\mathbf {y}= \mathbf {K}_1^* (\mathbf {C}_1^T \mathbf {C}_1 + \mathbf {V}^T_{11} \mathbf {V}_{11})^{-1} \mathbf {C}_1^T \mathbf {R}\mathbf {y}=\\&= \mathbf {K}_1^* \mathbf {V}^{-1}_{11} (\mathbf {V}^{-T}_{11} \mathbf {C}_1^T \mathbf {C}_1 \mathbf {V}_{11}^{-1} + \mathbf {I}_{n_1})^{-1} \mathbf {V}^{-T}_{11} \mathbf {C}_1^T \mathbf {R}\mathbf {y}= \mathbf {K}_1^* \mathbf {V}^{-1}_{11} (\mathbf {I}_{n_1} + \mathbf {V}^T \mathbf {V})^{-1} \mathbf {V}^T \mathbf {R}\mathbf {y}. \end{aligned}$$

We use the same approach to derive an equation for the posterior variance:

$$\begin{aligned} \mathbb {V} \left( X^* \right) - (\rho ^2 \sigma _l^2 + \sigma _d^2) \mathbf {I}_{n^*}&\approx \mathbf {K}_1^* \mathbf {K}_{11}^{-1} \mathbf {K}_1^{*T} - \mathbf {K}_1^* \mathbf {K}_{11}^{-1} \mathbf {K}_1^T (\mathbf {R}^{-2} + \mathbf {K}_1 \mathbf {K}_{11}^{-1} \mathbf {K}_1^T)^{-1} \mathbf {K}_1 \mathbf {K}_{11}^{-1} \mathbf {K}_1^{*T}=\\&= \mathbf {K}_1^* (\mathbf {K}_{11}^{-1} - \mathbf {K}_{11}^{-1} \mathbf {K}_1^T (\mathbf {R}^{-2} + \mathbf {K}_1 \mathbf {K}_{11}^{-1} \mathbf {K}_1^T)^{-1} \mathbf {K}_1 \mathbf {K}_{11}^{-1}) \mathbf {K}_1^{*T}=\\&= \mathbf {K}_1^* (\mathbf {K}_{11} + \mathbf {K}_1^T \mathbf {R}^2 \mathbf {K}_1)^{-1} \mathbf {K}_1^{*T} = \mathbf {K}_1^* (\mathbf {V}^T_{11} \mathbf {V}_{11} + \mathbf {C}_1^T \mathbf {C}_1)^{-1} \mathbf {K}_1^{*T}= \\&= \mathbf {K}_1^* \mathbf {V}^{-1}_{11} (\mathbf {I}_{n_1} + \mathbf {V}^T \mathbf {V})^{-1} \mathbf {V}^{-T}_{11} \mathbf {K}_1^{*T}. \end{aligned}$$

Proof (Proof of Statement 2)

First of all we have to calculate the matrices $\mathbf {V}_{11}$ and $\mathbf {V}= \mathbf {R}\mathbf {K}_1 \mathbf {V}_{11}^{-T}$. The matrix $\mathbf {V}_{11}$ is of size $n_1 \times n_1$, so we need $O(n_1^3)$ to get its inverse. To calculate $\mathbf {K}_1 \mathbf {V}_{11}^{-T}$ we need $O(n_1^2 n)$ operations. Finally, as $\mathbf {R}$ is a diagonal matrix, we use $O(n_1 n)$ operations to get $\mathbf {V}$.

In case $n^* = 1$ to get the posterior mean we have to calculate $\mathbf {V}_{11} (\mathbf {I}_{n_1} + \mathbf {V}^T \mathbf {V})^{-1} \mathbf {V}^T \mathbf {y}$. We use $O(n_1^2 n)$ operations to calculate $\mathbf {V}^T \mathbf {V}$, to inverse $\mathbf {I}_{n_1} + \mathbf {V}^T \mathbf {V}$ we need $O(n_1^3)$ operations, to calculate $\mathbf {V}_{11} (\mathbf {I}_{n_1} + \mathbf {V}^T \mathbf {V})^{-1} \mathbf {V}^T$ one uses extra $O(n_1^2 n)$ operations, and finally to calculate the posterior mean we need additional $O(n_1 n)$ operations. Consequently, to calculate the posterior mean we use $O(n_1^2 n)$ operations.

In the same way in order to calculate $\mathbf {V}_{11} (\mathbf {I}_{n_1} + \mathbf {V}^T \mathbf {V})^{-1} \mathbf {V}_{11}^{-1}$ we need $O(n_1^2 n)$ operations to calculate $(\mathbf {I}_{n_1} + \mathbf {V}^T \mathbf {V})^{-1}$ and additional $O(n_1^3)$ operations to get the final matrix. Consequently, in order to calculate the posterior variance we use $O(n_1^2 n)$ operations.

Finally, we need $O(n_1^2 n)$ operations to compute the required matrices, and $O(n_1^2 n)$, to obtain the posterior mean and the posterior variance from these precomputed matrices. So, the total computational complexity is $O(n_1^2 n)$.

B Comparison of Low and High Fidelity Model for Rotating Disk

There are two available solvers for $u_\mathrm{max}$ and $s_\mathrm{max}$ calculation. The low fidelity function is calculated using Ordinary Differential Equations (ODE) solver based on a simple Runge–Kutta’s method. The high fidelity function is calculated using Finite Element Model (FEM) solver from ANSYS.

To compare the solvers we draw the scatter plots of low and high fidelity values and also plot slices of the corresponding functions. We generate a random sample of points in a specified design space box, calculate the low and high fidelity function values and draw the low fidelity function values versus the high fidelity function values at the same points. The scatter plots are in Fig. 3: the difference between values increases significantly when the values are increasing.

For the central point of the design space box with $r_1 = 0.06, r_2 = 0.13, r_3 = 0.16, r_4 = 0.185, t_1 = 0.027, t_3 = 0.027$ we construct one-dimensional slices by varying single input variable in specified bounds. Slices for different input variables for $u_\mathrm{max}$ and for $s_\mathrm{max}$ are given in Fig. 4. In case of $u_\mathrm{max}$ the high and the low fidelity functions have the same behaviour, and the low fidelity function models the high fidelity function accurately. For $s_\mathrm{max}$ the high and the low fidelity functions are sometimes different: their behaviours differ for a slice along $r_1$ input, and local maxima differ for slice along $t_3$ input.

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Zaytsev, A. (2016). Variable Fidelity Regression Using Low Fidelity Function Blackbox and Sparsification. In: Gammerman, A., Luo, Z., Vega, J., Vovk, V. (eds) Conformal and Probabilistic Prediction with Applications. COPA 2016. Lecture Notes in Computer Science(), vol 9653. Springer, Cham. https://doi.org/10.1007/978-3-319-33395-3_11

Download citation

DOI: https://doi.org/10.1007/978-3-319-33395-3_11
Published: 17 April 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-33394-6
Online ISBN: 978-3-319-33395-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics