Skip to main content

Theory of Bayesian Optimization

  • Chapter
  • First Online:
Bayesian Optimization for Materials Science

Part of the book series: SpringerBriefs in the Mathematics of Materials ((BRIEFSMAMA,volume 3))

Abstract

In this chapter, we introduce the theory of Bayesian optimization procedure and illustrate its application to a simple problem. A more involved application of Bayesian optimization will be presented in Chap. 3.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

eBook
USD 16.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 16.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Rasmussen CE, Williams CKI. Gaussian processes for machine learning. MA, USA: The MIT Press; 2016 (Chapter 4).

    Google Scholar 

  2. Kresse G, Furthmuller J. Efficient iterative schemes for ab initio total energy calculations using a plane-wave basis set. Phys Rev B. 1996;54:11169–86.

    Article  Google Scholar 

  3. R Core Team. R: a language and environment for statistical computing. R foundation for statistical computing. https://www.R-project.org/ (2017).

  4. Frazier P, Wang J. Bayesian optimization for materials design. In: Lookman T, Alexander FJ, Rajan K, editors. Information science for materials discovery and design. Springer Series in Materials Science 225, Switzerland: Springer International Publishing; 2016.

    Google Scholar 

  5. Miller I, Miller M. John E. Freund’s mathematical statistics with applications. 7th ed. Upper Saddle River, NJ, USA: Pearson Prentice-Hall; 2014.

    Google Scholar 

  6. Petersen KB, Pedersen MS. The matrix cookbook. http://matrixcookbook.com (Section 9.1.3). 15 Nov 2012.

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Daniel Packwood .

Appendices

Appendix 2.1

To prove Eq. (2.12) – (2.14), we first substitute Eqs. (2.10) into (2.11) to obtain

$$ f\left( {v_{\alpha } ,v_{\beta } , \ldots ,v_{\gamma } \left| {u_{i} ,u_{j} , \ldots ,u_{k} } \right.} \right) \propto g\left( {u_{i} ,u_{j} , \ldots ,u_{k} \left| {v_{\alpha } ,v_{\beta } , \ldots ,v_{\gamma } } \right.} \right)g\left( {v_{\alpha } ,v_{\beta } , \ldots ,v_{\gamma } } \right) $$
(2.27)

Because the likelihood function and the prior density are Gaussian probability densities, Eq. (2.27) simplifies to (by the basic properties of conditional densities [5])

$$ f\left( {v_{\alpha } ,v_{\beta } , \ldots ,v_{\gamma } \left| {u_{i} ,u_{j} , \ldots ,u_{k} } \right.} \right) \propto g\left( {v_{\alpha } ,v_{\beta } , \ldots ,v_{\gamma } ,u_{i} ,u_{j} , \ldots ,u_{k} } \right), $$
(2.28)

or, by using Eq. (2.8),

$$ \begin{aligned} & f\left( {v_{\alpha } ,v_{\beta } , \ldots ,v_{\gamma } \left| {u_{i} ,u_{j} , \ldots ,u_{k} } \right.} \right) \\ & \quad \propto \exp \left( { - \frac{1}{2}\left[ {\left( {\begin{array}{*{20}c} {{\mathbf{v}}_{\alpha :\gamma } } \\ {{\mathbf{v}}_{i:k} } \\ \end{array} } \right) - \left( {\begin{array}{*{20}c} {{\varvec{\upmu}}_{\alpha :\gamma } } \\ {{\varvec{\upmu}}_{i:k} } \\ \end{array} } \right)} \right]^{T} \left[ {\begin{array}{*{20}c} {{\mathbf{K}}_{\alpha :\gamma ,\alpha :\gamma } } & {{\mathbf{K}}_{\alpha :\gamma ,i:k} } \\ {{\mathbf{K}}_{i:k,\alpha :\gamma } } & {{\mathbf{K}}_{i:k,i:k} } \\ \end{array} } \right]^{ - 1} \left[ {\left( {\begin{array}{*{20}c} {{\mathbf{v}}_{\alpha :\gamma } } \\ {{\mathbf{v}}_{i:k} } \\ \end{array} } \right) - \left( {\begin{array}{*{20}c} {{\varvec{\upmu}}_{\alpha :\gamma } } \\ {{\varvec{\upmu}}_{i:k} } \\ \end{array} } \right)} \right]} \right) \\ \end{aligned} $$
(2.29)

This expression can be simplified using an identity which applies to block matrices (see, Ref. [6])

$$ \begin{aligned} & \left[ {\begin{array}{*{20}c} {{\mathbf{K}}_{\alpha :\gamma ,\alpha :\gamma } } & {{\mathbf{K}}_{\alpha :\gamma ,i:k} } \\ {{\mathbf{K}}_{i:k,\alpha :\gamma } } & {{\mathbf{K}}_{i:k,i:k} } \\ \end{array} } \right]^{ - 1} \\ & \quad = \left[ {\begin{array}{*{20}c} {\left( {{\mathbf{K}}_{\alpha :\gamma ,\alpha :\gamma } - {\mathbf{K}}_{\alpha :\gamma ,i:k} {\mathbf{K}}_{i:k,i:k}^{ - 1} {\mathbf{K}}_{i:k,\alpha :\gamma } } \right)^{ - 1} } & {\left( {{\mathbf{K}}_{\alpha :\gamma ,\alpha :\gamma } - {\mathbf{K}}_{\alpha :\gamma ,i:k} {\mathbf{K}}_{i:k,i:k}^{ - 1} {\mathbf{K}}_{i:k,\alpha :\gamma } } \right)^{ - 1} {\mathbf{K}}_{\alpha :\gamma ,i:k} {\mathbf{K}}_{i:k,i:k}^{ - 1} } \\ { - \left( {{\mathbf{K}}_{i:k,i:k} - {\mathbf{K}}_{i:k,\alpha :\gamma } {\mathbf{K}}_{\alpha :\gamma ,\alpha :\gamma }^{ - 1} {\mathbf{K}}_{\alpha :\gamma ,i:k} } \right)^{ - 1} {\mathbf{K}}_{i:k,\alpha :\gamma } {\mathbf{K}}_{\alpha :\gamma ,\alpha :\gamma }^{ - 1} } & {\left( {{\mathbf{K}}_{i:k,i:k} - {\mathbf{K}}_{i:k,\alpha :\gamma } {\mathbf{K}}_{\alpha :\gamma ,\alpha :\gamma }^{ - 1} {\mathbf{K}}_{\alpha :\gamma ,i:k} } \right)^{ - 1} } \\ \end{array} } \right] \\ \end{aligned} $$
(2.30)

Substituting Eqs. (2.30) into (2.29) and performing some tedious but straightforward algebraic manipulations yields Eqs. (2.12)–(2.14).

Appendix 2.2

To prove Eq. (2.19), we write

$$ \begin{aligned} EI\left( {r_{\alpha } } \right) & = E_{f} \left[ {\hbox{min} \left( {u_{\hbox{min} } - V\left( {r_{\alpha } } \right),0} \right)} \right] \\ & = \int\limits_{ - \infty }^{{u_{\hbox{min} } }} {\left( {u_{\hbox{min} } - z} \right)\frac{1}{{\sqrt {2\pi K_{\alpha \alpha }^{*} } }}e^{{{{ - \left( {z - \mu_{\alpha }^{*} } \right)^{2} } \mathord{\left/ {\vphantom {{ - \left( {z - \mu_{\alpha }^{*} } \right)^{2} } {2K_{\alpha \alpha }^{*} }}} \right. \kern-0pt} {2K_{\alpha \alpha }^{*} }}}} dz} \\ & = \underbrace {{u_{\hbox{min} } \int\limits_{ - \infty }^{{u_{\hbox{min} } }} {\frac{1}{{\sqrt {2\pi K_{\alpha \alpha }^{*} } }}e^{{{{ - \left( {z - \mu_{\alpha }^{*} } \right)^{2} } \mathord{\left/ {\vphantom {{ - \left( {z - \mu_{\alpha }^{*} } \right)^{2} } {2K_{\alpha \alpha }^{*} }}} \right. \kern-0pt} {2K_{\alpha \alpha }^{*} }}}} dz} }}_{A} - \underbrace {{\int\limits_{ - \infty }^{{u_{\hbox{min} } }} {\frac{z}{{\sqrt {2\pi K_{\alpha \alpha }^{*} } }}e^{{{{ - \left( {z - \mu_{\alpha }^{*} } \right)^{2} } \mathord{\left/ {\vphantom {{ - \left( {z - \mu_{\alpha }^{*} } \right)^{2} } {2K_{\alpha \alpha }^{*} }}} \right. \kern-0pt} {2K_{\alpha \alpha }^{*} }}}} dz} .}}_{B} \\ \end{aligned} $$
(2.31)

The term A on the right-hand side of Eq. (2.31) is equal to

$$ A = u_{\hbox{min} } {\Phi} \left( {\frac{{u_{\hbox{min} } - \mu_{\alpha }^{*} }}{{\sqrt {K_{\alpha \alpha }^{*} } }}} \right) , $$
(2.32)

by the definition of the cumulative normal distribution. As for the term B in Eq. (2.31), we write

$$ \begin{aligned} B & = \int\limits_{ - \infty }^{{u_{\hbox{min} } }} {\frac{{\mu_{\alpha }^{*} + \left( {z - \mu_{\alpha }^{*} } \right)}}{{\sqrt {2\pi K_{\alpha \alpha }^{*} } }}e^{{{{ - \left( {z - \mu_{\alpha }^{*} } \right)^{2} } \mathord{\left/ {\vphantom {{ - \left( {z - \mu_{\alpha }^{*} } \right)^{2} } {2K_{\alpha \alpha }^{*} }}} \right. \kern-0pt} {2K_{\alpha \alpha }^{*} }}}} dz} . \\ & = \mu_{\alpha }^{*} {\Phi} \left( {\frac{{u_{\hbox{min} } - \mu_{\alpha }^{*} }}{{\sqrt {K_{\alpha \alpha }^{*} } }}} \right) + \underbrace {{\int\limits_{ - \infty }^{{u_{\hbox{min} } }} {\frac{{\left( {z - \mu_{\alpha }^{*} } \right)}}{{\sqrt {2\pi K_{\alpha \alpha }^{*} } }}e^{{{{ - \left( {z - \mu_{\alpha }^{*} } \right)^{2} } \mathord{\left/ {\vphantom {{ - \left( {z - \mu_{\alpha }^{*} } \right)^{2} } {2K_{\alpha \alpha }^{*} }}} \right. \kern-0pt} {2K_{\alpha \alpha }^{*} }}}} dz.} }}_{C} \\ \end{aligned} $$
(2.33)

By substituting the variable

$$ h = \frac{{z - \mu_{\alpha }^{*} }}{{\sqrt {2K_{\alpha \alpha }^{*} } }} $$
(2.34)

into the term C in Eq. (2.33) and performing the integration, we obtain

$$ \begin{aligned} C & = \left( {\frac{{2K_{\alpha \alpha }^{*} }}{\pi }} \right)^{{{1 \mathord{\left/ {\vphantom {1 2}} \right. \kern-0pt} 2}}} \int\limits_{ - \infty }^{{u_{\hbox{min} } }} {he^{{ - h^{2} }} dh} \\ & = - \left( {\frac{{K_{\alpha \alpha }^{*} }}{2\pi }} \right)^{{{1 \mathord{\left/ {\vphantom {1 2}} \right. \kern-0pt} 2}}} e^{{{{ - \left( {u_{\hbox{max} } - \mu_{\alpha }^{*} } \right)^{2} } \mathord{\left/ {\vphantom {{ - \left( {u_{\hbox{max} } - \mu_{\alpha }^{*} } \right)^{2} } {2K_{\alpha \alpha }^{{}} }}} \right. \kern-0pt} {2K_{\alpha \alpha }^{{}} }}}} \\ & = - \sqrt {K_{\alpha \alpha }^{*} } \phi \left( {\frac{{u_{\hbox{min} } - \mu_{\alpha }^{*} }}{{\sqrt {K_{\alpha \alpha }^{*} } }}} \right) \\ \end{aligned} $$
(2.35)

where the definition of the standard normal probability density was used. We obtain the result after combining Eqs. (2.31), (2.32), (2.33) and (2.35).

Appendix 2.3

To prove Eqs. (2.20) and (2.21), we take the logarithm of the prior probability density in Eq. (2.5) to obtain

$$ \begin{aligned} \log g\left( {u_{i} ,u_{j} , \ldots ,u_{k} } \right) & = - \frac{1}{2}\left( {{\mathbf{u}} - {\varvec{\upmu}}} \right)^{T} \left( {a{\mathbf{R}}} \right)^{ - 1} \left( {{\mathbf{u}} - {\varvec{\upmu}}} \right) - \frac{1}{2}\log \left| {a{\mathbf{R}}} \right| - \frac{s}{2}\log 2\pi \\ & = \underbrace {{ - \frac{1}{2}\left( {{\mathbf{u}} - {\varvec{\upmu}}} \right)^{T} \left( {a{\mathbf{R}}} \right)^{ - 1} \left( {{\mathbf{u}} - {\varvec{\upmu}}} \right)}}_{A} - \underbrace {{\frac{s}{2}\log \left( {a\left| {\mathbf{R}} \right|^{{{1 \mathord{\left/ {\vphantom {1 m}} \right. \kern-0pt} s}}} } \right)}}_{B} - \underbrace {{\frac{s}{2}\log 2\pi }}_{C}. \\ \end{aligned} $$
(2.36)

In the first line of Eq. (2.36), we used the definition of the matrix R in Eq. (2.22) and the fact that a R = Σ. In the second line, we used the fact that |a R| = a s|R|, which follows from the basic properties of determinants. Solving the equation ∂log g(u i , u j , …, u k )/∂a = 0 gives

$$ a = \frac{1}{s}\left( {{\mathbf{u}} - {\varvec{\upmu}}} \right)^{T} {\mathbf{R}}^{ - 1} \left( {{\mathbf{u}} - {\varvec{\upmu}}} \right), $$
(2.37)

which is simply Eq. (2.21). To obtain an expression for L, fist note that the term A reduces to

$$ A = {{ - s} \mathord{\left/ {\vphantom {{ - m} 2}} \right. \kern-0pt} 2} $$
(2.38)

upon substituting Eq. (2.37). Substituting Eq. (2.37) into the term marked B gives

$$ B = - \frac{s}{2}\log \left( {\frac{1}{s}\left| {\mathbf{R}} \right|^{{{1 \mathord{\left/ {\vphantom {1 m}} \right. \kern-0pt} s}}} \left( {{\mathbf{u}} - {\varvec{\upmu}}} \right)^{T} {\mathbf{R}}^{ - 1} \left( {{\mathbf{u}} - {\varvec{\upmu}}} \right)} \right). $$
(2.39)

Substituting Eqs. (2.38) and (2.39) into Eq. (2.34), and noting that the terms A and C are independent of a and L, gives Eq. (2.22). Finally, by noting that maximization of the logarithm of the prior probability density is equivalent to maximizing the prior probability density itself, we arrive at the result.

Rights and permissions

Reprints and permissions

Copyright information

© 2017 The Author(s)

About this chapter

Cite this chapter

Packwood, D. (2017). Theory of Bayesian Optimization. In: Bayesian Optimization for Materials Science. SpringerBriefs in the Mathematics of Materials, vol 3. Springer, Singapore. https://doi.org/10.1007/978-981-10-6781-5_2

Download citation

Publish with us

Policies and ethics