Theory of Bayesian Optimization

Packwood, Daniel

doi:10.1007/978-981-10-6781-5_2

Daniel Packwood⁸

Part of the book series: SpringerBriefs in the Mathematics of Materials ((BRIEFSMAMA,volume 3))

1554 Accesses
3 Citations

Abstract

In this chapter, we introduce the theory of Bayesian optimization procedure and illustrate its application to a simple problem. A more involved application of Bayesian optimization will be presented in Chap. 3.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

eBook: USD 16.99; Price excludes VAT (USA)

Softcover Book: USD 16.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Rasmussen CE, Williams CKI. Gaussian processes for machine learning. MA, USA: The MIT Press; 2016 (Chapter 4).
Google Scholar
Kresse G, Furthmuller J. Efficient iterative schemes for ab initio total energy calculations using a plane-wave basis set. Phys Rev B. 1996;54:11169–86.
Article Google Scholar
R Core Team. R: a language and environment for statistical computing. R foundation for statistical computing. https://www.R-project.org/ (2017).
Frazier P, Wang J. Bayesian optimization for materials design. In: Lookman T, Alexander FJ, Rajan K, editors. Information science for materials discovery and design. Springer Series in Materials Science 225, Switzerland: Springer International Publishing; 2016.
Google Scholar
Miller I, Miller M. John E. Freund’s mathematical statistics with applications. 7th ed. Upper Saddle River, NJ, USA: Pearson Prentice-Hall; 2014.
Google Scholar
Petersen KB, Pedersen MS. The matrix cookbook. http://matrixcookbook.com (Section 9.1.3). 15 Nov 2012.

Download references

Author information

Authors and Affiliations

Institute for Integrated Cell-Materials Sciences (iCeMS), Kyoto University, Kyoto, Japan
Daniel Packwood

Authors

Daniel Packwood
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Daniel Packwood .

Appendices

Appendix 2.1

To prove Eq. (2.12) – (2.14), we first substitute Eqs. (2.10) into (2.11) to obtain

$$ f\left( {v_{\alpha } ,v_{\beta } , \ldots ,v_{\gamma } \left| {u_{i} ,u_{j} , \ldots ,u_{k} } \right.} \right) \propto g\left( {u_{i} ,u_{j} , \ldots ,u_{k} \left| {v_{\alpha } ,v_{\beta } , \ldots ,v_{\gamma } } \right.} \right)g\left( {v_{\alpha } ,v_{\beta } , \ldots ,v_{\gamma } } \right) $$

(2.27)

Because the likelihood function and the prior density are Gaussian probability densities, Eq. (2.27) simplifies to (by the basic properties of conditional densities [5])

$$ f\left( {v_{\alpha } ,v_{\beta } , \ldots ,v_{\gamma } \left| {u_{i} ,u_{j} , \ldots ,u_{k} } \right.} \right) \propto g\left( {v_{\alpha } ,v_{\beta } , \ldots ,v_{\gamma } ,u_{i} ,u_{j} , \ldots ,u_{k} } \right), $$

(2.28)

or, by using Eq. (2.8),

$$ \begin{aligned} & f\left( {v_{\alpha } ,v_{\beta } , \ldots ,v_{\gamma } \left| {u_{i} ,u_{j} , \ldots ,u_{k} } \right.} \right) \\ & \quad \propto \exp \left( { - \frac{1}{2}\left[ {\left( {\begin{array}{*{20}c} {{\mathbf{v}}_{\alpha :\gamma } } \\ {{\mathbf{v}}_{i:k} } \\ \end{array} } \right) - \left( {\begin{array}{*{20}c} {{\varvec{\upmu}}_{\alpha :\gamma } } \\ {{\varvec{\upmu}}_{i:k} } \\ \end{array} } \right)} \right]^{T} \left[ {\begin{array}{*{20}c} {{\mathbf{K}}_{\alpha :\gamma ,\alpha :\gamma } } & {{\mathbf{K}}_{\alpha :\gamma ,i:k} } \\ {{\mathbf{K}}_{i:k,\alpha :\gamma } } & {{\mathbf{K}}_{i:k,i:k} } \\ \end{array} } \right]^{ - 1} \left[ {\left( {\begin{array}{*{20}c} {{\mathbf{v}}_{\alpha :\gamma } } \\ {{\mathbf{v}}_{i:k} } \\ \end{array} } \right) - \left( {\begin{array}{*{20}c} {{\varvec{\upmu}}_{\alpha :\gamma } } \\ {{\varvec{\upmu}}_{i:k} } \\ \end{array} } \right)} \right]} \right) \\ \end{aligned} $$

(2.29)

This expression can be simplified using an identity which applies to block matrices (see, Ref. [6])

$$ \begin{aligned} & \left[ {\begin{array}{*{20}c} {{\mathbf{K}}_{\alpha :\gamma ,\alpha :\gamma } } & {{\mathbf{K}}_{\alpha :\gamma ,i:k} } \\ {{\mathbf{K}}_{i:k,\alpha :\gamma } } & {{\mathbf{K}}_{i:k,i:k} } \\ \end{array} } \right]^{ - 1} \\ & \quad = \left[ {\begin{array}{*{20}c} {\left( {{\mathbf{K}}_{\alpha :\gamma ,\alpha :\gamma } - {\mathbf{K}}_{\alpha :\gamma ,i:k} {\mathbf{K}}_{i:k,i:k}^{ - 1} {\mathbf{K}}_{i:k,\alpha :\gamma } } \right)^{ - 1} } & {\left( {{\mathbf{K}}_{\alpha :\gamma ,\alpha :\gamma } - {\mathbf{K}}_{\alpha :\gamma ,i:k} {\mathbf{K}}_{i:k,i:k}^{ - 1} {\mathbf{K}}_{i:k,\alpha :\gamma } } \right)^{ - 1} {\mathbf{K}}_{\alpha :\gamma ,i:k} {\mathbf{K}}_{i:k,i:k}^{ - 1} } \\ { - \left( {{\mathbf{K}}_{i:k,i:k} - {\mathbf{K}}_{i:k,\alpha :\gamma } {\mathbf{K}}_{\alpha :\gamma ,\alpha :\gamma }^{ - 1} {\mathbf{K}}_{\alpha :\gamma ,i:k} } \right)^{ - 1} {\mathbf{K}}_{i:k,\alpha :\gamma } {\mathbf{K}}_{\alpha :\gamma ,\alpha :\gamma }^{ - 1} } & {\left( {{\mathbf{K}}_{i:k,i:k} - {\mathbf{K}}_{i:k,\alpha :\gamma } {\mathbf{K}}_{\alpha :\gamma ,\alpha :\gamma }^{ - 1} {\mathbf{K}}_{\alpha :\gamma ,i:k} } \right)^{ - 1} } \\ \end{array} } \right] \\ \end{aligned} $$

(2.30)

Substituting Eqs. (2.30) into (2.29) and performing some tedious but straightforward algebraic manipulations yields Eqs. (2.12)–(2.14).

Appendix 2.2

To prove Eq. (2.19), we write

$$ \begin{aligned} EI\left( {r_{\alpha } } \right) & = E_{f} \left[ {\hbox{min} \left( {u_{\hbox{min} } - V\left( {r_{\alpha } } \right),0} \right)} \right] \\ & = \int\limits_{ - \infty }^{{u_{\hbox{min} } }} {\left( {u_{\hbox{min} } - z} \right)\frac{1}{{\sqrt {2\pi K_{\alpha \alpha }^{*} } }}e^{{{{ - \left( {z - \mu_{\alpha }^{*} } \right)^{2} } \mathord{\left/ {\vphantom {{ - \left( {z - \mu_{\alpha }^{*} } \right)^{2} } {2K_{\alpha \alpha }^{*} }}} \right. \kern-0pt} {2K_{\alpha \alpha }^{*} }}}} dz} \\ & = \underbrace {{u_{\hbox{min} } \int\limits_{ - \infty }^{{u_{\hbox{min} } }} {\frac{1}{{\sqrt {2\pi K_{\alpha \alpha }^{*} } }}e^{{{{ - \left( {z - \mu_{\alpha }^{*} } \right)^{2} } \mathord{\left/ {\vphantom {{ - \left( {z - \mu_{\alpha }^{*} } \right)^{2} } {2K_{\alpha \alpha }^{*} }}} \right. \kern-0pt} {2K_{\alpha \alpha }^{*} }}}} dz} }}_{A} - \underbrace {{\int\limits_{ - \infty }^{{u_{\hbox{min} } }} {\frac{z}{{\sqrt {2\pi K_{\alpha \alpha }^{*} } }}e^{{{{ - \left( {z - \mu_{\alpha }^{*} } \right)^{2} } \mathord{\left/ {\vphantom {{ - \left( {z - \mu_{\alpha }^{*} } \right)^{2} } {2K_{\alpha \alpha }^{*} }}} \right. \kern-0pt} {2K_{\alpha \alpha }^{*} }}}} dz} .}}_{B} \\ \end{aligned} $$

(2.31)

The term A on the right-hand side of Eq. (2.31) is equal to

$$ A = u_{\hbox{min} } {\Phi} \left( {\frac{{u_{\hbox{min} } - \mu_{\alpha }^{*} }}{{\sqrt {K_{\alpha \alpha }^{*} } }}} \right) , $$

(2.32)

by the definition of the cumulative normal distribution. As for the term B in Eq. (2.31), we write

$$ \begin{aligned} B & = \int\limits_{ - \infty }^{{u_{\hbox{min} } }} {\frac{{\mu_{\alpha }^{*} + \left( {z - \mu_{\alpha }^{*} } \right)}}{{\sqrt {2\pi K_{\alpha \alpha }^{*} } }}e^{{{{ - \left( {z - \mu_{\alpha }^{*} } \right)^{2} } \mathord{\left/ {\vphantom {{ - \left( {z - \mu_{\alpha }^{*} } \right)^{2} } {2K_{\alpha \alpha }^{*} }}} \right. \kern-0pt} {2K_{\alpha \alpha }^{*} }}}} dz} . \\ & = \mu_{\alpha }^{*} {\Phi} \left( {\frac{{u_{\hbox{min} } - \mu_{\alpha }^{*} }}{{\sqrt {K_{\alpha \alpha }^{*} } }}} \right) + \underbrace {{\int\limits_{ - \infty }^{{u_{\hbox{min} } }} {\frac{{\left( {z - \mu_{\alpha }^{*} } \right)}}{{\sqrt {2\pi K_{\alpha \alpha }^{*} } }}e^{{{{ - \left( {z - \mu_{\alpha }^{*} } \right)^{2} } \mathord{\left/ {\vphantom {{ - \left( {z - \mu_{\alpha }^{*} } \right)^{2} } {2K_{\alpha \alpha }^{*} }}} \right. \kern-0pt} {2K_{\alpha \alpha }^{*} }}}} dz.} }}_{C} \\ \end{aligned} $$

(2.33)

By substituting the variable

$$ h = \frac{{z - \mu_{\alpha }^{*} }}{{\sqrt {2K_{\alpha \alpha }^{*} } }} $$

(2.34)

into the term C in Eq. (2.33) and performing the integration, we obtain

$$ \begin{aligned} C & = \left( {\frac{{2K_{\alpha \alpha }^{*} }}{\pi }} \right)^{{{1 \mathord{\left/ {\vphantom {1 2}} \right. \kern-0pt} 2}}} \int\limits_{ - \infty }^{{u_{\hbox{min} } }} {he^{{ - h^{2} }} dh} \\ & = - \left( {\frac{{K_{\alpha \alpha }^{*} }}{2\pi }} \right)^{{{1 \mathord{\left/ {\vphantom {1 2}} \right. \kern-0pt} 2}}} e^{{{{ - \left( {u_{\hbox{max} } - \mu_{\alpha }^{*} } \right)^{2} } \mathord{\left/ {\vphantom {{ - \left( {u_{\hbox{max} } - \mu_{\alpha }^{*} } \right)^{2} } {2K_{\alpha \alpha }^{{}} }}} \right. \kern-0pt} {2K_{\alpha \alpha }^{{}} }}}} \\ & = - \sqrt {K_{\alpha \alpha }^{*} } \phi \left( {\frac{{u_{\hbox{min} } - \mu_{\alpha }^{*} }}{{\sqrt {K_{\alpha \alpha }^{*} } }}} \right) \\ \end{aligned} $$

(2.35)

where the definition of the standard normal probability density was used. We obtain the result after combining Eqs. (2.31), (2.32), (2.33) and (2.35).

Appendix 2.3

To prove Eqs. (2.20) and (2.21), we take the logarithm of the prior probability density in Eq. (2.5) to obtain

$$ \begin{aligned} \log g\left( {u_{i} ,u_{j} , \ldots ,u_{k} } \right) & = - \frac{1}{2}\left( {{\mathbf{u}} - {\varvec{\upmu}}} \right)^{T} \left( {a{\mathbf{R}}} \right)^{ - 1} \left( {{\mathbf{u}} - {\varvec{\upmu}}} \right) - \frac{1}{2}\log \left| {a{\mathbf{R}}} \right| - \frac{s}{2}\log 2\pi \\ & = \underbrace {{ - \frac{1}{2}\left( {{\mathbf{u}} - {\varvec{\upmu}}} \right)^{T} \left( {a{\mathbf{R}}} \right)^{ - 1} \left( {{\mathbf{u}} - {\varvec{\upmu}}} \right)}}_{A} - \underbrace {{\frac{s}{2}\log \left( {a\left| {\mathbf{R}} \right|^{{{1 \mathord{\left/ {\vphantom {1 m}} \right. \kern-0pt} s}}} } \right)}}_{B} - \underbrace {{\frac{s}{2}\log 2\pi }}_{C}. \\ \end{aligned} $$

(2.36)

In the first line of Eq. (2.36), we used the definition of the matrix R in Eq. (2.22) and the fact that a R = Σ. In the second line, we used the fact that |a R| = a ^s|R|, which follows from the basic properties of determinants. Solving the equation ∂log g(u _i, u _j, …, u _k)/∂a = 0 gives

$$ a = \frac{1}{s}\left( {{\mathbf{u}} - {\varvec{\upmu}}} \right)^{T} {\mathbf{R}}^{ - 1} \left( {{\mathbf{u}} - {\varvec{\upmu}}} \right), $$

(2.37)

which is simply Eq. (2.21). To obtain an expression for L, fist note that the term A reduces to

$$ A = {{ - s} \mathord{\left/ {\vphantom {{ - m} 2}} \right. \kern-0pt} 2} $$

(2.38)

upon substituting Eq. (2.37). Substituting Eq. (2.37) into the term marked B gives

$$ B = - \frac{s}{2}\log \left( {\frac{1}{s}\left| {\mathbf{R}} \right|^{{{1 \mathord{\left/ {\vphantom {1 m}} \right. \kern-0pt} s}}} \left( {{\mathbf{u}} - {\varvec{\upmu}}} \right)^{T} {\mathbf{R}}^{ - 1} \left( {{\mathbf{u}} - {\varvec{\upmu}}} \right)} \right). $$

(2.39)

Substituting Eqs. (2.38) and (2.39) into Eq. (2.34), and noting that the terms A and C are independent of a and L, gives Eq. (2.22). Finally, by noting that maximization of the logarithm of the prior probability density is equivalent to maximizing the prior probability density itself, we arrive at the result.

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Packwood, D. (2017). Theory of Bayesian Optimization. In: Bayesian Optimization for Materials Science. SpringerBriefs in the Mathematics of Materials, vol 3. Springer, Singapore. https://doi.org/10.1007/978-981-10-6781-5_2

Download citation

DOI: https://doi.org/10.1007/978-981-10-6781-5_2
Published: 05 October 2017
Publisher Name: Springer, Singapore
Print ISBN: 978-981-10-6780-8
Online ISBN: 978-981-10-6781-5
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)

Publish with us

Policies and ethics