Precomputing strategy for Hamiltonian Monte Carlo method based on regularity in parameter space

Zhang, Cheng; Shahbaba, Babak; Zhao, Hongkai

doi:10.1007/s00180-016-0683-1

Precomputing strategy for Hamiltonian Monte Carlo method based on regularity in parameter space

Original Paper
Published: 22 September 2016

Volume 32, pages 253–279, (2017)
Cite this article

Computational Statistics Aims and scope Submit manuscript

Cheng Zhang¹,
Babak Shahbaba² &
Hongkai Zhao¹

438 Accesses
8 Citations
Explore all metrics

Abstract

Markov Chain Monte Carlo (MCMC) algorithms play an important role in statistical inference problems dealing with intractable probability distributions. Recently, many MCMC algorithms such as Hamiltonian Monte Carlo (HMC) and Riemannian Manifold HMC have been proposed to provide distant proposals with high acceptance rate. These algorithms, however, tend to be computationally intensive which could limit their usefulness, especially for big data problems due to repetitive evaluations of functions and statistical quantities that depend on the data. This issue occurs in many statistic computing problems. In this paper, we propose a novel strategy that exploits smoothness (regularity) in parameter space to improve computational efficiency of MCMC algorithms. When evaluation of functions or statistical quantities are needed at a point in parameter space, interpolation from precomputed values or previous computed values is used. More specifically, we focus on HMC algorithms that use geometric information for faster exploration of probability distributions. Our proposed method is based on precomputing the required geometric information on a set of grids before running sampling algorithm and approximating the geometric information for the current location of the sampler using the precomputed information at nearby grids at each iteration of HMC. Sparse grid interpolation method is used for high dimensional problems. Tests on computational examples are shown to illustrate the advantages of our method.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Preconditioned golden ratio primal-dual algorithm with linesearch

Article 16 April 2024

Approximate Bayesian computation using asymptotically normal point estimates

Article 27 April 2022

Point Cloud Generation for Meshfree Methods: An Overview

Article Open access 28 September 2022

References

Ahmadian Y, Pillow JW, Paninski L (2011) Efficient Markov chain Monte Carlo methods for decoding neural spike trains. Neural Comput 23:46–96
Article MathSciNet MATH Google Scholar
Ahn S, Chen Y, Welling M (2013) Distributed and adaptive darting monte carlo through regenerations. In: International conference on artificial intelligence and statistics
Ahn S, Shahbaba B, Welling M (2014) Distributed stochastic gradient MCMC. In: International conference on machine learning
Alder BJ, Wainwright TE (1959) Studies in molecular dynamics. I. General method. J Chem Phys 31(1959):459–466
Article MathSciNet Google Scholar
Barthelmann V, Novak E, Ritter K (2000) High dimensional polynomial interpolation on sparse grids. Adv Comput Math 12:273–288
Article MathSciNet MATH Google Scholar
Beskos A, Pinski FJ, Sanz-Serna JM, Stuart AM (2011) Hybrid monte-carlo on hilbert spaces. Stoch Process Appl 121:2201–2230
Article MathSciNet MATH Google Scholar
Bungartz HJ (1998) Finite elements of higher order on sparse grids. Sharker Verlag
Bungartz HJ, Griebel M (2004) Sparse grids. Acta Numer. 13:147–169
Article MathSciNet MATH Google Scholar
Calderhead B, Sustik M (2012) Sparse approximate manifolds for differential geometric mcmc. In: Bartlett P, Pereira FCN, Burges CJC, Bottou L, Weinberger KQ (eds) Advances in neural information processing systems 25, pp 2888–2896
Chen T, Fox EB, Guestrin C (2014) Stochastic gradient hamiltonian monte carlo. Preprint
Conard PR, Marzouk YM, Pillai NS, Smith A (2014) Asymptotically exact mcmc algorithms via local approximations of computationally intensive models. pp 1–38, arXiv:1402.1694v1
Dashti M, Stuart AM (2011) Uncertainty quantification and weak approximation of an elliptic inverse problem. SIAM J. Numer. Anal. 6:2524–2542
Article MathSciNet MATH Google Scholar
Duane S, Kennedy AD, Pendleton BJ, Roweth D (1987) Hybrid Monte Carlo. Phys Lett B 195:216–222
Article Google Scholar
Geyer CJ (1992) Practical Markov Chain Monte Carlo. Stat Sci 7:473–483
Article Google Scholar
Girolami M, Calderhead B (2011) Riemann manifold Langevin and Hamiltonian Monte Carlo methods. J R Stat Soc Ser B 73:123–214
Article MathSciNet Google Scholar
Hoffman M, Gelman A (2011) The No-U-Turn sampler: adaptively setting path lengths in Hamiltonian Monte Carlo. arXiv:1111.4246
Hoffman, M, Blei DM, Bach FR (2010) Online learning for latent dirichlet allocation. In: Lafferty JD, Williams CKI, Shawe-Taylor J, Zemel RS, Culotta A (eds) NIPS. Curran Associates, Inc., NY, pp 856–864
Klimke A, Wohlmuth B (2005) Algorithm 847: spinterp: piecewise multilinear hierarchical sparse grid interpolation in matlab 31:561–579
Lan S, Zhou B, Shahbaba B (2014) Spherical Hamiltonian Monte Carlo for constrained target distributions. In: 31th International conference on machine learning (to appear)
Metropolis N, Rosenbluth AW, Rosenbluth MN, Teller AH, Teller E (1953) Equation of state calculations by fast computing machines. J Chem Phys 21:1087–1092
Article Google Scholar
Neal RM (1998) Regression and classification using Gaussian process priors. Bayesian Stat 6:471–501
Google Scholar
Neal RM (2011) MCMC using Hamiltonian dynamics. In: Brooks S, Gelman A, Jones G, Meng XL (eds) Handbook of Markov Chain Monte Carlo, Chapman and Hall/CRC, pp 113–162
Rasmussen CE (1996) Evaluation of Gaussian processes and other methods for non-linear regression. Ph.D. thesis, Department of Computer Science, University of Toronto
Rasmussen CE (2003) Gaussian processes to speed up hybrid monte carlo for expensive bayesian integrals. In: Bayesian statistics, pp 651–659
Shahbaba B, Lan S, Johnson WO, Neal RM (2014) Split Hamiltonian Monte Carlo. Stat Comput 24:339–349
Article MathSciNet MATH Google Scholar
Wasilkowski GW, Wozoniakowski H (1995) Explicit cost bounds of algorithms for multivariate tensor product problems. J Complex 11:1–56
Article MathSciNet MATH Google Scholar
Welling M, Teh YW (2011) Bayesian learning via stochastic gradient langevin dynamics. In: Proceedings of the 28th international conference on machine learning (ICML), pp 681–688

Download references

Acknowledgments

We thank Wataru Sakamoto for his careful reading of the manuscript and useful feedbacks. We also thank the anonymous referees for suggestions that improved the exposition of this work. In addition, we appreciate the discussions with S. Lan and T. Chen. C. Zhang and H. Zhao were partially supported by NSF Grant DMS-1418422. B. Shahbaba and H. Zhao are partially supported by NSD Grant DMS-1622490.

Author information

Authors and Affiliations

Department of Mathematics, University of California, Irvine, Irvine, CA, USA
Cheng Zhang & Hongkai Zhao
Department of Statistics, University of California, Irvine, Irvine, CA, USA
Babak Shahbaba

Authors

Cheng Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Babak Shahbaba
View author publications
You can also search for this author in PubMed Google Scholar
Hongkai Zhao
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Cheng Zhang.

Appendices

Appendix 1: Convergence to the correct distribution

In order to prove that the equilibrium distribution remains the same, it suffices to show that the detailed balance condition still holds. Note that the alternative Hamiltonian $\tilde{H}(q,p)$ defines a surrogate-induced Hamiltonian flow, parameterized by the trajectory length t, which is a map $\tilde{\phi }_t:\;(q,p)\rightarrow (q^*,p^{*})$. Here, $(q^*,p^*)$ is the end-point of the trajectory governed by the following equations

$$\begin{aligned} \frac{dq}{dt} = \frac{\partial \tilde{H}}{\partial p} = M^{-1}p,\quad \frac{dp}{dt} = -\frac{\partial \tilde{H}}{\partial q} = -\frac{\partial \tilde{U}}{\partial q} = -\tilde{F} \end{aligned}$$

Denote $\theta =(q,p),\;\theta '=(q^*,p^*)=\tilde{\phi }_t(\theta )$. In the Metropolis-Hasting step, we use the original Hamiltonian to compute the acceptance probability

$$\begin{aligned} \alpha (\theta ,\theta ') = \min (1,\exp [-H(\theta ')+H(\theta )]) \end{aligned}$$

therefore,

$$\begin{aligned}&\alpha (\theta ,\theta '){\mathbb {P}}(d\theta ) = \alpha (\theta ,\theta ')\exp [-H(\theta )]d\theta \\&\quad \mathop {=}\limits ^{\theta =\tilde{\phi }_t^{-1}(\theta ')} \min (\exp [-H(\theta )],\exp [-H(\theta ')])\left| \frac{d\theta }{d\theta '}\right| d\theta '\\&\quad = \alpha (\theta ',\theta )\exp [-H(\theta ')]d\theta '\\&\quad = \alpha (\theta ',\theta ) {\mathbb {P}}(d\theta ') \end{aligned}$$

since $ \left| \frac{d\theta }{d\theta '}\right| = 1$ due to the volume conservation property of the surrogate induced Hamiltonian flow $\tilde{\phi }_t$. Now that we showed the detailed balance condition is satisfied, along with the reversibility of the surrogate induced Hamiltonian flow, the modified Markov chain will converge to the correct target distribution.

Appendix 2: More on sparse grid

1.1 Construction of sparse grid and basis functions

The Clenshaw–Curtis type sparse grid $H^{CC}$ introduced in Sect. 3.4.2 is constructed from the following formulas. Here, the $x_j^i$ are defined as

$$\begin{aligned} x_j^i =\left\{ \begin{array}{ll}(j-1)/(m_i-1),\; &{}j = 1,\ldots ,m_i, m_i > 1\\ 0.5, &{} j=1, m_i =1 \end{array}\right. \end{aligned}$$

In order to obtain nested sets of points, the number of nodes is given by

$$\begin{aligned} m_1 = 1\quad \text { and }\quad m_i = 2^{i-1} + 1 \text { for } i > 1 \end{aligned}$$

Piecewise linear basis functions a can be used for the univariate interpolation formulas $U^i(f)$.

$$\begin{aligned} a_1^1(x) = 1,\quad a_j^i(x) = \left\{ \begin{array}{ll} 1-(m_i-1)\cdot |x-x_j^i|,\;&{} |x-x_j^i| < 1/(m_i-1),\\ 0, &{} \text {otherwise} \end{array}\right. \end{aligned}$$

for $i>1$ and $j =1,\ldots ,m_i$.

1.2 Derivation of the hierarchical form

With the selection of nested sets of points, we can easily transform the univariate nodal basis into the hierarchical one. By definition, we have

$$\begin{aligned} \Delta ^i(f)&= U^i(f) - U^{i-1}(f) \\&= \sum _{j=1}^{m_i}f(x_j^i)\cdot a_j^i - \sum _{j=1}^{m_i} U^{i-1}(f)(x_j^i)\cdot a_j^i\\&= \sum _{j=1}^{m_i}\big (f(x_j^i) - U^{i-1}(f)(x_j^i)\big ) \cdot a_j^i \end{aligned}$$

since $f(x_j^i) - U^{i-1}(f)(x_j^i) = 0,\; \forall \; x_j^i \in X^{i-1}$,

$$\begin{aligned} \Delta ^i(f) = \sum _{x_j^i\in X^i_{\Delta }}\big (f(x_j^i)-U^{i-1}(f)(x_j^i)\big )\cdot a_j^i \end{aligned}$$

(6.1)

From (6.1) we note that for all $\Delta ^i(f)$, only the basis functions belonging to the grid points that have not yet occurred in a previous set $X^{i-k},\;1\le k \le i-1$ are involved. Figure 15 gives a comparison of the nodal and the hierarchical basis functions and Fig. 16 shows the construction of the interpolation formula using nodal basis functions and function values versus using hierarchical basis functions and hierarchical surpluses for a univariate function f. Both figures are reproductions based on Klimke and Wohlmuth (2005).

Applying the tensor product formula (3.7) with $\Delta ^i$ given in (6.1), the hierarchical update in the Smolyak algorithm (3.8) now can be rewritten as

$$\begin{aligned} \Delta A_{q,d}(f) =&\sum _{|{\varvec{i}}|=q}(\Delta ^{i_1}\otimes \cdots \otimes \Delta ^{i_d})(f)\\ =&\sum _{|{\varvec{i}}|=q}\sum _{x_{j_1}^{i_1}\in X_{\Delta }^{i_1}}\cdots \sum _{x_{j_d}^{i_d}\in X_{\Delta }^{i_d}}\big (f(x_{j_1}^{i_1},\ldots ,x_{j_d}^{i_d})-A_{q-1,d}(f)(x_{j_1}^{i_1},\ldots ,x_{j_d})\big )\nonumber \\&\cdot (a_{j_1}^{i_1}\otimes \cdots \otimes a_{j_d}^{i_d}). \end{aligned}$$

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zhang, C., Shahbaba, B. & Zhao, H. Precomputing strategy for Hamiltonian Monte Carlo method based on regularity in parameter space. Comput Stat 32, 253–279 (2017). https://doi.org/10.1007/s00180-016-0683-1

Download citation

Received: 11 December 2015
Accepted: 03 September 2016
Published: 22 September 2016
Issue Date: March 2017
DOI: https://doi.org/10.1007/s00180-016-0683-1

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Precomputing strategy for Hamiltonian Monte Carlo method based on regularity in parameter space

Abstract

Access this article

Similar content being viewed by others

Preconditioned golden ratio primal-dual algorithm with linesearch

Approximate Bayesian computation using asymptotically normal point estimates

Point Cloud Generation for Meshfree Methods: An Overview

References

Acknowledgments