Modified Hamiltonian Monte Carlo for Bayesian inference

Radivojević, Tijana; Akhmatskaya, Elena

doi:10.1007/s11222-019-09885-x

Modified Hamiltonian Monte Carlo for Bayesian inference

Published: 22 July 2019

Volume 30, pages 377–404, (2020)
Cite this article

Statistics and Computing Aims and scope Submit manuscript

1757 Accesses
15 Citations
3 Altmetric
Explore all metrics

Abstract

The Hamiltonian Monte Carlo (HMC) method has been recognized as a powerful sampling tool in computational statistics. We show that performance of HMC can be significantly improved by incorporating importance sampling and an irreversible part of the dynamics into a chain. This is achieved by replacing Hamiltonians in the Metropolis test with modified Hamiltonians and a complete momentum update with a partial momentum refreshment. We call the resulting generalized HMC importance sampler Mix & Match Hamiltonian Monte Carlo (MMHMC). The method is irreversible by construction and further benefits from (i) the efficient algorithms for computation of modified Hamiltonians; (ii) the implicit momentum update procedure and (iii) the multistage splitting integrators specially derived for the methods sampling with modified Hamiltonians. MMHMC has been implemented, tested on the popular statistical models and compared in sampling efficiency with HMC, Riemann Manifold Hamiltonian Monte Carlo, Generalized Hybrid Monte Carlo, Generalized Shadow Hybrid Monte Carlo, Metropolis Adjusted Langevin Algorithm and Random Walk Metropolis–Hastings. To make a fair comparison, we propose a metric that accounts for correlations among samples and weights and can be readily used for all methods which generate such samples. The experiments reveal the superiority of MMHMC over popular sampling techniques, especially in solving high-dimensional problems.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 2

Parsimonious ultrametric Gaussian mixture models

Article Open access 01 April 2024

Carlo Cavicchia, Maurizio Vichi & Giorgia Zaccaria

Novel sampling method for the von Mises–Fisher distribution

Article 26 March 2024

Seungwoo Kang & Hee-Seok Oh

Approximate Bayesian computation using asymptotically normal point estimates

Article 27 April 2022

George Karabatsos

References

Afshar, H.M., Domke, J.: Reflection, refraction, and Hamiltonian Monte Carlo. In: Advances in Neural Information Processing Systems (NIPS) (2015)
Akhmatskaya, E., Reich, S.: The targeted shadowing hybrid Monte Carlo (TSHMC) method. In: New Algorithms for Macromolecular Simulation. Lecture Notes in Computational Science and Engineering, vol. 49, pp. 141–153. Springer, Berlin (2006)
Chapter Google Scholar
Akhmatskaya, E., Reich, S.: GSHMC: an efficient method for molecular simulation. J. Comput. Phys. 227(10), 4934–4954 (2008). https://doi.org/10.1002/andp.19053221004
Article MathSciNet MATH Google Scholar
Akhmatskaya, E., Reich, S.: New hybrid Monte Carlo methods for efficient sampling: from physics to biology and statistics. Progr. Nucl. Sci. Technol. 2, 447–462 (2012)
Article Google Scholar
Akhmatskaya, E., Reich, S., Nobes, R.: Method, apparatus and computer program for molecular simulation. GB patent (published) (2009)
Akhmatskaya, E., Nobes, R., Reich, S.: Method, apparatus and computer program for molecular simulation. US patent (granted) (2011)
Akhmatskaya, E., Fernández-Pendás, M., Radivojević, T., Sanz-Serna, J.M.: Adaptive splitting integrators for enhancing sampling efficiency of modified Hamiltonian Monte Carlo methods in molecular simulation. Langmuir 33(42), 11530–11542 (2017). https://doi.org/10.1021/acs.langmuir.7b01372
Article Google Scholar
Beskos, A., Pillai, N., Roberts, G., Sanz-Serna, J., Stuart, A.: Optimal tuning of the hybrid Monte Carlo algorithm. Bernoulli 19, 1501–1534 (2013)
Article MathSciNet Google Scholar
Betancourt, M.: Nested sampling with constrained Hamiltonian Monte Carlo. AIP Conf. Proc. 1305, 165–172 (2011). https://doi.org/10.1063/1.3573613
Article MathSciNet Google Scholar
Betancourt, M.: A general metric for Riemannian manifold Hamiltonian Monte Carlo. In: Geometric Science of Information, pp. 327–334. Springer (2013a)
Betancourt, M.: Generalizing the No-U-Turn sampler to Riemannian manifolds (2013b). arXiv:1304.1920v1
Betancourt, M.: Adiabatic Monte Carlo (2014). arXiv:1405.3489v4
Betancourt, M.: A conceptual introduction to Hamiltonian Monte Carlo (2017). arXiv:1701.02434v1
Betancourt, M., Girolami, M.: Hamiltonian Monte Carlo for hierarchical models. Curr. Trends Bayesian Methodol. Appl. 79, 30 (2015)
Google Scholar
Betancourt, M., Byrne, S., Livingstone, S., Girolami, M.: The geometric foundations of Hamiltonian Monte Carlo. Bernoulli 23(4A), 2257–2298 (2017). https://doi.org/10.3150/16-bej810
Article MathSciNet MATH Google Scholar
Blanes, S., Casas, F., Sanz-Serna, J.M.: Numerical integrators for the Hybrid Monte Carlo method. SIAM J. Sci. Comput. 36(4), A1556–A1580 (2014)
Article MathSciNet Google Scholar
Bonilla, M.R., Lozano, A., Escribano, B., Carrasco, J., Akhmatskaya, E.: Revealing the mechanism of sodium diffusion in $\text{ Na }_x\text{ FePo }_4$ using an improved force field. J. Phys. Chem. C 122(15), 8065–8075 (2018). https://doi.org/10.1021/acs.jpcc.8b00230
Article Google Scholar
Bornn, L., Cornebise, J.: Discussion on the paper by Girolami and Calderhead. J. R. Stat. Soc. Ser. B (Stat. Methodol.) 73(2), 123–214 (2011)
Article MathSciNet Google Scholar
Bouchard-Côté, A., Vollmer, S.J., Doucet, A.: The bouncy particle sampler: a non-reversible rejection-free Markov chain Monte Carlo method. J. Am. Stat. Assoc. 113(522), 855–867 (2018). https://doi.org/10.1080/01621459.2017.1294075
Article MATH Google Scholar
Brubaker, M.A., Salzmann, M., Urtasun, R.: A family of MCMC methods on implicitly defined manifolds. In: International Conference on Artificial Intelligence and Statistics (AISTATS), pp. 161–172 (2012)
Campos, C.M., Sanz-Serna, J.M.: Extra chance generalized hybrid Monte Carlo. J. Comput. Phys. 281, 365–374 (2015)
Article MathSciNet Google Scholar
Chen, L., Qin, Z., Liu, J.S.: Exploring hybrid Monte Carlo in Bayesian computation. In: ISBA 2000, Proceedings (2000)
Chen, T., Fox, E.B., Guestrin, C.: Stochastic gradient Hamiltonian Monte Carlo. In: Proceedings of the 31st International Conference on Machine Learning, Beijing, China (2014)
Dinh, V., Bilge, A., Zhang, C., IV, F.A.M.: Probabilistic path Hamiltonian Monte Carlo. In: Proceedings of the 34th International Conference on Machine Learning (2017)
Duane, S., Kennedy, A.D., Pendleton, B.J., Roweth, D.: Hybrid Monte Carlo. Phys. Lett. B 195(2), 216–222 (1987)
Article MathSciNet Google Scholar
Duncan, A.B., Leliévre, T., Pavliotis, G.A.: Variance reduction using nonreversible Langevin samplers. J. Stat. Phys. 163(3), 457–491 (2016)
Article MathSciNet Google Scholar
Duncan, A.B., Pavliotis, G.A., Zygalakis, K.C.: Nonreversible Langevin samplers: splitting schemes, analysis and implementation (2017). arXiv:1701.04247
Escribano, B., Lozano, A., Radivojević, T., Fernández-Pendás, M., Carrasco, J., Akhmatskaya, E.: Enhancing sampling in atomistic simulations of solid state materials for batteries: a focus on olivine $\text{ NaFePO }_4$. Theor. Chem. Acc. 136, 43 (2017). https://doi.org/10.1007/s00214-017-2064-4
Article Google Scholar
Fang, Y., Sanz-Serna, J.M., Skeel, R.D.: Compressible generalized hybrid Monte Carlo. J. Chem. Phys. 140(17), 174108 (2014)
Article Google Scholar
Fu, T., Luo, L., Zhang, Z.: Quasi-Newton Hamiltonian Monte Carlo. In: Proceedings of Uncertainty in Artificial Intelligence, pp. 212–221 (2016)
García Daza, F., Bonilla, M.R., Llordés, A., Carrasco, J., Akhmatskaya, E.: Atomistic insight into ion transport and conductivity in ga/al-substituted li7la3zr2o12 solid electrolytes. ACS Appl. Mater. Interfaces 11, 753–765 (2019). https://doi.org/10.1021/acsami.8b17217
Article Google Scholar
Geyer, C.J.: Practical Markov chain Monte Carlo. Stat. Sci. 7(4), 473–483 (1992)
Article Google Scholar
Girolami, M., Calderhead, B.: Riemann Manifold Langevin and Hamiltonian Monte Carlo methods. J. R. Stat. Soc. Ser. B (Stat. Methodol.) 73(2), 123–214 (2011)
Article MathSciNet Google Scholar
Graham, M.M., Storkey, A.J.: Continuously tempered Hamiltonian Monte Carlo. In: Proceedings of Uncertainty in Artificial Intelligence (2017)
Gramacy, R., Samworth, R., King, R.: Importance tempering. Stat. Comput. 20(1), 1–7 (2010)
Article MathSciNet Google Scholar
Hairer, E., Lubich, C., Wanner, G.: Geometric Numerical Integration, 2nd edn. Springer, Berlin (2006)
MATH Google Scholar
Hoffman, M.D., Gelman, A.: The No-U-Turn sampler: adaptively setting path lengths in Hamiltonian Monte Carlo. J. Mach. Learn. Res. 15, 1593–1623 (2014)
MathSciNet MATH Google Scholar
Horowitz, A.M.: A generalized guided Monte Carlo algorithm. Phys. Lett. B 268, 247–252 (1991)
Article Google Scholar
Izaguirre, J.A., Hampton, S.S.: Shadow hybrid Monte Carlo: an efficient propagator in phase space of macromolecules. J. Comput. Phys. 200, 581–604 (2004)
Article Google Scholar
Jacquier, E., Polson, N.G., Rossi, P.E.: Bayesian analysis of stochastic volatility models. J. Bus. Econ. Stat. 12, 4 (1994)
MATH Google Scholar
Kennedy, A.D.: The theory of hybrid stochastic algorithms. In: Probabilistic Methods in Quantum Field Theory and Quantum Gravity, pp. 209–223. Springer (1990)
Kennedy, A.D., Pendleton, B.: Cost of the generalised hybrid Monte Carlo algorithm for free field theory. Nucl. Phys. B 607, 456–510 (2001). https://doi.org/10.1016/S0550-3213(01)00129-8
Article MathSciNet MATH Google Scholar
Kim, S., Shephard, N., Chib, S.: Stochastic volatility: likelihood inference and comparison with arch models. Rev. Econ. Stud. 65, 361–393 (1998)
Article Google Scholar
Kleppe, T.S.: Dynamically rescaled Hamiltonian Monte Carlo for Bayesian hierarchical models (2018). arXiv:1806.02068v1
Kong, A., Liu, J.S., Wong, W.H.: Sequential imputations and Bayesian missing data problems. J. Am. Stat. Assoc. 89(425), 278–288 (1994)
Article Google Scholar
Lan, S., Streets, J., Shahbaba, B.: Wormhole Hamiltonian Monte Carlo. In: Proceedings of the 28th AAAI Conference on Artificial Intelligence (2014a)
Lan, S., Zhou, B., Shahbaba, B.: Spherical Hamiltonian Monte Carlo for constrained target distributions. In: Proceedings of the 31st International Conference on Machine Learning, pp. 629–637 (2014b)
Lan, S., Stathopoulos, V., Shahbaba, B., Girolami, M.: Lagrangian dynamical Monte Carlo. J. Comput. Graph. Stat. 24(2), 357–378 (2015)
Article Google Scholar
Leimkuhler, B., Reich, S.: Simulating Hamiltonian Dynamics. Cambridge University Press, Cambridge (2005)
Book Google Scholar
Levy, D., Hoffman, M.D., Sohl-Dickstein, J.: Generalizing Hamiltonian Monte Carlo with neural networks. In: 6th International Conference on Learning Representations (2018)
Lichman, M.: UCI machine learning repository (2013). http://archive.ics.uci.edu/ml. Accessed May 2015
Liu, J.S.: Monte Carlo Strategies in Scientific Computing. Springer Series in Statistics. Springer, New York (2008)
Google Scholar
Livingstone, S., Betancourt, M., Byrne, S., Girolami, M.: On the geometric ergodicity of Hamiltonian Monte Carlo (2016). arXiv:1601.08057v1
Livingstone, S., Faulkner, M.F., Roberts, G.O.: Kinetic energy choice in Hamiltonian/hybrid Monte Carlo (2017). arXiv:1706.02649v2
Lu, X., Perrone, V., Hasenclever, L., Teh, Y.W., Vollmer, S.J.: Relativistic Monte Carlo. In: International Conference on Artificial Intelligence and Statistics (AISTATS) (2017)
Luo, R., Yang, Y., Wang, J., Liu, Y.: Thermostat-assisted Continuous-tempered Hamiltonian Monte Carlo for multimodal posterior sampling. In: NIPS Advances in Approximate Bayesian Inference Workshop (2017)
Ma, Y.A., Fox, E.B., Chen, T., Wu, L.: A unifying framework for devising efficient and irreversible MCMC samplers (2016). arXiv:1608.05973v3
Mackenzie, P.B.: An improved hybrid Monte Carlo method. Phys. Lett. B 226, 369–371 (1989)
Article Google Scholar
McLachlan, R.I.: On the numerical integration of ordinary differential equations by symmetric composition methods. SIAM J. Sci. Comput. 16(1), 151–168 (1995)
Article MathSciNet Google Scholar
Neal, R.M.: Bayesian learning for neural networks. Ph.D. Thesis, Department of Computer Science, University of Toronto (1994)
Neal, R.M.: Annealed importance sampling. Stat. Comput. 11, 125–139 (2001)
Article MathSciNet Google Scholar
Neal, R.M.: Improving asymptotic variance of MCMC estimators: non-reversible chains are better. Technical Report 0406, Department of Statistics, University of Toronto (2004)
Neal, R.M.: MCMC using Hamiltonian dynamics. In: Brooks, S., Gelman, A., Jones G.L., Meng, X.-L. (eds.) Handbook of Markov Chain Monte Carlo, vol. 2, pp. 113–162. Chapman & Hall/CRC (2011)
Nishimura, A., Dunson, D.B.: Recycling intermediate steps to improve Hamiltonian Monte Carlo (2015). arXiv:1511.06925v1
Nishimura, A., Dunson, D.B.: Geometrically tempered Hamiltonian Monte Carlo (2017). arXiv:1604.00872v2
Nishimura, A., Dunson, D., Lu, J.: Discontinuous Hamiltonian Monte Carlo for models with discrete parameters and discontinuous likelihoods (2018). arXiv:1705.08510v2
Ohzeki, M., Ichiki, A.: Mathematical understanding of detailed balance condition violation and its application to Langevin dynamics. J. Phys. Conf. Ser. 638, 012003 (2015). https://doi.org/10.1088/1742-6596/638/1/012003
Article Google Scholar
Ottobre, M.: Markov chain Monte Carlo and irreversibility. Rep. Math. Phys. 77(3), 267–292 (2016)
Article MathSciNet Google Scholar
Ottobre, M., Pillai, N.S., Pinski, F.J., Stuart, A.M.: A function space HMC algorithm with second order Langevin diffusion limit. Bernoulli 22(1), 60–106 (2016)
Article MathSciNet Google Scholar
Pakman, A., Paninski, L.: Auxiliary-variable exact Hamiltonian Monte Carlo samplers for binary distributions. In: Advances in Neural Information Processing Systems (NIPS), pp. 2490–2498 (2013)
Plummer, M., Best, N., Cowles, K., Vines, K.: CODA: convergence diagnosis and output analysis for MCMC. R News 6(1), 7–11 (2006)
Google Scholar
Radivojević, T.: Enhancing sampling in computational statistics using modified Hamiltonians. Ph.D. Thesis, UPV-EHU (2016)
Radivojević, T., Fernández-Pendás, M., Sanz-Serna, J.M., Akhmatskaya, E.: Multi-stage splitting integrators for sampling with modified Hamiltonian Monte Carlo methods. J. Comput. Phys. 373, 900–916 (2018). https://doi.org/10.1016/j.jcp.2018.07.023
Article MathSciNet MATH Google Scholar
Rimoldini, L.: Weighted skewness and kurtosis unbiased by sample size and Gaussian uncertainties. Astron. Comput. 5, 1–8 (2014)
Article Google Scholar
Salvatier, J., Wiecki, T.V., Fonnesbeck, C.: Probabilistic programming in python using pymc3. PeerJ Comput. Sci. 2, e55 (2016)
Article Google Scholar
Sohl-Dickstein, J.: Hamiltonian Monte Carlo with reduced momentum flips (2012). arXiv:1205.1939v1
Sohl-Dickstein, J., Culpepper, B.J.: Hamiltonian annealed importance sampling for partition function estimation (2012). arXiv:1205.1925
Sohl-Dickstein, J., Mudigonda, M., Deweese, M.: Hamiltonian Monte Carlo without detailed balance. In: Proceedings of the 31st International Conference on Machine Learning, pp. 719–726 (2014)
Stan Development Team: Stan Modeling Language User’s Guide and Reference Manual, version 2.17.0 ed. (2017)
Strathmann, H., Sejdinovic, D., Livingstone, S., Szabo, Z., Gretton, A.: Gradient-free Hamiltonian Monte Carlo with efficient kernel exponential families. In: Advances in Neural Information Processing Systems (NIPS), pp. 955–963 (2015)
Suwa, H., Todo, S.: General construction of irreversible kernel in Markov Chain Monte Carlo (2012). arXiv:1207.0258
Sweet, C.R., Hampton, S.S., Skeel, R.D., Izaguirre, J.A.: A separable shadow Hamiltonian hybrid Monte Carlo method. J. Chem. Phys. 131, 174106 (2009). https://doi.org/10.1063/1.3253687
Article Google Scholar
Tripuraneni, N., Rowland, M., Ghahramani, Z., Turner, R.: Magnetic Hamiltonian Monte Carlo. In: Proceedings of the 34th International Conference on Machine Learning (2017). arXiv:1607.02738v2
van de Meent, J.W., Paige, B., Wood, F.: Tempering by subsampling (2014). arXiv:1401.7145v1
Wang, Z., de Freitas, N.: Predictive adaptation of hybrid Monte Carlo with Bayesian parametric bandits. In: NIPS Deep Learning and Unsupervised Feature Learning Workshop (2011)
Wang, Z., Mohamed, S., de Freitas, N.: Adaptive Hamiltonian and Riemann manifold Monte Carlo samplers. In: Proceedings of the 30th International Conference on Machine Learning, pp. 1462–1470 (2013)
Wee, C.L., Sansom, M.S., Reich, S., Akhmatskaya, E.: Improved sampling for simulations of interfacial membrane proteins: application of generalized shadow hybrid Monte Carlo to a peptide toxin/bilayer system. J. Phys. Chem. B 112(18), 5710–5717 (2008)
Article Google Scholar
Yi, K., Doshi-Velez, F.: Roll-back Hamiltonian Monte Carlo. In: 31st Conference on Neural Information Processing Systems (NIPS) (2017)
Zhang, Y., Sutton, C.: Semi-separable Hamiltonian Monte Carlo for inference in Bayesian hierarchical models. In: Advances in Neural Information Processing Systems (NIPS), pp. 10–18 (2014)
Zhang, Y., Ghahramani, Z., Storkey, A.J., Sutton, C.A.: Continuous relaxations for discrete Hamiltonian Monte Carlo. In: Advances in Neural Information Processing Systems (NIPS), pp. 3194–3202 (2012)
Zhang, Y., Wang, X., Chen, C., Fan, K., Carin, L.: Towards unifying Hamiltonian Monte Carlo and slice sampling. In: Advances in Neural Information Processing Systems (NIPS), pp. 1749–1757 (2016)
Zhang, C., Shahbaba, B., Zhao, H.: Hamiltonian Monte Carlo acceleration using surrogate functions with random bases. Stat. Comput. 27(6), 1473–1490 (2017a)
Article MathSciNet Google Scholar
Zhang, C., Shahbaba, B., Zhao, H.: Precomputing strategy for Hamiltonian Monte Carlo method based on regularity in parameter space. Comput. Stat. 32(1), 253–279 (2017b)
Article MathSciNet Google Scholar
Zhang, Y., Chen, C., Gan, Z., Henao, R., Carin, L.: Stochastic gradient monomial Gamma sampler. In: Proceedings of the 34th International Conference on Machine Learning (2017c)
Zhang, C., Shahbaba, B., Zhao, H.: Variational Hamiltonian Monte Carlo via score matching. Bayesian Anal. 13(2), 485–506 (2018)
Article MathSciNet Google Scholar
Zou, D., Xu, P., Gu, Q.: Stochastic variance-reduced Hamilton Monte Carlo methods. In: Proceedings of the 35th International Conference on Machine Learning, vol. 80, pp. 6028–6037 (2018)

Download references

Acknowledgements

The authors would like to thank the financial support from MTM2016-76329-R (AEI/FEDER, EU) funded by the Spanish Ministry of Economy and Competitiveness (MINECO) and from Basque Government–ELKARTEK Program, Grant KK-2018/00054. This work has been possible thanks to the support of the computing infrastructure of the i2BASQUE academic network and the technical and human support provided by IZO-SGI SGIker of UPV/EHU and European funding (ERDF and ESF). This research is also supported by the Basque Government through the BERC 2018-2021 program and by MINECO: BCAM Severo Ochoa accreditation SEV-2017-0718. This work was also part of the Agile BioFoundry (http://agilebiofoundry.org) supported by the US Department of Energy, Energy Efficiency and Renewable Energy, Bioenergy Technologies Office, through contract DE-AC02-05CH11231 between Lawrence Berkeley National Laboratory and the US Department of Energy. The views and opinions of the authors expressed herein do not necessarily state or reflect those of the US Government or any agency thereof. Neither the US Government nor any agency thereof, nor any of their employees, makes any warranty, expressed or implied, or assumes any legal liability or responsibility for the accuracy, completeness or usefulness of any information, apparatus, product or process disclosed, or represents that its use would not infringe privately owned rights. The US Government retains and the publisher, by accepting the article for publication, acknowledges that the US Government retains a nonexclusive, paid-up, irrevocable, worldwide license to publish or reproduce the published form of this manuscript, or allow others to do so, for US Government purposes. The Department of Energy will provide public access to these results of federally sponsored research in accordance with the DOE Public Access Plan (http://energy.gov/downloads/doe-public-access-plan).

Author information

Tijana Radivojević
Present address: Biological Systems and Engineering Division, Lawrence Berkeley National Laboratory, Berkeley, CA, 94720, USA
Tijana Radivojević
Present address: DOE Agile Biofoundry, 5885 Hollis Street, Emeryville, CA, 94608, USA

Authors and Affiliations

Basque Center for Applied Mathematics, Mazarredo 14, 48009, Bilbao, Spain
Tijana Radivojević & Elena Akhmatskaya
IKERBASQUE, Basque Foundation for Science, María Díaz de Haro 3, 48013, Bilbao, Spain
Elena Akhmatskaya

Authors

Tijana Radivojević
View author publications
You can also search for this author in PubMed Google Scholar
Elena Akhmatskaya
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Tijana Radivojević.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix A: Invariance of the PMMC step

The Partial Momentum Monte Carlo step of the MMHMC method leaves the importance target distribution $\tilde{\pi }$ (Eq. 12) invariant if for a transition kernel $T(\cdot |\cdot )$ the following condition is satisfied

$$\begin{aligned} \tilde{\pi }(\varvec{\theta }',\mathbf {p}') =\int \tilde{\pi }(\varvec{\theta },\mathbf {p})T \left( (\varvec{\theta }',\mathbf {p}')|(\varvec{\theta }, \mathbf {p})\right) \mathrm{d}\varvec{\theta } \mathrm{d}\mathbf {p} \end{aligned}$$

for all $n=1,\dots ,N$.

The PMMC step is sampling on a space augmented with a noise vector $\mathbf {u} \sim \mathcal {N}(0,M)$ with the extended density $\hat{\pi }$ (defined in Eq. 17), for which

$$\begin{aligned} \tilde{\pi }(\varvec{\theta },\mathbf {p})=\int \hat{\pi } (\varvec{\theta },\mathbf {p},\mathbf {u})\mathrm{d}\mathbf {u} . \end{aligned}$$

Therefore, we want to show that

$$\begin{aligned}&\hat{\pi }(\varvec{\theta }',\mathbf {p}',\mathbf {u}')\nonumber \\&\quad =\int \hat{\pi }(\varvec{\theta },\mathbf {p},\mathbf {u}) T\left( (\varvec{\theta }',\mathbf {p}',\mathbf {u}')| (\varvec{\theta },\mathbf {p},\mathbf {u})\right) \mathrm{d} \varvec{\theta } \mathrm{d}\mathbf {p} \mathrm{d}\mathbf {u}, \end{aligned}$$

(28)

for the transition kernel defined as

$$\begin{aligned}&T\left( (\varvec{\theta }',\mathbf {p}',\mathbf {u}')| (\varvec{\theta },\mathbf {p},\mathbf {u})\right) \\&\quad =\mathcal {P}\cdot \delta \left( (\varvec{\theta }', \mathbf {p}',\mathbf {u}')-\mathcal {R}(\varvec{\theta }, \mathbf {p},\mathbf {u}) \right) \\&\qquad +\,(1-\mathcal {P})\cdot \delta \left( (\varvec{\theta }', \mathbf {p}',\mathbf {u}')-\hat{\mathcal {F}}(\varvec{\theta }, \mathbf {p},\mathbf {u})\right) , \end{aligned}$$

where $\mathcal {P}=\min \left\{ 1,\hat{\pi }(\mathcal {R} (\varvec{\theta },\mathbf {p},\mathbf {u}))/\hat{\pi } (\varvec{\theta },\mathbf {p},\mathbf {u})\right\} $ is the Metropolis probability, $\delta $ is the Delta function, $\mathcal {R}$ is the proposal function (Eq. 16) and $\hat{\mathcal {F}}(\varvec{\theta },\mathbf {p},\mathbf {u}) =(\varvec{\theta },\mathbf {p},-\mathbf {u})$ is the flipping function. Note that the map $\mathcal {R}$ is volume preserving; hence, the Metropolis probability $\mathcal {P}$ does not include the Jacobian factor. For the sake of clarity, we denote $\mathbf {x}=(\varvec{\theta },\mathbf {p},\mathbf {u})$ and write the right-hand side of the expression (28) as

$$\begin{aligned}&\int T(\mathbf {x}'|\mathbf {x})\hat{\pi }(\mathbf {x}) \mathrm{d}\mathbf {x}\\&\quad =\underbrace{\int \min \left\{ \hat{\pi }(\mathbf {x}),\hat{\pi } (\mathcal {R}(\mathbf {x})) \right\} \cdot \delta \left( \mathbf {x}' - \mathcal {R}(\mathbf {x}) \right) \mathrm{d}\mathbf {x}}_\text {1st term}\\&\qquad \underbrace{+\int \hat{\pi }(\mathbf {x}) \cdot \delta \left( \mathbf {x}' - \hat{\mathcal {F}}(\mathbf {x}) \right) \mathrm{d}\mathbf {x}}_\text {2nd term}\\&\qquad \underbrace{-\int \min \left\{ \hat{\pi }(\mathbf {x}),\hat{\pi }(\mathcal {R} (\mathbf {x})) \right\} \cdot \delta \left( \mathbf {x}'- \hat{\mathcal {F}} (\mathbf {x}) \right) \mathrm{d}\mathbf {x}}_\text {3rd term}. \end{aligned}$$

Applying change of variables $\mathbf {x}=\hat{\mathcal {F}} \circ \mathcal {R}(\bar{\mathbf {x}})$, which is volume preserving, to the first term in the sum, omitting the bars, and using the fact that $\hat{\mathcal {F}} =\mathcal {R}\circ \hat{\mathcal {F}}\circ \mathcal {R}$, one obtains

$$\begin{aligned} \text {1st term}= & {} \int \min \left\{ \hat{\pi }(\hat{\mathcal {F}} \circ \mathcal {R}(\mathbf {x})),\hat{\pi }(\hat{\mathcal {F}} (\mathbf {x})) \right\} \\&\cdot \delta \left( \mathbf {x}' -\hat{\mathcal {F}}(\mathbf {x}) \right) \mathrm{d}\mathbf {x}. \end{aligned}$$

Since $\hat{\pi }\circ \hat{\mathcal {F}}=\hat{\pi }$, the first and third terms cancel out. Employing change of variables $\mathbf {x}=\hat{\mathcal {F}}(\bar{\mathbf {x}})$ to the second term and again omitting the bars lead to

$$\begin{aligned} \text {2nd term}=\int \hat{\pi }(\mathbf {x}) \cdot \delta \left( \mathbf {x}'- \mathbf {x} \right) \mathrm{d}\mathbf {x} =\hat{\pi }(\mathbf {x}'), \end{aligned}$$

which proves the equality (28).

Appendix B: Modified Hamiltonians for splitting integrators

The coefficients for the two-stage integrator family (14) and modified Hamiltonians (22)–(24) are the following:

$$\begin{aligned} c_{21}=&\frac{1}{24}\Big (6b-1\Big ) \nonumber \\ c_{22}=&\frac{1}{12}\Big (6b^2-6b+1\Big )\nonumber \\ c_{41} =&\frac{1}{5760}\Big (7-30b\Big ) \nonumber \\ c_{42} =&\frac{1}{240}\Big (-10b^2+15b-3\Big ) \nonumber \\ c_{43} =&\frac{1}{120}\Big (-30b^3+35b^2-15b+2\Big ) \nonumber \\ c_{44} =&\frac{1}{240}(20b^2-1). \end{aligned}$$

(29)

Using (29), one can also obtain the modified Hamiltonian for the Verlet integrator, since two steps of Verlet integration are equivalent to one step of the two-stage integrator with $b=1/4$. The coefficients are therefore

$$\begin{aligned} c_{21}= & {} \frac{1}{12}, \quad c_{22}=- \frac{1}{24} \\ c_{41}= & {} -\frac{1}{720}, \quad c_{42} = \frac{1}{120}, \quad c_{43} = -\frac{1}{240}, \quad c_{44} =\frac{1}{60}. \end{aligned}$$

For three-stage integrators (15) (a two-parameter family), the coefficients are

$$\begin{aligned} c_{21}= & {} \frac{1}{12} \Big (1-6a(1-a)(1-2b)\Big )\\ c_{22}= & {} \frac{1}{24} \Big (6a(1-2b)^2-1\Big )\\ c_{41}= & {} \frac{1}{720}\Big ( 1 + 2 (a-1) a (8 + 31 (a-1) a) (1 - 2 b) - 4 b\Big )\\ c_{42}= & {} \frac{1}{240} \Big (6 a^3 (1 - 2 b)^2 -a^2 (19 - 116 b + 36 b^2 + 240 b^3)\\&+\,a (27 - 208 b + 308 b^2) - 48 b^2 + 48 b -7\Big ) \\ c_{43}= & {} \frac{1}{180}\Big (1 + 15 a (1- 2 b) (-1 + 2 a (2 - 3 b + a (4 b-2)))\Big ) \\ c_{44}= & {} \frac{1}{240} \Big (-1 + 20 a (1 - 2 b) (b + a (1 + 6 (b-1) b))\Big ). \end{aligned}$$

The coefficients for the modified Hamiltonians (25)–(26) are calculated as

$$\begin{aligned} k_{21}= & {} c_{21}, \quad k_{22}=c_{22}, \\ k_{41}= & {} c_{41}, \quad k_{42}=3c_{41} +c_{42}, \\ k_{43}= & {} c_{41} +c_{44}, \quad k_{44}=3c_{41}+c_{42}+c_{43}. \end{aligned}$$

For the fourth-order modified Hamiltonian (25), we use the second-order centered finite difference approximations of time derivatives of the gradient of the potential function

$$\begin{aligned} \mathbf {U}^{(1)}=\frac{\mathbf {U}(t_{n+1}) -\mathbf {U}(t_{n-1})}{2\varepsilon }, \end{aligned}$$

(30)

with $\varepsilon =h$ for the Verlet, $\varepsilon =h/2$ for two-stage and $\varepsilon =ah$ for three-stage integrators with a being the integrator’s coefficient advancing position variables. The sixth-order modified Hamiltonian (26), here considered only for the Verlet and two-stage integrators, is calculated using fourth-order approximation for the first derivative and second-order approximations for the second and third derivatives

$$\begin{aligned} \mathbf {U}^{(1)}= & {} \frac{\mathbf {U}(t_{n-2})-8\mathbf {U}(t_{n-1}) +8\mathbf {U}(t_{n+1})-\mathbf {U}(t_{n+2})}{12\varepsilon } \\ \mathbf {U}^{(2)}= & {} \frac{\mathbf {U}(t_{n-1})-2\mathbf {U}(t_n) +\mathbf {U}(t_{n+1})}{\varepsilon ^2}\\ \mathbf {U}^{(3)}= & {} \frac{-\mathbf {U}(t_{n-2})+2\mathbf {U}(t_{n-1}) -2\mathbf {U}(t_{n+1})+\mathbf {U}(t_{n+2})}{2\varepsilon ^3}, \end{aligned}$$

where $\varepsilon $ depends on the integrator as before. The interpolating polynomial in terms of the gradient of the potential function $\mathbf {U}(t_i)=U_{\varvec{\theta }} (\varvec{\theta }^i), \; i=n-k,\dots ,n,\dots ,n +k, n \in \{0, L\}$ is constructed from a numerical trajectory $\{U_{\varvec{\theta }}(\varvec{\theta }^i\}^{L+k}_{i=-k}$ where $k=1$ and $k=2$ for the fourth- and sixth-order modified Hamiltonians, respectively.

Appendix C: Modified PMMC step

In the modified PMMC step proposed for MMHMC, a partial momentum update is integrated into the modified Metropolis test; i.e., it is implicitly present in the algorithm. This reduces the frequency of derivative calculations in the Metropolis function. To implement this idea, we recall that the momentum update probability

$$\begin{aligned} \mathcal {P}=\min \left\{ 1,\frac{\exp \big (-\hat{H}(\varvec{\theta }, \mathbf {p}^{*},\mathbf {u}^{*})}{\exp \big (-\hat{H}(\varvec{\theta }, \mathbf {p},\mathbf {u})\big )}\right\} \end{aligned}$$

(31)

depends on the error in the extended Hamiltonian (18). Let us first consider the fourth-order modified Hamiltonian (22) with analytical derivatives of the potential function. It is easy to show that the difference in the extended Hamiltonian (18) between a current state and a state with partially updated momentum is

$$\begin{aligned} \varDelta {\hat{H}}= & {} {U(\varvec{\theta })} + \frac{1}{2} (\mathbf {p}^{*})^\mathrm{T} M^{-1} \mathbf {p}^{*}\nonumber \\&+\, h^2c_{21}(\mathbf {p}^{*})^\mathrm{T} M^{-1}U_{\varvec{\theta } \varvec{\theta } }(\varvec{\theta } ) M^{-1} \mathbf {p}^{*}\nonumber \\&+\, {h^2c_{22}U_{\varvec{\theta } }(\varvec{\theta } ) M^{-1} U_{\varvec{\theta }}(\varvec{\theta })} + \frac{1}{2} (\mathbf {u}^{*})^\mathrm{T} M^{-1}\mathbf {u}^{*} \nonumber \\&-\, {U(\varvec{\theta })} - \frac{1}{2} \mathbf {p}^\mathrm{T} M^{-1} \mathbf {p} - h^2c_{21}\mathbf {p}^\mathrm{T} M^{-1}U_{\varvec{\theta } \varvec{\theta } }(\varvec{\theta } ) M^{-1} \mathbf {p} \nonumber \\&-\, {h^2c_{22}U_{\varvec{\theta }}(\varvec{\theta }) M^{-1} U_{\varvec{\theta }}(\varvec{\theta })} - \frac{1}{2} \mathbf {u}^\mathrm{T} M^{-1} \mathbf {u} \nonumber \\= & {} {h^2}c_{21}\Big (\varphi A + 2\sqrt{\varphi (1-\varphi )}B \Big ) \end{aligned}$$

(32)

with

$$\begin{aligned} A= & {} (\mathbf {u}-\mathbf {p})^\mathrm{T} U_{\varvec{\theta } \varvec{\theta } } (\varvec{\theta } ) (\mathbf {u}+\mathbf {p}) \nonumber \\ B= & {} \mathbf {u}^\mathrm{T}U_{\varvec{\theta } \varvec{\theta } } (\varvec{\theta } ) \mathbf {p}. \end{aligned}$$

(33)

For the sixth-order modified Hamiltonian (24) for Gaussian problems, the error in the extended Hamiltonian (18) can be calculated in a similar manner

$$\begin{aligned} \varDelta {\hat{H}}= & {} {h^2}c_{21}\Big (\varphi (A-B) + 2\sqrt{\varphi (1-\varphi )}C \Big )\nonumber \\&+\,{h^4}c_{44}\Big (\varphi (D-E) + 2\sqrt{\varphi (1-\varphi )}F \Big ), \end{aligned}$$

(34)

with

$$\begin{aligned} A&=\mathbf {u}^\mathrm{T} U_{\varvec{\theta } \varvec{\theta } } (\varvec{\theta } )\mathbf {u} \\ B&=\mathbf {p}^\mathrm{T}U_{\varvec{\theta } \varvec{\theta } } (\varvec{\theta } ) \mathbf {p}\\ C&=\mathbf {u}^\mathrm{T}U_{\varvec{\theta } \varvec{\theta } } (\varvec{\theta } ) \mathbf {p}\\ D&=(U_{\varvec{\theta } \varvec{\theta } }(\varvec{\theta } ) \mathbf {u})^\mathrm{T}U_{\varvec{\theta } \varvec{\theta } } (\varvec{\theta } ) \mathbf {u}\\ E&=(U_{\varvec{\theta } \varvec{\theta } }(\varvec{\theta } ) \mathbf {p})^\mathrm{T}U_{\varvec{\theta } \varvec{\theta } } (\varvec{\theta } ) \mathbf {p}\\ F&=(U_{\varvec{\theta } \varvec{\theta } }(\varvec{\theta } ) \mathbf {u})^\mathrm{T}U_{\varvec{\theta } \varvec{\theta } } (\varvec{\theta } ) \mathbf {p}. \end{aligned}$$

Therefore, if the modified Hamiltonians (22)–(24) with analytical derivatives are used, a new momentum can be determined as

$$\begin{aligned} \bar{\mathbf {p}}=\left\{ \begin{array}{l l} \sqrt{1-{\varphi }}\mathbf {p}+\sqrt{{\varphi }}\mathbf {u} &{} \text{ with } \text{ probability } \\ &{} \qquad \mathcal {P}= \min \{1,\exp (-\varDelta \hat{H})\}\\ \mathbf {p} &{} \text{ otherwise, } \end{array}\right. \end{aligned}$$

(35)

where $\mathbf {u} \sim \mathcal {N}(0,M)$ is the noise vector, $\varphi \in \left( 0,1\right] $ and $\varDelta \hat{H}$ is defined as in (32) or (34).

Consequently, for models with no hierarchical structure, there is no need to calculate gradients within the PMMC step, second derivatives can be taken from the previous Metropolis test within the HDMC step and there is no need to generate $\mathbf {u}^{*}$.

If the modified Hamiltonians are calculated using numerical time derivatives of the gradient of the potential function, for the Verlet, two- and three-stage integrators as in (25)–(26), the difference in the fourth-order extended Hamiltonian becomes

$$\begin{aligned} \varDelta {\hat{H}}= {h}k_{21}\Big ((\mathbf {p}^{*})^\mathrm{T} P^{*}_1 -\mathbf {p}^\mathrm{T}P_1 \Big ), \end{aligned}$$

(36)

whereas for the sixth-order extended Hamiltonian it is

$$\begin{aligned} \varDelta {\hat{H}}= & {} hk_{21}\Big ((\mathbf {p}^{*})^\mathrm{T} P^{*}_1 -\mathbf {p}^\mathrm{T}P_1 \Big )\\&+\,hk_{41}\Big ((\mathbf {p}^{*})^\mathrm{T} P^{*}_3 -\mathbf {p}^\mathrm{T}P_3 \Big )\\&+\,h^2k_{42}\Big (U_{\mathbf {x}}^\mathrm{T} P^{*}_2 - U_{\mathbf {x}}^\mathrm{T}P_2 \Big )\\&+\,h^2k_{43}\Big ((P^{*}_1)^\mathrm{T} P^{*}_1 -P_1^\mathrm{T}P_1 \Big ). \end{aligned}$$

Here, $P^{*}_1, P^{*}_2,P^{*}_3$, are the first-, second- and third-order scaled time derivatives of the gradient, respectively (see Sect. 2.3.3), calculated from the trajectory with updated momentum $\mathbf {p}^{*}$. The computational gain of the new PMMC step, in this case, results from skipping a calculation of the terms multiplying $k_{22}$ in (25) and $k_{44}$ in (26). It has to be admitted that the term multiplying $k_{22}$ in (25) is of negligible cost, and thus the gain from using the new momentum update is not as significant as in the case of modified Hamiltonians with analytical derivatives. On the contrary, the saving in computation arising from the absence of the term multiplying $k_{44}$ in the sixth-order modified Hamiltonian (26) is essential.

In summary, in the case of the sixth-order modified Hamiltonian, with derivatives calculated either analytically or numerically, the proposed momentum refreshment enhances computational performance of MMHMC. This also applies to the cases when the fourth-order modified Hamiltonian with analytical derivatives is used. In this situation, however, if the Hessian matrix of the potential function is dense, instead of using the modified Hamiltonian with analytical derivatives, we recommend using numerical derivatives, for which the saving is negligible. On the other hand, if the computation of the Hessian matrix is not very costly (e.g., being block diagonal, sparse, close to constant), it might be more efficient to use analytical derivatives, for which the new formulation of the Metropolis test leads to computational saving.

Appendix D: Algorithmic summary

We provide the algorithmic summary for HMC (Algorithm 1) and two alternative algorithms for the MMHMC method. One (Algorithm 2) uses the modified Hamiltonians defined through analytical derivatives of the potential function and is recommended for the problems with sparse Hessian matrices. The other algorithm (Algorithm 3) relies on the modified Hamiltonians expressed through numerical time derivatives of the gradient of the potential function. This algorithm, although including additional integration step, is beneficial for cases where higher order derivatives are computationally demanding.

Table 5 Parameter values used for the multivariate Gaussian model experiments

Full size table

Appendix E: Experimental setup

See Tables 5, 6 and 7.

Table 6 Parameter values used for the Bayesian logistic regression model experiments

Full size table

Table 7 Parameter values used for the stochastic volatility model experiments

Full size table

Rights and permissions

Reprints and permissions

About this article

Cite this article

Radivojević, T., Akhmatskaya, E. Modified Hamiltonian Monte Carlo for Bayesian inference. Stat Comput 30, 377–404 (2020). https://doi.org/10.1007/s11222-019-09885-x

Download citation

Received: 11 September 2018
Accepted: 05 July 2019
Published: 22 July 2019
Issue Date: March 2020
DOI: https://doi.org/10.1007/s11222-019-09885-x

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Modified Hamiltonian Monte Carlo for Bayesian inference

Abstract

Access this article

Similar content being viewed by others

Parsimonious ultrametric Gaussian mixture models

Novel sampling method for the von Mises–Fisher distribution

Approximate Bayesian computation using asymptotically normal point estimates

References

Acknowledgements