Skip to main content
Log in

Modified Hamiltonian Monte Carlo for Bayesian inference

  • Published:
Statistics and Computing Aims and scope Submit manuscript

Abstract

The Hamiltonian Monte Carlo (HMC) method has been recognized as a powerful sampling tool in computational statistics. We show that performance of HMC can be significantly improved by incorporating importance sampling and an irreversible part of the dynamics into a chain. This is achieved by replacing Hamiltonians in the Metropolis test with modified Hamiltonians and a complete momentum update with a partial momentum refreshment. We call the resulting generalized HMC importance sampler Mix & Match Hamiltonian Monte Carlo (MMHMC). The method is irreversible by construction and further benefits from (i) the efficient algorithms for computation of modified Hamiltonians; (ii) the implicit momentum update procedure and (iii) the multistage splitting integrators specially derived for the methods sampling with modified Hamiltonians. MMHMC has been implemented, tested on the popular statistical models and compared in sampling efficiency with HMC, Riemann Manifold Hamiltonian Monte Carlo, Generalized Hybrid Monte Carlo, Generalized Shadow Hybrid Monte Carlo, Metropolis Adjusted Langevin Algorithm and Random Walk Metropolis–Hastings. To make a fair comparison, we propose a metric that accounts for correlations among samples and weights and can be readily used for all methods which generate such samples. The experiments reveal the superiority of MMHMC over popular sampling techniques, especially in solving high-dimensional problems.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17

Similar content being viewed by others

References

  • Afshar, H.M., Domke, J.: Reflection, refraction, and Hamiltonian Monte Carlo. In: Advances in Neural Information Processing Systems (NIPS) (2015)

  • Akhmatskaya, E., Reich, S.: The targeted shadowing hybrid Monte Carlo (TSHMC) method. In: New Algorithms for Macromolecular Simulation. Lecture Notes in Computational Science and Engineering, vol. 49, pp. 141–153. Springer, Berlin (2006)

    Chapter  Google Scholar 

  • Akhmatskaya, E., Reich, S.: GSHMC: an efficient method for molecular simulation. J. Comput. Phys. 227(10), 4934–4954 (2008). https://doi.org/10.1002/andp.19053221004

    Article  MathSciNet  MATH  Google Scholar 

  • Akhmatskaya, E., Reich, S.: New hybrid Monte Carlo methods for efficient sampling: from physics to biology and statistics. Progr. Nucl. Sci. Technol. 2, 447–462 (2012)

    Article  Google Scholar 

  • Akhmatskaya, E., Reich, S., Nobes, R.: Method, apparatus and computer program for molecular simulation. GB patent (published) (2009)

  • Akhmatskaya, E., Nobes, R., Reich, S.: Method, apparatus and computer program for molecular simulation. US patent (granted) (2011)

  • Akhmatskaya, E., Fernández-Pendás, M., Radivojević, T., Sanz-Serna, J.M.: Adaptive splitting integrators for enhancing sampling efficiency of modified Hamiltonian Monte Carlo methods in molecular simulation. Langmuir 33(42), 11530–11542 (2017). https://doi.org/10.1021/acs.langmuir.7b01372

    Article  Google Scholar 

  • Beskos, A., Pillai, N., Roberts, G., Sanz-Serna, J., Stuart, A.: Optimal tuning of the hybrid Monte Carlo algorithm. Bernoulli 19, 1501–1534 (2013)

    Article  MathSciNet  Google Scholar 

  • Betancourt, M.: Nested sampling with constrained Hamiltonian Monte Carlo. AIP Conf. Proc. 1305, 165–172 (2011). https://doi.org/10.1063/1.3573613

    Article  MathSciNet  Google Scholar 

  • Betancourt, M.: A general metric for Riemannian manifold Hamiltonian Monte Carlo. In: Geometric Science of Information, pp. 327–334. Springer (2013a)

  • Betancourt, M.: Generalizing the No-U-Turn sampler to Riemannian manifolds (2013b). arXiv:1304.1920v1

  • Betancourt, M.: Adiabatic Monte Carlo (2014). arXiv:1405.3489v4

  • Betancourt, M.: A conceptual introduction to Hamiltonian Monte Carlo (2017). arXiv:1701.02434v1

  • Betancourt, M., Girolami, M.: Hamiltonian Monte Carlo for hierarchical models. Curr. Trends Bayesian Methodol. Appl. 79, 30 (2015)

    Google Scholar 

  • Betancourt, M., Byrne, S., Livingstone, S., Girolami, M.: The geometric foundations of Hamiltonian Monte Carlo. Bernoulli 23(4A), 2257–2298 (2017). https://doi.org/10.3150/16-bej810

    Article  MathSciNet  MATH  Google Scholar 

  • Blanes, S., Casas, F., Sanz-Serna, J.M.: Numerical integrators for the Hybrid Monte Carlo method. SIAM J. Sci. Comput. 36(4), A1556–A1580 (2014)

    Article  MathSciNet  Google Scholar 

  • Bonilla, M.R., Lozano, A., Escribano, B., Carrasco, J., Akhmatskaya, E.: Revealing the mechanism of sodium diffusion in \(\text{ Na }_x\text{ FePo }_4\) using an improved force field. J. Phys. Chem. C 122(15), 8065–8075 (2018). https://doi.org/10.1021/acs.jpcc.8b00230

    Article  Google Scholar 

  • Bornn, L., Cornebise, J.: Discussion on the paper by Girolami and Calderhead. J. R. Stat. Soc. Ser. B (Stat. Methodol.) 73(2), 123–214 (2011)

    Article  MathSciNet  Google Scholar 

  • Bouchard-Côté, A., Vollmer, S.J., Doucet, A.: The bouncy particle sampler: a non-reversible rejection-free Markov chain Monte Carlo method. J. Am. Stat. Assoc. 113(522), 855–867 (2018). https://doi.org/10.1080/01621459.2017.1294075

    Article  MATH  Google Scholar 

  • Brubaker, M.A., Salzmann, M., Urtasun, R.: A family of MCMC methods on implicitly defined manifolds. In: International Conference on Artificial Intelligence and Statistics (AISTATS), pp. 161–172 (2012)

  • Campos, C.M., Sanz-Serna, J.M.: Extra chance generalized hybrid Monte Carlo. J. Comput. Phys. 281, 365–374 (2015)

    Article  MathSciNet  Google Scholar 

  • Chen, L., Qin, Z., Liu, J.S.: Exploring hybrid Monte Carlo in Bayesian computation. In: ISBA 2000, Proceedings (2000)

  • Chen, T., Fox, E.B., Guestrin, C.: Stochastic gradient Hamiltonian Monte Carlo. In: Proceedings of the 31st International Conference on Machine Learning, Beijing, China (2014)

  • Dinh, V., Bilge, A., Zhang, C., IV, F.A.M.: Probabilistic path Hamiltonian Monte Carlo. In: Proceedings of the 34th International Conference on Machine Learning (2017)

  • Duane, S., Kennedy, A.D., Pendleton, B.J., Roweth, D.: Hybrid Monte Carlo. Phys. Lett. B 195(2), 216–222 (1987)

    Article  MathSciNet  Google Scholar 

  • Duncan, A.B., Leliévre, T., Pavliotis, G.A.: Variance reduction using nonreversible Langevin samplers. J. Stat. Phys. 163(3), 457–491 (2016)

    Article  MathSciNet  Google Scholar 

  • Duncan, A.B., Pavliotis, G.A., Zygalakis, K.C.: Nonreversible Langevin samplers: splitting schemes, analysis and implementation (2017). arXiv:1701.04247

  • Escribano, B., Lozano, A., Radivojević, T., Fernández-Pendás, M., Carrasco, J., Akhmatskaya, E.: Enhancing sampling in atomistic simulations of solid state materials for batteries: a focus on olivine \(\text{ NaFePO }_4\). Theor. Chem. Acc. 136, 43 (2017). https://doi.org/10.1007/s00214-017-2064-4

    Article  Google Scholar 

  • Fang, Y., Sanz-Serna, J.M., Skeel, R.D.: Compressible generalized hybrid Monte Carlo. J. Chem. Phys. 140(17), 174108 (2014)

    Article  Google Scholar 

  • Fu, T., Luo, L., Zhang, Z.: Quasi-Newton Hamiltonian Monte Carlo. In: Proceedings of Uncertainty in Artificial Intelligence, pp. 212–221 (2016)

  • García Daza, F., Bonilla, M.R., Llordés, A., Carrasco, J., Akhmatskaya, E.: Atomistic insight into ion transport and conductivity in ga/al-substituted li7la3zr2o12 solid electrolytes. ACS Appl. Mater. Interfaces 11, 753–765 (2019). https://doi.org/10.1021/acsami.8b17217

    Article  Google Scholar 

  • Geyer, C.J.: Practical Markov chain Monte Carlo. Stat. Sci. 7(4), 473–483 (1992)

    Article  Google Scholar 

  • Girolami, M., Calderhead, B.: Riemann Manifold Langevin and Hamiltonian Monte Carlo methods. J. R. Stat. Soc. Ser. B (Stat. Methodol.) 73(2), 123–214 (2011)

    Article  MathSciNet  Google Scholar 

  • Graham, M.M., Storkey, A.J.: Continuously tempered Hamiltonian Monte Carlo. In: Proceedings of Uncertainty in Artificial Intelligence (2017)

  • Gramacy, R., Samworth, R., King, R.: Importance tempering. Stat. Comput. 20(1), 1–7 (2010)

    Article  MathSciNet  Google Scholar 

  • Hairer, E., Lubich, C., Wanner, G.: Geometric Numerical Integration, 2nd edn. Springer, Berlin (2006)

    MATH  Google Scholar 

  • Hoffman, M.D., Gelman, A.: The No-U-Turn sampler: adaptively setting path lengths in Hamiltonian Monte Carlo. J. Mach. Learn. Res. 15, 1593–1623 (2014)

    MathSciNet  MATH  Google Scholar 

  • Horowitz, A.M.: A generalized guided Monte Carlo algorithm. Phys. Lett. B 268, 247–252 (1991)

    Article  Google Scholar 

  • Izaguirre, J.A., Hampton, S.S.: Shadow hybrid Monte Carlo: an efficient propagator in phase space of macromolecules. J. Comput. Phys. 200, 581–604 (2004)

    Article  Google Scholar 

  • Jacquier, E., Polson, N.G., Rossi, P.E.: Bayesian analysis of stochastic volatility models. J. Bus. Econ. Stat. 12, 4 (1994)

    MATH  Google Scholar 

  • Kennedy, A.D.: The theory of hybrid stochastic algorithms. In: Probabilistic Methods in Quantum Field Theory and Quantum Gravity, pp. 209–223. Springer (1990)

  • Kennedy, A.D., Pendleton, B.: Cost of the generalised hybrid Monte Carlo algorithm for free field theory. Nucl. Phys. B 607, 456–510 (2001). https://doi.org/10.1016/S0550-3213(01)00129-8

    Article  MathSciNet  MATH  Google Scholar 

  • Kim, S., Shephard, N., Chib, S.: Stochastic volatility: likelihood inference and comparison with arch models. Rev. Econ. Stud. 65, 361–393 (1998)

    Article  Google Scholar 

  • Kleppe, T.S.: Dynamically rescaled Hamiltonian Monte Carlo for Bayesian hierarchical models (2018). arXiv:1806.02068v1

  • Kong, A., Liu, J.S., Wong, W.H.: Sequential imputations and Bayesian missing data problems. J. Am. Stat. Assoc. 89(425), 278–288 (1994)

    Article  Google Scholar 

  • Lan, S., Streets, J., Shahbaba, B.: Wormhole Hamiltonian Monte Carlo. In: Proceedings of the 28th AAAI Conference on Artificial Intelligence (2014a)

  • Lan, S., Zhou, B., Shahbaba, B.: Spherical Hamiltonian Monte Carlo for constrained target distributions. In: Proceedings of the 31st International Conference on Machine Learning, pp. 629–637 (2014b)

  • Lan, S., Stathopoulos, V., Shahbaba, B., Girolami, M.: Lagrangian dynamical Monte Carlo. J. Comput. Graph. Stat. 24(2), 357–378 (2015)

    Article  Google Scholar 

  • Leimkuhler, B., Reich, S.: Simulating Hamiltonian Dynamics. Cambridge University Press, Cambridge (2005)

    Book  Google Scholar 

  • Levy, D., Hoffman, M.D., Sohl-Dickstein, J.: Generalizing Hamiltonian Monte Carlo with neural networks. In: 6th International Conference on Learning Representations (2018)

  • Lichman, M.: UCI machine learning repository (2013). http://archive.ics.uci.edu/ml. Accessed May 2015

  • Liu, J.S.: Monte Carlo Strategies in Scientific Computing. Springer Series in Statistics. Springer, New York (2008)

    Google Scholar 

  • Livingstone, S., Betancourt, M., Byrne, S., Girolami, M.: On the geometric ergodicity of Hamiltonian Monte Carlo (2016). arXiv:1601.08057v1

  • Livingstone, S., Faulkner, M.F., Roberts, G.O.: Kinetic energy choice in Hamiltonian/hybrid Monte Carlo (2017). arXiv:1706.02649v2

  • Lu, X., Perrone, V., Hasenclever, L., Teh, Y.W., Vollmer, S.J.: Relativistic Monte Carlo. In: International Conference on Artificial Intelligence and Statistics (AISTATS) (2017)

  • Luo, R., Yang, Y., Wang, J., Liu, Y.: Thermostat-assisted Continuous-tempered Hamiltonian Monte Carlo for multimodal posterior sampling. In: NIPS Advances in Approximate Bayesian Inference Workshop (2017)

  • Ma, Y.A., Fox, E.B., Chen, T., Wu, L.: A unifying framework for devising efficient and irreversible MCMC samplers (2016). arXiv:1608.05973v3

  • Mackenzie, P.B.: An improved hybrid Monte Carlo method. Phys. Lett. B 226, 369–371 (1989)

    Article  Google Scholar 

  • McLachlan, R.I.: On the numerical integration of ordinary differential equations by symmetric composition methods. SIAM J. Sci. Comput. 16(1), 151–168 (1995)

    Article  MathSciNet  Google Scholar 

  • Neal, R.M.: Bayesian learning for neural networks. Ph.D. Thesis, Department of Computer Science, University of Toronto (1994)

  • Neal, R.M.: Annealed importance sampling. Stat. Comput. 11, 125–139 (2001)

    Article  MathSciNet  Google Scholar 

  • Neal, R.M.: Improving asymptotic variance of MCMC estimators: non-reversible chains are better. Technical Report 0406, Department of Statistics, University of Toronto (2004)

  • Neal, R.M.: MCMC using Hamiltonian dynamics. In: Brooks, S., Gelman, A., Jones G.L., Meng, X.-L. (eds.) Handbook of Markov Chain Monte Carlo, vol. 2, pp. 113–162. Chapman & Hall/CRC (2011)

  • Nishimura, A., Dunson, D.B.: Recycling intermediate steps to improve Hamiltonian Monte Carlo (2015). arXiv:1511.06925v1

  • Nishimura, A., Dunson, D.B.: Geometrically tempered Hamiltonian Monte Carlo (2017). arXiv:1604.00872v2

  • Nishimura, A., Dunson, D., Lu, J.: Discontinuous Hamiltonian Monte Carlo for models with discrete parameters and discontinuous likelihoods (2018). arXiv:1705.08510v2

  • Ohzeki, M., Ichiki, A.: Mathematical understanding of detailed balance condition violation and its application to Langevin dynamics. J. Phys. Conf. Ser. 638, 012003 (2015). https://doi.org/10.1088/1742-6596/638/1/012003

    Article  Google Scholar 

  • Ottobre, M.: Markov chain Monte Carlo and irreversibility. Rep. Math. Phys. 77(3), 267–292 (2016)

    Article  MathSciNet  Google Scholar 

  • Ottobre, M., Pillai, N.S., Pinski, F.J., Stuart, A.M.: A function space HMC algorithm with second order Langevin diffusion limit. Bernoulli 22(1), 60–106 (2016)

    Article  MathSciNet  Google Scholar 

  • Pakman, A., Paninski, L.: Auxiliary-variable exact Hamiltonian Monte Carlo samplers for binary distributions. In: Advances in Neural Information Processing Systems (NIPS), pp. 2490–2498 (2013)

  • Plummer, M., Best, N., Cowles, K., Vines, K.: CODA: convergence diagnosis and output analysis for MCMC. R News 6(1), 7–11 (2006)

    Google Scholar 

  • Radivojević, T.: Enhancing sampling in computational statistics using modified Hamiltonians. Ph.D. Thesis, UPV-EHU (2016)

  • Radivojević, T., Fernández-Pendás, M., Sanz-Serna, J.M., Akhmatskaya, E.: Multi-stage splitting integrators for sampling with modified Hamiltonian Monte Carlo methods. J. Comput. Phys. 373, 900–916 (2018). https://doi.org/10.1016/j.jcp.2018.07.023

    Article  MathSciNet  MATH  Google Scholar 

  • Rimoldini, L.: Weighted skewness and kurtosis unbiased by sample size and Gaussian uncertainties. Astron. Comput. 5, 1–8 (2014)

    Article  Google Scholar 

  • Salvatier, J., Wiecki, T.V., Fonnesbeck, C.: Probabilistic programming in python using pymc3. PeerJ Comput. Sci. 2, e55 (2016)

    Article  Google Scholar 

  • Sohl-Dickstein, J.: Hamiltonian Monte Carlo with reduced momentum flips (2012). arXiv:1205.1939v1

  • Sohl-Dickstein, J., Culpepper, B.J.: Hamiltonian annealed importance sampling for partition function estimation (2012). arXiv:1205.1925

  • Sohl-Dickstein, J., Mudigonda, M., Deweese, M.: Hamiltonian Monte Carlo without detailed balance. In: Proceedings of the 31st International Conference on Machine Learning, pp. 719–726 (2014)

  • Stan Development Team: Stan Modeling Language User’s Guide and Reference Manual, version 2.17.0 ed. (2017)

  • Strathmann, H., Sejdinovic, D., Livingstone, S., Szabo, Z., Gretton, A.: Gradient-free Hamiltonian Monte Carlo with efficient kernel exponential families. In: Advances in Neural Information Processing Systems (NIPS), pp. 955–963 (2015)

  • Suwa, H., Todo, S.: General construction of irreversible kernel in Markov Chain Monte Carlo (2012). arXiv:1207.0258

  • Sweet, C.R., Hampton, S.S., Skeel, R.D., Izaguirre, J.A.: A separable shadow Hamiltonian hybrid Monte Carlo method. J. Chem. Phys. 131, 174106 (2009). https://doi.org/10.1063/1.3253687

    Article  Google Scholar 

  • Tripuraneni, N., Rowland, M., Ghahramani, Z., Turner, R.: Magnetic Hamiltonian Monte Carlo. In: Proceedings of the 34th International Conference on Machine Learning (2017). arXiv:1607.02738v2

  • van de Meent, J.W., Paige, B., Wood, F.: Tempering by subsampling (2014). arXiv:1401.7145v1

  • Wang, Z., de Freitas, N.: Predictive adaptation of hybrid Monte Carlo with Bayesian parametric bandits. In: NIPS Deep Learning and Unsupervised Feature Learning Workshop (2011)

  • Wang, Z., Mohamed, S., de Freitas, N.: Adaptive Hamiltonian and Riemann manifold Monte Carlo samplers. In: Proceedings of the 30th International Conference on Machine Learning, pp. 1462–1470 (2013)

  • Wee, C.L., Sansom, M.S., Reich, S., Akhmatskaya, E.: Improved sampling for simulations of interfacial membrane proteins: application of generalized shadow hybrid Monte Carlo to a peptide toxin/bilayer system. J. Phys. Chem. B 112(18), 5710–5717 (2008)

    Article  Google Scholar 

  • Yi, K., Doshi-Velez, F.: Roll-back Hamiltonian Monte Carlo. In: 31st Conference on Neural Information Processing Systems (NIPS) (2017)

  • Zhang, Y., Sutton, C.: Semi-separable Hamiltonian Monte Carlo for inference in Bayesian hierarchical models. In: Advances in Neural Information Processing Systems (NIPS), pp. 10–18 (2014)

  • Zhang, Y., Ghahramani, Z., Storkey, A.J., Sutton, C.A.: Continuous relaxations for discrete Hamiltonian Monte Carlo. In: Advances in Neural Information Processing Systems (NIPS), pp. 3194–3202 (2012)

  • Zhang, Y., Wang, X., Chen, C., Fan, K., Carin, L.: Towards unifying Hamiltonian Monte Carlo and slice sampling. In: Advances in Neural Information Processing Systems (NIPS), pp. 1749–1757 (2016)

  • Zhang, C., Shahbaba, B., Zhao, H.: Hamiltonian Monte Carlo acceleration using surrogate functions with random bases. Stat. Comput. 27(6), 1473–1490 (2017a)

    Article  MathSciNet  Google Scholar 

  • Zhang, C., Shahbaba, B., Zhao, H.: Precomputing strategy for Hamiltonian Monte Carlo method based on regularity in parameter space. Comput. Stat. 32(1), 253–279 (2017b)

    Article  MathSciNet  Google Scholar 

  • Zhang, Y., Chen, C., Gan, Z., Henao, R., Carin, L.: Stochastic gradient monomial Gamma sampler. In: Proceedings of the 34th International Conference on Machine Learning (2017c)

  • Zhang, C., Shahbaba, B., Zhao, H.: Variational Hamiltonian Monte Carlo via score matching. Bayesian Anal. 13(2), 485–506 (2018)

    Article  MathSciNet  Google Scholar 

  • Zou, D., Xu, P., Gu, Q.: Stochastic variance-reduced Hamilton Monte Carlo methods. In: Proceedings of the 35th International Conference on Machine Learning, vol. 80, pp. 6028–6037 (2018)

Download references

Acknowledgements

The authors would like to thank the financial support from MTM2016-76329-R (AEI/FEDER, EU) funded by the Spanish Ministry of Economy and Competitiveness (MINECO) and from Basque Government–ELKARTEK Program, Grant KK-2018/00054. This work has been possible thanks to the support of the computing infrastructure of the i2BASQUE academic network and the technical and human support provided by IZO-SGI SGIker of UPV/EHU and European funding (ERDF and ESF). This research is also supported by the Basque Government through the BERC 2018-2021 program and by MINECO: BCAM Severo Ochoa accreditation SEV-2017-0718. This work was also part of the Agile BioFoundry (http://agilebiofoundry.org) supported by the US Department of Energy, Energy Efficiency and Renewable Energy, Bioenergy Technologies Office, through contract DE-AC02-05CH11231 between Lawrence Berkeley National Laboratory and the US Department of Energy. The views and opinions of the authors expressed herein do not necessarily state or reflect those of the US Government or any agency thereof. Neither the US Government nor any agency thereof, nor any of their employees, makes any warranty, expressed or implied, or assumes any legal liability or responsibility for the accuracy, completeness or usefulness of any information, apparatus, product or process disclosed, or represents that its use would not infringe privately owned rights. The US Government retains and the publisher, by accepting the article for publication, acknowledges that the US Government retains a nonexclusive, paid-up, irrevocable, worldwide license to publish or reproduce the published form of this manuscript, or allow others to do so, for US Government purposes. The Department of Energy will provide public access to these results of federally sponsored research in accordance with the DOE Public Access Plan (http://energy.gov/downloads/doe-public-access-plan).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Tijana Radivojević.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix A: Invariance of the PMMC step

The Partial Momentum Monte Carlo step of the MMHMC method leaves the importance target distribution \(\tilde{\pi }\) (Eq. 12) invariant if for a transition kernel \(T(\cdot |\cdot )\) the following condition is satisfied

$$\begin{aligned} \tilde{\pi }(\varvec{\theta }',\mathbf {p}') =\int \tilde{\pi }(\varvec{\theta },\mathbf {p})T \left( (\varvec{\theta }',\mathbf {p}')|(\varvec{\theta }, \mathbf {p})\right) \mathrm{d}\varvec{\theta } \mathrm{d}\mathbf {p} \end{aligned}$$

for all \(n=1,\dots ,N\).

The PMMC step is sampling on a space augmented with a noise vector \(\mathbf {u} \sim \mathcal {N}(0,M)\) with the extended density \(\hat{\pi }\) (defined in Eq. 17), for which

$$\begin{aligned} \tilde{\pi }(\varvec{\theta },\mathbf {p})=\int \hat{\pi } (\varvec{\theta },\mathbf {p},\mathbf {u})\mathrm{d}\mathbf {u} . \end{aligned}$$

Therefore, we want to show that

$$\begin{aligned}&\hat{\pi }(\varvec{\theta }',\mathbf {p}',\mathbf {u}')\nonumber \\&\quad =\int \hat{\pi }(\varvec{\theta },\mathbf {p},\mathbf {u}) T\left( (\varvec{\theta }',\mathbf {p}',\mathbf {u}')| (\varvec{\theta },\mathbf {p},\mathbf {u})\right) \mathrm{d} \varvec{\theta } \mathrm{d}\mathbf {p} \mathrm{d}\mathbf {u}, \end{aligned}$$
(28)

for the transition kernel defined as

$$\begin{aligned}&T\left( (\varvec{\theta }',\mathbf {p}',\mathbf {u}')| (\varvec{\theta },\mathbf {p},\mathbf {u})\right) \\&\quad =\mathcal {P}\cdot \delta \left( (\varvec{\theta }', \mathbf {p}',\mathbf {u}')-\mathcal {R}(\varvec{\theta }, \mathbf {p},\mathbf {u}) \right) \\&\qquad +\,(1-\mathcal {P})\cdot \delta \left( (\varvec{\theta }', \mathbf {p}',\mathbf {u}')-\hat{\mathcal {F}}(\varvec{\theta }, \mathbf {p},\mathbf {u})\right) , \end{aligned}$$

where \(\mathcal {P}=\min \left\{ 1,\hat{\pi }(\mathcal {R} (\varvec{\theta },\mathbf {p},\mathbf {u}))/\hat{\pi } (\varvec{\theta },\mathbf {p},\mathbf {u})\right\} \) is the Metropolis probability, \(\delta \) is the Delta function, \(\mathcal {R}\) is the proposal function (Eq. 16) and \(\hat{\mathcal {F}}(\varvec{\theta },\mathbf {p},\mathbf {u}) =(\varvec{\theta },\mathbf {p},-\mathbf {u})\) is the flipping function. Note that the map \(\mathcal {R}\) is volume preserving; hence, the Metropolis probability \(\mathcal {P}\) does not include the Jacobian factor. For the sake of clarity, we denote \(\mathbf {x}=(\varvec{\theta },\mathbf {p},\mathbf {u})\) and write the right-hand side of the expression (28) as

$$\begin{aligned}&\int T(\mathbf {x}'|\mathbf {x})\hat{\pi }(\mathbf {x}) \mathrm{d}\mathbf {x}\\&\quad =\underbrace{\int \min \left\{ \hat{\pi }(\mathbf {x}),\hat{\pi } (\mathcal {R}(\mathbf {x})) \right\} \cdot \delta \left( \mathbf {x}' - \mathcal {R}(\mathbf {x}) \right) \mathrm{d}\mathbf {x}}_\text {1st term}\\&\qquad \underbrace{+\int \hat{\pi }(\mathbf {x}) \cdot \delta \left( \mathbf {x}' - \hat{\mathcal {F}}(\mathbf {x}) \right) \mathrm{d}\mathbf {x}}_\text {2nd term}\\&\qquad \underbrace{-\int \min \left\{ \hat{\pi }(\mathbf {x}),\hat{\pi }(\mathcal {R} (\mathbf {x})) \right\} \cdot \delta \left( \mathbf {x}'- \hat{\mathcal {F}} (\mathbf {x}) \right) \mathrm{d}\mathbf {x}}_\text {3rd term}. \end{aligned}$$

Applying change of variables \(\mathbf {x}=\hat{\mathcal {F}} \circ \mathcal {R}(\bar{\mathbf {x}})\), which is volume preserving, to the first term in the sum, omitting the bars, and using the fact that \(\hat{\mathcal {F}} =\mathcal {R}\circ \hat{\mathcal {F}}\circ \mathcal {R}\), one obtains

$$\begin{aligned} \text {1st term}= & {} \int \min \left\{ \hat{\pi }(\hat{\mathcal {F}} \circ \mathcal {R}(\mathbf {x})),\hat{\pi }(\hat{\mathcal {F}} (\mathbf {x})) \right\} \\&\cdot \delta \left( \mathbf {x}' -\hat{\mathcal {F}}(\mathbf {x}) \right) \mathrm{d}\mathbf {x}. \end{aligned}$$

Since \(\hat{\pi }\circ \hat{\mathcal {F}}=\hat{\pi }\), the first and third terms cancel out. Employing change of variables \(\mathbf {x}=\hat{\mathcal {F}}(\bar{\mathbf {x}})\) to the second term and again omitting the bars lead to

$$\begin{aligned} \text {2nd term}=\int \hat{\pi }(\mathbf {x}) \cdot \delta \left( \mathbf {x}'- \mathbf {x} \right) \mathrm{d}\mathbf {x} =\hat{\pi }(\mathbf {x}'), \end{aligned}$$

which proves the equality (28).

Appendix B: Modified Hamiltonians for splitting integrators

The coefficients for the two-stage integrator family (14) and modified Hamiltonians (22)–(24) are the following:

$$\begin{aligned} c_{21}=&\frac{1}{24}\Big (6b-1\Big ) \nonumber \\ c_{22}=&\frac{1}{12}\Big (6b^2-6b+1\Big )\nonumber \\ c_{41} =&\frac{1}{5760}\Big (7-30b\Big ) \nonumber \\ c_{42} =&\frac{1}{240}\Big (-10b^2+15b-3\Big ) \nonumber \\ c_{43} =&\frac{1}{120}\Big (-30b^3+35b^2-15b+2\Big ) \nonumber \\ c_{44} =&\frac{1}{240}(20b^2-1). \end{aligned}$$
(29)

Using (29), one can also obtain the modified Hamiltonian for the Verlet integrator, since two steps of Verlet integration are equivalent to one step of the two-stage integrator with \(b=1/4\). The coefficients are therefore

$$\begin{aligned} c_{21}= & {} \frac{1}{12}, \quad c_{22}=- \frac{1}{24} \\ c_{41}= & {} -\frac{1}{720}, \quad c_{42} = \frac{1}{120}, \quad c_{43} = -\frac{1}{240}, \quad c_{44} =\frac{1}{60}. \end{aligned}$$

For three-stage integrators (15) (a two-parameter family), the coefficients are

$$\begin{aligned} c_{21}= & {} \frac{1}{12} \Big (1-6a(1-a)(1-2b)\Big )\\ c_{22}= & {} \frac{1}{24} \Big (6a(1-2b)^2-1\Big )\\ c_{41}= & {} \frac{1}{720}\Big ( 1 + 2 (a-1) a (8 + 31 (a-1) a) (1 - 2 b) - 4 b\Big )\\ c_{42}= & {} \frac{1}{240} \Big (6 a^3 (1 - 2 b)^2 -a^2 (19 - 116 b + 36 b^2 + 240 b^3)\\&+\,a (27 - 208 b + 308 b^2) - 48 b^2 + 48 b -7\Big ) \\ c_{43}= & {} \frac{1}{180}\Big (1 + 15 a (1- 2 b) (-1 + 2 a (2 - 3 b + a (4 b-2)))\Big ) \\ c_{44}= & {} \frac{1}{240} \Big (-1 + 20 a (1 - 2 b) (b + a (1 + 6 (b-1) b))\Big ). \end{aligned}$$

The coefficients for the modified Hamiltonians (25)–(26) are calculated as

$$\begin{aligned} k_{21}= & {} c_{21}, \quad k_{22}=c_{22}, \\ k_{41}= & {} c_{41}, \quad k_{42}=3c_{41} +c_{42}, \\ k_{43}= & {} c_{41} +c_{44}, \quad k_{44}=3c_{41}+c_{42}+c_{43}. \end{aligned}$$

For the fourth-order modified Hamiltonian (25), we use the second-order centered finite difference approximations of time derivatives of the gradient of the potential function

$$\begin{aligned} \mathbf {U}^{(1)}=\frac{\mathbf {U}(t_{n+1}) -\mathbf {U}(t_{n-1})}{2\varepsilon }, \end{aligned}$$
(30)

with \(\varepsilon =h\) for the Verlet, \(\varepsilon =h/2\) for two-stage and \(\varepsilon =ah\) for three-stage integrators with a being the integrator’s coefficient advancing position variables. The sixth-order modified Hamiltonian (26), here considered only for the Verlet and two-stage integrators, is calculated using fourth-order approximation for the first derivative and second-order approximations for the second and third derivatives

$$\begin{aligned} \mathbf {U}^{(1)}= & {} \frac{\mathbf {U}(t_{n-2})-8\mathbf {U}(t_{n-1}) +8\mathbf {U}(t_{n+1})-\mathbf {U}(t_{n+2})}{12\varepsilon } \\ \mathbf {U}^{(2)}= & {} \frac{\mathbf {U}(t_{n-1})-2\mathbf {U}(t_n) +\mathbf {U}(t_{n+1})}{\varepsilon ^2}\\ \mathbf {U}^{(3)}= & {} \frac{-\mathbf {U}(t_{n-2})+2\mathbf {U}(t_{n-1}) -2\mathbf {U}(t_{n+1})+\mathbf {U}(t_{n+2})}{2\varepsilon ^3}, \end{aligned}$$

where \(\varepsilon \) depends on the integrator as before. The interpolating polynomial in terms of the gradient of the potential function \(\mathbf {U}(t_i)=U_{\varvec{\theta }} (\varvec{\theta }^i), \; i=n-k,\dots ,n,\dots ,n +k, n \in \{0, L\}\) is constructed from a numerical trajectory \(\{U_{\varvec{\theta }}(\varvec{\theta }^i\}^{L+k}_{i=-k}\) where \(k=1\) and \(k=2\) for the fourth- and sixth-order modified Hamiltonians, respectively.

Appendix C: Modified PMMC step

In the modified PMMC step proposed for MMHMC, a partial momentum update is integrated into the modified Metropolis test; i.e., it is implicitly present in the algorithm. This reduces the frequency of derivative calculations in the Metropolis function. To implement this idea, we recall that the momentum update probability

$$\begin{aligned} \mathcal {P}=\min \left\{ 1,\frac{\exp \big (-\hat{H}(\varvec{\theta }, \mathbf {p}^{*},\mathbf {u}^{*})}{\exp \big (-\hat{H}(\varvec{\theta }, \mathbf {p},\mathbf {u})\big )}\right\} \end{aligned}$$
(31)

depends on the error in the extended Hamiltonian (18). Let us first consider the fourth-order modified Hamiltonian (22) with analytical derivatives of the potential function. It is easy to show that the difference in the extended Hamiltonian (18) between a current state and a state with partially updated momentum is

$$\begin{aligned} \varDelta {\hat{H}}= & {} {U(\varvec{\theta })} + \frac{1}{2} (\mathbf {p}^{*})^\mathrm{T} M^{-1} \mathbf {p}^{*}\nonumber \\&+\, h^2c_{21}(\mathbf {p}^{*})^\mathrm{T} M^{-1}U_{\varvec{\theta } \varvec{\theta } }(\varvec{\theta } ) M^{-1} \mathbf {p}^{*}\nonumber \\&+\, {h^2c_{22}U_{\varvec{\theta } }(\varvec{\theta } ) M^{-1} U_{\varvec{\theta }}(\varvec{\theta })} + \frac{1}{2} (\mathbf {u}^{*})^\mathrm{T} M^{-1}\mathbf {u}^{*} \nonumber \\&-\, {U(\varvec{\theta })} - \frac{1}{2} \mathbf {p}^\mathrm{T} M^{-1} \mathbf {p} - h^2c_{21}\mathbf {p}^\mathrm{T} M^{-1}U_{\varvec{\theta } \varvec{\theta } }(\varvec{\theta } ) M^{-1} \mathbf {p} \nonumber \\&-\, {h^2c_{22}U_{\varvec{\theta }}(\varvec{\theta }) M^{-1} U_{\varvec{\theta }}(\varvec{\theta })} - \frac{1}{2} \mathbf {u}^\mathrm{T} M^{-1} \mathbf {u} \nonumber \\= & {} {h^2}c_{21}\Big (\varphi A + 2\sqrt{\varphi (1-\varphi )}B \Big ) \end{aligned}$$
(32)

with

$$\begin{aligned} A= & {} (\mathbf {u}-\mathbf {p})^\mathrm{T} U_{\varvec{\theta } \varvec{\theta } } (\varvec{\theta } ) (\mathbf {u}+\mathbf {p}) \nonumber \\ B= & {} \mathbf {u}^\mathrm{T}U_{\varvec{\theta } \varvec{\theta } } (\varvec{\theta } ) \mathbf {p}. \end{aligned}$$
(33)

For the sixth-order modified Hamiltonian (24) for Gaussian problems, the error in the extended Hamiltonian (18) can be calculated in a similar manner

$$\begin{aligned} \varDelta {\hat{H}}= & {} {h^2}c_{21}\Big (\varphi (A-B) + 2\sqrt{\varphi (1-\varphi )}C \Big )\nonumber \\&+\,{h^4}c_{44}\Big (\varphi (D-E) + 2\sqrt{\varphi (1-\varphi )}F \Big ), \end{aligned}$$
(34)

with

$$\begin{aligned} A&=\mathbf {u}^\mathrm{T} U_{\varvec{\theta } \varvec{\theta } } (\varvec{\theta } )\mathbf {u} \\ B&=\mathbf {p}^\mathrm{T}U_{\varvec{\theta } \varvec{\theta } } (\varvec{\theta } ) \mathbf {p}\\ C&=\mathbf {u}^\mathrm{T}U_{\varvec{\theta } \varvec{\theta } } (\varvec{\theta } ) \mathbf {p}\\ D&=(U_{\varvec{\theta } \varvec{\theta } }(\varvec{\theta } ) \mathbf {u})^\mathrm{T}U_{\varvec{\theta } \varvec{\theta } } (\varvec{\theta } ) \mathbf {u}\\ E&=(U_{\varvec{\theta } \varvec{\theta } }(\varvec{\theta } ) \mathbf {p})^\mathrm{T}U_{\varvec{\theta } \varvec{\theta } } (\varvec{\theta } ) \mathbf {p}\\ F&=(U_{\varvec{\theta } \varvec{\theta } }(\varvec{\theta } ) \mathbf {u})^\mathrm{T}U_{\varvec{\theta } \varvec{\theta } } (\varvec{\theta } ) \mathbf {p}. \end{aligned}$$

Therefore, if the modified Hamiltonians (22)–(24) with analytical derivatives are used, a new momentum can be determined as

$$\begin{aligned} \bar{\mathbf {p}}=\left\{ \begin{array}{l l} \sqrt{1-{\varphi }}\mathbf {p}+\sqrt{{\varphi }}\mathbf {u} &{} \text{ with } \text{ probability } \\ &{} \qquad \mathcal {P}= \min \{1,\exp (-\varDelta \hat{H})\}\\ \mathbf {p} &{} \text{ otherwise, } \end{array}\right. \end{aligned}$$
(35)

where \(\mathbf {u} \sim \mathcal {N}(0,M)\) is the noise vector, \(\varphi \in \left( 0,1\right] \) and \(\varDelta \hat{H}\) is defined as in (32) or (34).

Consequently, for models with no hierarchical structure, there is no need to calculate gradients within the PMMC step, second derivatives can be taken from the previous Metropolis test within the HDMC step and there is no need to generate \(\mathbf {u}^{*}\).

If the modified Hamiltonians are calculated using numerical time derivatives of the gradient of the potential function, for the Verlet, two- and three-stage integrators as in (25)–(26), the difference in the fourth-order extended Hamiltonian becomes

$$\begin{aligned} \varDelta {\hat{H}}= {h}k_{21}\Big ((\mathbf {p}^{*})^\mathrm{T} P^{*}_1 -\mathbf {p}^\mathrm{T}P_1 \Big ), \end{aligned}$$
(36)

whereas for the sixth-order extended Hamiltonian it is

$$\begin{aligned} \varDelta {\hat{H}}= & {} hk_{21}\Big ((\mathbf {p}^{*})^\mathrm{T} P^{*}_1 -\mathbf {p}^\mathrm{T}P_1 \Big )\\&+\,hk_{41}\Big ((\mathbf {p}^{*})^\mathrm{T} P^{*}_3 -\mathbf {p}^\mathrm{T}P_3 \Big )\\&+\,h^2k_{42}\Big (U_{\mathbf {x}}^\mathrm{T} P^{*}_2 - U_{\mathbf {x}}^\mathrm{T}P_2 \Big )\\&+\,h^2k_{43}\Big ((P^{*}_1)^\mathrm{T} P^{*}_1 -P_1^\mathrm{T}P_1 \Big ). \end{aligned}$$

Here, \(P^{*}_1, P^{*}_2,P^{*}_3\), are the first-, second- and third-order scaled time derivatives of the gradient, respectively (see Sect. 2.3.3), calculated from the trajectory with updated momentum \(\mathbf {p}^{*}\). The computational gain of the new PMMC step, in this case, results from skipping a calculation of the terms multiplying \(k_{22}\) in (25) and \(k_{44}\) in (26). It has to be admitted that the term multiplying \(k_{22}\) in (25) is of negligible cost, and thus the gain from using the new momentum update is not as significant as in the case of modified Hamiltonians with analytical derivatives. On the contrary, the saving in computation arising from the absence of the term multiplying \(k_{44}\) in the sixth-order modified Hamiltonian (26) is essential.

In summary, in the case of the sixth-order modified Hamiltonian, with derivatives calculated either analytically or numerically, the proposed momentum refreshment enhances computational performance of MMHMC. This also applies to the cases when the fourth-order modified Hamiltonian with analytical derivatives is used. In this situation, however, if the Hessian matrix of the potential function is dense, instead of using the modified Hamiltonian with analytical derivatives, we recommend using numerical derivatives, for which the saving is negligible. On the other hand, if the computation of the Hessian matrix is not very costly (e.g., being block diagonal, sparse, close to constant), it might be more efficient to use analytical derivatives, for which the new formulation of the Metropolis test leads to computational saving.

Appendix D: Algorithmic summary

We provide the algorithmic summary for HMC (Algorithm 1) and two alternative algorithms for the MMHMC method. One (Algorithm 2) uses the modified Hamiltonians defined through analytical derivatives of the potential function and is recommended for the problems with sparse Hessian matrices. The other algorithm (Algorithm 3) relies on the modified Hamiltonians expressed through numerical time derivatives of the gradient of the potential function. This algorithm, although including additional integration step, is beneficial for cases where higher order derivatives are computationally demanding.

figure a
figure b
figure c
Table 5 Parameter values used for the multivariate Gaussian model experiments

Appendix E: Experimental setup

See Tables 5, 6 and 7.

Table 6 Parameter values used for the Bayesian logistic regression model experiments
Table 7 Parameter values used for the stochastic volatility model experiments

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Radivojević, T., Akhmatskaya, E. Modified Hamiltonian Monte Carlo for Bayesian inference. Stat Comput 30, 377–404 (2020). https://doi.org/10.1007/s11222-019-09885-x

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11222-019-09885-x

Keywords

Navigation