Non-reversible Parallel Tempering for Deep Posterior Approximation

Authors

  • Wei Deng Purdue University Morgan Stanley
  • Qian Zhang Purdue University
  • Qi Feng University of Michigan, Ann Arbor
  • Faming Liang Purdue University
  • Guang Lin Purdue University

DOI:

https://doi.org/10.1609/aaai.v37i6.25893

Keywords:

ML: Bayesian Learning, ML: Probabilistic Methods, RU: Stochastic Models & Probabilistic Inference, RU: Stochastic Optimization, RU: Uncertainty Representations

Abstract

Parallel tempering (PT), also known as replica exchange, is the go-to workhorse for simulations of multi-modal distributions. The key to the success of PT is to adopt efficient swap schemes. The popular deterministic even-odd (DEO) scheme exploits the non-reversibility property and has successfully reduced the communication cost from quadratic to linear given the sufficiently many chains. However, such an innovation largely disappears in big data due to the limited chains and few bias-corrected swaps. To handle this issue, we generalize the DEO scheme to promote non-reversibility and propose a few solutions to tackle the underlying bias caused by the geometric stopping time. Notably, in big data scenarios, we obtain a nearly linear communication cost based on the optimal window size. In addition, we also adopt stochastic gradient descent (SGD) with large and constant learning rates as exploration kernels. Such a user-friendly nature enables us to conduct approximation tasks for complex posteriors without much tuning costs.

Downloads

Published

2023-06-26

How to Cite

Deng, W., Zhang, Q., Feng, Q., Liang, F., & Lin, G. (2023). Non-reversible Parallel Tempering for Deep Posterior Approximation. Proceedings of the AAAI Conference on Artificial Intelligence, 37(6), 7332-7339. https://doi.org/10.1609/aaai.v37i6.25893

Issue

Section

AAAI Technical Track on Machine Learning I