Hypothesis testing for Markov chain Monte Carlo

Gyori, Benjamin M.; Paulin, Daniel

doi:10.1007/s11222-015-9594-1

Hypothesis testing for Markov chain Monte Carlo

Published: 10 July 2015

Volume 26, pages 1281–1292, (2016)
Cite this article

Statistics and Computing Aims and scope Submit manuscript

Benjamin M. Gyori¹ &
Daniel Paulin²

901 Accesses
2 Citations
9 Altmetric
4 Mentions
Explore all metrics

Abstract

Testing between hypotheses, when independent sampling is possible, is a well developed subject. In this paper, we propose hypothesis tests that are applicable when the samples are obtained using Markov chain Monte Carlo. These tests are useful when one is interested in deciding whether the expected value of a certain quantity is above or below a given threshold. We show non-asymptotic error bounds and bounds on the expected number of samples for three types of tests, a fixed sample size test, a sequential test with indifference region, and a sequential test without indifference region. Our tests can lead to significant savings in sample size. We illustrate our results on an example of Bayesian parameter inference involving an ODE model of a biochemical pathway.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A Stochastic Model of Mathematics and Science

Article 09 April 2024

David H. Wolpert & David B. Kinney

A double sampling plan for truncated life tests under two-parameter Lindley distribution

Article 08 April 2024

Chien-Wei Wu, Armin Darmawan & Nien-Yun Wu

Forest construction of Gaussian and discrete variables with the application of Watanabe Bayesian Information Criterion

Article 12 April 2024

Ashraful Islam & Joe Suzuki

References

Bardenet, R., Doucet, A., Holmes, C.: Towards scaling up Markov chain Monte Carlo: an adaptive subsampling approach. Proceedings of the 31st International Conference on Machine Learning (ICML-14) (2014)
Gilks, W.R., Richardson, S., Spiegelhalter, D.J. (eds.): Markov Chain Monte Carlo in Practice. Interdisciplinary Statistics. Chapman & Hall, London (1996)
MATH Google Scholar
Gyori, B., Paulin, D.: Non-asymptotic confidence intervals for MCMC in practice. arXiv preprint (2014)
Kato, T.: Perturbation Theory for Linear Operators, vol. 132. Springer Science & Business Media, Heidelberg (1976)
Book MATH Google Scholar
Korattikara, A., Chen, Y., Welling, M.: Austerity in MCMC land: cutting the Metropolis–Hastings budget. Proceedings of the 31st International Conference on Machine Learning (ICML-14) (2014)
Klipp, E., Herwig, R., Kowald, A., Wierling, C., Lehrach, H.: Systems Biology in Practice: Concepts, Implementation and Application. Wiley-VCH, Weinheim (2005)
Book Google Scholar
Lai, T.L.: Optimal stopping and sequential tests which minimize the maximum expected sample size. Ann. Stat. 1, 659–673 (1973)
Lai, T.L.: Nearly optimal sequential tests of composite hypotheses. Ann. Stat. 16(2), 856–886 (1988). doi:10.1214/aos/1176350840
Article MathSciNet MATH Google Scholar
Lawrence, N.D., Girolami, M., Rattray, M., Sanguinetti, G.: Learning and Inference in Computational Systems Biology. MIT Press, Cambridge (2009)
MATH Google Scholar
Legay, A., Delahaye, B., Bensalem, S.: Statistical Model Checking: An Overview. In: Barringer, H., Falcone, Y., Finkbeiner, B., Havelind, K., Lee, I., Pace, G., Rosu, G., Sokolsky, O., Tikkmann, N. (eds.) Runtime Verification, pp. 122–135. Springer, Heidelberg (2010)
Chapter Google Scholar
Lehmann, E.L., Romano, J.P.: Testing Statistical Hypotheses, Springer Texts in Statistics, 3rd edn. Springer, New York (2005)
Google Scholar
León, C.A., Perron, F.: Optimal Hoeffding bounds for discrete reversible Markov chains. Ann. Appl. Probab. 14(2), 958–970 (2004). doi:10.1214/105051604000000170
Article MathSciNet MATH Google Scholar
Miasojedow, B.: Hoeffding’s inequalities for geometrically ergodic Markov chains on general state space. Stat. Probab. Lett. 87, 115–120 (2014). doi:10.1016/j.spl.2014.01.013
Article MathSciNet MATH Google Scholar
Paulin, D.: Concentration inequalities for Markov chains by Marton couplings and spectral methods. Electron. J. Probab. (to appear)
Roberts, G.O., Rosenthal, J.S.: General state space Markov chains and MCMC algorithms. Probab. Surv. 1, 20–71 (2004). doi:10.1214/154957804100000024
Article MathSciNet MATH Google Scholar
Swameye, I., Müller, T., Jt, Timmer, Sandra, O., Klingmüller, U.: Identification of nucleocytoplasmic cycling as a remote sensor in cellular signaling by databased modeling. Proc. Natl. Acad. Sci. 100(3), 1028–1033 (2003)
Article Google Scholar
Wald, A.: Sequential tests of statistical hypotheses. Ann. Math. Stat. 16(2), 117–186 (1945)
Article MathSciNet MATH Google Scholar

Download references

Acknowledgments

The authors thank the anonymous referees for their insightful comments. DP was supported by an MOE Singapore Academic Research Fund Tier 2 Grant “Approximate Computational Methods for High-Dimensional Systems”.

Author information

Authors and Affiliations

Department of Systems Biology, Harvard Medical School, Boston, MA, USA
Benjamin M. Gyori
Department of Statistics and Applied Probability, National University of Singapore, Singapore, Singapore
Daniel Paulin

Authors

Benjamin M. Gyori
View author publications
You can also search for this author in PubMed Google Scholar
Daniel Paulin
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Benjamin M. Gyori.

Additional information

Source code available at github.com/bgyori/mcmchyp.

Appendix

1.1 Estimating the spectral gap

In this section, we propose a procedure to estimate the spectral gap of Markov chains. The procedure is motivated by the following lemma. In the statement of the lemma, we are going to use the scalar product

$$\begin{aligned} \left<f,g\right>_{\pi }:=\sum _{x\in \varOmega } f(x)g(x)\pi (x), \end{aligned}$$

where $\pi $ is the stationary distribution of a finite state Markov chain.

Lemma 1

Suppose that $X_1,X_2,\ldots $ is a finite state Markov chain with spectral gap $\gamma $, and transition kernel P. Suppose that $f:\varOmega \rightarrow \mathbb {R}$ satisfies that $\mathbb {E}_{\pi }(f)=0$, and that f is not orthogonal to the eigenspace corresponding to the eigenvector of P of the second largest absolute value (with respect to the scalar product $\left<\cdot ,\cdot \right>_{\pi }$). Let $\rho _\eta (f):=\mathbb {E}_{\pi }(f(X_0)f(X_\eta ))$, then

$$\begin{aligned} \lim _{\eta \rightarrow \infty }(|\rho _{\eta }(f)|/\mathrm {Var}_{\pi }(f))^{1/\eta }=1-\gamma ^*.\end{aligned}$$

(22)

Remark 1

A similar result can be shown for chains with general state spaces, but for notational simplicity we only consider finite state spaces. In practice, the condition of non-orthogonality is almost always satisfied.

Proof

We can write the eigenvalues of the operator P as $1={\uplambda }_1\ge {\uplambda }_2\ge \ldots \ge {\uplambda }_n$, with $n=|\varOmega |$, and corresponding right eigenvectors $f_1,\ldots ,f_n$, which are orthonormal with respect to the scalar product $\left<\cdot ,\cdot \right>_{\pi }$. It is clear that $f_1=1$, and using the assumption $\mathbb {E}_{\pi }(f)=0$, f can be decomposed as

$$\begin{aligned} f=\sum _{i=2}^{n}c_i\cdot f_i, \end{aligned}$$

for some constants $\{c_i\}_{i=1}^{n}$. Using this decomposition, we have

$$\begin{aligned} \frac{\mathbb {E}_{\pi }(f(X_0) f(X_{\eta }))}{\mathrm {Var}_{\pi }(f)}=\frac{\left<f,P^{\eta } f\right>_{\pi }}{\left<f,f\right>_{\pi }}=\frac{\sum _{i=2}^{n}c_i^2 {\uplambda }_i^{\eta }}{\sum _{i=2}^{n}c_i^2}, \end{aligned}$$

and using the condition of non-orthogonality, we have

$$\begin{aligned} \lim _{\eta \rightarrow \infty }\left| \frac{\sum _{i=2}^{n}c_i^2 {\uplambda }_i^{\eta }}{\sum _{i=2}^{n}c_i^2}\right| ^{1/\eta }=\max _{2\le i\le n}|{\uplambda }_i|=1-\gamma ^*. \end{aligned}$$

In practice, we cannot choose $\eta $ to be infinity, in particular, there are some issues related to the variance of the estimator of $\rho _{\eta }(f)$ that need to be taken in to account. Given burn-in time $t_0$, and $f(X_{t_{0}+1})$, $\ldots , f(X_{t_0+n})$, we define the empirical mean as

$$\begin{aligned}\overline{f}=\frac{f(X_{t_{0}+1})+\cdots + f(X_{t_0+n})}{n}, \end{aligned}$$

and use the estimator

$$\begin{aligned} \hat{\rho }_{\eta }(f):=\frac{\sum _{j=t_0+1}^{n+t_0-\eta }(f(X_j)-\overline{f})(f(X_{j+\eta })-\overline{f})}{n-\eta }. \end{aligned}$$

The typical range of dependence of the elements of the Markov chain is of the order of $1/\gamma ^*$, so we expect the standard deviation of $\hat{\rho }_{\eta }(f)/\mathrm {Var}_{\pi }(f)$ to be of the order of $\frac{1}{\sqrt{n \gamma ^*}}$. Now if we want to use the estimator

$$\begin{aligned} 1-\gamma ^*\approx (\rho _{\eta }(f)/\mathrm {Var}_{\pi }(f))^{1/\eta }, \end{aligned}$$

the standard deviation of $\rho _{\eta }(f)/\mathrm {Var}_{\pi }(f)$ needs to be much smaller than $(1-\gamma ^*)^{\eta }$.

Solving the equation $\frac{1}{\sqrt{n \gamma ^*}}=(1-\gamma ^*)^{\eta }$ leads to $\eta =\frac{\log (n\gamma ^*)}{2\log (1/(1-\gamma ^*))}$, so we propose the choice

$$\begin{aligned} \eta =\frac{\log (n\gamma ^*)}{4\log (1/(1-\gamma ^*))}, \end{aligned}$$

(23)

for which the standard deviation of $\rho _{\eta }(f)/\mathrm {Var}_{\pi }(f)$ is much smaller than $(1-\gamma ^*)^{\eta }$. A slight inconvenience of this choice is that it depends on the unknown parameter $\gamma ^*$. A way to overcome this issue is by computing the value of $\eta $ iteratively. Based on Lemma 1 and the above observation, we propose the following procedure.

1.
Choose some functions $f_1,\ldots , f_m: \varOmega \rightarrow \mathbb {R}$ that together determine the location in the state space (for example, if $\varOmega \subset \mathbb {R}^m$, then $f_1,\ldots , f_m$ can be chosen as the coordinates).
2.
Run some initial amount of simulations $X_1,\ldots , X_{t_0+n}$, and in every step, save the values $f_1(X_i),\ldots , f_m(X_i)$.
3.
Compute $\hat{\gamma }^*$ based on $\eta =1$ and $f=f_1, \ldots , f_m$. Denote the minimum of these values by $\gamma ^*_{\min }(1)$. Compute $\eta (1):=\frac{\log (n\gamma ^*)}{4\log (1/(1-\gamma ^*_{\min }(1)))}$.
4.
Inductively assume we have already computed $\eta (i)$. Then compute $\gamma ^*_{\min }(i+1)$ based on $\eta (i)$. If $\gamma ^*_{\min }(i+1)\ge \gamma ^*_{\min }(i)$, then stop, and let $\hat{\gamma }^*:=\gamma ^*_{\min }(i)$. Otherwise compute $\eta (i+1)$ and repeat this step.
5.
If the initial amount of simulations n satisfies that $n>100/\hat{\gamma }^*$, accept the estimate, otherwise choose $n=200/\hat{\gamma }^*$ and restart from Step 2.

The motivation for choosing $f_1,\ldots , f_m$ in this way is that we use all the available information about the Markov chain, and thus the estimator is expected to be more accurate than if we would only use a subset of this information. The motivation for Step 5 is to ensure that we have sufficient initial data for the estimate.

An illustration of this iterative procedure based on our case study (see Sect. 4) is shown in Fig. 2. Here we use the components of the Markov chain’s state (the value of the model parameters $k_1$ to $k_4$) for the estimation.

A histogram of estimated spectral gap values is shown in Figure 3 for the 1000 independent chains used in our case study.

1.2 Simulation details

Here we provide additional details on the JAK-STAT pathway model case study. The species in the model are listed in Table 1.

Table 1 Species in the JAK-STAT pathway model

Full size table

The ODE equations governing the model dynamics are as follows.

$$\begin{aligned}&\frac{\mathrm{d}[\mathrm {STAT}]}{\mathrm{d}t} = -k_1[\mathrm {STAT}][\mathrm {Epo}] + 2k_4[X_K]\\&\frac{\mathrm{d}[\mathrm {STATp}]}{\mathrm{d}t} = k_1[\mathrm {STAT}][\mathrm {Epo}] - k_2[\mathrm {STATp}]^2\\&\frac{\mathrm{d}[\mathrm {STATpd}]}{\mathrm{d}t} = -k_3[\mathrm {STATpd}] + 0.5k_2[\mathrm {STATp}]^2\\&\frac{\mathrm{d}[\mathrm {X_1}]}{\mathrm{d}t} = k_3[\mathrm {STATpd}] -k_4[\mathrm {X}_1] \end{aligned}$$

$$\begin{aligned}&\frac{\mathrm{d}[\mathrm {X_j}]}{\mathrm{d}t} = k_4[\mathrm {X}_{j-1}]-k_4[\mathrm {X}_j] \quad , \quad \quad j=2\ldots K \\&\frac{\mathrm{d}[\mathrm {STATn}]}{\mathrm{d}t} = k_3[\mathrm {STATpd}]-k_4[\mathrm {X}_K]. \end{aligned}$$

Here K is the number of delay variables, set to $K=10$.

The observation model is defined as follows.

$$\begin{aligned} y_1&= [\mathrm {STATp}]+2[\mathrm {STATd}] \\ y_2&= [\mathrm {STAT}]+[\mathrm {STATp}]+2[\mathrm {STATd}]. \end{aligned}$$

Here $y_1$ represents total phosphorylated STAT and $y_2$ represents total STAT in cytoplasm. Observations are available at 18 discrete time points up to 60 min. The data points, as well as standard deviations for each point are available in Swameye et al. (2003).

The parameter vector of the model is $\varvec{\theta }=(k_1,k_2,$ $k_3,k_4)$. We assume a uniform prior over a hypercube defined by a bounded interval for each parameter. The parameter ranges and the covariance matrix diagonal entries ($\sigma _{\mathrm {MH}}$) used to define the MCMC proposal distribution are provided in Table 2.

Table 2 Parameter ranges and entries in the proposal covariance matrix

Full size table

Rights and permissions

Reprints and permissions

About this article

Cite this article

Gyori, B.M., Paulin, D. Hypothesis testing for Markov chain Monte Carlo. Stat Comput 26, 1281–1292 (2016). https://doi.org/10.1007/s11222-015-9594-1

Download citation

Received: 28 September 2014
Accepted: 29 June 2015
Published: 10 July 2015
Issue Date: November 2016
DOI: https://doi.org/10.1007/s11222-015-9594-1

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Hypothesis testing for Markov chain Monte Carlo

Abstract

Access this article

Similar content being viewed by others

A Stochastic Model of Mathematics and Science

A double sampling plan for truncated life tests under two-parameter Lindley distribution

Forest construction of Gaussian and discrete variables with the application of Watanabe Bayesian Information Criterion

References

Acknowledgments