Markov Chain Monte Carlo Estimation of Quantiles

We consider quantile estimation using Markov chain Monte Carlo and establish conditions under which the sampling distribution of the Monte Carlo error is approximately Normal. Further, we investigate techniques to estimate the associated asymptotic variance, which enables construction of an asymptotically valid interval estimator. Finally, we explore the finite sample properties of these methods through examples and provide some recommendations to practitioners.


Introduction
Let π denote a probability distribution having support X ⊆ R d , d ≥ 1. If W ∼ π and g : X → R is measurable, set V = g(W ). We consider estimation of quantiles of the distribution of V . Specifically, if 0 < q < 1 and F V denotes the distribution function of V , then our goal is to obtain ξ q := F −1 V (q) = inf{v : F V (v) ≥ q} .
We will assume throughout that F V (x) is absolutely continuous and has continuous density function f V (x) such that 0 < f V (ξ q ) < ∞. Notice that this means ξ q is the unique solution y of F V (y−) ≤ q ≤ F V (y).
Typically, it is not possible to calculate ξ q directly. For example, a common goal in Bayesian inference is calculating the quantiles of marginal posterior distributions in order to construct posterior credible intervals. In these settings, quantile estimates are based upon simulation and are almost always reported without including any notion of the simulation error. Our work enables practitioners to asses this error, and hence increase confidence in their inferences.
Our focus is on using Markov chain Monte Carlo (MCMC) methods to approximate ξ q . The basic MCMC method entails simulating a Markov chain X = {X 0 , X 1 , . . .} having invariant distribution π. Define Y = {Y 0 , Y 1 , . . .} = {g(X 0 ), g(X 1 ), . . .}. If we observe a realization of X of length n and let Y n(j) denote the jth order statistic of {Y 0 , . . . , Y n−1 }, then we estimate ξ q withξ n,q := Y n(j) where j − 1 < nq ≤ j . (1) We will see thatξ n,q is strongly consistent for ξ q . While this justifies the use ofξ n,q , it will be more valuable if we can also assess the unknown Monte Carlo error,ξ n,q − ξ q . We address this in two ways. The first is by finding a function b : N × (0, ∞) → [0, ∞) such that for all > 0 Pr |ξ n,q − ξ q | > ≤ b(n, ) . (2) We also assess the Monte Carlo error through its approximate sampling distribution. We will show that under a weak mixing condition on X a quantile central limit theorem (CLT) will obtain. For now, assume there exists a constant γ 2 (ξ q ) > 0 such that as n → ∞ √ n(ξ n,q − ξ q ) d → N(0, γ 2 (ξ q )) .
Note that γ 2 (ξ q ) must account for the serial dependence present in a non-trivial Markov chain and hence is more difficult to estimate well than when X is a random sample. However, if we can estimate γ 2 (ξ q ) with, sayγ 2 n , then an interval estimator of ξ q iŝ ξ n,q ± t * γ n √ n where t * is an appropriate Student's t quantile. Such intervals, or at least, the Monte Carlo standard error (MCSE),γ n / √ n, are useful in assessing the reliability of the simulation results as they explicitly describe the level of confidence we have in the reported number of significant figures inξ n,q . For more on this approach see Flegal and Gong (2013), Flegal et al. (2008), Flegal andJones (2011), Geyer (2011), Jones et al. (2006) and Jones and Hobert (2001).
We consider three methods for implementing this recipe, all of which produce effective interval estimates of ξ q . The first two are based on the CLT at (3) where we consider using the method of batch means (BM) and the subsampling bootstrap method (SBM) to estimate γ 2 (ξ q ). Regenerative simulation (RS) is the third method, but it requires a slightly different quantile CLT than that in (3). Along the way we show that significantly weaker conditions are available for the RS-based expectation estimation case previously studied in Hobert et al. (2002) and Mykland et al. (1995).
The remainder is organized as follows. We begin in Section 2 with a brief introduction to some required Markov chain theory. In Section 3 we consider estimation of ξ q withξ n,q , establish a CLT for the Monte Carlo error, and consider how to obtain MCSEs using BM and SBM. In Section 4, we consider RS, establish an alternative CLT and show how an MCSE can be obtained. In Section 5, we illustrate the use of the methods presented here and investigate their finite-sample properties in two examples. Finally, in Section 6 we summarize our results and conclude with some practical recommendations.

Markov chain background
In this section we give some essential preliminary material. Recall that π has support X and let B(X) be the Borel σ-algebra. For n ∈ N = {1, 2, 3, . . .}, let the n-step Markov kernel associated with X be P n (x, dy). Then if A ∈ B(X) and k ∈ {0, 1, 2, . . .}, P n (x, A) = Pr(X k+n ∈ A|X k = x). Throughout we assume X is Harris ergodic (π-irreducible, aperiodic, and positive Harris recurrent) and has invariant distribution π.
Let · denote the total variation norm. Further, let M : X → R + with E π M < ∞ and ψ : N → R + be decreasing such that Polynomial ergodicity of order m where m > 0 means (4) holds with ψ(n) = n −m . Geometric ergodicity means (4) holds with ψ(n) = t n for some 0 < t < 1. Uniform ergodicity means (4) holds with M bounded and ψ(n) = t n for some 0 < t < 1.
An equivalent characterization of uniform ergodicity is often more convenient for applications. The Markov chain X is uniformly ergodic if and only if there exists a probability measure φ on X, λ > 0, and an integer n 0 ≥ 1 such that When (5) holds we have that 3 Quantile estimation for Markov chains . .} and set F n (y) = n −1 n−1 i=0 I(Y i ≤ y). By the Markov chain version of the strong law of large numbers (see e.g. Meyn and Tweedie, 2009) for each y, F n (y) → F V (y) with probability 1 as n → ∞. Using this, the proof of the following result is the similar to the proof for when Y is composed of independent and identically distributed random variables (see e.g. Serfling, 1981) and hence is omitted.
While this result justifies the use ofξ n,q as an estimator of ξ q , it does not allow one to assess the unknown Monte Carlo errorξ n,q − ξ q for any finite n. In Section 3.1 we establish conditions under which (2) holds, while in Section 3.2 we examine the approximate sampling distribution of the Monte Carlo error.

Monte Carlo error under stationarity
We will consider (in this subsection only) a best-case scenario where X 0 ∼ π, that is, the Markov chain X is stationary. We begin with a refinement of a result due to Wang et al. (2011) to obtain a useful description of how the Monte Carlo error decreases with simulation sample size and the convergence rate of the Markov chain. The proof is given in Appendix B.1.
For the rest of this section we consider finite sample properties of the Monte Carlo error in the sense that our goal is to find an explicit function b : N × (0, ∞) → [0, ∞) such that (2) holds. There has been some research on this in the context of estimating expectations using MCMC (e.g. Latuszyński et al., 2012;Latuszyński and Niemiro, 2011;Rudolf, 2012), but to our knowledge this has not been considered in the quantile case. The proofs of the remaining results in this section can be found in Appendix B.2.
Theorem 2. If X satisfies (4), then for any integer a ∈ [1, n/2] and each > 0 and 0 < δ < 1 To be useful Theorem 2 requires bounding ψ(n)E π M . There has been a substantial amount of work in this area (see e.g. Baxendale, 2005;Fort and Moulines, 2003;Rosenthal, 1995), but these methods have been applied in only a few practically relevant settings (see e.g. Jones andHobert, 2001, 2004). However, in the uniformly ergodic case we have the following easy corollary.
Corollary 1. Under the assumptions of Theorem 2 and that X satisfies (5) we have for any a ∈ [1, n/2], any > 0 and any 0 < δ < 1 Then Y |X = x ∼ Gamma(5/2, 2 + x 2 /2) and marginally X ∼ t(4)-Student's t with 4 degrees of freedom. Consider an MCMC algorithm which first updates X with a Metropolis-Hastings independence sampler having the marginal of X as the invariant distribution using a t(3) proposal distribution, then updates Y with a draw from the conditional of Y |X. Letting P denote the Markov kernel for this algorithm we show in Appendix B.3 that for any measurable set A P ((x, y), A) ≥ √ 9375 32π A π(x , y ) dx dy and hence the Markov chain satisfies (5) with n 0 = 1 and λ = √ 9375/32π. Set δ = .99999, a = n/16 and consider estimating the median of the marginal of X, i.e. t(4). Then q = 1/2 and ξ 1/2 = 0 so that γ = 0.037422. Suppose we want to find the Monte Carlo sample size required to ensure that the probability the final estimateξ n,1/2 is within .10 of the truth is approximately 0.9. Then Corollary 1 gives We can improve upon the conclusion of Corollary 1.
Example 2 (Continuation of Example 1). Theorem 3 yields that which clearly establishes that the bound given in Corollary 1 is conservative.
We will compare the bound in (8) to the results of a simulation experiment. We performed 500 independent replications of this MCMC sampler for each of 3 simulation lengths and recorded the number of estimated medians for each that were more than .1 in absolute value away from the median of a t(4) distribution, i.e. 0. The results are presented in Table 1 and Figure 1. The results in Table 1 show that the estimated probability in (8) is somewhat conservative. On the other hand, from Figure 1 it is clear that the estimation procedure is not all that stable until n = 4700.

Central limit theorem
We consider the limiting distribution of the Monte Carlo errorξ n,q − ξ q . Let The proof of the following result is in Appendix B.4.
Theorem 4. If X is polynomially ergodic of order m > 11 and if σ 2 (ξ q ) > 0, then as n → ∞ To obtain an MCSE we need to estimate γ 2 (ξ q ) := σ 2 (ξ q )/[f V (ξ q )] 2 . We consider two methods for doing this-in Section 3.2.1 we consider the method of batch means while in Section 3.2.2 we consider subsampling.

Batch Means
First, we substituteξ n,q for ξ q and separately consider estimating f V (ξ n,q ) and σ 2 (ξ n,q ).
Consider estimating f V (ξ n,q ). Consistently estimating a density at a point has been studied extensively in the context of stationary time-series analysis (see e.g. Robinson, 1983) and many existing results are applicable since the Markov chains in MCMC are special cases of strong mixing processes. In our examples we use kernel density estimators with a Gaussian kernel to obtainf V (ξ n,q ), an estimator of f V (ξ n,q ).
The quantity σ 2 (y), y ∈ R is familiar. Notice that by the usual Markov chain CLT for sample means (Jones, 2004). Moreover, we show in Corollary 4 that σ 2 (y) is continuous at ξ q . In this context, estimating σ 2 (y) consistently is a well-studied problem and there are an array of methods for doing so; see Flegal et al. (2008), Flegal and Jones (2010), Flegal and Jones (2011) and Jones et al. (2006). Here we focus on the method of batch means for estimating σ 2 (ξ n,q ). For BM the output is split into batches of equal size. Suppose we obtain n = a n b n iterations {X 0 , . . . , X n−1 } and for k = 0, . . . , a n − 1 defineŪ k (ξ n,q ) = b −1 n bn−1 i=0 I(Y kbn+i ≤ξ n,q ). Then the BM estimate of σ 2 (ξ n,q ) iŝ Putting these two pieces together we estimate γ 2 (ξ q ) witĥ and we can obtain an approximate 100(1 − α)% confidence interval for ξ q bŷ where z α/2 is an appropriate standard Normal quantile.

Subsampling
It is natural to consider the utility of bootstrap methods for estimating quantiles and the Monte Carlo error. Indeed, there has been a substantial amount of work on using bootstrap methods for stationary time-series (e.g. Bertail and Clémençon, 2006;Bühlmann, 2002;Carlstein, 1986;Datta and McCormick, 1993;Politis, 2003). However, in our experience,MCMC simulations are typically sufficiently long so that standard bootstrap methods are prohibitively computationally expensive.
We focus on the subsampling bootstrap method (SBM) described in general by Politis et al. (1999) in the context of MCMC by Flegal (2012) and Flegal and Jones (2011). The basic idea is to split X into n − b + 1 overlapping blocks of length b. We then estimate ξ q over each block resulting in n − b + 1 estimates. To this end, consider the ith subsample of Then the SBM estimate of γ 2 (ξ q ) is given bŷ Note the SBM avoids having to estimate a univariate density as in the implementation of BM and RS. An approximate 100(1 − α)% confidence interval for ξ q is given bŷ where z α/2 is an appropriate standard Normal quantile.

Quantile estimation for regenerative Markov chains
Regenerative simulation (RS) provides an alternative estimation method for Markov chain simulations. RS is based on simulating an augmented Markov chain and so Theorem 4 does not apply. In this section, we derive an alternative CLT based on RS and consider a natural estimator of the variance in the asymptotic Normal distribution.
Recall X has n-step Markov kernel P n (x, dy) and suppose there exists a function s : X → [0, 1] with E π s > 0 and a probability measure Q such that We call s the small function and Q the small measure. In this case we can write where R is the residual measure, given by We now have the ingredients for constructing the split chain, Two things are apparent from this construction. First, by (15) the marginal sequence {X n } has Markov transition kernel given by P . Second, the set of n for which δ n−1 = 1, called regeneration times, represent times at which the chain probabilistically restarts itself in the sense that X n ∼ Q(·) doesn't depend on X n−1 .
The main practical impediment to the use of regenerative simulation would appear to be the means to simulate from the residual kernel R(·, ·), defined at (16). Interestingly, as shown by Mykland et al. (1995), this is essentially a non-issue, as there is an equivalent update rule for the split chain which does not depend on R. Given X k = x, find X k+1 and δ k by RS has received considerable attention in the case where either a Gibbs sampler or a fulldimensional Metropolis-Hastings sampler is employed. In particular, Mykland et al. (1995) give recipes for establishing minorization conditions as in (14), which have been implemented in several practically relevant statistical models; see e.g. Doss and Tan (2013) Suppose we start X with X 0 ∼ Q; one can always discard the draws preceding the first regeneration to guarantee this, but it is frequently easy to draw directly from Q (Hobert et al., 2002;Mykland et al., 1995). We will write E Q to denote expectation when the split chain is started with X 0 ∼ Q. Let 0 = τ 0 < τ 1 < τ 2 < . . . be the regeneration times so that τ t+1 = min {i > τ t : δ i−1 = 1}. Assume X is run for R tours so that the simulation is terminated the Rth time that a δ i = 1. Let τ R be the total length of the simulation and The split chain construction ensures that the pairs (N t , S t ) are independent and identically distributed. If E Q N 2 t < ∞ and E Q S 2 t < ∞, then as R → ∞, with probability 1, and Moreover, there is an easily calculated consistent estimator of Γ; see Hobert et al. (2002). However, the required moment conditions, E Q N 2 t < ∞ and E Q S 2 t < ∞, are unwieldy and difficult to check in practice. Hobert et al. (2002) showed that these moment conditions will hold if the Markov chain X is geometrically ergodic and there exists δ > 0 such that E π |h| 2+δ < ∞. Our next result significantly weakens the required mixing conditions. The proof can be found in Appendix B.5.
In the sequel we use Theorem 5 to develop an RS-based CLT for quantiles.

Quantile estimation
which exists under the conditions of Theorem 5.
The proof of the following CLT is given in Appendix B.6.
Theorem 6 requires slightly weaker mixing conditions than Theorem 4, but stronger conditions on f V . Sinceξ τ R ,q requires j such that 0 ≤ j − τ R q < 1 we have the following corollary.
To obtain an MCSE we need to estimate γ 2 R (ξ q ) := Γ (ξ q ) /f 2 V (ξ q ). We substituteξ τ R ,q for ξ q and separately consider Γ(ξ τ R ,q ) and f V (ξ τ R ,q ). Of course, we can handle estimating f V (ξ τ R ,q ) exactly as before, so all we need to concern ourselves with is estimation of Γ(ξ τ R ,q ).
We can recognize Γ(y) as the variance of an asymptotic Normal distribution. LetF R (y) = R t=1 S t (y)/ R t=1 N t . Then, using (17), we have that with probability 1 as R → ∞,F R (y) → F V (y) for each fixed y. Moreover, using (18), we have for each y ∈ R, as R → ∞, We can consistently estimate Γ(y) for each y witĥ Finally, if t R−1,α/2 is a quantile from a Student's t distribution with R − 1 degrees of freedom, a 100(1 − α)% confidence interval for ξ q iŝ

Examples
In this section, we investigate the finite-sample performance of the confidence intervals for ξ q defined at (12), (13), and (19) corresponding to BM, SBM and RS, respectively. While our two examples are quite different, the simulation studies were conducted using a common methodology. In each case we perform many independent replications of the MCMC sampler. Each replication was performed for a fixed number of regenerations, then confidence intervals were constructed on the same MCMC output. For the BM-based and SBM-based intervals we always used b n = n 1/2 , which has been found to work well in other settings (Flegal, 2012;Flegal and Jones, 2010;Jones et al., 2006). In order to estimate coverage probabilities we require the true values of the quantiles of interest. These are available in only one of our examples. In the other example we estimate the truth with an independent long run of the MCMC sampler. The details are described in the following sections.

Polynomial target distribution
Jarner and Roberts (2007) studied MCMC for heavy-tailed target distributions. A target distribution is said to be polynomial of order r if its density satisfies f (x) = (l(|x|)/|x|) 1+r , where r > 0 and l is a normalized slowly varying function-a particular example is Student's t-distribution. We consider estimating quantiles of Student's t-distribution t(v) for degrees of freedom v = 3, 6, and 30; the t(v) distribution is polynomial of order v. We use a Metropolis random walk algorithm with jump proposals drawn from a N(0, σ 2 ) distribution. By Proposition 3 of Jarner and Roberts (2007), a Metropolis random walk for a t(v) target distribution using any proposal kernel with finite variance is polynomially ergodic of order v/2. Thus the conditions of Theorem 4 are met if v > 22, while the conditions of Corollary 2 are satisfied for v > 2; see the first row of Table 2.
We tuned the scale parameter σ 2 in the proposal distribution in order to minimize autocorrelation in the resulting chain (second row of Table 2); the resulting acceptance rates varied from about 25% for t(3) with σ = 5.5, the heaviest tailed target distribution, to about 40% for t(30) with σ = 2.5. Regeneration times were identified using the retrospective method of Mykland et al. (1995); see Appendix C for implementation details, and the bottom rows of Table 2 for regeneration performance statistics (mean and SD of tour lengths). For each of the 10 4 replications and using each of (12), (13), and (19) we computed a 95% confidence interval for ξ q for q = 0.50, 0.75, 0.90, and 0.95.
Empirical coverage rates (percentage of the 10 4 intervals that indeed contain the true quantile ξ q ) are shown in Table 3. We first note that, as might be expected, agreement with the nominal coverage rate is closer for estimation of the median than for the tail quantiles ξ .90 and ξ .95 . As for comparing the three approaches to MCSE estimation, we find that agreement with the nominal coverage rate is closest for SBM on average, but SBM also shows the greatest variability between cases considered, including a couple of instances (ξ .90 and ξ .95 for the t(3) target distribution) where the method appears overly conservative. Results for BM and RS show less variability than those of SBM, with agreement with the nominal rate being slightly better for RS. It is interesting to note that while the conditions of Theorem 4 do not hold for the t(6) or t(3) target distributions, BM and SBM still appear to be producing consistent estimators of γ 2 (ξ q ), as agreement with the nominal rate improves from R = 500 to R = 2000. This suggests the possibility that our theoretical results might be improved upon, i.e. a Markov chain CLT for quantiles may well hold under weaker conditions than those required for our Theorem 4. Table 4 shows the mean and standard deviation of interval half-widths for the three cases (defined by the quantile q and number of regenerations R) in which all empirical coverage Target distribution t(30) t(6) t(3)

MCSE estimation
Gen RS RS Tuning parameter σ 2.5 3.5 5.5 Mean tour length 3.58 4.21 5.60 SD of tour lengths 3.14 3.80 5.23 Table 2: Metropolis random walk on t(v) target distribution with N(0, σ 2 ) jump proposals, example of Section 5.1. In first row of table "Gen" indicates polynomial ergodicity of order m > 11, guaranteeing the conditions of both Theorem 4 and Corollary 2; "RS" indicates m > 1, guaranteeing the conditions of Corollary 2.
rates were at least 0.935. The most striking result here is the huge variability in the standard errors as computed by SBM, particularly for the heaviest tailed target distribution. Results for BM and RS are comparable, with RS intervals being slightly wider and having slightly less variability. The SBM intervals are generally as wide or wider, demonstrating again the apparent conservatism of the method.

Probit regression
van Dyk and Meng (2001) report data which is concerned with the occurrence of latent membranous lupus nephritis. Let y i be an indicator of the disease (1 for present), x i1 be the difference between IgG3 and IgG4 (immunoglobulin G), and x i2 be IgA (immunoglobulin A) where i = 1, . . . , 55. Suppose and take the prior on β := (β 0 , β 1 , β 2 ) to be Lebesgue measure on R 3 . Roy and Hobert (2007) show that the posterior π(β|y) is proper. Our goal is to report a median and an 80% Bayesian credible region for each of the three marginal posterior distributions. Denote the qth quantile associated with the marginal for β j as ξ (j) q for j = 0, 1, 2. Then the vector of parameters to be estimated is Ξ = ξ .1 , ξ .5 , ξ .9 , ξ .1 , ξ .5 , ξ .9 , ξ .1 , ξ .5 , ξ (2) .9 .
We will sample from the posterior using the PX-DA algorithm of Liu and Wu (1999), which Roy and Hobert (2007) prove is geometrically ergodic. For a full description of this algorithm in the context of this example see Flegal and Jones (2010) or Roy and Hobert (2007   We now turn our attention to comparing coverage probabilities for estimating elements of Ξ based on the confidence intervals at (12), (13), and (19). We calculated a precise estimate from a long simulation of the PX-DA chain and declared the observed quantiles to be the truth-see Table 5. Roy and Hobert (2007) implement RS for this example and we use their settings exactly with 25 regenerations. This procedure was repeated for 1000 independent replications resulting in a mean simulation effort of 3.89E5 (2400). The resulting coverage probabilities can be found in Table 6. Notice that for the BM and SBM intervals all the coverage probabilities are within two MCSEs of the nominal 0.95 level. However, for RS only 7 of the 9 investigated settings are within two MCSEs of the nominal level. In addition, all of the results using RS are below the nominal 0.95 level. Table 6 gives the empirical mean and standard deviation of the half-width of the BMbased, RS-based, and SBM-based confidence intervals. Notice the interval lengths are similar across the three methods, but the RS-based interval lengths are more variable. Further, the RS-based intervals are uniformly wider on average than the BM-based intervals even though they have uniformly lower empirical coverage probabilities.

Discussion
We have focused on assessing the Monte Carlo error for estimating quantiles in MCMC settings. In particular, we established quantile CLTs and considered using batch means, subsampling and regenerative simulation to estimate the variance of the asymptotic Normal distributions. We also studied the finite-sample properties of the resulting confidence intervals in the context of two examples.
The mixing conditions required in the CLT in Theorem 4 are slightly stronger than the CLT of Theorem 6 which is based on RS. However, RS requires stronger conditions on the density f V and it requires the user to establish a useful minorization condition (14). Although minorization conditions are often nearly trivial to establish, they are seen as a substantial barrier to practitioners because they require a problem-specific approach. Alternatively, it is straightforward to implement the BM-based and SBM-based approaches in general softwaresee the recent mcmcse R package (Flegal and Hughes, 2012) which implements the methods of this paper.
Overall, the finite sample properties were comparable across the three variance estimation techniques considered. However, SBM required substantially more computational effort because it orders each of the n − b + 1 overlapping blocks to obtain the quantile estimates. For example, we ran a three dimensional probit regression Markov chain (Section 5.2) for 2 × 10 5 iterations and calculated an MCSE for the median of the three marginals. The BM calculation took 0.37 seconds while the SBM calculation took 84.04 seconds, or 227 times longer.
The techniques developed here are applicable for a wide range of target quantiles. Hence, our work allows Bayesian practitioners to evaluate the uncertainty of the end points of commonly reported Bayesian credible regions. In other applications the goal may be estimation of extreme quantiles. However, the techniques developed in the current paper should be used with caution since many extreme quantile estimators are based on sample statistics other than order statistics.

A Preliminaries: Markov chains as mixing processes
Let S = {S n } be a strictly stationary stochastic process on a probability space (Ω, F, P ) and set F l k = σ(S k , . . . , S l ). Define the α-mixing coefficients for n = 1, 2, 3, . . . as Let f : Ω → R be Borel. Set T = {f (S n )} and let α T and α S be the α-mixing coefficients for T and S, respectively. Then by elementary properties of sigma-algebras (cf. Chow and Teicher, 1978, p. 16) σ(T k , . . . , T l ) ⊆ σ(S k , . . . , S l ) = F l k and hence α T (n) ≤ α S (n) for all n. Define the β-mixing coefficients for n = 1, 2, 3, . . . as If β(n) → 0 as n → ∞, we say that S is β-mixing while if α(n) → 0 as n → ∞, we say that S is α-mixing. It is easy to prove that 2α(n) ≤ β(n) (see Bradley, 1986, for discussion of this and other inequalities) for all n so that β-mixing implies α-mixing.
Let X be a stationary Harris ergodic Markov chain on (X, B(X)), which has invariant distribution π. In this case the expressions for the α-and β-mixing coefficients can be simplified while Davydov (1973) showed that β(n) = X P n (x, ·) − π(·) π(dx) .
Proof. The first part is Theorem 4.3 of Bradley (1986) while the second part can be found in the proof of Theorem 2 in Chan and Geyer (1994).
Since 2α(n) ≤ β(n) we observe that Theorem 7 ensures that if p ≥ 0, then B Proofs

B.1 Proof of Proposition 1
We begin by showing that we can weaken the conditions of Lemma 3.3 in Wang et al. (2011).
A similar argument shows that for sufficiently large n The remainder exactly follows the proof of Lemma 3.3 in Wang et al. (2011) and hence is omitted.
The proof of Proposition 1 will follow directly from the following Corollary.
Corollary 3. Suppose the stationary Markov chain X is polynomially ergodic of order m > 1.
Proof. Let α Y (n) be the strong mixing coefficients for Y = {g(X n )} and note that α Y (n) ≤ n −m E π M by Theorem 7. The remainder follows from Lemma 1 and our basic assumptions on F V and f V .

B.2 Proof of Theorems 2 and 3
We begin with some preliminary results.

B.3 Proof for Example 1
Let q(x) denote the density of a t(3) distribution, f X (x) the density of a t(4) distribution, f Y |X (y|x) the density of a Gamma(5/2, 2 + x 2 /2) distribution and π(x, y) the density at (7). Then the Markov chain has Markov transition density given by we have that for all x, y π(x , y ) and our claim follows immediately.

B.4 Proof of Theorem 4
Lemma 5. Suppose the stationary Markov chain X is polynomially ergodic of order m > 5 and let D n = [ ξ q − C 0 n −1/2+δ √ log n, ξ q + C 0 n −1/2+δ √ log n ] for a positive finite constant C 0 . Then for any δ ∈ (3/(m + 1), 1/2) there is a positive finite constant C such that with probability 1 for sufficiently large n Proof. Let α Y (n) be the strong mixing coefficients for Y = {g(X n )} and note that α Y (n) ≤ n −m E π M by Theorem 7. The claim now follows directly from Theorem 2.1 in Wang et al. (2011) and our basic assumptions on F V and f V .
Proof of Theorem 4. The proof follows a technique introduced in Sen (1972). Assume that X 0 ∼ π and hence the Markov chain is stationary. From Lemma 5 we have that for all y ∈ D n with probability 1 for sufficiently large n Since m > 11 we see that −1/4 + δ < 0 and hence for all y ∈ D n with probability 1 as n → ∞ Recall the definition of σ 2 (y) from (9). Theorem 9 in Jones (2004) shows that as n → ∞ and hence by Slutsky's theorem for all y ∈ D n as n → ∞ By Corollary 3ξ n,q ∈ D n with probability 1 for sufficiently large n and hence as n → ∞ Notice that by definition F n (ξ n,q ) = F V (ξ q ) + O(n −1 ) as n → ∞ and by Taylor's expansion there exists 0 < h < 1 such that

Now by Corollary 3 with probability 1 as
and we conclude that as n → ∞ That the same conclusion holds for any initial distribution follows from the same argument as in Theorem 17.1.6 of Meyn and Tweedie (2009).

B.6 Proof of Theorem 6
We require a preliminary result before proceeding with the rest of the proof.
Lemma 6. If X is polynomially ergodic of order m > 1, then Γ(y) is continuous at ξ q .
Proof. Denote the limit from the right and left as lim y→x + and lim y→x − , respectively. From the assumption on F V it is clear that Recall that Let Z 1 (y) = S 1 (y) − F V (y)N 1 and note E Q [Z 1 (y)] = 0 since Hobert et al. (2002) show Equations (24) and (25) yield E Q lim y→ξ + q S 1 (y) = E Q lim y→ξ − q S 1 (y) . The composition limit law and (24) result in What remains to show is that the limit of the expectation is the expectation of the limit. Notice that 0 < S 1 (y) ≤ N 1 for all y ∈ R and |Z 1 (y)| = |S 1 (y) − F V (y)N 1 | ≤ S 1 (y) + N 1 ≤ 2N 1 , which implies E Q Z 1 (y) 2 ≤ 4E Q N 2 1 . By Theorem 5 E Q N 2 1 < ∞ and the dominated convergence theorem gives, for any finite x, Finally, from the above fact and (26) we have lim y→ξ + q E Q Z 1 (y) 2 = lim y→ξ − q E Q Z 1 (y) 2 , and hence E Q Z 1 (y) 2 is continuous at ξ q implying the desired result. Hobert et al. (2002) show that Γ(y) = σ 2 (y)E π s where s is defined at (14), which yields the following corollary.
Proof of Theorem 6. Notice First, consider the s R sequence. A Taylor series expansion yields where ζ is between ξ q and ξ q + y/ √ R. Let h : R + → R + satisfy lim R→∞ h (τ R ) / √ τ R = 0 and set j = τ R q + h(τ R ). From (27) we have and hence lim R→∞ s R = −yf V (ξ q ) by assumptions on F V and the fact thatN → E(N 1 ) with probability 1 where 1 ≤ EN 1 < ∞ by Kac's theorem.
Lemma 6 and the continuous mapping theorem imply Using (28), (29), and Slutsky's Theorem, we can conclude as R → ∞ . C Regenerative simulation in example of Section 5.1 The minorization condition necessary for RS is, at least in principle, quite straightforward for a Metropolis-Hastings algorithm. Let q(x, y) denote the proposal kernel density, and α(x, y) the acceptance probability. Then P (x, dy) ≥ q(x, y)α(x, y)dy, since the right hand side only accounts for accepted jump proposals, and the minorization condition is established by finding s and ν such that q(x, y)α(x, y) ≥ s (x)ν (y). By Theorem 2 of Mykland et al. (1995), the probability of regeneration on an accepted jump from x to y is then given by r A (x, y) = s (x)ν (y) q(x, y)α(x, y) .
Letting π denote the (possibly unnormalized) target density, we have for a Metropolis random walk α(x, y) = min π(y) π(x) , 1 ≥ min c π(x) , 1 min π(y) c , 1 for any positive constant c. Further, for any pointx and any set D we have q(x, y) ≥ inf y∈D q(x, y) q(x, y) q(x, y)I D (y) .