Hitting time and mixing time bounds of Stein's factors

For any discrete target distribution, we exploit the connection between Markov chains and Stein's method via the generator approach and express the solution of Stein's equation in terms of expected hitting time. This yields new upper bounds of Stein's factors in terms of the parameters of the Markov chain, such as mixing time and the gradient of expected hitting time. We compare the performance of these bounds with those in the literature, and in particular we consider Stein's method for discrete uniform, binomial, geometric and hypergeometric distribution. As another application, the same methodology applies to bound expected hitting time via Stein's factors. This article highlights the interplay between Stein's method, modern Markov chain theory and classical fluctuation theory.


INTRODUCTION AND MAIN RESULTS
Stein's method is well-known to be a powerful method for bounding the error rates of various distributional approximation, see e.g. Barbour and Chen (2005); Diaconis and Holmes (2004); Ley et al. (2017); Ross (2011) and the references therein. At the heart of it lies the Stein's equation where Z ∼ π is the target distribution with support on X , L is the Stein operator associated with π, h belongs to a rich function class such as the class of indicator functions or Lipschitz continuous functions and f h is the solution of the Stein's equation. One popular approach to identify the Stein operator L is the generator approach introduced by Barbour (1990); Götze (1991), where L is the generator of a Markov process X = (X t ) t 0 with transition semigroup (P t ) t 0 on the state space X and stationary distribution π. Writing π(f ) = f dπ, the solution f h can then be related to X via f h (x) = ∞ 0 P t h(x) − π(h) dt, (1.2) whenever the above integral exists. The obvious advantage of this approach is the connection with Markov processes, see e.g. Brown and Xia (2001); Eichelsbacher and Reinert (2008). In addition, the solution form of f h naturally invites spectral techniques for Stein's method, which has been the subject of investigation in Schoutens (2001). We also remark that Döbler et al. (2017) propose an interesting iterative procedure for bounding f h and its derivative.
Suppose now X = 0, N with N ∈ N 0 ∪ {∞} is a countable set, where we write a, b = {a, a + 1, . . . , b − 1, b} for a, b ∈ Z. In Markov chain theory, the operator appearing on the right of (1.2) is commonly known as deviation kernel D = (D(i, j)) i,j∈X as in Coolen-Schrijner and van Doorn (2002); Mao (2004) (also known as fundamental matrix Kemeny et al. (1976), ergodic potential Syski (1978 or centered resolvent Miclo (2016)). In this paper, we further exploit this intimate connection between Markov chain theory and Stein's method, thereby allowing us to express f h in terms of hitting time of an associated birth-death chain for any discrete target distribution π as in Eichelsbacher and Reinert (2008), and from there connects Stein's method with modern Markov chain literature and offer universal bounds of Stein's factors in terms of quantities such as mixing time and eigentime.
Before we discuss our main results, we fix our notations and revisit various parameters of countable Markov chains. We refer readers to Aldous and Fill (2002); Levin et al. (2009); Montenegro and Tetali (2006) for in-depth account of these topics. For any probability measure µ, ν with support on X , the total variation distance between µ and ν is For f on X , we write ||f || ∞ := sup x |f (x)|, the sup-norm of f . In this paper, we are primarily interested in the following parameters associated with an ergodic countable Markov chain X = (X t ) t 0 : • Worst-case mixing time: for any ǫ > 0, • Average hitting time and relaxation time: where τ A := inf {t; X t ∈ A} is the first hitting time of A by X, and as usual we write τ j = τ {j} . Note that for uniformly ergodic Markov chain, the eigentime identity Aldous and Fill (2002); Cui and Mao (2010); Mao (2004) is given by where λ 0 = 0 < λ 1 λ 2 . . . are eigenvalues of −L. A closely related parameter is the relaxation time t rel := 1/λ 1 , and for finite reversible Markov chains we have t mix (ǫ) t rel log(1/ǫπ min ) with π min := min i π(i), see e.g. (Levin et al., 2009, Theorem 20.6).
• Worst-case expected strong stationary time Aldous (1982): • Worst-case expected hitting time of large set Oliveira (2012); Peres and Sousi (2015): for 0 < α < 1/2, t hit (α) := sup • Worst-case expected deviation of hitting time to a single state Aldous (1982): Note that Oliveira (2012); Peres and Sousi (2015) show that for finite Markov chains t mix (1/4) and t hit (α) are equivalent up to a constant depending on α, while Aldous (1982) proves the equivalence (up to some universal constants) between t mix (1/4), t sst , t dev for reversible finite Markov chains. With the above notations in mind, we are now ready to state the main result of this paper:

deviation kernel D associated with L and π exists and is finite if and only if
In such case, if L is reversible, i, j ∈ X and for any h such that π(|h|) < ∞, In particular, if h = δ j , the Dirac mass at j, the sup-norm of the Stein factors are Note that we can always pick L to be a birth-death process, and in such case the expected hitting time E i (τ j ) is readily computable and is expressed solely in terms of π, see Remark 1.1. Remark 1.1. For any discrete distribution π on X , it is shown in Eichelsbacher and Reinert (2008) that we can always pick L to be the generator of a birth-death process with birth rate b i := L(i, i + 1) = (i + 1)π(i + 1)/π(i) for i ∈ 0, N − 1 and death rate d i := L(i, i − 1) = i for i ∈ 1, N . According to Coolen-Schrijner and van Doorn (2002); Kijima (1997); Mao (2004), for i, j ∈ X = 0, N , and so the discrete forward gradient ▽f δ j can be written as Another formula of E i (τ j ) involves differences in summation of eigenvalues, which is given by, for example for i < j, (2009); Gong et al. (2012) and the references therein. In any case all these expressions are expressed in terms of a given target π.
Remark 1.2. It is tempting to think that ||f δ j || ∞ equals to D(j, j). Yet, while D(i, j) D(j, j) for any i, j, it is unclear to the author whether |D(i, j)| is less than or equal to D(j, j).
Theorem 1.1 reveals that hitting time and other Markov chain parameters as described are closely related to the structure and properties of f h . In particular, ||▽f δ j || ∞ t dev . This upper bound allows us to bound the Stein factors using various parameters: Corollary 1.1 (Bounding Stein's factors via hitting and mixing time). Suppose that π is a discrete distribution with finite support on X = 0, N and N < ∞, and L is a reversible generator. Let H = {h : X → [0, 1]} and i ∈ X , then where 0 < α < 1/2 and C α > 0 is an universal constant depending only on α.
Remark 1.3. In practice, we argue that the relaxation time bound is perhaps the most useful of all as exact expressions or bounds on spectral gap t rel = 1/λ 1 are readily available for many models. As a side note, we can also offer upper bounds involving log-Sobolev constant by bounding the mixing time with that, see Diaconis and Saloff-Coste (1996). This may perhaps yield tighter upper bound due to the double logarithm.
The rest of this paper is organized as follows. In Section 2, we present the proofs of Theorem 1.1 and Corollary 1.1. In Section 3, we illustrate our main results by detailing a few examples involving common distributions. As another application, we demonstrate a way to bound expected hitting time via Stein's factor in Section 4.

PROOFS OF THE MAIN RESULTS
2.1. Proof of Theorem 1.1. Coolen-Schrijner and van Doorn (2002); Mao (2004) show that D exists and is finite if and only if i∈X π(i)E i (τ j ) < ∞.
for some j and then for all j by irreducibility. The expressions of D(i, j) and D(j, j) follow readily from (Coolen-Schrijner and van Doorn, 2002, equation 5.5, 5.7). Define the α-potential kernel D α = (D α (i, j)) i,j∈X by Note that by dominated convergence theorem, Under the proposed assumptions on h and reversibility of L, we use (Coolen-Schrijner and van Doorn, 2002, Lemma 5.1) to arrive at To prove (1.6), we see that Now, we take h = δ j and f δ j (i) = D(i, j). Using triangle inequality, we have where we use the fact that j D(j, j) = t av and the random target lemma (Levin et al., 2009, Lemma 10.1) for the second term. As for the gradient of f δ j , it is straight forward from (1.6) that 2.2. Proof of Corollary 1.1. To arrive at the first equation, we make use of (1.5), ||h|| ∞ 1 and triangle inequality to arrive at Note that by the random target lemma (Levin et al., 2009, Lemma 10.1), the second term is independent of i which implies j∈X π(j)E i (τ j ) = t av , and desired result follows. For the second set of equation, we apply (1.6) and ||h|| ∞ 1 to yield 10t sst follows from (Aldous, 1982, Lemma 16), and making use of that together with (Aldous, 1982, Lemma 12) give t dev 10t sst 5t mix (1/4).

BOUNDING STEIN'S FACTORS VIA HITTING AND MIXING TIME -EXAMPLES
In this section, we will discuss in detail several examples to illustrate both Theorem 1.1 and Corollary 1.1, and compare with existing bounds in the literature. Our primary comparison is the bounds in Eichelsbacher and Reinert (2008). Notable results of this section are the O(log n) bound in Example 3.2, and bounds in Example 3.3.
Example 3.1 (discrete uniform on 0, n−1 with n < ∞). In our first example, we look at X = 0, n−1 and π(i) = 1/n for i ∈ X , and we take L to be the uniform chain with L(i, j) = 1 for all i = j as in (Aldous, 1982, Example 48), which is reversible. The nice feature about this chain is that Also, note that the eigenvalues of −L are 0 with multiplicity 1 and 1/n with multiplicities n − 1. As a result, Theorem 1.1 now reads, for any i, j ∈ X , As for Corollary 1.1, since t av = O(1/n), In addition, note that the relaxation time bound yields a crude upper bound of size O(n log n) for the gradient of Stein's solution: sup h∈H |▽f h (i)| 2(n − 1) n 5 λ 1 log(4/π min ) = 5n log(4n).
Note that Stein's method for discrete uniform distribution has first been considered in Diaconis and Holmes (2004, Chapter 2). In particular, in Theorem 2.2.1 of Diaconis and Holmes (2004), it is shown in the proof that, in our notations, for odd n and h = δ S , ||f h || ∞ (n − 1)/2, so our uniform bound of size O(1/n) seems to be tighter in this setting.
Example 3.2 (Binomial distribution on 0, n with parameters n and 0 < p < 1). In the second example, we consider π(i) = C n i p i (1 −p) n−i for i ∈ X = 0, n . Stein's method for binomial distribution has also been considered in Ehm (1991). We take L to be a birth-death process with birth rate b i = p(n − i) and death rate d i = (1 − p)i. The non-zero eigenvalues of −L are λ k = k for k ∈ 1, n (see e.g. Schoutens (2000)), and it follows from Corollary 1.1 that As for the relaxation time bound, we have This O(n) bound does not seem to be useful at all when compared with the O(1) uniform bound of sup h∈H ||▽f h || ∞ min{1/(1 − p), 1/p} as in Eichelsbacher and Reinert (2008).
Example 3.3 (Hypergeometric distribution on 0, r with parameters n, r and 0 < 2r n). In this example, we study the hypergeometric distribution for i ∈ 0, r , and pick L to be the generator of the Bernoulli-Laplace model, that is, it is a birth-death chain with birth rate b i and death rate d i to be respectively .
The eigenvalues of −L are so Corollary 1.1 now reads where the equality follows from Saloff-Coste, 2006, Page 2114). Existing work on Stein's method for hypergeometric distribution include Reinert and Schoutens (1998), Reinert (2005, Section 4) and Schoutens (2000, Section 4 Example 5), however in these work bounds for the Stein's factors cannot be found. We also adapt a different Stein's equation, namely the generator of the Bernoulli-Laplace model, than the one in existing literature.
Example 3.4 (Geometric distribution with success probability 0 < p < 1). In this example, we use only estimates and information on hitting time to bound the Stein's factors for geometric distribution. More specifically, we look at π(i) = (1 − p) i p for i ∈ X = N 0 , and choose L with unit per capita death rate and birth rate b i = (i + 1)(1 − p). It follows from Remark 1.1 that for any fixed j ∈ X , if i < j then For any i, j ∈ X , and using Remark 1.1, we have Specializing into the case of geometric distribution leads to Now, using Corollary 1.1 and summing over all possible j gives and so sup h∈H ||▽f h || ∞ 2/p. Note that in Eichelsbacher and Reinert (2008)  In the previous section, we have illustrated how we can use information of hitting time and mixing time to give bounds on Stein's factors. We aim at achieving the opposite in this section and illustrate how we can obtain estimate on the expected hitting time to 0 of a Galton-Watson with immigration (GWI) process.
Next, we bound the above expression using the Stein's factor bound in Barbour et al. (2015) to arrive at (1 − p) r |E i (τ 0 ) − E i+1 (τ 0 )| 1 1 − p =: C, and the above yields the following linear bound: This bound can perhaps be refined by adapting non-uniform bounds of Stein's factor.