Sharp edge, vertex, and mixed Cheeger type inequalities for finite Markov kernels

We show how the evolving set methodology of Morris and Peres can be used to show Cheeger inequalities for bounding the spectral gap of a finite Markov kernel. This leads to sharp versions of several previous Cheeger inequalities, including ones involving edge-expansion, vertex-expansion, and mixtures of both. A bound on the smallest eigenvalue also follows.


Introduction
Given a finite, irreducible reversible Markov kernel P the Perron-Frobenius theorem guarantees that the matrix P has a real valued eigenbasis with eigenvalues 1 = λ 0 (P) ≥ λ 1 (P) ≥ · · · ≥ λ n−1 (P) ≥ −1. The spectral gap λ = 1 − λ 1 (P) between the largest and second largest eigenvalues, or in the non-reversible case the gap λ = 1 − λ 1 P+P * 2 of the additive symmetrization, governs key properties of the Markov chain. Alon [1], Lawler and Sokal [5], and Jerrum and Sinclair [4] showed lower bounds on the spectral gap in terms of geometric quantities on the underlying state space V , known as Cheeger inequalities. Similarly, in the reversible case Diaconis and Stroock [3] showed a lower bound on 1 + λ n−1 , known as Poincaré inequalities, which also have a geometric flavor.
Inequalities of both types have played an important role in the study of the mixing times of Markov chains. Conversely, the authors of [7] used their Evolving set bounds on mixing times to show a Cheeger inequality, although this was later removed as it was weaker than previously known bounds. We improve on their idea and find that our resulting Theorem 3.2 can be used to show sharp Cheeger-like lower bounds on λ and 1 + λ n−1 , both in the edge-expansion sense of Jerrum and Sinclair, the vertex-expansion notion of Alon, and a mixture of both. The bounds on λ typically improve on previous bounds by a factor of two, which is essentially all that can be hoped for as most of our bounds are sharp; the notion of edge-expansion used in our Cheeger inequality for 1 + λ n−1 is entirely new.
The paper is organized as follows. In the preliminaries we review some mixing time and Evolving set results. This is followed in Section 3 by our main result, an Evolving set generalization of Cheeger's inequality. In Section 4 this is used to show a sharp version of the edge expansion Cheeger Inequality and to improve on vertex-expansion bounds of Alon and of Stoyanov. Similar bounds on λ n−1 , and more generally the second largest magnitude eigenvalue, are found in Section 5.

Preliminaries
Consider a finite ergodic Markov kernel P (i.e. transition probability matrix) on state space V with stationary distribution π. This is called lazy if P(x, x) ≥ 1/2 for every x ∈ V , and is reversible if P * = P where the time-reversal P * (x, y) = π(y)P(y,x) . The ergodic flow from A ⊂ V to B ⊂ V is Q(A, B) = x∈A,y∈B π(x)P(x, y). The total variation distance between distributions σ and π is σ − π T V = 1 2 x∈V |σ(x) − π(x)|. The rate of convergence of a reversible walk is related to spectral gap [4,3] by where p n x (y) = P n (x, y) and λ max = max{λ 1 (P), |λ n−1 (P)|}. Morris and Peres [7] introduced a new tool for studying the rate of convergence: Definition 2.1. Given set A ⊂ V a step of the evolving set process is given by choosing u ∈ [0, 1] uniformly at random, and transitioning to the set The walk is denoted by S 0 , S 1 , S 2 , . . ., S n , with transition kernel K n (A, S) = P rob(S n = S|S 0 = A), The main result of [7] is a bound on the rate of convergence in terms of Evolving sets: A few easy lemmas of theirs will be required for our work, both of which the interested reader should have little trouble in showing. First, a Martingale relation: Note from the definition that for a lazy walk A u ⊂ A if u > 1/2, while A u ⊃ A if u ≤ 1/2. The gaps between A and A u are actually related to ergodic flow:

A Generalized Cheeger Inequality
Recall that a Cheeger inequality is used to bound eigenvalues of the transition kernel in terms of some geometric quantity. "The Cheeger Inequality" in the finite Markov setting generally refers to the bound The quantity h is known as the Cheeger constant, or Conductance, and measures how quickly the walk expands from a set. We now show a generalization of the Cheeger inequality which is expressed in terms of Evolving sets. The Cheeger constant is replaced by f -congestion: ) .
The f -congestion is given by Small f -congestion corresponds to a rapid change in set size of the Evolving set process. In [6] it is found that to study many measures of convergence rate (total variation, relative entropy, chisquare, etc.) there correspond appropriate choice of C f . The f -congestion is thus closely related to convergence of Markov chains, which in part explains why our main result holds: Given a finite, irreducible, reversible Markov chain, and f : The final inequality followed from C g (S n−1 ) ≤ C g , induction, and S 0 = {x}. But then, by equation (2.1), Remark 3.3. For a non-reversible walk Theorem 3.2 holds with 1 − λ max replaced by 1 − λ * , where λ * = max i>0 |λ i | is the second largest magnitude (complex-valued) eigenvalue of P. This follows from the related lower bound (see e.g. [6]). While intriguing, it is unclear if lower bounds on 1 − λ * have any practical application.
Remark 3.4. An anonymous reader notes that rather than using lower bounds on variation distance we could instead use the well known-relation ρ(A) = lim k→∞ A k 1/k for spectral radius in terms of a consistent matrix norm satisfying Av ≤ C A v . In this case take A = P − E where E is the matrix with rows all equal to π, and total variation norm has C = 2.

Cheeger Inequalities
Special cases of Theorem 3.2 include bounds of the vertex type as in Alon [1], the edge type as in Jerrum and Sinclair [4], and mixtures of both. The key to the reduction will be the following lemma: Given a concave function f : [0, 1] → R and two non-increasing functions g,ĝ : The inequality (4.3) shows that if a bigger value (x) is increased by some δ, while a smaller value (y) is decreased by δ, then the sum f (x) + f (y) decreases. In our setting, the condition that ∀t ∈ [0, 1] : t 0 g(u) du ≥ t 0ĝ (u) du shows that changing fromĝ to g increased the already large values ofĝ(u) when u is small, while the equality 1 0 g(u) du = 1 0ĝ (u) du assures that this is canceled out by an equal decrease in the already small values when u is big. The lemma then follows from (4.3).
It remains to relate the f -congestion to the edge or vertex notions of Cheeger constant, and then choose the optimal function f .

Edge expansion
We first consider edge-expansion, i.e. ergodic flow, and in particular derive a bound in terms of the symmetrized Cheeger constant To do this a somewhat stronger bound will be shown, and then a few special cases will be considered, including that ofh. .
Without condition f (a) ≤ f (1 − a) the result holds with minimum taken over all proper subsets of V .
Proof. First consider the reversible, lazy case. By Lemma 2.4 and the remarks before it, Q(A, A c ) is the area below π(A u ) and above π(A), and also above π(A u ) and below π(A). By Lemma 4.1 the value C f (A) is maximized when π(A u ) = m(u) where m(u) is as in the first diagram of Figure 1. Then . In the general case consider the lazy, reversible Markov chain P ′ = 1 2 I + P+P *

2
. Then apply equation (4.4) to P ′ and observe that λ = 2λ P ′ and Q(A, A c ) = 2Q P ′ (A, A c ), to derive the relation .
Note that (4.5) holds even if f ′′ is not concave.   .
Proof. The upper bound is classical. The second lower bound follows from the first because √ 1 − x ≤ 1− x/2. For the first lower bound, consider C √ a(1−a) and apply equation (4.5). Then, for some A ⊂ V , To simplify this let X = 1 2 (1 +h(A) π(A c )) and Y = 1 2 (1 −h(A) π(A)) in Lemma 5.5.
The Corollary is sharp on the two-point space u − v with P(u, v) = P(v, u) = 1 andh = 2. Bounds in terms of h typically show at best λ ≥ 1 and so cannot be sharp for the two-point space.
Different choices of f (a) work better if more is known about the dependence ofh(A) on set size. For example, for a walk on a cycle it is better to choose f (a) = sin(πa).
A more refined argument can be used to determine λ exactly. As before, consider P ′ = I+P 2 . Then which is the correct value of λ.

Vertex-expansion
The Generalized Cheeger inequality can also be used to show Cheeger-like inequalities in terms of vertex-expansion (the number of boundary vertices), leading to sharp versions of bounds due to Alon [1], Bobkov, Houdré and Tetali [2] and Stoyanov [8].
Two notions of vertex-expansion are required: Quantitiesh in andh in (A) are defined similarly, but with π(A)π(A c ) in the denominator. The minimum transition probability P 0 = min x =y∈V {P(x, y) : P(x, y) > 0} will also be required.
Theorem 4.6. The spectral gap of a finite, reversible Markov kernel satisfies For the non-reversible case replace P 0 by P 0 /2.  Finish by applying Lemma 4.1 to upper bound C √ a in terms of h in or C √ a(1−a) in terms ofh in . For h out upper bound C √ a using the second diagram of Figure 2. As before, the non-lazy case is reduced to the lazy case by the relation λ P = 2λ P ′ where P ′ = I+P 2 , while the non-reversible case is reduced to the reversible case by the relation λ P = λ P ′′ where P ′′ = P+P * 2 . To compare this to previous bounds we note that Stoyanov [8], improving on results of Alon [1] and Bobkov, Houdré and Tetali [2], showed that a reversible Markov chain will satisfy Our Theorem 4.6, and the approximations √ 1 − h out P 0 ≤ 1 − h out P 0 /2 and √ 1 + h in P 0 ≤ 1 + h in P 0 /2, give a stronger bound for reversible chains, Remark 4.7. The h in and h out bounds in this section were not sharp, despite our having promised sharp bounds. This is because C √ a(1−a) ≤ C √ a is a better quantity to consider. If C √ a(1−a) were used instead of C √ a then we would obtain sharp, although quite complicated, bounds; these bounds simplify in theh andh in cases which is why we have used C √ a(1−a) for those two cases. Bounds based on C √ a(1−a) are sharp on the two-point space u − v with P(u, v) = P(v, u) = 1.

Mixing edge and vertex expansion
We can easily combine edge and vertex-expansion quantities, and maximize at the set level rather than at a global level. For instance, in the reversible case λ ≥ min Alternatively, we can apply Lemma 4.1 directly: Theorem 4.8. The spectral gap of a finite, reversible Markov kernel satisfies λ ≥ min For the non-reversible case replace P 0 by P 0 /2.
The proofs are no different from that of the cases already dealt with, other than that the worst cases m(u) are somewhat more complicated, and so we omit the proofs. As in Remark 4.7 the first bound can be made sharp (and even more complicated) by working with 1−C √ a(1−a) , while the second bound is already sharp on the two point space.

Bounding the smallest eigenvalue
The generalized Cheeger inequality can also be used to bound 1 − λ max for a reversible walk, by examining P directly instead of the lazy walk P ′ = I+P 2 as before. Techniques of the previous sections carry through if modified expansion quantities are used, such as the following: . Apply Lemma 4.1 with the second figure of Figure 1 to obtain To finish let X = ℘ A +˜ (A) π(A c ) and Y = ℘ A −˜ (A) π(A) in Lemma 5.5.
Hence, to bound spectral gap λ consider the worst-case ergodic flow from a set A to its complement A c , whereas to bound λ * use the worst-case ergodic flow from a set A to a set the same size as its complement A c .
To improve on this, note that a bound similar to the lower bound of Corollary 4.2 holds for Ψ(A) as well. Since Ψ(A) ≥ 1/2n for all A ⊂ V , this again suggests taking f (a) = sin(πa), and so if n is odd then 1 − λ max ≥ 1 − C sin(πa) = 1 − cos(2πΨ(A n−1 2n )) = 1 − cos π n This is again an equality.
Vertex-expansion lower bounds for 1 − λ * (and hence also 1 − λ max ) hold as well. For instance, if P 0 = min x,y∈V {P(x, y) : P(x, y) > 0} (note that x = y is permitted) and out = min Example 5.4. A vertex-expander is a lazy walk where h out ≥ ǫ > 0. Analogously, we might define a non-lazy vertex-expander to be a walk where out ≥ ǫ > 0. If the expander is regular of degree d then which (up to a small constant factor) generalizes the relation 1 − λ max ≥ ǫ 2 /4d for the lazy walk.