Sampling 3-colourings of regular bipartite graphs

We show that if $\gS=(V,E)$ is a regular bipartite graph for which the expansion of subsets of a single parity of $V$ is reasonably good and which satisfies a certain local condition (that the union of the neighbourhoods of adjacent vertices does not contain too many pairwise non-adjacent vertices), and if $\cM$ is a Markov chain on the set of proper 3-colourings of $\gS$ which updates the colour of at most $\rho|V|$ vertices at each step and whose stationary distribution is uniform, then for $\rho \approx .22$ and $d$ sufficiently large the convergence to stationarity of $\cM$ is (essentially) exponential in $|V|$. In particular, if $\gS$ is the $d$-dimensional hypercube $Q_d$ (the graph on vertex set $\{0,1\}^d$ in which two strings are adjacent if they differ on exactly one coordinate) then the convergence to stationarity of the well-known Glauber (single-site update) dynamics is exponentially slow in $2^d/(\sqrt{d}\log d)$. A combinatorial corollary of our main result is that in a uniform 3-colouring of $Q_d$ there is an exponentially small probability (in $2^d$) that there is a colour $i$ such the proportion of vertices of the even subcube coloured $i$ differs from the proportion of the odd subcube coloured $i$ by at most $.22$. Our proof combines a conductance argument with combinatorial enumeration methods.


Introduction and statement of the result
Markov chain Monte Carlo algorithms (MCMC's) occur frequently in computer science in algorithms designed to sample from or estimate the size of large combinatorially defined structures; they are also used in statistical physics and the study of networks to help understand the behavior of models of physical systems and networks in equilibrium. In this paper we study a class of natural MCMC's that sample from proper 3-colourings of a regular bipartite graph.
Let Σ = (V, E) be a simple, loopless, finite graph on vertex set V and edge set E. (For graph theory basics, see e.g. [4], [9].) For a positive integer q write C q = C q (Σ) for the set of proper q-colourings of Σ; that is, Let π q = π q (Σ) be the uniform probability distribution on C q .
The notion of q-colouring is fundamental in graph theory; see e.g. [3, Chapter 5] for a survey. The notion also occurs in statistical physics; the pair (C q , π q ) is the zero-temperature limit of the q-state antiferromagnetic Potts model (see e.g. [27,28]).
Glauber dynamics for proper q-colourings is the single-site update Markov chain M q = M q (Σ) on state space C q with transition probabilities P q (χ 1 , χ 2 ), χ 1 , χ 2 ∈ C q , given by We may think of M q dynamically as follows. From a q colouring χ, choose a vertex v uniformly from V and a colour j uniformly from {0, . . . , q − 1}. Then define a function χ ′ : V → {0, . . . , q − 1} by Finally, move to χ ′ if χ ′ is a proper q-colouring, and stay at χ otherwise. (A variant of Glauber dynamics chooses j uniformly from {0, . . . , q − 1} \ {χ(w) : w ∼ v}, ensuring that χ ′ is always a proper colouring. This changes the transition probabilities, but does not significantly change the qualitative behavior of the chain.) For all Σ the chain M q is aperiodic, but it is not in general irreducible (consider, for example, Σ = K q , the complete graph on q vertices), and so not ergodic. In the case when M q is ergodic (e.g., when Σ has maximum degree ∆ and q ≥ ∆ + 2; see [15]) it is readily checked that it has (unique) stationary distribution π q . (One only has to check that M q is reversible with respect to π q ; that is, that it satisfies the detailed balance equations π q (χ 1 )P q (χ 1 , χ 2 ) = π q (χ 2 )P q (χ 2 , χ 1 ) for all χ 1 , χ 2 ∈ C q .) A natural and important question to ask about M q in this case is how quickly it converges to its stationary distribution. We define the mixing time τ Mq of M q by τ Mq = min t : d T V (P t q , π q ) ≤ 1 e where P t q (χ, χ ′ ) is the probability of moving from χ to χ ′ in t steps and d T V (P t q , π q ) = max is total variation distance. The mixing time of M q captures the speed at which the chain converges to its stationary distribution: for every ε > 0, in order to get a sample from C q which is within ε of π q (in total variation distance), it is necessary and sufficient to run the chain from some arbitrarily chosen distribution for some multiple (depending on ε) of the mixing time. For surveys of issues related to the mixing time of a Markov chain, see e.g. [1,21,22]. Jerrum [15] and Salas and Sokal [24] independently showed that if Σ has maximum degree ∆ and q > 2∆ then there is rapid mixing of the Glauber dynamics; i.e., τ Mq (Σ) is polynomial in |V |. In fact, they showed that the mixing time is optimal (O(|V | log |V |)). Bubley and Dyer [7] showed that there is rapid mixing for q = 2∆ and Molloy [20] improved this to optimal mixing. In a breakthrough result Vigoda [29] showed rapid mixing for q ≥ (11/6)∆. More recently Dyer, Greenhill and Molloy [11] exhibited optimal mixing for q ≥ (2 − ε)∆ for a small positive constant ε.
In this paper our aim is to explore the limitations of Glauber dynamics as a sampling tool by exhibiting a class of graphs for which the mixing time is essentially as far from optimal as possible. In this direction, Luczak and Vigoda [19] have exhibited families of planar graphs for which M q is not rapidly mixing for each fixed q ≥ 3 and families of bipartite graphs with maximum degree ∆ for which M q is not rapidly mixing for any 3 ≤ q ≤ O(∆/ log ∆). A drawback of these negative results is that the families exhibited consist of random graphs. Here, we attempt to remedy this by constructing explicit families of graphs for which Glauber dynamics is inefficient. We focus exclusively on the case q = 3 (we cannot see at the moment how to apply our techniques to any q > 3) and Σ regular bipartite. Specifically, we establish certain local and expansion conditions in a regular bipartite graph Σ that force τ M 3 (Σ) to be (almost) exponential in |V |. The discrete hypercube is among the families of graphs which satisfy our conditions.
Our techniques actually apply to the class of ρ-local chains (considered in [6] and also in [10], where the terminology ρ|V |-cautious is employed) for suitably small ρ. A Markov chain M on state space C q is ρ-local if in each step of the chain at most ρ|V | vertices have their colour changed; that is, if Before stating our main result, we establish some notation. From now on, Σ = (V, E) will be a d-regular bipartite graph with partition classes E and O. For u, v ∈ V we write u ∼ v if there is an edge in Σ joining u and v. Set as an external closure of A) and say that such an A is small if Define the bipartite expansion of Σ by note that 0 ≤ δ < 1. The second inequality is clear. To see the first, note that since Σ is regular and bipartite it has a perfect matching, and so satisfies The bipartite expansion constant is a measure of the proportion by which the neighbourhood size of a small set exceeds the size of the set itself, in the worst case. Finally, define the locality ℓ(Σ) of Σ to be the largest ℓ ≥ 0 such that for all x ∼ y ∈ V and for all independent sets I (sets of vertices spanning no edges) in the subgraph of Σ induced by N(x) ∪ N(y) we have |I| ≤ 2d − ℓ. (So, for example, if Σ is the d-regular tree then ℓ(Σ) = 2 since the subgraph induced by the neighbourhoods of adjacent vertices contains an independent set of size 2d − 2; whereas if Σ is the complete d-regular bipartite graph then ℓ(Σ) = d.) Our main result is the following. Recall that is the usual binary entropy function. Theorem 1.1 Fix ρ > 0 satisfying H(ρ)+ρ < 1. There are constants d 0 , C 1 , C ′ 1 , C 2 > 0 all depending on ρ such that if Σ is a d-regular bipartite graph on N vertices with bipartite expansion δ and locality ℓ > 0 satisfying and with d ≥ d 0 and if M(Σ) is an ergodic ρ-local Markov chain on state space C 3 (Σ) with stationary distribution π 3 (Σ) then Note that for all ρ ≤ . 22 we have H(ρ) + ρ < 1. Here and throughout we use "log" for log 2 and write exp 2 x for 2 x .

Remark 1.2
The second inequality in (2) implies ℓ ≥ Ω(log d/δ). This condition appears in the derivation of (10), where it is only used in the weaker form ℓ = ω(1) (which follows since δ ≤ 1). It is used in a more essential way in the derivation of (16) where it serves to limit, somewhat artificially, the number of 3-colourings of a bipartite graph with a given pre-image of 0. We expect that Theorem 1.1 should remain true with the second inequality in (2) removed.
We now return to Glauber dynamics. This changes the colour of at most one vertex at each step, and so (as long as the underlying graph has at least five vertices) it is a ρ-local chain for ρ = .2. Fixing ρ to this value, all of the constants in Theorem 1.1 become absolute, and we have the following corollary.
In particular this result applies to the Glauber dynamics chain, although in this case it is not necessary to hypothesize ergodicity.
Proof: In the presence of Corollary 1.4, it suffices to show that the chain M 3 (Q d ) is ergodic. We will show that if χ 1 is a 3-colouring of Q d with χ 1 (v 0 ) = 0 for some v 0 ∈ E then there is a sequence of steps in the Glauber dynamics chain that takes χ 1 to a 2-colouring χ 2 of Q d with χ 2 (v) = 0 for all v ∈ E. This suffices, since it is clear that any one of the six 2-colourings of Q d can be reached from any other via steps in the chain. We make use of a correspondence between proper 3-colourings of Q d and homomorphisms from Q d to Z that send v 0 to 0. Formally, set (This set was introduced in [2] and further studied in [12,17].) Then, as observed by Randall [23], there is a bijection from . Before verifying that this is indeed a bijection, we use the correspondence to establish the corollary. For )| = 2, then we may take χ 2 = χ 1 and we are done. If |R(Φ −1 (χ 1 ))| = k > 2, then it suffices to exhibit a sequence of steps in the chain that takes χ 1 to some Without loss of generality we may assume that Φ −1 (χ 1 ) takes on some strictly positive values. Let ℓ be the largest such value, and let v ∈ V be any vertex satisfying 3 and that the Glauber dynamics chain permits a move from χ 1 to Φ(f ). But we also have that so that by repeating the above described procedure m more times (where m = |{v ∈ V : f (v) = ℓ}|) we arrive at the desired χ 3 .
It remains to verify that Φ is a bijection. That it is injective is clear. To see that it is surjective, consider χ ′ ∈ C v 0 3 . We shall construct from χ ′ an f ∈ F v 0 with Φ(f ) = χ ′ by setting f (v 0 ) = 0 and then extending f level by level, where the k th level of Q d (k = 0, . . . , d) is L k := {v ∈ V : dist(v, v 0 ) = k} (here we are using dist(·, ·) for the usual graph distance). Note that for v ∈ L k , N(v) ⊆ L k−1 ∪ L k+1 and that for f ∈ F v 0 the values that f takes on L k must all have the same parity.
So suppose we have specified f up to L k for some 0 If f is not constant on N(v) ∩ L k , then we claim that there is some ℓ ∈ Z such that f takes on only the values ℓ and ℓ + 2 on N(v) ∩ L k . For if not, then we have This contradiction establishes the two-value claim. We now set f (v) = ℓ + 1, allowing the construction to continue. Since L k+1 is an independent set in Q d , we may repeat the above-described procedure on each vertex of L k+1 independently, thus extending the construction of f to all of L k+1 . ✷ Remark 1.6 Glauber dynamics for q-colourings of Q d is not in general ergodic for 3 < q < ∆(Q d ) + 1. Indeed, it is straightforward to construct a 4-colouring χ of Q 3 which is frozen in the sense that P 4 (χ, χ ′ ) = 0 for all χ ′ = χ; one simply assigns the colours 0, 1, 2 and 3 to a particular vertex and its three neighbours and then extend to a colouring of the whole of Q 3 according to the rule that on each face (4-cycle) of Q 3 all of the colours 0, 1, 2 and 3 must appear.
Remark 1.7 While this paper was under review, Galvin and Randall [13] used methods different to those of the present work to extend Corollary 1.5 to the discrete torus T L,d , the graph on vertex set {0, . . . , L − 1} d in which two strings are adjacent if they differ on exactly one coordinate and differ by 1 (mod L) on that coordinate. The main result of [13] is that for L ≥ 4 even and d large, the Glauber dynamics chain M 3 on We prove Theorem 1.1 via a well-known conductance argument (introduced in [16]). A particularly useful form of the argument was given by Dyer, Frieze and Jerrum [10]. Let M be an ergodic Markov chain on state space Ω with transition probabilities P and stationary distribution π. Let A ⊆ Ω and M ⊆ Ω \ A satisfy π(A) ≤ 1/2 and ω 1 ∈ A, ω 2 ∈ Ω \ (A ∪ M) ⇒ P (ω 1 , ω 2 ) = 0. Then from [10] we have We may think of M as a bottleneck set through which any run of the chain must pass in order to mix; if the bottleneck has small measure, then the mixing time is high. Now let us return to the setup of Theorem 1.1. Set is the set of 3-colourings that are balanced with respect to 0) and We may assume without loss of generality that π 3 (C E,ρ,0 3 ) ≤ 1/2. Notice that since M changes the colour of at most ρN vertices in each step, we have that if χ 1 ∈ C E,ρ,0 the second inequality coming from the trivial lower bound |C E,ρ,0 3 | ≥ 2 N/2 (consider those χ with χ(v) = 0 for all v ∈ E). Theorem 1.1 thus follows from the following theorem, whose proof will be the main business of this paper. Theorem 1.8 Fix ρ > 0 satisfying H(ρ)+ρ < 1. There are constants d 0 , C 1 , C ′ 1 , C 2 > 0 all depending on ρ such that if Σ is a d-regular bipartite graph on N vertices with bipartite expansion δ and locality ℓ satisfying (2) and with d ≥ d 0 then Theorem 1.8 says more about the structure of C 3 than just that the dynamics mixes slowly. From it, we can infer that for Σ and ρ satisfying the conditions of the theorem, C 3 breaks naturally into six sets in such a way that once a ρ-local chain enters one of these dominant sets, it tends to remain there for an exponential time. These sets are characterized by a predominance of one (of three) colours on one (of two) partition classes. Indeed, defining C b,ρ,1 3 and C b,ρ,2 3 by analogy with C b,ρ,0 , we may partition R 3 into six pieces by (for any (x, y, z)) it must enter ∪ 2 i=0 C b,ρ,i 3 which, by Theorem 1.1 and a union bound, has exponentially small measure.
Before turning to the proof of Theorem 1.8 we pause to give a pleasing combinatorial corollary in the special case Σ = Q d .
In other words, the typical 3-colouring of Q d exhibits strong E/O imbalance on all colours.
Proof of Corollary 1.9: As previously observed, ℓ(Q d ) = d and δ(Q d ) ≤ Ω(1/ √ d), so (2) is satisfied for large enough d. It follows that there is a C ′ (ρ) such that for large enough d and for each i = 0, 1, 2, Using 2 2 d−1 as a lower bound on |C 3 (Q d )| (consider those colourings for which χ −1 (0) = E) we obtain for sufficiently large d = d(ρ) and suitable c = c(ρ). The proof is completed by invoking (4) and summing over all choices of a, g, et cetera. The proof of (4) involves the idea of approximation. To bound |H|, we produce a small set U with the properties that each (E, O) ∈ H is approximated (in an appropriate sense) by some U ∈ U, and for each U ∈ U, the number of (E, O) ∈ H that could possibly be approximated by U is small. (Each U ∈ U will consist of six parts; one each approximating E, N(E), I(E), N(I(E)), I(O) and N(I(O)).) The product of the bound on |U| and the bound on the number of those (E, O) ∈ H that may be approximated by any U is then a bound on |H|.
The main inspiration for our approximation scheme is the work of A. Sapozhenko, who, in [26], gave a relatively simple derivation for the asymptotics of the number of independent sets in Q d , earlier derived in a more involved way in [18]. We produce the set U by appealing to a lemma from [14] where a similar approximation scheme was used to show that the mixing time of Glauber dynamics for the hard-core model on Q d with activity λ is (essentially) exponential in 2 d for large enough λ. The proof that each U ∈ U approximates only a small number of (E, O) ∈ H is a modification of a similar proof from [12] in which it is shown that a uniformly chosen homomorphism from Q d to Z almost surely takes on at most 5 values, and also that the number of proper 3-colourings of Q d is asymptotic to 2e2 2 d−1 as d goes to infinity.

The proof
We begin by establishing some more notation. From now on, we write M for N/2.  and We assume the convention that whenever E and O have been specified, I, J and R will be used as shorthand for I(E), To see this, note that once we have specified that the set of vertices coloured 0 is E ∪ O, we have a free choice between 1 and 2 for the colour at x ∈ I ∪ J, with each choice independent. This accounts for the factor 2 |I|+|J| . The subgraph induced by R breaks into comp(R) components, each of which is bipartite and may be coloured in exactly two ways using the colours 1 and 2. This accounts for the factor 2 comp(R) . We therefore have |C b,ρ,0 A key observation is the following.
is an independent set), and so |C| ≥ ℓ. The result follows. ✷ We now decompose C b,ρ,0 3 into four pieces. Set Since H(ρ) + ρ < 1, α is a strictly positive constant depending on ρ. Set (nt, O) similarly. Since Σ has a perfect matching, it is easy to see that for χ ∈ C 3 at least one of |E| ≤ M/2, |O| ≤ M/2 holds; moreover, it is straightforward to check that at least one of |[E]| ≤ M/2, |[O]| ≤ M/2 holds also; that is, that at least one of E, O is small, and so In what follows we make extensive use of a result concerning the sums of binomial coefficients which follows from the Chernoff bounds [8] (see also [5], p.11): Also, since H(x) ≤ 2x log 1/x for x ≤ e −1 , We begin by bounding |C b,ρ,0 3 (triv, E)|. Noting that |I| ≤ |E| and |J| ≤ |O| always (this follows from (1)) we have for sufficiently large d = d(ρ). In (8) we have used Proposition 2.1. In (9) we use (6) while in (10) we use (5) to obtain for some ε = ε(α), and then use the second inequality in (2) (in the weak form that ℓ = ω(1)) to obtain for suitable d.
For integers a, g, b, h, b ′ and h ′ , set Our main lemma is the following (cf. [ For G = Σ we may take β 1 = 2 (say). Note that for each (E, O) ∈ C b,ρ,0 for some a, g, h and h ′ with αM ≤ a ≤ M/2. With the steps justified below, we therefore have verifying (12) and completing the proof of Theorem 1.8. The main point, (13), is an application of Lemma 2.2. Here the constant c depends on α and therefore on ρ. In (14) we use that M ≤ exp{M log 2 d/d} for all d. In (15) we have chosen g = αM to maximize the exponent. Finally in (16) we may (for example) take C 1 = 43/(αc) and C ′ 1 = 4/(αc), and use both inequalities in (2). The final constant c ′ depends only on c and α and therefore only on ρ, as claimed.
To prove Lemma 2.2, we use a notion of approximation introduced in [25]. An approximation for A ⊆ E is a pair (F, S) ⊆ O × E satisfying For A ⊆ O we make the analogous definition.
The following lemma is from [14] (a combination of Lemmata 3.2 and 3.3). We use the shorthand t ≤k for 0≤i≤k There is a family W = W(a, g) such that every A ∈ A(a, g) has an approximation in W. The analogous result holds with O replacing E in the definition of A(a, g).

Remark 2.4
If a and g satisfy g − a ≥ d −β g for some constant β then using (7) the bound on |W| from Lemma 2.3 may be rewritten as as long as d is sufficiently large (as a function of β).
Say that a sextuple (F, S, P, Q, P ′ , is an approximation for E, (P, Q) is an approximation for I and (P ′ , Q ′ ) is an approximation for J. Lemma 2.5 Let G, a, g, b, h, b ′ and h ′ be as in Lemma 2.2. There are constants c 1 = c 1 (β 1 ) > 0 and c 2 = c 2 (β 1 , β 2 ) > 0 and a family X = X (a, Bearing this and the fact that g − a ≥ δg in mind, Lemma 2.2 is implied by Lemma 2.5 and the following reconstruction lemma. Lemma 2.6 Let G, a, g, b, h, b ′ and h ′ be as in Lemma 2.2. There are constants (17) there are at most Proof: For notational convenience, write t for g − a, s for h − b and s ′ for h ′ − b ′ . Say that S is tight if |S| < g − c ′ 1 t/ log d and slack otherwise, that Q is tight if |Q| < b + c ′ 1 s/ log d, and slack otherwise, and that Q ′ is tight if |Q ′ | < b ′ + c ′ 2 s ′ / log d, and slack otherwise, where c ′ 1 = c ′ 1 (β 1 ) > 0 and c ′ 2 = c ′ 2 (β 1 , β 2 ) > 0 are constants that will be specified presently.
We now describe a procedure which, for input (F, S, P, Q, P ′ , Q ′ ) satisfying (17), produces an output (E, O) which satisfies (18). The procedure involves a sequence of choices, the nature of the choices depending on whether S, Q and Q ′ are tight or slack.
We begin by identifying a subset D of E which can be specified relatively cheaply: if Q is tight, we pick I ⊆ Q with |I| = b and take D = N(I); if Q is slack, we simply take D = P (recalling that P ⊆ N(I) ⊆ E).
If S is tight, we complete the specification of E by choosing E \ D ⊆ S \ D. If S is slack, we first complete the specification of N(E) by choosing N(E) \ F ⊆ N(S) \ F . We then complete the specification of E by choosing E \ D ⊆ [E] \ D (noting that we do know [E] \ D at this point).
Next we turn to the specification of O. As with E, we begin by identifying a subset D ′ of O: if Q ′ is tight, we pick J ⊆ Q ′ with |J| = b ′ and take D ′ = N(J); if Q ′ is slack, we simply take D ′ = P ′ . From here, we complete the specification of O by This procedure produces all pairs (E, O) satisfying (18). Before bounding the number of outputs, we gather together some useful observations.
First note that as established in the proof of Lemma 2.5 we have (The first term in the exponent on the left-hand side corresponds to the choice of D (using (21)), and the second to the choice of E \ D ⊆ S \ D (note that since S and Q are both tight, |S \ D| ≤ g − c ′ 1 t/ log d − h). If S is tight and Q is slack then the total is at most (Here there is no choice for D, and the exponent corresponds to the choice of E \ D ⊆ S \ D (using (21)).) If Q is tight then |[E] \ D| = a − h, so that if S is slack (and Q tight) then the number of possibilities for E is at most (The first term in the exponent on the left-hand side corresponds to the choice of D (using (21)), the second to the choice of N(E) \ F (using (22)) and the third to the choice of E \ D.) If Q is slack then |[E] \ D| ≤ a − b − c ′ 1 s/2 log d (see (21)), so that if S and Q are both slack the number of possibilities for E is at most