A Counterexample to rapid mixing of the Ge-Stefankovic Process

Ge and Stefankovic have recently introduced a novel two-variable graph polynomial. When specialised to a bipartite graphs G and evaluated at the point (1/2,1) this polynomial gives the number of independent sets in the graph. Inspired by this polynomial, they also introduced a Markov chain which, if rapidly mixing, would provide an efficient sampling procedure for independent sets in G. This sampling procedure in turn would imply the existence of efficient approximation algorithms for a number of significant counting problems whose complexity is so far unresolved. The proposed Markov chain is promising, in the sense that it overcomes the most obvious barrier to mixing. However, we show here, by exhibiting a sequence of counterexamples, that the mixing time of their Markov chain is exponential in the size of the input when the input is chosen from a particular infinite family of bipartite graphs.


Overview
Consider the following basic computational problem: Name: #BIS. Instance: A bipartite graph G. Output: The number of independent sets in G.
It has long been know that #BIS is #P-complete [9], and hence presumably intractable, if we insist on an exact solution. However, the computational complexity of approximating #BIS remains a fascinating open problem. The standard notion of efficient approximation in the context of counting problems is the "fully polynomial approximation scheme" or FPRAS. Roughly speaking, an FPRAS is a polynomial-time randomised algorithm that produces an estimate that is close in relative error to the true solution with high probability. (See [8,Defn 11.2] for a precise definition.) The most satisfactory situation would be either to have an FPRAS for #BIS, or a proof that #BIS is NP-hard to approximate. However, neither of situations is known to occur.
Dyer et al. [3] noted that a number of counting problems are equivalent to #BIS under approximation-preserving reducibility, and further #BIS-equivalent problems have been presented in subsequent work [1,5]. Since no FPRAS has been found for any of the counting problems in this equivalence class, it is becoming standard to progress under the assumption that #BIS (and hence each of the #BIS equivalent problems) does not admit an FPRAS. So finding an FPRAS for #BIS at this stage would be a significant development. Not only would it imply the existence of an FPRAS for several natural counting problems -such as counting downsets in a partial order, or evaluating the partition function of the ferromagnetic Ising model with local fields -but it would also resolve the complexity of approximating #BIS in the opposite direction to the one many people expect.
The most fruitful approach to designing efficient approximation algorithms for counting problems has been Markov chain Monte Carlo (MCMC). A direct application of MCMC to #BIS would work as follows. Given a bipartite graph G with n vertices, consider the Markov chain whose state space, Ω, is the set of all independent sets in G, and whose transition probabilities P (·, ·) are as follows, where ⊕ denotes symmetric difference and H(I) = {I ′ ∈ Ω | |I ⊕ I ′ | = 1}: It is easy to check that this Markov chain has the uniform distribution on independent sets as its unique stationary distribution. So, simulating the Markov chain for sufficiently many steps would enable us to sample independent sets nearly uniformly. From there it is a short step to estimating the number of independent sets [6, §3.2].
To obtain an FPRAS from this approach, one requires that the Markov chain on independent sets is rapidly mixing, i.e., that it is close to the stationary distribution after a number of steps that is polynomial in n. Unfortunately, it is clear that the proposed Markov chain does not have this property. Consider the complete bipartite graph with equal numbers of vertices in the left and right blocks of the bipartition. There are 2 n/2 − 1 independent sets that have non-empty intersection with the left block, and the same number with non-empty intersection with the right. Any sequence of transitions which starts in a left-oriented independent set and ends in a right-oriented one must necessarily pass through the empty independent set. Informally, the empty independent set presents an obstruction to rapid mixing by forming a constriction in the state space. This intuition can be made rigorous by noting that the "conductance" of the Markov chain is exponentially small, which implies exponential (in n) mixing time [2,Claim 2.3]. In fact, it is not even necessary to have a dense graph in order to obtain such a constriction: degree 6 will do [2, Thm 2.1].
Ge andŠtefankovič [4] have recently introduced an intriguing graph polynomial R ′ 2 (G; λ, µ), in two indeterminates λ and µ, that is associated with a bipartite graph G. At the point (λ, µ) = ( 1 2 , 1) it counts independent set in G; specifically, the number of independent sets in G is given by 2 n−m R ′ 2 (G; 1 2 , 1), where n is the number of vertices, and m the number of edges in G [4, Thm 4]. This polynomial inspires them to propose a new Markov chain [4,Defn 6] that potentially could be used to sample independent sets from a bipartite graph and hence provide an approximation algorithms for #BIS. The Markov chain, which is described below, is very different from the one discussed earlier. In particular, its states are subsets of the edge set of G rather than subsets of the vertex set. Thus, sampling an independent set of G is a two-stage procedure: (a) sample an edge subset A of G from the appropriate distribution, and then (b) sample an independent set from a distribution conditioned on A. Details will be given below.
The encouraging aspect of this new Markov chain, which we call the Ge-Štefankovič Process, or GS Process for short, is that it is immune to the obvious counterexamples, such as the complete bipartite graph. Unfortunately, with a certain amount of effort it is possible to find a counterexample to rapid mixing. In the following section we describe the GS Process and construct a sequence of graphs on which its mixing time is exponential (in the number of vertices of the graph). Although this counterexample rules out their Markov chain as an approach to constructing a general FPRAS for #BIS, we may still hope that it provides an efficient algorithm for some restricted class of graphs. For example, [4,Theorem 7] shows that it provides an efficient algorithm on trees.

The Ge-Štefankovič Process
Before stating our result, we need to formalise what we mean by mixing, rapid or otherwise. Let (X t ) be an ergodic Markov chain with state space Ω, distribution p t at time t, and unique stationary distribution π. Let x 0 ∈ Ω be the initial state of the chain, so that p 0 assigns unit mass to state x 0 . Define the mixing time τ (x 0 ) with initial state x 0 ∈ Ω, as the first time t at which 1 2 p t − π 1 ≤ e −1 , i.e., at which the distance between the t-step and stationary distributions as at most e −1 in total variation; then define the mixing time τ as the maximum of τ (x 0 ) over all choices of initial state x 0 .
Suppose G = (U ∪V, E) is a bipartite graph, where U, V are disjoint sets forming the vertex bipartition, and E is the edge set. We are interested in two probability spaces, (Ω, π Ω ) and (Σ, π Σ ), where Ω = 2 E and Σ = 2 U . We construct the probability distributions π Ω : Ω → [0, 1] and π Σ : Σ → [0, 1] with the help of a certain consistency relation χ on Σ × Ω, which is defined as follows. For a pair (I, A) ∈ Σ × Ω, consider the subgraph of (U ∪ V, A) induced by the vertex set I ∪ V . We say that the relation χ(I, A) holds iff every vertex of V has even degree in this subgraph. Start with the probability space of consistent pairs {(I, A) ∈ Σ × Ω | χ(I, A)} with the uniform distribution. Then π Ω (respectively π Σ ) is the induced marginal distribution on Ω (respectively Σ). We call π Σ the marginal BIS distribution on Σ. It is shown in [4, Lemma 10] that π Σ is also the distribution induced on U by a uniform random independent set in G, justifying the name.
In [4], π Ω (A), for A ∈ Ω is defined in terms of the rank of A, viewed as a bipartite adjacency matrix over F 2 ; this definition is equivalent to the one given here.
The GS-process is an ergodic "single bond flip" Markov chain on state space Ω which has stationary distribution π Ω . The exact definition of this Markov chain is not important to us, as our counterexample applies to any Markov chain on state space Ω with stationary distribution π Ω that does not change too many edges in one step. In order to formalise this last requirement, say that a Markov chain with transition probabilities P : The GS Process is a 1-cautious Markov chain. Our negative result applies to all d-cautious chains, where d depends at most linearly on the number of vertices of G.

A counterexample to rapid mixing
The following lemma (taken from [2, Claim 2.3]) packages the conductance argument in a convenient way for us to obtain explicit lower bounds on mixing time.

Lemma 1.
Let (X t ) be a Markov chain with state space Ω, transition matrix P and stationary distribution π. Let {S, T } be a partition of Ω such that π(S) ≤ 1 2 , and C ⊂ Ω be a set of states that form a "barrier" in the sense that P (s, t) = 0 whenever s ∈ S \ C and t ∈ T \ C. Then the mixing time of the Markov chain is at least π(S)/8π(C).
Let n, m be positive integers such that (3/2) m ≤ 2 n − 1 < (3/2) m+1 . Note that for every n there is a unique m satisfying the inequalities, and that m depends linearly on n, asymptotically. The counterexample graph (actually sequence of graphs indexed by n) G n = (U ′ ∪ V ∪ U ′′ , E) has vertex set U ′ ∪ V ∪ U ′′ where |U ′ | = n and |V | = |U ′′ | = m. The edge set is E = U ′ × V ∪ M, where M is a perfect matching of the vertices in V and U ′′ . Thus, (a) G n has bipartition (U, V ) where U = U ′ ∪ U ′′ , (b) U ′ , V and U ′′ are all independent sets, (c) the edges between U ′ and V form a complete graph, and (d) the edges between V and U ′′ form a matching.
Partition Σ as Σ = Σ 0 ∪ Σ 1 , where Σ 0 = {I ∈ Σ | I ∩ U ′ = ∅} and Σ 1 = Σ \ Σ 0 . Observe there are 3 m independent sets in G n that exclude all vertices in U ′ , and (2 n − 1)2 m that include some vertex. Since π Σ is the marginal distribution of independent sets in G n , where 2 5 < α ≤ 1 2 , by choice of n, m. So Σ 0 ∪ Σ 1 is a nearly balanced partition of the state-space Σ. Also it is easy to check that the cut defined by this partition is a witness to the conductance of the "single site flip" Markov chain of §1 being exponentially small in n. This implies that the mixing time of the single site flip Markov chain is exponential in n (which, of course, was never in doubt). Next we identify a partition Ω 0 ∪ Ω 1 that mirrors the partition Σ 0 ∪ Σ 1 , and itself witnesses exponentially small conductance of the GS Process.
Define the weight w(A) of A ∈ Ω to be w(A) = |A ∩ M|. Partition Ω as Ω = Ω 0 ∪ Ω 1 , where Ω 0 = {A ∈ Ω | w(A) ≤ 5 12 m} and Ω 1 = Ω \ Ω 0 . We aim to show that the weights of states in Ω are concentrated around 1 3 m and 1 2 m, and there are exponentially few states near the boundary of Ω 0 and Ω 1 . With a view to applying Lemma 1, define a "barrier set" (of states) by C = {A ∈ Ω | 9 24 m ≤ w(A) ≤ 11 24 m}. It is not clear how to sample a state A from the distribution (Ω, π Ω ) directly, so instead we sample a state I from (Σ, π Σ ) and then sample u.a.r. a state A consistent with I, i.e., satisfying χ(I, A). This amounts to the same thing.
Suppose we start with a state I sampled from (Σ, π Σ ), conditional on I ∈ Σ 0 . The set I ∩U ′′ is determined by a Bernoulli process with success probability 1 3 . (For each edge e in M there are three possibilities for the restriction of the independent set I to e, and only one of them includes a vertex from U ′′ . These choices are independent for each e ∈ M.) When we come to select a random consistent edge set A, we must exclude all edges in M that are incident to a vertex in I ∩ U ′′ . The other edges in M are free to be included or excluded. So the set of edges A ∩ M is determined by a Bernoulli process with success probability 1 3 . Thus E(w(A)) = 1 3 m and, by a Chernoff bound, Pr(w(A) ≥ 9 24 m) is exponentially small in m. Specifically, (1) Pr(A ∈ C) ≤ Pr w(S) ≥ 9 24 m ≤ exp(−m/576) by [7,Thm 4.4(2)], setting δ = 1 8 and µ = 1 3 m. Now suppose I is sampled from (Σ, π Σ ), conditional on I ∈ Σ 1 . Now select a uniform random A, conditional on the event χ(I, A). We argue that the probability that a given edge e = (v, u) of M is included in A is 1 2 , independent of all the other edges of M. Suppose v ∈ V and (v, u) ∈ M. Imagine we are deciding which edges incident to v are to be included in A. First we decide whether to include the edge (v, u) itself. In selecting the remaining edges for A from the n available, we just have to make sure that the parity of A ∩ ({v} × (I ∩ U ′ )) is odd, if (v, u) ∈ A and u ∈ I, and even otherwise. Since I ∩ U ′ = ∅, the number of ways to do this is 2 n−1 , independent of whether we included edge e in the first place. It follows that the set of edges A ∩ M is determined by a Bernoulli process with success probability Pr(A ∈ C) ≤ Pr w(S) ≤ 11 24 m ≤ exp(−m/576) by [7,Thm 4.5(2)], setting δ = 1 12 and µ = 1 2 m. We see now that the partition Ω 0 ∪ Ω 1 = Ω is balanced, since π Ω (Ω 0 ) = α ± o(1) and 2 5 < α ≤ 1 2 . Moreover, from (1) and (2), Pr( 9 24 ≤ w(A) ≤ 11 24 ) is exponentially small when A is selected from the distribution (Ω, π Ω ); specifically, π Ω (C) ≤ exp(−m/576). Thus the cut (Ω 0 , Ω 1 ) is witness to the conductance of the single bond flip MC being exponentially small. Suppose d ≤ m/12. Observe that no d-cautious chain can make a transition from Ω 0 \ C to Ω 1 \ C. Applying Lemma 1, we therefore obtain.
Theorem 2. Suppose that n, m, G n , Ω and π Ω are as above, and that d ≤ m/12. Any ergodic Markov chain on state space Ω with stationary distribution π Ω that is d-cautious has mixing time Ω(exp(m/576)). In particular, the GS Process, which is 1-cautious, has mixing time exponential in the number of vertices in G n .
It is also natural to consider a "Swendsen-Wang-style" Markov chain for sampling from (Σ, π Σ ). Let I ∈ Σ be the current state. Choose A u.a.r. from the set {A ∈ Ω | χ(I, A)}. Then choose I ′ u.a.r. from the set {I ′ ∈ Σ | χ(I ′ , A)}. The new state is I ′ . We can think of this process as a Markov chain on state space Σ ∪Ω with stationary distribution 1 2 π Σ on Σ and 1 2 π Ω on Ω. (Assume a continuous time process to avoid the obvious periodicity.) It follows from the earlier analysis that the cut (Σ 0 ∪ Ω 0 , Σ 1 ∪ Ω 1 ) witnesses exponentially small conductance. To see this, we calculate the probability in stationarity of observing a transition from Σ 0 ∪ Ω 0 to Σ 1 ∪ Ω 1 . There are two possibilities: a transition from Σ 0 to Ω 1 , or one from Ω 0 to Σ 1 . The probability of the former, we have seen, is 1 2 π Σ (Σ 0 ) times a quantity that is exponentially small in n. The latter is, by time reversibility, the same as observing, in stationarity, a transition from Σ 1 to Ω 0 . This probability is again exponentially small in n. Hence the conductance is exponentially small so the mixing time of the Swendsen-Wang-style Markov chain is exponential in n.
We can also look a little closer, to see what is going on in more detail. Sample a state at random from (Σ, π Σ ), conditioned on the event π Σ ∈ Σ 0 , and apply a "half-step" of the SW-like process to obtain a state A ∈ Ω. We know that A ∩ M is described by a Bernoulli process with success probability 1 3 . Moreover, it is easy to see the remaining edges of A are Bernoulli with success probability 1 2 . Now consider the transition from A to I ′ . As in [4], view the set I ′ ∩ U ′ as a n-vector u ′ over F 2 . Each of the vertices in V that is not incident to an edge of A∩M generates a linear equation, with constant term zero, constraining u ′ . These 2 3 m ≈ 1.1397n random linear equations constrain just n variables; so with with high probability the only solution is to set all n variables to 0. (Equivalently, a random n × 2 3 m matrix over F 2 has rank n with high probability.) In other words, I ′ ∩ U ′ = ∅, except with exponentially small probability, and we find ourselves back in Σ 0 again.