Mixing under monotone censoring

We initiate the study of mixing times of Markov chain under monotone censoring. Suppose we have some Markov Chain $M$ on a state space $\Omega$ with stationary distribution $\pi$ and a monotone set $A \subset \Omega$. We consider the chain $M'$ which is the same as the chain $M$ started at some $x \in A$ except that moves of $M$ of the form $x \to y$ where $x \in A$ and $y \notin A$ are {\em censored} and replaced by the move $x \to x$. If $M$ is ergodic and $A$ is connected, the new chain converges to $\pi$ conditional on $A$. In this paper we are interested in the mixing time of the chain $M'$ in terms of properties of $M$ and $A$. Our results are based on new connections with the field of property testing. A number of open problems are presented.


A motivating example
Consider critical percolation on a square in the hexagonal lattice. Formally this is given by the probability space {0, 1} Hn with the uniform distribution, where we denote by H n the sites in the hexagonal lattice in the square. It is trivial to sample a configuration from this model by sampling each hexagon independently. Let A be the event of a left to right crossing (by 1's). It is well known, by duality, that P[A] = 0.5. Suppose we want to sample a configuration of A. One natural way to do so is by rejection sampling: sampling a random configuration and accepting it if and only if it is in A. A different natural way to sample is to start with a particular left to right crossing configuration and then repeatedly re-sample edge as long as the resulting configuration is in A. It is not hard to see that the second procedure will also converge to the uniform distribution on A. However, how long would it take to converge?
We will study a more general question. Consider the partial order on {0, 1} n where x y if and only if x i y i for all i ∈ [1, n]. We say a set A ⊂ {0, 1} n is monotone if x ∈ A and x y imply that y ∈ A. For a monotone set A and x 0 ∈ A, let M x 0 A denote the following Markov chain started at x 0 .
• Given the current stat x, pick a coordinate i uniformly at random and re-randomize x i to obtain y.
• Let the next state of the chain be y if y ∈ A. Otherwise let it be x.
It is trivial to verify that the chain converges to the uniform distribution on A (it is clear that the chain is irreducible by monotonicity of A). We aim to analyze the mixing time (see, e.g., [1,10] for definition) for the chain M x 0 A . To this end, we will use a standard geometric bound on the mixing time given by the conductance of the underlying graph for a Markov chain. Given a graph G = G(V, E), the conductance φ(G) is defined to be where vol(S) is the sum of degrees over vertices in S and ∂ E (S) = {(x, y) ∈ E : x ∈ S, y ∈ S} denotes the edge boundary set of S. In light of this, we will view A as the underlying graph for the Markov chain M A . Alternatively, A can be seen as the induced subgraph of the hypercube {0, 1} n with a suitable number of self-loops added to each vertex so that the degree is n for every x ∈ A.
In what follows, we denote by P the uniform probability measure on {0, 1} n .
Combined with standard results in the theory of Markov chains [8,9] (see also [10,Theorem 13.14]), Theroem 1.1 yields the following corollary on the mixing time of M A .
. Note that this implies that the mixing time is polynomial in n as long as A is large (of measure at least inverse polynomial in n). In particular, the mixing time for our motivating example of sampling a critical percolation configuration with a left to right crossing has mixing time at most O(n 3 ). Our result is tight up to polynomial factors in n as the following example shows: Similarly starting from the point (1, . . . , 1, 0, . . . , 0) it is easy to see that the mixing time is lower bounded by the time to hit (1, . . . , 1) with probability at least 1/4, which is lower bounded by 2 m−4 .
Our proof uses a new ingredient in the context of mixing of Markov chain, i.e., a result from the theory of property testing. Property testing, explicitly defined in [12], plays a central role in probabilistically checkable proofs. However, it was extended and extensively studied on its own right for checking properties such as graph properties with fascinating connections to many areas of combinatorics including in particular to regularity lemmas. It turns out that the natural algorithm which samples a number of random neighboring pairs and rejects the monotonicity if a violating pair is seen, works well for monotonicity testing [6]. The key to the success of this natural testing algorithm, which is also the key to our proof of Theorem 1.1, is the following structural theorem on approximately monotone set. where ⊕ denotes the symmetric difference of two sets. Then we have δ(S) ε(S)/n.
Combined with (4), it follows that contradicting with the fact that B ∩ C = ∅. Thus, we completed verification of (3). Without loss of generality we assume now that P(B ⊕ F ) P(A)P(B)

16
, for all F ∈ Ω (if the same holds for C, we just apply the following analysis to C in the same manner, with the observation that ∂ E B = ∂ E C). By Theorem 1.4, we get that For (x, y) ∈ Ψ(B), we have x ∈ B and x y, and thus y ∈ A since A is a monotone set. Therefore, we get (x, y) ∈ ∂ E B, yielding that Ψ(B) ⊆ ∂ E B. This implies that |∂ E B| 2 n P(A)P(B)

16
. Combined with the fact that vol(B) = n2 n P(B), it completes the proof of (2) and thus the proof of the theorem.

Discussions and open problems
It seems plausible that the bound on the mixing time obtained in Corollary 1.2 is not sharp. A case of particular interest is when P(A) 1/2. Indeed, we ask the following open question. Question 1.1. Suppose that there exists a constant c > 0 such that a monotone subset A ⊂ {0, 1} n has measure P(A) c. Is it true that τ mix (M A ) Cn log n, where C > 0 is a constant depending only on c?
In a different direction our results suggest testing non-product measures. For example, suppose we wish to reproduce Theorem 1.1 for the Ising model on some graph G, where we denote by µ the stationary measure. For this to work we will need an analogue of the testing result. In this setup it is natural to define for a set S ⊂ {0, 1} n (identifying 0 with − and 1 with +) We then ask The following example suggests that some assumptions are needed. Consider Curie-Weiss model (Ising model on the complete graph) at low temperature (so the stationary measure admits double wells, see [2,3]) with n sites. For convenience, suppose that n is even. Let A = {x : n i x i n/2}. We claim that ε(A) 1/6. In order to see this, let A k = {x : x i = n − k} for k n/2. For x ∈ A k and y ∈ A c , define a(x, y) = 1 y∈A ′ k ,y x µ(y) |{y : y ∈ A ′ k , y x}| and so a(x, y) = 1 y∈A ′ k ,y x µ(x) |{y : y ∈ A ′ k , y x}| .
Thus, y∈A c a(x, y) = µ(x) for all x ∈ A. In addition, by symmetry for every y ∈ A c we have x y,x∈A a(x, y) = µ(y) (so a(·, ·) is a mass transportation from A to A c with respect to measure µ). Therefore, for any monotone set B we have which is exponentially small in n at low temperature [2,3]. It would be interesting to further study testing other properties for various non-product distributions.
Finally, we note that the influence to the mixing time of censoring was studied in [11], where it was shown that the mixing can only be delayed for Glauber dynamics on monotone spin systems by censoring some updates (the censoring is prescribed without information on what is the proposed update). In [7], an example was given to demonstrate that censoring can indeed speed up the mixing for proper coloring. This question was then studied in [4] in much more general settings, which introduced a certain partial order on the class of stochastically monotone Markov kernels and proved that the monotonicity of Markov chains implies monotonicity of mixing times. These results are different from ours in at least the following two senses: (1) They focus on Markov chains with the same stationary measure while our censoring will even change the state space of the Markov chain; (2) They aim at qualitative results which ensure monotonicity for mixing times of Markov chains under consideration, while ours aims to give a quantitative bound on the mixing time for the censored Markov chain.