Attracting Random Walks

This paper introduces the Attracting Random Walks model, which describes the dynamics of a system of particles on a graph with certain attraction properties. In the model, particles move between adjacent vertices of a graph $\mathcal{G}$, with transition probabilities that depend positively on particle counts at neighboring vertices. From an applied standpoint, the model captures the rich get richer phenomenon. We show that the Markov chain underlying the dynamics exhibits a phase transition in mixing time, as the parameter governing the attraction is varied. Namely, mixing is fast in the high-temperature regime, and slow in the low-temperature regime. When $\mathcal{G}$ is the complete graph, the model is a projection of the Potts model, whose phase transition is known. On the other hand, when the graph is incomplete, the model is non-reversible, and the stationary distribution is unknown. We demonstrate the existence of phase transition in mixing time for general graphs.


Introduction
In this paper, we introduce the Attracting Random Walks (ARW) model. The motivation of the model is to understand the formation of wealth disparities in an economic network. Consider a network of economic agents, each with a certain number of coins representing their wealth. At each time step, one coin is selected uniformly at random, and moves to a neighbor of its owner with a probability that depends on how wealthy the neighbors are. Those who are well-connected and initially wealthy will tend to accumulate more wealth. We refer to particles instead of coins in what follows.
This is a flexible model based on a few principles: There are a fixed number of particles moving around on a graph. Movements are asynchronous, and particles make choices about where to move based on their local environment. The model can encompass a variety of situations. Further, the model can be extended by allowing for multiple particle types, with intra-and inter-group attraction parameters, though we do not consider this extension in this paper. There are many more applications beyond the economic application. As an interacting particle system, it could be relevant for physics or chemistry applications. This paper analyzes the Attracting Random Walks model and establishes phase transition properties. The difficulty in bounding mixing times, particularly in finding lower bounds, is due to the fact that the stationary distribution cannot be simply formulated. Additionally, the model is not reversible unless the graph is complete (Theorem 2.3), meaning that many techniques do not apply.
We establish the existence of phase transition in mixing time as the attraction parameter, β, is varied. Slow mixing for β large enough is established by relating the mixing time to a suitable hitting time. Fast mixing for β small enough is proven by a path coupling approach that relates the Attracting Random Walks chain to the simple random walk on the same graph (i.e. with β = 0). An alternative prove of fast mixing is to use a variable-length path coupling, as introduced in [4]. The alternative prove is omitted. We emphasize that even though the stationary distribution is not known analytically for general graphs, we have shown that it undergoes phase transition by arguing through mixing times.
The rest of the paper is structured as follows. We describe the dynamics of the model in Section 2, along with some possible applications. The remainder of the paper is focused on properties of the Markov chain governing the dynamics. In Section 2.1 we discuss a link to the Potts model. Section 3 proves the existence of phase transition in mixing time for general graphs, and is the main theoretical contribution of this work. In Section 4, we collect partial results on the version of the model in which particles repel each other instead of attracting, a model we call "Repelling Random Walks."

The Model
The model is a discrete time process on a simple graph G = (V, E), where V is the set of vertices and E is the set of undirected edges. We assume throughout that G is connected. We write i ∼ j if (i, j) ∈ E. Let k = |V|. Initially, n particles are placed on the vertices of G in some configuration. Let x(i) be the number of particles at vertex i. The particle configuration is updated in two stages, according to a fixed parameter β: 1. Choose a particle uniformly at random. Let i be the location of that particle. 2. Move the particle to a vertex j ∼ i, j = i, with probability which is proportional to e β n x(j) . Keep the particle at vertex i with probability proportional to e β n (x(i)−1) . Let P be the transition probability matrix of the resulting Markov chain. Let e i denote the ith standard basis vector in R k . Then for two configurations x and y such that y = x − e i + e j for i ∼ j or i = j, we have if i = j .
The probabilities are a function of the numbers of particles at each vertex, excluding the particle that is to move. This modeling choice means that the moving particle is neutral toward itself, and relates the ARW model to the Potts model, as will be explained below.
When β is positive (ferromagnetic dynamics), the particle is more likely to travel to a vertex that has more particles. Greater β encourages stronger aggregation of the particles. On the other hand, taking β < 0 (antiferromagnetic dynamics) encourages particles to spread. Note that β = 0 corresponds to the case of independent random walks.
For an application with β < 0, consider an ensemble of identical gas particles in a container. We can discretize the container into blocks. Each block becomes a vertex in our graph. Vertices are connected by an edge whenever the corresponding blocks share a face. Since gas particles primarily repel each other, it makes sense to consider β < 0 in this scenario. Taking β 0 discourages particles from occupying the same block.
To get an idea of the effect of β, Figure 1 displays some instances of the Attracting Random Walks model run for 10 6 steps for different values of β. The graph is the 8 × 8 grid graph, with n = 320, for an average of 5 particles per vertex. We now state our main results regarding the phase transition in mixing time. We let P − Q TV denote the total variation distance between two discrete probability measures P and Q, and let d(X, t) max x∈X P t (x, ·) − π TV be the worst-case (with respect to the initial state) total variation distance for a chain {X t } with stationary distribution π. Let t mix (X, ) min {t : d(X, t) ≤ } denote the mixing time of a chain {X t }.
Theorem 2.1. For any graph G, there exists β 0 > 0 such that if β > β 0 , the mixing time of the ARW model is exponential in n.
Theorem 2.2. For any graph G, there exists a β 0 > 0 such that if 0 ≤ β < β 0 , the mixing time of the ARW model is O(n log n).

Connection to the Potts Model
In the case where G is the complete graph, the Attracting Random Walks model is a projection of Glauber dynamics of the Curie-Weiss Potts model. The Potts model is a multicolor generalization of the Ising model, and the Curie-Weiss version considers a complete graph. In the Curie-Weiss Potts model, the vertices of a complete graph are assigned a color from [q] = {1, . . . , q}. Setting q = 2 corresponds to the Ising model.
Let s(i) be the color of vertex i for each 1 ≤ i ≤ n. Define The stationary distribution of the Potts model, with no external field, is The Glauber dynamics for the Curie-Weiss Potts model are as follows: 1. Choose a vertex i uniformly at random. 2. Update the color of vertex i to color k ∈ [q] with probability proportional to e β n j =i δ(k,s(j)) .
Observe that the sum j =i δ (k, s(j)) is equal to the number of vertices, apart from vertex i, that have color k. Therefore if each vertex in the Potts model corresponds to a particle in the ARW model, and each color in the Potts model corresponds to a vertex in the ARW model, then the ARW model is a projection of the Glauber dynamics for the Potts model. The correspondence is illustrated in Figure 2. Under the correspondence, the ARW chain is exactly the "vector of proportions" chain in the Potts model. Let v(i) be the vertex location of the ith particle in the ARW model, for 1 ≤ i ≤ n. By the correspondence, we show that the stationary distribution of the ARW model is Observe that the e β 2n i x(i) 2 factor encourages particle aggregation, while the multinomial encourages particle spread.
The reader is encouraged to refer to [3] for a detailed study of the mixing time of the Curie-Weiss Potts model, for different values of β. For instance, [3] show that there exists a β s (q) such that if β < β s (q), the mixing time is Θ(n log n), and if β > β s (q), the mixing time is exponential in n. In the ARW context, these results hold with q replaced by k.
On the other hand, when G is not the complete graph, the correspondence to the Potts model is lost. In fact, the following can be shown: Theorem 2.3. For n ≥ 3, the ARW Markov chain is reversible for all β if and only if the graph G is complete.
The non-reversibility can be shown by applying Kolmogorov's cycle criterion, demonstrating a cycle of states (configurations) that violates the criterion.
Proof of Theorem 2.3. First, if the graph is complete, then the chain is a projection of Glauber dynamics, which is automatically reversible. Now suppose G is not complete. The proof of nonreversibility relies on Kolmogorov's criterion, a necessary and sufficient condition for reversibility. Lemma 2.1 (Kolmogorov's criterion). A finite state space Markov chain associated with the transition probability matrix P is reversible if and only if for all cyclic sequences of states i 1 , i 2 , . . . , i l−1 , i l , i 1 it holds that In other words, the forward product of transition probabilities must equal the reverse product, for all cycles of states.
In the ARW model, a state is a particle configuration. A cycle of states is then a sequence of particle configurations such that 1. Subsequent configurations differ by the movement of a single particle.
2. The first and last configurations are the same.
If G is not a complete graph, then it is straightforward to show that there exist three vertices u ∼ v ∼ w such that u w. Now we demonstrate a cycle of states that breaks Kolmogorov's criterion. We have the following situation.
The values d u , d v , and d w indicate the degrees of the vertices, excluding the named vertices. Place n − 2 particles at u and 2 particles at v. The particle movements are as follows: For clarity, let f (z) = e βz . The forward transition probabilities are: The reverse transition probabilities are: Canceling factors that appear in both products, we are left comparing (f (n − 1) .
Observe that f (z 1 )f (z 2 ) = f (z 1 + z 2 ). Taking leading terms, the first product is therefore a degree-(2n − 2) polynomial in e β . Since n − 2 ≥ 1, the second is a degree-(2n − 4) polynomial in e β . These polynomials have a finite number of solutions for e β , and therefore β itself. Therefore the Markov chain is not reversible.

Mixing Time on General Graphs
In this section, we show the existence of phase transition in mixing time in the ARW model when β is varied, for a general fixed graph. First, we show exponentially slow mixing for β suitably large, namely prove Theorem 2.1 by relating mixing times to hitting times. Next, we show polynomial time mixing for small values of β. The proof is by an adaptation of path coupling. For a reference to standard definitions around Markov chains, please see [5].

Slow Mixing
Proof of Theorem 2.1. The idea is to show that with substantial probability, the chain takes an exponential time to access a constant portion of the state space. First we state and prove a helper lemma.
k . In other words, the states where v has the greatest number of particles contribute at least 1 k to the stationary probability mass. Proof. By the Union Bound, By Lemma 3.1, there exists a vertex v such that π(S v ) ≥ 1 k . Choose any other vertex u. Whenever x(u) > 1 2 n, we can be sure that v is not the maximizing vertex, and therefore that at least 1 k of the stationary probability mass has not been accessed. Therefore until that hitting time, the total variation distance of the chain to its stationary distribution is at least 1 k . Let T x inf{t : X t (u) ≤ 1 2 n|X 0 = x}. If the probability that {X t } has reached the set {x ∈ Ω : x(0) ≤ 1 2 n} by time t is less than some p, then the total variation distance at time t is at least (1 − p) 1 k . Therefore we get the following relationship between the mixing time and hitting time: The problem now reduces to lower bounding this hitting time. The idea is that when particles leave vertex u, there is a strong drift back to u. However, controlling the hitting times of a multidimensional Markov chain is challenging, and direct comparison is difficult to establish. We instead reason by comparison to another Markov chain, Z, which lower bounds the particle occupancy at vertex u.
Let l(w) be the length of the shortest path connecting vertex u to vertex w. LetX t be a projection of the X t chain defined byX t (d) = w:l(w)=d X t (w), and letΩ be its state space. In other words, the dth coordinate of the projected chain counts the number of particles that are a distance d away from vertex u. Note thatX t (0) = X t (u). We let F denote this projection, writing,X = F (X). For any 0 < δ < 1 2 , define For some δ > 0 to be determined, let S = {x ∈Ω : x(0) > (1 − δ)n} and let S c =Ω \ S. We now build a chain Z onΩ coupled toX such that as long as The remainder the proof of slow mixing is as follows.
1. Construct a lower-bounding comparison chain Z satisfying Z t (0) and use a concentration bound to show that Z(0) ∼ π Z (0) places exponentially little mass on the set S c . 3. Comparing the chain X to Z, show that X takes exponential time to achieve X(u) ≤ (1 − δ)n. The result is complete by 1 − δ > 1 2 . We now define the lower-bounding comparison chain Z, which is a chain on n independent particles. These particles move on the discrete line with points {0, 1, . . . , D}, where D = diam(G). Since the comparison needs to hold only whenX t (0) ≥ (1 − δ)n, we assume thatX t (0) ≥ (1 − δ)n. The idea is to identify a uniform constant lower bound on the probability of a particle moving closer to u under this assumption, which tells us that once the particle is at u, there is a high probability of remaining there.
In the X chain, when a particle is at a vertex w / ∈ {u} ∪ N (u), its probability of moving to any one of its neighbors is at least where ∆ is the maximum degree of the graph. This is because the lowest probability when β is large corresponds to placing all δn movable particles at some other neighbor of w. When a particle is at a vertex w ∈ N (u), it moves to u with probability at least q e β(1−δ) e β(1−δ) + e βδ + ∆ − 1 .
When a particle is at a vertex u, it stays there with probability at least which is only slightly smaller than q. For the purpose of clean analysis, we say that when a particle is at vertex u, it stays there with probability at least u. The transitions of the Z chain are chosen in order to maintain comparison. At each time step, a particle is selected uniformly at random. When the chosen particle located at d / ∈ {0, 1}, the particle moves to d − 1 with probability p and moves to min{d + 1, D} with probability (1 − p). When the chosen particle is located at d ∈ {0, 1}, it moves to 0 with probability q, and moves to d + 1 with probability 1 − q. The transition probabilities for single particle movements are depicted in Figure 3. Lemma 3.2 establishes the comparison.
. . , D} and t < T x (δ). Since Z 0 =X 0 , we can pair up the particles at time t = 0 and design a synchronous coupling, i.e. when a certain particle is chosen in theX process, its copy is chosen in the Z chain. We design the coupling so that for each particle, theX-copy is at least as close to 0 as the Z-copy, for all t < T x (δ). Note that this implies ( ) for all d ∈ {0, 1, . . . , D} and t < T x (δ). The uniformity of p and q over all configurations in S ensures that the coupling will maintain the requirement ( ), which is established by induction on t.
The base case (t = 0) holds since Z 0 =X 0 . Suppose that at time t < T x (δ), each particle in theX chain is at least as close to 0 as its copy in the Z chain. We will show that the same property holds for time t + 1. First consider a particle located at 0 in the Z chain. By the inductive hypothesis, its copy must be located at 0 in theX chain also, and the corresponding particle in the X chain must be at u. The probability of the particle staying at 0 in the Z chain is smaller than the probability of the corresponding particle staying at u in the X chain, since q is a uniform lower bound on the probability of staying at u. Therefore in this case, the property is maintained.
Next consider a particle located at vertex d = 0 in the Z chain. The uniformity of q (if d = 1) or p (if d > 1) means that the probability of moving in the direction of 0 in the Z chain is smaller than the probability of the corresponding particle in the X chain moving closer to u. We conclude that the coupling can be extended to time t + 1. Now that we have established the lower-bounding property of Z, we compute its expected particle occupancy at 0 (corresponding to vertex u) at stationarity, and the concentration of that occupancy. Let π Z denote the stationary distribution of the Z chain, and let λ(w) be the probability according to π Z of a particular particle being located at vertex w. Then The following lemma bounds the stationary probability away from below and establishes concentration of the stationary measure.
Proof. To compute the stationary probabilities λ(r), r ∈ {0, 1, . . . , D}, note that we can disregard the initial uniform particle choice, and simply consider a Markov chain on a graph with (D + 1) nodes as in Figure 3. By solving the equations for the stationary distribution, it can be shown that Substitute p and q into (3.1): First, the limit of the numerator as β → ∞ is equal to 1. Therefore, for β large enough, the numerator is greater than 1 − δ + 2 < 1. Next, the expression yields an upper bound that is valid for β large enough. Finally, With these replacements, for β large enough, Since lim β→∞ Next we show concentration. Label all the particles, and define U i = 1 if particle i is at vertex 0, and U i = 0 otherwise. Then Z(0) = i U i , and U i is independent of U j for all i = j. Applying Hoeffding's inequality, for c > 0. Let c = . Then the above implies Applying Proposition 3.1 with p = 1 2 , The last equality is due to the fact that P(T Additionally define Finally, from Lemma 3.3 we know that π Z (S c ) ≤ 2 exp −2 2 n . Supposing that Z 0 is distributed according to π Z , the hitting time T Z πz is a geometric random variable with success probability at most 2 exp −2 2 n , which means , which proves Theorem 2.1.

Fast Mixing
The proof is by a modification of path coupling, which is a method to find an upper bound on mixing time through contraction of the Wasserstein distance.
The following definition can be found in [5], pp. 189.
Definition 3.1 (Transportation metric). Given a metric ρ on a state space Ω, the associated transportation metric ρ T for two probability distributions µ and ν is defined as where the infimum is over all couplings of µ and ν on Ω × Ω.
Definition 3.2 (Wasserstein distance). Let P be the transition probability matrix of a Markov chain on a state space Ω, and let ρ be a metric on Ω. The Wasserstein distance W P ρ (x, y) of two states x, y ∈ Ω with respect to P and ρ is defined as follows: In other words, the Wasserstein distance is the transportation metric distance between the next state distributions from initial states x and y.
The following lemma is the path coupling result which can be found in [2] and [5]. Given a Markov chain on state space Ω with transition probability matrix P , consider a connected graph H = (Ω, E H ), i.e. the vertices of H are the states in Ω and the edges are E H . Let l be a "length function" for the edges of H, which is an arbitrary function l : E H → [1, ∞). For x, y ∈ Ω, define ρ(x, y) to be the path metric, i.e. ρ(x, y) is the length of the shortest path from x to y in terms of l and H.

Lemma 3.4 (Path Coupling).
Under the above construction, if there exists δ > 0 such that for all x, y that are connected by an edge in H it holds that where diam(Ω) = max x,y∈Ω ρ(x, y) is the diameter of the graph H with respect to ρ.
Our proof of rapid mixing for small enough β relies on rapid mixing of a single random walk. The following lemma demonstrates the existence of a contracting metric for a single random walk. It is possible that such a result appears elsewhere, but we are not aware of a published proof.
Lemma 3.5. Consider a random walk on G which makes a uniform choice among staying or moving to any of the neighbors and denote by Q its transition matrix. Let d(x, y) be the expected meeting time of two independent copies of a random walk on a graph started from states x and y. Then d(x, y) is a metric and Q contracts the respective Wasserstein distance. In particular, Remark 3.1. In fact, the same proof shows a stronger result (i.e. with a smaller value in the place of d max ): we can allow arbitrary Markovian coupling between two copies of the random walk and define d(x, y) to be the meeting time under that coupling.
Proof of Lemma 3.5. First we verify that d(x, y) is a metric. It holds that d(x, y) = d(y, x), and d(x, y) ≥ 0 with equality if and only if x = y. To show the triangle inequality, start three random walks from vertices x, y, z and let τ (x, y) be the meeting time of the walks started from x and y. The three random walks are advanced according to the independent coupling, and if a pair of walks collides, they are advanced identically starting from that time. Under this coupling, observe that and take expectations. Next we show that W Q ρ (x, y) ≤ d(x, y)−1. We can choose any coupling of X 1 ∼ P (x, ·) and Y 1 ∼ P (y, ·) to show an upper bound. Letting These two equations imply Proof of Theorem 2.2. Suppose d(i, j) ≥ 1{i = j} is a metric on G such that a single-particle random walk's kernel Q satisfies Note that the existence of such a metric d(·, ·) was established in Lemma 3.5 with an estimate of δ = 1 dmax . We let H = (Ω, E H ) be a graph on particle configurations, where (x, y) ∈ E H whenever y = x−e i +e j for some pair of distinct vertices i and j in G. In other words, x and y differ by the position of a single particle. Note that i and j need not be neighboring vertices in G. For such a pair of neighboring configurations (x, y), let l(x, y) = d(i, j).
Clearly, l(x, y) ≥ 1{x = y}. Now for any two configurations x, y ∈ Ω, let ρ(x, y) denote the path metric induced by H and l(·, ·). We show that ρ(x, y) = l(x, y) for neighboring configurations. Indeed, let I = {i r : 0 ≤ r ≤ m − 1} and J = {j r : 0 ≤ r ≤ m − 1} be the multisets that collect the "outbound" and "inbound" particle transfers, respectively. The value i must appear one more time in I than in J . Similarly, the value j must appear one more time in J than in I. All other values appear an equal number of times in I and J . By choosing terms d(i r , j r ) in order, beginning with d(i, l 1 ), it is possible to rearrange the sum in the given form. By the triangle inequality for d(·, ·), Therefore, the shortest distance between x and y is along the edge connecting them, and we conclude that ρ(x, y) = l(x, y) for neighboring configurations. Now we wish to bound W P ρ (x, y) for all neighboring particle configurations x and y, related by y = x − e i + e j . We may choose any coupling in order to obtain an upper bound. The coupling will be synchronous: the choice of particle to be moved will be coordinated between the chains. Namely, if the "extra" particle is chosen in configuration x, then so too will the "extra" particle be chosen in configuration y. Similarly, if some other particle is chosen in x, than a particle at the same vertex will be chosen in y. For an illustration, see Figure 4. Let X 1 ∼ P (x, ·) and Y 1 ∼ P (y, ·) denote the coupled random variables corresponding to the next configurations. Let P denote the set of particles, and let p be the "extra" particle. Letp be a random variable that denotes the uniformly selected particle. Since our coupling gives an upper bound, we can write First, suppose the "extra" particle, p , is chosen in both chains. This happens with probability 1 n . Let P x (i, ·) be the probability distribution of the next location of the selected particle, when it is initially located at vertex i in configuration x. Recall that Q(i, ·) is the probability distribution of the next location of a simple random walk on G, initially located at vertex i. Note that when β = 0, it holds that P x (i, ·) = Q(i, ·). When β is small, P x (i, ·) ≈ Q(i, ·). The following lemma quantifies this statement.
x ∈ Ω} parametrized by the configuration x is contained within the convex set Proof. To show this claim, we compute the ratio Px(i,j1) Px(i,j2) when j 1 , j 2 ∈ N (i)∪{i}, and show that it is upper bounded by e β . There are three cases to consider. .
The first inequality is due to the fact that {P x (i, ·) : x ∈ Ω} ⊂ P β and the second is due to the fact that the maximum of a convex function over a closed and bounded convex set is achieved at an extreme point, namely e β d+e β , 1 d+e β , . . . , 1 d+e β . To maximize the right hand side of (3.5), let Setting f (d) = 0 we get the solutions d = ±e Therefore, we can couple the distributions P x (i, ·) and P y (j, ·) to Q(i, ·) and Q(j, ·) with probability at least 1 − e . In that case, we get contraction by a factor of (1 − δ). With the remaining probability, we assume the worstcase distance of d max . Therefore, the conditional Wasserstein distance is upper bounded as follows: Next, suppose some other particle (located at v) is chosen in both chains. This happens with probability n−1 n . Because only the position of one particle is different between the two configurations, P v (x, ·) ≈ P v (y, ·). Lemma 3.7. Recall that ∆ is the maximum degree of the vertices in V. The following holds: The proof of Lemma 3.7 is deferred to the appendix. By Lemma 3.7, we claim Indeed, ρ(X 1 , Y 1 ) = ρ(x, y) if particlep moves to the same vertex in both chains. Otherwise, an additional distance of at most 2d max is incurred.
Finally, we substitute the bounds (3.6) and (3.7) into (3.8), recalling that ρ(x, y) = d(i, j) for y = x − e i + e j : where the last inequality is due to ρ(x, y) ≥ 1 and n−1 n ≤ 1. In order to show contraction, it is sufficient that the expression multiplying 1 n be positive: For an example of a satisfying β, choose β so that Therefore, we can choose .
Substituting β = β 0 into (3.8), we obtain for some δ > 0 Applying the path coupling lemma (Lemma 3.4), we obtain Setting the right hand side to be less than > 0 in order to bound t mix (X, ), Therefore, t mix (X, ) = O(n log n), which completes the proof of Theorem 2.2.
Remark 3.2. Arguably, a more natural approach to show fast mixing would be through a more traditional path coupling approach: Let H have an edge between configurations x and y = x − e i + e j if i and j are adjacent vertices in G. Set l(x, y) = 1 for adjacent configurations. However, this approach does not yield contraction in the Wasserstein distance. The impossibility of contraction can be shown by considering a linear program describing the optimal coupling, and applying linear programming duality. This is done in Section 3.2.1.

Non-contraction in one-step path coupling
We now show that the approach for proving Theorem 2.2 based on the natural one-step path coupling does not yield the required contraction. Proof. Let G be the 4-vertex path graph. Label the vertices 1, 2, 3, 4 in order along the path, and consider x, y related by y = x − e 2 + e 3 so that the two configurations differ by a transfer from one middle vertex to the other. When β = 0, the transition probabilities are simple: given that a particle is chosen at vertex v, it moves to vertex w ∈ N (v) ∪ {v} with probability 1 deg(v)+1 . The optimal coupling of P (x, ·) and P (y, ·) may be expressed as an optimal solution of a linear program, as follows. Write x ∼ x if x is adjacent to x in H or x = x. For each x ∼ x and y ∼ y, let z(x , y ) be a variable representing the probability of the next states being x and y in a coupling. The constraints require the collection of z variables to be a valid coupling, and the objective function calculates the expected distance under the coupling. min x ∼x,y ∼y z(x , y )ρ(x , y ) This linear program is known as a Kantorovich problem. Our goal is to show that the optimal objective value is at least 1. We will first write down the dual problem. By weak duality, any feasible solution to the dual problem gives a lower bound to the optimal value of the primal problem. Next we will construct a primal solution with objective value equal to 1, and apply the complimentary slackness condition to help us construct a dual solution whose objective value is also equal to 1. Finally we will conclude that the optimal solution to the primal problem is equal to 1, by strong duality. For a reference to linear programming duality, see e.g. Chapter 4 of [1]. First we take the dual of the linear program, introducing dual variables u(x ) for x ∼ x and v(y ) for y ∼ y: This linear program is a Kantorovich dual problem. By weak duality, if there exists a dual solution with objective value Z, then the optimal solution of the primal is at least Z. Therefore our goal is to find a dual solution with objective value at least 1.  Other values of z(x , y ) are set to zero. In other words, z describes a synchronous coupling according to the pairing in Figure 4, with particles moving in the same direction always. Now supposing this is an optimal solution, we apply complementary slackness to identify candidate dual optimal solutions. The complementary slackness condition states that if z and (u, v) are optimal primal and dual solutions, then it holds that for all x ∼ x, y ∼ y, If our primal solution z is optimal, then whenever z(x , y ) = 0, we need u(x ) + v(y ) = ρ(x , y ). These additional constraints help us construct the following dual feasible solution: We find that the objective value of this solution is equal to 1. By strong duality, we conclude that the optimal value of the primal problem is equal to 1, and therefore there does not exist a contractive coupling.
Remark 3.3. The argument in the proof of Theorem 3.1 should apply to all graphs G that contain the a four-vertex path graph as a subgraph, and possibly to other graphs as well.

Repelling Random Walks
Throughout our analysis, we have only considered β ≥ 0. However, the case β < 0 ("Repelling Random Walks") is theoretically and practically interesting to study also. Simulations confirm the intuition that the particles behave like independent random walks when β is close to zero, and spread evenly when β is very negative (see Figure 5). We conjecture that there are not any hard-toescape subsets of the state space for all β < 0.  Proof. When β = −∞, the dynamics are simplified. Suppose a particle is chosen at vertex i. Let A be the set of vertices corresponding to the minimal value(s) of {x(i) − 1, x(j) : j ∼ i}. The chosen particle moves to a vertex among those in A, uniformly at random.
Our goal is to show that the set satisfies the following three properties: (1) It is absorbing, meaning that once the chain enters C, it cannot escape C; (2) The chain enters C in polynomial time; (3) Within C, the chain mixes in constant time with respect to n. We claim that the maximum particle occupancy cannot increase, and the minimum particle occupancy cannot decrease. We now show that the maximum particle occupancy, M t max v X t (v) is monotonically non-increasing over time.
Suppose that at time t, a particle at vertex i is selected and moves to vertex j. There are five cases: The maximum does not change. 2. i = j, and both are maximizers. This case is not possible, since x(j) > x(i) − 1. 3. i = j, i is a maximizer, and j is not. The new maximum value is at most M t , in the case that X t (j) = X t (i) − 1. 4. i = j, i is not a maximizer, and j is. This case is not possible, since x(j) > x(i) − 1. 5. i = j, i and j are not maximizers. The new maximum value is at most M t , in the case that X t (j) = X t (i) − 1.
Therefore M t+1 ≤ M t . A similar argument shows that the minimum particle occupancy is monotonically non-decreasing over time. Together, they imply Property (1). Next, we show Property (2). Assume X t / ∈ C. Let M t be the set of maximizing vertices at time t. We claim there exists at least one vertex u ∈ M t such that there exists a path of distinct vertices . In other words, there is a walkable path from u = i 1 to i p . The maximum length of the path is k − 1. The probability that a particle is transferred along this path before any other events happen is therefore lower bounded by . Therefore the probability that such a transfer happens within T 1 trials is at least p . If there had been at least two maximizing vertices to start, the number of maximizing vertices would have fallen by 1. If there had been only one maximizing vertex to start, the maximum value itself would have fallen by 1. We see that there are two types of "good" events: reducing the number of maximizing vertices while the maximum value stays the same, or reducing the maximum value. We claim that the number of "good" events that happen before the chain enters the set C is upper bounded by n 2 . Indeed, imagine that the particles at each vertex are stacked vertically. A particle movement from vertex i to vertex j is interpreted as a particle moving from the top of the stack at vertex i to the top of the stack at vertex j. Observe that the height of a particle cannot increase. Further, each particle's height can fall by at most n − 1 units over time, and can therefore drop at most n − 1 times. Since all good events require a particle's height to drop, the number of good events is at most n(n − 1) < n 2 . Let T 2 = 2n 2 1 p be the number of trials of length T 1 each. Let N be the number of successes during the T 2 trials. By the Hoeffding inequality, Therefore the probability that the chain is in C after For an example, we can even set T 1 = 1.
within O(n 2 ) steps, the chain is in C with high probability. Finally, we show Property (3). Once the chain is in C, there are two types of vertices: those that have n k particles, and those that have n k + 1 particles. Note that there are alwaysk n − k n k vertices with the higher number of particles. Therefore it is equivalent to study an exclusion process with justk particles on the graph G. With probability n k · k−k n , an unoccupied vertex is selected, and the chain stays in place. With the remaining probability, an occupied vertex is chosen uniformly at random. Its particle then moves to a neigboring empty vertex or stays where it is, uniformly at random. Equivalently, the chain is lazy with probability n k · k−k n , and otherwise one of thek particles is chosen, and either stays or moves to a neighbor. Since the number of particles k can be upper and lower bounded by constants (0 ≤k ≤ k), the mixing time within C is independent of n. Therefore, we conclude that the overall mixing time is O(n 2 ).

The complete graph case
Note that the complete graph case for β < 0 is equivalent to the vector of proportions chain in the antiferromagnetic Curie-Weiss Potts model.  Let (X t , t ≥ 0) be the ARW chain for any β < 0 and let (Y t , t ≥ 0) be a chain of independent particles (β = 0). Set X 0 = Y 0 . For every vertex v and time t, Proof. We claim that there exists a coupling of {X t , Y t } such that for all v and t, and defineỸ t (v) similarly. We claim that for all configurations x and vertices v, if x(v) = n k , then and and In other words, the inequalities (4.1)-(4.4) state that the X chain is less likely to move in the absolute value-increasing direction, and more likely to move in the absolute value-decreasing direction. These inequalities, along with the fact that X 0 = Y 0 , suffice to prove the lemma.
There are two cases to analyze when x(v) = n k : n k−1 k , because vertex v is a more likely than average destination. In other words, it is harder to lose a particle from vertex v that has fewer than the average number of particles when β < 0, compared to when β = 0. Mathematically, For the same reason, the probability that X t+1 (v) = X t (v) + 1 is lower bounded by 1 − Xt(v) n 1 k . Therefore, inequalities (4.1) and (4.2) hold in this case. 2. X t (v) > n k . This time, v is a less likely than average destination. The probability that X t+1 (v) = X t (v) − 1 is lower bounded by Xt(v) n k−1 k . The probability that X t+1 (v) = X t (v) + 1 is upper bounded by 1 − Xt(v) n 1 k . Therefore, inequalities (4.1) and (4.2) hold in this case also.
Finally, suppose x(v) = n k . Then the probability of losing a particle is upper bounded by 1 k k−1 k , and the probability of gaining a particle is upper bounded by k−1 k 1 k . Therefore, inequalities (4.3) and (4.4) hold.
We conclude that such a coupling exists, and therefore the stochastic dominance holds.
Proof of Theorem 4.2. Let {Y (v), v ∈ V} be a random variable distributed according to the stationary distribution of the {Y t (v), v ∈ V, t ≥ 0} chain at stationarity. At stationarity, the vertex occupancies are strongly concentrated around their means. By the Hoeffding Inequality, for every λ > 0, for every vertex v. Fix > 0. We wish to upper bound t mix (X, ). Now, for all , T 1 t mix (Y, ) = O(n log n). Therefore at time T 1 , for every λ > 0, for every vertex v. By Lemma 4.1, it also holds that for every λ > 0, Then by the Union Bound, for every λ and v. We observe that for n large enough, there is always an small enough so that k 2e −2λ 2 n + ≤ 2 . Then with probability at least 1 − 2 , X T1 belongs to C(λ). Next, we establish that for every β < 0, there exists λ β such that (1) Once the chain enters C(λ β ), it takes exponential time to leave C(2λ β ), with high probability; (2) We can applying path coupling within C(2λ β ). The first claim is due to comparison with the β = 0 chain, as established above.
We now demonstrate the required contraction for path coupling within C(2λ). Recall that we need to define the edges of the graph H = (Ω, E H ) and choose a length function on the edges. Let (x, y) ∈ E H if y = x − e i + e j for some i = j, and let l(x, y) = 1. Consider any pair of neighboring configurations x and y. We employ a synchronous coupling, as in Figure 4. Namely, the "extra" particle at vertex i in configuration x is paired to the "extra" particle at vertex j in configuration y. All other particles are paired by vertex location. When a particle is selected to be moved in the x configuration, the particle that it is paired to in the y configuration is also selected to be moved.
With probability n−1 n , one of the (n − 1) pairs that has the same vertex location is chosen. Suppose it is located at vertex v. We couple the transitions in the two chains according to the coupling achieving the total variation distance P x (v, ·) − P y (v, ·) TV . Lemma 4.2. On the complete graph, if y = x − e i + e j and x, y ∈ C(2λ), then The proof of Lemma 4.2 is rather involved because it requires considering many cases. It is deferred to the appendix.
By Lemma 4.2, when one of the (n − 1) particles paired by vertex location is chosen, we can couple them so that they move to the same vertex with probability at least 1 − −β n 1+(k−1)e 4λβ . With the remaining probability, the distance increases by at most 2.
With the remaining 1 n probability, the "extra" particle is chosen in both chains. The chains can then equalize with probability 1 because P x (i, ·) = P y (j, ·) on the complete graph. Therefore, we can bound the Wasserstein distance as follows: Therefore, in order to achieve contraction, it suffices that For any 0 < δ < 1, let λ β = 1 4β log(1 − δ) > 0. Then substituting λ = λ β , the right hand side of (4.5) becomes 1 + (k − 1)(1 − δ) < k. Since δ can be arbitrarily close to zero, this quantity is arbitrarily close to k. Therefore, contraction holds for − k 4 < β ≤ 0. To summarize the argument, we have shown that in time O(n log n), the chain enters C(λ β ). After that, the chain leaves the larger set, C(2λ β ), with exponentially small probability, which can be disregarded. Within C(2λ β ), the Wasserstein distance with respect to the chosen H and ρ contracts by a factor of 1 − θ 1 n , so an additional O (n log n) steps are sufficient. Therefore, the overall mixing time is O (n log n).

Conclusion
In this paper we have introduced a new interacting particle system model. We have shown that for any fixed graph, the mixing time of the Attracting Random Walks Markov chain exhibits phase transition. We have also partially investigated the Repelling Random Walks model, and we conjecture that model is always fast mixing. Beyond theoretical results, it is our hope that the model will find practical use.
Polyanskiy. I appreciate the careful editing by D. Gamarnik. The work benefitted in a pivotal way from discussions with Eyal Lubetzky and Reza Gheissari, especially in the proof of slow mixing. The idea of using a lower-bounding comparison chain is due to R. Gheissari. I am grateful to E. Lubetzky for kindly hosting me at NYU. I acknowledge Yuval Peres for several helpful discussions. I am supported by a Microsoft Research PhD Fellowship.

Appendix
Proof of Lemma 3.7. First, We will show that each term is upper bounded by 2β n . Since there are at most ∆ + 1 terms, the bound follows.
We compute max x,y:x∼y |P x (v, w) − P y (v, w)|. Since x and y are interchangeable, we can drop the absolute value. max x,y:x∼y First consider the case that v = w. Then max x,y:x∼y = max By similar reasoning to the case v = w, the expression in parentheses is upper bounded by 2β n . Since e − β n ≤ 1, the whole expression is upper bounded by 2β n .
Proof of Lemma 4.2. To compute this total variation distance, write We have if v = w.
Writing P y (v, w) in terms of x, there are three cases: Let us compute P x (i, i) − P y (i, i). Since e β n − 1 < 0 and e 2 β n − 1 < 0, we find that P x (i, i) − P y (i, i) < 0. Let us now compute P x (i, w) − P y (i, w) when w / ∈ {i, j}. For the remainder of the analysis, we assume without loss of generality that x(i) ≥ y(j). In fact, we can assume that x(i) > y(j) because when x(i) = y(j), it must be that x = y, and the total variation distance is zero. Analyzing the numerator, The first factor is positive.
The assumption x(i) > y(j) means that x(i) ≥ x(j) + 2. Therefore, the second factor is nonpositive. Therefore, P x (i, w) − P y (i, w) ≤ 0 when w / ∈ {i, j}. The last remaining possibility for w is w = j. Since we have also shown that P x (i, i)−P y (i, i) < 0, it must be that P x (i, j)−P y (i, j) > 0. We conclude that when if w / ∈ {i, j}.
One way to see this is to consider the computation for the above case v = i with one less particle at vertex j and one more particle at vertex i in both configurations, to adjust for the fact that v = j. After this adjustment, it is still true that P x (v, i) − P y (v, i) < 0, by examining the computation for P x (i, i) − P y (i, i) in the case above. Similarly, it holds that for w / ∈ {i, j}, P x (v, w) − P y (v, w) < 0: the second factor in Equation (5.1) is replaced by e β n (x(i)−1) − e β n (x(j)−1) . Since x(i) > y(j) > x(j), this factor is negative. 3. v / ∈ {i, j}. if w / ∈ {i, j, v}.
Using the same reasoning as the case v = j, we can imagine placing one more particle at vertex i and one less particle at vertex v. Then P x (v, w) − P y (v, w) < 0 for w = j, and we conclude that P x (v, ·) − P y (v, ·) TV = P x (v, j) − P y (v, j) By considering all cases for v, we conclude that P x (v, ·) − P y (v, ·) TV = P x (v, j)− P y (v, j). To compute an upper bound, we consider two cases for v. This last quantity is decreasing in x(j) and increasing in x(u) for u = j. Therefore, we can obtain an upper bound by simply setting x(j) to its lower bound and x(u) to its upper bound (recall that x ∈ C(2λ)). Finally, we obtain where the second last inequality is due to the fact that e −z − 1 ≤ − z 2 for z ≤ 1. .
As in the case above, set x(j) to its lower bound and x(u) to its upper bound, for u = j.