LOOP-ERASED RANDOM WALKS, SPANNING TREES AND HAMILTONIAN CYCLES

We establish a formula for the distribution of loop-erased random walks at certain random times. Several classical results on spanning trees, including Wilson’s algorithm, follow easily, as well as a method to construct random Hamiltonian cycles.


Introduction
Associated to a random walk X on a finite or denumerable state space V is the loop-erased random walk Y constructed as follows. Start X at some point u ∈ V and, for each time n, consider the whole path from u to X n . Then Y n is the self-avoiding path obtained by removing the cycles in the order they appear (a formal definition is given in the sequel). Loop-erasing is a quite natural method to construct self-avoiding paths and is therefore an object of interest in the study of self-avoiding random walks [6]. Moreover, in recent years, random spanning trees and forests have been much considered in probability (see e.g. [2]) and Wilson's algorithm [8], which uses the loop-erasing procedure, has proved to be a useful tool. One of the main interests in that context is the planar case, where the problem of conformal invariance plays major role. Some recent results in that domain can be found in Kenyon [5], although the methods are different, using domino tilings. We determine in this note the distribution of Y at certain random times: the first hitting time by X of some subset U ⊂ V , or an independent time with geometric distribution. Our main formula (Theorem 1), which refines a result by Lawler, provides an easy proof of Wilson's algorithm as well as classical results on spanning trees. We shall concentrate on the case when the state space V is finite. However, our results can be expressed in terms of the eigenvalues of the discrete Laplacian, see Section 2.3 (here we mean by discrete Laplacian an approximation of the second derivative, and not the graph-theoretical Laplacian). This suggests a possible generalization for a denumerable state space, although we shall not deal with the question here. The remainder of this note is organized as follows. The law of Y stopped at certain random times is determined in Section 2, and the application to spanning trees is the topic of Section 3. Using the same methods, we treat the case of Hamiltonian cycles in Section 4.

Definitions and notations
Consider a Markov chain X on a finite set V , and fix a point u ∈ V . Let X start at u and denote by X the path process, that is, for every integer n, The loop-erasure Y n is constructed as follows: put a 0 = u. Then define by recurrence l k to be the last time X visits a k before time n, and a k+1 = X l k +1 . This yields a finite sequence that stops at some integer m when l m = n, and we put Y n = (a 0 , a 1 , . . . , a m ) One easily checks that the previous definition is equivalent to the following: put a 0 = u and Y 0 is the empty path. By recurrence, if Y n = (a 0 , a 1 , . . . , a m ) and X n+1 does not belong to the support of Y n , put That is, the cycles are erased in the order they appear.
We adopt the following notations. The weight of a subset S = {e 1 , . . . , e n } ⊂ V ×V of directed edges is Note that the edges need not be adjacent, and we shall consider the case when S is a tree in Section 3.
Let M be the transition matrix of the chain X. For every subset U ⊂ V , we denote by T U the hitting time (for X) of U , by G(t, x, x; U ) the Green function for the random walk killed at T U : x,x and by M U the matrix M restricted to the coefficients in (V − U ) × (V − U ). In this setting, if γ is a path, we shall write γ instead of supp(γ) to simplify the notations.

The results
Let U ⊂ V with u / ∈ U . We first determine the law of Y at T U . Let Γ U be the set of self-avoiding paths γ of the form γ = (a 0 , a 1 , . . . , a m ) where a 0 = u, a m ∈ U and for every A last-exit decomposition easily leads to the following formula, whose proof can be found in Section 3.1 of [6]: The key remark is the following: for every k ≤ m − 1, This is a straightforward application of the Cramer formula for the inversion of matrices. Together with (1), this implies the following: Theorem 1 Let γ be the self-avoiding path starting at v. Then A similar result appears in [6] for t = 1, but the correcting factor det( is not computed explicitely. We shall express this factor in terms of the eigenvalues of the discrete Laplacian in the next subsection. Our formula also enables us to compute the expected run time of Wilson's algorithm, which plays a key role in computer science (see Section 3).
We can also determine the probability that Y T = γ for an an independent random time T with geometric distribution with parameter p, i.e. P (T = n) = (1 − p)p n . It suffices to add a cemetery point ∂ to the set of vertices V , and to consider the random walk X where the transition probabilities p i,j are replaced by pp i,j for i, j ∈ V , and the transition probability from any vertex u ∈ V to ∂ equals 1 − p. We then look at the loop-erased path at the first hitting time of ∂. Theorem 1 then becomes: Corollary 1 With the same notations as in Theorem 1, we have, for an independent time T with geometric distribution of parameter p, Note that for t = 1, when p → 1, the limit distribution is the invariant measure on the set of self-avoiding paths starting at v for the chain Y . According to Corollary 1, this invariant measure is proportional to the measure that assigns to each path γ the weight In particular, the restriction of the invariant measure to Γ U , for any subset U ∈ V such that u / ∈ U , is proportional to the probability measure computed in Theorem 1. For an explanation of this fact, see the proof of the Markov chain tree theorem in [7].

The correcting factor
The quantities det(Id − M γ∪U ) and det(Id − M U ) can be expressed in terms of the eigenvalues of the discrete Laplacian, an operator defined as follows: for a function f : V → R, we put We say that λ is an eigenvalue of the Laplacian with Dirichlet boundary condition on U ⊂ V if there exists a nonzero function f satisfying Moreover, the multiplicity of λ is the the dimension of the vector space constituted by the functions f such that for every x / ∈ U and for some integer n, Let λ 1 , . . . , λ k be the eigenvalues of the Laplacian with Dirichlet boundary condition on U , and a 1 , . . . , a k their respective multiplicities. Then we have We can define likewise the eigenvalues µ 1 , . . . , µ l of the Laplacian with Dirichlet boundary condition on γ ∪ U and their respective multiplicities b 1 , . . . , b l . Then Alternatively, the ratio can be expressed in the following way. Say that ν is a semi-eigenvalue if there exists a nonzero function f satisfying: Define the multiplicity of ν as the dimension of the the vector space constituted by the functions f such that Let ν 1 , . . . , ν p be the semi-eigenvalues and c 1 , . . . , c p their respective multiplicities. Then The proof of (5) is an easy exercise of linear algebra. Write the matrix M U where the last coefficients correspond to the vertices of the path γ, i.e.
Define the polynomial Then the product of the roots of P is On the other hand, the roots of P are the complex numbers λ such that there exists a vector That is, the roots of P are the semi-eigenvalues, which proves (5).

Application to spanning trees
A spanning tree of a graph is defined as a connected, acyclic subgraph containing all the vertices. Spanning trees are a classical problem in computer science, see the references in [8].
Their study in probability has been the the subject of interesting developements in recent years, see for instance [2] and the references therein. A rooted spanning tree Γ on V can be viewed as a collection of directed edges where each edge points towards the root. Hence we can define its weight with respect to the random walk X, as in Section 2.1. Wilson's algorithm is a method for constructing a random spanning tree with probability proportional to the weight [8]. Fix the root R and chose any ordering on V − {R}.
We define by recurrence a growing sequence of trees T (n) by: • If T (n) spans V , we are done.
• Else, let v be the minimal vertex in the ordering that is not in T (n), and perform the loop-erased random walk Y starting at v up to the first hitting time of T (n). Add this loop-erased path to T (n) to create T (n + 1).
If the chain is irreducible, it is clear that the sequence ends almost surely and that the final tree is a spanning tree. Furthermore we have: Theorem 2 (Wilson) Wilson's algorithm constructs a random spanning tree rooted at R with probability proportional to the weight w. In particular, it is independent of the ordering chosen on V − {R}.
We want to derive Wilson's theorem from Theorem 1. Let Γ be a spanning tree rooted at R, and choose an ordering on V −{R}. We begin by remarking that any sequence of paths, obtained by running Wilson's algorithm, that constructs Γ will in fact construct the same growing sequence of (deterministic) subtrees (U (0), U(1), . . . , U(n)) of Γ. Wilson's theorem is a consequence of the following lemma: Wilson's result follows by taking k = n.

Proof
We proceed by induction on k. For k = 1, as U (1) is a self-avoiding path from the first vertex in V − {R} to the root, this is nothing else than Theorem 1. Suppose that the result is established for k − 1. Let v k be the minimal vertex in V − S k and γ k the path from v k to S k−1 in the tree T (k). Then the conditional probability is the probability that the random walk, started at v k up to the hitting time of S k−1 , has loop-erasure γ k . Using Theorem 1, we see that this probability equals which completes the proof. 2 An easy generalization of Lemma 2 is the following Corollary 2 Let Γ be a tree rooted at R, that is not necessarily a spanning tree. Then the probability that a random spanning tree rooted at R contains Γ is The probability that a random spanning tree rooted at R contains Γ does not depend on the ordering on V − {R}. Let L 1 , . . . , L n be the leaves of Γ and chose an ordering on V − {R} where the first n vertices are L 1 , . . . , L n . Then the probability we are considering is exactly the probability that T (n) = Γ , which is given by Lemma 2.

2
Remark. If the random walk is reversible, it is well-known that it is associated to an electric network. In that case, given any set B of directed edges, the probability that all the edges in B belong to the random spanning tree can be computed by the transfer current theorem, see [3]. Our result only enables us to handle the case when B is a tree rooted at R. Nevertheless, it also holds for non-reversible Markov chains. Another connection of spanning trees to Markov chains is the Markov chain tree theorem [7], which can also be easily derived by our method:

Corollary 3 (Markov chain tree theorem) The stationary distribution of the Markov chain X is proportional to the measure that assigns to every vertex u the mass
where T u is the set of spanning trees rooted at u.

Proof
As Wilson's theorem constructs with probability 1 a spanning tree rooted at u, taking k = n in Lemma 2 yields: But it is well-known (see [9], chapter 1) that the measure Remark. Formula (6) can be seen as a generalization of Khirchoff's formula counting the number of spanning trees in a given graph [4].
As a last application of our method, let us compute the expected number of steps in Wilson's algorithm. This plays a key role in computer science, see [8]. Denote by T the number of steps of the random walk up to the complete construction of the spanning tree rooted at R. Then we have Proposition 1 Let n be the number of points in V − {R}. Then: (i) The distribution of T is given by: Proof As a straightforward generalization of Lemma 2 we have where T R denotes the set of spanning trees rooted at R. According to (6), Z∈TR w(Z) = det(Id − M {R} ), which yields (i).
The computation of E(U ) = g (1) leads to  Remark. The expected number of steps is expressed in [8] as where t x,y is the expected time for the Markov chain to go from x to y, and π is the invariant probability measure. To see that (7) and (8)  But it is classical that G R (i, i) divided by the commute time (t i,R + t R,i ) equals the stationary distribution π(i) (see [1]). Therefore Summing this over all the states i ∈ V identifies (8) with (7). This remark is due to Yuval Peres.

Hamiltonian cycles
The loop-erasing procedure also enables us to construct Hamiltonian cycles, i.e. simple cycles spanning all the vertices of V . Indeed, let the random walk start at some vertex u ∈ V , denote by 0 = T 0 < T 1 < T 2 . . . the successive return times of the random walk to u and put As a consequence, the algorithm yields a random Hamiltonian cycle with probability proportional to the weight.
Note that the joint law of (T, H) does not depend on the starting point u, which is not obvious by simple path transform arguments.

Proof
To prove this result, let us introduce some definition. An excursion will refer to a path (a 0 = u, . . . , a n = u) such that for 0 < j < n, a j = u. An excursion is Hamiltonian if the looperasure of the path (a 0 , . . . , a n−1 ) contains all the vertices of V . In that case, by convention, we say that the loop-erasure of the excursion is the loop-erasure of (a 0 , . . . , a n−1 ). We denote by E the set of Hamiltonian excursions and by E be the set of non-Hamiltonian excursions. Consider the Hamiltonian cycle γ = (a 0 = u, a 1 , . . . , a |V | = u) and denote by E γ the set of Hamiltonian excursions whose loop-erasure is (a 0 = u, a 1 , . . . , a |V |−1 ). Of course, E γ ⊂ E. We want to compute If the random walk X starts at u, conditioning on X 1 and applying Theorem 1 we get: The generating function of from u to u is given by: since any path from u to u is a sequence of excursions. Hence Putting pieces together we recover (9) and the second assertion follows by taking t = 1. 2 Note that one can run several times Wilson's algorithm rooted at u until the spanning tree constructed is a linear one, i.e. is constituted by a single path (a 1 , . . . a n = u), and then take the Hamiltonian cycle (a 0 = u, a 1 , . . . a n = u). In that case one constructs a random Hamiltonian cycle with probability proportional to |V | i=2 p ai−1,ai but this is different from the weight of the cycle, which is The expected run time of the algorithm is given by the following:

Corollary 4
The expected time to construct a random Hamiltonian cycle by the algorithm of Theorem 3 is: where T is the set of rooted spanning trees of V (rooted at any vertex) and C is the set of Hamiltonian cycles.

Proof
We want to compute f (1), f being the function defined by  Then we stop X at L and keep the whole path if it is a Hamiltonian cycle (in which case L = |V |). Else we restart X at u, and so on. It is clear that this procedure yields a random Hamiltonian cycle with probability proportional to the weight. The number of steps T of the algorithm has the following distribution: E u (t T ) = P u ((X 0 , . . . , X L ) ∈ C)t |V | 1 − n≤|V | P u (L = n, (X 0 , . . . , X L ) / ∈ C)t n Remark that P u ((X 0 , . . . , X L ) ∈ C) = γ∈C w(γ) Thus E u (T ) = E u (L) γ∈C w(γ) In particular, the expected run time of the naive algorithm depends on the starting point. Also remark that E u (T ) ≥ E(T ) since E u (L) ≥ 1 ≥ t∈T w(t).