Scaling limit of a self-avoiding walk interacting with spatial random permutations

We consider nearest neighbour spatial random permutations on $\mathbb{Z}^d$. In this case, the energy of the system is proportional the sum of all cycle lengths, and the system can be interpreted as an ensemble of edge-weighted, mutually self-avoiding loops. The constant of proportionality, $\alpha$, is the order parameter of the model. Our first result is that in a parameter regime of edge weights where it is known that a single self-avoiding loop is weakly space filling, long cycles of spatial random permutations are still exponentially unlikely. For our second result, we embed a self-avoiding walk into a background of spatial random permutations, and condition it to cover a macroscopic distance. For large values of $\alpha$ (where long cycles are very unlikely) we show that this walk collapses to a straight line in the scaling limit, and give bounds on the fluctuations that are almost sufficient for diffusive scaling. For proving our results, we develop the concepts of spatial strong Markov property and iterative sampling for spatial random permutations, which may be of independent interest. Among other things, we use them to show exponential decay of correlations for large values of $\alpha$ in great generality.


Introduction
Self-avoiding random walks are by now a classical topic of modern probability theory, although many questions still remain to be answered; we refer to the classic book [28] and the more recent survey [32]. A variant of self-avoiding walks are self-avoiding polygons (see e.g. [23]), where the last step of the selfavoiding walk has to come back to the point of origin.
Spatial random permutations, on the other hand, are a relatively recent concept. They were originally introduced due to their relevance for the theory of Bose-Einstein condensation [5,7,34], but are of independent mathematical interest. The purpose of the present paper is to view spatial random permutations as systems of mutually self-avoiding polygons and compare the behaviour of a selected cycle of a spatial random permutation to the one of a self-avoiding walk or polygon. Put differently, a selected cycle of a spatial random permutation can be viewed as a self-avoiding polygon embedded into a background of other self-avoiding polygons, and we are interested in the effect that this embedding has. For this purpose, we restrict to nearest neighbour spatial random permutations, as they are most closely related to selfavoiding walks. For a finite subset Λ n of Z d , the model is defined by the probability space S Λ consisting of all bijective maps π on Λ with the property that |π(x) − x| ∈ {0, 1}. The probability measure (with order parameter α) is given by assigning the element π ∈ S Λ the energy H(π) := x∈Λ |π(x) − x|, and the probability where Z(Λ) is the partition function (normalising constant).
The most important questions in spatial random permutations concern the length of their cycles, in particular the existence of macroscopic cycles. It is known [5] that the probability that the origin is in a cycle of length larger than k decays exponentially with k, uniformly in the volume Λ. It is expected that for dimensions d = 3 and higher, there exists a critical value α c of the order parameter so that for α < α c , the cycle containing the origin (or any selected point) is of macroscopic length with positive probability. In d = 2, on the other hand, only a Kosterlitz-Thouless phase transition is expected, meaning INTRODUCTION that the probability of the cycle being larger than k decays exponentially if α > α c but algebraically otherwise. While there is good numerical evidence for the existence of long cycles in d ≥ 3 [20,21] and the Kosterlitz-Thouless transition [2] in d = 2, actually proving any positive statement about the existence of long cycles is the great unsolved problem of the theory of spatial random permutations. The only case where such a statement is known is for an annealed version of the model (with a slightly different energy) [4,6]. The argument there relies on explicit calculations using Fourier transforms; all attempts to get away from this exactly solvable situation have so far failed. In [7] a non-rigorous argument is made that the study of models of non-spatial permutations with cycle weights may be useful, and such models have received some attention recently [3,8,10].
In the context of self-avoiding walks, the concept of a phase transition from short to long loops is present in the following results. Consider a single step-weighted, i.e. fix a sequence of growing subsets Λ n (e.g. cubes) of Z d , for each n two points a and z at opposite ends of Λ n , and consider the set of all self-avoiding walks starting in a and ending in z. Let α ∈ R, and assign to each such self-avoiding walk X the weight exp(−α|X|), where |X| is the number of steps that X takes. Write µ for the connective constant of the d-dimensional cubic lattice. When α > log µ, it is known that the shape of X converges to a straight line as n → ∞, when scaled by 1/n. Actually, when scaled by 1/n in the direction of a − z and by n −1/2 in the directions perpendicular to this vector, X converges to a Brownian Bridge. These results are implicit in the works [9,11] and have been worked out by Y. Kovchegov in his thesis [26].
For α < log µ, on the other hand, the results are entirely different. As Dominil-Copin, Kozma and Yadin have recently shown [15], in this case the rescaled self-avoiding walk becomes weakly space filling, meaning that it will only leave holes of logarithmic size in the graph [15]. Their results also hold for the self-avoiding loop, i.e. in the case where a and z are chosen to be the same site. Note that for α > log µ, the self-avoiding loop will converge to a point in the scaling limit.
It is therefore a natural question what happens when the self-avoiding loop is embedded into a background of other self-avoiding loops, i.e. in the case of spatial random permutations. The proof of the existence of weakly space filling cycles would be particularly interesting, as this would imply at least imply that the expected length of the cycle is infinite, and thus a a phase transition. Unfortunately, this is not what we can show. Instead, we give a somewhat negative result: we show that if there is a regime of space filling cycles, it must start at lower α than for the case of the self-avoiding polygon. More precisely, in the case where G is a subgraph of a vertex transitive graph, and when µ is the cyclic connective constant of that graph, then we identify in Theorem 2.1 an α 0 < log µ so that for all α > α 0 , and uniformly in the size of G, the length of a cycle through a given point has exponential tails. Thus in the interval (α 0 , log µ) the single self-avoiding loop is weakly space filling while the self-avoiding loop embedded into an ensemble of other such loops is very short.
For our second result, we restrict to the case where Λ n = [0, n] × [−n/2, n/2) d−1 and impose periodic boundary conditions on all except the first coordinate. We embed into the spatial random permutation a self-avoiding path starting in 0 and conditioned to end in a point z on the opposite side of Λ, see Figure  1. In this situation, we (almost) recover the results of [26], i.e. we show that for sufficiently large α, the self-avoiding walk starting in 0 collapses to a straight line in the scaling limit, as n → ∞. Unlike in the case of the single self-avoiding walk, we do not have a good quantitative estimate on the threshold above which this behaviour holds, and we cannot quite control the fluctuations well enough to prove the convergence to a Brownian bridge. The reason is that the background of cycles introduces additional correlations that are very hard to control. This can be explained well by investigating the strategy of proof for the case of the self-avoiding walk, and discussing where it fails in our case.
The basic idea in the case of the self-avoiding walk (which we indeed adapt and extend) is to introduce regeneration points. The walk is forced to connect 0 to the other side of the box; let us assume that it is the other side with respect to the first coordinate. A regeneration point is a point where the 'past' of the walk is entirely to the left (has smaller first coordinate) of that point, while the 'future' is entirely to the right. It is easy to see that if there are no regeneration points in an area of a given (horizontal) width w, then the self-avoiding path has to have at least 3w steps taking place in this area. Since every step that the walk takes introduces an additional factor of e −α to its weight, and since α is large, it is possible to show that there must be many regeneration points, more precisely that the probability to not find any regeneration point in a vertical strip of width K steps decays exponentially in K, uniformly in n. On the other hand, regeneration points provide fresh starts for the walk (hence the name): when conditioned to start at a regeneration point, the resulting model is again a self-avoiding walk. The regeneration points themselves thus form random walk with iid steps, and by the exponential bound discussed above we have excellent control on the step size of that process. All scaling results now follow from standard limit theorems for random walks.
This argument breaks down at several places when going to spatial random permutations. First of INTRODUCTION z a Figure 1: Representation of a bijection with a forced open cycle between a and z, when Λ ⊂ Z 2 is a box with cylinder boundary conditions. If π(x) = x, then a circle is drawn at x, while if π(x) is a neighbour of x, then an arrow is directed from x to π(x).
all, the energy of the system is an extensive quantity for all values of α, i.e. it grows proportional to the volume. Since the energy of the embedded random walk only grows proportional to its length, in the case where it collapses to a straight line its energy will be a subdominant compared to energy provided by the environment. Therefore, it will be much harder to argue that a broad vertical strip with no regeneration points is unlikely just because the walk would need to take many steps. Secondly, even if we do have a regeneration point, just considering the area to the right of it will not decouple the past from the future. The reason is that with probability one (in the limit n → ∞) a short cycle from the background will cross the vertical hyperplane containing the regeneration point, and introduce correlations. We solve both problems by developing a method for estimating the decay of correlations for spatial random permutations in the case of large α. Since we expect this method to be of independent interest, and since it does not complicate matters much, we develop it for general graphs. The method is based on a strong version of the spatial Markov property, and on iterative sampling, and is strong enough to provide exponential decay of correlations in a rather general context. We then extend the concept of regeneration points and introduce (random) regeneration surfaces, that serve the same purpose as the (deterministic) hyperplanes that separate regions in the self-avoiding walk case. See Figure 1 b) for an illustration. While our estimates are strong enough to give us the correct scaling limit, they are (just) not strong enough to get down to diffusive scaling. The main reason is that we can only show that consecutive regeneration points (and surfaces) have a distance of order log n with high probability, not a finite one as in the case of the self-avoiding walk.
There are other models where the scaling limit of lower dimensional structures interacting with an environment is studied. These include the Ising model [13], Bernoulli Percolation [14], Self-Avoiding Walks [9,11] and Random Cluster models [12]. All of these approaches rely on some variant of the Ornstein-Zernike theory [29], and they all need correlation inequalities of some kind. For example, in the Ising model, the Edwards-Sokal coupling [19] can be used. No correlation inequalities are known in spatial random permutations, and indeed we expect that finding such inequalities would have a significant impact on the subject area. Our method shares some features of the Ornstein-Zernike approaches above, but does not require correlation inequalities in order to apply. Instead, we obtain decay of correlations by iterative sampling, although this method is not (yet?) quite strong enough in order for us to obtain diffusive scaling.
The significance of the model of spatial random permutations with a forced cycle goes far beyond the situation that we describe here. In [34] it is shown that (in a suitable variant of spatial random permutations) the ratio of the partition functions of a system with a forced cycle and one without can be used to detect Bose-Einstein condensation: if this ratio stays positive uniformly in the volume and the spatial separation of the two endpoints of the forced cycle, this is equivalent to the presence of off-diagonal long range order [30], which itself is equivalent to Bose-Einstein condensation. Some progress has been made in understanding related models, e.g. the Heisenberg model and its connection to ensembles of mutually self-avoiding walks [33], or the random stirring models introduced by Harris [24] and further analysed in [1,31,25]. The big question about the existence of long loops, however, remains open.
On the probability side, our model is closely related to the connection is to the loop O(n) model, which has been introduced in [17]. In this model a loop configuration ω is an undirected spanning subgraph of a graph G such that every vertex of this subgraph has degree zero or two. The weight of a loop configuration is proportional to e −α o(ω) n L(ω) , where o(ω) is the number of edges in ω, L(ω) is the number of loops and n is a positive real. The case n = 0 corresponds formally to self-avoiding walk if one forces a path in the system in addition to the loops. The case n = 1 corresponds to the low-temperature representation of the Ising model. If viewed as an ensemble of cycles, spatial random permutations are intimately related to the loop O(n) model with n = 2, since each cycle of the permutation admits two possible orientations. The two models would be equivalent if in spatial random permutations cycles of length two were forbidden. On the hexagonal lattice, the loop O(n) model has been conjectured to undergo a Kosterlitz-Thouless phase transition at the critical threshold log 2 + √ 2 − n when n ≤ 2 [27]. This is compatible with our general finding that on every vertex-transitive graph the critical threshold of spatial random permutations, which corresponds more or less to the n = 2 case, is strictly less than the critical threshold for the selfavoiding walk, corresponding to the n = 0 case. Furthermore, it has been conjectured that only short cycles are observed at all values of α when n > 2. This has been rigorously proved only for n large enough in the article of Duminil-Copin, Peled, Samotij, and Spinka in [16], who also provide details on the structure of the typical configurations and provide evidence for the occurrence of a phase transition. Exploring the properties of the model at low values of n is of great interest and entirely open. Most of the proofs that we present in this paper, can be reproduced for the loop O(n) model for all values of n and α large enough, without the restriction of considering the hexagonal lattice (see Remark 3.2 below for further comments on this).
Our paper is organized as follows. We give precise definitions and state our results in Section 2. In Section 3 we prove our result on non-existence of long cycles, and provide various estimates on the partition functions over different domains which we will need later. In Section 4 we discuss the spatial Markov property and iterative sampling, and derive our results about exponential decay of correlations. In Section 5, we give the proof of our main result, Theorem 2.2, using the results of the previous sections.

Definitions and main results
We consider a finite simple graph G = (V, E). A permutation on G is a bijective map π : V → V so that for all x ∈ V , either π(x) = x or {x, π(x)} ∈ E. We write S V for the set of all permutations on G, omitting the dependence on the edge set in the notation. Also, when U ⊂ V , we write S U for the set of permutations on the subgraph (U, E U ) generated by U , i.e. where E U := {(x, y) ∈ E : x ∈ U, y ∈ U }).
For a given π ∈ S V , we define its energy by where the indicator above is 1 if π(x) = x and 0 if π(x) = x, and its probability as Above, α ∈ R controls the preference of random permutations to have fixed points, and Z(V ) is called partition function. The expression (2.1) is less general than it could be. By replacing the indicator function in that formula with edge weights d : E → E, {x, y} → d(x, y), we can generalize the definition of random permutations on graphs sufficiently so that classical cases including the quadratic jump penalization (see e.g. [2,5]) are covered. While some of our results below (most notably the iterative sampling procedure) can be adapted to hold for this general situation, our main results compare the behavior of random permutations to the behavior of self-avoiding paths. Therefore, we prefer to stick with the narrower definition (2.1).
The most interesting objects in spatial random permutations are their cycles. For π ∈ S V and z ∈ V , the cycle of π containing z is the directed graph on We will denote it by γ z (π). We will regularly abuse the notation and use the symbol γ z (π) also to denote the vertex set of the cycle, viewed as a subset of V . We denote the total number of edges of γ z (π) by γ z and we refer to it as length of γ z . In the special case where π(z) = z, γ z (π) has vertex set {z} and empty edge set, therefore length 0. It is known [5] that there exists some α 0 > 0 so that for all α > α 0 , long cycles are exponentially unlikely. The first result of the present paper is to sharpen this statement by providing some information about the value of α 0 . To state it, let us recall the definitions of connective constant and cyclic connective constant.
A self-avoiding path in a graph G is a finite directed subgraph (U, E ) of G such that there is an enumeration (x 1 , x 2 , . . . x n ) of U with the property that Note that by considering the graph (U, E ) as directed, we give γ an orientation. For an infinite, vertex transitive graph (Z d with the nearest neighbor edge structure being the most important example), we single out a vertex 0 ∈ V and call it the origin. We write SAW n (respectively SAP n ) for the set of all self-avoiding paths (respectively cycles) starting from 0, with n edges. Thus, we have that SAW 0 = SAP 0 = {0}. Then the limits and exist. For (2.4) this follows from a sub-additivity argument [22], while for (2.3) it follows from the fact that |SAP n | ≤ |SAW n |. The latter also immediately shows µ G ≤ µ G for all vertex-transitive graphs G.
Hammersley [23] proved the remarkable fact that We are now ready to state our first main result. Then for all α > α 0 there exist constants C 0 (α), c 0 (α) > 0 such that for any finite subgraph of G generated by U ⊂ V , for all z ∈ U , and for all ∈ N, Moreover, we can choose C 0 (α) and c 0 (α) so that lim α→∞ c 0 (α)/α = 1 and lim sup α→∞ C 0 (α) < ∞.
It is interesting to compare the above result with the findings of Duminil-Copin, Kozma and Yadin [15], who study self-avoiding walks on Z d with an energy proportional to their length. To connect to the present paper, it is best to view them as random permutations conditional on not having any cycles except the one through the origin. In [15] it is shown that when α < log µ Z d , the resulting self-avoiding cycle is weakly space filling, in particular its expected length is infinite. Theorem 2.1 shows that without the conditioning on having just one cycle, the situation is drastically different: since µ Z d = µ Z d and since the solution α 0 of (2.5) is strictly smaller than log µ Z d , in the interval (α 0 , log µ Z d ) the lonely self-avoiding cycle has infinite expected length (as the relevant subgraph becomes large), while the length of the cycle through the origin in ordinary spatial random permutations has exponential tails.
For our second main result, we restrict our attention to cylindrical subgraphs of Z d . For n ∈ N, let Λ n := [0, n] × (−n/2, n/2] d−1 ∩ Z d . Elements of Λ n will be written in the form x = (x,x) withx ∈ Z andx ∈ Z d−1 . We impose cylindrical boundary conditions on [−n/2, n/2) ∩ Z d−1 (but not on [0, n] ∩ Z), and edges are then between nearest neighbours in Λ n . We will denote the resulting graph with the same symbol Λ n if no confusion can arise. For a subset A ⊂ Λ n , we will also write A for the subgraph of Λ n that retains all of the edges where both endpoints lie in A. For a, z ∈ A, we define S a→z A as the set of maps π : A → A with the properties It is easy to see that S a→z A = ∅ if the vertex a is disconnected from z in the graph A, and that z = lim n→∞ π n (a) otherwise. For given π ∈ S a→z A , we will always write γ(π) = Orb π ({a}). We have π(A) = A \ {a}, and γ(π) is the trace of a self-avoiding walk starting at a and ending in z that is embedded in π. Thus the probability measure P a→z A defined through describes a step-weighted self-avoiding walk interacting with a background of spatial random permutations. We also define The probability measure on S a→ n A will be (2.7), except that the normalisation is now given by Z a→ m (A) = z:z=j Z a→z (A). We can now state the main result of this paper. Theorem 2.2. There exists α 0 > 0, D < ∞ and N ∈ N so that for all M > 0, sup n≥N,α>α0 P 0→ n Λn max{|ŷ| : y ∈ γ} > M n log n < D/M.
Thus for large α, the self-avoiding walk embedded into π converges to a straight horizontal line, and the vertical aberration can be proved to be just a bit larger than √ n. Indeed, we expect that the true vertical aberration is exactly of the order √ n and that γ converges to a Brownian motion under diffusive scaling. This is known in the case of a self-avoiding walk without a background of spatial random permutations [26]. In our situation, the strong correlations prevent us from getting a presumably sharp upper bound on the fluctuations, and indeed also prevent a useful lower bound. We will comment on the places where we lose the necessary accuracy for diffusive behaviour at the end of the proof of Theorem 2.2.

Cycle length and partition function
In this section we prove Theorem 2.1 and provide further estimates comparing partition functions for different subsets of V . Our Theorem 2.1 treats only the case of vertex transitive graphs since we want quantitative estimates involving the connective constant; but it is not difficult to modify our proof so that one can treat general graphs, including those with edge weights. In the latter case, some modifications will be necessary, as the graph distance is no longer a good quantity to measure the distance between sets. We do not pursue this any further in the present paper.
Recall that for a self-avoiding path or cycle γ, γ denotes the number of its edges and that |γ| denotes the number of its vertices. Our first comparison is Proposition 3.1. For any finite simple graph G = (V, E), for any self-avoiding path or cycle γ ⊂ G, we have that (3.1) Proof. Let γ ⊂ V be a self-avoiding path or a cycle. Let us denote by {x 0 , x 1 , . . . , x |γ|−1 } the sequence of sites of γ, ordered such that (x i , x i+1 ) is an edge of γ. We let M be the largest even number such that M ≤ |γ|. We claim that where we recall our abuse of notation: γ denotes the sites occupied by the cycle γ, and Z(γ) is the partition function of the subgraph generated by those sites. To see (3.2), we partition γ into pairs (x 0 , . Letγ be the graph obtained from γ by keeping only edges connecting vertices of the pair (x i , x i+1 ), for even i ∈ [0, M − 1], and removing all the other edges. Clearly, Z(γ) ≥ Z(γ), sinceγ is a subgraph of γ. Since the graphγ is composed of M 2 disjoint subgraphs containing two vertices connected by an edge each and since the contribution of each of these subgraphs is (1 + e −2α ), (3.2) follows.

CYCLE LENGTH AND PARTITION FUNCTION
Now note that, by Proposition 4.1 (i), Note also that if γ is a cycle, then |γ| = γ is even and we can set M = γ , while if γ is a self-avoiding path, then M ≥ |γ| − 1 = γ . Thus, we have that if γ is a self-avoiding path or a cycle, then The estimate of Proposition 3.1 is logarithmically sharp if the cycle γ is "stretched out" in the sense that the subgraph generated by the sites of γ contains no further edges beyond those of γ. If γ is "curly", meaning that many points in the relevant subgraph are connected by more than two edges, one could use these edges in order to get better lower bounds on Z(A) and thus better upper bounds on the ratio Z(V \ γ)/Z(V ). The associated combinatorics do not look easy even in the case of V ⊂ Z 2 , though.
We are now ready to prove our first main theorem.
Proof of Theorem 2.1. Let G be an infinite vertex-transitive simple graph of bounded degree, let U be a finite subset of V . For π ∈ S U , x ∈ U , and a cycleγ ⊂ U , we have Let µ G be the cyclic connective constant of G. The definition of cyclic connective constant (2.3) implies that for every δ > 0 there exists C δ > 0 such that for any n ∈ N, By vertex transitivity, the same bound holds for SAP n (x), the set of self-avoiding cycles of length n starting in x. By Proposition 3.1, we then have that for all ∈ N + , Recall that α 0 is defined as the unique solution of (2.5) and that therefore it satisfies 0 < α 0 < log(µ G ). For each α > α 0 , we can find δ(α, G) > 0 small enough such that Thus It is not difficult to see that for all large enough α, we can choose δ(α, G) = 1, and that then lim α→∞ c 0 (α)/α = 1. This concludes the proof of the theorem.  [16], corresponding to random permutations without cycles of length 2, the exponential bound (2.6) of Theorem 2.1 would still hold true, but only for α > log µ. Exponential decay of cycle length for the loop O(n) model has been proved in the paper [16] for any value of α when n is large.
The next proposition uses similar ideas as in the previous proof in order to give a complementary bound to the one of Proposition 3.1.
Proposition 3.3. Let G be an infinite vertex-transitive simple graph, let α 0 be the unique solution of equation (2.5). For any α > α 0 there exists c 1 (α) > 0 such that for any finite U ⊂ V , and for all A ⊂ U ,

ITERATIVE SAMPLING
Proof. Fix x ∈ U . Using the notation as in the proof of Proposition 3.1, we get Withγ ∈ SAP n (x) ∩ U for n ≥ 2, we apply Proposition 3.1 to the self-avoiding pathγ \ {x} (of length n − 2), and the graph U \ {x}, giving Thus, Above, the constants are the same as in Theorem 2.1. By our knowledge of C δ(α,G) and c 0 (α), clearly the right hand side of the above equation converges to one as α → ∞. Thus we have shown the claim for For general A, the claim follows from the telescopic product

Iterative sampling and the spatial Markov property
Spatial random permutations have a fundamental spatial Markov property, which we will first discuss in a relatively simple form. Then we introduce a sort of strong Markov property, and finally we present iterative sampling, which is a basic new technique that enables many of our proofs. Let G = (V, E) be a finite graph. For B ⊂ V , we define the sigma algebra Put differently, the value f (π) of an F B -measurable function is determined by the set of values π(x) and be the family of π-invariant sets. The Markov property, statement (iii) in the theorem below, says that conditional on A ∈ Inv, an F A measurable random variable and an F A c -measurable random variable are independent. In order to state it correctly, let us recall that P A denotes the probability measure (2.2) on the subgraph generated by A, and write E A for the corresponding expectation. For π ∈ S A we write π ⊕ id for the permutation in S V that is obtained by setting π(x) = x for all x / ∈ A.
Proof. By the additivity of H V , we have for F A -measurable f and F A c -measurable g that Inserting f = 1 and g = 1 into (4.2) and dividing by Z(V ) gives (i). Setting g = 1 in (4.2) and dividing by , and thus (ii). For (iii), note that dividing on the left hand side by using (i), while on the right hand side it gives )). The remaining equality follows from (ii). For (iv), it suffices to use Following ideas from the theory of Markov chains, we can also formulate a strong Markov property for random permutations. A set-valued random variable Q : S V → P(V ) will be called an admissible random invariant set if Q(π) ∈ Inv(π) for all π ∈ S V , and if the event Moreover, we define the sigma algebra We then have Proof. Since Q is admissible, it is F Q -measurable, and so the right hand side above is also F Q -measurable. Moreover, for a F Q -measurable random variable g, we have Above, the second equality is true since Q = A implies A ∈ Inv and the third equality holds by Proposition 4.1 (ii) and the fact that 1{Q = A}g is F A -measurable for each F Q -measurable g. The fourth equality is due to Proposition 4.1 (iii), which is applicable since f is F B -measurable and thus in particular F A cmeasurable as A ∩ B = ∅. The claim is thus shown.
We demonstrate the usefulness of the strong Markov property by applying it to the question of decay of dependence on the boundary conditions of spatial permutations.
where we think of U as being large and differing from V only "near the boundary", and of B as being "near the center" of V . We are interested in the difference between E V (f ) and E U (f ) in cases where the graph distance between U c and B becomes large and f is F B -measurable. In other words, we are interested in how much the precise shape of the graph at the boundary influences the expectation of local random variables. This is a classical question in any theory related to Gibbs measures. Our answer to this question is the crude, but useful estimate that is provided in the next proposition.
LetG be the disjoint union of two copies of G. LetṼ = V 1 ∪ V 2 be its vertex set, where V 1 and V 2 are disjoint copies of V . Let φ :Ṽ →Ṽ be the natural symmetry onṼ , i.e. the map such that each x ∈ V 1 is mapped to the vertex in V 2 that corresponds to x when V 1 and V 2 are identified with V . Forπ ∈ SṼ , let Q A (π) be the minimalπ-invariant set that contains A and is compatible with φ. The latter means that φ(Q A ) = Q A . See also Figure 2. Since the intersection of twoπ-invariant, φ-compatible sets containing A retains these properties, and sinceṼ has them, it is clear that such a minimal set exists.  Figure 2: The black set in V 1 corresponds to B, the dark set in V 1 corresponds to U 1 , the dark dotted line in V 1 and V 2 represents the boundary of the (random) minimal φ-compatible invariant set containing A ⊂ V 1 .
Proof. As before, the superscript˜will refer to objects that are defined inG, the disjoint union of two copies of G. Any graph permutationπ onṼ can be written as Then and (ii): Every strict subset of D fails to have at least one of the three properties. gives, where B is regarded as a subset of V 1 . The first term of the right hand side is equal to . Thus we can do the same calculation with f 2 instead of f 1 , and subtract the results. If in addition we divide by EṼ (1{π| A = id}) we arrive at which implies (4.3) and concludes the proof of the proposition.
When we want to compare two different subsets U 1 and U 2 of V , we could slightly adapt the definition of A above to get a similar estimate. In most cases, simply using the triangle inequality on (4.3) is enough.
In order to estimate the right-hand side of (4.3), we now need a tool to show the existence of invariant subsets with certain prescribed symmetries, and containing certain prescribed subsets of V . This tool is iterative sampling. Iterative sampling is a procedure to build a P V -distributed random variable step by step, reminiscent of the way that the full path of a Markov chain can be sampled step by step. In words, what we do is the following: we first sample a random permutation from P V , but keep only the cycles intersecting a (possibly random) set K 1 . The only restriction is that K 1 may not depend on the permutation we just sampled, it has to be chosen independently. We end up with a set D 1 where we have kept the cycles, and a set B 1 where we have discarded them, where D 1 and B 1 are disjoint and their union is V . We then resample the permutation, independently, inside the subgraph generated by B 1 . Again, we only keep the cycles intersecting some set K 2 that may depend on what has happened before and some external randomness, but not on the permutation inside B 1 . We carry on until we exhaust V . Formally, the definition is as follows. Definition 1. A sampling strategy for spatial random permutations on a finite graph G = (V, E) consists of two families of random variables (σ A ) A⊂V and (K A ) A⊂V such that (i): for all A, σ A is a P A -distributed random variable, in particular has values in S A . (σ A ) A⊂V is a family of independent random variables.
(ii): K A takes values in the power set P(A) for all A, and P(A = ∅) = 0 when A = ∅.
Note in particular that we require that K A is independent of σ A . Given a sampling strategy, we define a recursive sampling procedure as follows: We set B 1 = V , and for i ≥ 1 we set Above Or π (A) = {π j (x) : x ∈ A, j ∈ N} denotes the orbit of A under π. This way we end up with a sequence (B i , D i , π i ) where D i ∈ Inv(π i ), and where the D i form a partition of V . It follows that the map is a random element of S V .
Lemma 4.4. For any sampling strategy, the distribution of π is P V .
Proof. We prove the theorem by induction on the number of sets K i . Let us call (K A ) A⊆V an n-step sampling strategy if K A (ω) = A for all ω in the underlying probability space such that A = B n+1 (ω).
Since the event {A = B n+1 } is independent of (σÃ)Ã ⊆A , any sampling strategy can be turned into an n-step sampling strategy, and the union over the n-step sampling strategies for all n gives all sampling strategies. For a one-step sampling strategy, we have that (4.9) For fixed A, all three events in the probability above are independent and moreover and Since H D A (π) (π) + H V \D A (π) (π) = H V (π), the product of the two terms above does not depend on A and is equal to P V (π), and summing (4.9) over A gives the result in the case of one-step sampling strategies. Now assume that the claim holds for all n-step sampling strategies up to some n ∈ N. Let (σ A , K A ) A⊂V be an n + 1-step sampling strategy. For fixedπ ∈ S V andK ⊂ V with P(K V =K, σ V =π) > 0 we define the conditional measure We writeD = Orπ(K) andB = V \D, and claim that under the measure Pπ ,K , the families (σ A ) A⊂B and (K A ) A⊂B form a sampling strategy for random permutations on the graph generated byB. To check this claim, note that the event {K V =K, σ V =π} is independent of the family (σ A ) A⊂B , and thus for all choices of π A ∈ S A , Pπ ,K (σ A = π A ∀A ⊂B) = P(σ A = π A ∀A ⊂B). This shows independence of the family (σ A ) A⊂B . Furthermore, for A ⊂B, κ ⊂ A, and A 1 , . . . A m ⊆ A, we have Thus using (4.10) from right to left, we see that K A is independent of the family (σÃ)Ã ⊂A under Pπ ,K , and thus we have indeed a sampling strategy, which in addition clearly is an m − 1-step sampling strategy.
In the last line we used the induction hypothesis, and the independence of K V and σ V . For fixedK, the sum overσ can now be carried out, and the resulting term is The result now follows by summing overK.
Let us now come back to the task of constructing invariant sets with prescribed symmetries. We will be slightly more general than in the discussion leading up to (4.3), as this generality will be needed further on. Let G = (V, E) be a finite graph. A symmetry of G is a bijective map φ on V such that for all x, y ∈ V , {φ(x), φ(y)} ∈ E if and only if (x, y) ∈ E. For a group Φ of symmetries on G and an element π ∈ S V , we say that U ⊂ V is a Φ-compatible π-invariant set if U is invariant under π as well as under all φ ∈ Φ.
Let us first describe our sampling procedure in words. We start with the full graph (V, E), A ⊂ V , and choose the symmetrization Φ(A) := φ∈Φ φ(A) of A as our first set of which we want to keep the cycles. We draw a sample σ V , and look at the cycles intersecting K 1 := Φ(A). If none of the cycles of σ V intersecting K 1 leaves K 1 , this means that K 1 is already a Φ-compatible, π-invariant subset of V , and we are done. Otherwise, we take D 1 = Or(K 1 ), and 2 , all of its images under maps φ ∈ Φ might be elements of B 2 . Since x is an element of B c 2 , this means that B c 2 is not Φ-compatible in this case. We defineK 2 = B 2 ∩ Φ(B c 2 ) (see also Figure 3).K 2 may still be empty, e.g. if we are lucky enough so that all cycles leaving K 1 jointly cover a Φ-compatible set. In that case, we are done, and B c 2 is the desired set, as it is Φ-compatible, π-invariant and contains A. Otherwise, we put K 2 =K 2 . By the considerations we have just made, it follows that the cardinality of K 2 is bounded by (|Φ| − 1)|D 1 \ K 1 |. This estimate will be useful later on. We now sample a random permutation σ B2 and keep all the cycles intersecting K 2 and arrive to a set D 2 . Again, if D 2 = K 2 , then V \ K 2 is a Φ-compatible, π-invariant subset of V , and we are done. Otherwise, we repeat the procedure, i.e. we set . As before, we have |K 3 | ≤ (|Φ| − 1)|D 2 \ K 2 |. We continue in this way until we exhaust V (in which case we have found no non-trivial Φ-compatible subset), or until at some point D j = K j , in which case B c j+1 = i≤j D i is a Φ-compatible, π-invariant set containing A.
Formally, the sampling strategy is the following. The family (σ B ) B⊂V are just independent random variables where σ B is P B -distributed. For given subset A, the maps K B are defined by It is obvious that this is indeed a sampling strategy. For any configuration ω we put, ThenÂ(ω) is a π(ω)-invariant subset, compatible with Φ, and containing A. Recall that, from Lemma 4.4, π is distributed like P V . We say that U separates Thus, if in addition A(ω) ∩ B = ∅ for some subset B, we have found a Φ compatible, π(ω)-invariant subset separating A from B. We summarize: Proposition 4.5. For A, B ⊂ V andÂ defined as in (4.12), we have The previous proposition will only be useful if we have a way to control the probability that the random setÂ intersects B. At this point, we currently have no better tool than the following very crude estimate: if we write d for the graph distance on G, then (4.13) Inequality (4.13) holds true since by construction, from every x ∈Â a path in G that is entirely contained inÂ leads to Φ(A). Fortunately, this crude estimate will be sufficient for our purposes. We therefore now consider bounds on the cardinality ofÂ. We now introduce the notion of random permutation with bounded cycle length. Even if so far in this section we considered only finite graphs, we introduce this notion for graphs that might be finite or infinite. Recall that a random variable X is stochastically dominated by another random variable Y if P(X ≥ k) ≤ P(Y ≥ k) for all k. Let ξ be a random variable on the positive integers. Consider a graph (V, E) that might be finite or infinite. We say that our random permutation model has cycle length bounded by ξ in (V, E) if, uniformly in all finite sub-volumes U ⊂ V and all points x ∈ U , the length of the cycle γ x containing x is stochastically dominated by ξ. In other words, for all ≥ 1 we assume that (4.14) We use this notion in the next proposition.
Proposition 4.6. Let ξ be an integer-valued random variable, and assume that a random permutation has cycle length bounded by ξ on the finite graph (V, E). Let (ξ n ) n∈N be a sequence of independent copies of ξ. Then for each A ⊂ V and each ∈ N, we have Proof. We consider the following sampling strategy: let |A| = n and let x 1 , . . . , x n be the elements of A. We sample the cycles intersecting A one by one, in the order of the x j . In other words, for B ∩ A = ∅ we set j B = min{i ≤ n : This way, we get a sampling procedure with sets (B i ) i∈N . Let L i = |γ x(Bi) | be the number of vertices in the i − th cycle that is sampled. When A is exhausted after τ steps, we set L i = 0 for all i > τ . Note that τ (ω) ≤ n for any ω, where ω denotes the realization in the probability space of the sampling procedure. The sequence (L i ) i∈N of random variables is definitely not a Markov chain, and usually L i will not even be measurable with respect to the σ-algebra generated by all the other L k . However, it has the property that for givenB ⊂ V and given cycleγ, Since L k = |γ k |, we conclude that when F k is the σ-algebra generated by L 1 , . . . , L k , then for all ω. We have just checked Assumption 2 of our comparison Lemma, which we give in the Appendix as Lemma 5.6. In the notation of that lemma, (M 1 j ) j≥1 = (L j ) j≥1 , and (M 2 j ) j≥0 = (ξ j ) j≥0 . It is easy to see that Assumptions 1 and 3 also hold. The second statement of Lemma 5.6 then implies that for any k ∈ N, The statement of our proposition now follows when observing that |Or(A)| = |A| j=1 L j . Proposition 4.7. Consider a sampling strategy with the property that there exists M ∈ N such that under the recursive sampling procedure (4.7), the inequality holds almost surely in the case D i \ K Bi = ∅, and K Bi+1 = B i+1 otherwise. Let N be the random number of steps such that K B N = B N , and letÂ = B c N be the set sampled before N (see also (4.12)). Assume further that the random permutation has cycle length bounded by an integer-valued random variable ξ on (V, E) and let (Z j ) j≥0 be a Galton-Watson process where the offspring is distributed according to M (ξ −1), and with initial population M |K V |. Let W be the (possibly infinite) total population of the Galton-Watson process. Then for each ∈ N, we have Therefore we find that for all ω, Write q i (ω) = |D i (ω) \ K Bi(ω) (ω)| for brevity. Then for given setsB andK, where in the last step we used Proposition 4.6. Let F j be the sigma-algebra generated by the (q k ) k≤j . Since |K Bj (ω) (ω)| ≤ M q j−1 (ω) for any ω, then we conclude that for any , m ∈ N, for any integer j ≥ 1, and for any ω such that q j−1 (ω) = m, (4.16) Now we can again apply our comparison Lemma 5.6. In the notation of that lemma, (M 1 j ) j≥0 = (q j ) j≥0 and (M 2 j ) j≥0 = 1 M (Z j ) j≥0 , which is the Galton-Watson process rescaled by a constant M . Note that from Lemma 5.7, the Markov chain (M 2 j ) j≥0 satisfies the Assumption 1 of our comparison Lemma. Note also that from (4.16), we have that the Assumption 2 of our comparison Lemma is fulfilled. Furthermore, since 1 M (Z j ) j∈N has initial population 1 M Z 0 = q 0 = |K V |, we also have that the Assumption 3 of the comparison Lemma is fulfilled. Then, the Conclusion (b) of the comparison Lemma gives that, Thus, we find that The proof is finished.
Galton-Watson processes where the expected offspring per individual is strictly less than one have exponentially bounded total population. We can use this for Proposition 4.8. Consider the sampling strategy described before Proposition 4.5. Assume that the random permutation has cycle length bounded by ξ on the finite graph (V, E), and that (|Φ|−1)(E(ξ)−1) < 1. Then there exist constants C 0 > 0 and κ 0 > 0, depending on ξ and |A| but not on V , so that whereÂ has been defined in (4.12). In particular, Proof. Observe that our sampling strategy (4.11) fulfils |K Bi+1 | ≤ (|Φ| − 1)|D i \ K Bi | almost surely, so Proposition 4.7 can be applied with M = |Φ| − 1. Thus, the first statement follows from subcriticality of the Galton-Watson process. For the second statement, we use that by (4.13), We will be interested in cases where the set A itself is large. Then the above estimate is too weak, since the constant C there will depend on A we have too little control on its growth with |A|. We therefore need the following refinement, which we state only in the context of comparing expectation of local function for different boundary conditions. An adaptation to the existence of Φ-compatible sets is straightforward. Proposition 4.9. Let (V, E) be a finite graph, and B ⊂ U ⊂ V . Assume that a random permutation on (V, E) has cycle length bounded by a random variable ξ with E(ξ) < 2. Let C > 0 and κ > 0 be two constants such that the total offspring W of a Galton-Watson process with initial population 1 and offspring distribution ξ − 1 fulfils P(W > n) ≤ C e −2κn . Then, for any F B -measurable function f we have Above, d is the graph distance on G.
Proof. Let V 1 and V 2 be two disjoint copies of V and putṼ = V 1 ∪ V 2 ,Ṽ 0 = U 1 ∪ V 2 where U 1 is U regarded as a subset of V 1 , and A =Ṽ \Ṽ 0 . Let φ be the natural graph isomorphism taking V 1 to V 2 . Forπ ∈ SṼ , let Q A (π) be the minimal π-invariant, φ-compatible set containing A. In equation (4.3) we have seen that for B ⊂ U and F B -measurable f , On the right hand side above, we interpret B as a subset of U 1 ⊂ V 1 . See also Figure 2.
For estimating the right hand side, we first note that when A, A ⊂Ṽ , then Indeed, the implication ⊃ holds since A ⊂ B implies Q A (π) ⊂ Q B (π) for any π. For the reverse implication, note that Q A (π) ∪ Q A (π) is π-invariant and contains A ∪ A , and since φ is bijective we also by the minimality of the latter set. We conclude (4.20) For estimating the latter probabilities, define φ 0 (y) so that φ 0 (y) = φ(y) for y ∈ U 1 ∪ U 2 , and φ 0 (y) = y if y ∈ U c 2 . Here, U 2 denotes U regarded as a subset of V 2 . Let Q 0 {x} be the minimal π-invariant set containing x so that φ 0 (Q 0 {x} (π)) = Q 0 {x} (π). First note that PṼ (π| A = id) = Z(Ṽ 0 )/Z(Ṽ ), and thus

ITERATIVE SAMPLING
The relevant sampling strategy is then given by As before, we let N = inf{j ∈ N : K Bj = B j } andÂ = B c N . ThenÂ(π) is φ 0 -compatible and π-invariant, so Q 0 {x} (π) ⊂Â(π) for all π. The sampling strategy fulfils the assumptions of Proposition 4.7 with M = 1, and therefore d(x,B)+1) . C and κ are the constants defined in the statement of the proposition. Inserting this estimate in (4.20) and the result into (4.19) shows the claim.
Consider a vertex transitive graph G = (V, E) that might be finite or infinite. Theorem 2.1 guarantees that there exists α 0 < ∞ (which may be significantly larger than the α 0 given in that theorem) such that for all α > α 0 , the corresponding random permutation has cycle length bounded by a random variable ξ that has finite exponential moments and has E(ξ) < 2. Proposition 4.9 then gives a bound on |P V1 (π| A = id) − P V0 (π| A = id)| for finite subsets A ⊂ V 1 ⊂ V 0 of V . However, this bound becomes meaningless when A is allowed to become large, since both of the involved probabilities then become very small. The following proposition improves the estimate of Proposition 4.9 to the strength that will be needed in the next section. We formulate it in terms of partition functions. Proposition 4.10. Let (V, E) be finite or infinite. Consider finite subsets A ⊂ V 1 ⊂ V 0 of V . Let α 0 be large enough so that for all α > α 0 , the random permutation has cycle length bounded by a random variable ξ with E( ξ ) < 2. Let C > 0 and κ > 0 be two constants such that the total offspring W of a Galton-Watson process with initial population 1 and offspring distribution ξ − 1 fulfils P(W > n) ≤ C e −2κn . Let c 1 (α) be defined as in (3.9). Define where d(x, y) denotes the graph distance between x and y. Then Proof. Let us put n = |A|, and write We set A n+1 = ∅. Then, for j = 0, 1, and we have log The first sum on the right hand side above is equal to log Z(V0\A) Z(V0) . We use the inequality log(1 + |x|) ≤ |x| on each term in the final sum above, and find

Now, by applying Proposition 3.3 with
.
We now apply Proposition 4.9 with f = 1{π(x i ) = x i } and we obtain that, 2C e −κd(y,x i ) .
(4.22) We combine these estimates and obtain Exponentiating this inequality gives Z( and thus the first claimed inequality. For the second one, notice that the only place where we used V 1 ⊂ V 0 is inequality (4.22). That estimate still holds when interchanging the roles of V 0 and V 1 , and when we do this, we get the remaining inequality.
We will only use the following weaker form of the previous proposition.

Proof of Theorem 2.2
Recall that for y = (y 1 , . . . , y d ) ∈ Z d we writeȳ = y 1 andŷ = (y 2 , . . . , y d ). In the proof we will need the following notions. For y ∈ Λ n , we write for the forward cone starting in y that is broadened to full width after a logarithmic length (see also Figure 4a). A set A ⊂ Λ n will be called weakly y-admissible if {x ∈ Λ n :x ≥ȳ + log n} ⊂ A, y-admissible if in addition {x ∈ Λ n :x ≥ȳ,x =ŷ} ⊂ A, and strictly y-admissible if in addition C y ⊂ A.
We write A w y , A y and A s y for the set of weakly admissible, admissible and strictly admissible sets, respectively. For y, z ∈ Λ n , A ∈ A y and π ∈ S y→z A recall that γ(π) = Orb π ({y}) denotes the trace of the self avoiding walk that is embedded in π, starting from y and ending in z. We order the elements of γ(π) by order of their appearance in Orb π ({y}), i.e. x ≤ y if y ∈ Orb π ({x}). Together with this order, the set γ(π) uniquely characterises a self-avoiding path from y to an element of z, and we will sometimes write L y→z A for the set of all ordered subsets of A such that their order makes them a self-avoiding nearest neighbour walk in A starting in y and ending in z. As for permutations, we define L y→ n A = z∈ n L y,z A . Given γ ∈ L y→ n A , a point x ∈ γ will be called pre-regeneration point of γ if andz <x for z < x.
In other words, the self avoiding path hits the set x precisely at x and stays in C x thereafter (see also Figure 4b. Note that the latter requirement is more than what is usually required for regeneration points of self-avoiding walks. We will need it below in order to decouple the future of γ from part of the background consisting of loops in π. Let π ∈ S y→ n A , and let π 0 be the element of S A with π 0 (x) = x whenever x ∈ γ(π), and π 0 (x) = π(x) otherwise. A pre-regeneration point x ∈ γ(π) will be called regeneration point if there exists a subset R ⊂ A with the following four properties: (R2) R is strictly x-admissible. If x is a regeneration point of π, the largest set with properties (R1)-(R4) is called regeneration set of π through x, and is denoted by R(x, π). See also Figure 4c. Since (R1)-(R4) are invariant under unions of sets, the existence of any regeneration set implies the existence and uniqueness of R(x, π). By convention, the point 0 ∈ Λ n will be a regeneration point with A(0, π) = Λ n for all π, even though 0 might not be a pre-regeneration point of γ(π). Also, z = max{x : x ∈ γ(π)} is a regeneration point by convention, with empty regeneration set. The set of all regeneration points of some π ∈ S y→ n A is denoted by R(π) and is non-empty by the above conventions. We order its elements by the order in which they appear in γ.
In words, we pick a sequence of regeneration points that have mutual horizontal distance of at least log n, when in this way we find a regeneration point with horizontal distance less than log n to n , we jump to max γ and stay there forever. In addition, we define R i (π) = R(X i (π), π). We claim that the sequence (X i , R(X i )) is a Λ n × P(Λ n )-valued Markov chain, with transition matrix p( x, R), (y, ∅)) = P x→ n R (y = max γ) ifx + log n ≥ n, and in all other cases. This will follow from a variant of the spatial Markov property that we are now going to state. For any set A ⊂ V = Λ n , we define F A to be the σ-algebra over S 0→ n V generated by the forward evaluations π → π(z) for z ∈ A. Note that in contrast to the situation in Proposition 4.1, we do not consider inverse images since these may not be defined in the situation with an open cycle. For any π ∈ S 0→ n V , we define the family of π-almost invariant sets Inv 0 (π) := A ⊂ V : π(A) ⊂ A}.
Note that by this definition, the open cycle γ can enter an almost-invariant set A but not leave it, and if A ∩ γ = ∅ then the image of A is A without the unique entrance point of γ into A. The reader should be warned that in the current situation, A ∈ Inv 0 (π) does not imply A c ∈ Inv 0 (π). Let A ⊂ V , x ∈ A, a ∈ A c , and π ∈ S a→ n V such that A ∈ Inv 0 (π) and x = min γ(π) ∩ A. Then there exist (unique) π| A ∈ S x→ n A and π| A c ∈ S a→x A c ∪{x} such that π| A (z) = π(z) for all z ∈ A and π A c (z) = π(z) for all z ∈ A c . In this situation, for F A c -measurable f : S 0→ n V → R, we use the same symbol to denote the function In the same way, we get g(π) = g(π A ) for F A -measurable g.
(ii): For F A c -measurable f and F A -measurable g, we have (iv): For F A -measurable g and Q ∈ F A c , we have Proof. Let H B (π) := x∈B |π(x) − x|. Then for all A ⊂ V , all x ∈ A and all π ∈ S a→ n V such that A ∈ Inv 0 (π) and x = min γ ∩ A, we have

This holds since π|
The proof now follows the same steps as the proof of Proposition 4.1.
For proving equation (5.1), we fix x 1 , . . . x k+1 ∈ Λ n and subsets Q 1 , . . . , Q k+1 of Λ n which satisfy the deterministic conditions for regeneration points, in particular for all i ≤ k + 1, Let us write J i = {X i = x i , R i = Q i } for brevity, and write C(x, y) = {π : y = min{w ∈ R(π) :w ≥x + log n}}. Then On the other hand, the event is F Q c k \{x k } -measurable, and contains the set {Q k ∈ Inv x k } ∪ {x k = min γ ∩ Q k }. We can therefore apply part (iv) of Proposition 5.1 with A = Q k to both numerator and denominator above, and find By the symmetry condition (R3) of regeneration sets, the process (X i ), while not being a Markov process, is a Z d−1 -valued martingale under P 0→ n Λn . By Doob's L 2 -inequality, we have The main technical part of the proof will consist in showing the following result: Then there exist α 0 < ∞, C < ∞ and N ∈ N so that for all y ∈ Λ n , n > N , α > α 0 , A ∈ A s y , k ∈ R we have The proof is long, so we postpone it to the end of the section. Using Proposition 5.2 with p > 2, we find that for suitable α c , C and N , uniformly in n > N , α > α 0 and R ∈ A s y . By integrating over possible y and R we conclude E(|X j − X j−1 | 2 ) ≤ (log n) 2 C p . SinceX i+1 ≥X i +log n by construction, the martingale takes at most n/ log n steps before hitting the point lim n→∞ π n (0), after which it does no longer move. This, Chebyshevs inequality and (5.4) give In conclusion, we now know that with high probability none of the special regeneration points (X i ) is further than √ n log n away from the line {x = 0}. In order to control the parts of γ in between the X i , we first apply Proposition 5.2 with k = n 1/4 and p = 8, and find P 0→ n Λn (|X i+1 −X i | > n 1/4 log n|X i = y, R i = A) ≤ C 8 n −2 for all y ∈ Λ n and all A ∈ A s y . Since there are at most n/ log n regeneration points, the union bound then gives (for n large enough) We thus find By definition, a regeneration point is the last time that γ crosses a certain vertical hyperplane. Proposition 5.3 below deals with the probability that the length of γ is large before it crosses such a hyperplane for the last time. Applying it with δ = 1, B = Λ n and L = n 1/3 , we see that the probability that the piece of γ between X i and X i+1 is longer than 2n 1/3 is less than C e −cn 1/3 for each i. On the other hand, a path would need at least 2M √ n log n steps to exceed level 2M √ n log n when starting (and ending) below level M √ n log n. Thus the second line in (5.6) is bounded by n e −cn 1/3 . Taking n so large that n > M and n e −cn 1/3 < 1 M , Theorem 2.2 is proved, provided we can prove the two propositions that we used. We start with stating and proving Proposition 5.3, which itself is a fundamental ingredient to the proof of Proposition 5.2. For its statement and also for later use, we introduce the following notation. For L ∈ N and π ∈ S y→ n A , set x L (π) := max{x ∈ γ(π) :x =ȳ + L}, and ρ L (π) := γ(π) \ Orb π (x L ).
x L is the last point at which the hyperplane {z ∈ A :z =ȳ + L} is crossed by γ, and ρ L is the piece of γ that lies before that point. |A| denotes the cardinality of a set A. Recall from above that L x→y A denotes the set of all self-avoiding walks that start in x, end in y and are contained in A. We write L x y A for the subset of those γ ∈ L x→y A where γ ∩ x = {x}, i.e. for the walks that never visit the hyperplane containing their starting point again after their first step. We set L x→ n A := y∈ n L x→y A , and the same for L x n A . 20 Proposition 5.3. For each δ > 0, there exist α 0 < ∞, c > 0 and C < ∞ such that for all n > N , α > α 0 , L ∈ N with L > 2 log n, y ∈ Λ n withȳ ≤ n − L, and A, B ∈ A y with B ⊂ A, we have Proof. Let A ∈ A y . We claim that there exist α 0 < ∞ and N ∈ N such that for all α > α 0 , n > N , L > log n, h ∈ A withh =ȳ + L, and γ ∈ L h n A , we have This follows from Corollary 4.11: Since A is admissible and γ stays to the right of h, we have d(γ, Λ n \A) > log n, and clearly |γ||Λ n \A| ≤ n 2d . By Theorem 2.1, we can choose α 0 and N ∈ N so large that for α > α 0 the cycle length of spatial random permutations (in an arbitrary domain) is bounded by a random variable ξ with the property that the total population W of a Galton-Watson process with offspring distribution x >h for all x ∈ γ(π) \ {h}}.
Then, since ρ ⊂ B, we have For estimating the denominator, let ρ 0 = {x ∈ A :ȳ ≤x ≤ȳ + L,x =ŷ} be the straight line from y to the hyperplane h , and let h 0 = ρ 0 ∩ h . Since B is admissible, ρ 0 ⊂ B, and thus We now use Proposition 3.3 with c 1 (α) as in (3.9) as well as (5.7) and find Going back into (5.8), and using (5.7) again, we also have that Together, this gives The final quotient is equal to one by translation invariance. Since c 1 (α) converges to 1 as α → ∞, by making α 0 even bigger if necessary we achieve for all α > α 0 . Now we sum over all allowed lengths of ρ and find ∞ m=0 e −m(α−log(2d)) .
(The crude bound (2d) m on the number of all self-avoiding walks of length m could of course be improved using the connective constant, but in the present situation we do not gain any insight from that.) Now by further increasing α 0 if necessary, we see that the claim holds with c = α 0 δ/2 − (1 + δ) log(2d), and C = 1/(1 − e −α0+log(2d) ).
The next statement, which is purely deterministic, shows that proposition (5.3) is useful for obtaining lower bounds on the number of pre-regeneration points.
Proof. We recursively determine sequences of points in γ. We set x 0 = x. For i ≥ 1, we define y i := min{y ∈ γ : y ≥ x i−1 , y is not a pre-regeneration point}, y i := min{y ∈ γ : y > y i , y / ∈ C yi }, x i+1 := min{x ∈ γ :x > m i } (see also Figure 5). Thus when following the path γ, we thus have the following situation: At each x i , the path crosses the hyperplane xi for the first time. Thereafter, we only see regeneration points until we hit y i+1 , although of course y i+1 = x i is possible. y i+1 is then the point where we actually notice that y i+1 was not a pre-regeneration point (see also the following paragraph!). We then follow γ further until we hit a vertical hyperplane that we have not seen before y i+1 , and x i+1 is the place where we cross that hyperplane.
The crucial observation is that since x i is the first time a certain hyperplane is crossed, and since all points between x i and y i are pre-regeneration points, the reason that y i is not a pre-regeneration point can only be that the path leaves the enhanced forward cone; the other possibility, namely that some previous part of the path has already crossed the hyperplane containing y i , is ruled out. This means that indeed y i is the place where we notice that y i was not a pre-regeneration point. It also means that between y i and x i+1 , the path stays strictly to the right of ȳi and strictly to the left of xi+1 . It is easy to see that in such a situation, at most a third of all the steps of the path can be 'to the right', i.e. between vertices z, w withz =w + 1. Since γ takes at least L steps to the right there are at most δL steps available for the other directions. Thus the union of all pieces of γ that lie between some y i and the corresponding x i+1 has less than 3δL elements. Since all of the remaining at least (1 − 3δ)L points are regeneration points by construction, the claim is proved.
The next step is to show that for sufficiently large α, a pre-regeneration point is an actual regeneration point with uniformly positive probability. Recall that R(π) denotes set of all regeneration points of π.
Theorem 2.1 and Propositions 4.7 and 4.8 guarantee that we can choose α 0 large enough so that for all α > α 0 , all z withz =x − 1 and all k ≥ 1, we have that Then This shows that for sufficiently large n 0 and all n > n 0 , as well as for all α > α 0 , we have P A\γ0 (R(x, .) = ∅) ≥ (1 −c)/2 > 0.
We now have all the preparations in place to give the proof of Proposition 5.2 and thus conclude the proof of Theorem 2.2. Clearly, it suffices to prove the statement of Proposition 5.2 only for large enough k. We fix δ < 1/4, and for k ≥ 1 and L = k log n we decompose P y→ n A (|X 1 − y| > k log n|γ ∈ C y ) ≤ P y→ n A (|ρ L | ≥ (1 + δ)L|ρ L ∈ C y )+ + P y→ n A (|X 1 − y| > k log n, |ρ L | < (1 + δ)L|ρ L ∈ C y ). (5.10) Here we used that ρ L ∈ C y is equivalent to γ ∈ C y by the definitions of C y and ρ L and the fact that L ≥ log n. Since C y ∈ A y , we can apply Proposition 5.3 and find uniformly in k for all large enough α and n.
It remains to estimate the second term in (5.10). For this, we use the trivial equality in order to see that the claim will be shown once we prove that there existsC < ∞ such that uniformly in ε, n and ρ ⊂ A such that |ρ| < (1 + δ)L and such that ρ is the trace of some self-avoiding walk from y to some element of ȳ+L , we have PROOF OF THEOREM 2.2 Lemma 5.4 tells us that such a ρ must have at least (1 − 3δ)k log n ≥ 1 4 k log n pre-regeneration points. For k > 8, at least (k/4 − 1) log n ≥ k 8 log n of them satisfyx −ȳ ≥ log n. If any of these is a regeneration point, then X 1 is the minimal such point, and thus |X 1 − y| ≤ k log n.
Let therefore ρ be the trace of some self-avoiding walk from y to an element of ȳ+L , assume that it has M > log n pre-regeneration points that have horizontal distance larger than log n from y, and denote these points by (x i ) i≤M . By Proposition 5.5, each x i is a regeneration point with uniformly positive probability. If the events {π : x i ∈ R(π)} would be independent, this would already conclude the proof, as we have many of these points. Unfortunately, they are not independent, and so we will have to work harder. (Although we can not prove it, it is actually reasonable to conjecture that the events {x i / ∈ R} are positively correlated since the fact that x i is a pre-regeneration point but not a regeneration point means that there is some long cycle of π that prevents the existence of a regeneration surface; such a long cycle may still impact the next pre-regeneration point as well.) The solution is to separate pre-regeneration points by invariant sets. Let r 1 = x 1 , and r j = min{x j :x j >r j−1 + 2 log n}.
Since we started with at least 1 8 k log n pre-regeneration points (x i ), and since at most 2 log n of them are between two consecutive r i , we still retain at leastk ≥ k/16 points (r i ) i≤k . For π ∈ S A\γ , denote again by R(x, π) the set of all subsets of A satisfying properties (R1) to (R4). With for any γ ⊂ A so that ρ L (γ) = ρ, we then have We further define w i :=r i + log n, and − wi := {z ∈ A :z <w i }, Let m ≤k. We decompose, By the same reasoning that led to the bound on the second term in (5.9), the first term above is bounded by n d−1−ν , where we can choose ν as large as needed by making α large. For the second term we use the strong Markov property of SRP. Let f (π) = 1l{π ∈ N m }, and note that m−1 i=1 N i ∈ F Qm−1 by the definition of the Q i and the strict admissibility of regeneration sets. Thus

and Proposition 4.2 gives
almost surely. By Proposition 5.5 we find P Qm−1(π) (N m ) ≤ 1 − c uniformly in all allowed Q m−1 (π) (note that by definition, Q m−1 (π) is weakly admissible for all π such that Q m−1 (π) ∩ W m−1 = ∅), and thus we conclude for all m ≤k. Therefore, by induction, It remains to choose α 0 so large that for α > α 0 , we have d − 1 − η < −p. Then for sufficiently large n 0 and all n > n 0 , we have For k ≤ n, this proves the claim of Proposition 5.2; the last step in the proof consist in observing that for k > n, the required probability is trivially equal to zero when k > d. Thus Proposition 5.2, and thus Theorem 2.2, is proved. Let us finish this paper by looking back over the proof and identifying the place where our estimates are not good enough to provide diffusive scaling. The most basic place where this happens is Proposition 5.3: here already, we can only prove a bound on the length of ρ L (and thus many regeneration points) when L is of the order of log n. This in turn is due to the inequality (5.7), which needs the separation of the part of γ to the right of h from the set Λ n \ A by at least a distance of log n. This is because (5.7) relies on Proposition 4.10, and in this result the sum that leads to the constant D needs to be controlled by sufficinetly large distances between A and V 0 \ V 1 , especially when the number of points in either of them diverges. The rood cause of the problem is that while we do have exponential decay of correlations, the decay is not uniform in the size of the sets between which the correlations are measured. See also the discussion in the paragraph before Proposition 4.10. The latter proposition already goes some way towards solving this problem, but as it turns out it is not quite enough for obtaining optimal results.
Proof. Let us denote by P ( · ) = P 1 ( · ) × P 2 ( · ) and by E( · ) the expectation of P ( · ). We start by proving the statement (a) of the lemma. Fix a finite sequence of non-negative integers, n 1 , n 2 , . . ., n k . We replace steps of the process 1 by steps of the process 2 one by one, showing that the probability of the event 1{ M 1 k > n k , M 1 k−1 > n k−1 , . . . M 1 0 > n 0 } cannot decrease at each replacement. Here is our first replacement, can be interpreted as the step n = 1 of the Markov chain which starts from the initial state distributed according to M 1 k−1 conditioned on F 1 k−2 (then, the outer expectation integrates over the F 1 k−2 ). The inequality holds as we replace such an initial state for the Markov chain by a new state that, by our second assumption, is stochastically larger, namely by M 2 1 conditioned on M 2 0 = M 1 k−2 . Then, our first assumption in the statement of the lemma guarantees that the whole Markov chain (not only the step 1, but also the following steps) is stochastically larger. After k − 1 iterations we get the first inequality in the expression below. For the last inequality we used our Assumptions 1 and 3. Indeed, on the left-hand side of the second inequality we have the Markov chain (M 2 n ) n∈N starting from initial state M 1 0 , on the right-hand side we have the same Markov chain starting from initial state M 2,λ * 0 , which is is stochastically larger than M 1 0 by our Assumption 3.
The proof of the part (b) of the lemma goes along the same replacement procedure as in the part (a).
Let use define W i k = k n=0 M i k , i ∈ {1, 2}. We just illustrate our first and second replacements below. The proof of the statement (b) of our Comparison Lemma follows by iteration. For the last step, we use our Assumption 3, the same as for the proof of our statement (a).
The last result that we present in this section is that the simple Galton-Watson process is a Markov chain satisfying the Assumption 1 on the process (M 2 j ) j∈N in Lemma 5.6.
Lemma 5.7. If Ξ and Ξ are two independent random variables such that Ξ d Ξ , then (Z Ξ j ) j∈N d (Z Ξ j ) j∈N , where (Z Ξ j ) j∈N is a simple Galton-Watson with offspring distribution ξ. We are now ready to prove the previous lemma.
Proof of Lemma 5.7. We couple the two Galton-Watson process in the same probability space. In this probability space we define an infinite array of independent random variables ξ = (ξ j,k ) j≥1, k≥1 having the same distribution as ξ. For any m ∈ N and ξ, we define the deterministic sequence (Z j (m, ξ)) j∈N , which represents a Galton-Watson process starting from initial condition m, where the k-th individual of the j-th generation generates an offspring of size ξ j,k . Let us define the sequence precisely. By definition, we set Z 0 (m, ξ) = m. We label individuals of the first generation according to an arbitrary order using the integers 1, 2, . . ., m. After having labelled individuals of the first generation, each individual k generates an offspring of size ξ 1,k . We set Z 1 (m, ξ) as the number of individuals of the generation j = 1. We label individuals of the second generation by using integer numbers increasing one by one, 1, 2, 3, . . ., in such a way that, for any two individuals A and B, if the label of the parent of A is smaller than the label of the parent of B, then the label of A is smaller than the label of B. Once labels are assigned, each individual k of the second generation generates an offspring of size ξ 2,k and variables ξ 2,Z1+1 , ξ 2,Z1+2 , . . . remain unused. We set Z 2 (m, ξ) as the number of individuals of the generation j = 2.
At any step j, we assign labels according to such a rule and we use the variables ξ j,k for the children of each individual k. If for some j, Z j (m, ξ) = 0, then by definition Z j+1 (m, ξ) = Z j+2 (m, ξ) = . . . = 0. The process can be seen as a tree where each individual is connected to its children by a directed edge. By construction, we have that for any realization of the array ξ, for all j ∈ N, Z j (m, ξ)) j∈N ≤ Z j (m , ξ)) j∈N (5.17) whenever m ≥ m, as the tree corresponding to the Galton-Watson process with initial condition m is a sub-graph of the tree corresponding to the Galton-Watson process with initial condition m . Let us now fix a k ∈ N and a sequence of non-negative integers n 0 , n 1 , . . ., n k . For any fixed ξ, we let M (ξ, n 0 , . . . n k ) be the smallest m such that Z j (m, ξ) > n j for all j between 0 and k. For the first inequality we used that Ξ is stochastically larger than Ξ by definition. As the two processes that we constructed in the space ofP ( · ) have the same distribution as the Galton-Watson processes in the statement of the lemma, the previous inequality concludes the proof of the claim.