Local limit of the fixed point forest

Consider the following partial"sorting algorithm"on permutations: take the first entry of the permutation in one-line notation and insert it into the position of its own value. Continue until the first entry is 1. This process imposes a forest structure on the set of all permutations of size $n$, where the roots are the permutations starting with 1 and the leaves are derangements. Viewing the process in the opposite direction towards the leaves, one picks a fixed point and moves it to the beginning. Despite its simplicity, this"fixed point forest"exhibits a rich structure. In this paper, we consider the fixed point forest in the limit $n\to \infty$ and show using Stein's method that at a random permutation the local structure weakly converges to a tree defined in terms of independent Poisson point processes. We also show that the distribution of the length of the longest path to a leaf converges to the geometric distribution with mean $e-1$, and the length of the shortest path converges to the Poisson distribution with mean 1. In addition, the higher moments are bounded and hence the expectations converge as well.


Fixed point forest
Consider a deck of n cards labeled 1, 2, . . . , n given in an arbitrary order. Take the top card and reinsert it into the pile at the position of its value. This gives rise to a partial sorting algorithm, where the algorithm stops when card 1 is on the top. This can be formulated in terms of the set S n of permutations of size n. Each permutation remain unanswered. We discuss some of them in more detail in Section 6. Both locally and globally, the structure of the fixed point forest seems quite rich.
In the remainder of the introduction, we provide a non-technical discussion of the local structure of the forest F n in the limit n → ∞. Let π n be a uniformly random permutation in F n . The essential information for determining the neighborhood of π n is the location of its fixed and near-fixed points. Our idea for understanding the limiting structure of F n is to construct a sort of limit of the fixed and near-fixed points of π n . By rescaling by a factor of 1/n, these are represented as Poisson point processes on [0, 1].
Then, we define a tree from these point processes, which will turn out to be the local limit of F n (see Section 3).

Moving towards leaves in the tree
Suppose that π is a permutation and that we would like to enumerate its descendants up to three levels in the forest; that is, we want to determine all permutations obtained from π by bumping fixed points to the beginning no more than three times. What information about π do we need?
The answer is that we must know all i such that π(i) = i, π(i) = i + 1, or π(i) = i + 2. This is best seen by example. To unify our terminology, say that the letter π(i) or the position i is k-separated if π(i) = i + k. A 0-separated letter is simply a fixed point. Example 1.1. Suppose that π has 0-separated letters at positions 7 and 27, that it has a 1-separated letter at 18, and that it has a 2-separated letter at 13. Then π has two children in the forest, given by bumping the letters in either position 7 or 27. The descendants of π from Example 1.1, encoded by their words. Only three levels are given here, since we are using the information only from the 0-, 1-, and 2separated letters. To construct a tree like this given only the encoding of π by its word, when we bump the 0-separated letter at a given position, we remove the 0 from this position and subtract 1 from all letters previous to the bumped position. A value of −k is indicated here byk, that is, barred and colored red.
If the letter in position 7 is bumped, then the resulting permutation still has a 0separated letter at 27, a 1-separated letter at 18, and a 2-separated letter at 13. From here, one child is given by bumping the letter in position 27. Then, this turns the 1separated letter at position 18 into a 0-separated letter at 19, and it turns the 2-separated letter at 13 into a 1-separated letter at 14. From here, there is a child given by bumping the letter in position 19 (as well as another child after that).
If the letter in position 27 is bumped in the first step, then the 0-separated letter at 7 is destroyed, the 1-separated letter at 18 becomes a 0-separated letter at 19, and the 2-separated letter becomes a 1-separated letter at 14. Now there is a child given by bumping the letter in position 19, which turns the 1-separated letter at 14 into a 0-separated letter at 15, which can then be bumped.
This example suggests a way to encode a permutation according to its k-separated letters for k = 0, . . . , K. We create a word containing a k for each k-separated position in the permutation, in the order that they appear. For example, with K = 2, the word corresponding to π in Example 1.1 is 0210. When we bump the 0-separated letter at a given position, we remove the 0 from this position, subtract 1 from all letters previous to the bumped position, and we leave alone all letters after the bumped position. We write a −k letter ask, that is, barred and colored red. These negative letters are irrelevant for determining the descendants of a permutation, but we leave them in the word for consistency with the next section. Thus bumping the first 0 from 0210 yields 210, and bumping the second 0 yields110. This gives us the compact depiction of the descendants of π in Figure 3.
We warn the reader that this picture is incomplete in one way. When a 0-separated letter at position i is bumped, it creates a new (i − 1)-separated letter at position 1. In Example 1.1, this makes no difference, but in the following example it does.

Example 1.2.
Suppose that π is the permutation 42135, in one-line notation. Then π has 0-separated letters at locations 2 and 5, and it has no 1or 2-separated letters. If 5 is bumped, the resulting permutation has no 0-separated letters and hence no children. If 2 is bumped, then we still have a 0-separated letter at 5. We also have a new 1-separated letter at position 1, namely the 2 that was just bumped. Thus, if 5 is bumped, we can then bump the 2 again.
If we start with the word 00 corresponding to 42135 and then follow the rules laid out before this example for manipulating these words, we miss this last descendant (see Figure 4). Fixed points are in bold. On the right, we show the apparent forest computed only using the words giving the order of the k-separated letters. This misses a descendant created by the "reentry" of 2 as a fixed point. ξ π 2 : Figure 5: The point processes ξ π 0 , ξ π 1 , and ξ π 2 on the interval [0, 1], representing the 0-, 1-, and 2-separated letters in the abstracted permutation π. The word associated with this permutation for K = 2 is 0120, and the tree of its descendants up to three levels is the same as in Figure 3.
This possible "reentry" of a bumped letter as a fixed point complicates the picture.
When we determine the descendants of a permutation π up to level K, this reentry can only occur if there is a 0-separated letter at one of positions 1, . . . , K − 1. For fixed K, this is vanishingly unlikely as n → ∞, and so in constructing the limit tree, we ignore it. The essential idea to this construction is to take a random word on the alphabet 0, . . . , K and make a tree by the procedure described in Figure 3. In constructing the limit tree, one complication is that we want K to be infinite. To address this, we represent the locations of k-separated letters for each k in an abstracted permutation π as a point process ξ π k , representing a set of locations on the interval [0, 1]. These will be independent Poisson point processes with intensity one in the limit. For any fixed K, we can then obtain a string by writing a k for each point of ξ π k for 0 ≤ k < K, sorted by the positions of the points in [0, 1]. This string then determines a tree up to level K following the procedure sketched out after Example 1.1. Example 1.3. Suppose ξ π 0 , ξ π 1 , and ξ π 2 contain points as depicted in Figure 5. Then the word associated with the abstracted permutation π for K = 2 is 0120, and the tree generated by it up to three levels is the same one as in Example 1.1 and Figure 3.
One can also construct the tree directly from the point processes, without the intermediate step of converting to a word. The abstracted permutation π with associated point processes ξ π 0 , ξ π 1 , . . . has |ξ π 0 | children, with |ξ π 0 | denoting the number of points in ξ π 0 . The point processes obtained by bumping a point x ∈ [0, 1] of ξ π 0 are given by removing the point x from ξ π 0 , and by "shifting down" each point process on [0, x) as in  ξ π 3 : Figure 6: If π is an abstracted permutation and π is its child given by bumping the middle point x in ξ π 0 , then the point processes ξ π k is equal to ξ π k+1 on [0, x) and is equal to ξ π k on (x, 1], as depicted above.

Moving towards the base in the tree
In the previous section, we gave a loose account of how to define the descendants of an abstracted permutation π in the limit tree. We need to define the entire tree, however, which includes the ancestors of π and their descendants.
Suppose π is a (non-abstracted) permutation, and we would like to determine both how many children and how many siblings it has in the forest. Again, we ask the question of what information about π we need to find this out.
As before, we need to know the locations of 0-separated letters in π. We also need to know the locations of −1-separated letters, which can become 0-separated letters in the parent of π. Finally, we need to know the value of π(1), as this determines the ancestor of π in the forest. Example 1.4. Let π be the permutation from Example 1.1, which has 0-separated letters at positions 7 and 27. Suppose that it has −1-separated letters at 15 and 36, and suppose that π(1) = 20.
As before, π has two children in the forest, given by bumping positions 7 and 27. When we move towards the base in the tree to the parent of π, the 0-separated letter at position 7 becomes a 1-separated letter at position 6, and the −1-separated letter at position 15 becomes a 0-separated letter at position 14, while the separated letters after position 20 remain the same. The permutation also has a new 0-separated letter at position 20, which if bumped leads to π. Thus the parent of π has three children total, and π has two siblings.
Again, we can view this in terms of words encoding the k-separated letters. We can view the permutation π of Example 1.4 as the word 0211|01, with the | symbol specifying the value of π(1). When moving towards the base in the tree, a 0 is inserted in the position of the | symbol, and all values to the left of the | are incremented. See Figure 7 for a depiction of Example 1.4 in these terms. As before, this picture is incomplete: if the first character in the word corresponds to a separated letter at position 1, this character is deleted rather than incremented when moving towards the base in the tree.
This will be irrelevant in the limit, since for a fixed K and r, a random permutation is vanishingly unlikely to have a k-separated letter with |k| K occurring in the first r positions.
The extra ingredient in moving towards the base in the tree rather than towards the leaves is knowledge of π(1). Looking back at Figure 6, to go backward from π to π in the limit tree, we need the location of the dotted line, which cannot be determined from ξ π k . In the limit case, these locations will be uniform over [0, 1] and independent of the point processes.

Comparison to other known processes
It is a natural question to compare the fixed point forest or its subtrees to other known random trees, such as the Galton-Watson tree [18]. The tree component of the fixed point forest containing the identity permutation has approximately (n − 1)! vertices. As we discussed earlier, the height of this component is 2 n−1 − 1 as shown in [13]. The Galton-Watson tree on the other hand has height √ N if the tree has N vertices [1], which is much larger than 2 n−1 when N = (n − 1)!. This is consistent with the fact that the fixed point tree has offspring sizes that are correlated across generations, whereas the Galton-Watson tree has independent siblings. The fixed point forest locally also is quite different from a Galton-Watson tree. For example, no leaf in the fixed point forest has a sibling that is also a leaf.
The process of picking a fixed point at random and moving it to the front (which corresponds to a walk to a leaf) has some resemblance to the Tsetlin library [17,8,9], which is a model for the evolution of an arrangement of books in a library shelf over time. It is a Markov chain on permutations, where the entry in the i-th position is moved to the front with probability p i . However, in the fixed point forest this process eventually stops when a derangement is reached, whereas in the Tsetlin library the process can go on arbitrarily long.

The limiting objects
In this section, we provide the precise definition of the limiting tree.

Local weak convergence in general
The main result of this paper is the local weak convergence of the fixed point forest to a certain limiting tree. This mode of convergence is sometimes called Benjamini-Schramm convergence after the paper [4]. We will give a short introduction to local weak convergence now, but see [2, Section 2] for a more in depth discussion.
Let G, G 1 , G 2 , . . . be a sequence of random rooted graphs. For any rooted graph H, H(r) denotes the r-ball around the root of H; that is, H(r) is the subgraph of H induced by all vertices at distance r or less from the root. We write H ∼ = H to signify that H and H are isomorphic as rooted graphs. We say that G is the local weak limit of G n if for every r ≥ 0 and every finite graph H, as n → ∞. Roughly speaking, this says that the view from the root of G n resembles the view from the root of G in distribution more and more as n → ∞. Frequently, G n is a finite, deterministic graph with its root chosen uniformly at random, as will be the case in this paper.
Local limit of the fixed point forest

Construction of the limit tree
The ingredients of our construction are a collection of Poisson point processes (ξ ρ k ) k∈Z of unit intensity on [0, 1] and a sequence U 1 , U 2 , . . . of independent Unif[0, 1] random variables. Formally, a point process is an integer-valued random measure on the Borel sets of R. One should think of it as a random collection of points, represented as the atoms of the measure. A Poisson point process ξ of unit intensity on [0, 1] is characterized by two properties: First, for any interval of length x, the number of points of ξ in the interval is distributed as Poi(x). Second, the numbers of points of ξ in disjoint intervals are independent. We use the terminology point process configuration to mean a deterministic collection of points, also represented formally as a measure.
The point process ξ ρ k represents the k-separated letters in the abstracted permutation ρ. Let ρ 1 be the parent of ρ 0 = ρ, let ρ 2 be the parent of ρ 1 , and so on. The random variable U i represents the 0-separated letter in ρ i that was bumped to create ρ i−1 .
To construct the tree, we first define maps corresponding to moving forwards and backwards from a given vertex.
This is the down-shift operation depicted in Figure 6, corresponding to moving forwards towards a leaf in the tree from a permutation to one of its children by bumping the abstracted fixed point x.  1] for k = 0.
This is the reverse of the forward map, in the following sense: if x is an atom of ξ 0 and f (ξ, x) = ξ , then b(ξ , x) = ξ.
Next, we define a tree by applying f and b to map out the abstracted permutations. Definition 2.3. Given point process configurations ξ ρ = (ξ ρ k ) k∈Z and a sequence u = (u 1 , u 2 , . . .) of elements of [0, 1], we construct a rooted tree ϕ(u, ξ ρ ) as follows. We think of each vertex v of this tree as an abstracted permutation, represented by a collection of point processes ξ v = (ξ v k ) k∈Z . Let ρ = ρ 0 be the root of the tree. First, we give ρ 0 an infinite chain of ancestors ρ 1 , ρ 2 , . . .. Starting with ξ ρ0 , which is given to us, we inductively define Next, we construct descendants of each ρ i . For every atom x in ξ ρi 0 other than u i , give ρ i a child ρ i (x) and define for all k. (We avoid doing this with x = u i since this would just recreate ρ i−1 .) From here on, we proceed inductively, continuing to extend the tree forwards. Suppose that ξ v has already been defined. For each atom x in ξ v 0 , extend the tree by creating a child v(x) of v, and define We define ϕ(u, ξ ρ ) as the resulting tree. Also, observing that the r-neighborhood of the root of the tree depends only on ξ ρ −r+1 , . . . , ξ ρ r−1 and on u 1 , . . . , u r , define the map ϕ r by setting ϕ r u 1 , . . . , u r ; ξ ρ −r+1 , . . . , ξ ρ r−1 as the r-neighborhood of the root of ϕ(u, ξ ρ ). Finally, we construct the limit tree T .
In the following section, we will prove that T is the local limit of F n as n → ∞.

Local weak convergence of the fixed point tree
Recall that i is a k-separated position in a permutation π if π(i) = i + k. The main idea in this section is that the neighborhood of a permutation π in F n up to distance r can typically be reassembled from two pieces of information: the k-separated positions in π for −r < k < r, and the values of π(1), . . . , π(r). The first piece lets us work out the tree forwards (towards leaves) from π. When we rescale by 1/n, these locations converge to independent Poisson point processes on [0, 1]. The second piece of information lets us move backwards in the tree. With the same rescaling, these random variables converge to independent points sampled uniformly from [0, 1]. The two pieces of information converge jointly, as shown in Proposition 3.5, and the weak local convergence of the fixed point tree F n follows easily from this in Theorem 3.6.
While this convergence is what one would expect from well known Poisson approximations of fixed points of random permutations (see [6,Theorem 11], for example), it will take some technical work to prove our precise statement. We will use Stein's method via size-bias couplings using the framework from [3], which we introduce now. See also [14,Section 4.3] for a more detailed introduction to size-bias couplings. The general idea for our purposes is that we have a collection of 0-1 random variables, and we would like to show that they are well approximated by independent Poisson random variables. If there exist certain couplings described below, Stein's method gives a quantitative version of this approximation. The bound is given in terms of the covariances of the random variables and does not depend on the couplings, once they are shown to exist. Condition 3.1. Let I = (I α ) α∈I be a collection of 0-1 random variables. For each α ∈ I, there is a random vector J •α = (J βα ) β∈I coupled with I such that • J •α is distributed as I conditioned on I α = 1; • we can partition I into disjoint sets such that with probability one, When this condition holds, one can estimate the distance between I and a vector of independent Poisson random variables in terms of the covariances of the components of I, with no mention of the coupling. Here, as usual, the covariance between two random variables X and Y is The bound is on the total variation distance between the laws of the random vectors. For random variables X and Y taking values in some space S, this distance is defined as Another characterization of the total variation distance between the laws of X and Y is as the minimum of P[X = Y ] over all couplings of X and Y ; see [3, Section A.1].
Proposition 3.2 (Corollary 10.J.1 in [3]). Assume the coupling condition. Let Y = (Y α ) α∈I be a vector of independent Poisson random variables with EY α = EI α . Then We will also need the following technical lemma.
Proof. Suppose that U and V are Poisson with means a b. Then U and V can be coupled by Applying this coupling to Y α and Z α for each α, we obtain a coupling of Y and Z in which they differ with probability at most α∈I EY α − EZ α .
Next, we apply Proposition 3.2 to some indicators derived from a random permutation π on [n] := {1, . . . , n} with distribution to be specified. Let I(i, k) be an indicator on position i being k-separated in π. Fix r and n, and let I = (i, k) k ∈ {−r + 1, . . . , r − 1}, i ∈ {r + 1, . . . , n}, and i + k ∈ {1, . . . , n} , which is the set of (i, k) such that i > r and i might possibly be k-separated in π, for |k| r − 1.
Proof. The proof proceeds in three steps: First, we construct a coupling satisfying the coupling condition. Next, we bound the expression on the right hand side of (3.3) to obtain a total variation bound between I and a random vector Y = Y (i, k) (i,k)∈I whose components are independent Poisson random variables with EY (i, k) = EI(i, k). Last, we bound the total variation distance between Y and Z.
One thing to observe before we start is that if i + k ∈ A, then I(i, k) = 0 deterministically for (i, k) ∈ I. The pairs (i, k) where this holds are irrelevant when we bound the distance between I and Y, as the corresponding term in Y is also deterministically zero. Thus we can ignore these terms in Steps 1 and 2 by removing them from I and Y. Let In a slight abuse of notation, we take I and Y to be indexed by I rather than by I in Steps 1 and 2 only.
Fix some (i 0 , k 0 ) ∈ I . Our goal is to construct (J(i, k)) (i,k)∈I distributed as I conditioned on I(i 0 , k 0 ) = 1 and to partition I so that (3.1)-(3.2) hold. (In the notation used in the coupling condition, J(i, k) would be written J (i,k)(i0,k0) . We omit mention of (i 0 , k 0 ) to simplify notation.) Let τ be the random swap (π(i 0 ), i 0 + k 0 ), and let π = τ • π. This forces π to map i 0 to i 0 + k 0 , making i 0 a k 0 -separated point for π . As π(i 0 ) cannot be an element of A (because i 0 > r) and i 0 + k 0 is not in A by definition of I , the permutation π also satisfies (3.4).
We show now that π is distributed as π conditioned on mapping i 0 to i 0 + k 0 . Let Π be the set of permutations on n elements satisfying the conditions specified in (3.4), and let Π ⊆ Π be the set of permutations that also map i 0 to i 0 + k 0 . One can easily check that for any σ ∈ Π , there are exactly n − r permutations σ ∈ Π such that swapping σ(i 0 ) and i 0 + k 0 yields σ . As π is distributed uniformly over Π, this implies that π is distributed uniformly over Π , which shows that π is distributed as π conditioned on mapping i 0 to Define I + i0,k0 to be the rest of I except for (i 0 , k 0 ). For any (i, k) ∈ I − i0,k0 , it is impossible that π (i) = i + k: If (i) holds, then we already know that π (i) = i + k 0 , since we have conditioned π to make this so. If (ii) holds, then we already know that π (i 0 ) = i 0 + k 0 , and so it cannot be that π To see that (3.2) is satisfied, suppose that J(i, k) = 0 and I(i, k) = 1 for some (i, k) ∈ I . We will show that (i, k) ∈ I − i0,k0 . By our assumption, π(i) = i + k, and π (i) = i + k. Thus τ swaps i + k with some other value. By the definition of τ , either i + k = π(i 0 ), or i + k = i 0 + k 0 . In the first case, we have π(i 0 ) = π(i), implying that i 0 = i; thus (i, k) satisfies (i). In the second case, we have i = i 0 + k 0 − k, satisfying (ii). Thus our coupling satisfies (3.2).
The conditions of Proposition 3.2 are now satisfied, and we just need to bound the three terms on the right hand side of (3.3). We start with the observation that π can be thought of as a uniformly random bijection from {r + 1, . . . , n} to {1, . . . , n} \ A. Thus, for any (i, k) ∈ I , the probability that π maps i to i + k is 1/(n − r). Equivalently, EI(i, k) = 1/(n − r). Now, we bound the first term in (3.3). As |I | 2r(n − r), we have For the next term, observe that if (i, k) ∈ I − i0,k0 , then I(i, k) and I(i 0 , k 0 ) cannot simultaneously be 1. Thus For any (i 0 , k 0 ), the number of pairs (i, k) satisfying (i) is at most 2r, and the number satisfying (ii) is at most 2r.
For the final term, suppose that (i, k) ∈ I + i0,k0 . As π conditioned on I i0,k0 = 1 is a uniformly random element of the set Π from Step 1, .
. Applying Lemma 3.3 and using the bounds |I \ I | 2r 2 and |I | 2r(n − r), Summing this bound and the one in (3.8) proves the theorem. as n → ∞.
Proof. As a first step, let ξ (n) k be the point process obtained by restricting ξ (n) k to the interval (r + 1)/n, 1 . For any i, By a union bound, the probability that any of 1, . . . , r is k-separated in π n for some −r < k < r is at most 2r 2 /n. Thus π n (1) n , . . . , π n (r) n , ξ except with probability at most 2r 2 /n. Since this probability vanishes as n → ∞, it suffices to show the convergence in distribution of the right hand side of the above equation to the limit in (3.9). Now, observe that by directly calculating lim n→∞ P π n (1) n , . . . , π n (r) n ∈ E 1 × · · · × E r for any intervals E 1 , . . . , E n . To finish off the theorem, we will show that the law of ξ r−1 conditional on π n (1), . . . , π n (r) converges weakly to the law of (ξ −r+1 , . . . , ξ r−1 ).
Observe that ξ  Proof. Fix some integer r 1, and assume throughout that n > r. We need to show that the r-ball around π n in F n converges in distribution to the r-ball around the root in T , which is ϕ r U 1 , . . . , U r ; ξ −r+1 , . . . , ξ r−1 .
The idea will be to show that this tree T n is almost the same thing as the r-neighborhood of π n in F n , and then to apply Proposition 3.5 to show that T n converges in distribution to the r-ball around the root in T .
r−1 contain no points in the interval [0, r/n], then T n is identical to to the r-ball around π n in F n .
Proof. The algorithm defining T n differs from the true r-ball around π n in F n in two ways. First, when moving forward in the tree by bumping a fixed point at position i to position 1, no point at 1/n is inserted into the point process ξ (n) i−1 . For this to cause T n to lack a vertex in the r-ball around π n , it must occur that after s 0 steps backward in the tree from π n , there is a fixed point at i, and then after bumping it one can move forward in the tree another i − 1 times to return the fixed point to i, and then one can bump the fixed point again, all while remaining within the r-ball around π n . Thus, it is necessary that s + i < r. Under the conditions of this claim, π n (1), . . . , π n (r) > r, since otherwise π n would have a k-separated point at position i for some −r + 1 ≤ k ≤ r − 1 and 1 ≤ i ≤ r. Hence, one must move backwards from π n at least r − i + 1 times to create a fixed point at i, showing that this circumstance can never occur.
The second difference in the algorithm is that points in the point processes are not shifted by 1/n at each step. One could equally well define the algorithm giving T n only in terms of the order of the points in ξ As n → ∞, the condition in this claim holds with probability approaching 1, as we showed with a union bound in the proof of Proposition 3.5. Thus we only need to show that T n converges in distribution to the r-ball around the root in T , that is, that ϕ r π n (1) n , . . . , π n (r) n ; ξ Once we show that ϕ r is continuous on a set that almost surely contains U 1 , . . . , U r ; ξ −r+1 , . . . , ξ r−1 , We claim that ϕ is continuous at any point u 1 , . . . , u r ; ζ −r+1 , . . . , ζ r−1 where all atoms of the point process configurations ζ −r+1 , . . . , ζ r−1 are distinct, all u 1 , . . . , u r ∈ [0, 1] are distinct from each other and any points in the configurations. In fact, ϕ r is constant on a neighborhood of u 1 , . . . , u r ; ζ −r+1 , . . . , ζ r−1 , as a slight perturbation that does not change the order of any of the points u 1 , . . . , u r or the points in ζ −r+1 , . . . , ζ r−1 does not change the resulting tree. As U 1 , . . . , U r ; ξ −r+1 , . . . , ξ r−1 has these properties almost surely, this proves the continuity property of ϕ r that we needed.
As a consequence of Theorem 3.6, any statistic of π n determined by its r-neighborhood in F n converges in distribution to the corresponding statistic of the limit tree T . This excludes many statistics like the distance from π n to the nearest leaf of F n , which is nearly local but cannot be deduced from the r-neighborhood of π n for any fixed value of r. The following corollary addresses this by a truncation argument. Corollary 3.8. Let f be a function defined on rooted trees, and suppose that min(f (T 0 ), r) is determined by the r-neighborhood of the root of T 0 , for any rooted tree T 0 and any r 0.
Proof. By assumption, we have min(f (F n ), r) = γ(F n (r)), where F n (r) denotes the rball around the root of F n and γ is some deterministic function on rooted trees. By Theorem 3.6, as n → ∞ we have F n (r) d −→ T (r) with respect to the discrete topology on rooted trees. Hence, by the continuous mapping theorem, γ(F n (r)) d −→ γ(T (r)), and in our original notation, min(f (F n ), r) d −→ min(f (T ), r). As P[X x] = P[min(X, r) x] for x < r, it holds for any x where the distribution function P[f (T ) x] is continuous that

Combinatorics of paths to leaves
In the next section, we will be interested in the limiting distributions of the shortest and longest distance of a random permutation to a leaf. In preparation for this, we consider the combinatorics related to the shortest and longest path to a leaf in this section.
For a permutation π ∈ S n , let T (π) be the fixed point tree with π as root. We call i a true fixed point of π if π(i) = i and i = 1. Proposition 4.1. Given π ∈ S n , a shortest path from π to a leaf in T (π) is obtained by bumping the rightmost true fixed point at each step.

Remark 4.2.
Note that the procedure of picking the rightmost true fixed point in π ∈ S n and then bumping the letter can be reformulated in the following way. Replace each letter in the one-line notation for π by its k-separation. Then scan the word from right to left and successively pick the first k = 0, 1, 2, . . . until no further subsequent bigger k can be picked to the left. Picking 0 in position 1 is not allowed. The letters in these positions in the original π are the letters that are bumped.
For example, take π = 32415. The k-separation word is given by 20130 (where as in where r i is a single letter. Then the sequence R 1 L 1 r 1 R 2 L 2 r 2 . . . is a bumpable sequence on v h+1 due to the relative shift of letter in the left f i−1 + 1 positions. This indeed shows that v is strictly longer than v.
Hence the path given by always bumping the leftmost true fixed point is the unique longest path to a leaf.
Note that unlike for the longest path to a leaf, the shortest path is not unique.
The shortest path to a leaf given by always bumping the rightmost fixed point is given in (4.1). Here is another shortest path: Denote by B := B(π) the set of bumped values in the longest path from π to a leaf. For example, B(32415) = {2, 3, 4, 5} as can be seen from Example 4.4. We now provide a characterization of the set B that enables us to determine if π(i) ∈ B given only knowledge of π(i + 1), . . . , π(n).
Local limit of the fixed point forest Lemma 4.5. For π ∈ S n and 1 i n, π(i) ∈ B ⇐⇒ π(i) = 1 and 0 π(i) − i #{j > i | π(j) ∈ B}. (4.2) Proof. Let π(i) ∈ B. First we show that then the conditions on the right of (4.2) hold. Since only true fixed points can be bumped, we certainly have π(i) = 1. Note that π(i) − i is the i-separation of π(i) at position i. In the longest path to a leaf, a letter in one-line notation either stays in its position or moves to the right unless it is bumped. Hence the separation of a letter either remains the same or becomes smaller (unless it is bumped). This implies that a letter with a negative separation can never be bumped, which shows that 0 π(i) − i. Now suppose that π(i) − i = k, meaning that π(i) is k-separated. Since π(i) ∈ B and hence must be bumped in the path to the leaf, there must be at least k letters to the right of π(i) that are bumped to move π(i) k positions to the right and make it 0-separated. This is precisely the condition π(i) − i #{j > i | π(j) ∈ B}.
Conversely, suppose that π(i) satisfies the conditions on the right of (4.2) and set π(i) − i = k. By Proposition 4.3, the longest path is obtained by always bumping the leftmost true fixed point, and hence any point that becomes 0-separated will eventually be bumped. Since by assumption 0 k #{j > i | π(j) ∈ B}, there are at least k points to the right of π(i) that get bumped, which makes π(i) eventually 0-separated, and hence π(i) ∈ B.
Using the above ideas, we can give a bound on the length of the longest path to a leaf, which will be useful in Section 5. For π ∈ S n , we denote by (π) the length of the longest path from π to a descendent leaf. Lemma 4.6. Fix π ∈ S n and let B : Proof. Let N i denote the number of times b i is bumped in the longest path in T (π) and let M i = j i N j . With this notation (π) = M 1 . Every time the letter b j is bumped to the beginning, it needs to be moved to the right one position at a time by b j − 1 bumps of letters to the right of b j . The position of b j can only be increased once by bumps of smaller letters b i < b j . At least b j − j letters greater than b j must be bumped for the value b j to return to position b j and be eligible to be bumped again. Therefore N j 1 + 1 bj −j M j+1 , and hence By inductively applying this inequality starting from M k ≤ 1, we obtain The first bound in (4.3) is sharp, giving the correct length 2 n−1 − 1 for the identity permutation of length n.

Limiting distributions and higher moments
In this section we compute the limiting distributions of two statistics of the fixed point tree F n using the local weak limit we have constructed in Section 3. As usual, π n is a uniformly random permutation of length n. We study two statistics related to leaves: 1. the distance M n from π n to the nearest leaf descending from it; 2. the distance L n from π n to the farthest leaf descending from it.
We use Geo(q) to refer to the geometric distribution on {0, 1, . . .} with parameter q, the number of failures before the first success in independent trials that are successful with probability q. This distribution places probability (1 − q) k q on k and has mean (1 − q)/q. (ii) Let L ∼ Geo e −1 . As n → ∞, we have L n d −→ L and EL p n → EL p for all p > 0.
Proof. We will prove that M n d −→ M and L n d −→ L in Propositions 5.3 and 5.5, respectively. In Proposition 5.11, we will show that sup n EM p n < ∞ and sup n EL p n < ∞ for any p > 0. It is a standard result that this proves the convergence of all moments (see [7,Theorem 4.5

.2], for instance).
An interesting open problem is to determine the limiting distribution of the number of steps from π n to a leaf when moving randomly towards a leaf.
Sections 5.1 and 5.2 are devoted to proving the convergence of M n and L n to their limiting distributions. Both M n and L n are functionals of F n satisfying the criteria of Corollary 3.8. This corollary then proves that M n and L n converge in distribution to the corresponding functionals of the limit tree. Thus, our task in these sections is to work out the distributions of the distances in T from the root to the nearest and farthest leaves descending from it. Section 5.3 gives our proof that M n and L n are bounded in L p , establishing the convergence of moments. We emphasize that this result is not just a technicality. The reentry phenomenon mentioned on Page 4 can cause L n to be extremely large. For example, if π n begins with the string 1 · · · k, then L n 2 k−1 , with the same small letters bumped repeatedly. The probability of such reentry is vanishingly small, making it irrelevant to the distributional convergence of L n . But these unlikely events nonetheless contribute to the moments of L n , and a priori it is plausible that even the expectation of L n tends to infinity. To prove that this is not the case, we are forced to develop several bounds that should be useful in future work on the fixed point forest.

Shortest distance to a leaf
Our first lemma is the limiting analogue of Proposition 4.1. Proof. Let M be the distance in the random tree T from the root to its nearest descendent leaf. As we mentioned in our summary of Section 5, Corollary 3.8 shows that M n → M in distribution, and thus we need only show that M ∼ Poi(1). By Lemma 5.2, M is the number of steps taken if we start at the root of T and bump the rightmost fixed point until no fixed points remain.
Walking towards a leaf in the tree in this way can be viewed as follows. Recall that ρ is the root of T . If ξ ρ 0 has no points, then the walk is over, and M = 0. Otherwise, let X 1 be the rightmost point of ξ ρ 0 , and let v 1 be the child of ρ corresponding to bumping it. Now, if ξ v1 0 has no points, then the walk is over and M = 1. Otherwise, let X 2 be its rightmost point of ξ v1 0 , which is necessarily to the left of X 1 , since ξ v1 . Continue in this way, making a sequence of vertices ρ, v 1 , . . . , v M and a sequence of points corresponding to them, X 1 , . . . , X M .
In this procedure, X 1 is the rightmost point of ξ ρ 0 , then X 2 is the rightmost point of ξ ρ 1 | [0,X1) , then X 3 is the rightmost point of ξ ρ 2 | [0,X2) , and so on. We can interpret this as follows: We start at 1, moving backwards in a Poisson process until we encounter a point, which takes Exp(1) time to arrive. Then, looking at a different Poisson process, we move backwards until we encounter a point, which again will arrive in Exp(1) time, independent of the first arrival. We then continue on in this way, and M is the total number of points encountered before we make it backwards to time 0. This is the same as counting the number of arrivals in a single Poisson process between time 0 and 1, which is distributed as Poi(1).

Farthest distance to a leaf
The next lemma is the limiting analogue of Proposition 4.3.
Lemma 5.4. The unique longest path from the root of T to a descendent leaf is obtained by bumping the leftmost abstracted fixed point at each step.
Proof. Suppose we have some sequence of points bumped in a path from the root of T to a descendent leaf in which some point y was not the leftmost when it was bumped. We will show the existence of a strictly longer path to a leaf. Let x < y be a point that could have been bumped instead of y. Decompose the original sequence of bumped points as P yL 1 a 1 L 2 a 2 · · · a r−1 L r . Capital letters in (5.1) denote (possibly empty) strings of points and lowercase letters denote single points. The string P consists of all bumped points prior to y. Next, L 1 is made up of points smaller than x, and a 1 is the first point larger than x. Then L 2 consists of points smaller than x, and a 2 is the next subsequent point larger than x, and so on. Note that almost surely, no point of ξ ρ 0 , ξ ρ 1 , . . . occurs more than once, so we never need to worry about repeated values in these sequences. We claim that we can instead bump the following sequence of points: P xL 1 yL 2 a 1 L 3 a 2 · · · a r−2 L r a r−1 .
As this sequence is one longer than (5.1), this claim completes the proof. Thus, we just need to show that this sequence is in fact bumpable, by which we mean that when each point is bumped in turn, the next one is an abstracted fixed point. Before we do this, we make a key observation: Suppose that z 1 · · · z k and z 1 · · · z k are two bumpable sequences, each of which contains the same number of points larger than z. Then z 1 · · · z k z is bumpable if and only if z 1 · · · z k z is bumpable. Clearly, P x is bumpable. Since all points in L 1 are smaller than x and x < y, as each point of L 1 is encountered in (5.1) and (5.2) the same number of larger points has already been encountered in each sequence. By our observation, the bumpability of P yL 1 implies the bumpability of P xL 1 . Next, since P y is bumpable and all points in xL 1 are smaller than y, the sequence P xL 1 y is also bumpable. Repeating this reasoning, bumpability of P yL 1 a 1 L 2 implies bumpability of P xL 1 yL 2 , and bumpability of P yL 1 a 1 implies bumpability of P xL 1 yL 2 a 1 . Continuing in this way, we arrive at the bumpability of (5.2).
The following proof involves a continuous-time Markov chain known as a Yule process.
At state k, it jumps to k + 1 at rate k, with no other transitions allowed. It is well known that if (Y t ) t 0 is a Yule process starting at 1, then Y t − 1 ∼ Geo e −t ; see [11, Section 4.1.D], for example. Proof. Let L be the length of the longest path in the limit tree T from the root to a descendent leaf. By Corollary 3.8, we have L n d −→ L. By Lemma 5.4 the random variable L is the length of the path down T given by bumping the leftmost fixed point until none remain.
First, we sketch one way to compute the distribution of L. Suppose we start at the root ρ of T . With probability e −1 , there are no points in ξ ρ 0 , and L = 0. If not, we bump the leftmost point X of ξ ρ 0 to move to a vertex v. The next set of abstracted fixed points is ξ v 0 , which is equal to ξ ρ 1 on [0, X) and to ξ ρ 0 on (X, 1]. This is a new Poisson point process independent of the past, and so again we stop with probability e −1 and L = 1. Continuing on in this way, L is the number of failures before the first success in independent trials that succeed with probability e −1 .
The above argument works and is the most direct, but we give a different proof now whose ideas will be useful in Section 5.3. Let B ⊆ [0, 1] be the set of points bumped in the longest path in T from the root to a descendent leaf. Lemma 4.5 can be restated in this setting: A point x of ξ ρ k is an element of B if and only if 0 k |B ∩ (x, 1]|. Thus, we can progressively build our set B by the following procedure. Start with B empty. Scan ξ ρ 0 from right to left starting at time 1 until a point is encountered, and add it to B. Now, scan ξ ρ 0 and ξ ρ 1 from right to left from this point until a point is encountered, and add it to B. Now scan ξ ρ 0 , ξ ρ 1 , and ξ ρ 2 , and so on, stopping when we reach time 0. In this procedure, the first point arrives at rate 1, since it is the first arrival time of a unit intensity Poisson process. The next point arrives at rate 2, since it is the first arrival time of two independent unit intensity Poisson processes, and the next at rate 3, and so on. Thus, the size of B is the number of increases of a Yule process in time 1, which is distributed as Geo e −1 .
It follows from this proposition that the root of T has finitely many descendants almost surely. In forthcoming work, this will be investigated further, and it will be shown that the expected number of descendants is infinite.

Higher moments
To bound the moments of L n and M n , we must leave behind the limit tree T and work directly with the finite fixed point forest. Since M n L n and we are only looking for upper bounds, we will deal exclusively with L n . Our first result gives an upper bound on L n in terms of B = B(π n ) = {b 1 , b 2 , . . . , b k }, the set of letters bumped in the longest path from π n to a descendent leaf. Since b 1 < b 2 < · · · < b k is increasing, b i − i is weakly increasing. Fix some x > 0 and denote by B x the subset of B such that b i − i < x for b i ∈ B x . Lemma 5.6. Fix x > 0. Then With this, the bound in (5.3) follows directly from Lemma 4.6.
We will bound EL p n using (5.3) by showing that |B x | and |B| are unlikely to be large.
Our first result in this direction, interesting in its own right, is an exponential tail bound on |B|. Proposition 5.5 exactly computes the distribution of the corresponding quantity in the limit case as Geo e −1 . Our proof here follows the same intuition, but it will be considerably more difficult.
Proposition 5.7. For some constant C < 1, it holds for all k, n 0 that P |B| k C k .
We will need several preliminary lemmas first. Recall from Lemma 4.5 that B can be constructed by moving leftward through the random permutation π n , successively revealing π n (n), π n (n − 1), . . . , π n (1) and tracking the set B as we go. Reversing the indexing so that we can count up instead of down, define This yields a pure birth process (that is, one that either stays the same or increases by one at each step) starting at X 0 = 0 and ending at X n = |B|. Our goal is to prove an exponential tail bound for X n that does not depend on n. In Theorem 5.5, we show that the analogous process for the limit tree is a Yule process. The process (X i ) 0 i n is not as straightforward, but the following lemma shows that it approximates a Yule process in the sense that it increases with probability approximately proportional to its current value.
Lemma 5.8. For k < n/2, Proof. For a given permutation σ ∈ S n , let x i (σ) = #{j > n − i | σ(j) ∈ B}, so that X i = x i (π n ). Taking i and k to be fixed, define By Lemma 4.5, the process x i (σ) increases in its next step if and only if 0 σ(n − i) − (n − i) x i (σ) and σ(n − i) = 1, showing that Now, we compare the sizes of T and U by a combinatorial switching argument. Suppose that σ ∈ T . For any j ∈ [n], let σ j = (σ(n − i), σ(j)) • σ, the permutation given by swapping the values at positions n − i and j in σ. Given σ j , i, , and n but without σ or j, we can recover σ by the formula σ = (n − i + , σ j (n − i)) • σ j and j by j = (σ j ) −1 (n − i + ). This shows that the map from T × [n] → S n given by (σ, j) → σ j is injective. We claim that for any σ ∈ T , the permutation σ j falls in U if either of the following holds: (i) j n − i; (ii) j > n − i + and σ(j) / ∈ B(σ).
Indeed, in the first case, σ and σ j differ only at locations n − i and smaller. Since x i (σ) is determined by σ(n − i + 1), . . . , σ(n), we have x i (σ j ) = x i (σ) = k, and hence σ j ∈ U. In the second case, the only way for x i (σ j ) to be different from x i (σ) is if σ j (j) ∈ B. But this cannot occur, since σ j (j) = σ(n − i) ≤ n − i + and j > n − i + . For σ ∈ T , there are n − i choices of j satisfying (i). There are another i − choices of j satisfying j > n − i + ; at most x i− (σ) x i (σ) = k of these have σ(j) ∈ B(σ), giving us at least i − − k choices of j satisfying (ii). Thus, for each σ ∈ T , there are at least n − − k choices of j for which σ j ∈ U. By injectivity of the map (σ, j) → σ j , |U| ≥ (n − − k)|T | ≥ (n − 2k)|T | for ≤ k. Substituting this into (5.5) gives (5.4). Now, we couple X i with a true Yule process, whose marginal distributions we know to be exactly geometrically distributed. The only complication is that we lose control of (X i ) if it becomes too large, which we deal with by considering it only up to a stopping time.
Lemma 5.9. Let Y t be a Yule process starting from 1. Let S = min{i | X i n/4}, taking S = n if the minimum is over the empty set. The processes (X i ) 0 i n and (Y t ) 0 t 4 can be coupled so that Proof. Essentially, we just need to confirm that X i is less likely to increase from time i to i + 1 than Y is from time 4i/n to 4(i + 1)/n. Formally, we start with (X i , i ∈ [n]) and then build up the Yule process inductively on the same probability space. Since a Yule process is Markov, we can construct (Y t ) up to some time t 0 and then extend it by attaching to its end a new Yule process starting at Y t0 .
Assume for some j ∈ {0, . . . , n} that we have already constructed a Yule process (Y t ) 0 t 4j/n so that 1 + X i Y 4i/n for i min(j, S). Note that this is trivial in the starting case j = 0. We want to show that if S > j, then we can extend (Y t ) up to time 4(j + 1)/n so that 1 + X j+1 Y 4(j+1)/n . (If S j, then we can just extend (Y t ) up to time 4(j + 1)/n independently from X j+1 , since the relationship between the two is irrelevant.) So, we assume S > j, which implies that X j < n/4. According to Lemma 5.8, under this assumption. The next increase of a Yule process at Y 4j/n arrives at rate Y 4j/n . Thus, conditional on Y 4j/n , the probability that a Yule process increases from Y 4j/n to Y 4j/n + 1 within time 4/n is The first inequality uses the inductive hypothesis that 1 + X j Y 4j/n , and the second uses the inequality 1 − e −x x/2, which holds for 0 x 1. Thus, conditional on X j and Y 4j/n , a Yule process starting at Y 4j/n is more likely to increase in time 4/n than is the process (X i ) from time j to j + 1. Thus, conditionally on X j and Y 4j/n , we can couple a Yule process starting at Y 4j/n of duration 4/n with (X i ) so that it increases only if (X i ) does from j to j + 1. Tacking this Yule process onto the end of (Y t ), we have extended our coupling as desired.
Proof of Proposition 5.7. Recall that X n = |B|. Couple (X i ) and (Y t ) according to Lemma 5.9. Fix k n/4, and let S = min{i | X i k}, taking S = n if the minimum is over the empty set. Since S S, we have 1 + X S Y 4S /n . We claim that if X n k, then Y 4 1 + k. To see this, observe that if X n k, then X S k. Thus Y 4 Y 4S /n 1 + X S 1 + k.  The right hand side of (5.7) is larger than that of (5.6). As X n does not take values larger than n, this shows that (5.7) holds for all k 0.
Next, we prove a subexponential tail bound on the size of B x for any fixed x.
Lemma 5. 10. For x > 0 and t n − x, Proof. In order for each of the first t values in B to satisfy b i − i < x, we must have that all of the values are less than t + x. For a specific choice of letters 1 < c 1 < · · · < c t , let E(c 1 , . . . , c t ) denote the event that the first t values of B are c 1 , . . . , c t . If |B x | t then E(c 1 , . . . , c 2 ) holds for one of at most t+x t possible choices of (c 1 , . . . , c t ). For a particular choice of letters c 1 < · · · < c t < t + x, To prove the final bound in (5.8), we apply the union bound and evaluate t + x t t i=1 x n − i + 1 = x t (x + t) · · · (x + 1) t! n · · · (n − t + 1) x t t! , since x + t n under the assumptions of the lemma, and then we apply the bound t! (t/e) t . Proposition 5.7 and Lemma 5.10 now combine to bound EL p n .
Proposition 5.11. For any p > 0, it holds that sup n EL p n < ∞.
Proof. Fix p > 0, and choose x large enough that (1 + 1 x ) 2p < C −1 for the constant C which is summable. Similarly, applying Lemma 5.10 and using |B x | n, The first term is finite, and the last term vanishes as n → ∞. Together with (5.10), this yields an upper bound on (5.9) with no dependence on n.

Further Questions
Many of the open questions in [13] about the global properties of the fixed point forest F n remain unanswered. Recall that each base of F n is a permutation with 1 as a fixed point. Let T n denote the tree in F n with the identity permutation as its base. In fact, T n is the largest tree in F n , which can be seen as follows. Let π be any base with π(1) = 1. Let i be the largest index of such that π(i) = i. Then π(i) is never bumped, so switching π(i) and π(1) creates a new permutation π with subtree at least as big as the tree with base π. Note that π ∈ T n , since all letters after 1 are fixed points. Hence an isomorphic copy of the tree starting from π is also in T n .
In [13,Propositions 24 and 25], it is shown that for a uniformly random permutation π n , 1 n P[π n ∈ T n ] e n , and it is conjectured that P[π n ∈ T n ] ∼ 1 n . We highlight this question here and tack on a few more of our own: Question 6.1. Prove that nP[π n ∈ T n ] converges as n → ∞ and determine its limit. Characterize all permutations in T n . How do the next largest components compare in size to it?
Another question posed in [13] is: Question 6.2. Let R n be the distance from π n to the base of its tree in the fixed point forest. What are the limiting asymptotics of ER n ?
One could also ask about the limiting fluctuations of R n from its mean.
Though some of our work in Section 4 could be helpful in addressing these questions, our limit tree has nothing to say about them. For example, the limiting tree has no base at all, reflecting that questions about the distance from π n to a base are not local. One could instead look for a different limit of the fixed point forest where edges are scaled so that the diameter of each component of the forest stays bounded, along the lines of the continuum random tree (see [1]).
The present work has created other avenues to explore. Continuing in the same theme as Section 5, what can we say about paths from root to leaf in the limiting tree besides the longest and shortest ones? In particular, Question 6.3. Walk from the root towards the leaves in the limiting tree by choosing randomly among all children at each step. What is the distribution of the number of steps before reaching a leaf?
Other properties of the tree are interesting as well: Question 6.4. Is simple random walk on the limiting tree transient or recurrent? What is the branching number (see [12,Section 1.2]) of the tree?
A random walk on the nonlimit fixed point forest can be interpreted as a stochastic version of the bumping process where we randomly bump and unbump letters of a permutation. A solution to the above problem would likely give information on how quickly this process moves from the starting permutation.
Another problem is to determine what happens when the root of the fixed point forest is chosen nonuniformly: Question 6.5. Determine the local limit of the fixed point forest when the root is sampled from the Ewens or Mallows distributions.
The fixed point of Mallows distributions are not distributed evenly over the permutation, so we would expect convergence to a limiting tree defined by Poisson point processes of nonuniform intensity.