Almost-sure asymptotics for the number of heaps inside a random sequence

We study the minimum number of heaps required to sort a random sequence using a generalization of Istrate and Bonchis’s algorithm (2015). In a previous paper, the authors proved that the expected number of heaps grows logarithmically. In this note, we improve on the previous result by establishing the almost-sure and L 1 convergence.


Introduction
The so-called Ulam's problem consists in estimating the length of the longest decreasing subsequence in a uniform random permutation σ of {1, . . . , n}. By duality, this question is equivalent to computing the minimal number of disjoint increasing subsequences of σ required to partition {1, . . . , n}. In [2], Byers et al proposed variations on this problem where the question of finding increasing subsequences in a permutation is replaced by that of finding heapable subsequences. Subsequently, Istrate and Bonchis [3] introduced a modification of the classical patience sorting algorithm called heap sorting algorithm which now computes the minimal number of binary heaps required to partition {1, . . . , n}.
In [1], we study a generalization of the algorithm which also allows for the heaps to be random. More precisely, let µ be a fixed offspring distribution on {1, 2, . . . , }. Let (U i , ν i ) be an i.i.d. sequence where U i and ν i are independent, U i is uniform on [0, 1] and ν i is distributed as µ. We use the following streaming algorithm to sort this sequence into Galton-Watson heaps i.e. labeled Galton-Watson trees with the condition that the label of each vertex is larger than that its ancestors.
Heap sorting algorithm for (U i , ν i ).
• We start at time 1 with a single tree containing a unique vertex (U 1 , ν 1 ) and set R(1) = 1. • At time n, we have R(n) trees. To each vertex of these trees is associated a pair (U, ν). The variable U represents the label of the vertex whereas ν prescribes the maximum number of offsprings that the vertex may have. A vertex (U, ν) is said to be alive if it has strictly less than ν children. • At time n + 1, we add (U n+1 , ν n+1 ) as the children of the vertex which is still alive and which has the largest label smaller than U n+1 . If no such vertex exists, we create a new tree with root (U n+1 , ν n+1 ).
This algorithm sorts the sequence (U i , ν i ), in their order of arrival, and in such way that 1. All the trees have the heap property.
2. The trees are asymptotically Galton-Watson distributed with offspring distribution µ and, at all time, the vertex with label U i has at most ν i children.
See Figure 1 for an illustration of the procedure. It turns out that, remarkably, this greedy algorithm is optimal in minimizing the number of trees at all time. In [1], we proved that, for any offspring distribution µ which is not the Dirac mass in 1 (i.e. we exclude Ulam's problem), then the expectation of the number of trees grows logarithmically as it was predicted in [3]: The aim of this note is to bootstrap the result above, proving that the limit of R(n)/ log n also holds almost surely and in L 1 .
As explained in [1] (and briefly recalled in the next section), we can associate to the heap sorting algorithm a particle system which plays the same role as Hammersley's line particle system for Ulam's problem. One of the main results of [1] states that this particle system, while initially defined on compact intervals, can be extended to an infinite particle system on the whole line. Thus, the strategy to prove Theorem 1.1 is to first establish the almost sure convergence for an analog of R(n) associated with this infinite system on R and then transfer the result back to the discrete case. In this study, the key ingredients are the remarkable scaling properties of the infinite volume system together with monotonicity arguments.

Almost-sure convergence for the process on the half-plane
We start by recalling the construction of the Hammersley's tree process associated with the heap sorting algorithm introduced in [1]. Let H denote the upper half-plane For any a < b, we consider the following particle system H on (a, b) × N constructed from the atoms of Ξ inside the strip (a, b) × (0, ∞).
• There is no particle at time t = 0.
• Given H(t − ), an atom (u, t, ν) of Ξ with u ∈ (a, b) creates in H(t) a new particle at position u with ν lives. Furthermore, the particle in H(t − ) with the largest label smaller than u loses one life (if such a particle exists) and is removed from the system if it was its last life.
We can represent the genealogy of the particles using a set of vertical and horizontal lines. Here, vertical lines denote the positions of particles through time and horizontal lines connect particles to their father on their left (or to the vertical axis if they have no father). We denote G a,b this graphical representation of the process. See Figure 2 for an illustration.
For a = 0 and b = 1, this particle system may be seen as a continuous time embedding of the heap sorting algorithm where new labels now arrive with Poissonian rate instead of integer time. Therefore, the heaps created by the algorithm are exactly the trees "drawn" by the graphical representation. In particular, the number of trees (equiv. heaps) created between time s and t is equal to the number of horizontal lines in G 0,1 intersecting the vertical segment {0} × [s, t].
Since incoming particles do not affect particles already present on their right, the graphical representations G a,b are compatible for different values of the left boundary Thus, there is no problem to define G −∞,b . Clearly, this compatibility relation does not hold anymore when it is the right boundary that extends since new particle may "kill" their left neighbour. However, Theorem 4.4 of [1] states that the graphical representation G −∞,b still converges locally, almost surely, as b tends to infinity, to a random graphical The following result is the counterpart of Theorem 1.1 for the infinite volume system. Let us point out that this result does not assert the finitness of E(R ∞ [1, e]) (otherwise, the limit above is simply infinite). However, E(R ∞ [1, e]) is indeed always finite as we shall see later.
Proof. We decompose the number of horizontal lines crossing the vertical axis during the time interval [1, e n ] in the following way: For any i > 0, the invariance of the Poisson measure under the mapping Thus, it remains to prove that the sequence (X i := R ∞ [e i , e i+1 ], i ≥ 0) is mixing i.e. that for any n, m and any bounded functions f : R n+1 → R and g : R m+1 → R, Fix n, m ≥ 0 and k > n + 1. LetX k denote the number of horizontal lines crossing the segment {0} × [e k , e k+1 ] when we remove all the atoms of Ξ below height e n+1 . By construction, G ∞ ∩ (R × (0, t)) is determined by the atoms of Ξ below height t. In particular, this implies that (X 0 , . . . , X n ) is independent of (X k , . . . ,X k+m ). Moreover, up to a translation, the graphical representation obtained by removing all atoms below a given height as the same law as G ∞ . Thus, the vector (X k , . . . ,X k+m ) has the same distribution as the vector (R ∞ [e k − e n+1 , e k+1 − e n+1 ], . . . , R ∞ [e k+m − e n+1 , e k+m+1 − e n+1 ]), which is also equal, using the scaling property, to the law of (R ∞ [1 − e n+1−k , e − e n+1−k ], . . . , R ∞ [e m − e n+1−k , e m+1 − e n+1−k ]). Therefore, we obtain the limit in law lim k→∞ (X k , . . . ,X k+m ) L = (X 0 , . . . , X m ).

(2.3)
On the other hand, adding atoms below a given height s can only decrease the number of horizontal lines crossing the segment {0} × [s, t] (see for instance Equation (12) of [1] for more details). This monotonicity result implies that, for any k > n + 1, We can now write E [f (X 0 , . . . , X n )g(X k , . . . , X k+m )] = E f (X 0 , . . . , X n )g(X k , . . . ,X k+m ) The first term of the r.h.s. of the last equality tends to E [f (X 0 , . . . , X n )] E [g(X 0 , . . . , X m )] according to (2.3). Concerning the second term, we write E f (X 0 , . . . , X n )(g(X k , . . . ,X k+m ) − g(X k , . . . , X k+m )) Finally, the following easy lemma ascertains that sup i≥k P{X i = X i } tends to 0 which concludes the proof of (2.2).
Almost-sure asymptotics for the number of heaps Lemma 2.2. Let (U k ) and (V k ) be two sequences of integer-valued random variables such that Then, Proof. We first show by induction on i that Indeed, we find, using (i), that which, according to (iii), tends to 0 as k tends to infinity. Now, for i ≥ 1, we write The induction hypothesis combined with (iii) implies that the r.h.s. of the last equation tends to 0 as k tends to infinity. Hence, (2.5) holds for all i. Finally, writing that, for any A > 0, and using the tightness of the sequence (U k ), we deduce that P(V k = U k ) tends to 0 as k tends to infinity.
] be the number of horizontal lines attached to the y-axis between heights e i and e i+1 when we consider only the atoms of Ξ with absciss in the interval [0, e N −i ]. Using the same monotonicity argument as above, we have, for any i ≥ N , Using again the invariance of the law of Ξ under the mappings The argument is the same as in the previous section. Indeed, consider, for k > N + n, the numberX N k of horizontal lines crossing the y-axis between height e k and e k+1 when we only take into account the atoms of Ξ in the domain [0, e N −k ] × [e N +n+1 , ∞). It is easily checked that the following holds Almost-sure asymptotics for the number of heaps Remark 3.2. In a previous paper [1], it was shown that the infinite graphical representation exists, which is the same as saying that R ∞ (s, t) is finite for any 0 < s < t. However, it was not proved that the expectation of R ∞ (s, t) is also finite. This is now a consequence of the previous proposition combined with the main result of [1] stating that c µ is always finite. Still, we point out that the arguments presented here do not allow, by themselves, to recover that c µ is finite.
We now have all the tools needed to prove Theorem 1.1.