Height and contour processes of Crump-Mode-Jagers forests (I): general distribution and scaling limits in the case of short edges

Crump-Mode-Jagers (CMJ) trees generalize Galton-Watson trees by allowing individuals to live for an arbitrary duration and give birth at arbitrary times during their life-time. In this paper, we are interested in the height and contour processes encoding a general CMJ tree. We show that the one-dimensional distribution of the height process can be expressed in terms of a random transformation of the ladder height process associated with the underlying Lukasiewicz path. As an application of this result, when edges of the tree are"short"we show that, asymptotically, (1) the height process is obtained by stretching by a constant factor the height process of the associated genealogical Galton-Watson tree, (2) the contour process is obtained from the height process by a constant time change and (3) the CMJ trees converge in the sense of finite-dimensional distributions.


Galton-Watson forests and their scaling limits.
A planar discrete rooted tree is a rooted tree where edges have unit length and which is endowed with an ordering on siblings, in such a way that it can be naturally embedded in the plane. Since the seminal work of Aldous, Neveu, Pitman and others [2,3,4,17,22,23], it is well known that such a tree is conveniently encoded by its height and contour processes. To generate these processes, one can envision a particle starting from the root and traveling along the edges of the tree at unit speed, from left to right. The contour process is simply constructed by recording the distance of the particle from the root of the tree. To generate the height process, we start by labeling the vertices of the tree according to their order of visit by the exploration particle (i.e., from left to right): the height process evaluated at k is then given by the distance from the root of the kth vertex.
From a probabilistic standpoint, a particularly interesting case is the Galton-Watson case where each individual u in the tree begets a random number of offspring ξ u , these random variables being i.i.d. with common distribution ξ. In the critical and subcritical cases -i.e., when E(ξ) ≤ 1 -the tree is almost surely finite. Considering an infinite sequence of such i.i.d. random rooted planar trees, we can generate a random (planar) forest with its corresponding contour and height processes -respectively denoted by C and H -obtained by pasting sequentially the height and contour processes of the trees composing the forest.
When E(ξ 2 ) < ∞, Aldous [4] proved that the large time behavior of those processes (properly normalized in time and space) can be described in terms of a reflected Brownian motion. More precisely, in the critical case E(ξ) = 1 and if 0 < σ = Var(ξ 2 ) < ∞, we have 1 with w a standard Brownian motion and the convergence holds weakly (in the functional sense). When the second moment of the offspring distribution is infinite and the offspring distribution is in the domain of attraction an α-stable law with α ∈ (1, 2), Le Gall and Le Jan [19] and then Duquesne and Le Gall [9] proved the existence of a scaling sequence (ε p , p ∈ N) and a limiting continuous path H ∞ such that where H ∞ can be expressed as a functional of a spectrally positive Lévy process. As in the finite second moment case alluded above, we note that the height and contour processes are asymptotically related by a simple deterministic and constant time change.
1.2. Crump-Mode-Jagers forests. The subject of the present paper is the study of the height and contour processes of planar Crump-Mode-Jagers (CMJ) forests, which are random instances of chronological forests. Chronological trees generalize discrete trees in the following way: each individual u is endowed with a pair (V u , P u ) such that: (1) V u ∈ (0, ∞) represents the life-length of u; (2) P u is a point measure which represents the age of u at childbearing. In particular, we enforce Supp(P u ) ⊂ (0,V u ], so that individuals produce their offspring during their lifetime. Note that |P u | = P u (0,V u ] is the number of children of u. As noted by Lambert in [14], a chronological tree can be regarded as a tree satisfying the rule "edges always grow to the right". This is illustrated in Figure 1 where we present a sequential construction of a planar chronological forest from a sequence of "sticks" ω = (ω n , n ≥ 0), where ω n = (V n , P n ). n = 0 n = 1 n = 2 n = 3 n = 4 n = 5 n = 6 n = 10 FIGURE 1. We start at n = 0 with nothing, then add ω 0 at time n = 1. At this time, there are two stubs and so the next stick ω 1 is grafted to the highest stub, and we repeat until time n = 10 at which time no more stub is available and the tree is built. Then, the next step proceeds with the construction of the next tree, thus constructing the second tree of the forest, etc.
At time n = 0 we start with the empty forest and we add the stick ω 0 at time n = 1. In the case considered in Figure 1, P 0 has two atoms which correspond to birth times of individuals, but these two atoms are not yet matched with the sticks corresponding to these individuals. These unmatched atoms are called stubs, and when there is at least one stub we apply the following rule: Rule #1: if there is at least one stub, we graft the next stick to the highest stub.
Thus, we iteratively apply this rule until there is no more stub, at which point we have built a complete chronological tree with a natural planar embedding. Figure 1 illustrates a particular case where at time 10 there is no more stub, in which case we apply the following rule: Rule #2: if there is no stub, we start a new tree with the next stick.
Thus, starting at time n = 0 from the empty forest and iterating these two rules, we build in this way a forest F ∞ , possibly consisting of infinitely many chronological trees. By definition, a CMJ forest is obtained when the initial sticks are i.i.d., and throughout the paper we will denote their common distribution by (V * , P * ).

Chronological height and contour processes of CMJ forests.
As for discrete trees, the contour process of a CMJ forest is obtained by recording the position of an exploration particle traveling at unit speed along the edges of the forest from left to right, moving, when a chronological tree is represented as in Figure 1, at infinite speed along dashed lines. This process will be referred to as the chronological contour process associated to the CMJ forest, and the chronological height of the nth individual is defined as its date of birth. We define the genealogical contour and height processes as the contour and height processes associated to the discrete forest encoding the genealogy of F ∞ .
Contour processes of CMJ forests have been considered by Lambert in [14] in the particular setting where birth events are distributed in a Poissonian way along the sticks independently of the life-length -the so-called binary, homogeneous case. Under this assumption, the author showed that the (jumping) contour process is a spectrally positive Lévy process. See also [8,10,15,16,24,25] for related works.
To our knowledge, little is known in the general case and in the present study, we determine in full generality: (1) the distribution of the contour/height process of a CMJ forest; (2) the correlation between the height/contour process of a CMJ forest and the height/contour process of its underlying genealogy.
One of our first result is a description of the one-dimensional marginal of the height processes of a CMJ forest in terms of a bivariate renewal process. This two-dimensional process is constructed as a random functional of the weak ascending ladder height process associated to the dual Lukasiewicz path starting from n. This is is the subject of Section 3.

Scaling limits.
In the near-critical case it is well-known that, properly scaled in time and space, the genealogical height and contour processes associated to Galton-Watson trees converge toward a continuous process. Except for the binary, homogeneous case and to the best of our knowledge, little is known outside this case: we claim that our results highlighting the distribution of the chronological height process can be used to deal with a broad class of CMJ forests.
To support this claim, we treat in details in the present paper the case of short edges where the genealogical and chronological structures become deterministically proportional to one another. Moreover, current work in progress [32] suggests that our techniques can be extended to a broader class of CMJ forests including cases where the genealogical and chronological structures are not deterministically obtained from one another, see Section 1.6 below for more details.
To explain our results in the short edge case, let Y * be the random number obtained by first size-biasing the random variable |P * | (i.e., the number of atoms in the point measure P * ) and then by recording the age of the individual when giving birth to a randomly chosen child. The mean of Y * has a simple expression, namely E(Y * ) = E uP * (u . ) .
As noticed by Nerman [21], this random variable describes the age of an ancestor of a typical individual u when giving birth to the next ancestor of u. For this reason, Y * and in particular the condition E(Y * ) < ∞ -which is one way to formalize the "short edge" condition -plays a major role in previous works on CMJ processes, see for instance [26,27,28,29,30,31]. In the present paper we prove that if E(Y * ) < ∞, then in the nearcritical regime the asymptotic behavior of the chronological height process is obtained by stretching the genealogical height process by the deterministic factor E(Y * ). This result is stated and proved in Section 4. The analysis of the contour process is more delicate (see Section 1.5 below for more details). Our main result shows that when E(V * ) < ∞ -another way to formalize the "short edge" condition -the chronological contour process is obtained from the chronological height process by rescaling time by the deterministic factor 1/(2E(V * )). Hence, again provided that edges are short enough, this result provides a relation between the height and contour processes which is analogous to the discrete case. This result is stated in Section 5 where the general structure of the proof is given, and details are provided in Section 7.
Finally, we prove that when both Y * and V * have finite means, the minimum of the chronological contour process is obtained by scaling the minimum of the genealogical height process, in space by E(Y * ) and in time by 1/E(V * ). This shows that the genealogical and chronological trees, and not only the height/contour processes, are asymptotically close to one another. In particular, under these assumptions the CMJ trees themselves converge in the sense of finite-dimensional distributions.
1.5. Technical challenges. As already discussed, Duquesne and Le Gall [9] showed under rather mild conditions that the contour and height processes of Galton-Watson trees converge weakly to a continuous function. In the CMJ framework, we establish convergence in the sense of finite-dimensional distributions to a limiting object provided that edges are short enough. In Section 8 we present simple examples where finite-dimensional distributions of the scaled contour and height processes converge, but the processes themselves fail to converge in a functional sense. To be more precise, in this example the contour process becomes unbounded on any finite time-interval. This gap between convergence of finite-dimensional distributions and weak convergence also exists in the Galton-Watson case, however we argue in Section 8 that it is more significant in the CMJ case.
The main steps of the proof of our result on the relation between the contour and height processes in the case of short edges (i.e., when E(V * ) < ∞) are highlighted in Section 5.2. Due to the potential existence of pathological times when the contour/height process becomes degenerate (as illustrated by the example in Section 8), the convergence of the contour process raises new technical challenges that are absent in the discrete setting. In order to overcome those difficulties, we develop new tools presented in Sections 6 and 7.
1.6. Perspectives. The present paper aims at initiating the systematic study of scaling limits of CMJ forests. Most of the present paper is devoted to developing fundamental tools which, we believe, have the potential to tackle a broad class of CMJ forests and which will be the basis of subsequent papers.
The cornerstone of our approach is Proposition 3.4 below, which indicates how to recover a CMJ forest from its underlying genealogy by a random stretching. At the discrete level, this stretching operation is correlated with the genealogical structure in intricate ways but it suggests three possible universality classes: First class: the random stretching becomes asymptotically deterministic (this is the class to which Galton-Watson forests belong); Second class: the stretching remains random in the limit, but uncorrelated with the genealogy; Third class: the stretching remains random and correlated with the underlying genealogical structure.
To show the potential of our techniques, we deal in the present paper with the first class, which corresponds to the "short edge" condition discussed earlier.
In current work in progress [32] we are dealing with the second class. Starting from the limiting genealogical structure, encoded by a continuous path, the chronological height process is obtained by marking the branches of the forest with a Poisson point process: each mark carries a random number encoding the chronological contribution of the vertex under consideration. We conjecture that the limiting object should be related to the Poisson snake (see e.g., [1] and [5]).
Finally, studying the third class will presumably require new ideas given that the correlation structure may be quite involved: this will be the subject of further study.

SPINE, HEIGHT AND CONTOUR PROCESSES
In this section, we introduce the spine process, that can be thought of as a generalization of the exploration process first defined by Le Gall and Le Jan in [19].
The idea underlying the definition relies on the decomposition of the "spine" -or "ancestral line" -lying below the point of the tree corresponding to the birth of the nth individual. In the nth step of the sequential construction presented on Figure 1, this corresponds to the path in the forest starting from the root and reaching up to n (which also corresponds to the right-most path in the planar forest constructed at step n). As can be seen from the figure, this path is naturally decomposed into finitely many segments that correspond to each ancestor's contribution to the spine.
The spine process at n is then defined as a sequence of measures that encodes this decomposition. More precisely, we start by labeling ancestors from highest to lowest. Then, the kth element of the spine process (evaluated at n) is simply the measure that records the location of the stubs on the kth segment -crosses on  2.1. Notation. Let Z denote the set of integers and N the set of non-negative integers. For x ∈ R let [x] = max{n ∈ Z : n ≤ x} and x + = max(x, 0) be its integer and positive parts, respectively. If A ⊂ R is a finite set we denote by |A| its cardinality. Throughout we adopt the convention max = sup = −∞, min = inf = +∞ and b k=a u k = 0 if b < a, with (u k ) any real-valued sequence.
. Same construction as in Figure 1, but now with the spine highlighted in thick line. This allows to differentiate three kinds of atoms: Cross: represents a stub and corresponds to an atom on the spine whose subtree has not been explored yet; Circle: represents an atom on the spine whose subtree is being explored; Square: represents an atom whose subtree has been explored and that is no longer on the spine.

Finite sequences of measures.
We let M * = ∪ n∈N (M \ {z}) n be the set of finite sequences of non-zero measures in M . For Y ∈ M * we denote by Len (Y ) the only integer n ∈ N such that Y ∈ (M \ {z}) n , which we call the length of Y , and identify z with the only sequence of length 0. For two sequences Y 1 = (Y 1 (1), . . . , Y 1 (H 1 )) and ∈ M * as their concatenation: Further, by convention we set [z, Y ] = [Y , z] = Y for any Y ∈ M * and we then define inductively Note that, with these definitions, we have Len ([Y 1 , . . . , Identifying a measure ν ∈ M \{z} with the sequence of length one (ν) ∈ M * , the above definitions give sense to, say, [Y , ν] with Y ∈ M * and ν ∈ M \{z}. The operator π defined on M is extended to M * through the relation Recalling the convention 0 k=1 = 0, we see that π(z) = 0 and further, it follows directly from the above relation that π([Y 1 , . . . , Y N ]) = π(Y 1 ) + · · · + π(Y N ).
We say that a mapping Γ : Ω → X is a genealogical mapping if it is invariant by the genealogical operator, i.e., if Γ • G = Γ. The shift and dual operators are related by the following relations: and for any random time Γ : Ω → Z we have 2.2. Spine, height and contour processes. We now proceed to a formal definition of the various processes which will be studied.
where H = max{k ≥ 1 : |Y (k)| ≥ 2}. Note that by definition, we have Φ(Y , ν) ∈ M * for Y ∈ M * and ν ∈ M and that further, if ν = z then Φ(Y , ν) = z. Next, we consider the M * -valued sequence S 0 = (S n 0 , n ≥ 0) (the subscript 0 will be justified below, see (3.8)) defined recursively by (2.4) S 0 0 = z and S n+1 0 = Φ(S n 0 , P n ), n ≥ 0. This dynamic is illustrated on Figure 2. As already discussed in the introduction, the kth element of S n 0 (ordered from top to bottom) records (1) the location of the stubs on the kth segment in the spine decomposition illustrated in Figure 2, and (2) the age of the kth ancestor (of n) when begetting the (k − 1)st ancestor (identifying, for k = 1, the individual with its 0th ancestor). In words, the recursive relation (2.4) encodes the fact that the birth event corresponding to the (n + 1)st individual coincides with the next available stub after grafting the nth stick on top of S n 0 . In particular, if no stub is available, a new spine is started from scratch (third relation).
We note that when S n 0 = z, any element of the sequence S n 0 contains at least one atom: the one corresponding the birth of an ancestor, which is not counted as a stub: In particular, the condition H = max{k ≥ 1 : |Y (k)| ≥ 2} in (2.3) reads "look for the first available segment with a stub".
Remark 2.1. The definition of the spine process is similar, but not completely analogous to the exploration process of Le Gall and Le Jan in [19]. Therein, the authors only consider the stubs attached to the spine. However, in the chronological case, not only do we need to keep track of the number of available stubs, but one needs to also record the length of the segments carrying those stubs (in the discrete case, this is always equal to 1). This is done by adding the additional atom corresponding to the birth of the "previous" ancestor (when ancestors are labelled from top to bottom), and whose location coincides with the length of the corresponding segment.

2.2.2.
Chronological height and contour processes. We define the chronological height process H = (H(n), n ≥ 0) by the relation H(n) = π(S n 0 ), n ≥ 0. Informally, H(n) is the birth time of the nth individual. We consider the associated chronological contour process, which is the continuous-time process C = (C(t ), t ≥ 0) with continuous sample paths defined inductively as follows. In the sequel, we define Note that the sequence (K n , n ≥ 0) is non-decreasing, and we will assume that its terminal value is infinite. (This assumption will hold a.s. for (sub)critical CMJ forests). We start the initialization by setting C(K 0 ) = 0. Assume that C has been built on [0, K n ], we extend the construction to [0, K n+1 ] in the following way: C first increases at rate +1 up to H(n)+V n and then decreases at rate −1 to H(n+1). Since H(n+1) ≤ H(n)+V n , this is well-defined and this extends the construction up to the time K n +2V n +H(n)−H(n+1) = K n+1 as desired.
It is not hard to prove that C is the usual contour process associated with the forest F ∞ seen as a forest of continuous trees. Indeed, our definition coincides with the usual definition of C(t ) as the distance to the origin of a particle going up along the left side of an edge and going down along the right side, see for instance Le Gall [18] for a formal and general definition in the realm of real trees.

2.2.3.
Genealogical height and contour processes and exploration process. We define H = H•G and C = C•G which we call genealogical height and contour processes, respectively, and ρ n 0 = S n 0 • G the exploration process. As explained in Remark 2.1, it is closely related to the classical exploration process introduced by Le Gall and Le Jan [19].

THE SPINE PROCESS AND THE LUKASIEWICZ PATH.
In this section, we relate the spine process to the well-known Lukasiewicz path. More precisely, the spine process is expressed in terms of a random functional of the weak ascending ladder height process associated to the dual Lukasiewicz path. This is the content of Proposition 3.5 below. In a forthcoming section, this result will allow us to express the one-dimensional marginal of the spine process in terms of a bivariate renewal process, and will be instrumental in proving our main scaling limit results for height and contour processes in the short edges case.
3.1. Lukasiewicz path. We define the Lukasiewicz path S = (S(n), n ∈ Z) by S(0) = 0 and, for n ≥ 1, Note that if Γ is a random time, the dual operator acts as follows: It is well known that in the discrete case, the height process is directly related to the sequence of weak ascending ladder times. As we shall see, in the chronological case, more structure of the ladder height process is needed. In particular, the height (and spine) process will be expressed not only in terms of the ladder height times, but also in terms of the undershoot upon reaching the successive records of S (through the quantity Q defined below). In order to make this more precise, we consider the following functionals associated to S, which will be used repeatedly in the rest of the paper: • the sequence of weak ascending ladder height times: T (0) = 0 and for k ≥ 0, • the hitting times upward and downward: so that ζ ℓ is the undershoot upon reaching level ℓ; • and the backward maximum We will pay special attention to the following functionals of the ladder height process: • the following two inverses associated to the sequence (T (k), k ≥ 0): The fact that µ 0 = z implies that Q(k) = z whenever it is well-defined, a simple fact that will be used later on. If n is a weak ascending ladder height time, then T −1 (n) = T −1 (n) with T ( T −1 (n)) = n = T (T −1 (n)), while if n is not a weak ascending ladder height time, It is well-known that A (n) ∩ R + is the set of n's ancestors, see for instance Duquesne and Le Gall [9]. This property relates the height process and the weak ascending ladder height times T through the following identity: The genealogical height is also given by the length of S n 0 as we show now. Proof. As highlighted in Remark 2.1, the exploration process ρ n 0 = S n 0 • G slightly differs from the classical definition of the exploration process in Le Gall and Le Jan [19]: however, this slight difference does not alter the length of the sequence, which remains unchanged between the two definitions.
Since the length of the sequence in the classical exploration process coincides with the height process, this implies that Len ρ n 0 = H (n). Thus, Len S n 0 = Len S n 0 • G = Len S n 0 • G , which proves the desired result. Define Then m∧n ∈ Z and m and n have an ancestor in common (i.e., belong to the same tree) if and only if m∧n ≥ 0 in which case m∧n is the lexicographic index of their most recent common ancestor -see for instance [9]. We end this section by listing the following identities, which are proved in the Appendix A. The second identity involves the condition L(n − m) • ϑ m > 0: it is readily checked that

Lemma 3.3. For any n
3.2. Fundamental formula for H(n). As mentioned earlier, A (n) ∩ R + is the set of ancestors of n. More precisely, n − T (k) • ϑ n is the index of the kth ancestor of n, assuming that ancestors are ordered from highest to lowest date of birth (or height). Further, interpreting Y(k)•ϑ n as the age of the kth ancestor when giving birth to the (k−1)st ancestor motivates the following result.

Proposition 3.4.
For every n ≥ 0, we have Since H(n) = π(S n 0 ) = Len S n 0 k=1 π(S n 0 (k)), Proposition 3.4 is an immediate corollary of the following result which proves a more general relation between the spine process and the Lukasiewicz path. S n 0 = Q( T −1 (n)), . . . , Q(1) • ϑ n , n ≥ 0. The rest of this section is devoted to proving Proposition 3.5. We prove it through several lemmas, several of which will be used in the sequel. To prove these results, for m ≥ 0 and k ∈ {0, . . . , |P m |} we introduce corresponds to the index of (k + 1)st child of the mth individual (with the convention that children are ranked from youngest to oldest); whereas χ(m) is the index of the highest stub on S m 0 (i.e., right before attaching the mth individual). In particular, any individual n ∈ {m + 1, . . . , χ(m) − 1} belongs to a subtree attached to m. In view of this interpretation, the two following lemmas seem quite natural. (On Figure 2, and χ(1) = 4). For the proof of Lemma 3.9 we will need the following identity, whose proof is defered to Appendix B. This inequality implies that, since n ∈ {m + 1, . . . , χ(m) − 1}, there is at least one more ladder height time for the dual Lukasiewicz process seen from n as compared to the dual Lukasiewicz process seen from m. In view of the relation (3.2) which expresses H (n) = T −1 (n) • ϑ n as the number of weak ascending ladder height times of the dual Lukasiewicz process, this means precisely that H (n) > H (m). We now prove that S n 0 (ℓ) = S m 0 (ℓ). Since n ∈ {m + 1, . . . , χ(m) − 1}, in order to prove this it is enough to prove that χ ′ ≥ χ(m) where we define . In view of the definition (2.3) of Φ and the dynamic (2.4), we see that the ℓth element of the spine between m and n is modified only if the length of the spine goes below ℓ between m and n. Since the length of the spine coincides with H , this implies H (χ ′ ) = ℓ ≤ H (m). Finally, since H (m) < min {m+1,...,χ(m)−1} H , this implies that χ ′ ≥ χ(m) and concludes the proof.
Proof. By definition of χ(m, k) and the fact that S only makes jumps of negative size −1, we have A similar argument as in the proof of the previous lemma then leads to the conclusion H (χ(m, k)) = H (m) + 1 (i.e., by showing that there is exactly one extra ladder height time for the dual walk seen from χ(m, k)).
We now prove that S For k = 0 this is seen to be true by looking at the dynamic (2.4). We now prove that this is true by induction: so assume this is true for k ∈ {0, . . . , |P m | − 2} and let us prove that this continues to hold for k + 1. In order to do so, it is sufficient to combine the induction hypothesis with the following claim: In order to prove this identity, we first note that (again, this is seen by comparing the number of ladder height times of the dual processes seen from the two times) Finally, we already know that H (χ(m, k + 1)) = H (m) + 1. From the dynamic (2.4), this implies that the (H (m) + 1)st element of S n 0 remains unchanged for n = χ(m, k) + 1, . . . , χ(m, k + 1) − 1, but that one stub is removed at time χ(m, k + 1), i.e., This proves the claim made earlier and ends the proof of Lemma 3.8.
Proof of Lemma 3.9. Let us first prove the result for k = 1, so we consider n ≥ 0 with τ 0 • ϑ n ≤ n and we prove that Combining the two previous lemmas, we see that for any m ≥ 0 and any i ∈ {0, . . . , |P m | − 1}. In particular, Lemma 3.6 shows that we can apply this to m = n − τ 0 • ϑ n and i = ζ 0 • ϑ n , which gives On the one hand, we have χ(m, i ) = n (again by Lemma 3.6) and so in particular S Combining the above arguments concludes the proof for k = 1. The general case follows by induction left to the reader.
We can now prove Proposition 3.5.

Right decomposition of the spine.
In the case of i.i.d. life descriptors, the spine process is easily seen to be a Markov process. In the forthcoming Section 3.4, Proposition 3.5 will allow us to express the one-dimensional marginal of this process in terms of a bivariate renewal process. The present section can be seen as a description of the transition probabilities of the spine process: we show that for m ≤ n, the spine at n is deduced from the spine at m by truncating S m 0 and then by concatenating a spine that is independent of the past up to m, a construction reminiscent of the snake property -see Duquesne and Le Gall [9]. As we shall now see, the independent "increment" will be given by which, when life descriptors are i.i.d., is distributed as the original spine at time n − m.
• θ m , we note that an immediate consequence of (2.1) and (3.6) is that In order to prove Proposition 3.10, we will need the following lemma.
Lemma 3.11. For any n ≥ m ≥ 0 we have If in addition m∧n ≥ 0, then Proof. By definition we have S n m = S n−m 0 • θ m and so Proposition 3.5 implies that The first relation (3.11) thus follows from the identity ϑ n−m • θ m = ϑ n of (2.1). To prove the other relation (3.12), we use (3.11) with m random, which in this case reads as follows: for any random time Γ, the relation (3.4). Then we always have Γ ≤ n and so under the assumption m∧n ≥ 0, we obtain Since T −1 (T (k)) = k for any k ≥ 0, we obtain the result.
Remark 3.12. Let us comment on (3.14) as similar identities will be used in the sequel. To see how it follows from (3.13), write (3.13) in the form S n m = (U • ϑ n )(m) for some mapping U with domain Ω and values in the space of M * -valued sequence, so that (U •ϑ n )(m) is the mth element of the dual sequence. With this notation, we can directly plug in a random time, i.e., if m = Γ is random then we have S n Γ = (U • ϑ n )(Γ) and in particular, Proof of Proposition 3.10. By (3.4), m∧n ≥ 0 implies that T (T −1 (n − m)) • ϑ n ≤ n and so Lemma 3.9 with k = T −1 (n − m) gives Combining (3.4), which shows that S n−T (T −1 (n−m))•ϑ n 0 = S m∧n 0 , and the expression for S n m∧n given in (3.12) under the assumption m∧n ≥ 0 gives the first part of the result, namely that S n 0 = S m∧n 0 , S n m∧n . In order to show (3.10) and thus complete the proof, we distinguish between the two cases L(n − m) • ϑ m = 0 and L(n − m) • ϑ m > 0.
If L(n − m) • ϑ n = 0, then m∧n = m according to Lemma 3.2 which proves (3.10). Assume now that L(n − m) • ϑ n > 0: in view of (3.3), this means that n − m is not a weak ascending ladder height time of S •ϑ n and so We then obtain by Lemma 3.11 the relation S n • ϑ m in this case by (3.5), we obtain the result.
3.4. Probabilistic description of the spine. In this paper we are interested in the chronological height and contour processes associated to CMJ forests, which corresponds to the case where the planar forest is constructed from an i.i.d. sequence of sticks. Formally, let (V * , P * ) be a random variable with values in Ł, and let P be the probability distribution on Ω such that ω under P is i.i.d. with common distribution (V * , P * ). In this paper we consider the subcritical and critical cases, i.e., we assume that Under this (sub)critical assumption, S under P is a random walk with step distribution |P * |−1, which therefore does not drift to +∞. In particular, all the trees considered in the informal sequential construction of the Introduction are finite and the sequence K n almost surely grows to ∞.
In this case, for any n ∈ Z the dual operator ϑ n leaves P invariant, i.e., P = P•(ϑ n ) −1 . In the rest of the paper, this property will be called duality, it implies for instance that S and S • ϑ n under P are equal in distribution, and the same goes with H (m) and T −1 (m) • ϑ n for any m, n ≥ 0.
The fundamental result which makes it possible to study the asymptotic behavior of the height process is the following lemma. It entails in particular that under P is a bivariate renewal process stopped at some independent geometric random variable, which thus describes the law of (H (n), H(n)) in view of (3.6).
for every bounded and measurable functions f : M → R + and g : Z + → R + , and G * is an independent geometric random variable with parameter 1 − E(|P * |).
By duality, this result describes the law of ((T (k) − T (k − 1), Q(k)), k < G) • ϑ n under P and justifies the claim made before the statement of the lemma. By combining this result with the spine decomposition of Proposition 3.4, we thus get that the genealogical height process at a fixed time can be expressed as a functional of an explicit bivariate renewal process.
Moreover, we note that the random variable Y * = π(Q * ) admits a natural interpretation. Indeed, the previous result implies that Identifying (kP(|P * | = k)/E(|P * |), k ≥ 0) as the size-biased distribution of |P * |, we see that if we bias the life descriptor P * by its number of children, then Y * is the age of the individual when its begets a randomly chosen child. As mentioned in the introduction, in the critical case E(Y * ) = 1, the random variable Y * and its genealogical interpretation can already be found in Nerman [21].
Proof of Lemma 3.13. The strong Markov property implies that G is a geometric random variable with parameter P(τ 0 = T (1) = ∞) and that conditionally on G, the random variables (1)) conditioned on {τ 0 < ∞}. Thus in order to prove Lemma 3.13, we only have to show that (τ 0 , Q(1)) under P( · | τ 0 < ∞) is equal in distribution to (T * , Q * ). Recalling that Q(1) = Υ ζ 0 (P τ 0 −1 ), we will actually show a more complete result and characterize the joint distribution of ( Fix in the rest of the proof x, t ∈ N with t ≥ 1 and h : M → [0, ∞) measurable: we will prove that By standard arguments, this characterizes the law of (P τ 0 −1 , τ 0 , ζ 0 ) and implies for instance that for any bounded measurable function F : Since τ − x is P-almost surely finite, the above relation for F (ν, x, t ) = 1 entails the relation P(τ 0 < ∞) = E(|P * |) which implies in turn the desired result by taking F (ν, x, t ) = f (Υ x (ν))g (t ). Thus we only have to prove (3.17), which we do now. First of all, note that if B = S(t − 1) = −x and S(k) < 0 for k = 1, . . . , t − 1 , then the two events {ζ 0 = x, τ 0 = t } and B ∩ {|P t −1 | ≥ x + 1} are equal. It follows from this observation that and since P t −1 and the indicator function of the event B are independent and P t −1 under P is equal in distribution to P * , we obtain Since P(B) = P(τ − x = t − 1) by duality, this proves Lemma 3.13.
Let Y * p be the random variable with distribution prescribed by (3.16) with P * = P * p , and P p be the probability distribution on Ω under which ω is an i.i.d. sequence with common distribution (V * p , P * p ). We let ⇒ denote weak convergence under P p and fdd ⇒ denote convergence in the sense of finite-dimensional distributions under P p . For instance, B p fdd ⇒ B ∞ if and only if (B p (t ), t ∈ I ) under P p converges weakly to (B ∞ (t ), t ∈ I ) for any finite set I ⊂ [0, ∞).

4.2.
Convergence of the height process. We now state our main results concerning the convergence of the chronological height process: we fix a sequence ε p → 0 and consider the rescaled processes Our results will involve the following condition. Except for the first integrability condition, it is automatically satisfied in the non-triangular case where the law of Y * p does not depend on p.

Condition T-H. For every
Proof. First of all, note that H ([p t ]) ⇒ ∞ since H (n) and T −1 (n) are equal in distribution by duality. Further, the fundamental formula (3.6) gives Let in the sequel W p (n) =Ȳ p (1)+· · · +Ȳ p (n) and W (n) =Ȳ(1)+· · · +Ȳ(n), where the two sequences (Ȳ p (k), k ≥ 1) and (Ȳ(k), k ≥ 1) are i.i.d. with common distribution Y * p −E(Y * p ) andȲ introduced in Condition T-H, respectively. Fix η > 0 and M, N ≥ 1: by duality, it follows from Lemma 3.13 and standard manipulations that Letting first p → ∞, then N → ∞ and finally M → ∞ makes the two first terms of the above upper bound vanish: the first one because the sequence (H p (t ), n ≥ 1) is tight and the second one because H ([p t ]) ⇒ ∞, and so we end up with We omit the limsup M→∞ because, as we now show, the previous limit is equal to 0 for each fixed M > 0. In the non-triangular case where the law of Y * p (and thus W p ) does not depend on p, this follows from the strong law of large numbers, and we now extend this to the triangular setting under Condition T-H. Writing and using that (W p (n) − W p (N ), n ≥ N ) is equal in distribution to W p , we get By the Portmanteau Theorem, we have As for the second term, if we define W ± p (n) = W p (n) ± η ′ n and W ± (n) = W (n) ± η ′ n, then simple manipulations lead to Under Condition T-H, we have supW − p ⇒ supW − and infW + p ⇒ infW + , see for instance Theorem 22 in Borovkov [7]. The result thus follows from the fact that, since W + (resp. W − ) is a random walk drifting to +∞ (resp. −∞), its infimum (resp. supremum) is finite.
Remark 4.2. By the exact same argument, we leave the reader convince herself that if t p is a deterministic sequence such that t p /p → 0, then ε p H(t p ) ⇒ 0. This fact will be used later in proving the convergence of the contour process.
We now state one immediate corollary of this result, which states that under mild conditions on the Y * p 's, the paths H p and H p converge jointly in the sense of finitedimensional distributions.

Corollary 4.3. Assume that Condition T-H holds and that:
( Condition (H1) is essentially a non-degeneracy condition: when |P | = 1 a.s. it is not satisfied. Theorem 2.3.1 in Duquesne and Le Gall [9] provides explicit conditions for Condition (H3) to hold. Namely, the following three conditions together imply (H3): (H3a) S p ⇒ S ∞ for some Lévy process S ∞ with infinite variation; (H3b) the Laplace exponent ψ of S ∞ satisfies In this section, we study the contour process when this assumption is not enforced, which allows the chronological and genealogical processes to scale in different ways. We thus consider two sequences ε p andε p , both converging to 0, rescale the genealogical processes usingε p as , and the chronological processes using ε p as Remark 5.1. When E(V * ) < ∞, Theorem 4.1 ensures that the difference of scaling between the genealogical and the chronological height processes can only occur when E(Y * ) = +∞. For instance, this will occur in the (non-triangular) case of Poissonian birth events along the edges (as in [14]) and when E((V * ) 2 ) = ∞.
In the Galton-Watson case, it is well-known that C p is essentially obtained from H p by a deterministic time-change under rather mild assumptions (essentially conditions (C2)-(C3) below). We now show that a similar statement holds at the chronological level.
Let V > 0 be some random variable and G be the additive subgroup generated by the support of its distribution. In the sequel we say that V is non-arithmetic if G is dense in R; otherwise, we say that V is arithmetic and in this case, there exists a unique h > 0, called the span of V , such that G = hZ. For a random variable V > 0 with finite mean, we defineV as follows: • if V is non-arithmetic, we define • if V is arithmetic and h is its span, we define Condition T-C2. We haveV * p ⇒V * ∞ with V * ∞ as in Condition T-C1, and moreover: • if V * ∞ is non-arithmetic, then V * p for each p is non-arithmetic; • if V * ∞ is arithmetic, then V * p for each p is arithmetic. In the sequel, we will refer to the first case as the non-arithmetic case and to the second case as the arithmetic case. Note that, except for the integrability condition E(V * ∞ ) < ∞, Conditions T-C1 and T-C2 as well as condition (C1) below are automatically satisfied in the non-triangular case where the law of (V * p , P * p ) does not depend on p.
Combining Theorems 4.1 and 5.2, we obtain the following joint convergence.

Corollary 5.3. Assume that except for (C5), the conditions of Theorems 4.1 and 5.2 hold withε p = ε p : then
We finally complement these results by showing that the trees themselves converge in the sense of finite-dimensional distributions. To do so, we only need considering the minimum of the contour process, see for instance Le Gall [18] for more details.
Remark 5.5. In [30], Sagitov investigated (in the non-triangular setting) the size of a CMJ process conditioned to survive at large time under the short edge assumption, i.e., when E(V * 1 ) < ∞ and E(Y * 1 ) < ∞ (see also Section 8 and Green [12]). The population size is described in the limit in terms of a continuous state branching process where space and time are scaled analogously as in Corollary 5.3. As a consequence, the previous corollary can be seen as a genealogical version of [30]. We also note that in [30], the results are obtained through an entirely different approach, namely analytic computations involving some non-trivial extension of the renewal theorem.
In the rest of this section we discuss the proof of Theorem 5.2: the proof of Theorem 5.4, provided in Section 7.4, uses essentially the same arguments, together with the additional result of Corollary 6.4. In order to prove (5.1) and in view of the assumption (C5), we only need to prove that To show this result, it is tempting to draw inspiration from the proof of Theorem 2.4.1 in Duquesne and Le Gall [9], where it is proved that sup 0≤s≤t |C p (s) − H p (s/2)| ⇒ 0 for each fixed t ≥ 0. The proof of this result relies heavily on the assumption that the discrete height process converges weakly (i.e., in a functional sense) to its continuum counterpart. At the genealogical level, assuming weak convergence is not much stronger than assuming convergence of the finite-dimensional distributions, see [9, Theorem 2.3.1]. At the chronological level however, the simple example presented in the Section 8 illustrates that the gap between these two modes of convergence is more significant. In Section 5.2 we give an overview of the main steps for proving (5.2), thereby highlighting key differences with the Galton-Watson case.

Overview of the proof of Theorem Except in Section 8, we assume in the rest of the paper that Conditions T-C1 and T-C2 and Conditions (C1)-(C5) of Theorem 5.2 hold. The two conditions
imply that the sequence (V * p ) is uniformly integrable (see for instance [6, Theorem 3.6]), which implies the following triangular weak law of large numbers. It can be directly checked by computing Laplace transforms or by invoking §22 in Gnedenko and Kolmogorov [11]. In view of the construction of the chronological contour process C in Section 2.2.2, we have Let ϕ be the left-continuous inverse of (K [t ] , t ≥ 0), defined by Then defining the inequality (5.3) translates after scaling to and so going back to (5.2), we obtain for any t ≥ 0 The proofs of H p (ϕ p (t ) + 1/p) − H p (ϕ ∞ (t )) ⇒ 0 and of H p (ϕ p (t )) − H p (ϕ ∞ (t )) ⇒ 0 proceed along similar lines, and so in the sequel we only focus on the latter convergence. The above relation shows that, asymptotically, the correct time-change should be the limit of ϕ p , and we now explain why this is indeed ϕ ∞ . Plugging in the definition K n = 2V (n − 1) − H(n) into the definition of ϕ, we obtain For large p, the triangular law of large numbers of Lemma 5.6 suggests the approximation V (p) ≈ β * p; while under assumptions (C2) and (C5) , H(p) for large p is of the order of 1/ε p ≪ p. These two observations thus give a rationale for the following result.
In view of this result, a natural idea to prove H p (ϕ p (t )) − H p (ϕ ∞ (t )) ⇒ 0 is to use a uniform control of the kind However, the example considered in Section 8 strongly suggests that even for η p precisely of the order of |ϕ p (t ) − ϕ ∞ (t )|, the supremum of the previous upper bound may blow up. Such a control is therefore too rough and more care is needed.
One of the main obstacle for a finer control is the convoluted relation between H p and ϕ p (t ), whereby H p appears in the definition of ϕ p (t ); this is also the reason why it is not straightforward to prove the apparently innocuous convergence ε p V ϕ(pt ) ⇒ 0 which is required in order to deal with the first term in the upper bound of (5.6).
In order to circumvent this difficulty, we introduce a random timeφ p (t ) close to ϕ p (t ) and which will be easier to control. More precisely, we consider the first passage time of the renewal process 2V above level t . Note that, since V n and H(n) are non-negative, we haveφ(t ) ≤ ϕ(t ) for every t ≥ 0. For fixed p, the renewal theorem provides an asymptotic description as t → ∞ of the process 2V shifted at timeφ(t ). In Section 6.2 we will prove a triangular version of this result, and Condition T-C2 is here to ensure that this extension of the renewal theorem to a triangular setting holds. We will for instance prove the following result. This result illustrates the fact thatφ p (t ) is more convenient to work with compared to ϕ p (t ). Besides, ϕ p (t ) andφ p (t ) are close: the triangular law of large numbers of Lemma 5.6 implies similarly as in the proof of Lemma 5.7 thatφ p (t ) ⇒ ϕ ∞ (t ) and, to be more precise, the next result implies that their difference is at most of the order of 1/ε p . This result is a consequence of Proposition 7.1 which will be proved in Section 7.2.

Lemma 5.9. For any t ≥ 0, the sequence of random variables
Proof. See forthcoming Proposition 7.1.
Lemmas 5.8 and 5.9 allow to get rid of the first term in the upper bound (5.6) as we show now.
Lemmas 5.8 and 5.9 imply that the two first terms vanish, while for the third term, we write where the first inequality follows from the fact that the (Vφ (pt )+k , k ≥ 1) under P p are i.i.d. with common distribution V * p . Since the (V * p ) are uniformly integrable, this last bound vanishes as p → ∞, which completes the proof.
In order to show that H p (ϕ p (t )) − H p (ϕ ∞ (t )) ⇒ 0, we introduce H p (φ p (t )) and write We will then study each term of this upper bound. We will control the first term H p (φ p (t ))−H p (ϕ ∞ (t )) by showing that the spine originated from the random timeφ(p t ) asymptotically looks like the spine originated from a deterministic time. To do so we prove an extension of the renewal theorem to a triangular setting and a macroscopic horizon in Section 6.2, thereby extending results of Miller [20].
To control the second term H p (φ p (t )) − H p (ϕ p (t )), we introduce the shifted process The key idea is that H ′ turns out to be close in distribution to H, and so elaborating on Lemma 5.9 which states that ∆ is small macroscopically (since p ≫ 1/ε p by condition (C2)) will give the desired result.

5.3.
Organization of the rest of the paper. The rest of the paper is organized as follows. In Section 6 we prove some preliminary results, namely some formulas on the height process which extend the right decomposition of the spine introduced in Section 3.3, as well as some renewal type results: in particular, these results make it possible to prove Lemma 5.8. Section 7 contains the remaining proofs, namely the proof of Lemma 5.9, the proof that each term in the upper bound of (5.7) vanishes and finally the proof of Theorem 5.4. In the sequel, we consider the measurable function D ℓ : M * → R + that satisfies D 0 ≡ 0 and for ℓ ∈ N \ {0}: The fact that the right hand side is measurable with respect to S n 0 (and thus can be written as a function of S n 0 ) is a consequence of Proposition 3.5 and the fact that the random variables appearing in the formula are related to the dual Lukasiewicz path S • ϑ n .
Moreover, we leave the reader check that for any Y ∈ M * the sequence (D ℓ (Y ), ℓ ∈ N) is increasing. Actually, this comes from a more general fact, namely that D ℓ (Y ) for Y ∈ M * gives the distance between π(Y ) and the ℓ-th stub of Y .
The following result relates the two shifts which play a key role in this paper : on the one hand, the canonical shift θ which acts on the initial sequence of sticks ((V n , P n ), n ∈ Z) through the term π(S n m ) = π(S n−m 0 ) • θ m , and on the other hand, the shift in time through the term H(n) − H(m).

Proposition 6.2.
For every 0 ≤ m ≤ n we have Proof. Applying 6.1 to the random ℓ = L(n − m) • ϑ m , we obtain (see Remark 3.12) To prove (6.2) we distinguish the two cases m∧n < 0 and m∧n ≥ 0. Case 1: m∧n < 0. By (3.4) this condition is equivalent to τ L(n−m) • ϑ m > m: in view of (6.3), we thus need to show that Using the expression for H(n), H(m) and π(S n m ) provided by Proposition 3.4 and (3.9), we see that in order to show the above relation we only have to show that T −1 (n − m) • ϑ n = T −1 (n) • ϑ n . This in turn follows from the fact that the condition m∧n < 0 implies that (again by (3.4)), which is equivalent to saying that the sets {T (i ) : i ∈ N} • ϑ n and {n − m, . . . , n} do not intersect and gives T −1 (n − m) • ϑ n = T −1 (n) • ϑ n . The proof in this case is thus complete.
Taking the difference between these two expressions yield the result in view of (6.3) (recall that m∧n ≥ 0 is equivalent to τ L(n−m) • ϑ m ≤ m).
The following lemma relates the shifted spine to the Skorohod reflection. Proof. It follows from (3.9) that Next, we have from Proposition 3.5 that Comparing the two expressions for S n 0 ,we see that We let the reader convince herself that H n − T ( T −1 (n − m)) • ϑ n = min {m,...,n} H (again by comparing the number of ladder height times at n − T ( T −1 (n − m)) • ϑ n and k ∈ {m, . . . , n}), so that gathering the previous relations we finally obtain the desired result. Local minima of C are by construction attained on the set {K n : n ∈ N} and since H(k) = C(K k ) for any k ∈ N, this implies I n m = min k=m,...,n H(k). The result then follows from Lemma 6.3.

Triangular renewal theorem on a macroscopic horizon.
By construction, π(S n m ) only depends on the finite vector P n m = (P k , k = m, . . . , n − 1), and we can thus for instance write π(S n m ) = Ξ n−m (P n m ) for some measurable mapping Ξ n−m : M n−m → [0, ∞). With this notation, Condition (C5) on the convergence of the chronological height process precisely means that if we take a vector ν p ∈ M [pδ] of [pδ] i.i.d. random measures with common distribution P * p , then Ξ [pδ] (ν p ) converges weakly to H ∞ (δ). For instance, [ϕ ∞ (pt )]−[pδ] ⇒ H ∞ (δ) and we want to extend this result by replacing the deterministic time [ϕ ∞ (p t )] by the random oneφ(p t ).
Of course, the random variables (P k , k =φ(p t ) − [pδ], . . . ,φ(p t ) − 1) are not i.i.d. and so we cannot directly invoke the same argument. However, the renewal theorem suggests that these random variables become asymptotically i.i.d. as p → ∞, which gives a rationale for, e.g., the convergence Results with a similar flavor, i.e., renewal theorems on a macroscopic horizon, can be found in Miller [20].
Two technical difficulties prevent us from using Miller's or other standard results: (1) we are in a triangular setting and (2) we need to consider a growing number of terms (of the order of p). In addition, Miller [20] typically assumes the almost sure convergence of Ξ [pδ] P [pδ] 0 when we only have weak convergence. In order to overcome these difficulties, we exploit the coupling between two random walks with the same step distribution but possibly different initial distributions constructed in the proof of Lemma 9.21 in Kallenberg [13]. This coupling leads to the following results proved in the Appendix C. Proposition 6.5. Let (V * ∞ ,P * ∞ ) have the following size-biased distribution: for every measurable function f : Then Vφ (pt ) , Pφ (pt ) ⇒ V * ∞ ,P * ∞ for every t > 0. Proposition 6.6. For each p ≥ 1 let Ξ p : M p → R be a measurable mapping such that Recall the exploration process ρ n 0 = S n 0 • G , which similarly as (3.8) is extended by The following corollary to Propositions 6.5 and 6.6 gathers the results needed in the sequel. Corollary 6.7. For t ≥ 0, the three sequences ε p Vφ (pt ) , ε p π(Pφ (pt ) ) and ε p |Pφ (pt ) | converge weakly to 0 as p → ∞. If in addition 0 < δ < t /(2β * ), then Proof. The convergence of the three sequences ε p Vφ (pt ) , ε p π(Pφ (pt ) ) and ε p |Pφ (pt ) | is a direct consequence of Proposition 6.5 (note that, for point processes, the functionals π and |·| are continuous for the weak topology).
Let us now discuss the remaining convergence of ε p π Sφ does, in which case they have the same limit. This means that we are brought back to the convergence of H p (δ), H p (δ) and sup [0,δ] S p and since each of these three terms convergences by assumption (C3), (C4) and (C5), the result follows.

PROOF OF THEOREMS 5.2 AND 5.4
We now complete the proof of Theorems 5.2 and 5.4: Theorem 5.2 is proved in Sections 7.1-7.3 and Theorem 5.4 in Section 7.4. For Theorem 5.2, recall from the discussion in Section 5.2 that there remains to prove Lemma 5.9 as well as the fact that both terms in the upper bound of (5.7) vanish, i.e., that Using the results of the previous section, we will first prove (7.1) in Section 7.1. Then, we will use (7.1) to prove the following result in Section 7.2.

Proposition 7.1. For any t > 0 and any
Combining (7.1) and Condition (C5) implies that H(φ(p t )) is of the order of 1/ε p , and so Proposition 7.1 directly implies Lemma 5.9. Finally, we will use Proposition 7.1 to prove (7.2) in Section 7.3, which will achieve the proof of Theorem 5.2. 7.1. Proof of (7.1). We start with the following simple lemma.
Let for simplicity m p =φ(p t )∧[ϕ ∞ (p t )]. Since we have H(m p ) ≤ H(φ(p t )) as well as H(m p ) ≤ H([ϕ ∞ (p t )]), the triangular inequality reads and since m p ≤ min(ϕ(p t ), [ϕ ∞ (p t )]), (6.2) gives by neglecting the terms D ≥ 0 In particular, we only need to show that ε p π(S φ p m p ) ⇒ 0 for φ p =φ(p t ) or [ϕ ∞ (p t )]. Using the monotonicity of π(S n m ) in m given by Lemma 7.2, we obtain for any 0 < δ < ϕ ∞ (t ) this is a consequence of (C5), and for φ p =φ(p t ) this was proved in Corollary 6.7 for δ small enough. Since this inequality holds for every δ small enough and since H ∞ is almost surely continuous at 0 by Condition (C5), in order to conclude the proof it remains to show that P p (m p ≤ φ p − [pδ]) → 0 as p → ∞ for each fixed 0 < δ < ϕ ∞ (t ), which we do now.
By Assumption (C4), the genealogical contour process C p converges weakly to a continuous process C ∞ . Since φ p /p ⇒ ϕ ∞ (t ), this implies that C p (t p ) − inf I p C p ⇒ 0 with t p = φ p /p or t p = ϕ ∞ (t ) and I p = [min(φ p /p, ϕ ∞ (t )), max(φ p /p, ϕ ∞ (t ))]. By classical arguments on discrete trees, this implies that the genealogical distance rescaled byε p between φ p and m p converges to 0, i.e.,ε p (H (φ p ) − H (m p )) ⇒ 0. Therefore, for any η > 0 we obtain Since this term converges to P(H ∞ (δ) ≤ η) (for φ p =φ(p t ) this comes from Corollary 6.7 and for φ p = [ϕ ∞ (p t )] this is the convergence of the genealogical height process assumed in (C4)) we finally obtain Letting η → 0 in the last display therefore concludes the proof thanks to Condition (C4).

Proof of Proposition 7.1.
In order to prove this result, we introduce two intermediate height processes. We enrich the probability space with a random variable P which under P p is equal in distribution to P 1 and independent from the sequence (Pφ (pt )+k , k ≥ 1), and we consider S (p) = ( S n (p) , n ≥ 0) the spine process defined from the sequence ( P , Pφ (pt )+1 , · · · ). For k ≥ 0 we then let and H p (k) = π S k (p) .
We now turn to the proof of Proposition 7.1. Let in the rest of the proof ∆ p = ϕ(p t ) − ϕ(p t ). Since by definition it follows that and so according to Proposition 6.2, Since D k (ν) ≥ 0 and 2V (φ(p t )) ≥ p t , we obtain by definition of H p that In particular, if σ p = [ηH(φ(p t ))] then in order to prove the result it is enough to show that Since for any γ > 0, we have the desired convergence is implied by the following two relations: Let us begin by proving the first relation ε p H p (σ p ) ⇒ 0. Corollary 4.3 combined with (7.1) shows that ε p σ p ⇒ ηH ∞ (ϕ ∞ (t )), and since pε p → ∞ by (C2), it follows that σ p /p ⇒ 0. Since H p is equal in distribution to H by Lemma 7.3 and σ p is independent of H p , we obtain in view of Remark 4.2 that ε p H p (σ p ) ⇒ 0. The second part of Lemma 7.3 finally entails the desired result ε p H p (σ p ) ⇒ 0.
We now prove the second convergence in (7.5). By construction,V p is a renewal process independent of H(φ(p t )), and thus independent of σ p : Lemma 5.6 thus implies thatV p (σ p − 1)/σ p ⇒ β * and since, as already mentioned, ε p σ p ⇒ ηH ∞ (ϕ ∞ (t )), we get Since (2β * η−1) > 0 and H ∞ (ϕ ∞ (t )) > 0 a.s. by (C5), the result follows by letting γ → 0. 7.3. Proof of (7.2). Let as in the previous subsection ∆ p = ϕ(p t )−φ(p t ). Proposition 6.2 gives (pt ) . We now show that each term of the righthand side of (7.6) vanishes, and we start with the second one, i.e., we show that It is not hard to prove that D L ′ p (k) (Sφ (pt ) 0 ) is non-decreasing in k and the sequence (ε p ∆ p , p ≥ 1) is tight, it is enough to show that (7.8) ε p D L(t p )•ϑφ (pt) Sφ (pt ) 0 ⇒ 0 for some deterministic integer-valued sequence (t p ) with ε p t p → ∞: we will consider t p = [(p/ε p ) 1/2 ], which satisfies in addition t p /p → 0. In order to prove (7.8), we fix until further notice γ, γ ′ > 0 and two integer-valued sequences (γ p ), (γ ′ p ) such that γ p /p → γ (in particular t p /γ p → 0) and γ ′ p /(pε p ) → γ ′ . Since both D k (Sφ (pt ) 0 ) and L(k) • ϑφ (pt ) are non-decreasing with k, it follows that for p large enough such that t p ≤ γ p , we have By definition of L and S, the first term is equal to Isolating the term |Pφ (pt ) | − 1 and using that the P k 's for k ≥φ(p t ) + 1 are i.i.d., we further get The first term vanishes by (C2) and Corollary 6.7, and so rescaling the second term by pε p and using (C3), we obtain By letting first p → ∞ and then γ ↓ 0, we thus have at this point Fix now some 0 < δ < t /(2β * ): by definition (6.1) of D, (pt ) and so in the event {τ γ ′ p • ϑφ (pt ) ≤ [pδ]}, we get where we have used (3.9) to derive the last equality. In particular, Letting first γ ′ → 0 and then δ → 0 concludes the proof of (7.8), and so also of (7.7).
We now show that the first term in the right-hand side of (7.6) also vanishes. In view of (7.4) and using 2V (φ(p t )) − p t ≤ 2Vφ (pt ) , we obtain We have just proved that the second term vanishes (in law), and since the third term also vanishes by Lemma 5.8 it only remains to control the first term. SinceV is an increasing sequence, for any γ, η > 0 we have Choose now η > 1/(2β * ), so that the first term vanishes by Proposition 7.1. For the second term, we note thatV is independent from H(φ(p t )) to obtain with similar arguments as in the proof of Proposition 7.1 Since P(H ∞ (ϕ ∞ (t )) > 0) = 1, letting η → 1/(2β * ) concludes the proof.
) and R p = ½(0 < τ ℓ(p) ≤ [p t ])π(µ ℓ(p) ), so that by definition (6.1) of D we have Using the various facts that D ℓ(p) (S p and τ ℓ(p) are genealogical quantities and finally that ϑ [pt ] and G commute, composing on the right with G in the previous display gives By duality, we therefore only have to show that the three quantities converge weakly to 0. The second one obviously does since ε p → 0. For the third one we proceed similarly as in the proof of Theorem 4.1: indeed, ε p T −1 p is tight (because it is smaller than ε p T −1 ([p t ]) by monotonicity of T −1 , which is equal in distribution to H p (t )), which is the only assumption necessary for the proof of Theorem 4.1 to go through.
Next, we fix some N ≥ 0, consider N p = [N /ε p ] and use the previous inequality to write (7.9 where G = inf{k ≥ 0 : T (k) = ∞} and η p = η/ε p . For the first term of the right-hand side, we note that T −1 ([p t ]) is by duality equal in distribution to H (p t ) to get It remains to control the second term in the right-hand side of (7.9): since the (Y(k), k = 1, . . . ,G − 1) are i.i.d. by Lemma 3.13, we have This last bound vanishes because N p P(Y * p ≥ η p ) → 0 as a direct consequence of the uniform integrability of the Y * p together with the following bound: The proof is complete.
In the sequel for 0 ≤ u ≤ v we define Corollary 7.5. For any 0 < a < b we have Proof. First of all, we note that Indeed, this follows from rewriting M (2β * p a, 2β * pb) = inf C p (t ) : 2a ≤ t ≤ 2b and together with the following two facts: 1) C p ⇒ C ∞ with C ∞ continuous and 2) p −1 K [pa] • G ⇒ 2a. Therefore, in order to prove the result we only have to prove that To prove this, we define and apply Corollary 6.4 to write The first term on the right-hand side vanishes by Theorem 4.1, so we are left with the second term. Since L p is a genealogical quantity, this term is equal to and we can now invoke Lemma 7.4 to conclude that this term vanishes, as L p is independent of S [pa] 0 and converges weakly to ∞. This proves the result.
Proof of Theorem 5.4. In order to prove Theorem 5.4 we have to prove that Since for any t ∈ R + we have Thus in the sequel, for any 0 < γ < s < t we can assume that the event E p (s, γ) ∩ E p (t , γ) holds. By monotonicity, in this event we have and pursuing with the triangular inequality, we obtain Multiplying by ε p , the two terms of the second line vanish as p → ∞ by Corollary 7.5; letting then γ → 0 makes the terms of the third line disappear by virtue of the convergence C p ⇒ C ∞ with C ∞ continuous. The proof of Theorem 5.4 is complete.

SOME EXAMPLES WHERE TIGHTNESS FAILS
In the Galton-Watson case, if the height process converges in the sense of finitedimensional distributions toward a càdlàg process, then one actually only needs mild additional assumptions in order to get weak convergence in a functional sense of both the height and contour processes, essentially assumption (H3c) discussed after Corollary 4.3. For instance, we automatically get weak convergence in the non-triangular case where the offspring distribution does not depend on p.
In this section we consider simple examples where the genealogical height and contour processes of the corresponding CMJ trees converge in the sense of finite-dimensional distributions but not necessarily in a functional sense. In contrast to the Galton-Watson case, we show that this can happen even in the non-triangular case. For these examples, all the assumptions of the main results of the present paper (namely Corollary 4.3 and Theorem 5.2) hold, which shows that further conditions are called upon in order to strengthen these results to functional convergence.
Throughout this section, we assume that (V * p , P * p ) is equal in distribution to (V * , P * ), independent of p. We let ξ = |P * | and assume that its distribution is a critical offspring distribution in the domain of attraction of an α-stable law with α ∈ (1, 2). Then, it is known that for the choice ε p =ε p = p −(1−1/α) , assumptions (H3a)-(H3c) and (C2)-(C4) hold. In particular, S p has jumps of the order of one which means that, typically, some nodes have of the order of pε p = p 1/α children: these nodes are called macroscopic. In particular, assumptions (H2) and (C1) hold. The corresponding CMJ tree is then almost a Galton-Watson tree with offspring distribution the distribution of ξ, except that each edge is extended by a length equal to the number of children of the corresponding individual. Since H only depends on the P p but not on the V p , we have H = H and so H p converges weakly. On the other hand, macroscopic nodes have, by construction, edges with length of the order of p 1/α . When the particle traveling along the edges meets such an edge, this makes C go up and then down at rate ±1 for a duration p 1/α , so that during this time interval C has variation of the order of p 1/α . Because of the scaling C p (t ) = p −(1−1/α) C(p t ), such a time interval corresponds for C p to a time interval of size p 1/α × (1/p) = p −(1−1/α) , during which C p has variation of the order of p 1/α × p −(1−1/α) = p 2/α−1 . Since α ∈ (1, 2), in the limit we see that each macroscopic node should induce an infinite jump of C p . Since macroscopic nodes are dense, this strongly proscribes the tightness of C p .

Second family of examples.
Let us now consider a variation of the above example, where both H p and C p fail to converge weakly: here we consider V * , P * = 1 + ξ, (ξ − 1)δ 1 + δ ξ , so that E(V * ) = 2 and E(Y * ) = E(2ξ − 1) = 1. Again, the corresponding CMJ tree is almost a Galton-Watson tree, with the difference that all but one child are born at time 1, and one child is born at a time equal to the number of children. The crucial difference with the first family of examples is that now, a macroscopic node also induces an infinite jump of H p for the exact same reason as before.
Let us now push this example a little further, and discuss the claim made in Section 5.2 that a uniform control of the kind for some η p → 0 such that P p (|ϕ p (t ) − ϕ ∞ (t )| ≤ η p ) → 1 is too rough. Actually, we will discuss this withφ p (t ) instead of ϕ p (t ) but since these two quantities are close (recall Lemma 5.9), this discussion is equally insightful. In this case, classical results show that ϕ(p t ) − ϕ ∞ (p t ) is of the order of p 1/α . Undoing the scaling, we see that we want to understand the order of magnitude for the variations of H on time scales of the order of p 1/α , and in particular to see how these variations compare to the space scale p 1−1/α . Since S p converges to a stable process, it follows from the previous discussion that on the time scale p 1/α , S makes jumps of size (p 1/α ) 1/α = p 1/α 2 . As before, these jumps correspond to "mesoscopic" individuals with of the order of p 1/α 2 children, which also have edge lengths of the same order. In particular, if the space scale p 1−1/α is negligible compared to p 1/α 2 , i.e., if then it is reasonable to expect the right-hand side of (8.1) to blow up, although we have proved that left-hand side vanishes.
To conclude, we mention that such examples could be generalized by considering V * , P * = 1 + ξ, (ξ − 1)δ 1 + δ f (ξ) for some function f : [0, ∞) → [0, ∞) such that E( f (ξ)) < ∞. This extended family of examples then allows to decrease the above threshold involving the golden number, and also to show that even if α = 2, i.e., the offspring distribution has finite variance, H p may fail to be tight even though its finite-dimensional distributions converge.
APPENDIX A. PROOF OF LEMMA 3.3 In this section we prove Lemma 3.3: first consider the following lemma.