Fringe trees, Crump-Mode-Jagers branching processes and $m$-ary search trees

This survey studies asymptotics of random fringe trees and extended fringe trees in random trees that can be constructed as family trees of a Crump-Mode-Jagers branching process, stopped at a suitable time. This includes random recursive trees, preferential attachment trees, fragmentation trees, binary search trees and (more generally) $m$-ary search trees, as well as some other classes of random trees. We begin with general results, mainly due to Aldous (1991) and Jagers and Nerman (1984). The general results are applied to fringe trees and extended fringe trees for several particular types of random trees, where the theory is developed in detail. In particular, we consider fringe trees of $m$-ary search trees in detail; this seems to be new. Various applications are given, including degree distribution, protected nodes and maximal clades for various types of random trees. Again, we emphasise results for $m$-ary search trees, and give for example new results on protected nodes in $m$-ary search trees. A separate section surveys results on height, saturation level, typical depth and total path length, due to Devroye (1986), Biggins (1995, 1997) and others. This survey contains well-known basic results together with some additional general results as well as many new examples and applications for various classes of random trees.


Introduction
Aldous [1] introduced the concept of a random fringe subtree of a random tree. (See Section 4 below for definitions.) This is a useful concept since many properties of a tree can be formulated in terms of fringe trees, and thus results on the asymptotic distribution of fringe trees can imply various other asymptotic results; a simple example is the degree distribution (considered already in [1]) and some other examples are given in Section 10 (protected nodes and rank). (See also Devroye and Janson [41] and Holmgren and Janson [66] for some recent related applications of fringe trees.) Moreover, Aldous [1] also introduced the extended fringe tree that allows for consideration of e.g. parents and siblings of a chosen node; see Section 11 for some applications (e.g. maximal clades).
It is thus of interest to describe the asymptotic distribution of random fringe trees and extended fringe trees for various classes of (random) trees. Aldous [1] gave several examples of asymptotic fringe trees, including the case of random binary search trees; he also, more briefly, gave examples of asymptotic extended fringe trees. One of the purposes of the present paper is to extend these examples. In particular, we describe asymptotic fringe trees and extended fringe trees for m-ary search trees (see Section 3 for a definition and Section 7 for results). We give some applications of these results for m-ary search trees in Sections 10 and 11.
Our characterization uses some of the ideas in Aldous [1], in particular the reduction to results for continuous-time Crump-Mode-Jagers branching processes by Jagers and Nerman [72], [105]. (The m-ary search trees have earlier been studied by similar methods by Pittel [112]; however there the focus was on the height of the trees and not on the fringe trees.) In a sense, the results are implicit in [1], and partly in [72; 105], but the details are not completely trivial so we give a detailed explicit treatment.
We therefore begin with a survey of fringe trees and extended fringe trees for family trees of Crump-Mode-Jagers branching processes, including many other examples besides the m-ary search trees. The general theory is described in Sections 4 and 5. In Section 6, several examples are studied in detail, in particular various versions of preferential attachment trees, which earlier have been studied by these methods by Oliveira and Spencer [108], Rudas, Tóth and Valkó [118] and Rudas and Tóth [117]. We then specialise on m-ary search trees; explicit results for them are given in Section 7. In Section 8 we consider the random median-of-(2ℓ + 1) binary search tree as yet another example.
Furthermore, as another novel example, we consider in Section 9 the class of fragmentation trees; these too can be constructed using family trees of Crump-Mode-Jagers branching processes, but in a slightly different way from the preceding examples. We extend the results for the asymptotic distribution of random (extended) fringe trees to this case too.
In Sections 10 and 11, as mentioned above, we give some applications of the results on asymptotic fringe trees and extended fringe trees to protected nodes and maximal clades (and related properties). This serves partly to illustrate the general theory and its uses and some results are old, but we also give a number of new results for m-ary search trees. In particular, we give a recursion that yields the asymptotic probability that a random node in an m-ary search tree is k-protected, for general m and k, and a closed formula for the case k = 2, together with asymptotics as m → ∞ of this probability for k = 2.
In the main part of the paper, we consider the fringe tree or other properties of a uniformly random node in the tree. In Section 12 we consider variations for a random node with a non-uniform distribution. We study first restricted sampling, where we sample only nodes with some given property, for example a random leaf. For m-ary search trees, we study also the node containing a random key.
In Sections 4-12, we study (more or less) local properties of the tree, that are related to (extended) fringe trees. Branching process methods have also for a long time, beginning with Devroye [32], been used to study global properties of random trees, such as the height and other properties related to the distance to the root from the nodes. As a complement to the previous sections, we give in Section 13 a survey of such results for the height, saturation level, profile, typical depth and total path length. This uses the same setup as the preceding sections with random trees constructed as family trees of Crump-Mode-Jagers branching processes, but the methods are different and based on results on branching random walks by Biggins [11;13;14;15]. This section is thus essentially independent of the previous sections, except for definitions and some basic results. The main results are well-known, but we believe that some results are new.
In this paper, we concentrate on results obtained by general branching process methods, in particular results on the asymptotic distribution of (extended) fringe trees and applications of such results. Typical results can be expressed as convergence in probability or almost surely (a.s.) of the fraction of fringe trees that are isomorphic to some given tree, see for example (5.22); see also (4.7)-(4.8) and Remark 4.1. Such results can be seen as a law of large numbers for fringe trees, and typical applications yield first-order results for the proportion or number of nodes that have a certain property (see . In some special cases, for example for some properties of the binary search tree, much more precise results have been derived by other methods. We give some references to such results, but we do not attempt completeness. A natural next step would be to show a general central limit theorem, i.e., asymptotic normality of the number of fringe trees of a given type, under suitable conditions. This will not be attempted in the present paper, but we give some comments and references in Section 14; in particular we note that such results have been proved, by other methods, for some special cases (the binary search tree and random recursive tree), but that they do not hold in other cases (m-ary search tree with m 27).
The appendices contain some results that are used in the main part of the paper. Remark 1.1. In the present paper we consider random trees that are generated by stopping a supercritical branching process at a suitable (random) time, for example when its size (the number of individuals) is a given number.
Note that the results are quite different from the results for fringe trees of conditioned Galton-Watson trees, where we also start with a branching process but instead of stopping it, we let it run until extinction and condition on its total size being a given finite number, see [1; 7; 76; 77].

Some notation
The trees considered here are rooted and finite, unless otherwise indicated. (The infinite sin-trees, that arise as limits in Section 4, are important exceptions.) Furthermore, the trees are ordered, again unless otherwise indicated; unordered trees may be considered by giving them an arbitrary (e.g. random) ordering of the children of each node.
Moreover, there may be further information on the children of each node. In a binary tree, each child is labelled as left or right (with at most one child of each type at any node); the tree is ordered, with a left child before a right child, but also a single child is labelled left or right. More generally, in an m-ary tree, see Section 3, a node has m slots for children and the children are labelled with distinct numbers in {1, . . . , m}; these numbers determine the order of the children, but not conversely, since a node may have less than m children and thus only use a subset of these labels. (In an extended m-ary tree, each node has either m children or 0, so these labels are determined by the order and are therefore redundant. ) We write T 1 ≈ T 2 when T 1 and T 2 are isomorphic rooted trees. We often identify trees that are isomorphic. We may regard all finite rooted trees as subtrees of the infinite Ulam-Harris tree with node set V ∞ := ∞ n=0 N n consisting of all finite strings of natural numbers, where ∅ is the root and the mother of i 1 · · · i k is i 1 · · · i k−1 , see e.g. [61, § VI.2] and [106].
Let |T | be the number of nodes in a tree T . We regard the edges in a tree as directed from the root. Thus the outdegree d + (v) = d + T (v) of a node v in a tree T is its number of children. The depth h(v) of a node v is its distance from the root. Given a tree T and a node v ∈ T , let T v denote the subtree rooted at v, i.e., the subtree consisting of v and its descendants.
If T and S are trees, let n S (T ) be the number of nodes v in T such that T v ≈ S. Similarly, given a property P of nodes in a tree, let n P (T ) be the number of nodes v in T that have the property P. (Thus n S (T ) = n P S (T ) if P S is the property of v that T v ≈ S.) For a random rooted tree T and a fixed tree S, let p S (T ) = P(T ≈ S). Furthermore, if P is a property of nodes, let p P (T ) be the probability that the root of T has the property P. (Note that p S (T ) = p P S (T ) with P S as in the preceding paragraph, so the notation is consistent.) Note that when we talk about a property P of nodes, it is implicit that the property depends also on the tree containing the node, so it is really a property of pairs (v, T ) with v ∈ T . We will frequently consider properties of a node v that depend only on v and its descendants, i.e., on the subtree T v . In this case (but not in general), we may also regard the property P as a property of rooted trees: we say that a tree T has P if the root of T has P. In this case we also use P for the set of rooted trees that have the property P; thus a node v in a tree T has P ⇐⇒ T v ∈ P.
If P is a property of nodes, we sometimes write v ∈ P for the event that v has P.
We let x k and (x) k denote the rising and falling factorials: We say that a function f (x) is decreasing if x < y implies f (x) f (y); note that we allow equality. (This is sometimes called weakly decreasing.) If x < y implies f (x) < f (y), we may say strictly decreasing. Increasing and strictly increasing are defined similarly.
We consider asymptotics of various random trees when some parameter n (for example the number of nodes, or number of keys in an m-ary search tree) tends to infinity. Similarly, for the continuous-time branching processes, we consider limits as the time t tends to infinity. As usual, w.h.p. (with high probability) means with probability tending to 1.

m-ary search trees
An m-ary search tree, where m 2 is a fixed number, is an m-ary tree constructed recursively from a sequence of distinct keys (real numbers) as follows, see e.g. [93] or [44]. (In the case m = 2, we say binary search tree.) The m-ary search trees were first introduced in [103].
Each node may store up to m − 1 keys. We start with a tree containing just an empty root. The first m − 1 keys are stored in the root. When the (m − 1):th key is placed in the root, so the root becomes full, we add m new nodes, initially empty, as children of the root. Furthermore, the m − 1 keys in the root divide the set of real numbers into m intervals J 1 , . . . , J m . Each further key is passed to one of the children of the root depending on which interval it belongs to; a key in J i is passed to the i:th child.
This construction yields the extended m-ary search tree. Nodes containing at least one key are called internal and empty nodes are called external. Usually one eliminates all external nodes and consider the tree consisting of the internal nodes only; this is the m-ary search tree.
For both versions, we often wish to keep track of the number of keys in each node, so we regard the trees as labelled trees where each node has a label in {0, . . . , m − 1} indicating the number of keys. (Thus external nodes have label 0 while internal nodes have labels in {1, . . . , m − 1}. ) We assume that the keys are i.i.d. random variables with a continuous distribution, for example U [0, 1]. With a given number n of keys, this gives a random m-ary search tree T n . (As is customary, we usually omit the word "random" for convenience. Also, we regard m as fixed, and omit it from the notation.) Note that only the order of the keys matter; hence we obtain the same random m-ary search tree T n also if we instead let the keys be a uniformly random permutation of {1, . . . , n}.
Note that in T n we have fixed the number of keys; not the number of nodes. A node may contain 1, . . . , m − 1 keys, and the total number of nodes will be random when m 3. (The binary case m = 2 is an exception; each internal node contains exactly one key, so the number of (internal) nodes equals the number n of keys, and the number of external nodes is n + 1.) In an extended m-ary search tree, say that a node with i m − 2 keys has i + 1 gaps, while a full node has no gaps. It is easily seen that an extended m-ary search tree with n keys has n + 1 gaps; the gaps correspond to the intervals of real numbers between the keys (and ±∞), and a new key has the same probability 1/(n + 1) of belonging to any of the gaps. Thus the evolution of the extended m-ary search tree may be described by choosing a gap uniformly at random at each step. Equivalently, the probability that the next key is added to a node is proportional to the number of gaps at that node. For the m-ary search tree (with only internal nodes) the same holds with minor modifications; a full node now has one gap for each external node in the extended version, i.e., m − d gaps if there are d children, and a key added to one of its gaps now starts a new node.

Fringe trees and extended fringe trees
Given a (finite, rooted) tree T , the random fringe tree of T is the random tree obtained by taking the subtree T v with v chosen uniformly at random from the nodes of T ; we denote the random fringe tree of T by T * .
Consider a sequence T n of (possibly random) trees such that the random fringe tree T * n converges in distribution to some random tree F: which simply means (since the set of finite trees is countable) for every finite rooted tree S. We then say, following Aldous [1], that F (or rather its distribution) is the asymptotic fringe distribution of T n . If the trees T n are deterministic, then (4.2) can be written for every tree S; this is equivalent to the seemingly more general n P (T n ) |T n | → p P (F), (4.4) for every property P of a node v that depends only on the subtree T v , i.e., on v and its descendants.
In the more general case when T n are random (which is the case we are interested in), (4.2) instead can be written E n S (T n ) |T n | → p S (F) (4.5) or, more generally but equivalently, E n P (T n ) |T n | → p P (F) (4.6) for properties P as above. In interesting cases, we may typically strengthen (4.5)-(4.6) to convergence in probability: n P (T n ) |T n | p −→ p P (F); (4.7) Aldous [1,Proposition 7] gives a general criterion for this (the distribution of F is extremal in the set of fringe distributions), but we will instead prove (4.7) directly in the cases considered here; moreover, we will in our cases prove convergence almost surely: n P (T n ) |T n | a.s. −→ p P (F). (4.8) Remark 4.1. Note that n S (T n ) |T n | = P T * n ≈ S | T n (4.9) and, more generally, for a property P as above, n P (T n ) |T n | = P T * n ∈ P | T n . (4.10) It follows from (4.9) that (4.7) and (4.8) (for all properties P considered there) are equivalent to conditional versions of (4.1): L T * n | T n p −→ L F (4.11) and L T * n | T n a.s.
−→ L F , (4.12) respectively, with convergence in probability or a.s. of the conditional distribution, in the space of probability distributions on trees. (Note that any such property P corresponds to a set of finite rooted trees T , and conversely.) Results such as (4.11) and (4.12), where we fix a realization T n of a random tree and then study the distribution of its fringe tree (or something else), as a random variable depending on T n , are usually called quenched, while results such as (4.1), where we consider the random fringe tree of a random tree as a combined random event, are called annealed. See further e.g. [41] and [77].

4.1.
Extended fringe trees. The fringe tree T * considers only the descendants of a random node. Aldous [1] introduced also the extended fringe trees that include the nearest ancestors and other close relatives. If k 0 and v ∈ T with h(v) k, let v (k) be the ancestor of v that is k generations earlier (i.e., with h(v (k) ) = h(v) − k), and let T v,−k be the subtree rooted at v (k) , with the node v marked. (Or, equivalently, with the path from the root v (k) to v marked.) Thus T v,−k is a rooted tree with a distinguished node of depth k. (Note that T v,−0 = T v . ) We define the random extended fringe tree T * ,−k as T v,−k for a uniformly random node v ∈ T ; this is really not defined when h(v) < k, but we may define T v,−k in this case too by some supplementary definition, for example as a path of length k − h(v) with a copy of T attached, with v marked. We are only interested in asymptotics of the random extended fringe trees for sequences of trees T such that for a random node v, i.e., P(h(v) < k) → 0 for every fixed k, and thus each T * ,−k is well-defined w.h.p., and then the supplementary definition does not matter.
Aldous [1] showed that if T n is a sequence of (possibly random) trees such that (4.13) holds and an asymptotic fringe distribution exists, i.e., (4.1) holds, then, more generally, each T * ,−k n converges in distribution to some random tree F −k with a distinguished node o of depth k. Note that the trees T v,−k n are consistent in an obvious way, with T v,−(k−1) n a subtree of T v,−k n , and thus the same holds for the limits F −k (after a suitable coupling). Hence it is possible to regard the trees F −k as subtrees of a (random) infinite tree F with a distinguished node o and an infinite line o, o (1) , o (2) , . . . of ancestors of o, such that F −k = F o,−k = F o (k) . Furthermore, every node in F has a finite number of descendants; thus there are no other infinite paths from o. (Aldous [1] calls such a tree a sin-tree, for single infinite path.) We may then say that the extended fringe trees converge to the random sin-tree F , in the sense that T * ,−k n d −→ F o (k) for each k, or, equivalently, using the product topology on the set of sequences of (finite) trees, (4.14) For a random sin-tree F and a property P of nodes, let p P ( F ) be the probability that the distinguished node o has the property P. Then, cf. (4.6) (which is the case k = 0), (4.14) implies, and is equivalent to, E n P (T n ) |T n | → p P ( F ), (4.15) for every property P that depends only on T v,−k for some k, i.e., on v and its descendants and the descendants of its ancestors at most a fixed number of generations back. Again, we may typically strengthen (4.15) to convergence in probability, and in our cases we shall prove convergence a.s.: n P (T n ) |T n | a.s. −→ p P ( F ). (4.16) By standard truncation arguments, it may be possible to extend (4.15) or (4.16) also to some more general properties P, depending on an unlimited number of ancestors, see Sections 5.1 and 11 for some examples.

Family trees of general branching processes
A Crump-Mode-Jagers process is a general branching process defined as follows, see e.g. [71] for further details and for basic facts used below.
The branching process starts with a single individual born at time 0. This individual has a random number N of children, born at random times (ξ i ) N i=1 ; here 0 N ∞, and we assume 0 ξ 1 ξ 2 · · · . It is convenient to describe the birth times {ξ i } N 1 as a point process Ξ on [0, ∞). Every child that is born evolves in the same way, i.e., every individual x has its own copy Ξ x of Ξ (where now ξ i means the age of the mother when child i is born); these copies are assumed to be independent and identically distributed. Denote the time an individual x is born by σ x .
Usually one also assumes that each individual has a random lifetime λ ∈ [0, ∞]; for our purposes this plays no role, so we ignore it. (Formally, we may assume that λ = ∞.) There may also be other random variables associated to the individuals. Formally, we give each possible individual x its own copy (Ω x , F x , µ x ) of some probability space (Ω, F, µ) on which there are defined some given functions defining N , ξ i (and thus Ξ), and possibly other random variables describing the life history such as the marks ν i or label ℓ(t) in Remarks 5.1 and 5.2 below; the branching process then is defined on the product x (Ω x , F x , µ x ) of these probability spaces. (The individuals may be labelled in a natural way by strings in V ∞ := ∞ n=0 N n ; hence the set of individuals that are realized in the branching process is a random subset of V ∞ , and we may extend the product over x ∈ V ∞ .) Let Z t be the number of individuals at time t 0; since we assume no deaths, this equals the number of individuals born in [0, t]. (We follow standard custom and let all processes be right-continuous; thus an individual born at t exists at t and is included.) We say that the process is finite (or dies out) if Z ∞ < ∞, i.e., only a finite number of individuals are ever born.
Let T ∞ be the family tree of the branching process. This is a (generally infinite) tree obtained from the branching process by ignoring the time structure; in other words, it has the individuals as nodes, with the initial individual as the root, and the children of a node in the tree are the same as the children in the branching process. Let T t be the subtree consisting of all individuals born up to time t. Note that the number of nodes |T t | = Z t . (We are mainly interested in cases where Z t < ∞ for every finite t, but Z ∞ = ∞.) Remark 5.1. This defines the family tree T t as an unordered tree. Sometimes we want an ordered tree, so we have to add an ordering of the children of each individual. This can be done by taking the children in order of birth (which is the standard custom), but in our examples we rather want a random order. In general, we can obtain ordered family trees by assuming that each individual has a marked point process Ξ * (augmenting the plain Ξ above), where each point ξ i has a mark ν i ∈ {1, . . . , i} telling at which position the new child is inserted among the existing ones. (This includes both the birth order case, with ν i = i, and the random order case, with ν i uniform and independent of everything else.) For the m-ary search trees in Section 7, we want further information; this is obtained by instead giving each of the m children a distinct mark ν i ∈ {1, . . . , m} telling the position of the child among all (existing and future) children. (Equivalently, we may equip each individual with a random permutation of {1, . . . , m} giving the order of birth of the children.) Remark 5.2. We may also have labels on the nodes of T t ; this is important for our application to m-ary search trees, since they have nodes labelled with the number of keys, see Section 3. In general, we may assume that each individual has a label given by some random function ℓ(t) of its age. We assume that the set of possible labels is countable (with the discrete topology); we may assume that the labels are integers. We also assume that the function ℓ(t) ∈ D[0, ∞); thus ℓ(t) is constant on some intervals [t i , t i+1 ). (As everything else in the branching process, the label may depend on Ξ and other properties of the same individual, but not on other individuals, and they have the same distribution for all individuals; this is also a consequence of the formalism with probability spaces (Ω x , F x , µ x ) above.) A characteristic of an individual, see e.g. [71; 72; 104; 105], is a random function φ(t) of the age t 0; we assume that φ(t) 0 and that φ belongs to the space D[0, ∞) of right-continuous functions with left limits. (Note that we consider only t 0. We may extend φ to (−∞, ∞) by setting φ(t) = 0 for t < 0.) We assume that each individual has its own copy φ x , and we at first for simplicity assume that the pairs (Ξ x , φ x ) for all individuals are independent and identically distributed; this assumption can (and will) be relaxed, see Remark 5.10 below.
Given a characteristic φ, let be the total characteristic at time t of all individuals that have been born so far. (Recall that x is born at time σ x , and thus has age t − σ x at time t.) The random tree T t has a random size. We are usually interested in random trees with a given number of nodes, or trees where something else is given, for example the number of keys in an m-ary search tree. We can obtain such random trees by stopping the branching process as follows. Fix a characteristic ψ(t), which we shall call weight, and let τ (n) := inf{t : Z ψ t n}, i.e., the first time the total weight is at least n. (As usual, we define inf ∅ = ∞.) We exclude the trivial case when ψ(t) = 0 for all t 0 a.s. (which would give τ (n) = ∞ a.s.). Define T n := T τ (n) , the family tree at the time the total weight reaches n (provided this ever happens).
Random trees T n defined in this way, for some Crump-Mode-Jagers branching process and some weight ψ(t), are the focus of the present paper. We shall always denote the weight by ψ and the random tree, stopped as above, by T n omitting ψ from the notation for simplicity. (In all our examples, ψ is integer-valued, so it is natural to let n be an integer. This is not necessary, however, and all our results are valid for arbitrary real n → ∞.) Example 5.3. If ψ(t) = 1, t 0, then Z ψ t = Z t , and T n is the family tree of the branching process stopped when there are n nodes or more; if the birth times have continuous distributions and there are no twins, then a.s. no two nodes are born simultaneously, and thus we stop when there are exactly n nodes, so |T n | = n. (This weight is used in all examples in Section 6, but not for the m-ary search trees in Section 7.) We define the Laplace transform of a function f on [0, ∞) by and the Laplace transform of a measure m on [0, ∞) by (Note that there is a factor θ in (5.2) but not in (5.3). A justification of this difference is that a measure m has the same Laplace transform m as the function m(t) := m([0, t]), as is easily verified by an integration by parts, or by Fubini's theorem for the integral s t θe −θt m(ds).) Some standing assumptions in this paper are: (A1) µ{0} = E Ξ{0} < 1. (This rules out a rather trivial case with explosions already at the start. In all our examples, µ{0} = 0.) (A2) µ is not concentrated on any lattice hZ, h > 0. (The results extend to the lattice case with suitable modifications, but we ignore it.) (A3) E N > 1. (This is known as the supercritical case.) For simplicity, we further assume that N 1 a.s., but see Remark 5.5. (In this case, every individual has at least one child, so the process never dies out and Z ∞ = ∞.) (A4) There exists a real number α (the Malthusian parameter) such that µ(α) = 1, i.e., The random variable sup t e −θt φ(t) has finite expectation for some θ < α. Nerman [104,Theorem 6.3] (see also Jagers [71, Section 6.10] for related results) shows that under the conditions (A1)-(A6), as t → ∞, The right-hand side of (5.5) is finite by (A6). Thus, if we exclude the trivial case when φ(t) = 0 for all t 0 a.s., 0 < m φ < ∞.
Note that (A1)-(A5) are conditions on the branching process, while (A6) is a condition on the characteristic φ (and α), and thus is relevant only we consider some φ. When discussing trees T n defined by stopping using a weight ψ as above, we sometimes want (A6) to hold for ψ; we denote this version of the condition by (A6ψ). (However, for most of our results, (A6ψ) is not required. In any case, in Example 5.3 and in all our examples in Sections 6 and 7, ψ(t) is bounded, so (A6ψ) holds trivially.) Remark 5.4. As a consequence of (A4), µ(t) < ∞ for every t < ∞. (However, µ(∞) = E N may be infinite.) It is a standard result that this implies that Z t and E Z t are finite for every t < ∞.
Remark 5.5. We do not really need the assumption N 1 in (A3); it suffices that E N > 1. In this case, the extinction probability q := P(Z ∞ < ∞) < 1, so there is a positive probability that the process is infinite, and (5.5) and the results below hold conditioned on the event Z ∞ = ∞. (This is the standard setting in [104; 72; 105].) Remark 5.6. By (5.4), e −αt µ(dt) is a probability measure on [0, ∞). See Remark 5.22 for an interpretation of this distribution.
Remark 5.9. The results can be extended to multi-type branching processes, see Jagers and Nerman [73].
Remark 5.10. We have for simplicity assumed above that the characteristic φ x (t) associated to an individual x is independent of the life histories of all other individuals. As shown by [104,Section 7], the results extend to characteristics φ x (t) that may depend also on the descendants of x; we may let φ 0 (t) be any non-negative random function that depends on the entire branching process (and belongs to D[0, ∞) and satisfies (A6)), and then define φ x (t) as φ 0 (t) evaluated for the branching process consisting of x and its descendants (shifting the origin of time to the birth of x). This will be important below.
Remark 5.13. Note that (5.16) does not hold in the lattice case (in this paper excluded by (A2)), since then the population and Z ψ t grow in discrete steps with asymptotically a fixed factor > 1 each time.
We next study the fringe tree T * n . Note that the following theorem (and its proof) applies both if we consider T t as an unordered tree and if we consider it as an ordered (or m-ary) tree as in Remark 5.1; in the latter case T n and the fringe tree T * n are random ordered (or m-ary) trees, and T below should be an ordered (or m-ary) tree. We may also have labels on the nodes, defined by some random function ℓ(t) as in Remark 5.2; then T should be a tree with (arbitrary) labels on the nodes.
Recall from Section 2 that a property of a node v that depends only on v and its descendants may also be regarded as a property of rooted trees (and conversely).
(ii) (Quenched version.) For every finite tree T , as n → ∞, More generally, for every property P of a node v that depends only on v and its descendants, Furthermore, for a property of this type, More precisely, the characteristic φ in (5.24) is defined as in Remark 5.10 with φ 0 (t) := 1{T t ∈ P}.
Proof. This is a special case of the main results in Jagers and Nerman [72] and [105], and is one of the main examples in Aldous [1], but we give the simple proof for completeness and in our setting.
Moreover, the characteristic φ x (t − σ x ) of x at time t is the indicator 1{T x t ∈ P} that the subtree T x t of T t rooted at x satisfies P. Thus, the total characteristic Z φ t is the number of nodes v ∈ T t such that T v t ∈ P, which by definition holds if and only if v has the property P; hence, Z φ t = n P (T t ). Consequently, (5.5) yields By Theorem 5.12, we also have (a.s.) τ (n) < ∞ for every n and τ (n) → ∞ as n → ∞; thus (5.25) implies, as n → ∞, The result (5.22) follows from (5.26) and (5.24). As said above, (5.22) is a special case, and the annealed version (i) follows by taking the expectation in (5.22), yielding (by dominated convergence) P(T * n ≈ T ) → P(T ≈ T ) for every fixed tree T . (Recall that there is only a countable set of finite trees T , so this shows convergence in distribution. Alternatively, one can take the expectation of (5.23).) Remark 5.15. As said above, (5.22) is a special case of (5.23). Conversely, again because there is only a countable set of finite trees T , (5.22) is equivalent to the a.s. convergence of the distributions in (4.12), and thus to (5.23), cf. Remark 4.1. (In general, for distributions on a countable sample space, convergence of the individual point probabilities is equivalent to convergence in total variation [60,Theorem 5.6.4].) Hence, (5.22) and (5.23) are equivalent. (We state both versions for convenience in later applications.) Remark 5.16. We have stated the result (5.22) for the stopped trees T n , but proved it by proving the corresponding result for the full branching process, see (5.25) and (5.26). In fact, the two types of results are equivalent; by choosing the weight ψ = 1 as in Example 5.3, the trees T t run through the same (countable) set of trees as t → ∞ as T n does as n → ∞; hence (5.25) and (5.26) are equivalent. The same holds for (5.23) and for (5.43) and (5.51) in Theorems 5.25 and 5.26 below, where again we state the results for T n , in view of our applications in later sections, but the results also hold for T t .
Remark 5.17. Note that the asymptotics in Theorem 5.14 do not depend on the choice of weight ψ; any weight gives the same asymptotic fringe tree distribution. Of course, this is an immediate consequence of the proof using (5.26) and (5.25), see also Remark 5.16. Note that for this proof, it is essential that we consider convergence almost surely (and not, e.g., in probability).
Remark 5.18. In cases when |T n | is random, it is often of interest to study the number n P (T n ) rather than the fraction n P (T n )/|T n | in (5.23). Assuming (A6ψ), we can combine (5.23) and (5.16) and obtain This is particularly nice in the common case when the weight ψ(t) = 1, so |T n | = n deterministically; then (5.28) can be written For other weights ψ, we can (assuming (A6ψ)) use (5.27). If we furthermore have a deterministic bound |T n | Cn for some constant C (which, for example, is the case for the m-ary search trees in Sections 7.1 and 7.2), then dominated convergence applies again and yields We give a simple but important corollary to Theorem 5.14, showing that the degree distribution in T n converges to the distribution of D := Ξ([0, τ ]) (with Ξ and τ independent). Corollary 5.20. Let n k (T n ) be the number of nodes in T n with outdegree k. Under the assumptions (A1)-(A5) above, Proof. Let P be the property of a node that it has outdegree k. Then n k (T ) = n P (T ). Hence, (5.23) shows that n k (T n )/|T n | a.s. converges to the probability that the root of T has (out)degree k. However, the root of T t has degree Ξ([0, t]), so the degree D of the root of T = T τ equals Ξ([0, τ ]) and (5.31) follows.
See further Remark 5.23 below.
In order to extend Theorem 5.14 to the extended fringe, we first define the limiting random sin-tree T ; this is the family tree of the doubly infinite pedigree process in [105] (doubly infinite stable population process in [72]). In the latter, we start with an individual o ("ego") born at time 0, and grow a branching process starting with it as usual. We also give o an infinite line of ancestors o (1) , o (2) , . . . having a modified distribution of their life histories defined below, and let each child x of each ancestor o (k) , except x = o (k−1) , start a new branching process where all individuals have the original distribution. We denote the (infinite) family tree of this branching process by T t , −∞ < t < ∞. Finally, we stop the entire process at a random time τ ∼ Exp(α) as before, and let T := T τ be the resulting sin-tree, with distinguished node o. (Note that the subtree of T rooted at o equals T defined in Theorem 5.14.) It remains to define the distribution of the life history of an ancestor. This is really a distribution of a life history with a distinguished child, which we call the heir. The heir may be any child, but the probability distribution is weighted by e −ατ , where τ is the time the heir is born. Thus, recalling that the children are born at times (ξ i ) N i=1 , for any event E in the life history, P(E, and the heir is the i:th child) = E e −αξ i dP, (5.34) where for i > N we define ξ i = ∞, so e −αξ i = 0. In particular, Note that (5.34) defines a probability distribution, since the total probability equals We may give the children of the ancestor another order as in Remark 5.1, still using (5.34). Note that then (5.34)-(5.35) hold also if we consider the i:th child in the final order and redefine ξ i as the birth time of that child; this is seen by summing over all children and combinations of marks ν j that put a certain child in place i at a given time.
The ancestors o (k) are given independent copies of this modified life history distribution, and are put together so that the heir of o (k) is o (k−1) (with o (0) = o); this also defines recursively the birth times of all o (k) .
Remark 5.22. Let ξ * denote the age of an ancestor when its heir is born. Then ξ * has by (5.34) the distribution e −αt µ(dt), i.e., the distribution in Remark 5.6. Its Laplace transform is given by cf. (5.34) and (5.6)-(5.7). Assumption (A5) thus says that E e ε ξ * < ∞ for some ε > 0. In particular, ξ * has a finite expectation By (5.37), we also have the formula and directly from (5.34), or by (5.38), Remark 5.23. Let, as in (5.35), q i be the probability that the heir in the ancestor distribution is child i (in birth order), and let D = Ξ([0, τ ]) be the degree of the root in T , which by Corollary 5.20 is the limit in distribution of the outdegree of a random node in T n . By (5.35) and (5.33), so the two distributions are closely related. Note also that (5.41) implies so the average asymptotic outdegree is always 1. This should not be surprising; it just is an asymptotic version of the fact that in a tree with n nodes, there are together n − 1 children, and thus the average outdegree is 1 − 1/n; see also see [1, Lemma 1].
Remark 5.24. Recall that we may regard the node set of T as a subset of V ∞ , the node set of the infinite Ulam-Harris tree. Let v ∈ V ∞ . By the recursive definition of the branching process T t and the memoryless property of the exponential random variable τ , it follows that conditioned on v ∈ T = T τ , the subtree of T rooted at v has the same distribution as T . In particular, conditioned on v ∈ T , the outdegree of v has the same distribution as D.
It follows from this and (5.42), by induction, that for every k 0, the expected number of nodes in the k:th generation of T t is 1. In particular, the expected size E |T t | = ∞.
Note also that the outdegrees of two different nodes are not independent, since they both depend on the common stopping time τ ; it is easy too see that for any v, w ∈ V ∞ , conditioned on v, w ∈ T , the outdegrees deg(v) and deg(w) are (strictly) positively correlated.
In fact, the properties in this remark except the last one hold for any fringe distribution in the sense of Aldous [1], see [1, Section 2.1]. However, the positive correlation of node degrees is not general; in particular, it makes the asymptotic fringe trees T studied in this paper different from the ones obtained from conditioned Galton-Watson trees, since the latter are just unconditioned Galton-Watson trees, where all outdegrees are independent, see [1].
Theorem 5.25 (Jagers, Nerman, Aldous). Under the assumptions (A1)-(A5), as n → ∞, h(v) p −→ ∞ for a random node v ∈ T n and thus each T * ,−k n is well-defined w.h.p.; moreover, the following hold: Proof. Again, this is a special case of the main results in Jagers and Nerman [72] and [105], and is at least implicit in Aldous [1], but we give the proof for completeness. First consider the case of ordered trees (possibly with labels) with the children taken in order of birth. Fix a finite tree T with a distinguished node of depth k 0, and let v 0 · · · v k be the path in T from the root to the distinguished node; also, let v i be the j i :th child of v i−1 . Let P = P(T ) be the property of a node v that it has depth at least k and that, if w is its k:th ancestor, the subtree T w , with v as distinguished node, is isomorphic to T . Then n P (T n ), the number of v ∈ T n that have this property, equals the number of w ∈ T n such that T w n ≈ T , i.e., n T (T n ). Thus, by Theorem 5.14, Construct T = T τ as above, and let Moreover, conditioned on τ V , τ has the same distribution as V +τ ′ with τ ′ ∼ Exp(α) and independent of everything else. Thus, by conditioning on V , By shifting the time parameter in T by V , so that the distinguished node v k becomes born at time 0, and recalling that the subtree T o (k) has the modified distribution (5.34) for the ancestors of the distinguished node, we see that (5.45) equals Consequently, (5.44)-(5.46) show that for every finite tree with a distinguished node of depth k. More generally, for any fixed k 0 and any set A of finite trees, each having a distinguished node of depth k, let P = P(A) := T ∈A P(T ) be the property of a node v that it has depth at least k and that T w ∈ A, where w is its k:th ancestor. Then as in (5.44), by Theorem 5.14 applied to the property T w ∈ A, and using again (5.45)-(5.46), In particular, taking A to be the set of all finite trees, P(A) is the property that h(v) k and p P(A) ( T ) = 1, so (5.48) shows that for any k, P(h(v) k) → 1 for a random node v in T n . Since k is arbitrary, thus h(v) p −→ ∞. Moreover, every property P in (5.43) is of the form P(A) for some k and A, and thus the result (5.43) follows.
As in Theorem 5.14, the annealed case follows from the quenched case by taking expectations.
The case of unordered trees follows by ignoring the order. Finally, if T t is an ordered tree with the order of children defined by marks ν i as in Remark 5.1, we first fix an integer M and consider T t and T ordered by birth order and with each node labelled with the sequence of marks ν M : (in addition to existing labels, if any). (We use a cut-off M in order to keep the space of labels countable.) We have just shown that (5.48) holds for any set A of ordered trees with such a label ν M on each node. Since the birth order and the marks define the true order in the trees, it follows immediately that (5.47) holds also with the true order in T n and T , for any tree T with such marks and with maximum degree at most M . Since M is arbitrary, it holds with the true order for any T , and we may then forget the marks. (For an m-ary tree, we keep the marks.) Then (5.48) and (5.43) follow as above.

5.
1. An extension to some more general properties. In Theorem 5.25, we consider only properties of a node v that depend only on v, its ancestors at most a fixed number of generations back, and their descendants. (Theorem 5.14 is even more restrictive.) In this subsection, we show how this result can be extended to some properties that depend on all ancestors of v. A typical example is the property that v has no ancestor with outdegree 1; we consider this and some related examples in Section 11. (This section can be omitted at the first reading.) Theorem 5.26. Let P 0 and Q be two properties of a node v in a tree, such that both P 0 and Q depend only on v and its descendants. Let P be the property of a node v that v satisfies P 0 but no ancestor of v satisfies Q. Suppose, in addition to (A1)-(A5), that and that if Λ := sup{t : T t ∈ Q}, then E e δΛ < ∞ (5.50) for some δ > 0. Then, as n → ∞, if v is a uniformly random node in T n , In other words, (5.43) holds also for properties P of this type, although they are not covered by Theorem 5.25.
(5. 52) In the examples in Section 11, the property Q is (or can be taken as) decreasing in the sense that if it holds for some rooted tree T , then it holds also for every subtree with the same root; hence if Q holds for T u with u t, then it holds for T t , so (5.52) can be simplified to (5.53) Before the proof, we give a lemma.
Lemma 5.28. Suppose that (A1)-(A5) and (5.49) hold. Let Q be a property of rooted trees such that (5.52) holds for some δ > 0. Then there exists η > 0 and a < ∞ such that Proof. The left-hand side of (5.54) equals Z φ t /|T t |, where φ(t) is the characteristic given as in Remark 5.10 with The result thus follows from (5.5), provided we can choose η > 0 such that (A6) holds for this φ.
Proof of Theorem 5.26. For each integer M , let P M be the truncated property "v satisfies P 0 but no ancestor at most M generations before v satisfies Q." Then P M is covered by Theorem 5.25, so as n → ∞, for each M . Since P is the intersection of the decreasing sequence of properties P M , it is clear that p P M ( T ) → p P ( T ) as M → ∞. Furthermore, n P (T t ) n P M (T t ) and for any η > 0, writing w ≺ v when w is an ancestor of v, (5.61) Consequently, using also (5.59), a.s., The right-hand side tends to 0 as M → ∞, and the theorem follows.

Examples with uniform or preferential attachment
We begin with a few standard examples, where we repeat earlier results by other authors, together with some new results on the limiting sin-trees.
In all examples in this section, |T n | = n, so we stop the branching process using the weight ψ(t) = 1 as in Example 5.3. Since this weight is bounded, (A6ψ) holds trivially. Example 6.1 (random recursive tree). An important example, considered already by Aldous [1], is the random recursive tree. This tree, usually considered as an unordered rooted tree, is constructed recursively by adding nodes one by one, with each new node attached as a child of a (uniformly) randomly chosen existing node, see [44,Section 1.3.1]. It is easy to see, by the memoryless property of the exponential distribution, that the random recursive tree with n nodes is the tree T n defined in Section 5 for the branching process where each individual gives birth with constant intensity 1, i.e. with independent Exp(1) waiting times between births, and weight function ψ(t) = 1 as in Example 5.3. In other words, the point process Ξ describing the births of the children of an individual is a Poisson process with intensity 1. This branching process (or just the sizes (|T t |) t 0 ) is often called the Yule process, so the process (T t ) t of trees is called the Yule tree process [1]. Note that Yule process formed by the size |T t | is a pure birth process where the birth rate λ n = n, see Example A.3. We will need some notation. Let X i := ξ i − ξ i−1 (with ξ 0 := 0) be the waiting times between the births of the children of a given individual. Thus X i are i.i.d. Exp(1), and ξ i = i j=1 X j ∼ Γ(i, 1) has a Gamma distribution. The intensity measure µ is Lebesgue measure on [0, ∞), so As shown by Aldous [1], the limiting fringe tree T = T τ can also be described as a random recursive tree with a random number M nodes, where In fact, by symmetry, if M = |T | and we condition T on M = n, we get a random recursive tree on n nodes. Moreover, if we at some time have n 1 individuals in the branching process, then a new child is born with intensity n, while the process stops (at τ ) with intensity 1, so the probability that the process continues with at least one more individual is n/(n + 1). In other words, P(M n + 1) = n n+1 P(M n), and thus by induction P(M n) = 1/n and (6.2) follows. (For an alternative argument, see Example 6.4 below.) As noted by [1], various results for the random recursive tree T n now follows from Theorem 5.14. For example, the asymptotic distribution of the size of a random fringe tree is given by (6.2). Furthermore, the asymptotic distribution of the outdegree of the nodes in T n equals by Corollary 5.20 the distribution of the root degree D in T , which is geometric Ge 0 (1/2) as an immediate consequence of (5.33). (See (6.3) below and (5.41).) See Section 10.2 for yet another example.
In order to construct the random sin-tree T , which enables applications of Theorem 5.25 on the extended fringe, we have to find the distribution of the life history of the ancestors, given by (5.34). Consider an ancestor and denote its successive birth times by ξ i , i 1, and let X i := ξ i − ξ i−1 (with ξ 0 := 0) be the succesive waiting times. Furthermore, let Ξ be the point process of all births of children of this ancestor (thus Ξ = i δ ξ i ) and let J be the number of the heir (in birth order). Then, by (5.35), Thus J has the (shifted) geometric distribution Ge 1 (1/2). Moreover, conditioned on J = j, the joint density of ( X 1 , . . . , X m ), for any m j, is by (5.34) Consequently, conditioned on J = j, the waiting times X i between the births for an ancestor are independent, with X i ∼ Exp(2) for i J and We claim that we can describe Ξ in a simpler way as a Poisson process Ξ with intensity 1, plus an extra point Z ∼ Exp(1), independent of Ξ, with Z the heir. To see this, note that with this description, the first point of Ξ is either the first point of Ξ or Z; these two first points are both Exp(1) and independent, so the first point X 1 , which is the smallest of these two points, will be Exp (2). Furthermore, with probability 1/2, this point is the heir Z, so J = 1, and then the rest of the process is Ξ, with independent Exp(1) waiting times. And with probability 1/2, X 1 comes from Ξ, and then the whole process repeats from X 1 , so the next waiting time X 2 ∼ Exp(2), and so on. A simple induction shows that this yields both the distribution of J in (6.3) and the right conditional distribution of ( X i ) ∞ 1 given J = j for each j, which proves the claim.
We can thus describe the random sin-tree T as follows: First construct an infinite chain of ancestors o (1) , o (2) , . . . of o (backwards in time), with the times between their births i.i.d. Exp(1); in other word, (o (k) ) k 1 are born according to a Poisson process with intensity 1 on (−∞, 0). Then grow independent Yule tree processes from all o (k) , k 0. Finally, stop everything at τ ∼ Exp(1). (Cf. Aldous [1,Section 4], where the description is less explicit.) For an application, see Theorem 11.6. Example 6.2 (binary search tree). Another important example studied by Aldous [1] is the (random) binary search tree. This is the case m = 2 of the m-ary search tree in Section 3, but it is simpler than the general case, so we treat it separately, using a slightly different but equivalent formulation. (Since each (internal) node has exactly one key, the number of keys equals the number of nodes, and we can ignore the keys completely.) The binary search tree can be grown recursively as follows. (See e.g. [44] for other, equivalent, constructions.) Start with a single node. Since we grow a binary tree, each node may have a left child and a right child. When the tree has n nodes, there are n + 1 empty places for children (these places are the external nodes in the description in Section 3). The tree grows by adding a node to one of these n + 1 places, chosen uniformly at random. Similarly as in Example 6.1, it is easy to see that the binary search tree is the tree T n produced by the branching process where each individual has two children, labelled left and right and born at age ξ L and ξ R , say, with ξ L and ξ R both Exp(1) and independent; furthermore we use again the weight function ψ(t) = 1 as in Example 5.3. (This continuous-time branching process seems to have been first used to study the binary search tree by Pittel [111], who considered the height and saturation level, see Section 13.) We thus have N = 2. Since each child is born with the density function e −x , the intensity measure µ of Ξ has density 2e −x . Thus Note that if we order the children in order of birth as usual, then ξ 1 = min(ξ L , ξ R ), and thus ξ 1 ∼ Exp(2), while the waiting time ξ 2 − ξ 1 for the second child is Exp(1) and independent of ξ 1 .
We see also that the size |T t | grows as a pure birth process with birth rate λ n = n + 1, see Appendix A. Equivalently, |T t | + 1, which can be interpreted as the number of external nodes, is a pure birth process with rate λ n = n, i.e., the Yule process in Example A.3, and in Example 6.1, but started at 2 instead of 1.
As shown by Aldous [1], the limiting fringe tree T = T τ can be described as a binary search tree with a random number M nodes, where P(M = n) = 2 (n + 1)(n + 2) , n 1; (6.6) cf. the similar result (6.2) for the random recursive tree. To see this we argue as in Example 6.1; the difference is that when there are n individuals, there are now n + 1 places to add a new node, and thus n + 1 independent Exp(1) for these, competing with the random time τ that stops the process; hence the probability of adding another node is (n + 1)/(n + 2) and thus by induction P(M n) = 2/(n + 1) and (6.6) follows. (For an alternative argument, see Example 6.4 below.) By Theorem 5.14, the asymptotic distribution of the size of a random fringe tree is given by (6.6). Another simple calculation in [1] shows that the asymptotic distribution of the outdegree of the nodes in T n , which by Corollary 5.20 equals the distribution of the root degree D in T , is uniform on {0, 1, 2}, see (5.33). This can also be seen without calculation: ξ L , ξ R and τ are three i.i.d. Exp(1) random variables, so the three events that τ is the smallest, the middle, or the largest of these three have by symmetry all the same probability 1/3. These events equal the events that the root in T has degree 0, 1, 2.
To find the random sin-tree T , note that by the comments after (5.34)-(5.35), (5.34) holds also when taking the children in order left-right. For an individual in T , the pair (ξ L , ξ R ) has the density function e −x L −x R . For an ancestor, the probability that the heir is the left child is 1/2 (by symmetry or by (5.35)), and it follows that conditioned on the heir being the left child, the pair (ξ L , ξ R ) has the density function 2e −x L e −x L −x R = 2e −2x L e −x R . In other words, for an ancestor, given that the heir is the left child, the age ξ * when the heir is born is Exp(2) and the age when the other child is born is Exp(1), and these two ages are independent. The same holds given that the heir is the right child. In particular, ξ * ∼ Exp(2) and thus β = E ξ * = 1/2, cf. Remark 5.22. Consequently, the random sin-tree T can be described as follows, cf. the case of the random recursive tree in Example 6.1: First construct an infinite chain of ancestors o (1) , o (2) , . . . of o (backwards in time), with the times between their births i.i.d. Exp (2); in other word, (o (k) ) k 1 are born according to a Poisson process with intensity 2 on (−∞, 0). Moreover, make a random choice (uniform and independent of everything else) for each ancestor to decide whether its heir is the left or right child. Then grow independent binary tree processes at all empty places (external nodes), with independent Exp(1) waiting times for all new nodes. Finally, stop everything at τ ∼ Exp(1). (Applications are given in Section 11.) Example 6.3 (general preferential attachment trees). We can generalise the preceding examples as follows, see Rudas, Tóth and Valkó [118] and Rudas and Tóth [117] where this example is studied using the branching process method described here; see also Bhamidi [10]. (The branching process below was also earlier used by Biggins and Grey [16] to study the height of these trees.) We thus give only a summary and some complements, in particular on sin-trees. Some special cases are treated in Examples 6.4-6.8 below, see in particular Example 6.6; these cases have been studied by many authors, using various methods. (Further references are given below, but we do not attempt a complete history.) Suppose that we are given a sequence of non-negative weights (w k ) ∞ k=0 , with w 0 > 0. Grow a random tree T n (with n nodes) recursively, starting with a single node and adding nodes one by one. Each new node is added as a child of some randomly chosen existing node; when a new node is added to T n−1 , the probability of choosing a node v ∈ T n−1 as the parent is proportional to w d + (v) , where d + (v) is the outdegree of v in T n−1 . (More formally, this is the conditional probability, given T n−1 and the previous history. The sequence (T n ) ∞ n=1 thus constitutes a Markov process.) If we want the trees T n to be ordered trees, we also insert the new child of v among the existing d + (v) children in a random position, uniformly chosen among the d + (v) + 1 possibilities.
The random recursive tree in Example 6.1 is the special case w k = 1, k 0, and the binary search tree in Example 6.2 is the special case with w 0 = 2, w 1 = 1 and w k = 0, k 2 (and, furthermore, each first child randomly assigned to be left or right).
Note that we require w 0 > 0 (and w 1 > 0 will be implicitly assumed, as a consequence of (6.12) below), but we allow w m = 0 for some larger m, as in the example of the binary search tree. In this case, no individual will ever get more than m children; in fact (provided m is chosen minimal), N = m a.s. In this case, the weights w m+1 , w m+2 , . . . are irrelevant, so it suffices to prescribe w k for k m. (In this case, we interpret 1/w m = ∞ below, and the infinite sums become finite. We leave such obvious modifications to the reader.) In some important examples, for example Example 6.6 below, w k is a strictly increasing function of k, which means that nodes with a high degree are more likely to attract a new node than nodes with a low degree; hence the name preferential attachment, which comes from Barabási and Albert [6] where this type of model was introduced (in a more general version, in general yielding graphs and not trees), see Example 6.6. The tree version of their model had been studied earlier under a different name by Szymański [121] and others, see Example 6.5. The model with general w k was considered by Móri [102].
As in the examples above, the tree T n can be constructed by a branching process as in Section 5, again with weight ψ(t) = 1 and taking the birth times ξ i := i j=1 X j , now with the waiting times between births X j = ξ j − ξ j−1 ∼ Exp(w j−1 ) and independent. In other words, the stochastic process Ξ([0, t]), t 0, (i.e., the number of children of a given individual born up to age t) is a pure birth process, starting at 0 and with birth rate w k when the state is k.
We distinguish between two cases, depending on whether this sum is finite or not.
In the explosive case, Thus ξ ∞ < ∞ a.s., i.e., an individual will have an infinite number of children in a finite time (the point process Ξ explodes). In this case, the branching process will explode in finite time, and several of the assumptions in Section 5 fail. Nevertheless, this case can be treated separately. It turns out that the random fringe tree T * n is asymptotically degenerate and w.h.p. consists of a single node only, i.e., |T * n | p −→ 1, see Theorem 6.11 below. (Equivalently, the proportion of leaves in the tree T n tends to 1.) The case w k = (k + 1) p for some p > 1 is studied by Krapivsky, Redner and Leyvraz [88], Krapivsky and Redner [87] and (rigorously and in detail) by Oliveira and Spencer [108], who show that if p > 2 (but not if 1 < p 2), the random tree process T n , n 1, is even more strongly degenerate: a.s. there exists a (random) n 0 and a node v ∈ T n 0 such that all nodes added after time n 0 become children of v (and thus remain leaves forever). See also Athreya [3].
In the sequel we consider the non-explosive case In this case, by (6.7), E ξ ∞ = ∞; moreover, it is easy to see that ξ ∞ = lim n→∞ ξ n = ∞ a.s., for example by calculating for any λ > 0, see [3]. Hence, an individual has a.s. only a finite number of children in each finite interval, i.e., Ξ([0, t]) < ∞ for every t < ∞.
The asymptotic degree distribution is by Corollary 5.20 and (5.33) given by and thus This can also be seen as an example of Theorem A.4.
To describe the life of an ancestor, let E i be the event that the heir of the ancestor is child i. We note first that if we fix M < ∞, then in the point process ξ, the waiting times X 1 , . . . , X M have the joint density function M j=1 w j−1 e −w j−1 x j . It follows from (5.34) that for any i and M with 1 i M , conditioned on E i , the waiting times ( X j ) M j=1 between the M first children of the ancestor have a joint density function that is proportional to Furthermore, by (5.41) and (6.13) (or by tracking constants in the argument just given), Consequently, the point process Ξ describing the births of the children of an ancestor can be constructed as follows: Select the number I of the heir at random, with the distribution (6.16). Then, conditioned on I = i, let the waiting times X j be independent exponential variables, with X j ∼ Exp(w j−1 + α) for j i and X j ∼ Exp(w j−1 ) for j > i. The limiting random sin-tree T then is constructed as in Section 5.
In Examples 6.1 and 6.2, we have seen alternative, simpler, constructions of Ξ. This will be extended to the linear case in Example 6.4 and Theorem 6.9, but it does not seem possible to extend it further. In particular, we show in Theorem 6.10 that the age ξ * when the heir is born has an exponential distribution only in the linear case. See also Example 6.8 for a simple non-linear example.
Example 6.4 (linear preferential attachment). The simplest, and most studied, case of preferential attachment as in Example 6.3 is the linear case for some real parameters χ and ρ, with ρ = w 0 > 0. Note that we obtain the same random trees T n if we multiply all w k by a positive constant. (In the branching processes, only the time scale changes.) Hence, only the quotient χ/ρ matters, and it suffices to consider χ ∈ {1, 0, −1}.
The case χ = 0 is the (non-preferential) random recursive tree in Example 6.1. (In this case ρ is irrelevant and we take ρ = 1.) The case χ = 1 (the increasing case) is studied in Example 6.6.
In the case χ = −1, so w k = ρ − k, w k is eventually negative. This is impossible, and violates our basic assumption in Example 6.3. However, this is harmless if (and only if) ρ = m is an integer; then w m = 0 and, as said above in Example 6.3, the values w k for k > m do not matter. This is the m-ary case studied in Example 6.7; the binary search tree in Example 6.2 is the special case χ = −1, ρ = 2.
We continue with some results valid for any linear weight (6.17), and refer to Examples 6.1, 6.6 and 6.7 for further results for the different cases χ = 0, 1, −1.
Since Ξ([0, t]) is a pure birth process with a rate that is a linear function χk + ρ of the current state k, and with initial value 0, it is easy to see, see Theorem A.6, that the expectation E Ξ([0, t]) = µ(t) is given by Hence, µ has density ρe χt (also when χ = 0), cf. (A.15), and thus It follows that (6.12) holds, and that (5.4) holds with (Alternatively, (6.19) can be verified algebraically, see (6.30) and (6.41) below.) By Remark 5.22 and (6.20), the age ξ * when an heir is born to an ancestor has density e −αt µ(dt) = e −αt ρe χt dt = ρe −ρt dt; thus ξ * has an exponential distribution Exp(ρ). (This also follows from (6.19) and the formula for the Laplace transform in Remark 5.22.) As a consequence, generalizing the values of β found in Examples 6.1 and 6.2, We claim that the life history Ξ of an ancestor can be described as follows (as a simpler alternative to the general construction in Example 6.3), cf. the special cases in Examples 6.1-6.2; we postpone the proof to Theorem 6.9 below: For an ancestor, the ordinary children are born according to a point process Ξ ′ which is a pure birth process, with birth rate w k+1 = w k +χ when the state (number of ordinary children so far) is k, and the heir is born at an age ξ * ∼ Exp(ρ), independent of Ξ ′ . Consequently, the limiting random sin-tree T can be constructed as follows, generalising the constructions in Examples 6.1-6.2: First construct an infinite chain of ancestors o (1) , o (2) , . . . of o (backwards in time), with the times between their births i.i.d. Exp(ρ); in other word, (o (k) ) k 1 are born according to a Poisson process with intensity ρ on (−∞, 0). Give each ancestor additional children according to independent copies of Ξ ′ (where the intensities are shifted from Ξ, as said above). Then, every other individual gets children according to independent copies of Ξ. Finally, stop everything at τ ∼ Exp(α) = Exp(χ + ρ).
The linear case (6.17) treated in this example is simpler than the general case in Example 6.3 in several ways. For example, we have shown that the age ξ * when the heir of an ancestor is born has an exponential distribution, and (as said earlier) it will be shown in Theorem 6.10 that this holds only in the linear case. An important reason (perhaps the main reason) that the linear case is simpler is that the total weight in a tree depends only on the size of the tree: if |T | = n, then the total weight of the nodes in T , which we may label by 1, . . . , n, is This property has several important consequences. First, it follows (as remarked for the random recursive tree and the binary search tree above) that if M = |T | and we condition T on M = n, we get the random tree T n . (The property called coherence by Aldous [1,Section 2.6].) The distribution of M can be found by the same argument as for the random recursive tree in Example 6.1, which now, using (6.22) and (6.20), yields and hence Consequently, T can be described as the random tree T M with a random size M given by (6.24). An alternative way to see (6.24) is to note that (6.22) implies that the size Z t = |T t | of the branching process is a pure birth process with birth rates λ n = nα − χ = n(χ + ρ) − χ, and thus |T t | − 1 is a pure birth process with birth rates λ n = (n + 1)α − χ = nα + ρ. (The special case χ = 0, ρ = 1, when |T t | is a Yule process, was noted in Remark 6.1.) Theorem A.5 shows that (6.26) which by (B.3) and (B.5), or simpler by (B.10), yields (6.24). Note also that Theorem A.7 shows that the size |T t | at a fixed time, minus 1, has a negative binomial distribution.
Furthermore, (6.22) implies that if we label the nodes of T n by 1, . . . , n in the order they are added to the tree, so that T n becomes an increasing tree (or recursive tree [44, Section 1.3]), then the probability that T n equals a given ordered increasing tree T (with |T | = n) is, by the definition and a simple rearrangement, Hence T n has the distribution of a simply generated random increasing tree [44, Section Conversely, a simply generated random increasing tree can be generated by a random evolution where nodes are added one by one only when its weight sequence is of this form, for some w k of the form (6.17) [89], [110]. (Such trees are called very simple increasing trees in [89], [110].) In other words, the random increasing tree generated by a general sequence of weights w k (as in Example 6.3) is a simply generated increasing tree if and only if the weights are of the linear type (6.17). (I.e., we are in the case of the present example.) Finally, (6.22) is very useful when using martingale methods (which we do not do in the present paper). Example 6.5 (plane oriented recursive tree). A random plane oriented recursive tree, introduced by Szymański [121], is constructed similarly to the random recursive tree in Example 6.1, but we now consider the trees as ordered; an existing node with k children thus has k + 1 position in which a new node can be added, and we give all possible positions of the new node the same probability. The probability of choosing a node v as the parent is thus proportional to d + (v) + 1, so the plane oriented recursive tree is the case w k = k + 1 of Example 6.3. This is the special case χ = ρ = 1 of Example 6.4, and thus the special case ρ = 1 of the following example (Example 6.6), where some results and further references are given. Example 6.6 (positive linear preferential attachment). Consider the case χ = 1 of (6.17), i.e., w k = k + ρ, k 0, (6.28) where ρ > 0 is a parameter.
Thus, w k is a strictly increasing function of k, so this is a model with preferential attachment as mentioned in Example 6.3. This is a popular model, that has been studied by many authors (often by methods different from the branching processes used here). The original preferential attachment model by Barabási and Albert [6] was the case ρ = 1, so w k = k + 1; thus the probability of attaching a new node to an existing node v is proportional , the total degree of the node (except for the root). As said above, trees of this type had earlier been studied by Szymański [121]. (Barabási and Albert [6] considered a more general model where a new node may be attached to more than one existing node, thus creating graphs that are not trees. We only consider the tree case here.) Bollobás, Riordan, Spencer and Tusnády [18] made a precise formulation of the definition, and found (and proved rigorously) the asymptotic degree distribution (in the general, graph case). See also van der Hofstad [64,Chapter 8], with many details and references. The tree model with a general ρ was studied by Móri [102]. See also Athreya, Ghosh, and Sethuraman [4] for an extension with multiple edges, treated by an extension of the methods used here. Rudas, Tóth and Valkó [118] and Rudas and Tóth [117] also used the branching process method described here.
In the case (6.28), (6.11) becomes a hypergeometric series where F is a hypergeometric function, see (B.1) in Appendix B; the series converges for θ > 1, and then (6.29) and (B.2) yield as we have seen by another method in (6.19). Consequently, or by (6.20), the Malthusian parameter is The asymptotic degree distribution is by (6.13)-(6.14) and (6.31) given by and thus This is the hypergeometric distribution HG(ρ, 1; 2ρ + 2), see Definition B.1.
Example 6.7 (m-ary increasing tree, negative linear preferentialattachment). We may generalise the binary case Example 6.2 and grow a random m-ary tree as follows, for any m 2. This is sometimes called an m-ary increasing tree. Note that for m > 2, this will not give the m-ary search tree defined in Section 3. (One difference is that we here fix the number of nodes to be n, while the m-ary search tree has a random number of nodes, but this is a minor technicality, see Remark 7.1. A more essential difference is seen in the asymptotic degree distribution D, see Theorem 7.14) Start with a single node. Let each node have m positions for children, labelled 1, . . . , m. Add each new node to an empty child position in the tree, chosen uniformly at random. (We may, as in Section 3, regard the empty child positions as external nodes.) Since a node with outdegree d has m − d empty positions for children, this is an instance of the general preferential attachment in Example 6.4, with This is thus the case χ = −1 of the linear case in Example 6.4 (with ρ = m), so all results there hold. In particular, by (6.19)-(6.20), µ has density me −t , µ(θ) = m θ + 1 (6.39) and α = m − 1. (6.40) Also in the case (6.38), (6.11) becomes a hypergeometric series; in this case we obtain, cf. (6.29) and (B.1), (This is a case where the hypergeometric series is finite.) Gauss' formula (B.2) yields another proof of (6.39).
The asymptotic degree distribution is by (6.13)-(6.14) and (6.40) given by and thus The point process Ξ contains m points, with successive exponential waiting times with rates m, m − 1, . . . ,1. As is well-known, this process can also be constructed by taking m i.i.d.ξ 1 , . . . ,ξ m ∼ Exp(1) and ordering them as ξ 1 < · · · < ξ m . Since the construction of the m-ary tree also involves randomly labelling the children, it follows that ifξ i denotes the age when child at position i is born, thenξ 1 , . . . ,ξ m are i.i.d. Exp (1). The growing tree T t is thus the subtree of the (rooted) infinite m-ary tree, where each child of each node is born after an Exp(1) waiting time (with all these waiting times independent).
Similarly, for an ancestor, the process Ξ ′ of its ordinary children described in Example 6.4 and Theorem 6.9 simply consists of m−1 i.i.d. Exp(1) points. Furthermore, the age ξ * when the heir is born is Exp(m) and independent of Ξ ′ . Consequently, the description of the limiting random sin-tree T in Example 6.4 can be simplified as follows, cf. the binary case in Example 6.2: Construct an infinite chain of ancestors o (1) , o (2) , . . . of o (backwards in time), with the times between their births i.i.d. Exp(m); in other word, (o (k) ) k 1 are born according to a Poisson process with intensity m on (−∞, 0). Moreover, make a random choice (uniform and independent of everything else) for each ancestor to decide which of its m children that is its heir. Then grow independent m-ary tree processes at all empty places (external nodes), with independent Exp(1) waiting times for all new nodes. Finally, stop The examples above are all cases of Example 6.3, and all except the general Example 6.3 itself are special cases of Example 6.4. We have seen that in the latter cases, the age ξ * when the heir is born to an ancestor has an exponential distribution, and is independent of the births of the other children. We give a simple example showing that this is not always the case. Example 6.8 (Binary pyramids). Let w 0 = w 1 = 1 and w k = 0 for k > 1. Thus no node ever gets more than 2 children, and we can regard the result as a binary tree by randomly labelling children as left or right as in Example 6.2, but the difference is that we here have w 0 = w 1 ; hence, when adding a new node, the parent of the new node is chosen uniformly among all existing nodes with less than 2 children. (I.e., as in Example 6.1 but with a cut-off at 2 children.) This random tree was called a binary pyramid by Mahmoud [94] who studied their height. (The name comes from pyramid schemes for chain letters, see Gastwirth and Bhattacharya [57]. As said in [94], the definition can be generalized to an arbitrary cut-off m 2; we leave this case to the reader.) We have ξ 1 = X 1 ∼ Exp (1) with density e −x and ξ 2 = X 1 + X 2 ∼ Γ(2, 1) with density xe −x . Hence, the intensity µ has density (1+x)e −x and Laplace transform, by (5.7) or (6.11), Hence (5.4) is satisfied with α = ( √ 5 − 1)/2 (the inverse golden ratio). By Remark 5.22, the age ξ * when an heir is born has the density (1+t)e −(1+α)t = (1 + t)e − 1 2 ( √ 5+1)t , and thus, or by (5.39), β = E ξ * = (3 √ 5 − 5)/2. Furthermore, by (5.35), Thus, by (5.34), Ξ, describing the life history of an ancestor, can be described as a mixture: with probability q 1 = ( √ 5 − 1)/2, an heir is born at age ξ 1 ∼ Exp(1+α), and then another child is born after an independent waiting time ξ 2 − ξ 1 ∼ Exp(1); with probability 1 − q 1 , first another child is born at age ξ 1 ∼ Exp(1 + α), and then an heir is born after an independent waiting time ξ 2 − ξ 1 ∼ Exp(1 + α). We obtain also, by this or directly from (5.34), the joint density f (x, y) of the ages when the ordinary child and the heir is born as Consequently, the two births are not independent (unlike the linear case in Example 6.4). Since ξ * is not exponential, the times of births of the ancestors o (1) , o (2) , . . . do not form a Poisson process on (−∞, 0).
The asymptotic degree distribution is by (5.41) given by P( We end this section by proving some claims made above. First the ancestor in the linear case. Theorem 6.9. For the linear preferential attachment in Example 6.4, with weights w k = χk + ρ, the life history Ξ of an ancestor consists of an heir born at age ξ * ∼ Exp(ρ) together with ordinary children born according to a pure birth process Ξ ′ , with rate w k+1 when there are k ordinary children, with ξ * and Ξ ′ independent.
Proof. Consider an ancestor, let ξ * be its age when the heir is born, and denote its age at the births of the other children by ξ ′ 1 < ξ ′ 2 < . . . . (Also, let furthermore, if this event holds, then ξ * = ξ i and Fix i and M > i. For an ordinary individual, the joint distribution of (ξ 1 , . . . , ξ M +1 ) has density, on Hence, for an ancestor, (5.34) shows that restricted to the event E i , the joint density of ( and, using (6.48), the joint distribution of (ξ This equals the joint density of the first M points of the birth process Ξ ′ defined in the statement, together with an independent ξ * ∼ Exp(ρ). The result follows, since M is arbitrary.
We have shown in Example 6.4 that the age ξ * when the heir is born to an ancestor has an exponential distribution in the linear case. We now show the converse: this happens only in the linear case. (Recall that if w m = 0 for some m, the weights w k for k > m are irrelevant.) Theorem 6.10. Consider a general preferential attachment tree defined as in Example 6.3 by a sequence (w k ) ∞ 0 of weights. If the age ξ * when an ancestor gets an heir has an exponential distribution, then w k = χk + ρ for some χ ∈ R and ρ > 0 (at least until w k = 0, if that ever happens).
Consider, more generally, the equation for some real a and b, and all large s. Multiply (6.54) by (w 0 + s)/w 0 . This yields Now let s → ∞. On the left-hand side, each term except the first decreases to 0, and by dominated convergence, the sum converges to 1 + 0 + . . . ; thus (6.55) implies Consequently, (6.54) implies w 0 = a. Use this in (6.55) and subtract 1 to obtain is of the same type as (6.54), with the weights (w k ) shifted to (w k+1 ), and a replaced by w 0 − b. Hence, the argument above yields Thus, in both cases w 1 = w 0 − b. Moreover, if w 1 = 0, we can iterate the argument, and find w 2 = w 1 − b, w 3 = w 2 − b, and so on, as long as the weights are non-zero. Thus w k = w 0 − kb = χk + ρ, with χ = −b and ρ = w 0 .
Finally, we prove the result claimed above in the explosive case (6.8).
Theorem 6.11. Let T n be a general preferential attachment tree, defined by a sequence w k , and assume that the explosion condition (6.8) holds. Then Proof. Let T ∞ := T τ (∞) , the (infinite) tree obtained by stopping when the process explodes. Thus T n ⊂ T ∞ for every n. Let, for 1 i n ∞, , the indicator of the event that the i:th node (in order of appearance) v i has at least one child in T n .
Fix δ > 0, and let E i,δ be the event that the i:th individual (in order of birth) in the branching process gets at least one child before age δ, i.e., that it has ξ 1 < δ. Further, let J i,δ := 1{E i,δ }. The events E i,δ are independent and have the same probability P(ξ 1 < δ) = P(X 1 < δ). Thus, by the law of large numbers, Furthermore, a.s. τ (∞) < ∞, and then σ i > τ (∞) − δ for all but a finite number of i, i.e., all but a finite number of individuals have age less than δ when the process explodes. Hence, I i,∞ J i,δ for all but a finite number of i and, a.s., Since δ > 0 is arbitrary, this shows lim sup n→∞ n i=1 I i,∞ /n = 0 a.s. Furthermore, the finite tree T n is a subtree of T ∞ ; hence, a.s.,

m-ary search trees and branching processes
In this section, as always when we discuss m-ary search trees, m is a fixed integer with m 2. We apply the general theory in Section 5 to the m-ary search tree in Section 3. Recall from Section 3 that besides the m-ary search tree, we may also consider the extended m-ary search tree (including external nodes). It turns out that both versions can be described by stopped branching processes. It is easy to go between the two versions, but we find it instructive to treat them separately, and describe the two related but different branching processes connected to them. The reader is recommended to compare the two versions, even when we do not explicitly do so.
Remark 7.1. The random m-ary search tree is defined as in Section 3 to have a given number of keys, which makes the number of nodes random (in general). We can also define a random m-ary search tree with a given number of nodes, by adding keys until the desired number of nodes is obtained. This is obtained by the branching processes below, stopping when the number of nodes is a given number n; we thus use the weight ψ(t) = 1 in Example 5.3 (as in Section 6). The asymptotics are the same for this version, see Remark 5.17. We therefore ignore this version in the sequel, and consider only the standard version with a given number of keys. 7.1. Extended m-ary search tree. Recall from Section 3 that we can grow an extended m-ary search tree by starting with an empty tree (a single external node) and then adding keys, each new key added with equal probability to each existing gap. Hence, we can also grow the extended m-ary search tree in continuous time by adding a key to each gap after an exponential Exp(1) waiting time (independent of everything else). By the construction of the extended m-ary search tree in Section 3, this is a Crump-Mode-Jagers branching process, where the life of each individual is as follows (Pittel [112]): An individual is born as an external node with no keys. It acquires m − 1 keys after successive independent waiting times Y 1 , . . . , Y m−1 , where Y i ∼ Exp(i) (since the node has i gaps when there are i − 1 keys). When the (m − 1):th key arrives, the individual immediately gets m children.
We let ψ(t) be the number of keys stored at the individual at age t. Thus Z ψ t is the total number of keys at time t and τ (n) is the time the n:th key is added. Hence T n is a random m-ary search tree with n keys, as we want.
. . , m−1; for 1 k m−1, this is the time the k:th key arrives. Let further S m := ∞. Then ψ(t) = k for S k t < S k+1 . For θ 0 (in fact, for θ > −1) and k m − 1, (See also Theorem C.1, which further gives more the distribution of S k ; in the notation used in Appendix C, S k d = V k,k .) Furthermore, all children are born at the same time with ξ 1 = · · · = ξ m = S m−1 , and thus the random variable Ξ(θ) in Remark 5.7 equals me −θS m−1 . Hence, see (5.7) and (7.1), In particular, we see that µ(1) = 1, so the Malthusian condition (5.4) is satisfied with α = 1. It is easy to see that all other conditions (A1)-(A5) are satisfied. (Note that in this case, N = m is non-random. Furthermore, ψ is bounded, so (A6ψ) holds too.) Consequently, Theorem 5.14 applies, and shows (in particular) that the random fringe tree T * n converges in distribution to T , which is obtained by running the branching process above and stopping it after a random time τ ∼ Exp(1).
Similarly, Theorem 5.25 applies. In order to find the sin-tree T , note that since all children of an individual are born at the same time, so ξ 1 = · · · = ξ m = S m−1 , it does not matter which one is the heir. It thus follows from (5.34), that if we let Y 1 , . . . , Y m−1 be the successive waiting times between the arrival of keys for an ancestor, so all m children are born at time ξ = (6.4) and the proof of Theorem 6.9, with similar calculations in different but related situations.) The m children are numbered 1, . . . , m, with the heir chosen uniformly at random among them.
where H m := m 1 1 i denotes the m:th harmonic number. (See also (5.39) and (7.2).) The distribution of ξ * is given by Theorem C.1; using the notation there In particular, ξ * is not exponentially distributed unless m = 2.
In the construction of T t above, the number of gaps is always 1 + the number of keys, and we add keys (and thus gaps) with an intensity equal to the number of gaps. Hence, the number of gaps at time t forms a pure birth process with birth rates λ k = k, starting at 1 (this is again the Yule process in Example 6.1, see Example A.3), and thus the number of keys at time t forms a pure birth process with birth rates λ k = k + 1, starting at 0. (Note that this is independent of the choice of m.) Since τ has the same distribution Exp(1) here as in Example 6.1, it follows that the number of gaps in T = T τ has the same distribution as the number of nodes M in T τ in Example 6.1, given by (6.2). Moreover, by symmetry, conditioned on the number of keys K = k in T τ = T , T has the same distribution as the random extended m-ary search tree T k with k keys. Hence, we get the following result: The number K of keys in the asymptotic fringe tree T has the distribution Furthermore, T can also be described as an extended m-ary search tree with a random number K keys, where K has the distribution (7.6).
Remark 7.4. Using the notation in Definition B.1, K ∼ HG(1, 1; 3). (This also follows from Theorem A.5, with χ = ρ = α = 1.) The property in the second part of the theorem, describing the asymptotic fringe tree T as an extended m-ary search tree with a random number K keys is called coherence by Aldous [1, Section 2.6], and was seen also in Example 6.4. (In the present case with respect to the number of keys; we might call this key-coherent.) We proceed to derive some properties of the random extended m-ary search tree T n . Note that, unlike the examples in Section 6, T n does not have n nodes; n is the number of keys, while the number of nodes is random for m 3. (For m = 2, the number of nodes is 2n + 1, of which n are internal, see Section 3.) To find the asymptotic number of nodes, we use Theorem 5.12 and obtain the following result.
Theorem 7.5. For the extended m-ary search tree T n with n keys, The asymptotic value of the expectation E |T n |/n was found by Baeza-Yates [5]. We do not know any reference where (7.7) is stated explicitly, but closely related results for the number of internal nodes have been shown in several papers, see Remark 7.12; the result follows also immediately from the main result by Kalpathy and Mahmoud [83].
Proof. This follows from Theorem 5.12(ii), except for the value of m ψ , which we calculate as follows. Since and thus, Again, we do not know any reference where this is stated explicitly; the asymptotic values of the expectations E N k (T n )/n were found by Baeza-Yates [5]; see also the references in Remark 7.12. The result can also easily be shown using Pólya urns, see [74,Example 7.8], [83] and [67].
where P is the property of a node v that it contains k keys. Hence, p P (T ) is the probability that the root of T = T τ contains k keys, i.e., that ψ(τ ) = k or, equivalently, S k τ < S k+1 . We apply Theorem 5.14, and note that the characteristic φ there is φ(t) = 1{ψ(t) = k} = 1{S k t < S k+1 }. By (5.24), arguing similarly to (7.8)-(7.10) and in particular using (7.9), and recalling that S m := ∞, The result follows by Theorem 5.14. (Alternatively, one can use Theorem A.4.) Remark 7.7. In particular, the fraction of external nodes and thus the same holds for the number of internal nodes; the numbers of external and internal nodes are thus asymptotically the same. (Perhaps surprisingly, the asymptotic fractions of external and internal nodes are thus independent of m.) Remark 7.8. The asymptotic degree distribution D is not very interesting for the extended m-ary search tree, since every internal node has outdegree m and every external node has outdegree 0; thus, as a corollary of (7.13), P(D = 0) = P(D = m) = 1/2. 7.2. m-ary search tree, internal nodes only. Usually, we consider an m-ary search tree as consisting only of the internal nodes. This can be obtained from the tree with external nodes in Section 7.1 by deleting all external nodes, but it may also be constructed directly as follows, using a different Crump-Mode-Jagers process. We now start with a node containing a single key. Thus each individual is born as a node with 1 key. It acquires more keys after successive waiting times Alternatively, taking the children in order of birth, we may say that after the (m − 1):th key, there are m children born after successive waiting times , all waiting times independent. We let again the weight ψ(t) be the number of keys at time t in an individual. It is easy to see that then T n is a random m-ary search tree with n keys, as defined in Section 3.
The random variable Ξ(θ) in Remark 5.7 is now given by Its distribution is not the same as in Section 7.1, but the mean E Ξ(θ) = µ(θ) is easily seen to be the same as in (7.2), and thus we still have α = 1; similarly, by (5.39), β is the same as in (7.4), i.e., (That α has to be the same for the m-ary search tree with and without external nodes is rather obvious, since the number of internal nodes in T t is the same for both versions, and grows like e αt by (5.10) and (7.13).) The conditions (A1)-(A5) and (A6ψ) are satisfied, and Theorem 5.14 shows that the random fringe tree T * n converges in distribution to T , which is obtained by running this branching process and stopping it after a random time τ ∼ Exp(1).
Moreover, the random sin-tree T is constructed by the general procedure in Section 5. To find the distribution of an ancestor, we note that by symmetry, each child has the same probability 1/m of being the heir. Furthermore, using Y i and X j to denote the waiting times (corresponding to Y i and X j above) for an ancestor, it follows from (5.34) that conditioned on the heir being the child marked k, the joint distribution of Y 2 , . . . , Y m−1 , X 1 , . . . , X m has density Consequently, Y i ∼ Exp(i + 1) and, given that the heir is child k, X k ∼ (1) for j = k, all waiting times independent (conditioned on k).
Remark 7.9. The distributions of the birth times can be obtained from Theorem C.1; it follows that using the notation there, S Thus ξ * has the same distribution as for the extended m-ary search tree, see (7.5). (This is not surprising since we really construct the same trees in two somewhat different ways.) As for the extended m-ary search tree in Section 7.1, the number of gaps in the process (= 1 + the number of keys, i.e., 1 + Z ψ t ) forms a Yule process (see Example A.3), but in the present case it starts at 2, while it starts with 1 for the extended m-ary search tree in Section 7.1. (In other words, the number of gaps is the sum of two independent standard Yule processes.) The number of keys in T t thus evolves in exactly the same way for every m 2, and hence is the same as for the binary case m = 2 treated in Example 6.2. In particular, since also τ ∼ Exp(1) is the same for all m, the number K of keys in T τ has the distribution (6.6). Moreover, as for the extended m-ary search tree, if we condition on K = k, then T has the same distribution as the random m-ary search tree T k with k keys. Hence, we get the following result: Theorem 7.10. The number K of keys in the asymptotic fringe tree T has the distribution Furthermore, T can also be described as an m-ary search tree T K with a random number K keys, where K has the distribution (7.18).
Cf. (7.6), the similar result for the extended m-ary search tree, and note that the distribution (7.18) equals the distribution (7.6) conditioned on K 1. Furthermore, the number of keys thus grows as e t , so the number of nodes has to grow at the same rate, which again shows that α = 1. Note that the second part of the theorem is another instance of key-coherence.
As for the extended m-ary search tree in Section 7.1, the number of nodes is random for m 3. We can again find the asymptotics from Theorem 5.12, yielding the following theorem. (Alternatively, we can obtain the result from (7.7) and (7.11) for the extended m-ary search tree.) Theorem 7.11. For the m-ary search tree T n with n keys, In other words, the average number of keys per node converges a.s. to 2(H m−1 − 1).
For the variance and asymptotic distribution (which we do not consider in the present paper), there is an interesting phase transition: the variance is linear in n and the distribution asymptotically normal if m 26 but not if m 27, see [95], [90], [30], [29].
Proof. For the present branching process, th key comes to the node. Arguing as in (7.8)-(7.10) we find (omitting some details) and thus, Hence, Theorem 5.12(ii) yields The asymptotic number of nodes with a given number of keys can be found similarly. Note that the tree is constructed so that each node contains at least one key. (This theorem is also an immediate corollary of results by Kalpathy and Mahmoud [83], shown using a Pólya urn, see also [74,Example 7.8].) Proof. This follows either from (7.11) for the extended m-ary search tree or by a similar argument as in the proof of Theorem 7.6 (which we omit).
Finally, we give the asymptotic degree distribution D. (This was found, using a Pólya urn, by Kalpathy and Mahmoud [83], generalizing the special case of leaves (k = 0) given in [67].) Theorem 7.14. Let n k (T n ) be the number of nodes of outdegree k in T n . Then The asymptotic degree distribution is thus uniform on {1, . . . , m}, but with a large proportion of the nodes being leaves (outdegree 0). (For m = 2, the distribution is uniform on {0, 1, 2}.) Note that E D = 1, as always, see (5.42).
Proof. This follows by straightforward calculations from (5.33) and Remark 7.9, for example using Theorem C.1. However, we find it illuminating to instead give a less computational proof, using the properties of the exponential distributions. Recalling that D is the degree of the root of T , we consider the life of an individual (the root), stopped at τ ; we regard τ as an exponential clock (the doomsday clock ) that strikes at a random time, and then stops the process.
After the creation (at t = 0, and with a single key), the next thing that happens is either the arrival of the second key, or that the doomsday clock strikes. Since the second key arrives with intensity 2 and the clock strikes with intensity 1, the probability is 2/3 that the second key will arrive before the clock strikes. Conditioned on this event, the same argument shows that the probability that also the third key arrives before the clock strikes is 3/4, and so on. It follows that the probability that the node acquires all m − 1 keys before the clock strikes is 2 3 (Note that this argument also yields another proof of Theorem 7.13.) After the arrival of all m − 1 keys, assuming that the doomsday clock still has not struck, we wait for the m children. Each child arrives with intensity 1, and the clock strikes with the same intensity, so by symmetry (and independence), the order of the m births and the strike of the clock is uniform among all (m + 1)! possibilities. In particular, the position of the clock strike is uniform among these m+1 events, i.e., the number of children born before the clock strikes is uniform on {0, . . . , m}. Combining this and (7.25) we obtain, for 1 k m, and, including the cases where less than m − 1 keys arrive before the clock strikes, The result follows.
Remark 7.15. Note that the degree distribution in (7.24) differs from the degree distribution (6.43) for the random m-ary tree defined in Example 6.7; as said there, the two different types of random m-ary trees are thus not even asymptotically equivalent.

Median-of-(2ℓ + 1) binary search tree
Let ℓ 1 be a fixed integer. The random median-of-(2ℓ + 1) binary search tree, see e.g. [36], is a modification of the binary search tree in Example 6.2, where each internal node still contains exactly one key, but each external node can contain up to 2ℓ keys. (We can also include the case ℓ = 0; this is just the extended binary search tree, i.e., the special case m = 2 of Section 7.1.) The tree is grown recursively, starting with a single external node without any keys. The first 2ℓ keys are placed in this node. When the (2ℓ + 1):th key arrives to the node (or to another external node later in the process), the node becomes an internal node with two new external nodes as children, say v L and v R ; moreover, the median of the 2ℓ + 1 keys now at the node is found and put in the internal node, while the ℓ keys that are smaller than the median are put in the left child v L and the ℓ keys that are larger than the median are put in the right child v R .
In order to model this by a branching process, we start the tree with ℓ keys in the root. (This is no restriction, since the first ℓ keys always go there.) Then each external node will contain between ℓ and 2ℓ keys, throughout the process, and the median-of-(2ℓ + 1) binary search tree is produced by a branching process with the following life histories: An individual is born as an external node with ℓ keys. It acquires ℓ + 1 additional keys after successive independent waiting times Y 1 , . . . , Y ℓ+1 , where Y i ∼ Exp(ℓ + i) (since the node has ℓ + i gaps when there are ℓ + i − 1 keys). When the (ℓ + 1):th key arrives, the individual immediately gets 2 children.
We let the weight ψ(t) be the number of keys stored at the individual at age t. Thus Z ψ t is the total number of keys at time t and τ (n) is the time the n:th key is added. Hence, assuming n ℓ, T n is a random median-of-(2ℓ+1) binary search tree with n keys.
Note that this construction is very similar to the one for the extended m-ary search tree in Section 7.1, and we analyse it in the same way. Let . . , ℓ + 1; this is the time the node gets its (ℓ + k):th key. Then (See also Theorem C.1; in the notation used in Appendix C, S k d = V ℓ+k,k .) Furthermore, ξ 1 = ξ 2 = S ℓ+1 , and thus the random variable Ξ(θ) in Remark 5.7 equals 2e −θS ℓ+1 . Hence, see (5.7) and (7.1), In particular, we see that µ(1) = 1, so the Malthusian condition (5.4) is satisfied with α = 1. (Again, α = 1 has to hold since the number of keys is a Yule process, although now started with ℓ keys.) It is easy to see that all other conditions (A1)-(A5) are satisfied. Consequently, Theorem 5.14 applies; the asymptotic random fringe tree T is obtained by running the branching process above and stopping it after a random time τ ∼ Exp(1). Theorems 7.5 and 7.6 can be adapted with minor modifications as follows; we omit the proofs which are similar to the ones in Section 7.1, now using (8.2).
Theorem 8.1. For the median-of-(2ℓ+1) binary search tree T n with n keys, .
Let N e k (T n ) be the number of external nodes in T n with k keys, for k = ℓ, . . . , 2ℓ, and let N i k (T n ) be the number of internal nodes (all having one key). Then, for the median-of-(2ℓ + 1) binary search tree, Chern, Hwang and Tsai [31] consider (using different methods) a more general class of trees, where an external node has up to r − 1 keys; when the r:th key arrives to the node, a pivot is selected among them at random, such that its rank R (i.e., its number if the r keys are ordered) has some fixed distribution on {1, . . . , r}. (The case above is thus r = 2ℓ + 1 and R = ℓ + 1; in this case R is deterministic.) The pivot is put in the internal node, and its children get R − 1 and r − R keys. Translated to the branching process, this means (in general) that the individuals start with different number of keys, which would require a multi-type version of the results above (see Remark 5.9). However, it is possible to modify the branching process by including the external nodes in the life of their parent. Thus the individuals now are the internal nodes. (Properties of external nodes can be found using suitable characteristics.) The life of an individual starts with r keys; these are immediately split up with a random R as above, and we regard the individual as carrying two unborn children (fetuses) with initially R − 1 and r − R keys. The fetuses get new keys, independently of each other and each with rate 1 + the number of existing keys, and each is born when it has got r keys. We omit the details.
This version can be treated as above; again each individual starts with ℓ keys, but now it acquires (m − 1)(ℓ + 1) more keys, after waiting times , m children are born. Note that (7.1)-(7.2) generalize to, cf. the special case (8.3), and hence again α = 1. It then follows from (8.7) and (5.39) cf. (7.4) for the case ℓ = 0. Results for this model can be derived as above, but we leave this to the readers. More generally, one can similarly make an m-ary version of the model with random pivot in Remark 8.3, see [31]; a corresponding Crump-Mode-Jagers branching process (with the internal nodes as individuals) can be constructed as there.

Fragmentation trees
Another type of example is provided by the following fragmentation process, introduced by Kolmogorov [86], see also Bertoin [9, Chapter 1] and Janson and Neininger [80], and the further references given there. Fix b 2 and the law for a random vector V = (V 1 , . . . , V b ); this is commonly called the dislocation law. We assume that 0 V j 1, j = 1, . . . , b, and b j=1 V j = 1, (9.1) i.e., that (V 1 , . . . , V b ) belongs to the standard simplex. For simplicity we also assume that each V j < 1 a.s. (We allow V j = 0.) Starting with an object of mass x 0 1, break it into b pieces with masses V 1 x 0 , . . . , V b x 0 . For a given threshold x 1 ∈ (0, x 0 ], continue recursively with each piece of mass x 1 , using new (independent) copies of the random vector (V 1 , . . . , V b ) each time. The process terminates a.s. after a finite number of steps, leaving a finite set of fragments of masses < x 1 . We regard the fragments of mass x 1 that occur during this process as the (internal) nodes of a (random) tree, the fragmentation tree; the resulting fragments of mass < x 1 can be added as external nodes.
Obviously, the fragmentation tree depends only on the ratio x 0 /x 1 , so we denote it by T x 0 /x 1 . (We may assume either x 0 = 1 or x 1 = 1 without loss of generality, but we prefer to be more flexible.) We can translate the fragmentation process to a Crump-Mode-Jagers branching process by regarding a fragment of mass x as born at time log(x 0 /x); an individual will have b children, born at ages ξ 1 , . . . , ξ b with ξ i := − log V i . (If some V i = 0, we get ξ i = ∞, meaning that this child is not born at all, so there are fewer than b children. Note also that in this section, we do not require that ξ 1 , ξ 2 , . . . are ordered in increasing order.) It is easy to see that the fragmentation tree T x 0 /x 1 defined above for a threshold x 1 is the same as the family tree T log(x 0 /x 1 ) of this branching process at time log(x 0 /x 1 ).
The relation (9.1) can be written as b j=1 e −ξ i = 1.
Taking the expectation we find, see (5.7), µ(1) = 1, so the Malthusian parameter α = 1. It is easy to see that the assumptions (A1)-(A5) hold, except possibly (A2); we say that the fragmentation process is non-lattice if (A2) holds, i.e., if not every V i is concentrated on {r, r 2 , r 3 . . . } for some r ∈ (0, 1). (A sufficient condition for (A2) is thus that V 1 has a continuous distribution.) Furthermore, (9.2) and (5.6) say that Ξ(α) = 1 is non-random. This has the consequence that the random variable W in Remark 5.11 also is deterministic; more precisely, see [71, Theorem (6.8.1)], where by (5.40),  5) and this equation together with (5.15) (provided (5.11) holds) determines W uniquely, see [43]. Assuming α = 1 (which can be regarded as a normalisation of the time scale), it is easy to see that a constant W satisfies (9.5) if (9.2) holds, which gives an alternative proof of (9.3). Yet another proof of (9.3) is obtained by noting that when (9.2) holds, the martingales R n and Y t in [104]  Remark 9.2. Note that unlike the trees studied in the previous sections, we consider the family tree T t at a fixed time t = log(x 0 /x 1 ) instead of stopping when some weight Z ψ t reaches a given value. However, since W is constant, this makes a very small difference. In fact, by (5.9) and (9.3), Z t ∼ β −1 e t a.s., and thus, if we use the characteristic ψ(t) = 1 again, the stopping time τ (n) when the tree has n nodes satisfies a.s. τ (n) = log(βn) + o(1) = log n + log β + o(1). (9.6) We may define a fragmentation tree T n of fixed size n by stopping at τ (n); in the original formulation this means that we choose the threshold x 1 to be the size of the n:th largest fragment in the process, so that there will be exactly n fragment of size x 1 (unless there is a tie). We see from (9.6) that asymptotically, this is almost the same as taking a constant time t = log n + log β. For more precise results on |Z t |, and thus on τ (n), see [80]. Theorem 9.3. Let T x 0 /x 1 be a random fragmentation tree defined as above, for a non-lattice fragmentation process. Then Theorem 5.14 holds also (as x 0 /x 1 → ∞, and with other obvious notational modifications) for the random fringe tree T * x 0 /x 1 . The limiting random fringe tree T can be constructed by the fragmentation process above, starting at x 0 = 1 and with a random threshold x 1 = U ∼ U (0, 1), with U independent of the fragmentation.
Proof. By the equivalence above of the fragmentation process and the Crump-Mode-Jagers branching process, T x 0 /x 1 = T log(x 0 /x 1 ) , and the first part follows from Theorem 5.14 (and its proof).
The limiting fringe tree T is obtained by stopping the branching process T t at a random time τ ∼ Exp(1); by the equivalence above, this is equivalent to starting the fragmentation process at x 0 = 1 and stopping at a threshold x 1 = exp(−τ ). This completes the proof, since exp(−τ ) ∼ U (0, 1).
Similarly, Theorem 5.25 holds, and the random sin-tree T can be defined by a suitable extension of this random fragmentation process; we leave the details of the general case to the reader, and discuss only one case in the example below.  1). Thus, at each fragmentation event, the object is split into two parts, with uniformly random sizes.
In the corresponding Crump-Mode-Jagers branching process, each individual gets two children, born at ages ξ 1 and ξ 2 , where ξ 1 , ξ 2 ∼ Exp(1) and one of them determines the other by Note the similarities with the Crump-Mode-Jagers branching process for the binary search tree in Example 6.2; the difference is that there ξ 1 and ξ 2 are independent, while here they are dependent. For properties that depend only on the individual (marginal) distributions of ξ 1 , ξ 2 and not on their joint distribution, we thus have the same results for both processes; some examples are the intensity µ, the distribution of ξ * ∼ Exp(2) and its mean β = 1/2, and the expected size of the population E Z t = 2(e t − 1). However, many properties really depend on the joint distribution of the times of birth of the children, and are thus in general different for the two processes. For example, although E Z t is the same for both processes, the distributions of Z t are not: for the present process, there is by (9.7) always one child of the root born before time log 2, so Z log 2 2, while for the process in Example 6.2, P(Z t = 1) = e −2t > 0 for every t 0. Also the fringe tree distributions will be different, as is seen below.
Let us first consider the asymptotic outdegree distribution in the fragmentation tree, which equals the distribution of the root D in T . We have, using the construction of T in Theorem 9.3, and simple calculations yield and similarly P(D = 1) = 1/2 and P(D = 2) = 1/4. Consequently, D ∼ Bi(2, 1/2) has a binomial distribution. (This differs from Example 6.2, where D has a uniform distribution.) Furthermore, let X 1 := V 1 /U and X 2 := (1 − V 1 )/U be the masses of the two children of the root, relative to the threshold U . Then X 1 , X 2 > 0 and X 1 + X 2 1, and a calculation of the Jacobian of the mapping (U, V 1 ) → (X 1 , X 2 ) shows that in this region, (X 1 , X 2 ) has the density f (x 1 , x 2 ) = (x 1 + x 2 ) −3 . This enables us to again compute the distribution of D; for example, D = 0 ⇐⇒ X 1 , X 2 < 1. Moreover, we can now easily find the distribution of nodes in the second generation too; we give a few examples.
Denote the children of the root by v 1 and v 2 . Then T contains v 1 but not v 2 if and only if X 1 1 > X 2 ; denote this event by E 1 . Conditioned on E 1 , the density of X 1 is, by a small calculation, 2 x −2 − (x + 1) −2 , x 1. Furthermore, the outdegree of v 1 is given by (9.8) with U replaced by 1/X 1 (and V 1 by an independent copy V 11 ); hence, in analogy to (9.9), and, similarly, Recall that if we just condition on v 1 ∈ T , its outdegree distribution equals the unconditional distribution of D, i.e., Bin(2, 1/2); hence, (9.10)- (9.12) illustrate the dependencies between the outdegrees of different nodes (in this case, o and v 1 ), see Remark 5.24. We obtain also P(|T | = 1) = P(D = 0) = 1/4 and, from (9.10), Again, this differs from the binary search tree in Example 6.2. We do not know any general formula for the probability distribution of the size |T |.
The irrational probabilities in (9.10)-(9.12) and (9.13) seem to exclude any simple combinatorial construction or interpretation of the asymptotic fringe tree T .
To construct the limiting sin-tree T in Theorem 5.25, we note that the heir ξ * of an ancestor has distribution Exp(2) by (5.34) (as in Example 6.2). Going back to the mass scale, we note that Y := exp( ξ * ) has the Pareto(2) distribution x > 1. (9.14) The random sin-tree T can thus be constructed as follows: Start with a root o of mass 1 (as in Theorem 9.3) and give it an infinite sequence of ancestors with the distribution (9.14); the other child of the ancestors thus has mass . . . Grow independent fragmentation trees from these other children and from o, using uniformly random binary splittings, and stop at a common threshold x 1 = U ∼ U (0, 1).
Remark 9.5. We have, for simplicity, assumed that the branching factor b is a constant finite integer. (Although we may allow fewer than b fragments by letting some V i = 0.) We can also allow b = ∞, or a random b (which can be reduced to b = ∞ by adding variables V i that are 0). The results above extend, provided (A5) holds.
Remark 9.6. As noted in Section 5, the results extend also to the lattice case, with minor modifications, but for simplicity we ignore that case. Only very special fragmentation processes are lattice; one trivial example is the deterministic symmetric binary splitting V 1 = V 2 = 1/2. More generally, the deterministic binary splitting V 1 = p, V 2 = q = 1 − p is lattice if and only if log p/ log q is rational. For a random example, let r = ( √ 5 − 1)/2, take b = 3 and let (V 1 , V 2 , V 3 ) be either (r, r 2 , 0) or (r 2 , r 2 , r 3 ) with probability 1/2 each.
Remark 9.7. The split trees defined by Devroye [38] are related to fragmentation trees. A split tree is a b-ary tree defined using a number of balls that enter the root and are distributed (randomly and recursively) to the subtrees of the root and further down in the tree according to certain rules that are based on a splitting law V = (V 1 , . . . , V b ) satisfying (9.1), see [38] for details. (A splitting law is thus the same as a dislocation law.) Far away from the fringe, where there are many balls and the law of large numbers applies, the numbers of balls in different subtrees are distributed asymptotically as the masses in the corresponding fragmentation tree, so there are many similarities between the two types of random trees. However, at the fringe, the details differ, and the asymptotic fringe distributions are in general not the same. For example, the binary search tree in Example 6.2 can be defined as a split tree, where the splitting law U (0, 1) uniform. The corresponding fragmentation tree is thus the tree studied in Example 9.4, and as noted there the asymptotic fringe tree distribution is not the same as for the binary search tree; for example, the degree distributions differ. Fringe distributions of split trees will be studied in another paper.

Rank
Define, following Bóna and Pittel [20], the rank of a node in a rooted tree to be the smallest distance to a descendant that is a leaf. Thus a leaf has rank 0, while a non-leaf has rank 1. A node with rank k is also said to be k-protected. (For example, 1-protected = non-leaf; 2-protected = non-leaf and no child is a leaf.) The simplest "non-trivial" case is 2-protected, which sometimes is called just protected. There has in recent years been a number of papers on the number of 2-protected nodes in various random trees, or (equivalently) the probability that a random node is 2-protected, and a few papers on k-protected nodes for higher k; see e.g. Devroye and Janson [41] and the references therein. Such results can equivalently be described as results on the distribution of the rank of a random node.
For a tree T (deterministic or random), let R(T ) be the rank of a uniformly random node in T , and let R 0 (T ) be the rank of the root of T . (Thus R(T ) is a random variable, while R 0 (T ) is deterministic if T is.) Since the rank of v depends only on the subtree T v , R(T ) = R 0 (T * ), the rank of the root of the random fringe tree T * . This reduces the study of rank and k-protected nodes to the study of random fringe trees. (This was the method by Devroye and Janson [41], there applied to several classes of random trees, including random recursive trees and binary search trees but also conditioned Galton-Watson trees which are not of the type considered in the present paper.) For the random trees considered here, Theorem 5.14 applies, for any fixed k 0, to the property that a node has rank k (i.e., is k-protected); we denote this property by P k (in this section) and deduce the following. (Note that, depending on one's point of view, (10.1) can be seen both as a limit result for the distribution of the rank, and as a limit result for the proportion of k-protected nodes.) Theorem 10.1. Suppose that (A1)-(A5) hold. Then, for any k 0, as n → ∞, In other words, the conditioned random variables R(T n ) | T n converge in distribution a.s. as n → ∞, In particular, the same holds for the unconditioned random variables, i.e., Proof. An immediate application of Theorem 5.14.
10.1. m-ary search tree. We consider here the rank and k-protected nodes in the m-ary search tree in Section 7.2. (The binary case m = 2 has been studied by Mahmoud and Ward [98], Bóna [19], Devroye and Janson [41], Bóna and Pittel [20], Holmgren and Janson [66]; the case m = 3 by Holmgren and Janson [67] and some higher m by Heimbürger [62].) We let as in (10.1) P k = P k (m) := P R 0 (T ) k . Thus, by (10.1), the fraction of k-protected nodes in an m-ary search tree T n converges a.s. to P k (m). Recall that (for m 3) the number of nodes |T n | is random. Hence it is interesting to study not only the fraction of k-protected nodes, but also the (random) number n P k (T n ) of them in T n . (The results are formulated in this way in some of the references above.) We note that, as an immediate consequence of (10.1) and (7.19), see Remark 5.18, .
Note also that this implies asymptotics for the expectation E n P k (T n ), see Remark 5.19. We proceed to the calculation of the numbers P k (m). It will be convenient to use the extended m-ary search tree in Section 7.1, but note that we really are interested in the subtree of internal nodes; to emphasize this we say internally k-protected for the k-protected nodes in the tree of internal nodes. As usual, m is fixed and will often be omitted from the notation.
With this in mind, define, for k 0, h k (t) := P(the root of T t is internal and internally k-protected). (10.5) The root becomes an internal node at time S 1 ∼ Exp(1), when it receives its first key. Every node is 0-protected, so h 0 (t) is the probability that the root is internal; thus Recall that all m children of the root are born at time ξ = S m−1 . Let f (t) be the density function of ξ = S m−1 . For k 1, the root of T t is internal and internally k-protected if and only if t > ξ and the m children of the root either are external or internally (k − 1)-protected, but not all external. Conditioned on ξ, with ξ < t, the m subtrees of the root of T t are independent, and distributed as T t−ξ . Hence, the conditional probability (given ξ) that a given child is internally This can be written as a convolution This, with the initial (10.6), makes it possible to calculate any h k (t) by recursion (preferably using computer algebra). By Theorem 5.14 and (10.5), the fraction of nodes in T n that are internal and internally k-protected converges a.s. to P(the root of T is internal and internally k-protected) =  In the binary case m = 2, these formulas are equivalent to the formulas derived by a very similar argument in Devroye and Janson [41]. (The function r k (t) in [41] equals 1 − e −t − h k (t).) For a different method to find P k (2), see Bóna [19] and Bóna and Pittel [20].
Consequently, Theorem 10.1 says that for an m-ary search tree the asymptotic distribution of the rank is where P k is given by (10.11)-(10.12). Note that (by induction), each h k (t) is a polynomial in e −t and t with rational coefficients. Hence, each P k (m) is a rational number.
Trivially, P 0 = 1 for every m by (10.1). For k = 1, h 0 (t) + e −t = 1 by (10.6), and thus by (10.12), in accordance with (7.24) (recall that the 1-protected nodes are precisely the non-leaves). Also for P 2 , we may deduce a rather simple formula. Proof. We extend the notation of Section 7.1 and let S j := j i=1 Y i for any integer j 0, where Y i ∼ Exp(i) are independent. (We thus change the earlier special definition of S m .) By Theorem C.1, where Since h 1 (t)+e −t = 1 by (10.6), (10.8) yields, recalling that f is the density function of S m−1 and using (10.16), Theorem 10.3. The asymptotic probability that a random node in an m-ary search tree T n is 2-protected is Proof. By (10.12), a binomial expansion, (10.15), the change of variables x = e −t and a standard evaluation of a beta integral, Remark 10.4. We can also prove this result using a more combinatorial proof with balls and boxes. We recall from Theorem 7.10 that the asymptotic fringe tree T can be constructed as an m-ary search tree with a random number K of keys, where by (7.18) P(K = k) = 2 (k+1)(k+2) = 1/ k+2 2 , k 1. We condition on K = k and find the probability that the root of T k is 2-protected.
Recall that a node is 2-protected if it is not a leaf and has no child that is a leaf. Thus, the root of T k is 2-protected if and only if it is filled with m − 1 keys and each of the m subtrees of the root has the property that it is either empty or contains at least m keys, and at least one of the subtrees is nonempty; in particular, we must have k 2m − 1.
For k 2m − 1 we order the keys in increasing order and represent the k − m + 1 keys that are distributed to the m subtrees of the root by 0's and the m − 1 keys that stay in the root by 1's. We also add two additional 1's first and last. This gives a string of length k + 2, beginning and ending with 1, and with m − 1 additional 1's. There are k m−1 such strings, and all occur with the same probability.
Furthermore, the corresponding tree T k is 2-protected if and only if between every pair of 1's in the string, there is either no 0's, or at least m 0's. In other words, the 1's appear in clusters, separated by at least m 0's. Let the number of clusters be r + 1, and note that 1 r m. To count the number of strings of length k + 2 such that these properties are satisfied for a given r, we first distribute the m + 1 1's into r + 1 boxes, such that no box is empty. This gives m r different choices. We then distribute the k − m + 1 0's into the r gaps between the clusters; it is required that there should be at least m 0's in each gap, but the remaining k − m + 1 − mr 0's can be distributed arbitrarily into the r gaps. This can be done in k+r−m(r+1) r−1 ways. Hence, summing over r and k and using (7.18), we obtain The sum over k can be written as a hypergeometric sum (B.1), which using Gauss' formula (B.2) simplifies and yields We can find the asymptotics of P 2 (m) as m → ∞ from (10.17).
We give some numerical examples for small m and k, calculated by Maple using (10.11), (10.12) or (for k = 2) (10.17). Recall that P 0 (m) = 1 and P 1 (m) = 2/(m+1) by (10.14). The value P 2 (2) was first found by Mahmoud and Ward [98]. Bóna [19] found also P 3 (2) and P 4 (2) in an equivalent form; in our notation he computed P(R 0 (T ) = k) for k 4; this was extended to k 6 by Bóna and Pittel [20], see also Devroye and Janson [41]. For m > 2, P 2 (m) was found using Pólya urns for m = 3 by Holmgren and Janson [67] and for m = 4, 5, 6, 7 by Heimbürger [62]. The values of P 3 (3) and P 3 (4) are new. (We have also calculated e.g. P 4 (3) and P 3 (5), but they have too many digits to fit on a line.) The numerators and denominators of these rational numbers evidently grow very rapidly with k; Bóna and Pittel [20] note that (in our notation) the denominator of P 6 (2) has 274 digits, but the largest prime factor is only 61, and they show that in general, the largest prime factor of the denominator of P k (2) is at most 2 k + 1. This can be generalized to arbitrary m, using the recursion above; this also gives a new and simpler proof for m = 2. (Nothing similar seems to hold for the numerators; they typically have only a few and often large prime factors in these examples. The numerator of P 2 (13) happens to be a prime with 41 digits.) Theorem 10.7. The largest prime factor of the denominator of P k (m) is at most m k + 1, for any k 1 and m 2.
Proof. A simple calculation shows that for integers j, a, b 0 with a = b, the convolution t j e −at * e −bt is of the form j i=0 c i t i e −at + c ′ e −bt with coefficients It follows by (10.9), (10.15) and induction that for k 1, h k (t) is a polynomial in e −t and t of degree m k in e −t and of degree (at most) 1 + m + · · · + m k−2 = (m k−1 − 1)/(m − 1) in t, with rational coefficients whose denominators have all their prime factors < m k . The result then follows from (10.11).
For the binary case m = 2, the probabilities P k (2) where shown to have an exponential decay by Bóna and Pittel [20]. We conjecture that this holds for m 3 too, but leave that as an open problem.
10.2. Random recursive tree. Consider the random recursive tree in Example 6.1. This has been studied by Mahmoud and Ward [99] and Devroye and Janson [41]; we follow here [41]. Let T t be the Yule tree process in Example 6.1 and define p k (t) := P(R 0 (T t ) k), (10.33) the probability that the root of T t is k-protected. By the construction of the fringe tree T = T τ , with τ ∼ Exp(1), the limit P k in (10.1) is given by The functions p k (t) can, in principle, be found by recursion. The children of the root in T t arrive according to a Poisson process with intensity 1, and a child that is born at time s t is not (k − 1)-protected at time t with probability 1 − p k−1 (t − s). Hence, for any k 1, the number of children of the root in T t that are not (k − 1)-protected at time t is Poisson distributed with mean Since the root is k-protected if and only if there is no such child, but there is at least one child, and the probability that there is no child at all is e −t , we obtain the recursion in accordance with Mahmoud and Ward [99]. In principle, the recursion (10.35) yields p k (t) and P k for larger k too, but we do not know any closed form for k 3.

Maximal clades in m-ary search trees
We define a clade in an m-ary tree to be a node with less than m children. (In the formulation using extended m-ary search trees with external nodes, a clade is thus a node with at least one external child.) A maximal clade is a clade such that no ancestor is a clade.
Remark 11.1. The reason for this somewhat strange terminology comes from applications of the binary case m = 2 to mathematical biology, where the clade is regarded as a set of external nodes, see e.g. Blum and François [17], Durand, Blum and François [50], Chang and Fuchs [25], Durand and François [51], Drmota, Fuchs and Lee [46] and (for the elementary equivalence with the definition here) Janson [78]. We consider here the natural extension to m-ary trees. (As a mathematically interesting example; we do not claim any biological applications.) The number of clades is thus the number of nodes with outdegree less than m, and the fraction of such nodes is by Theorem 5.14 asymptotically given by the probability that the root of the asymptotic fringe tree T has outdegree less than m. (This is found to be 1 − 2 m(m+1) in Theorem 7.14.) The property that a clade is maximal, however, depends also on its ancestors, and therefore we need the extended fringe and the random sin-tree T ; moreover, we have to consider all ancestors, so Theorem 5.25 does not apply and we use Theorem 5.26. Proof. We apply Theorem 5.26 with P 0 = Q = "the outdegree is < m", i.e., the property that a node is a clade. Then P in Theorem 5.26 is the property that a node is a maximal clade. The assumption (5.49) holds trivially, since Ξ(α) m. The random variable Λ is the time the root of T t gets it final child; by Remark 7.9, this can be written as a sum of a number of exponential variables (with different rates), and thus (5.50) holds for some small δ > 0. (In fact, for all δ < 1, by Remark 7.9 and Theorem C.1.) Hence, Theorem 5.26 applies and the result follows.
The constant P mc (2), i.e., the asymptotic proportion of maximal clades in a binary search tree, was found to be (1 − e −2 )/4 by Durand and François [51], see also [46] and [78]. We give a different proof of this, using the properties of the sin-tree T in Example 6.2. Proof. Recall the general construction of the sin-tree T in Section 5 and the specific version for the binary search tree in Example 6.2. In the construction, we stop the tree at τ ∼ Exp(1), but we first consider the tree T t at a fixed time t 0. (Equivalently, we condition on τ = t.) We thus want to compute the probability P o is a maximal clade in T t . We first note that o is a clade unless it already has got its two children; each child has appeared with probability 1 − e −t and thus We also require that no ancestor is a clade, i.e., that each ancestor has two children. Note that each ancestor has an heir, so it is not a clade if and only if the other child is not yet born. Suppose that the ancestors are born at times −η 1 , −η 2 , . . . , and condition on these times. Ancestor o (i) thus has age η i + t at time t, so the probability that it is not a clade is e −(η i +t) .
Consequently, using the independence of different parts of the sin-tree, (11.4) The next step is to find the expectation of (11.4) over all {η i }. In the present case, this is not difficult since, by Example 6.2, {−η i } is a Poisson process with intensity 2 on (−∞, 0), and thus {η i } is a Poisson process with intensity 2 on (0, ∞). For any Poisson process Ξ = {ξ i } on some space S, with intensity measure λ, and any function f on S with 0 f (x) 1, there is a standard formula Finally, recalling that T = T τ with τ ∼ Exp (1), For further, somewhat surprising, results on the number of maximal clades in the binary case (moments and asymptotic distribution), see Drmota, Fuchs and Lee [46] and Janson [78].
Problem 11.4. Unfortunately, we do not know how to compute P mc (m) for m > 2, and we leave this as an open problem. Using the description in Section 7.2 of T , it is straightforward to modify (11.3)-(11.4) (although the result is more complicated since the birth times do not have exponential distributions, see Remark 7.9), but the birth times of the ancestors do not form a Poisson process so (11.5) does not apply and we do not know how compute the expectation.
We can use the same method for other, related, problems. We give two examples. Let us first consider again the binary search tree, but we simplify the property of being a maximal clade studied above by considering only the condition for the ancestors but ignoring the number of children. Thus, let P x be the property of a node that none of its ancestors has only one child, and let n x (T ) be the number of nodes in T with this property. Theorem 11.5. If T n is a random binary search tree with n keys, then Proof. We apply Theorem 5.26 as in the proof of Theorem 11.2, but with P 0 the trivial property "true". This yields convergence almost surely, to the limit P(o has P x in T ). This probability is computed as in the proof of Theorem 11.3, replacing the factor (11.3) by 1, which yields the result (11.9) The property P x , as formulated above, can be studied also in other trees. We consider the random recursive tree as a different simple example.
Theorem 11.6. If T n is a random recursive tree with n keys, then Proof. We argue as in the proof of Theorem 11.5, now using the description of the sin-tree T in Example 6.1. In this sin-tree, the ancestors form a Poisson process with intensity 1 on (−∞, 0), and, as in the binary search tree case, for an ancestor, the time until birth of the first non-heir is Exp(1). Hence the limit P(o has P x in T ) can be calculated by the method above, now yielding, cf. (11.9),

Restricted sampling and sampling by a random key
We have so far considered the properties of a random node in the tree T n . As pointed out by Jagers and Nerman [72], one can similarly obtain results for a random node sampled with some restriction. (For example, a random leaf, a random non-leaf, a random node with no sibling, . . . .) In general, let Q be a property of the type in Theorem 5.14 or 5.25 and sample v uniformly among all nodes in T n that satisfy Q. If P is another such property, then, by Theorem 5.25, If we let T Q denote T conditioned on o ∈ Q, then we can write (12.1) as If Theorem 5.14 applies, we can replace T by T in (12.1)-(12.2).
Example 12.1. We have already seen an example of this in the first suggested proof of Theorem 7.13, where we note that sampling a node uniformly in an m-ary search tree is the same as sampling an internal node uniformly in the corresponding extended m-ary search tree. Thus T n is the extended m-ary search tree and Q is "internal". Furthermore, P = P k is the property of having exactly k keys. (In this example, Q is the complement of P 0 , so P 0 ∧ Q is the empty property while P k ∧ Q = P k for k 1.) Let us consider the example of sampling a random leaf v in more detail. Of course, the fringe tree T v rooted at v is trivial, so the interest is in the extended fringe and in properties of the type in Theorem 5.25. For example, Drmota, Gittenberger, Panholzer, Prodinger and Ward [47] study the number of internal nodes (and the number of leaves) in the subtree rooted at the father of a randomly chosen leaf, for a variety of different types of random trees.
We have the following general result.
Theorem 12.2. Suppose that (A1)-(A5) hold, and that P is a property as in Theorem 5.25. If v is a uniformly random leaf in T n , then

3)
where T leaf is T conditioned on o being a leaf. The random sin-tree T leaf may be constructed directly from the tree process ( T t ) in Section 5 by removing all descendants of o and stopping at a random time τ o with the density function t > 0, (12.4) where ξ 1 is the time of birth of the first child of an individual in the branching process. In particular, if ξ 1 ∼ Exp(a) for some a > 0, then τ o ∼ Exp(a + α).
Proof. Let Q be the property of a node that it is a leaf. Then (12.3) is the same as (12.2), with T leaf = T Q , i.e., T conditioned on o being a leaf. To see that T leaf can be constructed as stated, note that in the construction of the tree process ( T t ) in Section 5, the descendants of o and the rest of the tree are independent. Since T is obtained by stopping T t at τ , it follows that if we ignore descendants of o, T leaf is obtained by stopping T t at an independent random time τ o having the distribution of τ conditioned on o being a leaf in T τ . Moreover, if the first child of o is born at ξ 1 , then o is a leaf in T = T τ if and only if ξ 1 > τ . Since τ has the density function αe −αt , it follows that conditioned on the event ξ 1 > τ , τ has the density function (12.4). A simple calculation shows that ξ * + τ o has the density function 2e −t 1 − e −t , t > 0, while Y t ∼ Ge 1 (e −t ), see Example A.8. Hence, for any k 1, with x = e −t , .
We have here considered only the number of nodes in T 1 . However, it is furthermore clear from the symmetry of the Yule tree process that given X = | T 1 |, the random tree T 1 is distributed as a random recursive tree of order X, i.e., T 1 is a random recursive tree with random order X given by (12.6). This describes, at least in principle, any properties of T 1 . For example, we may as in [47] count leaves and non-leaves separately in T 1 . It is easy to see by induction that if k 2, then P(T k has i leaves) = P(T k has i non-leaves) = where k−1 i−1 denotes the Eulerian number (see e.g. [59] or [107]); i.e., the number of leaves in T k is distributed as 1 + the number of ascents in a random permutation of length k − 1. (In fact, both random vectors (#leaves, #non-leaves) and (1 + #ascents, 1 + #descents) evolve when k is increased as generalized Pólya urns with balls of two colours where we draw a ball and return it together with a ball of the opposite colour.) Consequently we find, for k 2 and 1 i k, P T 1 n has i non-leaves and k − i leaves a.s. −→ P T 1 has i non-leaves and k − i leaves Let p i := P T 1 has i non-leaves . Summing (12.8) over k we find for example, after short calculations (partly assisted by Maple), p 1 = 6 − 2e ≈ 0.563, p 2 = 11 − 4e ≈ 0.127, p 3 = 857 54 − 5e − 1 2 e 2 + 2 27 e 3 ≈ 0.072. Using [107, (26.14.6)], it is easy to see that p i is a polynomial in e with rational coefficients, of degree at most i, but we do not know any simple general formula for p i . 12.1. Sampling a random key. Similarly, in an m-ary search tree, one might sample a key uniformly at random and consider the properties of the node containing that key.
Theorem 12.5. Let T n be a random m-ary search tree, and let P be a property as in Theorem 5.14. Sample a random key uniformly, and let v be the node containing that key. Then, as n → ∞, letting R(T ) denote the number of keys in the root of T , Proof. Let Q k be the property of a node v that it contains k keys. Then, by Theorem 5.14, which equals the second term in (12.9) because T ∈ Q k ⇐⇒ R(T ) = k. Furthermore, for the same reason, Since k kn Q k (T n ) = n, the total number of keys, (12.11) and Theorem 7.11 imply E R(T ) = 2(H m − 1) (12.12) which completes the proof. (Alternatively, (12.12) follows from Theorem 7.13, noting that the limits in (7.23) are the probabilities P(R = k).) Example 12.7. Let K ′ be the number of keys in the node containing a random key in an m-ary search tree T n . Theorems 12.5 and 7.13 imply that For m = 3, 4, 5, this yields the limit distributions 2 5 , 1 5 ,

Height, profile and typical depth
We consider in this paper fringe properties of random trees. However, the connection with Crump-Mode-Jagers branching processes has also been used very fruitfully to study properties related to the distance to the root, in particular the height of the tree. This was pioneered by Devroye [32] using results by Kingman [84] and Biggins [11; 13] for branching random walks with discrete time (based on Galton-Watson processes), see also Devroye [33], Mahmoud [93], the survey Devroye [37], and Broutin and Devroye [21]. (Partial results for the binary search tree had been proved earlier by Pittel [111], using the same continuous-time branching process as [32] in a somewhat different way.) The method was further developed by Biggins [14; 15] using the continuous-time Crump-Mode-Jagers branching processes used in the present paper. We give in this section a description of the method and some applications and examples; see the papers just mentioned for further details and results. (In particular, note the second order results in [33; 37].) Recall that the depth h(v) of a node v is its distance from the root. The height H(T ) of a tree T is defined as max v∈T h(v), the maximum depth of a node. If we consider m-ary trees, we define the saturation level S(T ) (also called fill-up level) to be the last generation that is full, i.e., the largest k such that there are m k nodes of depth k; this equals the minimum depth of a node with outdegree < m.
The key idea that makes it possible to apply results on branching random walks is to plot the individuals in a branching process in the plane, using two coordinates that we call time and position; time is the usual time of birth in the branching process and position is an additional variable. We assume that for each individual is defined, besides the sequence (ξ i ) N i=1 of birth times of the children (relative to the birth of the parent), also a sequence (η i ) N i=1 (of the same length N ) of random displacements, with −∞ < η i < ∞; if the parent is born at time and position (σ, y), then child i is born at time and position (σ + ξ i , y + η i ). (The general results in [14], [15] allow also a further random component, describing a random motion of each individual during its life. For our purposes, we put that motion equal to 0 and let each individual be static.) Results for branching random walks have been applied to the height (and other properties) of random trees in two different ways. In the original application of Devroye [32], see also [33; 37], the "position" is what we have called time in the Crump-Mode-Jagers branching process, while "time" is the number of the generation, i.e., the depth in the family tree T t . This means that "time" is discrete and that we consider a Galton-Watson process where each individual has a position that is its time of birth in the Crump-Mode-Jagers process studied elsewhere in the present paper. (Furthermore, in this application, the Galton-Watson process is deterministic; in the original application to binary search trees, we consider an infinite binary tree.) Note that H(T t ) n if and only if the minimum position of an individual in generation n is t, which gives the required connection with the theorems on branching random walks.
The alternative approach, described by Biggins [15], reverses the two coordinates and lets "time" be time in the Crump-Mode-Jagers branching process while "position" is the generation number, i.e., the depth in the family tree. The offsets η i are thus non-random with η i = 1. (We sometimes reverse signs and take η = −1.) We use this approach in the present section, referring to [15] for further details on branching random walks and to [14] for proofs of the theorems used here. One of the main results of Biggins [14,15] is the following (valid for general η i under some conditions that are satisfied in our case, cf. Remark 13.22 below): Theorem 13.1 (Biggins [14;15]). As t → ∞, In our case H(T t ) = B t , so this yields the asymptotic height of T t ; this translates to the height of T n = T τ (n) as follows. Remark 13.3. The fragmentation trees in Section 9 are of a slightly different type than the trees T n that are our main object of study, since they appear as the family tree T t stopped at a fixed time t = log(x 0 /x 1 ) instead of a random time τ (n), see Remark 9.2. This means that asymptotics for the height of fragmentation trees follow directly from Theorem 13.1 rather than from Theorem 13.2. In this section we usually consider only trees of the type T n , and leave corresponding results for fragmentation trees to the reader.
Remark 13.4. Also the split trees defined by Devroye [38], see Remark 9.7, are in general not exactly of the type of trees studied here, but for the purpose of studying the height, they can be approximated by fragmentation trees and similar results can be obtained, see Broutin and Devroye [21] and Broutin, Devroye and McLeish [22]. In our case, when (13.2) holds, this simplifies to γ = inf a > 0 : inf θ>0 θ/a + log µ(θ) < 0 = inf a > 0 : inf and thus Geometrically, (13.8) says that −γ −1 is the slope of the tangent from the origin to the curve log µ(θ), θ > 0, provided such a tangent exists. (Otherwise, −γ −1 is the slope of the asymptote, as follows from Lemma 13.5(ii) and Remark 13.6 below.) Analytically, γ can be found as follows.
On the other hand, if (13.9) has no positive solution, then g(θ) has a fixed sign in I. Since g(α) = α(log µ) ′ (α) < 0, g(θ) < 0 for all θ ∈ I and f (θ)/θ is strictly decreasing. Thus, the infimum in (13.8) is the limit as θ → ∞, which yields the first equality in (13.11). The final equality is a straightforward property of Laplace transforms.
Remark 13.6. The case (ii) in Lemma 13.5 is exceptional. We see from (13.11) that µ has no mass in [0, γ −1 ), so no child is ever born to a parent of age less than γ −1 . Moreover, by (13.8), µ(θ) e −γ −1 θ for all θ > 0, and it follows easily that µ{γ −1 } 1, so µ has a point mass at γ −1 . This case is thus exceptional, and does not appear in any of our examples.

Furthermore, where the last two cases apply only whenx
x >x + , Proof. (i): The log-convexity of µ is well-known and follows from Hölder's inequality. The remaining statements are also well-known properties of Laplace transforms, and follow easily from the definition (5.3), using monotone and dominated convergence together with (A1) and (A5) (or (A4)) for (13.23)-(13.24) and simple estimates for (13.25)-(13.26); note also that (A2) implies that µ is not concentrated at one point.
Example 13.11. A somewhat more complicated example is the m-ary search tree in Section 7.1 or 7.2. (For this example, it does not matter whether we include external nodes or not, since this only changes the height by 1. Furthermore, µ is the same for both versions, so the calculations are the same.) This was originally treated by Devroye [34], see also Mahmoud [93], Pittel [112], Biggins [15] and Devroye [37].
Example 13.14. For the fragmentation tree in Example 9.4, we have a branching process that differs from the one for the binary search tree in Example 13.9, but the intensity µ is the same, so all calculations in Example 13.9 are valid for this tree too. Thus, see Theorem 13.1 and Remark 13.3, −→ γ with γ given by (13.43). Furthermore, if we stop at n nodes as in Remark 9.2, H(T n )/ log n a.s. −→ γ, just as for the binary search tree. More precise results for the height of this fragmentation tree, and m-ary generalizations of it, are given by Chauvin and Drmota [26].
13.2. Moment convergence. We can also obtain moment convergence in Theorem 13.2, in particular convergence of the expectation E H(T n )/ log n toγ, at least if we assume the following additional condition on the birth times for an individual in the Crump-Mode-Jagers process. (A*) There exists δ > 0 such that E e δξ 2 < ∞. In other words, each individual gets at least two children (N 2), and the age when the second child is born has an exponential moment. (Equivalently, it has exponentially decreasing tails.) The condition (A*) is satisfied in all examples in Sections 6-8, since ξ 2 is the sum of one or several exponential waiting times.
Remark 13.15. We use (A*) in the proof of Lemma 13.17 below. Some extra condition is clearly needed for Lemma 13.17 (at least E e δξ 1 < ∞ for some δ > 0, since τ (n) ξ 1 if n > 1 and, say, ψ(t) = 1.) However, we do not know whether (A*) really is needed for Lemmas 13.18 and 13.19 and for Theorem 13.20. In fact, we conjecture that Theorem 13.20 holds assuming only (A1)-(A5) and (A6ψ).
We begin with some lemmas. The first two are stated somewhat more generally than actually needed here. By (13.31), we can choose c r such that α * (c r ) < −r, and then (13.58) yields log η t [c r t, ∞) −rt for large t. This yields the result (13.57), since Moreover, the stopping times τ (i) (n) are independent, and have the same distribution as τ (n). Hence, (13.60) implies that The next lemma will immediately be improved in Lemma 13.19. Lemma 13.19 is trivially true for n = 1 too; however, we assume n 2 since as said in Section 5, in principle we do not require n to be an integer; any real positive n is possible. (We use this in the proof of Lemma 13.19 below, for convenience, when we do not round m to an integer.) This completes the proof, since (13.66) is trivial for small n. In particular, E H(T n ) r log r n →γ r , r > 0. (13.73) Proof. Let X n := H(T n )/ log n. By Lemma 13.19, for n 2 and t 1, P(X n Ct) n −t 2 −t 2 1−t , which obviously holds also for t < 1. Hence, for r > 0, This shows that each moment E X r n is uniformly bounded for n 2. As is well-known, this implies uniform integrability of X r n for each r, and thus also of |X n −γ| r ; since X n In particular, Theorem 13.20 shows that E H(T n ) ∼γ log n, and similarly for higher moments, in Examples 13. 8-13.13. We obtain also corresponding results for the fragmentation tree in Example 13.14, using Lemma 13.16.
Remark 13.21. It follows from (13.72), with r = 2, that the variance E |H(T n ) − E H(T n )| 2 = o(log 2 n). In the case of a binary search tree, Reed [113] showed the much sharper result that E |H(T n ) − E H(T n )| 2 = O(1); this was extended to higher central moments and to m-ary search trees by Drmota [45]. 13.3. Saturation level. In this subsection we will often assume that the random tree is m-ary; more precisely, that N = m for some (non-random) integer m, i.e., every individual in the branching process gets m children. (There is no risk of confusion between the integer parameter m and the function m in (13.1); they never appear together.) We call this the m-ary case in the present section. (We have previously defined an m-ary tree to be a tree where the children of each nodes have labels in the set {1, . . . , m}. In the present section, such labels are irrelevant, as are the order of the children, so we can use this simpler definition.) In the m-ary case, the infinite complete family tree T ∞ is thus a complete m-ary tree; however, we are interested in the trees T t for finite t, and in particular in T n , and there the outdegrees may be smaller than m (but never larger); note that any given node will get m children eventually (i.e., for large t or n). As said above, the saturation level S(T n ) is defined as the last level (generation) k where all possible m k nodes exist; equivalently, it is the first generation where some node has less than m children.
We study the saturation level in basically the same way as the height in the preceding subsection, but now using a feature of Biggins [14,15] that was not needed above: Let χ be a 0-1 characteristic, i.e. a characteristic that takes the values 0 and 1 only (excluding the trivial case when a.s. χ(t) = 0 for all t 0), and now let B t be the maximum of the position y x of all individuals x born at time t such that the characteristic χ x (t − σ x ) = 1. (I.e., only individuals with χ = 1 count.) Then Theorem 13.1 still holds, for general η i , provided the following two conditions are satisfied [14; 15]: (13.75) (The property (B2) is called well-regulated in [15].) Remark 13.22. The case considered in Section 13.1 above is the special case when χ(t) = 1 for all t 0, and further η i = 1. We noted (B1) in Lemma 13.7(ii), and (B2) is trivial (for this choice of η i ) since α(ζ) α > 0 when ζ < 0, as also noted in Lemma 13.7(ii).
Consequently, Theorem 13.1 applies to our η i = −1 and χ in (13.76), which yields 13.85) and the result follows by (5.17) as in the proof of Theorem 13.2.
As a corollary, we obtain moment convergence and convergence in L r as in Theorem 13.20, assuming also (A*).
Theorem 13.26. Assume (A1)-(A5), (A6ψ) and (A*) and N = m. Then the convergence in (13.83) holds also in L r for every r > 0. In particular, Proof. Since S(T n ) H(T n ) + 1, Lemma 13.19 holds also for S(T n ); hence the result follows from Theorem 13.25 by the argument in the proof of Theorem 13.20. (In fact, in all cases of m-ary trees with |T n | = n and in many other cases, e.g. for m-ary search trees, S(T n ) C log n deterministically for some C; then the results follow from Theorem 13.25 by dominated convergence without using (A*) and Lemma 13.19.) Remark 13.27. We see that (B2) (or some similar condition) is needed for Theorem 13.1; if we let η i = −1 as above but take χ(t) = 1 for all t 0 (as in Section 13.1), then obviously B t = 0, and B t /t does not converge toγ (in general), so Theorem 13.1 does not hold.
We give some formulas for γ − , similar to Lemma 13.5.
13.4. Profile. We have in this section so far considered the height and the saturation level, which are the maximum and minimum depths of nodes, in the latter case considering only nodes that are not full, i.e., with less than the maximum number of children. The results of Biggins [14,15]  One of the main results of Biggins [14,15] is the following, which we for convenience state first in the original form (valid for general η i , with the corresponding α * ).
Theorem 13.36 (Biggins [14,15]). Suppose that η i and χ are such that We shall use the following version of Theorem 13.36, for the special cases η i = ±1 of interest to us. (The uniform convergence a.s. in (13.112) a.s.
Remark 13.39. Theorem 13.38 does not hold (in general) for x > γ/α. Indeed, for such x, by Theorem 13.2, a.s. H(T n ) < x log n for large n and thus n x log n (T n ) = 0 and log n x log n (T n ) / log n = −∞, while typically α * (αx) > −∞, see (13.20). Furthermore, Theorem 13.38 fails for x = γ/α too, as is seen for example by the binary search tree, where it follows from Biggins [12] that H(T n ) − (γ/α) log n → −∞ a.s., and thus a.s. n (γ/α) log n (T n ) = 0 for all large n; see [101] and [113] for more precise results. ( For the m-ary search tree in Section 7.1, we may similarly consider external nodes (i.e., nodes without a key) only; this yields the external profile.
Moreover, α * (x) is by the definition (13.123) a non-negative majorant of α * (−x), and we have just shown that it is concave. It follows easily from (13.123) that it is the least concave non-negative majorant.
We now return to the characteristic 1. By (13.149), for t t 0 , With t 0 as above, we thus have by (13.157) a.s., for t t 0 and all s 0, and consequently, for all x 0, Hence, a.s. there exists t 1 t 0 such that for all t t 1 and all x 0, Consequently, a.s., uniformly for all x 0. For a lower bound, consider first x > 0. For a given y x > 0 and any t > 0, let t ′ := tx/y. Then t ′ t and Consequently, letting t → ∞ and using (13.152), Hence, a.s., Remark 13.44. The lower part of the profile is thus described by the function α * (x) and not byα * (−x). (By Lemma 13.42(v), there is a difference only if A − < 0, and only for x < γ * .) However, if we count only nodes described by a 0-1 characteristic χ for which (B2) holds, then (13.152) holds. For example, this applies to non-full nodes in an m-ary search tree, using the characteristic (13.76); we use this in the proof of Theorem 13.57 below. Similarly, it applies to external nodes in an m-ary search tree, and to leaves in any tree generated by a branching process with N 1 (using the argument in the proof of Theorem 13.25 to verify (B2)). Equivalently, a.s., uniformly for k/ log n ∈ [δ, γ * /α] for any δ > 0, We can interpret this as saying that, a.s., a "large part" (in a logarithmic sense) of all possible nodes of depth k are filled for k (γ * /α) log n, but not for (substantially) larger k as a consequence of (13.130). Compare with Theorem 13.25, which says that all possible nodes are filled up to depth S(T n ) ∼γ − log n a.s., and note thatγ − < γ * /α by (13.128), except in the trivial case γ * = 0. A stronger result will be proved in Theorem 13.57. Note that α * (x) = α * (x) for 0 x β −1 and α * (x) = α * (x) for x β −1 , see (13.30) and (13.129).
−→ α * (x), (13.178) uniformly for x in any compact subset of (0, γ), and if A − 0, uniformly for x in any compact subset of [0, γ). This implies (13.177) by the usual argument using (5.17). Furthermore, since α * (x) is continuous on [0, γ), it suffices to consider the case when xt is an integer, i.e., uniformly for k/t in any compact subset of (0, γ) or [0, γ), respectively. The upper bound is easy. Since n k (T t ) n k (T t ) and n k (T t ) n k (T t ), we get the upper bound lim sup t→∞ log n k (T t ) t α * (k/t), (13.180) from Theorems 13.43 and 13.38, or more precisely from (13.162) and (13.120), uniformly for 1 k x ′ t for any x ′ < γ.
The general case follows by pruning the branching process and the corresponding tree T t . Let M be a large integer and let each individual keep only at most M children, for example by discarding all children after the M first of every individual. We denote the pruned tree by T (M ) t , and write similarly µ (M ) and so on for other quantities for the pruned version. Note that, by monotone convergence and (5.7), as M → ∞. (13.190) for every θ. It is easy to verify that (A1)-(A5) except (A2) hold for the pruned process too, provided M is large enough. Furthermore, also (A2) holds except in some exceptional cases; in those cases we can modify the pruning by selecting the M surviving children by a suitable random procedure that preserves both (A2) and (13.190); we omit the details. (Alternatively, it seems probable that (A2) is not really needed for the proofs in the present section, but we have not verified this in detail.) Moreover, T (M ) t is a subtree of T t , and thus, by (13.189) for the special case just treated, Lemma 13.50. Let α * (M ) (x) be α * for the branching process pruned to at most M children for each individual, using a pruning such that (13.190) holds. If 0 < x < γ, then, as M → ∞, (13.192) and this holds uniformly for x in any compact subset of (0, γ); if A − 0, it holds uniformly in any compact subset of [0, γ).

(13.212)
The profile is asymptotically given by (13.177), with α = ρ + 1, uniformly for x in any compact subset of (0, γ); note that (13.177) does not hold for x = 0, since α * (0) = 1 while n 0 (T n ) = 1. However, this is the only exception, and (13.177) extends, uniformly, to all x x 1 < γ/α satisfying the obvious condition x log n 1, see Theorem 13.55 and Example 13.56 below. (More precise results, obtained by other methods, in the case ρ = 1, i.e. the plane oriented recursive tree in Example 6.5, are given by [70] and [120].) The following example shows that if A − > 0, the restriction in Theorem 13.49 that x lies in a compact subset of (0, γ), and thus stays away from 0, is necessary; the theorem does not hold in general for x = x(n) → 0, even if we assume k = x log n 1 or x log n → ∞.
Example 13.54. Let (t j ) be a rapidly increasing sequence with t 1 = 1 and t j+1 > t j + 1, j 1. Let each individual get ⌊e t j − e t j−1 ⌋ children born at uniformly random times in [t j − 1, t j ] for each j 1. (Let here t 0 := −∞, say.) Then while for any θ > 1, Hence, A − = 1. It is easy to verify (A1)-(A5). At time t = t j+1 − 1, each individual has at most e t j children, and thus n k (T t ) e kt j for every k 0; hence For any given function ω(t) with ω(t) = o(t) as t → ∞, we can choose (t j ) such that ω(t j+1 )/t j+1 < 1/(jt j ), and then (13.215) shows that lim inf t→∞ log n k (T t ) t = 0 (13.216) uniformly for k Cω(t), for any fixed C. As a consequence, using the weight ψ = 1 in Example 5.3 (and assuming as we may that ω(t)/t is decreasing), it follows that lim inf t→∞ log n k (T n ) log n = 0 (13.217) uniformly for k ω(log n). In particular, (13.177) does not hold for all x = x(n) → 0, so it does not hold uniformly for x > 0 (even assuming x log n 1). (In this case, The same argument applies also to n k and shows that (13.146) does not hold uniformly for all x > 0 such that x log n 1.
In Example 13.54, the birth times are distributed very irregularly, and (13.177) fails already for depth x log n = 1, the children of the root. In more regular cases, (13.177) holds for depth 1; the following theorem shows that then it holds for all depths x log n 1 with x x ′ for some x ′ < γ/α. as t → ∞. Then a.s., for any x 1 < γ, as t → ∞, Say that an individual x in the branching process is good if (13.220) holds for the subtree rooted at x (with time measured from the birth of x), and let G(t) be the number of children of the root in T t that are good. Since the total number of children of the root is Ξ([0, t]), and each is good with probability at least 1/2 and these events are independent of each other, the law of large numbers implies that a.s. G(t) 1 3 Ξ([0, t]) for large t.
Let 0 < δ < 1. We have seen that a.s., for large t, the number of good children of the root at time (1 − δ)t is, using the assumption (13.218), Each of these good children sprouts a tree that at time t has age at least δt and thus, by the definition of good, has at least one node in each of the following ⌈bδt⌉ generations (provided t > t 0 /δ). Hence, for large t, if 1 k bδt, then n k (T t ) G (1 − δ)t and thus, by (13.221), Let ε > 0. Since α * (x) is continuous, we can find some small δ > 0 such that if 0 x bδ, then α * (x) < α * (0) + ε. We may furthermore assume that δ < ε/(A − + 1), and then (13.222) implies that, with x 0 = bδ, for any For any x 1 < γ, the same inequality (13.223) holds, for large t, also for k/t ∈ [x 0 , x 1 ] by (the proof of) Theorem 13.49, so (13.223) extends, for large t, to 1 k x 1 t. Together with the upper bound (13.180), this yields (13.219). The usual argument using the stopping times τ (n) yields (13.177) uniformly for 1 x log n x ′ log n, for any x ′ < γ/α.
Example 13.56. For the linear preferential attachment in Example 13.53, the children are born according to a pure birth process with birth rates λ k = k + ρ. From the birth of the first child, this process stochastically dominates the Yule process (Y t ) in Example A.3 (with the standard rate α = 1), which is a birth process with rates λ k = k. Conversely, the process describing the births is dominated by a Yule process started with ⌈ρ⌉ individuals, i.e., the sum of ⌈ρ⌉ independent Yule processes. It follows from (A.3) that a.s. In the m-ary case, we can give a sharper result for the lower part of the profile, showing that in the range k < (γ * /α) log n, the estimate m k+o(log n) in (13.170) in Example 13.45 can be improved to m k (1+o(1)), at least in the case A − < 0. (I.e., when the birth times ξ i have some exponential moment.) Theorem 13.57. In the m-ary case N = m, assume (A1)-(A5) and (A6ψ), and also A − < 0. Then, 0 < γ − < γ * and for every x 1 ∈ (γ − , γ * ], a.s., uniformly for k (γ * /α − ε) log n.
Example 13.59. For the m-ary search tree, we have α = 1 and γ * = 1/H m−1 , where H m−1 as usual is the harmonic number. Thus, Corollary 13.58 and, in more detail, Theorem 13.57, show that most possible nodes exist up to depth ≈ γ * log n = H −1 m−1 log n, but not further. In contrast, by Theorem 13.25, all possible nodes exist only up to depth ≈ γ − log n, where γ − < γ * . In the binary case, γ − . = 0.37336 by Example 13.34, while γ * = 1.
In (13.247), we consider the conditional distribution and conditional moments given T n . (This is the quenched version, see Remark 4.1.) We can also consider the unconditional distribution and moments (i.e., the annealed version); this means that we first sample a random T n and then a random node v in T n , and consider the depth of v in T n . Remark 13.63. For an m-ary search tree, we can also consider the depth of a uniformly random key. Since the number of keys per node is bounded, and (5.5) (with φ = ψ) holds, it is easy to see that Theorem 13.61 holds in this setting too.
Similarly, one might consider e.g. the depth of a random leaf in T n .
Jensen's inequality with the strictly convex function x log x yields, with strict inequality sinceξ is not concentrated at a single value by (A2), using (5.40), Thus αβ < log m, and (αβ) −1 > 1/ log m. Compare Remark 13.60.
13.6. Total path length. The total path length L is defined (for any rooted tree) as the sum of the depths of all nodes: The total path length is closely connected to the typical depth h * (T ) studied in Section 13.5; it follows from the definitions of h * (T ) and L(T ) that for any fixed rooted tree T , For a random tree, we thus obtain the same result for the conditional expectation: Example 13.67. For the binary search tree in Example 6.2, we use the weight ψ = 1 and thus m ψ = 1; furthermore α = 1 and β = 1/2. Hence, L(T n ) ∼ 2n log n a.s. In this case, much more detailed results are known, see e.g. [114], [115] and [116].
Remark 13.68. We can similarly study versions of the path length with the summation (13.255) only over a subset of all nodes v, for example just summing over the set of leaves v. We leave the details to the reader. In an extended m-ary search tree, two standard examples are the internal path length and the external path length. Another version for the m-ary search tree is the sum of the depths of all keys, cf. Remark 13.63, see e.g., [5] and [92]. This is more natural in the case of m-ary search trees since this is the natural measure for the efficiency of the corresponding sorting algorithm. The total path length using the sum over all nodes was considered in e.g., [65] for studying cuttings in split trees (there the m-ary search tree was given as one example). Both versions of the path lengths i.e., the sum over all keys, respectively the sum over all nodes were considered in e.g., [24] (for the analysis of general split trees).
As said in the introduction, it is natural to try to show asymptotic normality of the number of fringe trees of a given type. There are several previous results of this type for special cases. Central limit laws for fringe trees have been shown, by several different methods, for binary search trees in e.g. [35], [39], [54], [55], [25] and [66], and for random recursive trees in e.g. [55] and [66]. For m-ary search trees, the situation is more complicated: no results for general fringe trees have been published (this is work in progress [68]), but some special cases (such as the degree distribution and the number of fringe trees of a given size) and related quantities (the number of internal nodes) have been treated, and it turns out that central limit theorems hold for m 26 but not for m 27, see e.g. [95], [93], [90], [30], [69], [29], [53], [74] and [67]. Further examples of asymptotic normality include the degree distribution of plane oriented recursive trees (preferential attachment trees, see Example 6.5) [8], [96], [74], [75], and the number of internal nodes in median-of-(2ℓ + 1) binary search tree for ℓ 58, but not for ℓ 59, see [30], [31].
The examples of m-ary search tree and median-of-(2ℓ + 1) binary search trees thus show that central limit theorems do not always hold for fringe trees of the random trees generated by Crump-Mode-Jagers branching processes as in the present paper.
Problem 14.1. Find a characterization of the Crump-Mode-Jagers processes that yield asymptotic normality for the number of fringe trees of a given type.
Using the methods of Section 5, Problem 14.1 can be seen as a special case of the following problem for branching processes: Problem 14.2. Find a characterization of the Crump-Mode-Jagers branching processes such that for suitable characteristics φ and ψ, and with τ (n) as in Section 5, Z φ τ (n) is asymptotically normal as n → ∞. Problem 14.2 considers a stopped branching process. It is closely related to the following problem for fixed times: Problem 14.3. Find a characterization of the Crump-Mode-Jagers branching processes such that for suitable characteristics φ, This problem has been studied, at least for some branching processes. Asmussen and Hering [2, Theorems VIII.3.1 and VIII.12.1] give a central limit theorem of this type for a somewhat different class of branching processes, viz. multi-type Markov branching processes. In principle, as pointed out in [2], this class includes the Crump-Mode-Jagers branching processes studied here (with the "type" taken as the entire previous history of the individual), but the resulting type space is typically so large that the technical conditions in [2] are not satisfied. (In particular, "Condition (M)".) However, for the Crump-Mode-Jagers branching processes used in the examples above, with life histories that are composed of one or several independent waiting times, the process can be described using a finite dimensional type space. It seems that the results in [2] then apply and can be translated to conditions for these Crump-Mode-Jagers branching processes. Presumably, the same conditions then apply to Problems 14.1 and 14.2 too, but that remains an open problem.
Moreover, it seems likely that the same type of conditions apply to much more general Crump-Mode-Jagers branching processes. The conditions in [2] are stated in terms of eigenvalues of a certain operator A defined by the process, and the result says (under some technical assumptions) that if λ 1 is the largest eigenvalue of A (this eigenvalue is real), then we have asymptotic normality if every other eigenvalue λ has Re λ 1 2 λ 1 , but (typically, at least) not otherwise. The same condition also appears in the different but closely related context of generalized Pólya urns, see [74]. We conjecture that this condition (in a suitable form) applies to rather general Crump-Mode-Jagers branching processes. This has been proved in the discrete-time case [79], but the continuous-time case relevant here is more challenging.
Remark 14.4. In contrast, for conditioned Galton-Watson trees (see Remark 1.1), asymptotic normality for fringe trees holds in general, see [77]. (Such trees are not treated in the present paper.) Example A.2. As a special case (see Example 6.1), the counting process Ξ[0, t] corresponding to a Poisson process Ξ with intensity 1 is a pure birth process with constant intensity λ k = 1, started at 0. More generally, a pure birth process with constant birth rate λ k = λ, started at 0, is a Poisson process with intensity λ. (We have earlier defined a Poisson process as a point process; the corresponding pure birth process considered here is also called a Poisson process. There is an obvious equivalence between the two points of view, and hardly any risk of confusion.) Example A.3. The Yule process in e.g. Example 6.1 is a pure birth process with birth rates λ k = k, started at 1. More generally, for a Crump-Mode-Jagers process where each individual gets children according to a Poisson process with intensity α > 0, the total size (number of individuals Z t ) is a pure birth process with birth rates λ k = αk, started at 1; we call this a Yule process with rate α > 0. (It evidently differs from the standard case α = 1 only by a simple change of time.) If (Y t ) is a Yule process with rate α, it thus follows from (5.9) that for some random variable W . (Note that the intensity measure µ is αdt, so (5.4) holds and the Malthusian parameter α equals the rate α.) It is easy to verify (5.11) and thus W > 0 a.s.; in fact, it follows from (A.23) below that W ∼ Exp (1). (This is one of the few cases with a simple explicit distribution for the limit W .) We state a general result on stopping a pure birth process by an exponential clock τ .
Theorem A.4. Let (X t ) be a pure birth process with birth rates λ k 0, started at X 0 = 0. Furthermore, let τ ∼ Exp(α) be independent of the birth process. Then X := X τ has the distribution We give two different proofs (both simple) to illustrate different ways of arguing with exponential random variables; the first proof is more direct probabilistic and the second more analytic. (The second proof is essentially the same as (6.13)-(6.14) given in Example 6.3; it was there given for a special case but the argument is general, as is shown below.) First proof of Theorem A.4. Regard τ as an exponential random clock that strikes and stops the process. When X t = k and τ > t, so the process has not yet stopped, the next event that happens is either that the clock strikes (rate α), and then X = k, or that X t jumps to k + 1 (rate λ k ), and then X > k. Consequently, P(X = k | X k) = α α + λ k , (A.5) P(X k + 1 | X k) = λ k α + λ k .
Second proof of Theorem A.4. With the notation above, we have by (A.2) P(X k) = P(X τ k) = P(S k τ ). (A.8) Conditioning on S k , we have P(S k τ | S k ) = e −αS k , and taking the expectation we find, using (A.1) and independence of Y 1 , . . . , Y k−1 , The result follows by taking the difference P(X k) − P(X k + 1).
We consider in particular the linear case, when λ k = χk + ρ, (A.10) for some constants χ and ρ. (As in Example 6.4, only the ration χ/ρ matters, up to a change of time scale, so we might assume χ ∈ {1, 0, −1}, but we shall not require this.) Note that Examples A.2 and A.3 both are of this type, with (χ, ρ) = (0, ρ) and (1, 0) (or (α, 0)), respectively. Note that ρ = λ 0 > 0, while χ can be any real number. As in Example 6.4, if χ < 0, we have to assume that ρ = m|χ| for a non-negative integer m (and X 0 m); then λ m = 0 and the process stops when it reaches m, so the values λ k , k > m, can be ignored.
In the special case χ = 0 (so λ k = ρ is constant), we have instead Thus, in this case X has the geometric distribution Ge 0 α/(α + ρ) .
Proof. An immediate corollary of Theorem A.4.
In the linear case, it is also easy to find the distribution of X t for a fixed t. We begin with the expectation.
Theorem A.6. Let (X t ) be a pure birth process with birth rates λ k = χk + ρ as in (A.10), for some χ and ρ, started at X 0 = x 0 . Then, for every t 0, Consequently, for every χ and every t 0, d dt E X t = (ρ + χx 0 )e χt . (A.14) Proof. Since X t grows with a rate that is a linear function χk + ρ of the current state k, its expectation E X t grows at rate χ E X t + ρ, i.e., d dt This differential equation, with the initial value E X 0 = x 0 , has the solution (A.13). (The reader that finds this argument too informal may note that X t − t 0 (χX s + ρ) ds is a martingale, and take the expectation to obtain (A.15).) We give the distribution of X t only for the case x 0 = 0, leaving the general case to the reader.
Note the well-known fact for the Poisson process in Example A.2, we thus have (A. 19), yielding a Poisson distribution Po(ρt).

Appendix B. Hypergeometric functions and distributions
Recall that the hypergeometric function F (a, b; c; z) (also denoted by 2 F 1 (a, b; c; z)) is defined by the sum F (a, b; c; z) := ∞ n=0 a n b n c n · z n n! , (B.1) see e.g. [107, §15.2] or [59, §5.5]. In general, the parameters a, b, c can be arbitrary complex numbers (except that c = 0, −1, −2, . . . is allowed only in special cases), and z may be a complex number, but we are here only interested in real a, b, c and z. If a ∈ Z 0 or b ∈ Z 0 , then the hypergeometric terms in (B.1) vanish for n > |a| or n > |b|, respectively, so F (a, b; c; z) is a polynomial; otherwise the series (B.1) converges for |z| < 1 and diverges for |z| > 1. (The hypergeometric function F (a, b; c; z) extends by analytic continuation to z ∈ C \ [1, ∞), but we have no use for this extension here.) The hypergeometric series (B.1) converges for z = 1 if and only if a ∈ Z 0 , b ∈ Z 0 (in these cases the sum is finite, as said above) or Re(c − a − b) > 0, and then its sum is, as shown by Gauss [58], see also [107, (15.4.20), (15.4.24)], We say that a random variable has a (general) hypergeometric distibution if its probability generating function is, up to a normalization constant, a hypergeometric function F (a, b; c; z), for some a, b, c. We denote such a distribution by HG(a, b; c). (There seems to be no standard notation.) Some such distributions appear above in the study of random trees, and we give here some general properties and examples of such distributions, as a background and for easy reference. See further e.g. Johnson, Kemp and Kotz [81,Chapter 6] and the references given there.
We repeat the definition somewhat more formally: Definition B.1. The general hypergeometric distribution HG(a, b; c) is the distribution of a non-negative integer-valued random variable X such that for some constant C. Equivalently, the probability generating function is with a suitable interpretation if P(X = k) = 0.
A hypergeometric distribution HG(a, b; c) does not exist for all real parameters a, b, c. We see from (B.3) and (B.5) that a necessary and sufficient condition for the existence of HG(a, b; c) is that a k b k / c k 0 and that F (a, b; c; 1) < ∞. We do not give precise necessary and sufficient conditions for this here, see e.g. [81], but we note the following cases where HG(a, b; c) exists; these comprise all cases of interest to us (and to others as far as we know), if we recall that a and b can be interchanged.
(One example is Example B.4 below. Typically, as there, we have both a, b ∈ Z 0 and c > 0; then, if we do not assume b a, the support is {0, . . . , min(|a|, |b|)}.) Remark B.3. As said above, the hypergeometric function (B.1) in general does not exist when c ∈ Z 0 . However, it is still possible to define HG(a, b; c) is some cases. We assume that also a ∈ Z 0 . (Of course, the case b ∈ Z 0 is similar, by symmetry.) If c < a (and b > 0, included in (ii) As said above a hypergeometric variable X ∼ HG(a, b; c) with a ∈ Z 0 or b ∈ Z 0 is bounded, and thus has moments of all orders. If a, b / ∈ Z 0 , the distribution has a power-law tail, and thus only a finite number of moments.
We give a precise asymptotic formula for P(X = k) and then formulas for (factorial) moments.
Theorem B.9. (i) Let a, b, r > 0 be real numbers. Let X have a distribution that is a mixture of the negative binomial distribution NBin(r, p) with p ∼ B(a, b). Then X ∼ HG(r, b; r + a + b).
(ii) Let a, b, m > 0, with m an integer. Let X have a distribution that is a mixture of the binomial distribution Bin(m, p) with p ∼ B(a, b). Then X ∼ HG(−m, a; 1 − b − m).
Proof. The proofs of both parts are similar: we use the definitions of the negative binomial, binomial and beta distributions, evaluate a beta integral and make some manipulations using (2.1) and (2.2). It is not difficult to keep track of the constant factors during the calculations (and, indeed, this is a useful check, which we leave to the reader), but it is simpler to ignore them, since the final factor is determined by (B.5) and thus does not need to be computed; we thus just write C 1 , . . . for various constants (depending on the parameters but not on k).
(i): Remark B.10. In this context the resulting hypergeometric distributions are known as beta-negative binomial distributions and beta-binomial distributions. (Note that we can obtain any distribution of type (i) or (ii) above.) There are also several other names used for various cases of the general hypergeometric distribution, see [81]; in particular, case (ii) is sometimes called negative hypergeometric.
Example B.11. Yule [122] considered a simple model of evolution, where each existing species creates new species in the same genus with a constant rate λ s , and also (independently) new species in new genera with another rate λ g . What is the limiting distribution of the number of species in a random genus? Note that the evolution of all species, ignoring their genus, is a Crump-Mode-Jagers branching process, where each individual gets children according to a Poisson process with intensity λ s + λ g . Hence, assuming that we start with a single species, the total number of species forms a Yule process with rate λ s + λ g , see Example A.3. Similarly, the number of species in the same genus as the root (the original species) forms a Yule process with rate λ s .
One way to treat this problem is to consider each genus as an individual in a Crump-Mode-Jagers process, where each individual has an internal Yule process Y t with rate λ s (the number of species in the genus), and new births occur with rate λ g Y t . Since E Y t = e λst , see Theorem A.6, the offspring process has intensity µ(dt) = λ g e λst dt, from which it follows that (5.4) holds with the Malthusian parameter α = λ s + λ g . The assumptions (A1)-(A5) hold, and Theorem 5.14 shows that the number of species in a random genus converges in distribution to Y τ , the number of species in the root at time τ ∼ Exp(λ s + λ g ). The Yule process Y t starts as 1, but we may as in Example A.8 (which is the case λ s = 1) apply Theorem A.5 to Y t − 1, with χ = ρ = λ s , and it follows that the asymptotic distribution of the number of species in a given genus is 1 + HG(1, 1; 3 + λ g /λ s ). (One can, as said above, also use Theorem B.9(i), since Theorem A.7 implies that Y t − 1 has the geometric distribution NBin(1, e −λst ) = Ge 0 (e −λst ), and e −λsτ ∼ B((λ s + λ g )/λ s , 1).) This result was found by Yule [122] (by a different method), and a distribution of the form 1 + HG(1, 1; c) is therefore called a Yule distribution, see further [119] and [81, §6.10.3]. (Here c > 2. Often one writes c = 2 + ρ, with ρ > 0; in the present example, thus ρ = (λ s + λ g )/λ s .) Note that the case c = 3 appears in (6.2) and (shifted to HG(1, 1; 3)) in (7.6).
An alternative method to treat this example is to consider the Yule process (with rate λ s + λ g ) of all species. Call the first species in each genus the progenitor of the genus, and give each progenitor a mark; then each species (except the original one) is marked with probability p = λ g /(λ s + λ g ), and these marks are independent of each other and of everything else. Hence, we obtain the same asymptotic distribution of fringe trees (except for the mark at the root) if we sample a random progenitor as if we sample a random species, cf. Section 12. Using Theorem 5.14 it follows that the number of species in a random genus converges in distribution to the number of descendants in the same genus of a fixed individual stopped at a random age τ ∼ Exp(λ s + λ g ). This yields the same result as above.
Note also that if we erase the edges between different genera and only keep the edges between species in the same genus in Yule's model, we obtain a growing forest. If we let F n be this forest when it has reached n nodes, we obtain a growing forest process which is the same as UGROW defined by Devroye, McDiarmid and Reed [42]. Results for the size of the subtree rooted at a given node in UGROW are given by Devroye, McDiarmid and Reed [42] and Pakes [109]. Example B.13. Pólya's urn contains balls of different colours. We draw a ball uniformly at random and replace it together with c new balls of the same colour. This is repeated n times. Let W be the number of white balls drawn, assuming that the urn initially contains w white and b black (or non-white) balls.
We assume c = 0, to avoid the trivial case c = 0 when W has a binomial distribution, but c < 0 is allowed, meaning that balls are removed. In particular, c = −1 gives drawing without replacement, when W has the classical hypergeometric distribution in Example B.4. (It is natural to let c, b, w be integers, but the model has a natural interpretation also for real values of these parameters, see e.g. [74,Remark 4.2].) In the case c < 0, we assume that b, w and n are such that we never can be required to remove a ball of a colour that is no longer present, or draw a ball when the urn is empty.
It is easy to see that P(X = k) is proportional to For a connection with Example B.12, suppose instead that the urn starts with one ball each of m colours (including white), and that c = 1. The number W of white balls drawn is the same as if we start with 1 white and m − 1 black balls, and thus W ∼ HG(−n, 1; 2 − n − m). On the other hand, it is easy to see by induction, that for each n, the composition of the urn is uniform over all possible colour combinations. Thus W has the same distribution as X 1 in Example B.12 (with colours corresponding to boxes).