Moments of the frequency spectrum of a splitting tree with neutral Poissonian mutations

We consider a branching population where individuals live and reproduce independently. Their lifetimes are i.i.d. and they give birth at a constant rate b. The genealogical tree spanned by this process is called a splitting tree, and the population counting process is a homogeneous, binary Crump-Mode-Jagers process. We suppose that mutations affect individuals independently at a constant rate $\\theta$ during their lifetimes, under the infinite-alleles assumption: each new mutation gives a new type, called allele, to his carrier. We study the allele frequency spectrum which is the numbers A(k, t) of types represented by k alive individuals in the population at time t. Thanks to a new construction of the coalescent point process describing the genealogy of individuals in the splitting tree, we are able to compute recursively all joint factorial moments of (A(k, t)) k$\ge$1. These moments allow us to give an elementary proof of the almost sure convergence of the frequency spectrum in a supercritical splitting tree.


Introduction
In this work, we study a branching population in which every individual is supposed to have a lifetime independent from the other individuals in the population. Moreover, during their lifetimes, they give birth to new individuals at Poisson rate. The genealogical tree underlying the history of the population, the so called splitting tree, has been widely studied in the past [19,10,9].
In our model, individuals also experience mutations at Poisson rate. Each mutation leads to a totally new type replacing the previous type of the individual, this is the the moments formulas stated in Section 3. We give the key decomposition of the CPP in Subsection 5.1. The rest of Section 5 is dedicated to the proofs of theorems 3.1 and 3.2. In particular, we provide a computation of the first moment much simpler than the one of [4]. We give the asymptotic behaviour of higher moments in Section 6. Section 7 is dedicated to the proof the following law of large numbers: where E is an exponential random variable with parameter 1 conditionally on nonextinction, and the constants c k are explicit.

Splitting trees and the coalescent point process
We study a branching model of population dynamics called splitting tree where individuals live and reproduce independently from each other. Their lifetimes are i.i.d.
following a given arbitrary distribution P V on (0, ∞]. During this lifetime, an individual gives birth to new individuals, with binary reproduction (i.e. new individuals appear singly), at independent Poisson times with positive constant rate b until his death. We also suppose that the population starts with a single individual called the root or ancestor. A graphical representation of a splitting tree is shown in Figure 1.
The finite measure Λ := bP V is called the lifespan measure, and plays an important role in the study of the model.
Moreover, we assume that individuals undergo mutations at Poisson times with rate θ during their lifetimes independently from each other and from their reproduction processes. Each new mutation leads to a brand new type replacing the preceding type of the individual (infinitely many alleles model). Parents yield their current type to their children.
A family at a given time t is a set of alive individuals carrying the same type at time t. Our purpose is to study the distribution of the sizes of families in the population at time t. t Figure 1: Graphical representation of a splitting tree. The vertical axis represents the biological time for the population. The horizontal axis has no biological meaning. The vertical segments represent the lifetimes of the individuals: the lower bounds their birth-times and the upper bounds death-times. The dotted lines denote the filiations between individuals. (Image by A. Lambert) For our study, it is easier to work with the genealogical tree of the population alive a time t. Indeed, since mutations are Poissonian, the different types in the population only depend of the coalescence times of the lineages of the alive population. In order to derive the law of that genealogical tree, we need to characterize the law of the times of coalescence between pairs of individuals in the population, which are the times since their lineages have split.
In [19], Lambert introduces a contour process Y , which codes for the tree, and hence its genealogy. Suppose we are given a tree T, seen as a subset of R× ∪ k≥0 N k with some compatibility conditions (see [19]), where N refers to the set of non-negative integers. On this object, Lambert constructs a Lebesgue measure λ and a total order relation which can be roughly described as follows: let x, y in T, the point of birth of the lineage of x during the lifetime of the root splits the tree in two connected components, then y x if y belong to the same component as x but is not an ancestor of x (see Figure 2).
If we assume that λ(T) is finite, then the application, is a one-to-one correspondence. In a graphical sense (see Figure 2), ϕ(x) measures the length of the part of the tree which is above the lineage of x. The contour process is where Π R is the projection from R × ∪ k≥0 N k to R. In the case where λ(T) is infinite, one has to consider truncations of the tree above fixed levels in order to define contours (see [19] for more details). In a more graphical way, the contour process can be seen as the graph of an exploration process of the tree: it begins at the top of the root and decreases with slope −1 while running back along the life of the root until it meets a birth. The contour process then jumps at the top of the life interval of the child born at this time and continues its exploration as before. If the exploration process does not encounter a birth when exploring the life interval of an individual, it goes back to its parent and continues the exploration from the birth-date of the just left individual (see Figure 3). It is then readily seen that the intersections of the contour process with the line of ordinate t are in one-to-one correspondence with the individuals in the tree alive at time t.
In [19], Lambert shows that the contour process of the splitting tree which has been pruned from every part above t (called truncated tree above t), has the law of a spectrally positive Lévy process reflected below t and killed at 0 with Laplace exponent The largest root of ψ, denoted α, is called the Malthusian parameter and, as soon as α > 0, gives the rate of growth of the population on the survival event.
The time of coalescence of two individuals alive at time t corresponds to the amount of time one needs to go back in the past along their lineages to get their first common ancestor. The time of coalescence between an individual alive at time t and the next one visited by the contour is exactly the depth of the excursion of the contour process below t between this two successive individuals (see Figure 3). We are interested in the sequence of coalescence times shown in Figure 3, which contain the minimal information needed to reconstruct the genealogy at time t. Figure 3: Construction of the contour process and link between the excursions of the contour process and the times of coalescence in the tree.
More precisely, it follows from well known fluctuation properties of spectrally positive Lévy processes (see [18], Theorem 8.1 for spectrally negative Lévy processes) that the law of the depth H of an excursions below t is given by where W is the scale function of the Lévy process characterized by its Laplace transform , t > α. (2.1) Since the contour process is strong Markov, the sequence of excursion depths is i.i.d.
To summarize, given the population is still alive at time t, one can forget the splitting tree and code the genealogy of the living individuals alive at time t by a new object called the coalescent point process (CPP) at time t shown in Figure 4. Its law is the law of a sequence (H i ) 0≤i≤Nt−1 , where the family (H i ) i≥1 is i.i.d. with the same law as H, stopped at its first value H Nt greater than t, and H 0 is deterministic equal to t (see Figure 4). The heights H 1 , . . . , H Nt−1 are called branch lengths of the CPP. Remark 2.1. Let N be an integer valued random variable. In the sequel we say that a random vector with random size (X i ) 1≤i≤N form an i.i.d. family of random variables independent of N , if and only if From the CPP at time t, the genealogical tree of alive individuals at time t is obtained considering that the ith branch coalesces with the first branch on its left such that H j > H i (for j < i) (see Figure 4).  The number N t of alive individuals at time t in the splitting tree is then given by From the comments above, N t is a geometric random variable given N t > 0. More precisely, Finally, we can define the occurrence of mutations directly on the CPP as the atoms of a random measure. Let P be a Poisson random measure on (0, t) × N with intensity measure θλ ⊗ C where λ is the Lebesgue measure on (0, t) and C is the counting measure on N. The mutation random measure on the CPP at time t is then defined by N (da, di) = 1 Hi>t−a 1 i<Nt P (di, da) , (2.2) where an atom at (a, i) means that the ith branch of the CPP experiences a mutation at time t − a. Note that, when one looks at the allele distribution at time t, this construction is equivalent to the construction of Poissonian mutations on the original splitting tree [5]. We assume that each mutation gives a totally new type to its holder (infinitly-many alleles model) and that the types are transmitted to offspring. This rule yields a partition of the population by type at a given time t. The distribution of the frequency of types in the population is called the frequency spectrum and is defined as the sequence (A(k, t)) k≥1 where A(k, t) is the number of types carried by exactly k individuals in the alive population at time t (or, for short, the number of families of size k at this time) excluding the family holding the original type of the root.
In the study of the frequency spectrum, an important role is played by the family carrying the type of the root. The type of the ancestor individual at time 0 is said clonal. Moreover, at any time t, the set of individuals carrying this type is called the clonal family at time t. We denote by Z 0 (t) the size of the clonal family at time t.
To study this family it is easier to consider the clonal splitting tree constructed from the original splitting tree by cutting every branches beyond mutations. This clonal splitting tree is a standard splitting tree without mutations where individuals are killed as soon as they die or experience a mutation. The new lifespan law P V θ is then the minimum between an exponential random variable of parameter θ and an independent copy of V . As a splitting tree, one can study its contour process whose Laplace exponent is given, using simple manipulations on Laplace transforms, by In the case where α − θ > 0 (resp. α − θ < 0, α − θ = 0) the clonal population is supercritical (resp. sub-critical, critical), and we talk about clonal supercritical (resp. sub-critical, critical) case.
We denote by W θ the scale function of the Lévy process induced by this new tree, related to ψ θ as in (2.1). This leads to which, applied to the clonal splitting tree, allows obtaining after some easy calculations, from which one can deduce The main idea underlying our study is that the behaviour of any family in the CPP is the same as the clonal one but on a smaller time scale.
For the rest of this paper, unless otherwise stated, the notation P t refers to P (· | N t > 0) and P ∞ refers to the probability measure conditioned on the non-extinction event, denoted Non-Ex in the sequel.
Finally, we recall the asymptotic behavior of the scale functions W (t) and W θ (t), which is widely used in the sequel.
From this lemma, one can obtain that the probability that the clonal family reaches a fixed size at time t decreases exponentially fast with t.

Remark 2.4.
Note that Lemma 2.2 implies in particular that, for any positive integer k,

Statement of main results
In this section are stated the main results of the paper. In particular, the formulas for the moments of the frequency spectrum are given in Theorems 3.1 and 3.2.
For two positive real numbers a < t, we denote by N In the sequel, we use the following notation for multi-indexed sums: let K, N be two positive integers and 1 , . . . , K some non-negative integers, then the notation In order to lighten notation, we also use the convention that for any integer n and any negative integer k, n k = 0.
Recalling that P t is the conditional probability on the event {N t > 0} and that E t is the corresponding expectation, we now state our main results.
In Subsection 5.3, we also give formulas for the These formulas are explicit in the sense that any moments can be computed recursively from the lower order moments. As an application, these formulas we obtain an elementary proof of the following law of large numbers.
where E is an exponential random variable with parameter 1 conditionally on nonextinction, and c k is given by But before proving such Theorems 3.1 and 3.2, we need to introduce an important tool allowing to compute expectations of integrals w.r.t. random measures presenting particular independence structures. This is the purpose of the next section.

Expected stochastic integral using Palm theory
In this section, we use notation and vocabulary from [7]. Let X a be Polish space. We recall that a random measure is a measurable mapping from a probability space to the space M b (X ) of all boundedly finite measures on X , i.e. such that each bounded set has finite mass.
The purpose of this section is to prove an extension of the Campbell formula (see Proposition 13.1.IV in [7]), giving the expectation of an integral with respect to a random measure when the integrand has specific "local" independence properties w.r.t. to the measure.
For this purpose, we need to introduce the notion of Palm measure related to a random measure N . The presentation is borrowed from [7]. So let N be a random measure on X with intensity measure µ, and (X x , x ∈ X ) be a continuous random process with value in R + . Since this section is devoted to prove relations concerning only the distributions of N and X, we can assume without loss of generality that our random elements X and N are defined (in the canonical way) on the space where C(X ) denotes the space of continuous function on X . This space is Polish as a product of Polish spaces. We denote by F the corresponding product Borel σ-field.
For the random measure N , the corresponding Campbell measure C N is the measure defined on σ (F × B (X )) by extension of the following relation on the semi-ring F ×B (X ), It is straightforward to see that C N is σ-finite and for each F in F the measure C N (F × ·) is absolutely continuous with respect to µ. Then, from Radon-Nikodym's theorem, for each F ∈ F, there exist y ∈ X → P y (F ) in L 1 (µ) such that, uniquely defined up to its values on µ-null sets.
Since our probability space is Polish, P can be chosen to be a probabilistic kernel, i.e. for all F in F, y ∈ X → P y (F ) is mesurable, and for all y in X , F ∈ F → P y (F ) is a probability measure.
The probability measure P y is called the Palm measure of N at point y. Since X is continuous, it is B (X ) ⊗ F measurable, and it is easily deduced from this point that where E Px denotes the expectation w.r.t. P x . Formula (4.1) is the so-called Campbell formula.
We can now state, the main results of this section which are the aforementioned extensions of the above formula. Theorem 4.1. Let X be a continuous process from X to R + . Let N be a random measure on X with finite intensity measure µ. Assume that X is locally independent from N , that is, for all x ∈ X , there exists a neighbourhood V x of x such that X x is independent from N (V x ∩ ·). Suppose moreover that there exists an integrable random variable Y such However, the continuity condition of the preceding theorem prevent the application of this result to our model. We need a more specific result. Theorem 4.2. Let X be a process from [0, T ] × X to R + such that X .,x is càdlàg for all x and X s,. is continuous for all s. Let N be a random measure on [0, T ] × X with finite intensity measure µ. Assume that, for each s in [0, T ], the family (X s, Then we have Let J1, nK denotes the set N ∩ [1, n]. Before going further, we recall that a dissecting system is a sequence {A n,j , j ∈ J1, K n K} n≥0 of nested partitions of X , where (K n ) n≥0 is an increasing sequence of integers, such that In the spirit of the works of Kallenberg on the approximation of simple point processes, the proof of Theorems 4.1 is based on the following Theorem which can be found in [16] or in [20] (Section WIII.9). [16]). Let µ and ν be two finite measures on the Polish space X , such that µ is absolutely continuous with respect to ν. Let f be the Radon-Nikodym derivative of µ w.r.t. ν. Then, for any dissecting system {A n,j , j ∈ J1, K n K} n≥0 of X , we

Theorem 4.3 (Kallenberg
Proof of Theorem 4.1. Let {A n,j , j ∈ J1, K n K} n≥0 be a dissecting system of X . We denote by A n (x) the element of the partition (A n,j ) 1≤j≤Kn which contain x. Let also T be a denumerable dense subset of X . We use lower and upper approximations of X. More precisely, let for all positive integer k and for all a un X , Note that the supremum and infinimum are taken on T ∩ A k (a) to ensure that X (k) j and X (k) j are measurable, but the set T could be removed by continuity of X. We remark that, for any j, k, the measure E χ (k) j N (•) is absolutely continuous with respect to µ and it follows from Campbell's formula (4.1) that the Radon-Nikodym derivative is Moments of the frequency spectrum Thus, it follows from Theorem 4.3 that, µ-a.e., .
Then, since X (k) and X (k) are finite sums of such random variables, outside a µ-null set which can be chosen independent of k by countability. Now, since Now, since A n,j is a dissecting system, there exists an integer N such that, for all n > N , Finally, And the conclusion comes from (4.1).
Proof of Theorem 4.2. Clearly, we may assume without loss of generality that T = 1.
Define, for all integer M , Since X .,x is càdlàg, this sequence of processes converges pointwise to (X s,x , s ∈ [0, 1]) for all ω. Then, by Lebesgue's theorem, to conclude the proof.

Proofs of the moments formulas
The main goal of this section is to prove Theorems 3.1 and 3.2. Their proofs are given in Subsection 5.2. Subsection 5.3 is devoted to the computation of the joint moments of the frequency spectrum with 1 Z0(t)= . Subsection 5.4 shows an application of our theorems to the computation of the covariances of the frequency spectrum. The next subsection gives the key decomposition of the CPP.

Recursive construction of the CPP
Here we describe the general idea of the proof of Theorems 3.1 and 3.2 and give an alternative construction of the CPP. We consider the CPP at some time t. Suppose that a mutation occurs on branch i at a time a. Then, by construction of the CPP, the future of this family depends only on what happens on the branches (H j , i ≤ j < τ ) (see Figure 5), In fact, this set of branches is also a CPP with scale function W stopped at a (we talk about sub-CPP), and the number of individuals carrying the mutation at time t is the number of clonal individuals in this sub-CPP.  To capitalize on this fact, we introduce a construction of the CPP which underlines this independence. Suppose we are given a sequence P (i) i≥1 of coalescent point processes stopped at time a with scale function W . Then, take an independent CPP P, where the law of the branches corresponds to the excess over a of a branch with scale function W conditioned to be higher than a. As stated in the next proposition, the tree build from the grafting of the P (i) above each branch ofP is also a CPP with scale function W stopped at time t (see Figure 6).
at time t − a, and letN t−a denotes its population size. Let S 0 := 0 and The independence follows from the construction. We details the computation for the joint law of (H l , H k ) and leave the easy extension to the general case to the reader. Let k > l be two positive integers, and let also s 1 , s 2 be two positive real numbers. We denote by S the random set {S i , i ≥ 1}. Hence, + P a +Ĥ < s 1 P (H < s 2 | H < a) P (l / ∈ S, k ∈ S) + P (H < s 1 | H < a) P a +Ĥ < s 2 P (l ∈ S, k / ∈ S) + P a +Ĥ < s 1 P a +Ĥ < s 2 P (l ∈ S, k ∈ S) , whereĤ denotes a random variable with the law of the branches ofP, i.e. such that , ∀s > 0. Now, since the random variables S i are sums of geometric random variables, we get with p = P (k ∈ S). Moreover we have, Since the S i 's are sums of geometric random variables of parametersŴ (t − a) −1 , they follow binomial negative distributions with parameters i andŴ (t − a) −1 . Hence, A very simple application of this construction is the derivation of the expectation of A(k, t). Recall that this expectation was first calculated in [4], with a much more complicated proof.
Proof. Since A(k, t) is the number of types represented at time t by k individuals, it is equivalent to enumerate all the mutations and ask if they have exactly k clonal children at time t. This remark leads to the following integral representation of A(k, t): where N is defined in (2.2), and Z i 0 (a) denotes the number of alive individuals at time t carrying the same type as the type carried at time t − a on the ith branch of the CPP of the individuals alive at time t (the notation comes from the fact that Z i 0 (a) corresponds to the size of the clonal family in the sub-CPP induced by the ith individual at time t − a, see Figure 5). From Proposition 5.1, it follows that 1 Z i 0 (a)=k satisfies the conditions of Theorem 4.2, so

Proof of Theorems 3.1 and 3.2
Let a and t be two positive real numbers such that a < t, and n a positive integer. We call k-mutation, a mutation represented by k alive individuals at time t in the splitting tree. Let A (i) (k, a) k≥1 be the frequency spectrum in the i-th subtree of construction provided by Proposition 5.1.
To count the number of n-tuples in the set of k-mutations, we look along the tree and seek for mutations in the CPP. For each k-mutation encountered, we count the number of (n − 1)-tuples made of younger k-mutations. The (n − 1)-tuples should be enumerated by decomposition in each subtree in order to exploit the independence property of the subtrees of Proposition 5.1. Suppose that a mutation is encountered at a time a, then the number of (n − 1)-tuples made of younger mutations is given by So the number A(k, t) n of n-tuples of k-mutations is given by Finally, using that the N (da, di) = 1 Hi>t−a 1 i<Nt P(di, da) where P independent from the CPP (and, hence, from N (t) t−a ), it follows that which ends the proof of Theorem 3.1. The proof of Theorem 3.2 follows exactly the same lines, and we leave it to the reader.

Joint moments of the frequency spectrum and 1 Z0(t)=
In order to compute the terms of the form we need to ask that the sum of the number of clonal individuals in each subtree for which the type at time t − a is the ancestral type, is equal to k. We begin with the case E A(k, t)1 Z0(t)= in order to highlight the ideas. In this case, we have the following result.
t−a refers to the size whole population in the lower treeP of the construction of Proposition 5.1, we similarly define Z (t) 0 (a) as the size of the clonal population in the same tree (with the convention that mutations that occur at time t − a, i.e. on the leaves of the treeP, do not affect Z (t) 0 (a)). It follows that 0 (a) is not independent from P, but we have that Z which is given, thanks to Proposition 4.1 of [4], by (s, u)e −θs ds.
Proof. Let A 1 and A 2 denote the two terms of the r.h.s. of (5.4). We detail the computations of A 1 . The case A 2 is similar.
Finally, if we define, for all integer j, , and G j : we get These ideas also lead to the following formula, which is proved similarly.
Proof. According to Section 5.2, we have the following integral representation. We − B j . Now, following, as above, we Then, the sum with σ can be removed since there is no term depending on σ. Finally, integrating with respect to C(di) leads to the result.
Together with Theorems 3.1 and 3.2 and using the joint law of N (t) t−a and Z (t) 0 (a) given in (5.6), these formulas give explicit recursion to compute each factorial moment of the frequency spectrum.
Remark 5.6. Although, these formulas are quite heavy, an important interest lies in the method used to compute them. Indeed, this method should work to obtain the joint moments of A(k, t) with any quantity which can be expressed, at any time a, as the sum of contributions of each subtrees. For instance, since where N i a is the number of individuals of the i-th subtrees at time a, we are able to compute the joint moments of N t and (A(k, t)) k≥1 . For example, using the integral representation (5.1) of A(k, t) and following the proof of Theorem 5.2, we have that θda. (5.7)

Application to the computation of the covariances of the frequency spectrum
A quantity of particular interest is the limit covariance between two terms of the frequency spectrum, Proposition 5.7. Suppose that α > 0. Let k and l two positive integers, then, Proof. In order to show how quantities in Theorem 3.2 can be manipulated, we detail the proof.
Using Theorem 3.2, we obtain t−a E a A(l, a)1 Z0(a)=k + E a A(k, a)1 Z0(a)= da.
Recalling, from Proposition 5.1, that N (t) t−a is geometrically distributed with parameter W (a) it follows by Lemma 2.2 and Theorem 5.2, that The last equality follows from the identity

Asymptotic behaviour of the moments of the frequency spectrum
In this part, we study the long time behaviour of the moments of the frequency spectrum. From this point and until the end of this work, we suppose that the tree is supercritical, that is α > 0.
where the c ki 's are as defined in Proposition 5.7. Proof.
Step 1: Preliminaries and ideas. The proposition is proved by induction.
Using the symmetry of the formula provided by Theorem 3.2, we may restrict to the study of the term l = 1 in (3.1). Hence, we want to study We recall that the terms of the multi-sum in the above formula correspond to the ways of allocating the mutations in the subtrees. The analysis relies on the fact that the growth of each term depends on the repartition of the mutations. In particular, the main term correspond to the case where all mutations are allocated to different subtrees.
To capitalize on this fact, let M N (t) respectively. To simplify the analysis, we highlight three cases of interest: This set corresponds to the case where all the mutations are taken in different subtrees and are not taken in the tree where a mutation just occurred. In fact, this corresponds to the dominant term of (6.2) because as N (t) t−a tends to be large, the mutations tend to occur in different subtrees. Let also Finally, let Step 2: Uniform bound on the number of tuple of mutations in the subtrees.
Assuming that the relation of Lemma 6.1 is true for any multi-integer n such that |n | = |n| − 1, we have Since there are at most |n| − 1 multi-integers n m such that |n m | > 0 (because of the condition (6.3)), we can assume without loss of generality, up to reordering the indices, that n i m = 0, for all m ≥ |n|, and so all the terms with m > |n| in the product of (6.4) are equal to one. Hence, for some constant C n depending only on the choice of n in M |n| . Moreover, since M |n| is finite, then A(k i , a) n i m ≤ CW (a) |n|−1 . (6.6) Step 3: Analysis of C 1 .
For n ∈ C 1 , and in this case only, the product has only one term different from 1, and it follows from Theorem 5.2, that The corresponding contribution in (6.2) is Finally, where (x) (|n|) is the falling factorial of order |n|. Since, N (t) t−a is geometrically distributed under P t with parameter W (t) W (a) , it follows that Step 4: Analysis of C 2 .
We denote Now, since t−a is geometrically distributed with parameter W (a) , it follows that there exists a positive real numberĈ such that Which imply that, Step 5: Analysis of C 3 .
In the case where there is a positive n i Then, the expectation of the last quantity gives a polynomial of degree |n| − 1 in W (t) W (a) .
Using the same study as I 2 shows that this part is of order O tW (t) |n|−1 . Finally, summing over l ends the proof since the leading term is while the rest is a finite sum of O tW (t) |n|−1 -terms. By Lemma 2.2, where γ is equal to θ (resp. 2α − θ) in the clonal critical and subcritical cases (resp. supercritical case). Hence, we deduce (6.1).

Remark 6.2.
Taking the behavior of P (Z 0 (a) = k) into account and using the Cauchy-Schwartz inequality for E A(k, a)1 Z0(a)= one could actually prove that the error term in (6.1) is of order O W (t) |n|−1 in the clonal sub-critical and super-critical cases, and O log t W (t) |n|−1 in the clonal critical case. Corollary 6.3. We have, conditionally on non-extinction, where E is an exponential random variable with parameter 1.
Proof. From Lemma 6.1, we have Since the finite dimensional law of the process E (c k ) k≥1 is fully determined by its moments, it follows from the multidimensional moment problem (see [17]) and from the fact that the events {N t > 0} increase to the event of non-extinction, that we have the claimed convergence.

An elementary proof of the a.s. convergences of the frequency spectrum and the population counting process
The goal of this section is to prove in Subsection 7.2 the a.s. convergence of the frequency spectrum. We begin by showing the law of large numbers for N t . We recall once again that we are in the supercritical case (α > 0).

Convergence of the population counting process
Assume that α > 0, that is W (t) ∼ e αt ψ (α) . The goal of this section is to prove the almost sure convergence of the population counting process. We first show that the convergence holds in probability, using the convergence of the process which counts at time t the number N ∞ t of individuals having infinite descent. More formally, recalling that a splitting tree is a subset of R × ∪ k≥0 N k (see [19]), an individual (u, t) in the tree T is said to have infinite descent at time t if for any T > t there existũ in n≥0 N n such that (T, uũ) belong to T.
Finally, to obtain the almost sure convergence, we show in Theorem 7.2 that N t can not fluctuate faster than a Yule process.
Proposition 7.1. Let (N ∞ t , t ∈ R + ) be the number of alive individuals at time t having alive descendant at infinity. Then, under P ∞ , N ∞ is a Yule process with parameter α.
Proof. Let T, t ∈ R + . Recalling that, for T < t, N (T ) t is the number of individuals at time t who have alive children at time T , we extend the notation to t > T by setting N (T ) t = 0 in this case. Fix S a positive real number, we consider the quantity, There exists a finite time T S such that N converges to N ∞ a.s. for the Skorokhod topology of D [0, ∞) and N ∞ is a.s. càdlàg. Now, it remains to derive from N (T ) the law of the process N ∞ . Let 0 < s 1 < s 2 < · · · < s n < T . By a recursive use of Proposition 5.1, we see that, under P T , the process N Moreover, we have, by Lemma 2.2, This leads to, Since the right hand side term corresponds to the finite dimensional distribution of a Yule process with parameter α, this concludes the proof.
As N ∞ is a Yule process, e −αt N ∞ t converges a.s. to an exponential random variable of parameter 1, denoted E hereafter, when t goes to infinity (see for instance [2]).  Proof. We first look at the quantity, First note that N ∞ t can always be written as a sum of Bernoulli trials, we have e αt = (p t (W (t) − 1) +p t ) P (N t > 0) P (Non-ex) .
P (Non-ex) = E e −αV , where V is a random variable with law P V (i.e. the lifetime of a typical individual). It then follows, from Lesbegue Theorem that, where the second equality comes from the fact that N t is geometrically distributed with parameter W (t) −1 under P t .
Recalling also that N ∞ t is geometrically distributed with parameter e −αt under P ∞ , it follows that Hence, it follows from (7.3), (7.4), (7.2) and Lemma 2.2, that E t e −2αt (N ∞ t − ψ (α)N t ) 2 = O e −βt . (7.5) Let us define now, for all integer n, t n = 2 β log n. Then, by the previous estimation, it follows from Borel-Cantelli lemma and a Markov-type inequality that, lim n→∞ e −αtn N tn = ψ (α)E, a.s., (7.6) on the survival event. From this point, we need to control the fluctuation of N between the times (t n ) n≥1 . The births can be controlled by comparisons with a Yule process, but the deaths are harder to control. For this, we use that, by (7.6), e −αtn+1 N tn+1 − e −αtn N tn is small, for n large. Y tn+1−tn − Y s−tn > e αtn + P tn e −αtn N tn − e −αtn+1 N tn+1 > ≤ 2 P tn Y tn+1 − Y tn > e αtn + P tn e −αtn N tn − e −αtn+1 N tn+1 > .
Since Markov inequalities are not precise enough to go further, we need to compute exactly the probability,   Now, for the convergence in L 2 , we have that Proof. Using (5.7) and the bound E N a 1 Z0(a)=k ≤ E [N a ], it follows that Finally, it follows from Lemma 2.2 that where γ is equal to θ (resp. 2α − θ) in the clonal critical and sub-critical cases (resp. supercritical case).
From this point we follow the proof of Theorem 7.2, except that the Yule process used in (7.7) must be replaced by another Yule process corresponding to the a binary fission every time an individual experiences a birth or a mutation, i.e. the new Yule process has parameter b + θ. Indeed, the process A(k, t) can make a positive jump only in two cases: the first corresponding to the birth of an individual in a family of size k − 1, the other one correspond to a mutation occurring on an individual in a family of size k + 1.