Path-properties of the tree-valued Fleming-Viot process

We consider the tree-valued Fleming-Viot process, $(\mathcal X_t)_{t\geq 0}$, with mutation and selection as studied in Depperschmidt, Greven, Pfaffelhuber (2012). This process models the stochastic evolution of the genealogies and (allelic) types under resampling, mutation and selection in the population currently alive in the limit of infinitely large populations. Genealogies and types are described by (isometry classes of) marked metric measure spaces. The long-time limit of the neutral tree-valued Fleming-Viot dynamics is an equilibrium given via the marked metric measure space associated with the Kingman coalescent. In the present paper we pursue two closely linked goals. First, we show that two well-known properties of the neutral Fleming-Viot genealogies at fixed time $t$ arising from the properties of the dual, namely the Kingman coalescent, hold for the whole path. These properties are related to the geometry of the family tree close to its leaves. In particular we consider the number and the size of subfamilies whose individuals are not further than $\ve$ apart in the limit $\ve\to 0$. Second, we answer two open questions about the sample paths of the tree-valued Fleming-Viot process. We show that for all $t>0$ almost surely the marked metric measure space $\mathcal X_t$ has no atoms and admits a mark function. The latter property means that all individuals in the tree-valued Fleming-Viot process can uniquely be assigned a type. All main results are proven for the neutral case and then carried over to selective cases via Girsanov's formula giving absolute continuity.

roughly speaking, in the limit ε → 0, approximately 2/ε balls of radius ε are needed to cover the whole tree; see Section 4.2 in D. Aldous' review article ( [Ald99]). Equivalently, there are 2/ε families whose individuals have a common ancestor not further than ε in the past. Moreover, these 2/ε families have sizes of order ε. More precisely, the size of a typical family is exponential with parameter 2/ε (see eq (35) in [Ald99]), and the empirical distribution of the family sizes converges to this exponential distribution. However, these results have been proved only for the genealogy of a population at a fixed time.
In a series of papers of the authors, in part with A. Winter, [GPW09, DGP11, GPW13, DGP12] the Kingman coalescent was extended to a tree-valued process (X t ) t≥0 , where X t gives the genealogy of an evolving population at time t. The resulting process, the tree-valued Fleming-Viot process, is connected to the Fleming-Viot measure-valued diffusion, which describes the evolution of type-frequencies in a large (i.e. infinite) population of constant size. In the simplest case of neutral evolution all individuals have the same chance to produce viable offspring, i.e., the frequency of offspring of any subset of individuals is a martingale. However, biologically most interesting is the selective case where the evolutionary success of an individual depends on its (allelic) type and where also mutation (i.e. random changes in types) may occur. This case including mutation and selection was studied in [DGP12]. We note that rather than studying the full-tree valued process in the infinite population limit, it is possible to obtain limits of its functionals directly as well. For the neutral tree-valued Fleming-Viot process, this has been done for the height [PW06,DDSJ10] and the length [PWW11]. In addition, functionals of other tree-valued processes have been studied, e.g. for the height of the tree in branching processes [ER10] and for the height and length of a population with the Bolthausen-Sznitman coalescent as long-time limit [Sch12].

Goals:
The construction of the tree-valued Fleming-Viot process allows one to ask if the above mentioned properties of the geometry of the Kingman coalescent trees are almost sure path properties of the tree-valued Fleming-Viot process. Furthermore, while we gave a construction of the tree-valued Fleming-Viot process under neutrality in [GPW13] and under mutation and selection in [DGP12], some questions about path behavior remained open. We will carry over some (not all) of the geometric properties of the fixed random trees to the evolving paths of trees in Theorems 1 -4 of this work.
In the next section, we explain in detail how we model genealogical trees. In order to formulate open questions let us briefly mention here that we use a marked metric measure space (mmm-space), that is, a triple (U, r, µ) where (U, r) is a complete metric space describing genealogical distances between individuals and µ is a probability measure on the Borel-σ algebra of U × A, where A is the set of possible (allelic) types. In particular, the tree-valued Fleming-Viot process (X t ) t≥0 takes values in the space of (continuous) paths in the space of mmm-spaces.
To state two open questions from earlier work (see Remark 3.11 in [DGP12]), let X t = (U t , r t , µ t ) be the state of the tree-valued Fleming-Viot process at time t ≥ 0. First, we ask if the measure µ t has atoms for some t > 0. To understand what this means, recall that the state of the measure-valued Fleming-Viot process is purely atomic for all t > 0, almost surely. However, in the tree-valued case, existence of an atom in the measure µ t ∈ M 1 (U t × A) implies that there exists a set of positive µ t -mass such that individuals belonging to this set have zero genealogical distance to each other. As we will see in Theorem 5, this is not possible, and the tree-valued Fleming-Viot process is non-atomic for all t > 0, almost surely. Second, we ask if every individual in U t can uniquely be assigned a type which is of course the case for the Moran model, but does not automatically carry over to the (infinite population) diffusion limit. This is the case iff the support of µ t is given by {(u, κ t (u)) : u ∈ U t } for a function κ t : U t → A. In Theorem 6, we will see that this is indeed the case and every individual can be assigned a type for all t > 0, almost surely.

Methods:
Since the tree-valued Fleming-Viot process was constructed using a wellposed martingale problem, we will frequently use martingale techniques in our proofs. These allow us to study the sample Laplace-transform for the distance of two points of the tree as a semi-martingale. In addition, population models have specific features that will also be useful. For example all individuals have unique ancestors even though not all individuals have descendants and if an individual has a descendant, she might as well have many. This simple structure can be used for finite population models (e.g. the Moran model) or the tree-valued Fleming-Viot process, since this infinite model arises as a large-population limit from finite Moran models (for the neutral case see Theorem 2 of [GPW13] and for the selective case Theorem 3 of [DGP12]) to derive properties of the family structure.
An important point of the proofs is that we can transfer properties from the neutral case since for most forms of selection (which are determined by the interacting fitness functions, which gives the dependence of the offspring distribution depends on the allelic type), the resulting process is absolutely continuous to the neutral case (which comes with no dependency between allelic type and offspring distribution) via a Girsanov transform.
Outline: The paper is organized as follows: In Section 2 we recall the definition of the state space of the tree-valued Fleming-Viot process, its construction by a wellposed martingale problem and some of its properties. In Section 3, we give our main results. Theorem 1 states that the law of large numbers for the number of ancestors of Kingman's coalescent holds along the whole path of the tree-valued Fleming-Viot process. Moreover, we discover a Brownian motion within the tree-valued Fleming-Viot process based on the fluctuations of the number of ancestors; see Theorem 2. Another law of large numbers is obtained for a statistic concerning the family sizes and we make a big step towards this result in Theorem 3. Another Brownian motion is discovered within the tree-valued Fleming-Viot process based on family sizes in Theorem 4. Finally we show the non-atomicity along the path in Theorem 5 and obtain existence of a mark function in Theorem 6.
In Section 4 we prove Theorem 1 and after some preparatory moment computations in Section 5, we give in the subsequent sections the remaining proofs of the main results. We note that various proofs have been carried out using Mathematica and can be reproduced by the reader via the accompanying Mathematica-file.

The tree-valued Fleming-Viot process
In this section, we recall the tree-valued Fleming-Viot process given as the unique solution of a martingale problem on the space of marked metric measure spaces. The material presented here is a condensed version of results from [GPW09,DGP11,GPW13] and [DGP12]. We only recall notions needed to follow our arguments in the present paper. Let us fix some notation first.

Notation 2.1.
For a Polish space E the set of all bounded measurable functions is denoted by B(E), its subset containing the bounded and continuous functions by C b (E), the set of càdlàg function I ⊆ R → E by D E (I) (which is equipped with the Skorohod topology) and the subset of continuous functions by C E (I). The set of probability measures on (the Borel σ-algebra of) E is denoted by M 1 (E) and ⇒ denotes either weak convergence of probability measures or convergence in distribution of random variables. If φ : E → E for some Polish space E then the image measure of µ ∈ M 1 (E) under φ is denoted by φ * µ. For functions λ → a λ and λ → b λ , we write a λ b λ if there is C > 0 such that a λ ≤ Cb λ uniformly for all λ. Furthermore for λ 0 ∈ R ∪ {±∞} we write a λ λ→λ0 ≈ b λ if a λ and b λ are asymptotically equivalent as λ → λ 0 , i.e. if a λ /b λ → 1 as λ → λ 0 . For product spaces E 1 × E 2 × . . . we denote the projection operators by π E1 , π E2 , . . . . When there is no chance of ambiguity we use the shorter notation π 1 , π 2 , . . . .

The state space: genealogies as marked metric measure spaces
At any time t ≥ 0 the state of the neutral tree-valued Fleming-Viot process without types is a genealogical tree describing the ancestral relations among individuals alive at time t. Such trees can be encoded by ultrametric spaces and vice versa where the distance of two individuals is given by the time back to their most recent common ancestor. Adding selection and mutation to the process requires that we not only keep track of the genealogical distances between individuals but also of the type of each individual. This leads to the concept of marked metric measure spaces which we recall here. For more details and interpretation of the state space we refer to Section 2.3 in [DGP12] and to Remark 2.2 below.
Throughout, we fix a compact metric space A which we refer to as the (allelic) type space. An A-marked (ultra-)metric measure space, abbreviated as A-mmm space or just mmm-space in the following, is a triple (U, r, µ), where (U, r) is an ultra-metric space and µ ∈ M 1 (U × A) is a probability measure on U × A.
Remark 2.2 (Interpretation of equivalent marked metric measure spaces).
1. In our presentation, only ultra-metric spaces (U, r) will appear. The reason is that we only consider stochastic processes whose state at time t describes the genealogy of the population alive at time t, which makes r an ultra-metric.
2. There are several reasons why we consider equivalence classes of marked metric spaces instead of the marked metric spaces themselves. The most important is that we view a genealogical tree as a metric space on its set of leaves. Since in population genetic models the individuals are regarded as exchangeable (at least among individuals carrying the same allelic type), reordering of leaves does not change (in this view) the tree.
In order to construct a stochastic process with càdlàg paths and state space U A , we have to introduce a topology. To this end, we need to introduce test functions with domain U A .
Since we use polynomials as the domain of the generator for the tree-valued Fleming-Viot process, we need to restrict this class to smooth functions.

Definition 2.6 (Smooth polynomials).
We denote by Π 1 := Φ φ as in (2.4) : φ bounded, measurable and for all a ∈ A N , φ(·, a) ∈ C 1 b R ( N 2 ) (2.8) the set of smooth (in the first coordinate) polynomials. Furthermore we denote by Π 1 n the subset of Π 1 consisting of all Φ φ for which φ(r, a) depends at most on the first n 2 coordinates of r and the first n of a and hence have degree at most n.
The marked Gromov-weak topology on U A is the coarsest topology such that all Φ φ ∈ Π 1 with (in both variables) continuous φ are continuous.
The following is from Theorems 2 and 5 in [DGP11]: Proposition 2.8 (Some topological facts about U).
The following properties hold: 1. The space U A equipped with the marked Gromov-weak topology is Polish.
2. The set Π 1 is a convergence determining algebra of functions, i.e. for random U A -valued variables X, X 1 , X 2 , . . . we have (2.9)

Construction of the tree-valued FV-process
The tree-valued Fleming-Viot process will be defined via a well-posed martingale problem. Let us briefly recall the concept of a martingale problem.

Definition 2.9 (Martingale problem).
Let E be a Polish space, P 0 ∈ M 1 (E), F ⊆ B(E) and Ω a linear operator on B(E) with domain F. The law P of an E-valued stochastic process X = (X t ) t≥0 is a solution of the (P 0 , Ω, F)-martingale problem if X 0 has distribution P 0 , X has paths in the space D E ([0, ∞)), almost surely, and for all F ∈ F, is a P-martingale with respect to the canonical filtration. The (P 0 , Ω, F)-martingale problem is said to be well-posed if there is a unique solution P.
4. For selection, consider the fitness function χ is continuous and continuously differentiable with respect to its third coordinate. Then, with α ≥ 0 (the selection intensity) and we say that selection is additive and conclude that with (2.24) Remark 2.10 (Interpretation of generator terms). The growth, resampling, mutation and selection generator terms are interpreted as follows: 1. Growth: The distance of any pair of individuals is given by the time to the most recent common ancestor (MRCA). When time passes this distance grows at speed 1. Note that in [GPW13] and [DGP12] the corresponding distance was twice the time to MRCA. The reason for this change were some simplifications of the terms in the computations that we will see later.
3. Let Φ = Φ φ ∈ Π 1 n such that φ is symmetric under permutations. Then, the quadratic variation of the semi-martingale Φ(X ) := (Φ(X t )) t≥0 is given by (2.26) 4. Let P α be the distribution of X with selection intensity α. Then, for all α, α ≥ 0, the laws P α and P α are absolutely continuous with respect to each other.
5. If either (i) α = 0 and the process with generator Ω mut has a unique equilibrium or (ii) α ≥ 0 and mutation has a parent-independent component, then the process X is ergodic. That is, there is an U A,c -valued random variable X ∞ , depending on the model parameters but not the initial law, such that X t t→∞ = == ⇒ X ∞ .
Using the same notation as in Proposition 2.11, we call the process X the tree-valued Fleming-Viot process and in the case α = 0 its ergodic limit is called Kingman marked measure tree.
The random variable X ∞ arises from the marked ultrametric measure space which is associated with the partition-valued entrance law of the Kingman coalescent [GPW09].

Results
Our main goal is to establish almost sure properties of the paths of the tree-valued Fleming-Viot process, beyond continuity of paths and the property that the states are compact marked metric measure space for every t > 0, almost surely. We start by studying the geometry of the marked metric measure tree at time t of the tree-valued Fleming-Viot process. First we recall in Section 3.1 some well-known facts concerning the geometry of the Kingman coalescent and then extend them in Section 3.2 to the tree-valued Fleming-Viot process. In Section 3.3 we take advantage of our results and techniques and state some further path properties of the tree-valued Fleming-Viot process answering two open questions.

Geometric properties of the Kingman coalescent near the leaves
We focus on the Kingman marked measure tree X ∞ introduced in Proposition 2.11.5, but for most assertions in this subsection we can ignore the marks (i.e. think of A consisting of only one element). Since the introduction of the partition-valued Kingman coalescent in [Kin82a], this random tree has been studied extensively for instance in [Ald99] and [Eva00] -see also [BB09]. In our present formalism (using metric measure spaces), X ∞ appeared first in [GPW09]. In this section, we mostly reformulate known results, but also add a new one in Proposition 3.6. The Kingman measure tree, X ∞ , arises from the partition-valued Kingman coalescent, but can also be realized as a discrete graph tree using the following construction (see also Figure 1). Let S 2 , S 3 , . . . be independent exponentially distributed random variables with parameter 1. Starting with two lines from the root the tree stays with these two lines for time S 2 / 2 2 . At time S 2 one of the two lines chosen at random splits in two, such that three lines are present. In general after the jump from k − 1 to k lines the tree stays with that k lines for a period of time S k / k 2 and then one of the k lines chosen at random splits, such that there are k + 1 lines. The total tree height is thus T 1 , where T n := S n+1 / n+1 2 + S n+2 / n+2 2 + · · · , i.e. T n is the time it takes the coalescent to go from n to infinitely many lines. The time of the root is called the time of the most recent common ancestor (MRCA) and T is the present time of the population. In order to derive the Kingman marked metric measure tree, consider the uniform distribution on the branches and construct a tree-indexed Markov process, by using a collection of independent mutation processes as follows. Start with an equilibrium value of the mutation processes at the root up to the next splitting time where we continue with two independent mutation processes both starting from the type in the vertex, etc. Running from the root to the leaves and letting time approach T we finally obtain X ∞ . T1   T2   T3   T4   T5   T6 limn Tn = 0 x y Figure 1: A construction of the Kingman measure tree (X ∞ , r ∞ , µ ∞ ) without marks. In the "dashed region" the tree comes down from infinitely many lines at the treetop (time 0) to six lines at time T 6 . We have r ∞ (x, y) = T 3 . The thick grey sub-tree is the closed and open ball of radius T 3 around x and around y. The balls coincide because r ∞ is an ultrametric.
At time ε (counted from the top of the tree, for ε < T 1 ), a random number N ε of lines are present. Equivalently, N ε is the minimal number of ε-balls needed to cover (the leaves of) the random tree X ∞ . It is a well-known fact using de Finetti's Theorem that the frequency of the family descending from every of the N ε lines can be defined for all ε > 0. In addition, these frequencies are distributed as the spacings between N ε on [0, 1] uniformly distributed random variables [Pyk65].
>From these considerations several results on the geometry of X ∞ near the leaves can be derived. We briefly recall and extend some of them and reprove them later in our setting. Roughly we will show that there are 2/ε ± O(1/ √ ε)-many families in which the genealogical distance between the individuals is at most ε. Furthermore, each of the families has mass of order ε, as ε → 0. More precisely, the distribution of (by ε rescaled) family sizes is exponential with rate 2. We split the above picture in two parts. First we study the number of families and then their size in both cases looking at a LLN and then at a CLT. We begin with a law of large numbers and a central limit theorem for N ε (see (35) in [Ald99]). Our proofs are given in Sections 4.2 and 4.3.
(3.1) Proposition 3.2 (CLT for the number of balls to cover X ∞ ).
With the same notation as in Proposition 3.1 and Z ∼ N (0, 1), We now come to the family structure of X ∞ = (U ∞ , r ∞ , µ ∞ ) close to the leaves. For ε > 0, we define B ε (1), . . . , B ε (N ε ) ⊆ U ∞ as the disjoint balls of radius ε that cover U ∞ and the corresponding frequencies by Recall that in an ultrametric space two balls of the same radius are either equal or disjoint (see also Figure 1). Therefore, the vectors (F 1 (ε), . . . , F Nε (ε)) above are defined in a unique way. It can be viewed as the frequency vector of a sequence of exchangeable random variables and we can ask for the law of the empirical distribution of the scaled masses in the limit ε → 0, where the underlying sequence, even if scaled, becomes i.i.d. and we should get the scaled law of a single scaled F i . It turns out, a first step (cf. Remark 3.5) is to see that the following law of large numbers holds, the proof of which (together with the proof of Lemma 3.4) appears in Section 6.

Proposition 3.3 (Asymptotics of ball masses near the leaves).
For F i (ε) as above, almost surely.
The classical proof of Proposition 3.3 uses the fact that the random vector F 1 (ε), . . . , F Nε (ε) has the same distribution as the vector of spacings between N ε random variables uniformly distributed on [0, 1]. This vector in turn has the same distribution as Y 1 /( variables. Then, using a moment computation, (3.5) can be proved. For details we refer to Section 2 in [Eva00]. We will use a different route for which we need the following auxiliary Tauberian result.
This means that the Kingman coalescent at distance ε from the tree top consists of approximately 2/ε families, and the size of a randomly sampled family has an exponentially distributed size with expectation ε/2, in particular the rescaled empirical measure of the family sizes converges weakly to the exponential distribution with mean 2, denoted by Exp(1/2).
In order to show this assertion using moments of (F i (ε)) i=1,...,Nε , it is necessary and sufficient that for k = 1, 2, . . . (3.9) The sufficiency follows since the moment problem for the exponential distribution is well posed, while for the necessity, we assume that (3.8) holds, and then one concludes (recall the notation ≈ from Remark 2.1) as well as, for k ≥ 2, to derive a CLT. Here, (36) in [Ald99] states that where (B 0 t ) 0≤t≤1 is a Brownian bridge. Another, for us more suitable formulation is to consider the sum multiplied by N −1 ε instead of ε/2, so that Z disappears on the right hand side. In this case one would consider the fluctuations of the empirical measure of masses of the B(ε)-balls that cover the Kingman coalescent tree.
We have so far investigated the behavior near the treetop looking at the family sizes with respect to fixed degree ε of kinship for ε → 0. This picture can be refined by obtaining fluctuation results in (3.5) (or (3.7)). We obtain a partial result by considering a degree of kinship ε/t for t varying in R + and letting ε → 0. This gives a profile of the family sizes of varying degrees of kinship and their correlation structure close to the leaves, if we view the scaling limit as a function of t > 0. This profile should be the deterministic flow of distributions {Exp(t/2) : t > 0} which are the limits of (3.13) as ε → 0. Again we consider the Laplace transform given through Ψ 12 λ and obtain the following fluctuation result -proved in Section 6.4.

Proposition 3.6 (Fluctuations of scaled small masses in small balls).
Let X ∞ be the Kingman measure tree. Define the process Z λ := (Z λ t ) t≥0 by (3.14) Then every sequence (Z λn ) n≥0 with λ n → ∞ has a convergent subsequence (λ n ) n≥0 with Z λ n n→∞ ===⇒ Z, (3.16) Remark 3.7 (Is Z Gaussian?). We conjecture that there is a unique limit process Z in Proposition 3.6. Moreover, we note that Var[Z t ] and E[Z 4 t ] are in the relation if Z t ∼ N (0, 1/2t), which raises the question whether Z is a Gaussian process.

Path properties: the tree-valued Fleming-Viot process near the leaves
Although the Kingman measure tree, X ∞ , only arises as the long-time limit of the neutral tree-valued Fleming-Viot process, X = (X t ) t≥0 , near the leaves, X t (for t > 0) and X ∞ have similar geometry. The reason is that the structure near the leaves of X ∞ or X t only depends on resampling events in the (very) recent past. Hence, we expect that the properties of X ∞ from Propositions 3.1 and 3.3 hold along the paths of X . This will be shown in Theorem 1 and Theorem 3, respectively. Furthermore we conjecture (but don't have a proof) that the more ambitious refinements described in Remark 3.5 (see (3.9)) also hold along the paths. In addition, in the stationary regime X 0 d = X ∞ , Theorems 2 and 4 give two results on convergence to a Brownian motion along the tree-valued Fleming-Viot process.
The following theorem is proved in Section 4.4.
Theorem 1 (Uniform convergence of εN ε along paths). Let X = (X t ) t≥0 with X t = (U t , r t , µ t ) be the tree-valued Fleming-Viot process (started in some X 0 ∈ U A ) and selection coefficient α ≥ 0. Moreover, let N t ε be the number of ε-balls needed to cover (U t , r t ). Then, P lim ε→0 εN t ε = 2 for all t > 0 = 1. This program is now carried out along the tree-valued Fleming-Viot process.
In order to obtain a meaningful limit object, we consider time integrals. It is important to understand that the part of the time-t tree X t which is at most ε apart from the treetop is independent of F s := σ(X r : 0 ≤ r ≤ s) as long as s ≤ t − ε. The following is proved in Section 4.5.
Theorem 2 (A Brownian motion in the tree-valued Fleming-Viot process).
Let X = (X t ) t≥0 with X t = (U t , r t , µ t ) be the neutral tree-valued Fleming-Viot process (i.e. α = 0) started in equilibrium, X 0 d = X ∞ , and B ε = (B ε (t)) t≥0 given by where B = (B t ) t≥0 is a Brownian motion started in B 0 = 0. In (3.18) one would rather like to replace E[N ∞ ε ] by 2/ε to measure the fluctuations around the limit profile, i.e. to consider B ε := ( B ε (t)) t≥0 defined by instead of B ε . We will see that B ε converges as ε → 0 to a Brownian motion B with drift, but unfortunately we cannot identify the latter. Indeed, from Proposition 3.2, in particular using boundedness of second moments, we see that, approximately, However, this only implies E[N ∞ ε ] = 2/ε+o(1/ε) and the error term can be large. In order to sharpen this expansion to E[N ∞ ε ] = 2/ε + O(1), we use results from [Tav84]. His Section 5.4 (with θ = 0 and i = ∞) yields with ρ k (ε) = exp(−k(k − 1)ε/2). From this, writing δ := √ ε we also see that as ε → 0. This, together with Theorem 2, implies that B ε is of the form that is, B is a Brownian motion with drift.
Now we come to a generalization of Proposition 3.3 to the tree-valued Fleming-Viot process. Together with Lemma 3.4, we obtain the following result on the Laplace transform of two randomly sampled points. The proof is based on martingale arguments which will also be useful in the proof of Theorem 6. Theorem 3 is proved in Section 6.4.
Based on these computations, we can only claim convergence in probability rather than almost sure convergence.
As an ultimate goal one would want to prove that (compare with (3.8)) This would mean that the assertion that roughly the tree consists of 2/ε families of mean ε/2 exponentially distributed sizes holds at all times. Using our conclusions from Remark 3.5, this goal can be achieved if we show that (3.9) holds for k = 1, 2, . . . uniformly at all times. (While the case k = 1 is trivial, note that a combination of Theorem 3 and Lemma 3.4 gives (3.9) for k = 2.) In principle, the technique of our proof of Proposition 3.3 can be extended in order to obtain (3.9) for a given but arbitrary k which would require controlling higher order moments of Ψ 12 λ . If we could do this for general k then we would obtain a proof of (3.8). But since we are using Mathematica for these calculations the problem remains open.
Again, we can formulate a result on fluctuations. Integrating over time (to get a process rather than white noise) the quantity (λ + 1)Ψ 12 (X t ) − 1, which appears in Theorem 3, and using the right scaling, we again obtain a Brownian motion as the weak limit. The following result is proved in Section 6.5.

Remark 3.11 (A heuristic argument).
Assume that λ is large. Then, (λ + 1)Ψ 12 λ (X s ) − 1 depends approximately only on resampling events which happened within an interval [s − C/λ, s] for some large C. In particular, on different time intervals (which are at least of order 1/λ apart), the increments of W λ are approximately independent. Thus, it is reasonable to expect that the limiting process is a local martingale. In fact, using some stochastic calculus we can show that the limiting process is continuous (i.e. the family {W λ : λ > 0} is tight in the space C R ([0, ∞))) and the limiting object of (W 2 λ (t) − t) t≥0 is a local martingale as well. By Lévy's characterization of Brownian motion, W must be a Brownian motion.

Path properties: non-atomicity and mark functions
Using the calculus developed for the statements in Section 3.2 we obtain two further properties of the states of the tree-valued Fleming-Viot process X = (X t ) t≥0 , X t = (U t , r t , µ t ), namely that the states are atom-free and admit a mark function. More precisely, Theorem 5 says that at no time it is possible to sample two individuals with distribution µ t with distance zero; cf. Remark 3.12 below. Furthermore Theorem 6 says that we can assign marks to all individuals in the sense that µ t has the form µ t (du, da) = (π Ut ) * µ t (du)δ κt(u) (da) for some measurable function κ t : U t → A. These two theorems are proved in Section 7.
Theorem 5 (X t never has an atom).
Let X = (X t ) t≥0 with X t = (U t , r t , µ t ) be the tree-valued Fleming-Viot process. Then, P(µ t has no atoms for all t > 0) = 1. 1. At first glance the fact that µ t is non-atomic for all t > 0 might seem to contradict the fact that the measure-valued Fleming-Viot diffusion is purely atomic for every t > 0. However, both properties are of different kind and the probability measures in question are different objects: µ t is a sampling measure and the state of the measure-valued Fleming-Viot diffusion is a probability measure on the type space. The above theorem implies that randomly sampled individuals from the tree-valued Fleming-Viot process have distance of order 1, whereas genealogically the atomicity of the measure-valued Fleming-Viot diffusion expresses the fact that at every time t > 0 one can cover the state with a finite number of balls with radius t.
2. The proof is based on a simple observation: for a measure µ ∈ M 1 (E), Hence, the proof of (3.29) is based on a detailed analysis of the Laplace transform of the distance of two points, independently sampled with distribution µ t .
The next goal is to establish that at any time there is a mark function. Briefly, the state (U, r, µ) of a tree-valued population dynamics admits a mark function κ iff every individual u ∈ U can be assigned a (unique) type κ(u) ∈ A. This situation occurs in particular in finite population models, e.g. in the Moran model. The question for the tree-valued Fleming-Viot model is whether types in the finite Moran model can change at a fast enough scale so that an individual can have several types in the large population limit. Such a situation can occur, if the cloud of very close relatives (as measured in the metric r) is not close in location (as measured in the type space A).

Definition 3.13 (Mark function).
We say that (U, r, µ) ∈ U A admits a mark function if there is a measurable function κ : U → A such that for a random pair (U, A) with values in U × A and distribution µ κ(U) = A µ-almost surely. (3.31) (3.32) We set U mark A := (U, r, µ) ∈ U A : (U, r, µ) admits a mark function . (3.33) Remark 3.14 (mmm-spaces admitting a mark function are well-defined).
Let us note that admitting a mark function is a property of an equivalence class. Assume In other words, (U , r , µ ) admits the mark function κ = κ • ϕ.
Theorem 6 (X t admits a mark function for all t).

Remark 3.15 (Mark functions and the lookdown process).
For a series of exchangeable population models it is possible to construct the state of an infinite population via the lookdown construction [DK96,DK99]. This construction immediately allows us to define a mark function on a countable number of individuals specifying their types at all times, which suggests that (3.35) should hold. However, the metric space read off from the lookdown process is not complete, and the mark function is not continuous. It seems possible to extend the definition of the mark function to the completion of the corresponding metric space by defining a (right-continuous) markfunction on the tree from root to the leaves. However, we do not pursue this direction here. Instead, our proof of Theorem 6 in Section 7.2 uses again martingale arguments and moment computations.

Strategy of proofs
The proofs of our results are of two types. On the one hand, the proofs of Propositions 3.1 and 3.2, Theorems 1 and 2 use as the basic tools the fine properties of coalescent times in Kingman's coalescent. This means they are carried out without specific martingale properties of the tree-valued Fleming-Viot process. On the other hand, Propositions 3.3 and 3.6, Theorems 3, 4, 5 and 6 are proved by calculating expectations (moments) of polynomials, which is possible by using the martingale problem for the tree-valued Fleming-Viot process. The polynomials we have to consider here (see also Remark 2.5) are either Ψ 12 λ or Ψ 12 λ , i.e. polynomials based on the test functions ϕ(r, a) = exp(−λr 12 ) or ϕ(r, a) = exp(−λr 12 )1 {a1=a2} and products, powers and linear combinations thereof. For the calculations of the moments of this type we develop some methodology which we explain in Section 5. Propositions 3.1 and 3.2, Theorems 1 and 2 are proved in Section 4 while Propositions 3.3 and 3.6, Theorems 3 and 4 are proved in Section 6. The latter results are then used to prove Theorems 5 and 6 in Section 7.

Proof of Propositions 3.1 and 3.2 and of Theorems 1 and 2 4.1 Preparation: times in the Kingman coalescent
Recall from Section 3.1 that T n = S n+1 / n+1 2 + S n+2 / n+2 2 + · · · is the time the Kingman coalescent needs to go down to n lines, where S 2 , S 3 , . . . are i.i.d. exponential random variables with rate 1. Before we begin, we prove some simple results on the times T n .
Lemma 4.1 (Moments and exponential moments of T n ). Let T n be the time the Kingman coalescent needs to go from infinitely many to n lines. Then, Next, For fourth moments, For sixth moments, (4.6) With analogous calculations, the results for the 8th moment follows.
Finally for the exponential moments, we compute for any λ ≥ 0 (4.7)

Proof of Proposition 3.1
Let T n be as in the last subsection and recall N ε from Proposition 3.1. Then, (3.1) is (4.8) In order to see this, note that N Tn = n by definition of T n and (4.9) Since T n ↓ 0 as n → ∞ (and N ε ↑ ∞ as ε → 0), the equivalence of (3.1) and (4.8) follows.

Proof of Proposition 3.2
By the Lindeberg-Feller central limit theorem, we see from the moment computations of Lemma 4.1 that Recalling (4.9), we set such that for every x ∈ R: However, we also need to show convergence of moments up to fourth order. We write To estimate the right hand side of (4.15) we first show that for a suitably chosen δ > 0 where the equality follows because for x ≥ 6/ε the integrand is identically 0. Using the exponential Chebyshev inequality, we obtain for all λ y, now taking the lower bound for c, setting δ = 2δ/3 and using (4.7) we get (4.17) Now choose λ y,ε = (2−y+1/2) 2 ε 2 and let δ = 0.39 be the solution of √ 4 − δ = 1.9. For y ∈ (1.9, 2] we have It is easy to see that on the interval [1.9, 2] the function y → (− 55 12 + 13 3 y − y 2 ) is bounded below by a = 0.04 (its value in 1.9). It follows for a suitable constant c > 0, and hence we have shown (4.16).
In order to show convergence of fourth (and second) moments, using (4.15), since for some c > 0 and x ≥ 0. For this, we get, by the Markov inequality and (4.6) (4.23) Since the area x < 0 is restricted to x ≤ c/ √ ε in (4.21), the O(·)-term on the right hand side of (4.22) does not have a pole. It is now easy to obtain an integrable function dominating (4.21), leading to (3.3).

Proof of Theorem 1
By Proposition 2.11.5. (see Theorem 2 of [DGP12] for details) the tree-valued Fleming-Viot process with selection has a law which is absolutely continuous with respect to the neutral process. Therefore it suffices to consider the neutral case, α = 0. We observe that for α = 0 the claim is not affected by mutation. Moreover, it suffices to deal with the case X 0 d = X ∞ . The reason is that (3.17) is equivalent to the assertion that for all δ > 0 and uniformly for all t > δ we have lim ε→0 εN t ε = 2 almost surely. Then one can use the independence of N t ε and X 0 for ε < t.

Let
T t n := inf{ε > 0 : (U t , r t ) can be covered by n balls of radius ε}, i.e. T t n is the minimal time we have to go back from time t such that we have n ancestral lineages. It suffices to show (see around (4.8)) that P(nT t n n→∞ − −−− → 2 for all t > 0) = 1.
To prove this, we need to extend the proof of Proposition 3.1. It suffices to show that (4.25) Therefore considering the process along a discrete grid, we have for any ε > 0, − 2) bounds from above and from below.
Fleming-Viot process. (This property holds in every population model, arising as a diffusion limit from an individual based population where we can define ancestors, since the ancestors at time t − ε of the population at time t must then be ancestors at time t − ε = s − ε + (t − s) of the population at time s, for t − ε < s < t.) We can now write, for ε > 0 and n > 2/ε: (4.27) Hence lim sup n→∞ sup 0≤t≤1 nT t n ≤ 2 almost surely, by the Borel-Cantelli lemma. For the other direction of the inequality we (4.28) Hence lim inf n→∞ sup 0≤t≤1 nT t n ≥ 2 almost surely.
Combining both, the estimate from above and from below we obtain the assertion of Theorem 1.

Proof of Theorem 2
We proceed in the following four steps.
• Step 0: Warm up; computation of the first two moments of B ε (t) − B ε (s).
• Step 1: Computation of the first two conditional moments of B ε (t) − B ε (s).
Throughout we let (F t ) t≥0 be the canonical filtration of X = (X t ) t≥0 .
Step 0: Computation of first two moments of B ε (t) − B ε (s). The first moment of B ε (t) − B ε (s) equals 0 since by assumption the tree-valued Fleming-Viot process is in equilibrium.
For the second moment, we start by noting that (see Proposition 3.2) for some random variable Z s ∼ N (0, 1). Note that N s ε and N t δ are independent given (The reason is that in this case N s ε depends on resampling events in the time interval (s − ε, s] while N t δ only depends on resampling events in (t − δ, t], and these two sets of events are independent.) Without loss of generality, we set s = 0 and compute the variance of B ε (t) as (4.30) In order to compute the integrand of the last expression, we decompose N 0 The former is the number of lines the tree at time 0 looses between times ε − δ and ε in the past, and therefore only depends on resampling events between times −ε and δ − ε. The latter only depends on resampling events between times δ − ε and δ. Hence, (4.31) Consider now the dual representation of our equilibrium by the Kingman coalescent.
Let K N i be number of lines of a subtree, starting with N lines, in a tree starting with ∞ many lines, at the time the big tree has i lines. In Lemma 4 of [PWW11] it is shown that (4.32) Hence, for independent Z, Z ∼ N (0, 1), (4.34) Step 1: Computation of first two conditional moments of B ε (t) − B ε (s). We can compute the first conditional moment as since we started in equilibrium and N r ε is independent of F s for r > s + ε. So, by (4.37) For the second conditional moment, we extend our calculation from Step 0. Here, by Proposition 3.2. So, combining the last two displays with (4.34), (4.40) Step 2: The family of the laws of (B ε ) ε>0 on C R ([0, ∞)) is tight for ε → 0. We use the Kolmogorov-Chentsov criterion; see e.g. Corollary 16.9 in [Kal02]. We bound the fourth moment of the increment in B ε . We write a ε := E[N ∞ ε ] such that, again making use of the independence of N s2 ε and N s1 ε if |s 2 − s 1 | > ε, as well as of Proposition 3.2, we estimate for fixed t and ε → 0 (4.41) and the tightness follows.
Step 3: If (B t ) t≥0 is a limit point, then (B t ) t≥0 as well as (B 2 t − t) t≥0 are martingales. Let B = (B t ) t≥0 be a weak limit point of {(B ε (t)) t≥0 : ε > 0}, which has continuous paths by Step 2. We know from Step 0 that B t is square integrable which allows for the following calculations. For 0 ≤ r 1 < · · · ≤ r n ≤ s and some continuous bounded function f : R n → R, by (4.37) which shows that (B 2 t − t) t≥0 is a martingale. Then by Lévy's characterization of Brownian motion, B is a Brownian motion. The key to the proofs of Proposition 3.3 and Theorems 3, 4, 5 and 6 is a proper understanding of the behavior of the functions Ψ 12 λ (X t ) and Ψ 12 λ (X t ) from Remark 2.5 along the paths of the tree-valued Fleming-Viot dynamics. In this section, we provide useful tools for the analysis of these functions. In particular, we are going to compute moments up to fourth order. Throughout this section we assume that X = (X t ) t≥0 is the neutral tree-valued Fleming-Viot equilibrium process, for the mutation operator, we will assume throughout this section that β(u, ·) is non-atomic for all u ∈ A.
This assumption implies that every mutation event leads to a new type. In particular, this will be crucial in Lemma 5.8.

and its increments
The key is to obtain the power at which Ψ 12 λ vanishes as λ → ∞ respectively at which order the increments of t → ((λ + 1)Ψ 12 λ (X t ) − 1) vanish as t → 0. Proofs of the next two lemmata are given in Section 5.3.

Lemma 5.1 (Convergence for fixed times).
For all t ≥ 0, in L 2 (and therefore also in probability).

Remark 5.3 (Tightness).
If the right hand side of (5.4) would have been Ct 1+ε for some ε > 0, then Lemma 5.2 would imply the needed tightness for Theorem 3 by the Kolmogorov-Chentsov tightness criterion; see e.g. Corollary 16.9 in [Kal02]. However, since it is only linear in t, we will have to use fourth moments to get the desired tightness property.
The main goal of this subsection is therefore to compute the fourth moments of the increments of t → (λ + 1)Ψ 12 (X t ) and of t → (λ + 2ϑ + 1) Ψ 12 (X t ). The proof of the next two results are given in Sections 5.5 and 5.6.
For all t ≥ 0, as λ → ∞, In particular, the convergences in (5.3) hold in L 4 as well.
Lemma 5.5 (Fourth moment of increment of t → ((λ + 1)Ψ 12 λ (X t )). There exists C > 0, such that (5.8) We start with second moments and afterwards, we provide an automated procedure how to compute higher order moments using Mathematica.
We start by defining some functions which appear in the first and second moment equations for the polynomials Ψ 12 λ and Ψ 12 λ defined in Example 2.5.
Definition 5.6 (Some functions on U). We define the following functions in Π 1 : , , .

Remark 5.7 (Connections and a first moment).
Our main object of study are the functions Ψ 12 λ and Ψ 12 λ . Note that since X 0 has the same law as X ∞ and since in X ∞ the time it takes that two randomly chosen lines coalesce is an exponential random variable R with parameter 1, we have Moreover, if mutations arise at constant rate ϑ > 0, consider again two randomly chosen lines which coalesce at time R, and the first mutation event at an independent, exponentially distributed time S with rate ϑ, using (5.2): Moreover, we write and analogously for Ψ 12,34 λ .
Next we compute the action of the generator of X on the functions from Definition 5.6.
Proof of Lemma 5.1. We only give the full proof for the first assertion in (5.3) since the second follows exactly along the same lines. The proof amounts to computing the variance of Ψ 12 λ . Therefore, we need to understand the expectation of (Ψ 12 λ ) 2 = Ψ 12,34 λ (for the last equality, see (2.6)). To this end we express the Ψ λ 's in term of the Υ λ 's and obtain Ψ 12 λ = Υ 12 λ + Proof of Lemma 5.2. Again, we restrict our proof to (5.4) since (5.5) is proved analogously. We compute and (recall that we start in equilibrium) and the result follows.

Moments up to fourth order
For the proofs of Lemma 5.4 and Lemma 5.5 we use moment calculations of the increments of (Ψ 12 λ (X t )) t≥0 and ( Ψ 12 λ (X t )) t≥0 up to fourth order. The calculations are presented in an algorithmic way such that higher moments can be computed along the same lines. The fundamental idea is that all computations that need to be done are linear maps on the vector spaces: We point out that one advantage of using matrix algebra is that it is possible to use computer algebra software such as Mathematica for automated computations.
Apparently, a basis of the vector spaces V λ and V λ are given by (recall from Definition 5.6)

The matrix representation of the generator Ω
Here, we give the matrix representation of the generator Ω in terms of the basis Ψ λ , i.e. we find matrices A and A such that (5.31) The following list contains the action on the first 36 basis vectors from Ψ λ and Ψ λ .

For the matrices A (and A)
representing Ω in terms of the basis Ψ λ (and Ψ λ ), we give here only the first 13 (5) rows and columns for A ( A), which are dealing with samples of at most three (two) pairs (i.e. |I| ≤ 3 (≤ 2) in (5.26)). For A, we get Note that both matrices are diagonalisable. The reason is that the submatrix containing the same eigenvalue (take 3λ + 6 in A above as an example) is diagonal. Hence, there are three independent eigenvectors for this eigenvalue and so, we find a basis of eigenvectors.
In the rest of this section, the calculations for the Ψ k λ 's and Ψ k λ 's are completely analogous using the Υ k λ 's instead of the Υ k λ 's. Hence, we only present the calculations concerning Ψ k λ 's.

Conditioning as a linear map
During our calculations, we need to be able to compute terms like E n k=1 a k Ψ k λ (X t )|F s for s ≤ t. We do this using the following procedure. We have given the following objects: such that ΩΥ k λ = λ k Υ k λ , i.e. Υ k λ is an eigenvector of the generator Ω for the eigenvalue λ k (where k ∈ {∅; 12; 12, 12; 12, 23; 12, 34; . . . }. As argued below, these eigenvectors exist, since A is diagonalisable. 2. A diagonal matrix D t−s with diagonal entries e −λ k (t−s) in the kth line. Since the λ k 's are given by 1., this matrix is readily obtained.
3. An invertible matrix M These two matrices accomplish a change of basis from Ψ λ to Υ λ and back.
Since Υ λ is the basis of eigenvectors of the matrix A, the matrices M Υ λ Ψ λ and M Ψ λ Υ λ are obtained by standard linear algebra. We stress that both matrices are lower triangular since A is lower triangular.
Then, because (e λ k t Υ k λ (t)) t≥0 are martingales, (compare with Lemma 5.8), Apparently, Q is the matrix for a linear map on V λ , with respect to the basis Ψ λ .

Proof of Lemma 5.5
Now we can compute the fourth moments of the increments of t → (λ + 1)Ψ 12 (X t ).
Again, we only give the proof of the first assertion. The second follows by replacing the Ψ k λ 's by the Ψ k λ 's. Using Minkowski's inequality and the fact that we start in equilibrium we have (5.46) The expression in the last line can easily be integrated in s over [0, t], because only D s depends on s. In addition, X 0 is in equilibrium and hence, the expectation can be evaluated using the equilibrium distribution of X . All three terms in the right hand side of (5.45) can be computed analogously. Using Mathematica, we see that (5.8) holds.
The fourth moment was already given in (6.1).
Step 3: Tightness of Z λ in path-space. Here, we show that there exists C > 0, which is independent of λ, such that Hence, it suffices to prove the assertion when started in equilibrium, i.e. 1. holds for all assertions which concern properties which depend only on distances below a threshold and in particular all limiting properties close to the leaves.
Step 3: Convergence of finite-dimensional distributions to 0. This follows from Lemma 5.1.

Proof of Theorem 4
We proceed in several steps.
Throughout we let (F t ) t≥0 be the canonical filtration of the process (X t ) t≥0 .

Proof of Theorem 6
It turns out that the following simple criterion for existence of a mark function is useful.

Lemma 7.1 (Criterion for mark function).
An mmm-space (U, r, µ) ∈ U A admits a mark function if there is a sequence ε n ↓ 0 with lim n→∞ P A 1 = A 2 |r U (U 1 , U 2 ) < ε n = 1.  We proceed in three steps to prove Theorem 6. • Step 1: Proof of Lemma 7.1. • Step 2: An extension of Theorem 3. • Step 3: Combination of Steps 1 and 2 gives Theorem 6.
Step 1: Proof of Lemma 7.1. Since µ is a probability measure on U × A, we can write µ(du, da) = (π U ) * µ(du) ⊗ K(u, da) for some probability kernel K from U to A. We have to show that K(u, da) only has a single atom for (π U ) * µ-almost every u. We proceed by contradiction and assume that K(u, ·) is not concentrated on a single atom for a set U ⊆ U of positive (π U ) * µ-probability, i.e. µ(U × A) = δ > 0.