Continuum Space Limit of the Genealogies of Interacting Fleming-Viot Processes on $\Z$

We study the evolution of genealogies of a population of individuals, whose type frequencies result in an interacting Fleming-Viot process on $\Z$. We construct and analyze the genealogical structure of the population in this genealogy-valued Fleming-Viot process as a marked metric measure space, with each individual carrying its spatial location as a mark. We then show that its time evolution converges to that of the genealogy of a continuum-sites stepping stone model on $\R$, if space and time are scaled diffusively. We construct the genealogies of the continuum-sites stepping stone model as functionals of the Brownian web, and furthermore, we show that its evolution solves a martingale problem. The generator for the continuum-sites stepping stone model has a singular feature: at each time, the resampling of genealogies only affects a set of individuals of measure $0$. Along the way, we prove some negative correlation inequalities for coalescing Brownian motions, as well as extend the theory of marked metric measure spaces (developed recently by Depperschmidt, Greven and Pfaffelhuber [DGP12]) from the case of probability measures to measures that are finite on bounded sets.


Introduction and main results
In the study of spatial population models on discrete geographic spaces (for example Z d ), such as branching processes, voter models, or interacting Fisher-Wright diffusions (Fleming-Viot models), the technique of passing to the spatial continuum limit has proven to be useful in gaining insight into the qualitative behaviour of these processes. A key example is branching random walks on Z d , leading to the Dawson-Watanabe process [Daw77] on R d and Fisher-Wright diffusions; catalytic branching and mutually catalytic branching on Z, leading to SPDE on R [KS88, EF96, DP98, DEF + 02b, DEF + 02a]. The goal of this paper is to carry out this program at the level of genealogies, rather than just type or mass configurations. We focus here on interacting Fleming-Viot models on Z.
1.1. Background and Overview. We summarize below the main results of this paper, recall some historical background, as well as state some open problems.
Summary of results. The purpose of this paper is twofold. On the one hand, we want to understand the formation of large local one-family clusters in Fleming-Viot populations on the geographic space Z 1 , by taking a space-time continuum limit of the genealogical configurations equipped with types. On the other hand we use this example to develop the theory of tree-valued dynamics via martingale problems in some new directions. In particular, this is the first study of a tree-valued dynamics on an unbounded geographical space with infinite sampling measure, which requires us to extend both the notion of marked metric measure spaces in [GPW09,DGP11] and the martingale problem formulations in [GPW13,DGP12] to marked metric measure spaces with infinite sampling measures that are boundedly finite (i.e., finite on bounded sets).
Here is a summary of our main results: (1) We extend the theory of marked metric measure spaces [GPW09,DGP12] from probability sampling measures to infinite sampling measures that are boundedly finite, which serve as the state space of marked genealogies of spatial population models. See Section 1.2. (2) We characterize the evolution of the genealogies of interacting Fleming-Viot (IFV) models by well-posed martingale problems on spaces of marked ultrametric measure spaces. See Section 1.3. (3) We give a graphical construction of the spatial continuum limit of the IFV genealogy process, which is the genealogy process of the so-called Continuum Sites Steppingstone Model (CSSM), taking values in the space of ultrametric measure spaces with spatial marks and an infinite total population. The graphical construction is based on the (dual) Brownian web [FINR04]. The CSSM genealogy process has the peculiar feature that, as soon as t > 0, the process enters a regular subset of the state space that is not closed under the topology. This leads to complications for the associated martingale problem and for the study of continuity of the process at time 0. See Section 1.4. (4) We prove that under suitable scaling, the IFV genealogy processes converge to the CSSM genealogy process. The proof is based on duality with spatial coalescents, together with a novel approach of controlling the genealogy structure using a weaker convergence result on the corresponding measure-valued processes, with measures on the geographic and type space (with no genealogies). See Section 1.5. (5) We show that the CSSM genealogy process solves a martingale problem with a singular generator. More precisely, its diffusion term carries coefficients of a singular nature, which only acts on a part of the geographical space with zero measure.
In particular, the generator is only defined on a regular subset of the state space. See Section 1.6. (6) We prove some negative correlation inequalities for coalescing Brownian motions, which are of independent interest. See Appendix C.
Besides the description of the genealogies of the current population, we also prepare the ground for the treatment of all individuals ever alive, i.e. fossils, moving from the state space of marked ultrametric measure spaces to the state space of marked measure R-trees, which will be carried out elsewhere.
History of the problem: Why are we particularly interested in one-dimensional geographic spaces for our scaling results? Many interacting spatial systems that model evolving populations, i.e., Markov processes with state spaces I G (I = R, N, [0, 1], etc., and G = Z d or the hierarchical group Ω N ) that evolve by a migration mechanism between sites and a stochastic mechanism acting locally at each site, exhibit a dichotomy in their longtime behavior. For example, when G = Z d and the migration is induced by the simple symmetric random walk: in dimension d ≤ 2, one observes convergence to laws concentrated on the traps of the dynamic; while in d ≥ 3, nontrivial equilibrium states are approached and the extremal invariant measures are spatially homogeneous ergodic measures characterized by the intensity of the configuration. Typical examples include the voter model, branching random walks, spatial Moran models, or systems of interacting diffusions (e.g., Feller, Fisher-Wright or Anderson diffusions). One obtains universal dimension-dependent scaling limits for these models if an additional continuum spatial limit is taken, resulting in, for example, super Brownian motion (see Liggett [Lig85] or Dawson [Daw93]).
In the low-dimension regime, the cases d = 1 and d = 2 are very different. In d = 2, one observes for example in the voter model the formation of mono-type clusters on spatial scales t α/2 with a random α ∈ [0, 1], a phenomenon called diffusive clustering (see Cox and Griffeath [CG86]). In the one-dimensional voter model, the clusters have an extension of a fixed order of magnitude but exhibit random factors in that scale. More precisely, in spacetime scales ( √ t, t) for t → ∞, we get annihilating Brownian motions. Similar results have been obtained for low-dimensional branching systems (Klenke [Kle00], Winter [Win02]), systems of interacting Fisher-Wright diffusions (Fleischmann and Greven [FG94], [FG96] and subsequently [Zho03], [DEF + 00] ) and for the Moran model in d = 2 (Greven, Limic, Winter [GLW05]).
In all these models, one can go further and study the complete space-time genealogy structure of the cluster formation and describe this phenomenon asymptotically by the spatial continuum limit. In particular, the description for the one-dimensional voter model can be extended to the complete space-time genealogy structure, obtaining as scaling limit the Brownian web [Arr,TW98,FINR04]. More precisely, the Brownian web is defined by considering instantaneously coalescing one-dimensional Brownian motions starting from every space-time point in R × R. It arises as the diffusive scaling limit of continuous time coalescing simple symmetric random walks starting from every space-time point in Z × R, which represent the space-time genealogies of the voter model. This is analogous to the study of historical process for branching processes, which approximates the ancestral paths of branching random walks by that of super Brownian motion (see e.g. [DP91,FG96,GLW05]).
The basic idea behind all this is that, we can identify the genealogical relationship between the individuals of the population living at different times and different locations. This raises the question of whether one can obtain a description of the asymptotic behavior of the complete genealogical structure of the process on large space-time scales, which will in turn allow for asymptotic descriptions of interesting genealogical statistics that are not expressible in a natural way in terms of the configuration process.
These observations on the genealogical structure goes back to the graphical construction of the voter model due to Harris, and continues up to the historical process of Dawson and Perkins for branching models [DP91], or representation by contour processes [GJ98,Ald93]. To better describe genealogies, the notion of R-trees, marked R-trees or marked measure Rtrees were developed as a framework [Eva97,EPW06]. These objects contain the relevant information abstracted from the labeled genealogy tree, where every individual is coded with its lifespan and its locations at each time. Such a coding means in particular that all members of the population are distinguished, which information is mostly not needed. In the large population limit, it suffices to consider the statistics of the population via sampling.
For this purpose, one equips the population with a metric (genealogical distance), a probability measure (the so-called sampling measure, which allows to draw typical finite samples from the population), and a mark function (specifying types and locations). This description in terms of random marked metric measure spaces (in fact, the metric defines a tree) is the most natural framework to discuss the asymptotic analysis of population models, since it comprises exactly the information one wants to keep for the analysis in the limits of populations with even locally infinitely many individuals. The evolution of a process with such a state space is described by martingale problems [GPW13,DGP12].
Open problems and conjectures. We show in this paper that the spatial continuum limit of the one-dimensional interacting Fleming-Viot genealogy process solves a martingale problem. However, due to the singular nature of the generator, the uniqueness of this martingale problem is non-standard, and we leave it for a future paper.
Instead of Z, resp. R, as geographic spaces, one could consider the hierarchical group Ω N = ⊕ N Z N , with Z N being the cyclical group of order N , reps. the continuum hierarchical group Ω ∞ N = Z Z N . Brownian motion on R can be replaced by suitable Lévy processes on Ω ∞ N and the program of this paper can then be carried out. The Brownian web would have to be replaced by an object based on Lévy processes as studied in [EF96], [DEF + 00]. We conjecture that the analogues of our theorems would hold in this context. A further challenge would be to give a unified treatment of these results on R, Ω ∞ N . Another direction is to consider the genealogy processes of interacting Feller diffusions, catalytic or mutually catalytic diffusions, interacting logistic Feller diffusions, and derive their genealogical continuum limits. A more difficult extension would be to include ancestral paths as marks, which raises new challenges related to topological properties of the state space.
Outline of Section 1. The remainder of the introduction is organized as follows. In Subsection 1.2, we recall and extend the notion of marked metric measure spaces needed to describe the genealogies. In Subsection 1.3, we define the interacting Fleming-Viot (IFV) genealogy process via a martingale problem and give a dual representation in terms of a spatial coalescent. In Subsection 1.4, we give, based on the Brownian web, a graphical construction for the continuum-sites stepping stone model (CSSM) on R and its marked genealogy process, which is the continuum analogue and scaling limit of the interacting Fleming-Viot genealogy process on Z, under diffusive scaling of space and time and normalizing of measure, a fact which we state in Subsection 1.5. In Subsection 1.6, we formulate a martingale problem for the CSSM genealogy process. In Subsection 1.7 we outline the rest of the paper.
1.2. Marked Metric Measure Spaces. To formulate our main results, we first need to introduce the space in which the genealogies of interacting Fleming-Viot processes take their value. We will regard the genealogies of all individuals in the population as a marked metric measure space. Here is the precise definition of marked metric measure spaces, extending the one introduced in [DGP11], which considered probability measures.
Definition 1.1 (V -mmm-spaces). Let (V, r V ) be a complete separable metric space with metric r V , and let o be a distinguished point in V .
1. We call (X, r, µ) a V -marked metric measure space (V -mmm space for short) if: (i) (X, r) is a complete separable metric space, (ii) µ is a σ-finite measure on the Borel σ-algebra of X × V , with µ(X × B o (R)) < ∞ for each ball B o (R) ⊂ V of finite radius R centered at o. 2. We say two V -mmm spaces (X, r X , µ X ) and (Y, r Y , µ Y ) are equivalent if there exists a measureable map ϕ : X → Y , such that for all x 1 , x 2 ∈ supp(µ X (· × V )), In other words, ϕ is an isometry between supp(µ X (· × V )) and supp(µ Y (· × V )), and the induced map ϕ is mark and measure preserving. We denote the equivalence class of (X, r, µ) by (1.3) (X, r, µ).
3. The space of (equivalence classes of ) V -mmm spaces is denoted by 4. The subspace of (equivalence classes of ) V -mmm spaces which admit a mark function is denoted by Note that M V depends both on the set V and the metric r V since the latter defines the sets on which the measure must be finite.
Marked metric measure spaces were introduced in [DGP11], which extends the notion of metric measure spaces studied earlier in [GPW09]. Definition 1.1 is exactly the analogue of [DGP11, Def. 2.1], where µ is a probability measure. The basic interpretation in our context is that: X is the space of individuals; r(x 1 , x 2 ) measures the genealogical distance between two individuals x 1 and x 2 in X; the measure µ is a measure on the individuals and the marks they carry (which can be spatial location as well as type, or even ancestral paths up to now), allowing us to draw samples from individuals with marks in a bounded set.
To define a topology on M V that makes it a Polish space, we will make use of the marked Gromov-weak topology introduced in [DGP11, Def. 2.4] for V -mmm spaces equipped with probability measures. The basic idea is that, given our assumption on µ in Definition 1.1.1.(ii), we can localize µ to finite balls in V to reduce µ to a finite measure. We can then apply the marked Gromov-weak topology (which also applies to finite measures instead of probability measures) to require convergence for each such localized version of the V -mmm spaces. We will call such a topology V -marked Gromov-weak # topology, replacing weak by weak # , following the terminology in [DVJ03, Section A2.6] for the convergence of measures that are bounded on bounded subsets of a complete separable metric space. Note that vague convergence is for measures that are finite on compact rather than bounded subsets. Both notions agree, however, on Heine-Borel spaces (compare, [ALW]). Definition 1.2 (V -marked Gromov-weak # Topology). Fix a sequence of continuous functions ψ k : V → [0, 1], k ∈ N, such that ψ k = 1 on B o (k), the ball of radius k centered at o ∈ V , and ψ k = 0 on B c o (k + 1). Let χ := (X, r, µ) and χ n := (X n , r n , µ n ), n ∈ N, be elements of M V . Let ψ k · µ be the measure on X × V defined by (ψ k · µ)(d(x, v)) := ψ k (v)µ(d(x, v)), and let ψ k ·µ n be defined similarly. We say that χ n −→ n→∞ χ in the V -marked Gromov-weak # topology if and only if: (X n , r n , ψ k · µ n ) =⇒ n→∞ (X, r, ψ k · µ) in the Gromov-weak topology for each k ∈ N.
When V = R d , we may choose ψ k to be infinitely differentiable.
Remark 1.3 (Dependence on o and (ψ k ) k∈N ). Note that the V -marked Gromov-weak # topology does not depend on the choice o ∈ V and the sequence (ψ k ) k∈N , as long as ψ k has bounded support and A k := {v : ψ k (v) = 1} increases to V as k → ∞.
The proof of the following result combines the corresponding results in [DGP12] and [ALW], and will be deferred to Appendix A.
Points in M V , as well as weak convergence of M V -valued random variables, can be determined by the so-called polynomials on M V , which are defined via sampling a finite subset on the V -mmm space.
(a) Let Π k n := {Φ n,φ,g : φ ∈ C k b (R ( n 2 ) + , R), g ∈ C bb (V n , R)}, which we call the space of monomials of order n (with differentiability of order k). Let Π k 0 be the set of constant functions. We then denote by Π k := ∪ n∈N 0 Π k n the set of all monomials (with differentiability of order k).
Remark 1.7. Note that the polynomials form an algebra of bounded continuous functions.
Theorem 1.8 (Convergence determining class). We have the following properties for Π k , for each k ∈ N ∪ {0, ∞}: . We defer the proof of Theorems 1.5 and 1.8, as well as some additional properties of V -mmm spaces, to Appendix A.
Recall M V fct from Definition 1.1, and notice that M V fct is not complete, and that we therefore choose the bigger space M V as the state space. The space M V allows an individual x ∈ X to carry a set of marks, equipped with the conditional measure of µ on V given x ∈ X. If each individual carries only a single mark which we can identify via a mark function κ : X → V , the corresponding marked metric measure space is an element of M V fct . This will be the case for the interacting Fleming-Viot process that we will study. It can be shown that every element in M V fct is an element of the closed space (1.9) Remark 1.9. For the models we consider, the genealogies lie in Polish spaces which arise as closed subspaces of M V . Note that the current population alive corresponds to the leaves of a genealogical tree, and hence the associated V -mmm space is ultrametric. We will denote the space of V -marked ultrametric measure spaces by U V . They form a closed subspace of M V and hence U V is Polish. The same holds when we consider the subset of M V , whose associated measures all follow a prescribed measure when projected onto the mark space V .
1.3. Interacting Fleming-Viot (IFV) Genealogy Processes. We now study the genealogies of the measure-valued interacting Fleming-Viot (IFV) processes on a countable geographic space and with allelic types, typically taken from the type space [0, 1] (see [DGV95] for details on IFV), which is motivated by the following individual-based model, the so-called Moran model. Consider a population of individuals with locations indexed by a countable additive group V (for us this later will be Z). The individuals migrate independently according to rate one continuous time random walks with transition probability kernel a(·, ·), We denote the transition kernel of the time reversed random walks by Individuals furthermore reproduce by resampling, where every pair of individuals at the same site die together at exponential rate γ > 0, and with equal probability, one of the two individuals is chosen to give birth to two new individuals at the same site with the same type as the parent. This naturally induces a genealogical structure. The genealogical distance of two individuals at time t is 2 min{t, T } plus the distance of the ancestors at time 0, where T is the time it takes to go back to the most recent common ancestor. Imposing the counting measure on the population with each individual carrying its location as a mark, we obtain a V -mmm space and its equivalence class is the state of the genealogy process.
Letting now the number of individuals per site tend to infinity and normalizing the measure such that each site carries population mass of order one, we obtain a diffusion model, the interacting Fleming-Viot (IFV) genealogy process with the space of V -marked ultrametric measure spaces with the property that the projection of µ onto the countable geographic space V is counting measure, the 1 indicating that the measure restricted to each colony is a probability measure. This (see Remark 1.9) is again a Polish space. (For the diffusion limit in the case of a finite geographic space V , see [GPW13], [DGP12]).
Remark 1.10. If we introduce as marks (besides locations from a countable geographic space G) also allelic types from some set K (typically taken as a closed subset of [0, 1]), then the type is inherited at reproduction and V = K × G is the product of type space and geographic space. In this case the localization procedure in Definition 1.2 applies to the geographic space, since K is compact.
1.3.1. The genealogical IFV martingale problem. We now define the interacting Fleming-Viot (IFV) genealogy process via a martingale problem for a linear operator L FV on C b (U V 1 , R), acting on polynomials. For simplicity we first leave out allelic types, which we introduce later in Remark 1.12.
The linear operator L FV on C b (U V 1 , R), with domain Π 1,0 as introduced in Definition 1.6 (b), consists of three terms, corresponding respectively to aging, migration, and reproduction by resampling. With X = (X, r, µ) and Φ = Φ n,φ,g ∈ Π 1,0 , encodes the replacement by migration of the j-th sampled individual corresponding to a jump from location v j to v , while θ k, encodes the replacement of the -th individual by the k-th individual (both at the same site), more precisely, The following results cover a particular case of evolving genealogies of interacting Λ-Fleming-Viot diffusions which will be studied in [GKW]. For the sake of completeness we present all proofs in Appendix D. The first result states that there is a unique U V 1 -valued diffusion process associated with this operator.
It is a special case of the martingale problem characterization of interacting Λ-Fleming-Viot genealogy processes which are studied in detail in forthcoming work [GKW]. We therefore defer the proof to Appendix D.
(ii) The solutions (for varying initial conditions) define a strong Markov, and Feller process with continuous path. (iii) If the initial state admits a mark function, then so does the path for all t > 0 almost surely.
Remark 1.12. If we add the type of an individual as an additional mark, i.e., V := G×K with geographic and type space respectively, then the same result holds if we modify as follows. We require the states to satisfy the constraint that the projection of µ on the geographic space V is still the counting measure. The test functions Φ should be modified so that we multiply g : G n → R, acting on the locations of the n sampled individuals, by another factor g typ : K n → R, acting on the types of the individuals. The generator L FV should be modified accordingly, so that g typ changes at resampling from g typ to g typ • θ k, , with θ k, replacing the -th sampled individual by the k-th one, see [DGP12]).
Remark 1.13. The process X FV t = (X t , r t , µ t ) has the property that the measurevalued process X t given by the collection {µ t (X t × {i} × ·), i ∈ G}, is the IFV process on (M 1 (K)) G .
From Section 1.4 onward, we will choose V = Z. However in the subsequent analysis, it is important to observe that we can embed Z into R and view U Z as a closed subspace of U R , and view the IFV genealogy process as U R -valued process.
1.3.2. Duality. The IFV genealogy process X FV can be characterized by a dual process, the spatial coalescent. The formulation given here can also incorporate mutation and selection (see [DGP12]).
The dual process is driven by a system of n coalescing random walks for some n ∈ N, labeled from 1 to n starting at ξ 0,1 , · · · , ξ 0,n ∈ V at time 0. Each walk i is identified with the partition element {i} for a partition of the set {1, 2, · · · , n}. We order partition elements by their smallest element they contain. We also fix two functions, φ ∈ C b (R ( n 2 ) + , R), which takes as its arguments the genealogical distances between the n coalescing walks, and g ∈ C bb (V n , R), which takes as its arguments the initial positions ξ 0 := (ξ 0,1 , . . . , ξ 0,n ) of the walks.
The dynamics of the dual process is as follows. Partition elements migrate independently on V till they merge, when they follow the same walk, according to rate 1 continuous time random walks with transition kernelā. Independently, every pair of partition elements at the same location in V merge at rate γ. At time t, we define the genealogical distance r t (i, j) of two individuals i and j in {1, 2, · · · , n} as 2 min{t, T i,j }, where T i,j is the first time the two walks labeled by i and j coalesce, i.e., i and j are in the same partition element. The positions of the n walks are denoted by ξ t := (ξ t,1 , . . . , ξ t,n ).
This way we obtain a process (K t ) t≥0 with states (π, ξ , r ) ∈ S n , where π is a partition of the set {1, · · · , n} with cardinality |π|, ξ records the position of the walk at the present time, and r := (r (i, j)) 1≤i<j≤n is the genealogical distance matrix of the individuals at the present time. We set S = n∈N S n .
For each n, φ and g as in Definition 1.6, we define a duality function H : where X = (X, r, µ) ∈ U V 1 , K = (π, ξ , r ) ∈ S, from X an independent sample (x, v) ∈ X × V is taken according to µ for for each partition element in π, where all individuals in the partition element are assigned the same value (x, v), and r π = (r π(k),π( ) ) 1≤k< ≤n , v π = (v π(i) ) i=1,··· ,n , with π(i) being the partition element of π containing i. In other words, we concatenate, compare Remark 1.20, X and K to obtain a new V -mmm space and then apply φ and g.
Note that {H(·, K) : K ∈ S n , φ ∈ C ∞ b (R ( n 2 ) + , R), g ∈ C bb (V n , R), n ∈ N} is law-determining and convergence-determining on U V 1 . The IFV genealogy process (X FV t ) t≥0 is dual to the coalescent (K t ) t≥0 , and its law and behavior as t → ∞ can be determined as follows.
Theorem 1.16 (Duality and longtime behaviour). The following properties hold for the IFV genealogy process (X FV t ) t≥0 : (a) For every X FV 0 ∈ U V 1 and K 0 ∈ S, we have where Γ is the unique invariant measure of the process X FV on U V 1 . Remark 1.17.
If a is transient, then we can decompose X FV t = (X t , r t , µ t ) in such a way that X t is the countable union of disjoint sets X i A further consequence of relation (1.19) is that we can specify the finite dimensional distributions of (X FV t ) t≥0 completely in terms of the spatial coalescent as follows. Fix a time horizon t > 0. The finite dimensional distributions are determined by the expectations of Φ, where for some n ∈ N, 0 ≤ t 1 < t 2 < · · · < t = t, Φ k ∈ Π 1,0 of order n k and defined by φ k and g k for each k = 1, ..., , we define . The dual is the spatial coalescent with frozen particles ( K s ) s∈[0,t] , for which the time index s ∈ [0, t] runs in the opposite direction from X FV . Namely, looking backward from time t, for each 1 ≤ k ≤ , we start n k particles at time t−t k at locations ξ 1 t k := (ξ 1 t k ,1 , . . . , ξ 1 t k ,n k ) ∈ V k , each forming its own partition element in the partition π of {1, 2, . . . , n 1 + · · · + n }. The particles perform the usual dynamics of the spatial coalescent, with the particles starting at time t − t k frozen before then. At time s, the genealogical distance r s (i, j) between two individuals i and j, started respectively at times t − t i , t − t j ≤ s, is defined to be 2 min{s, T i,j } − (t − t i ) − (t − t j ), where T i,j is the first time the two individuals coalesce. If s < t − t i or t − t j , we simply set r s (i, j) = 1.
Denote this new spatial coalescent process with frozen particles by ( K t ) t≥0 . The state space S is the set of all for all n ∈ N, (n k ) 1≤k≤n ∈ N, and 0 ≤ t 1 < · · · < t , where ξ k records the present positions of the particles starting at time t k . Let us label the individuals from time t 1 to time t k successively by 1, 2, . . . , m := n 1 + · · · + n , and let ξ := (ξ 1 , . . . , ξ m ) denote present positions. We can then rewrite the state in (1.22) as (1.23) π, ξ , r , (t k , n k ) 1≤k≤ ).
Let (φ k , g k ) 1≤k≤ be as in the definition of Φ in (1.21). Let X = (X, r, µ) ∈ U V 1 , and let K ∈ S be as in (1.22) and (1.23). We then define the duality function H(X , K) : U V 1 × S → R which determines the finite dimensional distributions of X for varying K by We then have the following space-time duality.
Corollary 1.18 (Space-time duality). Let X FV 0 ∈ U V 1 , and K 0 ∈ S be as in (1.23) with t k ≤ t for all 1 ≤ k ≤ . Let Φ and H be defined as in (1.21) and (1.24). Then we have In words, the genealogical distances of n 1 +· · ·+n individuals sampled from (X FV s ) s∈[0,t] , with n k individuals sampled at time t k at specified locations, can be recovered by letting these individuals evolve backward in time as a spatial coalescent until time 0, at which point we sample from X FV 0 according to the location of each partition element in the spatial coalescent.
Remark 1.19. (Strong duality) We can obtain a strong dual representation, which represents the state X FV t in terms of the entrance law of the tree-valued spatial coalescent starting with countably many individuals at each site of V realised both on the same probability space and its equivalence class w.r.t. isometry. The marked partition-valued case is constructed in [GLW05]. With this object we can associate just as above a marked ultrametric measure space K tree t , which pasted to X 0 , has the same law as X t . We need now the concept of subordination of two marked ultrametric measure spaces, allowing to concatinate a marked metric measure tree, prescribed as initial state with the one of the time t coalescent starting with countably many individuals.
Remark 1.20. (Pasting) Let V be a countable mark space. We have two random marked ultrametric measure spaces with mark function, U = (U, r, κ, µ) and U realized independently, where U is generated by completing (N×V, r , κ ), with continuous continuation of the mark function and with sampling measure µ derived from the equidistribution and finally assume that it has diameter 2t. Furthermore there exists a map from the set of all open 2t-balls of U into the geographic space V , say χ and a random i.i.d. assigned label on the elements of N × V in [0, 1].
Then we paste U to U (written U U ) by drawing from U an infinite sample at each location according to µ and then consider as distance matrix r U + r sample (ψ(·), ψ(·))(U) evaluated for each pair of indices from N × V and denoted r. Here ψ : N × V −→ sample (U) with κ(ψ(i)) = χ(i), ψ(i) is constant on the open 2t-ball around i. This space (N × V, r, κ ) with local equidistribution as sampling measure we complete and take the equivalence class w.r.t. isomorphy to get the pasting of U to U.
Note that every marked metric measure tree we can obtain via the sampling procedure with subsequent completion. See Appendix A for a more formal definition.
We choose χ now as the location of the partition element at time t of i and ψ as the local rank of i ∈ N × V at χ(i). For that purpose we use the i.i.d. [0, t]-distributed label.
Then we can conclude as in [GPW13] that expectations of polynomials of X t and dual K t subordinated to X 0 agree and by inspection that the law of X equals the law of the tree-valued coalescent entrance law for the coalescent started at time 0 run for time t and started with countably many individuals at each site of the finite geographic space V pasted to the state X 0 if the migration kernel is doubly stochastic: For countably infinite geographic space we can construct the coalescence tree as well using the techniques in [GLW05], which shows that the frequencies of individuals located in a finite set A of the geographic space which are in finitely many partition elements at a positive time t can be made arbitrarily close to one. Hence the entrance law defines still a marked ultrametric measure space, see Section D. As a consequence (1.27) still holds.
1.4. Genealogies of Continuum-sites Stepping Stone Model (CSSM) on R. If we rescale space and time diffusively, the measure-valued interacting Fleming-Viot process on Z converges to a continuum space limit, the so-called continuum-sites stepping stone model (CSSM) . Formally, CSSM is a measure-valued process ν := (ν t ) t≥0 on R × [0, 1], where R is the geographical space and [0, 1] is the type space. We might think of individuals in the population which undergo independent Brownian motions, and whenever two individuals meet, one of the two individuals is chosen with equal probability and switches its type to that of the second individual. Provided that ν 0 (· × [0, 1]) is the Lebesgue measure on R, CSSM was rigorously constructed in [EF96, Eva97, DEF + 00, Zho03] via a moment duality with coalescing Brownian motions. In particular, ν t (· × [0, 1]) is the Lebesgue measure on R, for all t > 0.
In this subsection we will construct the evolving genealogies of the CSSM based on duality to the (dual) Brownian web, and establish properties (Proposition 1.24, Theorem 1.26).
In [FINR04], the Brownian web W is constructed as a random variable where each realization of (1.28) W is a closed subset of Π := ∪ s∈R C([s, ∞), R), the space of continuous paths in R with any starting time s ∈ R. We equip Π with the topology of local uniform convergence. For each z := (x, t) ∈ R, we will let W(z) := W(x, t) denote the subset of paths in W with starting position x and starting time t. We can construct W by first fixing a countable dense subset D ⊂ R 2 , and then construct a collection of coalescing Brownian motions {W(z) : z ∈ D}, with one Brownian motion starting from each z ∈ D. Interpreting coalescing Brownian motions in the (dual) Brownian web as ancestral lines specifying the genealogies, we can then give an almost sure graphical construction of the CSSM, instead of relying on moment duality relations similar to (1.24) as in [EF96, Eva97, DEF + 00, Zho03], which nevertheless we get as corollary of the graphical construction. The classical measure-valued CSSM process can be recovered from (X CS t ) t≥0 by projecting the sampling measure (µ CS t ) t≥0 to the mark space V , if V is chosen to be the product of geographical space R and type space [0, 1]. In what follows, we will take V to be only the geographical space R, since types have no influence on the evolution of genealogies.
We next explicitly construct a version of the CSSM genealogy process (1.31) X CS := (X CS t ) t≥0 , X CS 1 ⊆ U R , the space of R-marked ultrametric measure spaces introduced in Remark ??, where the projection of µ CS t on R is the Lebesgue measure. To avoid a disruption of the flow of presentation, background details on the (dual) Brownian web we will need are collected in Appendix B.
We proceed in three steps: Step 1 (Initial states). Assume that X CS 0 belongs to the following closed subspace of U R : (1.32) U R 1 := {(X, r, µ) ∈ U R : µ(X × ·) is the Lebesgue measure on R}. In other words, U R 1 is the set of R-marked ultrametric measure spaces where the projection of the measure on the mark space (geographic space) R is the Lebesgue measure. This is necessary for the duality between CSSM and coalescing Brownian motions. We will see that almost surely X CS t ∈ U R 1 for all t ≥ 0.
Step 2 (The time-t genealogy as a metric space). To define (X CS t , r CS t ) for every t > 0, let us fix a realization of (W, W), (compare, Appendix B). For each t > 0, let By Lemma B.4 on the classification of points in R 2 w.r.t. W and W, almost surely, E t is a countable set for each t > 0. For each v ∈ A t , we interpretf (v,t) as the genealogy line of the individual at the space-time coordinate (v, t). Genealogy lines of individuals at different space-time coordinates evolve backward in time and coalesce with each other. At time 0, each genealogy line traces back to exactly one spatial location in the set where we note that E is almost surely a countable set, because by Theorem B.1 and Lemma B.2, paths in W can be approximated in a strong sense by a countable subset of paths in W. For each v ∈ E, we then identify a common ancestor ξ(v) for all the individuals whose genealogy lines trace back to spatial location v at time 0, by sampling an individual ξ(v) ∈ X CS 0 according to the conditional distribution of µ CS 0 on X CS 0 , conditioned on the spatial mark in the product space X CS 0 × R being equal to v. We next characterize individuals by points in space. Note that there is a natural genealogical distance between points in A t . For individuals x, y ∈ A t , iff (x,t) andf (y,t) coalesce at timeτ < t, then denoting u :=f (x,t) (0) and v :=f (y,t) (0), we define the distance between x and y by First define (X CS t , r CS t ) as the closure of A t w.r.t. the metric r t defined in (1.35). Note that (X CS t , r CS t ) is ultrametric, and by construction Polish.
Remark 1.21. We may even extend the distance r t to a distance r between (x, t) with x ∈ A t and t > 0, and (y, s) with y ∈ A s and s > 0. More precisely, let Step 3 (Adding the sampling measure). We now define X CS . For that purpose, we will represent X CS t as an enriched copy of R (see (1.37) below). By identifying each x ∈ A t with the pathf (x,t) ∈ W, we can also identify X CS t with the closure of {f (x,t) ∈ W : x ∈ A t } in Π, because a sequence x n ∈ A t is a Cauchy sequence w.r.t. the metric r t if and only if the sequence of pathsf (xn,t) is a Cauchy sequence when the distance between two paths is measured by the time to coalescence, which by Lemma B.2, is also equivalent to (f (xn,t) ) n∈N being a Cauchy sequence in Π.
When we take the closure of {f (x,t) ∈ W : x ∈ A t } in Π, only a countable number of paths in W are added, which are precisely the leftmost and rightmost paths in W(x, t), when W(x, t) contains more than one path.
Namely for each x ∈ E t (recall from (1.33)), let x + , x − denote the two copies of x obtained by taking limits of Note in the continuum case the function Φ n,g,ϕ (·, κ Br ) is not a polynomial since we fix the locations we consider. In order to get a polynomial we need to consider a function g on mark space with g ∈ C 2 bb (R, R) over which we integrate w.r.t. the sampling measure.
We collect below some basic properties for the CSSM genealogy processes that we just constructed.
Proposition 1.24 (Regularity of states). Let X CS := (X CS t ) t≥0 be the CSSM genealogy process constructed from the dual Brownian web W, with X CS 0 ∈ U R 1 . Then almost surely, for every t > 0: (a) There exists a continuous (mark) function κ t : where f + (x,t) and f − (x,t) are respectively the rightmost and leftmost path in Remark 1.25. Using the duality between the Brownian web W and W, as characterized in Appendix B, it is easily seen that we can also write converges to the unique equilibrium distribution on U R 1 as t → ∞. Remark 1.27. Proposition 1.24 shows that even though X CS 0 can be any state in U R 1 , for t > 0, X CS t can only take on a small subset of the state space U R 1 . This introduces complications in establishing the continuity of the process at t = 0, and it will also be an important point when we discuss the generator of the associated martingale problem.

Remark 1.28.
We note that if we allow types as well, we enlarge the mark space from R to R × [0, 1], where each individual carries a type in [0, 1] that is inherited upon resampling. Theorem 1.26 still holds in this case. We will consider such an extended mark space in Theorem 1.30 below.
1.5. Convergence of Rescaled IFV Genealogies. In this subsection, we establish the convergence of the interacting Fleming-Viot genealogy processes on Z to that of the CSSM, where we view the states as elements of U R (see Remark 1.14). We assume that the transition probability kernel a(·, ·) in the definition of the IFV process satisfies For each > 0, we then define a scaling map S = S σ : U R → U R (depending on σ) as follows. Let X = (X, r, µ) ∈ U R . Then where (S r)(x, y) := 2 r(x, y) for all x, y ∈ X, and S µ is the measure on X × R induced by µ and the map (x, v) ∈ X × R → (x, σ −1 v), and then the mass rescaled by a factor of σ −1 . More precisely, We have the following convergence result for rescaled IFV genealogy processes.
Theorem 1.29 (Convergence of Rescaled IFV Genealogies). Let X FV, := (X FV, t ) t≥0 be an IFV genealogy process on Z with initial condition X FV, 0 ∈ U Z 1 , indexed by > 0. Assume that a(·, ·) satisfies (1.41), and S X FV, converges weakly to the CSSM genealogy process X CS := (X CS t ) t≥0 as → 0. To prove Theorem 1.29, we will need an auxiliary result of interest in its own on the convergence of rescaled measure-valued IFV processes to the measure-valued CSSM. The , and its projection on R is the Lebesgue measure on R. Define X FV and X CS respectively as the projection of the measure component of X FV and X CS , projected onto the mark space R × [0, 1]. We then have converges weakly to the CSSM process X CS := ( X CS t ) t≥0 as → 0. A similar convergence result has previously been established for the voter model in [AS11]. Remark 1.31. As can be seen from the above convergence results and the regularity properties of the limit process in Proposition 1.24, on a macroscopic scale, there are only locally finitely many individuals with descendants surviving for a macroscopic time of δ or more. This phenomenon leads in the continuum limit to a dynamic driven by a thin subset of hotspots only. For similar effects in other population models, see for example [DEF + 02b, DEF + 02a].
1.6. Martingale Problem for CSSM Genealogy Processes. In this section, we show that the CSSM genealogy processes solves a martingale problem with a singular generator. To identify the generator L CS for the CSSM genealogy process (X CS t ) t≥0 , we note that for all t > 0, X CS t satisfies the regularity properties established in Proposition 1.24. We will see that L CS is only well-defined on Φ ∈ Π 1,2 evaluated at points X ∈ U R with suitable regularity properties for Φ and X .
We now formalize the subset of regular points in U R 1 as follows, which satisfy exactly the properties in Proposition 1.24.
Definition 1.32 (Regular class of states U R r ). Let U R r denote the set of X = (X, r, µ) ∈ U R satisfying the following regularity properties: (a) X ∈ U R 1 , i.e., µ(X × ·) is the Lebesgue measure on R; (c) there exists δ > 0 such that for each l ∈ (0, δ), X is the disjoint union of balls (B l i ) i∈Z of radius l. Furthermore, there exists a locally finite set E l : By Proposition 1.24, U R r is a dynamically closed, separable, metric subset of the Polish space U R 1 under the dynamics of X CS . However, it is not complete.
Remark 1.33. Similar to the discussion leading to (1.37), for X ∈ U R r , we can give a representation on an enriched copy of R as follows. Property (c) in Definition 1.32 implies that any two disjoint balls in X are mapped by κ to two intervals, which overlap at at most a single point in E l for some l > 0. Therefore κ −1 (x) must contain a single point for and κ −1 (x) containing two or more points implies that x is in κ(B 1 ) ∩ κ(B 2 ) for two disjoint balls in X. By the same reasoning, for each x ∈ E, κ −1 (x) must contain exactly two points, which we denote by x ± , where x + is a limit point of {κ −1 (w) : w > x} and x − is a limit point of {κ −1 (w) : w < x}. Similar to (1.37), we can then identify X with With this identification, we can simplify our notation (with a slight abuse) and let µ be the measure on (R\E) ∪ E + ∪ E − , which assigns no mass to E ± and is equal to the Lebesgue measure on R\E.
We now introduce a regular subset of Π 1,2 needed to define the generator L CS .
Definition 1.34 (Regular class of functions Π 1,2 r ). Let Π 1,2 r denote the set of regular test functions Φ n,φ,g ∈ Π 1,2 , defined as in (1.7), with the property that: We can now specify the action of the generator L CS on regular Φ evaluated at regular points, namely LΦ(X ) exists for Φ = Φ n,φ,g ∈ Π 1,2 r and X = (X, r, µ) ∈ U R r . By Remark 1.33, we can identify X with (R\E) ∪ E + ∪ E − , while µ is identified with the Lebesgue measure on R\E. The generator L CS is given by , with the component for the massflow (migration) of the population on R given by and the component for aging of individuals given by These operators are linear operators on the space of bounded continuous functions, C b (U R 1 , R), with domain Π 1,2 , and maps polynomials to polynomials of the same order.
The component for resampling is, with θ k,l φ defined as in (1.16), given by with effective resampling measure and mark functions (1.50) where E, and x ± for x ∈ E, are defined as in Remark 1.33.
Remark 1.35. Note that L CS r is singular. First because the effective resampling measure µ * is supported on a countable subset of X and is singular w.r.t. the sampling measure µ on X. Secondly, µ * is locally infinite because E ∩ (a, b) contains infinitely many points for any a < b.
Therefore the r.h.s. in (1.49) is now in principle a sum of countably many monomials of order n − 2.
Indeed, as we partition X = (R\E) ∪ E + ∪ E − into balls of radius l, with l ↓ 0, the balls must correspond to smaller and smaller intervals on R so as not contradict the fact that each point in X is assigned one value in R. Nevertheless, L CS r Φ(X ) in (1.49) is well-defined at least on U R r because by our assumption that Φ ∈ Π 1,2 r and condition (1.45) on φ, we have θ k,l φ = φ if the resampling is carried out between two individuals x + and x − for some x ∈ E, with r(x + , x − ) ≤ δ. Thus only resampling involving x ∈ E with r(x + , x − ) > δ remains in the integration w.r.t. µ * , and such x are contained in the locally finite set E δ introduced in Definition 1.32 (c). Together with the assumption that g has bounded support, this implies that the integral in (1.49) is finite.
The operator L CS r , defined on functions in Π 1,2 r evaluated at regular states in U R r ⊆ U R 1 , is still a linear operator, mapping polynomials to generalized polynomials of degree reduced by two and with domain Π 1,2 r . Here, generalized polynomial means that they are no longer bounded and continuous, and are only measurable functions defined on the subset of points U R r ⊆ U R 1 . Hence L CS r differs significantly from generators associated with Feller semigroups on Polish spaces.
For L CS r Φ(X ) to be well-defined for any Φ ∈ Π 1,2 , instead of Φ ∈ Π 1,2 r , we need to place further regularity assumption on the point X at which we evaluate Φ. These assumptions are satisfied by typical realizations of the CSSM at a fixed time, as we shall see below.
Definition 1.37 (Regular subclass of states U R rr ). Let U R rr be the set of X = (X, r, µ) ∈ U R r with the further property that where we have identified X with (R\E) ∪ E + ∪ E − as in Remark 1.33.
(i) If X 0 ∈ U R 1 , then the (L CS , Π 1,2 , δ X 0 )-martingale problem has a solution, i.e., there exists a process X := (X t ) t≥0 with initial condition X 0 , almost sure continuous sample path with X t ∈ U R r for every t > 0, such that for all Φ ∈ Π 1,2 , and w.r.t. the natural filtration, is a martingale.
(ii) The CSSM genealogy process X CS constructed in Sec. We conjecture that the martingale problems above are in fact well-posed. A proof could be attempted by using the duality between the CSSM genealogy process and the Brownian web. There are however subtle technical complications due to the fact that the generator of the martingale problem is highly singular. We leave this for a future paper. 1.7. Outline. We provide here an outline of the rest of the paper. In Section 2 we construct the CSSM genealogy process, and establish in Section 3 the convergence of the IFV genealogies to those of the CSSM, and in Section 4 results on the martingale problem for the CSSM genealogy processes. In Appendix A, we collect further facts and proofs concerning marked metric measure spaces. In Appendix B, we recall the construction of the Brownian web and its dual, and collect some basic properties of the Brownian web and coalescing Brownian motions. In Appendix C we prove some results on coalescing Brownian motions needed in our estimates to derive the martingale problem for the CSSM. In Appendix D we prove of the results on the IFV genealogy processes.

Proofs of the properties of CSSM Genealogy Processes
In this section, we prove Proposition 1.24 and Theorem 1.26 by using properties of the double Brownian web (W, W), which was used to construct the CSSM genealogy process X CS in Section 1.4.
, then identifying x n and x with points on R, it follows thatf (xn,t) →f (x,t) in Π for some pathf (xn,t) ∈ W(x n , t) andf (x,t) ∈ W(x, t), which implies that x n → x in R.
(b): By (1.37), we identify X CS t with R, where a countable subset E t is duplicated. The distance between x, y ∈ X CS t is defined to be twice of the time to coalescence between the dual Brownian web pathsf (x,t) ∈ W(x, t) andf (y,t) ∈ W(y, t), if the two paths coalesce above time 0. Therefore for l ∈ (0, t), each ball B l i of radius l correspond to a maximal interval Proof of Theorem 1.26. Let us fix a realization of the Brownian web W and its dual W, and let X CS be constructed from W as just before Proposition 1.24. By (1.37), for each where A t and E t are defined as in (1.33). To simplify notation, we will drop the superscript CS in the remainder of the proof. (b): Let X 0 ∈ U R 1 . We first prove that (X t ) t≥0 is a.s. continuous in t > 0. To accomplish this, since Π 1,2 is convergence determining in U R as shown in Theorem 1.8, it suffices to show that for any Φ := Φ n,φ,g ∈ Π 1,2 , the evaluated polynomial , and given the identifi- By Lemma B.4, for each t > 0, W(x, t) contains a single path for all but a countable number of x ∈ R. For such x, by Lemma B.2, the time of coalescence betweenf (x,s) and f (x,t) tends to t as s → t, and hence lim s→t r((x, s), (x, t)) = 0, where r((x, s), (x, t)) is defined in (1.36) and extends the definition of r t (x, y) to individuals at different times. Since it follows that when t > 0, for Lebesgue a.e.
We can then apply the dominated convergence theorem in (2.1) to deduce that, for each t > 0, a.s.
This verifies that (X t ) t≥0 is a.s. continuous in t > 0. Proving the a.s. continuity of (X t ) t≥0 at t = 0 poses new difficulties because X 0 can be any state in U R 1 , while for any t > 0, X t is a regular state as shown in Proposition 1.24. We get around this by showing that X admits a càdlàg version. More precisely, we invoke a part of the proof of the convergence Theorem 1.29 that is independent of the current proof. Note that for any X 0 ∈ U R 1 , we can find a sequence X FV, Indeed, we only need to approximate the mark space R by Z in order to construct S X FV, 0 from X 0 . In the proof of Theorem 1.29, it is shown that the corresponding sequence of interacting Fleming-Viot genealogy process (S X FV, the space of càdlàg paths on U R equipped with the Skorohod topology. Furthermore, (S X FV, −2 t ) t≥0 converges in finite-dimensional distribution to the CSSM genealogy process (X t ) t≥0 . Therefore, (X t ) t≥0 must admit a version which is a.s. càdlàg, with X t → X 0 as t ↓ 0. Since we have just shown that the version of (X t ) t≥0 constructed in Sec. 1.4 is a.s. continuous in t > 0, it follows that the same version must also be a.s. càdlàg, and hence continuous at t = 0, which concludes the proof of part (b).
tm ⇒ X t , by Theorem 1.8, it suffices to show We claim that the convergence in In particular, as m → ∞, To prove (2.6), it then suffices to show that We prove (2.7) and (2.8) next.
Proof of (2.7). By the Markov property of X , it suffices to show that as t ↓ 0, Note that we can write where for each x ∈ R, ξ(x) ∈ X 0 is sampled according to the conditional distribution of µ 0 on X 0 , conditioned on the spatial coordinate in X 0 × R being equal to x. On the other hand, into expectation restricted to F (x, t) and F c (x, t) respectively. On the event F (x, t), we can replacef (x i ,t) , 1 ≤ i ≤ n, by independent Brownian motions (x i (s)) s≤t , 1 ≤ i ≤ n, starting respectively at x i at time t and running backward in time. Then Note that for Lebesgue a.e. x 1 , . . . , x n , P(F c (x, t)) = P( F c (x, t)) → 0 as t ↓ 0. Therefore by the bounded convergence theorem, the first and third term in (2.13) converges to 0 as t ↓ 0, uniformly in X 0 . For the second term in (2.13), we can make the change of variable For each y ∈ R n , clearly the quantity inside the expectation converges a.s. to the analogue in (2.10) as t ↓ 0, and the speed of convergence does not depend on X 0 . Therefore the expectation in (2.14) also converges, uniformly in X 0 . Using the fact that g has bounded support, while g and φ are both bounded, we can easily dominate the integrand in (2.14) w.r.t. dy by an integrable function as t ↓ 0; (2.9), and hence (2.7), then follows.
Proof of (2.8). For each where given the realization of (f . Note that Φf ,τ is a polynomial of order k on U R , defined from the bounded continuous functions gf ,τ and φf ,τ , except that gf ,τ does not have bounded support. Nevertheless, gf ,τ is integrable and can be approximated by continuous functions with bounded support.
Therefore from the assumption X for Lebesgue a.e. x 1 , . . . , x n and a.e. realization of (f i (s)) 1≤i≤n,s∈ [τ ,t] . It then follows from the bounded convergence theorem that which concludes the proof of the Feller property. As t → ∞, the probability of this event tends to 0, and therefore (X CS t+s ) s≥0 converges in distribution to the CSSM genealogy process constructed from W as in Section 1.4, with initial condition being at time −∞.

Proof of convergence of Rescaled IFV Genealogies
In this section, we prove Theorem 1.29, that under diffusive scaling of space and time as well as rescaling of measure, the genealogies of the interacting Fleming-Viot process converges to those of a CSSM. In Section 3.1 we prove f.d.d.-convergence, and in Section 3.2 tightness in path space. In Section 3.3, we prove Theorem 1.30 on the measure-valued process, which is needed to prove tightness in Section 3.2.
As in Theorem 1.29, let (X FV, be the CSSM genealogy process with initial condition where ⇒ denotes weak convergence of (U R ) k -valued random variables. By [EK86,Prop. 3.4.6] on convergence determining class for product spaces, it suffices to show that for any For notational convenience we assume first that the initial tree is the trivial one (all distances are zero) and we shall see at the end of the argument that this easily generalizes. We will first rewrite both sides of this convergence relation in terms of the dual coalescents and then apply the invariance principle for coalescing random walks.
We can by the definition of the polynomial in (2.1) rewrite the left hand side of (3.2) as where for each time t i , we sample n i individuals in X FV, −2 t i at respective spatial positions the conditional distribution of µ FV, −2 t i on X FV, −2 t i conditioned on the spatial mark being equal to x a ∈ Z.
By the space-time duality relation (1.26) for the IFV genealogy processes, every summand of the R.H.S. of (3.3) can be calculated in terms of coalescing random walks. Namely, the joint law of the space-time genealogies of the sampled individuals ξ t i (x i a ) ∈ X FV, −2 t i , 1 ≤ a ≤ n i and 1 ≤ i ≤ k, is equal to that of a collection of coalescing random walks (x i a (s)) s≤ −2 t i (recall here time s runs backwards), starting respectively at x i a at time −2 t i , where each walk evolves backward in time as rate 1 continuous time random walk on Z with transition probability kernelā, and two walks at the same location coalesce at rate γ. We have in particular by the dual the stochastic representation: We observe next that the continuum population is represented by A t ∪ E + t ∪ E − t which is a version of R marked on E t by +, −, the geographic marks are the reals and since the sampling measure is Lebesgue measure, we can write polynomials based on integration over R instead of X × V as: We can rewrite using the duality of Corollary 1.22, see also (1.36), the R.H.S. of (3.2) in the same form as in (3.3): where at each time t i , we sample n i individuals from X CS t i according to µ CS t i (which is Lebesgue measure on R) at positions (y i a ) 1≤a≤n i , and their joint space-time genealogy lines are by construction distributed as a collection of coalescing Brownian motions (y i a (s)) s≤t i , evolving backward in time.
To link (3.3) with (3.7), we note that in (3.3), we can regard as a finite signed sampling measure (recall that g has bounded support), which is easily seen to converge weakly to the finite signed sampling measure appearing in (3.7), namely To prove (3.2), it then suffices to show that (having (3.8)-(3.9) in mind): If for each 1 ≤ a ≤ n i and 1 ≤ i ≤ k, x i, a ∈ Z and σ −1 x i, a → y i a as → 0, then (3.10) Step 2 (Invariance principle for coalescents). We next prove (3.10) by means of an invariance principle for coalescing random walks. This invariance principle reads as follows. Given a collection of backward coalescing random walks starting at -dependent positions a → y i a as → 0, the collection of coalescing random walks (x i, a (s)) s≤ −2 t i , rescaled diffusively as ( σ −1 x i, a ( −2 s)) s≤t i , converges in distribution to the collection of coalescing Brownian motions (y i a (s)) s≤t i evolving backward in time. Furthermore, the times of coalescence between the coalescing random walks, scaled by 2 , converge in distribution to the times of coalescence between the corresponding Brownian motions. The proof of such an invariance principle can be easily adapted from [NRS05, Section 5], which considered discrete time random walks with instantaneous coalescence. We will omit the details.
Let δ > 0 be small. Note that the collection of rescaled coalescing random walks ( σ −1 x i, a ( −2 s)) restricted to the time interval s ∈ [δ, t k ], together with their times of coalescence, converge in joint distribution to the collection of coalescing Brownian motions (y i a (s)) restricted to the time interval s ∈ [δ, t k ], together with their times of coalescence. Using Skorohod's representation theorem (see e.g. [Bil89]), we can couple ( σ −1 x i, a ( −2 s)) s∈[δ,t k ] and (y i a (s)) s∈[δ,t k ] such that the paths and their times of coalescence converge almost surely. Let us assume such a coupling from now on.
By the same argument as in the proof of Theorem 1.26 (c), we can rewrite the expectations in (3.10) in terms of the backward coalescing random walks x i, a ∈ Z and coalescing Brownian motions y i a . Furthermore, we can condition on the coalescing random walks x i, a (s) on the time interval [δ −2 , −2 t k ] and condition on the coalescing Brownian motions y i a (s) on the time interval [δ, t k ], coupled as above.
Given the locations u 1 , . . . , u l ∈ Z of the remaining coalescing random walks at time δ −2 , we now make an approximation and replace them by independent random walks on the remaining time interval [0, δ −2 ], and make a similar replacement for the coalescing Brownian motions. Note that the error we introduce to the two sides of (3.10) is bounded by a constant (determined only by |φ i | ∞ , 1 ≤ i ≤ k) times the probability that there is a coalescence among the random walks (resp. Brownian motions) in the time interval [0, δ −2 ] (resp. [0, δ]), which tends to 0 as δ ↓ 0 uniformly in by the properties of Brownian motion and the invariance principle. Therefore to prove (3.10), it suffices to prove its analogue where we make such an approximation for a fixed δ > 0, replacing coalescing random walks (resp. Brownian motions) on the time interval [0, δ −2 ] (resp. [0, δ]) by independent ones. Let us fix such a δ > 0 from now on.
By conditioning on the coalescing random walks and the coalescing Brownian motions on the macroscopic time interval [δ, t k ] and using the a.s. coupling between them, we note that the analogue of (3.10) discussed above follows readily if we show: Lemma 3.1. If u 1 , . . . , u l ∈ Z satisfy σ −1 u i → u i as → 0, then for any bounded continuous function ψ : R ( l 2 ) → R, we have (3.11) where g δ (x) is the probability mass function of l independent random walks at time δ −2 , starting at u 1 , . . . , u l ; while g δ (y) is the probability density function of l independent Brownian motions at time δ, starting at u 1 , . . . , u l .
Proof. If we can replace g δ (x 1 , . . . , x l ) in (3.11) by ( σ −1 ) l g δ ( σ −1 x 1 , . . . , σ −1 x l ), then (3.11) follows immediately by applying the polynomial Φ l,ψ,g δ to the states S X FV,ε 0 and X CS 0 , using the assumption S X FV,ε 0 → X CS 0 . The only problem is that g δ does not have bounded support as we require for a polynomial. However, it is continuous and integrable, and hence can be approximated by continuous functions with bounded support. Therefore the above reasoning is still valid.
To see why we can replace g δ (x) by ( σ −1 ) l g δ ( σ −1 x), note that by the local central limit theorem (see e.g. [Spi76]), uniformly in y ∈ [−L, L] l for any L > 0. Therefore when we restrict the summation in (3.11) to x ∈ [− −1 L, −1 L] l , the replacement induces an error that tends to 0 as → 0. By the central limit theorem, the contribution to the sum in (3.11) from x / ∈ [− −1 L, −1 L] l can be made arbitrarily small (uniformly in ) by choosing L large, and hence can be safely neglected if we first let → 0 and then let L → ∞.
3.2. Tightness. In this subsection we prove the tightness of the family of rescaled IFV genealogy processes, (S X FV, ) >0 , regarded as C([0, ∞), U R )-valued random variables. First we note that it is sufficient to prove the tightness of (S X FV, ) >0 as random variables taking values in the Skorohod space D([0, ∞), U R ). Indeed, the tightness of (S X FV, ) >0 in the Skorohod space, together with the convergence of S X FV, to X CS in finite-dimensional distributions, imply that S X FV, ⇒ X CS as D([0, ∞), U R )-valued random variables. In particular, (X CS t ) t≥0 admits a version which is a.s. càdlàg. Together with the fact that X CS t is a.s. continuous in t > 0, which was established in the proof of Theorem 1.26 (b), it follows that X CS t must be a.s. continuous in t ≥ 0. Note that this concludes the proof of Theorem 1.26 (b).
Using Skorohod's representation theorem (see e.g. [Bil89]) to couple (S X FV, ) >0 and X CS such that the convergence in D([0, ∞), U R ) is almost sure, and using the a.s. continuity of (X CS t ) t≥0 , we can then easily conclude that S X FV, → X CS a.s. in C([0, ∞), U R ), which implies the tightness of (S X FV, ) >0 as C([0, ∞), U R )-valued random variables.
Note that Π 1,2 (recall from (1.8)) separates points in U R by Theorem 1.8, and is closed under addition.
First term in (3.15). This term we bound by bounding Var Φ τ +δ | X FV, −2 τ uniformly in X FV, −2 τ . First note that X FV, is a strong Markov process by Theorem 1.11. Therefore X FV, −2 (τ +δ ) can be seen as the IFV genealogy process X FV, at time −2 δ with initial condition X FV, −2 τ . In particular, it suffices to bound Var(Φ δ ) uniformly in the initial condition X FV, 0 , which we can assume to be deterministic. Let Φ = Φ n,φ,g , and denote x := (x 1 , . . . , x n ) ∈ Z n , y := (y 1 , . . . , y n ) ∈ Z n . Then by the definition of the scaling map S in (1.42), we have where r FV, −2 δ (x) denotes the distance matrix r FV, −2 δ (ξ(x i ), ξ(x j )) 1≤i<j≤n of n individuals ξ(x 1 ), . . . , ξ(x n ) sampled independently from X FV, −2 δ at positions x 1 , . . . , x n respectively. In order to evaluate the r.h.s. of (3.10) we represent the quantity using the duality in terms of a collection of coalescing random walks as follows.
Let (X x i t ) 1≤i≤n and (X y i t ) 1≤i≤n denote a family of rate 1 continuous time random walks on Z with transition kernelā as in (1.12), and every pair of walks at the same site coalesce at rate γ. The coalescence gives a partition of the set of coalescing random walks at time −2 δ , and independently for each partition element, say at position z ∈ Z, we sample an individual from X FV, 0 at position z. Let r FV, 0 (X x −2 δ ) denote the distance matrix of the collection of sampled individuals associated with the walks X x 1 −2 δ , . . . , X xn −2 δ at time −2 δ , and let r FV, 0 (X y −2 δ ) de defined similarly. We can further construct ( X y i t ) 1≤i≤n , a copy of (X y i t ) 1≤i≤n , which coincides with (X y i t ) 1≤i≤n up to time −2 δ on the event G −2 δ (x, y) := {none of (X x i ) 1≤i≤n coalesces with any (X y i ) 1≤i≤n before time −2 δ }, such that ( X y i t ) 1≤i≤n is independent of (X x i t ) 1≤i≤n . Let r FV, 0 ( X y −2 δ ) be the associated distance matrix, which is independent of r FV, 0 (X x −2 δ ). By the duality relation (see Theorem 1.16), we have (3.17) Cov φ( 2 r FV, −2 δ (x)), φ( 2 r FV, −2 δ (y)) = Cov φ( 2 r FV, where τ x i ,y i denotes the time it takes for the two walks X x i · and X y j · to meet. Note that this bound is uniform w.r.t. X FV, 0 . Substituting it into (3.16) then gives We claim that the r.h.s. of (3.18) tends to 0 as ↓ 0. Indeed, the measure converges weakly to the finite measure g(x)g(ỹ)dxdỹ on R 2n as ↓ 0. By Donsker's invariance principle and the fact that δ → 0 as ↓ 0, we note that for any λ > 0, It follows that when restricted tox i andỹ j with |x i −ỹ j | > λ, the inner sum in (3.18) tends to 0 as ↓ 0. On the other hand, when restricted tox i andỹ j with |x i −ỹ j | ≤ λ, the inner sum in (3.18) can be bounded from above by replacing P(·) with 1, which then converges to the integral of the finite measure g(x)g(ỹ)dxdỹ over the subset of R 2n with |x i −ỹ j | ≤ λ, and can be made arbitrarily small by choosing λ > 0 small. This proves that Var(Φ δ ) tends to 0 uniformly in X FV, 0 as ↓ 0, and hence the first term in (3.15) tends to 0 as ↓ 0.
Second term in (3.15). By the strong Markov property of X FV, , it suffices to bound t (x) be defined as before (3.17). Let ( X x i t ) 1≤i≤n be a collection of independent random walks, such that ( X x i t ) 1≤i≤n coincides with (X x i t ) 1≤i≤n up to time −2 δ on the event G −2 δ (x) := {no coalescence has taken place among (X x i ) 1≤i≤n before time −2 δ }.

Then we have
Note that the second term in the bound above tends to 0 as ↓ 0 by the same argument as the one showing that the bound for Var(Φ δ ) in (3.18) tends to 0 as ↓ 0. To bound the first term in the bound above, we decompose according to the positions of the random walks and rewrite it as follows, where p t (x) denotes the transition probability kernel of X 0 t : where C is a constant depending only on n. In the derivation above, we Taylor expanded g( σ −1 x) around σ −1 y when either σ −1 x or σ −1 y is not in the support of g, ∇g and ∇ 2 g denote the first and second derivatives of g, and u(x, y) is some point on the line segment connecting x and y. Lastly, we used the fact that z∈Z zp t (z) = 0 and z∈Z z 2 p t (z) = tσ 2 . Since g has bounded support, the bound we obtained above is bounded by C δ for some C depending only on n, φ and g, and hence tends to 0 as ↓ 0. This verifies that the second term in (3.15) also tends to 0 as ↓ 0, which concludes the proof of (A2).
We have verified (J2) above and hence to conclude the proof of tightness of (S X FV, ) >0 as a family of random variables in the Skorohod space D([0, ∞), U R ), it only remains to verify the compact containment condition (J1). Some technical difficulties arise. Because the geographical space is unbounded, truncation in space is needed. We also need to control how the sizes of different families fluctuate over time, as well as how the population flux across the truncation boundaries affect the family sizes. Our strategy is to enlarge the mark space by assigning types to different families. Using a weaker convergence result, Theorem 1.30, for measure-valued IFV processes with types (but no genealogies), we can control the evolution of family sizes as well as their dispersion in space, which can then be strengthened to control the genealogical structure of the population. As we will point out later, Theorem 1.30 can be proved by adapting what we have done so far, because condition (J1) is trivial in that context. Therefore invoking Theorem 1.30 to prove (J1) for the genealogy processes is justified.
Proof of (J1). As noted in Remark 1.4, we can regard U R as a subset of (U R f ) N , endowed with the product R-marked Gromov-weak topology. Therefore to prove (J1), it suffices to show that for each k ∈ N, the restriction of (S X FV, −2 t ) t≥0, >0 to the subset of marks (−k, k) ⊂ R, i.e., S (k) X FV, −2 t := (X FV, −2 t , S r FV, −2 t , 1 {|v|<k} S µ FV, −2 t (dxdv)), t ≥ 0, > 0, satisfies the compact containment condition (3.13). More precisely, it suffices to show that for each T > 0 (which we will assume to be 1 for simplicity), and for each δ > 0, there exists a compact K δ ⊂ U R f , such that for all > 0 sufficiently small, where k ∈ N will be fixed for the rest of the proof.
We will construct K 1 δ , K 2 δ , K 3 δ ⊂ U R f , which satisfy respectively conditions (i)-(iii) in Theorem A.1 for the relative compactness of subsets of U R f . We can then take K δ := K 1 δ ∩ K 2 δ ∩ K 3 δ , which is a compact subset of U R f . To prove (3.19), it then suffices to prove the same inequality but with K δ replaced by K i δ for each 1 ≤ i ≤ 3, which we do in (1)-(3) below.
Later when we construct K 2 δ and K 3 δ , we will keep track of the mass of different individuals having some specified properties. The way to do this is to introduce additional marks. We will enlarge the mark space from R to R × [0, 1], where [0, 1] is the space of the additional types, and X FV, (1) First let K 1 δ be the subset of U R f , such that for each (X, r, µ) ∈ K 1 δ , µ(X × ·) is supported on [−k, k], with total mass bounded by 4k. Since the family of measures on [−k, k], with total mass bounded by 4k, is relatively compact w.r.t. the weak topology, K 1 δ satisfies condition (i) in Theorem A.1. We further note that a.s., S (k) X FV, −2 t ∈ K 1 δ for all t ≥ 0, and hence (3.19) holds with K δ replaced by K 1 δ .
(2) For each n ∈ N, we will find below L(n) such that if K 2,n δ denotes the subset of U R f with (3.20) (X×R) 2 1 {r(x,y)>L(n)} µ(dxdu)µ(dydv) < 1 n for each (X, r, µ) ∈ K 2,n δ , then uniformly in > 0 sufficiently small, we have (3.21) P S (k) X FV, −2 t / ∈ K 2,n δ for some 0 ≤ t ≤ 1 ≤ δ 2 n =: δ n . We can then take K 2 δ := ∩ n∈N K 2,n δ , which clearly satisfies condition (ii) in Theorem A.1, while (3.21) implies that (3.19) holds with K δ replaced by K 2 δ . In order to find L(n), we proceed in two steps. First we find an analogue of L(n) for the limiting CSSM genealogies, and then in a second step, we use the convergence of the measure-valued IFV to obtain L(n).
Fix n ∈ N. To find L(n) such that (3.21) holds, we first prove an analogue of (3.21) for the continuum limit X CS by utilizing the types of the individuals. Given η ≥ 0 and γ ∈ [0, 1], let We claim that we can find A sufficiently large, such that if all individuals in X CS 0 with spatial mark outside [−A, A] are assigned type 0, and all other individuals are assigned type 1, then In other words, with probability at least 1 − δ n /4, the following event occurs: for all 0 ≤ t ≤ 1, no individual in X CS t with spatial mark in [−k, k] can trace its genealogy back to some individual at time 0 with spatial mark outside [−A, A]. This will allow us to restrict our attention to descendants of the population in [−A, A] at time 0.
Indeed, by the construction of the CSSM (Section 1.4) using the Brownian web, the measure-valued process X CS t , which is the measure µ CS t projected on the geographic and type space, is given by (3.24) which is the analogue of (3.21) for X CS . Next we turn to X FV,ε and we will deduce (3.21) from (3.26) by exploiting Theorem 1.30 on the convergence of the measure-valued processes S X FV, to X CS . Let A be the same as above. Let individuals in X CS , converges vaguely to X CS 0 as ↓ 0. By Theorem 1.30, (S X FV, −2 t ) 0≤t≤1 converges weakly to ( X CS t ) 0≤t≤1 as random variables in C([0, 1], M(R × [0, 1])). Applying this weak convergence result to (3.25) then implies that for > 0 sufficiently small, we have the following analogue of (3.25): Note that on the event S X FV, −2 t ∈ G 1 2kn , 1 (3) Our procedure for constructing K 3 δ is similar to that of K 2 δ . For each n ∈ N, we will find M = M (n) such that if K 3,n δ denotes the subset of U R f with the property that for each (X, r, µ) ∈ K 3,n δ , we can find M balls of radius 1 n in X, say B 1 , . . . , n , then uniformly in > 0 sufficiently small, we have We can then take K 3 δ := ∩ n∈N K 2,n δ , which clearly satisfies condition (iii) in Theorem A.1, while (3.28) implies that (3.19) holds with K δ replaced by K 3 δ . To find M (n) such that (3.28) holds, we partition the time interval [0, 1] into [ i−1 2n , i 2n ] for 1 ≤ i ≤ 2n. It then suffices to show that for each 1 ≤ i ≤ 2n, we can find M i (n) such that if K 3,n δ is defined using M i , then uniformly in > 0 small, Again we first determine M (n) for CSSM and then use the convergence of measurevalued IFV to measure-valued CSSM. We now prove an analogue of (3.29) for X CS . Since X CS i−1 2n ∈ U R×[0,1] almost surely, we can condition on its realization and partition the pop- with spatial marks in [−A, A] into disjoint balls of radius 1 3n , B 1 , B 2 , . . ., as we did in the argument leading to (3.25). Repeating the same argument there and assigning type 1 j to individuals in B j , we readily obtain the following analogue of (3.25): we can choose M i large enough such that Note that on the event X CS 2n in B 1 ∪ B 2 · · · ∪ B M i , and hence they are contained in M i balls of radius 1 3n + t ≤ 5 6n . Therefore (X CS t , r CS t , 1 {|v|≤k} µ CS t (dxdv)) ∈ K 3,n δ , and the analogue of (3.29) holds for X CS .
To establish (3.29) uniformly in small > 0, we again apply the convergence result in Theorem 1.30. Note that by the f.d.d. convergence established in Section 3.1, S X FV, as ↓ 0. Following the same argument as those leading to (3.27), we can then assign types to individuals in S X FV, such that the associated measure-valued process, (S X FV, , and individuals in S X FV, with spatial marks in [−A, A] and types in [ 1 M i , 1] are contained in M i balls with radius at most 1 2n . Applying this convergence result to (3.30) then implies that for > 0 sufficiently small, we have the following analogue of (3.30): This is then easily seen to imply (3.29).
Combining parts (1)-(3) concludes the proof of (3.19) and hence establishes the compact containment condition (J1). (Theorem 1.30). In [AS11, Theorem 1.1], a convergence result similar to Theorem 1.30 was proved for the voter model on Z, where the type space consists of only {0, 1}, and a special initial condition was considered, where the population to the left of the origin all have type 1, and the rest of the population have type 0. The proof consists of two parts: proof of tightness, and convergence of finite-dimensional distribution.

Proof of convergence of rescaled IFV processes
In [AS11], the proof of tightness for the voter model does not depend on the initial condition, and is based on the verification of Jakubowski's criterion and Aldous' criterion as we have done in Sections 3.1 and 3.2 for the genealogical process. Because the IFV process ignores the genealogical distances, the verification of the compact containment condition (J1) in Jakubowski's criterion is trivial, as in the case for the voter model. Using the duality between the IFV process and coalescing random walks with delayed coalescence, Aldous' criterion on the tightness of evaluations can be verified by exactly the same calculations as that for the voter model in [AS11], which uses the duality between the voter model and coalescing random walks with instantaneous coalescence. Recall here that in the rescaling the difference between instantaneous and delayed coalescence disappears because of recurrence of the difference walk. Lastly, the convergence of the finite-dimensional distribution for rescaled IFV process follows the same calculations as in Section 3.1, where we can simply enlarge the mark space to R × [0, 1] and suppress the genealogical distances by choosing φ i ≡ 1.

Martingale Problem for CSSM Genealogy Processes
In this section, we show that the CSSM genealogy process constructed in Section 1.4 solves the martingale problem formulated in Theorem 1.38. We will first identify the generator action on regular test functions evaluated at regular states, and then extend it to more general test functions and verify the martingale property. Complications arise mainly from the singular nature of the resampling component of the generator, which are only well-defined a priori on regular test functions evaluated at regular states. Fortunately by Proposition 1.24, the CSSM genealogy process enters these regular states as soon as t > 0, even though the initial state may not be regular.

Generator Action on Regular Test Functions.
In this section, we identify the generator of the CSSM genealogy process X CS , acting on Φ ∈ Π 1,2 r and evaluated at X CS t ∈ U R r , where Π 1,2 r and U R r are introduced in Section 1.6. The advantage of working with such regular Φ and X CS t is that, the relevant resamplings only occur at the boundary points of balls of radius at least δ, which is a locally finite set. In Section 4.2, we will extend it to the case Φ ∈ Π 1,2 , where we need to consider the boundaries of all balls in X CS t , which is a locally infinite set. Furthermore, is continuous in t ≥ 0. The proof is fairly long and technical and will be broken into parts, with (4.1) and (4.2) proved respectively in Sections 4. 1.1 and 4.1.2. 4.1.1. Proof of (4.1) in Proposition 4.1.
. Let L > 0 be chosen such that the support of g(x) is contained in (−L, L) n . Let δ > 0 be determined by φ as in (1.45), so that φ((r i,j ) 1≤i<j≤n ) is constant when any coordinate r i,j varies on the interval r i,j ∈ [0, δ].
We proceed in five steps, first giving a suitable representation for Φ(X CS 0 ) and Φ(X CS t ), and then calculating actions that lead to the different parts of the generator.
Step 1. We derive here a representation for Φ(X CS 0 ) by partitioning X CS 0 into disjoint balls of radius δ/4 and utilizing the fact that Φ ∈ Π 1,2 r . Since X CS 0 ∈ U R r , by Remark 1.33 and (1.37), we can identify X CS t for each t ≥ 0 with where x + and x − are duplicates of the point x in where E l t is the set of points in R that lie at the boundary of two disjoint balls of radius in l in X CS t , which is consistent with the definition in (1.39). Denote G(y, k) = · · · y k i <x i <y k i +1 1≤i≤n g(x) dx.
Step 2. We next write Φ(X CS t ), for t > 0, in terms of coalescing Brownian motions running forward in time (in contrast to the spatial genealogies which run backward in time), which determine the evolution of boundaries between disjoint balls in X CS t . Let {y i +B i } 0≤i≤m+1 be independent Brownian motions starting from {y i } 0≤i≤m+1 , from which we construct a family of coalescing Brownian motions {y i + B i } 0≤i≤m+1 . Namely, let y 0 + B 0 := y 0 + B 0 for all time, and let y 1 + B 1 (s) := y 1 + B 1 (s) until the first time y 1 +B 1 (s) hits y 0 + B 0 . From that time onward, define y 1 + B 1 to coincide with y 0 + B 0 . In the same way, we successively define {y i + B i } 2≤i≤m+1 from {y i + B i } 2≤i≤m+1 by adding one path at a time. Without loss of generality, we may assume that y i + B i is the a.s. unique path in the Brownian web W starting from (y i , 0).
To write Φ(X CS t ) in terms of the forward coalescing Brownian motions B, we observe that since by our construction of X CS Therefore, for all 0 ≤ t < δ 4 , we can write (4.9) Since Step 3. We first identify the aging term L CS a in the generator, defined in (1.48). For each term in the first sum in (4.12), by Taylor expanding φ(d k×k t ) in t, it is easy to see that where we note that ∂φ ∂r ij (d k×k 0 , is less than δ. Summing the above limit over k ∈ {0, · · · , m} n and using the definition of G, we find (4.14) where r := (r CS 0 (x i , x j )) 1≤i,j≤n . This gives the aging term L CS a .
Step 4. We next identify the resampling term L CS r , defined in (1.49). For the second sum in (4.12), we need to compute For each k ∈ {0, · · · , m} n , (4.16) We can rewrite the difference of the product of indicators as (4.17) We can expand the first product above and sort the result into three groups of terms, (G 1 ), (G 2 ) and (G 3 ), depending on whether each term contains one, two, or more factors of the form 1 [y i ,y i +B i (t)] for some 0 ≤ i ≤ m + 1. If h(x) denotes a term in (G 3 ), then necessarily R n g(x)h(x) ≤ C g |B i 1 (t)B i 2 (t)B i 3 (t)| for some C g depending only on g and some i 1 , i 2 , i 3 ∈ {0, · · · , m + 1}. Since E|B i 1 (t)B i 2 (t)B i 3 (t)| ≤ ct Each term in (G 2 ) is of the following form, where 1 ≤ i = j ≤ n, Denote the four terms in (4.18) respectively by h where we replaced g(x) by g(x 1 , · · · , y k i , · · · , y k j , · · · , x n ), with an error of o(1) as t ↓ 0. Therefore (4.20) y kτ <xτ <y kτ +1 τ =i,j g(x 1 , · · · , y k i , · · · , y k j , · · · x n ) 1≤τ ≤n τ =i,j dx τ .
We obtain similar results for h ij in (4.15), we obtain an integral for φ(r)g(x), where x τ , τ = i, j, are still integrated over R n , however the integration for x i and x j are replaced by summation over {y σ } 1≤σ≤m . Contributions only come from x i = x j , and is positive when k i = k j , and negative when k i = k j + 1 or k j = k i + 1. Writing everything in terms of X CS 0 , we easily verify that the contribution of terms in (G 2 ) to the limit in (4.15) is exactly where θ ij φ is defined as in (1.16), and r := (r CS 0 (x i , x j )) 1≤i,j≤n . This gives the resampling term L CS r defined in (1.49).
Step 5. Lastly we identify the diffusion (migration) term L CS d , defined in (1.47). We note that each term in group (G 1 ) is of the following form, where 1 ≤ j ≤ n, Denote the two terms respectively by h (1) j (x) and h (2) j (x). Then Therefore,

A similar result holds for h
(2) j . Combining the two, we see that Therefore, the contribution from terms in (G 1 ) to the limit in (4.15) gives precisely the migration term L CS d defined in (1.47). This establishes the generator formula (4.1).

4.1.2.
Proof of (4.2) in Proposition 4.1. The complications in proving (4.2) arise from trying to prove uniform integrability for various quantities. We proceed in three steps. First we show that, for each t > 0, By the Markov property of (X CS t ) t≥0 , Since X CS t ∈ U R r a.s. by Proposition 1.24, lim h↓0 t ) almost surely. Therefore, the first step is to interchange limit and expectation in (4.27) and to deduce (4.26). We need to show that (4.28) Once the uniform integrability has been verified, in Step 2 prove that E[LΦ(X CS t )] is continuous in t, and then in Step 3 put things together and prove (4.2).
Step 1 (Uniform integrability). This step constitutes the bulk of the proof of (4.2). Let us fix X CS t and examine the error terms in our earlier calculation of the generator formula (4.1) with X CS Let E t [·] denote the conditional expectation E[·|X CS t ]. Following the arguments leading to (4.11), where {meet} h is the event that either y 0 + B 0 (s) hits level −L before time h, or y mt+1 + B mt+1 (s) hits level L before time h, or one of the pair (y i + B i (s), y i+1 + B i+1 (s)) coalesces before time h. We now estimate the two terms in (4.30).
(i) We start with the second term in (4.30). Based on the decomposition (4.12), for some function H(h, z 0 , · · · , z mt+1 ) which is continuously differentiable in h and three times continuously differentiable in z 0 , . . . , z mt+1 with uniformly bounded derivatives. Since the generator formula (4.1) is derived by Taylor expanding H(h, B(h)), it is not hard to see that uniformly in h ∈ (0, δ∧t 4 ), (4.31) for some C g,φ depending on g and φ, and m t +3 is the number of variables in H(h, z 0 , . . . , z mt+1 ). Therefore uniformly for h ∈ (0, δ∧t 4 ). From the definition of L CS Φ, we note that (4.33) By the definition of m t in (4.29) and by Lemma B.5, we have E[m t ] ≤ 4L π δ∧t 4 < ∞. This implies the uniform integrability of the second term in (4.30) for h ∈ (0, δ∧t 4 ). (ii) We now consider the first term in (4.30). For 0 ≤ i ≤ m t , let τ i,i+1 be the first hitting time between y i + B i (s) and y i+1 + B i+1 (s). Let τ 0 be the first hitting time of level −L by y 0 + B 0 (s), and τ mt+1 the first hitting time of level L by y mt+1 + B mt+1 (s). Then (4.34) Since y 0 < −2L and y mt+1 > 2L, the probability of the events {τ 0 ≤ h} and {τ mt+1 ≤ h} decay exponentially fast in h −1 and the events are independent of X CS t . Therefore the first term in (4.34) is uniformly bounded in h > 0.
Bounding the second term in (4.34) is more delicate, because it remains of order 1 as h ↓ 0. We will need to use negative correlation inequalities for coalescing Brownian motions established in Appendix C.
Let us recall the definition of Φ(X CS t+h ) from (4.10), where we replace the pair {0, t} by {t, t + h}. By integrating over the population at time t + h, we note that both Φ(X CS t+h ) and Φ(X CS t+h ) can be written as integrals of g(x)φ(r(x)) integrated over x = (x 1 , · · · , x n ) ∈ (−L, L) n , except that: for a given x, the distance matrix r(x) may be different for Φ and Φ, and for Φ, the same point x may be integrated over several times with different distance matrix r(x) due to the fact that {y i + B i (h)} 0≤i≤mt+1 may not have the same order as {y i } 0≤i≤m+1 . However, for x not in {y j + B j (s)} for some 1 ≤ i ≤ n and 0 ≤ j ≤ m t + 1 , x is integrated over exactly once in Φ, and the associated distance matrix r(x) is the same for both Φ and Φ. Therefore, contributions from x / ∈ D cancel out in |Φ(X CS t+h ) − Φ(X CS t+h )|. Since g has compact support, the contribution from x ∈ D to |Φ(X CS t+h ) − Φ(X CS t+h )|, including multiple integrations over the same x by Φ, is at most where E B denotes expectation w.r.t. the Brownian motions B = (B 0 , . . . , B mt+1 ), and we applied Hölder inequality and the fact that h for some c > 0. It only remains to prove the uniform integrability of the r.h.s. of (4.36) w.r.t. the law of X CS t for 0 < h < δ∧t 4 . Note that the r.h.s. of (4.36) depends on y 0 and y mt+1 , which lie outside [−2L, 2L]. To control the dependence on y 0 and y mt+1 , we enlarge the interval and let {z 1 , · · · , z M +1 } := E δ∧t 4 t ∩ (−2L − 1, 2L + 1) as in (4.29), which contains {y 1 , · · · , y mt } as a subset. Denote we can then replace the r.h.s. of (4.36) by because C n,φ,g F h (X CS t ) dominates the r.h.s. of (4.36) except for possible missing terms (m t + 2), on the event y 0 ≤ −2L − 1, resp. y mt+1 ≥ 2L + 1. Since y 1 ≥ −2L and y mt ≤ 2L by definition, both missing terms are bounded by C 1 m t + C 2 uniformly for h > 0, and is thus uniformly integrable.
It only remains to prove the uniform integrability of F h (X CS t ) uniformly in 0 < h < δ∧t 4 . We achieve this by bounding its second moment. Note that by Cauchy-Schwarz, x 4 <x 5 |x 1 |,|x 2 |,|x 3 |,|x 4 |,|x 5 |<2L+1 where (4.40) K c 4,5 (x 1 , · · · , x 5 ) := lim is the density of finding a point at x i for each 1 ≤ i ≤ 5, with no point in (x 4 , x 5 ). By the definition of E l t in (1.39) and the duality between the forward and dual Brownian web (W, W), we see that ξ is the point process generated on R at time t by coalescing Brownian motions in the Brownian web W starting from every point in R at time t − δ∧t 4 . By the negative correlation inequality in Lemma C.3, we can bound where by Lemma B.5, By Lemma C.6, for x < y, K c (x, y) ≤ C δ,t (y − x) ∧ 1 for some C δ,t > 0 depending only on δ and t. Substituting the above bounds into (4.39), using the definition of ψ, and separating the integration into two regions depending on whether 0 < x 5 − x 4 < √ h or x 5 − x 4 ≥ √ h, it is easily seen that the integral in (4.39) is uniformly bounded for 0 < h < δ∧t 4 . This implies the uniform integrability of F h (X CS t ) for 0 < h < δ∧t 4 , and hence that of Step 2 (Continuity of E[L CS Φ(X CS t )]). Recall from (1.46)-(1.49) that L CS = L CS d + L CS a + L CS r . We will prove the continuity for each component of the generator. By our assumptions on g and φ, |L CS d Φ| ∞ , |L CS a Φ| ∞ ≤ C Φ < ∞. It was shown in (2.4) that for any t > 0, for Lebesgue a.e. x = (x 1 , · · · , x n ) ∈ R n , the distance matrix r CS s (x i , x j ) converges a.s. to r CS t (x i , x j ) as s → t, and this conclusion is also easily seen to hold for t = 0 by the assumption X CS 0 ∈ U R r . Therefore, almost surely w.r.t. (X CS s ) s≥0 , (4.41) lim are continuous in t ≥ 0 by the bounded convergence theorem.
We now turn to the continuity of E[L CS r Φ(X CS t )]. We first prove the continuity at t = 0 and later point out how it extends to t > 0. Let E 0 (s). By our regularity assumption on φ, only resampling at the boundaries of balls of radius δ or more has an effect on Φ(X CS s ). Therefore for s ∈ (0, δ/4), we have (4.42) where θ ij φ is defined as in (1.16), and r := (r CS s (x i , x j )) 1≤i,j≤n . By our assumptions on g and φ, the fact X CS 0 ∈ U R r , and our construction of X CS s in terms of the (dual) Brownian web, it is then easily seen by dominated convergence that To prove the continuity of E[L CS Φ(X CS t )] at t = 0, it only remains to verify the uniform integrability of L CS r Φ(X CS s ) for s close to 0, say s ∈ [0, δ/4]. We will achieve this by showing that L CS r Φ(X CS s ) has uniformly bounded second moments. Note that because g is assumed to be supported on [−L, L] n , we have By Lemma C.2, E δ/4 0 (s) is negatively correlated, and hence by Lemma C.5, we have Thus it suffices to bound E E Note that the first term in (4.46) is finite and independent of s ≥ 0. We now treat the second term. For each i ≥ 2L, let us denote ξ [i,i+1]×{0} s ∩ (−L, L) by ξ i s,L , which is also a point process on (−L, L) satisfying negative correlation. In particular, where we used Lemma C.4 and the observation that, ξ i s,L = ∅ implies that the Brownian motion starting at (i, 0) in the Brownian web must be to the left of L at time s. Since L is fixed and large, 1 √ 2πs ∞ i−L e − x 2 2s dx ≤ α for some α ∈ (0, 1) uniformly in i ≥ 2L and s ∈ [0, δ/4], and hence It is then clear that i≥2L E[|ξ i s,L |] tends to 0 as s ↓ 0 and is uniformly bounded for s ∈ [0, δ/4], which concludes the proof of the uniform integrability of L r Φ(X CS s ) for s (a) We will take the limit l ↓ 0 in the integral equation (4.2). In Step 1, we will show that (4.2) also holds for Φ. In Step 2, we will prove the continuity of E[L CS Φ(X CS t )] in t > 0. Note that without assuming Φ ∈ Π 1,2 r or X CS 0 ∈ U R rr , L CS Φ(X CS 0 ) may not be well-defined.
Step 1 First note that |Φ l − Φ| ∞ → 0 as l ↓ 0 by our assumption on g and φ and the fact that sup x≥0 |ρ l (x) − x| → 0 as l ↓ 0. Therefore, are all uniformly bounded by some C Φ < ∞ independent of l. Also, by our assumptions on g, φ and ρ l , for each s > 0 and a.s. every realization of X CS s , we have Therefore, by the bounded convergence theorem, For the resampling generator L CS r , note that for any s > 0 and a.s. every realization of X CS s , (4.52) where E s is defined as in (4.4), which is the subset of R that are points of multiplicity in the dual Brwonian web W, d x = r CS s (x + , x − ) is the distance between the two points in X CS s with the same spatial location as x ∈ E s , M > 0 is chosen such that supp(g) ⊂ (−M, M ) n , and we used the assumption that φ has bounded derivative. Such a bound holds for φ l uniformly in l > 0. If x∈(−M,M )∩Es d x ∧ 1 < ∞, then by dominated convergence, we have To prove the analogue of (4.51) for L CS r Φ, by dominated convergence, it only remains to show where ξ A s denotes the point set on R generated at time s by the collection of paths in the Brownian web W starting from the space time set A ⊂ R 2 , and we used E[ |ξ πu by Lemma B.5. Inequality (4.54) then follows. To summarize, we have thus shown that (4.2) is also valid for a general polynomial Φ ∈ Π 1,2 .
Step 2 To prove the continuity of E[L CS Φ(X CS t )] in t > 0, we again decompose L CS into its three summands. First note that the continuity of To prove the continuity of E[L CS r Φ(X CS t )] in t > 0, we fix t > 0 and a truncation parameter ∈ (0, t). For each s ∈ (t − , t + ), we decompose L CS r Φ(X CS s ) into L < +s−t r Φ(X CS s ) and L ≥ +s−t r Φ(X CS s ), where both L < +s−t r Φ(X CS s ) and L ≥ +s−t r Φ(X CS s ) are defined as in (4.42), except that resampling therein is carried out by summing over (4.56) where E s and E δ s are defined as in (4.4). The same argument as in the proof of the continuity of E[L CS r Φ(X CS t )] in Prop. 4.1 shows that On the other hand, for s ∈ [t − 2 , t + 2 ], by the same calculations as those leading to (4.55), we have Since > 0 can be chosen arbitrarily small, (4.57) and (4.58) together imply that when t > 0.
(b) We now verify the continuity of E[L CS r Φ(X CS t )] (and hence of E[L CS Φ(X CS t )]) at t = 0 under the additional assumption that X CS 0 ∈ U R rr . Together with (4.2), this also implies that the generator equation (4.1) holds for a general polynomial Φ ∈ Π 1,2 .
As before, we separate L CS r Φ(X CS s ) into L ≥ +s r Φ(X CS s ) and L < −s r Φ(X CS s ), where the truncation parameter > 0 is fixed and small. Equation (4.57) also holds with t = 0 by the same argument as that for the continuity of E[L CS r Φ(X CS t )] at t = 0, when Φ ∈ Π 1,2 r . Indeed, in both cases, only resampling between individuals with sufficiently large genealogical distance contribute to the generator action on Φ.
It remains to show that (4.59) sup For 0 < s < /2, we can separate the resampling terms according to whether the genealogies of the two resampled individuals merge above or below time 0, and write It remains to bound the expectation on the r.h.s. above, which originate from resampling between individuals whose genealogical distance depend on the distance of their ancestors in X CS We can now separate the contribution to the second term in (4.61) into two groups.
The first group consists of contributions from pairs (z i , z i+1 ) with −2M ≤ z i < z i+1 ≤ 2M . The total contribution from these terms is uniformly dominated by x∈[−2M,2M ]∩E 0 dx≤2 d x , which tends to 0 as ↓ 0 by the assumption X CS 0 ∈ U R rr . The second group consists of contributions from pairs (z i , z i+1 ) with either z i ≤ −2M or z i+1 ≥ 2M . We bound these terms by 2 times the expected cardinality of such pairs. By the duality between W and W, the cardinality of such pairs of (z i , z i+1 ) is bounded by the cardinality of ξ Proof of Theorem 1.5. As noted in Remark 1.4, we can identify M V as a subspace of (M V f ) N , endowed with the product V -marked Gromov-weak topology. Furthermore, under this identification, M V is a closed subspace of (M V f ) N . It was shown in [DGP11, Theorem 2] that M V 1 , the space of V -mmm spaces with probability measures, equipped with the V -marked Gromov-weak topology, is a Polish space. The same conclusion is easily seen to hold for M V f . Therefore (M V f ) N is also Polish, which implies that any closed subspace, including M V , is also Polish.
Proof of Theorem 1.8. For each k ∈ N ∪ {0, ∞}, let where C k (R ( n 2 ) + ×V n , R) is the space of bounded continuous real-valued functions on R ( n 2 ) + × V n that are k times continuously differentiable in the first n 2 coordinates, and Φ n,φ ((X, r, µ)) := · · · φ(r, v)µ ⊗n (d(x, v)) for each (X, r, µ) ∈ M V 1 , with r := (r(x i , x j )) 1≤i<j≤n and v := (v 1 , . . . , v n ). We recall from [DGP11, Theorem 5] that Π k := ∪ n Π k n is convergence determining in M V 1 and M 1 (M V 1 ), and hence also in M V f and M 1 (M V f ). We now want to argue that this holds as well for n Π k ⊆ n Π k . This follows immediately from [EK86,Prop. 3.4.6] for measures on product spaces. (iii) For each > 0, there exists M ∈ N such that uniformly in (X, r, µ) ∈ Γ, we can find M balls of radius in X, say B ε,1 , . . . , By the tightness criteria formulated in [DGP11,Thm. 4] for M V f -valued random variables, we obtain the following tightness criteria for M V -valued random variables.
} i∈I is a tight family of random variables taking values in the space of finite measures on V (equipped with the weak topology); (ii) {X i , r i , µ i (dx × B k (o))} i∈I is a tight family of random variables taking values in the space of metric measure spaces (equipped with the Gromov-weak topology).
Using the characterization of relatively compact sets in M V given in Theorem A.1, one can also formulate more concrete conditions for the tightness of a family of M V -valued random variables, using concrete conditions for the tightness of a family of random metric measure spaces formulated in [GPW09,Thm. 3].
Pasting Let (K t ) t≥0 be the marked metric measure space generated by the entrance law of the spatial coalescent on V starting with countably many individuals per site, each forming their own partition element and run for time t.
We claim the dual representation theorem (strong duality): where denotes the subordination of the second marked ultrametric measure space to the first w.r.t. the location of partition elements at time t which on the r.h.s. are realized independently.
Here pasting of Y to X means with respect to a function χ : Y → V without reference to graphical devices that we have the following relation.
Then we should have the following relation: Let X = (x, r, µ) and Y = (Y, r , µ ) both in U V . Then we require for the new element of U V for all ϕ, g that: If an ultrametric measure space exists, it is automatically uniquely determined. We can construct a solution by taking a sample sequence and pasting by coagulating roots and leaves for the same sites, if they have the same index as we explained earlier in Remark 1.20.

Appendix B. The Brownian web
In this section, we recall the construction and basic properties of the Brownian web. Recall from Subsection 1.4 the random variable (W, W) as constructed in [FINR04,FINR06], and in particular from (1.28)-(1.30), the state spaces of Π of W and Π of W. It has been shown [FINR04, Theorem 2.1] that the Brownian web W can be characterized as follows: Theorem B.1 (Characterization of the Brownian web). The Brownian web W is a random closed subset of Π, whose law is uniquely determined by the following properties: (i) For every z ∈ R 2 , almost surely W(z) contains a unique path.
The following result shows that every path in W can be approximated by the countable set of paths {W(z); z ∈ D} in a very strong sense (see e.g. [SS08, Lemma 3.4]). Then there exists an almost surely uniquely determined Π-valued random variable W defined on the same probability space as W, called the dual Brownian web, such that: (i) Almost surely, paths in W and W do not cross, i.e., there exist no f ∈ W and f ∈ W and s = t such that (f (s) −f (s))(f (t) −f (t)) < 0; (ii) R W has the same law as W, where R denotes the reflection map that maps eacĥ f ∈ W to an f ∈ Π such that the graph of f in R 2 is the reflection of the graph of f with respect to the origin.
Below we collect some basic properties of the Brownain web which we will use. For further details, see [FINR04,FINR06,SS08]. The first property concerns the configuration of paths in W and W entering and leaving a point z ∈ R 2 . For each z = (x, t) ∈ R 2 , let m out (z) denote the cardinality of W(z). We will let m in (z) denote the number of equivalence classes of paths in W entering z, where a path f ∈ W is said to enter z if it starts before time t and f (t) = x, while two paths f, g ∈ W entering z are called equivalent if they coalesce before time t. Note that m in (z) = 2 if and only if z is a point of coalescence between two paths in W. Similarly, we can definem in (z) andm out (z), based on the configuration of paths in W. The pair (m in , m out ) is called the type of z in W.
(2) For each t ∈ R, the set of z ∈ R × {t} withm out (z) ≥ 2 is a countable set, with m in (z) ≥ 1 for each such z, i.e., z lies on the graph of some path in W starting before time t.
Next we cite a result on the decay of the density of coalescing paths started at time 0.
Lemma B.5 (Density for the Brownian web). For t > 0, let ξ R×{0} t := {f (t) : f ∈ ∪ x∈R W(x, 0)} denote the point set on R generated at time t by the collection of coalescing paths in the Brownian web W started at time 0. Then for any a < b, This result can be easily derived by using the duality between W and W, namely that ξ t ∈ (x, x + ) = ∅ if and only if the two paths in W starting from (x, t) and (x + , t) do not collide on the time interval [0, t]. See e.g. [SS08, Prop. 1.12], where such a density calculation is carried out for a generalization of the Brownian web known as the Brownian net, which in addition allows branching of paths.

Appendix C. Correlation Inequalities for Coalescing Brownian motions
In this section, we prove some negative correlation inequalities for a collection of coalescing Brownian motions, which are used in Section 4. One such inequality has been observed in [NRS05,Remark 7.5]. Here we deduce more general negative correlation inequalities from Reimer's inequality applied to coalescing random walks.
In van den Berg and Kesten [vdBK02], Reimer's inequality was applied to continuous time coalescing random walks with a generalized coalescing rule. Since we are interested in coalescing Brownian motions, discrete space-time coalescing random walks with instantaneous coalescing already provide an adequate approximation, and Reimer's inequality can be applied without any complication to the latter.
First we recall Reimer's inequality [Rei00]. For each i ∈ I := {1, · · · , n}, let S i be a finite set with a probability measure µ i on S i . Let Ω = S 1 × S 2 · · · × S n and µ = µ 1 × · · · × µ n . For K ⊂ I and ω = (ω i ) i∈I , define the cylinder set C(K, ω) := {ω ∈ Ω : ω i = ω i ∀ i ∈ K}. Given two events A, B ⊂ Ω, we say A and B occur disjointly for a configuration ω ∈ Ω if there exists K ⊂ I such that C(K, ω) ⊂ A and C(I\K, ω) ⊂ B. The set of ω ∈ Ω for which A and B occur disjointly, which we call the disjoint intersection of A and B, is denoted by Then Reimer's inequality asserts that, for any two events A, B ⊂ Ω, Now we apply this inequality to coalescing random walks. We recall first the construction of discrete space-time coalescing random walks. Let Z 2 even = {(x, t) ∈ Z 2 : x + t is even}. Let {ω z } z∈Z 2 even be i.i.d. random variables taking values in {±1}. A directed edge is drawn from each z = (x, t) ∈ Z 2 even , which ends at (x + 1, t + 1) if ω z = 1, and ends at (x − 1, t + 1) if ω z = −1. This provides a graphical construction of a collection of coalescing random walks, where the random walk path starting from each z ∈ Z 2 even is constructed by following the directed edges in Z 2 even drawn according to ω. To see how Reimer's inequality is applied, let (X be a collection of coalescing random walks constructed as above with starting points (x i , t i ) ∈ Z 2 even , and assume for simplicity The crucial observation is that, if the events (A i ) 1≤i≤n occur simultaneously, then they must occur disjointly w.r.t. (ω z ) z∈Z 2 even because of the coalescence. Namely, Reimer's inequality (C.1) then gives the negative correlation inequality The same reasoning allows us to choose each A i to be an increasing event of the occupation configuration ξ t ∩O i , i.e., given ω and ω with respective occupation configurations ξ t ∩O i ⊂ ξ t ∩ O i , if ω ∈ A i , then also ω ∈ A i . Using Reimer's inequality as illustrated above, together with the invariance principle for coalescing random walks, we will deduce a host of negative correlation inequalities for coalescing Brownian motions, which we formulate next.
Definition C.1 (Negatively correlated point processes). We say a simple point process ξ on R is negatively correlated, if for any n ∈ N and any disjoint open intervals O 1 , · · · , O n , Proof. By monotone convergence, it suffices to consider the case when A consists of a finite number of points {z 1 , · · · , z k }. The fact that ξ A t is negatively correlated then follows directly from the negative correlation inequality (C.3) for coalescing random walks, and the distributional convergence of coalescing random walks to coalescing Brownian motions in the local uniform topology (see e.g. [NRS05, Section 5]).

Lemma C.3 (Decoupling of correlation functions).
Let ξ A t be as in Lemma C.2. Let a 1 < b 1 ≤ a 2 < b 2 · · · ≤ a n < b n . Then for any 1 ≤ j ≤ n − 1, Proof. As before, this follows from approximation by discrete space-time coalescing random walks and Reimer's inequality. Note that for the discrete analogue of the events in the second line of (C.4), if they all occur, then they must occur disjointly.
Lemma C.4 (Negative correlation for occupation number). Let ξ A t be as in Lemma C.2, and let B ⊂ R have finite Lebesgue measure. Then for any k ∈ N, We proceed by discrete approximation. Let (Z t ) t∈N be the subset of Z occupied at time t ∈ N by a collection of coalescing random walks on Z 2 even starting from z 1 = (x 1 , t 1 ), · · · , z n = (x n , t n ) ∈ Z 2 even with t i ≤ 0 for all 1 ≤ i ≤ n. Given O ⊂ Z and for k ∈ N, let A k = {ω ∈ Ω : |Z t ∩ O| ≥ k}, where ω = (ω z ) z∈Z 2 even are the i.i.d. {±1}valued random variables underlying the graphical construction of the coalescing random walks. We note that Indeed, if A k occurs, then we can find k disjoint random walk paths, each of which occupies a distinct site in O at time t. Reimer's inequality (C.1) then implies P(A k ) ≤ P(A 1 ) k . Inequality (C.5) then follows by the distributional convergence of coalescing random walks to coalescing Brownian motions in the local uniform topology.
We also need the following estimate on the constrained two-point correlation function for the Brownian web.
We then have (C.16)  In this section we present the proof of the results on the evolving genealogies for the interacting Fleming-Viot diffusions. This model is a special case of evolving genealogies for the interacting Λ-Fleming-Viot diffusions which are studied in [GKW].
Proof of Theorem 1.11. We will proceed in five steps: (1) We show the result on the martingale problem and the duality for finite geographic spaces V . (2) To prepare for the general case where V is countable, we define an approximation procedure with specific finite geographic space dynamics. (3) We then show the convergence in path space, as the finite spaces approach V . (4) We verify the claimed properties of the solution for general V by a direct argument based on the duality and an explicit look-down construction. (5) Finally, we show that the process admits a mark function.
We will use several known facts on measure-valued Fleming-Viot diffusions. For that we refer to [Daw93,Chapter 5] in the non-spatial case and to [DGV95] for the spatial case.
Step 1 (V finite) The case where V is finite is very similar to the non-spatial case. We therefore just have to modify the arguments of the proof of Theorem 1 in [GPW13] or Theorem 1 in [DGP12].
As usual we will conclude uniqueness of the solution of the martingale problem from a duality relation. Note that in contrast to the (non-spatial) interacting Fleming-Viot model with mutation considered in [DGP12], in our spatial interacting Fleming-Viot model, resampling takes place only locally, that is, for individuals at the same site. The treevalued dual will therefore be based on the spatial coalescent considered in [GLW05]. As for existence of a solution of the martingale problem we consider the martingale problems for the evolving genealogies of the approximating spatial Moran models. By consistency of the spatial coalescent, we get the uniform convergence of generators for free. Thus we only have to show the compact containment condition. Here we can rely on the general criterion for population dynamics given in Proposition 2.22 in [GPW13]. As V is finite, all arguments given in [GPW13] to verify this criterion simply go through here as well.
Step 2 (A coupled family of approximating finite systems) Let now V be countable, and consider a sequence (V n ) n∈N of finite sets with V n ⊆ V , and V n ↑ V . Put for each n ∈ N, and for all v 1 , v 2 ∈ V , (D.1) a n v 1 , v 2 := Denote then by X FV,Vn a solution of the martingale problem associated with the operator (restricted to V n ) (D.2) 1≤k< ≤n ∂ ∂r k, φ(r) 1≤k< ≤n 1 {v k =v } θ k, φ − φ (r), and K Vn the spatial coalescent on V n with migration rateā n (·, ·) rather thanā(·, ·).
Notice that a n is not necessarily double stochastic anymore, which turns the duality with the spatial coalescent into a Feynman-Kac duality where the Feynman-Kac term converges to 1, as n → ∞, on every finite time horizon. The Feynman-Kac duality reads as follows (compare, for example, [ This can be immediately verified by explicit calculation (compare, [GPW13,Section 4] for the generator calculation for the resampling part, and [Sei14, Proposition 3.11] for the generator calculation for Markov chains -here migration -whose transition matrix is not double stochastic). As for given n ∈ N, our dynamics consists of independent components outside V n , we can apply Step 1, here with the Feynman-Kac duality, to conclude the well-posedness of the martingale problem with respect to L FV,Vn .
Step 3 (V countable) Fix X 0 ∈ U V 1 . In this step we want to show that the family {X FV,Vn ; n ∈ N} is tight, and that every limit satisfies the (L FV , Π 1,2 , δ X 0 )-martingale problem.
Observe first that the family of laws of the projection of the measures on mark space are tight since the localized state w.r.t. a fixed finite subset A of V has only finitely many marks and weight |A| (uniformly in n). We therefore will here ignore the marks and show tightness in Gromov-weak # -topology. For that we want to apply [EK86, Corollary 4.5.2]. Since a n (v 1 , v 2 ) → a(v 1 , v 2 ) for all v 1 , v 2 ∈ V , we clearly have that L FV,Vn Φ converges to L FV Φ, as V n ↑ V , i.e., sup X ∈U V |L FV,Vn Φ(X ) − L FV Φ(X )| −→ n→∞ 0, for Φ depending only on finitely many sites, it remains to verify the compact containment condition, i.e., to show that for every T > 0, and ε > 0 we can find a compact set K T,ε ⊆ U V such that for all n ∈ N, (D.6) P X FV,Vn t ∈ K T,ε ; for all t ∈ [0, T ] ≥ 1 − ε.
For that purpose we will once more rely on the criterion for the compact containment condition which was developed in [GPW13, Proposition 2.22] for population dynamics. To see first that the criterion applies, notice that the evolving genealogies of interacting Fleming-Viot diffusions can be read off as a functional of the look-down construction given in [GLW05]. Thus the countable representation of the look-down defines a population dynamics. In particular, for each t ≥ 0, we can read off a representative (X t , r t , µ t ) of X FV t in the look-down graph such that ancestor-descendant relationship is well-defined. Denote for all t ≥ 0, s ∈ [0, t] and x ∈ X t by A t (x, s) ∈ X t−s the ancestor of x ∈ X t back at time s, and for J ⊆ X s by D t (J , s) ⊆ X t the set of descendants of a point in J at time t.
In the following we refer for each finite A ⊆ V to (D.7) X FV,·,A as the restriction of X FV,· to marks in A, i.e., obtained by considering the sampling measure µ A (dxdv) := 1 A µ(dxdv). Fix T > 0. We then have two show that the following are true for all A ⊆ V , • Tightness of number of ancestors. For all t ∈ [0, T ] and ε ∈ (0, t), the family {S Vn 2ε (X t , r t , µ t ); n ∈ N} is tight, where S Vn 2ε (X t , r t , µ t ) denotes the minimal number of balls of radius 2ε needed to cover X t up to a set of µ t -measure ε.
• Bad sets can be controlled. For all ε ∈ (0, T ), there exists a δ = δ(ε > 0) such that for all s ∈ [0, T ), n ∈ N and σ(X FV,Vn u ; u ∈ [0, s])-measurable drandom subsets J Vn ⊆ X s × A with µ s (J Vn ) ≤ δ, (i) Fix t ∈ [0, T ]. W.l.o.g. we assume that V n is a subgroup of V with addition + n for each n ∈ N. We consider for each n ∈ N another spatial coalescentK Vn on V n with migration kernelā n (·, ·) rather thanā n (·, ·) where (D.9)ã n (v, v ) := y∈V ;y∼nv a(v, y), where ∼ n denotes equivalence modulo + n . Further, denote byX FV,Vn the evolving genealogies of the interacting Fleming-Viot diffusions whose migration kernel isã n (·, ·) rather than a n (·, ·). We prefer to work withK Vn . For this spatial coalescent it was verified in the proof of Proposition 3.4 in [GLW05] that for any time t > 0 the total number of partition elements ofK Vn t which are located in A is stochastically bounded uniformly in n ∈ N. As the kernelã n (·, ·) is double stochastic,X FV,Vn andK Vn are dual (without a Feynman-Kac potential), and as d GP # (X FV,Vn , X FV,Vn ) → 0, as n → ∞, the claim follows.
Notice that {µ t (D Vn t (J Vn , s)×A); t ≥ s} is a semi-martingale and given by a martingale with continuous paths due to resampling plus a deterministic flow in and out of the set A due to migration. Therefore we have to control the fluctuation of the martingale part and the maximal flow out of the set A over a time interval of length t − s.
The martingale part is estimated from below with Doob's maximum inequality (the quadratic variation is bounded uniformly in the state and in n by a constant ·|A|, details are left to the reader). The deterministic out flow occurs at most at a finite rate c × |A| since the total mass of every site is one. This estimate is uniform in the parameter n (recall the random walk kernel is perturbed by restricting it to V n ). Similarly the flow into A can be bounded by c · |A| independently of n but the flow out of the set A occurs with a maximal rate c independently of n with n ≥ n 0 (A), and (D.10) Step 4 (Feller property). We next proof that X FV has the Feller property. From here it is standard to conclude that X FV satisfies the strong Markov property (see, for example, [EK86, Theorem 4.2.7]). Consider a sequence (X (n) 0 ) n∈N in U V 1 such that X (n) −→ n→∞ X 0 , Gromov-weak # ly, for some X 0 ∈ U V 1 . Denote by X FV,X (n) 0 and X FV,X 0 the evolving genealogies of the interacting Fleming-Viot diffusions started in X (n) 0 and X 0 , respectively, and let K be our tree-valued dual spatial coalescent, and H as in (1.18). Then for each given t ≥ 0, a.s.
Step 5 (Mark function) Fix T ≥ 0, and X 0 ∈ U V fct . For the proof we will rely once more on the approximation of the solution of the (L FV , Π 1,0 , δ X 0 )-martingale problem by U V fct -valued evolving genealogies of Moran models, X M,ρ , where ρ > 0 is the local intensity of individuals. By the look-down construction given in [GLW05], we can define the family {X M,ρ ; ρ > 0} on one and the same probability space. Moreover, as the solution of the (L FV , Π 1,0 , δ X 0 )-martingale problem has continuous paths, due to Skorohod representation theorem we may assume that X M,ρ t → X FV t , as ρ → ∞, uniformly for all t ∈ [0, T ], almost surely.
Assume first that the geographic space V is finite. For the construction of such a function h ,V = h ∈ H and a random measurable set Y ρ,δ,V = Y ρ,δ ⊆ X ρ t , we can proceed exactly as in the proof of Theorem 4.3 of [KL] where the statement is shown with mutation rather than migration in the non-spatial rather than the finite geographic space.
Let now V be countable, and consider a sequence (V n ) n∈N of finite sets with V n ⊆ V and V n ↑ V . Consider for each n ∈ N a solution, X FV,Vn , of the (L FV,Vn , Π 1,0 , δ X 0 )-martingale problem with L FV,Vn as defined in (D.2). As we have seen above, X FV,Vn t ∈ U Vn fct for all t ≥ 0, almost surely. Moreover, we have shown in Step 3 that each solution X FV of the (L FV , Π 1,0 , δ X 0 )-martingale problem on V can be obtained as the limit of X FV,Vn as n → ∞. To conclude from here that also X FV t ∈ U V fct for all t ≥ 0, almost surely, fix a finite set A ⊆ V . As done before we denote by X FV,Vn,A = (X t , r Vn t , µ Vn t (· ∩ (X t ∩ A))) t≥0 and X FV,A = (X t , r t , µ t (· ∩ (X × A))) t≥0 the restrictions of X FV,Vn and X FV , respectively, to marks in A (compare (D.7)).
For each m > n we couple X FV,Vn and X FV,Vm through the graphical lookdown construction by using the same Poisson point processes and marking every path which leaves V n in the V m dynamics by a 1. Moreover, we impose the rule that the 1 is inherited upon lookdown in the sense that both new particles carry type 1. The sampling measure of types then follows an interacting Fleming-Viot (in fact two-type Fisher-Wright) diffusion with selection. The corresponding Moran models are coupled and converge in the many particle per site limit to a limit evolution, which is the coupling on the finite geographic spaces and the additional types act upon resampling as under selection.
By construction, if x, x supp(µ t (·×A)), their distance is the same in X FV,Vn and X FV,Vm if both carry type 0. Thus for suitably large n (depending on > 0) such that A ⊆ V n at any location in A the relative frequencies of types 1 at time t can be made less than any given ε > 0 with probability ≥ 1 − ε by simple random walk estimates. Namely, if (Z b t ) t≥0 is a a(·, ·)-random walk starting in b ∈ V n ⊆ V m and b ∈ CV n , P (Z b t / ∈ V n for some t ∈ [0, T ], Z b T ∈ A) (D.14) +P (Z b t ∈ V n for some t ∈ [0, T ], Z b T ∈ A) ≤ δ n → 0 as n → ∞ ∀ m ≥ n. Then the expected frequency of type 1 in locations in A is bounded by F (δ m ) with F (δ) → 0 as δ → 0, which follows from the properties of the Fisher-Wright diffusion with selection easily via duality.
As a consequence the supremum along the path of the difference in variational norm of the distance-mark distributions for the V n and the V m -evolution for types in the set A can be bounded by a sequence converging to 0 as n, m → ∞.
Therefore also the limit dynamics on countable V has a mark function.
Proof of Theorem 1.16. Let X FV be the evolving genealogies of interacting Fleming-Viot diffusions where we have assumed that the symmetrized migration is recurrent. In order to prove ergodicity we proceed in two steps: (1) We start with constructing the limiting object which is tree-valued spatial coalescent.
(2) We then prove convergence of X FV t to this tree-valued spatial coalescent, at t → ∞, for any initial state X 0 ∈ U V 1 . This immediately implies uniqueness of the invariant distribution.
Step 1 (Tree-valued spatial coalescent) Recall from [GLW05] the spatial coalescent started with infinitely many partition elements per site and with migration mechanism a(v, v ). If the symmetrized migration is recurrent, we can assign to each realization a marked ultrametric space, K = (K, r), which admits a mark function. In order to equip it with a locally finite measure on the leaves, we consider a coupled family of sub-coalescents {K , ρ > 0} such that the number of points of a given mark is Poisson with intensity ρ. If we now assign each point in K ρ mass ρ −1 , then it follows from [GLW05, Theorem 3] that there exist a measure µ on K × V such that for each v ∈ V , µ(K × {v}) = 1. This reflects the spatial dust-free property. Thus, we can use the same arguments used in [GPW09,Theorem 4] to show that the family {(K ρ , ρ −1 (x,v)∈K ρ ×V δ (x,v) } is tight, and in fact has exactly one limit point, (D.15) K ↓ := (K, r ↓ , µ ↓ ) Step 2 (Convergence into the tree-valued spatial coalescent) For all X 0 ∈ U V 1 and K 0 ∈ S, by our duality relation, using the functions H = H n,φ from (1.18), Once more, as the family {H n,φ (·, K); n ∈ N, φ ∈ C bb (R ( n 2 ) + ), K ∈ S} is convergence determining by Theorem 1.8(i) , we can conclude that for all initial conditions, X FV t converges Gromov-#-weakly to the tree-valued spatial coalescent.