TIME-SPACE ANALYSIS OF THE CLUSTER-FORMATION IN INTERACTING DIFFUSIONS

A countable system of linearly interacting di(cid:11)usions on the interval [0 ; 1], indexed by a hierarchical group is investigated. A particular choice of the interactions guarantees that we are in the di(cid:11)usive clustering regime, that is spatial clusters of components with values all close to 0 or all close to 1 grow in various di(cid:11)erent scales. We studied this phenomenon in [FG94]. In the present paper we analyze the evolution of single components and of clusters over time. First we focus on the time picture of a single component and (cid:12)nd that components close to 0 or close to 1 at a late time have had this property for a large time of random order of magnitude, which nevertheless is small compared with the age of the system. The asymptotic distribution of the suitably scaled duration a component was close to a boundary point is calculated. Second we study the history of spatial 0- or 1-clusters by means of time scaled block averages and time-space-thinning procedures. The scaled age of a cluster is again of a random order of magnitude. Third, we construct a transformed Fisher-Wright tree, which (in the long-time limit) describes the structure of the space-time process associated with our system. All described phenomena are independent of the di(cid:11)usion coe(cid:14)cient and occur for a large class of initial con(cid:12)gurations (universality).


Introduction
The present paper is a second step in our program started in [FG94] to better understand the longterm behavior of interacting systems with only degenerate equilibria (i.e. steady states concentrated on traps), which typically occurs in weakly interacting (low dimensional) situations. Examples for this situation are branching models, linear voter model, linear systems in the sense of Liggett, and genetics models of the type we discuss here.
In the first step [FG94] of our study of infinite interacting systems of diffusions in [0, 1] in the regime of diffusive clustering we already obtained a detailed picture about the growth of clusters in space observed at single time points which get large. Furthermore we got a first rough insight in the time behavior of the process observed at a fixed finite collection of components. The aim of this second step of the program is to develop a suitable scheme which enables us to deepen the understanding of the large scale correlation structure in time and space.
The purpose of the present paper is threefold: (i) A refinement in the analysis of the time structure of the component process which will reveal that the times spent close to the boundaries of [0, 1] are diverging at a random order of magnitude as the observation time point gets large. That is, they are of the form T (t) α for suitable time scales T (t) and a random variable α whose distribution will be identified. This order α is less than one, i.e. the age of the cluster is small compared to the system age.
(ii) To understand the history of spatial clusters. In particular, we want to relate the time a component was close to 0 or 1 to the spatial extension of the related cluster.
(iii) To construct an object which contains the information about the time-space structure of the system on a "macroscopic" scale. We call this object a transformed Fisher-Wright tree.
As in [FG94], another important aspect is that the results are proved for a whole class of models (universality) allowing quite general diffusion coefficients and initial laws. In particular, the role of the transformed Fisher-Wright tree is not restricted to only interacting Fisher-Wright diffusions. The interaction term considered here corresponds to the d = 2 case in usual lattice models (whereas the equivalent to the d = 1 case behaves again different, compare Klenke [Kle95]; see also Evans and Fleischmann [EF96]).
The phenomenon of clustering has been addressed for low dimensional branching systems; see for instance Iscoe [Isc86] (assertions on the finiteness of the total occupation times), Cox and Griffeath [CG85] (random ergodic limit in the critical dimension), and Dawson and Fleischmann [DF88] (scaling limit of time-space clumps in subcritical dimensions). For the voter model which exhibits qualitatively similar behavior as the interacting diffusion, the phenomenon of clustering has been approached in Cox and Griffeath [CG83] by studying occupation times. In the present paper we will follow a different, more direct approach for interacting diffusions. It is actually not too hard to use our results to study similar questions for the voter model on a hierarchical group.
The analysis of clusters and their evolution in time, presented in § § 1.3-1.5 below, will proceed by viewing single components in suitable time scales (Theorem 1), large spatial averages in various time scales (Theorem 2), and time-space thinned-out systems (Theorem 3). The transformed Fisher-Wright tree is defined in § 1.2.

Model of interacting diffusions
We start by introducing the model under consideration. For a discussion of the population genetics motivation for this model we refer to [FG94] and references therein (see also Remark 1.5 below). Definition 1.1 (interacting diffusion) Let X = {X ξ (t); ξ ∈ Ξ, t ≥ 0} denote the interacting diffusion on the hierarchical group Ξ with fixed drift parameters {c k ; k ≥ 1} and diffusion coefficient g. This process is defined as follows: For each specified initial state in [0, 1] Ξ , consider the unique strong solution X (Shiga and Shimizu [SS80]) of the following system of stochastic differential equations: (1) The ingredients occurring in this equation are the following: denotes the "discrete norm" of ξ. Finally, Ξ is an Abelian group by defining the addition coordinate wise modulo N , and ξ − ζ is the "hierarchical distance" of ξ and ζ.
(b) (ball average) X ξ,k refers to the empirical mean (ball average) in a k-"ball": (c ) (driving Brownian motions) w = w ξ (t); ξ ∈ Ξ, t ≥ 0 is a system of independent standard Brownian motions in R.

3
Definition 1.2 (initial state) We often use a random initial state X(0). Then X(0) is assumed to be independent of the system w of driving Brownian motions. The law L X(0) is denoted by µ. In most cases we assume that µ belongs to the set T θ of all those distributions on [0, 1] Ξ which are shift ergodic with density θ ∈ (0, 1), that is µ(dx) x ξ ≡ θ. Write IP µ := IP g µ for the distribution of X if µ is the initial law L X(0) , and IP z := IP g z in the degenerate case µ = δ z , z ∈ [0, 1] Ξ .
see Figure 1. By an abuse of notation, in this case we also write IP b µ and IP b z for the laws of X. ..] labels the ξ1-st member in a family, which is the ξ2-nd family of a clan, ..., which is the ξk-th member of a k-level set. ξ − ζ = k refers to relatives ξ, ζ of degree k.
The system (1) occurs as diffusion limit of genetics models with resampling and migration (Moran model). Then Xξ can be interpreted as a gene frequency of the ξ-th component (colony) of the system. (See Sawyer and Felsenstein [SF83] or Chapter 10 in Ethier and Kurtz [EK86].)

3
The basic features of the model stem from a competition between drift and noise. Namely, set for the moment c k ≡ 0, then all components X ξ fluctuate independently according to diffusions with coefficient g. For instance in the Fisher-Wright case (5), X ξ will be trapped at 0 or 1 in finite time, as indicated in Figure 2. On the other hand, if we set g = 0, then X solves an infinite system of ordinary differential equations, which has the property that, under c k > 0, for initial states in T θ (Definition 1.2) the solution X t converges as t → ∞ to the constant state θ that is θ ξ ≡ θ.
Therefore in the case c k ≡0, g = 0, the drift term is in competition with the basic feature of the diffusion of the single components to get trapped at {0, 1} and in fact prevents it from getting trapped at all in finite time (except if X starts already with either of the traps 0 or 1).
In the sequel we shall study only the case where c k ≡ a > 0, which is the prototype displaying a specific "critical" behavior. Namely, this special choice of the drift parameter implies first of all that we are in the regime of clustering (for which k c −1 k = ∞ would suffice), that is Even more, the whole system X is in the regime of the so-called diffusive clustering, that is, the logarithm of the volume of clusters of neighboring components with values all close to 0 or close to 1 grow at a random linear speed if we observe the process in a suitable, in our case at an exponential, time scale; see Fleischmann and Greven [FG94]. Such behavior occurs typically if n k=1 c −1 k diverges but not exponentially fast as n → ∞, while the case of c −1 k = c −k with c < 1 gives different behavior, see Klenke [Kle95]. In order to keep notation reasonable we focus on c k ≡ a > 0 rather than putting conditions on n k=1 c −1 k . The above dichotomy is analogous to the d = 2, d = 1 dichotomy in usual lattice models as voter model, branching random walk, generalized potlatch and smoothing etc. For the voter model on Z Z 2 , see Cox and Griffeath [CG86]. Concerning the ergodic theory for general drift parameters c k we refer to Cox and Greven [CG94]. For a description of the cluster-formation for general c k , see Dawson and Greven [DG93] (concerning the mean field limit) and Klenke [Kle95]. Much of the scheme we derive to study the cluster-formation in time can be performed as well for the label set Z Z 2 .

Transformed Fisher-Wright tree
In order to discuss clustering phenomena, we want to introduce some objects which shall play a basic role in the description of the genealogy of clusters in this type of interacting models, as the backward tree in spatial branching theory does. Start with the following basic ingredients.
(b) (fluctuation times) We call the hitting time of the traps the fluctuation time of Y θ (cf. Figure 2 above).
and denote the marginal laws of this time-inhomogeneous Markov process by (d) (holding time of Y θ ) Introduce the holding time of Y θ : τ := e −τ ∈ (0, 1).

3
Consequently, a path of the transformed Fisher-Wright diffusion Y θ starts at 0 or 1, namely with the law of Y θ (τ ), that is with stays there for the random time τ ∈ (0, 1). After τ , the path fluctuates like a standard Fisher-Wright diffusion but with time reversed and on a logarithmic scale, and finally ends up at time β = 1 at the deterministic value θ. (Read Figure 2 backwards.) Note that (9) can alternatively be written Next we compose a whole tree of Fisher-Wright diffusions (see Figure 3): Definition 1.7 (Fisher-Wright tree Y θ ) Fix θ ∈ (0, 1), k ≥ 1, and (deterministic) time points 0 ≤ s k < · · · < s 1 < s 0 := ∞.
(a) (trunk) First we introduce the trunk of the tree denoted by Y θ ∞ . It is nothing else than Y θ from Definition 1.6 (a).
Figure 3: Fisher-Wright tree (only one branch trapped so far) (b) (branches) Next we define the branches of the tree. Given the trunk Y θ ∞ , let a branch Y θ sk split away from the trunk at the time s k . The branch is again assumed to be standard Fisher-Wright, but defined on the time interval [s k , ∞], that is starting at time . Proceed with the other s i accordingly. The branches Y θ si leave only from the trunk Y θ ∞ and are constructed independently of each other, given the trunk. Hence, by definition all the branches Y θ si , k ≥ i ≥ 1, are conditionally independent given Y θ ∞ . Note that all the finitely many branches and the trunk end up in the set {0, 1} of traps after finite times.
(c ) (fluctuation time of the trunk) As in Definition 1.6 (b), denote by τ the fluctuation time of the trunk. Of course, given Y θ ∞ (τ ) = ∂ ∈{0, 1}, all branches Y θ si with s i ≥ τ are trapped at ∂. (d) (law and filtration) For the fixed s k , ..., s 1 , write P θ for the law of the Fisher-Wright tree Y θ and {F(t); t ≥ 0} for the related filtration (with F(t) describing the behavior of Y θ in [0, t]).

3
Remark 1.8 The somewhat unexpected index ∞ = s0 (instead of 0 or sk+1) on the symbol Y θ ∞ for the trunk of the tree will become clear below when we switch to a transformed tree. This also indicates that one could read the trunk in backward direction while then the branches, starting with Y θ s 1 , split off in time viewed forward. This is (for good reason) the same as with the backward tree in branching theory, see for instance Chapter 12 in Dawson [Daw93]. -Note also that for typographical simplification we do not display the time points sk , ..., s1 in the notation of Y θ or P θ .
(a) (trunk) The trunk of the transformed Fisher-Wright tree is Since α = 0 is included, the trunk and all branches start from the traps {0, 1} and stay there for a positive time. The branch Y θ βi terminates at the (deterministic) time β i when it coalesces with the trunk; hence Y θ βi (β i ) = Y θ 0 (β i ). Consequently, Y θ can be considered as a coalescing ensemble of transformed Fisher-Wright diffusions. Note that by Definition 1.7 (b), all the branches Y θ βi , 1 ≤ i ≤ k, are conditionally independent given Y θ 0 .
(c ) (holding time of the trunk) As in Definition 1.6 (d), denote by τ the holding time of the trunk i.e. branches with terminal time bounded by τ are trapped at the trunk.
3 Remark 1.10 (Trees of transformed diffusions) Trees of transformed diffusions turned out to be a basic object entering into the description of cluster-formation of interacting diffusions. This applies for instance to interacting critical Ornstein-Uhlenbeck diffusions, that is g(r) = σ 2 > 0, r ∈ R, where one has a tree of Brownian motions, and to super-random walks, that is g(r) = σ 2 r, r ≥ 0, which lead to a tree of more complicated but explicitly known time-inhomogeneous diffusions.

Time structure of components
Expected phenomena We start by discussing the phenomena to be described. The formal set-up and related results will be contained in the next two subsections.
For the remainder of the introduction we require: Assumption 1.11 (initial state) X starts off with a shift ergodic distribution µ with fixed density θ ∈ (0, 1), that is µ ∈ T θ .

3
The basic theorem for the interacting diffusion in the regime of clustering is (where the symbol =⇒ refers to weak convergence); see Cox and Greven [CG94]. Nevertheless, if we fix a label ξ in the hierarchical group Ξ, we proved in [FG94, Theorem 5] that for the corresponding component process {X ξ (t); t ≥ 0} in [0, 1], lim sup t→∞ X ξ (t) = 1 and lim inf t→∞ X ξ (t) = 0 a.s.
In this sense, opposed to a system of independent diffusions, each component X ξ oscillates "between both traps" infinitely often. As a rule, it actually even spends asymptotically fraction one of the 9 time close to the traps {0, 1}, [FG94,Theorem 4]. (Note that the oscillation property (15) can be interpreted as a type of "recurrence" property for clusters.) We now want to know more about the durations for which X ξ is close to 1 or close to 0 (life times of clusters, or alternatively about correlation length in the time of our system). This should be closely related to the spatial cluster-formation.
We studied the cluster extensions in space in [FG94, Theorem 3]: At time N βt (as t → ∞), the spatial clusters are of "size" αt, where α is a random element of the open interval (0, β) (with β > 0 fixed which could be set to 1 by scaling). More precisely, we have α = β τ with τ the holding time of the transformed Fisher-Wright diffusion (Definition 1.6 (d)).
Or from another point of view, at time scale N βt correlations in space are built within distances of order αt, with the same random α. Or turned around, clusters of a spatial extension over a ball of radius αt need at least time N (α+ε)t to be formed with positive probability, for some ε > 0. Combined with (15) this means that smaller clusters keep being overturned or melted with other smaller ones. This suggests that in order to describe the sequence of holding times of values close to 1 or close to 0 on a large scale, we should encounter four interesting phenomena. Namely the holding times should be of a random order of magnitude, asymptotically small compared with the age of the system and -(stochastically) monotone and of increasing order of magnitude, within the correlation length, comparable with the correlation length.
To see that the correlation length is of a smaller order than the system age, look (for the fixed ξ) at {X ξ (βt); 0 < β ≤ 1}. The law of this process converges as t → ∞ to a "stationary" 0-1-valued noise; see Proposition 6.1 at p. 39. (Compare this phenomenon of a noisy behavior in time with the occurrence of a spatial "isolated Poissonian noise" in the analysis of the clumping in the time-space picture for branching systems in low dimensions [DF88].) To elaborate on this point look at the exponential scale N βt and at a component process X ξ (N βt ); 0 < β ≤ 1 (for the fixed ξ). As t → ∞ we get the same limiting "stationary" 0-1noise. That is, the limit is independent in each "macroscopic" time point β, where the common one-dimensional marginal is just (11), with θ ∈ (0, 1) the initial density of the system. The latter fact follows from (14).
This indicates that in order to capture time correlations we have to study X ξ after very long times but on a much finer scale than β as it appears in N βt . To accomplish that, we will look backwards from late time points N T in time scales of smaller order. This will be incorporated formally by the following set-up.

Results
To capture the structure of the correlations in time of the component process, we will look at an asymptotically small neighborhood of a late time point. For fixed ξ ∈ Ξ and T > 0, we define the scaled component process, that is β ∈ [0, 1) becomes the "macroscopic backward time". Consequently, from the "terminal time" N T we look backwards for the amount N βT where β varies in [0, 1). Note that N T − N βT ∼ N T as T → ∞, so that the whole process U T indeed describes the behavior "close to" N T .
Recall that g ∈ G 0 and µ ∈ T θ with θ ∈ (0, 1). We denote by fdd = =⇒ weak convergence of all finite-dimensional distributions. Now we describe the behavior of a single component X ξ in time based on the definitions (16) of U T and 1.6 of Y θ and τ.
(a) (convergence) There is a {0, 1}-valued process U ∞ on the (macroscopic backward) time interval [0, 1) such that Consequently, the distribution of U ∞ is a mixture of Bernoulli product laws: First realize the transformed Fisher-Wright diffusion Y θ and then build the product law with marginals In particular, the one-dimensional marginals L U ∞ β are given by (11), for all β ∈ [0, 1).
(c ) (qualitative description of U ∞ ) Consider the holding time Furthermore, beyond h U the process U ∞ is a "mixture" of non-stationary 0-1-noise: For ∂ ∈ {0, 1} and the β 1 , ..., β k as in (b), Remark 1.12 Note that the holding time hU is measurable on the σ-algebra of all (backward) paths with a non-empty starting interval of constant value.

3
The theorem says three things: (i) If we look back from time N T in time scale N βT , the component we focus on has been "close" to its state ∂ for a time of random order β of magnitude. (ii) This order is (strictly) positive and coincides in law with the holding time τ of Y θ . (iii) Later changes occur in times of a smaller order of magnitude (conditional noise), within the correlation length.
Remark 1.13 (time average of components) Since the correlation length (in time) is small compared with the system's age, one could prove that objects of the form t −1 t 0 ds Xξ(s) converge in law to θ. This is characteristic for the case of drift parameters {ck} not decaying exponentially fast (the analog of the d = 2 case in lattice models). Compare [CG83]. An open problem A very natural question is, how the holding times close to time points N T behave in the limit T → ∞. To be a bit more specific, for a fixed ε ∈ (0, 1 2 ) we introduce a sequence of random (backward) times (see Figure 5): Here For our purpose, the increments H T 1 , H T 2 , ... may serve as the (backward) holding times of the component process X ξ at the boundaries, since the fraction of time the component process spends in [ε, 1 − ε] converges to 0 in probability as T → ∞; see Theorem 4 in [FG94].
Incorporating the scaling suggested by the result of Theorem 1, define the rescaled holding times (that is N H T i T = H T i ) which for our purpose describe the order of magnitude of H T i . What one would like to do now is the following: -Show that L H T i ; i ≥ 1 has a limiting law, say Γ. -Identify the law Γ via the transformed Fisher-Wright tree.
-Show that Γ is concentrated on decreasing sequences. In order to carry out such an analysis, which involves joint laws of holding times rescaled by functions of different order of magnitude, requires more than controlling moments of the time-space diagram. What is needed is a representation of the interacting system via particle systems in the sense of the work of Donelly and Kurtz [DK96]. Such analysis is outside the scope of the present paper.

Spatial ball averages in time dependence
We want to combine the previous set-up describing a single component during time with our results in [FG94] about the spatial structure at a fixed (late) time, and this way to obtain a better picture how the clusters evolve in time. We approach this phenomenon from two angles. Namely in the present subsection we consider spatial ball averages in their time dependence whereas in the next one we shall deal with thinned-out time-space fields.
For fixed ξ ∈ Ξ and α ∈ [0, 1) consider the following spatial ball averages 0 ≤ β < 1, as processes in the macroscopic backward time β ∈ [0, 1) (here [r] refers to the integer part of r). As T → ∞, a limiting process V α,∞ on [0, 1) will exist whose law depends on α. Since N βT = o(N T ) (for β < 1 fixed), we stay again within the correlation length, and the one-dimensional marginal distribution of V α,∞ is again independent of β but is now given by the law Q θ α of the transformed Fisher-Wright diffusion Y θ of (9) at α ; see [FG94,Theorem 2].
The next theorem deals with this time-scaled process of spatial ball averages. Recall that µ ∈ T θ and 0 <θ <1.
with Y θ the transformed Fisher-Wright tree of Definition 1.9, and Here the r.h.s. is the following mixture of product laws: This theorem says that the spatial ball average has remained in its terminal value at least a time of order N αT . However, this holding time is larger than α if the whole α-ball is covered by a 0-or 1-cluster at the terminal time N T (this event has positive probability), in which case (depending on the random size of that cluster) the empirical mean had been in the same state as at time N T for a random time. The order of magnitude is α ∨ τ . Looking back further gives us then conditionally (given α ∨ τ ) independent observations since the time grid is too large to detect earlier and hence small holding times. Theorem 2 (c) combined with the conjectures at p. 11 and a result in [FG94] suggests that a specific value in the order of magnitude of the holding time of a component (viewed backwards from a late time point) corresponds to the existence of a cluster at that late time which has a corresponding order of magnitude. Roughly speaking, on the used macroscopic scales, the spatial cluster size gives the holding time of a typical component in that cluster. This will be made precise in Theorem 3 below.

Time-space thinned-out systems
A second approach to investigate the history of a spatial cluster found at time N T and to relate the order of spatial size of the cluster to the order of the holding time of a component, is the following. Choose a spatial network of points having distances αT . Consider a new field obtained by observing the system through time only at this network of observation points. Do this however only in a network of time points which also spread apart suitably as the system ages. We formalize this point of view as follows which will verbally be explained in Remark 1.15.
Definition 1.14 (thinning procedures) (a) (inverse level shift operators S −1 n and spatially thinned-out systems n shifts all coordinates (levels) of ξ by n steps, and fills in the newly created coordinates by 0. Hence, ξ = 0 is a fixed point, and if ξ = m = 0 then S −1 n ξ = m + n. In particular, S −1 n increases non-zero distances of pairs of labels by n. Applied to a whole configuration x ∈ [0, 1] Ξ , we can view S −1 n x as a spatially thinned-out system since each fixed pair of labels has distance n.
To describe its distribution, let F β,α denote the law of the random vector Y θ βi (α); 0 ≤ i ≤ k , and write ∂ for the configuration identically equal to ∂. Then Consequently, W β,α,∞ is a "mixture" of independent fields; with probability P ( τ ≥ β 1 ) it is even a constant field ∂ (with random ∂).
Theorems 1 (c) and 3 (c) reflect the fact that clusters have a space-time extension with an order of magnitude (α, α) where α is random. That is, the spatial cluster size is αT (in the hierarchical distance), whereas a "typical" component of that cluster lived for a time N αT . Or turned around, at time N T , spatial clusters of size αT have an age of order N αT . Hence, in the time-space diagram of the process viewed back from the end N T in an exponential time scale, we see at large times clusters of a size comparable with a square of a random size.

3
The most important feature of our analysis is that the large scale behavior of our model does not depend on the diffusion coefficient g, and in particular the transformed Fisher-Wright tree is an universal object in the class of models considered: Corollary 1.17 (universality) The limiting objects U, V, and (in the sense of finite-dimensional distributions) W depend essentially on the initial density θ ∈ (0, 1), but are otherwise independent of the "input parameters" a > 0, g ∈ G 0 and µ ∈ T θ of the interacting diffusion X, and of the parameter N of the label set Ξ.

Strategy of proofs and outline
The proofs of the Theorems 1-3 will follow the strategy to first reduce the general results by coupling and comparison techniques to the case of interacting Fisher-Wright diffusions starting in a product law. Then we can use a time-space duality relation with a delayed coalescing random walk ϑ with (deterministic) immigration, their approximation by an (instantaneous) coalescing random walk η with immigration, and scaling limits for the latter model.
For this purpose, in Section 2 we study some random walk systems, in particular coalescing random walks. In Section 3 we introduce an extension of Kingman's coalescent. We call this object Λ an ensemble of log-coalescents. In Section 4 it occurs in certain scaling limits of coalescing random walks (e.g., Theorem 4 at p. 27). On the other hand, it is in duality with the transformed Fisher-Wright tree Y θ (Theorem 5 in Section 4, p. 31), which is our crucial object for the description of the space-time structure of interacting diffusions. In Section 5 other basic techniques like the duality of X and ϑ, coupling and moment comparison are compiled, culminating in the universal conclusion Theorem 6 at p. 37. In Section 6 we finally prove our Theorems 1-3 and with Theorem 7 (p. 39) a rather general version of a scaling limit for thinned-out X-systems.

Preliminaries: On coalescing random walks
A basic tool for our study of the interacting Fisher-Wright diffusion X will be a time-space duality relation with a delayed coalescing random walk with immigration. As a preparation for this, in the present section we develop the relevant random walk models and some of their properties.

Coalescing random walk with immigration
(where a is the drift parameter a ≡ c k of the interacting diffusion of Definition 1.1 and N the "degree of freedom" in the hierarchical group Ξ) and jump probabilities Let Z ξ refer to Z starting with Z(0) = ξ ∈ Ξ (at time 0). The law of Z = Z ξ is denoted by P ξ . For convenience sometimes we also write Z(t) instead of Z t (similarly we proceed for other processes). We recall from [FG94, Lemma 2.21 and Proposition 2.37] that Z is a recurrent random walk and that the hitting time distribution of the origin starting from a fixed point ξ = 0 has tails of order 1/ log t as t → ∞. For a detailed study of this random walk we refer to Section 2 of [FG94].
Delayed coalescing random walk ϑ Let ϑ = {ϑ ξ (t); ξ ∈ Ξ, t ≥ 0} denote the (right-continuous) delayed coalescing random walk in Ξ with coalescing rate b > 0 (which corresponds to the diffusion parameter of the interacting Fisher-Wright diffusion, recall (5)). By definition, in the delayed coalescing random walk ϑ the particles move according to independent random walks of the previous subsection except when two particles meet. In the case of such a collision, as long as the two particles are at the same site, they attempt to coalesce to a single particle with (exponential) rate b.
Write ϑ ψ if ϑ starts (at time 0) with ψ ∈ Ψ. Here Ψ ⊂ Z Z Ξ + denotes the set of all those particle configurations ψ = {ψ ξ ; ξ ∈ Ξ} which are finite: ψ := ξ ψ ξ < ∞. The configurations ψ with ψ = 1 (unit configurations) are denoted by δ ξ where ξ ∈ Ξ is the position of the particle. Set For a detailed description and discussion of ϑ we refer to § 3.a in [FG94] where the model is called coalescing random walk with delay. (ϑ is the dual of the interacting Fisher-Wright diffusion, see (64) at p. 34 below.) Coalescing random walk η Write η = η ϕ , ϕ ∈ Φ, for the (instantaneous) coalescing random walk obtained by formally setting the coalescing rate b to ∞. Here Φ denotes the set of all (finite) populations ϕ ∈ Ψ with at most one particle at each site, that is ϕ ξ ≤ 1 for all ξ; see § 3.c in [FG94] for a detailed exposition. (Recall that η is the dual of the voter model on Ξ with interaction described by κ p ξ,ζ of (25) and (26); see Liggett [Lig85, Chapter 5].) By an abuse of notation (no confusion will be possible), the distributions of η ϕ and ϑ ψ are written as P ϕ and P ψ , respectively.
Delayed coalescing random walk with immigration As introduced above, the delayed random walk ϑ ψ starts at time t 0 = 0 with ϑ(0) = ψ. Now we modify the model in the following way. Consider a finite sequence t 0 , ..., t k ∈ IR of (deterministic) time points and related (deterministic) populations ψ 0 , ..., ψ k ∈ Ψ, respectively. Start the delayed random walk at time t * := t 0 ∧ ... ∧ t k with the related population ψ * , but in addition let the related populations ψ i immigrate at the remaining time points t i = t * , i = 0, ..., k. The resulting (right-continuous) delayed coalescing random walk with (deterministic) immigration is again denoted by ϑ but we exhibit the immigration parameters in the notation as follows: In particular, the starting time point is also viewed as an immigration time point. Of course, in the case k = 0 and t 0 = 0 we are back to the original delayed coalescing random walk: P ψ 0 = P ψ . Note that this family of (time-inhomogeneous) Markov processes has an obvious generalized time-homogeneity property: Coalescing random walk with immigration Similarly we define η, the (instantaneous) coalescing random walk with immigration (where b = ∞) and use the notation η = η ϕ 0 ,...,ϕ k t0,...,tk , P ϕ 0 ,...,ϕ k t0,...,tk , t 0 , ..., t k ∈ IR, ϕ 0 , ..., ϕ k ∈ Φ.
These processes have a generalized time-homogeneity property analogous to (28). In this case one should have in mind a picture as shown in Figure 6. The delayed coalescing random walk process with immigration is in a time-space duality with the interacting Fisher-Wright diffusion process, see Proposition 5.1 at p. 34, whereas the (instantaneous) coalescing random walk with immigration is in a time-space duality with the voter model on Ξ. The word time-space refers here to the fact that we consider the whole path up to time t. (In the case of Ξ = Z Z d with interaction determined by the simple random walk kernel p ξ,ζ , the latter time-space duality was developed in Cox and Griffeath [CG83] using the name "frozen" random walks instead of ones with "immigration".)

Basic coupling
Throughout the paper it will be useful to define the relevant random walk models on a common probability space. For comparison we shall also need a system Z χ 0 ,...,χ k t0,...,tk of independent random r r p r r r r r r r r r Figure 6: Coalescing random walk with immigration (0 = t 0 < t k < t k +t, k =1) walks with immigrating populations χ 0 , ..., χ k ∈ Ψ at the times t 0 , ..., t k , respectively, defined as Ψ-valued process in the obvious way. Finally we give the following basic coupling principle: Construction 2.1 (basic coupling) Choose a basic probability space [Ω, F, P] in such a way that it supports all three (time-inhomogeneous) Markov families where k ≥ 0, t 0 , ..., t k ∈ IR, χ 0 , ..., χ k , ψ 0 , ..., ψ k ∈ Ψ and ϕ 0 , ..., ϕ k ∈ Φ, and that these families satisfy Proof (existence of the basic coupling) First construct a probability space which supports the family of independent random walks with immigration Z χ 0 ,...,χ k t0,...,tk . Then at time t * we start χ * independent walks placed according to the related χ * , at all the remaining times t i we additionally start χ i independent walks placed according to χ i . But in addition every immigrating particle (including at time t * ) gets an internal degree of freedom, by definition one of the numbers 0, 1 or 2. The rules are as follows: If the immigrating particle belongs to one of the ϕ i it gets the 0-mark, in the case of particles from ψ i − ϕ i we adjoin the mark 1, and for χ i − ψ i we take 2. The mark of a particle is preserved during its evolution except for the following two situations: • If two particles meet which have both the mark 0, then one of them (chosen at random) instantaneously gets the mark 1. • If a pair of particles with mark in {0, 1} (except if both are 0) stays at the same site, then at exponential rate b one of them having mark 1 is chosen at random (if we have two of them) and increases it's mark from 1 to 2. Here we let all possible pairs (at the same site) act independently.
Then at time t ≥ t * count the particles as follows: Apparently these processes satisfy Z(t) ≥ ϑ(t) ≥ η(t), t ≥ t * , and are a version of Z, ϑ, η as wanted.

2
Note that the trivariate process [Z, ϑ, η] is not Markov. (In defining Z, ϑ, η by deleting the internal marks, the Markov character is lost.)

Approximation by (instantaneously) coalescing walks
Doubtless, (instantaneous) coalescing random walks with immigration are easier to handle than the corresponding delayed ones. On the other hand, we want to show now that in our context of a recurrent Z asymptotically the delayed coalescing random walk with immigration can be replaced without loss of generality by the corresponding system with instantaneous coalescence, and we will widely use this later on. (This implies in particular that the clustering properties of interacting Fisher-Wright diffusions are independent of the diffusion parameter b > 0.) On an intuitive level this equivalence is justified by the following argument: If two particles do not meet, then the coalescing rate b is irrelevant and can be set to ∞. On the other hand, once two particles meet and do not coalesce before one of them jumps away, then by recurrence they will meet again and again until they will finally coalesce. (Caution: This heuristic argument has to be refined since it does not take into account that one of these two particles could meanwhile be "absorbed" by another particle.) To put this idea on a firm base, first associate with each ψ ∈ Ψ the "truncated" element ψ ∧ 1 ∈ Φ defined by (ψ ∧ 1) ξ := ψ ξ ∧ 1, ξ ∈ Ξ. The following result is a refinement and generalization of the approximation Proposition 3.6 of [FG94].
and time points Then on our basic probability space [Ω, F, P] (recall Construction 2.1), the event has P-probability converging to 1 as t → ∞. (Sometimes we do not display the t-dependence.) Remark 2.3 The approximate equivalence of ϑ and η explains via duality, why (in the recurrent case) interacting Fisher-Wright diffusions and the voter model on Ξ have a similar large scale behavior.

3
Proof The proof proceeds by induction over k, the number of immigration time points.
1 • (initial step of induction) Let k = 0. Then the processes are time-homogeneous, and for simplicity we may set s 0 (t) ≡ 0. We treat this case k = 0 by doing again an induction, namely over the number m 0 of initial particles. For convenience, write m 0 =: m, ψ 0 =: ψ. Without loss of generality we may assume that s 1 (t) = N t is satisfied (otherwise change the notation of ψ(t)). Fix representations ψ(t) =: δ ζ(1,t) + · · · + δ ζ(m,t) . Trivially, the claim holds for m = 1. For the induction step, recall the coupling Construction 2.1 and assume that the statement is true for some m − 1 ≥ 1.
for the event (29) (in the case k = 0). Define E m m−1 as E m except replacing ψ by ψ − δ ζ(m,t) . For a fixed i < m, let C i,t (s) and M i,t (s) denote the events that the walks Z ζ(i,t) and Z ζ(m,t) coalesce respectively meet by time s.
Let σ(t) denote the first collision time of Z ζ(i,t) and Z ζ(m,t) after at least one of them jumped away from its initial state. Recall that the difference of the independent walks Z ζ(i,t) and Z ζ(m,t) is a random walk of the same kind except twice the jump rate. Define α(t) by α(t)t = ζ(i, t) − ζ(m, t) . By the hitting probability Proposition 2.43 of [FG94], we have for γ ∈ (0, 1) fixed, (In fact, apply this proposition twice, namely with β(t) ≡ 1 and (t) ≡ −∞ or (t) ≡ γ, respectively.) Consider a subsequence t → ∞ such that the limit α(∞) := lim t →∞ α(t ) exists in [0, ∞]. If α(∞) ≥ 1 then by the same proposition we have In the opposite case α(∞) < 1, the latter probability has a positive limit, and (30) implies But then due to recurrence of the random walk (cf. Lemma 2.21 in [FG94]) we conclude On the other hand, from (31) we know that (32) holds also under α(∞) ≥ 1. Summarizing (32) is true whenever t = t → ∞. Dropping in notation the time argument N t , we use the decomposition (where CA denotes the complement of the event A). By the induction hypothesis, P E m m−1 tends to 1 as t → ∞, so the probability of the event on the r.h.s. of (33) tends to 1. By (32) we can replace in that event i<m M i,t by i<m C i,t to get still This finishes the proof by induction on m since the latter event implies E m . Consequently the claim in the proposition holds in the case k = 0.

Speed of spread of random walks
Our random walk Z in Ξ has the following property: At time scale N t the speed of growth of the norm Z(N t ) of Z(N t ) is of order 1 as t → ∞. To formulate with Lemma 2.6 below a more precise statement, for r, c ≥ 0, set and introduce the subsets Remark 2.5 (cancellation) The assumption α < β cannot be dropped. For instance, if α = β = 1 and

2
The announced speed property of our random walk now reads as follows. Note that we choose the initial state ξ(t) of the random walk Z ξ(t) itself t-dependent.
Now, under ζ(t) ≡ 0, the case α ≤ β directly follows from Lemma 2.26 in [FG94] (with ζ, s, r replaced by ξ(t), βt, (t)t, respectively). It remains to treat α > β. For the moment, consider the walk Z 0 starting at the origin 0 of Ξ. By the proved part of the lemma, we may assume that Z 0 N βt −N (t)t belongs to Ξ[βt, 2c]. Then, by Lemma 2.4, for t sufficiently large, 2c] with probability converging to 1 as t → ∞. This finishes the proof.

2
Remark 2.7 (non-cancellation) In the case α ≤ β, the cancellation effect of Remark 2.5 cannot happen in the situation of Lemma 2.6, since there is negligible probability that the walk will meet a prescribed point at a particular late time.
3 2.5 Speed of spread of coalescing random walks The above speed property of families of single random walks (Lemma 2.6) has consequences for the coalescing random walk with immigration, since we are interested in the latter system at late times and for time-dependent initial and immigrating populations. To describe the situation we need some notation (which is verbally explained below): the set of all those populations ϕ = ϕ(t) ∈ Φ which can be represented as ϕ = ϕ 0 + · · +ϕ where the ϕ j = ϕ j (t) ∈ Φ, 0 ≤ j ≤ , have the following properties (recall (35)):
• Pairs of particles from the j-th subpopulation spread with relative velocity α j (cf. (c)).
• Mixed pairs of particles from [ϕ j , ϕ j ] spread at relative speed α j ∨ α j (see (d)). Now we are in a position to formulate the main result of this subsection concerning the speed of spread of multi-colonies in the coalescing random walk with spreading immigrating populations. In simplified words it says the following: Suppose at times s i (t) := N βit , i ≤ k, we have an immigration by populations being a superposition consisting of i + 1 subpopulations of m i,0 , ..., m i, i particles with velocities determined by α i,0 , ...α i, i , respectively. Then the terminal population at normalized time β k+1 is a superposition of subpopulations which spread apart with the velocities α i,j ∨β k+1 , 0 ≤ j ≤ j , (all except some logarithmic error terms and as described in Definition 2.8).
Proof The proof will be by induction over k, the number of immigration time points.
1 • (initial step of induction) Consider k = 0 (no additional immigration), and drop the index 0 in notation. Consider a pair ξ(t), ζ(t) of "particles" taken from the initial population ϕ = ϕ(t), that is δ ξ + δ ζ ≤ ϕ. Recall that the difference Z := Z ξ − Z ζ of independent walks is a random walk in Ξ of the same kind but with twice the jump rate. Now there are two cases possible: The pair ξ, ζ of particles originates (i) from a subpopulation ϕ j of ϕ related to the speed α j , (ii) from two different subpopulations ϕ j and ϕ j of ϕ (i.e. a "mixed" pair). (i) By assumption on ϕ j we have ξ − ζ ∈ Ξ[α j t, c] (recall condition (c) of Definition 2.8). Hence we may apply the walk speed Lemma 2.6 (with = β 0 ) to conclude that the event has a probability converging to 1 as t → ∞, and hence conditioning on this event is harmless. If now the coalescing mechanism is additionally applied (recall the coupling principle 2.1), then on the event (40) there are two cases. If the walks meet, then they coalesce, and we may apply the walk speed Lemma 2.6 to the surviving random walk starting with a particle ξ from ϕ j which case has to be considered anyway (to check the condition (b) of Definition 2.8). Then we get the desired position Z ξ (s 1 ) ∈ Ξ[(α j ∨ β 1 )t , 2c]. On the other hand, if the walks do not meet, then the pair ξ, ζ of particles survives by time s 1 , and its relative position is in Ξ[(α j ∨ β 1 )t , 2c], since we are in the event (40). Summarizing, the walks starting in the pair ξ, ζ from ϕ j , end up at time s 1 (t) in a subpopulation corresponding to the (relative and absolute) speed α j ∨ β 1 .
(ii) Now consider a mixed pair ξ, ζ from ϕ j , ϕ j . By assumption (recall condition (d) of Definition 2.8), it has relative speed α j ∨ α j , say α j without loss of generality. Again by the walk speed Lemma 2.6, we may assume that (40) holds. Hence, we may continue to argue as in (i). Combining (i) and (ii), we see that 2 • (induction step) Consider k ≥ 1. By the Markov property of the process η and generalized time-homogeneity as formulated in (28) at p. 16 for the process ϑ, the population from (39) can be thought of as arising from a process which starts in the population and running as a coalescing random walk for the time N βk+1t − N βkt . Now we use that the claim is true for some k − 1 ≥ 0 (induction hypothesis). Then by (39) we may restrict our consideration to the case that χ k belongs to We take a pair ξ, ζ of particles from χ k + ϕ k . The cases that both particles belong either to χ k or to ϕ k can be dealt with as in the first step of induction. The only difference is that we apply now the walk speed Lemma 2.6 with = β k instead of = β 0 . Thus it remains to consider the mixed case if one of the particles belongs to each of the submulti-populations. Say ξ belongs to χ k whereas ζ is related to ϕ k . Then ξ ∈ Ξ[(α i ,j ∨ β k )t , 2 k c] for some i = 0, ..., k − 1 and j = 0, ..., i , and ζ ∈ Ξ[α k,j t , c] for some j = 0, ..., k . Now the condition (38) comes into the play, namely for i = k. It guarantees that by the spread of sums Lemma 2.4 the speed of ξ − ζ can be determined by ξ − ζ ∈ Ξ (α i,j ∨ β k ) ∨ α k,j t , 2 k c . Then one can continue as in the other two cases just described.
Summarizing, under the induction hypothesis, at the normalized time β k+1 we end up in the event as written in (39), with probability converging to one. This completes the proof by induction.
2 Remark 2.10 The condition ϕ i (t) ∈ Φt[ α i , m i ; c] says roughly that all absolute positions are of specified orders. This was required as soon as just one "violation" of parameter restrictions occurs. This is stronger than actually needed. But otherwise one would need a refined notation in order to describe the situation.3

Ensemble of log-coalescents with immigration
In this section we study coalescing random walks with immigrating multi-colonies: We consider later and later time points and let the initial and immigrating populations spread apart. There exists a limiting object which we call an ensemble of log-coalescents with immigration. The crucial result is Theorem 4 at p. 27.

A log-coalescent λ with immigration
The purpose of this subsection is to introduce a death process on a logarithmic time scale, which we call the log-coalescent. In the next subsection we shall relate it with a scaling limit of a system of coalescing random walks with spreading initial populations (Proposition 3.2).
Start by recalling Kingman's [Kin82] coalescent λ := λ(t); t ≥ t 0 with coalescing rate b > 0. By definition, this is a (time-homogeneous right-continuous Markov) death process starting at time t 0 ∈ IR where a jump from m ≥ 0 to m − 1 occurs with rate b m 2 . The process λ describes the evolution of finite populations of particles without locations, where each pair of particles coalesces into one particle with rate b, independently of all the other present pairs.
This is a time-inhomogeneous Markov jump process starting at time α 0 . (We call it the logcoalescent, to avoid confusion with Kingman's coalescent.) The transition probabilities of λ are denoted by From the time-homogeneity of λ follows that Since the transition probabilities of Kingman's coalescent λ can be calculated explicitly (see for instance Tavaré [Tav84, formula (6.1)]), we get for the transition probabilities of λ (restricting to m ≥ n ≥ 1): and p m 0 (β, 1) ≡ 1 if 0 = α < β. In addition, we now allow a (deterministic) immigration of particles in the log-coalescent λ .

Coalescing walk starting in spreading multi-colonies
Before we proceed further, in this subsection we demonstrate first in a simpler situation the role which is played by the log-coalescent with immigration. We restate a limit proposition concerning a coalescing random walk starting in (spreading) multi-colonies. In fact, Proposition 3.28 of [FG94] (which is analogous to Theorem 6 in [CG86]), with the now obvious identification of the limit probabilities, can be specialized as follows (formally we also include the case m i = 0). Recall the rings Ξ[r, c] of (35).
Roughly speaking, start the coalescing random walk η with a superposition of +1 subpopulations ϕ 0 , ..., ϕ where pairs of particles from ϕ j (t) spread with the relative velocity α j whereas pairs from different subpopulations ϕ j and ϕ j spread with the relative speed α j ∨ α j . Then the number of particles at the late time N βt is approximately given by the log-coalescent λ = λ m0 ,...,m α0 ,...,α at time β, with immigration of m 0 , ..., m particles at times α 0 , ..., α , respectively. Note that only requirements on the relative position of particles in the initial populations are involved (in contrast to the scaling limit Theorem 4 below on the coalescing random walk with immigrating multi-colonies).
Remark 3.3 If the condition α0 , ..., α ≤ β in Proposition 3.2 is violated by some αj then the walks starting with particles of this speed αj cannot react by time N βt (with probability converging to 1 as t → ∞). So they simply evolve independently, and in the limit these particles have to be added to the number of particles arising from the log-coalescent. Consequently, that condition is natural in that it is adapted to the actual range of interaction of the coalescing random walk.

Ensembles Λ of log-coalescents with immigration
In this subsection we introduce the limiting object for coalescing random walks with immigration of spreading populations (multi-colonies). To avoid repeatedly cumbersome notation, we formulate a condition which we call the α≤β-Condition (recall Remark 3.3).

3
We now want to introduce what we call an ensemble of log-coalescents with immigration, see Figure 7. It will be used in the next subsection to describe a more general version of Proposition 3.2 above, namely a scaling limit for the coalescing random walk with immigrating multi-colonies.
Roughly speaking several log-coalescents with immigration evolve independently until they reach certain prescribed deterministic times β 1 < · · · < β k , respectively. In addition, we have a tagged population (related to the horizontal lines in the figure). From the times β 1 , ..., β k on, the respective log-coalescents start to interact with the tagged population. (Recall that the coalescing rate b was set to 1.)  In particular, Λ(β k+1 ) denotes the terminal number of particles in the whole system. We call Λ the ensemble of log-coalescents with immigration and parameters α , m , β .
3 By a generalized time-homogeneity, the following recursion formula holds: k ≥ 1, n ≥ 0. Note that the number of non-vanishing terms in the sum is bounded by i,j m i,j , hence finite. Clearly, (48) generalizes to the following homogeneity property: Definition 3.6 (ensemble of coalescents without immigration) If 0 = · · · = k = 0 in the α≤β-Condition 3.4 and in Definition 3.5 then we call Λ an ensemble of coalescents without immigration, and write simply Λ α ; m β .

3
Remark 3.7 Note that the term "without immigration" refers only to the fact that within the (randomly) immigrating branches of Definition 3.5 (b) no immigration occurs.

Coalescing walk with immigration: Multi-colonies
Now we will formulate the announced scaling limit theorem for the coalescing random walk with immigrating multi-colonies spreading moderately (recall the Definition 2.8 at p. 21): On a macroscopic scale the latter behaves as an ensemble of log-coalescents with immigration. (Recall (50).) Theorem 4 (scaling limit with immigrating multi-colonies) Fix a constant c ≥ 1, and suppose the α≤β-Condition 3.4. Consider immigrating multi-colonies Then for the terminal population size of the related coalescing random walk with immigration we get with Λ = Λ α ; m β the ensemble of log-coalescents with immigration.
Remark 3.8 Note that the limit process Λ is independent of the jump rate κ of the underlying random walk and the parameter N of Ξ. -Also, the limits are non-degenerate except some boundary cases as e.g. if k = 0 =0 and m0,0 = 1 implying Λ(β) ≡ 1. -Recall that the limit law satisfies the recursion formula (51).
By the Markov property and generalized time-homogeneity as in (28) at p. 16 we can rewrite the expression as = E ϕ 0 ,...,ϕ k−1 s0 ,...,sk−1 P η(sk−)+ϕ k η (s k+1 − s k ) = n (54) with η denoting an independent copy of η. According to the speed of spread of multi-colonies Proposition 2.9 at p. 22 we may assume that the subpopulation η s k (t) − belongs to the set Φ t β k ; ≤m k−1 ; 2 k c whereas for the other subpopulation, ϕ k (t) ∈ Φ t α k , m k ; c ⊆ Φ t α k , m k ; 2 k c holds by assumption. Moreover, by the walk speed Lemma 2.6 at p. 20, the relative speed of mixed pairs ξ, ζ of particles can uniformly be determined: ξ − ζ ∈ Ξ[β k t , 2 k c], since ξ arises from a walk starting at time s k−1 with a particle having a speed ≤ β k−1 . Altogether, the two subpopulations related to the two summands in η(s k −) + ϕ k fulfill the requirements in the scaling limit Proposition 3.2 for multi-colonies (with = β k ). Hence, given η s k (t) − = n , the probability expression appearing in (54) has a limit which is given by p m k ,n α k ,βk (β k+1 , n) (recall (46) for the latter). Now by the induction hypothesis the statement on the population sizes is true for some k −1 ≥ 0. Then P ϕ 0 ,...,ϕ k−1 s0 ,...,sk−1 Combined with the previous convergence statement for the probability conditioned on η(s k −) = n , we arrive at the r.h.s. of the recursion formula (51), since the number of terms over which we sum is finite. This completes the proof by induction.

Coalescing walk with immigration: Colonies of common speed
Occasionally the α≤β-Condition 3.4 is not satisfied, therefore we prepare now a tool to treat such a situation. This comes up when at a sequence of time points single colonies immigrate which spread at a common speed α : On a macroscopic scale, by time α such coalescing random walk with immigration behaves like a system of non-interacting particles, and from time α on like an ensemble of log-coalescents without immigration (recall Definition 3.6).
The limit object looks as follows: The tagged population and all the branches of Λ start at time α, namely with m 0 + · · · + m J−1 , m J , ..., m k particles, respectively. They evolve independently as log-coalescents without immigration, until the branches coalesce with the tagged population at the times β J , ..., β k , respectively.
Proof Since all the immigrating particles have absolute and relative speed α , by time N αt none of them can interact by the walk speed Lemma 2.6. More precisely, by that lemma, with probability converging to 1 as t → ∞. But starting with time N αt , we may apply Theorem 4, specialized to "single-colonies", to get the claim of the proposition.

Coalescing walk with immigration: Exponential immigration time increments
Here we deal with a different time regime: Single populations with a common spreading speed immigrate, but now the immigration time increments are of the form N βt , and actually of a decreasing order. The limit is again an ensemble of log-coalescents without immigration.
In the limit object, the tagged population and all the branches of Λ start at time α, namely with m k , ..., m 0 particles, respectively. They evolve independently as log-coalescents without immigration, until the branches coalesce with the tagged population at the times β k , ..., β 1 , respectively.
Proof The proof proceeds in two qualitatively different steps: First we analyze the evolution up to time s k (t), and then we provide the final step from time s k (t) to N t . 1 • (initial population) By the speed of spread Proposition 2.9 and the scaling limit Proposition 3.2, we conclude that after the first step: (with probability converging to 1 as t → ∞). In the following time steps of macroscopic size β i < β 1 , this subpopulation η ϕ 0 s0 (s 1 −) further behaves (asymptotically) as a system of independent random walks (walk speed Lemma 2.6), which at time s k satisfies (repeated use of Proposition 2.9). 2 • (second immigration) By definition, ϕ 1 ∈ Φ t [α, m 1 ; c] additionally immigrates at time s 1 . By the parameter assumption (55), during the subsequent time increments, these new particles cannot interact with the subpopulation of 1 • (Lemma 2.6). On the other hand, their own evolution is similar to that of the initial population: ϕ 1 results at time s k into a subpopulation 3 • (all immigrants) Continuing arguing in this way, at time s k − we finally get k independent subpopulations (with probability converging to one). 4 • (final step) Define (t) by s k (t) = N (t)t . For the final step from time s k to N t , we may apply the scaling limit Proposition 3.2 for multi-colonies with ϕ 0 , ..., ϕ replaced by χ 0 , ..., χ k−1 , ϕ k , given n 0 , ..., n k−1 . In fact, also mixed pairs of particles from the total population at time s k satisfy the spreading condition (49), by the walk speed Lemma 2.6. Therefore, L η ϕ 0 ,...,ϕ k s0 ,...,sk (N t ) = = =⇒ t→∞ L λ n0 ,...,nk−1 ,mk β1 ,...,βk ,α (1) = L λ mk ,nk−1 ,...,n0 α,βk ,...,β1 (1) where [n 0 , ..., n k−1 ] is random, is independent of the evolution, and equals in law with the independent vector λ m0 α (β 1 ), ..., λ mk−1 α (β k ) .
But according to the Definition 3.5 of the ensemble of log-coalescents, specialized to the case without immigration, this limiting object can be described as claimed, finishing the proof.
2 4 Duality of Y θ and Λ In Theorem 4 (p. 27) of the previous section we learned that on a large space and time scale the coalescing random walk with immigrating multi-colonies can be described by an ensemble Λ of logcoalescents with immigration. In order to calculate probabilities for this limit process we use a duality relation with an object much simpler to handle. In fact, the main result of this section (Theorem 5) says that the limiting system is in duality with the transformed Fisher-Wright tree of Definition 1.9.

Duality of Y θ and Λ
By definition this is a diffusion process on the interval [0, 1] with generator determined by the differential operator 1 2 b (r − r 2 ) ∂ 2 ∂r 2 , 0 ≤ r ≤ 1. Recall that the terminal state Y (∞) ∈ {0, 1} is reached already after a finite time.
Consequently, recalling the action of the Fisher-Wright generator on h(n, ·), the generators of the (time-homogeneous) Markov processes λ and Y are in duality and we get the well-known duality between Kingman's [Kin82] coalescent λ and the Fisher-Wright diffusion Y (both with parameter b and starting at time 0): (Tavaré [Tav84]). Switch to the standard situation b = 1. Turning to a logarithmic scale, we will generalize this duality relation in Theorem 5 below. It will tell us that the generating function of the terminal number of particles in the ensemble Λ of coalescence with immigration can be expressed via moments of the transformed Fisher-Wright tree Y θ . (The definitions of Y θ and Λ were given in 1.9 and 3.5 at pp. 7 and 26, respectively.) 0 ≤ θ ≤ 1, with Y θ the transformed Fisher-Wright tree of Definition 1.9.
Example 4.1 In the special case i ≡ 0, mi,0 ≡ mi ≤ 1, the r.h.s. of (59) simplifies to with the transformed Fisher-Wright diffusion Y θ defined in (9). In fact, condition first on the trunk. Then all branches become conditionally independent. Next, for all i with mi = 1, we can use the martingale property of the Fisher-Wright diffusion to replace the (conditional) expectation over the k independent branches by their termination points This gives the l.h.s. of (60). Then apply again the martingale property. -Note in particular, that here the αi,0 are irrelevant. This is immediately clear from the ensemble of log-coalescents since there is only at most one particles in each branch, which cannot react before its termination time, hence its "age" is irrelevant. For convenience, as a preparation we first expose some elementary properties of the Fisher-Wright tree Y θ from Definition 1.7: Lemma 4.2 (elementary properties of Y θ ) With respect to P θ : (a) (exchangeability) Given a splitting point Y θ ∞ (s i ) for a branch, the corresponding branch Y θ si and the trunk from s i on have the same law: is an independent pair, given F(s k ).
Proof of Theorem 5 We proceed again by induction on k, the number of immigration time points of Λ (the number of branches in Y θ ).
1 • (initial step of induction) In the case k = 0 the ensemble Λ reduces to a single log-coalescent λ = λ m 0 α 0 with immigration. By formula (6.2) in [FG94], the generating function related to its terminal number λ(1) is given by Recalling Y θ 0 = Y θ yields (59) in the case k = 0.
2 • (induction step) Let k ≥ 1. Then by the recurrence formula (51) and the homogeneity property (52) the l.h.s. of (59) can be written as By the initial step of induction (recall (61)), the innermost sum equals where we used Lemma 4.3 (a) (with i = k). Inserting this into (62), interchanging the expectation E θ with the summation, and further rearranging yields where we additionally used the homogeneity property (52) with c = 1/β k .
Assume now that (59) is valid for some k − 1 ≥ 0 (induction hypothesis). Then the latter sum equals Finally, by the conditional independence property 4.3 (c) we can write the resulting expression But this is equal to the r.h.s. of (59), finishing the proof by induction.

5 Duality, Coupling and Comparison
In this section we compile some basic methods to prove limit theorems for the interacting diffusion X as introduced in § 1.1. The basic tools combined will allow us to prove the key result of this section, Theorem 6, which asserts the universality of the limits obtained for the special case of interacting Fisher-Wright diffusions starting with product initial laws. Furthermore, using Section 4 and 2 we actually see in Theorem 6 that everything boils down to coalescing random walks with immigration, an object studied in Section 3.
The methods needed are the following: a time-space duality of interacting Fisher-Wright diffusions which is in particular useful in the case of i.i.d. initial components, a successful coupling enabling us to generalize from product measure to any initial state in T θ , and a moment comparison to provide the step from Fisher-Wright g = bf, b > 0, to general diffusion coefficients g in G 0 .

Time-space duality of X and ϑ
It is convenient to write the defining equation (1) for X in the form with migration rate κ and migration probabilities p defined in (25) and (26), respectively, and with δ ξ,ζ = 1 if ξ = ζ, and δ ξ,ζ = 0 otherwise. We now develop a time-space duality between the interacting Fisher-Wright diffusion X (with diffusion parameter b > 0) and the delayed coalescing random walk ϑ with immigration (with coalescing rate b > 0). First recall that a single Fisher-Wright diffusion and Kingman's coalescent are in duality as written in (58). Taking into account that the drift term in the interacting diffusion (63) is related to a continuous time random walk determined by κ q, relation (58) generalizes to Shiga's [Shi80] duality relation between the interacting Fisher-Wright diffusion X and the delayed coalescing random walk ϑ as follows (Ψ was defined before (27)): (Here the notation z ψ := ξ∈Ξ z ψξ ξ is used.) This relates all the multivariate moments of X t with the generating functions of ϑ(t).
Since we want to study not only the law of the interacting diffusion at a single time t, but rather the whole path up to time t, we actually need the distributions of the process X viewed backwards from a "late" time point, say t k+1 . Hence we want to calculate moments of the form IE b z X ψ 0 tk+1−t0 · · · X ψ k tk+1−tk with backward time points 0 =: t 0 < t 1 < ... < t k+1 (viewed from t k+1 ). These moments can again be expressed via generating functions of a delayed coalescing random walk but now with immigration of particles exactly at those fixed time points t 1 , ..., t k . Here is the needed generalization of duality to multiple time points (recall that b > 0): Proposition 5.1 (time-space duality of X and ϑ) For z ∈ [0, 1] Ξ , k ≥ 0, ψ 0 , ..., ψ k ∈ Ψ and 0 ≤ t 0 < ... < t k+1 the following duality relation holds: Consequently, the duality formula (65) relates the moments of the interacting Fisher-Wright diffusion X (starting at z) of orders ψ 0 , ..., ψ k at times looked backwards from t k+1 , namely at the times t k+1 − t 0 , ..., t k+1 − t k , with the generating functions of the delayed coalescing random walk ϑ with immigrating populations ψ 0 , ..., ψ k at the forward times t 0 , ..., t k , respectively.
Remark 5.2 Only the "antiton" order in the duality relation (65) is important, that is one can interchange the role of forward and backward times.

3
Proof of Proposition 5.1 The proof is by induction on k. For k = 0 we are back to the original duality relation (64) since X and ϑ without additional immigration are both time-homogeneous. Let k ≥ 1. Apply the Markov property at the "earliest forward" time t k+1 − t k , and the time homogeneity of X to get for the l.h.s. of (65) where X is an independent copy of X. Now assume that the assertion (65) is true for some k −1 ≥ 0 (instead of k). Applying (65) to X we can continue with Apply the original duality relation (64) (that is the initial step of induction) to arrive at where ϑ is an independent copy of ϑ. The interior generating function expression can be reformulated using the generalized time-homogeneity as in (28). This finishes the proof of (65) by induction.

Successful coupling in the Fisher-Wright case
Coupling will actually be used twofold. Namely in the first place to get rid of independence assumptions concerning the initial state X(0) for which the duality (65) is still tractable. But also to truncate initial states in order to be able to handle some restricted interacting Fisher-Wright diffusions needed in § 5.3. To prepare for the second case we first want to modify a bit our basic model introduced in Definition 1.1.

Definition 5.3 (diffusion coefficients in G)
Let G ⊃ G 0 denote the set of all diffusion coefficients g which are defined as in G 0 (recall Definition 1.1 (d) at p. 4) except that we require strict positivity of g on a non-empty subinterval of (0, 1) only. Note that the definition of the interacting diffusion X as strong solution to (63) still makes sense for these general g ∈ G.
3 Definition 5.4 (coupling principle) Fix g ∈ G and two initial laws µ, ν on [0, 1] Ξ . Let Γ be a distribution on [0, 1] Ξ × [0, 1] Ξ with marginals µ, ν. Choose [X(0), X(0)] according to Γ, and solve (63) separately starting with X(0) and X(0), respectively, but using the same collection {w ξ ; ξ ∈ Ξ} of driving standard Brownian motions (recall that we work with the unique strong solution of (63)). Then the bivariate process [X, X] is called the coupling of the interacting diffusions X and X with diffusion coefficient g and joint initial law Γ. Write IP g Γ for its distribution, and IP g [x,y] in the degenerate case Γ = δ x × δ y .

3
We now use this coupling concept to control the effect of a particular truncation of the initial state. For this purpose, for 0 ≤ ε ≤ 1 3 and z ∈ [0, 1] Ξ define the truncated configuration z ε ∈ [ε, 1−ε] Ξ by Moreover, if z is distributed according to µ then we write µ ε for the "truncated law", that is for the distribution of z ε .
Proof For (a), see the proof of Lemma 4.6 in [FG94], whereas (b) is obvious.

2
Now we come to the main point of this subsection concerning interacting Fisher-Wright diffusions, namely to recall Proposition 5.11 of [FG94]. It says, roughly speaking, that coupled processes started with the same initial density θ approach each other as time increases, due to the fact that the same driving Brownian motions are used: Lemma 5.6 (successful coupling of interacting Fisher-Wright diffusions) Assume that g = bf, b > 0. Let µ, ν ∈ T θ . Then the coupling [X, X] of interacting Fisher-Wright diffusions with joint initial law µ × ν is successful, that is Successful coupling will enable us to switch from product initial laws µ in T θ to general ν ∈ T θ .

Comparison with restricted Fisher-Wright diffusions
Since the limit processes U, V and W of the Theorems 1,2,3 do not depend on the diffusion coefficient g ∈ G 0 , our basic method to get this universality in g is a comparison principle with (restricted) interacting Fisher-Wright diffusions. This is a special case of a general comparison principle proved in Cox et al. [CFG96]. The starting point is the fact (see Figure 8) that for each ε ∈ 0, 1 3 a given g ∈ G 0 can be bounded as follows: (recall that g is strictly positive on (0, 1) and Lipschitz). Here g ε belongs to the more general set G ⊃ G 0 of diffusion coefficients introduced in Definition 5.3. We call g ε a restricted Fisher-Wright diffusion coefficient. It is needed for the case of a diffusion coefficient g with a vanishing derivative at a boundary point of [0, 1] (as for instance in the Ohta-Kimura diffusions case g = f 2 ).
Next we want to justify more formally why g ε is called a restricted Fisher-Wright diffusion coefficient.
Lemma 5.8 (restricted Fisher-Wright) If z belongs to the set [ε, 1 − ε] Ξ of restricted states, then under IP g ε z , the transformed process L ε X has the law IP b ε Lεz on [0, 1] Ξ .
In fact, Consequently, for truncated initial states, L ε X is an interacting Fisher-Wright diffusion on [0, 1] Ξ with diffusion parameter b ε .

Universality conclusion
The purpose of this subsection is to demonstrate how coupling and comparison are combined to prove universality statements on interacting diffusions, that is to reduce proofs to the Fisher-Wright case starting with a product initial law. The latter case amounts using the time-space duality relation (65) and the approximation Proposition 2.2 to showing a limit assertion on coalescing random walks η with immigration.
Theorem 6 (universality conclusion) Fix natural numbers k ≥ 0, n i ≥ m i ≥ 0, 0 ≤ i ≤ k. For t > 1, let time points s 0 (t) < · · · < s k+1 (t) be given such that Assume the coalescing random walk η with immigration satisfies (where A m0 ,...,mk (θ) is independent of b > 0). Then, for every g in G 0 and µ ∈ T θ , 0 < θ < 1, the corresponding interacting diffusion X satisfies Proof Step 1 • We show that without loss of generality, in (72) we may restrict to the Fisher-Wright case g = bf, b > 0. Indeed, put 0 < ε ≤ 1 3 and, for the given g ∈ G 0 , choose b ε , b 1 > 0 such that g ε = b ε f ε ≤ g ≤ b 1 f. Apply the moment comparison (68) to see first that we have to deal only with the lower bound once we know (72) in the Fisher-Wright case. By the truncation Lemma 5.5, except some uniform ε-error O(ε), we can replace µ by the ε-truncated law µ ε ∈ T θ ε with θ ε → θ as ε → 0.
Step 2 • Since we are now in the Fisher-Wright case g = bf, b > 0, we may apply the successful coupling Lemma 5.6, to reduce (72) to product initial laws µ ∈ T θ , that is if µ ∈ T θ has the form On the other hand, by the time-space duality (65), we rewrite the l.h.s. of (72): ..,ψ k s0 ,...,sk z ϑ(sk+1) .
By the approximation Proposition 2.2 we may replace this r.h.s. by where we used the fact that by our reduction the initial state has i.i.d. components with expectation θ. However, this is the l.h.s. of (71), and the proof is finished.

6 Limit statements for interacting diffusions
The purpose of this section is to use the results of the Sections 2-5 to prove the Theorems 1-3 of the introduction. To warm up we first want to prove the noise property of a single component process mentioned in § 1.3 above (p. 9).

Noise property of a single component process
Fix θ ∈ (0, 1) and recall that T θ denotes the set of all shift ergodic initial distributions with density θ.
Proposition 6.1 (stationary 0-1-noise of components) Let µ ∈ T θ and g ∈ G 0 . Fix ξ ∈ Ξ, k ≥ 0 and 0 < β 1 < ... < β k+1 . Then Consequently, here the limiting process is independent in each point and stationary. Actually this property essentially follows from the fact that in the Fisher-Wright dual, namely the delayed coalescing random walk with immigration no particle will interact in the present scaling regime.
Proof Fix µ, g, ξ, k and β 1 , ..., β k+1 as in the lemma. Without loss of generality, assume β k+1 = 1, t = N T , and ξ = 0. We apply the method of moments. Choose integers n 1 , ..., n k+1 ≥ 1. It suffices to show that According to Remark 5.2, one can interchange the antiton order in the time-space duality Proposition 5.1. Applied to Theorem 6 with ψ i = n i δ 0 and s 0 (T ) ≡ 0, this means that (75) will follow if for the coalescing random walk η: However, by the walk speed Lemma 2.6 at p. 20, at the first immigration time σ k (T ) = (1 − β k )N T the initial particle (starting at time σ k+1 (T ) = 0 at 0) is located in Ξ[T, 1] (defined in (35)) with probability approaching one as T → ∞. Consequently, it is of order T + o(T ) away from the next immigrating particle. In the time σ k−1 (T ) − σ k (T ) = (β k − β k−1 )N T until the next immigration, the resulting difference walk moves away again of order T +o(T ). Hence these particles will not meet and will both be away from the origin. Continuing to argue in this way we see that η(N T ) = k + 1 with probability converging to 1 as T → ∞. Hence (76) holds, finishing the proof. We shall deduce Theorem 1 from a more general statement allowing to simultaneously look at several component processes, which is interesting in its own despite the cumbersome notation one has to use.
(b) (characterization of the limit laws) This family of limit laws L α ; m β is characterized by the fact that the probability for all components to be equal to 1 is given by with Y θ the transformed Fisher-Wright tree of Definition 1.9 at p. 7.
In particular, the distribution of the limit array is a mixture of Bernoulli product laws: First realize the "weights" according to the distribution P θ of the transformed Fisher-Wright tree Y θ , and then form the product laws with marginals 3 Proof (a) Set s i (T ) := N βiT , 0 ≤ i ≤ k. In order to apply the method of moments, take "Tindependent multiples" ψ i (T ) of ϕ i (T ), that is ψ i (T ) ∈ Ψ satisfying ψ i (T ) ∧ 1 = ϕ i (T ), and where the multiplicities ψ i ζ (T ) > 0 are independent of T . We want to show that with Λ the ensemble of log-coalescents with immigration of Definition 3.5. Since the r.h.s. is independent of b, according to the universality conclusion Theorem 6 it suffices to show (79) with the l.h.s. replaced by E ϕ 0 (T ),...,ϕ k (T ) s0(T ),...,sk(T ) θ η(N T ) (recall that ψ i ∧ 1 = ϕ i ). But then by the scaling limit Theorem 4 at p. 27 on coalescing random walks with immigrating multi-colonies the claim (79) follows. Hence, the limit law L α ; m β exists and is concentrated on {0, 1} |m| since the limit expression in (79) is independent of the orders ψ i ζ (T ) > 0 of moments at the l.h.s. of (79).
Proof of Theorem 1 Specialize the assumptions in Theorem 7 as follows: i ≡ 0, m i,0 ≡ m i ≤ 1, α i,0 ≡ 0 and ϕ i ≡ δ ξ if m i = 1. Then the claims (a) and (b) of Theorem 7 imply the corresponding ones in Theorem 1, as in particular already explained in Example 4.1 at p. 31.
The marginal laws L U ∞ β are given by the basic ergodic theorem (14). Since we can map r → 1 − r, we may fix our attention on the case U ∞ 0 = ∂ = 1. We need to show that The l.h.s. coincides with the monotone limit lim n→∞ IP U ∞ k2 −n β = 1, 0 ≤ k ≤ 2 n , which by the identity in (b) is equal to lim m→∞ E Y θ t m with t := log(1/β).
If we restrict the latter expectation additionally to the event from the r.h.s. of (80), then Y θ t = 1, and we actually arrive at the r.h.s. of (80). The remaining part of the expectation can be bounded from above by E Y θ t m ; Y θ t < 1 , which converges to 0 as m → ∞, by bounded convergence. This verifies (80), and shows that U ∞ 0 , h U has the claimed law. But combined with (b), the remaining claim of (c) follows immediately. This finishes the proof. 2 6.3 Time-space thinned-out systems: Proof of Theorem 3 1 • (convergence and characterization) Fix µ ∈ T θ , 0 < θ < 1, natural numbers k, m 0 , ..., m k ≥ 0, and constants 1 > β 1 > · · · > β k ≥ α > 0 = β 0 . For i ∈ {0, ..., k}, consider χ i ∈ Φ with χ i = m i . for the spread-out population, giving the particles of χ an asymptotic distance αT , as in the thinning procedure of Definition 1.14 (a). As in the proof of Theorem 7, apply the method of moments, and take "T -independent multiples" ψ i ∈ Ψ of ϕ i . In order to determine the limit in law as T → ∞ of the array of variables W β,α,T ξ,i we look at the moments IE g µ k i=0 X ψ i (T ) N T − s i (T ) .
By Proposition 3.10 at p. 29, this generating function in θ converges as T → ∞ to the corresponding one of the ensemble Λ of log-coalescents without immigration. But according to the duality Theorem 5, the latter generating function is given by Hence, the moments (82) have the limit (83), and we conclude that the limiting field W β,α,∞ exists. Moreover, since (83) is independent of the orders ψ i ξ (T ) > 0 of the moments, the limit W β,α,∞ is {0, 1} Ξ×{0,...,k} -valued. This implies claims (a) and (b) of the theorem. 2 • (a moment estimate) Consider (83), and condition on F(β k ). Then all factors become independent, and we can switch to a product of conditional moments. Moreover, by Jensen's inequality, these conditional moments can be bounded below by corresponding powers of first moments. But by the martingale property of the Fisher-Wright diffusion, these expectations can be computed arriving altogether at the lower estimate E θ k i=0 Y θ βi (β k ) mi for (83). Since Y θ βk (β k ) = Y θ 0 (β k ), we actually reduced (83) by one factor. Hence, by induction, we will end up with the lower estimate θ m0+···+mk for (83). Consequently, from (24), 3 • (association) By definition, a countable family of variables is associated if every two nondecreasing functions of this family, depending only on finitely many components and being square integrable with respect to the law of the family, are non-negatively correlated. Our last formula implies that for events C, D of the form According to [Lin88] it suffices to have this property for all increasing events (depending only on finitely many components). Since the variables are 0-1-valued, all increasing events are of the form (85), and the needed property follows. Hence, the limit field W β,α,∞ is associated.
2 • (restriction of the range of summation) Asymptotically as T → ∞ we may restrict the range of summation in (87) requiring additionally (recall the definition (34) of (r) at p. 20). Roughly speaking, we sum only over labels with absolute and relative speed α. To justify this restriction, first note that all terms of the sum in (87)  3 • (convergence) Set ϕ i := δ ξ(Mi−1+1) + · · · + δ ξ(Mi) , 0 ≤ i ≤ k, with the ξ(j) of the range of summation in (87) but with the restriction (88). Note that ϕ 0 + · · · + ϕ k belongs to Φ T [α, M k , 1] for all sufficiently large T . (We applied the Definition 2.8 of spreading multi-colonies, specialized to a single colony.) Then a typical term in the sum in (87) can be written as In order to calculate the limit of (89) as T → ∞ which then gives the limit of (87), we want to apply the universality conclusion Theorem 6. Therefore we look at E ϕ 0 ,...,ϕ k s0 ,....,sk θ η(N T ) .