The Fleming-Viot limit of an interacting spatial population with fast density regulation

We consider population models in which the individuals reproduce, die and also migrate in space. The population size scales according to some parameter $N$, which can have different interpretations depending on the context. Each individual is assigned a mass of 1/N and the total mass in the system is called \emph{population density}. The dynamics has an intrinsic density regulation mechanism that drives the population density towards an equilibrium. We show that under a timescale separation between the \emph{slow} migration mechanism and the \emph{fast} density regulation mechanism, the population dynamics converges to a Fleming-Viot process as the scaling parameter $N$ approaches $\infty$. We first prove this result for a basic model in which the birth and death rates can only depend on the population density. In this case we obtain a \emph{neutral} Fleming-Viot process. We then extend this model by including position-dependence in the birth and death rates, as well as, offspring dispersal and immigration mechanisms. We show how these extensions add \emph{mutation} and \emph{selection} to the limiting Fleming-Viot process. All the results are proved in a multi-type setting, where there are $q$ types of individuals interacting with each other. We illustrate the usefulness of our convergence result by discussing applications in population genetics and cell biology.


Introduction
Density-dependent models are well-known in population biology. In these models, the birth and death rates of individuals may depend on the density of the population, where the term density refers to the population size under a suitably chosen normalization. Many models in ecology, epidemiology and immunology can be suitably described by such models (see Thieme [35]). Considering the molecules of chemical species as the individuals in a population, we can also view a chemical reaction network as a density-dependent population model.
Density-dependent models are appealing because one can easily account for interactions among individuals by appropriately specifying the birth and death rates as functions of the population density. For competitive interactions, as in the Lotka-Volterra model (see [28,38,30]), the death rate increases with the population density while for cooperative interactions, as in the Allee model (see [1]), the birth rate increases with the population density. Density dependent models are good candidates for modeling natural populations that cannot grow indefinitely due to the limited availability of certain vital resources or due to severe competition at large population sizes. By having the death rate dominate the birth rate at large densities, one can ensure that the population density does not go beyond a certain threshold.
For a population having q types of individuals, the population density is a q dimensional vector whose i-th component gives the density of the population of the i-th type. For such a multi-type population, a density-dependent model can be written in the deterministic setting as a system of q ordinary differential equations. If all the trajectories of this system stay within a compact set at all times, then we say that the population dynamics has a density regulation mechanism. Such a mechanism is called equilibrating if all the trajectories reach a fixed point for this system as time goes to infinity. In such a situation, this fixed point is called the equilibrium population density.
In this paper, we will consider population models in which the individuals live in a geographical region E, that is a compact metric space. Even though we have a spatial structure, for us the population density will always denote the population size divided by a normalization parameter N. In other words, our notion of population density is global in the sense that it carries no information about the distribution of individuals in E. This is unlike other models of spatial populations where the population density is a spatially varying function specifying the local concentration of individuals at each location. The normalization parameter N will be a large positive integer which can have various interpretations depending on the context. In ecological models, N can be taken to be the carrying capacity of a habitat, which is the maximum number of individuals that the habitat can support with its resources. In epidemic models, N is usually the total population size, while in chemical reaction networks, N measures the volume of the system. In each of these cases, the population size at any time is of order N.
For the moment assume that all the individuals have the same type and that they reproduce, die and also migrate in E. At the time of birth, the offspring gets the same location as its parent. The population consists of approximately N individuals of a mass of 1/N each. The population density at any time is just the total mass of the individuals that are alive. Suppose that the birth and death rates of the individuals depend on the population density in such a way that they induce a density regulation mechanism which is equilibrating. We also assume that the migration mechanism operates at a timescale that is N times slower than the density regulation mechanism. In such a setting, we can view the dynamics of the empirical measure of the population as a measure-valued Markov process parameterized by N. Our goal is to understand how this family of Markov processes behaves as N → ∞. The population dynamics has two timescales separated by N. If we observe the process at the fast timescale, then the effect of migration vanishes in the limit and it is uninteresting to consider the population with a spatial structure in this case. Therefore we will observe the dynamics at the slow timescale and examine its behaviour in the infinite population limit. Since the density regulation mechanism is fast, it will have enough time to re-equilibrate the population density between any two events at the slow timescale. Hence in the limit N → ∞, we would expect the population density to remain equilibrated at all times. We will show that it is indeed the case. However our main task is to understand the dynamics of the spatial distribution of the population in the infinite population limit. We will prove that in the limit, the spatial distribution of the population evolves according to a Fleming-Viot process which takes values in the space of probability measures over E. This process was introduced in the context of population genetics by Fleming and Viot [17] in 1979 and it has been very well-studied since then. An excellent survey of Fleming-Viot processes is given by Ethier and Kurtz [13]. The model we just described will be called the basic model. In this model, the birth and death rates of an individual were density dependent, but independent of the location of the individual. It shall be seen later that the limiting Fleming-Viot process in this case is neutral, in the parlance of population genetics. If we add small positiondependent terms to the birth and death rates, then in the limit we obtain a Fleming-Viot process with genetic selection. Perhaps unsurprisingly, altering the birth rate this way leads to fecundity selection, while altering the death rate leads to viability selection, in the limiting Fleming-Viot process. We also consider extensions of the basic model by allowing for offspring dispersal (offspring is born away from the parent) or immigration. Such extensions add extra mutations to the limiting Fleming-Viot process.
The results mentioned in the previous paragraph are proved in a multi-type setting. The population has q types of individuals and each type of individual can give birth to an individual of each type. All the individuals are migrating in E according to a type-dependent mechanism. Now we can view the joint dynamics of the empirical measures of the q subpopulations as a Markov process parameterized by N. We make similar assumptions on the dynamics as before. Again in the limit N → ∞, the population density (which is now a q-dimensional vector) stays at an equilibrium at all times. Assuming the irreducibility of an underlying interaction matrix, we show that in the limit all the q sub-populations become spatially inseparable. This means that on any patch of E, either there is no mass present or there is mass of each type present in a proportion determined by the equilibrium density. Moreover the spatial distribution of each of the q sub-populations evolves according to a single Fleming-Viot process. This Fleming-Viot process can be seen as describing the limiting dynamics of a mixed population, formed by taking a suitable density-dependent convex combination of the q sub-populations.
In ecological models, the individuals need resources to survive and reproduce. Normally in spatial population models, resources are assumed to have a fixed distribution in space. As individuals move, they find the unexploited resources and compete for them locally with other individuals present in their neighbourhood. Such a model is different from the models we consider in two ways. Firstly, due to the local nature of the interactions, the density is locally regulated rather than globally regulated as in our models. Secondly, since the discovery of resources is tied to the movement of individuals, it is reasonable to assume that both migration and birth-death mechanisms operate at the same timescale. For such spatial models, Oelschläger [31] has shown in a multi-type setting that the dynamics converges in the infinite population limit to a system of reaction-diffusion partial differential equations. Such equations are in widespread use in biology (see Fife [16]). We now discuss the conditions under which our models can be useful. Consider a situation where the resource is not fixed but rapidly mixing in the whole space. This resource is shared by all the individuals in the population. An individual may deplete the resource locally but its effect is felt globally due to the rapid mixing. This gives rise to global density dependence in a spatial population. If the individuals move very slowly in comparison to their resource consumption mechanism (which is linked to their birth and death mechanisms), then we have a situation in which our models can be used. This paper is motivated by our earlier work [19] in which we study the phenomenon of cell polarity using a model considered here. Cell polarity refers to the clustering of molecules on the cell membrane. This clustering is essential to trigger various other cellular processes, such as bud formation [4] or immune response [39]. Therefore understanding how cells establish and maintain polarity is of vital importance. In [2], Altschuler et. al. devised a mathematical model for this phenomenon, by abstracting the mechanisms that are commonly found in cells exhibiting polarity. Their model has a fixed number of molecules that can either reside on the membrane or in the cytosol. These molecules move slowly on the membrane but diffuse rapidly in the cytosol. The dynamics has a positive feedback mechanism which allows a membrane molecule to pull a cytosol molecule to its location on the membrane. This mechanism is like a birth process in which a membrane molecule gives birth by exploiting the common resource (cytosol molecules) shared by all the membrane molecules. Since the migration of membrane molecules is slow and the mixing of the resource is fast, this model can be viewed as a model described in this paper (see Section 3.2 for details). Therefore the results in this paper are applicable and we obtain a Fleming-Viot process in the infinite population limit. In [19] we prove this convergence 1 and use the limiting process to answer some interesting questions about the onset and structure of cell polarity. The model studied in [19] is rather simplistic as all the molecules are assumed to be identical. Most cells that exhibit polarity have molecules of many different types participating in the feedback mechanism and migrating on the membrane in different ways (see [10,4,34]). It is natural to ask if the Fleming-Viot convergence is valid in this general framework. The results in this paper show that it is indeed the case as long as certain basic elements of the dynamics are preserved. This ensures that the analysis in [19] can be extended to more complicated (and realistic !) models for cell polarity. We discuss this example further in Section 3.2.
Note that the geographical space E can be considered as the space of genetic traits. This casts our models into the setting of population genetics. The spatial migration can be seen as mutation that may happen at any time during the life of an individual, while the offspring dispersal mechanism is like mutation that can only happen at the time of birth of an offspring. We assume that the reproduction is clonal in the absence of mutation. The position-dependent birth and death mechanism is analogous to the selection mechanism in population genetics. Hence it is not surprising that spatial migration, offspring dispersal and position-dependent birth and death mechanisms correspond to mutation and selection in the limiting Fleming-Viot process. What is more interesting is that the sampling mechanism arises naturally from our models in the infinite population limit. This sampling mechanism is a key feature of the standard models in population genetics, such as the Wright-Fisher model, the Moran model and their variants (see [40,29,14]). This mechanism makes the models tractable by keeping the population size constant. It is done by matching the birth of an individual with the death of another individual chosen uniformly from the population. It is obvious that such a mechanism is quite unrealistic, at least for finite populations which are naturally fluctuating. However our Fleming-Viot convergence result shows that one can recover this sampling mechanism in the infinite population limit if the dynamics has an equilibrating density regulation mechanism that acts at a faster timescale than other events. It is well-known that a Fleming-Viot process arises in the infinite population limit of an appropriately scaled version of the Wright-Fisher or the Moran model (see [17] and [13]). Therefore if all the individuals have the same type (that is, q = 1) and h eq is the equilibrium population density, then for a large (but finite) value of the scaling parameter N, our models will have roughly the same dynamical behaviour as a suitably chosen Wright-Fisher or Moran model with the constant population size Nh eq . This insight provides a justification for the assumption of a constant population size in population genetics models. Most of the mathematical literature on population genetics is concerned with two types of questions. In the absence of mutation, one wants to know the probability and the time of fixation of a particular genetic trait. The term fixation describes the event in which the whole population has the same genetic trait. In the presence of mutation, one attempts to investigate the properties of the stationary distribution, if such a distribution is present. These questions are difficult to answer for finite populations and one typically answers them by studying the limiting Fleming-Viot process. Our discussion shows that for large N, the fixation times and probabilities or the stationary distribution will be approximately the same for our model and the corresponding Wright-Fisher or Moran model. We illustrate this point through an example in Section 3.1.
This paper is organized as follows. In Section 2 we describe the mathematical models that we consider and state our main results. In Section 3 we discuss the aforementioned applications of our results in greater detail. Finally in Section 4 we prove the main results.

Notation
We now introduce some notation that we will use throughout this paper. Let R, R + , R * , N and N 0 denote the sets of all reals, nonnegative reals, positive reals, positive integers and nonnegative integers respectively. For any a, b ∈ R, their minimum is given by a ∧ b.
Let · and ·, · denote the standard Euclidean norm and inner product in R n for any n ∈ N. Moreover for any v = (v 1 , . . . , v n ) ∈ R n , the norms v 1 and v ∞ are defined as v 1 = n i=1 |v i | and v ∞ = max 1≤i≤n |v i |. The vectors of all zeros and all ones in R n are denoted by 0 n and 1 n respectively. Let M(n, n) be the space of all n × n matrices with real entries. For any M ∈ M(n, n), the entry at the i-th row and the j-th column is indicated by M ij . Its infinity norm is defined as M ∞ = max 1≤i≤n n j=1 |M ij | and its transpose and inverse are indicated by M T and M −1 respectively. The symbol I n refers to the identity matrix in M(n, n). For any v = (v 1 , . . . , v n ) ∈ R n , Diag(v) refers to the matrix in M(n, n) whose non-diagonal entries are all 0 and whose diagonal entries are v 1 , . . . , v n . A matrix in M(n, n) is called stable if all its eigenvalues have strictly negative real parts. While multiplying a matrix with a vector we always regard the vector as a column vector.
Let U ⊂ R n and V ⊂ R m . Then for any k ∈ N 0 , the class C k (U, V ) refers to the set of all those functions f that are defined on some open set O ⊂ R n containing U such that f (x) ∈ V for all x ∈ U and f is k-times continuously differentiable at any x ∈ O.
Let (S, d) be a metric space. Then by B(S) (C(S)) we refer to the set of all bounded (continuous) real-valued Borel measurable functions. If S is compact, then C(S) ⊂ B(S) and both B(S) and C(S) are Banach spaces under the sup norm f ∞ = sup x∈S |f (x)|. Recall that a class of functions in B(S) is called an algebra if it is closed under finite sums and products. Let B(S) be the Borel sigma field on S. By M F (S) and P(S) we denote the space of all finite positive Borel measures and the space of all Borel probability measures respectively. These measure spaces are equipped with the weak topology. For any f ∈ B(S) and µ ∈ M F (S) let The space of cadlag functions (that is, right continuous functions with left limits) from [0, ∞) to S is denoted by D S [0, ∞) and it is endowed with the Skorohod topology (for details see Chapter 3, Ethier and Kurtz [12]). The space of continuous functions from [0, ∞) to S is denoted by C S [0, ∞) and it is endowed with the topology of uniform convergence over compact sets. An operator A on B(S) is a linear mapping that maps any function in its domain D(A) ⊂ B(S) to a function in B(S). The notion of the martingale problem associated to an operator A is introduced and developed in Chapter 4, Ethier and Kurtz [12]. In this paper, by a solution of the martingale problem for A we mean a measurable stochastic process X with paths in D S [0, ∞) such that for any f ∈ D(A), Af (X(s))ds is a martingale with respect to the filtration generated by X. For a given initial distribution π ∈ P(S), a solution X of the martingale problem for A is a solution of the martingale problem for (A, π) if π = PX(0) −1 . If such a solution X exists uniquely for all π ∈ P(S), then we say that the martingale problem for A is well-posed. Additionally, we say that A is the generator of the process X.
Throughout the paper ⇒ denotes convergence in distribution.

Model descriptions and the main result
Our first task is to describe the models that we consider in the paper. As mentioned in Section 1, we model a population which resides in some compact metric space E and in which the individuals have one of q possible types. We denote these types by elements in the set Q = {1, 2, . . . , q}. We identify each individual located at x ∈ E with the Dirac measure δ x , concentrated at x. Moreover each individual is assigned a mass of 1/N where N ∈ N is our scaling parameter. For any i ∈ Q, the population of type i individuals can be represented by an atomic measure of the form where n i is the total number of type i individuals and x i 1 , . . . , x i n i ∈ E are their locations. Define the space of atomic measures scaled by N as is the space of finite positive measures. Let M q N,a (E) and M q F (E) be the spaces formed by taking products of q copies of M N,a (E) and M F (E) respectively. Since for each i ∈ Q, the type i population can be represented by a measure µ i ∈ M N,a (E) , the entire population can be represented by a q-tuple of measures µ = (µ 1 , . . . , µ q ) ∈ M q N,a (E). Let 1 E denote the constant function in C(E) which maps each point in E to 1. Define the density map H : M q F (E) → R q + as the continuous function given by We will refer to h = H(µ) as the density vector corresponding to µ ∈ M q F (E). Note that if the population is represented by a µ ∈ M q N,a (E) and if h = (h 1 , . . . , h q ) is the corresponding density vector, then h i is just the total number of type i individuals divided by N. The density vector h contains no information about the distribution of individuals on E.

The type-dependent migration mechanism
In our models, each individual of type i ∈ Q will migrate according to an independent Evalued Markov process with generator B i . We will assume that each operator B i generates a Feller semigroup on C(E) (see Chapter 4 in Ethier and Kurtz [12]). Furthermore we assume that there is an algebra of functions D 0 ⊂ C(E) which is dense in C(E), contains 1 E and satisfies (2. 3) The martingale problem corresponding to each B i is well-posed and any solution is a strong Markov process with sample paths in D E [0, ∞) (see Theorem 4.2.7 and Corollary 4.2.8 in [12]). We now formally describe how this type-dependent migration of individuals translates into the evolution of our population in the space M q N,a (E). For each n ∈ N, define a space of atomic probability measures as and a class of continuous real-valued functions over P(E) by Suppose that ν = (1/n) n j=1 δ x j ∈ P n,a and F (ν) = m l=1 f l , ν ∈ C 0 . For positive integers k ≤ m, let P m k be the set of onto functions from {1, . . . , m} to {1, . . . , k} and for any p ∈ P m k and l = 1, . . . , k let Then we can write where the last term has summation over distinct choices of j 1 , . . . , j k ∈ {1, . . . , n}. For each i ∈ Q, n ∈ N we now define an operator B n i : D(B n i ) = C 0 → B (P n,a ) by where F ∈ C 0 is given by (2.6). Observe that any F ∈ C 0 is bounded and One can easily verify that the martingale problem for each B n i is well-posed. If ν 0 = (1/n) n j=1 δ x j ∈ P n,a then the solution of the martingale problem for (B n i , δ ν 0 ) is just the empirical measure process of a system of n individuals moving in E according to independent Markov processes with generators B i and initial positions x 1 , . . . , x n . For more details see Section 2.2 in Dawson [7].
For any f 1 , . . . , f m ∈ D 0 consider a function F : M q F (E) → R of the form where h = H(µ) and c : R q + → R q + is a function that satisfies Define a class of functions by ) : F is given by (2.9) and c satisfies (2.10) . (2.11) If µ = (µ 1 , . . . , µ q ) ∈ M q N,a (E), then for each i ∈ Q, µ i has the form k be as before. Pick a F ∈ C q 0 of the form (2.9). For any p ∈ P m k and l = 1, . . . , k define F is given by (2.5). We can write the function F as where the last term has summation over distinct choices of l 1 , . . . , l k ∈ {1, . . . , q}. Let be the operator whose action on any F ∈ C q 0 written in this form is given by where for any n ∈ N, i ∈ Q, the operator B n i is given by (2.7). For convenience B 0 i is defined to be the identity map on C 0 . The function B N F is bounded due to (2.8) and (2.10).
The martingale problem for B N is well-posed because the martingale problem for B n i is well-posed for each i ∈ Q, n ∈ N. To see this suppose that µ 0 = (µ 0,1 , . . . , µ 0,q ) ∈ M q N,a (E) and for each i ∈ Q, µ 0,i ∈ M N,a (E) has the form µ 0,i = (1/N) n i j=1 δ x i j . If n i > 0 let ν 0,i = (1/n i ) n i j=1 δ x i j and if n i = 0 let ν 0,i = δ x 0 for some arbitrary x 0 ∈ E. For each i ∈ Q, let {ν i (t) : t ≥ 0} be the unique solution of the martingale problem for (B n i i , δ ν 0,i ). Then the process {µ(t) : t ≥ 0} given by is the unique solution to the martingale problem for (B N , δ µ 0 ).

The density regulation mechanism
We mentioned in Section 1 that in our models, the birth and death rates of individuals depend on the population density in such a way that they induce an equilibrating density regulation mechanism. We now describe this mechanism in our multi-type setting.
For each i, j ∈ Q, let β ij be a bounded function in C 2 (R q + , R + ) and for each i ∈ Q let ρ i be any function in C 2 (R q + , R + ). For any h ∈ R q + , let the matrix A(h) ∈ M(q, q) and the vector θ(h) ∈ R q be given by and (2.14) Consider θ to be a vector field over R q + . Observe that if h ∈ R q + is such that h i = 0 then θ i (h) ≥ 0. This shows that any solution to the initial value problem stays inside R q + for all positive times for which it is defined. Standard existence and uniqueness theorems imply that for any h 0 ∈ R q + , there is a solution h(t) of (2.15) defined on some Therefore using Gronwall's inequality and (2.17) we obtain that h(t) 1 ≤ h(0) 1 e C θ t for all t ∈ [0, a) and so h(t) 1 cannot go to ∞ as t → a − . Thus a = ∞ and this shows that for any h 0 ∈ R q + , the initial value problem (2.15) has a unique solution which is defined for all t ≥ 0.
Let ψ θ : R q + × R + → R q + be the flow associated to the vector field θ. This means that ψ θ satisfies This flow is well-defined because of the arguments given in the preceding paragraph. In fact since θ is in . This map also satisfies the semigroup property We will say that a set U ⊂ R q Before we proceed, we need to make some more assumptions.
(C) The matrix A(h eq ) is irreducible, that is, there does not exist a permutation matrix P ∈ M(q, q) such that the matrix P A(h eq )P −1 is block upper-triangular.
(D) For each i, j ∈ Q, the functions ρ i and β ij are analytic at h eq , that is, they agree with their Taylor series expansion in a neighbourhood of h eq .
Part (A) says that there is a nonzero vector h eq ∈ R q + which is a fixed point for the flow ψ θ . Part (B) implies that this fixed point h eq is asymptotically stable for this flow. The significance of part (C) will become clear later in this section. Part (D) is a technical condition that we require to prove our main result.
We define the region of attraction of the fixed point h eq for the flow ψ θ by Part (B) of Assumption 2.1 and Lemma 3.2 in Khalil [24], ensure that U eq is a ψ θ -invariant open set in R q + . Note that U eq may not be an open set in R q . We are now ready to describe the density regulation mechanism. Suppose that when the population density vector is h ∈ R q + , then for each i, j ∈ Q, an individual of type i gives birth to an individual of type j at rate β ij (h) and an individual of type i dies at rate ρ i (h). At the time of birth, the offspring is placed at the same location in E as its parent. Note that the birth and death rates do not depend on the location of the individuals. If the scaling parameter is N ∈ N and the mass of each individual is 1/N, then we can view this density-dependent population dynamics as a Markov process over the state space M q N,a (E) with generator R N : D(R N ) = B(M q N,a (E)) → B(M q N,a (E)) defined as follows. For any F ∈ B(M q N,a (E)) and any µ ∈ M q N,a (E) with h = H(µ) where for any µ = (µ 1 , . . . , Concrete results on the well-posedness of the martingale problem for R N will come later. First let us understand how this operator R N drives the population density to an equilibrium value. Let {µ N (t) : t ≥ 0} be a M q N,a (E)-valued Markov process with generator R N and let {h N (t) : t ≥ 0} be the corresponding density process defined by h N (t) = H(µ N (t)). Assume that h N (0) → h 0 a.s. as N → ∞ and h 0 is some vector in U eq . From Theorem 11.2.1 in [12], one can conclude that as N → ∞, {h N (t) : t ≥ 0} converges in the Skorohod topology in D R q [0, ∞) to the process {ψ θ (h 0 , t) : t ≥ 0}. Since h 0 ∈ U eq , ψ θ (h 0 , t) → h eq as t → ∞ which indicates that for a large N, the density process h N (·) gets closer and closer to h eq with time. This shows, at least informally, that the operator R N drives the population density towards h eq . Henceforth we shall refer to the vector h eq as the equilibrium population density. Of course, this discussion totally ignores how R N affects the spatial configuration of the population. Our main goal in this paper is to discover how the spatial distribution of the population evolves when the density is equilibrated at a faster timescale in comparison to the other mechanisms.
For any h ∈ R q + , the matrix A(h) given by (2.13) signifies how the various types of individuals interact when the population density is h. Part (C) of Assumption 2.1 means that at the equilibrium population density, all the types of individuals are communicating with each other by influencing each other's birth and death rates.

Mathematical Models
We now describe the models that we consider in this paper. We start with a basic model which only has spatial migration along with density regulation. We then extend this model by adding other features such as position dependence in the birth and death rates, offspring dispersal and immigration. All the models will be parameterized by the scaling parameter N, with 1/N being the mass of each individual in the population. We will describe each model by specifying the generator of the associated Markov process. The well-posedness of the martingale problems corresponding to these generators is given by Proposition 2.4. Our main results are presented in Section 2.4.

Basic Model
In this model, the individuals migrate according to the type-dependent migration mechanism specified in Section 2.1 and their birth and death rates regulate the population density as described in Section 2.2. If the scaling parameter is N, then at any time we represent the population as a measure in M q N,a (E). The population evolves due to the following events. • Each individual of type i ∈ Q migrates in E according to an independent Markov process with generator B i .
• When the population density vector is h ∈ R q + , each individual of type i ∈ Q gives birth to an individual of type j ∈ Q at rate Nβ ij (h). At the time of birth, the offspring is placed at the same location as its parent.
• When the population density vector is h ∈ R q + , each individual of type i ∈ Q dies at rate Nρ i (h).
This population dynamics can be viewed as a M q N,a (E)-valued Markov process whose gen- for any F ∈ C q 0 . Here h = H(µ) is the density vector corresponding to µ.

Model with position dependence in the birth and death rates
In the above model, the birth and death rates of individuals do not depend on their location. One may want to consider models in ecology where some spatial locations are more advantageous for reproduction or some locations are more hazardous for survival. If we think of E as the space of genetic traits, then one may consider models in which the trait of an individual influences its chances of reproduction or survival. To capture such situations we now introduce a model in which the birth and death rates of an individual can vary with its position. However we will assume that this position dependent variation is small, in the sense that even though an individual's birth and death rate is of order N (as in Section 2.3.1), the spatial variation in these rates is of order 1. We now make the model precise.
For each i, j ∈ Q, let b s ij , d s i be bounded continuous functions from E × R q + to R + . These functions determine the spatial variation in the birth and death rates. The migration of individuals is like in the basic model. However now the birth and death mechanism changes as follows.
• When the population density vector is h ∈ R q + , an individual of type i ∈ Q located at x ∈ E, gives birth to an individual of type j ∈ Q at rate b s ij (x, h) + Nβ ij (h). At the time of birth, the offspring is placed at the same location as its parent.
• When the population density vector is h ∈ R q + , an individual of type i ∈ Q located at The evolution of our population under this dynamics can be viewed as a M q N,a (E)-valued Markov process with generator A N 1 : where h = H(µ) is the density vector corresponding to µ.

Model with offspring dispersal at birth
In the basic model we described in Section 2.3.1, the offspring is placed at the same location as its parent at the time of birth. However we may want to construct models where this restriction needs to be relaxed. For example, while modeling plant populations, one may wish to account for the spreading of seeds due to wind and other factors. Also in models for population genetics, where E is the space of genetic traits, offsprings may be born with a different trait than their parents due to mutations. To consider such situations we now present a model in which an offspring may be born away from its parent. We allow this offspring dispersal to either be rare (happens with probability proportional to 1/N) or small (the offspring is placed at a distance proportional to 1/N from the parent). We handle both these cases in a unified way. For each i, j ∈ Q and N ∈ N, let ϑ N ij be a function from E to P(E) and let p N ij be a function from E to [0, 1]. The individuals migrate and die in the same way as described in the basic model (Section 2.3.1). The birth rates are also the same as in the basic model. However when an individual of any type i ∈ Q located at x ∈ E, gives birth to an individual of type j, the location of the offspring is x with probability (1 − p N ij (x)) and distributed according to ϑ N ij (x, ·) with probability p N ij (x). To pass to the limit N → ∞, we need an assumption on p N ij and ϑ N ij which is stated below.
Assumption 2.2 For each i, j ∈ Q we assume that there is an operator C ij : D(C ij ) → C(E), whose domain is taken to be the same as D 0 (see (2.3)) for convenience, such that for every f ∈ D 0 .
The evolution of our population under the dynamics described above can be viewed as a M q N,a (E)-valued Markov process with generator A N 2 : by its action on any F ∈ C q 0 by where h = H(µ) is the density vector corresponding to µ.

Model with immigration
Consider a population whose dynamics is as described in the basic model (Section 2.3.1). In addition, suppose that the individuals of each type are immigrating to E at a certain density dependent rate and settling down according to some distribution on E. In this section we model this situation. Such a model can help us understand the effects of immigration on the population demography.
For each i ∈ Q, let κ i : R q + → R + be a continuous function satisfying for some C > 0. The individuals migrate, reproduce and die as in the basic model. Moreover, when the population density vector is h ∈ R q + , the individuals of each type i ∈ Q arrive in the population at rate Nκ i (h) and their initial location is given by the distribution Θ i ∈ P(E).
The evolution of our population under this dynamics can be viewed as a M q N,a (E)-valued Markov process with generator A N 3 : ) defined by its action on any F ∈ C q 0 as where h = H(µ) is the density vector corresponding to µ.
where h = H(µ). Then for each l ∈ {0, 1, 2, 3} and F ∈ C q 0 we can write This form makes the timescale separation clear between the fast density regulation mechanism (NR N ) and the slow migration (B N ), position-dependent birth and death (G N 1 ), offspring dispersal (G N 2 ) and immigration (G N 3 ) mechanisms.

The main results
The main results of this paper are concerned with the limiting behaviour of the dynamics under the models described in Section 2.3. Before we present these results we must first verify that all the models in Section 2.3 can be represented by a suitable Markov process. This is established by the following proposition which will be proved in Section 4.1.

Proposition 2.4
For each l ∈ {0, 1, 2, 3} and N ∈ N, the D M q N,a (E) [0, ∞) martingale problem for A N l is well-posed. We now begin analyzing how a sequence of Markov processes with generators A N l behave as N → ∞. The next proposition exhibits some important properties about the limiting dynamics. The proof of this proposition is given in Section 4.2. Recall the definition of U eq from (2.20).
be the corresponding density process. Assume that there is a compact set K 0 ⊂ U eq such that h N (0) ∈ K 0 a.s. for all N ∈ N. Let t N be a sequence of positive numbers satisfying t N → 0 and Nt N → ∞ as N → ∞. Then we have: Let the processes {µ N (t) : t ≥ 0} and {h N (t) : t ≥ 0} be as in the above proposition.

Part (A) of this proposition implies that the process
In other words, for large N, the density process is constantly near the equilibrium population density h eq (after a small time shift t N ). This emphasizes the point that we made in Section 1. The density regulation mechanism operating at a faster timescale than our timescale of observation, keeps the population density equilibrated at all times. Note that the process {µ -valued and it keeps track of how the populations corresponding to all the q-types are evolving in the space E. Part (B) of the above proposition shows that in the limit, all the q sub-populations are spatially fused (in proportions determined by the density vector). Hence their spatial evolution can be studied together by using a single P(E)-valued process. This kind of model reduction result is quite common in stochastic reaction networks with multiple timescales (see [3] and [22]), where one can often equilibrate the concentrations of the fast chemical species and derive a reduced model for the dynamics of the slow species. In our case, the dynamics at the fast timescale equilibrates the population density as well as the relative abundances of all the q sub-populations at each location on E. These relative abundances equilibrate because the birth-death interaction matrix A(h) (see (2.13)) is irreducible at the equilibrium density h eq (see part (C) of Assumption 2.1).
The proof of Proposition 2.5 will exploit the form (2.31) of the operator A N l . A brief outline of the proof is as follows. We will define a R q for some choice of f ∈ C(E). Next we will show that X N is a semimartingale which satisfies an equation of the form . This clearly indicates that for large values of N, the drift term of the form NF (X N (·)) completely overwhelms the effect of the semimartingale Z N . Equations like (2.33) were studied by Katzenberger in [23] in a much more general setting. He showed that under certain conditions, the sequence of semimartingales {X N : N ∈ N} converges in distribution to a semimartingale X as N → ∞. Moreover X only takes values in a set Γ which is an invariant manifold for the deterministic flow induced by F . In our case, this set Γ only consists of one point x eq = (h eq , 0 q−1 ) and this enables us to prove Proposition 2.5. The details are given in Section 4.2.
We mentioned before that in the limit N → ∞, the spatial evolution of all the q subpopulations is governed by a single P(E)-valued process. Our next result, Theorem 2.6, shows that this P(E)-valued process is in fact a Fleming-Viot process that can be characterized by its generator. Before we state Theorem 2.6 we first need to introduce several objects. The existence and properties of some of these objects will be studied in the appendix.
Recall the equilibrium population density vector h eq = (h eq,1 , . . . , h eq,q ) from Section 2.2. It can be verified that this vector has strictly positive components (see part (A) of Lemma A.1). Moreover part (C) of Lemma A.1 shows that there is a unique vector v eq = (v eq,1 , . . . , v eq,q ) ∈ R q * such that v eq A(h eq ) = 0 q and v eq , h eq = q i=1 v eq,i h eq,i = 1.
From (2.34) we can see that the operator B avg is a convex combination of the operators Let γ smpl be the positive constant given by For each i, j ∈ Q, let the operator C ij be as in Assumption 2.2. Define the operator C avg : For each i ∈ Q, let κ i , Θ i be as in Section 2.3.4. Define the operator I avg : D(I avg ) = B(P(E)) → B(P(E)) by We will assume that the operators B avg , (B avg + C avg ) and (B avg + I avg ) generate Feller semigroups on C(E). The well-posedness of the martingale problems corresponding to A 0 , A 1 ,A 2 and A 3 follows from Theorem 3.2 in [13]. In fact, any solution will have sample paths in The operator A 0 is the generator of a neutral Fleming-Viot process on E with mutation operator B avg and sampling rate 2γ smpl . The operators A 2 and A 3 generate a similar Fleming-Viot process with the mutation operator changed to (B avg + C avg ) and (B avg + I avg ) respectively. The operator A 1 also generates a similar Fleming-Viot process, but with selection. The last two terms in its definition correspond to fecundity selection (with intensity function b s avg ) and viability selection (with intensity function d s avg ). See Donnelly and Kurtz [8] for more details. We now formally state the main result of our paper. The proof is given in Section 4.4.
where U eq is given by (2.20). Let t N be a sequence as in Proposition 2.5. Define another process Then there exists a distribution π ∈ P (P(E)) such that is a Fleming-Viot process with type space E, generator A l and initial distribution π.
Remark 2.7 The initial distribution π of the process {ν(t) : t ≥ 0} is related to the distribution of µ(0). This relation is stated in Remark 4.10.
Remark 2.8 In Section 2.3 we first defined a basic model and then constructed auxiliary models by adding other mechanisms, one at a time. These other mechanisms are position dependent birth and death, offspring dispersal and immigration. One can consider models in which more than one of these mechanisms are simultaneously added to the basic model. The proof will demonstrate that the generator of the limiting Fleming-Viot process is then obtained by adding the correct term corresponding to each of these additional mechanisms to the operator A 0 . This correct term can be seen from the definitions of A 1 , A 2 and A 3 . For example, one can have the basic model along with position dependent birth and death (Section 2.3.2) and offspring dispersal (Section 2.3.3). Then the limiting Fleming-Viot process has the generator given by We now give a heuristic explanation of why the dynamics under the models described in Section 2.3 converges to a Fleming-Viot process. Note that part (A) of Proposition 2.5 says that the population density is pinned to a constant value h eq in the limit. Therefore any addition of new mass in the population must be concurrently offset by an equal reduction of existing mass and vice versa. Furthermore, when the mass is reduced or added to keep the balance, this reduction or addition happens at locations that are chosen more or less uniformly from the current empirical measure of the population. This is because the birth and deaths rates of individuals are dominated by a term which is density dependent but location independent. This argument offers some intuition as to why the fast birth-death terms (that form part of the operator NR N ) give rise to the sampling term in the limit (the second term in A 0 ). It also shows why the position dependent birth and death terms in A N 1 become selection terms in A 1 and the offspring dispersal (immigration) term in A N 2 (A N 3 ) becomes a mutation term in A 2 (A 3 ). Since the position of an individual in E can also be seen as its genetic trait, one can interpret the migration on E as genetic mutation. Hence it is not surprising that the migration operators appear as part of the mutation operator in the limiting process. Part (B) of Proposition 2.5 says that in the limit, all the q sub-populations become spatially inseparable. This causes all the mechanisms in the limiting process to appear in an averaged form.
Let {µ N (t) = (µ N 1 (t), . . . , µ N q (t)) : t ≥ 0} be a Markov process with generator A N l , for some l ∈ {0, 1, 2, 3}. It is difficult to prove the convergence of this process directly because the density regulation mechanism acts on it at the fast timescale. This can be seen by splitting the operator A N l according to (2.31) and noting that NR N becomes unbounded as N → ∞. To pass to the limit we consider another measure-valued process {ν N (t) : t ≥ 0} that is constructed by suitably combining the various components of {µ N (t) : t ≥ 0}. In particular where h N (t) = H(µ N (t)) and Λ = (Λ 1 , . . . , Λ q ) is a function from R q + to R q + which satisfies certain conditions. These conditions are chosen to ensure that {ν N (t) : t ≥ 0} is a P(E)valued process whose dynamics is such that the density regulation mechanism acts at the slow timescale. Such a function Λ can be shown to exist by proving that a certain system of coupled partial differential equations has a solution with some desired properties. This is done in Section 4.3. We will then show that as N → ∞ we have ν N ⇒ ν where {ν(t) : t ≥ 0} is the Fleming-Viot process specified by Theorem 2.6. This convergence along with Proposition 2.5 allow us to prove Theorem 2.6. The details of the proof are given in Section 4.4.
The discussion in the preceding paragraph also shows that intuitively we can think of the limiting Fleming-Viot process as describing the spatial evolution of a mixed popula-tion formed by taking a suitable density-dependent linear combination of all the q subpopulations. This is reminiscent of the notion of virtual species (formed by linearly combining several chemical species), that are needed in the specification of the reduced models in chemical reaction networks with multiple timescales (see [5]).

Applications
In this section we discuss the applications mentioned in Section 1 in greater detail. Note that a Fleming-Viot process usually has continuous paths (see [13]). Hence Theorem 2.6 can be seen as a diffusion approximation result which shows that a stochastic process with jumps can be approximated by a process with continuous paths. Such results provide a justification for drawing inferences about the original process (with jumps) by analyzing a more tractable process with continuous paths.
To demonstrate the usefulness of Theorem 2.6 we present two examples. In the first example we consider a population genetics model having logistic interactions along with rare mutation and weak selection. The words rare and weak indicate that the mutation and selection events occur at a slower timescale than other events. The difference between this model and a standard population genetics model (Wright-Fisher or Moran) is that the population size is not fixed but fluctuating due to the logistic interactions. Theorem 2.6 guarantees that by taking the infinite population limit in a suitable way, we obtain a Fleming-Viot process. In many cases this limiting process is well-studied and using its properties one can estimate fixation probabilities, fixation times and the stationary distribution for the finite population model. Our second example sheds light on the phenomenon of cell polarity which refers to the spatial crowding of molecules on the cell membrane. We draw upon our work in [19] to show that Fleming-Viot convergence can help us understand how cells establish and maintain polarity. In [19] we only consider a very simple model, but the results in this paper ensure that the same analysis holds for a general class of models.

Logistic model for population genetics
The logistic growth model is very popular in ecology. It was proposed by Verhulst [37] in 1838 to describe the growth of a population in the presence of competition for resources. In this model each individual reproduces at rate β and dies at a rate ρP/N, where P is the current population size and N is the carrying capacity of the habitat. In the deterministic setting, the population size (P ) evolves as a function of time (t) according to the ordinary differential equation Let h(t) = P (t)/N be the population density at time t. Then the above differential equation becomes It is immediate that if h(0) > 0 then h(t) → h eq := β/ρ as t → ∞.
We now construct a population genetics model that has logistic interactions along with rare mutation and weak selection. Suppose the compact metric space E is the set of all the genetic traits that an individual can have. Each individual is given a mass of 1/N, with N being the carrying capacity as before. The population at time t can be represented by the measure where n N (t) is the number of individuals at time t and x 1 , x 2 , · · · ∈ E are their genetic traits. Let b s be a continuous function from E to R + . When the population density (total mass) is h, an individual with trait x ∈ E gives birth at rate (β + b s (x)/N) and dies at rate ρh. Its offspring has the same trait x with probability (1 − p(x)/N). However with probability p(x)/N, the offspring is a mutant and its trait is chosen according to the distribution ϑ(x, ·) ∈ P(E). The process {µ N (t) : t ≥ 0} can be viewed as a Markov process with state space M N,a (E) (see (2.1)). The timescale at which we have described the dynamics is such that the mutation and selection events will vanish in the limit N → ∞. Therefore to study their effects, we must observe the process at the timescale which is N times slower. Let The dynamics of {µ N (t) : t ≥ 0} has fast density regulation along with position-dependent birth (see Section 2.3.2) and offspring dispersal mechanism (see Section 2.3.3). Assuming that µ N (0) ⇒ µ(0) as N → ∞ and 1 E , µ(0) > 0 a.s. Theorem 2.6 gives us the following. If t N is a sequence satisfying t N → 0 and Nt N → ∞, then the process µ N (· + t N ) ⇒ h eq ν as N → ∞, where h eq = β/ρ and {ν(t) : t ≥ 0} is a Fleming-Viot process with generator given by . This is of course the generator of a Fleming-Viot process with mutation and fecundity selection. We now present a couple of cases where this process is well-studied.
Suppose that E = {1, . . . , K} for some K ∈ N. Then for any t ≥ 0 we can express ν(t) ∈ P(E) as the K-tuple (ν 1 (t), . . . , ν K (t)), where ν i (t) is the proportion of individuals having genetic trait i ∈ E. This representation allows us to view {ν(t) : t ≥ 0} as a process over the K-simplex For all i, j ∈ E set θ ij = βp(i)ϑ(i, {j}) and α i = b s (i). Then {ν(t) : t ≥ 0} is a diffusion process over ∆ K with generator given by where f ∈ C 2 (R K + , R), ν = (ν 1 , . . . , ν K ) and δ ij is the Kronecker delta function. This is the generator of the Wright-Fisher diffusion process [25]. Many explicit results about the fixation probabilities, fixation times and the stationary distribution can be found in [14].
Let us return to the situation where E is a general compact metric space and the dynamics evolves according to (3.47). Assume that for all x ∈ E we have b s (x) = 0, p(x) = 1 and ϑ(x, ·) = ϑ 0 (·), for some non-atomic probability measure ϑ 0 ∈ P(E). The resulting Fleming-Viot process {ν(t) : t ≥ 0} arises as a reformulation of the infinitely-many-neutral-alleles model due to Kimura and Crow [26] (see [12] and Section 9.2 in [13] for more details). In this case, ν(t) can be written as a countable sum ∞ i=1 a i δ x i for any t > 0 (see Theorem 7.2 in [13]). This means that at time t, a i fraction of the population is located at x i . Arranging these a i -s in descending order we can extract a process over the ordered infinite simplex This extracted process is a diffusion process over ∆ ∞ whose various properties are presented in [11]. Furthermore in [13] it is shown that the Fleming-Viot process {ν(t) : t ≥ 0} is ergodic and its unique stationary distribution Π ∈ P(P(E)) is given by where the infinite vector (φ 1 , φ 2 , . . . ) has the Poisson-Dirichlet distribution with parameter β/2ρ and ξ 1 , ξ 2 , . . . are i.i.d. with distribution ϑ 0 , independent of (φ 1 , φ 2 , . . . ). The Poisson-Dirichlet distribution was introduced and studied by Kingman [27] in 1975.
The results mentioned in the last two paragraphs indicate the behaviour of the evolutionary dynamics under our original model for large values of N.

Cell polarity
Cell polarity is an important phenomenon and understanding the mechanisms responsible for it is a matter of fundamental concern for biologists. It is widely accepted that polarity is established in 3 stages (see [10,2,33]), which can be described as follows: 1. An unpolarized cell receives a spatial cue that may be intrinsic (coming from inside the cell) or extrinsic (coming from the extracellular environment).
3. The feedback network inside the cell is activated, which amplifies the weak initial signal into a robust signal that can direct the molecules towards the clustering site.
The feedback network has two components : positive feedback which enables the membrane molecules to pull the cytosol molecules to their location on the membrane, and negative feedback that pushes the membrane molecules into the cytosol. Positive feedback is responsible for the localized recruitment of molecules on the membrane while negative feedback helps in regulating the population size on the membrane. The molecules diffuse slowly on the membrane but rapidly in the cytosol. Even though the feedback mechanism may bring the molecules together on the membrane, any clusters that form may not persist due to spatial diffusion. This caused some biologists to propose that other additional mechanisms are needed to generate spatial asymmetry (see [18,20]), but these mechanisms are not always found in cells that exhibit polarity. Hence it is important to investigate if the feedback mechanism can alone counter spatial diffusion to establish cell polarity. For this purpose, Altschuler et. al. [2] formulated a simple model based on the mechanisms mentioned above. We now describe their model. Consider the cell to be a sphere of radius R in R 3 . The whole cell has N molecules which may be present on the membrane or in the cytosol. The following four mechanisms change the configuration of molecules in the cell.
1. Association mechanism: Each molecule in the cytosol can move to a uniformly chosen location on the membrane at rate k on .
2. Positive feedback: Each molecule on the membrane pulls another molecule from the cytosol to its location at rate k fb × (fraction of molecules in the cytosol).
3. Negative feedback: Each molecule on the membrane is pushed into the cytosol at rate k off .
4. Spatial migration: Each membrane molecule is constantly diffusing on the membrane according to an independent Brownian motion with diffusion rate D.
The association mechanism provides the initial spatial cue to trigger cluster formation. In [2], this spatial cue is intrinsic because the authors are concerned with spontaneous cell polarity, which means that polarity is established without any extracellular influence. Hence the association mechanism acts uniformly on the membrane. When one wants to consider polarity that is established in response to a chemical gradient (see [39]) then a molecule associating itself to the membrane must choose its location according to some distribution that encodes the gradient information. We mentioned in Section 1, that the positive feedback mechanism is like a birth process, where the pulled cytosol molecule is the offspring of the recruiting membrane molecule. This introduces genealogical relationships between the membrane molecules. A set of membrane molecules are said to belong to a clan if they have a common ancestor. Note that when the diffusion rate (D) is small, we would expect the clan members to be huddled together. The analysis of the above model in [2] gives some interesting results. When the dynamics is described deterministically, using a reaction-diffusion partial differential equation, then the model fails to capture cell polarity. However in the stochastic setting, the model does predict the formation of clusters in certain parameter regimes, when the number of molecules (N) is small. This result is proved by showing that the number of clans on the membrane drops to 1 at certain times. For small D, one would observe a cluster at these times. However the frequency of these events is proportional to N −1 , which indicates that polarity cannot occur in the large population limit N → ∞, unless other mechanisms are present.
In [19] we rigorously study this model under a different scaling of parameters. We multiply k fb and k off by N, leaving k on and D unchanged. We keep track of the locations of the membrane molecules as well as their clan identities. A clan identity is a number in [0, 1] which is passed unaltered from the parent molecule to the offspring. A molecule that associates itself on the membrane is assigned a uniformly chosen clan identity in [0, 1]. At any time, the molecules on the membrane that have the same clan identity should have a common ancestor and hence they must belong to the same clan. Note that here the number of types (q) is equal to 1 and the population density is the same as the fraction of cell molecules that are on the membrane. Let E = E × [0, 1], where E is the membrane (sphere of radius R in R 3 ). When the number of molecules is N, the population dynamics is described by a M N,a (E)valued process {µ N (t) : t ≥ 0} as before. For any h ∈ R + let β(h) = k fb (1 − h), ρ(h) = k off and κ(h) = k on (1 − h). Let Θ ∈ P(E) be the uniform distribution on E and let the spatial migration operator B (see Section 2.1) be (D/2)∆, where ∆ denotes the Laplace-Beltrami operator on the sphere E. For any f : E × [0, 1] → R, ∆ acts on f only as the function of the first coordinate. The operator (D/2)∆ is just the generator of the Brownian motion on E with diffusion rate D. With this notation one can verify that this model is a special case of the model in Section 2.3.4. The association mechanism is analogous to immigration while the feedback mechanism gives rise to the density regulation mechanism. Theorem 2.6 (see also Theorem 2.3 in [19]) shows that as N → ∞ we have µ N ⇒ h eq ν where h eq = 1 − k off k fb and {ν(t) : t ≥ 0} is a P(E)-valued Fleming-Viot process with generator for any F (ν) = m l=1 f l , ν ∈ C 0 . It can be shown that this Fleming-Viot process is ergodic and has a unique stationary distribution in P(P(E)) (see Section 5 in [13] and Proposition 2.5 in [19]). To study the evolution of the clan sizes we define a P([0, 1])-valued process {ν c (t) : t ≥ 0} by This is a Fleming-Viot process that describes the infinitely-many-neutral-alleles model (recall the discussion in Section 3.1). Therefore for any t > 0, we can write ν c (t) = ∞ i=1 a i δ x i , which means that a i fraction of the population has clan identity x i . At stationarity, the clan sizes (arranged in descending order) are distributed according to the Poisson-Dirichlet distribution with parameter α = k on /k fb . Properties of the Poisson-Dirichlet distribution (see [15]) tell us that for any small ǫ > 0, there is a positive probability of the largest clan having size greater than (1 − ǫ). Furthermore one can show that at stationarity the molecules in each clan are concentrated on a circular patch on the membrane. The square of the radius of this patch can be approximately computed as (see Theorem 2.7 in [19]) The last two assertions imply that if D is small in comparison to R 2 , then at stationarity there is a positive probability that most of the membrane molecules are in one clan and that clan is spread over a small area on the membrane. Due to ergodicity this event will occur infinitely often in any trajectory of the process {ν(t) : t ≥ 0}. Whenever this event happens we can expect the cell to be polarized. Therefore the limiting process exhibits recurring cell polarity. In [19] we discuss how the frequency of observing polarity depends on various model parameters.
The above analysis shows that if the feedback mechanism is strong enough, it can counter spatial diffusion to generate cell polarity. However this conclusion is based on a highly simplified model. As mentioned in Section 1, most cells that exhibit polarity have complicated feedback circuits, with molecules of several types pulling each other on and off the membrane at various type-dependent rates. These different types of molecules may also have their own migration and association mechanisms. It would be interesting to know if the above analysis can be extended to general multi-type models for cell polarity. The results in this paper show that this can indeed be done as long as the feedback mechanism satisfies the assumptions in Section 2.2, and acts at a faster timescale than the association and migration mechanisms. In this case, Theorem 2.6 guarantees convergence to a Fleming-Viot process and this limiting process can then be analyzed in the same way as in [19]. This enables us to draw similar conclusions about the onset of cell polarity in this multi-type setting.  (2.12)). The wellposedness of the corresponding martingale problem is immediate from the well-posedness of the martingale problem for B N (see Chapter 4 in Ethier and Kurtz [12]). In our case, the dynamics under A N l may exit any compact set of M q N,a (E). However we can still argue the well-posedness of the corresponding martingale problem by showing that this exit time tends to infinity as the compact set gets bigger and bigger in size. We now make these ideas precise.
Lemma 4.1 Fix a l ∈ {0, 1, 2, 3} , N ∈ N and π ∈ P M q N,a (E) . For each k ∈ N let {µ k (t) : t ≥ 0} be a M q N,a (E)-valued process with initial distribution π. Define a stopping time where H is the density map (2.2). Suppose that for each k ∈ N and F ∈ C q is a martingale. Then for any t ≥ 0 lim k→∞ P (τ k ≤ t) = 0.
Proof. Let {h k (t) = H(µ k (t)) : t ≥ 0} be the density process corresponding to µ k and let c k : R q + → R q + be the function defined by Pick any ǫ ∈ (0, 1). Since for each k ∈ N the distribution of µ k (0) is π there must exist a k ǫ > 0 such that If k satisfies k ǫ ≤ kǫ then (4.51) For each i ∈ Q and k ∈ N, define a function F k i ∈ C q 0 by F k i (µ) = c k,i (h)h i where h = H(µ). From (2.31) we know that for any µ ∈ M q N,a (E) One can easily verify that Note that for each i, j ∈ Q, the functions b s ji , d s i are bounded, while the functions θ i and κ i satisfy (2.16) and (2.25). This implies that there exists a positive constant C (depending on N and l) such that By the assumption stated in the statement of this lemma we can say that is a martingale starting at 0. Taking expectations we get Then summing over i ∈ Q in (4.53) and using (4.52) we arrive at From (4.51) and Gronwall's inequality, for k ≥ k ǫ /ǫ we obtain Then by Markov's inequality (4.54) Observe that For large k, the second probability on the right is 0 because of the following reason. The process h k has jumps of size 1/N and hence the definition of τ k (see (4.49)) implies that h k (t ∧ τ k ) 1 ≤ k + (1/N) < 2k. Therefore using (4.54) we get Letting ǫ → 0 proves the lemma. is open with a compact closure in M q F (E). Define an operator L k : for any F ∈ C q 0 . The operator L k can be seen as a bounded perturbation of the operator B N . We argued in Section 2.1 that the martingale problem for B N is well-posed. From Theorem 4.10.3 in Ethier and Kurtz [12], the martingale problem for L k is well-posed for each k ∈ N. Pick a π ∈ P M q N,a (E) and let {µ k (t) : t ≥ 0} be the unique solution to the martingale problem for (L k , π). Define a stopping time by Then for any F ∈ C q is a martingale. From (2.31) one can see that if µ ∈ U k then A N l F (µ) = L k F (µ). Using the optional sampling theorem we get that is a martingale. Lemma 4.1 ensures that for any t ≥ 0, lim k→∞ P(τ k ≤ t) = 0.
From Theorem 4.6.3 in Ethier and Kurtz [12] we can conclude that there exists a unique solution to the martingale problem for (A N l , π).

Properties of the limiting process
The goal of this section is to prove Proposition 2.5 which gives important insights into the limiting behaviour of the dynamics under the models described in Section 2.3. As mentioned in Section 2.4 our proof of Proposition 2.5 will rely on the work of Katzenberger [23] which studies how semimartingales behave when they are driven by a fast drift that forces them to stay on a stable manifold. Before we can use the framework in [23] we need to prove some preliminary results. We start by recalling a tightness condition for semimartingales.  Moreover assume that each semimartingale Z N can be written as where h N (t) = H(µ N (t)). Then Z N is a semimartingale with respect to the filtration generated by {µ N (t) : t ≥ 0}. For any compact K ⊂ R q + define where o K denotes the interior of the set K. If Z N K is the semimartingale given by Z N K (t) = Z N (t ∧ λ N (K)) for t ≥ 0, then the sequence of semimartingales {Z N K : N ∈ N} satisfies Condition 4.2.
Proof. Let i ∈ Q and f ∈ D 0 be fixed. For each N ∈ N and l ∈ {0, 1, 2, 3} define a function a N l : where h = H(µ). Let U N k ⊂ M q N,a (E) be given by Then for each k ∈ N and l ∈ {0, 1, 2, 3} we have To see this note that for any N ∈ N, U N k ⊂ U k := {µ ∈ M q F (E) : H(µ) 1 ≤ 2k} and U k is a compact subset of M q F (E). For l ∈ {0, 1, 3}, a N l is a continuous function which does not depend on N and hence we get (4.59) simply by observing that Similarly if we define a continuous function Here C ji 's are the operators satisfying Assumption 2.2. This assumption also implies that This bound along with (4.61) and the triangle inequality shows (4.59) for l = 2. Let c k : R q + → R q + be given by (4.50).
where h = H(µ). One can verify that for any µ ∈ U N k B N + G N l F k (µ) = a N l (µ) (4.62) and From Lemma 4.1 we can conclude that for any fixed N, the stopping times τ N k converge to ∞ a.s. as k → ∞. Observe that F k belongs to the class C q 0 = D(A N l ). Hence is a martingale. From (2.31) and the optional sampling theorem we get that is also a martingale. If the set (0, t ∧ τ N k ] is non-empty then for any s ∈ (0, t ∧ τ N k ], we have c k (h N (s)) = 1 q and therefore F k (µ N (s)) = f, µ N i (s) . If the set (0, t ∧ τ N k ] is empty then t ∧ τ N k = 0 and in this case m N k (t) = 0. From (4.62) and (4.63) we see that for all t ≥ 0, m N k (t) = m N (t ∧ τ N k ), where m N is defined by (4.64). But m N k is a martingale and for a fixed N, τ N k → ∞ a.s. as k → ∞. Therefore we can conclude that m N is local martingale and hence Z N is a semimartingale.
Let F 2 k : M q F (E) → R be given by F 2 k (µ) = ( F k (µ)) 2 . Note that for any µ ∈ M F (E) Using this one can verify that if µ ∈ U N k and h = H(µ) then we have Using (2.31) and the above expressions we can show in a manner similar to (4.59) that for any k ∈ N we have The function F 2 k is also in C q 0 = D(A N l ). Therefore if m N k is the martingale given by (4.66) then is also a martingale. Therefore the expected quadratic variation of m N k can be computed as (4.68) For any fixed k ∈ N, the sequence of semimartingales {Z N (· ∧ τ N k ) : N ∈ N} satisfy (4.56) because the discontinuities of µ N are of size proportional to 1/N. From (4.64) we can see that the semimartingale Z N (· ∧ τ N k ) can be decomposed as For any 0 < s ≤ τ N k , µ N (s) ∈ U N k . Using (4.68), (4.67) and (4.59) we can see that Therefore for any k ∈ N, the sequence of semimartingales {Z N (· ∧ τ N k )} satisfies Condition 4.2. For any compact set K ⊂ R q + , there exists a k such that K ⊂ {h ∈ R q + : h 1 < k}. If the stopping time λ N K is defined by (4.57) then λ N K ≤ τ N k a.s. where τ N k is given by (4.65). Hence it is immediate that if Z N K is the semimartingale defined by Z N K (·) = Z N (· ∧ λ N (K)) then the sequence of semimartingales {Z N K : N ∈ N} will also satisfy Condition 4.2. Proof.[Proof of Proposition 2.5] Let {F N t } be the filtration generated by the process {µ N (t) : t ≥ 0}. For any compact set K ⊂ U eq let λ N (K) be given by (4.57). In this proof a sequence of {F N t }-semimartingales {Z N : N ∈ N} with paths in D R d [0, ∞) will be called well-behaved if for each compact K ⊂ U eq , the sequence of semimartingales {Z N (· ∧ λ N (K))} satisfies Condition 4.2.
Using Lemma 4.4 with f = 1 E , (where 1 E is as in (2.2)) for each i ∈ Q we obtain a well-behaved R-valued semimartingale Z N,1 i such that Recall the definition of the matrix A(h) from (2.13). The above expression is the same as If we let Z N,1 to be the R q -valued semimartingale given by (4.70) where the last equality holds due to definition (2.14). Now fix a f ∈ D 0 . From Lemma 4.4, for each i ∈ Q there is a well-behaved semimartingale Z N,f i such that Using the integration by parts formula for semimartingales, for each i, j ∈ Q we can write (4.71) where Z N ij is another well-behaved semimartingale given by The last term in the above equation is the cross-variation term between Z N,1 The semimartingale Y N i is just a linear combination of the semimartingales of the form h N j (t) f, µ N i (t) . Using (4.71) we can write (4.73) where Z N,2 i is a well-behaved semimartingale and for any t ≥ 0 Define a matrix G(h) for each h ∈ R q + as follows .
This allows us to write and hence from (4.73) Let Y N and Z N,2 be the R q−1 -valued semimartingales given by From part (E) of Lemma A.1 the matrix G(h eq ) is stable, that is, all its eigenvalues have strictly negative real parts. We now define a R q + × R q−1 -valued semimartingale X N by From (4.70) and (4.75) we can see that X N satisfies where Z N (t) = (Z N,1 (t), Z N,2 (t)) is a well-behaved semimartingale and F : R q + × R q−1 → R 2q−1 is the function given by Let x eq = (h eq , 0 q−1 ). Then F (x eq ) = 0 2q−1 and the Jacobian matrix [JF (x eq )] ∈ M(2q − 1, 2q − 1) has the block lower-triangular form where C is some (q − 1) × q matrix in M(q − 1, q) and O q,q−1 is the q × (q − 1) matrix of zeroes. We mentioned above that the matrix G(h eq ) is stable and part (B) of Assumption 2.1 says that the matrix [Jθ(h eq )] is also stable. Due to the block triangular form, the matrix [JF (x eq )] is stable as well. Pick any h 0 ∈ U eq and y 0 ∈ R q−1 . Let h(t) = ψ θ (h 0 , t) for all t ≥ 0, where ψ θ is the flow defined in Section 2.2. The set U eq ⊂ R q + in ψ θ -invariant and hence h(t) ∈ U eq for all t ≥ 0. Let y(t) be the unique solution of the initial value problem dy dt = G(h(t))y, y(0) = y 0 . (4.79) Since the above differential equation is linear in the y variable, the solution y(t) is defined for all t ≥ 0. Moreover h(t) → h eq and G(h(t)) → G(h eq ) as t → ∞. The matrix G(h eq ) is stable and therefore y(t) → 0 q−1 as t → ∞. Let U(x eq ) = U eq × R q−1 . For each x 0 = (h 0 , y 0 ) ∈ U(x eq ) and t ≥ 0, let ψ F (x 0 , t) = (h(t), y(t)) with h(t) = ψ θ (h 0 , t) and y(t) being the solution of (4.79). The mapping ψ F : U(x eq ) × R + → U(x eq ) is the flow of the vector field F on U(x eq ). For all x ∈ U(x eq ) and t ≥ 0, ψ F satisfies (4.80) From the discussion in the preceding paragraph we can conclude that We have assumed in this proposition that there is a compact set K 0 ⊂ U eq such that h N (0) ∈ K 0 a.s. for each N ∈ N. Note that any function f ∈ D 0 is bounded. The definition of the semimartingale Y N guarantees that there is a compact set K 1 ⊂ R q−1 such that X N (0) = (h N (0), Y N (0)) ∈ K 0 × K 1 ⊂ U(x eq ) a.s. for each N ∈ N. Now consider the equation (4.77). For large values of N, the semimartingale X N is driven by a large drift term of the form NF (X N (·)). The vector x eq is a stable fixed point for this drift term and U(x eq ) is its region of attraction. If we start in this region of attraction, then this drift is very forceful. It completely overwhelms the effect of the well-behaved semimartingale Z N and drives X N to the stable fixed point x eq . Moreover as N gets large, the trajectories of X N start looking more and more like the trajectories of the deterministic flow ψ F with time compressed by a factor of N. These ideas are made precise in a much more general setting by Katzenberger [23]. We use Theorem 6.3 in [23] to deduce that for any where · is the standard Euclidean norm. From the definition of X N and ψ F it is also clear that for any T > 0 From now on, for any x ∈ R n and ǫ > 0, let B n ǫ (x) denote the open ball in R n centered at x with radius ǫ. We have already argued that the Jacobian matrix of F at x eq is stable.
From a simple linearization argument (see for example the proof of Theorem 3.7 in Khalil [24]) we can see that there exists a δ 0 > 0 such that the open ball B 2q−1 δ 0 (x eq ) ⊂ U(x eq ) and for every δ ∈ (0, δ 0 ) there exists a ψ F -invariant open set W δ whose closure W δ is contained in B 2q−1 δ (x eq ). From (4.81) we obtain As W δ is ψ F -invariant, the open sets on the right are getting bigger and bigger as t increases. Compactness of K 0 × K 1 implies that there exists a t δ > 0 such that ψ F (x, t) ∈ W δ for all x ∈ K 0 × K 1 and t ≥ t δ . This immediately gives us Now let t N be any sequence satisfying the conditions of this proposition. Then for any T > 0 where the second inequality is true because X N (0) ∈ K 0 × K 1 a.s. for each N ∈ N. From (4.82) and (4.84) we can see that as This proves part (A) of the proposition since the norms · and · 1 are equivalent in R q . From the definition of Y N i we can check that for each i, j ∈ Q and t ≥ 0 The limits (4.85) immediately give us part (B) of the proposition for any f ∈ D 0 . As D 0 is dense in C(E), part (B) holds for any f ∈ C(E).
The following lemma will be useful in proving Theorem 2.6.
Lemma 4.5 Let the notation and assumptions be the same as in Proposition 2.5. Then there is a compact set K ⊂ U eq such that for all T > 0 Proof. Since the Jacobian matrix [Jθ(h eq )] is stable (part (B) of Assumption 2.1), by a linearization argument similar to the one referred in the proof above, we can find an ǫ > 0 such that the open ball B q ǫ (h eq ) ⊂ U eq and there exists a ψ θ -invariant open set U ǫ such that its closure U ǫ ⊂ B q ǫ (h eq ). One can argue as before that since K 0 is a compact set, there exists a t ǫ such that for all t ≥ t ǫ and x ∈ K 0 , ψ θ (x, t) ∈ U ǫ . If we define K 0 as then it is a ψ θ -invariant compact set containing K 0 . Because we have assumed that h N (0) ∈ K 0 a.s. for all N ∈ N, we must have that for all t ≥ 0, ψ θ (h N (0), t) ∈ K 0 a.s.. But U eq is open in R q + (see Section 2.2) and so there is a γ > 0 such that The limit on the right is 0 due to (4.83) and this proves the lemma.

Solution to a system of partial differential equations
Recall the discussion at the end of Section 2.4. To prove Theorem 2.6 we require a function Λ that allows us to construct a P(E) valued process {ν N (t) : t ≥ 0} (see (2.44)) whose dynamics is well-behaved as N approaches ∞. The goal of this section is to guarantee that such a function Λ exists. Specifically, we need to show that for some open set U eq ⊂ R q containing U eq (given by (2.20)), we have a function Λ ∈ C 2 ( U eq , R q * ) which satisfies the following: Λ(h), h = 1 for all h ∈ U eq (4.88) and Λ(h eq ) = v eq . (4.89) Here v eq is defined in (2.34) and [JΛ(h)] in equation (4.87) refers to the Jacobian matrix of Λ at h. The significance of the above relations will become clear in Section 4.4. The major difficulty in solving (4.87) arises in the neighbourhood of h eq . This is because θ(h eq ) = 0 q (part (A) of Assumption 2.1), which causes degeneracy in the system. However the next proposition shows that by employing power series expansions we can get around this problem and find an analytic solution to (4.87) in a neighbourhood of h eq . We later construct an open set U eq containing U eq and extend the solution over the whole U eq . We also show that this solution has all the properties we desire.

Proposition 4.6
There exists an open set V containing h eq such that the equation (4.87) has an analytic solution Λ on V satisfying (4.89).
Proof. We first transform the equation (4.87) into another equation that is easier to work with. Let λ 1 , . . . , λ q be the eigenvalues of the matrix [Jθ(h eq )]. We will prove this proposition under the assumption that all these eigenvalues are real. We later remark how the proof changes when they take complex values.
We know from part (B) of Assumption 2.1 that λ i < 0 for each i ∈ Q. Pick an ǫ 0 ∈ (0, 1) such that (4.90) Let M 1 ∈ M(q, q) be the matrix representing the Jordan canonical form of [Jθ(h eq )]. Its diagonal is occupied by λ 1 , . . . , λ q , while its super-diagonal entries are either 0 or 1. All the other entries are 0. Let P 1 ∈ M(q, q) be the invertible matrix such that P 1 [Jθ(h eq )]P −1 and M is just the matrix M 1 with each 1 on the super-diagonal replaced by ǫ 0 . In this proof, 0 will always denote the vector of zeroes in R q . Since h eq ∈ R q * (see part (A) of Lemma A.1) and the map x → h eq + P −1 x is continuous, we can find a r 0 > 0 such that for any x ∈ B q r 0 (0) we have h eq + P −1 x ∈ R q * , where B q r 0 (0) is the open ball in R q with radius r 0 centered at 0. For all x ∈ B q r 0 (0), let A(x) ∈ M(q, q) and θ(x) ∈ R q be given by A(x) = A T (h eq + P −1 x) and θ(x) = P θ(h eq + P −1 x).
Suppose β : U → R q is a function which is analytic in an open set U ⊂ B q r 0 (0) and for all along with where v eq is given by (2.34). If V ⊂ R q + is the image of U under the map x → h eq +P −1 x, then V is an open set containing h eq and the function Λ : V → R q defined by Λ(h) = β(P (h−h eq )) is an analytic solution to (4.87) satisfying (4.89). Hence to prove the proposition it suffices to show that equation (4.92) has a solution β in some neighbourhood of 0 which satisfies (4.93).
Applying the operator D α to equation (4.92) and using the product rule for multiderivatives we get On rearranging we obtain (4.97) For any j ∈ Q, let e j ∈ N q 0 be the multi-index (0, . . . , 0, 1, 0, . . . , 0), with the 1 at the j-th position. Observe that if |α − ν| = 1 then ν = α − e j for some j ∈ Q. Therefore Note that [J θ(0)] = M (see (4.91)). This matrix has the eigenvalues λ 1 , . . . , λ q on the diagonal and either 0 or ǫ 0 on the super-diagonal. For each j = 2, . . . , q let ǫ j = ǫ 0 if M (j−1)j = ǫ 0 and ǫ j = 0 otherwise. Then for x = 0 we obtain Note that θ(0) = P θ(h eq ) = 0 and for each α ∈ N q 0 , γ α is given by (4.95). We plug x = 0 in (4.97) and divide by α! to get The second term can be simplified as Therefore we can write Y α as For each k ∈ N, let S k be the set of multi-indices given by S k = {α ∈ N q 0 : |α| = k}. The number of elements in S k is We order the multi-indices in S k as follows. We say that ν α if and only if i∈Q iν i ≤ i∈Q iα i . Let α k (1), . . . , α k (s k ) be all the elements of S k listed in the order given by . Let the matrix Ξ (k) ∈ M(qs k , qs k ) be a block matrix composed of s 2 k blocks of size q × q. For each i, j ∈ {1, 2, . . . , s k } the block starting at row q(i − 1) + 1 and column q(j − 1) + 1 of matrix Ξ (k) is occupied by the matrix L ij ∈ M(q, q) defined as follows. If i = j then L ii = A(0) + l∈Q α k l (i)λ l I q . If i and j are such that α k (j) = α k (i) − e l + e l−1 for some l ∈ {2, . . . , q} then L ij = ǫ l (α k l−1 (i) + 1)I q . For every other i and j, L ij is just a matrix of zeroes. The matrix Ξ (k) is lower block-triangular and its determinant is given by The eigenvalues λ 1 , . . . , λ q satisfy (4.90). Since all the eigenvalues of the matrix A(0) = A T (h eq ) have non-positive real parts (see part (B) of Lemma A.1), the above determinant is non-zero. Hence the matrix Ξ (k) is invertible. Let X (k) and Y (k) be the vectors in R qs k given by Using (4.98) we obtain the following linear system and since the matrix Ξ (k) is invertible Note that Y (k) only depends on {γ α : α ∈ S l for l ∈ {0, 1, . . . , k − 1}}. Hence for each k ∈ N we can solve for the whole set {γ α : α ∈ S k } using (4.100). Doing this iteratively for each k we can solve for γ α for all α ∈ N q 0 . The function β given by (4.94) with this choice of γ α 's will solve (4.92) in a neighbourhood of 0 if we can show that (4.96) holds for some C > 0. Showing this will be our next task.
Any entry on the diagonal of Ξ (k) has the form A ii (0) + j∈Q λ j α j for some α ∈ S k and i ∈ Q. Observe that A(0) = A T (h eq ) and this matrix only has non-positive entries on its diagonal (see (2.13)). From (4.90), for α ∈ S k we obtain the estimate For each row of Ξ (k) , the sum of the absolute values of the non-diagonal entries is bounded above by max i∈Q j∈Q,j =i Hence from (4.101) and (4.102) we can conclude that there exists a K 0 ∈ N such that for all k ≥ K 0 the matrix Ξ (k) is strictly diagonally dominant and we have min Theorem 1 in Varah [36] shows that for all k ≥ K 0 Part (D) of Assumption 2.1 says that for each i, j ∈ Q, the functions ρ i and β ij are analytic in a neighbourhood of h eq . This implies that there is a neighbourhood U of 0 such that the M(q, q)-valued function A and the R q -valued function θ are analytic component-wise on U. Therefore there is a constant C 0 such that We can assume that C 0 > 1. Choose a δ > 0 satisfying δ < ǫ 0 C 0 q(q + 1)2 q+3 (4.105) and define C = C 0 /δ. We will prove (4.96) by induction. Let k > K 0 and suppose that C is large enough to satisfy for all l ∈ {1, 2, . . . , k − 1} and ν ∈ S l . To prove (4.96) we need to show that γ α ∞ ≤ C k for all α ∈ S k . This is equivalent to showing that X (k) ∞ ≤ C k . From (4.100) and (4.103) we have Hence to prove (4.96) it suffices to show that From (4.99), (4.104) and (4.106), for any α ∈ S k we get But C = C 0 /δ and |α| = k. Hence and this shows that Since δ ∈ (0, 1/2) and n(α) ≤ q, by Taylor's theorem we see that Using these estimates, (4.109), (4.110) and (4.108) we get But δ satisfies (4.105) which shows (4.107) and completes the proof of the proposition. At the beginning of the proof, we had assumed that the eigenvalues λ 1 , . . . , λ q of the matrix [Jθ(h eq )] are all real-valued. If that is not true then the invertible matrix P that appears in (4.91) has complex entries. Let C be the field of complex numbers. Define a map φ : R q → C q by φ(h) = P (h − h eq ). The image of this map, denoted by φ(R q ), sits as a q-dimensional real vector space in C q . The map φ is an infinitely differentiable isomorphism between R q and φ(R q ) and using this we can define derivatives of real-valued functions over φ(R q ). As above, we can obtain an analytic solution β of (4.92) satisfying (4.93), defined on some open set U in φ(R q ) containing 0. On V = φ −1 (U), the function Λ defined by Λ(h) = β(φ(h)) will then be an analytic solution to (4.87) satisfying (4.89).
The above proposition provides us with an analytic solution to (4.87) in a neighbourhood of h eq . Our next task is to extend it to a solution in C 2 ( U eq , R q * ) where U eq is an open set in R q containing U eq .
Recall from Section 2.2 that for all i, j ∈ Q, β ij , ρ i are functions in C 2 (R q + , R + ). Let O ⊂ R q be the open set containing R q + defined by O = {h ∈ R q : h i > −1 for all i = 1, . . . , q} .
Then we can extend the functions β ij , ρ i to functions β ij , ρ i ∈ C 2 (R q , R + ) such that β ij (h) = 0 and ρ i (h) = 0 for all h / ∈ O. Moreover since each β ij is bounded, we can make sure that its extension β ij is also bounded. For each h ∈ R q let A(h) ∈ M(q, q) be the matrix defined by (2.13) with β ij , ρ i replaced by β ij , ρ i . Also let θ ∈ C 2 (R q , R q ) be the function given by (4.111) Corresponding to θ we can define the flow map ψ ∈ C 2 (R q × R + , O) as the unique solution to the equation analogous to (2.18), with θ replaced by θ. Define the region of attraction of the fixed point h eq as Then U eq is an open set in R q (see Lemma 3.2 in [24]) containing U eq . Proposition 4.7 There exists a solution Λ ∈ C 2 ( U eq , R q + ) of (4.87) satisfying (4.88) and (4.89).
Proof. Suppose that U ⊂ U eq is any ψ-invariant open set and the function Λ ∈ C 2 (U, R q ) satisfies (4.87) and (4.89). We first show that this function automatically satisfies (4.88) on U. Using (4.87) and the ψ-invariance of U we get where the last equality holds due to (4.111). This shows that for any fixed h ∈ U the function ψ(h, t), Λ( ψ(h, t)) is a constant function of time. Therefore (4.89) implies that for This proves that Λ satisfies (4.88) on U. For any h ∈ U and 0 ≤ t ≤ t 0 , let Φ(h, t, t 0 ) be the matrix defined in Lemma A.2. Since Λ satisfies (4.112) we must have Λ( ψ(h, t)) = Φ(h, t, t 0 )Λ( ψ(h, t 0 )). (4.113) From Proposition 4.6 we know that on some open set V ⊂ R q containing h eq we can find a solution Λ ∈ C 2 (V, R q ) that satisfies (4.87) along with (4.89). Since v eq ∈ R q * (that is, it is positive component-wise) and Λ is a continuous function, by shrinking V if necessary, we can ensure that the image of V under Λ lies in R q * .
Let h ∈ O n and m ≥ n. Then ψ(h, n) ∈ W . From part (A) of Lemma A.2 and (4.115) we can deduce that for any t ∈ [0, n) Hence if we define the map λ : then λ is a well-defined function in C 2 ( U eq × R + , R q * ) which satisfies From (4.114) and the definition of the matrix Φ we can see that Then this map is in C 2 ( U eq , R q * ) and (4.117) implies that for any (h, t) ∈ U eq × R + ψ(h, t)).
If we set t = 0 then we see that Λ is a solution to (4.87). Since Λ satisfies (4.89), equation (4.116) implies that Λ will also satisfy it. We have already shown that such a solution of (4.87) will automatically satisfy (4.88) for all h ∈ U eq . This completes the proof of the proposition.

Fleming-Viot convergence
In this section we will finally prove the main result of our paper, which is Theorem 2.6. Let {µ N (t) : t ≥ 0} be a M q N,a (E)-valued process with generator A N l for some l ∈ {0, 1, 2, 3}. As outlined at the end of Section 2.4, we first extract a P(E)-valued process {ν N (t) : t ≥ 0} from the process {µ N (t) : t ≥ 0}. This step requires a solution Λ of (4.87) whose existence was shown in Section 4.3. We then show that as N → ∞, we have ν N ⇒ ν where {ν(t) : t ≥ 0} is an appropriately defined Fleming-Viot process. This convergence and Proposition 2.5 prove Theorem 2.6. Before we proceed we need some preliminary results.
For any set From now on let Λ ∈ C 2 ( U eq , R q * ) be a function that satisfies (4.87), (4.88) and (4.89) on some open set U eq ⊂ R q containing U eq . Such a function exists by Proposition 4.7. Define a continuous map Γ : where the measure ν is given by with h = H(µ) being the density vector corresponding to µ. Note that for each h ∈ U eq , Λ(h) is a vector which is positive in each component and hence ν(S) ≥ 0 for all S ∈ B(E).
Since the function Λ satisfies (4.88) we have This shows that ν is a probability measure on E. Let Υ ′ be the class of functions in C (M q F (E : U eq )) given by Let Υ be the smallest algebra of functions in C (M q F (E : U eq )) containing Υ ′ . Observe that if G ∈ C (M q F (E : U eq )) and L ∈ Υ, then the product GL is in Υ. Given two functions G 1 , G 2 ∈ C (M q F (E : U eq )) we say that G 2 (µ) = G 1 (µ) + Υ for all µ ∈ M q F (E : U eq ) if and only if the function (G 2 − G 1 ) is in the class Υ.
Let F ∈ C(P(E)) be a function in the class C 0 defined by (2.4). Then F has the form where f 1 , . . . , f m ∈ D 0 . Corresponding to F , define the functions F l , F lk ∈ C 0 for all distinct l, k ∈ Q by Using any F ∈ C 0 we construct a function F ∈ C q 0 as follows. We first extend the definition of Λ to the whole of R q by letting Λ(h) = 0 q for all h / ∈ U eq . If F has the form (4.122) then consider the function F : M q F (E) → R given by where h = H(µ). Due to (4.88), the function F is in the class C q 0 defined by (2.11). The next result demonstrates how the action of various operators on functions of the form (4.124) can be approximated.
Proposition 4.8 Let F ∈ C 0 have the form (4.122). Corresponding to F let F ∈ C q 0 have the form (4.124) and for distinct l, k ∈ Q let F l , F lk be given by (4.123). Then for all µ ∈ M q F (E : U eq ) with h = H(µ) and ν = Γ(µ) we have the following. (A) Let R N be the operator given by (2.21). Then where (4.126) (B) Let B N be the operator given by (2.12). Then (C) Let G N 1 be the operator given by (2.28). Then where for any x ∈ E and h ∈ R q (D) Let G N 2 be the operator given by (2.29). Then where the operators C ij are as in Assumption 2.2.
(E) Let G N 3 be the operator given by (2.30). Then

(4.131)
Proof. For any j ∈ Q, let e j be the vector in R q of the form e j = (0, . . . , 0, 1, 0, . . . , 0) with the 1 at the j-th position. Since Λ ∈ C 2 ( U eq , R q + ) and U eq is an open set containing U eq , if h ∈ U eq , then using Taylor's theorem we can write for any i, j ∈ Q. But then for any µ ∈ M F (E : U eq ) and x ∈ E where F l , F lk are as in (4.123) and χ j l (µ, x), φ j l (µ, x) are given by On rearranging we obtain Therefore for any µ ∈ M F (E : U eq ) But note that where the matrix A(h) and the vector θ(h) are defined by (2.13) and (2.14). Since the function Λ satisfies (4.87), the expression on the right is just 0. Hence the formula for NR N F (µ) simplifies to

Equation (4.88) says that for all
Pick a j ∈ Q. Differentiating the above equation with respect to h j we get and differentiating again with respect to h j we obtain Recall that for any µ ∈ M F (E : U eq ), ν = Γ(µ) is given by (4.120). Using (4.88) one can verify that for any f ∈ C(E) and i ∈ Q (4.137) But the second term on the right is a function in the class Υ. Hence for all µ ∈ M F (E : U eq ) From the definitions of χ j l and φ j l (see (4.132)) it is immediate that for any j ∈ Q, l ∈ {1, . . . , m} and x ∈ E we have the following relations. For all µ ∈ M F (E : U eq ) Using (4.135) and (4.136) we obtain Recall that the class Υ is invariant under multiplication by functions in C(M F (E : U eq )). It can be checked that for any i, j ∈ Q and l, k ∈ {1, . . . , m} and the function µ → φ j l (µ, ·), µ i belongs to class Υ. Substituting these two relations in (4.134) proves part (A) of this proposition.
Recall the definition of the operator B n i from Section 2.1. If G(ν) = l j=1 g j , ν ∈ C 0 then one can verify (see Section 2.2 in [7]) that there is a constant c (depending on l and g 1 , . . . , g l ) such that This proves part (B) of the proposition. Using (4.133), (4.139) and (4.138) we get Using (4.133) , (4.132) and (4.138) we obtain This proves part (D).
Again using (4.133), (4.139) and (4.138) we see that This proves part (E) and completes the proof of this proposition.
We now prove the main theorem of the paper. Proof.[Proof of Theorem 2.6] Fix a l ∈ {0, 1, 2, 3} and let {µ N (t) : t ≥ 0} be a solution to the martingale problem for A N l . Let {h N (t) = H µ N (t) : t ≥ 0} be the corresponding density process. Since µ N (0) ⇒ µ(0) as N → ∞ and H(µ(0)) ∈ U eq a.s. we must have that h N (0) ⇒ h(0) and h(0) ∈ U eq a.s. It suffices to prove the theorem under the assumption that for all N ∈ N, h N (0) ∈ K 0 a.s. for some compact K 0 ⊂ U eq . By Lemma 4.5, we can find a bigger compact set K ⊂ U eq containing K 0 such that if we define the stopping time σ N by Let F ∈ C 0 be a function of the form (4.122) and let F ∈ C q 0 have the form (4.124). From (2.31) and Proposition 4.8 we can conclude that for all µ ∈ M F (E : U eq ) where h = H(µ), ν ∈ Γ(µ) and the continuous functions ζ B F , ζ R F , ζ G,0 F , ζ G,1 F , ζ G,2 F , ζ G,3 F from U eq × P(E) to R are defined in Remark 4.9.
The relation (4.142) implies that The function F belongs to the domain of A N l and the process {µ N (t) : t ≥ 0} is a solution to the martingale problem for A N l . Hence F (µ N (t)) − F N (µ N (0)) − t 0 A N l F (µ N (s))ds is a martingale and using the optional sampling theorem we see that is also a martingale. Note that for all t ≥ 0, µ N (t ∧ σ N ) is in the set M F (E : U eq ). Define a P(E)-valued process by The form of the functions F and F shows that F (µ N (t ∧ σ N )) = F (ν N (t)) for any t ≥ 0. Hence the martingale m N F can be rewritten as The linear span of functions in the class C 0 is a dense sub-algebra of C (P(E)) and for every F ∈ C 0 we have the martingale relation (4.145). Theorems 3.9.1 and 3.9.4 in Ethier and Kurtz [12] along with the estimate (4.143) imply that the sequence of processes {ν N : N ∈ N} is tight in the space D P(E) [0, ∞). Let t N be a sequence satisfying the conditions of Theorem 2.6. Pick a T > 0. It is easy to see that and for any f ∈ C(E) and i, j ∈ Q  We argued before that the sequence of processes {ν N : N ∈ N} is tight. Let {ν(t) : t ≥ 0} be a limit point. Then along some sequence k N , ν N ⇒ ν as N → ∞. Since σ N ⇒ ∞, using the continuous mapping theorem and (4.147) we obtain that along the subsequence k N Using (4.151) and the continuous mapping theorem we can conclude that for any F ∈ C 0 , as N → ∞, the sequence of martingales m N F (given by (4.145)) converges in distribution along the subsequence k N to the martingale given by F (ν(t)) − F (ν(0)) − t 0 A l F (ν(s))ds. (4.152) This shows that {ν(t) : t ≥ 0} is a solution to the martingale problem for A l . Let π ∈ P(P(E)) be the distribution of Γ(µ(0)). Since µ N (0) ⇒ µ(0) as N → ∞ and Γ is a continuous map we must also have that ν N (0) ⇒ ν(0), where ν(0) has distribution π. We argued in Section 2.4 that the martingale problem for each A l is well-posed. Hence {ν(t) : t ≥ 0} is the unique solution to the martingale problem for (A l , π) and thus ν N ⇒ ν as N → ∞, along the entire sequence. Moreover the limiting process has sample paths in C P(E) [0, ∞) almost surely. Let { µ N (t) : t ≥ 0} be the process defined by (2.43). Pick a i ∈ Q and f ∈ C(E). From (4.137) for any 0 ≤ t < σ N − t N we can write Since ν N ⇒ ν, σ N ⇒ ∞ and t N → 0, (4.147) , (4.148) and the continuity of the sample paths of {ν(t) : t ≥ 0} imply that for any T > 0 sup t∈[0,T ] f, µ N i (t) − h eq,i f, ν(t) ⇒ 0 as N → ∞.
This holds for any f ∈ C(E) and i ∈ Q. Hence by Theorem 3.7.1 in Dawson [7], µ N ⇒ h eq ν as N → ∞ in the Skorohod topology on D M q F (E) [0, ∞). This completes the proof of Theorem 2.6.

Remark 4.10
In the statement of Theorem 2.6 we did not specify how the initial distribution π of the limiting Fleming-Viot process {ν(t) : t ≥ 0} is related to the distribution of µ(0). However the above proof makes it clear that π is the distribution of Γ(µ(0)) where Γ is the map defined by (4.119).
where v is some vector in R q−1 . Let L = Diag(h eq ) and Q = P −1 L −1 . Note that Qh eq = P −1 L −1 h eq = P −1 1 q = 0 q and hence This shows that QG(h eq )Q −1 = QA(h eq )Q −1 . Hence the matrices G(h eq ) and A(h eq ) are similar and have the same eigenvalues. Part (B) of this lemma and (A.2) imply that the matrix G(h eq ) is stable. This completes the proof of this lemma.
Let A, θ, ψ and U eq be as defined in Section 4.3 (just prior to Proposition 4.7).
Lemma A.2 Fix a t 0 > 0. For (h, t) ∈ U eq × [0, t 0 ] consider the following matrix equation where I q is the q × q identity matrix. This equation has a unique solution in C 2 ( U eq × [0, t 0 ], M R (q, q)) that satisfies the following for any h ∈ U eq , s ≥ 0 and 0 ≤ t ≤ t 0 .
(C) If v 0 ∈ R q * then Φ(h, t, t 0 )v 0 ∈ R q * . Proof. The function ψ is in C 2 (R q × R + , R q ) and the matrix-valued function (h, t) → A T ( ψ(h, t)) is in C 2 (R q ×R + , M(q, q)). Standard existence and uniqueness results for ordinary differential equations guarantee that there is a unique solution for (A.3) in the class C 2 ( U eq × [0, t 0 ], M R (q, q)).
Part (A) of the lemma is just the Chapman-Kolmogorov property (see Proposition 2.12 in Chicone [6]). Note that due to the semigroup property for ψ (similar to (2.19)) both Φ( ψ(h, s), t, t 0 ) and Φ(h, t + s, t 0 + s) satisfy the same equation for t ∈ [0, t 0 ]. Hence by uniqueness of solutions, part (B) is immediate.
We now prove part (C). Note that only the diagonal elements of the matrix A T ( ψ(h, t)) can be negative. For t ∈ [0, t 0 ] let ψ(h, t)).