A central limit theorem for the spatial Lambda Fleming-Viot process with selection

We study the evolution of gene frequencies in a population living in $\mathbb{R}^d$, modelled by the spatial Lambda Fleming-Viot process with natural selection (Barton, Etheridge and Veber, 2010 and Etheridge, Veber and Yu, 2014). We suppose that the population is divided into two genetic types, $a$ and $A$, and consider the proportion of the population which is of type $a$ at each spatial location. If we let both the selection intensity and the fraction of individuals replaced during reproduction events tend to zero, the process can be rescaled so as to converge to the solution to a reaction-diffusion equation (typically the Fisher-KPP equation, as in Etheridge, Veber and Yu, 2014). We show that the rescaled fluctuations converge in distribution to the solution to a linear stochastic partial differential equation. Depending on whether offspring dispersal is only local or if large scale extinction-recolonization events are allowed to take place, the limiting equation is either the stochastic heat equation with a linear drift term driven by space-time white noise or the corresponding fractional heat equation driven by a coloured noise which is white in time. If individuals are diploid (i.e. either $AA$, $Aa$ or $aa$) and if natural selection favours heterozygous ($Aa$) individuals, a stable intermediate gene frequency is maintained in the population. We give estimates for the asymptotic effect of random fluctuations around the equilibrium frequency on the local average fitness in the population. In particular, we find that the size of this effect - known as the drift load - depends crucially on the dimension $d$ of the space in which the population evolves, and is reduced relative to the case without spatial structure.


Introduction
Consider a population distributed across a geographical space (typically of dimension one or two). Suppose that each individual carries one of several possible versions (or alleles) of a gene. How do the different allele frequencies evolve with time and how are they shaped by the main evolutionary forces, such as natural selection and migration? To answer this question, early models from population genetics were adapted by G. Malécot [Mal48], S. Wright [Wri43] and M. Kimura [Kim53] to include spatial structure. These spatial models either considered subdivided populations reproducing locally and exchanging migrants at each generation or made inconsistent assumptions about the distribution of individuals across space.
In this work, we focus on a mathematical model for populations evolving in a spatial continuum, the spatial Λ-Fleming-Viot process (SLFV for short), originally proposed in [Eth08]. The main feature of this model is that instead of each individual carrying exponential clocks determining its reproduction and death times, reproduction times are specified by a Poisson point process of extinction-recolonization events. At each of these events, some proportion -often denoted u -of the individuals present in the region affected by the event is replaced by the offspring of an individual (the parent) chosen within this region. (The proportion u which is replaced is called the impact parameter.) We shall only consider cases where the region affected is a (d-dimensional) ball, and the Poisson point process specifies the time, centre and radius of reproduction events. (Since we consider scaling limits, minor changes to this assumption would not change our results.) Natural selection can be included in the SLFV by introducing an independent Poisson point process of selective events which give an advantage to a particular type. Multiple potential parents are chosen in the region affected by the event and one is chosen to be the parent and have offspring in a biased way depending on their types. The selection parameter determines the rate of this Poisson point process. A comprehensive survey of recent developments related to the SLFV can be found in [BEV13a].
Several works have focussed on characterising the behaviour of this model over large space and time scales, in the special case where only two types (or two alleles) a and A are present in the population. In this case the state of the process is given by a map q t : R d → [0, 1] defined Lebesgue almost everywhere, where q t (x) denotes the proportion of type a at location x and at time t. We shall first consider the simplest form of selection when individuals are haploid, i.e. each individual has one copy of the gene. At selective events, two potential parents are chosen and if their types are different, the parent is the one which has type A. In [EVY14], rescaling limits of this form of the spatial Λ-Fleming-Viot process with selection (SLFVS) have been obtained when both the impact parameter and the selection parameter tend to zero. Earlier results on the large scale behaviour of the SLFV had already been established in [BEV13b] in the neutral case (i.e. without selection), but keeping the impact parameter macroscopic. The behaviour of the SLFVS in the corresponding regime is studied in [EFS15] and [EFPS15].
The limiting process obtained by [EVY14] turns out to be deterministic as soon as d ≥ 2, and, when the reproduction events have bounded radius, it is given by the celebrated Fisher-KPP equation, (1) This result fits the original interpretation of this equation proposed by R. A. Fisher as a model for the spread of advantageous genes in a spatially distributed population [Fis37]. The spatial Λ-Fleming-Viot process with selection (SLFVS) can thus be thought of as a refinement of the Fisher-KPP equation, combining spatial structure and a random sampling effect at each generation -what biologists call genetic drift.
In the present work we prove a slightly stronger form of convergence to this deterministic rescaling limit. We also study the fluctuations of the allele frequency about (an approximation of) (f t ) t≥0 . We find that if the impact parameter is sufficiently small compared to the selection parameter and the fluctuations are rescaled in the right way then in the limit they solve the following stochastic partial differential equation, where W is space-time white noise, and f is the solution of (1). More detailed statements with the precise conditions on the parameters of the SLFVS are given in Section 2. A very similar result was proved by F. Norman in the non-spatial setting [Nor75a] (see also [Nor74a], [Nor77] and [Nor75b]). Norman considered the Wright-Fisher model for a population of size N under natural selection (see [Eth09] for an introduction to such models). Let p N n denote the proportion of individuals not carrying the favoured allele at generation n, and suppose that the selection parameter is given by s N = ε N s, with ε N → 0 and ε N N → ∞ as N → ∞. (At each generation, individuals choose a parent of the favoured type with probability (1+s N )(1−p N n ) 1+s N (1−p N n ) .) Norman showed that, as N → ∞, p N t/ε N converges to g t , which satisfies (In the weak selection regime -i.e. when N s N = O (1) -one recovers the classical Wright-Fisher diffusion.) Furthermore, the fluctuations of p N t/ε N around g t are of order (N ε N ) −1/2 . More precisely, for t = nε N , n ∈ N, set Z N (t) = (N ε N ) 1/2 p N t/ε N − g t , and define Z N (t) for all t ≥ 0 by linear interpolation. Theorem 2 in [Nor75a] states that, as N → ∞, Z N (t) t≥0 converges to the solution of the following stochastic differential equation, where (B t ) t≥0 is a standard Brownian motion; note that (z t ) t≥0 is a Gaussian diffusion. A similar regime in the case of a neutral model with mutations was already studied by W. Feller in [Fel51, Section 9], who identified the limiting diffusion for the fluctuations around the equilibrium frequency. Norman's result can be extended to other classical models from population genetics, and in particular to continuous-time processes such as the Moran model and the (non-spatial) Λ-Fleming-Viot process (introduced in [BLG03]). The necessary tools can be found mainly in [EK86,Chapter 11] (see also Chapter 6 of the same book) and in [Kur71]. In this paper we adapt these methods to the setting of the spatial Λ-Fleming-Viot process, with the necessary tools for stochastic partial differential equations taken from [Wal86] (see also [MT95] and [DMFL86]).
We also consider a second regime for the SLFVS to allow large scale extinction-recolonization events; we let the radius of reproduction events follow an α-stable distribution truncated at zero. For this regime, as in [EVY14], we find the Fisher-KPP equation with non-local diffusion as a rescaling limit (i.e. with a fractional Laplacian instead of the usual Laplacian). The Laplacian is also replaced by a fractional Laplacian in (2), the equation satisfied by the limiting fluctuations, and the noise W becomes a coloured noise with spatial correlations of order |x − y| −α (see Subsection 2.2).
These results are valid for a general class of selection mechanisms, with modified versions of (1) and (2) (and our proof will cover the general case). As an application of our results on the fluctuations, we turn to a particular kind of selection mechanism. Suppose a given gene is present in two different forms -denoted A 1 and A 2 -within a population. Suppose also that each individual carries two copies of this gene (each inherited from one of two parents). We say that individuals are diploid, and homozygous individuals are those who carry two copies of the same type (A 1 A 1 or A 2 A 2 ) while heterozygous individuals carry one copy of each type (A 1 A 2 ). Overdominance occurs when the relative fitnesses of the three possible genotypes are as follows, where s 1 , s 2 > 0. In words, heterozygous individuals produce more offspring than both types of homozygous individuals. In this setting, in an infinite population a stable intermediate allele frequency is expected to be maintained, preventing either type from disappearing. If q is the frequency of type A 1 and p = 1 − q that of type A 2 and if mating is random, the respective proportions of the three genotypes will be q 2 , 2qp, p 2 , hence the population cannot remain composed exclusively of heterozygous individuals. As a consequence, even when the stable equilibrium is reached, the mean fitness of the population will not be as high as the highest possible individual fitness (i.e. that of heterozygous individuals). This fitness reduction is referred to as the segregation load.
In finite populations, because of finite sample size, the allele frequency is never exactly at its optimum. This was the subject of a work by A. Robertson [Rob70] who considered this specific configuration of the relative fitnesses. He argued that the mean fitness in a panmictic population (i.e. one with no spatial structure) with finite but relatively large size N is reduced by a term of order (4N ) −1 , irrespective of the strength of selection. This is due to a trade-off between genetic drift and natural selection. The stronger selection is, the quicker the allele frequency is pushed back to the equilibrium, but at the same time even a small step away from the optimal frequency is very costly in terms of mean fitness. On the other hand, if natural selection is relatively weak, the allele frequency can wander off more easily, but the mean fitness of the population decreases more slowly. This reduction in the mean fitness due to genetic drift -which is added to the reduction from the segregation load -is called the drift load.
Robertson's result can be made rigorous using tools found in [Nor74a] and [Nor74b]. We adapt these to our setting and study the same effect in spatially structured populations. We find that the spatial structure significantly reduces the drift load, in a way that depends crucially on dimension. It turns out that migration prevents the allele frequencies from straying too far away from the equilibrium frequency, because incoming migrants are on average close to this equilibrium.
The paper is laid out as follows. We define the spatial Λ-Fleming-Viot process for a haploid model with general frequency dependent selection and for a diploid model of overdominance in Section 1. In Section 2 we state the main convergence results for the SLFVS in the bounded radius and stable radius regimes and we present our estimate of the drift load in spatially structured populations. In Section 3, we present the main ingredient of the proof: a martingale problem satisfied by the SLFVS. At the end of Subsections 3.2 and 3.3, we state more general results on solutions to these martingale problems which imply our convergence results for the SLFVS. Most of the remainder of the paper is dedicated to the proofs of these results. The central limit theorem in the bounded radius case is proved in Section 4, while the stable regime is dealt with in Section 5 (the two proofs share the same structure, but differ in the details of the estimates). Finally, the asymptotics of the drift load are derived in Section 6.
1 Definition of the model 1.1 The state space of the spatial Λ-Fleming-Viot process with selection We now turn to a precise definition of the underlying model, the spatial Λ-Fleming-Viot process with selection on R d , starting with the state space of the process. At each time t ≥ 0, {q t (x) : x ∈ R d } is a random function such that q t (x) := proportion of type a alleles at spatial position x at time t, which is in fact defined up to a Lebesgue null set of R d . More precisely, let Ξ be the quotient of the space of Lebesgue-measurable maps f : We endow Ξ with the topology of vague convergence: letting f, φ = R d f (x)φ(x)dx, a sequence (f n ) n converges vaguely to f ∈ Ξ if and only if f n , φ −→ n→∞ f, φ for any continuous and compactly supported function φ : R d → R. A convenient metric for this topology is given by choosing a separating family (φ n ) n≥1 of smooth, compactly supported functions which are uniformly bounded in L 1 (R d ). Then for f, g ∈ Ξ, defines a metric for the topology of vague convergence on Ξ. The SLFVS up to time T is then going to be a D ([0, T ], Ξ)-valued random variable: a Ξ-valued process with càdlàg paths.
is a metric for the topology of uniform convergence on D ([0, T ], Ξ).
For more details, see Section 2.2 of [VW15].

The spatial Λ-Fleming-Viot process with selection
Let us now define the dynamics of the process. Let u ∈ (0, 1] and s ∈ [0, 1], and let µ(dr) be a finite measure on (0, ∞) satisfying For m ∈ N and w ∈ [0, 1], let B m w be a vector of m independent random variables taking the value a with probability w and A otherwise. Then let F : [0, 1] → R be a polynomial such that for some m ∈ N and p : {a, A} m → [0, 1], for each w ∈ [0, 1], Definition 1.2 (SLFVS, haploid case with general frequency dependent selection). Let Π and Π S be two independent Poisson point processes on R + × R d × (0, ∞) with intensity measures (1 − s) dt ⊗ dx ⊗ µ(dr) and s dt ⊗ dx ⊗ µ(dr) respectively. The spatial Λ-Fleming-Viot process with selection for a haploid population with impact parameter u, radius of reproduction events given by µ(dr), selection parameter s and selection function F is defined as follows. If (t, x, r) ∈ Π, a neutral event occurs at time t within the ball B(x, r): 1. Choose a location y uniformly at random in B(x, r) and sample a parental type k ∈ {a, A} according to q t − (y) (i.e. k = a with probability q t − (y)).
2. Update q as follows: Similarly, if (t, x, r) ∈ Π S , a selective event occurs at time t inside B(x, r): 1. Choose m locations y 1 , . . . , y m independently uniformly at random in B(x, r), sample a type k i at each location y i according to q t − (y i ) and then let k = a with probability p(k 1 , . . . , k m ) and k = A otherwise.
In each case, (4) is clearly satisfied. We now give two variants of this definition corresponding to the two selection mechanisms discussed in the introduction. We begin with a model for a selective advantage for A alleles in haploid reproduction.
Definition 1.3 (SLFVS, haploid model, genic selection). The spatial Λ-Fleming-Viot process with genic selection with impact parameter u, radius of reproduction events given by µ(dr) and selection parameter s is defined as in Definition 1.2 with F (w) = w(1 − w). In this case, m = 2 and the function p equals simply p(k 1 , k 2 ) = 1 k 1 =k 2 =a .
In other words, during selective reproduction events, two types are sampled in B(x, r) and k = a if and only if both types are a.
We now define a variant of the SLFVS to model overdominance. Individuals are diploid and we study a gene which is present in two different forms within the population, denoted A 1 and A 2 . For t ≥ 0 and x ∈ R d , let q t (x) := the proportion of the allele type A 1 at location x at time t.
(If p 1 is the proportion of A 1 A 1 indiduals and p H is the proportion of A 1 A 2 heterozygous individuals, then q = p 1 + 1 2 p H .) We assume that the relative fitnesses of the different genotypes are as follows: In other words, for an event (t, x, r) in the SLFVS with w = |B(x, r)| −1 B(x,r) q t − (z) dz, we want to choose parental types (k 1 , k 2 ) ∈ {A 1 , A 2 } 2 at random with Further, we suppose that, with probability ν 1 , the type A 1 alleles produced mutate to type A 2 , and that, with probability ν 2 , the type A 2 mutate to type A 1 (this is a technical assumption to ensure that q t (x) / ∈ {0, 1}; we shall assume that ν 1 and ν 2 are small). We are going to be interested in small values of ν 1 , ν 2 , s 1 and s 2 . We thus define the following model, which is an approximation of the one described by (7) to the first order in s i and ν i . Definition 1.4 (SLFVS, overdominance). Suppose that ν 1 + ν 2 + s 1 + s 2 < 1. Let Π, Π S i and Π ν i , i = 1, 2 be five independent Poisson point processes on The spatial Λ-Fleming-Viot process with overdominance with impact parameter u, radius of reproduction events given by µ, selection parameters s 1 , s 2 and mutation parameters ν 1 , ν 2 is defined as follows. If (t, x, r) ∈ Π, a neutral event occurs at time t in B(x, r): 1. Pick two locations y 1 and y 2 uniformly at random within B(x, r) and sample one parental type k i ∈ {A 1 , A 2 } at each location according to q t − (y i ), independently of each other.
If (t, x, r) ∈ Π ν i , a mutation event occurs at time t in B(x, r): (In other words we suppose that the A i genes of the offspring mutate to type A 3−i .) 2. Update q as in (8).
Remark. Similarly to the haploid case, existence and uniqueness for this process can be proved as in [EVY14] using a dual process.
We shall see in Section 3 that this process satisfies essentially the same martingale problem as the general haploid process in Definition 1.2 with

Statement of the results
In this section, we present our main results. We consider the SLFVS as in Definitions 1.2 and 1.4, and we let the impact parameter and the selection and mutation parameters tend to zero. On a suitable space and time scale (depending on the regime of the radii of reproduction events) the process q N t t≥0 converges to a deterministic process. We also characterise the limiting fluctuations of q N t t≥0 about an approximation to this deterministic process as the solution to a stochastic partial differential equation.

Fixed radius of reproduction events
We begin by considering the regime in which the radii of the regions affected by reproduction events are bounded. We shall only give the proof in the case of fixed radius events; the proof for bounded radius events is the same but notationally awkward. Fix u, s ∈ (0, 1] and R > 0, and choose w 0 : R d → [0, 1] with uniformly bounded spatial derivatives of up to the fourth order. Take two sequences (ε N ) N ≥1 , (δ N ) N ≥1 of positive real numbers in (0, 1] decreasing to zero, and set Let µ(dr) = δ R , and let F : R → R be a smooth, bounded function with bounded first and second derivatives such that F | [0,1] satisfies (5) for some m ∈ N and p : {a, A} m → [0, 1]. Then for N ≥ 1, let q N t t≥0 be the spatial Λ-Fleming-Viot process with selection following the dynamics of Definition 1.2 with impact parameter u N , radius of reproduction events R, selection parameter s N and selection function F started from the initial condition q N 0 . Define the rescaled process q N t t≥0 by setting: We justify this scaling as follows. Consider an individual sitting at location x at time t. It finds itself within a region affected by a reproduction event at rate |B(0, R)|. The probability that it dies and is replaced by a new individual is u N = ε N u, so, if we rescale time by 1/ε N , this will happen at rate O (1). Also, we are going to see later (see Section 3.2) that the reproduction events act like a discrete heat flow on the allele frequencies. We rescale time further by 1/δ 2 N and space by 1/δ N , which corresponds to the diffusive scaling of this discrete heat flow. Since selective events also take place at rate O δ 2 N , this is the right scaling to consider in order to observe the effects of both migration and selection in the limit. (Due to this diffusive scaling we shall refer to this regime as the Brownian case.) We need to introduce some notation. Let L 1,∞ (R d ) denote the space of bounded and integrable real-valued functions on R d . For r > 0, we set V r = |B(0, r)| and, for x, y ∈ R d , V r (x, y) = |B(x, r) ∩ B(y, r)| . (10) When there is no ambiguity, we shall not specify the radius r and simply write φ(x). This notation will be used throughout this paper and formulae will routinely involve averages of averages, etc. For example we also write Let us define a linear operator L (r) by setting Finally let S(R d ) denote the Schwartz space of rapidly decreasing smooth functions on R d , whose derivatives of all orders are also rapidly decreasing. Accordingly, let S (R d ) denote the space of tempered distributions.
Let f N : R + × R d → R be a deterministic function defined as the solution to One can check that this defines a unique function by a Picard iteration. As stated in the introduction, the spatial Λ-Fleming-Viot process with genic selection with fixed radius of reproduction events converges, under what can be considered a diffusive scaling, to the solution of the Fisher-KPP equation (as in [EVY14] for d ≥ 2) while the limiting fluctuations are given by the solution to a stochastic partial differential equation which generalises the result obtained in [Nor75a]. We can now give a precise statement of this result for general frequency dependent selection. The same result holds for radius distributions given by a finite measure µ on a bounded interval.
Theorem 2.1 (Central Limit Theorem for the SLFVS with fixed radius of reproduction events). Suppose that ε N = o δ d+2 N , then the process q N t t≥0 converges in L 1 and in probability (for the metric d of Definition 1.1) to the deterministic solution of the following PDE, In addition, defines a sequence of distribution-valued processes converging in distribution in D [0, T ], S (R d ) to the solution of the following stochastic partial differential equation, where W is a space-time white noise.
Remark. The impact parameter u N is inversely proportional to the neighbourhood size -i.e. the probability that two individuals have a common parent in the previous generation (see Section 3.6 of [BEV13a] for details). Hence, letting u N tend to zero corresponds to letting the neighbourhood size grow to infinity.
We shall show in Section 3 that Theorem 2.1 is a consequence of Theorem 3.5. The latter is a result on sequences of solutions to a martingale problem and is proved in Section 4.
Remark. It would have been more natural to consider the fluctuations directly around the deterministic limit (f t ) t≥0 , but in fact the difference between f N and f is too large (of order δ 2 N , see Proposition 4.6). We have that

Stable radii of reproduction events
In the previous subsection, we assumed that the radius of dispersion of the offspring produced at reproduction events was small. We now wish to allow large scale extinction-recolonization events to take place to illustrate the fact that "catastrophic" extinction events can occur, followed by a quick replacement of the dead individuals by the offspring of a small subset (here only one individual) of the survivors. To do so, we suppose that the intensity measure for the radius of reproduction events µ(dr) has a power law behaviour, following the work in [EVY14]. The corresponding limiting behaviour is described by reaction-diffusion equations with non-local diffusion, studied for example in [Chm13,AK13]. Suppose that the measure µ(dr) for the radius of reproduction events is given by for some α ∈ (0, 2 ∧ d). Fix u, s ∈ (0, 1] and choose w 0 : R d → [0, 1] with uniformly bounded spatial derivatives of up to the second order. Again, take (ε N ) N ≥1 and (δ N ) N ≥1 two sequences in (0, 1] decreasing to zero, and set Let F : R → R be a smooth, bounded function with bounded first and second derivatives such that F | [0,1] satisfies (5) for some m ∈ N and p : {a, A} m → [0, 1]. Then for N ≥ 1, let q N t t≥0 be the spatial Λ-Fleming-Viot process with selection following the dynamics of Definition 1.2 with impact parameter u N , radius of reproduction events given by µ(dr) in (14), selection parameter s N and selection function F started from the initial condition q N 0 . The main difference with the setting of Subsection 2.1 is that the flow resulting from the reproduction events is the α-stable version of the heat flow (see Section 3.3). Thus we apply a stable scaling of time by 1/δ α N and space by 1/δ N (after rescaling time by 1/ε N as previously). Since we have chosen s N = δ α N s, this is the right scaling to consider in order to observe both selection and migration in the limit. For all x ∈ R d and t ≥ 0, set We need some more notation; recall the notation for double averages in (11). The following will take up the role played by F (w) in the fixed radius case. For H : [0, 1] → R, δ > 0, and f ∈ Ξ, set Recalling the notation in (10), set, for x, y ∈ R d , Remark. Up to a multiplicative constant, depending on d and α, D α is the fractional Laplacian (this can be seen via the Fourier transform, see [SKM93]).
We can now formulate our result for the stable radii regime. The main difference from Theorem 2.1 is that the Laplacian has to be replaced by the operator D α and that the noise driving the fluctuations is replaced by a coloured noise which is white in time and has spatial correlations which decay like We also set the following notation: for f ∈ Ξ, Note that if f denotes the frequency of type a in q N t immediately before a (neutral) reproduction event which hits both z 1 and z 2 with |z 1 − z 2 | ≥ 2δ N , then [f ] α (z 1 , z 2 ) is the probability that the offspring produced in this event are of type a.
Now define f N t as the solution to Theorem 2.2 (Central Limit Theorem for the SLFVS with stable radii of reproduction events). Suppose that ε N = o δ 2α N ; then q N t t≥0 converges in L 1 and in probability (for the metric d of Definition 1.1) to the deterministic solution of the following PDE, In addition, to the solution of the following stochastic partial differential equation, where W α is a coloured noise with covariation measure given by Remark. The fact that the correlations in the noise decay as |z 1 − z 2 | −α can be expected from the results in [BEK06]. The authors prove that, if N is a Poisson point process on R d × R + whose intensity measure is of the form dxf (r)dr with f (r) ∼ C r 1+α+d , one can define a generalized random field X on the space of signed measures on R d with finite total variation by Under a suitable scaling of the radius and of the intensity measure, it is shown that the fluctuations of X converge (in the sense of finite dimensional distributions) to a centred Gaussian random linear functional W α with (The notation has been changed so as to fit that of our setting; in [BEK06], β = α + d.) We shall show in Section 3 that Theorem 2.2 is a consequence of Theorem 3.8. The latter is a result on sequences of solutions to a martingale problem and is proved in Section 5.

Drift load for a spatially structured population
We shall illustrate the application of our results by studying the drift load in the SLFVS with overdominance as in Definition 1.4, in the case of bounded radii.
As in Section 2.1, fix u, s 1 , s 2 , ν 1 , ν 2 in (0, 1] and R > 0 such that s 1 + s 2 + ν 1 + ν 2 < 1, take two sequences (ε N ) N ≥1 , (δ N ) N ≥1 of positive real numbers in (0, 1] decreasing to zero, and set for i = 1, 2. Then for N ≥ 1, let q N t t≥0 be the SLFVS following the dynamics of Definition 1.4 with impact parameter u N , radius of reproduction events R, selection parameters s i,N and mutation parameters ν i,N , started from some initial condition q N 0 . One thing to note is that for our results to hold, we need to make sure that the allele frequencies do not get "stuck" -even locally -at the boundaries (i.e. upon reaching 0 or 1), which could significantly slow down the convergence to the equilibrium frequency. For this reason we choose to assume that during some mutation reproduction events the type of the offspring can differ from that of its parent. This will not affect the results in any other way provided that the mutation parameters are negligible compared to the selection parameters. In the remainder of this section, we assume that We shall see in Section 3 that this function plays the same role as F in the haploid case. Note that F satisfies the following conditions: furthermore there is only one such λ and it satisfies For the function F given in (23), λ is given by Let us define K N (t, x), the local mean fitness at a point x ∈ R d , as the expected fitness of an individual formed by fusing two gametes chosen uniformly at random from B(x, R) at time t ≥ 0. In other words, its two copies of the gene are sampled independently by selecting two parental locations y 1 and y 2 uniformly at random in B(x, R) and then types according to q t (y 1 ) and q t (y 2 ). Then, for ν i s i , (see [Rob70])

The first term
is the segregation load mentioned in the introduction, and it is of order δ 2 N . The remaining term is then the local drift load, which we aim to estimate at large times for large N . Let us set The following theorem is proved in Section 6 using some of the intermediate results used to prove Theorem 2.1.
Theorem 2.3. Suppose that q N 0 (x) = λ for all x and assume that ε N = o δ 4 N . There exists a constant C > 0, depending only on the dimension d, such that, for all Assumption (24)-(25) is crucial in [Nor74a], which serves as a basis for this result. In fact this condition ensures that λ is the only equilibrium point for the allele frequency, and that it is stable.
Remark. We chose to start the process from the equilibrium frequency λ -i.e. very near stationarity -but we need not do so. The same result can be obtained starting from an arbitrary initial condition, provided we let t grow sufficiently fast that the process reaches stationarity quickly enough. The corresponding centering term f N is then defined as in (13), and (24)-(25) ensures that it converges to λ exponentially quickly. Starting from λ simplifies the proof as in this case, for all t ≥ 0, f N t = λ.
In the non-spatial setting of the Λ-Fleming Viot process, a simplified version of the proof of Theorem 2.3 shows that the drift load is asymptotically proportional to u N . We can see u N as being inversely proportional to the neighbourhood size, in other words the probability that two individuals had a common parent in the previous generation (see [BEV13a] for details). This agrees with Robertson's estimate [Rob70] of (4N ) −1 , where N is the total population size in a panmictic population. Note that this estimate is independent of the strength of selection. This can be seen as the result of a trade off between selection and genetic drift: if selection is weak, the allele frequency can be far from the equilibrium whereas if selection is stronger, the allele frequency stays nearer to the equilibrium and in both cases the mean fitness of the population is the same.
For spatially structured populations, however, Theorem 2.3 shows that the local drift load is significantly smaller than in the non-spatial setting and does depend on the strength of natural selection. For example, if a population lives in a geographical space of dimension 2, the corresponding drift load will be of order u N s N |log s N |. Moreover, we see a strong effect of dimension on this estimate. Populations living in a space with a higher dimension have a reduced drift load compared to populations evolving in smaller dimensions. This result illustrates the fact that, in a higher dimension, migration is more efficient at preventing the allele frequencies from being locally far from the equilibrium frequency. It turns out from the proof that this is linked to the recurrence properties of Brownian motion.
Remark (Drift load in the stable case). If one considers instead the SLFVS with stable radii of reproduction events, under similar conditions to those in Theorem 2.2, one finds that for all d ≥ 1 and α ∈ (0, 2 ∧ d), ∆ N (t, x) is asymptotically equivalent to a constant times u N s N |log s N |.

Martingale problems for the SLFVS
This section provides the basic ingredients for the proofs of Theorems 2.1 and 2.2. In Subsection 3.1, we prove that the SLFVS satisfies a martingale problem. In Subsections 3.2 and 3.3, we study the martingale problem for the rescaled version of this process, in the fixed radius case and in the stable radii case, and state general convergence results for processes satisfying these martingale problems. Theorems 2.1 and 2.2 are a direct consequence of these results.

The martingale problem for the SLFVS
Let q N t t≥0 be defined as in Sections 2.1 and 2.2 as the SLFVS as in Definition 1.2 with impact parameter u N , distribution of reproduction event radii given by µ(dr), selection parameter s N and selection function F . Let (F t ) t≥0 denote the natural filtration of this process.
defines a (mean zero) square integrable F t -martingale with (predictable) variation process where Proposition 3.1 can be seen as a way to write q t as the sum of the effects of the different evolutionary forces at play in this model. The term φ − φ represents migration, while the term involving the function F in (28) accounts for the bias introduced during selective events. As for the martingale term, it corresponds to the stochasticity at each reproduction event, which is called genetic drift.
Proof of Proposition 3.1. We drop the superscript N from q N in this proof. Let P t,x,r (resp. P S t,x,r ) denote the distribution of the parental type k at a reproduction event (t, x, r) ∈ Π (resp. in Π S ). Then, from the definition of (q t ) t≥0 , and Integrating with respect to the variable x over B(z, r) then yields Thus (28) indeed defines a martingale -see for example [EK86, Proposition 4.1.7] (we can change the order of integration to do the averaging on φ instead of q in the first term). To compute its variation process, write and the other term within the curly brackets is O (s N ). Thus, integrating with respect to x and using (30), we recover By Jensen's inequality, dx ≤ φ 2 2 and the result follows from the assumption that Now let q N t t≥0 denote the SLFVS with overdominance as defined as in Definition 1.4 with impact parameter u N , radius of reproduction events R, selection parameters s i,N and mutation parameters ν i,N defined in (22). Recall the definition of F in (23) and let (F t ) t≥0 denote the natural filtration of this process. where Proof. Suppose a reproduction event hits the ball B(x, r) at time t, and let w = q N t − (x, r). Then, Note that this corresponds to the first order approximation of (7), modified to take mutations into account. It is straightforward to check that where F is given by (23). It follows as in the proof of Proposition 3.1 that (34) is a martingale. The result for the variation process also follows as in the proof of Proposition 3.1. (Note that σ (r) is replaced by ρ (r) in order to account for the fact that (6) is replaced by (8).) Remark. If q were continuous then as r → 0, σ (r) ). The factor of 1/2 represents the doubling of effective population size for a diploid population compared to a haploid one.

The rescaled martingale problem -Fixed radius case
As at the start of Subsection 2.1, let (ε N ) N ≥1 , (δ N ) N ≥1 be sequences in (0, 1] decreasing towards zero, and let F : R → R. defines a (mean zero) square-integrable martingale with (predictable) variation process Remark. Of course, one cannot expect uniqueness to hold for this martingale problem, due to the unspecified error term in (38). In the limit when N → ∞, however, the error terms will vanish.
Let q N t t≥0 be defined as at the start of Section 2.1. Set Proposition 3.4. For each N , the process w N t t≥0 satisfies the martingale problem (M1).
Proof. From Proposition 3.1, we know that, for φ ∈ L 1,∞ (R d ), where M N t (φ) is a martingale. By a change of variables, Thus, recalling the definition of the operator L (r) in (12) and the initial condition Moreover, by a change of variables in the variation process given in (29), and σ Hence w N satisfies the martingale problem (M1).
Proposition 3.4 is the main ingredient in the proof of Theorem 2.1. In fact we shall now see that under suitable conditions on the parameters (ε N ) N ≥1 and (δ N ) N ≥1 , the function F and the initial condition w 0 , any sequence of processes w N t t≥0 satisfying the martingale problem (M1) in Definition 3.3 will also satisfy a result analogous to Theorem 2.1. If τ N is of a smaller order than η N , w N can be expected to be asymptotically deterministic (on a suitable time-scale), and we can study its fluctuations around a deterministic centering term. Define f N : R + × R d → R as in (13). Quite naturally, this corresponds to equating (37) to zero and making its time-scale fit that of the limiting process.
Since the operator L (r) approximates the Laplacian as r → 0 (see Proposition A.1 in the appendix), The following result is proved in Section 4.
Theorem 3.5. Suppose that w N t t≥0 is a Ξ-valued process which satisfies the martingale problem (M1) in Definition 3.3 for some smooth, bounded F : R → R with bounded first and second derivatives and (δ N ) N , (ε N ) N converging to zero as N → ∞. Moreover, suppose Suppose also that w 0 has uniformly bounded derivatives of up to the fourth order and that there exists α N such that the jumps of w N t t≥0 are (almost surely) dominated by for every T > 0 with d given by Definition 1.1. In addition, defines a sequence of distribution-valued processes which converges in distribution in to the solution of the following stochastic partial differential equation, W being a space-time white noise.
Theorem 2.1 is now a direct consequence.
Proof of Theorem 2.1. Recall that (q N t ) t≥0 is defined in (9) as a rescaling of q N t t≥0 , and by , and the bound on the jumps (42) holds with α N = ε N u by (6). Hence Theorem 3.5 applies and the result follows by noting that The proof of Theorem 3.5 can be found in full detail in Section 4, but, in order to shed some light on the limiting equations that we obtain and to identify the difficulties in proving this result, let us outline the first calculations involved in the proof. As in [Kur71], we use bounds on the martingale (37) to show the convergence of w N t/η N t≥0 . When properly rescaled, this martingale converges to a continuous Gaussian martingale, implying the convergence of the fluctuation process Z N t t≥0 . For ease of notation, we shall set the constants uV R , 2R 2 /(d + 2) and s to 1 in the definition of (M1). Let M N t (φ) denote τ −1/2 N times the martingale defined in (37). Formally, we can then write (M1) as . (This Brownian scaling is not surprising since in the SLFVS case M N is essentially an integral against a compensated Poisson process, and we expect M N to converge to an integral against white noise.) Replacing t by t/η N above, we have Subtracting the equation df N and multiplying by (η N /τ N ) 1/2 on both sides, we obtain Since the function F : R → R is smooth, for k ∈ {1, 2} and x, y ∈ [0, 1], we can define the following: Then R k is continuous and bounded by 1 k! F (k) ∞ . In addition, by Taylor's formula, Substituting the second relation into (45) yields In fact, this equality holds in mild form, (In other words, every step above can be done using the integral form, yielding (48).) We can see M N as a martingale measure and, from a change of variables in (38), it can be seen that its covariation measure is given by Accordingly, we will sometimes write M N t (φ) as a stochastic integral (as defined in [Wal86, Chapter 2]), Note that we have linearised the drift term in (37) around the deterministic centering term, and that the remaining term (where R 2 appears) is the error due to this linearisation. The main difficulty in proving the convergence of Z N is to control this error. At first sight, it would seem that the factor (τ N /η N ) 1/2 in front of it is enough to make it vanish in the limit. However, some care is needed in dealing with the quadratic term in the spatial integral. Since Z N is going to converge as a distribution-valued process, its square does not make sense in the limit. The control of this term is achieved through Lemma 4.5, where we bound the square of the average of Z N t over a ball of radius r N . It is for this purpose that we require that τ N /η N = o r 2d N .
Once this is done, we will be in a good position to prove the convergence of Z N . Indeed, as r N tends to zero, The proof of convergence of Z N follows the classical strategy of proving that the sequence is tight before uniquely characterising its possible limit points. We are outside the safe borders of real-valued processes, but the theory presented in [Wal86] provides the main tools needed for the proof of our result. In particular, the argument relies heavily on Mitoma's Theorem (Theorem 6.13 in [Wal86]), which states that a sequence of processes (X n t ) t≥0 , n ≥ 1 with sample paths in

The rescaled martingale problem -Stable radii case
defines a (mean zero) square-integrable martingale with (predictable) variation process where (Note that the remark about uniqueness made after Definition 3.3 also applies to the martingale problem (M2).) Let q N t t≥0 be defined as at the start of Section 2.2. Set Proposition 3.7. For each N the process w N t t≥0 satisfies the martingale problem (M2). Proof. This is proved in a similar way to Proposition 3.4, using change of variables and the definitions of D (α,δ) , F (δ) and σ (α,r) in (17), (16) and (53) respectively.
Note that we cannot apply Proposition 3.1 directly, since in the stable case, ∞ 0 V 2 r µ(dr) = ∞, but the term from the second line of (32) in the proof of Proposition 3.1 can be bounded by We recover (52) since V r (z 1 , z 2 ) ≤ r d 1 r≥ 1 2 |z 1 −z 2 | . As in Subsection 3.2, we can now state a general result for a sequence of processes satisfying (M2) which will imply Theorem 2.2. Let f N be defined as in (20) and define f as the solution to The following result is proved in Section 5.
Theorem 3.8. Suppose that w N t t≥0 satisfies the martingale problem (M2) in Definition 3.6 for some smooth, bounded function F : R → R with bounded first and second derivatives and (δ N ) N , (ε N ) N converging to zero as N → ∞. Moreover, suppose Suppose also that w 0 has uniformly bounded derivatives of up to the second order and that there exists α N such that the jumps of w N t t≥0 are dominated by in (D ([0, T ], Ξ) , d). In addition, defines a sequence of distribution-valued processes which converges in distribution in D [0, T ], S (R d ) to the solution of the following stochastic partial differential equation, where W α is a coloured noise with covariation measure given by (21).
Theorem 2.2 is now a direct consequence.
Proof of Theorem 2.2. Recall that (q N t ) t≥0 is defined in (15) as a rescaling of q N t t≥0 , and by Proposition 3.7, letting w N t (x) = q N t (x/δ N ), w N t t≥0 satisfies the martingale problem (M2). Also , and the bound on the jumps (42) holds with α N = ε N u by (6). We conclude by applying Theorem 3.8 to w N t/η N = q N t .
The proof of Theorem 3.8 will make use of the same ideas as in the proof of Theorem 3.5 and, to improve readability, the steps of the proof which are most similar to those in the Brownian case will be dealt with more quickly, going into details only when the two arguments differ.
4 The Brownian case -proof of Theorem 3.5 As in the sketch of the proof in Subsection 3.2, for ease of notation, we shall set the constants uV R , 2R 2 /(d + 2) and s to 1 in the definition of (M1). Recall the expression for Z N t , φ in (48); the next subsection shows how time-dependent test functions can be used to write Z N t , φ as the sum of a stochastic integral against a martingale measure and a non-linear term. Subsection 4.2 will provide a bound on this quadratic term using a Gronwall estimate. We can then prove the convergence of the process The following result is used to reduce the convergence of distribution-valued processes to the convergence of a family of real-valued processes; it is a direct corollary of Mitoma's theorem [Wal86, Theorem 6.13].
Theorem 4.1 ([Wal86, Theorem 6.15]). Let (X n ) n≥1 be a sequence of processes with sample paths in Then there exists a process (X t ) t≥0 with sample paths in D [0, T ], S (R d ) such that X n converges in distribution to X.
In order to apply this result to the sequence of distribution-valued processes Z N N ≥1 , we need to check that the two conditions (i) and (ii) are satisfied. The first one is proved in Subsection 4.4, thus implying the tightness of the sequence by Mitoma's theorem. Subsection 4.5 deals with the convergence of the martingale measure M N (again as a distribution valued process, so this subsection will use Theorem 4.1). Finally condition (ii) is checked in Subsection 4.6.
In this section, in order to simplify the notation we often drop the sub-and superscripts N when there is no ambiguity; for instance, L should always be read L (r) , with r = r N .

Time dependent test functions
Fix φ ∈ S(R d ). We consider time-dependent test functions ϕ : such that (with a slight abuse of notation) ϕ(s, t) ∈ S(R d ) for all 0 ≤ s ≤ t and ϕ is continuously differentiable with respect to the time variables. The following is proved by adapting Exercise 5.1 of [Wal86].
Proposition 4.2. Let M be a worthy martingale measure and suppose that V t is a mild solution to the following equation: Suppose that A t (V t ) is adapted and that this equation is well posed. Then if ϕ is a time dependent test function, Returning to (48), we define a time dependent test function ϕ N as the solution to It is straightforward to check that ϕ N (s, t) ∈ S(R d ) for all 0 ≤ s ≤ t. Proposition 4.2 and (48) then yield (58) Here we see that in the special case where F is linear, R 2 = 0 and it remains to prove the convergence of the stochastic integral of ϕ N against the martingale measure M N . Using [Wal86, Theorem 7.13] we need only prove the convergence of M N and that of ϕ N to ϕ, where ϕ solves The following lemma, whose proof is given in Appendix C, provides the convergence of ϕ N to ϕ. In addition, there exist constants K 2 and K 3 such that, for 0 < |β| ≤ 4, and K 2 does not depend on φ.
Remark. Recall the definition of R 1 in (46); it is tempting to try to define ϕ N as the solution to In this way, according to our previous calculations in (48) and using Proposition 4.2, we would get rid of the first integral in (58). However, in this case, s → ϕ N (·, s, ·) is not adapted to the canonical filtration of our process and the stochastic integral with respect to the martingale measure M N is not well defined.

Regularity estimate
The following result is an easy consequence of the definition of M N .
Proof. From the definition of Q N in (49) and the definition of σ (r) (We have used Jensen's inequality in the last line.) For t > 0 and x ∈ R d , let be the fundamental solution to the heat equation on R d ; φ → G t * φ is then the semigroup of standard Brownian motion. Then f t as defined in (40) satisfies be a symmetric Lévy process on R d with generator φ → L (r) φ and let G (r) be the corresponding semigroup. Note that since ξ (r) t = 0 with positive probability, G (r) is not a well-defined function, but we do have G (r) t ∈ L 1,∞ . Then f N as defined in (13) satisfies The following provides a bound on the second moment of Z N t , which allows us to control the quadratic term in (58). Note that x → Z N t (x) is a well defined function (despite the fact that w N t/η is only defined up to a Lebesgue-null set) and that for each N ≥ 1, it is uniformly bounded on R d (by Lemma 4.5. For T > 0, there exists a constant K 5 > 0, independent of N , such that for 0 ≤ t ≤ T, The proof of this result mirrors that of Theorem 1 in [Nor75a], although it is more technical because of the Laplacian and the various spatial averages. Proof. Coming back to equation (48), and using (46) instead of (47), we write To get rid of the operator L, we use Proposition 4.2 with the time-dependent test function G Now we take φ(y) = 1 Vr 1 |x−y|<r , and we obtain We now want to apply Gronwall's lemma, but the last term must be controlled carefully. Taking the square of both sides and using (a + b) 2 ≤ 2(a 2 + b 2 ), we have By Jensen's inequality (and noting that Taking expectations on both sides and using Fubini's theorem, we obtain started from the origin.) In addition,

From Lemma 4.4, we have
The right hand side does not depend on x, so we can take the supremum over x ∈ R d on the left and Finally, we can apply Gronwall's lemma to deduce that

Convergence to the deterministic limit
The following result, proved in Appendix B, shows that f N converges to f . Proposition 4.6. For T > 0, there exist constants K 6 and K 7 such that, for all N ≥ 1, and, for all 0 ≤ |β| ≤ 4, sup where ∂ β f is the spatial derivative with respect to the multi-index β.
We are now in a position to prove the first statement of Theorem 3.5, namely the convergence of the process w N t t≥0 . We are going to prove the following lemma. Lemma 4.7. There exists a constant K 8 such that for all N ≥ 1 and for any function φ satisfying φ q ≤ 1 and max |β|=2 ∂ β φ q ≤ 1 for q ∈ {1, 2}, Before we prove Lemma 4.7, we show that it implies the convergence of w N t t≥0 . We can choose a separating family (φ n ) n≥1 of compactly supported smooth functions satisfying φ q ≤ 1 and max |β|=2 ∂ β φ q ≤ 1 for q ∈ {1, 2}, and define d as in (3) using this family. Then where the last line follows by Proposition 4.6 and Lemma 4.7. The right-hand-side converges to zero as N → ∞, yielding the uniform convergence (on compact time intervals) of w N t t≥0 to (f t ) t≥0 , the solution of equation (40). Note that, as soon as d ≥ 2, r 2 N is the leading order on the right-hand-side (see (41)).
Proof of Lemma 4.7. We are going to make use of (48) and apply Doob's maximal inequality to the martingale part. Let us first show that there exist two constants K and K such that, for t ∈ [0, T ], Indeed, taking the expectation of the absolute value of both sides of (58) and using Lemma 4.4, we have where we used Lemmas 4.5 and 4.3 in the last line. We have thus proved (64). Recalling (48) and the notation Taking expectations on both sides, we use Lemma 4.5 and apply (64) with φ replaced by By Doob's inequality and Lemma 4.4, Vr tends to zero as N → ∞ due to assumption (41). Hence, if φ q ≤ 1 and max |β|=2 ∂ β φ q ≤ 1 for q ∈ {1, 2}, the right-hand-side of (65) is bounded by some constant independent of N and φ.

Tightness
To prove that the sequence Z N N ≥1 is tight in D [0, T ], S (R d ) , we adapt the argument from the proof of Theorem 7.13 in [Wal86].
Proposition 4.8. For any φ ∈ S(R d ), for any arbitrary sequence (T N , ρ N ) N ≥1 such that T N is a stopping time (with respect to the natural filtration of the process Z N t t≥0 ) with values in [0, T ] for all N and ρ N is a deterministic sequence of positive numbers decreasing to zero as N → ∞, in probability as N → ∞. Proof of Proposition 4.8. We are going to treat each term in (58) separately. The first one converges to zero in L 1 , uniformly on [0, T ], as a consequence of Lemma 4.5. The second one is dealt with as in [Wal86, Theorem 7.13].
The proof requires three auxiliary lemmas as follows (the first two are proved in Appendix C). We extend ϕ N to R d × [0, T ] 2 by setting, for s, t ∈ [0, T ], In other words, for s > t, ϕ N (s, t) equals φ.
Lemma 4.9. For T > 0, there exists a constant K 9 such that, for all N ≥ 1 and for q ∈ {1, 2},

Now define
Lemma 4.11. For any 0 < β < 1/2, there exists a random variable Y N such that almost surely, and E Y 2 N ≤ C for all N ≥ 1.
Returning to the process Z N t , φ , by (58), we can write Let us deal with each term separately. The first two are similar so we need only consider the first one. Since inside the integral s ≤ T N ≤ T , ϕ N (s, T N ) ≤ sup t∈[s,T ] ϕ N (s, t) and we have |ϕ N (s, t)| ds.
Taking the expectation on both sides, we get where the second line follows by Lemma 4.5 and the third line follows by Lemma 4.10. Recall that we assumed in (41) that τ N /η N = o r 2d N ; hence the first term on the right-hand-side of (69) converges to zero in L 1 . By Lemma 4.11, we have, almost surely, Taking the expectation of the square of both sides, we write Hence the third term converges to zero in L 2 and in probability as N → ∞. Finally, since T N is a stopping time, we can apply Lemma 4.4 to the fourth term, This concludes the proof of Proposition 4.8.

Convergence of the martingale measure M N
The next step is to show that the martingale measure M N converges weakly in is a stochastic integral against the space-time white noise W and f is the solution of (40). We will naturally use Theorem 4.1, along with the following result on convergence to Gaussian martingales (which is a consequence of Lévy's characterisation of Brownian motion). For any Theorem 4.12 ([JS87, Theorem VIII 3.11]). Suppose (X t ) t≥0 = (X 1 t , . . . , X d t ) t≥0 is a continuous d-dimensional Gaussian martingale and for each n ≥ 1, (X n t ) t≥0 = (X n,1 t , . . . , X n,d t ) t≥0 is a local martingale such that (i) |∆X n t | is bounded uniformly in n for all t, and sup t≤T |∆X n t | P −→ n→∞ 0.
(ii) For each t ∈ Q ∩ [0, T ], X n,i , X n,j Then X n converges in distribution to X in D([0, T ], R d ).
In our setting, the limiting process (M t (φ)) t≥0 is a continuous martingale with quadratic variation (See [Wal86, Theorem 2.5].) Since this quantity is deterministic, (M t (φ)) t≥0 is Gaussian, and we can apply the result above. The following lemma is then enough to conclude that M N converges to M .
Lemma 4.13. For any φ ∈ S(R d ), Indeed, by polarisation, we can recover

and (ii) of Theorem 4.12 is satisfied by vectors of the form M
The bound on the jumps of w t , φ in (42) implies But we have assumed that α 2 The rationale here is to show that the main contribution to this term comes from the diagonal {(z 1 , z 2 ) : z 1 = z 2 } when r → 0. From the definition of σ (r N ) in (30), Changing the order of integration gives We are left with showing that the right-hand-side of (71) converges in probability to To do this, we first justify that φ can be let out of the average, we use Lemma 4.5 to argue that we can replace w N s/η by f N s , then the regularity of f N allows us to remove the averages and finally we know from Proposition 4.6 that f N converges to f . First note that Since 0 ≤ w(y) ≤ 1 a.e., we have As a consequence, By the same argument (replacing w by 1 − w), we can also let φ out of the average in the first term on the right-hand-side of (71), and the problem reduces to showing the convergence of We now see that it is enough to show But, by the triangle inequality, (We have used Lemma 4.5, Proposition A.1 in Appendix A and Proposition 4.6.) The right-hand-side converges to zero as N → ∞ (due to assumption (41)), providing the desired result. From all this we conclude uniformly for s ∈ [0, T ], which gives us (ii).

Conclusion of the proof
We are almost done. We have proved that the sequence of processes Z N N ≥1 is tight, and we need only characterise its potential limit points. Recall the following expression for Z N t , φ from (58): The first term converges to zero in L 1 from (70). Also, since, from Lemma 4.4 and Lemma 4.3, For φ 1 , . . . , φ p in S(R d ), let ϕ 1 , . . . , ϕ p be the corresponding solutions of (59) with φ = φ i . Since we showed in Section 4.5 that M N converges weakly to M , by [Wal86, Proposition 7.12], for t 1 , . . . , t p ∈ [0, T ], This uniquely characterises the potential limit points of Z N N ≥1 . By Theorem 4.1, Z N t t≥0 converges in distribution to a distribution-valued process (z t ) t≥0 given by where ϕ satisfies the backwards heat equation (59) with terminal condition φ, It is an easy exercise to prove that z t satisfies (See the proof of [Wal86, Theorem 5.2].) In other words, (z t ) t≥0 is the (mild) solution of (44) (recall that M t = f t (1 − f t ) · W t ) and Theorem 3.5 is proved.
5 The stable case -proof of Theorem 3.8 Turning to the proof of the central limit theorem in the stable case, we warn that its overall structure is the same as that in the Brownian case. Some steps need a different treatment however, and we explain those in more detail. Whenever the details of the argument are exactly the same as previously, we simply mention intermediate results without detailing their proof. To simplify our formulae, we use the following notation: The specific constants can always be retrieved from Section 4 or from a trivial calculation. Also as in Section 4 we set the constants u and (sV 1 )/α to 1 in the martingale problem (M2) defined in Definition 3.6. Let us write (M2) as Setting M N t (φ) = η 1/2 N M N t/η N (φ) and using the definition of F (δ N ) in (16), we have, by the same argument as for (48), and the covariation measure of M N is given by

Time dependent test functions
Recall Proposition 4.2 and how we used it in the previous proof. Define a time dependent test function ϕ N as the solution to the following.
By Proposition 4.2, we have We are thus left with finding a suitable way to bound the first term above and showing the convergence of the stochastic integral against M N . The convergence of the martingale measure M N is going to involve slightly different calculations compared to the previous case as the limiting noise is not a space-time white noise. The convergence of ϕ N , however, is proved in a similar way to before. Define ϕ as the solution to the following.
The following lemma, whose proof is given in Appendix C, provides the convergence of ϕ N to ϕ.

Regularity estimate
Let us first state the following L 2 bound for the stochastic integral.
The proof uses the following lemma, which is proved in Appendix C.
Lemma 5.3. For α < d, then for f, g ∈ L 1,∞ (R d ) Proof of Lemma 5.2. From the expression for the covariation measure in (76), But, by the definition of σ (α,δ) in (53), The second inequality is obtained from the first one and Lemma 5.3.
Let G (α) (resp. G (α,δ) ) denote the fundamental solution to the fractional heat equation with the operator D α (resp. the fractional heat equation with the truncated operator D α,δ ). Then the centering term f N as defined in (20) can be written as Likewise, using the definition of f t in (54), We can now prove the following counterpart of the regularity estimate (Lemma 4.5), which allows us to bound the quadratic error term in (78).
Proof. From (75) and the definition of R 1 in (46), we have Using Proposition 4.2 with ϕ(x, s, t) = G Repeating the same steps as in the proof of Lemma 4.5 and using Jensen's inequality, we get Using the first inequality of Lemma 5.2 and bounding As a result Taking the supremum of E Z N s (y, δ N r) 2 over y inside the integral on the right-hand-side, the function G (α,δ) integrates to 1, yielding Integrating over R, we get Hence, by Gronwall's inequality, for 0 ≤ t ≤ T ,

Convergence to the deterministic limit
The following result is proved in Appendix B.
Proposition 5.5. For T > 0, The convergence of w N t/η to f t in L 1 will follow from the next lemma.
Lemma 5.6. For any function φ satisfying φ q ≤ 1 and max |β|=2 ∂ β φ q ≤ 1 for q ∈ {1, ∞}, Indeed, by the same argument as in Section 4.3, choosing a separating family (φ n ) n≥1 of compactly supported smooth functions satisfying this condition and using the corresponding metric d on Ξ, one has by Proposition 5.5 and Lemma 5.6. From (55), it can be seen that the leading term on the righthand-side is δ α∧(2−α) N , which goes to zero as N → ∞, yielding the convergence of w N t/η . The following lemma is needed for the proof of Lemma 5.6 and is proved in the same manner as (64) in Section 4.3.
Lemma 5.7. For φ ∈ L 1,∞ (R d ) and t ∈ [0, T ], Proof. Taking expectations on both sides of (78), In the first integral, we have Hence, applying Lemma 5.4 to the first term and Lemma 5.2 to the second term on the right-hand-side of (81) yields We have used the fact that (by Lemma 5.1) ϕ N (s, t) q φ q to pass from the first line to the second. The third line follows since τ N /η N = o δ 2α N by (55).
Proof of Lemma 5.6. The proof of Lemma 5.6 is similar to the proof of Lemma 4.7. Setting Taking the expectation on both sides, Lemma 5.7 can be used in the first term, and Lemma 5.4 in the second one, to yield But ψ s q D α,δ φ q + F ∞ φ q and, by Proposition A.2.i in Appendix A, D α,δ φ q φ q + max |β|=2 ∂ β φ q . In addition, by Doob's inequality, and using Lemma 5.2, As a result, if φ q ≤ 1 and max |β|=2 ∂ β φ q ≤ 1 for q ∈ {1, ∞},

Tightness
The overall argument for the tightness of the sequence Z N t t≥0 is the same as in Section 4.4.
Proposition 5.8. For any φ ∈ S(R d ) and for any sequence (T N , ρ N ) N ≥1 such that T N is a stopping time with values in [0, T ] for every N ≥ 1 and ρ N ↓ 0 as N → ∞,  (67); we need estimates on ϕ N as in Lemmas 4.9 and 4.10. The proof of the following lemma is in Appendix C.
Lemma 5.9. For T > 0, q ∈ {1, ∞} and for all s, t, t ∈ [0, T ], In addition, for all s ∈ [0, T ], We shall only detail how the quadratic part of (78) can be bounded using Lemma 5.4, and refer to Section 4.4 for the rest of the proof of Proposition 5.8. For T N a stopping time with values in [0, T ], write |ϕ N (s, t)|(δ N r) dr r α+1 ds.
Taking the expectation on both sides and the supremum inside the spatial integral against ϕ N , we get by Lemma 5.4 and Lemma 5.9. The other terms in (82) are bounded as in the proof of Proposition 4.8 in Section 4.4, using Lemmas 5.9 and 5.2.

Convergence of the martingale measure M N
The convergence of M N relies on applying Theorem 4.12 to vectors of the form M N t (φ 1 ), . . . , M N t (φ p ) t≥0 , although the details differ from the proof in the Brownian case (in Section 4.5). Indeed, M N no longer converges to a stochastic integral against a space-time white noise, but to W α , a coloured Gaussian noise such that Hence the weak convergence of M N to W α in D [0, T ], S (R d ) will follow (as in Section 4.5) from the following lemma.
Proof. The proof of the first part is the same as for Lemma 4.13: which tends to zero since α 2 N = o (τ N /η N ). For the second part of the statement, we first show that We have from the definition of σ (α,δ N ) w(x, r)dx Subtracting the corresponding expressions with w N s/η and f N s and reordering terms, we write We shall deal with the terms from each of the three lines separately, so let us call them A(z 1 , z 2 ), B(z 1 , z 2 ) and C(z 1 , z 2 ) (they are in fact defined for a.e. z 1 and z 2 , and so is all that follows, but this is not a problem since what we really show is (83)). For the first term write (We have used the Cauchy-Schwartz inequality in the last line.) In addition, by Lemma 5.4,

0.
For the second term, by symmetry, In particular, by Proposition 5.5 ψ N z 2 q δ −α N φ q for q ∈ {1, ∞} and, since ψ N z 2 is deterministic, by Lemma 5.7 Hence, by (55), The third term is controlled in a similar way, this time setting which satisfies the same inequalities as the previous ψ N z 2 and using the bound on f N s ∞ from Proposition 5.5. As a result we have proved (83). Now write

0.
Finally, replacing w N s/η by f s and σ (α,δ N ) by σ α in (84), one writes using Proposition 5.5. It follows from Lemma 5.3 that and we have shown that, for all t ∈ [0, T ]

Conclusion of the proof
We can now conclude the proof of Theorem 3.8. We have proved that the sequence Z N N ≥1 is tight and we can characterise its potential limit points using the convergence of M N . Recall the following expression for Z N t , φ from (78) : In Section 5.4, we showed that the first term converges to zero in L 1 . In addition, by Lemmas 5.1 and 5.2, For φ 1 , . . . , φ p in S(R d ), let ϕ 1 , . . . , ϕ p be the corresponding solutions of (79) with φ = φ i . Since M N converges weakly to W α , by [Wal86,Proposition 7.12], for t 1 , . . . , t p ∈ [0, T ] Hence the same convergence holds (in distribution) for Z N t 1 , φ 1 , . . . , Z N tp , φ p and this characterises the potential limit points of Z N N ≥1 . By Theorem 4.1, Z N t t≥0 converges in distribution to a distribution-valued process (z t ) t≥0 which satisfies By the same argument as in Section 4.6, (z t ) t≥0 solves the stochastic PDE (56), which concludes the proof.
6 Drift load -proof of Theorem 2.3 Recall the definition of F and ρ (r N ) z 1 ,z 2 in (23) and (36) respectively.
defines a (mean zero) square-integrable martingale with (predictable) variation process (Again, uniqueness does not hold for this martingale problem, but we will not require it.) Let q N t denote the SLFVS with overdominance defined in Definition 1.4 with parameters as defined in (22) in Section 2.3. As in Subsection 3.2, we consider the rescaled process w N t (x) = q N t (x/δ N ). By Proposition 3.2, using the same rescaling argument as in Proposition 3.4, we have the following result.
Proposition 6.2. The process w N t t≥0 satisfies the martingale problem (M3).
As in Theorem 2.1, we define the process of rescaled fluctuations by (Recall that since w 0 = λ, the centering term is constant and equals λ.) Then by the definition of Let us define the following notation for any φ ∈ L 1,∞ (R d ), Theorem 2.3 is then a direct consequence of the following theorem.
Note that the only difference between the martingale problems (M1) and (M3) in Definitions 3.3 and 6.1 is that σ (r N ) z 1 ,z 2 is replaced by ρ (r N ) z 1 ,z 2 . Hence it is easy to see that Lemma 4.4 and Lemma 4.5 also hold in this case (with different constants). It is also possible to adapt the proofs in Section 4.5 to show that on compact time intervals, Z N t t≥0 converges to the solution of the following SPDE, This process admits a stationary distribution, under which z t , φ is a Gaussian random variable with variance We can thus hope to extend the convergence of Z N t t≥0 to the whole real line (as in [Nor77]), and use the above expression to estimate the second moment of Z N t , φ r N for large times. Some care is needed though, as we are letting the support of the test function vanish as N → ∞.
Proof of Theorem 6.3. Since q N 0 = λ, by the same argument as for (48), where M N is a martingale measure with covariation measure Q N given by Consider a time dependent test function ϕ N which solves Then, by Proposition 4.2 and (89), (91) The remainder of the proof now consists of proving that the main contribution to the variance of Z N t , φ r N is made by the last term on the right-hand-side and then estimating this contribution. Note that ϕ N is given explicitly by In particular, ϕ(s, t) q ≤ φ q e −F (λ)(t−s) . The following lemma extends the result of Lemma 4.5 to arbitrarily large times, and will be proved in Subsection 6.1.
Lemma 6.4. There exist constants K 1 and K 0 such that, for all x ∈ R d and all t ≥ 0, Using the expression for ϕ N in (92) and the Cauchy-Schwartz inequality, Another use of the Cauchy-Schwartz inequality yields Hence, using Lemma 6.4 and the fact that G (r) As a result, uniformly in t ∈ R + . We now move on to estimating the contribution of the second term in (91).
The following lemma will be proved in Subsection 6.1.
As we shall see in Subsection 6.1, this is a consequence of the fact that in the expression for Q N in (90), w N can be replaced by λ. As a result, using (92) and (93) in (91) and since To study the asymptotic behaviour of the first integral, we use the scaling properties of the function G (r) . Recall that ξ is a Lévy process with infinitesimal generator L (r) ; it is not difficult to show that it satisfies the following scaling property: (Simply look at the infinitesimal generator of both processes.) Hence If we can show that, as N, t → ∞, there is a constantC > 0 such that the result will follow. For this we need the following estimate of f (t) when t → ∞.
Lemma 6.6. For φ ≥ 0, as t → ∞, For the proof of this estimate we will use the following properties of the semigroup G (r) , which will be proved in Appendix D.
Lemma 6.7. For any r > 0 and t > 0, the law of ξ (r) t takes the form is continuous on R d , is invariant under rotations which fix the origin and g (r) t (y) is a decreasing function of |y|.
Proof of Lemma 6.6. By the semigroup property of φ → G (r) t * φ, f (t) can also be written G (1) 2t * φ(1), φ(1) . In addition, by the scaling property of ξ (r) t t≥0 and using Lemma 6.7, By Proposition A.1.ii and Theorem 4.8.2 in [EK86], the finite dimensional distributions of ξ (r) t t≥0 converge to those of standard Brownian motion as r → 0. In particular, ξ 2 (x) → G 2 (x) as r → 0 for almost every x ∈ R d (the probability that ξ (r) t = 0 vanishes as r → 0 for any t > 0). Since G 2 is continuous on R d and g (r) 2 is decreasing as a function of the modulus, this convergence takes place uniformly on compact sets by Dini's second theorem. So, fixing > 0, for any R > 0, for r small enough, sup As a result, using the continuity of G 2 , for any y, for t large enough, Hence, since g (r) From the above expression for f , Replacing φ with φ(1) in (97) and letting t → ∞ yields the result.
Remark. This is in fact a consequence of the fact that ξ is transient.
We now prove (95) separately for each regime.
High dimension If d ≥ 3, change the variable of integration to write Since f is integrable, by dominated convergence, and since tδ −2 N → ∞, Dimension 1 If d = 1, however, from (96), we see that, as N → ∞, 1 for some constantĈ > 0.
Dimension 2 If d = 2, let T 1 and T 2 be two positive constants and assume that t ≥ T 2 . We split the integral as follows : We first show that the first and last terms are of order r 2 N . Since 0 ≤ f (t) ≤ φ 2 2 for all t ≥ 0, For the middle term, by (96), 1 (4πs) −1 φ 2 1 , so as N → ∞, by dominated convergence, As a result as N, t → ∞. We have thus proved (95), and the result.
6.1 Proofs of Lemmas 6.4 and 6.5 The proof of Lemma 6.4 requires the following two technical lemmas, which are proved in Appendix D.
Lemma 6.8. Let φ : R d → R, r > 0 and suppose that g : Further, for some constant c > 0, for r small enough, Lemma 6.9. Suppose h : R + × R d → R is a function that is continuously differentiable with respect to the time variable t and which satisfies the following differential inequality for some positive α : Then for all 0 ≤ s ≤ t and for any 1 ≤ q ≤ ∞, Proof of Lemma 6.4. Set h(t, x) = E Z N t (x, r N ) 2 . We are going to make use of Lemma 6.9, so we want to obtain a differential inequality for h. To this end, average (89) on B(x, r N ) to get , r N )).
(From now on all averages will be over radius r N .) By the generalised Itô formula, Expanding the brackets, the terms on the second line cancel and, integrating for s ∈ [0, t], we have Taking expectations on both sides, since the second term is a martingale, Differentiating yields The second term is bounded by 1 Vr N , and the first one has the same form as the left-hand-side of the first statement of Lemma 6.8. In [Nor74b] (at the beginning of the proof of Theorem 3.2), it is proved that the conditions on F in (24)-(25) imply Then, taking φ = Z N t and g = R 1 (w N s/η , λ), Lemma 6.8 implies that, for all t ≥ 0, with α N = γ + O r 2 N . Using Lemma 6.9 (with s = 0) we can now write, since Z N 0 = 0, where the sum is over jump times for the process (Z N t (x)) t≥0 . We can bound the size of the jumps ∆Z N s (x) by a deterministic constant. By the definition of the SLFVS with overdominance in Definition where the sum is still over the jump times of Z N s (x). These jumps occur according to a Poisson process with rate V 2R η −1 N , so, using (99) to bound E Z N s − (x) , we obtain

Now note that
where |g t (x)| 1. Now the second statement of Lemma 6.8 yields : and by Lemma 6.9, we have The following lemma is needed in the proof of Lemma 6.5.
Lemma 6.10. The following holds uniformly for all t ≥ 0: Proof. Recall the expression for Z N t , φ in (91); using Lemma 6.4 and Lemma 4.4, we can write Replacing φ by (φ 1/r N ) r N -as defined in (88) -to use (94) and then looking at the proof of (95) in the proof of Theorem 6.3, we see that But φ 1/r N 1 = φ 1 and φ 1/r N 2 = r and we have the required result since Proof of Lemma 6.5. We drop the superscript N from ϕ N throughout the proof and take averages over radius r := r N . Recall from the expressions for Q N in (90) and ρ (r) in (36) that the variance of the stochastic integral t 0 R d ϕ(x, s, t) M N (dxds) is given by which can also be written We want to show that in this expression, w N s/η can (asymptotically) be replaced by λ, hence we write Since (w N t/η ) 2 − λ 2 = (τ /η) 1/2 Z N t (w N t/η + λ), using Lemma 6.4, In addition, (The cross terms cancel out by symmetry.) Thus, where ψ N z 2 (z 1 ) = Vr(z 1 ,z 2 ) V 2 r ϕ(z 1 ). In particular, By Lemma 6.10, we get 2 )dz 2 1/2 , using the Cauchy-Schwartz inequality in the second line. By (101), We use a similar argument for the other terms in (100) to show that replacing w N s/η by λ makes a difference of o r 2 N c 1/2 N ϕ 2 2 . We have thus shown that, since r N c uniformly in s ≥ 0. The result follows.

A Approximating the (fractional) Laplacian
We use here the notation defined in (74).
Proposition A.1. Let φ : R d → R be twice continuously differentiable and suppose that ∂ β φ q < ∞ for 0 ≤ |β| ≤ 2 and 1 ≤ q ≤ ∞. Then If in addition, φ admits · q -bounded derivatives of up to the fourth order, Proof of Proposition A.1. By Taylor's theorem, where R ij (y) = 1 0 (1 − t)∂ ij φ(x + t(y − x))dt (we use the notation x i 1 ...i k = x i 1 . . . x i k ). By symmetry, the integral of the first sum over a ball vanishes, and If q = ∞, then |R ij (y)| ≤ 1 2 ∂ ij φ ∞ and we write , by Jensen's inequality. But, by the definition of R ij Plugging this into the previous inequality, we get The second inequality is proved in essentially the same way. We expand φ according to Taylor's theorem to the fourth order: where R ijkl (y) = 1 3! 1 0 (1 − t) 3 ∂ ijkl φ(x + t(y − x))dt. Integrating, all the antisymmetric terms vanish and we obtain Proof of Proposition A.2. From the definition of D α,δ and Φ (δ) in (17), note that For q ∈ {1, ∞}, using Proposition A.1 Likewise, we have By Proposition A.1, we then write The third statement is a rewording of the first one in a slightly different setting. Indeed by (16), Hence as in the proof of (i) The last term appears because there is an average inside the function F . The result then follows from the fact that ∂ ij F (φ) = ∂ ij φ F (φ) + ∂ i φ ∂ j φ F (φ).

B.1 The Brownian case
Proof of Proposition 4.6. Recall the following expression for f N from (60), Since G (r) We can now prove the second part of the statement by induction on |β|. Suppose that the result is established for every 0 ≤ |β| < k ≤ 4 and take β such that |β| = k. (From now on we omit the superscript N in the induction proof.) Noting that and recalling that w 0 is assumed to have uniformly bounded derivatives of up to the fourth order, we can differentiate on both sides of (103): The sum is uniformly bounded by a constant K by the induction hypothesis, and so, using the fact that G (r) We can apply Gronwall's inequality to conclude where the right hand side is independent of both t ∈ [0, T ] and N ≥ 1. We can now prove the first statement using Gronwall's inequality again, together with Proposition A.1 and the first part of the proof. Recall that G t denotes the fundamental solution to the heat equation. Recalling that we set the constants uV R , 2R 2 /(d + 2) and s to 1, equations (13) and (40) can be written as and f t (x) = G t * w 0 (x) + t 0 G t−s * F (f s )ds.

By Proposition A.1,
since max |β|=4 ∂ β f N s ∞ is uniformly bounded from the previous argument. Also by Proposition A.1, (The term within brackets is uniformly bounded from the first part of the proof.) Finally, we also have Hence, using the fact that G t * φ ∞ ≤ φ ∞ , there exists a constant C > 0 such that, for t ∈ [0, T ], Applying Gronwall's inequality,

B.2 The stable case
Proof of Proposition 5.5. The proof of the convergence of the centering term in the stable case goes along the same lines as in the Brownian case of Proposition 4.6. Differentiating (80) yields : dr r α+1 ds.
One can then proceed by induction as previously to show and Gronwall's inequality yields the second part of the statement. For the first part, the proof is identical to that in the Brownian case, one simply has to replace the operators 1 2 ∆ and L (r) by D α and D α,δ , respectively, and likewise replace F (f N t ) by F (δ) (f N t ). Proposition A.2 then yields the correct estimates on the corresponding error terms.

C.1 The Brownian case
Proof of Lemma 4.3. The proof of Lemma 4.3 is similar in spirit to that of Proposition 4.6. We start by proving the bound on the derivatives of ϕ N . By the definition of ϕ N in (57), Using the fact that G (r) t is a contraction in L q , we have, for q = 1, 2, By Gronwall's inequality, we conclude that ϕ N (s, t) q ≤ 2 (q−1)/q φ q e 2 q−1 Thus the statement holds for β = 0. We can then proceed by induction on |β| as in the proof of Proposition 4.6 to show that the same holds for every 0 ≤ |β| ≤ 4 (making use of the fact that by Proposition 4.6, f N has uniformly bounded derivatives). We omit the details.
We are left with proving the convergence estimate for ϕ N which is again a Gronwall estimate. As in the proof of Proposition 4.6, write (57) and (59) as and ϕ(x, s, t) = G t−s * φ(x) − t s G u−s * F (f u )ϕ(u, t) (x)du.
By Proposition A.1 and the bound on the spatial derivatives of ϕ N , Still by Proposition A.1, (omitting superscripts N and time variables) The last term inside the brackets is uniformly bounded by Proposition 4.6 and the second to last is bounded as a consequence of the first part of the proof. Also, ∂ ij (F (f )ϕ) is dominated by a linear combination of (averages of) derivatives of both f and ϕ. The latter are bounded in L q while the former are bounded in L ∞ , hence the first term within the brackets is also uniformly bounded. To sum up, Finally, by Proposition 4.6, F (f N u ) − F (f u ) q r 2 N . Hence, subtracting (106) from (105) and using Jensen's inequality as above with the L q -contraction property of G t , we have, for t ∈ [0, T ], We conclude with Gronwall's inequality, yielding the first statement of Lemma 4.3.
Proof of Lemma 4.9. We can assume that t > t ≥ s (if t ≥ s ≥ t, then ϕ N (s, t) = φ = ϕ N (s, s) and the problem reduces to bounding ϕ N (s, t ) − ϕ N (s, s)). Using (104) and recalling the way we extended ϕ N in (67), we write Again, we use the L q -contraction property of G We need a bound on the first term; recalling the definition of G (r) in Subsection 4.2, we have By Jensen's inequality, by Proposition A.1. Hence, returning to (108), Gronwall's inequality now yields the result.
Proof of Lemma 5.9. The argument for the continuity estimate is the same as in the proof of Lemma 4.9, using Proposition A.2. For the second bound, we use the same argument as in Lemma 4.10, again using Proposition A.2.