Small populations corrections for selection-mutation models

We consider integro-differential models describing the evolution of a population structured by a quantitative trait. Individuals interact competitively, creating a strong selection pressure on the population. On the other hand, mutations are assumed to be small. Following the formalism of Diekmann, Jabin, Mischler, and Perthame, this creates concentration phenomena, typically consisting in a sum of Dirac masses slowly evolving in time. We propose a modification to those classical models that takes the effect of small populations into accounts and corrects some abnormal behaviours.


The first model
We study the dynamics of a population subject to mutations and selection due to competition between individuals. Each individual in the population is characterized by a quantitative phenotypic trait x (for example the size of the individual, their age at maturity, or their rate of intake of nutrients). For simplicity, x is taken here in R even though all the arguments could easily be extended to higher dimensional cases.
Probabilistic models are usually considered as the most realistic in that setting. They consist in life and death processes for each individual X i , Poisson processes more precisely, with for instance birth rate b(X i ) and a death rate which increases with the competition between individuals, for example When a birth occurs, it simply adds another individual with the same trait, except when a mutation takes place with small probability. In that case the new individual has a different random trait, obtained through some distribution K. We refer to [25] and the references therein for a nice introduction to the probabilistic approach.
Of course this is only one possible model and there are many variants. One can modify the interaction between individuals for instance by introducing explicit resources (with chemostat like interactions maybe). The competition could influence both the birth rate and the mortality rate...
When the total number of individuals is too large (it can easily reach 10 10 − 10 12 for some micro-organisms), it becomes prohibitive to compute numerically the solution to this process. In that case one expects to be able to derive a deterministic model as a limit of large populations. Such of derivation was proved in [8] and one obtains integro-differential equations like ∂ t u(t, x) = b(x) − d(x) − I(x − y)u(t, y) dy u(t, x)+M(u)(t, x), (1.1) where the mutation kernel is for instance

The scaling of fast reaction and small mutations
Eq. (1.1) is easy and fast to solve. However in most situations, small parameters appear and complicate that. This is a consequence of two small scales that are typical for this problem: • The rate of mutation is very small in comparison to the reproduction rate. As we wish to see the evolution of traits generated by the mutations, this means that we need to rescale the equation in time and will hence get a large reproduction rate.
• The size of each mutation is small.
To simplify again our equations, let us now assume that b = 1 and denote r(x) = 1 − d(x) the reproduction rate of an individual without competition. Taking the two scalings into account, one has in fact to deal with an equation like where the mutation kernel now reads Note that M ε (f ) is indeed of order 1 if b f ∈ C 1 for instance. Eq. (1.2) is now much more delicate. The properties of the solution might depend on ε (its smoothness for instance) and solving it numerically can become again very costly if ε is too small (a typical value for many applications would be ε ∼ 10 −4 ). Therefore one would wish to derive a new model as ε → 0.
Eq. (1.2) is strongly similar to a reaction-diffusion equation with a strong reaction term. However a crucial difference here is that the reaction term is non local. As we will see this completely changes the behaviour of the solution.
This scaling was introduced in [17] and formally studied there. We briefly reproduce the main argument here. The starting point is to introduce a large deviation scaling (which makes sense in view of the original probabilistic interpretation of the model) ϕ ε (t, x) = ε log u ε . (1.3) Then it is easy to see that Eq. (1.2) becomes where H ε (f ) = R K(z) e (f (x+ε z)−f (x))/ε − 1 dz. (1.5) Using the theory of viscosity solutions to Hamilton-Jacobi equations, one can pass to the limit in (1.4) and obtain for the limit ϕ of ϕ ε ∂ t ϕ = r − I ⋆ u + H(∂ x ϕ), (1.6) with u the weak limit (in the space of measure) of u ε and H ε (f ) = R K(z) e (f (x+ε z)−f (x))/ε − 1 dz. (1.7) Of course at the limit, ϕ and u are no more connected by a relation like (1.3). Therefore Eq. (1.6) is no more closed and the question of how to recover u from ϕ is one of the main difficulty here, which is now fortunately better understood though.

The problem with Eq. (1.2)
Let us focus here on one other delicate issue with this approach which is the main motivation for the current work. In the scaling under consideration, one has growth or decay of order exp(C/ε). In particular one can see that the ratio between the maximal value of u ε and the value at most other point is of this order exp(C/ε). However if we come back to the starting point, which means a total population of 10 10 − 10 12 and ε ∼ 10 −4 , then there is an obvious problem. If (and it can be proved) u ε is of order exp(C/ε) over a fixed interval of traits I then the total population over this interval includes in fact much less than one individual! This has several consequences. First of all it means that given the scales under consideration here, the limit of the probabilistic models to the deterministic one is not fully justified (the total number of individuals would have to be much higher with respect to ε). This, in itself, could be ignored as anyway with 10 12 individuals the obvious solution, use the probabilistic model, is not practical. Hence one could reasonably accept to still work with the deterministic model if its predictions were qualitatively in agreement with the behaviour of the stochastic one. That is not the case.
The first indication of a serious flaw comes from the case where we do not put any mutations in the model. Making M ε = 0 in (1.2) still allows to perform the same analysis. In such a case, one would expect that nothing happens anymore at the limit as no evolution should be possible without mutations. However this is not the case and numerical simulations in particular show a remarquably similar behaviour between the cases with and without mutations (see [30] for example). This phenomenon is entirely due to the persistence of very small subpopulations which should actually be extinct but are kept by our deterministic model and can therefore re-emerge later if the conditions are right.
A second important problem concerns the issue of branching. Biologically speaking branching is the process by which one population divides itself into two (and then possibly more) subpopulations. Mathematically one expects that at the limit, u will be a sum of moving Dirac masses Branching then corresponds to the case when one Dirac mass becomes two. This phenomenon is one of the main motivation to study models like (1.2) instead of the adaptive dynamics approach for instance (see [15] for an introduction). At the limit (1.6), branching occurs at infinite speed, i.e. if Dirac mass αδ x(t) divides itself after some time Instead probabilistic models predict a finite speed at branching... Those qualitative disagreements are important and should be corrected. One would then hope to derive models that are both able of dealing with very large populations and still treat correctly the small subpopulations. It is for the moment a completely open question of how to keep the stochastic effects for the small populations. But one could at least try to truncate the populations with less than 1 individual, which is expected to be enough to correct the qualitative flaws of the deterministic models.
There are already some attempts in this direction, see [30] and very recently [29]. The proposed correction in [30] consists in studying equations like The added mortality term has the effect of killing (in times of order ε) any population with a density less thanū ε . From the modeling point of view, this is quite satisfactory as it corrects most of the problems with (1.2). Unfortunately the mathematical analysis of an equation like (1.8) is for the moment untractable. The only situation that is understood is whenū ε is chosen like exp(−φ/ε). However this is exactly the scaling that was giving less than 1 individual and it is not satisfactory. Instead given our scalings, one would like to work withū ε that are polynomial in ε.
However that would mean truncating every value less than C ε log ε in ϕ ε , which implies at the limit, as ϕ ≤ 0, everything...

The proposed correction: Cooperative interactions
We propose here a correction that is inspired from the case with sexual reproduction. In the present context of mostly asexual reproduction, it can however be better understood as taking into account some cooperative effects between the individuals.
We can prove that it completely corrects all the abnormal behaviours of (1.2) at the limit. In addition it is for the moment the only correction for which one can derive rigorously the limit. That means in particular that one can obtain numerical simulations for realistically low ε.
There exists a well known phenomenon in the case of sexual reproduction which drives to extinction small populations. If the population is too small then the probability of meeting a partner is very low and hence the birth rate declines. This is the idea that we follow here.
Let us explain it first in the context of sexual reproduction. Consider a subpopulation with trait x, if the density of population for x, u ε (x) is below a critical valueū ε then we assume that the probability for an individual with trait x to meet a partner is too low. Instead this individual will reproduce with an individual with a different trait y such that u ε (y) is large enough. Typically one can expect then that y should be the closest trait to x with a population large enough. However as the two traits are different, the individuals are not as compatible and the corresponding birth rate should decrease with the distance |x − y|.
The same phenomenon can be seen if one assumes that individuals of similar traits may cooperate. Selection-mutation models rightly focus on competition between individuals as the main interaction mechanism in order to observe selection (and hence evolution). Nevertheless cooperation usually exists as well; we assume here that it takes place at a smaller scale than the competitive interactions.
We are for example lead to add to the reproduction rate a cooperative effect of the kind for a small parameter η ε and a decreasing Φ. Note that the maximum effect of cooperation is capped (obviously birth rates cannot blow up). Of course if one expects to see an effect on the small population then it is necessary that the reproduction rate without cooperation be negative, which means that the cooperation effect should be strong enough.
It is possible to simplify (1.9) (for numerical purposes for instance) while keeping the same structure and the same effects. When one combines it with the lower value of the reproduction rate, and assume for instance a linear Φ, then it is essentially possible to replace (1.9) by a regularization of (1.10) Those are the type of corrections that we consider here, with K large enough.

A brief overview of the various approaches for selection, mutation dynamics
The stochastic approach is based on individual-based models. As we mentioned before, they are related to evolutionary PDE models as those here or in [13,23] through a scaling of large population (see again [8]). Using a simultaneous scaling of large population and rare mutations, a stochastic limit process was obtained in [6] in the case of a monotype population (i.e. when the limit process can only be composed of a single Dirac mass), and in [11] when the limit population can be composed of finitely many Dirac masses. Other features can be added to those models, age-dependence for instance as in [26].
At the deterministic level, one approach consists in studying a simultaneous scaling of mutation and selection, in order to obtain a limit dynamics where transitions from a single Dirac mass to two Dirac masses could occur; the famous branching phenomenon that we have already mentioned. This is where the deterministic models presented here fit. More is said about the contributions in that area in the next subsection.
Another approach consists in completely separating the two scalings. One then tries to directly characterize evolutionary dynamics as sums of Dirac masses under biologically relevant parameter scalings, instead of obtaining it as some limit. This is a key point in adaptive dynamics- [22,27,28,14,7].
A classical way of justifying this form of the solution consists in studying the stationary behaviour of an evolutionary model involving a scaling parameter for mutations, and then letting this parameter converge to 0. The stationary state has been proved to be composed of one or several Dirac masses for various models (for deterministic PDE models, see [4,5,13,23,18], for Fokker-Planck PDEs corresponding to stochastic population genetics models, see [3], for stochastic models, see [33], for game-theoretic models, see [12]).
Closely related to these works are the notions of ESS (evolutionarily stable strategies) and CSS (convergence stable strategies) [28,15], which allow one in some cases to characterize stable stationary states [4,13,23,12].
In this context the phenomenon of evolutionary branching, which is especially important for us, simply corresponds to the direct transition from a population composed of a single Dirac mass to a population composed of two Dirac masses, [28,19,20]. The scalings and the first formal results have been obtained in [17]. This was followed by several works on other models and on the corresponding Hamilton-Jacobi PDE [5,31].
The main difficulty as mentioned before is the identification of the weak limit of u in terms of ϕ. There is usually one additional information which comes from uniform bounds on the total mass, it is that One solution is then to try to see the undetermined I ⋆ u as a Lagrange multiplier for this additional constraint.
For example if ϕ solves (1.6) then our constraint should imply that r−I ⋆u is non positive on the set {ϕ = 0} and vanishes on at least one point.
Unfortunately, it is in general even formally not possible to identify I ⋆ u with that. For instance if ϕ attains its maximum at one point then one has only constraint which is not enough.
There are however cases of competitive interactions where this suffices, Suppose for instance that I = 1. Then I ⋆ u is just the total mass and formally one simply expects that This is typical of so-called one resource interaction, meaning that the individuals interact only through one average quantity. In general that means considering interactions of the type with an increasing R.
For models of the type we consider here, rigorous results (especially for the well posedness of the Hamilton-Jacobi eq. at the limit) mainly only exist in this case with just one resource, see [2] and [1], [24] (one resource but multidimensional traits).
However it is possible to extend the theory, see [9]. Formally one expects the limit u to satisfy the following conditions This corresponds to the definition of an evolutionarily stable strategy. If one assumes a strong competition, i.e. that the operator I⋆ is positive, then there exists a unique measure satisfying (1.12) (see [23]). This is the approach followed here as well. With respect to the case with one resource, there are nevertheless additional difficulties. It is much harder to control the time oscillations of the reaction term, which is usually increasing and hence BV in time for one resource.

The model studied and the results
According to the previous considerations, we study the following equation where we recall that the mutation kernel M ε reads Following [17], one defines ϕ ε as and obtains the equation First of all we need to make sure that only the traits in a compact interval are important (to avoid traveling waves effects for instance). This can be simply ensured by asking all traits out of an interval to have a negative reproduction rate Of course the whole population should not vanish immediately, and it is necessary that a non negligible part be concentrated on [−R, R]. The total population should also be bounded which leads to We assume that the individuals interact through a strong competition This allows us to define uniquely the ESS as per Proposition 2.1 Assume (2.8), (2.6) and that r, I ∈ C(R). For any closed Ω ⊂ R, there exists a unique finite nonnegative measure µ(Ω) satisfying Note however that sometimes, one may have uniqueness of the environmental variables whereas the population measure is not unique (and (2.8) is violated), most of our method would remain valid in such a case. This is the situation of a single resource, where almost nothing is required, see [2]. Nevertheless in more general situations, the conditions for which this kind of property holds are not currently identified. We need an additional assumption to make sure that the ESS can only be concentrated on a set of measure 0 This condition is in part technical and is required to avoid some time oscillations. However it also corresponds to the natural biological idea that only a few traits can be present at a given time.
It is probably hard to check (2.9) in specific models, but it is at least satisfied in large classes of parameters. One easy example is if the derivatives r (k) and I (k) are positive (or negative) for some k. It has in fact been proved, see [21], that generically in r and I the ESS is discrete (a finite sum of Dirac masses) and hence (2.9) should be satisfied.
As for D ε , we assume that there exists a critical scaleφ ε s.t. (2.10) Those assumptions are compatible with the type of corrections like (1.9) and (1.10). They introduce a new scale in the problem −φ ε which corresponds to a critical population density of exp(−φ ε /ε). In line with our previous consideration, we would like to take this polynomial in ε which means −φ ε ∼ ε log ε or more precisely for a constant C uniform in ε There is no need to be more precise on D ε to study the properties of Eq.
(2.4). However if one wants to identify the limit, it is necessary to specify what D ε should look like at the limit. Therefore we make the additional assumption, for any fixed f ∈ C(R), non positive This corresponds to (1.10) but other shapes would be possible and would essentially work the same. Formally, we can hence expect that as ε → 0, Eq. (2.4) will lead to where the Hamiltonian H ε became This is indeed what one can prove on the bounds for the initial population, (2.8) on the interaction I (2.9), (2.10) on D ε with (2.11), and (2.12) on the limit of D ε , that the initial data and that ϕ ε (t = 0, ·) converges to a function ϕ 0 for the norm · L ∞ (R) . Then up to the extraction of a subsequence in ε, ϕ ε converges to some continuous ϕ uniformly on any compact subset of [0, T ]×R and ϕ is a solution to (2.13) almost everywhere in t, x with initial condition ϕ(t = 0, ·) = ϕ 0 . In particular the function I ⋆ u ε converges to From a practical point of view, computing the solution u ε of Eq. (2.1) is often too costly for small ε. This result allows to approximate the population density u ε for small ε by the simpler µ({ϕ(t, ·) = 0}), where ϕ may be obtained by a discretization of (2.13), in the fashion of those done in [17]. Rigorous numerical analysis of this kind of Hamilton-Jacobi equations is however still very preliminary. Section 3 gives a short sketch of the proof of Prop. 2.1 and can be safely ignored if one is familiar with [23], [32].
The proof of Theorem 2.1 is given in the next section and being quite technical is divided into several lemmas. Lemma 4.1 just corresponds to the classical a priori estimates on equations like (2.4). Lemma 4.12 essentially follows the step of [9] and can also be skipped if the reader is familiar with that work.
In the proofs below, C denotes a numerical constant which may change from line to line but which only depends on T , norms of the initial data or the coefficients, but which is always uniform on ε.
We define as usual the distance between a point and a set We also define the semi distance δ(O 1 , O 2 ) between two sets O 1 and O 2 as usual by We denote by M 1 (ω) the set of signed Radon measures on the subset ω of R equipped with the total variation norm.

Sketch of the proof of Prop. 2.1
A complete proof of this proposition can be found in [23], and [32]. We only show uniqueness and give a rough skecth of the existence part. For a given compact set Ω, assume that we can find two measures µ 1 and µ 2 with support in Ω and such that Summing with the symmetric term, one obtains However sinceÎ > 0 then the corresponding quadratic form is positive and we can conclude that µ 1 = µ 2 .
For the existence, one considers the equation for a well chosen initial data µ 0 . If Ω is fully discrete, Ω = {x 1 , ..., x n } then one may choose µ 0 = i δ x i . If Ω is an interval then one simply takes µ ε = 1. The general case is trickier and consists in taking an initial measure µ 0 s.t. for any δ > 0 and It is then easy to obtain lower and upper bounds for the total mass. Note that though there is also a mutation term here, its scaling is completely different from (2.1). The scaling does not involve small mutations and moreover the mutations are of order ε (instead of 1) thus vanishing at the limit.
Passing to the weak limit, µ ε → µ, gives a measure that has the desired properties (but is by no means easy to show).
4 Proof of Theorem 2.1

A Priori estimates
We start by stating and proving the obvious a priori estimates that one can obtain for the problem. Those essentially follow the lines of previous works.
We show the following estimates on the solution to (2.4) Lemma 4.1 Let ϕ ε be a solution to (2.4) with the assumptions of Theorem 2.1. Then for any T > 0 where C only depends on the time T , R e ϕ 0 ε (x)/ε dx, ∂ x ϕ 0 ε L ∞ (R) and the infimum of ∂ xx ϕ 0 ε (x). Finally ϕ ε has level sets, uniformly bounded in ε.
Proof. Almost all proofs here are taken directly from [9] and reproduced for the sake of completeness. Some resulting bounds are different and much worse than in this former article, more precisely the lower bounds on ∂ xx ϕ ε and H ε . Nevertheless even for those, the proofs, i.e. the way to obtain the bounds, are very close. As such we may skip some technical details.
Step 0: Upper Bound on the total mass. First notice that because of (2.6), there exists R > 0 s.t.
Let ψ be a smooth test function with support in |x| > R, taking values in [0, 1] and equal to 1 on |x| > R + 1. Using the previous bound, we compute On the other hand, on the bounded domain [−R − 1, R + 1] as I(x) > 0 for all x, one has for some constant C Therefore with the same kind of estimate Summing the two Since the sum of the first two terms of the r.h.s. is negative if u ε is larger than a constant independent of ε, this shows that u ε (t, x) dx remains uniformly bounded on any finite time interval.
Step 1: Bound on ∂ x ϕ ε . This is a classical bound for solutions to Hamilton-Jacobi equations and we have to check that it remains true uniformly at the ε level. We follow exactly [9] for instance. Compute By our assumptions and the upper bound on the total mass First note that this shows that ∂ x ϕ ε (t, .) L ∞ remains finite over a (possibly very short) time interval [0, t ε ]. Now we use the classical maximum principle where the constant α > 0 will be specified later, we have Therefore, choosing α = εe −C Ct,ε , we obtain for a constant C independent of t < t ε and of ε. Using a similar argument for the minimum, we deduce that t ε > T and that ∂ x ϕ ε is bounded on [0, T ] × R by a constant depending only on T and ∂ x ϕ 0 ε L ∞ (R) .
Step 2: First bound on H ε (ϕ ε ) and bounds on ∂ t ϕ ε and ϕ ε . First remark that This is not optimal, more precisely the lower bound is atrocious, but will suffice for the moment. Directly from Eq. (2.4), hence ending the proof of the full Lipschitz bound on ϕ ε . To get the upper bound on ϕ ε , we use this Lipschitz bound to get Hence the bound on the total mass yields that ϕ ε ≤ ε log 1/ε + C ε.
Step 3: Lower bounds on ∂ xx ϕ ε and H ε . Here we start paying for the introduction of an additional mortality term as the second derivative of D ε is not bounded uniformly in ε. As before we use a maximum principle, from (2.4) and (2.10) The last term is of course non negative and so with the same argument as before, we get d dt inf This proves the uniform lower bound on ∂ xx ϕ ε . Let us turn to the sharp lower bound on H ε (ϕ ε ). Let us write By differentiating once more where H is defined as in (2.14) and since K is compactly supported. Because we assumed that R zK(z) dz = 0, we have H(p) ≥ 0 for any p, which gives the final bound.
Therefore, the set is bounded for any l, uniformly in ε.
Step 5: Lower bound on the total mass and ϕ ε . By the previous steps we know that Note that D ε = 0 whenever u ε ≥ e −φε/ε . Therefore by (2.7), choose R 0 s.t. min |x|≤K r > 0 and integrate (2.1) This implies that by (2.7) This allows us to conclude that the total mass remains bounded from below uniformly in ε.
In particular by step 4, this lower bound means that max ϕ ε ≥ −φ ε .

Passing to the limit in the equation: First steps
By the uniform bounds provided by Lemma 4.1, we can extract a subsequence in ε (still denoted with ε), and find a function ϕ on [0, T ] × R such that , with max ϕ = 0 satisfying by the Arzéla-Ascoli theorem Note that by the upper and lower bounds on ϕ ε , one has max ϕ = 0. Since u ε is uniformly bounded in L 1 , it converges (still after an extraction) in the weak-* topology of measures to some u ∈ L ∞ ([0, T ], M 1 (R)). This in turn implies a weak convergence of I ⋆ u ε . Note that it is indeed only a weak convergence in spite of the convolution because it regularizes only in x and time oscillations are still possible.
Using the notion of viscosity solution (see [2] for instance) and the uniform bound on ∂ x ϕ ε , we may obtain the convergence of H ε (ϕ ε ) to H(∂ x ϕ).
Finally as D ε is uniformly bounded and can hence be assumed to converge weakly to some D ∈ L ∞ , we obtain ∂ t ϕ = r − I ⋆ u(t, .) − D + H(∂ x ϕ). (4.6) Unfortunately D and u are still unidentified. This is now our aim but it will require a much more precise understanding and control of the time oscillations of the set where the population is concentrated.

Continuity in time of the set {x | ϕ = 0}
At the limit, it is possible to show that the points where the population is concentrated move at most at a finite speed given by We comment on that again at the very end of the proof. However we do not have yet enough tools to prove such a strong statement. For the time being we will be satisfied in proving a continuity result. The first step is to do it at the limit Lemma 4.2 Assume that for some point x 0 , some t > t 0 and δ > 0 then ϕ(t, x 0 ) < 0.
Before turning to the next result, let us point out that Lemma 4.2 actually implies the following ∃τ ∈ C(R + ) with τ (0) = 0, s.t. ∀s ≥ t, ∀x ∈ {ϕ(s, .) = 0}, ∃y ∈ {ϕ(t, .) = 0} with |y − x| ≤ τ (s − t). (4.7) One of the main idea in the following proofs is to follow the characteristics corresponding to the Hamiltonian H. However Eq. (2.4) involves the modified Hamiltonian H ε which does not have characteristics per se (it is non local for example). Some modifications are hence needed and follow the usual ideas for parabolic problems, which is why we use the following lemma Lemma 4.3 Introduce the intermediary scale ε |φ ε | = ε/ε =εφ ε and consider any interval ϕ ε (t, .). Then is direct once one recalls that for any x ∈ [a(t), b(t)] then by the lower bound (4.2). Let us turn now to the upper bound which is trickier and involves the velocity V .
In general this maximum is attained at one (or several) point x ∈ ω m (t). First note that at such a point, one has ∂ x (ϕ ε (t, x) + ψ ε (t, x)) = 0.
By (4.8) this implies that Note that by the definition of the maximum, ∀z ∈ supp K by (4.11). Note that by (4.8) and (4.11) which allows to conclude. Let us now turn to Lemma 4.2 Proof of Lemma 4.2.
Therefore the result of the lemma is obviously true if t is such that τ t 0 (t) < δ.
Step 2: The connection between τ t and the Lemma. One does not have in general a uniform control on τ t 0 . In that case, one would find a sequence t n , a number δ > 0 s.t. ∀η > 0, lim sup τ tn (η) > δ.
The result of Lemma 4.2 would precisely rule this out and we argue by contradiction. Denote by t the first time when such a jump occurs. That means that • τ can be chosen uniform till t, i.e. ∃τ s.t. ∀s < t and ∀η ∈ [0, t − s], τ s (η) ≤ τ (η).
• There is a jump at t of size δ > 0, which means ∃x 0 , ∃t 0 < t s.t. ϕ(t, x 0 ) = 0 but Note that one can take t 0 as close to t as one wishes and in particular we may freely assume that t − t 0 is small enough s.t. τ (t − t 0 ) < δ/8.
Step 3: The contradiction. First of all, note that as ϕ ε → ϕ in L ∞ norm, one has that for ε small enough and some interval I ε with |I ε | → 0 as ε → 0 Indeed one observes that And by the continuity of ϕ for almost every s.
The same result can be obtained for the closest point on the right. By the assumption on x 0 in the Lemma, this implies that I 1 remains at a distance larger than δ/4 > Cφ ε of {ϕ ε (s, .) ≥ −φ ε }.
By the properties of D ε , we deduce that for any s ∈ [s 0 , t], by choosing ε small enough. Now let us apply Lemma 4.3 to I 1 to deduce that max Note that by (4.2) To conclude, just observe that ϕ(t 0 , y 0 ) = 0 so that Therefore taking ε small enough, one concludes that which gives the desired contradiction.

Control of the oscillations
Before obtaining the continuity of the set {ϕ ε ≥ −φ ε } which is our main goal, we have to control the oscillations in time of the reaction term.
We turn to the next proof Proof of Lemma 4.6. For any s ∈ [t 0 , t] \ I ε , one has either that x 0 is at distance less than ν of {ϕ ε (s, .) ≥ −φ ε } or it is at distance larger than ν. Let us decompose accordingly [t 0 , t] into I ε ∩ I ν ∩ J ν where I ν consists of the set where x 0 is at distance less than ν of {ϕ ε (s, .) ≥ −φ ε }.
We observe that on J ν by the properties of D ε and since on {ϕ ε (s, .) ≥ −φ ε }, one has by Lemma 4.5 that r − I ⋆ u ε (s, .) is less thanε then On the other hand, on I ν then still by 4.5, one has that We recall that |I ε | ≤ ε 3/8 which is asymptotically much smaller thanε 2 . Therefore either |J ν | ≥ (t − t 0 ) ν and we find the first possibility or |J ν | ≤ (t − t 0 ) ν and we find the second one. The next proof is Proof of Lemma 4.7. Note first of all that by Lemma 4.2, one may in fact take any sequence t n → t 0 for any x 0 ∈ Ω t 0 .
Choose any t 0 , any η > 0 and any x 0 in {ϕ(t 0 , .) = 0}. Observe that for any t > t 0 Therefore we first deduce that (4.15) Next find x t the closest point from x 0 s.t. ϕ(x t ) = 0. By definition x t − x 0 → 0. We define the small interval around x t for t 0 ≤ s ≤ t Now define ν s.t.
Note that as long as ε is small enough one has that ν ≥ε. Moreover ν can be chosen arbitrarily small by taking t close enough to t 0 . By applying Lemma 4.6 for this ν, (4.16) means that we are necessarily in the second case which means We combine this inequality to (4.15) to deduce that Now, we define the average and from (4.16) and (4.17), we get By the fundamental property (2.9), we deduce that To conclude take t → t 0 s.t. ν → 0 and then ε small enough to obtain that |Ω t 0 | = 0.
then for ε small enough with respect to δ, for some constant C independent of ε.
Unfortunately Lemma 4.8 does not imply a result like (4.7) with a function τ uniform in ε. It guarantees that there cannot be jumps of significant size in the set {ϕ ε ≥ −φ ε } but that set could still be propagated very fast. However we can combine it with Lemma 4.7 to finally deduce a uniform (in ε) continuity in time for the support of {ϕ ε ≥ −φ ε }.
then for ε small enough with respect to δ and τ , for any t < t 0 + τ We turn to the proofs Proof of Lemma 4.8. Let first characterize the "good" t 0 . We definẽ Note that of course I ε ⊂Ĩ ε (just take t = t 0 + ε 3/4 ). We can estimate |Ĩ ε | the following way: If t 0 ∈Ĩ ε then there exists an index k ≥ 0 s.t. t 0 ∈Ĩ k ε wherẽ I k ε is defined as the union of the intervals [i 2 k ε 3/4 , (i + 1) 2 k ε 3/4 ] for those i s.t.
For s ∈ [t 0 , t] \ I ε , one has that max I 1 (s) Apply now Lemma 4.5, if s ∈ I ε then this implies that max provided that ε is small enough with respect to δ.
We now apply Lemma 4.3 to I 1 to deduce that max provided that ε is small enough with respect to δ, which concludes the proof.
Let us choose such a t 0 with as well t 0 ∈Ĩ ε and denote the corresponding intervalĪ ε . Now for any δ, divide the domain in intervals I i of size δ/2. And denote m = sup By the previous property m < 0. Define τ = −m/(3 ϕ W 1,∞ ). Now for any x 0 at distance larger than δ from {ϕ ε (t 0 , .) ≥ −φ ε }, there exists y 1 < x 0 and y 2 > x 0 s.t.
Obviously for any t < t 0 + τ , ϕ(t, y i ) ≤ 2m/3, and hence for any y s.t. |y − y i | ≤ τ then provided ε is chosen small enough. Now we apply a first time Lemma 4.8 with δ = τ which means that for ε small enough with respect to τ , there cannot be a jump of size τ in then by Lemma 4.8 with δ = τ , it means that for some s ∈ [t 0 , t], one can find a point y with |y − y 1 | ≤ τ or |y − y 2 | ≤ τ s.t.
Hence we know that x 0 stays at distance δ/2 of the set {ϕ ε ≥ −φ ε }. We apply once more Lemma 4.8 to conclude.

Passing to the limit in the equation: Final steps
The first step is to characterize the limit of D ε . Of course there is a natural candidate for that which is (4.20) By (4.13), we essentially know that at most times t, the set {ϕ ε (t, .) ≥ −φ ε } is included in a neighborhood of {ϕ(t, .) = 0}. However to prove (4.20), we need to show the contrary namely that {ϕ(t, .) = 0} is included in a neighborhood of {ϕ ε (t, .) ≥ −φ ε } at most times t. This is what the following lemma shows Lemma 4.11 For any fixed δ > 0, denote Then J ε → 0 as ε → 0 and consequently (4.20) holds.
The last step is to identify the weak limit u of u ε . Fortunately we now have all the required tools and we can follow the same ideas as in [9]. Therefore let us define where µ is given by Prop. 2.1.
We prove the intermediary result where τ is given by (4.7).
Then simply by passing to the limit one deduces that Corollary 4.13 For any fixed t 0 , This means that any t 0 is a Lebesgue point of I ⋆ u and then necessarily that I ⋆ u(t 0 , .) = I ⋆ µ(t 0 , .), which asÎ > 0 lets us conclude that u = µ. We have identified all the terms in the limiting equation which reads which finishes the proof of our main theorem. It only remains to give the proofs of Lemmas 4.11 and 4.12.
We conclude that for any t 0 ∈Ī ε there exists t > 0 chosen uniformly in ε s.t.
Proof of Lemma 4.12. The steps are mostly the same as in [9]. We give them for the sake of completeness but we do not repeat all the details here. Part of the proof can also be simplified by using the more precise results that were proved before in the present article.
Step 1: The functional. We look at the evolution of x) dx − dµ t 0 (x)).

Concluding Remarks: Finite speed of propagation of the support
It is now easy to obtain additional properties for the limit ϕ, for instance the finite speed of propagation of the support. We just sketch quickly here how such properties can be proved. Just by the following the characteristics of the Hamilton-Jacobi equation, one obtains that the set {ϕ = 0} can only propagate at finite speed V . Finally let us mention that some time regularity is also known on µ. For instance the set {ϕ = 0} is continuous in time by Lemma 4.2. By the stability result in [23], one deduces that µ({ϕ(t, .)}) is continuous in time. As a matter of fact the finite speed of propagation of {ϕ = 0} would even imply Hölder continuity of order 1/2 for µ.