The Polymorphic Evolution Sequence for Populations with Phenotypic Plasticity

In this paper we study a class of stochastic individual-based models that describe the evolution of haploid populations where each individual is characterised by a phenotype and a genotype. The phenotype of an individual determines its natural birth- and death rates as well as the competition kernel, $c(x,y)$ which describes the induced death rate that an individual of type $x$ experiences due to the presence of an individual or type $y$. When a new individual is born, with a small probability a mutation occurs, i.e. the offspring has different genotype as the parent. The novel aspect of the models we study is that an individual with a given genotype may express a certain set of different phenotypes, and during its lifetime it may switch between different phenotypes, with rates that are much larger then the mutation rates and that, moreover, may depend on the state of the entire population. The evolution of the population is described by a continuous-time, measure-valued Markov process. In our last paper [4], such a model was proposed to describe tumor evolution under immunotherapy. In the present paper we consider a large class of models which comprises the example studied in [4] and analyse their scaling limits as the population size tends to infinity and the mutation rate tends to zero. Under suitable assumptions, we prove convergence to a Markov jump process that is a generalisation of the polymorphic evolution sequence (PES) as analysed by Champagnat and M\'el\'eard [8, 10].


Introduction
Over the last decade there has been increasing interest in the mathematical analysis of socalled stochastic individual based models of adaptive dynamics. These models were introduced in a series of papers by Bolker, Pacala, Dieckmann, and Law [6,7,12]. They describe the evolution of a population of individuals characterised by their phenotypes under the influence of the evolutionary mechanisms of birth, death, mutation, and ecological competition in an inhomogeneous "fitness landscape" as a measure valued Markov process. In these models there appear two natural scaling parameters. The carrying capacity, K, which regulates the size of the population and that can reasonably considered as a large parameter, and the mutation rate (of advantageous mutations), u, that in many biological situations can be taken as a small parameter. In a series of remarkable papers, Champagnat and Méléard [8,10] (and others) have analysed the limiting processes that arise in the limit when K is taken to infinity while at the same time u = u K tends to zero. Under conditions that ensure the separation of the ecological and evolutionary time scales. This means that the mutation rates are so small that the system has time to equilibrate (ecological time scale) between two mutational events. On the time scale where mutations occur (evolutionary time scale), the evolution of the population can then be described as a Markov jump process along a sequence of equilibria of, in general, polymorphic populations. An important (and in some sense generic) special case occurs when the mutant population fixates while the resident population dies out in each step. The corresponding jump process is called the Trait Substitution Sequence (TSS) in adaptive dynamics. Champagnat [8] derived criteria in the context of individual-based models under which convergence to the TSS can be proven. The general process is called the Polymorphic Evolution Sequence (PES) [10]. Here the limit is describes as a jump process between possibly polymorphic equilibria of systems of Lotka-Volterra equations of increasing dimension.
In the present paper we extend this analysis to models where an additional biological phenomenon is present, the so-called phenotypic plasticity. By this we mean the following. Individuals are no longer described by their phenotype, but by both their genotype and their phenotype. Moreover, an individual of a given phenotype can express several phenotypes and it can change its phenotype during the course of its lifetime.
Our original motivation for this comes from applications to cancer therapy, where it is wellknown that phenotypic switches ("phenotypic plasticity") is of utmost importance and in fact a major obstacle to successful therapies (see, e.g. [17] and references therein). For a first attempt at modelling specific scenarios in the framework of individual based stochastic models, see [4]. However, phenotypic switches without mutations are certainly relevant in many if not most biological systems.
Here we take a broader look at a large class of models. By expanding the techniques of [10] we prove that the microscopic process converges on the evolutionary time scale to a generalisation of the Polymorphic Evolution Sequences (PES) (cf. Thm. 3.3). The main difference in the proof is that we have to couple the process with multi-type branching processes instead of normal branching processes, which leads also to a different definition of invasion fitness in this setting. Note that we gave in [4] already heuristic arguments why the process should converge to a Markov jump process. The aim of this paper is to give the rigorous statement and its proof.
The remainder of this paper is organised as follows. In Section 2 we define the model, give a pathwise description of the Markov process we are studying and state the convergence towards a quadratic system of ODEs in the large population limit. In Sections 3 we consider the case of rare mutations and fast switches. More precisely, we state the convergence to the Polymorphic Evolution Sequences with phenotypic Plasticity (PESP) in Subsection 3.2 and prove it in Subsection 3.3.

The microscopic model
In this section we introduce the stochastic individual-based model we analyse (cf. [4,15,8,10,5]). The evolutionary process changes populations on a macroscopic level, but the basic mechanisms of evolution, heredity, variation (in our context caused by mutation and phenotypical switching), and selection, act on the microscopic level of the individuals. We describe the evolving population as a stochastic system of interacting individuals, where each individual is characterised by its phenotype and its genotype.
Let l ≥ 1 and X a finite set of the form X = G × P, where G is the set of genotypes and P is the set of phenotypes. We call X the trait space of the population. As usual, we introduce a parameter K ∈ N, called the carrying capacity. This parameter allows to scale the population size and can be interpreted as the size of available space or the amount of available resources. Let M(X) be the set of finite, non-negative measures on X, equipped with the topology of weak convergence, and let M K (X) ⊂ M(X) be the set of finite point measures on X rescaled by K, i.e.
where δ x denotes the Dirac mass at x ∈ X. We model the time evolution of a population as an M K (X)-valued, continuous time Markov process (ν K t ) t≥0 . To account for the process basic mechanisms of evolution and the phenotypic plasticity, we introduce the following parameters: (i) b(p) ∈ R + is the rate of birth of an individual with phenotype p ∈ P.
(ii) d(p) ∈ R + is the rate of natural death of an individual with with phenotype p ∈ P.
(iii) c(p,p)K −1 ∈ R + is the competition kernel which models the competitive pressure an individual with phenotype p ∈ P feels from an individual with phenotypep ∈ P and is inversely proportional to the carrying capacity K.
(iv) s g nat. (p,p) ∈ R + is the natural switch kernel which models the natural switching from phenotype p top of individuals with genotype g.
(v) s g ind. (p,p)(p)K −1 ∈ R + is the induced switch kernel which models the switching from phenotype p top of individuals with genotype g induced by an individual with phenotypê p.
(Compare with the cytokine-induced switch of [4], especially the one of TNF-α (Tumour Necrosis Factor).) (vi) u K m(g) with u K , m(g) ∈ [0, 1] is the probability that a mutation occurs at birth from an individual with genotype g ∈ G, where u K is a scaling parameter.
(vii) M((g, p), (g,p)) is the mutation law, i.e. if a mutant is born from an individual with trait (g, p), then the mutant's trait is (g,p) with probability M((g, p), (g,p)).
Note that most of the parameters depend on the phenotype only and that we explicitly allow that individuals with different genotypes can express the same phenotype and conversely that individuals with the same genotype can express different phenotypes.
Assumption 1. For simplicity we assume that s g ind. (p,p)(p)K −1 = 0 for allp ∈ X whenever s g nat. (p,p) = 0, i.e. depending on the environment the total switching rate can be larger or smaller but not zero or non-zero.
At any time t ≥ 0, we consider a finite population which consist of N t individuals and each individual is characterised its trait x i (t) ∈ X. The state of a population at time t is the measure The population process ν K is a M K (X)-valued Markov process with infinitesimal generator L K , defined, for any bounded measurable function φ : M K (X) → R and for all µ K ∈ M K (X) by .
The first and second terms describe the births (without and with mutation), the third term describes the deaths due to age or competition, and the last term describes the phenotypic plasticity. Observe that the first and second terms are linear (in µ K ), but the third and fourth terms are non-linear. The only difference to the standard model is the presence of the fourth term that corresponds to the phenotypic switches. However, this term changes the dynamics substantially. In particular, the system of differential equations which arises in the large population limit without mutation (u K = 0) is not a generalised Lotka-Volterra system anymore, i.e. has not the forṁ n = n f (n), where f is linear in n (cf. Thm. 2.1 and Def. 3.1).
Remark 1. (i) Since X is finite, we could also represent the population state as an |X|-dimensional vector. More precisely, let E be a subset of R |X| and E K ≡ E ∪ {n/K : n ∈ N 0 }, then for fixed K ≥ 1, the population process can be constructed as Markov process with state space E K by using independent standard Poisson processes (cf. [14] Chap. 11). (ii) For an extension to a non-finite trait space, e.g. if G and P are compact subsets of R k for some k ≥ 1, the modeling of switching the phenotype has to be changed in the following way: Each individual with trait (g, p) ∈ G × P has instead of the natural switch kernel s g nat. (p,p) a natural switch rate s nat. (g, p) combined with a probability measure S (g,p) nat. (dp) on P and instead of the induced switch kernel s g ind. (p,p)(p)K −1 a induced switch kernel s ind. ((g, p),p)K −1 combined with a family of probability measure {S ((g,p),p) ind.
2.1. Explicit construction of the population process with phenotypic plasticity. It is useful to give a pathwise description of ν K in terms of Poisson point measures (cf. [15]). Let us recall this construction. Let (Ω, F , P) be an abstract probability space. On this space, we define the following independent random elements: (i) a convergent sequence (ν K 0 ) K≥1 of M K (X)-valued random measures (the random initial population), (g,p) (ds, di, dθ, dx) ) (g,p)∈X on [0, ∞)×N×R + ×X with intensity measure ds n≥0 δ n (di)dθ x∈X δx(dx).
2.2. The Law of Large Numbers. If the mutation rate is independent of K and the initial conditions converge to a deterministic limit, then the sequence of rescaled processes, (ν K ) K≥1 , converges, almost surely, as K ↑ ∞ to the solution of system of ODEs. This follows directly from the law of large numbers for density depending processes, see, e.g. Ethier and Kurtz [14], Chap. 11. The following theorem gives a precise statement.
Theorem 2.1. Let u K ≡ 1. Suppose that the initial conditions converge almost surely to a deterministic limit, i.e. lim K↑∞ ν K 0 = ν 0 , where ν 0 is a finite measure on X. Then, for every T > 0, exists a deterministic function ξ ∈ C([0, T ], M F (X)) such that where || . || TV is the total variation norm. Moreover, let n be the unique solution to the dynamical systemṅ Then, ξ is given as ξ t = x∈X n x (t)δ x .
Proof. This result follows from Theorem 2.1 in Chapter 11 of [14], since we can construct the process as described in Remark 1 (i). For more details see [21].
Remark 3. If the trait spaces is not finite, one can obtain a similar result, cf. [15].
3. The interplay between rare mutations and fast switches.
In this section we state our main results. As in previous work, we place ourselves under the assumptions which ensure that a population reaches equilibrium before a new mutant appears. Under these assumptions we prove that the individual-based process with phenotypic plasticity convergences to a generalisation of the PES. Let us start with describing the techniques used in [10].
The key element in the proof of the convergence to the PES used by Champagnat and Méléard [10] is a precise analysis of how a mutant population fixates. A crucial assumption in [10] is that the competitive Lotka-Volterra systems that describes the large population limit always have a unique stable fixed pointn. Thus, the main task is to study the invasion of a mutant that has just appeared in a population close to equilibrium. The invasion can be divided into three steps: First, as long as the mutant population size is smaller than K , for a fixed small > 0, the resident population stays close to its equilibrium. Therefore, the mutant population can be approximated by a branching process. Second, once the mutant population reaches the level K , the whole system is close to the solution of the corresponding deterministic system and reaches an -neighbourhood ofn in finite time. Third, the subpopulations which have a zero coordinate inn can be approximated by subcritical branching processes until they die out.
The first and third steps require a time of order ln(K), whereas the second step requires only a time of order one, independent of K. Since the expected time between two mutations is of order 1/(u K K), the the upper bound on u K in (3.1) guarantees that, with high probability, the three steps of an invasion are completed before a new mutation occurs.
In the first invasion step the invasion fitness of a mutant plays a crucial role. Given a population in a stable equilibrium that populates a certain set of traits, say M ⊂ X, the invasion fitness f (x, M) is the growth rate of a population consisting of a single individual with trait x M in the presence of the equilibrium populationn on M. In the case of the standard model, it is given by Positive f (x, M) implies that a mutant appearing with trait x from the equilibrium population on M has a positive probability (uniformly in K) to grow to a population of size of order K; negative invasion fitness implies that such a mutant population will die out with probability tending to one (as K ↑ ∞) before it can reach a size of order K. The reason for this is that the branching process (birth-death process) which approximates the mutant population is supercritical if f (x, M) is positive and subcritical if f (x, M) is negative. In order to describe the dynamics of a phenotypically heterogeneous population on the evolutionary time scale, we have to adapt the notion of invasion fitness to the case where fast phenotypic switches are present. Since switches between phenotypes associated to the same genotype happen at times of order one, the growth rate of the initial mutant phenotype does not determine the probability of fixation. See [11] for a similar issue in a model with sexual reproduction. In the proof of Theorem 3.3 we approximate the dynamics of the mutant population by a multitype branching process until the reaches a size K (or dies out). A continuous-time multi-type branching process is supercritical if and only if the largest eigenvalue of its infinitesimal generator of is strictly positive (cf. [2,23]). Therefore, this eigenvalue will provide an appropriate generalisation of the invasion fitness.
3.1. The competitive Lotka-Volterra system with phenotypic plasticity. We first consider the large population limit without mutation (u K ≡ 0). Assume tha the initial condition is supported on d traits, (g, p) = ((g 1 , p 1 ), . . . , (g d , p d )) ∈ (G × P) d , and that the sequence of the initial conditions converges almost surely to a deterministic limit, i.e.
By Theorem 2.1, for every T > 0, the sequences of processes ν K ∈ D([0, T ], M K (X)) generated by L K with initial state ν K 0 converges almost surely, as K ↑ ∞, to a deterministic function ξ ∈ C([0, T ], M(X)). Since u K ≡ 0, no new genotype can appear in the population process ν K . Moreover, not every genotype can express every phenotype. <let us describe the support of ν K t more precisely.
For all g ∈ G, let X g be a stationary discrete-time Markov chain with state space P and transition probabilities The Markov chains {X g , g ∈ G} contain only partial information on the switching behaviour of the process ν K , but we see that this is the key information needed later.
In the sequel we work under the following simplifying assumption: For all g ∈ G, all communicating classes of X g are recurrent.
We denote the communicating class associated with (g, p) ∈ G × P by [p] g . This is the communicating class of X g which contains p, i.e. p can be seen as a representative of the class, which has an equivalence relation depending on g. By Assumption 1, this ensures that if we start with a large enough population consisting only of individuals carrying the same trait (g, p), then, after a short time, all phenotypes in the class [p] g will be present in the population, but none of the other classes. Observe that these Markov chains do not describe the dynamics of the whole process. If we allowed transient states this would not imply that the trait would get extinct, since its growth rate could be larger than the switching rate. Thus, [p] g is the set of phenotypes which are reachable in the Markov chain X g with X g 0 = p and the set of traits which can appear in the population process ν K is given by (3.6) With this notation, ξ is given by ξ(t) = x∈X (g,p) n x (t)δ x , where n is the solution of the competitive Lotka-Volterra system with phenotypic plasticity defined below.
Definition 3.1. For any (g, p) ∈ (G × P) d , we denote by LVS (d, (g, p)) the competitive Lotka-Volterra system with phenotypic plasticity. This is an |X (g,p) |-dimensional system of ODEs given bẏ We choose the (possibly misleading) name competitive Lotka-Volterra system with phenotypic plasticity to emphasise that we add phenotypic plasticity (induced by switching rates) in the usual competitive Lotka-Volterra system. Note, however, that the system LVS is not a system of Lotka-Volterra equations. We now introduce the notation of coexisting traits in this context (cf. [10]). Definition 3.2. For any d ≥ 2, we say that the distinct traits (g 1 , p 1 ), . . . , (g d , p d ) coexist if the system LVS (d, (g, p)) has a unique non-trivial equilibriumn(g, p) ∈ (0, ∞) |X (g,p) | which is locally strictly stable, meaning that all eigenvalues of the Jacobian matrix of the system LVS (d, (g, p)) atn(g, p) have strictly negative real parts.
Note that if (g 1 , p 1 ), . . . , (g d , p d ) coexist, then all traits of X (g,p) coexist and the equilibrium n(g, p) is asymptotically stable. We will prove later that if the traits (g 1 , p 1 ), . . . , (g d , p d ) coexist, then the invasion probability of a mutant trait (g,p) which appears in the resident population X (g,p) close ton(g, p) is given by the function where q (g,p) (g,p) is given as follows: Let us denote the elements of [p]g byp 1 ,p 2 , . . . ,p |[p]g| and assume without lost of generality thatp =p 1 . Then, q (g,p) (g,p) is the first component of the smallest solution of c(p i , p)n (g,p) (g, p) y i .
In fact, (1 − q (g,p) (g,p)) is the probability that a single mutant survives in a resident population with traits X (g,p) . We obtain this by approximating the mutant population with multi-type branching processes (cf. proof of Thm. 3.6). The function (1 − q (g,p) (g,p)) plays the same role as the function [ f (y; x)] + /b(y) in the standard case (cf. [10]).
To obtain that the process jumps on the evolutionary time scale from one equilibrium to the next, we need an assumption to prevent cycles, unstable equilibria or chaotic dynamics in the deterministic system (cf. [10] Ass. B).
We write n * and notn to emphasise that some components of n * can be zero. We use the shorthand notation ((g, p), (g,p)) for ((g 1 , p 1 ), . . . , (g d , p d ), (g,p)). Assumption 3 does not have to hold for all traits in X \ X (g,p) , but only for those traits (g,p) which can appear in the resident population by mutation, i.e. only if (g,p)∈X (g,p) m(g)M((g, p), (g,p)) is positive. Remark 4. It is possible to extend the definitions and assumptions for the study of rare mutations and fast switches in populations with non-discrete trait space if one assumes that an individual can change its phenotype only to finitely many other phenotypes. This must be encoded in the switching kernels. More precisely, for all (g, p) ∈ G × P the communicating class [p] g should contain finitely many elements.

3.2.
Convergence to the generalised Polymorphic Evolution Sequence. In this subsection we state the main theorem of this paper and give the general idea of the proof illustrated by an example.
Theorem 3.3. Suppose that the Assumptions 1, 2 and 3 hold. Fix (g 1 , p 1 ), . . . , (g d , p d ) ∈ G × P coexisting traits and assume that the initial conditions have support X (g,p) and converge almost surely ton(g, p), i.e. lim K↑∞ ν K 0 = x∈X (g,p)n x (g, p)δ x a.s.. Furthermore, assume that Then, the sequence of the rescaled processes (ν K t/Ku K ) t≥0 , generated by L K with initial state ν K 0 , converges in the sense of finite dimensional distributions to the measure-valued pure jump process Λ defined as follows: Λ 0 = (g,p)∈X (g,p)n (g,p) (g, p)δ (g,p) and the process Λ jumps for all (ĝ,p) ∈ X (g,p) from with infinitesimal rate m(ĝ)b(p)n (ĝ,p) (g, p)(1 − q (g,p) (g,p))M((ĝ,p), (g,p)). (3.13) Remark 5. (i) The convergence cannot hold in law for the Skorokhod topology (cf. [8]). It holds only in the sense of finite dimensional distributions on M F (X), the set of finite positive measures on X equipped with the topology of the total variation norm. (ii) The process Λ is a generalised version of the usual PES. Therefore, we call Λ Polymorphic Evolution Sequence with phenotypic Plasticity (PESP). (iii) Assumption 3 is essential for this statement. In the case when the dynamical system has multiple attractors and different points near the initial state lie in different basins of attraction, it is not clear and may be random which attractor the system approaches. The characterisation of the asymptotic behaviour of the dynamical system is needed to describe the final state of the stochastic process. This is in general a difficult and complex problem, which is not doable analytically and requires numerical analysis. Thus, we restrict ourselves to the Assumption 3.
We describe in the following the general idea of the proof, which is quite similar to the one given in [10]. The population is either in a stable phase or in an invasion phase. Until the first mutant appears the population is in a stable phase, i.e. the population stays close to a given equilibrium. From the first mutational event until the population reaches again a stable state, the population is in an invasion phase. In fact, the mutant either survives and the population reaches fast a new stable state (where the mutant trait is present) or the mutant goes extinct and the population is again in the old stable state. After this the populations is again in a stable phase until the next mutation, etc..
Note that we prove in the following that the invasion phases are relatively short (O(ln(K))) compared to the stable phase (O(1/u K K)). Since we study the process on the time scale 1/Ku K , the limit process proceeds as a pure jump process which jumps from one stable state to another.
The stable phase: Fix > 0. Let X (g,p) be the support of the initial conditions. For large K, the population process ν K is, with high probability, still in a small neighbourhood of the equilibrium n(g, p) when the first mutant appears. In fact, using large deviation results on the problem of exit from a domain (cf. [16]), we obtain that there exists a constant M > 0 such that the first time ν K leave the M -neighbourhood ofn(g, p) is bigger than exp(V K) for some V > 0 with high probability. Thus, until this stopping time, mutations born from individuals with trait x ∈ X (g,p) appear with a rate which is close to (3.11) ensures that the first mutation appears before this exit time.
The invasion phase: We divide the invasion of a given mutant trait (g,p) into three steps, as in [8] and [10] (cf. Fig. 2). In the first step, from a mutational event until the mutant population goes extinct or the mutant density reaches the value , the number of mutant individuals is small (cf. Fig. 2, [0, t 1 ]). Thus, applying a perturbed version of the large deviation result we used in the first phase, we obtain that the resident population stays close to its equilibrium densitȳ n(g, p) during this step. Using similar arguments as Champagnat et al. [8,10], we prove that the mutant population is well approximated by a |[p]g|-type branching process Z, as long as the mutant population has less than K individuals. More precisely, let us denote the elements of [p]g byp 1 , . . . ,p |[p]g| , then, for each 1 ≤ i ≤ |[p]g|, each individual in Z (carrying trait (g,p i )) undergoes (i) birth (without mutation) with rate b(p i ), (ii) death with rate d(p i ) + (g,p)∈X (g,p) c(p i , p)n (g,p) (g, p) and (iii) switch top j with rate s˜g nat. (p i ,p j ) + (g,p)∈X (g,p) s˜g ind. (p i ,p j )(p)n (g,p) for all 1 ≤ j ≤ |[p]g|. This continuous-time multi-type branching process is supercritical if and only if the largest eigenvalue of its infinitesimal generator, which we denote by λ max , is larger than zero. Hence, the mutant invades with positive probability if and only if λ max > 0. Moreover, the probability that the density of the mutant's genotype, ν K (g), reaches at some time t 1 is close to the probability that the multi-type branching process reaches the total mass K, which converges as K ↑ ∞ to (1 − q (g,p) (g,p)).
In the second step, we obtain as a consequence of Theorem 2.1 that once the mutant density has reached , for large K, the stochastic process ν K can be approximated on any finite time interval by the solution of LVS (d + 1, ((g 1 , p 1 ), . . . , (g d , p d ), (g,p))) with a given initial state. By Assumption 3, this solution reaches the -neighbourhood of its new equilibrium n * ((g, p), (g,p)) in finite time. Therefore, for large K, the stochastic process ν K also reaches with high probability the -neighbourhood of n * ((g, p), (g,p)) at some finite (K independent) time t 2 .
In the third step, we use similar arguments as in the first atep. Since n * ((g, p), (g,p)) is a strongly locally stable equilibrium (Ass. 3), the stochastic process ν K t stays close n * ((g, p), (g,p)) and we can approximate the densities of the traits (g, p) ∈ X ((g,p),(g,p)) with n * (g,p) ((g, p), (g,p)) = 0 by |[p] g |type branching processes which are subcritical and therefore become extinct, a.s..
The duration of the first and third step are proportional to ln(K), whereas the time of the second step is bounded. Thus, the second inequality in (3.11) guarantees that, with high probability, the three steps of invasion are completed before a new mutation occurs. After the last step the process is again back in a stable phase, but with a possibly different resident population, until the next mutation happens. An example: Figure 2 shows the invasion phase of a single mutant with trait (g,p 1 ), which appeared (at time 0) in a population close ton(g, p) (indicated by the dashed lines). In this example the resident population consists of two coexisting traits (g, p 1 ) and (g, p 2 ) and the mutant individuals can switch to one other phenotype only, i.e. [p 1 ]g = {p 1 ,p 2 }. The parameters of the simulation of Figure 2 are given in Table 1. The stable fixed point of the system LVS (2, ((g, p 1 ), (g, p 2 ))) b(p 1 ) = 3 d(p 1 ) = 1 c(p 1 , p 1 ) = 1 c(p 1 , p 2 ) = 0.7 c(p 1 ,p 1 ) = 0.7 c(p 1 ,p 2 ) = 0.7 s g nat. (p 1 ,  Figure 2 isn((g, p 1 ), (g, p 2 )) ≈ (1.507, 0.809). The infinitesimal generator of the multi-type branching process that approximates the mutant population in the first step is approximately 0.879 1.5 2 −0.621 . (3.14) Since the largest eigenvalue of this matrix is positive (≈ 2.016), the mutant population reaches with positive probability the second invasion step (cf. Fig. 2). Moreover, n * ≈ (0, 0, 2.608, 1.608) is the unique locally strictly stable fixed point of the dynamical system LVS (4, ((g, p 1 ), (g, p 2 ), (g,p 1 ), (g,p 2 ))). The dynamical system and hence also the stochastic process reach in finite time the -neighbourhood of this value. The infinitesimal generator of the multi-type branching process that approximates the resident population in the third step is approximately −1.951 The largest eigenvalue of this matrix is negative (≈ −0.951) meaning that the process is subcritical and goes extinct a.s.. Therefore, there exists a time t 3 such that all individuals which carry trait (g, p 1 ) or (g, p 2 ) are a.s. dead at time t 3 .
3.3. Proof of Theorem 3.3. In this paragraph we prove the convergence to the PESP. (The proof uses the same arguments and techniques as [10], which were developed in [8]. However, some extensions are necessary, if fast phenotypic switches are included in the process, which we state and prove in this subsection.) We start with an analog of Theorem 3 of [8]. Part (i) of the following theorem strengthens Theorem 2.1, and part (ii) provides control of exit from an attractive domain in the polymorphic case with phenotypic plasticity.

16)
where n(t, ν K 0 ) ∈ R |X (g,p) | denotes the value of the solution of LVS (d, (g, p)) at time t with initial condition n x (0, ν K 0 ) = ν K 0 (x) for all x ∈ X (g,p) . Note that the measure x∈X (g,p) n x (t, ν K 0 )δ x depends on K, since the initial condition and hence the solution of LVS (d, (g, p)) depends on K. (ii) Let (g 1 , p 1 ), . . . , (g d , p d ) ∈ X coexist. Assume that, for any K ≥ 1, Supp(ν K 0 ) = X (g,p) . Let τ mut. be the first mutation time. Define the first exit time from the ξ-neighbourhood of n x (g, p) by θ K,ξ exit ≡ inf t ≥ 0 : ∃x ∈ X (g,p) : ν K t (x) −n x (g, p) > ξ .
(3.17) Then there exist 0 > 0 and M > 0 such that, for all < 0 , there exists V > 0 such that if the initial state of ν K lies in the -neighbourhood ofn x (g, p), the probability that θ K,M exit is larger than e KV ∧ τ mut. converges to one, i.e. lim K↑∞ sup n K ∈(N/K) |X (g,p) | ∩B (n(g,p)) P θ K,M exit < e KV ∧ τ mut. ν K 0 (x) = n K x for all x ∈ X (g,p) = 0, (3.18) where n K ≡ (n K x ) x∈X (g,p) and B (n(g, p)) denotes the -neighbourhood ofn(g, p). Moreover, (3.18) also holds if, for all (g, p) ∈ X (g,p) , the total death rate of an individual with trait (g, p), d(p) + (g,p)∈X (g,p) c(p,p)ν K t (g,p), (3.19) and the total switch rates of an individual with trait (g, p),

20)
are perturbed by additional random processes that are uniformly bounded byc respectivelys ind. , wherec ands ind. are upper bounds for the parameters of competition and induced switch.
Remark 6. (i) One consequence of the second part of (ii) is that, with high probability, the process stays in the M -neighbourhood ofn x (g, p) until the first time that a mutant's density reaches the value . In other words, let θ K Invasion denote the first time that a mutant's density reaches the value , i.e Then, the probability that θ K,M exit is larger than e KV ∧ θ K Invasion converges to one. We use this result also for the third invasion step. (ii) Sincen(g, p) is a locally strictly stable fixed point of the system LVS (d, (g, p)), there exists a constant M > 0 such that, for all > 0 small enough, for all trajectories n(t) with ||n(0) −n(g, p)|| < , it holds that sup t≥0 ||n(t) −n(g, p)|| < M .
Proof. The main task to prove (i) is to show that a large deviation principle on [0, T ] holds for a sightly modify process and that the ν K has the same law on the random time interval we need to control it. In fact, Theorem 10.2.6 of [13] can be applied to obtain the large deviation principle.
The main task to prove (ii) is to show that the classical estimates for exit times from a domain (cf. [16]) for the jump process ν K can be used. Note that Freidlin and Wentzell study in [16] mainly small white noise perturbations of dynamical systems. However, there also are some comments on the generalisation to dynamical systems with small jump-like perturbations (cf. [16], Sec. 5.4).
The following Lemma describes the asymptotic behaviour of τ mut. and can be seen as an extension of Lemma 2 of [8] or Lemma A.3 of [10].

22)
Moreover, (τ mut. u K K) K≥1 converges in law to an exponential distributed random variable with parameter (g,p)∈X (g,p) m(g)b(p)n (g,p) (g, p) and the probability that the mutant, which appears at time τ mut. , is born from an individual with trait (g, p) ∈ X (g,p) converges to m(g)b(p)n (g,p) (g, p) (g,p)∈X (g,p) m(g)b(p)n (g,p) (g, p) (3.23) as K ↑ ∞.
Proof. There exist constants C > 0 and V > 0, such that on the time interval [0, exp(KV)] the total mass of the population, ν K t (X), is bounded from above by C. Therefore, we can construct an exponential random variable A with parameter C Ku K , where C = C max g∈G,p∈P m(g)b(p), such that A ≤ τ mut. on the event τ mut. < exp(KV) .
The fixed pointn(g, p) is asymptotic stable. Thus, ∃ 0 > 0 : ∀˜ ∈ (0, 0 ) ∃T (˜ ): In words, there exists a finite time T (˜ ) such that all trajectories, which start in the 0 neighbourhood of the fixed point, stay after T (˜ ) in the˜ /2-neighbourhood of the fixed point. Next, we apply the last theorem: By (i), for all˜ ∈ (0, 0 ) ∃T (˜ ) such that, for K large enough, Then, by (ii), there exist 0 > 0 and M > 0: for all˜ ∈ (0, 0 ) there exists V > 0 such that Moreover, for all˜ ∈ (0, 0 ) there exists K 0 ∈ N such that T (˜ ) < ln(K) for all K ≥ K 0 . Thus, setting = M˜ , ends the proof of (3.22), provided that lim K↑∞ P[τ mut. < e KV ] = 1. Again, we can construct for all > 0 two exponential random variables A 1,K, and A 2,K, with parameters a 1 u K K ≡ (g,p)∈X (g,p) u K m(g)b(p)(n (g,p) (g, p) + )K (3.28) and a 2 u K K ≡ (g,p)∈X (g,p) u K m(g)b(p)(n (g,p) (g, p) − )K (3.29) such that A 1,K, ≤ τ mut. ≤ A 2,K, on the event {T (˜ ) < τ mut. < e KV }, because u K Ke KV ↑ ∞ as K ↑ ∞. Therefore, for all > 0, the probability of the event {T (˜ ) < τ mut. < e KV } converges to one as K goes to infinity. Moreover, the random variables A 1,K, u K K and A 2,K, u K K converge both in law to the same exponential distributed random variable with parameter (g,p)∈X (g,p) m(g)b(p)n (g,p) (g, p) (3.32) as first K ↑ ∞ and then → 0. The random variables A, A 1,K, and A 2,K, can easily be constructed by using the pathwise description of ν K (cf. [3] or [9]).
Theorem 3.6 (The three steps of invasion). Let (g 1 , p 1 ), . . . , (g d , p d ) ∈ X coexist. Assume that, for any K ≥ 1, Supp(ν K 0 ) = X (g,p) ∪ {(g,p)}. Let τ mut. denote the next mutation time (after time zero) and define and ∀x {x ∈ X : n * x ((g, p), (g,p)) > 0} : ν K t (x) = 0 . Assume that we have a single initial mutant, i.e. ν K 0 (g,p) = 1/K. Then, there exist 0 > 0, C > 0, and M > 0 such that for all is the invasion probability defined in (3.8) and ∀η > 0, lim The structure of the proof is similar to the one of Lemma 3 in [8] (cf. also Lem. A.4. of [10]). However, we have to extend the theory to multi-type branching processes. Thus, the proof is not a simple copy the arguments in [8]. Before proving the theorem, let us collect some properties about multi-type continuous-time branching processes. Most of these can be found in [2] or [23]. The limit theorems we need in the sequel were first obtained by Kesten and Stigum [19,18,20] in the discrete-time case and by Athreya [1] in the continuous-time case.
Let Z(t) be a k-dimensional continuous-time branching process. Assume that Z(t) is nonsingular and that the first moments exist. (Note that a process is singular if and only if each individual has exactly one offspring and that the existence of the first moments is sufficient for the non-exposition hypothesis.) Then, the so-called mean matrix M(t) of Z(t) is the k × k matrix with elements 38) and e i is the i-th unit vector in R k . It is well known (cf. [2] p. 202) that there exists a matrix A, called the infinitesimal generator of the semigroup {M(t), t ≥ 0}, such that The process Z is called supercritical, critical, or subcritical according as λ max (A) is larger, equal, or smaller than zero.
Observe that the following properties are equivalent (cf. [23] p. 95-99 and [22]): In particular, irreducible implies positive regular. Note that a matrix is irreducible if it is not similar via a permutation to a block upper triangular matrix and that a Markov chain is irreducible if and only if the mean matrix is irreducible.
The following lemma is an extension of Theorem 4 of [8] for multi-type branching processes. and Moreover, forū = max 1≤i≤k u i min 1≤ j≤k u j and for any > 0, lim Moreover, conditionally on survival, the proportions of the different types present in the population converge almost surely, as t ↑ ∞, to the corresponding ratios of the components of the eigenvector: for all i = 1, . . . , k, Proof. We start with the proof of (i). Since Z(t) is in this case a subcritical irreducible continuoustime branching process and E[Z j (t) ln(Z j (t))|Z(0) = e i ] < ∞, we obtain by applying Satz 6.2.7 of [23] the existence of a constant C > 0 such that Moreover, we have a non-explosion condition. Thus, for all > 0, either T K equals infinity or it converges to infinity as K ↑ ∞. Putting both together, there exists a sequence s K with lim K↑∞ s K = +∞ such that The branching property implies that for all For all i ∈ {1, . . . , k}, 1 ≥ (q i (t K )) x i ≥ (q i (t K )) K and by (3.50) we have 1 − q i (t K ) = O(e λ max (A)t K ). Moreover, for any sequence (w K ) K≥1 such that lim K↑∞ w K = 0, This implies that, for all t K with t K ln(K) and C > 0, since lim K↑∞ Ce λ max (A)t k K = 0, Thus, taking the limit K ↑ ∞ in (3.52), we obtain the desired equation (3.45). To prove the inequality (3.46) we use the fact that ( k i=1 u i Z i (t))e −λ max t is a martingale (cf. [1], Prop. 2). By applying Doob's stopping theorem to the stopping time T K ∧ T 0 we obtain, for all x ∈ B 2 K , that Therefore, since λ max (A) < 0 in the subcritical case, where W is a nonnegative random variable. Since we assume that, for all i ∈ {1, . . . , k}, E[Z j (t) ln(Z j (t))|Z(0) = e i ] < ∞, we get that Note that we used that t K ln(K). Since P [T 0 = ∞|Z(0) = e i ] = 1 − q i , we deduce (3.48). On the other hand, there exist two sequences s 1 K and s 2 K , which converge to infinity as K ↑ ∞, such that, for K large enough, s 1 Using these properties about multi-type branching processes we can now prove the theorem about the three steps of invasion.
Proof of Theorem 3.6. The first invasion step. Let us introduce the following stopping times Untilθ K the mutant population ν K t (g) influences only the death and switching rates of the resident population and this perturbation is uniformly bounded by (c+s ind. ) . Thus, by applying Theorem 3.4 (ii), we obtain On the time interval [0, θ K,M exit ∧ τ mut. ∧θ K ], the resident population can be approximated by x∈X (g,p)n x (g, p)δ x and no further mutant appears. This allows us to approximate ν K t (g) by multitype branching processes.
Let k ≡ |[p]g|. We construct two (N 0 ) k -valued processes X 1, (t) and X 2, (t), using the pathwise definition in terms of Poisson point measures of ν K t , which control the mutant population ν K t (g). To this aim let us denote the elements of [p]g byp 1 , . . . ,p k (w.l.o.g.p ≡p 1 ). Then, we define X 1, by j (s−), θ≤d(p j )+ (g,p)∈X (g,p) c(p j ,p)n (g,p) (g,p)+cM e j N death (g,p j ) (ds, di, dθ) (p j ,p l )+ (g,p)∈X (g,p) sg ind. (p j ,p l )(p)n (g,p) (g,p)−s ind. M e l − 1 θ≤sg nat. (p j ,p l )+ (g,p)∈X (g,p) sg ind. (p j ,p l )(p)n (g,p) (g,p)+s ind. M e j N switch (g,p j ) (ds, di, dθ, dp l ), and similar X 2, by (p j ,p l )+ (g,p)∈X (g,p) sg ind. (p j ,p l )(p)n (g,p) (g,p)+s ind. M e l − 1 θ≤sg nat. (p j ,p l )+ (g,p)∈X (g,p) sg ind. (p j ,p l )(p)n (g,p) (g,p)−s ind. M e j N switch (g,p j ) (ds, di, dθ, dp l ), where e j is the j-th unit vector in R k and N birth , N death , and N switch are the collections of Poisson point measures defined in Subsection 2.1. Note that X 1, (t) and X 2, (t) are k-type branching processes with the following dynamics: For each 1 ≤ i ≤ k, each individual in X 1, (t), respectively X 2, (t), with trait (g,p i ) undergoes (i) birth (without mutation) with rate b(p i ) − , respectively b(p i ) + + 2(k − 1)s ind. M , (ii) death with rate D (g,p) (p i ) +cM + 2(k − 1)s ind. M , respectively D (g,p) ( Moreover, the processes X 1, (t) and X 2, (t) have the following property: There exists a K 0 > 1 such that for allp i ∈ [p]g and for all K On the other hand, if inf{t ≥ 0 : X 2, (t) = 0} ≤ inf{t ≥ 0 : X 2, (t) = K } ∧ θ K, exit ∧ τ mut. , theñ Next, let us identify the infinitesimal generator of the control processes X 1, and X 2, . Therefore, define, for i = 1, . . . , k, (3.70) ( f (g,p) (g,p i ) would be the invasion fitness of phenotypep i if there was no switch back from the other phenotypes top i .) Then, by Equation (3.40), the infinitesimal generators are given by the following matrixes We prove in the following that the number of mutant individuals grow with positive probability to K before dying out if and only if λ max of A (g,p) ≡ lim →0 A(X 1, ) is strictly positive. Thus, λ max (A (g,p) ) is an appropriate generalisation of the invasion fitness of the class [p]g: (A (g,p) ). (3.72) Since the birth and death rates of X 1, and X 2, are positive and since Assumption 2 implies that M(X 1, ) and M(X 2, ) are irreducible, we obtain that the processes X 1, and X 2, are non-singular and irreducible. Thus, X 1, and X 2, satisfy the conditions of Lemma 3.7. For l ∈ {1, 2}, let q(X l, ) denote the extinction probability vector of X l, , i.e.
Observe that q(X l, ) = (1, . . . , 1) if X l, is not supercritical. To characterise q(X l, ) in the supercritical case, let us introduce the following functions where η is some small enough constant and l ∈ {1, 2}, (3.73) defined, for all 1 ≤ i ≤ k, by and Observe that u 1 (y, ) and u 2 (y, ) are the infinitesimal generating functions of X 1, and X 2, and that u 1 (y, 0) = u 2 (y, 0). Moreover, the extinction vector of a multi-type branching process is given as the unique root of the generating function in the unit cube (cf. [2] p. 205 or [23] Chap. 5). Thus, in the supercritical case q(X 1, ) is the unique solution of u 1 (y, ) = 0 for y ∈ [0, 1) k (3.76) and q(X 2, ) is the unique solution of These solutions are in general not analytic. Applying Lemma 3.7 to X 1, and X 2, we obtain that there exists C 1 > 0 such that, for all η > 0, > 0 sufficiently small and K large enough, If X 2, is sub-or critical for small enough, then lim ↓0 q 1 (X 2, ) = lim ↓0 q 1 (X 1, ) = 1. In the supercritical case, let q ∈ [0, 1) k be the solution of u 1 (y, 0) = u 2 (y, 0) = 0. Then, by applying the implicit function theorem, there exist open sets U 1 ⊂ R and U 2 ⊂ R containing 0, open sets V 1 ⊂ R k and V 2 ⊂ R k containing q, and two unique continuously differentiable functions g 1 : U 1 → V 1 and g 2 : (3.81) By definition, g 1 (0) = g 2 (0) = q and q 1 = q (g,p) (g,p). We can linearise and obtain that there exists a constant C 2 > 0 such that q 1 (X 1, ) ≤ q (g,p) (g,p) + C 2 and q 1 (X 2, ) ≥ q (g,p) (g,p) − C 2 (3.82) Therefore, lim Conditionally on survival, the proportions of the different phenotypes in X 1, converge almost surely, as t ↑ ∞, to the corresponding ratios of the components of the eigenvector, which are all strictly positive (cf. Lem. 3.7, Eq. (3.49)). Moreover, there exists a constant C 3 > 0 such that, for all small enough, andθ K converges to infinity as K ↑ ∞. Thus, conditionally on {θ K < η Ku K ∧ θ K,M exit ∧ τ mut. }, there exists a (small) constant C 4 > 0 such that the probability that the densities of the phenotypes {p 1 , . . . ,p k }, are all larger than C 4 at timeθ K convergences to one as first K ↑ ∞ and then → 0. More precisely, there exists constants C 4 > 0 and C 5 > 0 such that, for all small enough, The second invasion step. By Assumption 3, any solution of LVS (d + 1, ((g, p), (g,p))) with initial state in the compact set converge, as t ↑ ∞, to the unique locally strictly stable equilibrium n * ((g, p)), (g,p)). Therefore, for all > 0 there exists T ( ) ∈ R such that any of these trajectories do not leave the set x ∈ R |X (g,p) |+k : |x − n * ((g, p)), (g,p))| ≤ 2 /2 (3.87) after time T ( ). Back to the stochastic system, let us introduce on the event {θ K < η Ku K ∧ θ K,M exit ∧ τ mut. } the following stopping time θ K, near n * = inf t ≥θ K : ν K t − x∈X ((g,p),(g,p)) n * x ((g, p), (g,p))δ x We used that, at timeθ K , the stochastic process ν K (considered as element of R |X (g,p) |+k ) lies in the compact set A, where A is defined in (3.86).
The third invasion step. After time θ K, near n * we use again comparisons with multi-type branching processes to show that all individuals carrying a trait which is not present in the new equilibrium n * die out. To this aim let us define X n * extinct = {(g, p) ∈ X ((g,p),(g,p)) : n * (g,p) ((g, p), (g,p)) = 0} (3.91) For proving that the populations with traits in X n * extinct stay small after θ K, near n * and that the populations with traits not in X n * extinct stay close to its equilibrium value after θ K, near n * , let us define θ K, not small = inf t ≥ θ K, near n * : ∃(g, p) ∈ X n * extinct such that ν K t (g, p) > (3.92) and θ K,M exit n * ≡ inf t ≥ θ K, near n * : ν K t − x∈X ((g,p),(g,p)) n * x ((g, p), (g,p))δ x By using first the strong Markov property at θ K, near n * , we can apply Theorem 3.4 (ii) and obtain that there exist constants M > 0 and C 7 > 0 such that, for all small enough, lim K↑∞ P θ K < θ K, near n * < τ mut. ∧ η Ku K and θ K,M exit n * < e KV ∧ τ mut. ∧ θ K, not small < C 7 (3.94) This is obtained in a similar way as Equation (3.64) in the first step. Note that (g, p) ∈ X n * extinct implies that (g, p i ) ∈ X n * extinct for all p i ∈ [p] g , which is a consequence of Assumption 2. Using the same arguments as in the first step, we can construct, for all (g, p) ∈ X n * extinct , a |[p] g |-type continuous-time branching process Y ,(g,p) (s) with initial condition such that, for all K large enough and, for all t ∈ [θ K, near n * , θ K,M exit n * ∧ θ K, not small ∧ τ mut. ], Moreover, Y ,(g,p) (t) is characterised as follows: For each p i ∈ [p] g , each individual in Y ,(g,p) (t) with trait (g, p i ) undergoes (i) birth (without mutation) with rate b(p i ) + 2(|[p] g | − 1)s ind. (M + |X n * extinct |) (ii) death with rate d(p i ) + (ĝ,p)∈X (g,p),(g,p) c(p i ,p)n * (ĝ,p) ((g, p), (g,p)) −c(M + |X n * extinct |) (iii) for all j i, switch to p j with rate s g nat. (p i , p j ) + (ĝ,p)∈X (g,p),(g,p) s g ind. (p i , p j )(p)n * (ĝ,p) ((g, p), (g,p)) −s ind. (M + |X n * extinct |) . Let A(Y ,(g,p) ) denote the infinitesimal generator of the process Y ,(g,p) . Since the equilibrium n * ((g, p), (g,p)) is locally strictly stable (cf. Ass. 3), the eigenvalues of the Jacobian matrix of the dynamical system at n * ((g, p), (g,p)) are all strictly negative. If is small enough, this implies that all eigenvalues of {A(Y ,(g,p) ), (g, p) ∈ X n * extinct } are strictly negative. (There exists an order of the elements of X (g,p),(g,p) such that the Jacobian matrix is an upper-block-triangular matrix and {A(Y 0,(g,p) ), (g, p) ∈ X n * extinct } are on the diagonal.) Thus, for all small enough, the branching processes {Y ,(g,p) , (g, p) ∈ X n * extinct } are all subcritical. Moreover, we can apply Lemma 3.7 and get, for all small enough and (g, p) ∈ X n * extinct lim K↑∞ P inf{t ≥ 0 : Y ,(g,p) (t) = 0} ≤ and there exists a constant C 8 such that, for all small enough and (g, p) ∈ X n * extinct , lim K↑∞ P inf{t ≥ 0 : Y ,(g,p) (t) = K } ≤ inf{t ≥ 0 : Y ,(g,p) (t) = 0} ≤ C 8 . (3.98) Hence, there exists a constant M > 0 and C 9 > 0 such that, for all η > 0 and small enough, lim K↑∞ P θ K < θ K,M Jump < τ mut. ∧ η Ku K ∧ θ K, not small ≥ 1 − q (g,p) (g,p) − C 9 , (3.99) which finishes the proof of the theorem.
Combining all the previous results, we can prove similar as in [8] that for, all > 0, t > 0 and Γ ⊂ X, where Λ is the PES with phenotypic plasticity defined in Theorem 3.3. Finally, generalising this to any sequence of times 0 < t 1 < . . . < t n , implies that (ν K t/Ku K ) t≥0 converges in the sense of finite dimensional distributions to (Λ t ) t≥0 (cf. [8], Cor. 1 and Lem. 1), which ends the proof of Theorem 3.3. Figure 3 shows two examples where in a population consisting only of type (g, p) and being close to n(g, p) a mutation to genotypeg occurs. In these example,g is associated with two possible phenotypesp 1 andp 2 .

Examples.
A B Figure 3. Simulations of the invasion phase with K = 1000. (A) The mutant phenotypẽ p 1 has a negative initial growth rate but can switch top 2 which has a positive one. The fitness of the genotypeg is positive. (B) The fitness of the mutant genotypeg is positive, although each phenotype has a negative initial growth rate. This is possible because an outgoing switch is a loss of a cell for a phenotype, but not for the whole genotype.
In example (A), we start with a single mutant carrying trait (g,p 1 ) and which can switch tõ p 2 but the back-switch is relative weak (cf. Tab. 2). According to definition (3.70) we have f (g,p) (g,p 1 ) < 0 and f (g,p) (g,p 2 ) > 0. However, the global fitness of the genotypeg is positive. More precisely, it is given by the largest eigenvalue of −3 2 0.6 1 , which equals approximatively 1.280. Therefore, the multi-type branching process approximating the mutant population in the first step is supercritical. This does not depend on the phenotype of the first mutant, i.e. we would have the same if we had started with a single mutant carrying trait (g,p 2 )). However, the probability of invasion depends this. In this example, the invasion probability is given by the solution of Thus, if we start with the trait (g,p 1 ), the invasion probability is approximately 0.199. Whereas it is 0.338 if the first one has trait (g,p 2 ). In Figure 3 (A), the mutant population with genotypeg survives and the stochastic process is attracted to the new equilibrium n * ((g, p), (g,p 1 ), (g,p 2 )) ≈ (0, 0.543, 2.554), which is a strictly stable.