Genetic contribution of an advantaged mutant in the biparental Moran model

We consider a population of haploid individuals reproducing sexually, i.e. for which the genome of each individual is a random mixture of the genome of its two parents. We assume that initially one individual carries a mutation at one locus, and that individuals carrying this mutation have an advantage regarding genome transmission. Our aim is to study the long time effect of this mutation on the genetic composition of the population, when population size is large.

1 Motivation and model

Motivation
In this article we aim at understanding how the advantage conferred to an individual by a genetic mutation influences its contribution to the genome of a population after a long time.This article follows two other works ( [3] and [4]) that focus on the contribution of ancestors to the genome of a sexually reproducing population, in which each individual has two parents who contribute equally to the genome of their offspring.More precisely, we consider a population of haploid individuals reproducing sexually, i.e. for which the genome of each individual is a random mixture of the genome of its two parents.We assume that initially a proportion a of these individuals carry a mutation at one locus, and that individuals carrying this mutation have an increased life expectancy, and therefore are advantaged, regarding genome transmission.Our aim is to study the long time effect of this mutation on the genetic composition of the population, when population size is large, and in particular to quantify the impact of the strength of selection, on the genetic contribution of an ancestor, to the population.Biparental genealogies have received some interest, notably in [2,5,7], in which time to recent common ancestors and ancestors' weights are investigated for the Wright-Fisher biparental model.In [3], we studied the asymptotic law of the contribution of an ancestor, to the genome of the present time population.The articles [9] and [1] study the link between pedigree, individual reproductive success and genetic contribution.The monoparental Moran model with selection at birth has received some interest, notably in [6] that studies its dual coalescent and [8] that notably studies alleles fixation probabilities and ancestral lines.Finally, the limiting case where the strength of selection is infinite was studied in [4].

Model
As in the previous papers [3] and [4], we consider a population of N individuals whose dynamics through time is modeled using a Moran biparental model.However this model is modified in order to take into account the advantage conferred by a mutation of interest.We will in fact assume that selection impacts only the death of individuals.More precisely, we assume that the population is composed of a fixed number N of individuals.Next we assume that at a given locus, a mutation confers some advantage to the individuals that carry it.Advantaged individuals (that carry the mutation) are characterized by a death weight 1 whereas non advantaged individuals are characterized by a death weight 1 + s.At each discrete time step, two individuals are chosen independently and uniformly to be parents, and produce one offspring.This offspring replaces a third individual, chosen with a probability proportional to its death weight.Therefore at each time step, a given non-advantaged individual has a probability to die that is 1 + s times more important than that of a given advantaged individual.Hence advantaged individuals have an increased life expectancy and consequently a larger mean offspring number.Genetic transmission is assumed to follow Mendel rules, which means that for a given locus, one of the two parents is chosen uniformly (among the two) and transmits its allele (i.e. a copy of its genome at this locus) to the offspring.This transmission is not independent for loci that are close on the genome, but can be considered as independent for loci that are on different chromosomes for instance.In particular the transmission of advantage to offspring is assumed to be characterized by Mendelian transmission at one locus.In other words, the offspring inherits the level of advantage of one of its parents, chosen uniformly at random among its two parents.By convention, we call "mother" the parent that transmits its allele at the locus under mutation, and "father" the other parent.Note that the limiting case where s is equal to infinity (therefore the individual that dies is always chosen among non advantaged individuals) has been studied in the previous paper [4].Comparisons to this previous work will be provided later on.Let us denote by I = {1, 2, ..., N } the sites in which individuals live, and denote by (µ n , π n , κ n ) ∈ I 3 the respective positions of the mother, father, and offspring at time step n.As in [3], this reproduction dynamics defines an oriented random graph on I × Z + (as represented in Figure 1), denoted G N , representing the pedigree of the population, such that between time n and time n + 1, two arrows are drawn from (κ n , n + 1) to (π n , n) and (µ n , n) respectively and N −1 arrows are drawn from (i, n+1) to (i, n) for each i ∈ I \{κ n }.Note that individuals are now characterized by their advantage : advantaged individuals are represented in red.We finally denote by Y n ⊂ {1, ..., N } the set of advantaged individuals at time n and denote by {F n , n ∈ Z + } the natural filtration associated to the stochastic process (µ n , π n , κ n , Y n ) n∈Z + .

Ancestors genetic weights
To study the impact of selection on the genetic composition of the population we now consider a new locus, distant enough from the locus under mutation, so that the genome is assumed to be transmitted independently at these two loci.Then we consider a gene sampled in one of the individuals present in the population at time n and our interest is to study the probability for this gene, to come from each of the individuals living at time 0. The genealogy of this gene (i.e. the individual in which a copy of this gene was present, assuming no mutation and no recombination, at each time t , is a random walk on the pedigree G N , starting from the position (i, n).
The key element in this model, as introduced in [3] is therefore the random variable This quantity is a random variable, since it is a deterministic function of the random graph G N .This quantity, given the genealogy G N , is the probability that any gene of individual i living in generation n comes from ancestor j living at generation 0. Illustrations are given in Figure 1.If genome size is very large and the evolutions of distant genes are sufficiently decorrelated, we can expect this quantity to be close to the proportion of genes of individual i that come from individual j.This quantity will also be called the weight of the ancestor j in the genome of individual i.

Main result
Let us denote by (Y n ) n∈Z + the set of advantaged individuals at time step n, and (Y n = |Y n |) n∈Z + the number of advantaged individuals at each time n ∈ Z + .We are interested in the impact of selection on the weight of ancestors, therefore on the probability that a gene sampled in the population comes from an advantaged individual.
This comes back to studying the two following quantities : which is a random variable (as deterministic function of the pedigree between time 0 and time n) giving the probability that a gene sampled uniformly among advantaged individuals at time n comes from an initially advantaged individual, given the pedigree between time 0 and time n, and gives the probability that a gene sampled uniformly among non-advantaged individuals at time n comes from an initially advantaged individual, given the pedigree between time 0 and time n.These two random quantities can be then seen as the weight of advantaged individuals among advantaged and disadvantaged individuals, respectively.An example of weight of the initially advantaged individuals is given in Figure 1.We will focus on the expectation of these two random variables, and more specifically on the expectation of the first one once the advantageous mutation is fixed (i.e.once Y n = N which happens at a stopping time we denote by T Y N ), when we start from a fixed proportion a of individual carrying this mutation.Our main result is the following : Theorem (Theorem 2.9).

E(Ξ
(1 − a) It can be shown that this quantity increases with s and converges to 2 √ a − a as s ↑ ∞.This result (for s = ∞) can also be retrieved from [4].As an example, for large enough s, an initial proportion of advantaged individuals equal to a = 1% would yield an average asymptotic genetic weight close to 19% for this set of 1% initially advantaged individuals.

Number of advantaged individuals dynamics
Recall that (Y n ) n∈Z + is the set of advantaged individuals at time step n, and .
This Markov chain is absorbed in 0 and in N .
Proof.At each time step, as only one individual dies and one individual arises, the number of advantaged individual can only be increased or decreased by 1, or remain the same.Now if the number of advantaged individual is equal to k, then the probability for the individual that dies at this time step to be advantaged, is equal to . Now for the Markov chain (Y n ) n∈Z + to decrease, one needs that the mother is a non-advantaged individual, while the replaced individual is advantaged.The first event occurs with probability (N − k)/N while the second event occurs with probability . Similarly, for the Markov chain (Y n ) n∈Z + to increase, one needs that the mother is an advantaged individual, while the replaced individual is non-advantaged.The first event occurs with probability k/N while the second event occurs with probability (1+s . Finally for the Markov chain Y n to stay at position k, one needs that both the mother and replaced individual are either both advantaged, or both disadvantaged, which gives that As p 0 = p N = 0, this gives that the states 0 and N are absorbing.
Proof.By decomposing according to the duration spent by the Markov chain , one has :

Mean weight of advantaged individuals
Our aim in this work is to quantify the effect of selection on the contribution of an ancestor to the genome of the population.As mentioned in Section 1.4, we therefore focus on the following quantities : that are respectively the probability that a gene sampled uniformly among advantaged (resp.disadvantaged) individuals, comes from an initially advantaged individual.Indeed, denoting by U(S) the uniform law on any set S ⊆ I, (2.1) These two quantities give a measure of the quantity of genome that comes from the initially advantaged population, respectively among advantaged and disadvantaged individuals, knowing the pedigree, i.e. the parental relationship between individuals.We are particularly interested in the order of magnitude of Ξ A n and Ξ B n , when n goes to infinity, and when the population is large.The case of greatest interest is when the advantageous mutation ends up invading all the population, i.e. when T Y N < ∞, which happens with probability almost 1 as soon as Y 0 ≥ aN for a positive a and large N .In this article we will then aim at studying ).To this aim let us define (F Y n ) n∈Z + the natural filtration associated to the stochastic process (Y n ) n∈Z + , and (2.2) As previously, the quantity U n (resp.V n ) gives the probability, given F Y n , that a gene sampled among advantaged (resp.disadvantaged) individuals at time n, comes from any advantaged individual at time 0. Note that These two stochastic processes are useful because they satisfy the following proposition : . (2.4) , where As mentioned previously, this gives that if Proof.Recall that U n = P(X is the equal to the probability that a gene sampled uniformly in advantaged individuals at time n comes from an advantaged individual at time 0 (backwards in time).Similarly, V n is equal to the probability that a gene sampled uniformly in non-advantaged individuals at time n comes from an advantaged individual at time 0. Now if Y n+1 = Y n + 1 then as mentioned in the proof of Proposition 2.1, it means that at time n an advantaged individual is chosen to be the mother, and a non advantaged individual is chosen to die.Therefore if Y n = k and Y n+1 = Y n + 1 = k + 1, then a gene sampled uniformly among advantaged individuals at time n + 1 is either carried by an advantaged individual already present at time n (with probability k/(k + 1)), or is sampled in the newly advantaged individual (with probability 1/(k + 1)).In the first case a gene sampled in the individual comes from an advantaged individual at time 0 with probability U n .In the second case, it comes from an advantaged individual at time 0 with probability The first term of the left-hand side of the previous calculation corresponds to the case where the gene sampled at time n + 1 comes from the (advantaged) mother of the considered individual (which occurs with probability 1/2), and the second term corresponds to the case where the sampled gene comes from the father.Therefore .
On the contrary, a gene sampled uniformly among non-advantaged individuals at time n + 1 comes necessarily from an individual that was already non advantaged at time n, if This ends the first point of the proposition.
In the second case where Y n+1 = Y n − 1, the mother chosen at time n must be a nonadvantaged individual, while the dead individual must be an advantaged mutant.Therefore if Y n+1 = Y n −1 and Y n = k, then a gene sampled uniformly among advantaged individuals at time n + 1 is necessarily carried by an advantaged individual already present at time n, which gives that U n+1 = U n .Now a gene sampled uniformly at time n + 1 among non advantaged individuals is either sampled among the previously non advantaged individuals (with probability (N −k)/(N −k+1)), or is sampled in the newly non advantaged individual (with probability 1/(N − k + 1)).In the first case this gene comes from an advantaged individual at time 0 with probability V n , and in the second case this gene comes from an advantaged individual with probability 1/2 which ends the second point of the proposition.
Let us now assume that both the mother and the dying individual at time n are advantaged.Then if one gene is sampled among non advantaged individuals at time n + 1 then it is necessarily carried by a non advantaged individual already present at time n.So V n+1 = V n .If a gene is sampled among advantaged individuals at time n + 1 then it is either sampled in the new individual (with probability 1/k) in which case it comes from an advantaged individual at time 0 with probability 1/2 V n or it is sampled in an other advantaged individual (with probability (k − 1)/k) in which case it comes from an advantaged individual at time 0 with probability U n .In the end this gives that in this particular case where Y n+1 = Y n = k and both the mother and the dying individual at time n are advantaged, Let us finally assume that both the mother and the dying individual at time n are non advantaged individuals.Then if one gene is sampled among advantaged individuals at time n + 1 then it necessarily is the copy of a gene already present in an advantaged individual of time n.So U n+1 = U n .If a gene is sampled among non advantaged individuals at time n + 1 then it is either sampled in the new individual (with probability 1/(N − k)) in which case it comes from an advantaged individual at time 0 with probability or it is sampled in an other non advantaged individual (with probability (N − k − 1)/(N − k)) in which case it comes from an advantaged individual at time 0 with probability V n .In the end this gives that in this particular case where Y n+1 = Y n = k and both the mother and the dying individual at time n are non advantaged individuals, This ends the proof.
Proof.Both points of the proposition are immediate, recalling the formulas (2.5), (2.3) and (2.4) for the respective matrices It is simpler to consider only the times at which the Markov chain jumps, and set .
• The eigenvalues associated to the left eigenvector Proof.The time during which the Markov chain (Y n ) n∈Z + stays in the state k before jumping follows a geometric law (with value in Z + ) with probability of success which was defined in Proposition 2.1.
As the matrix B(0) k is stochastic, so is the matrix Let us then write This equation, together with Equation (2.5) gives first that where the second equality in Equation (2.8) is obtained using Mathematica (see Section A).Now recall that . which gives the first point of the proposition.
where I is the identity matrix of size 2. Therefore, recalling that . which gives the second point of the proposition.
The previous proposition gives in particular that Proposition 2.6.(i) For any time n ≤ inf(T Z 0 , T Z N ), where S + n (k), and S − n (k) are respectively the number of transition from k to k + 1, and from k to k − 1 in the trajectory of (Z l ) l≤n , and (ii) One has , and .
Proof.From Proposition 2.5 one has the following system of equations : Therefore first which gives Equation (2.12).Now using the first equation of (2.14), one has : which gives that Similarly using the second equation of (2.14) gives that and therefore In this article we focus on the situation in which the mutation is already present in a significant proportion of the individuals, and study the order of magnitude of the weight of ancestors once the advantageous mutation has spread in the population.This leads us to use a continuous approximation, in which the previous equations will be replaced by differential equations.The first step of our study consists in studying the expectation of the difference of weights between advantaged and disadvantaged individuals.Recall that T Z y = inf{n : Z n = y}.
Although we are particularly interested in the case where b = 1, we start by considering the case where b > 1, in order to simplify the approximation.The limit where b goes to 1 will be tackled in the final step of the proof of Theorem 2.9.

Proof. Define for any
By definition, the function ϕ D is such that for any k Lϕ D (k) = 0, , where where the function R is such that there exists a positive constant where and from Markov property, as s > 0.
Besides, recall from Proposition 2.2 that P(T Z aN/2 < ∞|Z 0 = aN ) decreases exponentially with N , when a is fixed.Therefore The first quantity has an exponential bound from Proposition 2.2.The second quantity is bounded by C/N , from Equation 2.15 and since Pushing the same approach further and using the previous proposition, we get that Proposition 2.8.For any 0 < a < b < 1, Moreover, when N is large, .
Our aim from here is to prove that the functions ϕ V and ψ V are close to each other.Setting ψ V (k) = f V (k/N ), we have Therefore where the function R is such that there exists a positive constant C such that |R(x)| ≤ C N 2 .Hence, since ψ V is bounded, for all k ∈ [ [1, bN ]].Therefore for all k ∈ [ [1, bN ]].Now as in the proof of Proposition 2.7, one can introduce a function ψ Ṽ which coincides with ψ Ṽ above ⌊aN/2⌋ and with ϕ Ṽ below.Next, reasoning exactly as in the proof of Proposition 2.7, the fact that P(T Z aN.2 |Z 0 = aN ) decreases exponentially with N when a is fixed gives that there exists a constant C such that ψV − ϕ V (k) ≤ C/N for all k ∈ [[aN, bN ]] which gives the result for Ṽ .The expression for Ũ then follows by Proposition 2.7.
We can now complete the proof of our theorem : Theorem 2.9.
(1 − a) Note that from Equation (2.13), the sequences ( U n ) n∈Z + and ( V n ) n∈Z + are respectively decreasing and increasing.In particular for any ϵ > 0, if T Z N < ∞, What is more, .
Letting ϵ go to 0 gives the result.

Figure 1 :
Figure 1: Graph G N representing the pedigree of a population with 5 individuals, during 10 time steps.The time orientation is from past to future and arrows materialize gene flow between individuals, when going backward in time.Advantaged individuals are represented in red.Numbers at the bottom gives the probability that a gene sampled in each of the individuals come from the initially advantaged individual.In this example the genetic weight of the initially advantaged individual (i.e. the probability that a gene sampled uniformly at time n = 9 comes from this individual at time n = 0) is equal to 21/40 = 1/5(1 + 1/2 + 1/2 + 1/4 + 3/8).
the number of advantaged individuals at each time step n ∈ Z + .This number of advantaged individuals satisfies the following Proposition 2.1.The stochastic process ( jump time of the Markov chain Y .Then let us set for any n ≤ sup{n : τ n < ∞}, Z n = Y τn , which is the sequence of states visited by the Markov chain (Y n ) n∈Z + , and set for any y ∈ {0, 1, ..., }, T Z y = τ T Y y = inf{n : Z n = y}, and T Z y,z = inf{n : Z n ∈ {y, z}}.Then the stochastic process (Z n ) n∈N follows the Proposition 2.2.The stochastic process (Z n ) n∈N is a simple biased random walk absorbed in 0 and N : as long as Z n ∈ {1, ..., N − 1}, Z n+1 ∈ {Z n − 1, Z n + 1} and

2 k 2 +
In the last case where Y n+1 = Y n = k, the mother and dying individual chosen at time n are either both advantaged or either both non advantaged.Since the first event occurs with probability k N × k k+(1+s)(N −k) and the second event occurs with probability (N −k) N × (1+s)(N −k) k+(1+s)(N −k) this gives that the probability of the first event conditioned on the fact that Y n+1 = Y n = k is equal to k (1+s)(N −k) 2 and the probability of the second event conditioned on the fact that

.
Our aim is to prove that the functions ϕ D and ψ D are close to each other on [⌊aN ⌋, ⌊bN ⌋] for large N .Let us define the infinitesimal generator L on the set of real valued functions g on [[0, N ]] vanishing in 0 and N by all x ∈ [0, b].One has for all x ∈ [0, b],

2 .
the function R is such that there exists a positive constant C such that | R(x)| ≤ C N 2 for all x ∈ [a, b].Hence there exists a positive constant C ′ such that for all k ∈ [⌊aN ⌋, ⌊bN ⌋], |Lψ(k)| ≤ C N Let us now define the function ψ by ψD (k) = ψ D (k) for all k ∈ [⌊aN/2⌋, ⌊bN ⌋] and ψD (k) = ϕ D (k) for all k ∈ [0, ⌊aN/2⌋[.Our aim from now is to prove that the functions ϕ D and ψD are close to each other on [⌊aN ⌋, ⌊bN ⌋].Note that since ψD − ϕ D vanishes in 0 and N , First, by definition of Ũ , E(Ξ A T Y N 1 T Y N <∞ |Y 0 = ⌊aN ⌋) = E( ŨT Z N 1 T Z N <∞ |Y 0 = ⌊aN ⌋).