Skip to main content
Log in

Ancestries of a recombining diploid population

  • Published:
Journal of Mathematical Biology Aims and scope Submit manuscript

Abstract

We derive the exact one-step transition probabilities of the number of lineages that are ancestral to a random sample from the current generation of a bi-parental population that is evolving under the discrete Wright–Fisher model with \(n\) diploid individuals. Our model allows for a per-generation recombination probability of \(r\). When \(r=1\), our model is equivalent to Chang’s (Adv Appl Probab 31:1002–1038, 1999) model for the karyotic pedigree. When \(r=0\), our model is equivalent to Kingman’s (Stoch Process Appl 13:235–248, 1982) discrete coalescent model for the cytoplasmic tree or sub-karyotic tree containing a DNA locus that is free of intra-locus recombination. When \(0 < r <1\) our model can be thought to track a sub-karyotic ancestral graph containing a DNA sequence from an autosomal chromosome that has an intra-locus recombination probability \(r\). Thus, our family of models indexed by \(r\in [0,1]\) connects Kingman’s discrete coalescent to Chang’s pedigree in a continuous way as \(r\) goes from \(0\) to \(1\). For large populations, we also study three properties of the ancestral process corresponding to a given \(r\in (0,1)\): the time \(\fancyscript{T}_n\) to a most recent common ancestor (MRCA) of the population, the time \(\fancyscript{U}_n\) at which all individuals are either common ancestors of all present day individuals or ancestral to none of them, and the fraction of individuals that are common ancestors at time \(\fancyscript{U}_n\). These results generalize the three main results of Chang’s (Adv Appl Probab 31:1002–1038, 1999). When we appropriately rescale time and recombination probability by the population size, our model leads to the continuous time Markov chain called the ancestral recombination graph of Hudson (Theor Popul Biol 23:183–201, 1983) and Griffiths (The two-locus ancestral graph, Institute of Mathematical Statistics 100–117, 1991).

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

References

  • Amato P, Tachibana M, Sparman M, Mitalipov S (2014) Three-parent invitro fertilization: gene replacement for the prevention of inherited mitochondrial diseases. Fertil Steril 101(1):31–35

    Article  Google Scholar 

  • Athreya K, Ney P (2004) Branching processes. Dover Publications Inc., Mineola, NY, reprint of the 1972 original [Springer, New York)

  • Barton N, Etheridge A (2011) The relation between reproductive value and genetic contribution. Genetics 188:953–973

    Article  Google Scholar 

  • Cameron P (1994) Combinatorics: topics, techniques, algorithms. Cambridge University Press, Cambridge

    MATH  Google Scholar 

  • Chang J (1999) Recent common ancestors of all present-day individuals. Adv Appl Probab 31(4):1002–1038 with discussion and reply by the author

    Article  MATH  Google Scholar 

  • Donnelly P, Wiuf C, Hein J, Slatkin M, Ewens W, Kingman J (1999) Discussion: Recent common ancestors of all present-day individuals. Adv Appl Probab 31:1027–1035

    Article  Google Scholar 

  • Ethier S, Kurtz T (1986) Markov processes: characterization and convergence. Wiley, New York

    Book  MATH  Google Scholar 

  • Fisher R (1930) The Genetical Theory of Natural Selection. Clarenson, Oxford

    Book  MATH  Google Scholar 

  • Gallagher J (2015) UK approves three-person babies. BBC News Retrieved 17 March 2015 from http://www.bbc.com/news/health-31594856a

  • Gravel S, Steel M (2014) The existence and abundance of ‘ghost’ ancestors in biparental populations. Preprint

  • Griffiths R (1991) The two-locus ancestral graph. In: Basawa I, Taylor R (eds) Selected Proceedings of the Sheffield Symposium on Applied Probability: Held at the University of Sheffield, Sheffield, August 16–19, 1989, IMS Lecture Notes—onograph Series, vol 18, Institute of Mathematical Statistics, pp 100–117

  • Griffiths R, Marjoram P (1997) An ancestral recombination graph. In: Donnelly P, Tavaré S (eds) Progress in Population Genetics and Human Evolution, IMA Volumes in Mathematics and its Applications, vol 87. Springer, pp 257–270

  • Gusfield D (2014) ReCombinatorics: The Algorithmics of Ancestral Recombination Graphs and Explicit Phylogenetic Networks. MIT Press, Cambridge

    Google Scholar 

  • Hudson R (1983) Properties of a neutral allele model with lntragenic recombination. Theor Popul Biol 23:183–201

    Article  MATH  Google Scholar 

  • Hurst G, Jiggins F (2005) Problems with mitochondrial DNA as a marker in population, phylogeographic and phylogenetic studies: the effects of inherited symbionts. Proc R Soc B 272:1525–1534

    Article  Google Scholar 

  • Kämmerle K (1989) Looking forwards and backwards in a bisexual Moran model. J Appl Probab 26(4):880–885

    Article  MATH  MathSciNet  Google Scholar 

  • Kämmerle K (1991) The extinction probability of descendants in bisexual models of fixed population size. J Appl Probab 28(3):489–502

    Article  MATH  MathSciNet  Google Scholar 

  • Kemeny J, Snell J (1960) Finite Markov chains. D. van Nostrand Company Inc, Princeton

    MATH  Google Scholar 

  • Kingman J (1982) The Coalescent. Stoch Process Appl 13:235–248

    Article  MATH  MathSciNet  Google Scholar 

  • Lachance J (2009) Inbreeding, pedigree size, and the most recent common ancestor of humanity. J Theor Biol 261:238–247

    Article  MathSciNet  Google Scholar 

  • Matsen F, Evans S (2008) To what extant does genealogical ancestry imply genetic ancestry? Theor Pop Biol 74:182–190

    Article  MATH  Google Scholar 

  • Möhle M (1994) Forward and backward processes in bisexual models with fixed population sizes. J Appl Probab 31(2):309–332

    Article  MATH  MathSciNet  Google Scholar 

  • Möhle M (1998) A convergence theorem for Markov chains arising in population genetics and the coalescent with selfing. Adv Appl Probab 30:493–512

    Article  MATH  Google Scholar 

  • Ralph P (2009) Most recent common ancestors, genetic inheritance, stochastic gene transcription, and Brownian motion on disconnected sets: a probabilistic analysis of a few models. Ph.D. Thesis

  • Stein W et al (2009) Sage Mathematics Software (Version 4.2.1). The Sage Development Team, http://www.sagemath.org

  • Wakeley J, King L, Low B, Ramachandran S (2012) Gene genealogies within a fixed pedigree, and the robustness of Kingman’s coalescent. Genetics 190:1433–1445

    Article  Google Scholar 

  • White D, Wolff J, Pierson M, Gemmell N (2008) Revealing the hidden complexities of mtDNA inheritance. Mol Ecol 17:4925–4942

    Article  Google Scholar 

  • Wright S (1931) Evolution in Mendelian populations. Genetics 16:97–159

    Google Scholar 

Download references

Acknowledgments

B.T. was partly supported by post-doctoral fellowships at Department of Statistics, University of Oxford, UK (EPSRC grant EP/E05885X/1) and at Instituto de Matemática e Estatística, Universidade de São Paulo, Brasil (CNPq Processo 151782/2010-5 & MaCLinC). B.T. would like to thank Professor Yoshiharu Kohayakawa for many useful discussions related to Theorems 23 and Corollary 1. R.S. was partly supported by a visiting scholarship at Department of Mathematics, Cornell University, Ithaca, NY, USA, a sabbatical grant from College of Engineering, University of Canterbury, and consulting revenues from Wynyard Group, Christchurch, NZ. R.S. thanks Robert C. Griffiths for an introduction to his ARGs, Alison Etheridge for discussions on stationary behaviour of the ancestral size chain, Jae Young Choi and Neil Gemmell for discussions on cytoplasmic inheritance, and Krithika Yogeeswaran for discussions on the nested embeddings in Fig. 3. A.V. was supported by the ANR project MANEGE (ANR-09-BLAN-0215) and R.S. and A.V. were supported in part by the chaire Modélisation Mathématique et Biodiversité of Veolia Environnement–École Polytechnique–Museum National d’Histoire Naturelle–Fondation X. The authors would like to thank the Referees and Associate Editor for their helpful suggestions to improve the presentation of the paper.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to R. Sainudiin.

Appendices

Appendix A: Galton–Watson processes with Poissonian offspring distribution

In this section, we collect a few facts about Galton–Watson processes. These facts are used in the proofs of Theorems 2 and 3. Recall that a Galton–Watson process with offspring distribution \(\mu \) (a probability distribution on \({\mathbb Z}_+\)) counts the number of individuals alive in each generation in a population evolving as follows: each individual in generation \(k\ge 0\) gives birth to a random number of descendants with law \(\mu \), independently of each other; generation \(k+1\) is then made of all these offspring. If the population becomes extinct at some time, its remains extinct for all later generations (and by extension we say that the Galton–Watson process becomes extinct—stuck at \(0\) —at that time).

The following lemmas summarize Lemmas 4 and 16 of Chang (1999) (with minor modifications, since Chang’s results are for a Galton–Watson process with offspring distribution \(\mathtt {Poisson}(2)\)) and some well-known general results from Chapter 1 of the book by Athreya and Ney (2004). Therefore, we do not give all of their proofs here. We write \(m\) for the expectation of \(\mu \), \(\sigma ^2\) for its variance, and \(\mathbb {P}_i\) (and \(\mathbb {E}_i\)) for the law under which the population starts with \(i\) individuals.

Lemma 11

Let \((Y_t)_{t\in {\mathbb Z}_+}\) be a Galton–Watson process with \(m < 1\). Let \(\tau _0 := \inf \{t : Y_t = 0 \}\) be the extinction time of \((Y_t)_{t\in {\mathbb Z}_+}\). Then for any \(k\in {\mathbb Z}_+\),

$$\begin{aligned} \mathbb {P}_1[\tau _0 > k] < m^k. \end{aligned}$$

Furthermore, if \(\sigma ^2<\infty \) we have for large \(k\)

$$\begin{aligned} \mathbb {P}_1[\tau _0 >k] \ge \frac{1-m}{\sigma ^2}\, m^{k+1}. \end{aligned}$$

When \(\mu \) is the law \(\mathtt {Poisson}(\lambda )\) for some \(\lambda >0\), we have \(m=\lambda =\sigma ^2<\infty \).

Lemma 12

Let \((Y_t)_{t\in {\mathbb Z}_+}\) be a Galton–Watson process with offspring distribution \(\mathtt {Poisson}(\lambda )\) for a given \(\lambda >0\). Let \(\psi \) be the probability generating function of \(X\), where \(X\) is a random variable with \(\mathtt {Poisson}(\lambda )\) law. That is, for any \(z\in [0,1]\), \(\psi (z)=\mathbb {E}[z^X]= e^{-\lambda (1-z)}\). Let also \(\varrho \) be the smallest solution in \([0,1]\) to \(\psi (x)=x\). Then

  1. (i)

    The probability \(\mathbb {P}_1[\tau _0<\infty ]\) that \((Y_t)_{t\in {\mathbb Z}_+}\) becomes extinct in finite time, starting with 1 individual, is equal to \(\varrho >0\).

  2. (ii)

    The Markov chain \((Y_t/\lambda ^t)_{t\in {\mathbb Z}_+}\) is a martingale. As \(t\) tends to infinity, it converges a.s. to a random variable \(M\) satisfying \(\{M>0\}=\{Y\ \mathrm {survives\ for\, ever}\}\).

  3. (iii)

    Let \((b_t)_{t\ge 0}\) be a sequence of positive integers such that \(\ln b_t = o(t)\) as \(t \rightarrow \infty \). Then

    $$\begin{aligned} \lim _{t \rightarrow \infty } \frac{\ln \mathbb {P}_1[1 \le Y_t \le b_t]}{t} = \ln (\lambda \varrho ). \end{aligned}$$

The first two points in Lemma 12 hold in fact for any Galton–Watson process (with \(\lambda \) replaced by its mean offspring distribution in \((ii)\)). The last point says in essence that, up to polynomial prefactors, the probability that \(Y_t\) is still positive but grows less than exponentially in \(t\) decays like \(e^{-|\ln (\lambda \varrho )|t}\) as \(t\rightarrow \infty \). Observe that the product \(\lambda \varrho \) is always less than one (except in the critical case \(\lambda =1\) in which we shall not be interested), since when \(\lambda <1\) we have \(\varrho =1\), and when \(\lambda >1\), \(\lambda \varrho = \psi '(\varrho )<1\) (\(\psi \) is strictly convex, \(\psi (0)>0\) and \(\varrho <1\) is the smallest positive value at which \(\psi (x)=x\), the largest being \(x=1\)).

Finally, let us give a comparison result between a single family size \((G_t)_{t\in {\mathbb Z}_+}\) (resp., the size of the family of non-descendants \((B_t)_{t\in {\mathbb Z}_+}\)) and a Galton–Watson process with offspring distribution \(\mathtt {Poisson}(1+r)\) (resp., \(\mathtt {Poisson}(1-r)\)).

Lemma 13

Let \((Y^+_t)_{t\in {\mathbb Z}_+}\) (resp., \((Y^-_t)_{t\in {\mathbb Z}_+}\)) be a Galton–Watson process with offspring distribution given by \(\mathtt {Poisson}(1+r)\) (resp., \(\mathtt {Poisson}(1-r)\)). For any \(b>0\), let \(\tau _b^Y:=\inf \{t:\, Y_t\ge b\}\), \(\tau _{0,b}^Y:=\inf \{t:\, Y_t\ge b \hbox { or }Y_t=0\}\) and \(\tau _0^Y:=\inf \{t:\, Y_t=0\}\) (where \(Y=Y^+\) or \(Y^-\)). Define the same quantities for the processes \(G\) and \(B\). Here again, \(\mathbb {P}_i\) denotes the probability measure under which the process under consideration starts at \(i\).

  1. (i)

    If \(k\) and \(b\) grow with \(n\) in such a way that \(kb^2=o(n)\), then as \(n\rightarrow \infty \)

    $$\begin{aligned} \mathbb {P}_1\big [\tau ^G_b>k\big ]= & {} \mathbb {P}_1\big [\tau ^{Y^+}_b>k\big ](1+o(1)) \quad \hbox {and}\quad \mathbb {P}_1\big [\tau ^G_{0,b}>k\big ]\\= & {} \mathbb {P}_1\big [\tau ^{Y^+}_{0,b}>k\big ](1+o(1)). \end{aligned}$$
  2. (ii)

    If for some \(\alpha \in (0,1/4)\) and \(\gamma \in (2\alpha ,1/2)\) we have \(i=\fancyscript{O}(n^\alpha )\) and \(k=o(n^{1-2\gamma })\), then as \(n\rightarrow \infty \)

    $$\begin{aligned} \mathbb {P}_i\big [\tau ^B_0>k\big ]= \mathbb {P}_i\big [\tau ^{Y^-}_0>k\big ](1+o(1)). \end{aligned}$$

Remark 5

Note that \(i=\fancyscript{O}(n^\alpha )\) means that \(i\) is bounded by a constant times \(n^\alpha \), which allows to take \(i\) constant in \(n\) or growing more slowly than \(n^\alpha \).

In words, despite the dependency between the different family sizes in our original model, the early development of a single family \((G_t)_{t\in {\mathbb Z}_+}\) is very close to that of a \(\mathtt {Poisson}(1+r)\) Galton–Watson process. Likewise, as soon as there remains much less than \(n\) individuals in \(B\), the extinction of this subpopulation occurs in the same way as in a \(\mathtt {Poisson}(1-r)\) Galton–Watson process.

Proof

The proof of \((i)\) is similar to that of Lemma 3 of Chang (1999), and so we omit it here. The proof of \((ii)\) follows the same lines but is a bit more complex. Our main aim is to show that as long as the processes do not grow too much and we do not look at too many generations, their transition probabilities are equivalent. Then, since \(Y^-\) starting below \(n^\alpha \) will not grow beyond \(n^\gamma \) before going extinct with very high probability, neither will \(B\) and extinction will occur roughly at the same time (in distribution) for both.

Let us thus consider \(x,y\le n^\gamma \). Recall that conditionally on \(B_t=x\), \(B_{t+1}\sim \mathtt {Bin}(n,(1-r)\frac{x}{n} + r\frac{x^2}{n^2})\). Since the sum of \(x\) independent \(\mathtt {Poisson}(1-r)\) random variables has the law \(\mathtt {Poisson}((1-r)x)\), we have for any \(t\in {\mathbb Z}_+\)

$$\begin{aligned}&\frac{\mathbb {P}[B_{t+1}=y\, |\, B_t=x]}{\mathbb {P}[Y^-_{t+1}=y\, |\, Y^-_t=x]} = \frac{\left( {\begin{array}{c}n\\ y\end{array}}\right) \big ((1-r)\frac{x}{n}+r\frac{x^2}{n^2}\big )^y \big (1- (1-r)\frac{x}{n}-r\frac{x^2}{n^2}\big )^{n-y}}{e^{-(1-r)x}(1-r)^yx^y/y!}\\&\qquad = \frac{n!}{(n-y)!n^y}\bigg (1+\frac{r}{1-r}\, \frac{x}{n}\bigg )^y \exp \bigg \{(1-r)x +(n-y)\ln \bigg (1- (1-r)\frac{x}{n}-r\frac{x^2}{n^2}\bigg )\bigg \} \end{aligned}$$

But \(x,y\le n^\gamma \ll n\), and so a first order Taylor expansion gives us that

$$\begin{aligned} \bigg (1+\frac{r}{1-r}\, \frac{x}{n}\bigg )^y = e^{\frac{r}{1-r}\frac{xy}{n}+\fancyscript{O}\big (\frac{yx^2}{n^2}\big )} \le e^{\frac{2r}{1-r}n^{2\gamma -1}} \end{aligned}$$

and

$$\begin{aligned} \exp \bigg \{(1-r)x +(n-y)\ln \bigg (1- (1-r)\frac{x}{n}-r\frac{x^2}{n^2}\bigg )\bigg \} \le e^{C\big (\frac{x^2 + xy}{n}\big )+\fancyscript{O}\big (\frac{yx^2}{n^2}\big )} \le e^{C' n^{2\gamma -1}}. \end{aligned}$$

Together with the fact that \(n!/((n-y)!\,n^y)\le 1\), we obtain that

$$\begin{aligned} \frac{\mathbb {P}[B_{t+1}=y\, |\, B_t=x]}{\mathbb {P}[Y^-_{t+1}=y\, |\, Y^-_t=x]}\le e^{C_rn^{2\gamma -1}} \end{aligned}$$
(8.5)

for a constant \(C_r>0\) independent of \(x\) and \(y\) (recall that \(\gamma <1/2\)).

The same analysis, separating the cases \(y=0,1\) and \(y\ge 2\) and using the fact that for \(1< y\le n^\gamma \) we have

$$\begin{aligned} \frac{n!}{(n-y)!n^y}=\prod _{j=0}^{y-1}\bigg (1-\frac{j}{n}\bigg )\ge e^{-\frac{y(y-1)}{n}}\ge e^{-1/n}, \end{aligned}$$

shows that

$$\begin{aligned} \frac{\mathbb {P}[B_{t+1}=y\, |\, B_t=x]}{\mathbb {P}[Y^-_{t+1}=y\, |\, Y^-_t=x]} \ge e^{C'_r n^{-1}} \end{aligned}$$
(8.6)

for a constant \(C'_r\) independent of \(x\) and \(y\). Putting together Eqs. (8.5) and (8.6), we obtain that

$$\begin{aligned} \mathbb {P}[B_{t+1}=y\, |\, B_t=x] = \mathbb {P}[Y^-_{t+1}=y\, |\, Y^-_t=x] \big (1+o(n^{2\gamma -1})\big ), \end{aligned}$$

where the remainder is uniform in \(x\in \{1,\ldots ,n^\gamma \}\) and \(y\in \{0,\ldots ,n^\gamma \}\). As a consequence, for any \(x_0,\ldots ,x_{k-1}\in \{1,\ldots ,n^\gamma \}\) and \(x_k\in \{0,\ldots ,n^\gamma \}\), we have

$$\begin{aligned} \mathbb {P}[B_0=x_0,\ldots ,B_k=x_k]= & {} \mathbb {P}[Y^-_0=x_0,\ldots ,Y^-_k=x_k]\big (1+o(n^{2\gamma -1})\big )^k \\= & {} \mathbb {P}[Y^-_0=x_0,\ldots ,Y^-_k=x_k]\, e^{o(kn^{2\gamma -1})}. \end{aligned}$$

Summing over all paths corresponding to the event considered and using the fact that \(kn^{2\gamma -1}=o(1)\) as \(n\rightarrow \infty \), we can write that

$$\begin{aligned} \mathbb {P}_i\big [\tau _{0,n^\gamma }^{B}>k\big ] = \mathbb {P}_i\big [\tau _{0,n^\gamma }^{Y^-}>k\big ]\big (1+o(1)\big ) \end{aligned}$$

and

$$\begin{aligned} \mathbb {P}_i\big [\tau _{0,n^\gamma }^{B}\le k\, ;\, B_{\tau _{0,n^\gamma }}\ge n^\gamma \big ] = \mathbb {P}_i\big [\tau _{0,n^\gamma }^{Y^-}\le k\, ;\, Y^-_{\tau _{0,n^\gamma }}\ge n^\gamma \big ]\big (1+o(1)\big ), \end{aligned}$$

the latter being the probabilities that the process leaves \(\{1,\ldots ,n^\gamma -1\}\) before time \(k\) and by going over \(n^\gamma \). Now,

$$\begin{aligned} \mathbb {P}_i\big [\tau _0^{B}>k\big ] = \mathbb {P}_i\big [\tau _{0,n^\gamma }^{B}>k\big ] + \mathbb {P}_i\big [\tau _{0,n^\gamma }^{B}\le k\, ;\, \tau _0^{B}>k \big ]. \end{aligned}$$
(8.7)

From the above, the first term in the r.h.s. is equal to the corresponding term for \(Y^-\) up to a vanishing error term. As concerns the second quantity in the r.h.s., it is bounded by

$$\begin{aligned} \mathbb {P}_i\big [\tau _{0,n^\gamma }^{B}\le k\, ;\, B_{\tau _{0,n^\gamma }}\ge n^\gamma \big ]= \mathbb {P}_i\big [\tau _{0,n^\gamma }^{Y^-}\le k\, ;\, Y^-_{\tau _{0,n^\gamma }}\ge n^\gamma \big ]\big (1+o(1)\big ) \end{aligned}$$

To finish the proof, let us show that the probability that \(Y^-\), starting below \(n^\alpha \), reaches \(n^\gamma \) before becoming extinct tends to \(0\) as \(n\rightarrow \infty \). Together with Eq. (8.7), this will give us the desired result since Eq. (8.7) holds also with \(B\) replaced by \(Y^-\).

Since \(Y^-\) cannot grow beyond \(n^\gamma \) unless one of the \(i\le C n^\alpha \) families emanating from an initial individual reaches \(n^{\gamma -\alpha }/C\), we have

$$\begin{aligned} \mathbb {P}_i\big [\tau _{0,n^\gamma }^{Y^-}\le k\, ;\, Y^-_{\tau _{0,n^\gamma }} \ge n^\gamma \big ]\le & {} i\mathbb {P}_1\big [Y^- \hbox { ever reaches }n^{\gamma -\alpha }/C\big ]\\\le & {} n^\alpha \sum _{j=1}^\infty \mathbb {P}_1\big [Y^-_j\ge n^{\gamma -\alpha }/C \big ]. \end{aligned}$$

But \(\mathbb {E}_1[Y^-_j]=(1-r)^j\), and so the Markov inequality applied to each term in the sum gives us

$$\begin{aligned} \mathbb {P}_i\big [\tau _{0,n^\gamma }^{Y^-}\le k\, ;\, Y^-_{\tau _{0,n^\gamma }} \ge n^\gamma \big ] \le n^\alpha \sum _{j=0}^\infty Cn^{\alpha -\gamma }(1-r)^j = \frac{C}{r}\,n^{2\alpha -\gamma }. \end{aligned}$$

As \(\gamma >2\alpha \), this quantity goes to zero as \(n\) tends to infinity.

Appenndix B: Proof of Eq. (8.1)

Here we derive the approximation of Eq. (8.1) in detail.

Note first that we have the following approximation:

$$\begin{aligned} \frac{n_{[j]}}{n^j} := \frac{n (n-1) \cdots (n-(j-1))}{n^j} = \prod _{k=1}^{j-1} \left( 1-\frac{k}{n}\right) = 1- \left( {\begin{array}{c}j\\ 2\end{array}}\right) \frac{1}{n}+\fancyscript{O}\left( \frac{1}{n^2}\right) . \end{aligned}$$

We will first consider some special cases. Fix \(I,J,K\) as before such that \(|I| = i, |K| = k, |J| = j\). For \(M\subseteq I\), let \(A(M)\) be the set of parents of vertices in \(M\).

Lemma 14

$$\begin{aligned} |B(i+s|i,s)| = \frac{(i+s)!}{2^s}. \end{aligned}$$
(8.8)

Proof

We have \(|I| = i, |K| = s, |J| = i+s\). In this case, no two vertices in \(I\) have a common parent, hence

$$\begin{aligned} |B(i+s|i,s)| = \left( {\begin{array}{c}i+s\\ 2s\end{array}}\right) \left( {\begin{array}{c}2s\\ \underbrace{2,2,\ldots ,2}_{\text {s times}}\end{array}}\right) (i-s)! = \frac{(i+s)!}{2^s}. \end{aligned}$$

The first factor is the number of ways to select \(A(K)\). The second factor is the number of ways to assign parents to vertices in \(K\), each vertex being assigned 2 distinct parents. The last factor is the number of ways to assign parents to vertices in \(I{\backslash }K\), each vertex being assigned a distinct parent.

Lemma 15

$$\begin{aligned} |B(i+s|i,s+1)| = \frac{(i+s)!((i-s-1)(i+3s+2)+4s(s+1))}{2^{s+2}} \end{aligned}$$
(8.9)

Proof

We have 3 cases.

Case 1 \(|A(K)| = 2s+2\), \(|A(I \backslash K)| = i - s - 2\), and \(A(I \backslash K) = J \backslash A(K)\). The contribution to \(|B(i+s|i,s+1)|\) in this case is

$$\begin{aligned} \left( {\begin{array}{c}i+s\\ 2s+2\end{array}}\right) \left( {\begin{array}{c}2s+2\\ \underbrace{2,2,\ldots ,2}_{\text {(s+1) times}}\end{array}}\right) \left( {\begin{array}{c}i-s-1\\ 2\end{array}}\right) (i-s-2)! = \frac{(i+s)!(i-s-1)(i-s-2)}{2^{s+2}} \end{aligned}$$
(8.10)

The first factor is the number of ways to select \(A(K)\). The second factor is the number of ways to assign parents to vertices in \(K\), each vertex being assigned 2 distinct parents. The last two factors together give the number of ways to assign parents to vertices in \(I\backslash K\); which is the number of onto maps from \(A(I\backslash K)\) to \(J \backslash A(K)\).

Case 2 \(|A(K)| = 2s+2\), \(|A(I \backslash K)| = i - s - 1\), and \(|A(K) \cap A(I \backslash K)| = 1\). The contribution to \(|B(i+s|i,s+1)|\) in this case is

$$\begin{aligned} \left( {\begin{array}{c}i+s\\ 2s+2\end{array}}\right) \left( {\begin{array}{c}2s+2\\ \underbrace{2,2,\ldots ,2}_{\text {(s+1) times}}\end{array}}\right) (2s+2)(i-s-1)! \!=\! \frac{(i+s)!(i\!-\!s\!-\!1)(2s+2)}{2^{s+1}} \end{aligned}$$
(8.11)

The first two factors are as in Case 1. The third factor is the number of ways to select the (only) vertex in \(A(K) \cap A(I \backslash K)\). The last factor is the number of ways to assign distinct parents to vertices in \(I\backslash K\).

Case 3 \(|A(K)| = 2s+1\) and \(A(I \backslash K) = J \backslash A(K)\). We have \(|A(I \backslash K)| = i - s - 1\). Also, there are \(s-1\) vertices in \(K\) with two distinct parents each, 2 vertices in \(K\) with one common parent, and \(i-s-1\) vertices in \(I\backslash K\) with distinct parents. The contribution to \(|B(i+s|i,s+1)|\) in this case is

$$\begin{aligned} \left( {\begin{array}{c}i+s\\ 2s-2,3,i-s-1\end{array}}\right) \left( {\begin{array}{c}s+1\\ 2\end{array}}\right) 6 \left( {\begin{array}{c}2s-2\\ \underbrace{2,2,\ldots ,2,}_{\text {s-1 times}}\end{array}}\right) (i-s-1)! = \frac{(i+s)!s(s+1)}{2^{s}} \end{aligned}$$
(8.12)

The first factor gives the number of partitions of \(J\) in 3 parts as described above. The second factor is the number of ways to select the two vertices in \(K\) that have a common parent; and the third factor is the number of ways to assign 2 parents to each of them, with one parent in common. The forth factor is the number of ways to assign parents to the remaining \(s-1\) vertices in \(K\), each vertex being assigned 2 distinct parents. The last factor is the number of ways to assign distinct parents to vertices in \(I\backslash K\).

Now \(|B(i+s,i,s+1)|\) is obtained by adding the contributions in Eqs. (8.10), (8.11) and (8.12).

Let us now return to the Proof of Eq. (8.1) and consider the case \(j\ge i\). We must have \(k \ge j-i\), i.e., more recombinants than additional lineages. To find an approximation of \(^{n,r}P_{i,j}\), we use the expression obtained in Theorem  1 [more precisely, we use Eq. (4.4)]. We first evaluate the order of magnitude (in \(n\)) of each term appearing in the sum over \(k\in \{j-i,\ldots ,i\}\). We have

$$\begin{aligned} \left( {\begin{array}{c}n\\ j\end{array}}\right) \frac{1}{n^{i-k} \left( {\begin{array}{c}n\\ 2\end{array}}\right) ^{k}} = \frac{2^kn!}{j!(n-j)!n^i(n-1)^k} = \frac{2^k}{j! n^{i-j}(n-1)^k}\,\bigg (1- \left( {\begin{array}{c}j\\ 2\end{array}}\right) \frac{1}{n}+\fancyscript{O}\left( \frac{1}{n^2}\right) \bigg ). \end{aligned}$$
(8.13)

Hence, the term corresponding to \(k\) will be of the order of \(\fancyscript{O}(n^{-2})\) whenever \(k\ge j-i+2\). From now on, we thus consider the terms \(k=j-i\) and \(k=j-i+1\) only. Let us write \(j=i+s\), with \(0\le s\le i\). Suppose first that \(k=s\). Using again the notation \(|B(j|i,k)|\) for the number of bipartite graphs defined in Sect. 4 (where we replaced the sets \(I\), \(J\), \(K\) by their cardinalities since only these quantities matter), we can write

$$\begin{aligned} \left( {\begin{array}{c}n\\ i+s\end{array}}\right)&\frac{1}{n^{i-s}\left( {\begin{array}{c}n\\ 2\end{array}}\right) ^s}\, \left( {\begin{array}{c}i\\ s\end{array}}\right) r^s(1-r)^{i-s} |B(i+s|i,s)| \\&= \bigg (1- \left( {\begin{array}{c}i+s\\ 2\end{array}}\right) \frac{1}{n}+\fancyscript{O}\bigg (\frac{1}{n^2}\bigg )\bigg )\frac{2^s}{(i+s)!(1-1/n)^s}\, \left( {\begin{array}{c}i\\ s\end{array}}\right) r^s(1-r)^{i-s}\frac{(i+s)!}{2^s}\\&= \left( {\begin{array}{c}i\\ s\end{array}}\right) r^s(1-r)^{i-s} \bigg (1- \left( {\begin{array}{c}i+s\\ 2\end{array}}\right) \frac{1}{n}+\fancyscript{O}\bigg (\frac{1}{n^2}\bigg )\bigg )\bigg (1+\frac{s}{n}+ \fancyscript{O}\bigg (\frac{1}{n^2}\bigg )\bigg )\\&= \left( {\begin{array}{c}i\\ s\end{array}}\right) r^s(1-r)^{i-s} \bigg (1- \bigg \{\left( {\begin{array}{c}i+s\\ 2\end{array}}\right) -s\bigg \} \frac{1}{n}+\fancyscript{O}\bigg (\frac{1}{n^2}\bigg )\bigg ), \end{aligned}$$

where the first equality uses Eqs. (8.13) and (8.8).

Let us now suppose that \(k=s+1\). This time, we have

$$\begin{aligned} \left( {\begin{array}{c}n\\ i+s\end{array}}\right)&\frac{1}{n^{i-s-1}\left( {\begin{array}{c}n\\ 2\end{array}}\right) ^{s+1}}\, \left( {\begin{array}{c}i\\ s+1\end{array}}\right) r^{s+1}(1-r)^{i-s-1} |B(i+s|i,s+1)| \\&= \frac{2^{s+1}}{(i+s)!(n-1)(1-1/n)^s}\left( 1-\left( {\begin{array}{c}i+s\\ 2\end{array}}\right) \frac{1}{n}+\fancyscript{O}\left( \frac{1}{n^2}\right) \right) \left( {\begin{array}{c}i\\ s+1\end{array}}\right) r^{s+1}(1-r)^{i-s-1}\\&\quad \times \frac{(i+s)!\big ((i-s-1)(i+3s+2)+4s(s+1)\big )}{2^{s+2}}\\&= \frac{1}{2(n-1)}\left( {\begin{array}{c}i\\ s+1\end{array}}\right) r^{s+1}(1-r)^{i-s-1}\big ((i-s-1)(i+3s+2)+4s(s+1)\big ) +\fancyscript{O}\left( \frac{1}{n^2}\right) , \end{aligned}$$

where we have used Eq. (8.9). Combining the above, we obtain the desired approximation when \(j\ge i\).

Next, we consider the case where \(j = i-s\), with \(s > 0\). Using again Eq. (8.13), we see that the terms appearing in the sum over \(k\) in the expression of \(^{n,r}P_{i,j}\) will be \(\fancyscript{O}(1/n^2)\) whenever \(s+k\ge 2\), which will be the case whenever \(k\ge 1\) or \(k=0\) and \(s\ge 2\). We thus concentrate on the case \(k=0\) and \(s=1\) only, corresponding to the scenario where a single pair of lineages coalesces and no recombinations occur. Since there are \(\left( {\begin{array}{c}i\\ 2\end{array}}\right) \) possible choices for the pair of lineages that coalesces and then \((i-1)!\) possible allocations of parents, we have \(|B(i-1|i,0)| = (i-1)!\left( {\begin{array}{c}i\\ 2\end{array}}\right) \) and thus

$$\begin{aligned} \left( {\begin{array}{c}n\\ i-1\end{array}}\right) \frac{1}{n^i}\,(1-r)^i|B(i-1|i,0)|&= \frac{1}{(i-1)!\, n}\left( 1-\fancyscript{O}\left( \frac{1}{n}\right) \right) (1-r)^i (i-1)!\left( {\begin{array}{c}i\\ 2\end{array}}\right) \\&= \frac{1}{n}\,\left( {\begin{array}{c}i\\ 2\end{array}}\right) (1-r)^i \left( 1-\fancyscript{O}\left( \frac{1}{n}\right) \right) . \end{aligned}$$

This entails the approximation given on the first line of Eq. (8.1).

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Sainudiin, R., Thatte, B. & Véber, A. Ancestries of a recombining diploid population. J. Math. Biol. 72, 363–408 (2016). https://doi.org/10.1007/s00285-015-0886-z

Download citation

  • Received:

  • Revised:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00285-015-0886-z

Keywords

Mathematics Subject Classification

Navigation