Abstract
We introduce two models for random trees with multiple states motivated by studies of trait dependence in the evolution of species. Our discrete time model, the multiple state ERM tree, is a generalization of Markov propagation models on a random tree generated by a binary search or ‘equal rates Markov’ mechanism. Our continuous time model, the multiple state Yule tree, is a generalization of the tree generated by a pure birth or Yule process to the tree generated by multi-type branching processes. We study state dependent topological properties of these two random tree models. We derive asymptotic results that allow one to infer model parameters from data on states at the leaves and at branch-points that are one step away from the leaves.
Similar content being viewed by others
References
Aldous DJ (1996) Probability distributions on cladograms. Random Discrete Structures, (IMA Volumes Math Appl 76), pp 1–18
Aldous DJ (2001) Stochastic models and descriptive statistics for phylogenetic trees, from Yule to today. Stat Sci 16(1):23–34
Aldous D, Popovic L (2005) A critical branching process model for biodiversity. Adv Appl Probab 37:1094–1115. doi:10.1239/aap/1134587755
Fitzjohn RG (2010) Quantitative traits and diversification. Syst Biol 59:619–633
Fitzjohn RG (2012) What drives biological diversification? Detecting traits under species selection. University of British Columbia, PhD Thesis
Gascuel O, Steel M (2014) Predicting the ancestral character changes in a tree is typically easier than predicting the root state. Syst Biol 63(3):421–435
Goldberg EE, Igic B (2012) Tempo and mode in plant breeding system evolution. Evolution 66:3701–3709
Goldberg EE, Lancaster LT, Ree RH (2011) Phylogenetic inference of reciprocal effects between geographic range evolution and diversification. Syst Biol 60:451–465
Harding EF (1971) The probabilities of rooted tree-shapes generated by random bifurcation. Adv Appl Probab 3:44–77
Janson S (2004) Functional limit theorems for multitype branching processes and generalized Pólya urns. Stoch Process Appl 110(2):177–245
Jones G (2011) Calculations for multi-type age-dependent binary branching processes. J Math Biol 63(1):33–56
Lambert A, Popovic L (2013) The coalescent point-process of branching trees. Ann Appl Probab 23(1):99–144. doi:10.1214/11-AAP820
Maddison WP, Midford PE, Otto SP (2007) Estimating a binary character’s effect on speciation and extinction. Syst Biol 56(5):701–710
McKenzie A, Steel M (2000) Distributions of cherries for two models of trees. Math Biosci 164(1):81–92
Mode CJ (1962) Some multi-dimensional birth and death processes and their applications in population genetics. Int Biometric Soc 18(4):543–567
Mooers AO, Heard SB (1997) Inferring evolutionary process from phylogenetic tree shape. Q Rev Biol 72(1):31–54
Mossel E, Steel M (2005) How much can evolved characters tell us about the tree that generated them? In: Gascuel O (ed) Mathematics of evolution and phylogeny, chap 14. Oxford University Press, Oxford, pp 384–412
Mossel E, Steel M (2014) Majority rule has transition ration 4 on Yule trees under a 2-state symmetric model. J Theor Biol 18(360):315–318
Nee S, May RH, Harvey PH (1994) The reconstructed evolutionary process. Philos Trans Roy Soc B 344(1309):305–311
NG J, Smith SD (2014) How traits shape trees: new approaches for detecting character state-dependent lineage diversification. J Evol Bio. doi:10.1111/jeb.12460
Popovic L, Rivas M (2014) The coalescent point-process of multi-type branching trees. Stoch Process Appl 124(12):4120–4148
Smythe RT (1996) Central limit theorems for urn models. Stoch Process Appl 65(1):115–137
Yule GU (1924) A mathematical theory of evolution, based on the conclusions of Dr. J. C. Willis. Philos Trans Roy Soc London Ser B 213:21–87
Acknowledgments
We thank the referees for constructive comments and suggestions which improved the paper’s exposition. This research was supported by NSERC (Natural Sciences and Engineering Research Council of Canada) Discovery Grant # 346197-2010.
Author information
Authors and Affiliations
Corresponding author
Appendix
Appendix
Proof of Lemma 12
In the event that \(\varvec{Z}(T)=0\) there is nothing to prove, so we consider \(\varvec{W}\) on the event \(\varvec{Z}(T)\ne 0\Leftrightarrow \varvec{W}(0)\ne 0\) (and \(\varvec{W}(T)\ne 0\) as well).
For any \(n\ge 1\) let \(0\le t_0\le t_1\le \cdots \le t_n\le T\), we denote the joint distribution of \(\varvec{W}\) at these times by
We first show, by induction, that \(\forall n\ge 1\)
This is evident for \(n=2\). Assume the equation is true \(\forall i\le n-1\) with \(n>2\). Notice that
The branching property of the birth-death process \(\varvec{Z}\) guarantees independence of its subtrees originating from non-overlapping subsets of individuals present at any time \(t_1\). Since all individuals surviving at time T must be descendants of the process \(\varvec{W}\), we have
where \(C_{\varvec{z}_1,\varvec{w}_1}\) denotes the combinatorial number of distinct ways of choosing \(\varvec{w}_1\) out of \(\varvec{z}_1\) individuals, and \(p_{\varvec{z}}^{\varvec{0}}(t,T)={\mathbb {P}}[\varvec{Z}(T)=0|\varvec{Z}(t)=\varvec{z}]\) is the extinction probability by time T of the process \(\varvec{Z}\) started at time t with \(\varvec{Z}(t)=\varvec{z}\).
Given \(\varvec{Z}({t_1})=\varvec{w}_1\), the process \((\varvec{Z}(t))_{t\ge t_1}\) is the sum of birth-death processes defined by subtrees \(\{\mathcal T^{(i)}\}, i=1,\ldots ,|\varvec{w}_1|\), originated by one of each of the \(|\varvec{w}_1|\) individuals at time \(t_1\). We may assume that each \({\mathcal {T}}^{(i)}\) is started by an individual of state \(\tau ^{(i)}\), where \(\tau ^{(1)},\ldots ,\tau ^{(|\varvec{w}_1|)}\) is some ordering of the \(|\varvec{w}_1|\) surviving originator states. Probability for the surviving lineages is
where \(\varvec{W}(t)({\mathcal {T}}^{(i)})\) denotes the number of individuals of \({\mathcal {T}}^{(i)}\) at time t which have a surviving lineage at time T. Since the subtrees \({\mathcal {T}}^{(i)}\) are independent
where \(\varvec{e}_i\) denotes the unit k-dimensional vector whose i-th coordinate is 1 and all other coordinates are 0, and the summation is over all possible decompositions of \(\varvec{w}_j\) into vectors \((\varvec{w}_j^{(i)})_{i=1,\ldots ,|\varvec{w}_1|}\) with all nonzero coordinate values, for each \(j=2,\dots , n\). By the inductive hypothesis (10) for \(n-1\), the probabilities in the product on the right side are equal to
where the last equality follows from (11) and (12) since
As the first factor on the right side above does not depend on \((\varvec{w}_n^{(i)})_{i=1,\ldots ,|\varvec{w}_1|}\) the sum in (13) may be split into outer sums, over \(2\le j\le n-1\), and an inner sum, over \(j=n\) that is equal to
By the same argument using splitting over independent subtrees, but this time splitting the individuals at time \(t_{n-1}\) into subsets of sizes \((\varvec{w}_{n-1}^{(i)})_{i=1,\ldots ,|\varvec{w}_1|}\), we can show that this sum contributes to the outer sums a factor of
where the last equality follows again from Eqs. (11) and (12), and combining with the outer sums in (13) implies
as wanted. By using once again Eqs. (11) and (12), this becomes Eq. (10) for step n. Equation (10) may be written in terms of conditional probabilities as
which implies the Markov property for \((\varvec{W}(t))_{t\ge 0}\). \(\square \)
Rights and permissions
About this article
Cite this article
Popovic, L., Rivas, M. Topology and inference for Yule trees with multiple states. J. Math. Biol. 73, 1251–1291 (2016). https://doi.org/10.1007/s00285-016-0992-6
Received:
Revised:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00285-016-0992-6
Keywords
- Ancestral tree
- Multi-type branching process
- Yule tree
- Binary search tree
- Tree topology
- Parameter reconstruction