Identifiability of a Markovian model of molecular evolution with gamma-distributed rates

Elizabeth S. Allman; Cécile Ané; John A. Rhodes

doi:10.1239/aap/1208358894

Identifiability of a Markovian model of molecular evolution with gamma-distributed rates

Part of: Markov processes

Published online by Cambridge University Press: 01 July 2016

Elizabeth S. Allman ,

Cécile Ané and

John A. Rhodes

Show author details

Elizabeth S. Allman*: Affiliation:
University of Alaska Fairbanks
Cécile Ané*: Affiliation:
University of Wisconsin Madison
John A. Rhodes*: Affiliation:
University of Alaska Fairbanks
*: ∗ Postal address: Department of Mathematics and Statistics, University of Alaska Fairbanks, PO Box 756660, Fairbanks, AK 99775, USA.
∗∗∗ Department of Statistics, University of Wisconsin Madison, Medical Science Center, 1300 University Avenue, Madison, WI 53706, USA. Email address: ane@stat.wisc.edu
∗ Postal address: Department of Mathematics and Statistics, University of Alaska Fairbanks, PO Box 756660, Fairbanks, AK 99775, USA.

Article contents

Abstract
References

Rights & Permissions

Abstract

Core share and HTML view are not available for this content. However, as you have access to this content, a full PDF is available via the ‘Save PDF’ action button.

Inference of evolutionary trees and rates from biological sequences is commonly performed using continuous-time Markov models of character change. The Markov process evolves along an unknown tree while observations arise only from the tips of the tree. Rate heterogeneity is present in most real data sets and is accounted for by the use of flexible mixture models where each site is allowed its own rate. Very little has been rigorously established concerning the identifiability of the models currently in common use in data analysis, although nonidentifiability was proven for a semiparametric model and an incorrect proof of identifiability was published for a general parametric model (GTR + Γ + I). Here we prove that one of the most widely used models (GTR + Γ) is identifiable for generic parameters, and for all parameter choices in the case of four-state (DNA) models. This is the first proof of identifiability of a phylogenetic model with a continuous distribution of rates.

Keywords

Phylogenetics identifiability

MSC classification

Primary: 60J25: Continuous-time Markov processes on general state spaces

Secondary: 92D15: Problems related to evolution 92D20: Protein sequences, DNA sequences

Type: General Applied Probability
Information: Advances in Applied Probability , Volume 40 , Issue 1 , March 2008 , pp. 229 - 249

DOI: https://doi.org/10.1239/aap/1208358894 [Opens in a new window]
Copyright: Copyright © Applied Probability Trust 2008

References

Allman, E. S. and Rhodes, J. A. (2006). The identifiability of tree topology for phylogenetic models, including covarion and mixture models. J. Comput. Biol. 13, 1101–1113.Google Scholar

Allman, E. S. and Rhodes, J. A. (2008). Identifying evolutionary trees and substitution parameters for the general Markov model with invariable sites. Math. Biosci. 211, 18–33.CrossRef Google Scholar PubMed

Chang, J. T. (1996). Full reconstruction of Markov models on evolutionary trees: identifiability and consistency. Math. Biosci. 137, 51–73.CrossRef Google Scholar PubMed

Felsenstein, J. (2004). Inferring Phylogenies. Sinauer Associates, Sunderland, MA.Google Scholar

Gascuel, O. and Guidon, S. (2007). Modelling the variability of evolutionary processes. In Reconstructing Evolution: New Mathematical and Computational Advances, eds Gascuel, O. and Steel, M., Oxford University Press, pp. 65–107.Google Scholar

Horn, R. A. and Johnson, C. R. (1985). Matrix Analysis. Cambridge University Press.Google Scholar

Kolaczkowski, B. and Thornton, J. (2004). Performance of maximum parsimony and likelihood phylogenetics when evolution is heterogeneous. Nature 431, 980–984.Google Scholar

Matsen, F. A. and Steel, M. A. (2007). Phylogenetic mixtures on a single tree can mimic a tree of another topology. Syst. Biol. 56, 767–775.Google Scholar

Matsen, F. A., Mossel, E. and Steel, M. (2008). Mixed-up trees: the structure of phylogenetic mixtures. To appear in Bull. Math. Biol. Google Scholar

Pagel, M. and Meade, A. (2004). A phylogenetic mixture model for detecting pattern-heterogeneity in gene sequence or character-state data. Syst. Biol. 53, 571–581.Google Scholar

Rogers, J. S. (2001). Maximum likelihood estimation of phylogenetic trees is consistent when substitution rates vary according to the invariable sites plus gamma distribution. Syst. Biol. 50, 713–722.Google Scholar

Semple, C. and Steel, M. (2003). Phylogenetics (Oxford Lecture Ser. Math. Appl. 24). Oxford University Press.Google Scholar

Steel, M. A., Székely, L. and Hendy, M. D. (1994). Reconstructing trees from sequences whose sites evolve at variable rates. J. Comput. Biol. 1, 153–163.Google Scholar

Štefankovič, D. and Vigoda, E. (2007). Phylogeny of mixture models: robustness of maximum likelihood and non-identifiable distributions. J. Comput. Biol. 14, 156–189.Google Scholar

Štefankovič, D. and Vigoda, E. (2007). Pitfalls of heterogeneous processes for phylogenetic reconstruction. Syst. Biol. 56, 113–124.Google Scholar

Sullivan, J., Swofford, D. L. and Naylor, G. J. P. (1999). The effect of taxon sampling on estimating rate heterogeneity parameters of maximum-likelihood models. Molec. Biol. Evolution 16, 1347–1356.Google Scholar

Yang, Z. (1994). Maximum likelihood phylogenetic estimation from DNA sequences with variable rates over sites: approximate methods. J. Molec. Evol. 39, 306–314.Google Scholar

Article contents

Identifiability of a Markovian model of molecular evolution with gamma-distributed rates

Abstract

Keywords

MSC classification

References

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests