Tree-based maximal likelihood substitution matrices and hidden Markov models

Mitchison, G.; Durbin, R.

doi:10.1007/BF00173195

Tree-based maximal likelihood substitution matrices and hidden Markov models

Articles
Published: December 1995

Volume 41, pages 1139–1151, (1995)
Cite this article

Journal of Molecular Evolution Aims and scope Submit manuscript

G. Mitchison¹ &
R. Durbin¹

107 Accesses
32 Citations
Explore all metrics

Abstract

There has been considerable interest in the problem of making maximum likelihood (ML) evolutionary trees which allow insertions and deletions. This problem is partly one of formulation: how does one define a probabilistic model for such trees which treats insertion and deletion in a biologically plausible manner? A possible answer to this question is proposed here by extending the concept of a hidden Markov model (HMM) to evolutionary trees. The model, called a tree-HMM, allows what may be loosely regarded as learnable affine-type gap penalties for alignments. These penalties are expressed in HMMs as probabilities of transitions between states. In the tree-HMM, this idea is given an evolutionary embodiment by defining trees of transitions. Just as the probability of a tree composed of ungapped sequences is computed, by Felsenstein's method, using matrices representing the probabilities of substitutions of residues along the edges of the tree, so the probabilities in a tree-HMM are computed by substitution matrices for both residues and transitions. How to define these matrices by a ML procedure using an algorithm that learns from a database of protein sequences is shown here. Given these matrices, one can define a tree-HMM likelihood for a set of sequences, assuming a particular tree topology and an alignment of the sequences to the model. If one could efficiently find the alignment which maximizes (or comes close to maximizing) this likelihood, then one could search for the optimal tree topology for the sequences. An alignment algorithm is defined here which, given a particular tree topology, is guaranteed to increase the likelihood of the model. Unfortunately, it fails to find global optima for realistic sequence sets. Thus further research is needed to turn the tree-HMM into a practical phylogenetic tool.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Allison L, Wallace CS, Yee CN (1992) Minimum message length encoding, evolutionary trees and multiple alignment. 25th Hawaii Intern Conf on System Sciences 1:663–674
Google Scholar
Barry D, Hartigan JA (1987) Statistical analysis of hominoid molecular evolution. Stat Sci 2:191–210
Google Scholar
Berg OG, von Hippel PH (1987) Selection of DNA binding sites by regulatory proteins: statistical-mechanical theory and application to operators and promoters. J Mol Biol 193:723–750
Google Scholar
Bishop MJ, Thompson EA (1986) Maximum likelihood alignment of DNA sequences. J Mot Biol 190:159–165
Google Scholar
Bishop MJ, Friday AE, Thompson EA (1987) Inference of evolutionary relationships. In: Bishop M, Rawlings CJ (eds) Nucleic acid and protein sequence analysis. IRL Press, Oxford, pp 359–385
Google Scholar
Brown M, Hughey R, Krogh A, Mian IS, Sjoelander K, Haussler D (1993) Using Dirichlet mixture priors to derive hidden Markov models for protein families. Proc Intelligent Systems for Molecular Biology, Washington DC
Cavalli-Sforza LL, Edwards AWE (1967) Phylogenetic analysis: models and estimation procedures. Evolution 21:550–570
Google Scholar
Cox DR, Miller HD (1965) The theory of stochastic processes. Chapman and Hall, London
Google Scholar
Dayhoff MO, Schwartz RM, Orcutt BC (1978) A model of evolutionary change in proteins. In: Dayhoff MO (ed) Atlas of protein sequence and structure, vol 5 suppl 3. pp 345–352
Eddy SR, Mitchison G, Durbin R (1995) Maximum discrimination hidden Markov models of sequence consensus. J Comput Biol 2:923
Google Scholar
Edwards AWF, Cavalli-Sforza LL (1963) The reconstruction of evolution. Ann Hum Genet 27:105
Google Scholar
Edwards AWF, Cavalli-Sforza LL (1964) Reconstruction of evolutionary trees. In: Heywood VH, McNeill J (Eds) Phenetic and phylogenetic classification. Systematics Assoc, London, Publ No. 6, pp 67–76
Google Scholar
Felsenstein J (1981) Evolutionary trees from DNA sequences: a maximum likelihood approach. J Mol Evol 17:368–376
CAS PubMed Google Scholar
Felsenstein J (1988) Phylogenies from molecular sequences: inference and reliability. Annu Rev Genet 22:521–565
Google Scholar
Gonnet GH, Cohen MA, Benner SA (1992) Exhaustive matching of the entire protein sequence database. Science 256:1443–1445
Google Scholar
Gribskov M, McLachlan AS, Eisenberg D (1987) Profile analysis: detection of distantly related proteins. Proc Natl Acad Sci USA 84:4355–8
Google Scholar
Hein J (1989) A new method that simultaneously aligns and reconstructs ancestral sequences for any number of homologous sequences, when the phylogeny is given. Mol Biol Evol 6:649–668
Google Scholar
Henikoff S, Henikoff JG (1992) Amino acid substitution matrices from protein blocks. Proc Natl Acad Sci 89:10915–10919
CAS PubMed Google Scholar
Hillis DM, Moritz C (1990) Molecular systematics. Sinauer Associates, Sunderland MA
Google Scholar
Krogh A, Brown M, Mian IS, Sjoelander K, Haussler D (1994) Hidden Markov models in computational biology: application to protein modeling. J Mol Biol 235:1501–1531
Google Scholar
Kullback S (1978) Information theory and statistics. Peter Smith, Gloucester, MA
Google Scholar
Overington J, Donnelly D, Johnson MS, Sali A, Blundell TL (1992) Environment-specific amino acid substitution tables: Tertiary templates and prediction of protein folds. Protein Sci l:216–226
Google Scholar
Rabiner LR (1989) A tutorial on hidden Markov models and selected applications in speech recognition. Proc. IEEE 77:257–286
Google Scholar
Rumelhart DE, Zipser D (1985) Feature discovery by competitive learning. Cogn Sci 9:75–112
Google Scholar
Sander C, Schneider R (1993) The HSSP data base of protein structure-sequence alignments. Nucleic Acids Res 21:3105–3109
Google Scholar
Sankoff D, Morel C, Cedergen RJ (1973) Evolution of 5S RNA and the non-randomness of base replacement. Nature New Biol 245:232–234
Google Scholar
Thorne JL, Kishino H, Felsenstein J (1991) An evolutionary model for maximum likelihood alignment of DNA sequences. J Mol Evol 33:114–124
Google Scholar
Thorne JL, Kishino H, Felsenstein J (1992) Inching toward reality: an improved likelihood model of sequence evolution. J Mol Evol 34: 3–16
Google Scholar

Download references

Author information

Authors and Affiliations

MRC Laboratory of Molecular Biology, Hills Road, CB2 2QH, Cambridge, England
G. Mitchison & R. Durbin

Authors

G. Mitchison
View author publications
You can also search for this author in PubMed Google Scholar
R. Durbin
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

About this article

Cite this article

Mitchison, G., Durbin, R. Tree-based maximal likelihood substitution matrices and hidden Markov models. J Mol Evol 41, 1139–1151 (1995). https://doi.org/10.1007/BF00173195

Download citation

Received: 13 September 1994
Accepted: 17 April 1995
Issue Date: December 1995
DOI: https://doi.org/10.1007/BF00173195

Key words

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Tree-based maximal likelihood substitution matrices and hidden Markov models

Abstract

Access this article

Similar content being viewed by others

Numerical Optimization Techniques in Maximum Likelihood Tree Inference

PQ, a new program for phylogeny reconstruction

Estimating Phylogenetic Trees

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Key words

Navigation

Tree-based maximal likelihood substitution matrices and hidden Markov models

Abstract

Access this article

Similar content being viewed by others

Numerical Optimization Techniques in Maximum Likelihood Tree Inference

PQ, a new program for phylogeny reconstruction

Estimating Phylogenetic Trees

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Share this article

Key words

Search

Navigation