Abstract
An evolutionary configuration (EC) is a set of aligned sequences of characters (possibly representing amino acids, DNA, RNA or natural language). We define the probability of an EC, based on a given phylogenetic tree and give an algorithm to compute this probability efficiently. From these probabilities, we can compute the most likely sequence at any place in the phylogenetic tree, or its probability profile. The probability profile at the root of the tree is called the probabilistic ancestral sequence. By computing the probability of an EC, we can find by dynamic programming alignments over two subtrees. This gives an algorithm for computing multiple alignments. These multiple alignments are maximum likelihood, and are a compatible generalization of two sequence alignments.
This is a preview of subscription content, log in via an institution.
Preview
Unable to display preview. Download preview PDF.
References
L. Allison and C.S. Wallace. The posterior probability distribution of alignments and its application to parameter estimation of evolutionary trees and to optimization of multiple alignments. J. Molecular Evolution, 39:418–430, 1994.
Lachlan H. Bell, John R. Coggins, and James E. Milner-White. Mix'n'match: an improved multiple sequence alignment procedure for distantly related proteins using secondary structure predictions, designed to be independent of the choice of gap penalty and scoring matrix. Protein Engineering, 6(7):683–690, 1993.
Steven A. Benner, Mark A. Cohen, and Gaston H. Gonnet. Empirical and structural models for insertions and deletions in the divergent evolution of proteins. J. Molecular Biology, 229:1065–1082, 1993.
Steven A. Benner, Mark A. Cohen, and Gaston H. Gonnet. Amino acid substitution during functionally constrained divergent evolution of protein sequences. Protein Engineering, 7(11), 1994.
Humberto Carillo and David. Lipman. The multiple sequence alignment problem in biology. SIAM J. Appl. Math., 48(5):1073–1082, 1988.
Margaret O. Dayhoff, R. M. Schwartz, and B. C. Orcutt. A model for evolutionary change in proteins. In Margaret O. Dayhoff, editor, Atlas of Protein Sequence and Structure, volume 5, pages 345–352. National Biochemical Research Foundation, Washington DC, 1978.
Adam Godzik and Jeffrey Skolnick. Flexible algorithm for direct multiple alignment of protein structures and sequences. CABIOS, 10(6):587–596, 1994.
Gaston H. Gonnet, Mark A. Cohen, and Steven A. Benner. Exhaustive matching of the entire protein sequence database. Science, 256:1443–1445, 1992.
Gaston H. Gonnet and Chantal Korostensky. Evaluation measures of multiple sequence alignments. In preparation, 1996.
O. Gotoh. An improved algorithm for matching biological sequences. J. Mol. Biol., 162:705–708, 1982.
Sandeep K. Gupta, John Kececioglu, and Alejandro A. Schaffer. Improving the practical space and time efficiency of the shortest-paths approach to sum-of-pairs multiple sequence alignment. J. Computational Biology, 1996. To appear.
Xiaoqiu Huang. On global sequence alignment. CABIOS, 10(3):227–235, 1994.
Charles E. Lawrence, Stephen F. Altschul, Mark S. Boguski, Jun S. Liu, Andrew F. Neuwald, and John C. Wootton. Detecting subtle sequence signals: A gibbs sampling strategy for multiple alignment. Science, 262:208–214, October 1993.
David J. Lipman, Stephen F. Altschul, and John D. Kececioglu. A tool for multiple sequence alignment. Proc. Natl. Acad. Sci. USA, 86:4412–4415, June 1989.
S. B. Needleman and C. D. Wunsch. A general method applicable to the search for similarities in the amino acid sequence of two proteins. J. Mol. Biol., 48:443–453, 1970.
Gregory D. Schuler, Stephen F. Altschul, and David J. Lipman. A work-bench for multiple alignment construction and analysis. PROTEINS: Structure, Function, and Genetics, 9:180–190, 1991.
Peter H. Sellers. On the theory and computation of evolutionary distances. SIAM J Appl. Math., 26(4):787–793, Jun 1974.
Temple F. Smith and Michael S. Waterman. Identification of common molecular subsequences. J. Mol. Biol., 147:195–197, 1981.
J.D. Thompson, D.G. Higgins, and T.J Gibson. Clustal w: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, positions-specific gap penalties and weight matrix choice. Nucleic Acids Research, 22:4673–4680, 1994.
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 1996 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Gonnet, G.H., Benner, S.A. (1996). Probabilistic ancestral sequences and multiple alignments. In: Karlsson, R., Lingas, A. (eds) Algorithm Theory — SWAT'96. SWAT 1996. Lecture Notes in Computer Science, vol 1097. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-61422-2_147
Download citation
DOI: https://doi.org/10.1007/3-540-61422-2_147
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-61422-7
Online ISBN: 978-3-540-68529-6
eBook Packages: Springer Book Archive