Skip to main content

Evolving Better Multiple Sequence Alignments

  • Conference paper
Book cover Genetic and Evolutionary Computation – GECCO 2004 (GECCO 2004)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 3102))

Included in the following conference series:

Abstract

Aligning multiple DNA or protein sequences is a fundamental step in the analyses of phylogeny, homology and molecular structure. Heuristic algorithms are applied because optimal multiple sequence alignment is prohibitively expensive. Heuristic alignment algorithms represent a practical trade-off between speed and accuracy, but they can be improved. We present EVALYN (EVolved ALYNments), a novel approach to multiple sequence alignment in which sequences are progressively aligned based on a guide tree optimized by a genetic algorithm. We hypothesize that a genetic algorithm can find better guide trees than traditional, deterministic clustering algorithms. We compare our novel evolutionary approach to CLUSTAL W and find that EVALYN performs consistently and significantly better as measured by a common alignment scoring technique. Additionally, we hypothesize that evolutionary guide tree optimization is inherently efficient and has less time complexity than the commonly-used neighbor-joining algorithm. We present a compelling analysis in support of this scalability hypothesis.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Needleman, S.B., Wunsch, C.D.: A general method applicable to the search for similarities in the amino acid sequences of two proteins. Journal of Molecular Biology 48(3), 443–453 (1970)

    Article  Google Scholar 

  2. Smith, T.F., Waterman, M.S.: Identification of Common Molecular Subsequences. Journal of Molecular Biology 48, 443–453 (1981)

    Google Scholar 

  3. Just, W.: Computational Complexity of Multiple Sequence Alignment with SP-Score. Journal of Computational Biology 8(6), 615–623 (2001)

    Article  MathSciNet  Google Scholar 

  4. Notredame, C.: Recent progresses in multiple sequence alignment: a survey. Pharmacogenomics 3(1) (2002)

    Google Scholar 

  5. Saitou, N., Nei, M.: The neighbor-joining method: A new method for reconstructing phylogenetic trees. Molecular Biology Evolution 4, 406–425 (1987)

    Google Scholar 

  6. Notredame, C., Higgins, D.G.: SAGA: sequence alignment by genetic algorithm. Nucleic Acids Research 24(8), 1515–1524 (1996)

    Article  Google Scholar 

  7. Carillo, H., Lipman, D.: The multiple sequence alignment problem in biology. SIAM Journal on Applied Mathematics 48(5), 1073–1082 (1988)

    Article  MathSciNet  Google Scholar 

  8. Thomsen, R., Fogel, G.B., Krink, T.: A Clustal alignment improver using evolutionary algorithms. In: Congress on Evolutionary Computation (CEC), Honolulu, Hawaii (2002)

    Google Scholar 

  9. Higgins, D.G., Bleasby, A.J., Fuchs, R.: CLUSTAL V: improved software for multiple sequence alignment. Comput. Appl. Biosci. 8(2), 189–191 (1992)

    Google Scholar 

  10. Lewis, P.O.: A genetic algorithm for maximum-likelihood phylogeny inference using nucleotide sequence data. Molecular Biology Evolution 15(3), 277–283 (1998)

    Google Scholar 

  11. Matsuda, H.: Protein phylogenetic inference using maximum likelihood with a genetic algorithm. In: Pacific Symposium on Biocomputing, World Scientific, London (1996)

    Google Scholar 

  12. Congdon, C.B.: Gaphyl: An Evolutionary Algorithms Approach for the Study of Natural Evolution. In: Genetic Evolutionary Computation Confernece (GECCO), Morgan Kaufmann, San Francisco (2002)

    Google Scholar 

  13. Fitch, W.M.: Toward defining the course of evolution: Minimum change for a specific tree topology. Systematic Zoology 20, 406–416 (1971)

    Article  Google Scholar 

  14. Koza, J.: Genetic Programming: On the Programming of Computers by Means of Natural Selection. MIT Press, Cambridge (1992)

    MATH  Google Scholar 

  15. Soule, T., Foster, J.A.: Effects of code growth and parsimony pressure on populations in genetic programming. Evolutionary Computation 6(4), 293–309 (1998)

    Article  Google Scholar 

  16. Dayhoff, M.O., Schwartz, R.M., Orcutt, B.C.: A model of evolutionary change in proteins. Atlas of Protein Sequence and Structure (1978)

    Google Scholar 

  17. Henikoff, S., Henikoff, J.G.: Amino acid substitution matrices from protein blocks. National Academy of Sciences of the USA (1992)

    Google Scholar 

  18. Howe, K., Bateman, A., Durbin, R.: QuickTree: building huge Neighbor-Joining trees of protein sequences. Bioinformatics 18(11), 1546–1547 (2002)

    Article  Google Scholar 

  19. Thompson, J.D., Higgins, D.G., Gibson, T.J.: CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Research 22(22), 4673–4680 (1994)

    Article  Google Scholar 

  20. Jukes, T.H., Cantor, C.R.: Evolution of protein molecules. In: Munro, H.N. (ed.) Mammalian Protein Metabolism, pp. 21–132. Academic Press, London (1969)

    Google Scholar 

  21. Thompson, J.D., Plewniak, F., Poch, O.: BAliBASE: A benchmark alignments database for the evaluation of multiple sequence alignment programs. Nucleic Acids Research 27(13), 2682–2690 (1999)

    Article  Google Scholar 

  22. Felsenstein, J.: Evolutionary trees from DNA sequences: A maximum likelihood approach. Journal of Molecular Evolution 17, 368–376 (1981)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2004 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Sheneman, L., Foster, J.A. (2004). Evolving Better Multiple Sequence Alignments. In: Deb, K. (eds) Genetic and Evolutionary Computation – GECCO 2004. GECCO 2004. Lecture Notes in Computer Science, vol 3102. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-24854-5_45

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-24854-5_45

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-22344-3

  • Online ISBN: 978-3-540-24854-5

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics