Chapter Summary
How to discover evolutionary homology of nucleotide and amino acid sequences and how to analyze these homologous sequences are discussed, including homology search, pairwise alignment, multiple alignment, and genome-wide sequence viewing.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Altschul, S. F., Gish, W., Miller, W., Myers, E. W., & Lipman, D. J. (1990). Basic local alignment search tool. Journal of Molecular Biology, 215, 403–410.
Karlin, S., & Altschul, S. F. (1990). Methods for assessing the statistical significance of molecular sequence features by using general scoring schemes. Proceedings of the National Academy of Sciences USA, 87, 2264–2268.
Altschul, S. F., Madden, T. L., Schaffer, A. A., Zhang, J., Zhang, Z., Miller, W., et al. (1997). Gapped BLAST and PSI-BLAST: A new generation of protein database search programs. Nucleic Acids Research, 25, 3389–3402.
Zhang, Z., Schwartz, S., Wagner, L., & Miller, W. (2000). A greedy algorithm for aligning DNA sequences. Journal of Computational Biology, 7, 203–214.
NCBI BLAST. (https://blast.ncbi.nlm.nih.gov/Blast.cgi?PAGE_TYPE=BlastSearch).
Kitano, T., Sumiyama, K., Shiroishi, T., & Saitou, N. (1998). Conserved evolution of the Rh50 gene compared to its homologous Rhblood group gene. Biochemical and Biophysical Research Communications, 249, 78–85.
DDBJ. (https://www.ddbj.nig.ac.jp/).
DDBJ getentry. (http://getentry.ddbj.nig.ac.jp).
DDBJ ARSA. (http://ddbj.nig.ac.jp/arsa/).
DDBJ BLAST. (http://ddbj.nig.ac.jp/arsa/).
NCBI blastp. (https://blast.ncbi.nlm.nih.gov/Blast.cgi?PROGRAM=blastp&PAGE_TYPE=BlastSearch&LINK_LOC=blasthome).
Lipman, D. J., & Pearson, W. R. (1985). Rapid and sensitive protein similarity searches. Science, 227, 1435–1441.
Pearson, W. R., & Lipman, D. J. (1988). Improved tools for biological sequence comparison. Proceedings of the National Academy of Sciences USA, 85, 2444–2448.
Kent, W. J. (2002). BLAT—The BLAST-like alignment tool. Genome Research, 12, 656–664.
Ma, B., Tromp, J., & Li, M. (2002). PatternHunter: Faster and more sensitive homology search. Bioinformatics, 18, 440–445.
Eddy, S. R. (2009). A new generation of homology search tools based on probabilistic inference. Genome Informatics, 23, 205–211.
Waterman, M. S. (1995). Introduction to computer biology. London: Chapman & Hall.
Chao, K.-M., & Zhang, L. (2008). Sequence comparison: Theory and methods. London: Springer.
Saitou, N., & Ueda, S. (1994). Evolutionary rate of insertions and deletions in non-coding nucleotide sequences of primates. Molecular Biology and Evolution, 11, 504–512.
Needleman, S. B., & Wunsch, C. D. (1970). A general method applicable to the search for similarities in the amino acid sequence of two proteins. Journal of Molecular Biology, 48, 443–453.
Sellers, P. H. (1974). On the theory and computation of evolutionary distances. SIAM Journal on Applied Mathematics, 26, 787–793.
Waterman, M. S., Smith, T. F., & Beyer, W. A. (1976). Some biological sequence metrics. Advances in Mathematics, 20, 367–387.
Gotoh, O. (1982). An improved algorithm for matching biological sequences. Journal of Molecular Biology, 162, 705–708.
Altschul, S. F., & Erickson, B. W. (1986). A nonlinear measure of subalignment similarity and its significance levels. Bulletin of Mathematical Biology, 48, 603–616.
Fitch, W. (1969). Locating gaps in amino acid sequences to optimize the homology between two proteins. Biochemical Genetics, 3, 99–108.
Schulz, J., Florian Leese, F., & Held, C. (2011). Introduction to dot-plots. Web page available at http://www.code10.info/.
YASS server. (http://bioinfo.lifl.fr/yass/index.php).
Murata, M., Richardson, J. S., & Sussman, J. L. (1985). Simultaneous comparison of three protein sequences. Proceedings of National Academy of Sciences, USA, 82, 3073–3077.
Feng, D.-F., & Doolittle, R. F. (1987). Progressive sequence alignment as a prerequisite to correct phylogenetic trees. Journal of Molecular Evolution, 25, 351–360.
Notredame, C. (2007). Recent evolutions of multiple sequence alignment algorithms. PLoS Computational Biology, 3, e123.
MEGA (Molecular Evolutionary Genetics Analysis). (https://www.megasoftware.net).
Thompson, J. D., Higgins, D. G., & Gibson, T. J. (1994). CLUSTAL W: Improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Research, 22, 4673–4680.
Edgar, R. C. (2004). MUSCLE: Multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Research, 32, 1792–1797.
Notredame, C., Higgins, D. G., & Heringa, J. (2000). T-Coffee: A novel method for fast and accurate multiple sequence alignment. Journal of Molecular Biology, 302, 205–215.
Katoh, K., Misawa, K., Kuma, K., & Miyata, T. (2002). MAFFT: A novel method for rapid multiple sequence alignment based on fast Fourier transform. Nucleic Acids Research, 30, 3059–3066.
Morgenstern, B., Dress, A., & Werner, T. (1996). Multiple DNA and protein sequence alignment based on segment-to-segment comparison. Proceedings of National Academy of Sciences, USA, 93, 12098–12103.
Brudno, M., Do, C., Cooper, G., Kim, M. F., Davydov, E., Green, E. D., et al. (2003). LAGAN and multi-LAGAN: Efficient tools for large-scale multiple alignment of genomic DNA. Genome Research, 13, 721–731.
Bray, N., & Pachter, L. (2004). MAVID: Constrained ancestral alignment of multiple sequences. Genome Research, 14, 693–699.
Darling, A. C. E., Mau, B., & Perna, N. T. (2010). ProgressiveMauve: Multiple genome alignment with gene gain, loss and rearrangement. PLoS ONE, 5, e11147.
Kryukov, K., & Saitou, N. (2010). MISHIMA—A new method for high speed multiple alignment of nucleotide sequences of bacterial genome scale data. BMC Bioinformatics, 11, 142.
Popendorf, K., Tsuyoshi, H., Osana, Y., & Sakakibara, Y. (2010). Murasaki: A fast, parallelizable algorithm to find anchors from multiple genomes. PLoS ONE, 5, e12651.
Marcais, G., et al. (2018). MUMmer4: A fast and versatile genome alignment system. PLoS Computational Biology, 14, e1005944.
Felsenstein, J., Sawyer, S., & Kochin, R. (1982). An efficient method for matching nucleotide acid sequences. Nucleic Acids Research, 10, 133–139.
Katoh, K., & Standley, D. M. (2013). MAFFT multiple sequence alignment software version 7: Improvements in performance and usability. Molecular Biology and Evolution, 30, 772–780.
Kryukov, K. (unpublished). MSHIMA version 2.
SeaView—Multiplatform GUI for molecular phylogeny. (http://doua.prabi.fr/software/seaview).
Sievers, F., et al. (2011). Fast, scalable generation of high-quality protein multiple sequence alignments using Clustal Omega. Molecular Systems Biology, 7, 539.
Lipman, D. J., Altschul, S. F., & Kececioglu, J. D. (1989). A tool for multiple sequence alignment. Proceedings of the National Academy of Sciences of the United States of America, 86, 4412–4415.
UNIPROT. (http://www.uniprot.org).
Larkin, M. A., et al. (2007). Clustal W and Clustal X version 2.0. Bioinformatics, 23, 2947–2948.
Subramanian, A. R., Kaufmann, M., & Morgenstern, B. (2008). DIALIGN-TX: Greedy and progressive approaches for segment-based multiple sequence alignment. Algorithms for Molecular Biology, 3, 6.
Bradley, R. K., Roberts, A., Smoot, M., Juvekar, S., Do, J., Dewey, C., et al. (2009). Fast statistical alignment. PLoS Computational Biology, 5, e1000392.
Blanchette, M., et al. (2004). Aligning multiple genomic sequences with the threaded blockset aligner. Genome Research, 14, 708–715.
Kurtz, S., et al. (2004). Versatile and open software for comparing large genomes. Genome Biology, 5, R12.
Brudno, M., Chapman, M., Gottgens, B., Batzoglou, S., & Morgenstern, B. (2003). Fast and sensitive multiple alignment of long genomic sequences. BMC Bioinformatics, 4, 66.
Raphael, B., Zhi, D., Tang, H., & Pevzner, P. (2004). A novel method for multiple alignment of sequences with repeated and shuffled elements. Genome Research, 14, 2336–2346.
Do, C. B., Mahabhashyam, M. S. P., Brudno, M., & Batzoglou, S. (2005). ProbCons: Probabilistic consistency-based multiple sequence alignment. Genome Research, 15, 330–340.
Lassmann, T., & Sonnhammer, E. L. L. (2005). Kalign—An accurate and fast multiple sequence alignment algorithm. BMC Bioinformatics, 6, 298.
Lotynoja, A., & Goldman, N. (2005). An algorithm for progressive multiple alignment of sequences with insertions. Proceedings of the National Academy of Sciences USA, 102, 10557–10562.
Sze, S.-H., Lu, Y., & Yang, Q. (2006). A polynomial time solvable formulation of multiple sequence alignment. Journal of Computational Biology, 13, 309–319.
Liu, Y., Schmidt, B., & Maskell, D. L. (2010). MSAProbs: Multiple sequence alignment based on pair hidden Markov models and partition function posterior probabilities. Bioinformatics, 26, 1958–1964.
Shih, A. C.-C., & Li, W.-H. (2003). GS-Aligner: A novel tool for aligning genomic sequences using bit-level operations. Molecular Biology and Evolution, 20, 1299–1309.
Keightley, P. D., & Johnson, T. (2004). MCALIGN: Stochastic alignment of noncoding DNA sequences based on an evolutionary model of sequence evolution. Genome Research, 14, 442–450.
Schwartz, S., et al. (2000). PipMaker—A web server for aligning two genomic DNA sequences. Genome Research, 10, 577–586.
PipMaker and MultiPipMaker. (http://pipmaker.bx.psu.edu/pipmaker).
Matsunami, M., Sumiyama, K., & Saitou, N. (2010). Evolution of conserved non-coding sequences within the vertebrate Hox clusters through the two-round whole genome duplications revealed by phylogenetic footprinting analysis. Journal of Molecular Evolution, 71, 427–436.
VISTA. (http://genome.lbl.gov/vista/index.shtml).
UCSC (University of California, Santa Cruz) Genome Bioinformatics. (http://genome.ucsc.edu).
NCBI Genome Data Viewer. (https://www.ncbi.nlm.nih.gov/genome/gdv/).
Higgins, D. G., & Sharp, P. (1988). CLUSTAL: A package for performing multiple sequence alignment on a microcomputer. Gene, 73, 237–244.
Sokal, R., & Michener, C. D. (1958). A statistical method for evaluating systematic relationship. University of Kansas Science Bulletin, 38, 1409–1438.
Higgins, D. G., Bleasby, A. J., & Fuchs, R. (1992). CLUSTAL V: Improved software for multiple sequence alignment. Computational Applied Biosciences, 8, 189–191.
Kimura, M. (1980). A simple method for estimating evolutionary rates of base substitutions through comparative studies of nucleotide sequences. Journal of Molecular Evolution, 16, 111–120.
Kimura, M. (1983). The neutral theory of molecular evolution. Cambridge: Cambridge University Press.
Saitou, N., & Nei, M. (1987). The neighbor-joining method: A new method for reconstructing phylogenetic trees. Molecular Biology and Evolution, 4, 406–425.
Wilbur, W. J., & Lipman, D. (1984). The context dependent comparison of biological sequences. SIAM Journal of Applied Mathematics, 44, 557–567.
Myers, E. W., & Miller, W. (1988). Optimal alignments in linear space. CABIOS, 4, 11–15.
Clustal: Multiple Sequence Alignment. (http://www.clustal.org/).
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
Copyright information
© 2018 Springer Nature Switzerland AG
About this chapter
Cite this chapter
Saitou, N. (2018). Homology Search and Multiple Alignment. In: Introduction to Evolutionary Genomics. Computational Biology, vol 17. Springer, Cham. https://doi.org/10.1007/978-3-319-92642-1_15
Download citation
DOI: https://doi.org/10.1007/978-3-319-92642-1_15
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-92641-4
Online ISBN: 978-3-319-92642-1
eBook Packages: Computer ScienceComputer Science (R0)