Homology Search and Multiple Alignment

Saitou, Naruya

doi:10.1007/978-3-319-92642-1_15

Naruya Saitou²⁴

Part of the book series: Computational Biology ((COBO,volume 17))

1836 Accesses

Chapter Summary

How to discover evolutionary homology of nucleotide and amino acid sequences and how to analyze these homologous sequences are discussed, including homology search, pairwise alignment, multiple alignment, and genome-wide sequence viewing.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

eBook: USD 16.99; Price excludes VAT (USA)

Softcover Book: USD 64.99; Price excludes VAT (USA)

Hardcover Book: USD 99.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Altschul, S. F., Gish, W., Miller, W., Myers, E. W., & Lipman, D. J. (1990). Basic local alignment search tool. Journal of Molecular Biology, 215, 403–410.
Article Google Scholar
http://www.ncbi.nlm.nih.gov/books/NBK21097/.
Karlin, S., & Altschul, S. F. (1990). Methods for assessing the statistical significance of molecular sequence features by using general scoring schemes. Proceedings of the National Academy of Sciences USA, 87, 2264–2268.
Article Google Scholar
Altschul, S. F., Madden, T. L., Schaffer, A. A., Zhang, J., Zhang, Z., Miller, W., et al. (1997). Gapped BLAST and PSI-BLAST: A new generation of protein database search programs. Nucleic Acids Research, 25, 3389–3402.
Article Google Scholar
Zhang, Z., Schwartz, S., Wagner, L., & Miller, W. (2000). A greedy algorithm for aligning DNA sequences. Journal of Computational Biology, 7, 203–214.
Article Google Scholar
NCBI BLAST. (https://blast.ncbi.nlm.nih.gov/Blast.cgi?PAGE_TYPE=BlastSearch).
Kitano, T., Sumiyama, K., Shiroishi, T., & Saitou, N. (1998). Conserved evolution of the Rh50 gene compared to its homologous Rhblood group gene. Biochemical and Biophysical Research Communications, 249, 78–85.
Article Google Scholar
DDBJ. (https://www.ddbj.nig.ac.jp/).
DDBJ getentry. (http://getentry.ddbj.nig.ac.jp).
DDBJ ARSA. (http://ddbj.nig.ac.jp/arsa/).
DDBJ BLAST. (http://ddbj.nig.ac.jp/arsa/).
NCBI blastp. (https://blast.ncbi.nlm.nih.gov/Blast.cgi?PROGRAM=blastp&PAGE_TYPE=BlastSearch&LINK_LOC=blasthome).
Lipman, D. J., & Pearson, W. R. (1985). Rapid and sensitive protein similarity searches. Science, 227, 1435–1441.
Article Google Scholar
Pearson, W. R., & Lipman, D. J. (1988). Improved tools for biological sequence comparison. Proceedings of the National Academy of Sciences USA, 85, 2444–2448.
Article Google Scholar
Kent, W. J. (2002). BLAT—The BLAST-like alignment tool. Genome Research, 12, 656–664.
Article Google Scholar
http://genome.ucsc.edu/FAQ/FAQblat.html.
Ma, B., Tromp, J., & Li, M. (2002). PatternHunter: Faster and more sensitive homology search. Bioinformatics, 18, 440–445.
Article Google Scholar
Eddy, S. R. (2009). A new generation of homology search tools based on probabilistic inference. Genome Informatics, 23, 205–211.
Google Scholar
http://hmmer.org.
Waterman, M. S. (1995). Introduction to computer biology. London: Chapman & Hall.
Book Google Scholar
Chao, K.-M., & Zhang, L. (2008). Sequence comparison: Theory and methods. London: Springer.
Google Scholar
Saitou, N., & Ueda, S. (1994). Evolutionary rate of insertions and deletions in non-coding nucleotide sequences of primates. Molecular Biology and Evolution, 11, 504–512.
Google Scholar
Needleman, S. B., & Wunsch, C. D. (1970). A general method applicable to the search for similarities in the amino acid sequence of two proteins. Journal of Molecular Biology, 48, 443–453.
Article Google Scholar
Sellers, P. H. (1974). On the theory and computation of evolutionary distances. SIAM Journal on Applied Mathematics, 26, 787–793.
Article MathSciNet Google Scholar
Waterman, M. S., Smith, T. F., & Beyer, W. A. (1976). Some biological sequence metrics. Advances in Mathematics, 20, 367–387.
Article MathSciNet Google Scholar
Gotoh, O. (1982). An improved algorithm for matching biological sequences. Journal of Molecular Biology, 162, 705–708.
Article Google Scholar
Altschul, S. F., & Erickson, B. W. (1986). A nonlinear measure of subalignment similarity and its significance levels. Bulletin of Mathematical Biology, 48, 603–616.
Article MathSciNet Google Scholar
Fitch, W. (1969). Locating gaps in amino acid sequences to optimize the homology between two proteins. Biochemical Genetics, 3, 99–108.
Article Google Scholar
Schulz, J., Florian Leese, F., & Held, C. (2011). Introduction to dot-plots. Web page available at http://www.code10.info/.
YASS server. (http://bioinfo.lifl.fr/yass/index.php).
Murata, M., Richardson, J. S., & Sussman, J. L. (1985). Simultaneous comparison of three protein sequences. Proceedings of National Academy of Sciences, USA, 82, 3073–3077.
Article Google Scholar
Feng, D.-F., & Doolittle, R. F. (1987). Progressive sequence alignment as a prerequisite to correct phylogenetic trees. Journal of Molecular Evolution, 25, 351–360.
Article Google Scholar
Notredame, C. (2007). Recent evolutions of multiple sequence alignment algorithms. PLoS Computational Biology, 3, e123.
Article Google Scholar
MEGA (Molecular Evolutionary Genetics Analysis). (https://www.megasoftware.net).
Thompson, J. D., Higgins, D. G., & Gibson, T. J. (1994). CLUSTAL W: Improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Research, 22, 4673–4680.
Article Google Scholar
Edgar, R. C. (2004). MUSCLE: Multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Research, 32, 1792–1797.
Article Google Scholar
Notredame, C., Higgins, D. G., & Heringa, J. (2000). T-Coffee: A novel method for fast and accurate multiple sequence alignment. Journal of Molecular Biology, 302, 205–215.
Article Google Scholar
Katoh, K., Misawa, K., Kuma, K., & Miyata, T. (2002). MAFFT: A novel method for rapid multiple sequence alignment based on fast Fourier transform. Nucleic Acids Research, 30, 3059–3066.
Article Google Scholar
Morgenstern, B., Dress, A., & Werner, T. (1996). Multiple DNA and protein sequence alignment based on segment-to-segment comparison. Proceedings of National Academy of Sciences, USA, 93, 12098–12103.
Article Google Scholar
Brudno, M., Do, C., Cooper, G., Kim, M. F., Davydov, E., Green, E. D., et al. (2003). LAGAN and multi-LAGAN: Efficient tools for large-scale multiple alignment of genomic DNA. Genome Research, 13, 721–731.
Article Google Scholar
Bray, N., & Pachter, L. (2004). MAVID: Constrained ancestral alignment of multiple sequences. Genome Research, 14, 693–699.
Article Google Scholar
Darling, A. C. E., Mau, B., & Perna, N. T. (2010). ProgressiveMauve: Multiple genome alignment with gene gain, loss and rearrangement. PLoS ONE, 5, e11147.
Article Google Scholar
Kryukov, K., & Saitou, N. (2010). MISHIMA—A new method for high speed multiple alignment of nucleotide sequences of bacterial genome scale data. BMC Bioinformatics, 11, 142.
Article Google Scholar
Popendorf, K., Tsuyoshi, H., Osana, Y., & Sakakibara, Y. (2010). Murasaki: A fast, parallelizable algorithm to find anchors from multiple genomes. PLoS ONE, 5, e12651.
Article Google Scholar
Marcais, G., et al. (2018). MUMmer4: A fast and versatile genome alignment system. PLoS Computational Biology, 14, e1005944.
Article Google Scholar
Felsenstein, J., Sawyer, S., & Kochin, R. (1982). An efficient method for matching nucleotide acid sequences. Nucleic Acids Research, 10, 133–139.
Article Google Scholar
Katoh, K., & Standley, D. M. (2013). MAFFT multiple sequence alignment software version 7: Improvements in performance and usability. Molecular Biology and Evolution, 30, 772–780.
Article Google Scholar
Kryukov, K. (unpublished). MSHIMA version 2.
Google Scholar
SeaView—Multiplatform GUI for molecular phylogeny. (http://doua.prabi.fr/software/seaview).
Sievers, F., et al. (2011). Fast, scalable generation of high-quality protein multiple sequence alignments using Clustal Omega. Molecular Systems Biology, 7, 539.
Article Google Scholar
Lipman, D. J., Altschul, S. F., & Kececioglu, J. D. (1989). A tool for multiple sequence alignment. Proceedings of the National Academy of Sciences of the United States of America, 86, 4412–4415.
Article Google Scholar
UNIPROT. (http://www.uniprot.org).
Larkin, M. A., et al. (2007). Clustal W and Clustal X version 2.0. Bioinformatics, 23, 2947–2948.
Article Google Scholar
Subramanian, A. R., Kaufmann, M., & Morgenstern, B. (2008). DIALIGN-TX: Greedy and progressive approaches for segment-based multiple sequence alignment. Algorithms for Molecular Biology, 3, 6.
Article Google Scholar
Bradley, R. K., Roberts, A., Smoot, M., Juvekar, S., Do, J., Dewey, C., et al. (2009). Fast statistical alignment. PLoS Computational Biology, 5, e1000392.
Article MathSciNet Google Scholar
Blanchette, M., et al. (2004). Aligning multiple genomic sequences with the threaded blockset aligner. Genome Research, 14, 708–715.
Article Google Scholar
Kurtz, S., et al. (2004). Versatile and open software for comparing large genomes. Genome Biology, 5, R12.
Article Google Scholar
Brudno, M., Chapman, M., Gottgens, B., Batzoglou, S., & Morgenstern, B. (2003). Fast and sensitive multiple alignment of long genomic sequences. BMC Bioinformatics, 4, 66.
Article Google Scholar
Raphael, B., Zhi, D., Tang, H., & Pevzner, P. (2004). A novel method for multiple alignment of sequences with repeated and shuffled elements. Genome Research, 14, 2336–2346.
Article Google Scholar
Do, C. B., Mahabhashyam, M. S. P., Brudno, M., & Batzoglou, S. (2005). ProbCons: Probabilistic consistency-based multiple sequence alignment. Genome Research, 15, 330–340.
Article Google Scholar
Lassmann, T., & Sonnhammer, E. L. L. (2005). Kalign—An accurate and fast multiple sequence alignment algorithm. BMC Bioinformatics, 6, 298.
Article Google Scholar
Lotynoja, A., & Goldman, N. (2005). An algorithm for progressive multiple alignment of sequences with insertions. Proceedings of the National Academy of Sciences USA, 102, 10557–10562.
Article Google Scholar
Sze, S.-H., Lu, Y., & Yang, Q. (2006). A polynomial time solvable formulation of multiple sequence alignment. Journal of Computational Biology, 13, 309–319.
Article MathSciNet Google Scholar
Liu, Y., Schmidt, B., & Maskell, D. L. (2010). MSAProbs: Multiple sequence alignment based on pair hidden Markov models and partition function posterior probabilities. Bioinformatics, 26, 1958–1964.
Article Google Scholar
Shih, A. C.-C., & Li, W.-H. (2003). GS-Aligner: A novel tool for aligning genomic sequences using bit-level operations. Molecular Biology and Evolution, 20, 1299–1309.
Article Google Scholar
Keightley, P. D., & Johnson, T. (2004). MCALIGN: Stochastic alignment of noncoding DNA sequences based on an evolutionary model of sequence evolution. Genome Research, 14, 442–450.
Article Google Scholar
Schwartz, S., et al. (2000). PipMaker—A web server for aligning two genomic DNA sequences. Genome Research, 10, 577–586.
Article Google Scholar
PipMaker and MultiPipMaker. (http://pipmaker.bx.psu.edu/pipmaker).
Matsunami, M., Sumiyama, K., & Saitou, N. (2010). Evolution of conserved non-coding sequences within the vertebrate Hox clusters through the two-round whole genome duplications revealed by phylogenetic footprinting analysis. Journal of Molecular Evolution, 71, 427–436.
Article Google Scholar
VISTA. (http://genome.lbl.gov/vista/index.shtml).
UCSC (University of California, Santa Cruz) Genome Bioinformatics. (http://genome.ucsc.edu).
NCBI Genome Data Viewer. (https://www.ncbi.nlm.nih.gov/genome/gdv/).
Higgins, D. G., & Sharp, P. (1988). CLUSTAL: A package for performing multiple sequence alignment on a microcomputer. Gene, 73, 237–244.
Article Google Scholar
Sokal, R., & Michener, C. D. (1958). A statistical method for evaluating systematic relationship. University of Kansas Science Bulletin, 38, 1409–1438.
Google Scholar
Higgins, D. G., Bleasby, A. J., & Fuchs, R. (1992). CLUSTAL V: Improved software for multiple sequence alignment. Computational Applied Biosciences, 8, 189–191.
Google Scholar
Kimura, M. (1980). A simple method for estimating evolutionary rates of base substitutions through comparative studies of nucleotide sequences. Journal of Molecular Evolution, 16, 111–120.
Article Google Scholar
Kimura, M. (1983). The neutral theory of molecular evolution. Cambridge: Cambridge University Press.
Book Google Scholar
Saitou, N., & Nei, M. (1987). The neighbor-joining method: A new method for reconstructing phylogenetic trees. Molecular Biology and Evolution, 4, 406–425.
Google Scholar
Wilbur, W. J., & Lipman, D. (1984). The context dependent comparison of biological sequences. SIAM Journal of Applied Mathematics, 44, 557–567.
Article MathSciNet Google Scholar
Myers, E. W., & Miller, W. (1988). Optimal alignments in linear space. CABIOS, 4, 11–15.
Google Scholar
Clustal: Multiple Sequence Alignment. (http://www.clustal.org/).

Download references

Author information

Authors and Affiliations

Division of Population Genetics, National Institute of Genetics (NIG), Mishima, Shizuoka, Japan
Naruya Saitou

Authors

Naruya Saitou
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Naruya Saitou .

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Saitou, N. (2018). Homology Search and Multiple Alignment. In: Introduction to Evolutionary Genomics. Computational Biology, vol 17. Springer, Cham. https://doi.org/10.1007/978-3-319-92642-1_15

Download citation

DOI: https://doi.org/10.1007/978-3-319-92642-1_15
Published: 26 October 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-92641-4
Online ISBN: 978-3-319-92642-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics