Abstract
Up to now, there are many homology search algorithms that have been investigated and studied. However, a good classification method and a comprehensive comparison for these algorithms are absent. This is especially true for index based homology search algorithms. The paper briefly introduces main index construction methods. According to index construction methods, index based homology search algorithms are classified into three categories, i.e., length based index ones, transformation based index ones, and their combination. Based on the classification, the characteristics of the currently popular index based homology search algorithms are compared and analyzed. At the same time, several promising and new index techniques are also discussed. As a whole, the paper provides a survey on index based homology search algorithms.
Similar content being viewed by others
References
Aghili SA, Agrawal D, El Abbadi A (2003) Filtration of string proximity search via transformation. In: Third IEEE symposium on bioinformatics and bioengineering (BIBE’03), Bethesda, MD, USA, 2003
Aghili SA, Sahin OD, Agrawal D, El Abbadi A (2004) Efficient filtration of sequence similarity search through singular value decomposition. In: Fourth IEEE symposium on bioinformatics and bioengineering (BIBE’04), Taichung, Taiwan, 2004
Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ (1990) Basic local alignment search tool. National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD
Altschul SF, Madden T, Alejandro A, Schaffer A, Zhang J, Zhang Z, Miller W, Lipman DJ (1997) Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, July 1997
Argyros T, Ermopoulos C (2003) Efficient subsequence matching in time series databases under time and amplitude transformations. In: ICDM, 2003, pp 481–484
Califano A, Rigoutsos I (1993) FLASH: a fast look-up algorithm for string homology. In: International conference on intelligent systems for molecular biology, Bethesda, MD, pp 56–64
Cao X, Li SC, Ooi BC, Tung AKH (2004) Piers: an efficient model for similarity search in DNA sequence databases. Sigmod Record, Special Issue
Chattaraj A, Williams HE (2004) Variable-length intervals in homology search. In: Asia-pacific bioinformatics conference, Dunedin, New Zealand, 2004
Chen W, Aberer K (1997) Efficient querying on genomic databases by using metric space indexing techniques. Technical Report No. 1056, German National Research Center for Information Technology
Chen W, Aberer K (1997) Efficient querying on genomic databases by using metric space indexing techniques. In: Eighth international conference and workshop on database and expert-systems applications (DEXA’97), Toulouse, France
Cooper G, Raymer M, Doom T, Krane D, Futamur N (2004) Indexing genomic databases. In: Fourth IEEE symposium on bioinformatics and bioengineering (BIBE’04), Taichung, Taiwan, 2004
Fondrat C, Dessen P (1995) A Rapid access motif database (RAMdb) with a search algorithm for the retrieval patterns in nucleic acids or protein databanks. Comput Appl Biosci 11(3):273–279
Gardner-Stephen P, Knowles G (2003) A novel architecture for genomic sequence searching and alignment. In: Asia-pacific computer systems architecture conference, pp 180–192
Hobohm U, Sander C (1995) A sequence property approach to searching protein databases. J Molec Biol 251:390–399
Hunt E, Atkinson MP, Irving RW (2001) A database index to large biological sequences. In: Proceedings of the 27th VLDB conference, Roma, Italy, 2001
Hunt E, Atkinson MP, Irving RW (2002) Database indexing for large DNA and protein sequence collections. VLDB J 11(3):256–271
Kahveci T, Singh AK (2001) An efficient index structure for string databases. In: Proceedings of the 37th VLDB conference, Roma, Italy, 2001
Kahveci T, Singh AK (2003) MAP: searching large genome databases. In: Pacific symposium on biocomputing, Hawai, 2003
Kailing K, Kriegel H-P, Schonauer S, Seidl T (2004) Efficient similarity search for hierarchical data in large databases. In: Proc 9th int conf on extending database technology (EDBT 2004), Heraklion, Greece, pp 676–693
Kent WJ (2002) BLAT: the BLAST-like alignment too. Genom Res 12(4)
Kriegel H-P, Schonauer S (2003) Similarity search in structured data. In: Proc 5th int conf on data warehousing and knowledge discovery (DaWaK’03), Prague, Czech Republic, Lecture notes in computer science (LNCS), vol 2737, 2003, pp 309–319
Lee HP, Tsai YT, Sheu TF, Tang CT (2004) An IDC-based algorithm for efficient homology filtration with guaranteed seriate coverage. In: Fourth IEEE symposium on bioinformatics and bioengineering (BIBE’04), Taichung, Taiwan, 2004
Navarro G, Baeza-Yates R, Sutinen E, Tarhio J (2001) Indexing methods for approximate string matching. IEEE Data Eng Bul 24(4)
Ning Z, Cox AJ, Mulikin JC (2001) A fast search method for large DNA databases. Genom Res 11(10)
Oliver T, Schmidt B (2004) High performance biosequence database scanning on reconfigurable platforms. In: IPDPS04 (HiCOMB), Santa Fe, NM, IEEE, 2004
Ong TH, Tan KL, Wang H (2002) Indexing genomic databases for fast homology searching. In: Proceedings of the 13th international conference on database and expert systems applications, September 2002, Aix-en-Provence, France, pp 871–880
Ooi BC, Pang HH, Wang H, Wong L, Yu C (2002) Fast filter-and-refine algorithms for subsequence selection. In: Proceedings of the 6th international database engineering and applications symposium (IDEAS’02), Edmonton, Canada, July 2002, pp 243–254
Ozturk O, Ferhatosmanoglu H (2003) Effective indexing and filtering for similarity search in large biosequence databases. In: 3rd IEEE international symposium on bioinformatics and bioengineering (BIBE 2003), Bethesda, MD, USA
Pearson WR, Lipman DJ (1988) Improved tools for biological sequence comparison. Proc Natl Acad Sci USA 85:2444–2448
Rognes T, Seeberg E (1998) SALSA: improved protein database searching by a new algorithm for assembly of sequence fragments into gapped alignments. Bioinf 14(10):839–845
Roy A, Mullick A, Genomic indexing using wavelets. Available at: http://people.csa.iisc.ernet.in/~aroy/gene.doc
Seshadri P, Livny M, et al (1996) The design and implementation of a sequence database system. In: Proc of the 22nd VLDB conf, Mumbai, India
Shamir R (1998) Algorithms for molecular biology, Lecture 3. Tel Aviv University, Fall 1998
Willams HE (1997) Fast ranking strategies for genomic databases
Williams HE (1999) Effective query filtering for fast homology searching. In: Pacific symposium on biocomputing, Hawaii, pp 214–225
Williams H, Zobel J (1996) Indexing nucleotide databases for fast query evaluation. In: Proc of the 5th international conference on extending database technology, Avignon, France, pp 275–288
Williams H, Zobel J (2002) Indexing and retrieval for genomic databases. IEEE Trans Knowl Data Eng 14(1):63–78
Yang Y, Liu B, Zhang Z (2003) Partition based hierarchical index for text retrieval. In: WAIM, 2003, pp 161–172
Author information
Authors and Affiliations
Corresponding authors
Rights and permissions
About this article
Cite this article
Jiang, X., Zhang, P., Liu, X. et al. Survey on index based homology search algorithms. J Supercomput 40, 185–212 (2007). https://doi.org/10.1007/s11227-006-0041-0
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11227-006-0041-0