Skip to main content
Log in

Survey on index based homology search algorithms

  • Published:
The Journal of Supercomputing Aims and scope Submit manuscript

Abstract

Up to now, there are many homology search algorithms that have been investigated and studied. However, a good classification method and a comprehensive comparison for these algorithms are absent. This is especially true for index based homology search algorithms. The paper briefly introduces main index construction methods. According to index construction methods, index based homology search algorithms are classified into three categories, i.e., length based index ones, transformation based index ones, and their combination. Based on the classification, the characteristics of the currently popular index based homology search algorithms are compared and analyzed. At the same time, several promising and new index techniques are also discussed. As a whole, the paper provides a survey on index based homology search algorithms.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Similar content being viewed by others

References

  1. Aghili SA, Agrawal D, El Abbadi A (2003) Filtration of string proximity search via transformation. In: Third IEEE symposium on bioinformatics and bioengineering (BIBE’03), Bethesda, MD, USA, 2003

  2. Aghili SA, Sahin OD, Agrawal D, El Abbadi A (2004) Efficient filtration of sequence similarity search through singular value decomposition. In: Fourth IEEE symposium on bioinformatics and bioengineering (BIBE’04), Taichung, Taiwan, 2004

  3. Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ (1990) Basic local alignment search tool. National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD

  4. Altschul SF, Madden T, Alejandro A, Schaffer A, Zhang J, Zhang Z, Miller W, Lipman DJ (1997) Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, July 1997

  5. Argyros T, Ermopoulos C (2003) Efficient subsequence matching in time series databases under time and amplitude transformations. In: ICDM, 2003, pp 481–484

  6. Califano A, Rigoutsos I (1993) FLASH: a fast look-up algorithm for string homology. In: International conference on intelligent systems for molecular biology, Bethesda, MD, pp 56–64

  7. Cao X, Li SC, Ooi BC, Tung AKH (2004) Piers: an efficient model for similarity search in DNA sequence databases. Sigmod Record, Special Issue

  8. Chattaraj A, Williams HE (2004) Variable-length intervals in homology search. In: Asia-pacific bioinformatics conference, Dunedin, New Zealand, 2004

  9. Chen W, Aberer K (1997) Efficient querying on genomic databases by using metric space indexing techniques. Technical Report No. 1056, German National Research Center for Information Technology

  10. Chen W, Aberer K (1997) Efficient querying on genomic databases by using metric space indexing techniques. In: Eighth international conference and workshop on database and expert-systems applications (DEXA’97), Toulouse, France

  11. Cooper G, Raymer M, Doom T, Krane D, Futamur N (2004) Indexing genomic databases. In: Fourth IEEE symposium on bioinformatics and bioengineering (BIBE’04), Taichung, Taiwan, 2004

  12. Fondrat C, Dessen P (1995) A Rapid access motif database (RAMdb) with a search algorithm for the retrieval patterns in nucleic acids or protein databanks. Comput Appl Biosci 11(3):273–279

    Google Scholar 

  13. Gardner-Stephen P, Knowles G (2003) A novel architecture for genomic sequence searching and alignment. In: Asia-pacific computer systems architecture conference, pp 180–192

  14. Hobohm U, Sander C (1995) A sequence property approach to searching protein databases. J Molec Biol 251:390–399

    Article  Google Scholar 

  15. Hunt E, Atkinson MP, Irving RW (2001) A database index to large biological sequences. In: Proceedings of the 27th VLDB conference, Roma, Italy, 2001

  16. Hunt E, Atkinson MP, Irving RW (2002) Database indexing for large DNA and protein sequence collections. VLDB J 11(3):256–271

    Article  MATH  Google Scholar 

  17. Kahveci T, Singh AK (2001) An efficient index structure for string databases. In: Proceedings of the 37th VLDB conference, Roma, Italy, 2001

  18. Kahveci T, Singh AK (2003) MAP: searching large genome databases. In: Pacific symposium on biocomputing, Hawai, 2003

  19. Kailing K, Kriegel H-P, Schonauer S, Seidl T (2004) Efficient similarity search for hierarchical data in large databases. In: Proc 9th int conf on extending database technology (EDBT 2004), Heraklion, Greece, pp 676–693

  20. Kent WJ (2002) BLAT: the BLAST-like alignment too. Genom Res 12(4)

  21. Kriegel H-P, Schonauer S (2003) Similarity search in structured data. In: Proc 5th int conf on data warehousing and knowledge discovery (DaWaK’03), Prague, Czech Republic, Lecture notes in computer science (LNCS), vol 2737, 2003, pp 309–319

  22. Lee HP, Tsai YT, Sheu TF, Tang CT (2004) An IDC-based algorithm for efficient homology filtration with guaranteed seriate coverage. In: Fourth IEEE symposium on bioinformatics and bioengineering (BIBE’04), Taichung, Taiwan, 2004

  23. Navarro G, Baeza-Yates R, Sutinen E, Tarhio J (2001) Indexing methods for approximate string matching. IEEE Data Eng Bul 24(4)

  24. Ning Z, Cox AJ, Mulikin JC (2001) A fast search method for large DNA databases. Genom Res 11(10)

  25. Oliver T, Schmidt B (2004) High performance biosequence database scanning on reconfigurable platforms. In: IPDPS04 (HiCOMB), Santa Fe, NM, IEEE, 2004

  26. Ong TH, Tan KL, Wang H (2002) Indexing genomic databases for fast homology searching. In: Proceedings of the 13th international conference on database and expert systems applications, September 2002, Aix-en-Provence, France, pp 871–880

  27. Ooi BC, Pang HH, Wang H, Wong L, Yu C (2002) Fast filter-and-refine algorithms for subsequence selection. In: Proceedings of the 6th international database engineering and applications symposium (IDEAS’02), Edmonton, Canada, July 2002, pp 243–254

  28. Ozturk O, Ferhatosmanoglu H (2003) Effective indexing and filtering for similarity search in large biosequence databases. In: 3rd IEEE international symposium on bioinformatics and bioengineering (BIBE 2003), Bethesda, MD, USA

  29. Pearson WR, Lipman DJ (1988) Improved tools for biological sequence comparison. Proc Natl Acad Sci USA 85:2444–2448

    Article  Google Scholar 

  30. Rognes T, Seeberg E (1998) SALSA: improved protein database searching by a new algorithm for assembly of sequence fragments into gapped alignments. Bioinf 14(10):839–845

    Article  Google Scholar 

  31. Roy A, Mullick A, Genomic indexing using wavelets. Available at: http://people.csa.iisc.ernet.in/~aroy/gene.doc

  32. Seshadri P, Livny M, et al (1996) The design and implementation of a sequence database system. In: Proc of the 22nd VLDB conf, Mumbai, India

  33. Shamir R (1998) Algorithms for molecular biology, Lecture 3. Tel Aviv University, Fall 1998

  34. Willams HE (1997) Fast ranking strategies for genomic databases

  35. Williams HE (1999) Effective query filtering for fast homology searching. In: Pacific symposium on biocomputing, Hawaii, pp 214–225

  36. Williams H, Zobel J (1996) Indexing nucleotide databases for fast query evaluation. In: Proc of the 5th international conference on extending database technology, Avignon, France, pp 275–288

  37. Williams H, Zobel J (2002) Indexing and retrieval for genomic databases. IEEE Trans Knowl Data Eng 14(1):63–78

    Article  Google Scholar 

  38. Yang Y, Liu B, Zhang Z (2003) Partition based hierarchical index for text retrieval. In: WAIM, 2003, pp 161–172

Download references

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Xianyang Jiang or Stephen S.-T. Yau.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Jiang, X., Zhang, P., Liu, X. et al. Survey on index based homology search algorithms. J Supercomput 40, 185–212 (2007). https://doi.org/10.1007/s11227-006-0041-0

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11227-006-0041-0

Keywords

Navigation