Skip to main content

Advertisement

Log in

The Closest BLAST Hit Is Often Not the Nearest Neighbor

  • Letter to the Editor
  • Published:
Journal of Molecular Evolution Aims and scope Submit manuscript

Abstract

It is well known that basing phylogenetic reconstructions on uncorrected genetic distances can lead to errors in their reconstruction. Nevertheless, it is often common practice to report simply the most similar BLAST (Altschul et al. 1997) hit in genomic reports that discuss many genes (Ruepp et al. 2000; Freiberg et al. 1997). This is because BLAST hits can provide a rapid, efficient, and concise analysis of many genes at once. These hits are often interpreted to imply that the gene is most closely related to the gene or protein in the databases that returned the closest BLAST hit. Though these two may coincide, for many genes, particularly genes with few homologs, they may not be the same. There are a number of circumstances that can account for such limitations in accuracy (Eisen 2000). We stress here that genes appearing to be the most similar based on BLAST hits are often not each others closest relative phylogenetically. The extent to which this occurs depends on the availability of close relatives present in the databases. As an example we have chosen the analysis of the genomes of a crenarcheaota species Aeropyrum pernix, an organism with few close relatives fully sequenced, and Escherichia coli, an organism whose closest relative, Salmonella typhimurium, is completely sequenced.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

References

  • Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ (1997): Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res 25:3389–3444

    Article  CAS  PubMed Central  PubMed  Google Scholar 

  • Eisen JA (2000): Horizontal gene transfer among microbial genomes: new insights from complete genome analysis. Curr Opin Genet Dev 10:606–611

    Article  CAS  PubMed  Google Scholar 

  • Eisen JA (1998): Phylogenomics: improving functional predictions for uncharacterized genes by evolutionary analysis. Genome Res 8: 163–167

    Article  CAS  PubMed  Google Scholar 

  • Eisen JA, Hanawalt PC (1999): A phylogenomic study of DNA repair genes, proteins, and processes. Mutat Res 435:171–213

    Article  CAS  PubMed Central  PubMed  Google Scholar 

  • Freiberg C, Fellay R, Bairoch A, Broughton WJ, Rosenthal A, Perret X (1997): Molecular basis of symbiosis between Rhizobium and legumes. Nature 387:394–401

    Article  CAS  PubMed  Google Scholar 

  • Golding GB (1983): Estimates of DNA and protein sequence divergence: an examination of some assumptions. Mol Biol Evol 1:125–144

    CAS  PubMed  Google Scholar 

  • Lawrence JG, Ochman H (1998): Molecular archaeology of the Escherichia coli genome. Proc Natl Acad Sci USA 95:9413–9417

    Article  CAS  PubMed Central  PubMed  Google Scholar 

  • Nelson KE, Clayton RA, Gill SR, Gwinn ML, Dodson RJ, Haft DH, Hickey EK, Peterson JD, Nelson WC, Ketchum KA, McDonald L, Utterback TR, Malek JA, Linher KD, Garrett MM, Stewart AM, Cotton MD, Pratt MS, Phillips CA, Richardson D, Heidelberg J, Sutton GG, Fleischmann RD, Eisen JA, Fraser CM, et al (1999): Evidence for lateral gene transfer between Archaea and bacteria from genome sequence of Thermotoga maritima. Nature 399:323–329

    Article  CAS  PubMed  Google Scholar 

  • Ribeiro S, Golding GB (1998): The mosaic nature of the eukaryotic nucleus. Mol Biol Evol 15:779–788

    Article  CAS  PubMed  Google Scholar 

  • Ruepp A, Graml W, Santos-Martinez ML, Koretke KK, Volker C, Mewes HW, Frishman D, Stocker S, Lupas AN, Baumeister W (2000): The genome sequence of the thermoacidophilic scavenger Thermoplasma acidophilum. Nature 407:508–511

    Article  CAS  PubMed  Google Scholar 

  • Sicheritz-Ponten T, Anderson GE (2001): A phylogenomic approach to microbial evolution. Nucl Acids Res 29:545–552

    Article  CAS  PubMed Central  PubMed  Google Scholar 

  • Strimmer K, von Haeseler A (1996): Quartet puzzling: a quartet maximum- likelihood method for recontructing tree topoplogies. Mol Biol Evol 13:964–969

    Article  CAS  Google Scholar 

  • Tourasse NJ, Gouy M (1999): Accounting for evolutionary rate variation among sequence sites consistently changes universal phylogenies deduced from rRNA and protein-coding genes. Mol Phylogenet Evol 13:159–168

    Article  CAS  PubMed  Google Scholar 

  • Woese CR (1987): Bacterial evolution. Microbiol Rev 51:221–271

    CAS  PubMed Central  PubMed  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

About this article

Cite this article

Koski, L.B., Golding, G.B. The Closest BLAST Hit Is Often Not the Nearest Neighbor. J Mol Evol 52, 540–542 (2001). https://doi.org/10.1007/s002390010184

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s002390010184

Navigation