Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

Fast and sensitive protein alignment using DIAMOND

Subjects

Abstract

The alignment of sequencing reads against a protein reference database is a major computational bottleneck in metagenomics and data-intensive evolutionary projects. Although recent tools offer improved performance over the gold standard BLASTX, they exhibit only a modest speedup or low sensitivity. We introduce DIAMOND, an open-source algorithm based on double indexing that is 20,000 times faster than BLASTX on short reads and has a similar degree of sensitivity.

This is a preview of subscription content, access via your institution

Access options

Rent or buy this article

Prices vary by article type

from$1.95

to$39.95

Prices may be subject to local taxes which are calculated during checkout

Figure 1: Comparison of DIAMOND and RAPSearch2 against BLASTX for four sequencing technologies and for ORFs predicted from a bacterial assembly.

Similar content being viewed by others

Accession codes

Accessions

Sequence Read Archive

References

  1. Handelsman, J., Rondon, M., Brady, S., Clardy, J. & Goodman, R. Chem. Biol. 5, R245–R249 (1998).

    Article  CAS  Google Scholar 

  2. Benson, D.A., Karsch-Mizrachi, I., Lipman, D., Ostell, J. & Wheeler, D. Nucleic Acids Res. 33, D34–D38 (2005).

    Article  CAS  Google Scholar 

  3. Kanehisa, M. & Goto, S. Nucleic Acids Res. 28, 27–30 (2000).

    Article  CAS  Google Scholar 

  4. Altschul, S.F., Gish, W., Miller, W., Myers, E.W. & Lipman, D.J. J. Mol. Biol. 215, 403–410 (1990).

    Article  CAS  Google Scholar 

  5. Kent, W.J. Genome Res. 12, 656–664 (2002).

    Article  CAS  Google Scholar 

  6. Edgar, R.C. Bioinformatics 26, 2460–2461 (2010).

    Article  CAS  Google Scholar 

  7. Zhao, Y., Tang, H. & Ye, Y. Bioinformatics 28, 125–126 (2012).

    Article  CAS  Google Scholar 

  8. Huson, D.H. & Xie, C. Bioinformatics 30, 38–39 (2014).

    Article  CAS  Google Scholar 

  9. Burkhardt, S. & Kärkkäinen, J. Fundamenta Informaticae 23, 1001–1018 (2003).

    Google Scholar 

  10. Ma, B., Tromp, J. & Li, M. Bioinformatics 18, 440–445 (2002).

    Article  CAS  Google Scholar 

  11. Ilie, L., Ilie, S., Khoshraftar, S. & Bigvand, A.M. BMC Genomics 12, 280 (2011).

    Article  Google Scholar 

  12. Murphy, L.R., Wallqvist, A. & Levy, R.M. Protein Eng. 13, 149–152 (2000).

    Article  CAS  Google Scholar 

  13. Smith, T.F. & Waterman, M.S. J. Mol. Biol. 147, 195–197 (1981).

    Article  CAS  Google Scholar 

  14. Mackelprang, R. et al. Nature 480, 368–371 (2011).

    Article  CAS  Google Scholar 

  15. Jansson, J. Microbe 6, 309–315 (2011).

    Google Scholar 

  16. Turnbaugh, P.J. et al. Nature 449, 804–810 (2007).

    Article  CAS  Google Scholar 

  17. Venter, J.C. et al. Science 304, 66–74 (2004).

    Article  CAS  Google Scholar 

  18. Wilson, M.C. et al. Nature 506, 58–62 (2014).

    Article  CAS  Google Scholar 

  19. Wheeler, D.L. et al. Nucleic Acids Res. 36, D13–D21 (2008).

    Article  CAS  Google Scholar 

  20. Boncz, P., Manegold, S. & Kersten, M.L. Proc. VLDB Conf. 99, 54–65 (1999).

    Google Scholar 

  21. Hach, F. et al. Nat. Methods 7, 576–577 (2010).

    Article  CAS  Google Scholar 

  22. Rognes, T. BMC Bioinformatics 12, 221 (2011).

    Article  Google Scholar 

  23. Henikoff, J.G. & Henikoff, S. Methods Enzymol. 266, 88–105 (1996).

    Article  CAS  Google Scholar 

  24. Zhu, W., Lomsadze, A. & Borodovsky, M. Nucleic Acids Res. 38, e132 (2010).

    Article  Google Scholar 

Download references

Acknowledgements

This research was partially supported by the National Research Foundation and Ministry of Education Singapore under its Research Centre of Excellence Programme, and by the A*STAR Computational Resource Centre through the use of its high-performance computing facilities.

Author information

Authors and Affiliations

Authors

Contributions

B.B. designed and implemented the algorithm. C.X. performed the experimental study. C.X. and D.H.H. initiated and guided the project. D.H.H. and B.B. wrote the manuscript.

Corresponding authors

Correspondence to Benjamin Buchfink or Daniel H Huson.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

Integrated supplementary information

Supplementary Figure 1 Spaced seeds.

(a) The four seed shapes of weight 12 that DIAMOND uses by default. Ones and zeros indicate positions to use and ignore, respectively. (b) Illustration of the application of a spaced seed to match letters between a reference and a query sequence.

Supplementary Figure 2 Ratio of main memory accesses.

The ratio K/K’ as a function of the total length of the query sequences, for different seed lengths. The variables K and K’ represent the approximate number of main memory accesses required when using a single index or double index, respectively.

Supplementary Figure 3 PCoA analysis of 12 permafrost samples based on a subset of 6 million reads.

BLASTX results are shown on the left, (a) and (c). DIAMOND-fast results are shown on the right, (b) and (d). The upper two panels show the first and second principle coordinates, whereas the lower two panels show the first and third principle coordinates.

Source data

Supplementary information

Supplementary Text and Figures

Supplementary Figures 1–3 and Supplementary Tables 1–3 (PDF 523 kb)

Supplementary Software

DIAMOND v0.4.7 source code (ZIP 2737 kb)

Source data

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Buchfink, B., Xie, C. & Huson, D. Fast and sensitive protein alignment using DIAMOND. Nat Methods 12, 59–60 (2015). https://doi.org/10.1038/nmeth.3176

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/nmeth.3176

This article is cited by

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing