Abstract
Advances in DNA sequencing technology have improved our ability to characterize most genomic diversity. However, accurate resolution of large structural events is challenging because of the short read lengths of second-generation technologies. Third-generation sequencing technologies, which can yield longer multikilobase reads, have the potential to address limitations associated with genome assembly. Here we combine sequencing data from second- and third-generation DNA sequencing technologies to assemble the two-chromosome genome of a recent Haitian cholera outbreak strain into two nearly finished contigs at >99.9% accuracy. Complex regions with clinically relevant structure were completely resolved. In separate control assemblies on experimental and simulated data for the canonical N16961 cholera reference strain, we obtained 14 scaffolds of greater than 1 kb for the experimental data and 8 scaffolds of greater than 1 kb for the simulated data, which allowed us to correct several errors in contigs assembled from the short-read data alone. This work provides a blueprint for the next generation of rapid microbial identification and full-genome assembly.
This is a preview of subscription content, access via your institution
Access options
Subscribe to this journal
Receive 12 print issues and online access
$209.00 per year
only $17.42 per issue
Buy this article
- Purchase on Springer Link
- Instant access to full article PDF
Prices may be subject to local taxes which are calculated during checkout
Similar content being viewed by others
Accession codes
Primary accessions
BioProject
NCBI Reference Sequence
Sequence Read Archive
Referenced accessions
NCBI Reference Sequence
References
Chin, C.S. et al. The origin of the Haitian cholera outbreak strain. N. Engl. J. Med. 364, 33–42 (2011).
Rasko, D.A. et al. Origins of the E. coli strain causing an outbreak of hemolytic-uremic syndrome in Germany. N. Engl. J. Med. 365, 709–717 (2011).
Rohde, H. et al. Open-source genomic analysis of Shiga-toxin–producing E. coli O104:H4. N. Engl. J. Med. 365, 718–724 (2011).
Ali, A. et al. Recent clonal origin of cholera in Haiti. Emerg. Infect. Dis. 17, 699–701 (2011).
Hendriksen, R.S. et al. Population genetics of Vibrio cholerae from Nepal in 2010: evidence on the origin of the Haitian outbreak. MBio 2, e00157–e00111 (2011).
Mutreja, A. et al. Evidence for several waves of global transmission in the seventh cholera pandemic. Nature 477, 462–465 (2011).
Reimer, A.R. et al. Comparative genomics of Vibrio cholerae from Haiti, Asia, and Africa. Emerg. Infect. Dis. 17, 2113–2121 (2011).
Metzker, M.L. Sequencing technologies—the next generation. Nat. Rev. Genet. 11, 31–46 (2010).
Schadt, E.E., Turner, S. & Kasarskis, A. A window into third generation sequencing. Hum. Mol. Genet. 19, R227–R240 (2010); erratum 20, 853 (2011).
Mardis, E.R. Next-generation DNA sequencing methods. Annu. Rev. Genomics Hum. Genet. 9, 387–402 (2008).
Chaisson, M., Pevzner, P. & Tang, H. Fragment assembly with short reads. Bioinformatics 20, 2067–2074 (2004).
Pevzner, P.A., Tang, H. & Waterman, M.S. An Eulerian path approach to DNA fragment assembly. Proc. Natl. Acad. Sci. USA 98, 9748–9753 (2001).
Myers, E.W. The fragment assembly string graph. Bioinformatics 21 (suppl. 2), ii79–ii85 (2005).
Medvedev, P. & Brudno, M. Maximum likelihood genome assembly. J. Comput. Biol. 16, 1101–1116 (2009).
Batzoglou, S. et al. ARACHNE: a whole-genome shotgun assembler. Genome Res. 12, 177–189 (2002).
Myers, E.W. et al. A whole-genome assembly of Drosophila. Science 287, 2196–2204 (2000).
Schatz, M.C., Delcher, A.L. & Salzberg, S.L. Assembly of large genomes using second-generation sequencing. Genome Res. 20, 1165–1173 (2010).
Chaisson, M.J. & Pevzner, P.A. Short read fragment assembly of bacterial genomes. Genome Res. 18, 324–330 (2008).
Simpson, J.T. et al. ABySS: a parallel assembler for short read sequence data. Genome Res. 19, 1117–1123 (2009).
Zerbino, D.R. & Birney, E. Velvet: algorithms for de novo short read assembly using de Bruijn graphs. Genome Res. 18, 821–829 (2008).
Butler, J. et al. ALLPATHS: de novo assembly of whole-genome shotgun microreads. Genome Res. 18, 810–820 (2008).
Margulies, M. et al. Genome sequencing in microfabricated high-density picolitre reactors. Nature 437, 376–380 (2005); erratum 441, 120 (2006).
Li, R. et al. De novo assembly of human genomes with massively parallel short read sequencing. Genome Res. 20, 265–272 (2010).
Kingsford, C., Schatz, M.C. & Pop, M. Assembly complexity of prokaryotic genomes using short reads. BMC Bioinformatics 11, 21 (2010).
Alkan, C., Sajjadian, S. & Eichler, E.E. Limitations of next-generation genome sequence assembly. Nat. Methods 8, 61–65 (2010); comment 8, 59–60 (2011).
Chain, P.S.G. et al. Genomics. Genome project standards in a new era of sequencing. Science 326, 236–237 (2009).
Alkan, C., Coe, B.P. & Eichler, E.E. Genome structural variation discovery and genotyping. Nat. Rev. Genet. 12, 363–376 (2011).
Li, Y. et al. Structural variation in two human genomes mapped at single-nucleotide resolution by whole genome de novo assembly. Nat. Biotechnol. 29, 723–730 (2011).
Nelson, K.E. et al. A catalog of reference genomes from the human microbiome. Science 328, 994–999 (2010).
Liolios, K. et al. The Genomes On Line Database (GOLD) in 2009: status of genomic and metagenomic projects and their associated metadata. Nucleic Acids Res. 38, D346–D354 (2010).
Schadt, E.E., Turner, S. & Kasarskis, A. A window into third-generation sequencing. Hum. Mol. Genet. 19, R227–R240 (2010).
Goldberg, S.M.D. et al. A Sanger/pyrosequencing hybrid approach for the generation of high-quality draft assemblies of marine microbial genomes. Proc. Natl. Acad. Sci. USA 103, 11240–11245 (2006).
Pop, M. Genome assembly reborn: recent computational challenges. Brief. Bioinform. 10, 354–366 (2009).
Miller, J.R. et al. Aggressive assembly of pyrosequencing reads with mates. Bioinformatics 24, 2818–2824 (2008).
Reinhardt, J.A. et al. De novo assembly using low-coverage short read sequence data from the rice pathogen Pseudomonas syringae pv. oryzae. Genome Res. 19, 294–305 (2009).
Kong, A. et al. Parental origin of sequence variants associated with complex diseases. Nature 462, 868–874 (2009).
Ritz, A., Bashir, A. & Raphael, B.J. Structural variation analysis with strobe reads. Bioinformatics 26, 1291–1298 (2010).
Grim, C.J. et al. Genome sequence of hybrid Vibrio cholerae O1 MJ-1236, B-33, and CIRS101 and comparative genomics with V. cholerae. J. Bacteriol. 192, 3524–3533 (2010).
Frerichs, R.R., Keim, P.S., Barrais, R. & Piarroux, R. Nepalese origin of cholera epidemic in Haiti. Clin. Microbiol. Infect. 18, E158–E163 (2012).
Davis, B.M. & Waldor, M.K. CTXϕ contains a hybrid genome derived from tandemly integrated elements. Proc. Natl. Acad. Sci. USA 97, 8572–8577 (2000).
Rubin, E.J., Lin, W., Mekalanos, J.J. & Waldor, M.K. Replication and integration of a Vibrio cholerae cryptic plasmid linked to the CTX prophage. Mol. Microbiol. 28, 1247–1254 (1998).
Hassan, F., Kamruzzaman, M., Mekalanos, J.J. & Faruque, S.M. Satellite phage TLCϕ enables toxigenic conversion by CTX phage through dif site alteration. Nature 467, 982–985 (2010).
Mazel, D., Dychinco, B., Webb, V.A. & Davies, J. A distinctive class of integron in the Vibrio cholerae genome. Science 280, 605–608 (1998).
Rowe-Magnus, D.A., Guerout, A.M. & Mazel, D. Super-integrons. Res. Microbiol. 150, 641–651 (1999).
Mazel, D. Integrons: agents of bacterial evolution. Nat. Rev. Microbiol. 4, 608–620 (2006).
Pop, M., Kosack, D.S. & Salzberg, S.L. Hierarchical scaffolding with Bambus. Genome Res. 14, 149–159 (2004).
Kurtz, S. et al. Versatile and open software for comparing large genomes. Genome Biol. 5, R12 (2004).
Dijkstra, E.W. A note on two problems in connexion with graphs. Numerische Mathematik 1, 269–271 (1959).
Acknowledgements
This study was supported in part by the US National Institutes of Health National Institute of General Medical Sciences grant R01GM068851 (J.J.M. and W.P.R.), NIH R37 AI-42347 (B.M.D. and M.K.W.) and the Howard Hughes Medical Institute (B.M.D. and M.K.W.).
Author information
Authors and Affiliations
Contributions
A.B., A.A.K., W.P.R., C.S.C., E.P., M.F., C.L.T., M.T., B.M.D., A.K., J.J.M., M.K.W. and E.E.S. designed the experiments; A.B., A.A.K., C.S.C., D.W., J.S. and J.B. designed the methods; W.P.R., E.P., D.H., M.A., S.W., P.P., R.S., J.Y., M.V., E.M., K.L., S.L., B.L., A.J., L.R., M.F., C.L.T., M.T. and B.M.D. carried out all sample-preparation experiments, all sequencing runs and PCR-validation experiments; A.B., A.A.K., W.P.R., C.S.C., D.W., J.B., A.A.K., M.K.W. and E.E.S. jointly analyzed the data sets; and A.B., A.A.K., W.P.R., L.R., M.F., C.L.T., M.T., B.M.D., J.J.M., M.K.W. and E.E.S. wrote the manuscript.
Corresponding author
Ethics declarations
Competing interests
Many of the authors are employees of and own stock in Pacific Biosciences.
Supplementary information
Supplementary Text and Figures
Supplementary Methods, Supplementary Results, Supplementary Tables 1-9 and Supplementary Figs. 1-13 (PDF 2063 kb)
Rights and permissions
About this article
Cite this article
Bashir, A., Klammer, A., Robins, W. et al. A hybrid approach for the automated finishing of bacterial genomes. Nat Biotechnol 30, 701–707 (2012). https://doi.org/10.1038/nbt.2288
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1038/nbt.2288
This article is cited by
-
Orienting Ordered Scaffolds: Complexity and Algorithms
SN Computer Science (2022)
-
Biological computation and computational biology: survey, challenges, and discussion
Artificial Intelligence Review (2021)
-
Multi-tissue transcriptome analysis using hybrid-sequencing reveals potential genes and biological pathways associated with azadirachtin A biosynthesis in neem (azadirachta indica)
BMC Genomics (2020)
-
A completeness-independent method for pre-selection of closely related genomes for species delineation in prokaryotes
BMC Genomics (2020)
-
A survey on de novo assembly methods for single‐molecular sequencing
Quantitative Biology (2020)