Evaluation of assembly methods combining long-reads and short-reads to obtain Paenibacillus sp. R4 high-quality complete genome

Shin, Seung Chul; Choi, Woong; Lee, Junhyuck; Kim, Hyo Jin; Kim, Han-Woo

doi:10.1007/s13205-020-02474-0

Evaluation of assembly methods combining long-reads and short-reads to obtain Paenibacillus sp. R4 high-quality complete genome

Original Article
Published: 19 October 2020

Volume 10, article number 480, (2020)
Cite this article

3 Biotech Aims and scope Submit manuscript

Seung Chul Shin ORCID: orcid.org/0000-0001-6835-484X¹,
Woong Choi²,
Junhyuck Lee^2,3,
Hyo Jin Kim^4,5 &
…
Han-Woo Kim^2,3

489 Accesses
1 Citation
1 Altmetric
Explore all metrics

Abstract

We sequenced the Paenibacillus sp. R4 using Oxford Nanopore Technology (ONT), single molecule real-time (SMRT) technology from Pacific Biosciences (PacBio), and Illumina technologies to investigate the application of nanopore reads in de novo sequencing of bacterial genomes. We compared the differences in both genome sequences between genome assemblies using nanopore and PacBio reads and focused on the difference in the prediction of coding sequences. The results indicated that for more accurate predictions of open reading frames, contigs in the assemblies using only PacBio reads also needed to be corrected using short reads with high-quality bases, and repeat regions in genomes did not affect the increase of mispredicted coding sequences via genome polishing significantly. In assemblies using only nanopore reads, genome polishing was essential, but many repeat regions in genomes might increase the number of mispredicted coding sequences via genome polishing. The hybrid assembly combining the long reads and short reads represents the best result for coding sequence predictions in genome assemblies using nanopore reads.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

New evaluation methods of read mapping by 17 aligners on simulated and empirical NGS data: an updated comparison of DNA- and RNA-Seq data from Illumina and Ion Torrent technologies

Article 16 June 2021

Opportunities and challenges in long-read sequencing data analysis

Article Open access 07 February 2020

Best Practices in Designing, Sequencing, and Identifying Random DNA Barcodes

Article Open access 18 January 2023

Availability of data and material

The raw data have been deposited at the National Center for Biotechnology Information (NCBI) BioProject repository PRJNA564035 (SRX6807868-SRX6807870). This strain is available from the Polar and Alpine Microbial Collection (PAMC) of Korea Polar Research Institute with the accession number PAMC 29622.

References

Antipov D, Korobeynikov A, McLean JS, Pevzner PA (2016) HybridSPAdes: an algorithm for hybrid assembly of short and long reads. Bioinformatics 32:1009–1015
Article CAS Google Scholar
Ashton PM et al (2015) MinION nanopore sequencing identifies the position and structure of a bacterial antibiotic resistance island. Nat Biotechnol 33:296
Article CAS Google Scholar
Bolger AM, Lohse M, Usadel B (2014) Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics 30:2114–2120
Article CAS Google Scholar
Broder AZ (1997) On the resemblance and containment of documents. In: Proceedings. Compression and Complexity of SEQUENCES 1997 (Cat. No. 97TB100171). IEEE, pp 21–29
Chin C-S et al (2013) Nonhybrid, finished microbial genome assemblies from long-read SMRT sequencing data. Nat Methods 10:563
Article CAS Google Scholar
Clarke J, Wu H-C, Jayasinghe L, Patel A, Reid S, Bayley H (2009) Continuous base identification for single-molecule nanopore DNA sequencing. Nat Nanotechnol 4:265
Article CAS Google Scholar
De Maio N et al. (2019) Comparison of long-read sequencing technologies in the hybrid assembly of complex bacterial genomes. BioRxiv:530824
Delcher AL, Phillippy A, Carlton J, Salzberg SL (2002) Fast algorithms for large-scale genome alignment and comparison. Nucleic Acids Res 30:2478–2483
Article Google Scholar
Deschamps S et al (2016) Characterization, correction and de novo assembly of an Oxford Nanopore genomic dataset from Agrobacterium tumefaciens. Sci Rep 6:28625
Article CAS Google Scholar
Eccles D, Chandler J, Camberis M, Henrissat B, Koren S, Le Gros G, Ewbank JJ (2018) De novo assembly of the complex genome of Nippostrongylus brasiliensis using MinION long reads. BMC Biol 16:6
Article Google Scholar
Eid J et al (2009) Real-time DNA sequencing from single polymerase molecules. Science 323:133–138
Article CAS Google Scholar
Giordano F et al (2017) De novo yeast genome assemblies from MinION PacBio and MiSeq platforms. Sci Rep 7:3935
Article Google Scholar
Goldstein S, Beka L, Graf J, Klassen JL (2019) Evaluation of strategies for the assembly of diverse bacterial genomes using MinION long-read sequencing. BMC Genomics 20:23
Article Google Scholar
Hunt M, Kikuchi T, Sanders M, Newbold C, Berriman M, Otto TD (2013) REAPR: a universal tool for genome assembly evaluation. Genome Biol 14:R47
Article Google Scholar
Jain M et al (2018) Nanopore sequencing and assembly of a human genome with ultra-long reads. Nat Biotechnol 36:338
Article CAS Google Scholar
Jain M, Olsen HE, Paten B, Akeson M (2016) The Oxford Nanopore MinION: delivery of nanopore sequencing to the genomics community. Genome Biol 17:239
Article Google Scholar
Kim H, Park AK, Lee JH, Shin SC, Park H, Kim HW (2018) PsEst3, a new psychrophilic esterase from the Arctic bacterium Paenibacillus sp. R4: crystallization and X-ray crystallographic analysis. Acta Crystallogr F Struct Biol Commun 74:367–372. https://doi.org/10.1107/S2053230X18007525
Article CAS PubMed PubMed Central Google Scholar
Koren S, Walenz BP, Berlin K, Miller JR, Bergman NH, Phillippy AM (2017) Canu: scalable and accurate long-read assembly via adaptive k-mer weighting and repeat separation. Genome Res 27:722–736
Article CAS Google Scholar
Li H (2013) Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM. arXiv preprint arXiv:13033997
Li H (2018) Minimap2: pairwise alignment for nucleotide sequences. Bioinformatics 34:3094–3100
Article CAS Google Scholar
Loman NJ, Quick J, Simpson JT (2015) A complete bacterial genome assembled de novo using only nanopore sequencing data. Nat Methods 12:733
Article CAS Google Scholar
Lu H, Giordano F, Ning Z (2016) Oxford Nanopore MinION sequencing and genome assembly. Genomics Proteomics Bioinformatics 14:265–279
Article Google Scholar
Michael TP et al (2018) High contiguity Arabidopsis thaliana genome assembly with a single nanopore flow cell. Nat Commun 9:541
Article Google Scholar
Parks DH, Imelfort M, Skennerton CT, Hugenholtz P, Tyson GW (2015) CheckM: assessing the quality of microbial genomes recovered from isolates, single cells, and metagenomes. Genome Res 25:1043–1055
Article CAS Google Scholar
Passera A, Marcolungo L, Casati P, Brasca M, Quaglino F, Cantaloni C, Delledonne M (2018) Hybrid genome assembly and annotation of Paenibacillus pasadenensis strain R16 reveals insights on endophytic life style and antifungal activity. PLoS ONE 13:e0189993
Article Google Scholar
Ross MG et al (2013) Characterizing and measuring bias in sequence data. Genome Biol 14:R51
Article Google Scholar
Schmidt MH-W et al (2017) De novo assembly of a new Solanum pennellii accession using nanopore sequencing. Plant Cell 29:2336–2348
Article CAS Google Scholar
seqtk. https://github.com/lh3/seqtk Accessed 26 Aug 2019
Shin SC et al (2019) Nanopore sequencing reads improve assembly and gene annotation of the Parochlus steinenii genome. Sci Rep 9:5095
Article Google Scholar
Simão FA, Waterhouse RM, Ioannidis P, Kriventseva EV, Zdobnov EM (2015) BUSCO: assessing genome assembly and annotation completeness with single-copy orthologs. Bioinformatics 31(19):3210–3212
Article Google Scholar
SMARTdenovo. https://github.com/ruanjue/smartdenovo. Accessed 19 Nov 2018
Tanizawa Y, Fujisawa T, Nakamura Y (2017) DFAST: a flexible prokaryotic genome annotation pipeline for faster genome publication. Bioinformatics 34:1037–1039
Article Google Scholar
Tatusov RL, Galperin MY, Natale DA, Koonin EV (2000) The COG database: a tool for genome-scale analysis of protein functions and evolution. Nucleic Acids Res 28:33–36
Article CAS Google Scholar
Walker BJ et al (2014) Pilon: an integrated tool for comprehensive microbial variant detection and genome assembly improvement. PLoS ONE 9:e112963
Article Google Scholar
Wick RR, Judd LM, Gorrie CL, Holt KE (2017) Unicycler: resolving bacterial genome assemblies from short and long sequencing reads. PLoS Comput Biol 13:e1005595
Article Google Scholar

Download references

Funding

This research was supported by a National Research Foundation of Korea Grant from the Korean Government (MSIT; the Ministry of Science and ICT) (NRF-2017M1A5A1013568) (KOPRI-PN20082) (Title: application study on the Arctic cold-active enzyme degrading organic carbon compounds).

Author information

Authors and Affiliations

Division of Life Sciences, Korea Polar Research Institute (KOPRI), Inchon, 21990, Republic of Korea
Seung Chul Shin
Unit of Polar Genomics, Korea Polar Research Institute (KOPRI), Inchon, 21990, Republic of Korea
Woong Choi, Junhyuck Lee & Han-Woo Kim
Department of Polar Sciences, University of Science and Technology, Inchon, 21990, Republic of Korea
Junhyuck Lee & Han-Woo Kim
Graduate School of International Agricultural Technology, Seoul National University, Pyeongchang, 25354, Republic of Korea
Hyo Jin Kim
Institutes of Green Bio Science and Technology, Seoul National University, Pyeongchang, 25354, Republic of Korea
Hyo Jin Kim

Authors

Seung Chul Shin
View author publications
You can also search for this author in PubMed Google Scholar
Woong Choi
View author publications
You can also search for this author in PubMed Google Scholar
Junhyuck Lee
View author publications
You can also search for this author in PubMed Google Scholar
Hyo Jin Kim
View author publications
You can also search for this author in PubMed Google Scholar
Han-Woo Kim
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

Conceptualization: HWK, JHL, SCS; methodology: HJK, SCS; software: SCS; formal analysis: WC, SCS; investigation: WC; resources: SCS; data curation: SCS; writing—original draft preparation: All authors; writing—review and editing: All authors; visualization: SCS, HWK; supervision: SCS, HWK; project administration: SCS, HWK; funding: HWK.

Corresponding authors

Correspondence to Seung Chul Shin or Han-Woo Kim.

Ethics declarations

Conflict of interest

The authors declare that they have no competing financial and non-financial interests.

Ethics approval

This article does not contain any studies with human participants or animals performed by any of the authors.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary file1 (DOCX 39 kb)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Shin, S.C., Choi, W., Lee, J. et al. Evaluation of assembly methods combining long-reads and short-reads to obtain Paenibacillus sp. R4 high-quality complete genome. 3 Biotech 10, 480 (2020). https://doi.org/10.1007/s13205-020-02474-0

Download citation

Received: 06 August 2020
Accepted: 07 October 2020
Published: 19 October 2020
DOI: https://doi.org/10.1007/s13205-020-02474-0

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Evaluation of assembly methods combining long-reads and short-reads to obtain Paenibacillus sp. R4 high-quality complete genome

Abstract

Access this article

Similar content being viewed by others

New evaluation methods of read mapping by 17 aligners on simulated and empirical NGS data: an updated comparison of DNA- and RNA-Seq data from Illumina and Ion Torrent technologies

Opportunities and challenges in long-read sequencing data analysis

Best Practices in Designing, Sequencing, and Identifying Random DNA Barcodes

Availability of data and material

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding authors

Ethics declarations

Conflict of interest

Ethics approval

Electronic supplementary material

Supplementary file1 (DOCX 39 kb)

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Evaluation of assembly methods combining long-reads and short-reads to obtain Paenibacillus sp. R4 high-quality complete genome

Abstract

Access this article

Similar content being viewed by others

New evaluation methods of read mapping by 17 aligners on simulated and empirical NGS data: an updated comparison of DNA- and RNA-Seq data from Illumina and Ion Torrent technologies

Opportunities and challenges in long-read sequencing data analysis

Best Practices in Designing, Sequencing, and Identifying Random DNA Barcodes

Availability of data and material

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding authors

Ethics declarations

Conflict of interest

Ethics approval

Electronic supplementary material

Supplementary file1 (DOCX 39 kb)

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation