Skip to main content

DNA Sequence Assembly and Annotation of Genes

How to Generate the DNA Sequence and to Predict the Function of Genes

  • Chapter
  • First Online:
Introduction to Bioinformatics in Microbiology

Abstract

This chapter describes the different sequencing strategies, the pros and cons of the different strategies to help you select the optimal DNA sequencing strategy for your research question, and how to assembly and annotate DNA sequences. DNA sequencing is the determination of the order of nucleotides of parts or whole chromosomes of organisms and virus. DNA sequencing can be done for a single gene or a whole genome or many genomes at a time such as in metagenomics. One of the most popular sequencing machines is the MiSeq from Illumina which is capable of doing small whole-genome sequencing, transcriptomics, and 16S rRNA metagenomics. It is possible to multiplex by using unique combinations of specific barcodes and indexes. Real-time, single-molecule sequencing allows for sequencing of the native DNA, resulting in significantly longer read lengths and sequence information available when the bases are incorporated, i.e., information available in real time. Base calling is the first step in sequencing where the electronic signal generated in the sequencing machine is separated from random noise and converted to nucleotide information. Then the nucleotide information needs to be assembled to DNA sequences which resemble the original DNA sequenced as best as possible. This can either be done de novo without a reference or with a reference if the genome of the organism or virus is well known. The most important quality parameter to consider is the coverage. Another important parameter is N50. Comparison of different assemblies can be made with Quast. The “minimum information about a genome sequence (MIGS) specification provides an exhaustive list of the information required for genomic sequences including demands to metadata. Genome annotation is the identification and labeling of all the relevant features of the genomic sequence. At first, this includes the coordinates provided as nucleotide positions where coding regions are predicted. It is mainly a prediction of coding genes; however, other structural genes such as rRNA are also identified.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 79.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  • Aziz RK, Bartels D, Best AA, DeJongh M, Disz T, Edwards RA, Formsma K, Gerdes S, Glass EM, Kubal M, Meyer F, Olsen GJ, Olson R, Osterman AL, Overbeek RA, McNeil LK, Paarmann D, Paczian T, Parrello B, Pusch GD, Reich C, Stevens R, Vassieva O, Vonstein V, Wilke A, Zagnitko O. 2008. The RAST Server: rapid annotations using subsystems technology. BMC Genomics 9:75.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Bolger AM, Lohse M, Usadel B. 2014. Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics 30: 2114–2120.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Chun J, Oren A, Ventosa A, Christensen H, Arahal DR, da Costa MS, Rooney AP, Yi H, Xu XW, De Meyer S, Trujillo ME. 2018. Proposed minimal standards for the use of genome data for the taxonomy of prokaryotes. Int J Syst Evol Microbiol. 68, 461–466.

    Article  PubMed  Google Scholar 

  • Cock et al. 2010. The Sanger FASTQ file format for sequences with quality scores, and the Solexa/Illumina FASTQ variants. Nucleic Acids Res. 38, 1767–1771

    Article  CAS  PubMed  Google Scholar 

  • Compeau PE, Pevzner PA, Tesler G. 2011. How to apply de Bruijn graphs to genome assembly. Nat Biotechnol. 29:987–91.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Ewing B, Hillier L, Wend MC, & Green P. 1998. Base-calling of automated sequencer traces using phred. I. Accuracy assessment. Genome research 8, 175–185.

    Article  CAS  PubMed  Google Scholar 

  • Field D, Garrity G, Gray T, Morrison N, Selengut J, Sterk P, Tatusova T, Thomson N, Allen MJ, Angiuoli SV, Ashburner M, Axelrod N, Baldauf S, Ballard S, Boore J, Cochrane G, Cole J, Dawyndt P, De Vos P, DePamphilis C, Edwards R, Faruque N, Feldman R, Gilbert J, Gilna P, Glöckner FO, Goldstein P, Guralnick R, Haft D, Hancock D, Hermjakob H, Hertz-Fowler C, Hugenholtz P, Joint I, Kagan L, Kane M, Kennedy J, Kowalchuk G, Kottmann R, Kolker E, Kravitz S, Kyrpides N, Leebens-Mack J, Lewis SE, Li K, Lister AL, Lord P, Maltsev N, Markowitz V, Martiny J, Methe B, Mizrachi I, Moxon R, Nelson K, Parkhill J, Proctor L, White O, Sansone SA, Spiers A, Stevens R, Swift P, Taylor C, Tateno Y, Tett A, Turner S, Ussery D, Vaughan B, Ward N, Whetzel T, San Gil I, Wilson G, Wipat A. 2008. The minimum information about a genome sequence (MIGS) specification. Nat Biotechnol. 26, 541–7.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Glass EM, Wilkening J, Wilke A, Antonopoulos D, Meyer F. 2010. Using the metagenomics RAST server (MG-RAST) for analyzing shotgun metagenomes. Cold Spring Harb Protoc.

    Google Scholar 

  • Goodwin S, McPherson JD, McCombie WR. 2016. Coming of age: ten years of next-generation sequencing technologies. Nat Rev Genet. 17:333–51.

    Article  CAS  PubMed  Google Scholar 

  • Gurevich A, Saveliev V, Vyahhi N, Tesler G. 2013. QUAST: quality assessment tool for genome assemblies. Bioinformatics 29,1072–5.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Idury RM, Waterman MS. 1995. A new algorithm for DNA sequence assembly. J Comput Biol. 1995 Summer;2(2):291–306.

    Article  CAS  PubMed  Google Scholar 

  • Kanehisa M, Sato Y, Kawashima M, Furumichi M, Tanabe M. 2016. KEGG as a reference resource for gene and protein annotation. Nucleic Acid Res. 44(D1):D457–62.

    Article  CAS  PubMed  Google Scholar 

  • Koren S, Harhay GP, Smith TP, Bono JL, Harhay DM, Mcvey SD, Radune D, Bergman NH, Phillippy AM. 2013. Reducing assembly complexity of microbial genomes with single-molecule sequencing. Genome Biology 14: R101.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Larsen MV, Cosentino S, Rasmussen S, Friis C, Hasman H, Marvig RL, Jelsbak L, Sicheritz-PontĂ©n T, Ussery DW, Aarestrup FM, Lund O. 2012. Multilocus sequence typing of total-genome-sequenced bacteria. J Clin Microbiol. 50, 1355–61.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Madigan M, Bender KS, Buckley DH, Sattley WM, & Stahl D. 2019. Brock biology of Microorganisms. Pearson, Harlow UK.

    Google Scholar 

  • Nurk S, Bankevich A, Antipov D, Gurevich AA, Korobeynikov A, Lapidus A, Prjibelski AD, Pyshkin A, Sirotkin A, Sirotkin Y, Stepanauskas R, Clingenpeel SR, Woyke T, McLean JS, Lasken R, Tesler G, Alekseyev MA, Pevzner PA. 2013. Assembling single-cell genomes and mini-metagenomes from chimeric MDA products. J Comput Biol. 20, 714–37.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Overbeek R, Olson R, Pusch GD, Olsen GJ, Davis JJ, Disz T, Edwards RA, Gerdes S, Parrello B, Shukla M, Vonstein V, Wattam AR, Xia F, Stevens R. 2014. The SEED and the Rapid Annotation of microbial genomes using Subsystems Technology (RAST). Nucleic Acids Res. 42(Database issue):D206–14.

    Article  CAS  PubMed  Google Scholar 

  • Pearson WR, Lipman DJ. 1988. Improved tools for biological sequence comparison. Proc Natl Acad Sci U S A. 85, 2444–8.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Pevzner PA, Tang H, Waterman MS. 2001. An Eulerian path approach to DNA fragment assembly. Proc Natl Acad Sci U S A. 98, 9748–53.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Sanger F, Nicklen S, Coulson AR. 1977. DNA sequencing with chain-terminating inhibitors. Proc Natl Acad Sci USA 74, 5463–7.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Seemann T. 2014. Prokka: rapid prokaryotic genome annotation. Bioinformatics 30, 2068–9.

    Article  CAS  PubMed  Google Scholar 

  • Zerbino DR, Birney E. 2008. Velvet: algorithms for de novo short read assembly using de Bruijn graphs. Genome Res. 18, 821–9.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

Further Reading

  • Loosdrecht, M. C. M. van, Nielsen, P. H., Lopez Vazquez, C. M. and Brdjanovic, D. 2016. Experimental methods in wastewater treatment. IWA publishing, London, UK

    Book  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Henrik Christensen .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer Nature Switzerland AG

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Christensen, H., Moodley, A. (2018). DNA Sequence Assembly and Annotation of Genes. In: Christensen, H. (eds) Introduction to Bioinformatics in Microbiology. Learning Materials in Biosciences. Springer, Cham. https://doi.org/10.1007/978-3-319-99280-8_2

Download citation

Publish with us

Policies and ethics