Skip to main content

Deciphering Metatranscriptomic Data

  • Protocol
  • First Online:
RNA Bioinformatics

Abstract

Metatranscriptomic data contributes another piece of the puzzle to understanding the phylogenetic structure and function of a community of organisms. High-quality total RNA is a bountiful mixture of ribosomal, transfer, messenger and other noncoding RNAs, where each family of RNA is vital to answering questions concerning the hidden microbial world. Software tools designed for deciphering metatranscriptomic data fall under two main categories: the first is to reassemble millions of short nucleotide fragments produced by high-throughput sequencing technologies into the original full-length transcriptomes for all organisms within a sample, and the second is to taxonomically classify the organisms and determine their individual functional roles within a community. Species identification is mainly established using the ribosomal RNA genes, whereas the behavior and functionality of a community is revealed by the messenger RNA of the expressed genes. Numerous chemical and computational methods exist to separate families of RNA prior to conducting further downstream analyses, primarily suitable for isolating mRNA or rRNA from a total RNA sample. In this chapter, we demonstrate a computational technique for filtering rRNA from total RNA using the software SortMeRNA. Additionally, we propose a post-processing pipeline using the latest software tools to conduct further studies on the filtered data, including the reconstruction of mRNA transcripts for functional analyses and phylogenetic classification of a community using the ribosomal RNA.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Protocol
USD 49.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 89.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 119.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Kapranov P et al (2007) RNA maps reveal new RNA classes and a possible function for pervasive transcription. Science 316(5830):1484–1488

    Article  CAS  PubMed  Google Scholar 

  2. Velculescu VE et al (1995) Serial analysis of gene expression. Science 270(5235):484–487

    Article  CAS  PubMed  Google Scholar 

  3. Shiraki T et al (2003) Cap analysis gene expression for high-throughput analysis of transcriptional starting point and identification of promoter usage. Proc Natl Acad Sci U S A 100(26):15776–15781

    Article  CAS  PubMed Central  PubMed  Google Scholar 

  4. Janda JM, Abbott SL (2007) 16S rRNA gene sequencing for bacterial identification in the diagnostic laboratory: pluses, perils, and pitfalls. J Clin Microbiol 45(9):2761–2764

    Article  CAS  PubMed Central  PubMed  Google Scholar 

  5. Sorek R, Cossart P (2010) Prokaryotic transcriptomics: a new view on regulation, physiology and pathogenicity. Nat Rev Genet 11(1):9–16

    Article  CAS  PubMed  Google Scholar 

  6. Boissinot K, Huletsky A, Peytavi R et al (2007) Rapid exonuclease digestion of PCR-amplified targets for improved microarray hybridization. Clin Chem 53(11):2020–2023

    Article  CAS  PubMed  Google Scholar 

  7. Yi H, Cho YJ, Won S et al (2011) Duplex-specific nuclease efficiently removes rRNA for prokaryotic RNA-seq. Nucleic Acids Res 39(20):e140

    Article  CAS  PubMed Central  PubMed  Google Scholar 

  8. Kopylova E, Noe L, Touzet H (2012) SortMeRNA: fast and accurate filtering of ribosomal RNAs in metatranscriptomic data. Bioinformatics 28(24):3211–3217

    Article  CAS  PubMed  Google Scholar 

  9. Quast C, Pruesse E, Yilmaz P et al (2013) The SILVA ribosomal RNA gene database project: improved data processing and web-based tools. Nucleic Acids Res 41(D1):D590–D596

    Article  CAS  PubMed Central  PubMed  Google Scholar 

  10. DeSantis TZ, Hugenholtz P, Larsen N et al (2006) Greengenes, a chimera-checked 16S rRNA gene database and workbench compatible with ARB. Appl Environ Microbiol 72(7):5069–5072

    Article  CAS  PubMed Central  PubMed  Google Scholar 

  11. Griffiths-Jones S, Bateman A, Marshall M et al (2003) Rfam: an RNA family database. Nucleic Acids Res 31(1):439–441

    Article  CAS  PubMed Central  PubMed  Google Scholar 

  12. Cole JR, Wang Q, Cardenas E et al (2008) The Ribosomal Database Project: improved alignments and new tools for rRNA analysis. Nucleic Acids Res 37:D141–D145

    Article  PubMed Central  PubMed  Google Scholar 

  13. Ludwig W, Strunk O, Westram R et al (2004) ARB: a software environment for sequence data. Nucleic Acids Res 32(4):1363–1371

    Article  CAS  PubMed Central  PubMed  Google Scholar 

  14. Edgar RC (2010) Search and clustering orders of magnitude faster than BLAST. Bioinformatics 26(19):2460–2461

    Article  CAS  PubMed  Google Scholar 

  15. Brown CT, Howe A, Zhang Q et al (2013) A reference-free algorithm for computational normalization of shotgun sequencing data. https://www.e-biogenouest.org/resources/46

  16. Schmieder R, Lim YW, Rohwer F et al (2010) TagCleaner: identification and removal of tag sequences from genomic and metagenomic datasets. BMC Bioinformatics 11:341

    Article  PubMed Central  PubMed  Google Scholar 

  17. Schmieder R, Edwards R (2011) Quality control and preprocessing of metagenomic datasets. Bioinformatics 27(6):863–864

    Article  CAS  PubMed Central  PubMed  Google Scholar 

  18. Morgulis A, Gertz EM, Schäffer AA et al (2006) A fast and symmetric DUST implementation to mask low-complexity DNA sequences. J Comput Biol 13(5):1028–1040

    Article  CAS  PubMed  Google Scholar 

  19. Salmela L, Schroder J (2011) Correcting errors in short reads by multiple alignments. Bioinformatics 27(11):1455–1461

    Article  CAS  PubMed  Google Scholar 

  20. Goecks J, Nekrutenko A, Taylor J, The Galaxy Team (2010) Galaxy: a comprehensive approach for supporting accessible, reproducible, and transparent computational research in the life sciences. Genome Biol 11(8):R86. doi:10.1186/gb-2010-11-8-r86

    Article  PubMed Central  PubMed  Google Scholar 

  21. Radax R, Rattei T, Lanzen A et al (2012) Metatranscriptomics of the marine sponge Geodia barretti: tackling phylogeny and function of its microbial community. Environ Microbiol 14(5):1308–1324

    Article  CAS  PubMed  Google Scholar 

  22. Fan L, McElroy K, Thomas T (2012) Reconstruction of ribosomal RNA genes from metagenomic data. PLoS One 7(6):e39948. doi:10.1371/journal.pone.0039948

    Article  CAS  PubMed Central  PubMed  Google Scholar 

  23. Miller CS, Baker BJ, Thomas BC et al (2011) EMIRGE: reconstruction of full-length ribosomal genes from microbial community short read sequencing data. Genome Biol 12(5):R44

    Article  CAS  PubMed Central  PubMed  Google Scholar 

  24. Luo R, Liu B, Xie Y et al (2012) SOAPdenovo2: an empirically improved memory-efficient short-read de novo assembler. GigaScience. doi:10.1186/2047-217X-1-18

    PubMed Central  PubMed  Google Scholar 

  25. Mason OU, Hazen TC, Borglin S et al (2012) Metagenome, metatranscriptome and single-cell sequencing reveal microbial response to Deepwater Horizon oil spill. ISME J 6(9):1715–1727

    Article  CAS  PubMed Central  PubMed  Google Scholar 

  26. Sommer DD, Delcher AL, Salzberg SL et al (2007) Minimus: a fast, lightweight genome assembler. BMC Bioinformatics. doi:10.1186/1471-2105-8-64

    PubMed Central  PubMed  Google Scholar 

  27. Schulz MH, Zerbino DR, Vingron M et al (2012) Oases: robust de novo RNA-seq assembly across the dynamic range of expression levels. Bioinformatics 28(8):1086–1092

    Article  CAS  PubMed Central  PubMed  Google Scholar 

  28. Grabherr MG, Haas BJ, Yassour M et al (2011) Full-length transcriptome assembly from RNA-Seq data without a reference genome. Nat Biotechnol 29:644–652

    Article  CAS  PubMed Central  PubMed  Google Scholar 

  29. Pell J, Hintze A, Canino-Koning R et al (2012) Scaling metagenome sequence assembly with probabilistic de Bruijn graphs. Proc Natl Acad Sci U S A. doi:10.1073/pnas.1121464109

    Google Scholar 

  30. Altschul SF, Gish W, Miller W et al (1990) Basic local alignment search tool. J Mol Biol 215(3):403–410

    Article  CAS  PubMed  Google Scholar 

  31. Langmead B, Trapnell C, Pop M et al (2009) Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biol 10:R25

    Article  PubMed Central  PubMed  Google Scholar 

  32. Kim D, Pertea G, Trapnell C et al (2013) TopHat2: accurate alignment of transcriptomes in the presence of insertions, deletions and gene fusions. Genome Biol 14(4):R36

    Article  PubMed Central  PubMed  Google Scholar 

  33. Langmead B, Salzberg SL (2012) Fast gapped-read alignment with Bowtie 2. Nat Methods 9(4):357–359

    Article  CAS  PubMed Central  PubMed  Google Scholar 

  34. Pruitt KD, Tatusova T, Maglott DR (2005) NCBI Reference Sequence (RefSeq): a curated non-redundant sequence database of genomes, transcripts and proteins. Nucleic Acids Res 33:D501–D504

    Article  CAS  PubMed Central  PubMed  Google Scholar 

  35. Kanehisa M, Goto S, Sato Y et al (2012) KEGG for integration and interpretation of large-scale molecular datasets. Nucleic Acids Res 40:D109–D114

    Article  CAS  PubMed Central  PubMed  Google Scholar 

  36. Overbeek R, Begley T, Butler RM et al (2005) The subsystems approach to genome annotation and its use in the project to annotate 1000 genomes. Nucleic Acids Res 33(17):|5691–5702

    Article  CAS  PubMed Central  PubMed  Google Scholar 

  37. Meyer F, Paarmann D, D’Souza M et al (2008) The metagenomics RAST server—a public resource for the automatic phylogenetic and functional analysis of metagenomes. BMC Bioinformatics 9:386. doi:10.1186/1471-2105-9-386

    Article  CAS  PubMed Central  PubMed  Google Scholar 

  38. Hudson DH, Mitra S, Ruscheweyh HJ et al (2011) Integrative analysis of environmental sequences using MEGAN4. Genome Res 21(9):1552–1560

    Article  Google Scholar 

  39. Mitra S, Rupek P, Richter DC et al (2011) Functional analysis of metagenomes and metatranscriptomes using SEED and KEGG. BMC Bioinformatics 12(Suppl 1):S21

    Article  PubMed Central  PubMed  Google Scholar 

  40. Rho M, Tang H, Ye Y (2010) FragGeneScan: predicting genes in short and error-prone reads. Nucleic Acids Res 38(20):e191

    Article  PubMed Central  PubMed  Google Scholar 

  41. Delcher AL, Bratke KA, Powers EC et al (2007) Identifying bacterial genes and endosymbiont DNA with Glimmer. Bioinformatics 23(6):673–679

    Article  CAS  PubMed Central  PubMed  Google Scholar 

  42. Lin X, Hong C, Xiaohua H et al (2006) Average gene length is highly conserved in prokaryotes and eukaryotes and diverges only between the two kingdoms. Mol Biol Evol 23(6):1107–1108

    Article  Google Scholar 

  43. Lane DJ, Pace B, Olsen GJ et al (1985) Rapid determination of 16S ribosomal RNA sequences for phylogenetic analyses. Proc Natl Acad Sci U S A 82(20):6955–6959

    Article  CAS  PubMed Central  PubMed  Google Scholar 

  44. Schloss PD, Westcott SL, Ryabin T et al (2009) Introducing mothur: open-source, platform-independent, community-supported software for describing and comparing microbial communities. Appl Environ Microbiol 75(23):7537–7541

    Article  CAS  PubMed Central  PubMed  Google Scholar 

  45. Caporaso JG, Kuczynski J, Stombaugh J et al (2010) QIIME allows analysis of high-throughput community sequencing data. Nat Methods 7(5):335–336

    Article  CAS  PubMed Central  PubMed  Google Scholar 

  46. Wang Q, Garrity GM, Tiedje JM et al (2007) Naive Bayesian classifier for rapid assignment of rRNA sequences into the new bacterial taxonomy. Appl Environ Microbiol 73(16):5261–5267

    Article  CAS  PubMed Central  PubMed  Google Scholar 

Download references

Acknowledgments

This research was supported by the French National Agency for Research (grant ANR-2010-COSI-004) and the French National Sequencing Center (Genoscope).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Evguenia Kopylova .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer Science+Business Media New York

About this protocol

Cite this protocol

Kopylova, E. et al. (2015). Deciphering Metatranscriptomic Data. In: Picardi, E. (eds) RNA Bioinformatics. Methods in Molecular Biology, vol 1269. Humana Press, New York, NY. https://doi.org/10.1007/978-1-4939-2291-8_17

Download citation

  • DOI: https://doi.org/10.1007/978-1-4939-2291-8_17

  • Published:

  • Publisher Name: Humana Press, New York, NY

  • Print ISBN: 978-1-4939-2290-1

  • Online ISBN: 978-1-4939-2291-8

  • eBook Packages: Springer Protocols

Publish with us

Policies and ethics