Testing the potential of a ribosomal 16S marker for DNA metabarcoding of insects
- Published
- Accepted
- Subject Areas
- Biodiversity, Conservation Biology, Genetics, Molecular Biology, Zoology
- Keywords
- Biodiversity assessment, stream monitoring, small ribosomal subunit, high throughput sequencing
- Copyright
- © 2016 Elbrecht et al.
- Licence
- This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ Preprints) and either DOI or URL of the article must be cited.
- Cite this article
- 2016. Testing the potential of a ribosomal 16S marker for DNA metabarcoding of insects. PeerJ Preprints 4:e1855v1 https://doi.org/10.7287/peerj.preprints.1855v1
Abstract
Cytochrome c oxidase I (COI) is a powerful marker for DNA barcoding of animals, with good taxonomic resolution and a large reference database. However, when used for DNA metabarcoding, estimation of taxa abundances and species detection are limited due to primer bias caused by highly variable primer binding sites across the COI gene. Therefore, we explored the ability of the 16S ribosomal DNA gene as an alternative metabarcoding marker for species level assessments. Ten bulk samples, each containing equal amounts of tissue from 52 freshwater invertebrate taxa, were sequenced with the Illumina NextSeq 500 system. In comparison to COI, the 16S marker amplified more insect species and amplified more equally, probably due to decreased primer bias. Rough estimation of biomass might thus be less biased with 16S than with COI. According to these results, the marker choice depends on the scientific question. If the goal is to obtain a taxonomic identification at the species level, then COI is more appropriate due to established reference databases and known taxonomic resolution of this marker, knowing that a greater proportion of species will be missed using COI Folmer primers. If the goal is to obtain a more comprehensive survey in a context where it is possible to build a local reference database, the 16S marker could be more appropriate.
Author Comment
First preprint of our 16S metabarcoding manuscript. Will be submitted for review and publication at PeerJ shortly. However feel free to provide feedback and your toughs in the comments, we really appreciate it and will try to incorporate your additional reviews / comments!
Supplemental Information
16S fusion primers used in this study
Figure S1. 16S Fusion primers developed in this study. They include flow cell and sequencing primer binding regions for current Illumina sequencers. The amplified fragment has a size fo ~157 bp and can be sequenced directly after purification (one step PCR). Up to 10 samples can be uniquely tagged from forward and reverse direction and pooled in one NextSeq run. The bases used for shifting on Ins_F and Ins_R can be used to uniquely tag samples (inline barcodes). It is recommended that all 10 primer pairs are used in the following combination to maximize sequence diversity and reduce effects of tag switching by uniquely tagging samples from both sides: P5_Ins_R0+P7_Ins_F4, P5_Ins_R1+P7_Ins_F3, P5_Ins_R2+P7_Ins_F2, P5_Ins_R3+P7_Ins_F1, P5_Ins_R4+P7_Ins_F0, P5_Ins_F0+P7_Ins_R4, P5_Ins_F1+P7_Ins_R3, P5_Ins_F2+P7_Ins_R2, P5_Ins_F3+P7_Ins_R1, P5_Ins_F4+P7_Ins_R0
Distribution of reads obtained by NextSeq and number of reads discarded throughout the different bioinformatics processing steps
Figure S2. Number of sequences obtained per sample after library demultiplexing (A) and percentage of sequences excluded in different bioinformatic analysis steps (B). A: Library demultiplexing; Numbers above bars indicate the relative contribution (in percent) to the total number of sequences obtained for each sample. Sequencing started with Ins_F (white) or Ins_R (black) is indicated by bar color. B: Number of reads excluded in data processing steps. Mean percentage of sequence abundance in each processing step is written in brackets. Ins_F / Ins_R primer bias was tested with a t-test.