A novel ultra high-throughput 16S rRNA gene amplicon sequencing library preparation method for the Illumina HiSeq platform

Background Advances in sequencing technologies and bioinformatics have made the analysis of microbial communities almost routine. Nonetheless, the need remains to improve on the techniques used for gathering such data, including increasing throughput while lowering cost and benchmarking the techniques so that potential sources of bias can be better characterized. Methods We present a triple-index amplicon sequencing strategy to sequence large numbers of samples at significantly lower c ost and in a shorter timeframe compared to existing methods. The design employs a two-stage PCR protocol, incorpo rating three barcodes to each sample, with the possibility to add a fourth-index. It also includes heterogeneity spacers to overcome low complexity issues faced when sequencing amplicons on Illumina platforms. Results The library preparation method was extensively benchmarked through analysis of a mock community in order to assess biases introduced by sample indexing, number of PCR cycles, and template concentration. We further evaluated the method through re-sequencing of a standardized environmental sample. Finally, we evaluated our protocol on a set of fecal samples from a small cohort of healthy adults, demonstrating good performance in a realistic experimental setting. Between-sample variation was mainly related to batch effects, such as DNA extraction, while sample indexing was also a significant source of bias. PCR cycle number strongly influenced chimera formation and affected relative abundance estimates of species with high GC content. Libraries were sequenced using the Illumina HiSeq and MiSeq platforms to demonstrate that this protocol is highly scalable to sequence thousands of samples at a very low cost. Conclusions Here, we provide the most comprehensive study of performance and bias inherent to a 16S rRNA gene amplicon sequencing method to date. Triple-indexing greatly reduces the number of long custom DNA oligos required for library preparation, while the inclusion of variable length heterogeneity spacers minimizes the need for PhiX spike-in. This design results in a significant cost reduction of highly multiplexed amplicon sequencing. The biases we characterize highlight the need for highly standardized protocols. Reassuringly, we find that the biological signal is a far stronger structuring factor than the various sources of bias. Electronic supplementary material The online version of this article (doi:10.1186/s40168-017-0279-1) contains supplementary material, which is available to authorized users.


Background
Significant advances in microbiome analysis have been achieved by the adoption of improved sequencing technologies and increased computational capabilities. Sequencing large numbers of relatively short DNA fragments has become routine, and microbiologists have adapted these technologies to characterize communities of microbes either by targeted sequencing of conserved regions containing phylogenetically informative polymorphisms (e.g., 16S or 18S rRNA gene sequencing) or by sequencing a sub-set of the randomly sheared DNA molecules in a sample (so-called shotgun metagenomics). Both approaches present unique challenges for identification and interpretation of biologically meaningful information, and for the moment, the high costs associated with deep sequencing in shotgun metagenomics currently limits full exploitation. As such, combined approaches may offer the best possibility for achieving a significantly improved understanding of complex microbial communities. Thus, targeted PCR amplification and sequencing of the 16S rRNA gene continues to offer a powerful and economic way to gain insight into the bacterial community composition in large numbers of samples.
To address issues related to accuracy, cost, and throughput, we have designed an improved Illumina compatible library preparation protocol, combining features of several existing popular sequencing strategies, for 16S rRNA gene amplicon sequencing. Variable region 4 (V4) of the 16S rRNA gene is currently the most commonly used marker for bacterial amplicon sequencing and the 515-806 fragment in particular has been shown to outperform alternative variable regions both in terms of reproducibility and classification accuracy [1,2]. The 515-806 fragment can be spanned by sequencing 200-300 nt from both ends, easily achieved on the Illumina MiSeq (which delivers 15-25 million paired reads). The use of a lengthy overlap between paired reads is necessary for reliable paired read merging, ensuring increased coverage on each sequence and higher quality data. Illumina's HiSeq 2500 machine is capable of producing 300 million 250 bp paired reads per run using Rapid run mode v2 500 cycle reagents, at approximately one-third the cost per base compared to MiSeq.
Nucleotide diversity-having equal proportions of A, C, G, and T nucleotides at each base position in a sequencing library-is required for effective template generation on Illumina sequencing platforms [3]. In particular, the diversity in the first 11 bases of the DNA fragment is critical for cluster identification and color matrix estimation. Libraries with low complexity and uneven nucleotide distribution such as amplicon libraries can be successfully sequenced by blending in a balanced spike-in (Illumina recommends PhiX) which significantly reduces the amount of useful data obtained. This issue can be partially overcome by the addition of so-called heterogeneity spacers, short sequences of variable length, at the 5' end of the amplicon thus introducing the required base diversity for effective sequencing with a minimal requirement for spike-in [4].
The novel protocol that we describe in this study adds a third Illumina compatible index to the existing dual indexing strategy developed initially by Kozich et al. [5]. We use a two-stage PCR protocol (as recommended by Illumina [6]), thus reducing the number of oligos required for multiplexing large numbers of samples. In addition, our protocol includes the heterogeneity spacers introduced by Fadrosh et al. [4] to increase nucleotide diversity at the start of sequencing reads, allowing greater utilization of available sequencing capacity.
Here, we benchmark our new sequencing protocol using both sequencing of a mock community and repeated sequencing of a standardized environmental sample. We looked for possible bias effects of PCR cycle number, input template DNA amount, PCR indexing of samples, MiSeq vs. HiSeq, between libraries and between runs variation. These types of evaluation are necessary to understand potential misinterpretations that could result from technical issues inherent in the design and practice of characterizing bacterial communities.

Methods
Briefly, the procedure starts with a PCR reaction (PCR1) for general amplification of the V4 515-806 region of the 16S rRNA gene. Oligos used in this reaction includes the dual index sequence necessary for multiplexing (see below) along with heterogeneity spacers for increasing library complexity. Equal amounts of each amplified sample are then enriched in a normalization step and pooled. A second PCR reaction (PCR2) is then performed, using each amplicon pool as template, which primes on the ends of the oligos used for PCR1 (Fig. 1). This reaction adds a third index sequence and completes the Illumina adapters required for sequencing. Amplified PCR2 products are cleaned and quantified, then blended to make a single library before sequencing on the HiSeq 2500 (rapid mode) or MiSeq.

Oligo design
The primers used for PCR1 consists of four parts ( Fig. 1): (1). the 3′ end primes the V4 region of the 16S rRNA gene [7] with 515fB and 806rB in forward and reverse primer, respectively. (2.) A heterogeneity spacer (0-7 bp) as described by Fadrosh et al. [4]. (3.) Internal barcodes on both forward and reverse primers, also as described by Fadrosh et al. [4], to allow 96-384 sample index combinations. (4.) The 5′ end contains partial Illumina adapters. The forward primer used in PCR2 is the same for all the samples and contains the remaining Illumina adapter that is necessary for sequencing. Reverse primers contain the third index (6 nt Illumina TruSeq index) and the remaining Illumina adapter. Complete lists of primers employed in this study are provided in Additional file 1: Table S1. All primers were purchased from Integrated DNA Technologies (Coralville, IA, USA) as 4 nmol ultramers.

Bacterial mock community
The mock bacterial community used in this study was generously provided by Prof. Lutgarde Raskin, Department of Civil and Environmental Engineering, University of Michigan and has been described elsewhere [8]. Briefly, near full length sequences of the 16S rRNA gene from 33 bacterial species representing more than 25 different phyla (Additional file 2: Table S2, Additional file 3: Figure S1) were cloned into plasmids. Plasmid concentrations were then measured before blending to equal proportions at a final concentration of 1e 9 plasmids/μl.

Library preparation
For an initial MiSeq sequencing run, the PCR1 amplification was carried out with the following reaction mixture: 9 μl H2O, 10 μl 5Prime Hot Master Mix (Gaithersburg, MD, USA), 2.5 μl of 1 μM PCR1 forward primer, 2.5 μl of 1 μM PCR1 reverse primer, and 1 μl mock community template at either 2.5e 6 or 2.5e 7 molecules/μl. The following program was used for amplification: 94°C for 3 min for initial melting; 25, 30, or 35 cycles of 94°C for 10 s, 50°C for 30 s, 72°C for 45 s; 72°C for 10 min. Products were verified on a 1% agarose gel, and all samples were cleaned and normalized to equal concentrations using the SequalPrep Normalization Plate Kit (Invitrogen, Carlsbad, CA, USA) according to the manufacturer's instructions.
Samples assigned the same third index were pooled before the PCR2 reaction (Additional files 4, 5 and 6: Tables S3-5). PCR2 amplification was carried out with the following reaction mix: 10 μl H2O, 20 μl 5Prime Hot Master Mix, 5 μl of 1 μM PCR2 forward primer, 5 μl of 1 μM PCR2 reverse primer,10 μl template pool at 1 ng/μl. The following program was used for amplification: 94°C for 3 min; 5 or 10 cycles of 94°C for 10 s, 58°C for 30 s, 72°C for 45 s; 72°C for 10 min. Products were again verified on a 1% agarose gel before cleaning with the Agencourt AMPure XP PCR purification kit (Beckman Coulter, Brea, CA, USA) according to the manufacturer's recommendations. DNA concentration and fragment size were measured on a Qubit fluorometer (Invitrogen, Carlsbad, CA, USA) and Agilent Bioanalyzer (Santa Clara, CA, USA), respectively. Samples with significant amounts of off-target fragment were cleaned one additional time with AMPure beads, but with the protocol modified to a 1:1 mixture ratio of beads and sample in order to eliminate shorter fragments. Libraries were blended in proportions to ensure similar coverage of all samples. For later HiSeq sequencing runs (Additional files 7, 8, and 9: Tables S6-8), the library preparation procedure was identical to the MiSeq protocol with 35 and 10 cycles for PCR1 and PCR2, respectively, with the following exception: the repeated melting steps during both PCR amplifications were increased from 10 to 30 s.

Illumina sequencing
Paired-end sequencing was performed on a MiSeq (Illumina, San Diego, CA, USA) using v3 reagents, generating 300 bp reads per end, or on a HiSeq 2500 (Illumina) in Rapid mode, with v2 reagents, producing 250 bp reads per end, according to manufacturer's instructions at the Norwegian Sequencing Centre (Oslo, Fig. 1 Triple indexing design. The triple indexing strategy incorporates two PCR steps. During the first PCR (PCR1), the template sequence of interest is targeted and amplified (green). The primers for this reaction also contain an indexing sequence and a heterogeneity spacer sequence (red), and a partial Illumina adapter (blue). A second PCR (PCR2) allows for the introduction of a third indexing sequence (dark blue) as well as completion of the Illumina sequencing adapter Norway). In each case, Illumina spike-in (PhiX) was included at 10% while loading was dropped to 80% normal recommended levels. Raw bcl files were processed using RTA v1. 18.54 (MiSeq) and v1.18.66.3 (HiSeq). Initial de-multiplexing of the data based on the Illumina index reads (third index) was carried out using bcl2fastq v.2.17.1.14, which also converted the raw data to fastq files. Quality of the sequenced data was verified using FastQC v0.11.3 [9] and was summarised using multiqc v0.3.1 [10].

Downstream analyses of mock community data
Low quality reads were trimmed, and Illumina adapters were removed using Trimmomatic v0.36 [11] with default settings. Reads mapping to the PhiX genome (NCBI id: NC_001422.1) using BBMap v36.02 [12] were also removed. De-multiplexing of data based on the dual index sequences was carried out using custom scripts (https:// github.com/arvindsundaram/triple_index-demultiplexing). Internal barcodes and spacers were removed using cutadapt v1.4.1 [13], and paired reads were merged using FLASH v1.2.11 [14] with default settings. Sequence files were then converted from fastq to fasta, and primers were trimmed from merged read ends. For each mock community sample, reads were aligned against a local database consisting of the 16S rRNA gene sequences of the 33 bacterial species included in the mock community using BLAST [15] and only reads with full length alignment with 100% sequence identity were retained for subsequent analyses, except for chimera detection. Chimera detection was carried out on the raw, merged reads using the uparse [16] pipeline in usearch v8.1.1861 [17] with no minimum group size for the dereplication and clustering steps. Chimera formation rates were then found from the uparse output files.
The BLAST tabular output files were then imported into R [18]. Operational taxonomic unit (OTU) tables were produced by counting the number of occurrences of each bacterial species in the mock community. Since the resulting OTU tables were non-sparse with each OTU at an expected relative abundance of~3%, ordination was done using principal components analysis (PCA) with the prcomp function in R (stats package). Bray-Curtis distance matrix computation, adonis modeling (Permutational Multivariate Analysis of Variance Using Distance Matrices) and ANOSIM (using Bray-Curtis dissimilarities and 10000 permutations) were carried out using the R package vegan.

Environmental sample sequencing
A standardized environmental sample was made by suspending a fresh fecal sample (32.5 g) from a healthy infant in 500 ml of PBS. 1 ml aliquots of this suspension were then transferred to microtubes and immediately frozen at −80°C. 500 μl of sample material was used for DNA extraction with the PowerSoil 96 well DNA isolation kit (MO BIO Laboratories Inc., Carlsbad, CA, USA). A total of 54 standardized samples were DNAextracted on 19 different plates and were prepared as part of 19 different sequencing libraries (Additional file 8: Table S7), with 3 standardized samples per library with two exceptions (one library with two samples and one with only one). Library preparation and Illumina sequencing were carried out as described above, and the samples were sequenced as part of two separate HiSeq sequencing runs (26 samples in one, 28 in the other). Sequence processing was performed as described above up to and including paired read merging. Further processing was carried out using a combination of vsearch v2.0.3 [16] and usearch v8.1.1861 [17]. Specifically, dereplication was performed with the -derep_fulllength function in vsearch with a minimum unique group size set to 2. OTU clustering, chimera removal, taxonomic assignment, and OTU table building were carried out using the uparse pipeline [19] in usearch. Taxonomic assignment to the genus level was done against the RDP classifier. Read depths were normalized by common scaling [20]. This entails multiplying each OTU count in a given library with the ratio of the smallest library size (97,187 reads) to size of the library in question. This procedure replaces rarefying (i.e., random sub-sampling to the lowest number of reads) as it produces the library scaling one would achieve by averaging over an infinite number of repeated sub-samplings. After conversion of OTU counts to relative abundances, OTUs with an average relative abundance below 0.01% were removed from the data in order to reduce noise from artifacts. Kruskal's non-metric multidimensional scaling was carried out using the function isoMDS in the R package MASS with default settings.
We also sequenced 25 fecal samples from 5 healthy adults. We analyzed 4-6 samples, collected on consecutive days, per individual. With one exception the samples were prepared as part of the same library, and all samples were sequenced in one HiSeq run. Sequencing and subsequent analyses followed the exact same procedure as the standardized samples described above, with the exception that scaling was done against a smallest library size of 53,902 reads.
For further comparison, 15 additional human samples were sequenced using the widely used earth microbiome project (EMP) protocol [7], targeting the exact same V4 fragment of the 16S rRNA gene as our triple indexing approach. Sample collection and extraction was performed as described in [21], but in this case, the actual sequencing was carried out at the Broad Institute (Cambridge, Massachusetts, USA) core sequencing facility and sequencing was done on a MiSeq instrument with 150 PE reads. We re-sequenced these samples using the triple indexing approach on the HiSeq at the Norwegian Sequencing Centre (Oslo, Norway). Sequencing data were processed as described above, with exception that scaling was done to the smallest library size of 10,241 reads.

Results
We evaluated several potential sources of bias in our sequencing strategy, using the total output from one MiSeq run and partial output from two HiSeq runs, resulting in six separate datasets (Table 1). Bias caused by sample indexing both in PCR1 (Dataset 1, Additional file 4: Table S3) and PCR2 (Dataset 2, Additional file 5: Table  S4) reactions, as well as the effects of DNA template amount used for PCR and PCR cycle number (Dataset 3, Additional file 6: Table S5), were evaluated by sequencing the mock community 160 times in a single MiSeq run. We sequenced the mock community 24 additional times as part of a HiSeq run (Dataset 4, Additional file 7: Table  S6) in order to assess design optimization, as well as to provide a basis for comparison between MiSeq and HiSeq results. Furthermore, a standardized environmental sample was sequenced 54 times as part of two separate HiSeq runs (Dataset 5, Additional file 8: Table S7) in order to assess variability between sequencing libraries and sequencing runs. In addition, we sequenced, in a single HiSeq run, 25 fecal samples collected on consecutive days from 5 adult volunteers (Dataset 6, Additional file 9: Table S8), in order to provide a baseline measure of variability in a realistic experimental setting, as well as to assess discriminatory power. Finally, we compared our sequencing technique with the widely used EMP protocol (Dataset 7, Additional file 10, Table S9).

Effect of PCR1 indexing
Twelve forward and eight reverse PCR1 primers were used in this design to generate 96 PCR1 index combinations (Dataset 1, Additional file 4: Table S3), i.e., a full standard plate. Since these primers contain different 12 bp indices, and have slightly different lengths due to the "heterogeneity spacer", they can conceivably amplify with different efficiencies during PCR, which could bias results. To examine this effect, we amplified the same template (the mockcommunity library, 2.5e 6 input molecules per reaction) using 96 PCR1 primer combinations, employing 35 cycles of PCR1 and 10 cycles of PCR2 amplification (the same PCR2 forward and reverse primer was used for all reactions). These amplification conditions represent the maximum number of cycles (hence maximum possible bias) used in the study. The 96 samples were sequenced on the MiSeq platform yielding a mean of 80,330 (±31,559 s.d.) reads per sample after quality filtering and paired read merging.
Variation in yield from each PCR reaction can be expected due to amplification efficiency and technical variation caused by normalization prior to PCR2. Amplification efficiency was seen to be approximately equal from all reactions, as determined by running the products on an agarose gel (data not shown). However, index combinations where the same index sequence was present on both forward and reverse PCR1 primers clearly produced a lower yield of sequence data (Additional file 11: Figure S2), with mean read numbers at 8968 (same index samples, n = 8) vs. 86,818 (different index samples, n = 88). However, apart from the relatively low read numbers, the species relative abundances measured in these samples did not differ noticeably from other samples, and thus were included in further analyses.
Based on the blending estimates described in Pinto and Raskin [8], the expected proportion of each species was 3%. Overall relative abundance estimates generally centered on the expectations (Fig. 2), with a mean deviation from the expected value by 1.34 (±0.91 s.d.). However, some species deviated more than others from expectation. In particular, Thermomicrobium roseum was detected, at very low abundances, in only 7 of the 96 sequenced samples. Notably, this is the species with the highest GC content (69%) in the mock community, so low sequencing efficiency of the rDNA from this bacterium is unsurprising, as Illumina sequencing is not effective on sequences with extreme GC content [22]. Overall, significant spread in relative abundance estimates within species was observed, with a mean standard deviation of 0.66%. This could largely be attributed to bias effects from the primers used for PCR1 amplification causing significant structure in the data (Fig. 3, Additional file 12: Figure S3), with 74.4% of the intersample variation in Bray-Curtis distances explained by samples having been amplified with different PCR1 reverse primers (p < 0.001, ANOSIM) and 18.7% ascribed to the forward primers (p = 0.003, ANOSIM), accounting for a total of 93.1% of observed variation. Observed effects were primarily associated with the oligo indices designated r1, r10, and r13 (Fig. 3a). These  Table S3, Additional file 7: Table S6). Species abundance estimates are shown side-by-side with MiSeq estimates labeled 'MS' and HiSeq estimates labeled 'HS'. For enhanced visualization, each pair of colored bars (blue or white) depicts the estimated relative abundances for one species. The dotted red line shows the relative abundance expectation given perfectly equal blending. Each box represents the interquartile range while the whiskers represent 1.5 times the interquartile range. Points outside the whiskers represent outliers  Table S3). a. Samples are colored according to the reverse primer used for PCR1. b. Samples are colored according to the forward primer used for PCR1. In both a and b, the first two dimensions, explaining 62% of the total variance, are shown effects were again associated with specific bacterial species (Additional file 13: Figure S4), e.g., r1 appeared to be biased in favor of Mycobacterium orale and an uncultured Bacteroidetes while r10 and r13 favored an uncultured Cyanobacterium and an uncultured Verrucomicrobium. In general, there was good agreement between the 96 measured mock communities with an average Spearman correlation of 0.85 between the vectors describing relative species abundances, but the relatively poorer agreement between samples processed with the r1 primer and those amplified with the r10 and r13 primers was evident, with correlations even falling below 0.5 (Additional file 14: Figure S5). We also observed a significant negative relationship between estimated mean relative abundances and GC content of sequenced fragments (p = 0.002, linear regression; Fig. 4), with relative abundance decreasing by 0.18% for each 1% increase in GC content. Chimera formation rates, at a mean of 17% (±2.2 s.d.), were not significantly associated with specific oligo combinations, except for a slightly elevated rate for samples amplified with primer r13 (p < 0.001, linear model, (Additional file 15: Figure S6). Furthermore, it was evident that the large majority of chimeric sequences occurred only once in any given sample, with a mean proportion of singleton chimeras at 90%.

Effects of PCR2 indexing
Similar to the analysis above, we attempted to quantify bias emerging from the use of different index sequences introduced in PCR2 amplification. In the case of PCR2, however, indices were only present on the reverse primer, were shorter (6 bp), and no heterogeneity spacers were required. Furthermore, the priming sequence used for this amplification did not contain wobble-bases (Additional file 1: Table S1), further reducing the potential for bias. We compared relative species abundances measured in five sets of PCR1 products (four per set for a total of 20 samples) amplified using 35 cycles, with each set of 4 samples using a different PCR2 index primer (Dataset 2, Additional file 5: Table S4) and 10 cycles of amplification. The samples were sequenced on the MiSeq platform yielding and average of 144,171 (±100,253 s.d.) reads. PCR2 was again performed for 10 cycles of amplification, to introduce the maximum possible bias. By far, the largest proportion of intersample variation in Bray-Curtis distances was explained by differences caused by PCR1 reverse priming oligos (82.7%, Adonis model), with the main difference between samples amplified using r1 and r10 (Additional file 16: Figure S7). Conversely, only 8.6% (Adonis model) of the observed distance variation could be related to differences between oligos used for PCR2 amplifications, and these effects were not found to be statistically significant (p = 0.98, ANOSIM).

Effects of PCR cycle number and template amount
PCR has on numerous occasions been shown to introduce amplification bias [23]. We therefore wished to examine how the number of amplification cycles, during PCR1 and PCR2, affected sequencing results. To this end we sequenced 48 mock community samples (MiSeq, mean of 104,705 (±43,026 s.d.) reads per sample) according a factorial design (Dataset 3, Additional file 6: Table S5). Briefly, we performed PCR1 amplification using either 25, 30, or 35 cycles (16 samples per treatment). To investigate the effect of PCR2 cycle number, PCR1 products were further amplified with either 5 or 10 cycles (24 samples per treatment). Furthermore, within each combination of PCR cycle number regimes, initial PCR1 amplification was carried out using either relatively low (2.5e 6 molecules) or high (2.5e 7 molecules) amounts of DNA template (24 samples per treatment). This design gave a total of four samples within each possible combination of variable levels. We did not observe any significant interaction effects between template amount and PCR cycle number (p = 0.94 and 0.98 for PCR1 and PCR2, respectively, Adonis models), or between PCR1 and PCR2 cycle regime (p = 0.82, Adonis model). Thus, for further statistical analysis each variable was treated as separate from the two others, with the number of samples per treatment as stated above.
Significant effects of input DNA template amount for the PCR1 amplification were observed (p < 0.001, ANOSIM; Fig. 5), particularly related to the species  Table S3) and HiSeq (p = 0.012, n = 24, Additional file 7: Table S6) data. Estimates drop by 0.18 and 0.16% for each 1% increase in GC content for the MiSeq and HiSeq estimates, respectively Fibrobacter succinogenes being favored in reactions using the relatively high template concentration (Additional file 17: Figure S8). However, the abundance estimates were remarkably robust to the effect of DNA input amount over the 10-fold range tested here, explaining only 12.1% of the total observed variation in inter-sample Bray-Curtis distances (Adonis model).
There was a significant, albeit noisy, effect of the number of PCR1 cycles on the number of sequence reads produced (Additional file 6: Table S5), with read numbers on average increasing with 4084 reads per added cycle (p = 0.006, linear regression). No corresponding effects were observed for PCR2 cycle number or template amount (p = 0.91 and 0.89, respectively; linear models). We observed a clear but subtle effect of the number of cycles used in PCR1 reactions on estimated relative abundances (p < 0.001, ANOSIM) (Additional file 18: Figure S9), with this factor accounting for 13.6% of the observed variation in Bray-Curtis distances (Adonis model). However, comparisons of squared deviations from the expected relative abundances between samples that had undergone different numbers of PCR1 cycles did not find any particular regime to produce results significantly closer to the expectation (25 vs. 30 cycles: p = 0.45; 25 vs. 35 cycles: p = 0.30; 30 vs. 35 cycles: p = 0.86, unpaired t tests). All in all there were ten species for which we observed a significant linear relationship between measured relative abundance and the number of cycles used during PCR1 amplification (Fig. 6). PCR1 cycle number seemed to particularly affect abundance estimations of species with a high GC content, e.g. T. neapolitana and Thermodesulfobacterium commune (GC of 64 and 61%, respectively). One extreme example is T. roseum, the species with the highest GC content (69%) on the sequenced fragment. This species was observed in 15 out of 16 sequenced samples, albeit at very low abundances, when using 25 PCR1 cycles, but only in 5 out of 32 samples at higher cycle numbers (relative abundances <0.01%). Another noteworthy example is the uncultivated Gemmatimonadetes species (GC of 62%) which was readily observed after 25 and 30 PCR cycles, but dropped sharply at 35. Conversely, we observed a positive relationship for four species of low or intermediate GC content (Fig. 3). The number of cycles used for PCR2 did not produce discernible effects (p = 0.84, ANO-SIM) (Additional file 19: Figure S10) and accounted for only 1.1% of observed distance variation between samples.
Even when considering the bias sources built into the experimental design in terms of PCR regime and template amount, the by far strongest effects on data structure could be ascribed to bias related to the primers used for PCR1 amplification. As noted above, the DNA oligos used for forward and reverse priming differed only in their respective index/spacer sequences, yet 62.1% of inter-sample variation in Bray-Curtis distances was explained by samples having been amplified with different reverse primers during PCR1. As above, the forward primers had a much smaller effect, contributing only 2.1% of inter-sample variation. All in all, 91.1% of the total distance variation could be explained by the combination of PCR cycle number effects, template concentration and PCR1 priming bias.

Chimeric read formation
Chimeric amplification artifacts, arising from two or more templates, artificially inflate diversity estimates in 16S rRNA gene sequencing data. Over-amplified libraries can include up to 45% such artifacts [24]. We detected a strong linear association between PCR cycle regime and chimeric sequence formation when considering PCR1 cycle numbers, with the proportion of chimeric sequences increasing with as much as 1% for each added reaction cycle (R 2 = 0.81, p < <0.001, linear model) (Additional file 20: Figure S11). We did not observe a significant relationship when only considering the second step PCR2 regime (p = 0.61, t test); however, when considering the combined effects of both PCR1 and PCR2, there was still a strong linear relationship (R 2 = 0.65, p < <0.001, linear model: Fig. 7). It was also evident that for samples having undergone an equal total number of PCR cycles (PCR1 + PCR2) partitioned differently between first and second step reactions, i.e., 25 + 10/  Table S5). Samples are colored according to PCR1 and PCR2 cycle regime, with the number of cycles indicated in the legend (PCR1 + PCR2). Filled dots and triangles represent samples prepared with tenfold difference in input DNA template concentration used for PCR1. The first two dimensions, explaining 65% of the total variance, are shown 30 + 5, and 30 + 10/35 + 5, the mean chimera formation rates were significantly higher in the samples having undergone more first step cycles (p < 0.001 for both comparisons, t tests). We did not observe significant effects of input template amount on chimera formation rates (p = 0.72, t test). However, we only tested a relatively narrow range of template amounts and cannot draw conclusions for scenarios with more extreme variation in input DNA.

Optimization and scalability to the HiSeq platform
In order to test the scalability of our protocol and to investigate the potential for increasing measurement consistency by reducing bias introduced through suboptimal primer combinations, we sequenced the mock community 24 more times (Dataset 4, Additional file 7: Table S6) as part of a HiSeq rapid run producing 250 bp paired-end reads (mean read number of 297,930 (±91,387 s.d.)). In this experiment, we eliminated any instance of identical indices on the forward and reverse primers, and we replaced the poorly performing primers r1, r10, and r13 with alternative ones (r3, r12, and r15; Supplementary table S6). These primers were chosen because they have similar heterogeneity spacer lengths to the ones they replaced (Additional file 1: Table S1). We also extended the melting time during PCR from 10 to 30 s to try to alleviate bias caused by high the GC content of some fragments.
Overall measurement accuracy was comparable with the MiSeq run described in the previous experiment  Fig. 7 Relationship between PCR cycle number and chimeric sequence formation in dataset 3. The combined numbers of PCR1 and PCR2 amplification cycles are indicated on the x-axis. Black and red dots indicate samples amplified using 5 and 10 cycles for PCR2, respectively. A highly significant linear relationship (p < <0.001, linear regression model) was observed. The effects were primarily related to the PCR1 cycle number, e.g., samples undergoing 35 cycles (25 cycle PCR1 and 10 cycles PCR 2) had less chimeras than samples undergoing 35 cycles (30 cycles PCR1 and 5 cycles PCR2) (Fig. 2) with a mean deviation from the expected abundances at 1.38%, and observed differences in estimated community composition were not statistically significant (p = 0.84, ANOSIM). Significant structure caused by bias both on the reverse and forward primers was still observed (p < 0.001 and p = 0.01, respectively, ANOSIM), with 71.9 and 23.1% (Adonis model) of the observed inter-sample variation in Bray-Curtis distances explained by the reverse and forward priming oligos used for PCR1 amplification, respectively (Additional files 21 and 22: Figures S12 and S13). The most severe discrepancies were caused by the new primer combinations (r3, r12, and r15), suggesting that spacer length may be playing a part in amplification bias. However, the new primers performed better than the indices that were replaced, as Spearman correlations in this run did not go below 0.66 (Additional file 23: Figure S14). The mean Spearman correlation of the estimated OTU composition vectors was 0.87.
The bacterial species that were subject to the most severe bias were the same as before (Fig. 2). Importantly, in this sequencing run we were consistently able to observe, at moderate abundances, the T. roseum sequences that were absent from the MiSeq run under the same 35 cycle PCR1 and 10 cycle PCR2 regimes with a mean relative abundance increase from 0.00023 to 1.0%. Estimates for other high GC content species T. neapolitana and uncultured Gemmatimonadetes species also saw moderate increases in mean relative abundance (0.7-1.2% and 2.9-3.2%, respectively). We attribute this to the increased time for DNA denaturation used for both PCR1 and PCR2 amplification prior to sequencing on the HiSeq, and this result suggests that denaturation time during PCR can affect measurements, especially for high GC content species.

Sequencing of environmental samples
Further evaluation of the sequencing protocol was done by sequencing human fecal samples. First, a standardized sample was sequenced 54 times using a variety of indices (Dataset 5, Additional file 8: Table S7). These samples were processed as part of 19 different rounds of DNA extraction and subsequent library preparation (i.e., on 19 different 96-well plates; Table 1). Furthermore, they were sequenced as part of two different HiSeq runs, with 26 samples sequenced on the first run and 28 on the second. The mean Bray-Curtis distance of estimated bacterial community composition between samples was 0.18, a value slightly higher than that observed for the mock community (0.12) (Fig. 8a). Most of the variation (54.4%, Adonis model) in inter-sample distances could be ascribed to batch effects (p < 0.001, ANOSIM), i.e., the fact that the samples were processed on different plates. This could be due to differences in DNA extraction efficiency or downstream steps of library preparation. 20.1% of the variation could be attributed to amplification bias related to the reverse primers used during the PCR1 amplification (p < 0.001, ANOSIM), while 3.7% of the variation was related to the forward primers (p = 0.009, ANOSIM). 14% of the variation could be explained by the samples having been sequenced on two separate HiSeq runs (p < 0.001, ANOSIM). In total, these four factors accounted for 92.3% of the observed variation.
For the purpose of comparison, 25 fecal samples were obtained from 5 healthy adult volunteers. The samples were taken sequentially on a daily basis (4-6 days) and were sequenced on a single HiSeq run (Dataset 6, Additional file 9: Table S8). The purpose of this was to provide a baseline measure of inter-sample differences, as well as to evaluate relative effects of bias caused by indexed PCR, in a realistic experimental setting. The mean Bray-Curtis distance of this sample set was 0.57, illustrating that the variation observed in the data from both the mock community and the standardized fecal sample are comparatively low (Fig. 8a). The inter-sample variation in distances could in this instance mainly be ascribed to differences between the individuals from which the samples originated (74.1%, Adonis model), while only 11.0 and 8.7% of the total variation was explained by the reverse and forward PCR1 primers, respectively (93.8% explained by all 3 factors). This demonstrates good potential for detecting biological signal in the face of inherent technical bias (Fig. 8b).
We further compared the triple indexing approach with the widely used EMP library preparation method [7]. Bray-Curtis distances between samples sequenced using the two methods was significantly lower in identical relative to non-identical samples (Fig. 8c, p < <0.001, Wilcoxon rank sum test), and sample clustering followed the expected pattern ( Fig. 8d) with no significant systematic batch effects (p = 0.85, ANOSIM). We observed high correlations between sample pairs with means of 0.76 (Spearman) and 0.85 (Pearson), and sample pairing accounted for 94.3% of the observed between sample distances (Adonis model).

Discussion
In this study, we carried out extensive benchmarking of an Illumina sequencing library preparation protocol incorporating several extant features in order to improve on popular designs. Our main recommendations for optimal use of the protocol described here are summarized in Table 2. The use of a triple indexing strategy allows for extensive multiplexing with a greatly reduced requirement for long custom oligos. This approach is also in theory extendable to quadruple indexing, with the addition of a barcode sequence on the reverse primer used during PCR2. Addition of heterogeneity spacers [4] allowed for PhiX spike-in at low levels while maintaining high data quality. Finally, the rapid run mode available on the Illumina HiSeq 2500 allows for a read length that easily spans the~300 bp of the V4 marker fragment when using paired-end sequencing and generates up to 10 times as much data as a MiSeq run, allowing faster data generation on a single machine run.
The original article analyzing the mock community used in this study [8] reported significant deviations from the expected relative abundances according to their blending design. It should be noted that plasmid blending represents a significant error source, and some deviation from the intended ratios is inevitable. Pinto and Raskin [8] reported poor detection of some species, such as the thermophiles T. neapolitana and T. commune. These species were also detected below expected  Table S3), standardized sample (SS, dataset 5, Additional file 8: Table S7), and healthy adult (HA, dataset 6, Additional file 9: Table S8) group. Each box represents the interquartile range while the whiskers represent 1.5 times the interquartile range. Points outside the whiskers represent outliers. The number of pairwise distances for each group is indicated over the boxes. b. Multidimensional scaling (MDS) plot showing clustering of 25 samples taken from 5 healthy adult volunteers (dataset 6, Additional file 9: Table S8). Sample origin is indicated by color (individual 1-5). The stress value of the MDS model was 13.2%, indicating a good fit. c. Pairwise Bray-Curtis distances for the 15 samples sequenced using 2 different library preparation methods (dataset 7, Additional file 10, Table S9). The leftmost box shows distances between identical samples (P = paired), while the box on the right shows the distances for non-identical samples (UP = unpaired). d Multidimensional scaling plot showing clustering of the 15 samples sequenced using 2 different library preparation methods (dataset 7, Additional file 10, Table S9). Paired samples, i.e., identical samples sequences using different techniques, are joined by black lines Table 2 Recommendations abundances in our study. On the other hand, the previous study reported very low recovery of sequences from V. vadensis and M. orale. We detected V. vadensis at around the expected 3%, while M. orale was highly overrepresented in our data at around 6%. Conversely, Pinto and Raskin [8] found sequences from the two Syntrophus spp. to be overrepresented in their data, as did we. However, they detected the uncultured Planctomycetes far over the expected 3% while we found it to represent an average of around 2% of reads. Direct comparison between the two studies is complicated by the fact that they used primers targeting different regions of the 16S rRNA gene, a known cause of differential PCR amplification bias [2]. Furthermore, we employed the Illumina sequencing platform as opposed to 454 pyrosequencing, which was used in the older study.
The input template DNA amount used for PCR1 had only moderate effects on the estimated relative abundances in the mock community, and the effects we did observe were consistently associated with specific species. In particular, for F. succigenes, a major cellulolytic component of the ruminant microbiota [25], we observed an increase from 3.3 to 6.7% mean relative abundance when going from low to high amounts of template DNA. It is unclear whether this type of bias is specific to the mock community we analyzed, or if it represents a more general phenomenon. Wu et al. [26] sequenced various dilutions of sediment samples without observing significant effects on estimated community composition. Kennedy et al. [27], on the other hand, found significant effects of template concentration when analyzing soil and fecal samples. This suggests that template amount bias might be system specific and should be treated on an ad hoc basis. By sequencing dilutions of a subset of samples within a given experiment, one could determine if bias is associated with particular taxa, and either error correction could be employed or caution used when making conclusions based on taxa that are more prone to template amount bias.
It is well established that multi-template PCR is subject to several sources of bias [28,29], and these error sources can be aggravated when using indexed primers for highly multiplexed DNA sequencing [30].The PCR cycle number used for amplification of a target fragment is a factor that can be expected to have effects on amplicon sequencing results. As expected, the number of PCR cycles used for amplification had a big impact on chimeric sequence formation rates, in line with previous findings [2]. Interestingly, chimera formation was much more strongly associated with the PCR1 reactions than PCR2 reactions (Fig. 7), indicating that primer degeneracy or the appended indices and adapters, or a combination of the two, increases the probability of template switching during PCR amplification relative to an amplification with perfectly matching primer sequences, such as for the PCR2 reactions used in our protocol. However, the relatively low cycle number used for PCR2 may also have contributed to masking the effects of this reaction. The high proportion of singleton chimeric reads (90%) also suggests that simple abundance filtering can greatly help towards elimination of chimeric sequences from the dataset. The remainder can be eliminated using one of several available chimera detection programs [19,24].
We only observed relatively subtle effects of PCR1 cycle number and estimated relative abundances of species in the mock community. There was, however, a systematic negative relationship between PCR cycle number and estimated relative abundances of sequences with very high GC content (Fig. 6). This is a phenomenon that has been observed previously [31]. A likely explanation is that the relatively higher melting temperatures of these sequences causes them to be more easily outcompeted by lower GC containing template sequences due to differential amplification efficiency. Increasing the denaturation time during PCR1 increased recovery of the high GC content sequences in our mock community (T. roseum, T. neapolitana, and uncultured gemmatimonadetes). The effects of melting time during PCR should be especially considered for bacterial communities expected to containing species with a wide range of GC contents. We also observed a significant negative relationship between estimated mean relative abundances and GC content of sequenced fragments in general. This suggests that a GC correction factor could be employed where relative abundance estimates are multiplied by some empirical constant in order to compensate for high GC content. The exact value of the correction factor (in our case 0.18% per percent increase in GC) should be corroborated by further experimentation.
Unexpectedly, we observed greatly reduced sequencing output for samples using identical index sequences on the forward and reverse primers during PCR1 (a reduced output by a factor of around 10 was observed in all 15 cases). A cursory analysis suggests that this phenomenon is caused by formation of high melting temperature hairpin structures that interfere with sequencing efficiency. Eliminating these primer combinations for the HiSeq runs removed this problem. However, although unlikely, we cannot conclude that this issue is not specific to the MiSeq platform.
Primer indexing was found to be an important structuring factor affecting relative abundance estimates. This has been observed by other investigators [8,30,32]. The bulk of the observed variation (74.4%) due to primer indexing in this study was caused by the reverse primers. We were able to partially alleviate this problem by replacing underperforming primers (Additional file 1: Table S1), with the total amount of the variation ascribed to primer indexing decreasing from 93.1 to 79.7%. Some bias caused by primer indexing during multi-template PCR is inevitable; however, an informed rationale for designing primer indices/spacers that minimize bias would represent a significant advancement. Possible ways of reducing this source of bias are by first using an unindexed PCR followed by either ligation of index and adapter sequences [2] or by using a second step indexed PCR [30,33]. We did not observe significant bias due to the non-degenerate PCR2 amplifications (which used six-nucleotide indices regularly employed by Illumina library preparation reagents on the reverse primers), suggesting that carrying out indexing PCR on amplicons, rather than genomic DNA, may alleviate primer indexing bias. The choice of method will be a cost to bias trade-off (Table 1). Using an unindexed PCR1 with dual indexing would entail the purchase of a substantially larger number of long custom DNA oligos in order to achieve a high degree of multiplexing. In the case of dual indexing the number of oligos scales with the square of the number of samples to be sequenced, while with triple indexing the number of oligos increases linearly with one additional oligo per 96 samples. For example, for one HiSeq run we multiplexed 1152 samples (12 plates) using only 33 oligos, while the corresponding number required for a two-step PCR dual indexing protocol would be 70.
Primer indexing, although significant, was a relatively minor contributor to the overall variation when sequencing a standardized human fecal sample. In this case, the main variation (54.4%) in inter-sample distances was related to batch effects, such as DNA extraction being performed on different days. This highlights the importance of having highly standardized protocols starting with DNA extraction and through to the actual sequencing step. Although the mean Bray-Curtis distances between samples were higher for the standardized fecal samples than the mock community (Fig. 8), the mean Spearman correlation of the estimated OTU composition vectors of both the mock community (0.85) and the standardized fecal sample (0.87) were similar. This suggests that one should get consistent results if one chooses either to benchmark by sequencing a defined mixture of 16S rRNA gene molecules or by repeated sequencing of a standardized environmental sample. Comparison with fecal samples taken sequentially from five adults demonstrated that the inter-sample distances observed in a real-world experimental setting are significantly higher than observed in both the mock community and standardized environmental sample data. In this experiment, primer indexing bias was only a minor contributor to the overall variation, while the biological signal resulting from sample origin accounted for 74.1% of the total observed variation. Comparison of the triple indexing approach with the EMP library preparation protocol demonstrated good agreement between the two methods. The EMP protocol uses single indexing of the reverse primer in a single step PCR amplification and in this case, the sequencing was carried out at a separate sequencing facility. We observed no significant batch effects on data produced either by us or at the Broad Institute, indicating that the biological signal by far outweighs any technical biases.
Additional file 11: Figure S2. Read counts for all PCR1 primer combinations in Dataset 1. Each bar represents the read count of a mock community sample after quality filtering and paired read merging. The figure shows data for all 96 PCR1 index combinations (Dataset 1, Additional file 4: Table S3). Tenfold fewer reads were observed in samples that were amplified with primer pairs using the same forward and reverse indices. (BMP 1083 kb) Additional file 12: Figure S3. Effects reverse primer indices used for PCR1 on relative abundance estimates in the mock community (Dataset 1, Additional file 4: Table S3). For each species, the eight boxes displayed shows relative abundance estimates obtained using each of the eight indexed PCR1 primers, with the primer name appended to the species names. Each set of measurements results from 12 replicates. Species are separated by grey dotted lines. Each box represents the interquartile range while the whiskers represent 1.5 times the interquartile range. Points outside the whiskers represent outliers. (PDF 27 kb) Additional file 13: Figure S4. Principal components analysis loadings plot showing the bacterial species defining the two largest variance components in Dataset 1. The main axis of variation accounts for 50% of the total and defines the difference between samples amplified with PCR1 primers r1 (e.g., Mycoplasma orale) and r10 and r13 (e.g., uncultured verrucomicrobium). (BMP 1806 kb) Additional file 14: Figure S5. Heat map of pairwise sample correlations in mock community MiSeq data. Pairwise Spearman correlations between the vectors of estimated relative abundances for mock community samples amplified using all 96 PCR1 primer combinations (Dataset 1, Additional file 4: Table S3). Primer pairs are shown on the x-and y-axis. The color of each cell indicates the degree of correlation, according to the color key on the right side of the figure. (BMP 656 kb) Additional file 15: Figure S6. Association between chimeric sequence formation and reverse index sequence from PCR1. The data are the mock community samples amplified using 96 PCR1 primer combinations (dataset 1, Additional file 4: Table S3). Chimera formation was not significantly associated with specific primer combinations, except for a slightly elevated rate for samples amplified with primer r13 (p < 0.001, linear model). Each box represents the interquartile range while the whiskers represent 1.5 times the interquartile range. Points outside the whiskers represent outliers. (BMP 441 kb) Additional file 16: Figure S7. Scores plot based on a principal components analysis model computed from the matrix of species relative abundances in the mock community (dataset 2, Additional file 5: Table S4). Samples are colored according to the reverse primer used for PCR1. Symbol characters represent the different PCR2 reverse primers. The first two dimensions, explaining 79% of the total variance, are shown. (BMP 441 kb) Additional file 17: Figure S8. Effects of input DNA amount used for PCR1 on relative abundance estimates in the mock community (dataset 3, Additional file 6: Table S5). For enhanced visualization, each pair of colored bars (alternating blue or white for easier visualization) depicts the estimated relative abundances for one species. Species abundance estimates for high (H) and low (L) input template amounts are shown side-by-side as indicated in the x-axis labels (H = 2.5e 7 molecules, L = 2.5e 6 molecules). Each set of measurements results from 24 replicates. Each box represents the interquartile range while the whiskers represent 1.5 times the interquartile range. Points outside the whiskers represent outliers. (BMP 1806 kb) Additional file 18: Figure S9. Effects of PCR1 cycle number on relative abundance estimates in the mock community (dataset 3, Additional file 6: Table S5). For enhanced visualization, each alternate triplet of colored bars (blue or white) depicts the estimated relative abundances for one species. Species abundance estimates for 25, 30, and 35 cycles are shown side-byside as indicated in the x-axis labels. Each set of measurements results from 16 replicates. Each box represents the interquartile range while the whiskers represent 1.5 times the interquartile range. Points outside the whiskers represent outliers. (BMP 1806 kb) Additional file 19: Figure S10. Effects of PCR2 cycle number on relative abundance estimates in the mock community (dataset 3, Additional file 6: Table S5). For enhanced visualization, each alternate pair of colored bars (blue or white) depicts the estimated relative abundances for one species. Species abundance estimates for 5 and 10 cycles are shown side-by-side as indicated in the x-axis labels. Each set of measurements results from 24 replicates. Each box represents the interquartile range while the whiskers represent 1.5 times the interquartile range. Points outside the whiskers represent outliers. (BMP 1806 kb) Additional file 20: Figure S11. Relationship between PCR1 cycle number and chimeric sequence formation in dataset 3 (Additional file 6: Table S5). The number of PCR1 cycles is indicated on the x-axis. Black and red dots indicate samples amplified using 5 and 10 cycles for PCR2, respectively. A highly significant linear relationship (p < <0.001, linear regression model) was observed. (BMP 441 kb) Additional file 21: Figure S12. Effects of reverse PCR1 primer indexing in HiSeq data. Scores plot based on a principal component analysis model computed from the matrix of species relative abundances in the mock community sequenced on the HiSeq (dataset 4, Additional file 7: Table S6). Samples are colored according to the reverse primer used for PCR1. The first two dimensions, explaining 72% of the total variance, are shown. (BMP 441 kb) Additional file 22: Figure S13. Effects of forward primer indexing in HiSeq data. Scores plot based on a principal component analysis model computed from the matrix of species relative abundances in the mock community sequenced on the HiSeq (dataset 4, Additional file 7: Table S6). Samples are colored according to the forward primer used for PCR1. The first two dimensions, explaining 72% of the total variance, are shown. (BMP 441 kb)