Adapterama III: Quadruple-indexed, triple-enzyme RADseq libraries for about $1USD per Sample (3RAD)

Molecular ecologists have used genome reduction strategies that rely upon restriction enzyme digestion of genomic DNA to sample consistent portions of the genome from individuals being studied (e.g., RADseq, GBS). However, researchers often find the existing methods expensive to initiate and/or difficult to implement consistently. Here, we present a low-cost and highly robust approach for the construction of dual-digest RADseq libraries. Major features of our method include: 1) minimizing the number of processing steps; 2) focusing on a single strand of sample DNA for library construction, allowing the use of a non-phosphorylated adapter on one end; 3) ligating adapters in the presence of active restriction enzymes, thereby reducing chimeras; 4) including an optional third restriction enzyme to cut apart adapter-dimers formed by the phosphorylated adapter thus increasing the efficiency of adapter ligation to sample DNA; 5) integrating adapter designs that can be used interchangeably; 6) incorporating variable-length internal tags within the adapters to increase the scope of sample tagging and facilitate pooling while also increasing sequence diversity; 7) maintaining compatibility with universal dual-indexed primers, which facilitate construction of combinatorial quadruple-indexed libraries that are compatible with standard Illumina sequencing reagents and libraries; and, 8) easy tuning for molecular tagging and PCR duplicate identification. We present eight adapter designs that work with 72 restriction enzyme combinations, and we demonstrate our approach by the use of one set of adapters and one set of restriction enzymes across a variety of non-model organisms to discover thousands of variable loci in each species.

cut by NheI and ligated to Read 1 adapters should be recut by the same RE, any remaining 2 1 1 molecules of this form are suitable for PCR amplification and may be present in final libraries. primers, each of which returns a different indexing read (Fig. 4). We tested the 3RAD protocol on eight example projects: Kinosternidae (turtles), Ixodidae samples. These projects span a broad diversity of organisms (e.g., in taxonomic classification, 2 2 9 population size, heterozygosity level, and genome size), motivating evolutionary questions, and 2 3 0 associated methods (i.e., from population genetics to phylogenetics; Table 3). After preliminary 1 0 species corresponding to adapter sets R1.A and R2.1 (Table 1 and Design 1 in Appendix S3). We 2 3 3 used the enzymes XbaI, EcoRI-HF, and NheI for all example projects. 3. Oligonucleotides were ordered in plates from Integrated DNA Technologies (Coralville, IA).

3 8
Adapters were solubilized in a solution of 10 mM Tris pH 8, 0.1 mM EDTA, and 100 mM NaCl, 2 3 9 mixed in equal-molar amounts, annealed, diluted and aliquoted (Appendix S4). Similar methods were employed for all samples, but some details varied among species, 2 4 3 as the protocols were developed and implemented over the course of multiple years. In general, 2 4 4 we focus on the version of our protocol that does not include molecular ID tags (Appendix S1 Hoffberg et al. 2016, Appendix S5, and Appendix S6 for protocols that employ molecular ID 2 4 7 tags).

5
Finally, we estimated the prevalence and impact of loci with third RE cut sites in our data.

6
We estimated the proportion of this third RE cut site relative to the first RE cut site (i.e., intended 3 3 7 cut site) for five of the projects presented here. To evaluate the effect of these loci in 3 3 8 downstream analyses, we reanalyzed two of our projects (i.e., both Sphyrna projects) after  cleaning and trimming the raw sequence data as before, but disabling rad check (--3 4 2 disable_rad_check) to leave the cut sites intact; and, 2) using the previous step's output as input, checking only for exact, intended RE cut sites (i.e., XbaI and EcoRI). From this output, we  samples are pooled. The initial cost of restriction digestion and ligation is ~$0.85 per sample 3 5 8 (Appendix S3, "Library_prep_costs" Sheet). If samples are amplified individually, this adds > 3 5 9 $1 per sample, but if the ligations are pooled, then this cost averages to $0.06 per sample. Size-3 6 0 selection using the Pippin prep adds $0.12 per sample, assuming user access to the equipment. A 3 6 1 total of $0.25 per sample is required for tips, plates and tubes. Thus, the total cost for library 3 6 2 preparation is about $1.35 per sample, when samples are amplified in pools of 96.

6 3
In the Ixodidae example project, we obtained between 1.8-7.3 (mean =4.2) million reads 3 6 4 per sample, and for all other projects, we obtained between 0.6-3.6 (mean = 1.3) million reads Information, Table S1). Average coverage per locus varied from 6x for Eurycea to 70x for  Table 3, Supporting Information Table S1).

6 9
The number of loci obtained for each dataset varied from 18,629 in Gambusia to 425,729 in Eurycea. After filtering to retain only polymorphic loci found in at least 75% of individuals 3 7 1 within each population in each dataset, we recovered between 30 (Eurycea) to 19,843 (Ixodidae) 3 7 2 loci containing between 360 and 69,518 SNPs, respectively (Table 3). As expected for RADseq 1 6 and reduced the size of our final datasets from 7183 to 6738 loci in Sphyrna tiburo and 5263 to  (Table S4). We have developed a flexible low-cost system for preparing dual-digest RADseq libraries from many individuals. To illustrate the utility of our method, we presented summary 3 9 2 statistics from analyses of eight small example projects, representing diversity in taxonomy and 3 9 3 scientific objectives. For each project, we obtained tens to thousands of loci containing SNPs for 3 9 4 downstream population genetic and phylogenetic analyses, and among projects, we highlighted 3 9 5 the role of genome size, genetic variation, and sequencing coverage in determining the quality 3 9 6 and quantity of data recovered. For example, when processed differently, we recovered large 3 9 7 numbers of homologous loci both among species in the family Kinosternidae and within a single representative species, Sternotherus depressus, from the same libraries and sequencing reads. These data are informative both in studying variation among populations of S. depressus and 4 0 0 across relatively deep evolutionary time (~55 Myr; Appendix S6). We note that when datasets 4 0 1 span deeper evolutionary time (e.g., Eurycea and Kinosternidae), we recover more loci in initial 4 0 2 steps, but fewer of these loci are shared among individuals. However, increasing evidence 4 0 3 supports the utility of these large and sparse data matrices for systematic-level questions (e.g.  using REs with shorter recognition sequences yielded more loci; data not shown). Additionally, 4 1 8 using broad size-distributions allows larger numbers of loci to be selected and gives more size 4 1 9 tolerance among alleles, whereas narrow distributions yields more limited numbers of loci. We 4 2 0 suspect that use of narrow size-distributions excludes alleles at loci with significant size variation, 4 2 1 and may lead to increased levels of incorrect genotype calls due to the missing alleles outside of 4 2 2 the selected size-range. However, most researchers are targeting loci without size variation, so 4 2 3 this bias should be small.

2 4
A key advance of the 3RAD workflow (Appendix S1 and S2) is the combination of decreases the amount of input DNA required. The 3RAD adapters we designed are useable with 4 3 0 multiple enzymes that leave compatible, cohesive ends (Table 2). Thus, there are at least 72 so that researchers may easily change the costs to reflect updated pricing from their supplier(s).

3 8
It should be noted that certain enzyme combinations work well with some species, but best for any particular organism in a few representative samples (Table 1). We start by looking  Our results show that XbaI, EcoRI-HF and NheI provide a suitable combination of REs  shown). Most of the organisms we have studied to date work well with these three REs, and 4 5 5 most that do not work with the design 1 adapters and primary enzymes, perform better with the 1 9 design 2 adapters and primary enzymes (MspI, ClaI, and BamHI; data not shown). Although viable, we only rarely use the adapters pairs from designs 3 and 4 (R1.C, R1.D, R2.3 and R2.4).

5 8
In our standard protocols (Appendix S1 and S5), we cut DNA with three different REs-4 5 9 two enzymes to create sticky ends for adapter ligation (similar to ddRAD and 2-enzyme GBS) 4 6 0 and the third enzyme to digest a recognition site formed by self-ligation of the phosphorylated 4 6 1 adapters (Fig. 2). Although the third enzyme facilitates creation of libraries with very little input that can be ligated to the R1 adapter, but the adapter:DNA ligation product is susceptible to re- to cleave as many of these products as possible. These loci are, in principle, suitable for 4 6 6 downstream analyses, but because the protocol is designed to minimize their retention, they 4 6 7 should have lower coverage than the intended RE cut site. A high prevalence of these off-target 4 6 8 loci can require additional sequencing (and thus, increase costs), but these reads can easily be optimal concentration are warranted. Alternatively, it is possible to engineer adapters that can 4 7 2 ligate to genomic DNA, but not self-ligate (e.g., using 3' dideoxycytidine), which we have done, phosphorylated adapters that ligate together get cut apart, thus much lower amounts of input  sequence capture to focus sequencing on informative loci, and reduce missing data, and remove 4 9 8 PCR duplicates. In summary, our 3RAD method produces results similar to others reported in the   Derivative works, such as the oligonucleotides presented here, are authorized for use with 5 0 7 1 Illumina instruments and products only. All other uses are strictly prohibited. We thank: Julian 5 0 8 Catchen for modifying Stacks and Isaac Overcast and Deren Eaton for modifying ipyrad to deal 5 0 9 with the variable-length dual internal tags, as well as incorporating additional enzymes to 5 1 0 facilitate data processing of 3RAD data; Erin Lipp for generously sharing laboratory space and support for adapter oligonucleotide development and synthesis. We also thank our colleagues 5 1 5 involved in logistics and sample collection for example projects included in this paper. This Foundation.  Sequence reads generated for study are deposited in the NCBI SRA (SUB2631727). All 6 1 7 scripts used to analyze data and VCF files produced are available from Dryad additional data analysis; all authors edited and commented on the manuscript. The authors declare competing interests. The EHS DNA lab provides oligonucleotide presented herein allows all researchers to synthesize the oligonucleotides at any vendor of their 6 3 2 choice and freely publish results with proper attribution to this paper and Illumina ® ™.  blocks adapter-dimer formation and is not actually required, but is used for naming each adapter set (supplementary file 3). Digestion efficiency 6 3 5 is given for three NEB buffers (2.1, 3.1, and CutSmart®), with the best conditions highlighted in green, and poor or important non-standard 6 3 6 conditions in red. Sensitivity to methylation in the template sequence is given, as is the optimal temperature for digestion and the number of bases 6 3 7 in the recognition sequence.