Adapterama II: universal amplicon sequencing on Illumina platforms (TaggiMatrix)

Next-generation sequencing (NGS) of amplicons is used in a wide variety of contexts. In many cases, NGS amplicon sequencing remains overly expensive and inflexible, with library preparation strategies relying upon the fusion of locus-specific primers to full-length adapter sequences with a single identifying sequence or ligating adapters onto PCR products. In Adapterama I, we presented universal stubs and primers to produce thousands of unique index combinations and a modifiable system for incorporating them into Illumina libraries. Here, we describe multiple ways to use the Adapterama system and other approaches for amplicon sequencing on Illumina instruments. In the variant we use most frequently for large-scale projects, we fuse partial adapter sequences (TruSeq or Nextera) onto the 5′ end of locus-specific PCR primers with variable-length tag sequences between the adapter and locus-specific sequences. These fusion primers can be used combinatorially to amplify samples within a 96-well plate (8 forward primers + 12 reverse primers yield 8 × 12 = 96 combinations), and the resulting amplicons can be pooled. The initial PCR products then serve as template for a second round of PCR with dual-indexed iTru or iNext primers (also used combinatorially) to make full-length libraries. The resulting quadruple-indexed amplicons have diversity at most base positions and can be pooled with any standard Illumina library for sequencing. The number of sequencing reads from the amplicon pools can be adjusted, facilitating deep sequencing when required or reducing sequencing costs per sample to an economically trivial amount when deep coverage is not needed. We demonstrate the utility and versatility of our approaches with results from six projects using different implementations of our protocols. Thus, we show that these methods facilitate amplicon library construction for Illumina instruments at reduced cost with increased flexibility. A simple web page to design fusion primers compatible with iTru primers is available at: http://baddna.uga.edu/tools-taggi.html. A fast and easy to use program to demultiplex amplicon pools with internal indexes is available at: https://github.com/lefeverde/Mr_Demuxy.

an ever-growing capacity to generate more reads per run. Substantial progress has been made in 92 developing new, lower-cost instruments, but much less progress has been made in reducing the 93 cost of sequencing runs (cf., Glenn, 2011vs. Glenn, 2016. Thus, the large number of reads from 94 a typical NGS run comes with a relatively large buy-in cost but yields an extremely low cost per 95 read. Frustratingly, within every NGS platform, the lowest-cost sequencing kits have the highest 96 costs per read (Glenn, 2011;2016). This creates a fundamental challenge: how do we efficiently 97 create and pool large numbers of samples so that we can divide the cost of high capacity NGS 98 sequencing runs among many samples, thereby reducing the cost per sample? 99 It is well known that identifying DNA sequences (commonly called indexes, tags, or 100 barcodes; we use the term indexes throughout) can be incorporated during sample preparation for 101 NGS (i.e., library construction) so that multiple samples can be pooled prior to NGS, thereby 102 allowing the sequencing costs to be divided among the samples (see Faircloth & Glenn, 2012 and 103 references therein). When sufficient unique identifying indexes are available, many samples, 104 including samples from multiple projects, can be pooled and sequenced on higher throughput 105 platforms which minimizes costs for all samples in the pool. 106 In many potential NGS applications, the number of desired reads per sample is limited, so 107 the cost of preparing samples for NGS sequencing becomes the largest component of the overall 108 cost of collecting sequence data. Thus, it is desirable to increase the number of low-cost library 109 preparation methods available. As the cost of library construction is reduced, projects requiring 110 fewer DNA sequences per sample become effective to conduct using NGS (e.g., if sample 111 preparation plus sequencing for NGS is < sample preparation plus sequencing on capillary 112 machines, then it is economical to switch).

114
Previous NGS amplicon library preparation methods 115 Amplicon library preparations for NGS have been integrating indexes for more than a decade 116 (e.g., Binladen et al., 2007;Craig et al., 2008). Early NGS strategies consisted of conducting 117 individual PCRs targeting different DNA regions from one sample and then pooling them 118 together. Then, full-length adapters would be ligated to each sample pool, providing sample-119 specific identifiers. This approach has the advantage of being economical regarding amplicon 120 production, primer cost, and pooling of amplicons prior to adapter ligation, as well as being 121 ecumenical because the resulting amplicons can be ligated to adapters for any sequencing 122 platform. The downside of this first approach is that adapters must be ligated to the amplicons, 123 which is time-consuming, expensive, and error-prone, and which can introduce errors into the 124 resulting sequences. To avoid ligation of adapters to amplicons, most NGS amplicon sequencing 125 strategies have subsequently relied upon the fusion of locus-specific primers to full-length 126 adapter sequences and the addition of identical indexes to both 5' and 3' ends (e.g., Roche fusion 127 primers; Binladen et al., 2007;Bentley et al., 2009;Bybee et al., 2011;Cronn et al., 2012;128 Shokralla et al., 2014). These strategies often use the whole sequencing run for amplicons only. 129 Illumina platforms have traditionally struggled to sequence amplicons because: 1) the platform 130 requires a diversity of bases at each base position (Mitra et al., 2015), which is easily achieved in 131 genomic libraries but not in amplicon libraries; and 2) read-lengths are limited, making the 132 complete sequencing of long amplicons challenging or impossible. 133 Several alternatives have been proposed to resolve the first issue (i.e., low base-134 diversity). Users have typically added a genomic library (e.g., the PhiX control library supplied 135 by Illumina) to amplicon library pools to create the base-diversity needed, but this method 136 wastes sequencing reads on non-target (PhiX) library. Second, to solve the issue of limited read-137 length, described above, custom sequencing primers can be used in place of the Read1 and/or 138 Read2 sequencing primer(s) (Caporaso et al., 2011). This method allows for longer effective 139 read-lengths by removing the read-length wasted by sequencing the primers used for 140 amplification (e.g., 16S primer sequences), but it can be very expensive to optimize custom 141 sequencing primers, costing hundreds of dollars for each attempt. Another alternative is to use 142 the amplicons as template for shotgun library preparations, most often using Nextera library 143 preparation kits (Illumina 2018a). A fourth method is to add heterogeneity spacers to the indexes 144 in the form of one, two, three (etc.) bases before the index sequence (e.g., Cruaud et al., 2017), 145 but because amplicons can contain repeats longer than the heterogeneity spacers, it is still 146 possible to have regions of no diversity. Thus, all of the proposed solutions have specific 147 limitations, and none are particularly economical for sequencing standard PCR products from a 148 wide range of samples, as is typical in molecular ecology projects. 151 In general, NGS has been widely adopted to sequence complex amplicon pools where 152 cloning would have been used previously (e.g., 16S from bacterial communities or viruses within 153 individuals). Such amplicon pools may have extensive or no length variation. Amplicons for 154 single loci from haploid or diploid organisms (with no length variation between alleles) are 155 typically still sequenced via capillary electrophoresis at a cost of about $5 USD per read. In 156 contrast to the high cost of individual sequencing reads via capillary instruments, >50,000 come in units of ~$2,000 USD for reads that total a length similar to that of capillary sequencing 159 (Glenn, 2016; paired-end (PE) 300 reads). Thus, it would be desirable to have processes that 160 allow users to: 1) pool samples from multiple projects on a single MiSeq run and divide costs 161 proportionately, and 2) prepare templates (i.e., construct libraries) at costs less than or similar to 162 those of traditional capillary sequencing.

163
Characteristics of an ideal system include: 1) use of universal Illumina sequencing 164 primers; 2) minimizing total sample costs, ideally to be below standard capillary/Sanger 165 sequencing; 3) minimizing time and equipment needed for library preparations; 4) minimizing 166 buy-in (start-up) costs; 5) eliminating error-prone steps, such as adapter ligation, 6) maximizing 167 the number of samples (e.g., ≥ thousands) that can be identified in a pool of samples run 168 simultaneously, 7) maximizing the range of amplicons that can be added to other pools (e.g., 169 from <1% to >90%), and 8) creating a very large universe of sample identifiers (e.g., ≥ millions) 170 so that identifiers would not need to be shared among samples, studies, or researchers, even 171 when coming through large sequencing centers.

172
Single-locus amplicon sequencing represents one extreme example of the needs identified 173 above. In some scenarios, researchers may only be sequencing a single short, homogeneous 174 amplicon where ≥ 20x coverage is excessive. The cost of sequencing reagents for only 20 reads 175 of 600 bases on an Illumina MiSeq using version 3 chemistry, which generates ~20 million 176 reads, is <$0.01 USD (i.e., 1 millionth of the run). It is impractical to amass 1 million amplicon 177 samples for a single run. However, a small volume of dozens or hundreds of samples can be 178 easily added into a MiSeq run with other samples/pools that need the remaining of reads. By each with their own identification indexes, is critical to the feasibility of this strategy. We have 183 developed, and describe below, a system to meet most of the design characteristics enumerated 184 above.

185
In this paper, we focus on library preparation methods for amplicons. We introduce 186 TaggiMatrix, which is an amplicon library preparation protocol that is built upon methods 187 developed in Adapterama I (Glenn et al., 2019). This general method can be optimized for 188 various criteria, including the minimization of library preparation cost and reduction of PCR 189 bias. Briefly, by tagging both the forward and reverse locus-specific primers with different, 190 variable-length index sequences, and also by including indexes in the iTru or iNext primers, we 191 create quadruple-indexed libraries with high base-diversity, enabling the use of highly 192 combinatorial strategies to index, pool, and sequence many samples on Illumina instruments.

195
Methodological objectives 196 Our goal was to develop a protocol that would help to overcome the challenges of amplicon 197 library preparation and fulfill the characteristics of an ideal system enumerated above. We extend  2). This approach should work with a wide variety of primers (e.g., Table 2). Such combinatorial 218 indexing is designed to work in 96-well plate arrays but can be modified for other systems. (1-12) are designed and synthetized (File S1). Then, each DNA sample in each well of the 96-221 well plate can be amplified with a different forward and reverse primer combination (File S1, 222 PCR_Set_up). These PCR products can be pooled and amplified using a similar combinatorial 223 scheme with tagged universal iTru/iNext primers in the second PCR (Table 3) with Read1 and Read2 fusions (e.g., R1Forward + R2Reverse, vs. R1Reverse + R2Forward; 228 "flipped" primers) to account for this issue (Fig. 2). It is also possible to do replicate 229 amplification with both sets of primers (regular and flipped), to significantly increase base 230 diversity in amplicon libraries.

232
TaggiMatrix applied case studies 233 We tested iTru primers designed as described above in five different experiments covering a 234 wide range of experiments typically done in molecular ecology projects, and we tested iNext 235 primers designed as described above in a single project (Table 4). In each experiment, we used at 236 least two sets of primers: the first set (i.e., locus-specific fusion primers) generated primary 237 amplicons, and the second set (i.e., iTru or iNext) converted primary amplicons into full-length 238 libraries for sequencing (Fig. 3).  Table 2). To assist with production of fusion primers and 248 reduce errors, we have created and provided Excel spreadsheets (TaggiMatrix; File S1) and a 249 web page (http://baddna.uga.edu/tools-taggi.html). With TaggiMatrix, users can simply input the 250 names and sequences of the locus-specific primers, and all 22 (i.e., 2 non-indexed and 20 251 internally indexed) fusion primers and names are generated automatically. It is important to note 252 that secondary structures or other PCR inhibiting characteristics are not checked by these tools 253 (see Discussion). We then used the locus-specific fusion primers in a primary PCR, followed by 254 a clean-up step and a subsequent PCR with iTru primers from Adapterama I. As an example, a 255 general protocol for 16S amplification using TaggiMatrix can be found in File S2. 256 We used this approach for five projects (Table 4), each with slight modifications. First, 257 we used primers targeting cytochrome-b to characterize the source of blood meals in kissing 258 bugs; in this project, we first amplified DNA with standard primers, then ligated a y-yoke 259 adapter to these products, and then amplified these products in an iTru PCR (Method 1 in Table   260 3). Second, we used primers targeting several portions of the ITS region, including "flipped" 261 fusion primers, to identify fungal pathogens in tree tissues; in this project, we first amplified 262 DNA with standard primers, then amplified these products with indexed fusion primers, and then 263 amplified these products in an iTru PCR (Method 2 in Table 3). Third, we used primers targeting 264 12S to characterize plethodontid salamander communities from environmental DNA samples; in 265 this project, we first amplified DNA with either internally indexed or non-indexed fusion primers 266 and then amplified these products in an iTru PCR (Methods 4 or 5 in Table 3). Fourth, we used  278 We generated libraries compatible with Nextera sequencing primers using the same approach as TaggiMatrix Excel file (File S1) to facilitate the construction of iNext fusion primers. 286 We used this approach in one project. We used primers targeting one chloroplast locus, 287 two mitochondrial loci, and two nuclear loci to perform a fine-scale population genetic analysis 288 of the invasive vine Wisteria; in this project, we first amplified DNA with indexed fusion 289 primers and then amplified these products in an iNext PCR (Method 5 in Table 3). Full methods 290 describing the sample collection, DNA extraction, library construction (including detailed 291 descriptions of pooling schemes), and data analysis are included in the File S3.

293
Pooling, Sequencing, Analysis 294 The methods used for pooling, sequencing and analysis varied among the six projects 295 (File S3), but some general approaches were consistently employed. Amplicon library pools 296 from each of the six projects were pooled with additional samples and sequenced at different 297 times on Illumina MiSeq instruments. The sizes of the amplicons were determined from known 298 sequence targets and verified by agarose gel electrophoresis and known size-standards. We 299 quantified purified amplicon pools using Qubit (Thermo Fisher Scientific Inc, Waltham, MA). 300 We then input the size, concentration, and number of desired reads for amplicon sub-pools and 301 all other samples or sub-pools that would be combined together for a sequencing run into an 302 Excel spreadsheet to calculate the amount of each sub-pool that should be used (an example file 303 of our pooling guide can be found in File S4). We targeted total proportions ranging from <1% to 304 44% of the MiSeq runs (Table 4). We used v.3 600 cycle kits to obtain the longest reads possible 305 for four of the projects and v.2 500 cycle kits for two of the projects, which reduces buy-in costs 306 when shorter reads are sufficient.

307
Following sequencing, results were returned via BaseSpace or from demultiplexing the 308 outer indexes contained in the bcl files using Illumina software (bcl2fastq). Following

321
We used five methods that take advantage of iTru or iNext indexing primers developed in 322 Adapterama I in six exemplar amplicon sequencing projects. These projects illustrate the range 323 of methodological approaches that can be used to overcome challenges of amplicon library 324 preparation and fulfill most of the characteristics of an ideal amplicon library preparation system.

325
In all but one project (Table 4, project 1), we designed fusion primers to generate 326 amplicons that can be amplified by iTru5 and iTru7 (or iNext5 and iNext7) primers to create full- Illumina sequencing, and maximization of efficiency of library preparation.

334
In our project characterizing the blood meals of kissing bugs (Table 4, project 1), we 335 obtained an average of 116,902 reads for each sample and identified a total of five unique 336 vertebrate species as the source of the blood meals. In our project identifying fungal pathogens in 337 tree tissues (Table 4, project 2), we obtained an average of 436,825 reads per pool (i.e., 96 338 samples) and characterized the diverse fungal communities found in these samples. In our project 339 characterizing plethodontid salamander communities from environmental DNA samples (Table   340 4, project 3), we obtained an average of 163,555 reads for each PCR replicate and identified 341 reads matching 6/7 species expected to be present in the streams. In our project comparing basal 342 DNA methylation of p21 (Table 4,  In Adapterama I, we introduced a general approach to reduce the cost of genomic library 378 preparations for Illumina instruments. Here, we made extensive use of the iNext and iTru 379 primers described in Adapterama I and show that these can also be used to facilitate amplicon 380 library construction at reduced cost with increased flexibility. As we did in Adapterama I, we 381 focused mostly on iTru to simplify our presentation of the method, but iNext works identically in 382 most situations.

383
Although we focused on Illumina, many of these approaches can be extended to other 384 platforms following the design principles described here (e.g., use primers from sheet 385 ITS_10nt_5'tags in File S1 following Method 3). For platforms that sequence individual 386 molecules (e.g., PacBio and Oxford Nanopore), there is no advantage to variable-length indexes 387 and negligible penalty for longer indexes, but there are significant informatic advantages to 388 equal-length indexes. Thus, for many other platforms, it will be better to use longer indexes of 389 equal length.

390
In general, TaggiMatrix Method 5 achieves our design goals, in that it: 1) uses the 391 universal Illumina sequencing primers; 2) minimizes costs (as little as $2.20 per library, i.e.  (Table 1). Because Illumina reads are of set length, longer spacers 420 decrease the total amount of useful sequence obtained for downstream analyses. Thus, there is a 421 trade-off in how long the heterogeneity spacers should be. Here, we implement a 0-3 nt long 422 heterogeneity spacers, although this could be easily tuned to 0-7 nt for forward primers and 0-11 423 nt for reverse primers, to accommodate any researcher's preferences and mononucleotide repeats 424 known to occur in the target sequences.

437
TaggiMatrix provides an easy way to create indexed fusion primers for convenient 438 ordering at any oligo vendor of your choice. However, the current web page and spreadsheets do 439 not perform quality control of the primer sequences generated. Thus, before ordering, it is 440 important to validate the fusion primers to ensure hairpins, dimers and other secondary structures 441 that inhibit PCR are not created. Several programs exist to validate the primers designed and 442 these should be used before ordering. It is also generally recommended that a small number of 443 fusion primers should be obtained and tested prior to investing large batches of long fusion 444 primers. When deciding on the best method to use (i.e., Methods 1-5), the number of samples, 445 reagent cost, and time available to optimize the primers should be considered (Fig. 5).

446
While developing adapters and primers to make multiple libraries that will be pooled and 447 sequenced, it is important to determine if the primers with different indexes have biased 448 amplification characteristics. This can be accomplished by testing all primers via quantitative 449 PCR using a common template pool to ensure that each primer was synthesized, aliquoted, and 450 reconstituted successfully and has similar amplification efficiency. In practice, however, it will 451 not be economical or necessary to conduct such rigorous quality control for many projects. It is 452 important to note that because sequencing reads are so cheap (~10,000 reads per $1 USD for 453 PE300 reads on a MiSeq), being off by thousands of reads per sample is less expensive than 454 precise quantification, especially when personnel time for such quantification is considered.

455
Thus, it will often be less expensive to subsample reads from overrepresented samples and/or 456 simply redo the small proportion of samples that do not generate a sufficient number of reads.

457
Another common concern with amplicon library preparation methods involving PCR is the 458 introduction of bias due to PCR duplicates. Our method can be modified to incorporate 8N     Table 1 Internal identifying index sequences. All indexes have an edit distance of ≥ 3. Upper case letters are the indexes; lower case letters add length variation to facilitate sequence diversity at each base position of amplicon pools (see text for details). For Illumina MiSeq and HiSeq models ≤ 2500, adenosine and cytosine are in the red detection channel, whereas guanine and thymine are in the green channel. Indexes and spacers have balanced red and green representation at each base position within each group of four indexes (i.e., count 1-4, 5-8, 9-12, 13-16, and 17-20). Table 2 Primer pairs used in the example projects presented.
Project, target locus, forward and reverse primer names and sequences, as well as the sources of the primer sequences are shown.   Table 4 Detailed information for example projects presented to validate our approach.

Project
Summarized information for all example projects used to demonstrate Taggimatrix. The "Method" column refers to methods in Table  3; the "Target Reads" column cites the approximate number of reads per pool (i.e., not per individual sample) we targeted when pooling samples with other libraries. Note that these data were generated on many independent MiSeq runs. The kissing bug image is from Joseph Hughes (https://creativecommons.org/licenses/by-nc-sa/3.0/), and all other images are from PhyloPic 2.0 (Public Domain Dedication 1.0).  Table 5 Oligos and iTru buy-in, and library prep costs among methods.

#
Costs associated to the implementation of the different methods. In segment a) we present buy-in cost of oligos and iTru primers and cost per sample of library prep which consists of both, fixed and variable costs depending on pooling at early stages. Segment b) is the cost of library prep (no considering primers/adapters) per sample given a number of samples. Segment c) is the total experimental cost of primers/adapters and library prep according to the number of samples in the experiment, the first section is in term of number of samples, the second section is in terms of plates, each plate consisting of 96 samples. Cost for iTru are calculated list prices of aliquots from baddna.uga.edu. Costs for 'oligos' are calculated using list prices from Integrated DNA Technologies (IDT; Coralville, IA). Other costs are from listed prices from various vendors by Jan 2019. Please view File S1 and S6 for additional details on price calculations and also to review total prices of experiment given a number of samples. Note: These will be added individually to PeerJ with each file upload. Don't include " Figure 1"; just add the title and description separately. Titles are in bold and descriptions are in plain font.

Figure 1
High throughput workflow to create and multiplex TaggiMatrix libraries The components of the quadrupled-indexed amplicon Libraries. A specific DNA region is amplified using fusion and tagged locus-specific primers, also known as "indexed fusion primers", to produce a fusion amplicon. Then iTru adapters are ligated using Y-yolk adapters or incorporated using limited cycle PCR with i5 and i7 indexed primers to make the complete double stranded DNA library. Internal indexes and outer i5/i7 indexes are represented as well as the set of primers used.

Figure 2
Examples of possible primer types (Table 3), including "flipped" fusion primers Elements in the box are combined to form each of these various primer types, shown below the box. Standard locus-specific primer sequences are indicated by the letter "N", in uppercase the forward primer and lowercase the reverse primer. Green and red nucleotide bases refer to unique index sequences. Blue and pink sequences are Read1 and Read 2 fusion sequences, respectively.

Figure 3
Sequencing reads that can be obtained from dual-indexed paired-end reads. a) Illustration of a double-stranded DNA molecule from a full-length amplicon library (i.e., following the limited-cycle round of PCR). Horizontal arrowheads indicate the 3' ends. Labels on the double-stranded DNA indicate the function of each section, with shading to help indicate boundaries. b) Scheme of the four separate primers used for the four sequencing reactions that occur in paired-end dual-indexed sequencing and the reads that each primer produces (number in the circle). The four sequencing primers are added one at a time in the following order -Read1, Index Read1, Index Read2, and Read2. Vertical height indicates this order (top primer added first). 3A and 3B correspond to workflow A (NovaSeq™ 6000, MiSeq™, HiSeq 2500, and HiSeq 2000) and workflow B (iSeq™ 100, MiniSeq™, NextSeq™, HiSeq X, HiSeq 4000, and HiSeq 3000), respectively, of dual-indexed workflows on paired-end flow cells (Illumina 2018).

Figure 4
Total cost of experiments across the five methods given a number of samples. Line plot of price of each method according to the number of samples. The starting point in the X-axis (x=0) represents the buy-in cost of oligos.

Figure 5
Decision tree to select the best fitting method according to the experiment goals and budget.
Guide of choices to drive an informed decision over the method for amplicon sequencing that may be fit the best for your lab/research/experiment goals.
Supplementary Figure S1 Diagram of full-length amplicon TaggiMatrix library product Double stranded amplicon library product after implementation of TaggiMatrix. Indication tags and indexes incorporated through the use of Fusion primers and iTru/iNext primers, respectively.
Supplementary Figure S2 Detailed illustration of the components on one of the possible designs (Method 5) to construct TaggiMatrix amplicon libraries First, locus specific fusion primers with tags are used to amplify the target DNA region. From this step pooling is possible thanks to the presence of indexes. Then library amplification with the use of iTru univers primers with indexes that allows pool labeling and incorporation of Illumina platform oligos (P5 and P7).
Supplementary File S1 TaggiMatrix spreadsheet Excel spreadsheet demonstrating the step-by-step process to create indexed fusion primers with TaggiMatrix. The first sheet (Introduction) is an introductory explanation of how the document works. The second, third, and fourth sheets (…iTru_Fusions) are examples of the creation of indexed fusion primers for 16S, cyt-b and COI universal primers, respectively. The fifth sheet (iNext_&_iTru_Primers) is a list of the universal primer sequences and prices. The sixth and seventh sheets (…Order_Sheet) are examples of how to fill the order form to fill plates with primer sets. The eighth sheet (PCR_Setup) indicates how to combinatorically layout the primers for a 96-well plate. The ninth, tenth, and eleventh sheets (…Tags…) list the index sequences that are incorporated to the fusion primers, their spacers, and examples.

Supplementary File S2 TaggiMatrix protocol for 16S amplicon library prep
Step-by-step library construction for 16S libraries with indexed fusion primers.

Supplementary File S3 Supplementary methods and results for TaggiMatrix example datasets
A detailed guide through the methods, results, and discussion of sequence analyses from TaggiMatrix data generated for each example dataset presented in this manuscript.

Supplementary File S4 TaggiMatrix video: what is happening inside the tube?
This presentation demonstrates the key features of TaggiMatrix, including how the combinatorial indexing is performed in a plate.

Demultiplexing Internal Indexes Using Mr. Demuxy
Guide of how to run Mr. Demuxy to demultiplex using internal indexes amplicon data in fastq format.