Effect of marker choice and thermal cycling protocol on zooplankton DNA metabarcoding studies

Abstract DNA metabarcoding is a promising approach for rapidly surveying biodiversity and is likely to become an important tool for measuring ecosystem responses to environmental change. Metabarcoding markers need sufficient taxonomic coverage to detect groups of interest, sufficient sequence divergence to resolve species, and will ideally indicate relative abundance of taxa present. We characterized zooplankton assemblages with three different metabarcoding markers (nuclear 18S rDNA, mitochondrial COI, and mitochondrial 16S rDNA) to compare their performance in terms of taxonomic coverage, taxonomic resolution, and correspondence between morphology‐ and DNA‐based identification. COI amplicons sequenced on separate runs showed that operational taxonomic units representing >0.1% of reads per sample were highly reproducible, although slightly more taxa were detected using a lower annealing temperature. Mitochondrial COI and nuclear 18S showed similar taxonomic coverage across zooplankton phyla. However, mitochondrial COI resolved up to threefold more taxa to species compared to 18S. All markers revealed similar patterns of beta‐diversity, although different taxa were identified as the greatest contributors to these patterns for 18S. For calanoid copepod families, all markers displayed a positive relationship between biomass and sequence reads, although the relationship was typically strongest for 18S. The use of COI for metabarcoding has been questioned due to lack of conserved primer‐binding sites. However, our results show the taxonomic coverage and resolution provided by degenerate COI primers, combined with a comparatively well‐developed reference sequence database, make them valuable metabarcoding markers for biodiversity assessment.

that include a wide range of metazoan and nonmetazoan phyla.
However, greater taxonomic coverage often comes at the cost of taxonomic resolution.
The "holy grail" of metabarcoding is to retrieve relative abundance data through the proportion of reads assigned to each taxon. Many studies have highlighted the potential of metabarcoding as at least a semiquantitative method for both nuclear ribosomal (Hirai et al., 2015;Lindeque et al., 2013;Sun et al., 2015;Weber & Pawlowski, 2013) and mitochondrial DNA markers (Evans et al., 2016;Kelly, Port, Yamahara, & Crowder, 2014;Murray et al., 2011). Biases introduced during DNA extraction, PCR amplification, and sequencing are likely to skew the number of reads per taxon, with a disproportionate effect of primertemplate mismatches on PCR-amplification efficiency (Elbrecht & Leese, 2015;Piñol et al., 2015). Hence, it may be particularly difficult to retrieve relative abundance data targeting protein-coding genes such as COI. However, a study using environmental DNA metabarcoding to characterize aquatic mesocosms found "mismatch potential" for six mitochondrial primer sets had no consistent effect on the relationship between species biomass and high-throughput sequencing (HTS) read abundance (Evans et al., 2016).
In this study, we compared the performance of one nuclear (18S) and two mitochondrial (COI and 16S rDNA) metabarcoding markers for characterizing zooplankton assemblages from Storm Bay, Tasmania. Southeast Australia is a global marine "hotspot" (Hobday & Pecl, 2013), with the greatest projected increases in sea surface temperature predicted to occur northeast and east of Tasmania (Lough, Gupta, & Hobday, 2012), including Storm Bay. Two of the markers (COI and 18S) have previously been used to characterize taxonomically diverse marine samples (Leray et al., 2013;Zhan et al., 2013Zhan et al., , 2014; the third was designed to amplify mitochondrial 16S rDNA from calanoid copepods, one of the most abundant and diverse components of the zooplankton. To compare performance of the three markers, we evaluated taxonomic coverage and resolution, correspondence between morphology-and DNA-based identification, and the ability to assess relative abundance of calanoid copepods from the proportion of HTS reads. For the COI marker, high annealing temperatures in the first rounds of the published touchdown PCR protocol (Leray et al., 2013) could bias PCR amplification toward taxa with less mismatches in the primer-binding sites (Sipos et al., 2007); hence, we compared the number of taxa detected using the touchdown protocol to a protocol with a single low annealing temperature. We also explored the technical repeatability of taxon detection by re-sequencing COI amplicons generated with identical PCR protocols.  volume of 10 μl. The optimum quantity of template DNA was determined with qPCR for each extract (Murray, Coghlan, & Bunce, 2015).
Replicate PCR products were pooled then diluted 1:10 and Illumina sequencing adapters added in a second round of PCR (10 cycles with an annealing temperature of 55°C) using the same conditions as the first round, except primer and MgCl 2 concentration was reduced to 0.1 μM each and 1.5 mM, respectively. Products from each round of PCR were separated by electrophoresis and visualized on 2% agarose gels. Second round PCR products were pooled in equal ratios based on band intensity (Murray et al., 2015). The pooled products were purified using Agencourt AMPure XP beads ( To explore repeatability of PCR amplification and sequencing, PCR amplicons were generated a second time using the 35 × 46°C protocol as described above and sequenced on a MiSeq using the MiSeq Reagent Kit v3 (2 × 300 bp) with the amplicon libraries for the marker comparison.

| Comparing metabarcoding markers
DNA extracts for January, March, and April, including no template controls, were PCR-amplified with two existing and one newly designed primer set (Table 1). We used ecoPrimer (Riaz et al., 2011) to design new primers (Cop16SF and Cop16SR) based on 33 mitochondrial genomes from 11 copepod species. The ecoPrimer parameters used were a maximum of three mismatches between each primer and the target sequence with no mismatches allowed within two nucleotides of the 3′ end. Amplicon length was restricted to 100-600 bp.
Primers were required to have no mismatches in at least 50% of species (option −q 0.50), with no more than three mismatches in at least 60% of species (−s 0.60). Primer design was refined based on calanoid copepod 16S sequences in Geneious version 8.1.7 (http://www.geneious.com, Kearse et al., 2012).
PCR amplifications were performed in two rounds as described for 20 s, with a final extension at 72°C for 5 min. COI could not be amplified with Phusion polymerase due to inosine residues in the reverse primer and was amplified with AmpliTaq Gold (Life Technologies, Melbourne, Australia), using 35 cycles with an annealing temperature of 46°C as described above. Replicate PCR products were pooled then diluted 1:10 and Illumina sequencing adapters added in a second round of PCR (10 cycles with an annealing temperature of 55°C) using the same conditions as the first round, except primer concentration was reduced to 0.1 μM each and MgCl 2 concentration was reduced to 1.5 mM for COI. Products from each round of PCR were separated by electrophoresis and visualized on 2% agarose gels. Pooling and purification were performed as described above, with paired-end sequencing performed on a MiSeq using MiSeq Reagent Kit v3 (2 × 300 bp).

| Data analysis
Reads were deconvoluted based on 10 bp MIDs on the MiSeq.

| Beta-diversity
Morphology-based counts and HTS read counts for site 2 were fourth-root-transformed. Differentiation among collection months for each metabarcoding marker was compared using Bray-Curtis distance based on a rarefied OTU

| Correlation of calanoid copepod HTS reads and biomass
We compared the strength of the correlation between biomass (determined from morphological species counts) and number of HTS reads per taxon for the three metabarcoding markers. We focused on calanoid copepods as all markers were capable of PCR-amplifying this group and the biomass of each species could be estimated from counts. Counts for each sex of each species were converted to dry weights (μg) based on sex-specific prosome lengths from the literature using the approach of Hirai et al. (2015). HTS read counts and dry weights were summarized at family-level and converted to the percentage of total calanoid HTS reads or dry weight. Pearson (r) and Spearman rank (ρ) correlation coefficients were calculated for correlations between the proportions of HTS reads and biomass using the "cor.test" function in R (R Core Team 2015).

| Effect of thermal cycling protocol on taxon detection
We compared OTUs detected with the COI marker from amplicons generated with either (1) the published touchdown PCR protocol (Leray et al., 2013) or (2)

| Repeatability of COI amplification & sequencing
The COI marker was PCR-amplified from the January DNA extracts

| Comparison of three metabarcoding markers and morphological ID
No template controls produced a small number of merged HTS reads (2-124) for each marker; however, no reads were retained for these samples after discarding sequences with mismatches in the primer or MIDs. The total number of reads with no mismatches to the expected MID and primers for the DNA extracts from the site sampled T A B L E 2 Comparison of OTUs and taxonomic assignments for a COI marker PCR-amplified using either a touchdown thermal cycling protocol (Leray et al., 2013)  in January, March, and April (site 2, n = 10 samples; including 2 biological replicates in March and April and 2 extracts for each collection) ranged from 47227 (COI) to 98870 (Uni18S), with rarefaction plots approaching a plateau for each marker ( Figure S1 in Appendix).
The number of OTUs detected was two-to threefold higher for COI compared to Uni18S and Cop16S (  (Figure 2).

| Taxonomic coverage and resolution
The majority of OTUs with taxonomy for each marker were assigned to metazoa (100% for Cop16S), whereas Uni18S OTUs were also assigned to Alveolata and Rhizaria, with COI OTUs assigned to these groups (except Rhizaria) as well as bacteria, Haptophyceae, fungi, stramenopiles, and Viridiplantae. Cop16S detected fewer zooplankton phyla (5) compared to COI and Uni18S (9 and 10, respectively, Table 3). However, COI detected a greater number of metazoan phyla (9) compared to Cop16S and Uni18S (5 and 7, respectively, Figure   S2). Reflecting the better resolution of mitochondrial markers, COI resolved threefold more zooplankton taxa to species compared to Uni18S. Although all markers were capable of amplifying crustacean taxa, mitochondrial markers resolved three-to fivefold more crustacean taxa to species compared to Uni18S (

| Morphology versus DNA-based ID
Zooplankton from site 2 were identified by morphology to species where possible, although some specimens were only assigned to phylum (e.g., larval bryozoans, undifferentiated Chaetognatha).
A total of 56 zooplankton taxa (January-32, March-31, April-45, Figure 2) representing 10 phyla were identified, with more than half the taxa belonging to Copepoda (62.5%). The three genetic markers combined detected 55-60% of morphologically identified taxa at site 2 ( Figure 4). DNA-based ID would often detect congeneric species to those identified using morphology. Including congeneric species increased the proportion of taxa detected using DNA to 69-77% ( Figure 4). Overall, 20 of the 25 instances where taxa (including congeners) were not detected with any marker were taxa that represented less than 2% of the total count (

No. taxa/OTUs
F I G U R E 3 Crustacean families detected with three metabarcoding markers in zooplankton samples from site 2 in Storm Bay, Tasmania. Circle size is proportional to the number of reads assigned to that taxon based on normalized read counts morphological identifications to higher taxonomic levels (e.g., the bryozoan Membranipora membranacea).

| Beta-diversity
Principle coordinate analysis plots showed samples clustered by month for each marker (p < .001), with greater than 48% of variation explained by collection month in each case (Uni18S: R 2 = .48, COI: R 2 = .49, Cop16S: R 2 = .67, Figure 5). The results of the SIMPER analyses showed there was typically better agreement between the morphological ID and mitochondrial markers (Table 4; Tables S3 and   S4). For example, COI, Cop16S, and morphology all indicated the cladocerans Penilia spp. and Podon intermedius (absent in January and abundant reads/count in March), and the euphausiid Nyctiphanes australis (more abundant in January) were the main taxa behind the difference between the January and March samples. In contrast, Uni18S identified Oikopleura dioica (two OTUs) and several copepods as the most significant contributors to the difference between January and March (Table 4).

| Correlation of calanoid percentage biomass & HTS read counts
We found a positive relationship between calanoid copepod familylevel proportions of HTS reads and dry weight for each marker ( Figure 6), with one or both Pearson and Spearman rank correlations significant for each marker in each month. Both correlation coefficients tended to be highest in each month for Uni18S (r = .63-.91, ρ = .43-.81, Table 5), although the Spearman's rank correlation was highest for COI in January (ρ = .87).

| Ecological insights provided by metabarcoding
Fish eggs and larvae were present in zooplankton samples from site 2 each month, but could not be identified further by morphology. In contrast, three fish species known to be present in Storm Bay were detected with COI at site 2 over the three collection times (Acanthaluteres vittiger, Aldrichetta forsteri, and Platycephalus richardsoni). The additional taxonomic resolution afforded by a metabarcoding approach could thus provide valuable information on the reproductive biology of important commercial and recreational fish species.
Metabarcoding also detected taxa known to be invasive in Storm Bay (e.g., the New Zealand screwshell, Maoricolpus roseus, Cop16S), as well as taxa experiencing range expansions as a result of regional increases in sea surface temperature, for example, Noctiluca scintillans (Hallegraeff, Hosja, Knuckey, & Wilkinson, 2008). Noctiluca scintillans was not detected at site 2 in January or March, but was present in all four replicates for both COI and Uni18S in April, immediately prior to a Noctiluca bloom at sites around Storm Bay in May 2015 (personal observation). Crustacean parasites of the Syndinidae family (Syndinium turbo (Uni18S) and Hematodinium (COI)) were detected in all samples.
Blue whale (Balaenoptera musculus) was detected with COI in the two January site 2 PCR replicates in both HTS runs. Genomic

| DISCUSSION
The aim of this study was to compare the performance of metabarcoding markers targeting either nuclear (18S) or mitochondrial (16S or COI) DNA for characterizing zooplankton communities in terms of taxonomic coverage and resolution, correspondence with morphology-based identification, and their ability to quantify relative abundance. We also explored the reproducibility and impact of thermal cycling protocol on OTU and taxon detection for COI.
The increased number of OTUs and slightly greater number of taxa detected using a single low annealing temperature compared to the touchdown protocol for COI demonstrates the importance of considering thermal cycling protocols in metabarcoding studies. The detection of additional taxa supports the use of low annealing temperatures to maximize taxonomic coverage for any given marker (Clarke et al., 2014;Sipos et al., 2007). Amplification and sequencing of COI amplicons generated using the low annealing temperature protocol was highly repeatable in spite of using different sequencing chemistries (v2 and v3) and number of sequencing cycles (2 × 250 and 2 × 300), with OTUs representing >0.1% of reads in one replicate almost always detected in the corresponding replicate. Estimating the reproducibility of rare OTUs by sequencing a set of technical replicates in each study T A B L E 4 Results of SIMPER analysis comparing January and March zooplankton samples identified using either morphology or three metabarcoding markers. The top five contributors for each method or marker are shown  could provide a means to establish an abundance threshold for OTU retention, as nonreproducible OTUs are more likely to represent PCR or sequencing artifacts (De Barba et al., 2014;Ficetola et al., 2015).
The use of different polymerases for COI (AmpliTaq Gold) and 16S/18S (Phusion) complicates the marker comparison in this study.
Phusion polymerase could not be used for the COI marker due to inosine residues in one primer. The use of a nonproofreading polymerase could inflate the number of OTUs detected for COI. Similarly, using a proofreading polymerase could explain the reduced number of OTUs detected with 18S in this study compared to other zooplankton metabarcoding studies Chain et al., 2016). However, we feel comparisons of the number of taxa detected with each marker are robust.
Many of the results of this study are consistent with well-recognized characteristics of 18S and mitochondrial markers identified in previous studies. As per Tang et al. (2012), we find 18S provided poor species resolution compared to COI and 16S mitochondrial markers, despite targeting a longer variable region (V4) than used in many studies (e.g., Jarman et al., 2013). The taxonomic coverage reflected the target taxa for each marker to some degree, with Uni18S detecting one more zooplankton phylum than COI (Table 3), but COI detecting more metazoan phyla. In contrast to the findings of Zhan et al. (2014), the mitochondrial 16S marker also provided slightly better taxonomic coverage than Uni18S within the crustacea (four vs. three orders, 11 vs. 10 families).
Although direct comparison is complicated given the use of different DNA polymerases, the Cop16S primers and those used by Zhan et al. (2014) bind to almost identical sites in the mitochondrial 16S rDNA.
The lower annealing temperature used in this study (45°C vs. 50°C) may have contributed to the broader coverage of the Cop16S marker observed. As highlighted in many metabarcoding studies to date, all three markers suffered from incomplete reference databases, with 2.1-33.7% of OTUs unassigned for each marker, and many OTUs assigned to congeneric species of those identified using morphology.
All three metabarcoding markers revealed that distinct zooplankton communities were present at the three time points (Figure 5), hence are all potentially valuable tools for monitoring seasonal and temporal change. However, the markers differed in the taxa identified as driving the difference between months. For example, COI and Cop16S identified the cladocerans Penilia spp. and Podon intermedius, and the euphausiid Nyctiphanes australis as the key taxa driving the difference between January and March samples (Table 4), whereas Uni18S failed to detect these taxa in any month (Figure 3). It is unclear why Uni18S failed to detect euphausiids and cladocerans, despite their high proportional representation in both the morphology-based counts and mitochondrial HTS reads. Although 18S sequences were not available for Nyctiphanes australis or Podon intermedius, no primer-template mismatches were identified in the available sequence data for Penilia avirostris, or congeneric Podon or Nyctiphanes species. However, the predicted Uni18S amplicon for Penilia avirostris was more than 100 bp longer than the mean length of the Uni18S OTUs in this study (419 ± 26 bp, mean ± SD). Amplicon length polymorphism has been shown to cause differential amplification and taxonomic bias in bacterial and fungal HTS studies (Ihrmark et al., 2012;Ziesemer et al., 2015) and may explain the failure to detect cladocerans with Uni18S. Predicted Uni18S amplicon lengths for Euphausiidae other than Nyctiphanes australis are close to the mean length observed in this study (431 bp), and thus, it remains unclear why euphausiids were not detected with 18S. If we had only analyzed our samples with 18S instead of three metabarcoding markers and morphology, we would have failed to detect the important contribution of the cladocerans and euphausiids to the beta-diversity pattern observed, highlighting the benefit of using multiple markers or approaches for biodiversity assessments.
Several metabarcoding studies have identified a positive relationship between biomass and the number of HTS reads for a given taxon for both nuclear rDNA and mitochondrial markers (e.g., Evans et al., 2016;Sun et al., 2015). In this study, we examined the relationship for calanoid copepods as all markers were capable of PCR-amplifying this group, and we could estimate biomass from morphology-based counts using the approach of Hirai et al. (2015). We found all three markers displayed a positive relationship between biomass and HTS reads for calanoid families. Interestingly, both Pearson and Spearman rank correlations were typically strongest for the nuclear 18S rDNA marker.
Many studies have shown strong correlations between biovolume and 18S copy number for a range of eukaryotic taxa, including metazoans (Godhe et al., 2008;de Vargas et al., 2015;Zhu, Massana, Not, Marie, & Vaulot, 2005). Several factors could reduce the strength of the correlation between biomass and HTS reads in this study, including using separate samples for DNA-based and morphology-based identification, and estimating biomass using conversion factors rather than direct measurement. Our results suggest the number of 18S rDNA HTS reads provides a better proxy for calanoid copepod biomass than mitochondrial markers, but should be confirmed and extended to other zooplankton groups using well-characterized samples such as mock communities.

| CONCLUSION
Our study extends previous research demonstrating the value of metabarcoding for rapidly surveying biodiversity, including the potential to identify nonmetazoans and developmental stages such as eggs and T A B L E 5 Pearson and Spearman rank correlation coefficients between proportion of calanoid copepod family biomass and high-throughput sequencing reads for three metabarcoding markers.
p-values are shown in brackets larvae, as well as environmental DNA (whale) and parasite detections not possible with traditional methods. It is worth noting that additional data on developmental stage or sex can be obtained using morphology that is not available using a metabarcoding approach on its own. Our results show that different metabarcoding markers and/or protocols provide slightly different views of genetic biodiversity. Comparisons with morphology-based datasets are useful for showing which markers best match traditional datasets and highlighting potential shortcomings of markers. Standardization of thermal cycling protocols and markers will be required to allow valid comparisons between studies.
In contrast to previous studies that recommend 18S as the most suitable marker for surveying zooplankton communities (Zhan et al., 2014), we find COI provided similar coverage of zooplankton phyla, but better taxonomic resolution (Table 3) and agreement between morphology-and DNA-based identifications (Figure 4). Although the use of COI for metabarcoding has been questioned due to lack of conserved primer-binding sites (Clarke et al., 2014;Deagle et al., 2014), the taxonomic coverage and resolution provided by degenerate COI primers, combined with a comparatively well-developed reference sequence database, make them valuable metabarcoding markers for biodiversity assessment. The potential for retrieving at least semiquantitative abundance data was confirmed for all markers, with 18S providing the strongest relationship between calanoid copepod biomass and number of HTS reads. However, alternatives to PCR-based approaches may be required to accurately quantify species abundance with high-throughput sequencing (Dowle, Pochon, Banks, Shearer, & Wood, 2015;Zhou et al., 2013).