The impact of intragenomic rRNA variation on metabarcoding‐derived diversity estimates: A case study from marine nematodes

This is an open access article under the terms of the Creative Commons Attribution License, which permits use, distribution and reproduction in any medium, provided the original work is properly cited. © 2020 The Authors. Environmental DNA published by John Wiley & Sons Ltd 1Department of Nematology, University of California, Riverside, Riverside, CA, USA 2School of Fisheries and Ocean Sciences, University of Alaska, Fairbanks, AK, USA

One major problem of eukaryotic metabarcoding studies is how to identify and circumvent the issue of intragenomic rRNA variation. Bioinformatic tools for copy correction of rRNA metabarcoding datasets have been developed exclusively for prokaryotes, since bacterial and archaeal species can also have multiple rRNA gene copies and exhibit some level of intragenomic polymorphism (Callahan et al., 2019;Sun, Jiang, Wu, & Zhou, 2013). Tools such as PICRUSt (Langille et al., 2013), CopyRighter (Angly et al., 2014), and PAPRICA (Bowman & Ducklow, 2015) rely on well-curated databases and phylogenetic frameworks to predict and correct for 16S rRNA copy number in bacterial and archaeal species. While many eukaryotic studies utilize alternate metabarcoding loci (e.g., mitochondrial genes with lower levels of intragenomic variation and higher taxonomic resolution at the species level; Kumar et al., 2017), the application of other gene regions is problematic in many metazoan groups such as marine nematodes which lack "universal" mitochondrial primer binding sites (Blaxter et al., 2005), and where the diversity of undescribed species is especially high (Blaxter, 2016). Thus, there is a pressing need to characterize patterns of rRNA variation across a broad range of microbial eukaryote taxa. Research efforts in this area are needed in order to elucidate the relationship between genomic patterns and traditional specimen-level data, identify correlations between molecular variation and ecology and life-history traits, and rapidly expand public database resources (thereby increasing the accuracy and precision of taxonomy assignments for eukaryotic MOTUs).
In the present study, we assessed patterns of 18S rRNA variation from individual marine nematode specimens, including potentially confounding effects on diversity estimations (e.g., relic DNA: DNA remaining in the marine sediment; gut contents: potential nematode prey items). Using a previously published 18S rRNA metabarcoding dataset generated from individual marine nematodes (Schuelke, Pereira, Hardy, & Bik, 2018), we estimated amplicon sequence variants (ASVs) using DADA2 (a software tool that models and corrects Illumina-sequenced amplicon errors, Callahan et al., 2016). Metabarcoding data were analyzed in conjunction with fulllength reference sequences (~1,600 bp Sanger sequences of the 18S rRNA gene) generated for a subset of nematode morphospecies. Our investigation focused on three main questions: (a) From an evolutionary perspective, do nematodes with similar life histories share similar genomic patterns? We hypothesized that phylogenetically related nematodes (i.e., within the same genus or clade) would show similar ASV profiles or "Head-Tail" patterns of dominant and minor with specimen-level data and alleviate the confounding effects of intragenomic gene variants in studies of microbial eukaryotes.

K E Y W O R D S
18S rRNA gene, high-throughput sequencing, intragenomic variation, marine nematodes, metabarcoding MOTUs (Porazinska, Giblin-Davis, Esquivel, et al., 2010;Porazinska, Giblin-Davis, Sung, Thomas, & Others, 2010). (b) From an ecological perspective, do patterns of rRNA variation in marine nematode species correspond to known life-history traits? We hypothesized that predatory nematodes (feeding group 2B) would have more complex and diverse ASV profiles than bacterivorous nematodes (feeding group 1A). (c) Do we find consistency across different data types and methods of diversity estimation (i.e., classical morphological taxonomy, Sanger-based DNA barcoding, and high-throughput metabarcoding)? We hypothesized that rRNA polymorphisms observed in Sanger DNA barcodes would be also detected among ASVs and that morphological species would be largely congruent with molecular phylogenetic clade structures.

| 18S rRNA metabarcoding of individual marine nematode specimens
The 18S rRNA metabarcoding dataset derived from marine nematodes was previously generated as part of Schuelke et al. (2018).
Individual marine nematode specimens were picked from sediment samples representing a diverse range of habitats and geographic regions (Table S1). These same nematodes, representing 44 marine nematode genera, were morphologically identified and imaged on temporary slide mounts under light microscopy (Nikon Eclipse E600; Nikon Corporation) prior to DNA extraction and PCR (i.e., one nematode per tube; Figure 1). Where possible, nematode specimens were separated into putative morphospecies (i.e., sp1, sp2, and so on; Table S1) when morphological variation within a genus was also supported by DNA sequence differences and clade structure observed in molecular phylogenies. Additional details on sediment sample collection and sample processing are provided in Schuelke et al. (2018).
DNA extractions, PCR, and Illumina sequencing were carried out according to the methods in Schuelke et al. (2018). Briefly, individual nematodes were transferred into separate 0.2-ml PCR tubes following taxonomic identification, and DNA extractions were carried out using a Proteinase K "Worm Lysis Buffer" protocol where samples were incubated in a Thermomixer heated shaker block (Eppendorf) at 65°C and 750 rpm for 2 hr, followed by a 5-min incubation at 100°C to inactivate proteinase K. Although we did not include PCR replicates, each nematode morphospecies in our study was usually represented by multiple samples, so that we could also have some sort of replication at the species/clade level. Taxonomy blank samples (i.e., those containing only WLB, where the taxonomist simulated the transfer of nematodes into tubes) were also included as a checkpoint for potential sources of contamination (i.e., airborne microorganisms, reagent contaminants) in the laboratory. Lysates were used immediately or stored at −20°C. DNA extracts from individual nematodes were subsequently used to generate 18S rRNA metabarcoding datasets using the V1-V2 hypervariable region optimized for environmental amplification of microbial metazoan rRNA genes (F04 and R22 primers, Creer et al., 2010). Dual-index primer constructs were designed by modifying the Earth Microbiome Project Illumina amplicon protocol (http://www.earth micro biome.org/, Caporaso et al., 2012) and included a second barcode in the reverse primer.
All primer constructs and oligo sequences have been made available on Figshare (https://doi.org/10.6084/m9.figsh are.5701090). To avoid contamination, PCRs were set up in a dedicated laminar-flow hood that underwent daily sterilization with bleach and UV light. All PCR products were purified using magnetic beads following the manufacturer's protocol (Agencourt AMPure XP beads; Beckman Coulter). Sample concentrations were subsequently measured using a Qubit ® 3.0 Fluorometer and a Qubit ® dsDNA HS (High Sensitivity) Assay Kit (Thermo Fisher Scientific). Normalization values were calculated to ensure that approximately equivalent DNA concentrations were pooled across all samples (including controls and blank samples). The final library was subjected to an additional magnetic bead cleanup step, followed by size selection on a BluePippin (Sage Science) to remove any remaining primer dimer and isolate target PCR amplicons within the range of 300-700 bp. A Bioanalyzer trace was run on the size-selected pool as a quality control measure, and the pooled 18S rRNA amplicon library was sequenced on an Illumina MiSeq platform (2 × 300-bp paired-end run) at the UC Davis Genomics Core Facility (Schuelke et al., 2018). All wet laboratory protocols and downstream bioinformatics scripts used in this study have been deposited on GitHub (https://github.com/BikLa b/nemat ode-rRNA-variants).

| Generating nematode reference sequences via Sanger sequencing
We used Sanger sequencing to generate 18S rRNA barcodes from a subset of 86 nematode specimens included in the original study (Schuelke et al., 2018) where sufficient volumes of DNA template remained. We additionally isolated another 24 nematode specimens collected from the same Arctic samples (Table S1) in order to improve our 18S rRNA reference database and increase the accuracy of nematode taxonomic assignments derived from ASV sequences.

| Illumina data processing and generation of ASVs
Raw Illumina data were demultiplexed using a custom script for handling dual-index barcode combinations (available on GitHub). Next, the demultiplexed 18S rRNA dataset was analyzed in QIIME2 version 2019.10 (Bolyen et al., 2019) where primer sequences were trimmed using the cutadapt plugin (Martin, 2011). Denoising was based on optimal parameters (forward and reverse reads truncated at 232 and 253 bp, respectively, and a median PHRED score of ≥30). ASVs were subsequently generated using the DADA2 algorithm (Callahan et al., 2016). The estimation of ASVs is based on 100% sequence identity (or 1 nucleotide difference) and accounts for every single mutation in the dataset, thus allowing us to determine the frequency and importance of base pair changes in nematode rRNA gene copies. DADA2 was run using default parameters, including default chimera checking parameters (consensus option, which carries out de novo chimera identification and removes ASVs identified as chimeras if they are present in a significant fraction of samples). Taxonomy assignments for ASVs were obtained via the BLAST+ consensus taxonomy classifier (Camacho et al., 2009), using a custom reference database trimmed at the Illumina barcode length (~350 bp) and a minimum confidence value of 70%. Our custom reference database consisted of the QIIME-formatted SILVA 132 release (https://www.arb-silva. de/no_cache /downl oad/archi ve/qiime /), a collection of marine nematode 18S rRNA barcodes produced by Macheriotou et al. (2019), and the nearly full-length 18S rRNA gene sequences generated from marine nematodes as part of this study (database files available on Github: https://github.com/BikLa b/nemat ode-rRNA-variants).

| Bioinformatic analyses of ASV patterns and phylogeny reconstruction
Our final dataset differs slightly from that presented by Schuelke et al. (2018). We reduced the 18S rRNA metabarcoding dataset down to 227 individual nematode samples by removing samples with ambiguous nematode taxonomy (i.e., specimens identified to order level and above), relatively low-read counts (i.e., samples with <500 reads after DADA2 analysis), and very high-read counts (e.g., 4 samples had >100,000 reads). Thus, we have removed all outlier samples that could impact the interpretation of overall patterns.
The resulting ASV table was summarized and analyzed in order to assess patterns and sources of rRNA variation associated with individual marine nematodes. First, seven initial metrics were calculated for each nematode specimen (Table S2): the total number of demultiplexed and quality trimmed reads, the number of reads retained by DADA2, total number of ASVs, the number of ASVs with taxonomy assignments to Nematoda, the number of ASVs with ≥1% relative abundance that had taxonomy assignments to Nematoda, the relative abundance of the dominant ASV, and F I G U R E 1 Workflow diagram of single-nematode metabarcoding. Freeliving marine nematodes were isolated from processed marine sediment samples, mounted on glass slides, and identified down to genus level and/or putative morphospecies. DNA extractions and metabarcoding PCRs were subsequently carried out for single worms, followed by Illumina MiSeq sequencing and downstream bioinformatic analysis. Figure  created with BioRe nder.com whether or not the dominant ASV corresponded to the expected nematode morphospecies. Nematode samples were also classified into four categories (Table 1)  ASV matched the nematode ID, and exhibited relative abundances of <75% and ≥10%, respectively. For some samples, the lack of a dominant or highly dominant ASV matching the expected nematode ID appeared to be related to the coamplification of DNA from other taxa (e.g., fungi, polychaetes, other nematode genera; Table   S2 and discussion below).
Next, we explored the relationship between metabarcoding taxonomy assignments and ASV profiles obtained from individual nematodes. In particular, we evaluated patterns across samples representing the same nematode family, genus, phylogenetic clade (i.e., nematode specimens having the same dominant ASV or Sanger DNA barcode), and feeding group. The relationship between the number of ASVs and number of reads retained by DADA2 was estimated using Pearson's correlation coefficient with the package ggpubr v.0.2 in R v.3.6.0 (R Core Team, 2019). Data normality was assessed using Shapiro-Wilk's method, and Kruskal-Wallis (K-W) tests were used to assess differences among nematode trophic groups and nematode families with the package FSA v0.8.24 in R version 3.6.0 (R Core Team, 2019). The Mann-Whitney U test with adjustments for p-value (BH method; Benjamini & Hochberg, 1995) was used for pairwise comparisons (Zar, 2010). We also extracted diversity estimates, including Shannon diversity H′ (Log 2 ), Margalef's species richness (d), and Inverted Simpson (D) diversity from the ASV table using PRIMER v7 software (Clarke & Gorley, 2015) and compared among nematode families and feeding groups. Ranked ASV distribution curves (i.e., number of reads vs. ranked ASV) were generated with ggplot2 v.3.1.1 in R (Wickham, 2016) to explore variation on ASV abundance derived from individual nematode specimens (and also averaged across nematode specimens representing the same nematode family) in order to identify common trends (e.g., Head-Tail  (Darriba, Taboada, Doallo, & Posada, 2012). Gamma parameters were estimated from log-likelihood units, and bootstrap support was automatically calculated for the best scoring ML tree (Stamatakis, 2014). For each nematode specimen, we subsequently plotted the ASV profiles (i.e., relative abundance of all ASVs) as well as the taxonomic assignments of ASVs according to the following categories: (a) ASVs corresponding to the expected nematode morphospecies, (b) ASVs corresponding to other nematodes, and (c) ASVs corresponding to other non-nematode eukaryotes. All tree annotations, including plotting associated TA B L E 1 Number of nematode samples (also presented as a percentage of the total) belonging to the different sample categories. Classification is based on the most common ASV profiles (i.e., RA% of ASVs in a sample, see Section 2 for further details) metadata, were conducted using the package ggtree v.1.16.0 in R (Yu, Lam, Zhu, & Guan, 2018).
Similarly, we explored the ASV profiles of nematodes representing different feeding groups (i.e., 1A, 1B, 2A, and 2B) and tested whether or not the proportion of ASVs assigned to the three previous categories (i.e., expected nematode morphospecies, other nematodes, and other non-nematode eukaryotes) differs among these groups. Kruskal-Wallis and normality tests were carried out as described above. Next, we plotted the ASV profiles (number of reads and relative abundance of ASVs) of selected nematode specimens representing each feeding group (3 specimens per group) in a phylogenetic context. For this analysis, phylogenetic trees were built in MEGA 7 using the neighbor-joining method. Genetic divergence using p-distance (pairwise deletion of gaps/missing data) was estimated among sequences (Kumar, Stecher, & Tamura, 2016). Tree annotations were carried out using ggtree as described above (Yu et al., 2018).

| Diversity estimates with DADA2
The total number of ASVs obtained per sample ranged from 1 to 91 (avg. = 11, median = 9) and showed a significant but weak positive correlation with the number of retained reads (Figure 2a). Moreover, the number of ASVs did not vary significantly among nematode feeding groups (Kruskal-Wallis, X 2 = 5.83, p = .12) but was signif- The Head-Tail pattern of rRNA variants associated with individual nematodes (ranked ASV curves; Figure 4) shows that the relative abundance of ASVs decreases very quickly and produces a "short tail" of rare ASVs with low-read counts. This was a very consistent trend across all four main nematode families, although specimens from the Oxystominidae displayed a slightly longer tail (Figure 4a).
Overall, the dominant ASV for each nematode was about an order of magnitude more abundant than the second most abundant ASV.
Lower variations in relative abundance were observed among less abundant ASVs (Figure 4b). We also observed that the relative abundance of the highest ranked ASVs for each worm was in the 81%-90% range and that only one nematode sample had an ASV with a relative abundance of 100% (e.g., Nem.227: Halichoanolaimus sp., Selachinematidae; Table S2).

| ASV profile patterns among nematode samples
To assess patterns of intragenomic rRNA variation in individual nematode specimens, we quantified the total number of ASVs per specimen (and the relative abundance of each ASV) with QIIMEderived assignments to nematode taxa. For 204 out of 227 nematode samples (90%), the ASV with the highest read count in each specimen's metabarcoding profile matched our expected morphological nematode genus ID (obtained via light microscopy). For the remaining 23 nematode samples (10%), the top ASV did not correspond to a nematode sequence (e.g., having a taxonomy assignment to a fungal or polychaete species), or alternatively, the top ASV was assigned to a different nematode genus than expected. However in both of the latter scenarios, our single-worm metabarcoding profiles always contained an ASV with the expected taxonomy assignment (the nematode genus ID determined via light microscopy), and these species-barcode ASVs were always present at >1% relative abundance.
Based on the relative abundance profiles of all per-specimen ASVs with taxonomic assignments to nematodes, our samples were most commonly shown to have a "highly dominant" or "dominant" ASV present at >75% relative abundance in the overall profile (44% of specimens; Table 1). The second most common category was that of "intragenomic variation" where two competing nematode ASVs had <75% relative abundance in the specimen metabarcoding profile (35% of specimens; Table 1); often these two ASVs had relative abundance similar to each other and only differed by 10%-20% in terms of their overall relative abundance in the specimen metabarcoding profile. Moreover, half of the samples in the intragenomic variation category had at least three ASVs with a relative abundance ≥10%. The third most common category (20% of specimens, Table 1) was "no dominant ASV" mostly due to the coamplification of nontarget DNA, including other nematodes and fungi. Still within this category, further analysis of those samples where the most abundant ASV did not match the expected nematode (i.e., 23 out 46 samples) revealed that these ASVs were usually unique to a sample and not shared across the dataset. These findings suggest that the existence of other organisms' DNA in the samples most likely represents a real biological phenomenon, as opposed to sample contamination and/ or a sequencing artifact derived from the Illumina platform (e.g., tag jumping; see discussion for further detail).

| ASV profile patterns among taxonomic/ phylogenetic and feeding groups
Phylogenetic analysis allowed us to assess whether or not closely related nematode species share similar ASV profiles (e.g., same dominant ASV and ASVs with similar relative abundance values).
Overall, ASV profiles were highly consistent across nematode specimens representing the same nematode family (Chromadoridae, Comesomatidae, Desmoscolecidae, and Oxystominidae; Figure 5 and Figure S2). Within the four nematode families we assessed indepth, ASV patterns seemed to be clade specific-in other words,  166 Chromadoriade sp5). Variation in ASV profiles usually occurred when the dominant ASV did match a different nematode group or another eukaryotic taxon (e.g., fungi, polychaete) instead of the expected nematode genus ID. When comparing the overall ASV taxonomy associated with these inconsistent nematode samples, we also observed a higher proportion of ASVs matching groups other than the nematode morphospecies ( Figure S2), suggesting higher coamplification of nontarget eukaryotic taxa. Moreover, nematode specimens with higher putative levels of coamplification tended to have a higher number of ASVs compared to other members of the same clade.
With respect to trophic diversity, we did not find significant differences in ASV profiles among nematode feeding groups (i.e., 1A, 1B, 2A, and 2B); the total number of reads (data not shown) and the number of ASVs did not vary significantly across these four groups (Figure 2b). Similarly, when ASVs are categorized based on their taxonomic assignment (i.e., expected nematode morphospecies, other nematodes, and other non-nematode eukaryotes), the pattern is very consistent across nematode feeding groups. As expected, there was a significantly higher proportion of ASVs matching the expected nematode morphospecies ( Figure S3), whereas no significant differences are observed between ASVs assigned to other nematodes and non-nematode eukaryotes (except for feeding group 1A, where poorly sampled nematode clades may F I G U R E 5 Composition and variation of DADA2 ASV profiles recovered across different phylogenetic clades representing four nematode families: Chromadoridae (orange), Comesometidae (green), Desmoscolecidae (purple), and Oxystominidae (pink). On each branch, the taxonomic assignment of the dominant ASV recovered from each nematode specimen is indicated as the correct nematode morphospecies (gray circle), another nematode (blue triangle), or a non-nematode eukaryote (orange square). (a) Maximum-likelihood tree reconstruction based on the representative ASV sequence (i.e., matching the morphological nematode ID). Tree is rooted on the branch leading to all representatives of the family Oxystominidae (BS > = 70% are shown). (b) ASV profiles based on the relative abundance of all ASVs (presented in a continuous color-coded scale) for selected nematodes specimens, indicated by dotted blue boxes in panel a. (c) Taxonomic assignments of ASVs are summarized by color: ASVs corresponding to the expected nematode morphospecies (gray), other nematodes (blue), and other non-nematode eukaryotes (orange). A complete ML tree including all nematode specimens representing these four main families is provided in Figure S2 explain this statistical result). On the other hand, variation among ASV categories may show a correlation with nematode taxonomy, including variations within clades (e.g., among closely related species and even among nematode specimens representing the same morphospecies; Figure S3).

| Intragenomic variation as a source of rRNA patterns
Nematode specimens exhibiting intragenomic variation represented over one-third of the samples in our dataset (  Figure S4, Table S3). In some nematodes, intragenomic rRNA variants diverged by as much as 5%-20% sequence identity. However, the location of polymorphic sites (observed in multiple sequence alignments) appeared to be clade specific. As an example, nematode specimens representing Dichromodora sp. had highly divergent ASVs (p-distance: mean = 2.57%, range = 0%-20.95%). However, we observed that the Sanger-derived nematode barcode (i.e., the reference sequence for a morphospecies) exhibited 100% pairwise identity to the dominant ASV in most of the cases ( Figure S4), suggesting that the dominant ASV most likely represents the rRNA variant sequence with the highest copy number within a nematode genome.

| Consistent levels of rRNA variation across individual nematode specimens
Metabarcoding studies focusing on prokaryotic and eukaryotic diversity have heavily relied on rRNA genes owing to both theoretical and practical reasons (Bik, Porazinska, et al., 2012;Bik, Sung, et al., 2012;Deiner et al., 2017;Ibal, Pham, Park, & Shin, 2019;Leasi et al., 2018;Lindner et al., 2013). However, the multicopy nature of rRNA genes and the existence of intragenomic variation among these gene copies confound our ability to accurately survey biodiversity and understand patterns and processes in nature. Numerous studies, whether using cloning techniques coupled with Sanger sequencing or newer HTS technologies such as Illumina, have extensively documented intragenomic rRNA variation in prokaryotes and eukaryotes (Alanagreh, Pegg, Harikumar, & Buchheim, 2017;Brabec, Kuchta, Scholz, & Littlewood, 2016;Lindner & Banik, 2011;Nieto Feliner, Gutiérrez Larena, & Fuertes Aguilar, 2004;Pereira & Baldwin, 2016;Sun et al., 2013). A common feature observed in eukaryotic rRNA datasets is the existence of a "Head-Tail" pattern as defined by Porazinska, Giblin-Davis, Esquivel, et al. (2010), whereby the "Head" represents a highly dominant MOTU (i.e., containing the majority of sequence reads) and the "Tail" comprises a set of rarer MOTUs each containing a decreasing number of reads.
In the present study, we explored 18S rRNA variation in marine nematodes from different perspectives, including assessment of both intragenomic variation (i.e., within-individual) and intraspecific variation (i.e., among specimens representing the same nematode morphospecies). Our results showed that Head-Tail patterns were a common and consistent phenomenon among individual nematode specimens and all major nematode families. Moreover, Head-Tail patterns in our dataset were persistent even when using the newest bioinformatic algorithms that employ error-correction strategies to generate amplicon sequence variants (ASVs), which essentially create MOTUs clustered at 100% sequence identity. We observed that ASV pipelines such as DADA2 return a smaller number of ASVs associated with individual nematode specimens, as well as a much shorter tail of rare MOTUs compared to other methods (e.g., VSearch, data not shown). is estimated to harbor 100-150 rRNA copies (Stein et al., 2003).
In this study, we consistently recovered <100 nematode ASVs associated with individual worms (Figure 2), confirming that the DADA2 pipeline produces outputs that are consistent with known biological patterns of intragenomic rRNA variation in nematodes (Porazinska, Giblin-Davis, Esquivel, et al., 2010;Porazinska et al., 2009). Head-Tail patterns in our nematode samples did not appear to be influenced by sequencing depth, exhibiting only a weak correlation between ASVs and the number of reads (Figure 2a,b). In addition to providing high resolution between rRNA variants (i.e., differences of one nucleotide), DADA2 has been shown to outperform other methods by recovering more real variants and fewer spurious sequences (Callahan et al., 2016). Previous studies of marine nematodes also indicate that DADA2 ASVs produce the most realistic estimates of overall species richness, accurately reporting 30 out of 39 nematode species in a mock community assessment (Macheriotou et al., 2019). Thus, DADA2 ASVs are able to consistently recover variant rRNA copies that persist within eukaryotic genomes while simultaneously providing an accurate view of community assemblage structure. However, quantifying the exact number of intragenomic rRNA variants using metabarcoding datasets remains challenging, even when using ASV profiles resolved at the level of a single specimen.

| Clade-specific patterns of rRNA variants and recovery of nontarget taxa
Unsurprisingly, most of the dominant ASVs recovered in our dataset were associated with the expected nematode morphospecies, genus, or family. Although the number of 18S rRNA reads, and consequently the number of ASVs, varied across nematode specimens (Table S2), we observed that ASV patterns across nematode taxa are most likely to be clade specific ( Figure 5, Figure S3). Individual nematodes representing the same morphospecies overall shared the same ASV nucleotide sequences (both dominant and minor rRNA variants), and these ASVs were recovered at very similar relative abundances across replicate worms from the same morphospecies, thus resulting into almost identical ASV profiles (i.e., ASV richness and diversity; see Dichromadora sp. and Sabatieria sp5 in Figure 5). Newer HTS technologies are highly sensitive in detecting very low amounts of DNA, and the recovery of nontarget organisms (especially when using "universal" metabarcoding PCR primers) could lead to misinterpretations of microbial eukaryote assemblage structure, especially when generating metabarcoding data from bulk sediments. An alternative explanation for those cases where the dominant ASV did not match the expected morphospecies (especially when the recovered match was a different nematode genus) is the issue of "tag jumping" in Illumina-based metabarcoding studies, where reads may be mis-assigned to samples during demultiplexing and obscure diversity patterns (Schnell, Bohmann, & Gilbert, 2015).
Errors (i.e., false positives) due to tag jumping are generally reported to be relatively low: 0.01%-0.03% specifically for phiX (Hänfling et al., 2016;Olds et al., 2016); or 2.1%-2.6% within specific libraries (Schnell et al., 2015). However, these levels will certainly vary among metabarcoding studies depending on the sequencing strategy adopted (single-versus paired-end barcoding, Illumina platform used, etc.). Although we cannot entirely rule out the effects of tag jumping in this study, the consistency of our results suggested that, if present, its impact was minimum.
Recovery of some types of nontarget DNA can provide novel scientific insights, especially if this information is explicitly linked with specimen-level data. For example, many nematodes and microbial metazoan species are known to harbor microbial symbionts and parasites, such as microsporidian taxa infecting nematodes in both soil and marine environments (Gibson & Morran, 2017;Sapir et al., 2014). One recent metabarcoding study also reported distinct community profiles of protist taxa associated with metazoan DNA extracts (dominated by parasitic and pathogenic species, Geisen, Laros, Vizcaíno, Bonkowski, & de Groot, 2015) compared to protist mock community assemblages, indicating that host-associated microbial pathogens remain poorly characterized despite their seemingly common occurrence in natural habitats worldwide. However, it is important to note that coamplification of nontarget taxa does not necessarily imply ecological interactions, especially since most metabarcoding approaches cannot differentiate between living species (extant microbial communities) and relic DNA (dead organisms and extracellular DNA).
In the present dataset, some nematode groups such as the family Desmoscolecidae (Figure 5b) appeared to have a higher proportion of ASVs with taxonomy assignments to other nematodes and non-nematode eukaryotes. Morphologically, Desmoscolecidae are small nematodes (often <500 μm) characterized by short, ovalshaped bodies with conspicuous transverse rings also known as "desmen," which are composed of secretions and sediment particles including grains of sand (Decraemer & Rho, 2013;Platt & Warwick, 1988). Marine sediment particles are known to readily bind DNA and shield nucleotides from degradation (Torti, Lever, & Jørgensen, 2015), and the persistent attachment of sediment particles on the cuticle of Desmoscolecid nematodes may greatly facilitate the coamplification of relic DNA from the environment. Similarly, we noticed that some nematode specimens in the family Comesomatidae (e.g., Cervonema sp2) also displayed a large number of ASVs with taxonomic assignments to other nematodes and other non-nematode eukaryotes. Contrary to Desmoscolecidae nematodes, representatives of the Comesomatidae are much larger with a fusiform body shape and a smooth cuticle. Based on their buccal cavity morphology, this group is classified as nonselective deposit feeders (i.e., feeding group 1B sensu Wieser, 1953) and therefore is likely to ingest larger sediment particles and potentially relic DNA from the environment. Comesomatidae nematodes have been observed to prey on other small nematodes (Moens & Vincx, 1997) as well as being attracted to dead animals (Gerlach, 1977). Our single-nematode metabarcoding approach could not unequivocally determine whether recovery of nontarget taxa represented false positives due to tag jumping (see above), bioinformatic artifacts (e.g., inaccurate taxonomy assignments resulting from sparse rRNA databases), chance amplification of relic DNA, or active feeding or behavioral strategies of each nematode species. The coamplification of nontarget DNA is also impacted by the overall integrity of the specimen: Degraded nematode specimens and low-quality host DNA may increase the chance of amplifying microbial associates, relic DNA, and gut contents, as well as be impacted by potential primer biases which may promote recovery of certain taxa over others (Lanner, Curto, Pachinger, Neumüller, & Meimberg, 2019;Sow et al., 2019).
Our assessment of nontarget nematode taxa is also heavily impacted by database size, completeness, and the bioinformatic algorithm and parameters used to determine taxonomy assignments from metabarcoding ASVs (Macheriotou et al., 2019). In the present dataset, the accuracy of taxonomic assignments for single-nematode ASV profiles appears to vary greatly depending on phylogenetic distance and the number of closely related 18S rRNA sequences present in our reference databases (SILVA and our local custom Sanger database of nematode specimens  (Nem.269,Nem. 274,Nem. 276) were either erroneously assigned to a nematode genus that was well-represented in our database (e.g., to Sabatieria, a genus where many full-length DNA barcodes are available) or only assigned to higher taxonomic ranks (i.e., order or class).
However, we were able to circumvent some database issues by using phylogenetic analysis of 18S rRNA data to confirm placement of these morphospecies in the correct clades.

| No correlation between rRNA variation and nematode life-history traits
Although different nematode feeding strategies could potentially result in distinct ASV profiles, we did not depict a consistent pattern across our 18S rRNA dataset. The number of ASVs did not significantly differ among nematode feeding groups (1A, 1B, 2A, 2B sensu, Wieser, 1953; Figure 2c). Moreover, when we examined the taxonomic assignment of ASVs (i.e., matching the expected nematode morphospecies, another nematode, or a non-nematode eukaryote), we observed a high degree of variation within feeding groups ( Figure   S3). The majority of high-abundance ASVs matched the expected nematode morphospecies, and these sequences were always significantly different from ASVs matching other nematodes and nonnematode eukaryotes. These results further support the idea that ASV profiles are likely to correspond more strongly to nematode phylogenetic clades rather than (potentially arbitrary) ecological classifications such as feeding groups (Moens, Bouillon, & Gallucci, 2005).
We initially hypothesized that predatory nematodes (i.e., feeding group 2B) would have more complex ASV profiles compared to other nematode groups, owing to higher prey diversity, larger and more complex buccal cavities, and active feeding strategies (Jensen, 1987). This hypothesis was readily disproven by the present dataset, but we did recover some potential instances of potential predator-

| Integration of data types helps clarify and circumvent intragenomic rRNA variation
In this study, we integrated traditional morphology-based taxonomy and molecular approaches, including Sanger sequencing and HTS, to evaluate the variation of 18S rRNA sequences associated with individual marine nematode specimens. High-resolution morphological taxonomy (i.e., at the genus or species level) of microbial metazoa such as nematodes is challenging, time consuming, and requires specialized taxonomic expertise (Bik, Porazinska, et al., 2012;Bik, Sung, et al., 2012;De Ley et al., 2005;Holovachov, Haenel, Bourlat, & Jondelius, 2017). Here, our integrated morphological-molecular approach allowed us to take into account all lines of evidence to delimit species and assess overall levels of biodiversity. "Reverse taxonomy" was often the most useful approach (Kanzaki et al., 2012;Markmann & Tautz, 2005), where the variation in metabarcoding ASVs and Sanger-derived 18S rRNA sequences guided our reassessment of morphological variation and helped to refine our initial nematode identifications.
For example, in the family Desmoscolecidae, we were able to identify 13 putative morphospecies, whose variation at the molecular level was also supported by morphological differences previously overlooked during our initial light microscopy observations.
Similarly, a large number of putative nematode morphospecies were also identified in the three other major families viz. Comesomatidae, Chromadoridae, and Oxystominidae. Our results also showed that metabarcoding data strongly agree with Sanger sequencing data.
For example, in most cases the Sanger-derived 18S rRNA barcode of nematode specimens was 100% identical to the dominant ASV sequence recovered from the metabarcoding dataset. This suggests that in studies of nematodes, the species-specific DNA barcode will be recovered as the dominant ASV and with high relative abundance.
Moreover, using this short 18S rRNA fragment (i.e., V1-V2 region), we were able to detect mutations among closely related taxa that also supported morphological variation observed under light microscopy. Overall, our results showed high congruence across all of the applied methods.
The idea of supplementing HTS metabarcoding studies with additional datasets (e.g., morphological data) varies among eukaryotic groups, but it has been lately supported by many authors (Cahill et al., 2018;Dell'Anno, Carugati, Corinaldesi, Riccioni, & Danovaro, 2015;Geisen et al., 2018;Lejzerowicz et al., 2015;Macheriotou et al., 2019). These studies have highlighted the importance of classical taxonomy in characterizing biodiversity as well as advocating for the integration of both approaches, particularly for those groups poorly and/or sparsely represented in molecular databases such as free-living marine nematodes. For example, V1-V2 regions of the 18S rRNA are commonly used for marine nematodes (Bik, Porazinska, et al., 2012;Bik, Sung, et al., 2012;Fonseca et al., 2010), whereas studies on terrestrial nematodes have targeted the V9 region (Porazinska, Giblin-Davis, Esquivel, et al., 2010;Porazinska et al., 2009). Moreover, many marine nematode groups lack formal taxonomic descriptions despite high levels of biodiversity (Miljutin et al., 2010), with many major groups having no representation at all in molecular databases. Thus, delimiting free-living nematode species and quantifying abundance from rRNA metabarcoding reads alone will continue to be fraught with difficulties unless more rapid progress is made in filling taxonomic and database gaps.

| CON CLUS IONS
The present study outlined patterns of intragenomic rRNA variation associated with individual free-living marine nematodes, providing assessments of ASV profiles across nematode phylogenetic clades and ecological feeding groups. While Head-Tail patterns and relative abundance of rRNA variants were consistent at the level of morphospecies (i.e., ASVs were recovered at very similar relative abundances across replicate worms, or higher levels of nontarget taxa were recovered some groups such as the sediment-encrusted Desmoscolecidae), the level and categorization of intragenomic rRNA variation (Table 1) appeared to vary widely across taxa.
Interpretation of broad patterns was additionally confounded by the uncertainties related to ASV taxonomic assignments and sparse eukaryotic databases. Despite our use of labor-intensive wet laboratory protocols to isolate individual nematode specimens and minimize environmental DNA contamination, we consistently recovered signals from relic DNA and potential prey/parasite taxa associated with single nematodes. However, in most samples the signal from the target nematode morphospecies represented the dominant sequence ("Head") in the ASV profiles of individual worms, indicating that metabarcoding protocols will reliably report the species-specific DNA barcode, even in the presence of intragenomic rRNA variants and low levels of eDNA derived from other sources. Furthermore, amplification of nontarget taxa can provide novel ecological insights on host-microbe interactions and feeding strategies, generating observational data and hypotheses that can be explored in future studies of microbial metazoa. The choice of metabarcoding marker locus (e.g., rRNA loci vs. mitochondrial genes), reference database composition and completeness, and read clustering and taxonomy assignment algorithms will continue to have a large impact on the assessment of biodiversity patterns and species presence/absence in HTS datasets (Holovachov et al., 2017;Macheriotou et al., 2019).
However, the use of complementary datasets (environmental metabarcoding data and specimen-level morphology and reference DNA barcodes) can provide a robust and accurate view of microbial eukaryote community assemblages, particularly for rRNA gene loci where pervasive intragenomic variation confounds our ability to quantitatively link -Omics data with individual specimens.

Additional funding for this was provided by a North Pacific Research
Board award to SH and HMB (NPRB project 1303).

CO N FLI C T O F I NTE R E S T
None declared.

AUTH O R CO NTR I B UTI O N S
TJP and TS picked and isolated nematode specimens, carried out taxonomic identifications and wet laboratory molecular protocols, and generated Illumina and Sanger sequencing data. TJP, ADS, and HMB carried out data analysis, interpreted results, and wrote the manuscript. SMH collected marine sediment samples, and HMB and SMH contributed to the conception and design of this study.