Taxonomy assignment approach determines the efficiency of identification of OTUs in marine nematodes

Precision and reliability of barcode-based biodiversity assessment can be affected at several steps during acquisition and analysis of data. Identification of operational taxonomic units (OTUs) is one of the crucial steps in the process and can be accomplished using several different approaches, namely, alignment-based, probabilistic, tree-based and phylogeny-based. The number of identified sequences in the reference databases affects the precision of identification. This paper compares the identification of marine nematode OTUs using alignment-based, tree-based and phylogeny-based approaches. Because the nematode reference dataset is limited in its taxonomic scope, OTUs can only be assigned to higher taxonomic categories, families. The phylogeny-based approach using the evolutionary placement algorithm provided the largest number of positively assigned OTUs and was least affected by erroneous sequences and limitations of reference data, compared to alignment-based and tree-based approaches.


Introduction
Metabarcoding studies based on high throughput sequencing of amplicons from marine samples have reshaped our understanding of the biodiversity of marine microscopic eukaryotes, revealing a much higher diversity than previously known [1]. Early metabarcoding of the slightly larger sediment-dwelling meiofauna have mainly focused on scoring relative diversity of taxonomic groups [1][2][3]. The next step in metabarcoding: identification of species, is limited by the available reference database, which is sparse for most marine taxa, and by the matching algorithms. In this paper we are evaluating to what extent OTUs of marine nematodes can be assigned to family level taxa using publicly available reference sequences, and which of three matching strategies, alignment-based, tree-based or phylogeny-based that is the most effective.
The reference datasets for marine nematodes are sparsely populated, as correctly pointed out in [4]. The most recent check of NCBI GenBank (February 2017) reveals that less than 180 genera and about 170 identified species of marine nematodes are included, comparing to over 530 described genera and almost 4750 described species (based on [5] with updates). This summarized number of records in GenBank does not take into consideration which genes are represented (mostly near complete or partial 18S and partial 28S rDNA), but gives the total number of entries.
Not all of these entries include sequences suitable to be used as references for metabarcoding. Since completeness of the reference databases for marine nematodes is insufficient to assign all anonymous metabarcodes (operational taxonomic units, OTUs) to species level [6], one has to consider if they can be assigned to taxonomic categories above species level, and if this type of data can be used in research.
Assignment of OTUs to nematode genera faces the same problem as assignment of OTUs to species -limited representation of identified taxa in reference databases (see above). Identification to the family level of those OTUs that cannot be assigned to any particular species or genus is the next best option. It provides enough information to group nematode OTUs into trophic [7][8] and functional [9] groups and apply ecological metrics, such as Maturity Index [10], used to evaluate the complexity and functioning of nematode communities [11]. This approach has already been applied in metabarcoding studies of terrestrial nematode communities from the Arctic and the tropics [12][13].
Although, it would be possible to generate new barcodes for marine nematodes from our study sites to supplement existing reference datasets, the purpose of the present paper is to follow the typical scenario when metabarcoding projects rely on existing databases and do not publish new reference sequences.
Identification of OTUs can be done using a number of currently available approaches and applications, several of which will be tested and compared below. In general, all taxonomy assignment methods can be grouped into four categories: alignment-based, probabilistic, tree-based and phylogeny-based.
Alignment-based approaches utilize various measures of similarity between query and reference sequences based solely on their alignment. They are implemented in VAMPS [14], Taxonerator [15] and CREST [16], or can be performed directly through BLASTN [17] function of the NCBI server (https://blast.ncbi.nlm.nih.gov/Blast.cgi). The performances of CREST and BLASTN are evaluated in details in this publication. On the other hand, since VAMPS is specifically designed for procaryotic organisms, while Taxonerator uses the same routine as BLASTN, neither one is included in this comparison.
Probabilistic approaches rely on likelihood estimates of OTU placement and include the UTAX algorithm of the USEARCH software package [18] and Statistical Assignment Package (SAP) [19]. For technical reasons, none of these tools are included in this comparison: 1) exact details of the UTAX algorithm have not been published, and thus the results produced by this approach are difficult to evaluate; 2) standalone version of SAP could not be successfully installed, while the web server (http://services.birc.au.dk/sap/server) was not stable in use and consistently returned error messages.
The tree-based approach evaluates similarity between query and reference sequences by analyzing the position of each individual OTU relative to reference sequence on the cladogram and the bootstrap support that it receives. This approach includes the following bioinformatic steps: multiple sequence alignment of short query reads with reference sequences is done de-novo using any available multiple sequence alignment tool; the dataset is usually trimmed to the barcode size; the cladogram is built using one of the phylogeny inference algorithms, most commonly Neighbour Joining followed by bootstrapping [20][21][22][23][24][25].
Phylogeny-based identification of query sequences is performed in three stages. During the preparation stage, a manually curated reference alignment is created using full length sequences of the gene that includes the barcoding region. A reference phylogeny is estimated based on this alignment. Taxonomic assignment of the query barcodes is then done by using the reference tree as constraint and testing placement of query reads across all nodes in the reference topology, with placement likelihood calculated for every combination. The highest scoring placements are retained for evaluation. This approach is implemented in MLTreeMap [26], pplacer [27] and Evolutionary Placement Algorithm (EPA) [28]. Of the three, only Evolutionary Placement Algorithm is used in this paper, since "there was no clear difference in accuracy between EPA and pplacer" (cited from [27]) in comparative tests performed in [28]. MLTreeMap is designed for taxonomy assignment of barcodes into higher-level taxonomic categories (Phylum and above) and was not suitable for our purpose.

Sampling sites, sampling, extraction and fixation
Samples used in this study were collected in two ecologically distinct locations along the west coast of Sweden. Coarse shell sand was sampled at 7-8 m depth with a bottom dredge along the north-eastern side of the Hållö island near Smögen (N 58° 20.32-20.38' E 11° 12.73-12.68').
Soft mud was collected using a Warén dredge at 53 m depth in the Gullmarn Fjord near Lysekil (N 58° 15.73' E 11° 26.10'), in the so-called "Telekabeln" site. Samples from both sites were extracted using two different techniques each. Material for metabarcoding was preserved in 96% ethanol and stored at -20°C, material for morphology-based identification was preserved in 4% formaldehyde.
Meiofauna from the coarse sand from Hållö was extracted using two variations of the flotation (decanting and sieving) technique. In the first case, fresh water was used to induce osmotic shock in meiofaunal organisms and force them to detach from the substrate. 200 ml volume of sediment was placed in a large volume of fresh water, thoroughly mixed to suspend meiofauna and sediment. The supernatant was sieved through 1000 µm sieve in order to separate and discard macrofaunal fraction. The filtered sample was then sieved through a 45 µm sieve to collect meiofauna, which was preserved either for sequencing or morphological identification. The sieving step was repeated three times. Ten replicates were preserved for molecular studies and two replicates were preserved for morphology-based observations. In the second case, a 7.2% solution of MgCl 2 was used to anaesthetize nematodes and other organisms to detach them from the substrate. Meiofauna was decanted through 125 µm sieve. Similarly, ten replicates were preserved for molecular studies and two replicates were preserved for morphology-based observations. Meiofauna from the mud samples was also extracted using two different methods: flotation and siphoning. For the flotation, fresh water was used to induce osmotic shock in meiofaunal organisms. 2.4 l volume of sediment was placed in a large volume of fresh water, thoroughly mixed to suspend meiofauna and sediment. The supernatant was sieved through 1000 µm sieve in order to separate and discard macrofaunal fraction. The filtered sample was then sieved through a 70 µm sieve to collect meiofauna. The last procedure was repeated three times. Meiofauna was collected, divided into 12 subsamples and preserved: six subsamples were preserved for molecular studies and six subsamples were preserved for morphology-based observations. For siphoning, a total volume of 12 l of sediment was transferred to a plastic container, covered with 20 cm of seawater, and left to settle overnight. Meiofauna was then collected through siphoning off the top layer of sediment and passing it through a 125µm sieve from which samples were taken. One sample was fixed in 96% ethanol, and split into six equal subsamples for molecular studies. The second sample was also split into six subsamples and preserved for morphology-based observations.

Morphology-based analysis of samples
In order to estimate nematode diversity it is usually recommended to either count and identify all nematode individuals in the entire sample, or in a subsample of predetermined volume.
The alternative, least time consuming and most commonly used option is to count a predetermined number (usually 100 or 200) of randomly picked nematodes from the sample. Unfortunately, this latter approach can be imprecise for samples with high species diversity. Moreover, since nematodes are affected by Stokes law, that causes uneven distribution of specimens of different size along the bottom of the counting dish, it is difficult to obtain randomized data with this approach. Therefore, we opted to count and identify all nematodes for all samples (or subsamples). The amount of time required for this task limited the effort to two replicates for each site and extraction method, eight in total. We appreciate that counting nematodes in only two replicates per sample is not enough to quantitatively evaluate the composition of nematode communities, it is nevertheless satisfactory to provide the list of species and genera for each sampling site and extraction method for the purpose of this publication.
All nematode specimens were identified and counted for two replicates each from Hållö flotation with MgCl 2 , Hållö flotation with freshwater and Telekabeln siphoning. Telekabeln flotation with freshwater were subsampled by taking 1/10 of the entire sample. Specimens from formaldehyde-preserved samples were transferred to pure glycerine using modified Seinhorst's rapid method [29] and mounted on glass slides using the paraffin wax ring method. All nematode specimens were identified to genus, and, when possible, to species level and placed in the classification system published in [5] and accepted in WoRMS [30] and NeMys [31] reference databases. Please note that this classification is in many cases different from the nematode classification used in GenBank [32], SILVA [33] and GBIF (www.gbif.org). The 18S rRNA marker was amplified using PCR primers modified from [2] yielding a ≈370 bp fragment that includes the V1-V2 hypervariable domains of 18S rRNA (Supplementary Figure 1).

Sequencing procedures
Illumina MiSeq library preparation was done using the dual PCR amplification method [33]. All subsequent sequencing and bioinformatic analysis steps are fully described in [6].

Preliminary taxonomic assignment using QIIME
Preliminary taxonomic assignment was done using the QIIME [35] script assign_taxonomy.py against the SILVA database [33] release 111. Default settings in QIIME used for preliminary sorting of OTUs grouped query sequences into two groups based on similarity level: to phyla at 80% similarity and to species at 97% similarity. The output for each query sequence included closest match but did not give similarity level, making it impossible to evaluate these assignments. Only two OTUs were positively identified using QIIME to species level: Viscosia viscosa (TS6.SSU58722) and Chromadora nudicapitata (HF2.SSU192072). Six more OTUs were The original output from the QIIME analysis included 145 OTUs assigned to the phylum Nematoda. Four of them were incorrectly placed among nematodes due to errors in the reference database derived from SILVA -they group with Arthropoda (HE1.SSU866120, HE6.SSU382930, HF6.SSU331569) and Phoronida (TS6.SSU559982) in all other analyses and were excluded. Two more sequences cluster with nematodes but appear to have long insertions within conserved regions (HE6.SSU358113 and TF5.SSU411806). Both of them were found only in one sample each, further supporting the idea that they are derived from an erroneous amplification product, and were removed from any further analysis. The final list of nematode OTUs includes 139 query sequences.

Taxonomy assignment of nematode OTUs using alignment-based methods
All 139 nematode OTUs were manually analysed using BLASTN 2.5.0+ [17] against the nucleotide collection of the NCBI database (http://blast.ncbi.nlm.nih.gov/Blast.cgi) on August 22, 2016 with the following settings: optimize for highly similar sequences (megablast), exclude uncultured/environmental sample sequences, max target sequences -100, sorted by max score.
Closest matches were evaluated. If the top match sequence was still labelled as "uncultured", "unidentified" or "environmental", the next best match was evaluated. Assignment to the family level was based on the top hit with at least 90% identity score, with 100% sequence cover, as well as assignment consistency (e.g. top hits assigned to the same family) following the observations described in [36], and contrary to the approach used in [37].
LCAClassifier function of the CREST web server (http://apps.cbu.uib.no/crest) was used to assign taxonomy to 139 OTUs using built-in silvamod database [16] on August 25, 2016. Three different scores of LCA relative range were tested separately: 2%, 5% and 10%, but only the results based on LCA range of 2% were retained for further analysis and comparison.

Taxonomy assignment of nematode OTUs using tree-based approach
According to published tests [38], the tree-based approach does not allow grouping of sequences into well supported monophyletic clades equivalent in their taxonomic composition to nematode orders, but most of the marine nematode families are well resolved and supported. The reference sequence dataset was based on the "filtered" alignment from [38] that was updated with newly published sequences of marine nematodes. The final reference dataset is composed of 305 sequences representing the majority of marine nematode families as well as selected freshwater and terrestrial families, some species of which are known to inhabit the marine environment, plus three outgroup taxa (Supplementary Table 1). The same set of sequences was used for the taxonomy placement using a phylogeny-based approach (Section 2.7). . Two independent analyses were performed: in the first case, all 139 query sequences (cumulative reference dataset) were aligned with the reference dataset and analysed at once; in the second case, 139 query sequences were split in 14 groups of 10 or nine (partitioned query dataset), each group was separately aligned with the complete reference dataset and analysed. It was done to verify if the number and composition of query sequences have any impact on the effectiveness of the tree-based taxonomy assignment approach.

Taxonomy assignment of nematode OTUs using phylogeny-based approach
Alignments from [43][44] were combined together and supplemented with other sequences of marine nematodes available in GenBank. In order to minimize any potential errors and inconsistencies, both at the tree-building stage, alignment stage and placement stage, all sequences used for generating reference alignment and reference tree were selected to be as complete as possible, with the exception of taxa for which no alternative option was available. Secondary structure annotation was manually added to all non-annotated sequences using the JAVA-based editor 4SALE [45], and all sequences were manually aligned to maximize apparent positional homology of nucleotides. The resulting alignment includes representatives of all families of marine nematodes for which sequence data is available, as well as selected freshwater, terrestrial and animal parasitic taxa (Supplementary Table 1

Morphology-based analysis of samples
The nematode fauna in the coarse sand from the Hållö site included 107 different nematode species belonging to 86 genera and 33 families (Supplementary Table 2). Of these, flotation using MgCl 2 recovered 88 species from 73 genera and 26 families, while flotation using H 2 O recovered 101 species from 83 genera and 33 families. The differences in nematode fauna extracted using two variations of the same method are limited to rare species of different size classes (from less than 0.5 mm to over 2 mm). Relative abundance of these rare species does not exceed 0.14% (0.01-0.14%, with the average of 0.03%).

Taxonomy placement of OTUs using tree-based approaches
3.3.1. Cumulative query dataset. Tree-based taxonomy assignment of the cumulative query dataset produced 54 well supported placements (Figure 1; Supplementary Table 6) that fulfilled the following criteria: OTU must cluster within the monophyletic clade that has high bootstrap support (≥70%) and is at or below family-level. The remaining 85 OTUs could not be placed in clades satisfying these criteria, and are thus treated as unidentified.

Partitioned query dataset.
The results of taxonomic assignment using a tree-based approach of the partitioned query dataset produced somewhat different results comparing to the cumulative query dataset -67 OTUs were placed in monophyletic clades equivalent to family-level categories with sufficient support (Supplementary Table 6). Of these, taxonomic placement of only 47 OTUs matched the identification produced using the cumulative query dataset, and identifications of 20 OTUs were new. Seven OTUs were not assigned using a partitioned query dataset but positively identified using a cumulative query dataset.  Table 7). There are ten additional cases when the positive identity can not be attained because OTUs are placed either within a paraphyletic assemblage (family Desmodoridae or Linhomoeidae) or closely related monophyletic clade (Draconematidae or Siphonolaimidae respectively).

EPA/PaPaRa. The results produced using PaPaRa-based alignment and Evolutionary
Placement Algorithm are exactly the same as obtained using mothur-based alignment and described in the section 3.4.1 (Supplementary Table 7), even though visual comparison of alignments produced by mothur and by PaPaRa revealed some differences.

Comparison of different taxonomy assignment approaches
Among the three different taxonomy assignment approaches tested (each with two variations), the Evolutionary Placement Algorithm (both variations) placed the largest number of query OTUs into family level taxonomic categories (105 out of 139), while CREST implementation of alignment-based assignment was the least efficient (26 out of 139). Despite such a broad success rate, most of the identified OTUs were assigned to the same families (Supplementary Table 8

Comparison between barcode-based and morphology-based identification
The Evolutionary Placement Algorithm (phylogeny based approach) provided the largest number of positively identified OTUs and will be compared with the faunistic lists created by identifying nematode specimens using morphological characters. Since species-level identification can not be achieved for most of the OTUs, the results of barcode-based and morphology-based identifications can only be compared as the number of identified OTUs/morphospecies per family

General notes
Three different taxonomy assignment approaches (with two modifications each) tested in this project provide some variation in the number of positively identified OTUs, however, the assigned identities of those OTUs that were identified was consistent with very few exceptions

Alignment-based approach
Alignment-based approaches tested in this publication include manual analysis using BLASTN 2.5.0+ [17] against the nucleotide collection of the NCBI database and LCAClassifier function of the CREST against built-in silvamod database [16]. Both tested approaches have their Dell'Anno et al. [4] is an example where broad similarity threshold resulted in incorrect assignment of several nematode OTUs from deep-sea samples to nematode species known to inhabit freshwater and soil and never found in the marine environment (e. g. Anaplectus porosus, Anaplectus sp., Pakira orae amd Tylolaimophorus sp.).

Tree-based approach
Phylogenetic hypotheses used to infer relationships of taxa are usually thoroughly described and rigorously evaluated, undergo comparison and testing using different alignment and treebuilding algorithms. Phylogenetic trees used to identify unknown barcodes are less so [20][21].
Barcodes are by definition relatively short in length, hypervariable sites flanked by conserved regions. Hypervariable domains V1 and V2, which are part of the barcoding region of the 18S rRNA used in this publication, are the culprit that causes poor alignment and hence has negative effect on the quality of the resulting phylogeny. Different alignment and phylogeny-inference algorithms may provide competing phylogenetic hypotheses [38], and, as a result, different placements of OTUs in the cladogram. Taxon composition and sequence quality (exclusion of incorrectly identified species, low quality and short sequences) of the reference dataset is also crucial [38], as it determines which taxa can be identified and which taxa can not. Even the number and composition of OTUs, have strong effect on the final phylogenetic tree, and, as a result, on the outcome of the taxonomy assignment, as shown in Section 3.3. The latter is caused by the need to align de-novo the combined datasets that includes reference and query sequences -presence of unidentified sequencing errors among query OTUs can have negative effect on the alignment and phylogeny inference, even if all reference sequences are of high quality. This effect is global, i.e. by affecting the entire alignment and tree topology and bootstrap, erroneous sequences can potentially cause other OTUs to be misidentified or unidentified. In conclusion, successful use of tree-based approaches to assign taxonomy to anonymous OTUs is highly dependent not only on the quality and completeness of the reference dataset and alignment and phylogeny inference algorithms, but also on the quality and diversity of query sequences.

Phylogeny-based approach
Phylogeny-based approaches allow the estimation of the most likely position of each OTUs within the constrained phylogenetic tree, estimation of the rank of its taxonomic placement in supraspecific categories if these are well resolved and supported in the reference phylogeny, and can even work with paraphyletic taxa. Moreover, since the reference alignment and reference phylogeny are constrained during phylogeny-based taxonomy assignment procedures, the quality of query sequences has no impact on the result, i.e. the presence of erroneous sequences among query OTUs (chimaeras) has no effect on the identification of other query OTUs. The outcome of the analysis solely depends on the quality of the reference alignment and reference phylogeny. Even minor differences in the alignment of OTUs against the reference alignment noted above (Section 3.4.2) had no effect on the results. An additional advantage of the phylogeny-based taxonomy assignment approach implemented in Evolutionary Placement Algorithm is the possibility to use cumulative likelihood scores when assigning taxa to clades equivalent to supraspecific taxonomic categories (Supplementary Figure 3).

Metabarcoding versus morphology-based identification
Morphology-based identification procedures are strongly biased by the expertise and experience of the researcher performing the identification, as well as the state of the knowledge on the diversity of particular groups of nematodes. Metabarcoding, on the other hand, should be able to better estimate the diversity of poorly known groups of nematodes, or groups for which taxonomic expertise is not available at the moment, as well as unidentifiable specimens (eggs, juveniles, damaged specimens, etc). Moreover, metabarcoding can reveal taxa that are physically hidden and can not be observed by the researcher during sorting and identification, such as internal parasitessimilarly to the results obtained by [52], barcode-based identification revealed the presence of endoparasitic nematodes from the families Mermithidae and Benthimermithidae in our samples.
They had been overlooked during morphology-based identification, likely being juveniles within bodies of other invertebrates.
The number of OTUs identified by metabarcoding is strongly influenced by the clustering procedures of the raw sequence data, and, depending on the threshold used, will give different results. Assuming that the OTUs produced through metabarcoding are equivalent to currently recognized morphospecies, the only reason it would not be able to correctly estimate the number of species in the sample is if there are issues with amplification of the barcoding gene. The genus Halalaimus is a good example of a problematic taxon in this case -only one Halalaimus OTU (TS5.SSU874117) was recovered with metabarcoding, and only from the Telekabeln site.
Morphology-based identification recovered at least two different Halalaimus species in Hållö site and more than eight species in the Telekabeln site, some of which were relatively common.
GenBank hosts a number of Halalaimus sequences, confirming that the genus is sufficiently diverse genetically, and that our single Halalaimus OTU is unlikely to encompass multiple morphospecies, but is rather a result of amplification problems.

Reference databases
Taxonomy-assignment procedures described in the literature [16,35] often rely on various releases of the SILVA database [33], which in turn is based on the sequence data published in GenBank or EMBL. These databases can be "built-in" (CREST), and completely inaccessible for the user, or "pre-made" and hard to modify (QIIME). Presence of erroneously identified sequences of nematodes and other organisms in GenBank and SILVA databases has been mentioned multiple times [36, 38, 53-54]. If the reference database is not checked for errors prior to the analysis, the results produced by any taxonomy-assignment algorithm should be evaluated using available data on geographical or ecological distribution of species, in order to avoid mistakes.
As mentioned above, the SILVA database in itself does not always follow the most recent accepted classification for certain groups of organisms. As a result, placing some of the OTUs into nematode families based on SILVA classification turned out to be incorrect. For example, genera Paracyatholaimus and Preacanthonchus were placed in the family Chromadoridae using QIIME, while they do belong to the family Cyatholaimidae. Same examples are Enoploides placed in Enoplidae instead of Thoracostomopsidae, Calyptronema in Oncholaimidae instead of Enchelidiidae, Achromadora in Chromadoridae instead of Achromadoridae, Camacolaimus in Leptolaimidae instead of Camacolaimidae, and some others. Output from CREST [16] only gives the name of the supraspecific taxon for those cases where an anonymous OTU can not be identified to species level. This prevents proper evaluation of the assignment results and correction of assignments based on erroneous reference sequence or incorrect classification. We do not expect any database to be able to quickly reflect changes in nematode classification, but we expect end users of these databases to be aware of the need to verify and, if necessary, to update the output of any taxonomy-assignment procedure that they may use.
Another disadvantage of taxonomy-assignment software that uses built-in databases and offers only top-pick assignments in the output files (QIIME, CREST), is that a substantial number of OTUs are matched with environmental samples, labelled in such databases with the words "environmental" (e.g. "environmental sample"), "uncultured" (e.g. "uncultured eucaryote") and "unidentified" ("unidentified nematode"). They themselves are OTUs generated during previous metabarcoding projects and identified not by looking at actual morphological vouchers but by using one of the multiple taxonomy-assignment methods. Moreover, by giving only one top "hit" assignment, such software eliminates the possibility to see if the "second best" hit is based on sequence data from the physically observed and identified morphological voucher, and its similarity score, preventing the researcher from making educated decisions on the taxonomic identity of an OTU.

Conclusions and future prospects
The identification of OTUs is obviously a key step in metabarcoding and it is essential that the most effective method is used (as opposed to the fastest or simplest). Ideally the barcoding sequences should be assigned taxonomic names that provide a link to all biological knowledge that may exist in relation to the organism. Misidentification will compromise the results in studies of e.g. biogeography, community structure, habitat state or presence of certain important species (invasive, rare, indicators, etc).
Identification of OTUs should be at the appropriate taxonomic level, which is determined by the available reference sequences and the purpose of the study. In the case of marine nematodes we were able to assign our barcode sequences to family-level taxa to a high degree despite the very incomplete reference database. The relevance of family-level metabarcoding data in ecological studies remains poorly tested and requires extensive comparison with data obtained using classical approaches.
The full potential of metabarcoding is realised when sequences are identified to species level. This conveys the most information and permits more robust inferences. A prerequisite for this is taxonomic groundwork in the form of complete curated reference databases with sequences of reliably identified specimens.
We found the phylogeny-based taxonomy assignment approach to be the most efficient and the least error-prone. The alignment-based approach is less reliable because the similarity thresholds it depends on do not account for inter-and intra-taxon variations in barcode sequence, while tree-based approaches can be affected by the quality of the input OTU data. If phylogenybased taxonomy assignment methods become widely used in nematode metabarcoding, it is imperative to create and maintain high quality reference alignments and reference phylogenetic trees to be used by researchers worldwide.

Ethics statement
There are no particular ethical aspects specific to this publication. It did not involve: (1) experiments on animals; (2) collection of protected species; (3) research on human subjects; and (4) collection of personal data.

Data accessibility
The data supporting this article are available in the electronic supplementary material.