Exploring species boundaries with multiple genetic loci using empirical data from non‐biting midges

Over the past decade, molecular approaches to species delimitation have seen rapid development. However, species delimitation based on a single locus, for example, DNA barcodes, can lead to inaccurate results in cases of recent speciation and incomplete lineage sorting. Here, we compare the performance of Automatic Barcode Gap Discovery (ABGD), Bayesian Poisson tree processes (PTP), networks, generalized mixed Yule coalescent (GMYC) and Bayesian phylogenetics and phylogeography (BPP) models to delineate cryptic species previously detected by DNA barcodes within Tanytarsus (Diptera: Chironomidae) non‐biting midges. We compare the results from analyses of one mitochondrial (cytochrome c oxidase subunit I [COI]) and three nuclear (alanyl‐tRNA synthetase 1 [AATS1], carbamoyl phosphate synthetase 1 [CAD1] and 6‐phosphogluconate dehydrogenase [PGD]) protein‐coding genes. Our results show that species delimitation based on multiple nuclear DNA markers is largely concordant with morphological variation and delimitations using a single locus, for example, the COI barcode. However, ABGD, GMYC, PTP and network models led to conflicting results based on a single locus and delineate species differently than morphology. Results from BPP analyses on multiple loci correspond best with current morphological species concept. In total, 10 lineages of the Tanytarsus curticornis species complex were uncovered. Excluding a Norwegian population of Tanytarsus brundini which might have undergone recent hybridization, this suggests six hitherto unrecognized species new to science. Five distinct species are well supported in the Tanytarsus heusdensis species complex, including two species new to science.

Potentially cryptic species are detected now more frequently than ever. However, the presence of introgression (Gay et al., 2007;Martinsen, Whitham, Turek, & Keim, 2001) and incomplete lineage sorting (Ballard & Whitlock, 2004;Heckman, Mariani, Rasoloarison, & Yoder, 2007;Willyard, Cronn, & Liston, 2009) present obstacles for correct species delimitation using single genetic markers. Moreover, deep mitochondrial genetic divergence is not always accompanied by correspondingly deep nuclear differentiation. Genealogical concordance among multiple loci can provide convincing evidence for species boundaries and validate the presence of genetically distinctive but morphologically cryptic lineages. This has been explored in several insect taxa. For instance, Fossen, Ekrem, Nilsson, and Bergsten (2016) found evidence of genetically distinct lineages in closely related northern European water scavenger beetles using multiple loci, and Low et al. (2016) delineated taxonomic boundaries in the largest species complex of black flies using multiple genes, morphological and chromosomal data. Also within the Chironomidae, several potential cryptic species have been detected by DNA barcodes and subsequently confirmed by nuclear DNA markers and careful analyses of morphological characters (e.g., Anderson, Stur, & Ekrem, 2013).
There are several methods that can be used to investigate species boundaries even with low populational sampling frequency. For example, in DNA barcoding, the so-called barcode gap assumes larger interspecific than intraspecific genetic distance. Although some studies show that barcode gaps can disappear with increased sampling and geographical coverage (Bergsten et al., 2012), many investigated groups tend to retain barcodes gaps as sampling is increased (Čandek & Kuntner, 2015;Huemer, Mutanen, Sefc, & Hebert, 2014;Marín et al., 2017). This is also the case for Chironomidae (own observation in BOLD, Lin, Stur, & Ekrem, 2015). The software Automatic Barcode Gap Discovery (ABGD) recursively searches for barcode gaps in the distribution of pairwise sequence divergences and assigns input sequences into hypothetical species based on pairwise distances. It is recognized that ABGD performs well on large barcode data sets with an appropriate prior of maximum intraspecific divergence (Lin et al., 2015;Pentinsaari, Vos, & Mutanen, 2016;Puillandre, Lambert, Brouillet, & Achaz, 2012); however, it is sensitive to singleton sequences and requires knowledge of threshold values (Pentinsaari et al., 2016;Puillandre et al., 2012).
Standard phylogenetic analyses assuming dichotomous splitting of ancestral branches can also be used to recognize species as monophyletic groups in phylogenetic trees.
Character-based, rigorously tested approaches include maximum parsimony (MP), maximum likelihood (ML) and Bayesian inference (BI). While these methods produce results that are based on similarities analysed in a strictly hierarchical framework, techniques using coalescent-based species delimitation combine population genetics and phylogenetics to objectively delineate evolutionary significant units of diversity. For single genetic loci, the generalized mixed Yule coalescent model (GMYC) (Pons et al., 2006) and the Poisson Tree Processes (PTP) (Zhang, Kapli, Pavlidis, & Stamatakis, 2013) are widely used to apply the phylogenetic species concept with assumed reciprocal monophyly in gene trees. The GMYC model combines a Yule species birth model with a neutral coalescent model of intraspecific branching (Fujisawa & Barraclough, 2013;Pons et al., 2006) and has been widely accepted for species delimitation based on single-locus data under many circumstances, including high singleton presence, taxon richness and the presence of gaps in intraspecific sampling coverage (Talavera, Dincă, & Vila, 2013). However, relative to other methods, GMYC has a tendency of oversplitting lineages resulting from errors in the reconstruction of ultrametric input trees (Paz & Crawford, 2012;Pentinsaari et al., 2016;Tänzler, Sagata, Surbakti, Balke, & Riedel, 2012). The method also shows a tendency of overlumping in cases where different lineages are results of rapid and recent divergences (Esselstyn, Evans, Sedlock, Khan, & Heaney, 2012).
The PTP model requires a rooted input tree and assumes that intra-and interspecific substitutions follow distinct Poisson processes and that intraspecific substitutions are discernibly fewer than interspecific substitution (Tang, Humphreys, Fontaneto, & Barraclough, 2014;Zhang et al., 2013). The bPTP model, an updated version of the original PTP with Bayesian support values, provides more accurate results for species delimitation (Zhang et al., 2013).
Single gene trees can be discordant and are not the same as species trees due to processes like incomplete lineage sorting. Using coalescent-based methods on multiple loci can therefore be advantageous as it uncouples gene trees and species trees, and the gene tree coalescences are allowed to be older than species tree coalescences. It is recognized that the coalescent-based method Bayesian phylogenetics and phylogeography (BPP) (Yang & Rannala, 2010) is efficient in delineating closely related species using multiple loci (Yang, 2015;Yang & Rannala, 2010. The BPP method implements a reversible jump Markov chain Monte Carlo (rjMCMC) search to estimate the posterior probability of species delimitation hypotheses. The method estimates ancestral population sizes and species population divergence times through estimated distributions of gene trees from multiple loci. The method requires sequence data and a guide species tree with defined topology as input and BPP can lead to false species delimitation when the guide tree is inaccurately specified (Rannala & Yang, 2013).
Statistical parsimony network analysis implemented in the TCS software provides a rapid and useful tool for species delimitation (Hart & Sunday, 2007), when applied to nonrecombinant loci. TCS calculates the maximum number of mutational steps constituting a 95% parsimonious connection between two haplotypes and then joins these into networks following specific algorithms (Templeton, Crandall, & Sing, 1992).
Currently, only a few studies have compared the performance of these different analytical methods for multiple genetic markers, especially for delineation of potentially cryptic species in insects.
The genus Tanytarsus van der Wulp, 1874 (Diptera: Chironomidae) is the most species-rich genus of the tribe Tanytarsini in subfamily Chironominae with more than 350 valid species worldwide. Larvae of Tanytarsus are eurytopic, occur in all types of freshwater, sometimes even in marine or terrestrial environments, and play an important role in freshwater biomonitoring. However, morphological determination of species in some Tanytarsus species groups can be notoriously difficult. Additionally, there are many unknown and cryptic Tanytarsus species where the boundaries remain uncertain. In a previous study, DNA barcodes uncovered several potential cryptic species within the Tanytarsus curticornis Kieffer, 1911 andTanytarsus heusdensis Goetghebuer, 1923 species complexes (Lin et al., 2015).
Based on morphologically similar characteristics in the adult male, the T. curticornis species complex previously included Tanytarsus brundini, Lindeberg, 1963, Tanytarsus congus Lehmann, 1981, T. curticornis Kieffer, 1911, Tanytarsus ikicedeus Sasa & Suzuki, 1999, Tanytarsus neotamaoctavus Ree, Jeong & Nam, 2011, Tanytarsus pseudocongus Ekrem, 1999, Tanytarsus salmelai Giłka & Paasivirta, 2009, Tanytarsus tamaoctavus Sasa, 1980. The T. heusdensis species complex included four described species: T. heusdensis Goetghebuer, 1923, Tanytarsus reei Na & Bae, 2010, Tanytarsus tamaduodecimus Sasa, 1983, Tanytarsus tusimatneous Sasa & Suzuki, 1999. The similar phenotypes within the T. curticornis and T. heusdensis species complexes likely have led to misidentifications and an underestimation of species biodiversity in Tanytarsus. Thus, these two species complexes are well suited to explore the suitability of different molecular markers and analytical methods in the analyses of species boundaries within non-biting midges. Currently to this study, a taxonomic review of the two species complexes based on morphology and DNA barcodes was conducted and formal descriptions published (Lin, Stur, & Ekrem, 2017 The goal of this study was to investigate whether different molecular markers and analytical tools give similar results when applied to a set of morphologically similar species of Chironomidae, and whether the results are comparable to those achieved from DNA barcodes or morphological analysis alone (op. cit.).

| Molecular methods and analyses
Adult specimens were preserved in 85% ethanol, immatures in 96% ethanol, and stored dark at 4°C before morphological and molecular analyses. Genomic DNA of most specimens was extracted from the thorax and head using QIAGEN ® DNeasy Blood & Tissue Kit and GeneMole DNA Tissue Kit on a GeneMole ® instrument (Mole Genetics, Lysaker, Norway) at the Department of Natural History, NTNU University Museum. When using QIAGEN ® DNeasy Blood & Tissue Kit was used, the standard protocol of the kit was followed, except that the final elution volume was 100 μl due to small specimen size. When using GeneMole DNA Tissue Kit, the standard protocol was followed, except that 4 μl Proteinase K was mixed with 100 μl buffer for overnight lysis at 56°C and the final elution volume was 100 μl. After DNA extraction, the cleared exoskeleton was washed with 96% ethanol and mounted in Euparal on the same microscope slide as its corresponding antennae, wings, legs and abdomen following the procedure outlined by Saether (1969). Vouchers are deposited at the Department of Natural History, NTNU University Museum, Trondheim, Norway, University Museum of Bergen, Bergen, Norway, or the College of Life Sciences, Nankai University, Tianjin, China.
Fragments of one mitochondrial protein-coding gene cytochrome c oxidase subunit I (COI) and three nuclear protein-coding genes (alanyl-tRNA synthetase 1 [AATS1], carbamoyl phosphate synthetase 1 [CAD1] and 6-phosphogluconate dehydrogenase [PGD]) were amplified. The primers used to amplify the four regions are shown in Table 1. DNA amplification of COI was carried out in 25 μl reactions using 2.5 μl 10× Takara ExTaq pcr buffer (CL), 2 μl 2.5 mM dNTP mix, 2 μl 25 mM MgCl 2 , 0.2 μl Takara Ex Taq HS, 1 μl 10 μM of each primer, 2 μl template DNA and 14.3 μl ddH2O. Amplification cycles were performed on a Bio-Rad C1000 Thermal Cycler (Bio-Rad, California, USA) and followed a program with an initial denaturation step of 95°C for 5 min, then followed by 34 cycles of 94°C for 30 s, 51°C for 30 s, 72°C for 1 min and one final extension at 72°C for 3 min. DNA amplifications of selected three nuclear genes were carried out using 2.5 μl 10× Ex Taq Buffer, 2 μl 2.5 mM dNTP Mix, 0.1 μl Ex Taq HS (all TaKaRa Bio INC, Japan), 0.5 μl 25 mM MgCl 2 and 1 μl of each 10 μM primer. The amount of template DNA was adjusted according to the DNA concentration and varied between 2 and 5 μl. ddH 2 O was added to make a total of 25 μl for each reaction. Amplification cycles were performed on a Bio-Rad C1000 Thermal Cycler and followed a program with an initial denaturation step of 98°C for 10 s, then 94°C for 1 min followed by five cycles of 94°C for 30 s, 52°C for 30 s, 72°C for 2 min and seven cycles of 94°C for 30 s, 51°C for 1 min, 72°C for 2 min and 37 cycles of 94°C for 30 s, 45°C for 20 s, 72°C for 2 min 30 s and one final extension at 72°C for 3 min. PCR products were visualized on a 1% agarose gel, purified using Illustra ExoProStar 1-Step (GE Healthcare Life Sciences, Buckinghamshire, UK) and shipped to MWG Eurofins (Ebersberg, Germany) for bidirectional sequencing using BigDye 3.1 (Applied Biosystems, Foster City, CA, USA) termination. Not all individuals were successfully sequenced for all three nuclear loci (Tables S1 and S2). Sequences were assembled and edited using Sequencher 4.8 (Gene Codes Corp., Ann Arbor, Michigan, USA). The forward and reverse sequences were automatically assembled by the software, and the contig was inspected and edited manually. The appropriate International Union of Pure and Applied Chemistry (IUPAC) code was applied when the ambiguous base calls existed. Sequence information was uploaded on BOLD (www.boldsystems. org) along with an image and collateral information for each voucher specimen. The sequences names were edited using Mesquite 2.7.5 (Maddison & Maddison, 2010). Alignment of the sequences was carried out using the Muscle algorithm (Edgar, 2004) on nucleotides in MEGA 7 (Kumar, Stecher, & Tamura, 2016). Introns were detected with a reference sequence (Chironomus tepperi, GenBank: FJ040616) and removed from the alignment using GT-AG rule (Rogers & Wall, 1980). After removing introns, the codons were aligned. No evidence of paralogue copies was observed in any sequences.

| Automatic barcode gap discovery (ABGD)
Although several species had fewer than three specimens, the aligned COI barcodes of T. curticornis and T. heusdensis species complexes were sorted into hypothetical species using the ABGD method to discover the existence of the DNA barcode gaps and estimate the number of molecular OTUs. The analyses were conducted on the ABGD website (http://wwwabi.snv.jussieu.fr/public/abgd/abgdweb.html) with a prior p that ranges from .005 to .1 and the K2P model, following default settings.

| Phylogenetic reconstructions
All nuclear genetic markers were concatenated using SequenceMatrix v1.7.8 (Vaidya, Lohman, & Meier, 2011). Phylogenetic analyses used the partition strategies and models of sequence evolution selected based on the Bayesian information criterion (BIC) in the jModelTest 2.1.7 (Darriba, Taboada, Doallo, & Posada, 2012). We used a maximumlikelihood (ML) phylogenetic analysis on each loci and on the concatenated nuclear gene data set with RAxML8.1.2 (Stamatakis, 2006(Stamatakis, , 2014 using raxmlGUI v1.5b1 (Silvestro & Michalak, 2012), with unlinked partitions as selected by PartitionFinder (Lanfear, Calcott, Ho, & Guindon, 2012). We used 1,000 bootstrap replicates in a rapid bootstrap analysis and a thorough search for the best scoring ML tree. Results indicated no conflict between nuclear gene trees, but incongruence between mitochondrial and nuclear trees. As a result, we used a concatenated nuclear data set and mitochondrial COI data set separately to reconstruct phylogenetic relationships of all specimens sequenced. We also implemented Bayesian inference in MrBayes v3.2.6 (Ronquist et al., 2012).
In the Bayesian analyses, data sets were partitioned by gene, four chains on two runs for 20 million generations, sampled every 1,000 generations with a burn-in of 0.25. Convergence of posterior probabilities in each run was monitored using Tracer v1.6 (Rambaut, Suchard, Xie, & Drummond, 2014); the first 10% of the sampled trees were discarded as burn-in.

| GMYC
The single-threshold GMYC analyses were conducted in R v3.2.3 (R Core Team 2016) in a Linux environment, with the use of the splits package. The ultrametric single-locus gene tree required for the GMYC method was obtained using BEAST 1.8.2 (Drummond, Suchard, Xie, & Rambaut, 2012) on the reduced data set (identical sequences were excluded in RAxML), with 10 million MCMC generations under the Yule speciation model. A strict molecular clock was shown to be appropriate to infer the ultrametric trees through the model comparison using a Bayes factor test in Tracer 1.6 (Rambaut et al., 2014). For the T. curticornis species complex, the GTR + G substitution model (Tavaré, 1986) was selected for the AATS1, CAD1 and PGD genes, the HKY + G substitution model (Hasegawa, Kishino, & Yano, 1985) was selected for COI. For the T. heusdensis species complex, the GTR + G substitution model was selected for PGD, the HKY + G substitution model was selected for AATS1, CAD1 and COI. Effective sample sizes (ESS) and trace plots estimated with Tracer 1.6 were used as convergence diagnostics, and a burn-in of one million generations was used to avoid suboptimal trees in the final consensus tree. Ultrametric maximum clade credibility (MCC) trees were computed using the mean node heights with TreeAnnotator v1.8.2 for each locus gene.

| PTP
A rooted input tree for each gene was generated with RAxML using rapid Bootstrap with 1,000 replicates and the GTR + G + I substitution model. The PTP and bPTP analyses for each gene were run on a web server (http://species.hits.org/ptp/) with 500,000 MCMC generations, excluding outgroup, following the remaining default settings.

| BPP
We combined the data sets of the T. curticornis and T. heusdensis species complexes for the BPP analyses because the statistical power of BPP can be increased when closely related outgroups are included (Rannala & Yang, 2013). The multilocus Bayesian species delimitation method in BPP X1.2.2 (Yang, 2015;Yang & Rannala, 2010) was used with two concatenated data sets (three nuclear loci [AATS1, CAD1 and PGD] and all loci [AATS1, COI, CAD1 and PGD]). Two start guide species trees were estimated in *BEAST v1.8.2 (Drummond et al., 2012) on the above concatenated data sets and run with 40 million MCMC generations under the Yule Process speciation model. The HKY + G substitution model was selected for AATS1, COI, CAD1 and PGD genes for the T. curticornis and T. heusdensis species complexes. Effective sample sizes (ESS) and trace plots were examined in Tracer 1.6 and used as convergence diagnostics. A burn-in of one million generations was used to avoid suboptimal trees in the final consensus tree. Ultrametric maximum clade credibility (MCC) trees were computed using the mean node heights with TreeAnnotator v1.8.2. Trees were visualized using FigTree 1.4.3 (available at http://tree.bio.ed.ac.uk/software/figtree).
We used algorithm 0 with a default fine-tuning parameter ε = 2 and species model prior to 1 as uniform rooted trees. The estimation of the marginal posterior probability of speciation associated with each node in the guide tree is performed by summarizing the probabilities for all models that support a particular speciation event with probability values of ≥95% (Leaché & Fujita, 2010). The posterior probabilities for models can be mainly affected by the prior distributions on the ancestral population size (θ) and root age (τ), with large values for θ and small values for τ favouring conservative models containing fewer species (Yang & Rannala, 2010). As no empirical data were available for the studied species, we ran the species delimitation analyses by the following combinations of gamma distributions: 1. Θ: G (2: 1,000), τ: G (2: 200); 2. Θ: G (2: 1,000), τ: G (2: 2,000); 3. Θ: G (2: 100), τ: G (2: 200); 4. Θ: G (2: 100), τ: G (2: 2,000); 5. Θ: G (2: 100), τ: G (2: 500). All BPP analyses were run for 500,000 generations with sampling every five generations, after discarding an initial burn-in of 20,000 generations. Heredity scalars were set to 1.0 for AATS1, COI, CAD1 and PGD, while algorithm was set to "0." Every analysis was run twice to check for convergence between runs and agreement on the posterior probability of the species delimitation models.

| Sequencing results
The aligned length (bp) for the four loci used in the full analysis was as follows: AATS1 (408), CAD1 (909), COI (658), PGD (747). The number of variable and parsimony informative sites as well as the average nucleotide composition in each genetic marker is shown in Tables S3 and S4. The COI sequences were heavily AT-biased, especially in third position (>82%), in both species complexes.

| ABGD
In the T. curticornis species complex, the COI sequences were sorted into 10 molecular OTUs, but no definite "barcode gap" was observed in the pairwise K2P distances as some morphospecies showed high intraspecific divergence ( Figure S1a). For the T. heusdensis species complex, two gaps were observed ( Figure S1b), and the COI sequences were sorted into five molecular OTUs when the threshold was placed at 9% according to the second gap in the distribution of pairwise nucleotide distances ( Figure  S1b).

F I G U R E 1 Maximum-likelihood
tree based on the concatenated nuclear data set of the Tanytarsus curticornis species complex. Bootstrap support (1,000 replicates) and posterior probabilities of nodes are indicated above and below the branches, respectively. Only nodes with BS > 70% and/or BP > 0.95 are labelled

| Phylogenetic analyses
The phylogenetic analyses under ML and Bayesian inference produced identical trees in the T. curticornis and T. heusdensis species complexes for the concatenated nuclear genes data. In the T. curticornis species complex, the concatenated nuclear genes data yielded 10 well-supported monophyletic groups (Figure 1). A population of T. brundini from Sølendet, Norway, was separated from other populations of T. brundini in all three nuclear markers ( Figure 1) and in the trees resulting from analyses of a concatenated mitochondrial and nuclear data set (Figure 2). All T. brundini sequences clustered together in the trees based on COI barcodes (Figure 3). In the T. heusdensis species complex, the data set based on concatenated nuclear genes as well as the data set based on COI barcodes yielded five well-supported monophyletic groups (Figure 4a,b).

| Patterns of haplotype diversity
For the T. curticornis species complex, generally the networks based on mitochondrial and nuclear genes showed 10 haplotype groups (Figures 5 and 6). However, sequences of T. thomasi were sorted into one haplotype group in AATS1 gene network, two haplotype groups in COI and two haplotype groups in CAD1. The PGD marker gave three haplotype groups. Furthermore, sequences of T. brundini were arranged into one haplotype group in the COI network and two groups in all networks based on nuclear markers (Figures 5b and 6).
For the T. heusdensis species complex, the TCS network of mitochondrial haplotypes showed six groups where the sequences of T. reei split into two haplotype groups (Figure 7a), resulting from the high intraspecific divergence in COI sequences for this species. As expected, the TCS networks F I G U R E 2 Maximum-likelihood tree based on the concatenated mitochondrial and nuclear data set of the Tanytarsus curticornis species complex. Bootstrap support (1,000 replicates) and posterior probabilities of nodes are indicated above and below the branches, respectively. Only nodes with BS > 70% and/or BP > 0.95 are labelled based on all three nuclear alleles confirmed the results obtained in the phylogenetic trees and retrieved five genetic groups (Figure 7b-d).

| Species delimitation with GMYC
In the T. curticornis species complex, the GMYC model resulted in a slight oversplitting in COI-( Figure S2), CAD1-( Figure S3) and PGD-data ( Figure S4) varying between 10 and 14 OTUs (Table 2). Surprisingly, the GMYC analysis of AATS1 using an ultrametric tree with 44 terminals returned a result where only three OTUs were distinguished (Figure 8). This might be a result of insufficient sampling, low intraspecific divergences and recent speciation of the T. curticornis species complex (Timothy Barraclough pers. comm.). We excluded a few sequences with little divergence and generated a new ultrametric tree with 33 individuals under the same settings in BEAST. This AATS1 data set yielded seven OTUs (Figure 9). We also ran the smaller data set for AATS1 using different ultrametric input trees, but still got a lower number of distinguished clusters (3-7 OTUs) compared to the other markers. Recent and rapid divergences can result in uncertainty in the GMYC model and lead to a certain tendency of overlumping (Esselstyn et al., 2012;Reid & Carstens, 2012). However, the variation observed for AATS1 in our data is comparable to that of the other markers. It is difficult to explain why the results are so different with the AATS1 data set and we speculate that the observed pattern might be caused by a systematic error with the GMYC model.
In the T. heusdensis species complex, the GMYC analyses delimited five species with a single threshold. The most likely solution showed concordant results between all nuclear markers and corresponded well to defined morphospecies. The analyses of CAD1 ( Figure S5) and PGD ( Figure S6) both distinguished five molecular OTUs, but as we failed to amplify the AATS1 segment for T. adustus, this marker resulted in four distinct molecular OTUs for the T. heusdensis species complex ( Figure S7). For COI, GMYC analysis resulted in six distinguished clusters as geographically divergent populations of T. reei from Germany and Eastern Asia formed two separate OTUs ( Figure S8). The lack of similar pattern in the nuclear sequence data sets is difficult to explain, but could be due to higher evolutionary rate of the COI barcodes. Thus, in the T. heusdensis species complex, species delimitations based on the AATS1, CAD1 and PGD nuclear markers appear more reliable than those using the mitochondrial COI gene.

| Species delimitation with PTP and bPTP
The bPTP analysis of COI for the T. curticornis species complex failed to reach convergence by 500,000 MCMC generations, which is the upper limit of the web server. Disregarding this, the PTP and bPTP analyses yielded similar result with the GMYC analysis of the COI, CAD1 and PGD data sets delineating 10-14 OTUs (Table 3). For the marker AATS1, the PTP and bPTP analyses resulted in 14 and 16 OTUs, respectively, considerably higher than the results of the GMYC analyses as well as more than the expected nine morphospecies. The PTP and bPTP analyses of the T. heusdensis species complex yielded same results as the GMYC model, delineating 4-6 OTUs for each marker (Table 3).

| Species delimitation with BPP
Initial runs showed errors in the RJ fine-tune variable (≤0) when the parameters were set as follows: Θ: G (2: 100), τ0: G (2: 2,000) and Θ: G (2: 100), τ0: G (2: 1,000). Thus, we used the parameters as Θ: G (2: 100), τ0: G (2: 500) in the final runs. The results from BPP analyses on the concatenated data sets of both nuclear genes and all genes (including COI) showed that 15 candidate species were well supported (posterior probabilities 0.99-1.00; Table 4).  The T. curticornis species complex was divided into 10 species where the Sølendet population of T. brundini was isolated as a separate species. For the T. heusdensis species complex, both data sets isolated five species in the BPP analyses.

| DISCUSSION
Several previous studies have evaluated the performance of the species delimitation approaches used here on singlelocus data. GMYC appears to have a tendency to oversplit and sometimes overlump lineages due to sampling bias, differences in population size and speciation rates (Dellicour & Flot, 2015;Esselstyn et al., 2012;Fujisawa & Barraclough, 2013;Pentinsaari et al., 2016;Reid & Carstens, 2012;Talavera et al., 2013). PTP generates more robust results or results that are highly congruent with GMYC (Pentinsaari et al., 2016;Tang et al., 2014). While ABGD and parsimony networks appear to perform well when speciation rates are low and interspecific divergence is high (Dellicour & Flot, 2015). When sampling is comprehensive within species and effective population sizes are small, these species delimitation methods using single-locus data sets generally yield the same results. However, when intraspecific divergence is high and interspecific divergence is low, species delimitation models using single-locus data set often are unable to separate species properly. Moreover, non-monophyletic species caused by infrequent horizontal gene flow and incomplete lineage sorting may also lead to inaccurate species delimitation results when using single-locus data set (Camargo, Morando, Avila, & Sites, 2012;Fontaneto, Flot, & Tang, 2015;Fujita, Leaché, Burbrink, McGuire, & Moritz, 2012). To overcome these problems, species delimitation using multiple loci can be used. The BPP species delimitation method is perhaps the most popular method using multiple loci and has been proven efficient in species separation (Fehlauer-Ale et al., 2014;Leaché et al., 2017;Yang & Rannala, 2010). Using various species delimitation approaches with different criteria and searching a consensus from different outcomes may increase our confidence regarding species boundaries of target groups.
In this study, species delimitation analyses based on single loci using ABGD, parsimony networks, GMYC and PTP give the same results for the T. heusdensis species complex, but different results for the T. curticornis species complex. The BPP species delimitation model is based on multiple loci and provides a result that better reflects the observations on morphological divergence.
The above results show that while the COI marker divides the T. curticornis species complex into nine lineages, the three nuclear genes identify 10 lineages. The conflicting result between mitochondrial and nuclear genes is caused by a Norwegian population of T. brundini which has COI sequences similar to other populations of T. brundini, while all nuclear markers show deep divergence. We are not able to detect morphological differences between the specimens of this particular population and other populations of T. brundini at present, but have only compared adult males. It is widely recognized that the discordance among gene trees can be caused by the stochastic process of lineage sorting (Maddison, 1997;Pamilo & Nei, 1988) and numerous examples exist in literature. For instance, the phylogenetic incongruence in the Drosophila melanogaster species complex is caused by incomplete lineage sorting (Pollard, Iyer, Moses, & Eisen, 2006). Thus, incomplete lineage sorting in the nuclear markers is a possible explanation for the discordance between mitochondrial and nuclear gene trees. Another explanation can be horizontal gene transfer by infrequent hybridization between two cryptic species, resulting in equal mitochondrial genotypes while keeping divergent nuclear genomes. This pattern is previously documented for crickets (Shaw, 2002) and water fleas (Taylor, Sprenger, & Ishida, 2005) and would fit well with our observations. Based on the observed genetic divergence, we searched and found fine but consistent morphological differences that separate six of the distinct clusters of the T. curticornis species complex from previously described species (T. heberti, T. madeiraensis, T. songi, T. thomasi, T. tongmuensis and T. wangi). These taxa are diagnosed and described elsewhere (Lin et al., 2017).
The five mitochondrial DNA lineages we previously identified in the T. heusdensis species complex (Lin et al., 2015) are also recognized in the analyses based on nuclear markers. The divergence among the T. heusdensis sensu lato lineages is on par with that between other recognized Tanytarsus species, suggesting that the complex as of now comprises five distinct species. Two are recognized as new to science based on morphology (T. adustus and T. pseudoheusdensis) and are described by Lin et al. (2017).
Overall, DNA barcodes are very effective in distinguishing chironomid species and provide novel insight into the taxonomy of some groups. However, DNA barcodes occasionally fail to separate genetically distinct species and can give inaccurate results due to deep intraspecific divergence (Meier, Shiyang, Vaidya, & Ng, 2006;Zhou, Adamowicz, Jacobus, DeWalt, & Hebert, 2009). Multiple reasons why gene trees and species trees are often not the same exist (Maddison, 1997;Nichols, 2001;Rosenberg, 2002) and incomplete lineage sorting, insufficient taxon sampling, horizontal gene flow or recent speciation can be difficult to distinguish regardless of analytic method implemented. Thus, species delimitation analyses based on multiple loci with coalescent models are widely accepted as it improves the discovery, resolution, consistency and stability of our understanding of species (Fujita et al., 2012;Leaché & Fujita, 2010). Our findings are consistent with those of Dupuis, Roe, and Sperling (2012) who found that one marker is not enough for species delimitation in closely related animals and fungi. Also in insects, multiloci-based species delimitation has proved to be more suitable as it is not equally susceptible to introgression (Boykin et al., 2014;Dincă, Lukhtanov, Talavera, & Vila, 2011;Hsieh, Ko, Chung, & Wang, 2014;Malausa et al., 2011;Schutze et al., 2015;Song & Ahn, 2014). Our results are in agreement with this and demonstrate that species delimitation analyses based on multiple loci give a more credible result than a single locus.
The discovery, description and naming of cryptic species obviously are important for both biological conservation and estimates of species richness (e.g., Delić, Trontelj, Rendoš, & Fišer, 2017;Hebert, Penton, Burns, Janzen, & Hallwachs, 2004). But it also can be of significance for environmental management if the cryptic lineages have different have preferences or react differently to environment stressors (Feckler et al., 2014). We do not have sufficient information to evaluate the potential ecological differences between the species in the T. curticornis and T. heusdensis complexes, but acknowledge the possibility for such comparative studies now that molecular characterization of these taxa exists.

| CONCLUSION
In our study, species delimitations based on the AATS1, CAD1 and PGD nuclear DNA markers were largely consistent with delimitations using the mitochondrial COI gene alone. The results were only conflicting for the species T. brundini, in which nuclear markers separated a Norwegian population as a distinct species. Bayesian species delimitation based on multiple loci gives a more reliable result than single locus-based species delimitation methods. In total, 15 species of the T. curticornis and T. heusdensis species complexes were differentiated genetically. Subsequent detection of morphological characters that support these species boundaries led to the integrative discovery of eight species new to science.