Integrative taxonomy of the genus Pseudostegana (Diptera, Drosophilidae) from China, with descriptions of eleven new species

The genus Pseudostegana (Okada, 1978) currently contains thirty-nine described species. A number of Pseudostegana were collected from the fieldwork in southwestern China from 2010 to 2017. Eleven new species were discovered and are described from southwestern China: Pseudostegana alpina Zhang & Chen, sp. nov.; Pseudostegana amnicola Zhang & Chen, sp. nov.; Pseudostegana amoena Zhang & Chen, sp. nov.; Pseudostegana mailangang Zhang & Chen, sp. nov.; Pseudostegana meiduo Zhang & Chen, sp. nov.; Pseudostegana meiji Zhang & Chen, sp. nov.; Pseudostegana mystica Zhang & Chen, sp. nov.; Pseudostegana stictiptrata Zhang & Chen, sp. nov.; Pseudostegana stigmatptera Zhang & Chen, sp. nov.; Pseudostegana ximalaya Zhang & Chen, sp. nov. and Pseudostegana zhuoma Zhang & Chen, sp. nov. A key to all Chinese Pseudostegana species based on morphological characters is provided. Two mitochondrial loci (COI and ND2) and one nuclear locus (28S rRNA) were sequenced for the Pseudostegana specimens, and Bayesian and RAxML concatenated analyses were run. Molecular species delimitation is performed using the distance-based automatic barcode gap discovery (ABGD) method. Molecular data support the morphological characteristics observed among these Chinese species and confirm the new species as being distinctly different.


INTRODUCTION
The genus Pseudostegana (Okada, 1978) is widely distributed in tropical area from Oriental to Australasian regions. Adult flies of this genus are yellow to black in body color, with a body length of about two to four mm. They are usually found resting on fallen logs, tussocks or fruits beside a stream, flapping their wings slowly like butterflies (Li, Gao & Chen, 2010). Currently, a total of 39 Pseudostegana species have been described (Chen, Toda & Wang, 2005;Li, Gao & Chen, 2010): 17 spp. from Malaysia, 12 spp. from China, five spp. from the Philippines, three spp. from Papua New Guinea, two spp. from Indonesia and one sp. from Vietnam. Chen, Toda & Wang (2005) revised the Pseudostegana taxonomy using morphological characters, and proposed six species groups (the atrofrons, javana, latiparma, (ICZN), and hence the new names contained in the electronic version are effectively published under that code from the electronic edition alone. This published work and the nomenclatural acts it contains have been registered in ZooBank, the online registration system for the ICZN. The ZooBank LSIDs (Life Science Identifiers) can be resolved and the associated information viewed through any standard web browser by appending the LSID to the prefix http://zoobank.org/. The LSID for this publication is: urn:lsid:zoobank.org: pub:91F288F7-0083-4EBD-9B49-DF51048E9519. The online version of this work is archived and available from the following digital repositories: PeerJ, PubMed Central and CLOCKSS.

DNA extraction, PCR and sequencing
Detailed information on samples used for molecular study is given in Table 1. Two Parastegana species, Parastegana femorata Duda, 1923 and Parastegana brevivena Chen & Zhang, 2007, are chosen as the outgroups in this study. Genomic DNA was extracted from the abdominal tissue of a single fly after the dissection of the genitalia using the TIANamp Genomic DNA kit (TIANGEN TM , Beijing, China). DNA fragments of COI, ND2 and 28S were sequenced for each of the selected samples (Table 1). Target fragments were amplified and sequenced using published primers (Table S2). PCR products were sequenced at BGI (Beijing, China) by the ABI 3730 Genetic Analyzer (Applied Biosystems, Foster City, CA, USA) and were performed forward and reversed strand sequencing.

Alignment and phylogenetic reconstruction
The obtained mitochondrial sequences were aligned with the ClustalW method implemented in MEGA 6.0 (Tamura et al., 2013) and they were translated into amino acid sequences to avoid the nuclear paralogous copies (Numts). In addition, the ratio between the number of synonymous substitutions per nonsynonymous sites (dN) and the number of synonymous substitutions per synonymous sites (dS) was assessed in MEGA 6.0 (Tamura et al., 2013) using Nei-Gojobori method (Nei & Gojobori, 1986). The "Q-INS-I" method in the online MAFFT software (http://mafft.cbrc.jp/alignment/server/) was applied for the alignment of 28S data set. Phylogenetic analyses were performed using maximum likelihood (ML) and Bayesian inference (BI) methods based on the combined data set of COI, ND2 and 28S segments (only the specimens with all three genes were employed). The combined alignment was partitioned into seven blocks, including six blocks for the first, second and third codon positions for the two mitochondrial coding genes, and one rRNA gene fragment. Then the partitioning schemes were searched for under PartitionFinder 1.1.1 (Lanfear et al., 2012) using the "greedy" algorithm with the Akaike's Information Criteria (AIC) and corresponding optimal models were selected under the AIC using jModelTest v2.1.3 (Guindon & Gascuel, 2003;Darriba et al., 2012). The best partition scheme and substitution models were selected for BI or ML analysis: COI codon position 1 and 2 + ND2 codon position 1 and 2 + 28S -TIM+I+G; COI codon position 3 + ND2 codon position 3 -GTR+G+I. BI was accessed in MrBayes 3.2.1 (Huelsenbeck & Ronquist, 2001;Ronquist & Huelsenbeck, 2003) and run on the CIPRES science gateway (http://www.phylo.org). Two independent runs with 20,000,000 generations were implemented in parallel and a sampling frequency of every 2,000 generations was employed. For each run the 4,000 early-phase samples were discarded as burn-in and the two runs were combined using LogCombiner (Drummond & Rambaut, 2007) to estimate a consensus tree. ML analysis was performed with RAxML GUI 1.3 (Silvestro & Michalak, 2012) with 20 random addition replicates. Of the models selected, the GTR+G+I model was used for the ML analysis. Reliability of the ML tree was assessed by thorough analysis for 1,000 bootstrap replications. A calculated posterior probability in the Bayesian tree 0.95 or a bootstrap support in the ML tree 70 was considered to indicate strong support for a given clade (Hillis & Bull, 1993;Erixon et al., 2003).

Species delimitation
Pairwise genetic distances (Kimura-2-prameter) between taxa were calculated using MEGA 6.0 (Tamura et al., 2013). DNA-based species delimitations were tested using separate data sets of COI, ND2 and 28S with the automatic barcoding gap discovery (ABGD) method. The ABGD analyses were performed at the web interface (http://wwwabi.snv.jussieu.fr/public/abgd/, web version April 11, 2013), with a prior P that ranges from 0.005 to 0.1, and the simple distance model. This method statistically infers the DNA barcode gap in a single locus alignment, partitioning the data based on this gap in putative species (Puillandre et al., 2012).

Phylogenetic analysis
The GenBank accession numbers for the obtained DNA sequences are shown in Table 1. The final sequence alignments length of COI, ND2 and 28S were 666 (252 variable, 225 parsimony informative), 1,047 (579 variable, 469 parsimony informative) and 1,002 (182 variable, 110 parsimony informative) bases long, respectively. Moreover, no strong evidence of Numts was found throughout our mitochondrial data, since neither stop-codons nor frameshifts were detected within the COI and ND2 sequences. In addition, the dN/dS ratio analyses did not detect any recent COI or ND2 pseudogene.
In general, the ML (Fig. 2) and Bayesian trees (Fig. S1) were similar in their topologies, especially at the terminal branches. Based on the phylogeny, Pseudostegana were recovered as a monophyletic genus with strong support. Within Pseudostegana, the monophyly of the fleximediata, latiparma and javana group all received strong support. The zonaria group was separated in two clades: (I) Pseudostegana insularis + Pseudostegana silvana and (II) Pseudostegana nitidifrons + Pseudostegana amnicola sp. nov. + Pseudostegana mailangang sp. nov. The monophyletic status of zonaria group is not supported by the Bayesian and ML analyses, as Pseudostegana insularis and Pseudostegana silvana were phylogenetically more closely related to the latiparma group ( Fig. 3 and Fig. S1). At the species level, the phylogeny of the combined data set yielded 13 monophyletic clades and seven singletons with strong support.

Species delimitation
The values of genetic variation (K2P distance) across taxonomic level are summarized (Table S3, S4 and S5). Intraspecific genetic variation calculated using COI ranged from 0.0% to 6.0% and the maximum intraspecific variation was detected in P. amnicola. In most cases, small intraspecific distances (<1%) were observed. The interspecific genetic variation ranged from 8.6% to 19.0% and the minimal interspecific genetic variation exceeded the maximum intraspecific genetic variation ( Fig. 3A; Table S3). Intraspecific genetic variation calculated using ND2 ranged from 0.0% to 5.6% and the maximum intraspecific variation was also detected in P. amnicola. The interspecific genetic variation ranged from 5.6% to 20.2%. Thus the intraspecific and interspecific genetic variation slightly overlapped ( Fig. 3B; Table S4). Compared to the mitochondrial genes, genetic variation for the nuclear gene 28S was small. The interspecific genetic variation ranged from 0.0% to 0.5%, while interspecific genetic variation ranged from 0.1% to 3.3%. The intraspecific and interspecific genetic variation largely overlapped ( Fig. 3C; Table S5).
Species delimitation with the ABGD method based on the COI and ND2 resulted in 22 (Table S6) and 20 (Table S7) molecular operational taxonomic units (MOTUs), respectively. These two mitochondrial fragment were largely congruent in most of the MOTUs, while the analysis based on COI fragment divided P. amnicola to two groups (Fig. 2). The analysis based on the 28S data set yielded the lowest number of MOTUs (Table S8).   Abdomen: Sternites usually yellow to brown. Male terminalia: Epandrium broad, sometimes slightly constricted mid-dorsally, pubescent except for anterior margin. Surstylus separated from epandrium, mostly lacking pubescence, with several setae on outer and inner surfaces. Cercus separated from epandrium, pubescent and setigerous. Hypandrium broad, large, laterally mostly with one pair of paramedian setae, mid-anteriorly connected with apical part of aedeagal apodeme by aedeagal guide. Paramere with two long sensilla distally and several, small sensilla. Gonopods forming postero-median lobe, baso-laterally contiguous to parameres. Aedeagus usually with one pair of flap-like, serrated processes basally. Aedeagal apodeme long, rod-shaped, basally laterally flattened.

Systematic accounts
The fleximediata species group     4H). Pleura dark brown in male (Fig. 4K), dark brown to black in female. Legs: Mostly yellow; brown to dark on mid-and hindleg femora. Abdomen: All tergites brown (male) to dark brown (female), yellow latterly (Fig. 4I). Sternites brownish yellow in male, brown in female.

Measurements
Description: Male and female. Head: Ocellar triangle brown on posterior three-quarter, dark brown on anterior a quarter (Fig. 6A). Frons brown (Fig. 6A). Face brown above, dark brown below. Clypeus black. Gena brown. Palpus dark brown, broadened in male, medially one-third as wide as long in male (Fig. 6D), brown in female. Thorax: Mesonotum brownish yellow anteriorly, yellowish brown to brown posteriorly (Fig. 6B); scutellum yellowish brown, yellow at tip (Fig. 6B). Pleura glossy, yellow on anterior one-third, dark brown on posterior two-thirds (Fig. 6E). Legs: Yellow. Abdomen: Tergites glossy, brownish yellow on second and third in male but only on second in female, the rest black (Fig. 6C). Sternites brownish to dark brown. Diagnosis: This species is distinguished from Pseudostegana amoena sp. nov. by having the mesonotum brown, submedially with two pairs of yellow longitudinal stripes not reaching the scutellum (Fig. 6H); the pleura yellow on anterior quarter, brown on posterior 3/4 (Fig. 6K); the aedeagus ventrally protruded submedially in lateral view (Fig. 16C).

DISCUSSION
DNA sequence data may process effective species boundary information, which provides a useful tool for taxonomic studies (Hamilton et al., 2014;Kekkonen et al., 2015). The integration of DNA sequence data and the traditional morphological characters increases the ease and reliability of both species identification and species discovery (Vogler, 2006;Cardoso, Serrano & Vogler, 2009;Kekkonen & Hebert, 2014;Roberts et al., 2016). We proposed the Pseudostegana species based on morphological variation and then clarified their status by DNA data. Although some new species are described (e.g., Pseudostegana alpina sp. nov. and Pseudostegana meiduo sp. nov.) based on few observed specimens, both morphological and molecular analysis support our taxonomic hypothesis. The new species, Pseudostegana alpina sp. nov., Pseudostegana amoena sp. nov., Pseudostegana mailangang sp. nov., Pseudostegana meiduo sp. nov., Pseudostegana meiji sp. nov., Pseudostegana stictiptrata sp. nov., Pseudostegana stigmatptera sp. nov., Pseudostegana ximalaya sp. nov. and Pseudostegana zhuoma sp. nov., which we described here, were recovered as distinct entities in phylogenetic trees and the ABGD analyses. species by having distinct cross band at the basal position of the wing. Unfortunately, no other fleximediata group species was included in the present phylogenetic analyses for testing its phylogenetic status. We temporarily place Pseudostegana mediuo sp. nov. in the fleximediata group, but its taxonomic status should be evaluated by additional sampling. Phylogenetic analyses recovered the monophyly of the latiparma and javana groups, but the zonaria group was found to be paraphyletic as clade I (Pseudostegana insularis + Pseudostegana silvana) were separated from clade II (Pseudostegana nitidifrons + Pseudostegana amnicola sp. nov. + Pseudostegana mailangang sp. nov.). The latter clade is morphologically different from the former concerning the shape of the palpus. The palpus of Pseudostegana latifasciata, Pseudostegana nitidifrons, Pseudostegana amnicola sp. nov. and Pseudostegana mailangang sp. nov. are slender and rod-like shape, while the palpus of the other zonaria group species are expanded (Chen, Toda & Wang, 2005).
Yunnan and the adjacent area (Southwest China) are located at the junction of the Himalaya, Mountains of Southwest China and Indo-Burma biodiversity hotspots (Myers et al., 2000). In recent years, fieldwork by members of our laboratory have revealed a hidden diversity of Pseudostegana species in this area, where previously only eight species were reported (Chen, Toda & Wang, 2005;Li, Gao & Chen, 2010). Southwest China total contains 38% (19 out of 50; including new species described in this study) of Pseudostegana species, and all of them are endemic to this region. Southwest China seems to be an important center of diversification of Pseudostegana species. Moreover, the short, relatively ancient branches of the phylogenetic trees suggest that adaptive radiation probably occurred during the early evolutionary history of Pseudostegana. Although we provided a molecular phylogeny for the Chinese Pseudostegana, additional research on systematics including other species groups and species in Southwest Asia, will be needed to better understand the origin and diversification of this genus.