Molecular Phylogeny of the Butterfly Genus Polytremis (Hesperiidae, Hesperiinae, Baorini) in China

Background The genus Polytremis, restricted to the continental part of the southeastern Palaearctic and northern Oriental regions, is one of the largest and most diverse lineages of the tribe Baorini. Previous studies on the genus were focused mainly on morphological classification and identification of new species. Due to the lack of effective and homologous traits of morphology, there were many challenges in the traditional classification. In this report, we reconstruct the phylogeny to provide a solid framework for these studies and to test the traditional limits and relationships of taxa. Methodology and Principal Findings We sequenced a mitochondrial and three nuclear gene fragments, coupled with an evaluation of traditional morphological characters, to determine the phylogenetic relationships for a total of 15 species representing all major species groups of the Polytremis genus in China, and to elucidate their taxonomic status. Conclusions and Significance Analysis of mitochrondial and nuclear DNA showed considerable congruent phylogenetic signal in topology at the inter-species level. We found strong support for the monophyly of Polytremis and some clades were recognized with morphological data. Thus, the COI sequence in our study could be used as a DNA barcode to identify almost all members of the genus. However, incongruences of phylogenetic analyses occurred: in contrast to the phylogenetic trees of mitochondrial COI, it was not possible for nuclear rDNA to discriminate P. gotama from P. caerulescens, suggesting a possible recent separation of these two species. Additionally, P. theca was the only species with a greater intra-specific genetic distance compared to some inter-specific genetic distances in this study and some problems associated with the cryptic diversity of the species are discussed. The results of this study will helpful to reveal the causes of the high degree of diversity of butterflies, and possibly other groups of insects in China.


Introduction
Butterflies have long served as a model system for ecological and evolutionary studies on the basis of the high degree of diversity and complexity [1]. The family Hesperiidae, commonly known as ''skippers'' includes around 4000 species, of which Hesperiinae is the largest subfamily. Baorini is a tribe of subfamily Hesperiinae and one of the largest and most diverse lineages of the tribe. The genus Polytremis Mabille, 1904 has a thick body and relatively small wings. These wings are commonly yellowish brown or dark brown [2]. The main external features are characterized by serial, linear, semi-hyaline spots and the absence of a cell spot on the underside of each hindwing (except P. gotama  which has a silver short line in discal cell), as well as the unspined mesotibia. Male genitalia are distinguished by the swollen tegumen, bifid uncus and elongated harpe [3,4].
The genus Polytremis has 18 members and is restricted geographically to the continental part of the southeastern Palaearctic and northern Oriental regions. These species, except P. minuta and P. annama, were described in China, including 11 Chinese endemic species [3][4][5][6][7]. Evans [5] recognized 12 species and the other 6 species were reported in recent years [2][3][4]6]. Since a majority of the species is gathered in southern China, we suggest this region could be the primary area where Polytremis originated and separated.
Earlier studies of the genus Polytremis were focused mainly on morphological classification, population distribution and identification of new species. There is a lack of effective and homologous traits of morphology in most species of the family Hesperiidae. Similarly, there were many challenges in the traditional classification system and phylogenetic relationship based on morphological classification in the genus Polytremis [8,9]. The inherent defects of phenotypic plasticity and genetic variability in traditional classification can also lead to lack of effective identification. [10,11].
Analysis of DNA has been widely used in the phylogenetic studies of the family Hesperiidae using mitochondrial cytochrome c oxidase I (COI) and II (COII) genes or nuclear Wingless, EF-1a genes. [12,13]. These methods are more effective and specific than traditionally morphological methods in genetic variation and phylogenetic relationships involved in sibling species, cryptic and morphologically intermediate species [14,15]. In this study, we constructed the molecular phylogenetic trees of mitochondrial gene COI and three parts of nuclear genes termed the D3 region of 28S rDNA and the V4 and V7 regions of 18S rDNA from the Chinese species of Polytremis. In addition, we discuss how far the results of molecular phylogeny are congruent with traditional classification based on morphology. We also explore the feasibility of the mitochondrial COI gene as a DNA barcode for classification and identify members of the genus Polytremis. Consequently, we tested 67 Polytremis spp. specimens by molecular techniques to: (1) discuss the relationships between the genetic lineages and morphological variation; (2) investigate whether DNA barcodes can be used to identify Polytremis spp. conveniently and accurately; and (3) assess whether there is cryptic diversity or the existence of morphological intermediates in the genus Polytremis.

Results
COI and rDNA (D3+V4+V7) from all specimens were amplified and sequenced. The results of the nucleotide substitution model detected with software DAMBE (Data Analysis in Molecular Biology & Evolution) show that the COI and the concatenated sequences (COI+rDNA) do not exhibit saturation in either transition or transversion and can be used for further phylogenetic analysis (Fig. 1).

(a) Genetic Divergences
The data set of COI alignment contains 490 nucleotide positions, of which 144 positions are variable and 97 are parsimony informative. The mean base composition of the fragment shows a strong bias of A+T (T 39.3%, C 18.5%, A 28.6% and G 13.6%), as found commonly in insect mitochondrial genomes [16]. In all, 31 haplotypes of Polytremis and outgroups were found and deposited in GenBank with accession numbers KC684389-KC684419 (Table 1). The average inter-specific K2P distance is 7.9% and the shortest distance was observed between P. gigantea Tsukiyama, 1997 and P. suprema Sugiyama, 1999 (K2P distance 1.7%). The intra-specific distance ranges up to 4.2% for the P. theca Evans, 1937. The specimens of their subspecies P. theca theca Evans, 1937 and P. theca fukia Evans, 1940 are characterized by K2P distances ranging up to 1.9% and 2.6%, respectively.
The nuclear DNA (D3+V4+V7) aligned data set contains 1048 nucleotide positions without gaps or stop codons, of which 42 positions are variable and 25 are parsimony informative. The mean base composition of the fragments shows no significant BIAS (T 24.3%, C 23.2%, A 23.8% and G 28.7%) (accession numbers KC684338-KC684388 and KC692371; Table 1). The average inter-specific K2P distance is 1.0%. The K2P distance between P. theca theca and P. theca fukia is 0.3%. P. gotama and P. caerulescens Mabille, 1876 share the same sequence.

(b) Network of Genus Polytremis
Twenty-seven mitochondrial COI haplotypes of the genus Polytremis were used for network construction with the software Network4.5 using the median-joining method [17] along with sequences of other butterflies retrieved from GenBank (presented in Table 1). The results of network are shown in Fig. 2B. The network of concatenated sequences (COI+rDNA) gives substantially the same network of relationships as that of COI (data not shown).

(c) Phylogenetics
Figures 2A shows the ML tree based on the data set of COI and reveals five main clades. Clade I contained 8 species: P. caerulescens, P. kiraizana Sonan, 1938, P. matsuii Sugiyama, 1999, P. suprema and P. gigantea are first clustered with a strong support value. This is followed by clustering of P. gotama and P. nascens Leech, 1893. Finally, the newly described species P. jigongi Zhu, 2012 is revealed [4]. Two haplotypes of P. jigongi formstrongly supported lineages and are clearly separate from the other species. The genetic data confirms the validity of the new species. Clade II contains only P. theca reported to include three subspecies [5,7], which shows a greater intra-specific genetic distance than some inter-specific genetic distances in the genus Polytremis (see above). Clade IV contains only one species, i.e., P. mencia Moore, 1877. Furthermore, the sister group relationship between P. pellucida Murray, 1875 and P. zina Evans, 1932 in Clade III is confirmed. Clade V contains three species, P. lubricans Herrich-Schä ffer, 1869, P. eltola Hewitson, 1869 and P. discreta Elwes & Edwards, 1897, which are distributed sympatrically in the oriental region throughout India and Malaya. The ML tree of concatenated sequences (COI+ rDNA) gave substantially the same topologies as that of COI (Fig. 3).

(d) Combined Morphological and Molecular Analysis
Hierarchical Cluster analysis of the 20 morphological characters yields a two-cluster solution (Fig. 4). The first cluster includes 12 species of Polytremis. The second includes the remaining three species and the outgroups. The combined analysis of morphological and molecular data returns three equally parsimonious trees with length (L) 328 steps, consistency index (HI) 0.6799 and retention index (RI) 0.8323. All supported clades from the combined data matrix are clades also appearing when the molecular data matrix is analysed on its own (Fig. 5).

(a) Genetic Divergence
The level of DNA sequence divergence reflected the taxonomic hierarchy of the original species. On the basis of our data, the lowest intra-specific COI genetic distance was observed between P. gigantea and P. suprema (K2P distance 1.7%). Intra-specific distances, with only one exception (P. theca), were shorter. The COI data confirmed the sister group relationship between P. gigantea and P. suprema, which form a monophyletic group together with P. caerulescens, P. kiraizana and P. matsuii. All of their interspecific distances were ,3% (K2P distance). This was not surprising, because the five species shared many morphological traits inclding absence of a cornuti, thin coecum penis and ear-like uncus with a pair of processes.
We found that P. theca was the only species for which the intraspecific genetic distance was greater than some inter-specific genetic distances based on COI in the genus Polytremis. The distance was still much less than the average inter-specific genetic distances (K2P distance 7.9%) of the genus Polytremis. The species was widely distributed in the south of the Qinling Mountains, except in the Hainan Province and the southern tropical regions of Yunnan Province in China [5,7]. A few subspecies were reported on the basis of morphological features of the wings [7]. Our specimens included two of them, namely P. theca theca and P. theca fukia. The COI tree revealed two distinct haplotype lineages without intermediates and the K2P distance was 4.2%. Additionally, the subspecies were separated by nuclear rDNA sequence (K2P distance 0.3%), suggesting the possible existence of a sibling species paired with allopatric distribution. This aspect warrants further study.
The average inter-specific rDNA genetic distance was 1.0% and far less than that of COI (K2P distance 7.9%). Except for P. gotama and P. caerulescens, other species in Polytremis could be distinguished with rDNA. P. caerulescens and P. gotama could be separated in the COI (K2P distance 1.9%), but showed no variation in the rDNA. The differences observed between results based on mitochondrial and nuclear markers may contribute to incomplete lineage sorting, introgressive hybridization or recent separation [18,19]. Considering that Polytremis is quite an old genus and the splits of COI of the two lineages are also fairly old, it seems that incomplete lineage sorting may not be an appealing explanation for the discordance [20]. Additionally, P. gotama has been described as an independent species by morphological features [21] and our COI data of the three specimens of P. caerulescens (from two populations) and one specimen of P. gotama revealed two distinct haplotype lines with   K2P distances of 1.9%, without intermediates, This observation may indicate that they were two species based on the morphological and molecular level. Nevertheless, more specimens from different population of the two species need to be collected and analyzed in future to see if this pattern is recovered consistently and further confirm the relationship of them. Instead, several arguments favor the recent separation assumption. As far as their morphological characteristics were concerned, P. caerulescens was considered to be closely related to P. gotama on the basis of the structural similarity of male genitalia and the cell spot on the upperside of the hindwing, which were not found in the other Polytremis species [22]. Additionally, only these two species were observed and captured at altitudes.2000 m (Zhu et al. unpublished data). P. caerulescens was a little larger than P. gotama. They both varied in other morphological traits, including the ground color of the wings and male stigma on the upperside of the forewing (see Table 2), which clearly support the existence of two closely related but distinct species. K2P distances of the COI  fragments reached 1.9%, whereas rDNA showed no sequence variation, suggesting a possible recent separation of these two species.

(b) DNA Barcoding and Tree-based Identificaion
Ideally, inter-specific divergence should be about 10 times higher than intra-specific divergence [23], which could form a barcoding gap between inter-and intra-species. However, the absence of a gap in some studies led researchers to caution against the use of a simple distance threshold-oriented barcoding approach [24][25][26]. As far as the studies of Lepidoptera are concerned, Hajibabaei et al [27] found such a gap in their dataset while Wiemers and Fiedler [26] failed to confirm this finding in their studies. Meyer and Paulay [24] assume that insufficient sampling on both the inter-specific and intra-specific level are responsible for the barcoding gap, while others argue that the main reason for an overlap can be found in inappropriate assumptions underlying a sequence from the DNA library (i.e. poor identification, alpha-taxonomy or incompatible species criteria).
Although thresholds have been proposed in Lepidoptera as 3% for COI [28], intra-and inter-specific genetic divergences did not fall into separate intervals and an obvious 'barcode gap' did not occur in COI in our study of Polytremis (Fig. 6). It was entailed by two factors. Firstly, The Interspecific K2P distances among five sister species (P. gigantean, P. suprema, P. caerulescens, P. kiraizana and P. matsuii) were less than 3%, which caused the intra-and interspecific genetic divergences overlap from 1.7% to 3%. Secondly, all intra-specific distances were less than 3% except for P. theca. However, we inferred the subspecies of P. theca could be a sibling species pair according the molecular and morphology data in the study. Regardless, for COI, the overlaps between intra-and interspecific variations would not affect identification in a thoroughly sampled evaluation [24]. The Polytremis species could be distinguished by tree-based methods and clustered with a well support, suggesting the COI sequence could be used as a DNA barcode to correctly identify almost all species in the genus (Fig. 2A).
The markers of the nuclear rDNA sequences used in our studies have been proposed as a reasonable alternative to mitochondrial COI. These markers could identify or delimit species or specieslike units. As they are not inherited maternally and avoid problems of mitochondrial markers such as introgression and pseudogenes [29].

(c) Analyses of Molecular and Morphological Data
ML tree constructed in this study based on COI showed that specimens belonging to the same species formed a monophyletic cluster including the new species, P. jigongi, whose earlier descriptions were based on morphology and geographical distribution [4]. Meanwhile, there was considerable congruence in topology of the inter-species level for both mtDNA COI and concatenated sequences ML trees indicating certain clades were well differentiated phylogenetically (Fig. 2, 3). The strong support for the monophyly of Polytremis was found in the analyses of the COI and concatenated alignments.
The results obtained by Hierarchical Cluster analysis showed traditional classification were basically consistent with molecular phylogeny of the genus Polytremis (Fig. 4). However, the morphological analysis resulted in only limited resolution based on just 20 morphological characters since the morphological characters and character states were commonly homologous in Polytremis. On contrary, molecular classification provided a more precise, lower artificial taxonomic rank. Thus, the combination of the morphological and molecular matrix was better resolved for understanding of the phylogeny of this genus.

(a) Ethics Statement
No specific permits were required for the described studies. No specific permissions were required for these locations. The location from where we collected the insects is not privately-owned or protected in any way. The insects used in the studies did not involve endangered or protected species. During the experiment, the insects were never maltreated.

(b) Samples Collection
We collected 67 specimens from 15 of the estimated 18 species in the genus Polytremis, including the newly reported P. jigongi [4], from different localities (Fig. 7, Table 1). All specimens were caught in the field and preserved by dehydration in small envelopes. The preliminary species-level identification was based on traditional morphological characteristics of wings, genitalia (see Table S1), locality and additional information [22]. Borbo cinnara and Pseudoborbo bevani, two species classed in the same tribe as Polytremis spp. served as outgroups in the phylogenetic analyses.

(c) Molecular Methods
The DNA was isolated from leg tissue using a QIAamp DNA Mini kit (QIAGEN, Hilden, Germany) essentially following the manufacturer's instructions but with some modification. Briefly, after adding proteinase K and buffer AL (QIAGENH), the mixed homogeneous solution was incubated at 70uC for 2 h. Subsequently, 200 mL of 100% ethanol was added and the mixture transferred to a QIAamp spin column. The mixture in the spin column was subjected to 3 cycles of centrifugation at full speed (14,000 g) for 1 min and the filtrate was returned to the spin column to increase the amount of DNA obtained.
We amplified approximately 490 bp of the mitochondrial COI gene, recommended as the universal and standard barcoding marker for species identification [23,30], using the primers RON For nuclear DNA, We selected three expansion segments (known as variable domains) of 18S rDNA and 28S rDNA, the slowly evolving genes used normally in higher classification studies [32,33]. The faster evolving mitochondrial gene and highly conserved nuclear genes were included in order to resolve deep polyphyletic structuring inferred in the earlier study by Fan [34]. The author divided the genus into two genera, Polytremis and Zinaida Evans, 1939, on the basis of the morphological and molecular data with a limited sample of 7 species. The ,240 bp long D3 region was amplified with primers CD3F and CD3R (5'-GGA CCC GTC TTG AAA CAC -3' and 5'-GCA TAG TTC ACC ATC TTT C -3') using a PCR protocol of 94uC for 5 min, then 32 cycles at 94uC for 45 s, annealing at 52uC for 45 s, extension at 72uC for 80 s and a final extension step at 72uC for 7 min. The ,390 bp region of the V4 gene fragment was amplified with the primer pair CV4F (5'-TGG TGC CAG CAG CCG CGG TAA -3') and CV4R (5'-CCT CTA ACG TCG CAA TAC GAA TGC CC -3'). The PCR temperature protocol was: 94uC for 5 min then 32 cycles of denaturation at 94uC for 45 s, annealing at 66uC for 45 s, extension at 72uC for 2 min and a final extension step at 72uC for 8 min. Finally, an ,400 bp long region of the V7 gene fragment was amplified using the forward primer CV7F (5'-CTT AAA GGA ATT GAC GGA GGG CAC CAC C -3') and the reverse primer CV7R (5'-GAT TCC TTC AGT GTA GCG CGC GTG -3') with the following PCR conditions: 94uC for 5 min then 32 cycles of denaturation at 94uC for 45 s, annealing at 68uC for 45 s, extension at 72uC for 2 min and a final extension step at 72uC for 8 min [19]. Extraction blanks were run in all reactions to control for contamination during the extraction and PCR processes. The amplification products were subjected to electrophoresis in a 2% (w/v) agarose gel in TAE buffer (0.04 M Tris-acetate, 0.001 M EDTA) with a DL1000 ladder size marker (Takara, Otsu, Shiga, Japan) to determine whether the amplification reactions were successful. In addition, after electrophoresis in agarose gels, the amplification products from leg tissues were    (Table 1).

(d) Molecular Analysis
The sequence data of the mitochondrial COI, nuclear rDNA and concatenated sequences (mitochondrial COI+nuclear rDNA) were aligned with published homologous sequence from butterflies in the same tribe as Polytremis spp., (e. g., Parnara spp., Pelopidas spp. etc.) as well as some out of the tribe (presented in Table 1), translated to amino acid sequences to check for nuclear mitochondrial pseudogenes (numts) and pruned to remove redundant sequences with Bioedit v.7.0 [35]. The haplotype sequence matrix was used for all subsequent phylogenetic analyses ( Table 1). For the COI and the concatenated data set, the substitution saturation was determined with DAMBE v.4.5 by plotting numbers of transitions and transversions against the Kimura 2-parameter distance (K2P) [36,37]. MEGA v4.0 was used to calculate the intra-and inter-specific genetic divergences based on the K2P model [38]. Phylogenetic trees were constructed by the ML methods using PAUP 4.0b10 [39]. Relationships among the mitochondrial COI and concatenated sequences (mitochondrial COI+nuclear rDNA) haplotypes were represented as a haplotype network obtained by the software DnaSP4.90 [40] and Network4.5 (fluxus-engineering.com) using the medianjoining method [17].

(e) Analysis of Some Morphological Features
To investigate the morphological variation between Polytremis species, we dissected, photographed and described a total of 20 morphological features of genitals and wings. These features recommended by Evans [5], Eliot [41] and Chou [42] were used to recognize groups and/or species within Polytremis. The description of morphological characters in each species is given in Table 2. A rectangular raw data matrix was constructed that contains 18 rows (species) and 20 columns (morphological variables). The Hierarchical Cluster procedure of the SPSS package (version 11.5, Chicago, IL, 2001) was employed using squared Euclidean distance as the measure and between-groups linkage as the cluster method. Twenty morphological variables are:    The various data partitions (morphological and molecular data) originated as Nexus files, opened with the program Mesquite [43], exported in Hennig86 /Nona format, and were opened and combined with the program Winclada [44]. Data were exported from Winclada and analyzed in TNT [45]. All analyses employed all four new technology search methods [46,47], using the default settings in TNT, except: the ratchet weighting was probability 5% and there were 200 iterations; tree drifting used 50 cycles; treefusing used five rounds; minimum length was set to be hit 25 times. Molecular gaps were treated as missing data. Stability of the nodes of the most parsimonious trees was assessed using 1000 bootstrap pseudoreplicats.