Molecular phylogeny of the Anopheles hyrcanus group (Diptera: Culicidae) based on rDNA–ITS2 and mtDNA–COII

The Anopheles hyrcanus group, which includes 25 species, is widely distributed in the Oriental and Palaearctic regions. Given the difficulty in identifying cryptic or sibling species based on their morphological characteristics, molecular identification is regarded as an important complementary approach to traditional morphological taxonomy. The aim of this study was to reconstruct the phylogeny of the Hyrcanus group using DNA barcoding markers in order to determine the phylogenetic correlations of closely related taxa and to compare these markers in terms of identification efficiency and genetic divergence among species. Based on data extracted from the GenBank database and data from the present study, we used 399 rDNA–ITS2 sequences of 19 species and 392 mtDNA–COII sequences of 14 species to reconstruct the molecular phylogeny of the Hyrcanus group across its worldwide range. We also compared the performance of rDNA–ITS2 against that of mtDNA–COII to assess the genetic divergence of closely related species within the Hyrcanus group. Average interspecific divergence for the rDNA–ITS2 sequence (0.376) was 125-fold higher than the average intraspecies divergence (0.003), and average interspecific divergence for the mtDNA–COII sequence (0.055) was eightfold higher than the average intraspecies divergence (0.007). The barcoding gap ranged from 0.015 to 0.073 for rDNA–ITS2, and from 0.017 to 0.025 for mtDNA–COII. Two sets of closely related species, namely, Anophels lesteri and An. paraliae, and An. sinensis, An. belenrae and An. kleini, were resolved by rDNA–ITS2. In contrast, the relationship of An. sinensis/An. belenrae/An. kleini was poorly defined in the COII tree. The neutrality test and mismatch distribution revealed that An. peditaeniatus, An. hyrcanus, An. sinensis and An. lesteri were likely to undergo hitchhiking or population expansion in accordance with both markers. In addition, the population of an important vivax malaria vector, An. sinensis, has experienced an expansion after a bottleneck in northern and southern Laos. The topology of the Hyrcanus group rDNA–ITS2 and mtDNA–COII trees conformed to the morphology-based taxonomy for species classification rather than for that for subgroup division. rDNA–ITS2 is considered to be a more reliable diagnostic tool than mtDNA–COII in terms of investigating the phylogenetic correlation between closely related mosquito species in the Hyrcanus group. Moreover, the population expansion of an important vivax malaria vector, An. sinensis, has underlined a potential risk of malaria transmission in northern and southern Laos. This study contributes to the molecular identification of the Anopheles hyrcanus group in vector surveillance.


Background
The Anopheles hyrcanus group consists of at least 25 species and is classified into the Myzorhynchus series of Anopheles, with one provisional designated member [1,2]. The members of this group are extensively distributed within the Oriental and Palaearctic regions, including a number of species capable of transmitting not only malaria [3][4][5][6] and filariasis [7,8], but also Japanese encephalitis virus [9][10][11]. According to previous studies, Anopheles sinensis and An. lesteri are the major malaria vector present in China [12]; An. hyrcanus acts as a potential malaria vector in the south of France [13,14]; An. kleini and An. pullus are the primary malaria vectors in South Korea [3]; An. sinensis, An. nigerrimus and An. peditaeniatus are the potential malaria vectors in Thailand [4]; An. hyrcanus group is among the major Anopheles species found across eight provinces in Laos [15]. As suggested in our recent study, An. sinensis is the predominant Anopheles species and suspected to be extensively distributed in the north of Laos (Phoshaly Province) [16]. Since the primary malaria vectors are classified mainly into the Hyrcanus group, it is essential to devise an efficient and precise method to identify the members of this group [17], which is an essential requirement for malaria vector surveillance in practice [18,19]. However, even trained taxonomists are unlikely-or find it extremely difficult-to accurately distinguish species within the Hyrcanus group based only on morphological properties [6,7] due to the significant variation in morphology and the adults of some closely related species exhibiting nearly identical adult morphological properties [8,9].
DNA barcoding refers to an important addition to conventional approaches based on morphology and an effective tool that is used to identify species without the need to consider life stages. A DNA marker that is evolving at the species level can contribute toward accurate phylogenetic reconstruction in the Hyrcanus group and elucidate the ambiguity arising from an improper classification process [20,21]. The internal transcribed spacer 2 (ITS2) has been commonly employed to address taxonomic problems in the Hyrcanus group due to its low intraspecific and high interspecific variability, as suggested in an abundance of studies [8,9,18,[22][23][24]. Using this marker, three newly proposed lineages revealed, including two species, An. belenrae and An. kleini, separated from An. sinensis [25], and one species showing a close relation to An. hyrcanus, with the provisional designation of An. hyrcanus sp IR [18]. The mitochondrial cytochrome c oxidase subunit region (e.g. COI and COII) was taken as the standard barcode for identifying species within an extensive range of animal taxa [18,19] and may be effective in providing barcoding data, in particular for assessing interspecific hybridization. Nevertheless, introgression in animals is considered to frequently involve mitochondrial DNA (mtDNA), as evidenced by recently appearing hybridization events among species [26,27].
In order to reconstruct the molecular phylogeny of the Hyrcanus group, it is necessary to identify the barcoding gap of ITS2 and COII. Accordingly, the specimens of identical species in different geography sites must be examined [19,28] for calculating the intraspecific and interspecific variations of COII and ITS2 within the group. The database of COII and ITS2 sequences in Gen-Bank enables reference sequences to be used for identifying Hyrcanus group species based on a comparatively extensive geographic distribution [29]. Therefore, in the present study, GenBank sequences and data from our original study in northern and southern Laos were used to reconstruct a phylogeny for the Hyrcanus group on the basis of ITS2 and COII, with the aim to determine the phylogenetic correlations between taxa with close relations. In addition, we compared rDNA-ITS2 and mtDNA-COII in terms of their efficiency to distinguish different species and to determine the genetic divergence among different species in the Hyrcanus group, thereby contributing to the identification of molecular data on mosquitoes for use in malaria vector surveillance.

Mosquito collection and identification
Adult mosquitoes were collected by overnight trapping with battery-operated CDC light traps (model 1012; John W. Hock Inc., Gainesville, FL, USA) in cattle/pig pens or human residences (rooms) from 20:00 h to 08:00 h in Pathoomphone County (Champasak Province), Pak lay County (Xaignabouli Province) and Yot Ou County (Phongsaly Province), Laos, in 2018 (Additional file 1: Fig. S1). The live adult mosquitoes were killed by freezing. Subsequent isolation and identification procedures were carried out based on gender, species and subgroup, under a dissecting microscope using standard techniques [30,31]. All mosquitoes were first morphologically sorted in the field using the keys of Das et al. [32]. Each morphologically identified specimen was kept individually in a 1.5-ml microcentrifuge tube filled with 75% ethanol and then stored at 4 ℃ for molecular confirmation of species and further processing.

Sequence alignment and phylogenetic analysis
The keywords "(species name)" and "ITS2/COII" were used to search for ITS2 or COII sequences of members of the Hyrcanus group deposited in GenBank. ITS2 and COII sequences that were distant from conspecific sequences after initial sequence alignment were eventually excluded from further analyses. In total, 691 ITS2 sequences and 368 COII sequences of the Hyrcanus group were ultimately extracted from GenBank and used in this study (Additional file 2: Tables S1, S2). These sequences were subsequently aligned and identical sequences obtained from the same dataset or species were excluded from further analysis. Ultimately, 267 ITS2 and 260 COII sequences (haplotypes) were further screened in a genetic divergence analysis and phylogenetic analysis (Additional file 2: Tables S3, S4). A total of 132 COII and ITS2 sequences from 89 An. sinensis, 18 An. peditaeniatus, 3 An. nitidus, 8 An. argyropus and 14 An. nigerrimus were generated in this study. Table 1 provides detailed information on the ITSE and COII sequences.
The ITS2 and COII sequence dataset was combined with data on fragments from our original study and records retrieved from GenBank. A multiple sequence alignment was conducted in MEGA-X [33], while the manual adjustment was made using BioEdit V7.0.9 if required [34]. Gaps were excluded from the analysis and characters were unweighted. Both the maximum likelihood (ML) tree and the Neighbor-Joining (NJ) tree were performed with 1000 bootstraps in MEGA X [33]. The NJ method generally reveals shallow intraspecific and deep interspecific divergences [19,35], for which a bootstrapped NJ tree was constructed using 1000 replicates [36] to provide a graphical representation of the phylogenetic correlations among the Hyrcanus group members. Anopheles lindesayi (GenBank accession no. AJ620898) and An. claviger (GenBank accession nos. AY129232 and DQ229313) were taken as outgroup taxa to the Hyrcanus group, in line with a prior study [37]. The visualization and the editing of the tree were performed using FigTree v1.4.2 [38].

Intra-and interspecific ITS2/COII variation
The mean intra-and interspecific K2P distances of the ITS2 sequence in 19 Hyrcanus group members and those of the COII sequence in 14 Hyrcanus group members were computed and compared using the K2P distance model [39] in this study. Tables 2 and 3 show the intraand interspecific divergences of ITS2 and COII in the Hyrcanus group. Individual species were represented as few as one and as many as 143 individuals, for a total of 399 ITS2 sequences, and by one to 140 individuals for a   [1,9,14,18,29], and further studies are required before a definitive conclusion can be drawn. Accordingly, the interspecific K2P distance varied from 0.073 between An.     (Table 3). Accordingly, the interspecific K2P distance varied from 0.025 between An. sineroides and An. paraliae to 0.081 between An. crawfordi and An. argyropus, with an average of 0.055. Based on these findings, the intragroup species divergence in the COII sequence was approximately eightfold higher than the average withinspecies divergence. The ITS2 barcoding gap ranged from 0.015 to 0.073 (Fig. 1a), while the COII barcoding gap ranged from 0.017 to 0.025 (Fig. 1b), suggesting that the ITS2 spacer can serve as a more effective marker than COII for differentiating members of the Hyrcanus group. In Fig. 2, each dot represents a species, with interspecific distance on the Y-axis and intraspecific distance on the X-axis. Notably, there are more ITS2 dots than COII dots close to the top left-hand corner of the graph.

Phylogenetic analysis
ITS2 and COII records obtained from GenBank were combined with sequences obtained in the original study and suspicious fragments (those distant from conspecific sequences after initial sequence alignment) were excluded, leaving 399 ITS2 sequences of 19 members within the Hyrcanus group, together with 392 COII sequences of 14 Hyrcanus group members to reconstruct a phylogenetic tree. The topology of the NJ tree and ML tree showed an approximate consistency in terms of main lineage, despite a slight difference in node confidence data between the two (Additional file 3: Fig. S2; Additional file 4: Fig. S3). Although the ITS2-or COIIbased phylogenetic tree was consistent with conventional morphology taxonomy in terms of species recognition, its subgroup arrangement failed to comply with that achieved under morphology-based grouping.
The NJ-K2P analysis of the ITS2 sequences resulted in the identification of two major clusters in the Hyrcanus group: the Nigerrimus and Lesteri-Unassigned species subgroups, respectively. . Each of these species was arranged on a single branch and had the homolog to its closest taxon in the tree, demonstrating their potential role as the candidate species or recent divergence. However, one An. kweiyangensis (GenBank accession no. AF261150.2) was classified into the An. liangshanensis clade, one An. engarensis (GenBank accession no. AB159604.1) was classified into the An. klein clade and one An. hyrcanus sp IR was classified into the An. hyrcanus clade (Additional file 3: Fig. S2a). All lineages covering individuals representing the same species were supported by high bootstrap data, with the exceptions of An. pseudopictus and An. hyrcanus, which exhibited barcode congruence with a significantly small interspecific distance (0.001) ( Tables 2,  3). Moreover, slight genetic divergence was also observed between An. lesteri and An. paraliae (0.042), between An. kleini and An. engarensis (0.069), between An. liangshanensis and An. kweiyangensis (0.098), between An. hyrcanus and An. hyrcanus sp IR (0.020) and between An. hyrcanus sp IR and An. pseudopictus (0.020) (Tables 2, 3).
According to the NJ-K2P analysis conducted on COII sequences, the group fell into a minimum of three major clusters. The first cluster comprised only An. nigerrimus; the second cluster included An. nitidus, An. pursati and An. argyropus; the third cluster included An. sinensis, An. belenrae, An. kleini, An. lesteri, An. paraliae, An. crawfordi, An. hyrcanus, An. peditaeniatus, An. sineroides and An. pullus. Nearly all those node-linking sequences of individuals pertaining to the identical species showed high bootstrap value; however, the correlation of An. sinensis/An. belenrae/An. kleini remained unclear. Instead, they exhibited extremely small pairwise distance data (Tables 2, 3) which led to the formation of a distinct clade with high node confidence data (Additional file 3: Fig.  S2b) and An. nitidus, which is close to the average interspecific distance (0.055). Thus, it is practically possible for these sequences from An. lesteri individuals to be incorrect, these results can presumably be attributed to the misidentification of original specimens.

Demographic history and neutrality test on the basis of ITS2 and COII sequences
A demographic history and neutrality test was further conducted using a total of 823 ITS2 and 500 COII sequences of the Hyrcanus group extracted from Gen-Bank and our original data. Tables 4 and 5 (Table 5).
A smooth and unimodal mismatch distribution was detected in An. lesteri, An. sinensis, An. hyrcanus and An. peditaeniatus using both markers, which conforms to the expected mismatch distributions under the sudden expansion model. In addition, a smooth and unimodal mismatch distribution was also detected not only in the population of An. liangshanensis on the basis of ITS2, but Table 4 Genetic diversity indices and neutrality tests (Fu's Fs and Tajima

Subdivision of the Hyrcanus group
Based on morphological properties, the Hyrcanus group can be classified into three subgroups [48,49]: the Nigerrimus subgroup, including An. nigerrimus, An. nitidus, An. pursati and An. pseudosinensis; the Lesteri subgroup, including An. lesteri, An. paraliae, An. peditaeniatus, An. crawfordi and An. vietnamensis; and species in an unassigned subgroup. Given the difficulty in identifying cryptic species by morphological characteristics alone, molecular methods have been used as powerful tools to complement traditional morphological taxonomy [10,11,17]. However, the barcoding of DNA for all members of the Hyrcanus group has scarcely been performed. Fang et al. were the first to have reconstructed the molecular phylogeny and analyzed the genetic divergence of the Hyrcanus group through two DNA barcoding markers, namely, ITS2 and COI [50,51]. Based on the GenBank database and their original study data, they used 461 ITS2 sequences of 19 species and 466 COI sequences of 18 species to reconstruct the molecular phylogeny of the Hyrcanus group across its worldwide geographic range [50,51].
Similarly, both rDNA and mtDNA were used in the present study to perform DNA barcoding. So far, there are 691 ITS2 of 19 species and 368 COII of 14 species of the Hyrcanus group deposited in GenBank (Additional file 2: Table S1). The NJ tree obtained in this study supported the monophyly of the Hyrcanus group, but the subgroup arrangement based on the rDNA and mtDNA DNA markers failed to comply with that indicated by morphological characteristics. Based on ITS2 sequences, there were two major subgroups that could be recognized: the first group includes An.  [51]. In addition, as indicated by the tree topologies in prior studies [8,9,18], An. crawfordi had the smallest distance to An. peditaeniatus; however, according to the results of the study of Fang [51] and the present study, An. crawfordi was at a distance from An. peditaeniatus, but approached An. lesteri and An. paraliae to form a single clade in the NJ tree based on ITS2. Moreover, An. pseudopictus individuals were embedded in the An. hyrcanus  [14,51]. Similarly, An. engarensis and An. kleini were found to be of the identical species according to the study of Hwang [9] and the current study. However, the results allow for some speculation on the relationship between An. kweiyangensis and An. liangshanensis. In the ITS2 tree built in the study of Fang [51] and in the present study, the two species were classified into the same clade, but they were classified into two different clades in the COI tree [50].
Notably, three major clusters were found using the COII sequences in this study: An. nigerrimus was separated from the Nigerrimus group to form the first cluster; An. nitidus, An. pursati and An. argyropus were included in the second cluster; and all remaining species were included in the third cluster. Fang et al. reported that the pairwise distance of An. paraliae and An. lesteri reached 0.019, but that it was impossible to distinguish the two species in the phylogenetic tree based on COI [50]. In this study, however, the two species were easily distinguishable in the NJ tree based on COII, despite the pairwise distance of An. lesteri and An. paraliae reaching as low as 0.017. In their study, Taai et al. considered An. paraliae to be a synonym of An. lesteri, based on the results of crossing experiments performed between these two species using data on species distributions, morphological variants, cytology and comparative DNA sequence analyses [29]. Therefore, An. paraliae and An. lesteri should be treated as two closely related species in this study. In addition, the relationship between An. sinensis, An. belenrae and An. kleini remained inconclusive in the COII tree. Showing significantly low pairwise distance values (Tables 2, 3), these species contributed to the formation of a monoclade with a high achievement of node confidence (Additional file 3: Fig. S2), which suggests the potential occurrence of ancient hybridization in the aforementioned three species with close correlations.
Due to the failure in previous studies to cover an appropriate number of species, it is difficult to detect subgroups in the phylogenetic tree, as a result of which no consistent results were produced in the current and prior studies on the genetic relationships of Hyrcanus group members. The genes submitted to GenBank were not properly presented, with a number of error sequences possibly in the database [52,53]. In this study, some error sequences related to the authors who submitted these sequences to GenBank were used to conduct the phylogenetic analysis, which led to misidentification (See Results section Phylogenetic analysis). These error sequences must be removed from future phylogenetic analyses.

Phylogenetic reconstructions with the use of COII and ITS2
An effective DNA marker must exhibit low intraspecific distances and large interspecific distances [54]. In this study, we found that the COII barcoding gap varied from 0.017 to 0.025, whereas that of ITS2 ranged from 0.015 to 0.073. The ITS2 interspecific genetic distance was 125fold higher than that of the average intraspecies difference; in comparison, the average COII divergence among group of species was merely eightfold higher than that within species.
Since mtDNA is characterized by maternal inheritance, any offspring or hybrid would have only the mtDNA of the maternal species. Accordingly, hybridization can lead to shared or highly consistent sequences in the mitochondrial genome. Therefore, the major downside of employing COII for phylogenetics research of the Hyrcanus group is that COII fails to distinguish between those species with a close relationship, i.e. An. sinensis, An. belenrae and An. kleini. As suggested by the divergent mtDNA of An. belenrae and An. kleini, the mitochondrial genomes of incipient sibling species were sympatrically replaced by those of An. sinensis extensively in many regions. In general, recent hybridization can play a role in transferring mtDNA from one species to another to cause mtDNA variation [26,55,56]. Thus, results on mtDNA are considered to be more useful when speculating about the possibility of ancient hybridization in mosquito molecular phylogeny. In contrast, rDNA has been suggested to be more reliable than mtDNA in addressing evolutionary problems with the recently diverged taxa or cryptic species of mosquitoes [51] and in building species boundaries when these can not be addressed by using mtDNA. Hence, the ITS2 marker has been widely used for species identification and phylogenetic reconstruction for the Hyrcanus group [8,9,18,22,23].

Demographic history of An. sinensis population
According to the results of the neutrality tests and mismatch distribution, a number of members of the Hyrcanus group, namely An. peditaeniatus, An. hyrcanus, An. sinensis and An. lesteri, are suspected to have experienced hitchhiking or population expansion based on both markers. These four species show an extensive distribution, and some have the potential to transmit malaria. In addition, the An. sinensis samples collected from Laos in this study showed significant negative values in the neutrality tests, according to both markers. A smooth and unimodal mismatch distribution of ITS2 and COII was observed in the population of An. sinensis from Laos, suggesting that the expansion occurred after a bottleneck was reached. According to several recent studies, An. sinensis acted as an important vector of local malaria outbreak, which highlights a potential risk of malaria