Genetic diversity and population structure of the primary malaria vector Anopheles sinensis (Diptera: Culicidae) in China inferred by cox1 gene

Anopheles sinensis is a primary vector for Plasmodium vivax malaria in most regions of China. A comprehensive understanding of genetic variation and structure of the mosquito would be of benefit to the vector control and in a further attempt to contribute to malaria elimination in China. However, there is only inadequate population genetic data pertaining to An. sinensis currently. Genetic variations and structure among populations of An. sinensis was examined and analyzed based on the nucleotide sequences of a 662 nt variable region of the mitochondrial cox1 gene among 15 populations from 20 collection sites in China. A total of 453 individuals in 15 populations were analyzed. The cox1 gene sequences were aligned, and 247 haplotypes were detected, 41 of these shared between populations. The range of haplotype diversity was from 0.709 (Yunnan) to 0.998 (Anhui). The genealogic network showed that the haplotypes were divided into two clusters, cluster I was at a high level of homoplasy, while cluster II included almost all individuals from the Yunnan population. The Yunnan population displayed a significantly high level of genetic differentiation (0.452−0.622) and a restricted gene flow with other populations. The pairwise FST values among other populations were lower. The AMOVA result showed that the percentage of variation within populations (83.83%) was higher than that among populations (16.17%). Mantel test suggested that geographical distance did not significantly contribute to the genetic differentiation (R2 = 0.0125, P = 0.59). Neutral test and mismatch analysis results showed that the An. sinensis population has undergone demographic expansions. Anopheles sinensis populations showed high genetic polymorphism by cox1 gene. The weak genetic structure may be a consequence of low genetic differentiation and high gene flow among populations, except the Yunnan samples. The Yunnan population was isolated from the other populations, gene flow limited by geographical distance and barriers. These findings will provide a theoretical basis for vector surveillance and vector control in China.


Background
Anopheles sinensis Wiedemann is an Oriental species with a wide distribution in China [1]. It is the primary malaria vector in plain regions of central China, especially in the paddy planting areas, and it has also been identified as a pathogenic vector for other disease such as Brugia malayi, one type of lymphatic filariasis [2,3]. Despite its disputable malaria vector capacity, An. sinensis is still incriminated as a competent vector for Plasmodium vivax malaria due to its abundant population size and wide distribution, which have led to occasional local malaria epidemics or outbreaks throughout history [4,5].
The morbidity of P. vivax malaria has dropped down to a historically low level since the Chinese government initiated the National Malaria Elimination Action Plan Programme in 2010 [6]; however, the ecological habitat and distribution of An. sinensis has not changed significantly [7,8]. In addition, the proportion of imported P. vivax malaria cases has notably increased in recent years [9,10], and so increased knowledge on the genetic structure and the divergence among An. sinensis populations are essential to lay out effective strategies for vector control and further malarial elimination.
As one important member of the Hyrcanus group, An. sinensis has been differentiated from its morphologically indistinguishable sibling species (An. lesteri, An. yatsushiroensis, and so on) by comparison of sequence data from the second internal transcribed spacer (ITS2) region of the ribosomal DNA (rDNA), which also greatly facilitates the subsequent genetics study [11].
To date, An. sinensis populations in China exhibit variations in morphology [12], chromosomes [13,14], ecology [12,14], vector capacity [15], the mitochondrial DNA (mtDNA) NADH dehydrogenase subunit 5 (nad5) [16] and mtDNA control region [17]. Our research team applied random amplified polymorphic DNA (RAPDs) [18] and microsatellite DNA markers [19] to detect the genetic structure of An. sinensis samples, the results revealing considerable polymorphism among populations, the genetic divergence however did not correlate with geographical distance. There were two gene pools in An. sinensis populations inferred by microsatellites, these structured populations possibly limiting the migration of genes under pressures/selections, such as insecticides and immune genes against malaria [20].
Different molecular markers may demonstrate different genetic structures. The appropriate markers are usually neutral, such as the mitochondrial cytochrome c oxidase subunit I (cox1) gene, which was used to analyze genetic variation and population structure of the anopheline mosquitoes [21][22][23][24][25][26][27]. The cox1 gene is slowly evolving compared to other protein-coding mitochondrial genes and is also widely used for reconstructing molecular phylogenies [28]. Although a few Chinese An. sinensis population genetic studies have been reported [16,18,19], to obtain a more accurate genetic structure of these populations, more molecular markers and more specimens are needed. In the present study, we sought to elucidate the genetic properties and variability of the An. sinensis populations collected from almost all distribution regions in China based on the sequences of the cox1 gene.

Mosquito collection and identification
Wild mosquito adults were caught from July 1997 to August 2010 by light traps or artificial catching aspirator at livestock corrals. With the owners' consent, the light traps were set up in pig or cow pens from 18:30 pm to 8:30 am. The number of sampling sites was 20 in 15 provinces from China. The collection information was summarized in Tables 1 and Fig. 1. Mosquitoes of the An. hyrcanus group were sorted out in the field by morphology using the identification keys [1], and kept individually in silica gel filled tubes at 4°C until DNA extraction. The An. sinensis species identity was diagnosed by PCR assay based on ITS2 rDNA sequences [11].
Genomic DNA isolation and sequencing of the cox1 mitochondrial gene fragments The genomic DNA of single mosquitoes were isolated using QIAamp DNA Mini Kit (Qiagen, Hilden, Germany), following the manufacturer's protocol. The forward (5′-GGT CAA CAA ATC ATA AAG ATA TTG G-3′) and reverse (5′-AAA CTT CAG GGT GAC CAA AAA ATC A-3′) primers were synthesized to amplify cox1 fragments [29]. The PCR kit was from Aidlab, China. PCR reactions were carried out in a Verity 96 well 157 Thermal Cycler (Applied Biosystems, Foster, USA). The cycling parameter included an initial step of denaturation at 94°C for 5 min, followed by 30 cycles of amplification at 94°C for 30 s, 45°C for 30 s, and 72°C for 30 s, with a final extension step at 72°C for 5 min. After electrophoresis, PCR products were purified and sequenced in both directions using PCR primers on an ABI 3730 automatic sequencer (Applied Biosystems, Foster, USA) by Biosune Biotech (Shanghai Co., Ltd., Beijing, China). The electropherograms were inspected manually to verify sequence quality.

Data analyses
Multiple cox1 sequence alignments were performed by CLUSTAL X [30] and edited by MEGA 6.0 [31]. The sequence differences within populations with number of haplotypes (H), the haplotype diversity (Hd), average number of nucleotide differences (K), average number of mutations per sequence (θ), number of variable sites (S) and nucleotide diversity (Pi) were estimated using DnaSP [32]. Median-joining networks of all An. sinensis haplotypes were constructed using Network 5.0 [33] to visualize relationships among unique haplotypes.
The genetic structure was analyzed with 15 populations. The percentage of sequence divergence within and between populations was calculated based on Nei & Li [34]. The pairwise F ST values for short-term genetic distance between populations were estimated with the methods of Slatkin [35] and tested for significance by permutation. The gene flow [Nm = (1 -F ST )/4 F ST ]) between localities was estimated from pairwise F ST [36]. Mismatch distributions and hierarchical analysis of molecular variance (AMOVA) were calculated using ARLEQUIN 3.11 [37]. Significant correlation between population genetic distance and linear straight geographical distances were assessed using the Mantel test implemented in Isolation by Distance Web Service (IBDWS) and significance was evaluated based on 1,000 permutations [38]. The neutrality test was evaluated by Tajima's D and Fu's Fs, which was estimated using the DnaSP software program [32].

Population sampling and An. sinensis identification
Mosquito samples were collected from 20 sites in 15 provinces in China. Fifteen populations were analyzed, in which HEN, SD and LN consisted of specimens collected from two or three sites in proximity to each other (these were pooled, as stated in Table 1). There were 453 individuals of An. sinensis mosquitoes, which were identified by PCR assay.

Sequence characteristics of cox1 gene
A segment of mtDNA, corresponding to the coding region of cox1, was successfully amplified from An. sinensis individuals. A sequence of 662 nt was obtained and analyzed. No insertion or deletion was detected across all samples. The conserved sites were 553, variable sites 107. Of these, 41 were singleton and 66 were parsimonyinformative.

Genetic variation among populations
The fixation index (F ST ) was used to evaluate the genetic distance among populations. In this study there were seven negative F ST values, comprising AH/FJ, CQ/JX, FJ/ GZ, FJ/HEN, FJ/JS, HB/GZ and HB/JS, which indicated genetic differentiation among these populations was very limited (Table 3). Across all populations, F ST values of the YN population were all greater than 0.45, and correspondingly, the gene flow (Nm) of the YN population were all much less than 1.0. In theory, it was hard to prevent genetic divergence caused by genetic drift if the gene flow (Nm) value was less than 1.0 [36].
In the hierarchical AMOVA, both the 'among populations' and 'within populations' variance components were considerably high, the latter (83.83%) contributing more to total variances than the former ( Table 4). The mean genetic divergence among populations was 0.16 by cox1 sequences.
Tests of isolation by distance were performed for all of the populations. No statistically significant correlations were detected between genetic differentiation and geographical distances based on the Mantel test (Fig. 3). The correlation coefficient was 0.11, which is not significant based on 1,000 permutations (P = 0.59). The results suggested that geographical distance does not significantly contribute to the genetic differentiation observed in An. sinensis populations.

Neutrality test and mismatch analysis
The Tajima's D and Fu's Fs values were all negative ( Table 5) by DnaSP software, which suggested many low-frequency mutations in populations as well as that the populations were in the process of expansion. The strongly negative values of Fu's Fs were observed (P < 0.01) in each population, which would be more sensitive in detecting deviations from neutrality.
Demographic expansions were analyzed using mismatch analysis; both the sum of squared deviation (SSD) values (0.03, P = 0.24) and raggedness index (Rag) (0.03) were not statistically significant in almost all the populations except the SSD value for the HEN population (Table 5) [39]. The mismatch distributions showed a smooth and main unimodal curve peaks (Additional file 2: Figure S1), which coincide with the population expansion model.

Discussion
In this study, the population genetic diversity was analyzed on An. sinensis samples obtained from 20 collection sites (22°16′N to 40°61′N, 103°51′E to 120°75′E), which covered almost the entire distribution range of An. sinensis in China [1]. Sample sizes and site distribution throughout China were adequate and provided an ample dataset to study. The results should be an objective representation, since sampling strategy and geographical coverage greatly influence the analysis and interpretation of the data generated from the samples.
There is considerable cox1 gene nucleotide diversity in anopheline mosquitoes, which has been used to explore population genetic structure [21][22][23][24][25][26] and DNA barcoding [27,[40][41][42][43][44]. The results showed that genetic polymorphism of An. sinensis populations have a high haplotype diversity (Hd = 0.709-0.998) and nucleotide diversity (Pi = 0.004-0.011), in turn suggesting that cox1 can be considered a suitable molecular marker for calculating genetic variation and detecting genetic structure. Fig. 2 Haplotype network of the cox1 as calculated by Network 5.0. Each circle (yellow) represents a haplotype, and the size of a circle is proportional to the number of individuals that contained the haplotype, the red dots (median vector) are a hypothesised haplotype, which was not detected The high level of genetic diversity indicated that the species could maintain a relatively large effective population size by a broad tolerance to environmental and habitat pressure. Due to the evolutionary rates of different molecular markers, the genetic divergence showed different degrees, e.g. Pi values as 0.61−1.00 of cox1 gene, compared to 0.24−0.65 of cox2 in An. sinensis samples [27]; θs values as 0.58−4.285 of cox2, while as 0.274−3.545 of cytb in An. lesteri populations [45].
Overall, the results showed that there were low genetic differences and high gene flow among different populations except in the case of the YN population, which was similar with that previously detected in An. sinensis populations in China using microsatellites and other molecular markers. There was also no correlation between genetic differences and geographical distance [16,18,19,46]. The distribution range of An. sinensis in China was wide, with a large population size and similar ecological habit [1]. Gene flow and introgression between individuals occurred easily, the expansion and spread of genes responsible for immunity against malaria or insecticide resistance thus highly probable between the populations [20,47]. Yunnan, a mountainous area in southwest China, is noted as a center of biodiversity because of its highly complex topography [48,49]. It was obvious that the YN population of An. sinensis was relatively unique in this study as in other reports [16,46]. The finding also reported that there was a great difference between YN and other populations in the pyrethroid resistance mechanism [47,50]. Therefore, both physical distance and heterogeneous landscape could be factors inhibiting gene flow between the YN population and others. In the Republic of Korea (ROK), An. sinensis populations represented positive and significant isolationby-distance patterns by microsatellites and mtDNA control region, and both molecular markers showed the Taebaek and Sobaek Mountain ranges to be barriers between the northern and southern parts of the ROK [17,51].
The genealogy network showed that these haplotypes of An. sinensis were divided into two clusters. Cluster I included all samples from 15 populations and cluster II included most samples from the YN population, this supported by the phylogenetic findings showing the YN populations also as an independent clade (Additional file 3: Figure S2). However, three clusters of An. sinensis populations were detected over seven Chinese provinces by mtDNA-ND5 gene. All three clusters were observed in An. sinensis samples collected from different sites demonstrating apparent differences in relative abundance for given clusters [16]. The CI was similar with cluster I, while CII and CIII were added together corresponding to cluster II of this study. But two gene pools grouping An. sinensis samples by microsatellites were difficult to correspond to the two clusters by cox1 in the present study [19].  Anopheles sinenesis population genetic variation detected by the microsatellites and the mtDNA were compared. Firstly, microsatellite results sorted the Chinese An. sinensis populations into two gene pools, six of 14 populations were mixed with individuals from both gene pools, indicating the coexistence of two genetic units in the areas sampled [19]. In this study, 15 populations were divided into two clusters, and almost all individuals in cluster II were from the YN population. Secondly, the level of F ST values in An. sinensis populations from China used by microsatellites showing a range of 0.004 −0.048 [19] compared to a range of 0.000−0.452 in this study. Similar the report in ROK, the range of F ST was 0.000−0.110 by microsatellites [51], while 0.000−0.3125 by mtDNA control region [17]. Thirdly, no overall correlation between genetic and geographical distance was detected in populations of An. sinensis in China, both by mtDNA and microsatellites, unlike in ROK [17,51]. This may be due to China's wide geographical area, resulting in dilution of the impact imposed by geographical barriers.
In this study, the neutrality test values of Tajima's D and Fu's Fs were all negative, which suggensted that An. sinensis population expansion events may have occurred in the demographic history. Moreover, the distribution of pairwise nucleotide differences was characteristic of a population that has undergone a large expansion. Furthermore, small and not statistically significant mismatch analysis statistics SSD and Rag supported the hypothesis of population expansion. These findings were also consistent with previously reported results based on the ND5 gene [16]. These results were supported by many low-frequency mutations in populations and with possible effects of purifying selection, or population expansion of An. sinensis in these locations.

Conclusions
A better understanding of genetic diversity of local An. sinensis and metapopulation dynamics could provide important information for the epidemiological surveillance and malaria vector control strategy. In this study, our collection sites covered most of the distribution in which malaria used to be meso-endemic or hypo-endemic in its regions in China. Anopheles sinensis populations showed high genetic polymorphism by cox1 gene. The weak genetic structure may be a consequence of low genetic differentiation and high gene flow among  Abbreviations: SSD sum of squared deviation, Rag raggedness index *P < 0.01 populations, except in the case of the YN population. The population structure implies that the expansion and spread of genes responsible for immunity against malaria or insecticide resistance would be more probable between the populations. The YN population was isolated from the other populations with its large pairwise F ST value, gene flow limited by geographical distances and barriers, thus forming a separate cluster (cluster II). Anopheles sinensis population size was large owing to the recent expansion of these populations in China.

Additional files
Additional file 1: Table S1. GenBank accession numbers for the Anopheles sinensis haplotypes among populations. (PDF 2557 kb) Additional file 2: Figure S1.