Molecular evolution and population genetics of a Gram-negative binding protein gene in the malaria vector Anopheles gambiae (sensu lato)

Background Clarifying the role of the innate immune system of the malaria vector Anopheles gambiae is a potential way to block the development of the Plasmodium parasites. Pathogen recognition is the first step of innate immune response, where pattern recognition proteins like GNBPs play a central role. Results We analysed 70 sequences of the protein coding gene GNBPB2 from two species, Anopheles gambiae (s.s.) and An. coluzzii, collected in six African countries. We detected 135 segregating sites defining 63 distinct haplotypes and 30 proteins. Mean nucleotide diversity (π) was 0.014 for both species. We found no significant genetic differentiation between species, but a significant positive correlation between genetic differentiation and geographical distance among populations. Conclusions Species status seems to contribute less for the molecular differentiation in GNBPB2 than geographical region in the African continent (West and East). Purifying selection was found to be the most common form of selection, as in many other immunity-related genes. Diversifying selection may be also operating in the GNBPB2 gene. Electronic supplementary material The online version of this article (doi:10.1186/s13071-016-1800-2) contains supplementary material, which is available to authorized users.


Background
In order to complete their life-cycle, the malaria parasites Plasmodium sp. have to go through important stages within their mosquito vectors Anopheles sp., before being transmitted to human hosts. Malaria control strategies based on obstructing the parasite life-cycle within the mosquito are dependent on an understanding of the mosquito anti-pathogen defence system [1,2]. This has been facilitated by the availability of the Anopheles gambiae genome sequence [3,4].
The mosquito innate immune system constitutes a major barrier to infection [5,6]. The first step of the innate immune response is pathogen recognition, which is activated by pattern recognition receptors (PRRs) that bind to pathogen-associated molecular patterns [7]. One important group of PRRs are the Gram-negative bacteria-binding proteins or glucan-binding proteins (GNBPs). These were initially identified in An. gambiae due to the similarities with GNBPs from other insects and because they are transcriptionally upregulated following infection with bacteria and Plasmodium parasites [8]. Six members of this gene family are expressed in An. gambiae and function as PRRs by binding ß-1,3-glucan and lipopolysaccharide on the surface of pathogens.
GNBPs are divided into two distinct sequence groups: subfamily A, that includes all known fruit fly and moth as well as two mosquito sequences (GNBPA 1 and 2); and subfamily B that is mosquito-specific (GNBPB 1, 2, 3 and 4) and probably a result from gene duplication [7].
Prior work reported that GNBPs are transcribed in multiple tissues (hemocytes, midgut, and salivary glands) and while they are all upregulated following an immune challenge, they vary in their antimicrobial specificities [7][8][9][10][11][12][13][14]. Specifically, GNBPs have been shown to regulate immune gene expression through the Toll or the IMD (Immune Deficiency) pathways. Certain members are able to mediate Plasmodium oocyst intensities in An. gambiae [13] and one GNBP homologue in An. gambiae was highly expressed in the fat body and salivary glands [8,15]. On the other hand, in cultured Anopheles cells infected with Wolbachia strains, GNBPB1 gene was downregulated by Wolbachia infection [16]. As for GNBPB2, it has been shown to be induced by challenges with Salmonella typhimurium [13] and Beauveria bassiana [11]. It was also upregulated upon challenge with Escherichia coli [13].
In sub-Saharan Africa, most malaria transmission is sustained by members of the Anopheles gambiae complex. Within the nominal species two molecular forms (denoted M and S) were previously described (see [17] and references therein). Recently, these molecular forms were reclassified as distinct species, and the M-form was named Anopheles coluzzii, while the S form retained the nominotypical name An. gambiae [17].
Early genome-wide genotyping studies have shown that most of the genetic divergence between An. gambiae (s.s.) and An. coluzzii is concentrated in three relatively small centromeric regions in X, 2L and 3L [18][19][20]. Differentiation was also detected in immunity genes between the two species [21][22][23] with the most remarkable case being the near fixation in An. coluzzii of an allelic variant of the thioester-containing protein 1 (TEP1) [23]. TEP1 is an important component in the innate immune response of An. gambiae to Plasmodium infection, which targets malaria parasites for destruction during their initial invasion of the body cavity. Several studies have addressed the molecular evolution and genetic diversity of the anti-malaria immune genes of An.
In order to untangle the modes of selection operating in the gene GNBPB2 and better understand its evolution in malaria vectors, patterns of genetic diversity and population differentiation were examined in samples of An. gambiae (s.s.) and An. coluzzii from six sub-Saharan African countries.

Mosquito sampling
Mosquito samples analysed in this study were collected mainly indoors by various methods of adult sampling during the rainy season in seven localities from six sub-Saharan African countries, within the framework of epidemiological surveys. Details on these collections can be found in Additional file 1: Table S1. After collection, individual specimens were kept in silica gel filled tubes.

DNA extraction, PCR amplification and sequencing
Genomic DNA was extracted from each specimen as described in Collins et al. [35]. Species identification of the members of the An. gambiae complex was carried out by PCR-RFLP as described in Favia et al. [36].
The primers used to amplify the GNBPB2 gene were designed based on the complete An. gambiae genome at Ensembl (sequence annotated AGAP002798). These primers are described in Table 1 and available at NCBI Probe database (Pr032290638).
Nested PCR assays were performed in 50 μl reaction volumes with final reagent concentrations of 1× reaction buffer, 3 μM of MgCl 2 , 4 μM dNTPs, 0.5 μM of each primer (except for the centre primers with 0.1 μM), and 0.05 U/μl of Taq DNA polymerase. PCR cycling conditions consisted in 2 min of initial denaturation at 95°C, followed by 35 cycles of 1 min at 95°C, 30 s at 51°C, 1 min at 72°C and a final extension step of 5 min at 72°C. For the primers out, the intermediate step of 72°C lasted for 2 min. For the primers centre, the annealing temperature was 55°C. PCR products were purified with Table 1 Primer sequences used to amplify the GNBPB2 gene (NCBI Probe database accession number: Pr032290638) GNBPB2-centre -R GCCWCGRTAGTCCATATTGC SureClean kit (BIOLINE, London, UK) and commercially sequenced by Macrogen, Korea.

Data analysis
Sequences were edited and aligned with BioEdit Sequence Alignment Editor version 7.0.5.2 [37]. In DnaSP version 5.10.01 [38] the different coding and non-coding regions were defined and the translated sequences were obtained. Summary statistics, including the number of segregating sites (S), number of haplotypes (Hap), Haplotype diversity (Hd), nucleotide diversity (π), and the standard neutrality tests: Tajima's D [39], Fu and Li's D* and F* [40] and Fu's Fs [41] were calculated using DnaSP. This program was also used to compute π between species and the π(a)/π(s) ratios along the gene GNBP2 using the sliding window option (window length = 50 bp; sliding interval = 10 bp).
Additionally, we performed the dN/dS test for detecting selection implemented in the HYPHY program [42]. This test was also executed in a new alignment with the original sequences of the exon 2 obtained in the present study, to which 31 sequences of the same exon (599 bp) of An. gambiae from Cameroon available on GenBank (accession numbers AM774987-AM774989, AM774998-AM775011 [21,34], AM900863-AM900876 [34]) were added. The same set of exon sequences was analysed for a recombination detection using RDP4 software [43] and no evidence of recombination was found.
Genetic differentiation among populations was quantified by computing pairwise F ST (conventional F-statistics from haplotype frequencies). Slatkin's linearized F ST estimates were tested for correlation with pairwise measures of the natural logarithm of the geographic distance using Mantel's test [44]. In order to evaluate if some populations contribute differently than others to the average F ST , population specific F ST indices were also calculated [45].
In order to estimate the total percentage of variance attributable to differences between species and among geographic areas (western and eastern Africa), a standard analysis of molecular variance AMOVA was performed with 5000 permutations [46]. These estimates were obtained with Arlequin version 3.11 [47] using the complete sequence for 70 individuals.
Sequential Bonferroni corrections adjusted critical probability values for multiple tests to minimize type I errors [48].

Results
We obtained 70 sequences of 1335 bp. Twenty-four of the mosquitoes corresponded to An. coluzzii from Angola, Ghana-Okyereko and Guinea-Bissau. The remaining 46 samples corresponded to An. gambiae from Gabon, Ghana-Accra, Ghana-Okyereko, Guinea-Bissau, Mozambique and Tanzania. All sequences are available in the GenBank database under accession numbers: KX620787-KX620856.

Genetic diversity and neutrality tests
The alignment of the 70 sequences resulted in 135 segregating sites defining 63 distinct haplotypes (Additional file 2: Table S2).
Summary diversity statistics are presented in Additional file 3: Table S3 The levels of haplotype diversity were very high and similar among populations and between species (0.911-1.000). The levels of nucleotide diversity compared between species are presented in Fig. 1.
The translation of DNA sequences generated protein sequences with 391 amino acids. In GNBPB2, protein diversity was large. We obtained 17 proteins for An. gambiae, with 13 showing a frequency equal to one. The most common proteins had a frequency of 11, accounting for 24 % of the proteins detected. In An. coluzzii, we obtained 12 proteins, seven of which showed a frequency equal to one. The most common proteins had a frequency of 8, accounting for 24 % of the proteins detected (Fig. 2).
For the total sequence, all neutrality tests showed negative results (non-significant, Additional file 3: Table S3). However, significant negative values for Fu and Li's D* and F* were detected when exon 1 was analysed separately. Furthermore, the D* value for the exon 2 was also significantly negative, over all populations. Considering each population individually, a significant positive value of Tajima's D test was obtained for the intron section in mosquitoes from Angola.
If most non-synonymous mutations are deleterious, then the rate of non-synonymous evolution will be lower than the neutral rate, resulting in π(a)/π(s) and dN/dS ratios < 1 [49]. Since our data revealed π(a)/π(s) ratio values lower than one, this suggests negative or purifying selection (Fig. 3). This ratio was particularly low in exon 2 (Additional file 3: Table S3).
From the dN/dS test performed with HyPhy, in the whole coding region we found two positively selected sites and 45 negatively selected sites, mainly evident in exon 2 (with only one site under positive selection and 40 sites under negative selection). As for exon 1, the selection signature was negligible (one site positively selected and another negatively selected).
When we focused on exon 2 by analysing more sequences from GenBank in a new alignment of 101 sequences, the number of negatively selected sites increased to 60, and with only two sites under positive selection.

Genetic differentiation and population structure
Global F ST among geographic locations was 0.018 (P < 0.003, Table 2) when the complete sequence was analysed, and 0.021 (P < 0.003) when only the coding regions (exon 1 and 2) were considered. The results presented hereafter refer to the whole sequence.
The pairwise differentiation (F ST ) estimates ranged from 0 to 0.066, and all comparisons were non-significant (after Bonferroni correction). The same pattern was obtained when the species status was also taken into account, e.g. samples from Guinea Bissau and Ghana (Okyereko) were divided in two, one corresponding to An. coluzzii individuals and the other to An. gambiae (s.s.). The overall F ST between species was 0.006 (P = 0.027).
The partition of molecular variance was 0.14 % of total variance (P ≥ 0.05) between species and 1.75 % (P < 0.01 in Table 3) among sample sites. On the other hand, , independent of species status, the molecular variance among groups increased to 2.26 % (P = 0.03), and the variance among sample sites was reduced to 0.85 % (P ≥ 0.05). Furthermore, the combination with the maximum variation among groups (2.75 %, P < 0.01) was the one comprising three groups: Angola, East Africa and West Africa without Angola (Table 3).

Discussion
The levels of nucleotide diversity for GNBPB2 detected in this study are comparable with the GNBP gene reported by Lehmann et al. [30] and other immune-related genes in An. gambiae [24,27,32] and Drosophila melanogaster [50,51]. As an exception, Morlais et al. [24] found levels of nucleotide diversity for GNBPB1 ten times smaller (Π = 0.0016) than the values of GNBPB2, but this work was based on the analysis of laboratory strains rather than wild populations so that comparisons may not be straightforward. The patterns of high protein variation in GNBPB2, and high haplotype diversity in exon 2 may be consistent with diversifying selection, a mode of selection that maintains high levels of diversity (e.g. MHC genes in mammals [52]) and fits well with the role of immune recognition [53]. This selection for hyper-variability was not excluded for the GNBP gene in the study from Lehman et al. [30] that also presented large protein diversity.
Our study showed very low levels of genetic differentiation between species (F ST = 0.006, P = 0.027), when compared with values obtained in other population studies in several immune-related genes [21,22]. Unlike these studies where only samples from one village were used (from Cameroon in [21] and Burkina Faso in [22]), we sampled mosquitoes from nine countries, ranging from Guinea Bissau in western Africa to Mozambique in the Southeast of the continent. Overall, the effect of geographical distance among populations was more decisive in our study than that of species status. Indeed, the genetic discontinuity between West and East Africa accounts for 2.2 % (P = 0.03) of the total variance (Table 3), while the hierarchical genetic diversity analysis revealed that 0.14 % (P > 0.05) of the total variance arose from differences between the two species. Furthermore, we also detected a pattern of isolation by distance made evident by the significant positive correlation between genetic differentiation and geographical distance. This scenario has already been reported for An. gambiae in an extensive analysis of neutral markers on a large geographic scale by [54].
In terms of genetic structure, the sample of An. coluzzii from Angola stands out in this study. The variance among groups was maximized when three groups of samples were defined: West Africa, East Africa and Angola (2.76 % of variation, P = 0.01, Table 3). The population from Angola is the most differentiated   Table 2). This value is 1.7 times the global F ST among populations, and is higher than the indices calculated for the most eastern populations (Mozambique and Tanzania). Such differential contribution to the average F ST may suggest special evolutionary constraints in the population [55]. In fact, in Angola a significant positive value of the D Tajima's neutrality test [39] in the intron section of the GNBPB2 (Additional file 3: Table S3) was detected, which signifies an excess of intermediate frequency polymorphisms, indicating a possible decrease in population size. Overall our findings may reflect a population structuring associated with different African biomes as reported by Pinto et al. [56]. These authors analysed An. gambiae samples from 12 African countries with 13 microsatellite loci and reported a strong population structuring within An. coluzzii, which was divided into three distinct genetic clusters (west, central, and southern Africa). These clusters were associated with the central African rainforest belt and northern and southern savannah biomes, suggesting limited gene flow between them. Furthermore, a study based on sequence analysis of an X-linked locus revealed that the majority of An. coluzzii individuals in Angola had a 16-bp insertion that was fixed in An. gambiae but absent in An. coluzzii individuals from west and central Africa [57], a finding that suggests interspecific introgression may have occurred in this geographical region.
The results of neutrality tests were generally variable and non-significant. Because both, selective events or demographic changes can produce similar deviations from neutrality in these tests [58] we used also the dN/dS ratio test that is not sensitive to demographic events, to help in the detection of selection effects. In both exons, dN/dS was <1, which signifies a rate of non-synonymous evolution lower than neutral rate, due to most nonsynonymous mutations being deleterious (i.e. purifying selection) [49]. Indeed, a strong signature of purifying selection was detected essentially in exon 2, and further confirmed by a joint analysis of other sequences of exon 2 of GNBPB2 available from previous studies [21,34]. This is concordant with the majority of Anopheles immune related genes, which are also under purifying selection (e.g. [29,30,32]). Overall, this suggests functional constraints possibly associated with the immunoregulatory role of this gene.

Conclusions
The present paper expands our limited knowledge about the gene GNBPB2 with a population genetics approach in the two main malaria vectors, An. gambiae (s.s.) and An. coluzzii, over a wide geographic area in Africa. Our study showed that GNBPB2 is similar in the two species. On the other hand, the variability of the gene is differentiated according to the geographic distance of different populations in the African continent. Generally, the selection tests results are consistent with most of the studies that have addressed questions regarding the evolution and genetic diversity of Anopheles sp. innate immunity genes involved in Plasmodium infection. Purifying selection was found to be the most common form of selection operating on these genes [21,[25][26][27][28][29][30][31][32][33][34], but diversifying selection should not be excluded. Specifically, Lehmann et al. [30] confirmed similar selective effects on the GNBP gene on a contemporary time scale.

Additional files
Additional file 1: Table S1. Mosquito samples used in this study [59,60]. (DOCX 18 kb) Additional file 2: Table 2 Ethics approval and consent to participate Not applicable.
Author details