Genomic insights into adaptive divergence and speciation among malaria vectors of the Anopheles nili group

Abstract Ongoing speciation in the most important African malaria vectors gives rise to cryptic populations, which differ remarkably in their behavior, ecology, and capacity to vector malaria parasites. Understanding the population structure and the drivers of genetic differentiation among mosquitoes is crucial for effective disease control because heterogeneity within vector species contributes to variability in malaria cases and allow fractions of populations to escape control efforts. To examine population structure and the potential impacts of recent large‐scale control interventions, we have investigated the genomic patterns of differentiation in mosquitoes belonging to the Anopheles nili group—a large taxonomic group that diverged ~3 Myr ago. Using 4,343 single nucleotide polymorphisms (SNPs), we detected strong population structure characterized by high‐F ST values between multiple divergent populations adapted to different habitats within the Central African rainforest. Delineating the cryptic species within the Anopheles nili group is challenging due to incongruence between morphology, ribosomal DNA, and SNP markers consistent with incomplete lineage sorting and/or interspecific gene flow. A very high proportion of loci are fixed (F ST = 1) within the genome of putative species, which suggests that ecological and/or reproductive barriers are maintained by strong selection on a substantial number of genes.


| INTRODUCTION
One of the principal goals of population genetics is to summarize the genetic similarities and differences between populations (Wright, 1984). This task can be relatively straightforward for some taxa, but the genetic relationship among populations can also be difficult to summarize, especially for species whose evolutionary history is complex and reticulate. The best known mosquito species of the genus Anopheleswhich includes all vectors of human malaria parasites-exhibit very complex rangewide population structure due to the combined effects of cryptic speciation, adaptive flexibility and ongoing gene flow across strong but incomplete reproductive barriers (Harbach, 2013;Krzywinski & Besansky, 2003). For example, almost all major malaria vectors of the Afrotropical region belong to large taxonomic groups encompassing multiple incipient species relatively isolated reproductively and geographically from one another (reviewed by Sinka et al., 2010;Coetzee & Koekemoer, 2013;Dia, Guelbeogo, & Ayala, 2013;Lanzaro & Lee, 2013). These characteristics make them promising model systems to study speciation and the processes which contribute to reproductive barriers (e.g., Turner, Hahn, & Nuzhdin, 2005;Lawniczak et al., 2010;Neafsey et al., 2010;Fontaine et al., 2015;Weng, Yu, Hahn, & Nakhleh, 2016), but can also have far-reaching practical consequences. Both spatial and temporal variabilities in malaria cases and the effectiveness of vector control measures are greatly impacted by heterogeneity within vector species (Molineaux & Gramiccia, 1980;Van Bortel et al., 2001). For these reasons, research on the genetic structure among the major African malaria vector mosquitoes has intensified over the last few decades Coetzee & Koekemoer, 2013;Dia et al., 2013;Lanzaro & Lee, 2013).
The recent scaling up of insecticide-treated nets usage and indoor insecticide spraying to a lesser extent have led to a dramatic reduction in malaria morbidity and mortality across the continent (WHO, 2016).
However, other consequences of these large-scale interventions include increased insecticide resistance (reviewed by Hemingway et al., 2016;Ranson & Lissenden, 2016), range shift (e.g., Bøgh, Pedersen, Mukoko, & Ouma, 1998;Derua et al., 2012;Mwangangi et al., 2013) and profound evolutionary changes among vector populations. In contrast to insecticide resistance and range shift, which have been extensively studied, the recent adaptive changes among mosquito populations have yet to be addressed significantly. These changeswhich involve local adaptation and genetic differentiation, introgressive hybridization, and selective sweeps across loci conferring resistance to xenobiotics-are particularly evident in the most anthropophilic species (Barnes et al., 2017;Clarkson et al., 2014;Norris et al., 2015).
The ecology, taxonomic complexity, geographic distribution, role in transmission, and evolutionary potential of each vector species are unique. Consequently, further research is needed to specifically resolve population structure and the genomic targets of natural selection at a fine scale in all of the important taxa including currently understudied species. The present work focused on a group of malaria vector species representing a large taxonomic unit named Anopheles nili group. Despite the significant role some of its species play in sustaining high malaria transmission, this group has received little attention. To date, four species that occur in forested areas of Central and West Africa and are distinguishable by slight morphological variations are known within the An. nili group: An. nili sensu stricto (hereafter An. nili), An. ovengensis, An. carnevalei, and An. somalicus (Awono-Ambene, Kengne, Simard, Antonio-Nkondjio, & Fontenille, 2004;Gillies & Coetzee, 1987;Gillies & De Meillon, 1968). These species are characterized by reticulate evolution and complex phylogenies that have been challenging to resolve so far (Awono-Ambene et al., 2004, 2006Kengne, Awono-Ambene, Antonio-Nkondjio, Simard, & Fontenille, 2003;Ndo et al., 2010Ndo et al., , 2013Peery et al., 2011;Sharakhova et al., 2013). Populations of An. nili and An. ovengensis are very anthropophilic and efficient vectors of Plasmodium in rural areas where malaria prevalence is particularly high .
To delineate genomic patterns of differentiation, we sampled mosquito populations throughout the range of species of the An. nili group in Cameroon and used reduced representation sequencing to develop genomewide SNP markers that we genotyped in 145 individuals. We discovered previously unknown subpopulations characterized by high pairwise differentiation within An. ovengensis and An.
nili. We further explored the genetic differentiation across the genome and revealed the presence of a very high number of outlier loci that are targets of selection among locally adapted subpopulations. These findings provide significant baseline data on the genetic underpinnings of adaptive divergence and pave the way for further genomic studies in this important group of mosquitoes. Notably, a complete reference genome will enable us to conduct in-depth studies in order to decipher the functional and phenotypic characteristics of the numerous differentiated loci as well as the contribution of recent selective events in ongoing adaptation.

| Mosquito species
We surveyed 28 locations within the geographic ranges of species of the An. nili group previously described in Cameroon (Figure 1) (Antonio-Nkondjio et al., 2009;Awono-Ambene et al., 2004, 2006Ndo et al., 2010Ndo et al., , 2013. The genetic structure of Anopheles species is most often based on macrogeographic or regional subdivisions of gene pools, but can also involve more subtle divergence between larvae and adults, or between adult populations found in or around human dwellings (e.g., Riehle et al., 2011). To effectively estimate F I G U R E 1 Map showing the sampling locations and the relative frequencies of the morphologically defined species An. nili and An. ovengensis in Cameroon. Small and large black dots indicate, respectively, the 28 locations surveyed and the four sampling sites where mosquitoes were collected the genetic diversity and identify potential cryptic populations within species, we collected larvae and adult mosquitoes within and around human dwellings using several sampling techniques (Service, 1993) in September-October 2013 (Table S1). To identify the four currently known members of the An. nili. group, we used morphological keys and a diagnostic PCR, which discriminates species based on point mutations of the ribosomal DNA (Awono-Ambene et al., 2004;Gillies & Coetzee, 1987;Gillies & De Meillon, 1968;Kengne et al., 2003).

| Library preparation, sequencing, and SNP discovery
We created double-digest restriction site-associated DNA (ddRAD) libraries as described in Kamdem et al. (2017) using a modified version of the protocol designed by Peterson, Weber, Kay, Fisher, and Hoekstra (2012). Briefly, genomic DNA of mosquitoes was extracted using the DNeasy Blood and Tissue kit (Qiagen) and the Zymo Research MinPrep kit for larvae and adult samples, respectively. Approximately 50 ng (10 μl) of DNA of each mosquito was digested simultaneously with MluC1 and NlaIII restriction enzymes. Digested products were ligated to adapter and barcode sequences enabling identification of individuals. Samples were pooled, purified, and 400-bp fragments selected. The resulting libraries were amplified via PCR and purified, and fragment size distribution was checked using the BioAnalyzer. PCR products were quantified, diluted and single-end sequenced to 100 base reads on Illumina HiSeq2000.

| SNP discovery and genotyping
The  (Wu & Nacu, 2010). To identify and call SNPs within consensus RAD loci, we utilized the ref_map.pl program of Stacks. We set the minimum number of reads required to form a stack to three and allowed two mismatches during catalogue creation.
We generated SNP files in different formats for further downstream analyses using the populations program of Stacks and Plink v1.09 (Purcell et al., 2007).

| Population genomics analyses
We analyzed the genetic structure of An. nili sensu lato (s.l.) populations using a principal component analysis (PCA) and an unrooted Neighbor-Joining tree (NJ). We also examined ancestry proportions and admixtures between populations in Admixture v1.23 (Alexander, Novembre, & Lange, 2009) and Structure v2.3.4 (Pritchard, Stephens, & Donnelly, 2000). We used the package adegenet (Jombart, 2008) to implement the PCA in R (R Development Core Team 2016). The individual-based NJ network was generated from SNP allele frequencies via a matrix of Euclidian distance using the R package ape (Paradis, Claude, & Strimmer, 2004). We ran Admixture with 10-fold cross-validation for values of k from 1 through 8. Similarly, we analyzed patterns of ancestry from k ancestral populations in Structure, testing five replicates of k = 1-8. We used 200,000 iterations and discarded the first 50,000 iterations as burn-in for each Structure run. Clumpp v1.1.2 (Jakobsson & Rosenberg, 2007) was used to summarize assignment results across independent runs. To identify the optimal number of genetic clusters in our sample, we applied simultaneously the lowest cross-validation error in Admixture, the ad hoc statistic deltaK (Earl & VonHoldt, 2012;Evanno, Goudet, & Regnaut, 2005)  Single-population models were fitted to allele frequency spectra, and the best model was selected using the lowest likelihood and Akaike information criterion as well as visual inspections of residuals.

| SNP genotyping
We collected mosquitoes from four locations out of 28 sampling sites ( Figure 1, Table S1) and sequenced 145 individuals belonging, according to morphological criteria and diagnostic PCRs, to two species (An. nili [n = 24] and An. ovengensis [n = 121]). We assembled 197,724 RAD loci that mapped to unique positions throughout the reference genome. After applying stringent filtering rules, 408 loci present in all populations and in at least 50% of individuals in each population were retained. Within these loci, we identified 4,343 high-quality biallelic markers that were used to analyze population structure and genetic differentiation.

| Morphologically defined species do not correspond to genetic clusters
The PCA and the NJ tree show that the genetic variation across 4,343 SNPs is best explained by more than two clusters, implying subdivisions within An. nili and An. ovengensis (Figure 2). Three subgroups are apparent within An. nili while two distinct clusters segregate in An.
These five subpopulations are strongly correlated with the different sampling sites suggesting local adaptation of divergent populations.
Importantly, Structure and Admixture analyses reveal that, at k = 2, one population identified by morphology and the diagnostic PCR as An.
nili has almost the same ancestry pattern as the largest An. ovengensis cluster (Figure 3). Such discrepancies between morphology-based and molecular taxonomies can be due to a variety of processes including phenotypic plasticity, introgressive hybridization, or incomplete lineage sorting (i.e., when independent loci have different genealogies by chance) (Arnold, 1997;Combosch & Vollmer, 2015;Fontaine et al., 2015;Weng et al., 2016). At k = 2 and k = 3, some populations also exhibit half ancestry from each morphological species suggestive of gene flow. We found a conflicting number of genetic clusters in our samples likely reflecting the complex history of subdivisions and admixtures among populations (Figure 4). The Evanno et al. (2005) method, which highlights the early stages of divergence between An.
nili and An. ovengensis, indicates two probable ancestors. DAPC and the Admixture cross-validation error, which are more sensitive to recent hierarchical population subdivisions, show five or more distinct clusters as revealed by the PCA and the NJ tree ( Figure 4).
As suggested by the long internal branches, which connect subpopulations on the NJ tree, there is strong differentiation between and within morphological species characterized by globally high-F ST values (Table 1). Relatively lower F ST values observed between certain clusters may be due to greater interpopulation migration and intermixing or more recent divergence. The F ST values do not reflect the morphological delimitation of species. Indeed, the level of genetic differentiation is higher between some subpopulations within the same morphological species. Overall, patterns of genetic structure and differentiation reveal a group of populations whose phylogenies and species status are likely confounded by hybridization and/or incomplete lineage sorting. We argue that the current taxonomy based on morphology and ribosomal DNA does not capture the optimal reproductive units among populations of this group of mosquitoes.

| Genomic signatures of divergent selection and demographic history
We analyzed patterns of genetic differentiation across SNP loci throughout the genome. Pairwise comparisons are based on filtered variants that satisfy all criteria to be present in both populations, which explains the discrepancy in the number of SNPs observed between specific paired comparisons ( Figure 5). The distribution of locus-specific F ST values between the five subpopulations revealed a U-shape characterized by two peaks around 0 and 1. The large majority of SNPs have low-to-moderate divergence, but a substantial number of variants are extremely differentiated between populations.
The maximum F ST among SNPs is 1, and the proportion of loci with F ST = 1 varies from 6.52% between the populations we termed An.
nili group 1 and An. ovengensis group 1 to 44.74% between the subgroups called An. nili group 2 and An. nili group 3 ( Figure 5). This pattern of genomewide divergence suggests that a very high number of sites with abrupt differentiation-which likely contain genes that contribute to divergent selection and/or reproductive isolation-coexist with regions of weak divergence that can be freely exchanged between species. As is the case with the overall genetic differentiation, morphology is not a reliable predictor of locus-specific divergence.
Precisely, the lowest percentage of fixed SNPs is found between An. ovengensis from Nyabessan and An. nili collected from Mbébé and Ebebda (Figures 1 and 5). In contrast, the largest proportion of fixed loci is observed between locally adapted subgroups within the same morphological species: An. nili. The draft reference genome made F I G U R E 3 Ancestry proportions inferred in Admixture with k = 2-8 F I G U R E 2 Population genetic structure inferred from 4,343 SNPs using a PCA (a) and a neighbor-joining tree (b). The percentage of variance explained is indicated on each PCA axis. Note the strong association between the five genetic clusters and the different sampling locations  (Nosil & Feder, 2012;Roesti, Hendry, Salzburger, & Berner, 2012).

Models of population demography indicate that all subgroups have
experienced an increase in effective size in a more or less recent past (Table 2). Nevertheless, confidence intervals of population parameters are high in some populations, and our results should be interpreted with the necessary precautions. The population growth is less significant in An. nili group 1.

| Genetic differentiation
Advances in sequencing and analytical approaches have opened new avenues for the study of genomes of disease vectors. We have focused on malaria mosquitoes of the An. nili group, whose taxonomy and population structure have been challenging to resolve with low-resolution markers. We analyzed genetic structure using genomewide SNPs and found strong differentiation and local adaption among populations belonging to the two morphologically defined species An. nili and An.
ovengensis. The exact number of subpopulations remains contentious, with the suggested number of divergent clusters varying from two to five. Significant population structure at eight microsatellite loci has been described among An. nili populations from Cameroon, with F ST values as high as 0.48 between samples from the rainforest area . By contrast, An. ovengensis was discovered recently and the genetic structure of this vector remains understudied. This species was initially considered as a sibling of An. nili (Awono-Ambene et al., 2004, 2006Kengne et al., 2003), but more recent studies have started to challenge the assumed relatedness between the two species due to the high divergence revealed by polytene chromosomes (Sharakhova et al., 2013). Our findings call for a careful review of the current taxonomy within this group of species, which is a necessary first step for accurately delineating the role played by the different subpopulations in malaria transmission.
Our samples were collected from locations characterized by a more or less degraded forest within the rainforest area of Cameroon.
In these habitats, larvae of An. nili s.l. exploit relatively similar breeding sites consisting of slow-moving rivers (Antonio-Nkondjio et al., 2009).
The ecological drivers of genetic differentiation remain unknown, and will be difficult to infer from our data given the apparent similarity of habitats among the divergent populations we described. Further study is needed to clearly address the environmental variables that may be correlated with ongoing adaptive divergence at adult and larval stages.
One of the most expected outcomes of current large-scale malaria

An. ovengensis group 2
An. nili group 1 - F I G U R E 5 Distribution of F ST values throughout the genome between An. nili group 1 and An. nili group 2 (a); An. nili group 1 and An. nili group 3 (b); An. nili group 1 and An. ovengensis group 1 (c); An. nili group 1 and An. ovengensis group 2 (d); An. nili group 2 and An. nili group 3 (e); An. nili group 2 and An. ovengensis group 1 (f); An. nili group 2 and An. ovengensis group 2 (g); An. nili group 3 and An. ovengensis group 1 (h); An. nili group 3 and An. ovengensis group 2 (i); An. ovengensis group 1 and An. ovengensis group 2 (j). The number of SNPs with F ST = 1 is indicated in each pairwise comparison as well as the total number of SNPs in parenthesis coluzzii, An. funestus and An. moucheti, which reveals a substantial population increase suggesting that intense insecticide exposure has yet to leave deep or detectable impacts on patterns of genetic variation among mosquito populations Kamdem et al., 2017;O'Loughlin et al., 2014).

| Genomic architecture of geographic and reproductive isolation
Understanding the genomic architecture of reproductive isolation may reveal crucial information on the sequence of events that occur from the initial stages of divergence among populations to the onset of strong reproductive barriers between species (e.g., Turner et al., 2005;Harr, 2006;Nadeau et al., 2012;Ellegren et al., 2012;Carneiro et al., 2014;Burri et al., 2015). One influential concept of speciation coined the "genic view of species" proposes that boundaries between species are properties of individual genes or genome regions and not of whole organisms or lineages (Barton & Hewitt, 1985;Harrison, 1990;Key, 1968;Nosil & Feder, 2012;Rieseberg, Whitton, & Gardner, 1999;Wu, 2001). We have discovered a substantially high number of SNPs that are strongly differentiated between populations and often fixed within subgroups of An. nili s.l. Interpreting this intriguing pattern of genomic differentiation is not straightforward due to the complex interactions between numerous forces-including positive or negative selection, recombination, introgressive hybridization and incomplete lineage sorting-that can affect the level of divergence among SNPs (Begun & Aquadro, 1992;Cutter & Payseur, 2013;Harrison & Larson, 2016;Nachman & Payseur, 2012;Roesti et al., 2012). Some of these variants exhibiting high divergence among populations certainly contain markers of ecological and/or reproductive isolation. However, as far as reproductive barriers are concerned, recent studies have indicated a complex relationship between the degree of genetic differentiation and gene flow at the genome level (e.g., Gompert et al., 2012;Hamilton, Lexer, & Aitken, 2013a,b;Larson, Andrés, Bogdanowicz, & Harrison, 2013;Larson, White, Ross, & Harrison, 2014;Parchman et al., 2013;Taylor, Curry, White, Ferretti, & Lovette, 2014). Highly divergent genomic regions do not necessarily coincide with regions of reduced gene flow among established or emerging species. Several alternative interpretations exist for the numerous high-F ST regions we detected in all pairwise comparisons (Cruickshank & Hahn, 2014;Delmore et al., 2015;Nachman & Payseur, 2012;Noor & Bennett, 2009). Nevertheless, careful examination of these outliers of differentiation may reveal significant insights into the wide range of genes and traits that contribute to ecological divergence and/or reproductive isolation between subgroups of An. nili s.l. A complete genome assembly will be necessary to better delineate specific regions of the genome under natural selection, and therefore clarify the genomic basis of phenotypic fitness differences between divergent populations. This will also help understand the extent to which recent selection associated with human interventions contribute to local adaptation and genetic differentiation as observed in An. gambiae and An. coluzzii .
Signals consistent with gene flow between An. nili and An. ovengensis are apparent in our data although it has been proposed that the two morphological species diverged ~3 Myr ago . Some individuals display almost half ancestry from each morphological species. The disagreement between morphology/PCR and molecular taxonomies observed in Structure and Admixture analyses also suggests that incongruent genealogies may be widespread along chromosomes due to hybridization. However, hybridization can be difficult to detect because other factors such as incomplete lineage sorting or technical artifacts can leave signatures that are similar to those of interspecific gene flow (Liu et al., 2014;Patterson et al., 2012). A complete reference genome is also needed to analyze the detailed distribution of genealogies across small genomic windows and to disentangle the relative contribution of processes that generate the putative admixtures and species confusion observed among divergent populations (Fontaine et al., 2015;Martin et al., 2013;Weng et al., 2016).

| CONCLUSIONS AND IMPLICATIONS
Delineating the fine-scale population structure of mosquito populations is crucial for understanding their epidemiological significance Relative to ancestral population size. b Expressed in units 2Ne generations from start of growth to present.
T A B L E 2 Demographic models of different subgroups of An. nili s.l.
and their potential response to vector control measures. Moreover, recent malaria control efforts affect interspecific gene flow, genetic differentiation, population demography and natural selection in mosquitoes (Athrey et al., 2012;Barnes et al., 2017;Clarkson et al., 2014;Kamdem et al., 2017;Norris et al., 2015). Deciphering the signatures of these processes across mosquito genomes is important to minimize their negative impacts on vector control. Our findings shed some light on the complex evolutionary history and provide a framework for future investigations into the genetic basis of ecological and reproductive barriers among species of the An. nili group.

ACKNOWLEDGEMENTS
Funding for this project was provided by the University of California Riverside and NIH grants 1R01AI113248 and 1R21AI115271 to BJW.
We thank inhabitants and administrative authorities of the sampling sites included in this study for their collaboration.