Tracing the origin of the early wet‐season Anopheles coluzzii in the Sahel

Abstract In arid environments, the source of the malaria mosquito populations that re‐establish soon after first rains remains a puzzle and alternative explanations have been proposed. Using genetic data, we evaluated whether the early rainy season (RS) population of Anopheles coluzzii is descended from the preceding late RS generation at the same locality, consistent with dry season (DS) dormancy (aestivation), or from migrants from distant locations. Distinct predictions derived from these two hypotheses were assessed, based on variation in 738 SNPs in eleven A. coluzzii samples, including seven samples spanning 2 years in a Sahelian village. As predicted by the “local origin under aestivation hypothesis,” temporal samples from the late RS and those collected after the first rain of the following RS were clustered together, while larger genetic distances were found among samples spanning the RS. Likewise, multilocus genotype composition of samples from the end of the RS was similar across samples until the following RS, unlike samples that spanned the RS. Consistent with reproductive arrest during the DS, no genetic drift was detected between samples taken over that period, despite encompassing extreme population minima, whereas it was detected between samples spanning the RS. Accordingly, the variance in allele frequency increased with time over the RS, but not over the DS. However, not all the results agreed with aestivation. Large genetic distances separated samples taken a year apart, and during the first year, within‐sample genetic diversity declined and increased back during the late RS, suggesting a bottleneck followed by migration. The decline of genetic diversity followed by a mass distribution of insecticide‐treated nets was accompanied by a reduced mosquito density and a rise in the mutation conferring resistance to pyrethroids, indicating a bottleneck due to insecticidal selection. Overall, our results support aestivation in A. coluzzii during the DS that is accompanied by long‐distance migration in the late RS.


| INTRODUCTION
Malaria in Africa extends from the humid equatorial forest to semi-arid zones in the north and south. Three species within the Anopheles gambiae species complex, A. gambiae, A. coluzzii and A. arabiensis are primary vectors of malaria in Africa. Beyond the equatorial zone, malaria transmission is seasonal and in regions where surface waters required for larval development are absent during the 3-to 7-month-long dry season (DS), mosquito densities drop dramatically (Adamou et al., 2011;Dao et al., 2014;Lehmann et al., 2010;Lemasson et al., 1997;Omer & Cloudsley-Thompson, 1968, 1970Simard, Lehmann, Lemasson, Diatta, & Fontenille, 2000). Whether populations survive the long DS by aestivation (a "semi-dormant" state promoting extended longevity during the DS) or are re-established by migrants from distant locations where larval sites are available year-round has remained an enigma for over 60 years (Donnelly, Simard, & Lehmann, 2002;Gillies & De Meillon, 1968). Continued interest in this question is fuelled by recognition that the fragile DS populations are an ideal target for malaria control. Recent studies have amassed evidence that in the West African Sahel A. coluzzii aestivates, whereas A. gambiae re-establishes populations via long-distance migration (Adamou et al., 2011;Arcaz et al., 2016;Dao et al., 2014;Huestis et al., 2012;Lehmann et al., 2010;Yaro et al., 2012). However, direct evidence such as observation of aestivating adults in their hidden shelters or the recapture of marked mosquitoes hundreds of kilometres from their release sites remains elusive; thus, additional indirect evidence could help understand this problem. Here, we undertake a population genetic study to evaluate and distinguish between these explanations of persistence throughout the DS.
Previous genetic studies have aimed to detect seasonal bottlenecks in different populations of the A. gambiae species complex that undergo severe density depression during the DS or that have undergone vector control (Athrey et al., 2012;Hodges et al., 2013;Lehmann, Hawley, Grebert, & Collins, 1998;Simard et al., 2000;Taylor, Toure, Coluzzi, & Petrarca, 1993;Wondji, Simard, et al., 2005). The effective population size (N e ) approximates the number of breeding adults in a population if the population density is stable and parental contributions to the next generation are randomly distributed (Nei & Tajima, 1981;Waples, 1989). However, over a series of generations, it will be most affected by the lowest N e in any generation, as the harmonic mean. Studies of A. gambiae have revealed that except for in cases of strong vector control, N e was large (N e ≥ 1,000) and commonly the upper 95% CI included infinity, consistent with a robust aestivating population or with mass migration. This also indicates that the statistical power derived from genotyping specimens at up to 12 microsatellite loci with ~90 alleles is insufficient for the estimation of N e of these populations. Shallow genetic structure typically characterizes populations of A. gambiae and A. coluzzii (excluding differences between inverted and standard chromosomal inversions) over large distances (Czeher et al., 2010;Lehmann et al., 1996Lehmann et al., , 2003Pinto et al., 2013;Slotman et al., 2007;Weetman, Wilding, Steen, Pinto, & Donnelly, 2012), supporting the presence of large N e , as well as a formidable migration force (gene flow).
Our goal was to determine whether the early rainy season (RS) population of A. coluzzii in the Sahel descend from the preceding late RS generation from the same locality, consistent with aestivation (hypothesis 1), rather than from distinct and presumably distant locations, consistent with migration (hypothesis 2). The former hypothesis was proposed for A. coluzzii and the latter for A. gambiae, respectively . The expectation of a DS population bottleneck typically defined as N e < 100 (Avise, 1994;Luikart, Allendorf, Cornuet, & Sherwin, 1998) would reflect that the few mosquitoes observed in the DS are survivors that persist until the following rains. However, reduced numbers but no bottleneck is expected if hundreds or even thousands of, mostly unobserved, mosquitoes successfully undergo aestivation as suggested by recent studies (Adamou et al., 2011;Dao et al., 2014). Alternatively, local Sahelian populations may go extinct during the DS and become re-established at the early RS by migrants from areas with permanent water facilitating continued breeding.
These hypotheses lead to distinct predictions in terms of the withinpopulation genetic diversity and differentiation between populations as summarized in Table 1.
By including eleven samples of A. coluzzii and two of A. gambiae in this study, we focus the tests on the former species with only exploratory analysis of the second. In total, we genotyped 520 mosquitoes T A B L E 1 Key predictions based on aestivation and long-distance migration hypotheses  (Adamou et al., 2011;Lehmann et al., 2010).
Seven temporally spaced samples of A. coluzzii were obtained from the Sahelian village Thierola, two from Niono and a single sample from each of the villages of Kolimana and Bagadaji (Table 2 and Figure 1).
Two temporally spaced samples of A. gambiae were obtained from Thierola (Table 2 and Figure 1). Hereafter, population refers to locality (and species), whereas sample refers to a particular collection time point for a given population. During the DS, there is no surface water nor anopheline breeding sites in the villages of Thierola and Bagadaji and mosquito density is very low (Adamou et al., 2011;Dao et al., 2014;Lehmann et al., 2010). The village of Kolimana sits along the bank of the Niger River (50 km East of Thierola), and there are at least a few larval sites available year-round. The village of Soukrani (Niono) is surrounded by a large rice irrigation system where the year-round availability of many larval sites support an extremely high density of A. coluzzii (Diuk-Wasser et al., 2007;Sogoba et al., 2007). Samples were obtained during the RS (July to October), the DS (December to May) and during the transition months (June and November; Figure 1).
Most samples were collected over a maximum of 5 days, but several such as A. coluzzii from Thierola in August 2008 were collected over a longer period (Table 2).
F I G U R E 1 Map of study populations (coloured dots) with samples from Thierola named after the month and year of collection (n = sample size). Distances from Thierola are shown (yellow). Inset: Temporal sequence of sample collection dates from Thierola (red and purple for Anopheles coluzzii superimposed on population density (mean Anopheles gambiae s.l./house, Dao et al., 2014). Observed daily mean densities (in ~120 houses) are shown as circles and mean density over 5 days intervals is shown by the line. The seasonal acronyms under the x-axis includes "RS" and "DS" to denote rainy and dry seasons, preceded by "E" or "L" to denote the early and late parts of each season, respectively T A B L E 2 Sample size, polymorphism and genetic diversity in 738 SNP loci Year and month of collection (days between collection of the first and last mosquitoes if >5 days). c The seasonal acronyms includes "RS" and "DS" to denote rainy and dry seasons, preceded by "E" or "L" to denote the early and late parts of each season, respectively.

| DNA extraction and genotyping
Following morphological identification (Gillies & De Meillon, 1968), only A. gambiae complex mosquitoes were included, with identity refined using sequential species-specific PCR and PCR-RFLP with legs as template (Fanello, Santolamazza, & della Torre, 2002

| Data analysis
Quality of SNP scores was carefully evaluated and 619 SNPs were Within-sample genetic diversity was measured as the number of polymorphic SNPs (major allele frequency <95%); expected heterozygosity (H e ) per locus; and mean number of alleles per locus. To accommodate variation in sample size, these statistics were also computed based on the smallest sample size (n = 40 chromosomes), selecting the first 20 mosquitoes in larger samples. The differences between samples (n = 40 A. coluzzii) in the mean heterozygosity and the mean number of alleles were evaluated using ANOVA, treating locus as a random variable and sample as a fixed parameter and using contrasts to estimate differences between samples according to a priori hypotheses (see predictions in Table 1). The analysis used Proc Mixed (SAS 9.4).
Exact goodness-of-fit tests were used to assess deviations of genotype composition from Hardy-Weinberg expectations (HWE) in each locus and sample, accommodating multiple tests as discussed below.
The inbreeding coefficient, Fis, measuring the difference of observed heterozygosity from H e in units of H e (Nei, 1987;Wright, 1951)  To identify subgroups without prior information on species, geographical location or time point, we used STRUCTURE 2.3.3 (Pritchard et al. 2000). An admixture model assuming correlated allele frequencies was used throughout. Preliminary runs with the possible number of subpopulations (clusters, K) set between 1 and 10 employed 20,000 iterations, the first 10,000 as burn-in. Probability plots suggested a maximum of K = 6 or 7, and also a rapid plateau in parameter values within the burn-in period. Consequently, these were followed by test runs of 50,000 iterations, again with a burn-in of 10,000, with K set between 1 and 7. Each value of K was replicated five times, very low variability in log probability values detected among runs following processing suggesting that additional replicates would not alter the optimum solution. Results were processed with Structure Harvester v0.6.94 (Earl & Vonholdt, 2012). The optimal value was determined using the deltaK method (Evanno, Regnaut, & Goudet, 2005).
Subpopulation membership of individuals was determined by the highest ancestry proportion (Q) of each individual.
To test whether genetic drift between samples collected during the DS (and immediately after the first rains) has occurred, we partitioned the temporal variation in allele frequency (F T , Krimbas and Tsakas 1971;Nei & Tajima, 1981;Pollak 1983) to its components: the experimental sampling variance and the genetic variance (Ne). Thus, if F T was larger than the experimental sampling variance, genetic drift was measurable, consistent with reproduction over one or more generations. On the other hand, if F T was smaller than (or equal to) the experimenter's sampling variance, genetic drift was not evident, consistent with no reproduction. The genetic component is more difficult to detect the larger Ne is (Nei & Tajima, 1981;Waples, 1989 Nei and Tajima (1981): and y i are the common allele in locus i (all di-allelic) in the first (x) and second (y) samples, and S xi and S yi are the sample sizes (number of chromosomes) of the first (x) and second (y) samples from that locus (i). The Wilcoxon rank test was used to test whether the median of the distribution of the difference (F T i − V T i ) over all loci was equal or larger than zero to determine whether it F T > V T for each pair of samples.
Global tests were employed to evaluate significance of multiple tests. The sequential Bonferroni procedure (Holm, 1979) was applied to detect a single test-specific departure (e.g., for a locus), whereas the binomial test (which estimates the probability of obtaining the observed number of significant tests at a given significance level given the total number of tests) can detect weaker departures across multiple tests, such as genomewide departures. Unless otherwise specified, calculations were carried out using programmes written by TL in SAS (2011).

| Within-sample genetic diversity
The number of polymorphic loci, P 95 (major allele frequency, MAF < 95%), among samples varied considerably (63%-87%) in the full and in the standardized data set, with sample size fixed at n = 40 chromosomes (55%-82%, χ 2 12 = 206, p < .0001, Table 2 and Figure 2). Similarly, expected heterozygosity (H e ) and mean number of alleles/locus (A) varied between samples when tested in the standardized data set (F 12,8844 = 32.6, p < .0001 for H e , and F 12,8844 = 52.6, p < .0001 for A; Figure 2 and Table 2). Although the correlations be- and Table 2). The following year, which is represented by five samples, this phenomenon did not repeat: diversity was stable throughout October 2009 to August 2010, including June 2010-the seasonally equivalent sample to May 2009, which actually showed slightly elevated diversity (Figure 2 and Table 2). The inconsistency in stability of within-population diversity in A. coluzzii between years does not fit well with any single prediction. While a drop in diversity in the mid-RS of the first year to the early rain of the following year might appear to correspond to re-establishment by migrants, such large changes in genetic diversity have not been observed in previous studies on A. gambiae populations, including those from more arid habitats (Lehmann et al., 1998;Simard et al., 2000;Wondji, Simard, et al., 2005), suggesting an exceptional event. A possible explanation for the extreme fluctuation during the first year is a bottleneck associated with F I G U R E 2 Within-sample genetic diversity (mean and 95% CI) in a standardized sample size (N = 20 individuals, see text). Bubble size is proportional to the fraction of polymorphic loci (major allele frequency < 95%). Least-squares means and 95% CI of expected heterozygosity (H e ) on the y-axis and the number of alleles (A) on the x-axis, computed in an ANCOVA, with locus treated as a random effect and population and the frequency of the common allele as fixed effects, are shown by lines (sample sizes are given in Table 2   Genetic diversity, measured by H e , was not homogenous across the chromosomes of either species, showing variation between species, populations and temporal samples (Fig. S3). A. coluzzii exhibited reduced H e on the distal part of 2L and elevated H e along the right arm of the 2 nd chromosome, whereas A. gambiae exhibited higher diversity across the large 2La inversion, but similar or lower diversity near the Vgsc locus ( Fig. S3A-B). Unlike A. gambiae, diversity around the Vgsc-1014F resistance locus was not depressed in A. coluzzii. Diversity across the 3 rd chromosome was similar for A. coluzzii samples, but was heterogeneous in A. gambiae (Fig. S3A-B). Diversity of A. coluzzii from Kolimana in the left arm of chromosome two was higher than that of Thierola and Niono (Fig. S3C-D). Temporal variation in diversity between Thierolan A. coluzzii samples indicated seasonal fluctuation near the 2Rb inversion (Fig. S3E-H).

| Genetic distance among populations and samples
Overall genetic distances (Euclidian) between samples were computed across all 738 loci and used to depict the similarities among F I G U R E 4 Genetic (Euclidian) distance between sample pairs portrayed by multidimensional scaling. Arrows denote sequence of sample collection (from the past to recent). The population sample's label follows Table 2 The

| Testing "reproductive arrest" during the DS
A key prediction of aestivation is reproductive arrest during the DS.
This arrest applies to the mosquitoes that initiate aestivation at the end of the RS until they resume reproduction after the first rain (Adamou et al., 2011;Dao et al., 2014;Lehmann et al., 2010). DS reproductive arrest implies minimal genetic drift between samples collected during that period, whereas the migration hypothesis predicts high drift because new migrants arrive from distant population(s) that continuously reproduce, albeit in smaller numbers during the DS. Therefore, under aestivation, the variance in allele frequency between two samples of the same generation should solely reflect sampling variation (assuming negligible selection). On the other hand, without aestivation, that is, continuous reproduction, these samples were taken several generations apart and an additional component of variance, given by their Ne, would be discernible. Accordingly, if the total variance in allele frequency (F T ) is larger than the experimenter's sampling variance (V T ), genetic drift is evident, consistent with reproduction (see Section 2). Tests of genetic drift (reproduction) were conducted between all pairwise samples in Thierola (by species) and Niono (Table 3)

. For
A. coluzzii in Thierola, all but one sample pair with one of the samples preceding October 2009 exhibited F T > V T , whereas F T of all other sample pairs were smaller or equal to V T (Table 3). Importantly, these F I G U R E 5 Division of samples (M and S denote Anopheles coluzzii and Anopheles gambiae, respectively) among the clusters inferred by Structure. The total sample was subdivided into five clusters (K = 5, Fig. S5). (a) Composition of each sample with respect to the five clusters identified by Structure. Horizontal lines indicate series of temporal samples taken from the same species and location. The seasonal acronyms under the x-axis include "RS" and "DS" to denote rainy and dry seasons, preceded by "E" or "L" to denote the early and late parts of each season, respectively. (b) Composition of each cluster inferred by Structure with respect to all of the samples grouped by species and location based on the assignment probability (x-axis). Vertical broken line points to 50% assignment, above which individuals may be assigned to that cluster.  (Table 3), possibly reflecting smaller sample size of this species (Table 2), thus reducing the power for detection of F T > V T .
To evaluate the overall season-specific effects of DS and RS on drift in A. coluzzii, we regressed for each sample pair, their mean F T (across all loci) over the duration of RS and DS that separated them ( Figure 6). Bootstrapping (1000 pseudoreplicates), used to determine the 95% CI around the season coefficients, revealed that the RS slope was positive (0.038, 95% CI = 0.018-0.054), consistent with occurrence of time-dependent drift, whereas the DS slope was not significantly greater than zero (−0.018, 95% CI = −0.035 -0.0124, Figure 6). Whether the regression excluded or included the intercept, the significance of the RS and DS effects, as determined by bootstrapping were the same (not shown).

| DISCUSSION
Using genetic data, we aimed to determine whether in the Sahel, the early RS population of A. coluzzii descended from the preceding late RS generation at the same locality, consistent with aestivation, or from migrants from distant locations. These hypotheses were proposed for different malaria mosquito species based on recent studies of population dynamics, mark-release-recapture, semi-field cage experiments and seasonal changes in behaviour, physiology and morphology (Adamou et al., 2011;Arcaz et al., 2016;Dao et al., 2014;Huestis et al., 2012;Lehmann et al., 2010Lehmann et al., , 2014Yaro et al., 2012). However, the inability to find direct evidence, such as mosquitoes in their shelters during the DS, has prevented a definitive T A B L E 3 Temporal variation in allele frequency (F T ) between samples of Anopheles coluzzii (Thierola and Niono) and Anopheles gambiae (Thierola) in relation to experimental sampling variation (V T ) as a test of reproduction between time points, evident as genetic drift. Red highlighting indicates significantly higher total variance than sampling variance (measurable drift) and blue significantly higher sampling variance (no drift). Sample pairs of A. coluzzii from Thierola are shown above the diagonal and are listed according to the dates indicated by the columns and rows, whereas the sample pairs from Niono (green) and A. gambiae from Thierola (magenta) are shown below diagonal and their sampling dates listed *,**,***denote significance level p < .05, p < .01, and p < .001, respectively using the Wilcoxon Signed Test for the Median (see methods).
resolution of this problem. Using 738 SNPs genotyped in eleven A. coluzzii samples, including seven temporal samples spanning two years in one location, and two A. gambiae samples from the same location, we evaluated predictions (see Table 1) based on these two alternative hypotheses. Rather than providing a solid fit to one or the other hypotheses, the results present a mosaic of good and poor fit with respect to both sets of predictions, necessitating reassessment of underlying scenarios and consideration of additional ones.  Figure 4), according to prediction 1.3 (Table 1) and contrary to 2.3, above. Likewise, assignment of individuals to clusters based solely on their multilocus genotype using Structure revealed a similar composition of samples taken from the end of the RS and until after the first rain of the following RS, but a distinct composition for samples taken across the RS ( Figure 5). Additionally, genetic drift was detected between samples spanning the RS but not between samples spanning solely a DS (Table 3). Further, the temporal variance in allele frequency increased with time over the RS but not over the DS ( Figure   6), in accordance with prediction 1.4 but contrary to 2.4.
Although resistant mutation from 13% to 36% between these time points (Fig.   S2), and by the >6-fold lower peak density of A. coluzzii during that RS . This insecticide-induced bottleneck could also explain the conspicuously large genetic distance between these samples and all others (Figures 4, 5 and Table 3). Notably, the subsequent rise The effects of periods of dry season (x-axis) and rainy season (y-axis) on the mean variance in allele frequency, F T , (z-axis) across loci in all pairwise sample combinations in Thierola (n = 21 comparisons,  Figures 4 and S4). As reproduction ceases in the Sahel due to the absence of surface waters across vast areas such as around Thierola, these heterozygote deficits would persist until reproduction resumes a few weeks after the first rain (e.g., Th10Au Figure 3). These deviations, although mild (0 < Fis < 0.1), suggest massive migration, which presumably follows the prevailing winds, and accordingly arrive from the south. As nearby A. coluzzii populations were genetically similar, such migration must arrive from a distant area (>150 km). The heterozygote deficits observed in Niono may also attest for admixture with long-distance migrants and/or with Sahelian populations that surround the narrow (~5 km wide) irrigated area.
The exceptional value of heterozygote excess manifested by the 2009 A. coluzzii sample of Thierola from May and to a lower degree by the Kolimana sample from August ( Figure 3) are difficult to explain and similar values have not been observed previously in A. gambiae species (Lehmann et al., 1998(Lehmann et al., , 2003Slotman et al., 2007;Wondji, Frederic, et al. 2005). The heterozygote excess might reflect an aftermath of the insecticide treatment. Accordingly, the local Sahelian-adapted population was reduced by ITNs, whereas many of the migrants carried mutations conferring resistance to the insecticides, such as the Vgsc-1014F.
During the Sahelian DS, those genotypes were selected against, resulting in the preferential survival of heterozygotes that persisted until the first rains of 2009. Possibly such DS-related constraints cap the frequency of the Vgsc-1014F (and possibly other mutations) in Sahelian A. coluzzii (Fig. S2). Under a severe bottleneck, different alleles are fixed in females and males and their offspring would present heterozygote excess (Balloux, 2004), but this seems unlikely given the observed density (Lehmann et al., 2010). Finally, it is possible that aestivation involves a small fraction of the population, consisting of individuals carrying a combination of alleles from several polymorphic loci. These genotypes are favoured by the harsh Sahelian DS, yet are selected against during the RS, possibly generating subpopulations with a degree of linkage across the genome that produces the subpopulations identified by Structure as well as the departures from HWE as recently been demonstrated in overwintering Drosophila melanogaster in the temperate zone (Bergland, Behrman, O'Brien, Schmidt, & Petrov, 2014).
Having only two samples of A. gambiae precludes a comprehensive analysis of their fit to key predictions based on migration as previously proposed (Adamou et al., 2011;Dao et al., 2014). Yet, the observed patterns revealed large variation in within-population diversity, possibly consistent with prediction 2.1. In accordance with 2.3, considerable genetic distance separated the samples, possibly reflecting a founder effect, but contrary to 2.4, genetic drift was not detected between the two samples, providing no evidence for continuous reproduction (Table 3), possibly due to the lower power resulting from smaller samples.
In conclusion, our results reveal strong but imperfect agreement with predictions based solely on aestivation in A. coluzzii. Altogether, the observed patterns in this species support aestivation accompanied by long-distance migration in the late RS. Such migration does not address the problem of persistence throughout the DS, but could be important for persistence through the unpredictable dry spells during the RS, which are rather common in the Sahel (Hastenrath & Polzin, 2011Salack, Giannini, Diakhate, Gaye, & Muller, 2014;Tarhule et al., 2015). Without migration, every Sahelian population that survives the DS via aestivation would become extinct because of RS dry spell (