The genetic architecture of target‐site resistance to pyrethroid insecticides in the African malaria vectors Anopheles gambiae and Anopheles coluzzii

Abstract Resistance to pyrethroid insecticides is a major concern for malaria vector control. Pyrethroids target the voltage‐gated sodium channel (VGSC), an essential component of the mosquito nervous system. Substitutions in the amino acid sequence can induce a resistance phenotype. We use whole‐genome sequence data from phase 2 of the Anopheles gambiae 1000 Genomes Project (Ag1000G) to provide a comprehensive account of genetic variation in the Vgsc gene across 13 African countries. In addition to known resistance alleles, we describe 20 other non‐synonymous nucleotide substitutions at appreciable population frequency and map these variants onto a protein model to investigate the likelihood of pyrethroid resistance phenotypes. Thirteen of these novel alleles were found to occur almost exclusively on haplotypes carrying the known L995F kdr (knock‐down resistance) allele and may enhance or compensate for the L995F resistance genotype. A novel mutation I1527T, adjacent to a predicted pyrethroid‐binding site, was found in tight linkage with V402L substitutions, similar to allele combinations associated with resistance in other insect species. We also analysed genetic backgrounds carrying resistance alleles, to determine which alleles have experienced recent positive selection, and describe ten distinct haplotype groups carrying known kdr alleles. Five of these groups are observed in more than one country, in one case separated by over 3000 km, providing new information about the potential for the geographical spread of resistance. Our results demonstrate that the molecular basis of target‐site pyrethroid resistance in malaria vectors is more complex than previously appreciated, and provide a foundation for the development of new genetic tools for insecticide resistance management.


| INTRODUC TI ON
Pyrethroid insecticides have been the cornerstone of malaria prevention in Africa for almost two decades (Bhatt et al., 2015).
Pyrethroids are currently used in all insecticide-treated bed-nets (ITNs) and are used in indoor residual spraying (IRS) as well as in agriculture. Resistance to these insecticides is now widespread in malaria vector populations across Africa (Hemingway et al., 2016).
The World Health Organization (WHO) has published plans for insecticide resistance management (IRM) that emphasize the need for improvements in both our knowledge of the molecular mechanisms of resistance and our ability to monitor them in natural populations (World Health Organization, 2012;World Health Organization et al., 2017).
The voltage-gated sodium channel (VGSC) is the physiological target of pyrethroid insecticides and is integral to the insect nervous system. The sodium channel protein consists of four homologous domains (DI-DIV) each of which comprises six transmembrane segments (S1-S6) connected by intracellular and extracellular loops (Dong et al., 2014). Pyrethroid molecules bind to this protein, stabilize the ion-conducting active state and thus disrupt normal nervous system function, producing paralysis ('knock-down') and death. However, amino acid substitutions at key positions within the protein alter the interaction with insecticide molecules, increasing the dose of insecticide required for knock-down, known as knock-down resistance or kdr (Davies et al., 2007a;Dong et al., 2014).
In the African malaria vectors Anopheles gambiae and An. coluzzii, three substitutions have been found to cause pyrethroid resistance. Two of these substitutions occur in codon 995 1 , with L995F prevalent in West and Central Africa (Martinez-Torres et al., 1998;Silva et al., 2014), and L995S found in Central and East Africa (Ranson et al., 2000;Silva et al., 2014). A third substitution, N1570Y, has been found in West and Central Africa and shown to increase resistance in association with L995F (Jones et al., 2012).
However, studies in other insect species have found a variety of other Vgsc substitutions inducing a resistance phenotype (Davies et al., 2007b;Dong et al., 2014;Rinkevich et al., 2013). To our knowledge, no studies in malaria vectors have analysed genetic variation across the full Vgsc coding sequence, and thus, the molecular basis of pyrethroid target-site resistance has not been fully explored.
Basic information is also lacking about the spread of pyrethroid resistance in malaria vectors (World Health Organization, 2012). For example, it is not clear when, where or how many times pyrethroid target-site resistance has emerged. Geographical paths of transmission, carrying resistance alleles between mosquito populations, are also not known. Previous studies have found evidence that L995F occurs on several different genetic backgrounds, suggesting multiple independent outbreaks of resistance driven by this allele (Etang et al., 2009;Lynd et al., 2010;Pinto et al., 2007;Santolamazza et al., 2015). However, these studies analysed only small gene regions in a limited number of mosquito populations and therefore had limited resolution to make inferences about relationships between haplotypes carrying this allele. It has also been shown that the L995F allele spread from An. gambiae to An. coluzzii in West Africa (Clarkson et al., 2014;Diabaté et al., 2004;Norris et al., 2015;Weill et al., 2000). However, both L995F and L995S now have wide geographical distributions (Silva et al., 2014), and to our knowledge, no attempts have been made to infer or track the geographical spread of either allele across Africa.
Here, we report an in-depth analysis of genetic variation in the Vgsc gene, using whole-genome Illumina sequence data from phase 2 of the Anopheles gambiae 1000 Genomes Project (Ag1000G) (The Anopheles gambiae 1000 Genomes Consortium, 2020). The Ag1000G phase 2 resource includes data on nucleotide variation in 1,142 wild-caught mosquitoes sampled from 13 countries, with representation of West, Central, Southern and East Africa, and of both An. gambiae and An. coluzzii. We investigate variation across the complete gene coding sequence and report population genetic data for both known and novel non-synonymous nucleotide substitutions. We then use haplotype data from the chromosomal region spanning the Vgsc gene to study the genetic backgrounds carrying resistance alleles, investigate the geographical spread of resistance between mosquito populations and provide evidence for recent positive selection. Finally, we explore ways in which variation data from Ag1000G can be used to design highthroughput, low-cost genetic assays for surveillance of pyrethroid resistance, with the capability to differentiate and track resistance outbreaks.

| Vgsc non-synonymous nucleotide variation
To identify variants with a potentially functional role in pyrethroid resistance, we extracted single nucleotide polymorphisms (SNPs) that alter the amino acid sequence of the VGSC protein from the Ag1000G phase 2 data resource (The Anopheles gambiae 1000 Genomes Consortium, 2020). We then computed their allele frequencies among 16 mosquito populations defined by species and country of origin. Alleles that confer resistance are expected to increase in frequency under selective pressure; therefore, we filtered the list of potentially functional variant alleles to retain only those at or above 5% frequency in one or more populations (Table 1). The resulting list comprises 23 variant alleles, including the known L995F, L995S and N1570Y resistance alleles, and a further 20 alleles which prior to Ag1000G had not previously been described in anopheline mosquitoes. We reported 12 of these novel alleles in our overall analysis of the 765 samples in the Ag1000G phase 1 data resource 1 Codon numbering is given here relative to transcript AGAP004707-RD as defined in the AgamP4.12 gene-set annotations. A mapping of codon numbers from AGAP004707-RD to Musca domestica, the system in which kdr mutations were first described (Williamson et al., 1996), is given in Table 1. TA B L E 1 Non-synonymous nucleotide variation in the voltage-gated sodium channel gene coluzzii; Ag, An. gambiae. Species status of specimens from Guinea-Bissau, Gambia and Kenya is uncertain (The Anopheles gambiae 1000 Genomes Consortium, 2020). All variants are at 5% frequency or above in one or more of the 16 Ag1000G phase 2 populations, with the exception of 2,400,071 G > T which is only found in the CMAg population at 0.3% frequency but is included because another mutation is found at the same position (2,400,071 G > A) at >5% frequency and which causes the same amino acid substitution (M490I).
a Position relative to the AgamP3 reference sequence, chromosome arm 2L. b Codon numbering according to Anopheles gambiae transcript AGAP004707-RD in geneset AgamP4.12.
d Location of the variant within the protein structure. Transmembrane segments are named according to domain number (in Roman numerals) followed by 'S' then the number of the segment; for example, 'IIS6' means domain two, transmembrane segment six. Internal linkers between segments within the same domain are named according to domain (in Roman numerals) followed by 'L' then the numbers of the linked segments; for example, 'IL45' means domain one, linker between transmembrane segments four and five. Internal linkers between domains are named 'L' followed by the linked domains; for example, 'LI/II' means the linker between domains one and two. 'COOH' means the internal carboxyl tail.
The 23 non-synonymous variants were located on a transmembrane topology map and on a 3-dimensional homology model of the Vgsc protein. (Figure 1). The substitutions were found to be distributed throughout the channel, in all of the four internally homologous domains (DI-DIV), in S1, S5 and S6 membrane-spanning segments, in two of the intracellular loops connecting domains, and in the C-terminal tail. The S5 and S6 segments that form the central ion-conducting pore of the channel carry six of the eight segment substitutions, including V402 and L995 which have been shown to produce insecticide resistance phenotypes (Davies et al., 2007a;Dong et al., 2014;Martinez-Torres et al., 1998;Ranson et al., 2000;Silva et al., 2014). Two substitutions are located on the DIII-DIV linker including the resistance conferring N1570 (Jones et al., 2012).
A further six substitutions are found concentrated in the protein's carboxyl tail (C-terminus), including two alternative substitutions at the resistance-associated P1874 residue (Sonoda et al., 2008). The DIII-DIV linker and the C-terminus segment interact in the closedstate channel and substitutions are found throughout this intracellular subdomain. Finally, there are four novel substitutions located on the DI-DII intracellular linker, but this region is missing from the model as it was not resolved in the cockroach NavPaS structure used as the model template .
The two known resistance alleles affecting codon 995 had the highest overall allele frequencies within the Ag1000G phase 2 cohort (  (Wang et al., 2015). To study the patterns of association among non-synonymous variants, we used haplotypes from the Ag1000G phase 2 resource to compute the normalized coefficient of linkage disequilibrium (D′) between all pairs of variant alleles ( Figure 2). As expected, we found N1570Y in almost perfect linkage with L995F. Of the 20 novel non-synonymous alleles, 13 also occurred almost exclusively in combination with L995F ( Figure 2).
These included two variants in codon 1874 (P1874S, P1874L), one of which (P1874S) has previously been associated with pyrethroid resistance in the crop pest moth Plutella xylostella (Sonoda et al., 2008).
The abundance of high-frequency non-synonymous variants occurring in combination with L995F is notable for two reasons.
First, Vgsc is a highly conserved gene, expected to be under strong functional constraint and therefore purifying selection, so any non-synonymous variants are expected to be rare (Davies et al., 2007b). Second, in contrast with L995F, we did not observe any F I G U R E 2 Linkage disequilibrium (D′) between non-synonymous variants. A value of 1 indicates that two alleles are in perfect linkage, meaning that one of the alleles is only ever found in combination with the other. Conversely, a value of −1 indicates that two alleles are never found in combination with each other. The bar plot at the top shows the frequency of each allele within the Ag1000G phase 2 cohort. See Table 1 for population allele frequencies high-frequency non-synonymous variants occurring in combination with L995S. This contrast was clear when data on all variants within the gene were considered: for haplotypes carrying the L995F allele, the ratio of non-synonymous to synonymous nucleotide diversity π N /π S was 20.04 times higher than haplotypes carrying the wild-type allele, but for those carrying L995S π N /π S was 0.5 times lower than haplotypes carrying the wild-type allele. These results indicate that and two separate nucleotide substitutions causing M490I, did not occur in combination with any known resistance allele and were almost exclusively private to a single population (Table 1).

| Genetic backgrounds carrying resistance alleles
The Ag1000G data resource provides a rich source of information about the spread of insecticide resistance alleles in any given gene, because data are not only available for SNPs in protein coding regions, but also SNPs in introns, flanking intergenic regions and in neighbouring genes. These additional variants can be used to analyse the genetic backgrounds (haplotypes) on which resistance alleles are found. In our initial report of the Ag1000G phase  Vgsc gene (π < 4.5 × 10 −5 bp −1 ). In contrast, diversity among wildtype haplotypes was two orders of magnitude greater (Cameroon An. gambiae π = 1.4 × 10 −3 bp −1 ; Guinea-Bissau π = 5.7 × 10 −3 bp −1 ).
Thus, it is reasonable to assume that each of these five groups contains descendants of an ancestral haplotype that carried a resistance allele and has risen in frequency due to selection for insecticide resistance. Given this assumption, these groups each provide evidence  (Figures S1 and S2). This analysis supported a refinement of our initial grouping of haplotypes carrying resistance alleles. All haplotypes within groups S4 and S5 were effectively identical on both the upstream and downstream flanks of the gene, but there was a region of divergence within the Vgsc gene itself that separated them in the fixed window analyses ( Figure S3). The 13.8 kbp region of divergence occurred upstream of codon 995 and contained 6 SNPs that were fixed differences between S4 and S5. A possible explanation for this short region of divergence is that a gene conversion event has occurred within the gene, bringing a segment from a different genetic background onto the original genetic background on which the L995S resistance mutation occurred.

| Positive selection for resistance alleles
To investigate evidence for positive selection on non-synonymous alleles, we performed an analysis of extended haplotype homozygosity (EHH) (Sabeti et al., 2002). Haplotypes under recent positive selection will have increased rapidly in frequency, thus have had less time to be broken down by recombination, and should on average have longer regions of haplotype homozygosity relative to wild-type haplotypes. We defined a core region spanning Vgsc codon 995 and an additional 6 kbp of flanking sequence, which was the minimum required to differentiate the haplotype groups identified via clustering and network analyses. Within this core region, we found 18 distinct haplotypes at a frequency above 1% within the cohort. These included core haplotypes corresponding to each of the 10 haplotype groups carrying L995F or L995S alleles identified above, as well as a core haplotype carrying I1527T which we labelled L1 (due to it carrying the wild-type leucine codon at position 995). We also found a core haplotype corresponding to a group of haplotypes from Kenya carrying an M490I allele, which we labelled as L2. All other core haplotypes we labelled as wild-type (wt). We then computed EHH decay for each core haplotype up to a megabase upstream and downstream of the core locus ( Figure 5).
As expected, haplotypes carrying the L995F and L995S resistance alleles all experience a slower decay of EHH relative to wildtype haplotypes, supporting positive selection. Previous studies have found evidence for different rates of EHH decay between L995F and L995S haplotypes, suggesting differences in the timing and/or strength of selection (Lynd et al., 2010). However, we found no systematic difference in the length of shared haplotypes when comparing F1-F5 (carrying L995F) against S1-S5 (carrying L995S) ( Figure S3). There were, however, some differences between core haplotypes carrying the same allele. For example, shared haplotypes were significantly longer for S1 (median 1.  Figure S3). Longer shared haplotypes indicate a more recent common ancestor, and thus, some of these core haplotypes may have experienced more recent and/or more intense selection than others.
As sample collections took place over 12 years (2000)(2001)(2002)(2003)(2004)(2005)(2006)(2007)(2008)(2009)(2010)(2011)(2012), it might be expected that core haplotypes appearing earlier in our sampling would have smaller shared haplotypes due to increased opportunity for recombination and mutation. However, no correlation was found between the year a core haplotype was first detected and the median length (r(8) = 0.03, p = 0.93, Figure S3). Thus, this signal may also be due to the extreme demographic history of this population.

| Cross-resistance between pyrethroids and DDT
The VGSC protein is the physiological target of both pyrethroid insecticides and DDT (Davies et al., 2007a). The L995F and L995S alleles are known to increase resistance to both of these insecticide classes (Martinez-Torres et al., 1998;Ranson et al., 2000). By 2012, over half of African households owned at least one pyrethroid impregnated ITN and nearly two thirds of IRS programmes were using pyrethroids (Hemingway et al., 2016). Pyrethroids were also introduced into agriculture in Africa prior to the scale-up of public health vector control programmes and continue to be used on a variety of crops such as cotton (Reid & McKenzie, 2016). DDT was used in Africa for several pilot IRS projects carried out during the first global campaign to eradicate malaria, during the 1950 s and 1960 s (Davies et al., 2007b). DDT is still approved for IRS use by WHO and remains in use in some locations; however within the last two decades, pyrethroid use has been far more common and widespread. DDT was also used in agriculture from the 1940 s, and although agricultural usage has greatly diminished since the 1970 s, some usage remains (Abuelmaali et al., 2013). In this study, we reported evidence of positive selection on the L995F and L995S alleles, as well as the I1527T + V402L combination and possibly M490I. We also found 14 other non-synonymous substitutions that have arisen in association with L995F and appear to be positively selected. Given that pyrethroids have dominated public health insecticide use for two decades, it is reasonable to assume that the selection pressure on these alleles is primarily due to pyrethroids rather than DDT. It has previously been suggested that L995S may have been initially selected by DDT usage (Lynd et al., 2010). However, we did not find any systematic difference in the extent of haplotype homozygosity between these two alleles, suggesting that both alleles have been under selection over a similar time frame. We did find some sig-

| Resistance phenotypes for novel nonsynonymous variants
The non-synonymous variants are distributed throughout the channel protein but can be considered in terms of three clusters: Faso occurs in segment IIIS6 and is immediately adjacent to two pyrethroid-sensing residues in this binding site (Dong et al., 2014).
It is thus plausible that pyrethroid binding could be altered by this substitution. The I1527T substitution (M. domestica codon 1532) has been found in Aedes albopictus (Xu et al., 2016), and substitutions in the nearby codon 1529 (M. domestica I1534T) have been reported in Aedes albopictus and in Aedes aegypti where it was found to be associated with pyrethroid resistance (Dong et al., 2014;Ishak et al., 2015;Li et al., 2018). We found the I1527T allele in tight linkage with two alleles causing a V402L substitution (M. domestica V410L).
Substitutions in codon 402 have been found in multiple insect species and are by themselves sufficient to confer pyrethroid resistance (Dong et al., 2014). The fact that we find I1527T and V402L in such tight mutual association is intriguing because haplotypes carrying V402L alone should also have been positively selected and thus be present in one or more populations.
The V402 residue is located towards the middle of the IS6 helix.
The L995F and L995S substitutions occur at a similar position on the IIS6 helix. It was proposed these S6 substitutions confer resistance by allosterically modifying formation of the pyrethroid-binding site (O'Reilly et al., 2006). More recently, the L995 kdr residue was speculated to form part of a second pyrethroid-binding site in the insect channel termed 'PyR2' . A major functional effect of the L995F substitution is enhanced closed-state inactivation (Vais et al., 2000). This contributes to kdr resistance by reducing the number of channels that undergo activation, which is the functional state that pyrethroids bind to with highest affinity (Vais et al., 2000).
Fast inactivation involves movement of the DIV domain to form a receptor for the DIII-DIV linker fast inactivation particle containing the 'MFM' sequence motif (equivalent to the 'IFM' motif in mammals) (Capes et al., 2013;Dong et al., 2014). Recent eukaryotic sodium channel structures reveal that the DIII-DIV linker is in complex with the C-terminal segment in the closed-state conformation but the DIII-DIV linker appears to dissociate and bind in close proximity in the DIV S6 helix upon transition to the inactivated state Yan et al. 2017). It seems that binding of the DIII-DIV linker pushes the DIV S6 helix forward to occlude the pore and produce the inactivated state . We suggest that substitutions located on the DIII-DIV linker and C-terminal tail may perturb the conformation of this subdomain when it assembles in the closedstate channel and may subsequently affect capture or release of the DIII-DIV linker from this complex. The expected functional outcome would be altered channel inactivation, although whether inactivation is enhanced or diminished and if this compensates for a deleterious effect of L995F on channel function awaits elucidation. The N1570Y substitution on the DIII-DIV linker has been functionally characterised but inactivation kinetics in the mutant channel were found unaltered (Wang et al., 2015). Pyrethroid sensitivity was also unaffected by N1570Y although resistance was greatly enhanced in the N1570Y + L995F double mutant (Wang et al., 2015).
The final cluster of novel variants is located on the DI-DII intracellular linker. This segment includes the novel M490I substitution that was found on the Kenyan L2 haplotypic background potentially under selection. M490I did not occur in association with L995F or any other non-synonymous substitutions. Although we were unable to model this region, we speculate that the DI-DII linker passes under the DII S4-S5 linker and these regions may interact, as was found in a bacterial sodium channel structure (Sula et al., 2017). The structural effects of DI-DII substitutions may be altered interactions with the DII S4-S5 linker, the movement of which is critical for formation of the pyrethroid-binding site (O'Reilly et al., 2006;Usherwood et al., 2007). Overall, there are a number of potential mechanisms by which a pyrethroid resistance phenotype may arise and topology modelling reveals how many of the non-synonymous variants we discover may be involved, though clearly much remains to be unravelled regarding the molecular biology of pyrethroid resistance in this channel. ficient resolution to answer these questions and could be used to provide ongoing resistance surveillance. The cost of whole-genome sequencing continues to fall, making it a practical tool for malaria vector surveillance. However, to achieve substantial spatial and temporal coverage of mosquito populations, it would also be necessary to develop targeted genetic assays for resistance outbreak surveillance. Technologies such as amplicon sequencing (Kilianski et al., 2015) are already being trialled on mosquitoes (Lucas et al., 2019), and these could scale to tens of thousands of samples at low cost and could be implemented using existing platforms in national molecular biology facilities.  This table includes the allele frequency within each of the 10 haplotype groups defined here, to aid in identifying SNPs that are highly differentiated between two or more haplotype groups. We also provide Table S3 which lists all 10,244 SNPs found within the Vgsc gene and up to 10 kbp upstream or downstream, which might need to be taken into account as flanking variation when searching for PCR primers to amplify a SNP of interest. To provide some indication for how many SNPs would need to be assayed in order to track the spread of resistance, we used haplotype data from this study to construct decision trees that could classify which of the 12 groups a given haplotype belongs to ( Figure 6). This analysis suggested that it should be possible to construct a decision tree able to classify haplotypes with >95% accuracy by using 20 SNPs or less. In practice, more SNPs would be needed, to provide some redundancy and also to type non-synonymous polymorphisms in addition to identifying the genetic background. However, it is still likely to be well within the number of SNPs that could be assayed in a single multiplex via amplicon sequencing. Thus, it should be feasible to produce lowcost, high-throughput genetic assays for tracking the spread of pyrethroid resistance. If combined with whole-genome sequencing of mosquitoes at sentinel sites, this should also allow the identification of newly emerging resistance outbreaks.

| Data
We used variant calls and phased haplotype data from the Ag1000G  Table 1 includes the predicted effect for all SNPs that are non-synonymous in one or more of these transcripts. None of the variants that are nonsynonymous in a transcript other than AGAP004707-RD were found to be above 5% frequency in any population.
For ease of comparison with previous work on Vgsc, pan Insecta, in Table 1 and Table S1, we report codon numbering for both An.
gambiae and Musca domestica (the species in which the gene was first discovered). The M. domestica Vgsc sequence (EMBL accession X96668 (Williamson et al., 1996)) was aligned with the An. gambiae AGAP004707-RD sequence (AgamP4.12 geneset) using the Mega v7 software package (Kumar et al., 2016). A map of equivalent codon numbers between the two species for the entire gene can be download from the MalariaGEN website (https://www.malar iagen.net/ sites/ defau lt/files/ conte nt/blogs/ domes tica_gambi ae_map.txt).

| Haplotype networks
Haplotype networks were constructed using the median-joining algorithm (Bandelt et al., 1999) as implemented in a Python module available from https://github.com/malar iagen/ ag100 0g-phase 2-vgsc-report. Haplotypes carrying either L995F or L995S mutations were analysed with a maximum edge distance of two SNPs.
Networks were rendered with the Graphviz library and a composite figure constructed using Inkscape. Non-synonymous edges were highlighted using the SnpEff annotations (Cingolani et al., 2012).

| Positive selection
Core haplotypes were defined on a 6,078-bp region spanning Vgsc codon 995, from chromosome arm 2L position 2,420,443 and ending at position 2,426,521. This region was chosen as it was the smallest region sufficient to differentiate between the ten genetic backgrounds carrying either of the known resistance alleles L995F or L995S. Extended haplotype homozygosity (EHH) was computed for all core haplotypes as described in Sabeti et al. (2002) using scikit-allel version 1.1.9 (Miles & Harding, 2016), excluding non-synonymous and singleton SNPs. Analyses of haplotype homozygosity in moving windows (Figures S1 and S2) and pairwise haplotype sharing ( Figure S3) were performed using custom Python code available from https://github.com/malar iagen/ ag100 0g-phase 2-vgsc-report.

| Design of genetic assays for surveillance of pyrethroid resistance
To explore the feasibility of identifying a small subset of SNPs that would be sufficient to identify each of the genetic backgrounds carrying known or putative resistance alleles, we started with an input data set of all SNPs within the Vgsc gene or in the flanking regions 20 kbp upstream and downstream of the gene. Each of the 2284 haplotypes in the Ag1000G Phase 2 cohort was labelled according to which core haplotype it carried, combining all core haplotypes not carrying known or putative resistance alleles together as a single 'wild-type' group. Decision tree classifiers were then constructed using scikit-learn version 0.19.0 (Pedregosa et al., 2011) for a range of maximum depths, repeating the tree construction process 10 times for each maximum depth with a different initial random state.
The classification accuracy of each tree was evaluated using stratified fivefold cross-validation.

| Homology modelling
A homology model of the An. gambiae voltage-gated sodium channel (AGAP004707-RD AgamP4.12) was generated using the 3.8 Å resolution structure of the Periplaneta americana sodium channel NavPaS structure (PDB code 5X0M) . Sequences were aligned using Clustal Omega (Sievers et al., 2011). 50 starting models were generated using MODELLER (Webb et al., 2016).
The internal scoring function of MODELLER was used to select 10 models, which were visually inspected and submitted to the VADAR webserver (Willard et al., 2003) to assess stereochemistry in order to select the best final model. Figures were produced using PyMOL (DeLano Scientific, San Carlos, CA, USA).

ACK N OWLED G EM ENTS
We thank the anonymous reviewers for comments which improved this manuscript.