Overexpression and nonsynonymous mutations of UDP-glycosyltransferases are potentially associated with pyrethroid resistance in Anopheles funestus

UDP-glycosyltransferases (UGTs) enzymes are pivotal in insecticide resistance by transforming hydrophobic substrates into more hydrophilic forms for efficient cell elimination. This study provides the first comprehensive investigation of Anopheles funestus UGT genes, their evolution, and their association with pyrethroid resistance. We employed a genome-wide association study using pooled sequencing (GWAS-PoolSeq) and transcriptomics on pyrethroid-resistant An. funestus, along with deep-targeted sequencing of UGTs in 80 mosquitoes Africa-wide. UGT310B2 was consistently overexpressed Africa-wide and significant gene-wise Fst differentiation was observed between resistant and susceptible populations: UGT301C2 and UGT302A3 in Malawi, and UGT306C2 in Uganda. Additionally, nonsynonymous mutations in UGT genes were identified. Gene-wise Tajima's D density curves provide insights into population structures within populations across these countries, supporting previous observations. These findings have important implications for current An. funestus control strategies facilitating the prediction of cross-resistance to other UGT-metabolised polar insecticides, thereby guiding more effective and targeted insecticide resistance management efforts.


Introduction
In sub-Saharan Africa, malaria remains a significant cause of morbidity and mortality, particularly among pregnant women and children under 5 years old.It accounts for >96% of malaria-related deaths globally [1].The primary strategies for malaria control depend heavily on insecticide-based interventions, such as indoor residual spray (IRS) and long-lasting insecticide nets (LLINs) [1,2].These interventions have been highly successful in reducing malaria cases and associated morbidity, globally preventing over 2 billion malaria cases and 11.7 million malaria-related deaths between 2000 and 2021 [1].However, several mosquito species have developed multiple and cross-resistance to several insecticides, including pyrethroids the main compound approved by WHO for LLINs.Unless resistance management strategies are designed by elucidating the evolutionary and molecular basis of resistance, recent gains in reducing malaria burden could be lost [3,4].
Anopheles funestus, a major malaria vector across sub-Saharan Africa, has demonstrated resistance to several insecticides, including pyrethroids, jeopardising recent efforts to eradicate and control malaria [5][6][7][8][9][10][11][12][13][14][15][16][17][18][19].The primary mechanisms of insecticide resistance are target site resistance and metabolic resistance [20].In An. funestus, metabolic resistance plays a predominant role, as the high expression of efficient detoxifying enzymes allows for the rapid removal or destruction of insecticides [21].Metabolic resistance involves three-phase metabolic pathways present in all major groups of organisms.These pathways consist of modification, biotransformation, and excretion of toxic insecticides [21].During Phase I, the toxicity of insecticides is reduced when a reactive and polar group is added to the substrate by a variety of enzymes including, cytochrome P450 monooxygenases (P450s), esterases, alcohol dehydrogenases and aldehyde dehydrogenases.In Phase II, the activated metabolites produced by the Phase I reactions are conjugated with charged species and bio-transformed into more polar and soluble metabolites that can be actively transported [22,23].The Phase II reactions are catalysed by a variety of transferases, including sulfotransferases, glutathione S-transferases (GSTs) and UDPglycosyltransferases (UGTs) [24].Conjugated toxins are then excreted from the cells into the extracellular medium via Phase III, where a variety of membrane-bound transporters are involved, notably ATPbinding cassette (ABC) transporters [24].
Previous studies have highlighted the role of An. funestus P450s enzymes, including CYP6P9a, CYP6P9b, CYP9J11, CYP6Z1, CYP6M7 and CYP9K1, in metabolising pyrethroids, leading to significant depletion and reduced efficacy of pyrethroids-treated bed nets [7,8,14,17,25,26].Additional classes of detoxification enzymes in An. funestus, such as GSTe2, confer resistance to pyrethroids and DDT through allelic variations that increase metabolic activities, resulting in cross-resistance [15].The upregulation of other detoxification enzymes belonging to Phase II such as UGTs has been observed in previous transcriptomic studies investigating the molecular basis of resistance to pyrethroid [17,26].However, despite their established role in detoxification in other insects, UGTs role in pyrethroid resistance in malaria vectors, including An. funestus, remains largely uncharacterised.
UDP-glycosyltransferases (UGTs) constitute a superfamily of enzymes that play a vital role in the biotransformation of various hydrophobic compounds into more hydrophilic products [27].These enzymes catalyse the covalent addition of a glycosyl group from an active donor, uridine diphosphate (UDP) glucose, to hydrophobic compounds containing hydroxyl, carboxyl, or amino functional groups through the glycosylation reaction [22,[27][28][29].The resulting glycosides are more polar metabolites that can be easily excreted from the cell by export transporters than the substrate compound.Glucose conjugation is involved in various physiological processes in insects, including pigmentation, cuticle formation (sclerotization) and metabolic detoxification [30][31][32].In certain lepidopteran insects, especially, UGTmediated glycosylation activities are associated with resistance to plant defensive allelochemicals.These activities have been observed in economically important species such as the silkworm Bombyx mori [33], the tobacco hornworms Manduca sexta [34] and the Asian corn borer Ostrinia furnacalis [35][36][37] UGT-mediated biotransformation of pyrethroids has been suggested in Anopheles sinensis, a common malaria vector in Southeast Asia [38].Their contribution to resistance against several classes of insecticide has been reported in the Diamondback moth Plutella xylostella and the tomato leafminer Tuta absoluta resistance to chlorantraniliprole [39,40], the housefly Musca domestica resistance to organophosphate [41], and the greenfly cotton aphid Aphis gossypii resistance to neonicotinoid and spirotetramat [42,43].
While genome-wide characterisation of UGT genes, their evolution, and association with insecticide resistance has been conducted in various insects using different sequencing technologies [44][45][46][47], UGT genes remain largely uncharacterised in malaria vectors.Although their overexpression in response to pyrethroids exposure has been reported in An. funestus, the investigations of UGTs in this context has been limited compared to other gene families, such as cytochrome p450s or GSTs.Selective sweeps and nonsynonymous mutations associated with UGTs, which could potentially contribute to pyrethroid resistance, have not been previously identified.In this study, we combine genome-wide association of pooled-template sequencing (GWAS-PoolSeq) with transcriptomic analysis of pyrethroids-resistant feild populations of An. funestus and deep sequencing of part of the genome of 80 mosquitoes to comprehensively characterise An. funestus UGTs genome-wide, their evolution, and association with pyrethroid resistance.This study provides the first comprehensive investigation into the role of UGTs role in pyrethroids resistance in An. funestus, the major malaria vector.

Mosquito collection, rearing and sequencing
The collection, rearing and sequencing of mosquitoes were described in detail previously in [7,11,17,18,26,48].In brief, two An.funestus laboratory colonies (FANG and FUMOZ) and field mosquitoes from Malawi, Cameroon and Uganda were used in this study.The FUMOZ colony is a multi-insecticide-resistant An. funestus colony derived from Southern Mozambique [49].While the FANG is an insecticidesusceptible An. funestus colony derived from Angola [49].Mosquitoes were collected from field populations representing Southern Africa (Malawi), Central West Africa (Cameroon) and East Africa (Uganda).Mosquitoes were collected from Southern Chikwawa (16 • 1′S, 34 • 47′E), Malawi in 2002 and January 2014 [13]; Tororo (0 • 45′N, 34 • 5′E), Uganda in March 2014 [12], and from Mibellon (6 • 46′N, 11 • 70′E), Cameroon in February 2015.The collected F 0 mosquitoes were determined to be belonging to the An.funestus group using morphological and molecular identification [50,51].Genomic DNA was extracted using DNeasy blood and tissue kit (Qiagen, Hilden, Germany) from 40 F mosquitoes individually and pooled in equal amounts.Library preparation and whole-genome sequencing by Illumina HiSeq2500 (2 × bp paired-end) were carried out by the Centre for Genomic Research (CGR), University of Liverpool, UK [18,48].The pooled-sequencing data (PoolSeq) of 40 mosquitoes per pool for F 0 populations from field An. funestus populations and two lab strains are available in the European Nucleotide Archive (ENA) under accession numbers PRJEB24384 and PRJEB13485 [18,48].
For transcriptional profiling of pyrethroid resistance in field populations and laboratory colonies, collected F 0 gravid mosquitoes were forced to lay eggs using the forced egg-laying method [9].Egg batches were transported to the Liverpool School of Tropical Medicine (DEFRA) license (PATH/125/2012).Eggs were allowed to hatch in the insectary, and larvae were reared to adulthood in distilled water and fed Tetra-minTM baby fish food every day.The water of each larvae tray was changed every two days to reduce mortality, according to rearing conditions described previously in [9].F 1 females that are two to five days old were subjected to insecticide resistance bioassays as described previously in [12,13].F 1 females were exposed to permethrin (0.75%) for a varying length of time to define putatively resistant and susceptible mosquitoes.In populations from Malawi and Uganda, susceptible mosquitoes were defined as those that die after 60 min of permethrin (0.75%) exposure and resistant mosquitoes that are still alive after min.Due to lower levels of resistance in the population from Cameroon, susceptible mosquitoes were dead mosquitoes collected after 20 min of exposure and resistant are those that are still alive after 60 min exposure.Total RNA was extracted from pools of 10 female mosquitoes using the Arcturus PicoPure ™ RNA Isolation Kit (Thermo Scientific, MA, USA) as described previously [17,26].A total of 18 RNA pools were collected for populations from Malawi, Cameroon and Uganda.In each field-collected population, 3 pools of alive mosquitoes after exposure to permethrin and 3 pools of unexposed mosquitoes were collected.In addition, 4 pools each were collected for laboratory colonies FANG and FUMOZ.Library preparation and sequencing by Illumina HiSeq 2500 (2 × 125-bp) were done at the Centre for Genomic Research (CGR), University of Liverpool [17,18].The pooled RNAseq data are available in the European Nucleotide Archive (ENA) under accession numbers PRJEB24351, PRJEB45224 and PRJEB70998.
For the phylogenetic analysis, amino acid sequences of 123 UGT genes from four Diptera species, Anopheles funestus (27), Aedes aegypti (35), Anopheles gambiae (26), Drosophila melanogaster (35) were retrieved from Vectorbase and globally aligned using Clustal Omega alignment built within Geneious Prime 2022.1.1 (https://www.geneious.com)[54].The phylogenetic tree was built using Geneious builtin FastTree.FastTree uses a modified version of the neighbour-joining algorithm to construct an initial tree that is refined using a maximumlikelihood approach and heuristics are used to speed up the treebuilding process [57].The phylogenetic tree was edited using the online tool, Interactive Tree of Life (http://itol.embl.de/).An. funestus UGT genes were named based on the UGT nomenclature system [58].
To construct the Templeton, Crandall and Sing (TCS) haplotype network, UGT haplotypes for all individuals were extracted from the variant calling files using bcftools [59] resulting in 160 haplotypes for each UGT gene.Haplotypes were aligned using MUSCLE aligner [60] and TCS haplotype network was built using POPART software [61].

Differential expression analysis of total RNA and analysis of PoolSeq data
All the 21 UGT genes annotated in the FUMOZ genome including the partially sequenced UGT308B3 (AFUN018708) were investigated for association with pyrethroid resistance using GWAS-PoolSeq and Differentially expressed genes (DEGs) analyses.
To identify UGTs that are overexpressed in response to exposure to permethrin, differentially expressed genes (DEGs) analyses between mosquitoes alive after exposure to permethrin and unexposed populations from Malawi, Uganda, and Cameroon were performed using pools of total RNA.In addition, differentially expressed gene analyses were performed by contrasting transcription profiles of populations resistant to permethrin from Malawi, Uganda, Cameroon, and laboratory-resistant colony (FUMOZ) against the transcription profile of laboratory-susceptible (FANG) (See Materials and Methods for details).In each separate contrast, DEGs were determined globally while DEGs of detoxification genes were highlighted including differential expressions of UGT genes (Supplementary Fig. 3).The analysis involves initial preprocessing of raw reads, alignment to the FUMOZ reference genome using the AfunF3.2annotation (Assembly ID GCA_003951495.1) and count of reads on the gene level.The DEG analysis was conducted using edgeR [62].Differentially regulated genes were defined in each separate contrast as those with a corrected p-value threshold of <0.05 and log 2 fold change >1.
To validate the expression profiles of select UGTs using quantitative real-time PCR, RNA was extracted in pools of 10 from 3 to 5 days old females randomly selected and unexposed to any insecticides using Arcturus PicoPure RNA isolation kit (Applied Biosystems, CA, USA) according to the manufacturer's instructions.Reverse transcription and qRT-PCR were performed using Invitrogen reverse transcriptase kit (Invitrogen, Waltham, CA, USA) and SYBR green master mix kit (Sigma, Aldrich, Germany) respectively according to the manufacturer's instructions.The expression of RNA was calculated using ddCT protocol [63] and normalised to the expression of Ribosomal protein S7 housekeeping gene.The efficiency of the primers was incorporated into the analysis and values were finally converted into their logarithmic forms for normal distribution.The primer sequences are provided in Supplementary Table S9.
For the PoolSeq analysis, DNA pools from An. funestus laboratory colonies (FANG and FUMOZ) and F 0 field-collected mosquitoes were examined for a signature of selection in the UGT gene family.The preprocessing of raw reads, alignment and variant calling was described previously [7].In each population, pools of genomic DNA were analysed by calculating the gene-wise ratio between nucleotide diversity of nonsynonymous polymorphism and nucleotide diversity of synonymous polymorphism (π n /π s ) for all genes including the 21 UGT genes using snpgenie [64] for details see [48].

Design of the SureSelect bait and target enrichment sequencing of candidate resistance regions from laboratory and field-collected colonies
Target sequencing baits were designed using the SureSelect DNA Advanced Design Wizard in the eArray program of Agilent.The design and sequencing of the SureSelect experiment were described in detail previously [7] In summary, a total of 1302 target sequences were included in the enrichment sequencing, constituting 3,059,528 bp.The library preparation and sequencing were performed by the Centre for Genomic Research (CGR), University of Liverpool, using the SureSelect target enrichment custom kit.The libraries were sequenced in 2 × 150 bp paired-end fragments on an Illumina MiSeq with 20 samples per run.The regions targeted by the enrichment sequencing include a selection of detoxification genes potentially involved in insecticide resistance, heat shock proteins, immune response genes and odorant binding proteins (see [7] for further details).In addition, all the genes in the major candidate trait loci associated with pyrethroid resistance previously identified were included in the targeted sequencing.These include a 120 kb region Bacterial Artificial Chromosome (BAC) clone of rp1 (resistance to pyrethroid 1) locus on chromosome 2R and a 113 kb BCA clone of rp2 on chromosome 2 L [19,65].
This fine-scale targeted technique was used to sequence part of the genome of a total of 80 individual mosquitoes from the two lab colonies (FANG and FUMOZ) and F 1 field-collected mosquitoes from Cameroon, Malawi and Uganda.For field-collected populations exposure duration for susceptibility assay for each population is described in 2.1, 10 putatively F 1 permethrin susceptible females and 10 F 1 resistant females were targeted by the SureSelect bait.In addition, 10 mosquitoes were targeted from each laboratory colony FANG, FUMOZ.Centre for Genomic Research (CGR), the University of Liverpool, conducted the library construction and capture using the SureSelect target enrichment custom kit.The libraries were subjected to paired-end sequencing (2 × 150 bp) on an Illumina MiSeq instrument with 20 samples per run [7].A broad-scale analysis of these data can be found in [7].

Analysis of the SureSelect data
SureSelect pair-end trimmed reads were aligned using BWA (version 0.7.17) against An.funestus FUMOZ reference genome (Assembly ID GCA_003951495.1) [55].Sequence alignment map (sam) files were converted to binary alignment map (bam) files using samtools (1.13) [66,67].Using Picard tools (2.26.3) alignment bam files were sorted according to the position in the reference genome, duplicated reads were marked, and reads were assigned a new read group tag [68].Variant calling was carried out using freebayes (v1.2.0-dirty) by setting the number of alleles to be considered to 4 using the option (− -use-best-nalleles 4) to reduce run time and all other options were set to default [69].Subsequently, resulting variants were filtered using vcffilter (vcflib version 1.0.0_rc2)based on Phred-scaled quality-score >20 (QUAL ≥ 20) keeping 531,726 variants [70].SnpEff (5.0) was used to annotate and predict the effects of genetic variants between samples and the FUMOZ reference genome [71].Variants were further filtered by removing SNPs with missing values, removing all indels, and only retaining bi-allelic SNPs.Variants were separated by country to multiple VCF files so when filtered by missing values a maximum number of SNPs will be retained (Supplementary Table S6) [66,67].

Gene-wise F ST differentiation and allele frequency spectrum (AFS) test statistics between pyrethroid resistance and putatively susceptible populations
To conduct population genetics analyses on a gene level from the SureSelect target enrichment sequencing only genes with an average coverage of 100 and above calculated in jvarkit https://github.com/lindenb/jvarkit using merged alignment files of all the FANG and FUMOZ individuals were retained.The target enrichment sequencing covered 807 genes, compromising 6.16% of the total number of genes in the FUMOZ reference genome.UGT genes associated with pyrethroid resistance were detected using gene-wise allele frequency spectrum (AFS)-based methods on biallelic SNPs within targeted regions.For each gene targeted by the enrichment sequencing, gene-wise F ST was calculated using PopGenome R package to investigate divergence between resistant (alive) and susceptible (dead) Africa-wide, in each country and between laboratory colonies (FANG and FUMOZ) [72].A p-value for each gene-wise F ST was calculated using bootstrapping process.To understand how much variability there might be in the gene-wise F ST values due to a chance we generated 1000 bootstrap samples by resampling from the gene-wise F ST values.P-values were determined by comparing the distribution of bootstrap F ST values to the observed F ST values.Genes with gene-wise F ST values that are in the top 0.05 quantiles of genes per chromosome and a p-value <0.05 were considered significant (Supplementary Dataset 5, Fig. 3, Supplementary Fig. 6).Additionally, the genomic fragment spanning the full length of UGTs targeted by enrichment sequencing were analysed in the 80 mosquitoes Africawide.Nucleotide diversity (π) was calculated within each population (resistant and susceptible) as well as between populations from Malawi, Uganda, and Cameroon, and both laboratory colonies the FANG and the FUMOZ.
To detect genomic regions impacted by selections, gene-wise Tajima's D neutrality test statistics were estimated for all genes included in the targeted sequencing.Gene-wise population genetics neutrality statistics including Tajima's D (on synonymous SNPs, nonsynonymous SNPs and combined) Fu, and Li's D and F tests, Watterson estimator and composite likelihood ratio (CLR) test [73] were calculated for all genes included by the targeted sequencing using PopGenome R package [72].Coalescent simulation to derive the expected distribution of gene-wise neutrality tests statistics across the loci targeted by the enrichment analysis was generated using the MS program [74].Tajima's D values were considered significant if they fall at the 0.05 quantiles at both extremities of the simulated Tajima's D density plot (Fig. 4, Supplementary Dataset 6).

F st differentiation between pyrethroid resistance and putatively susceptible populations on the SNP level
For all SNPs in the targeted region by SureSelect, Weir & Cockerham's wc F ST for two populations and p F ST probabilistic approach to detect the difference in allele frequencies between the resistant and susceptible population in every country, Africa-wide and between laboratory colonies (FANG and FUMOZ) was calculated using vcflib (1.0.0_rc2) [70].R qvalue package (https://github.com/StoreyLab/qvalue) was used to perform a false discovery rate (FDR) estimation for the list of p-values calculated from the p F ST test, a p-value of 0.05 were used to extract significantly divergent SNPs from each analysis.The functional effect of each SNP with significant p F ST from each comparison on the coding region was determined by overlapping using SNPs genomic positions.Supplementary Table S6 surmises the total number of SNPs identified in the targeted region, along with the significantly divergent SNPs between resistant and susceptible, and the number of predicted functional effects of significant SNPs including nonsynonymous SNPs.The functional effect of each SNP as predicted by SnpEff, a p-value quantifying allele frequency difference (p F ST ) and Weir & Cockerham's F ST (wc F ST ) are provided in (Supplementary Dataset 7).

Characteristics, phylogenetics, and evolution of An. funestus UDPglycosyltransferases (UGT) genes
The assembled and annotated Anopheles funestus genome for the FUMOZ resistant-strain colony contains 21 UGTs; 20 genes with complete transcripts located on chromosomes 2 and 3, encoding 511 to amino acids (Table 1), and a partially sequenced UGT gene AFUN018708 (65 amino acids) due to incompleteness in the transcriptome assembly [55].The latest An.funestus assembly from an individual female specimen from La Lopé, Gabon (AfunG1) contains seven more UGT genes and the partially sequenced UGT gene (AFUN018708) from the FUMOZ colony reference was identified to be a partial sequence of AFUN2_007305(b) later named UGT308B3 according to UGT nomenclature [75].Overall, the number of predicted genes in the AfunG1 assembly (14,819 genes) is higher than the FUMOZ colony assembly (14,176 genes).In total 27 UGT genes were identified in An. funestus from both assemblies.(Table 1) [55].Genomics and transcriptomics analyses in this paper used the FUMOZ strain reference genome as its annotation is well-integrated within VectorBase.
Anopheles funestus UGT genes were distributed into 12 families, of which 9 UGTs were clustered within the UGT308 family (Fig. 1, Table 1).The UGT308 gene family demonstrates significant gene expansion, making it the largest in the overall phylogenetic tree and the largest among the UGT families in An. funestus.There are 3 An.funestus UGT genes each in UGT families UGT301, UGT302 and UGT306 with a close 1:1 orthologue gene between An. funestus and An.gambiae.There are two An.funestus UGT genes in the UGT36 family belonging to subfamilies B and C are close orthologous to the UGTs from An. gambiae and Ae.Aegypti belonging to the same subfamilies.There is only one An.funestus UGT each in UGT49, UGT50, UGT309, UGT310, UGT313, UGT314 and UGT315 families (Fig. 1, Table 1).The species phylogeny among the four Diptera species is mirrored by the branching pattern of the UGT50 family, the An.funestus UGT50 gene is closer to An. gambiae and Ae.Aegypti with a pairwise identity of 93.1% aaID and 79.6% aaID respectively than D. melanogaster DmUgt50B3 with 58.6% aaID (Fig. 1) (Supplementary Dataset 2).

T. Al-Yazeedi et al.
Protein function analysis using the consensus sequence and UGT50B8 amino acid sequence predicted a signal peptide at the N-terminal, a transmembrane domain, and a cytoplasmic tail at the C-terminal domain (Supplementary Fig. 2).A significant similarity is observed at the UGT signature motif of 29 aa-long sequences (consensus: FITHGGLLSTQEAIYHGVPIVGIPFFGDQ) found in the middle of the Cterminal domain with two residues conserved in all UGT genes (G415, P428) [58].Other conserved signature motif sequences involved in sugar donor binding regions (DBR1 and DBR2) and catalytic mechanisms are found in the C-terminal domain comparable to mammalian and insect UGT genes (Supplementary Fig. 1).

Studying UGT association with pyrethroid resistance using DNA and RNA pools
The analysis of genomic DNA pools involves the calculation of the gene-wise ratio between nucleotide diversity of nonsynonymous polymorphism and nucleotide diversity of synonymous polymorphism (π n / π s ) for all genes including the 21 UGT genes [7,48].The results are presented in (Supplementary Dataset 3) elucidating the association of genes with pyrethroid resistance.The (π n /π s ) ratio gene-wise for all the UGT genes in all the populations is generally <1, implying stabilizing or purifying selection acting against changes in the amino acid sequence, except for UGT315A3 (AFUN000679) in the FUMOZ population where to investigate their association with pyrethroid resistance a positive selection was detected driving changes in the protein sequence.
Results for differentially expressed genes (DEGs) globally in each separate contrast while highlighting detoxification genes including UGT genes are presented in (Supplementary Fig. 3).Differential expression analyses between mosquitoes resistant and unexposed to permethrin in populations from Uganda and Cameroon did not detect significant changes in the expression level of detoxification genes including UGTs.However, the expression contrast in Malawi between resistant and unexposed populations detected significant differential expression of 7 UGT genes; 5 genes were upregulated, and 2 genes were downregulated.In Malawi, 4 of the genes that are upregulated are also upregulated when the resistant population is compared with FANG (Fig. 2).UGT310B2 (AFUN011266) is significantly upregulated in all resistant populations to permethrin when compared to the FANG population, 2.7 FC in FUMOZ, 5.6 in Malawi, 2.6 in Uganda and 3.5 FC in Cameroon (Fig. 2B).While UGT301C2 (AFUN004354) overexpression was specific to the FUMOZ population (2.2 FC) when compared with FANG (Fig. 2, Supplementary Fig. 4, Supplementary Dataset 4).The overexpression of UGT310B2 in FUMOZ compared to FANG was detected using quantitative PCR (qPCR) (Fig. 2C).Detecting UGT301C2 by qPCR was challenging due to its relative overexpression in the FUMOZ colony compared to the FANG (Fig. 2C, Supplementary Dataset 4).
Overall, global transcriptional regulation in resistant populations in response to permethrin exposure, when compared to unexposed populations from the same region, was more evident in Malawi compared to Uganda and Cameroon (Supplementary Fig. 5).In Malawi, a total of 1826 significant DEGs were detected, and 48 of those genes belong to the major detoxification gene families, 23 cytochrome P450s, 12 carboxylesterases, 4 glutathione S-transferases, 7 UDP-glycosyltransferases and 2 ABC transporters.In the population from Cameroon, only a single carboxylesterase gene (AFUN002514) was detected to be upregulated [17].While the Ugandan resistant population overexpresses three cytochromes P450 genes AFUN015739 (CYP307A1), AFUN020895 and AFUN019365 compared to the unexposed population from the same country (Supplementary Fig. 3, Supplementary Dataset 4) (Supplementary text 1).

Identifying selection and divergence in an. Funestus UGT genes using targeted sequencing
Gene-wise allele frequency spectrum-based (AFS) summary statistics

Table 1
A complete list of UGT genes identified in An. funestus.The table includes official names for all UGT genes, VectorBase gene IDs, chromosomal location including gene start (Start), end position (End) and strand orientation (Strand), and length of the gene and amino acid sequence (aa sequence length).For the overlapping 20 UGT genes found in the FUMOZ colony reference and in the AfunG1 genome, VectorBase IDs from both genomes were reported, while Chr, Start, End and Strand were extracted from the FUMOZ reference genome.For the extra 7 UGTs specific to AfunG1 genome, the chromosome location, gene start, and end position were reported from the AfunG1 genome.were calculated using biallelic SNPs within targeted regions.A total of 136,348 bi-allelic polymorphic sites were retained for AFS analysis across 80 mosquitoes Africa-wide, divided into equal numbers of dead and alive across the continent and in each country (see methods).In each country, 91,619, 90,340, and 49,762 polymorphic biallelic sites were detected respectively in Malawi, Uganda and Cameroon (Table S1).Identified polymorphic sites are divided between genes targeted by enrichment sequencing including 61 genes on the X chromosome, 431 genes on Chr2 and 315 genes on Chr3 (807 in total) (Table S1).Many of those genes belong to detoxification gene families, including P450 (66 genes), GSTs (12 genes), COEs (5 genes), UGTs (12 genes), ABC transporters (14 genes), and some of the remaining 698 genes could also be associated with pyrethroid resistance (Supplementary Text 2).Low polymorphism is present in samples collected from Malawi compared to populations from other countries.Additionally, low polymorphism was detected between laboratory colonies FANG and FUMOZ compared to all field isolates, where 28,566 polymorphic sites were detected.A low level of polymorphism in the field population from Malawi and the FANG laboratory colony, originally from Angola, is expected when FUMOZ, originally from Mozambique, is used as a reference genome since all populations are from Southern Africa (Table S1).
Africa-wide gene-wise F ST test between dead and alive mosquitoes across the investigated countries detected a significant gene-wise F ST value of 0.07463 for UGT301C2 on the top 0.5 quantiles (Chromosome 0.95 quantiles = 0.048) (Fig. 3) (Table S2 and S3).To some extent, a geographical pattern of elevated gene-wise F ST for UGT genes was evident in samples collected from Southern African populations (FANG, FUMOZ and Malawi) with elevated gene-wise F ST for UGT301C2 and UGT314A3, compared to Uganda (East Africa) and Cameroon (Central West Africa) with a shared high gene-wise F ST for UGT306C2 (Fig. 3) (Table S3).The commonly overexpressed UGT310B2 was not included in the targeted sequencing to investigate if there is a link between differentiation and overexpression.UGT301C2 which is overexpressed in the FUMOZ colony is highly differentiated between FUMOZ and FANG populations but not significant.Gene-wise Tajima's D density curve of genes included in the enrichment sequencing may reflect the demographic history of An. funestus population in those countries (Fig. and Supplementary Fig. 7).Fig. 1.Phylogenetic tree of 123 UGT sequences from An. funestus, Ae.Aegypti, An. gambiae, D. melanogaster.Phylogenetic analysis was constructed using FastTree method.Fast tree support values that are larger than 0.5 are marked on the corresponding branch.UGT names start with the species initials An. funestus (Af), Ae.Aegypti (Aa), An. gambiae (Ag), and D. melanogaster (Dm), followed by the family classification (numbers), the subfamily classification (letters), and the unique numbers.The numbers on the outer circle refer to the family classification.

Gene-wise differentiation and selection in laboratory colonies (FANG and the FUMOZ) and Malawi
Gene-wise F ST differentiation between the susceptible laboratory colony the FANG and the resistant laboratory colony the FUMOZ did not detect UGTs with significant differentiation among the top 0.05 quantiles in respective chromosomes, but UGT301C2 and UGT314A3 on chromosome 2 are the most differentiated UGT genes (Fig. 3) (Table S2).The average gene-wise F ST values in the analysis between FANG and FUMOZ were higher than all the other analyses between putatively susceptible and resistant from other countries, revealing the genetic difference between the two geographically distant isolates (Fig. 3) (Table S2).Furthermore, in Malawi (South Africa), significant differentiation of UGT genes UGT301C2 on chromosome 2 and UGT302A3 on chromosome 3 was identified, among the top 0.05 quantiles on respective chromosomes (0.95 quantiles of chromosome 2 = 0.069 and chromosome 3 = 0.085), with gene-wise F ST of 0.9142 and 0.12021 respectively.(Fig. 3 and Table S3).Meanwhile, differentiation of UGT314A3 on chromosome 2 in populations from Malawi was similar to the observed differentiation between laboratory colonies (the FANG and the FUMOZ) only higher than 80% of gene-wise F ST values and not significant (Fig. 3).
In all populations from Southern Africa including Malawi, FANG colony (derived from Angola) and FUMOZ colony (derived from Mozambique) Tajima's D density curves were close to equilibrium, represented by the coalescent simulation curve (Fig. 4B).In the FUMOZ population, 7 UGT genes out of the 12 UGTs targeted in this study have a high gene-wise Tajima's D than 0.95 quantiles of simulated Tajima's D values and the gene-wise Tajima's D average in FUMOZ is higher than the average of simulated Tajima's D for all chromosomes (Fig. 4) (Table S4 and S5).Similarly, in the susceptible FANG population, Tajima's D average for all chromosomes is higher than the coalescent simulation average (Table S4).This observation may indicate a strong balancing selection or decrease in population size (sudden population contraction) in the laboratory colony populations, probably introduced by laboratory propagation.However, in Malawi, UGT301C2 has a negative Tajima's D value of − 1.8243 below the 0.05 quantiles of simulated Tajima's D, deviating from the empirical distribution of genewise Tajima's D values on chromosome 2 indicating a recent selective sweep driven by directional selection (Fig. 4, Table S5).

Gene-wise differentiation and selection in central West Africa (Cameroon) and East Africa (Uganda)
When comparing susceptible and resistant populations from Uganda, UGT306C2 on chromosome 3 had a significant differentiation with gene-wise with F ST value of 0.0494 within the top 0.05 quantiles, while in Cameroon gene-wise F ST for UGT306C2 was high but not significant (Fig. 3, Table S3 and Supplementary Dataset 5).
In Uganda and Cameroon, gene-wise Tajima's D values were predominantly skewed towards negative values.In Cameroon, the average and median of gene-wise Tajima's D per chromosome are below the 0.05 quantiles of simulated Tajima's D for corresponding Chromosomes.There were 7 UGT genes in Cameroon below 0.05 quantiles of simulated Tajima's D and most UGT genes in Uganda have negative Tajima's D values but not within the lowest 5% of simulated Tajima's D for corresponding chromosomes (Fig. 4, Supplementary Fig. 7, Table S4 and  S5).

Analysis of UGTs polymorphism across Africa identified nonsynonymous SNPs potentially associated with resistance
Low gene-wise nucleotide diversity was detected in the susceptible FANG population for all UGT genes compared to other populations probably due to the lack of selection pressure and consistent population genetic drift introduced by laboratory maintenance of the colony (Fig. 5A).The low diversity of the FANG UGTs haplotype is evident in haplotype networks were FANG haplotypes mostly cluster together.Gene-wise nucleotide diversity varies between UGT genes, and the number of polymorphic substitutions relates to the gene size (Fig. B).Analysis of the Templeton, Crandall and Sing (TCS) haplotype tree for UGTs targeted by the enrichment sequencing from 80 mosquitos (160 haplotypes) highlights the high polymorphism of UGTs across Africa with many singleton haplotypes separated by many mutational steps, except for UGT306C2, UGT306A3 and UGT301C2 where predominant haplotypes shared by different populations were detected (Fig. 5C-E).
To identify SNPs that are potentially associated with pyrethroid resistance within UGTs, we identified significantly differentiated SNPs based on p-values quantifying differences in allele frequencies (pFst) between susceptible and resistant populations in each country (Table S6 and S7) In comparison between laboratory colonies FANG and the FUMOZ, 30.5% of targeted UGT SNPs showed significant differentiation, including 26 non-synonymous variants.In Malawi, 11.6% of UGT SNPs are significantly differentiated, with 6 genes containing 14 significant nonsynonymous SNPs.Cameroon shows 5.4% of significantly differentiated UGT SNPs, including 4 nonsynonymous SNPs on 4 genes.Uganda exhibits 8.6% of highly divergent UGT SNPs, with 5 significant nonsynonymous SNPs on 4 UGTs (Supplementary Fig. S8 and S9).
We focused our investigation on significantly differentiated nonsynonymous SNPs, especially non-synonymous SNPs that could potentially affect the enzyme catalytic activities occurring close to the conserved motifs involved in binding to the glycosyl donor (Fig. 6).Three Nonsynonymous SNPs were detected at the conserved motif of three different UGTs occurring only in susceptible populations (Supplementary Fig. 11) (Table S8).In addition, nonsynonymous changes were detected between sugar donor binding residues 1 and 2 (DBR1 and 2) in UGT306C2 (c.945 T > A, p.His315Gln) only in 3 haplotypes from the resistant Malawi population, and in UGT302A3 (c.962A > G, p. Asn321Ser) occurring in a frequency of 7 haplotypes from the total 20 haplotypes of the FUMOZ population.Other mutations were detected outside the conserved domain on the signal peptide, transmembrane domain, carboxylic tail, and the N-terminal domain that is believed to determine substrate specificity.Notably, a nonsynonymous SNP on the N-terminal domain of UGT36C3 (c.226A > G, p.Thr76Ala) that is fixed in the resistant population from Uganda and occurs in 17/20 haplotype of the putatively susceptible populations (Table S8).
Overall, the most highly differentiated SNPs in analysis between susceptible in resistant populations from the three countries belong to detoxification genes other than UGTs except for Malawi, where SNPs on UGT302A3 are the most highly differentiated SNPs (Supplementary Fig. 2. Differentially expressed An. funestus UGT genes in response to permethrin resistance.(A) Differentially expressed genes (DEGs) between the field-resistant population after exposure to permethrin and the unexposed population from the same country were identified in Malawi but not in Cameroon and Uganda.(A) Comparing the transcriptional profile of resistant populations from Malawi, Cameroon, Uganda and FUMOZ to that of susceptible FANG population identified Africawide upregulation of UGT310B2.The red arrow pointing upwards indicates upregulation and the blue arrow pointing downwards indicates downregulation.(B) All An. funestus UGTs (x-axes) plotted against fold change (FC) (y-axes) obtained from DEGs analyses colour coded according to the plot legend highlights the Africa-wide upregulation of UGT310B2 with a maximum fold change of 5.6 in resistant population from Malawi when compared with transcription profile of FANG.UGT310B2, UGT306D2, UGT301E3, and UGT315A3 have a significant overexpression in Malawi resistant population when compared with the FANG profile and with the unexposed population from Malawi highlighting the potential role of UGTs in detoxification in Malawian population.(C) The overexpression of UGT310B2 was detected in the FUMOZ colony using qPCR.However, due to the relative overexpression of UGT301C2 from the FUMOZ RNAseq profile compared to the FANG detecting the overexpression in the FUMOZ colony by qPCR was challenging.(For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.) T. Al-Yazeedi et al.

Discussion
It is essential to improve our understanding of insecticide resistance since malaria prevention is heavily dependent on insecticide-based interventions.Although other gene families such as cytochrome p450s or GSTs have been extensively studied, the role of UGT genes in pyrethroid resistance remains largely unexplored in malaria vectors.Our study provides a comprehensive genome-wide characterisation of the UGT gene family in An. funestus and investigates their expression and selection in geographically distinct populations.A major outcome of the Fig. 3. Gene-wise F ST for all genes included in the targeted sequencing in all analyses between resistant and susceptible.The average and 0.95 quantiles of gene-wise F ST for each chromosome are represented by the blue horizontal dotted line and the red dotted line respectively.The genomic location of all genes in the targeted region (x-axis), represented by a circle, was plotted against the Gene-wise F ST (y-axis).UGT genes included in the targeted sequencing are colour coded according to the plot legend.Raw data of all the genes included in the targeted sequencing is in (Supplementary Dataset 5), averages and 0.9 quantiles for each comparison are in (Table S2), and UGT genes F ST values are in (Table S3).(For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.) study was the detection of overexpressed and significantly differentiated UGTs in field-collected pyrethroid-resistant populations of An. funestus.

Characterisation and evolution of An. funestus UGTs
The number of UGT genes identified in An. funestus (27) is comparable to that of An. gambiae (26), however both of Ae. aegypti and D. melanogaster contain 35 UGT genes.An. funestus UGTs exhibit typical characteristics of enzymes that catalyse glycosylation at the C-terminal domain containing significant conservation of 29 amino acids signature motifs and UDP-glycosyl donor binding regions (DBR1 and DBR2), as described for other insects [27,45].On the other hand, the N-terminal domain is highly variable between UGTs and is believed to be involved in substrate specificity [38,44,46].The 'loose' fit of the substrate binding domain at the N-terminal provides a binding site for structurally diverse substrates by the same UGT isoform [22,27].Additionally, a signal peptide is located at the N-terminal indicating that UGTs protein precursor is destined towards the secretory pathway, while the retention of those proteins inside the ER membrane is mediated by the C-terminal hydrophobic transmembrane domain that spans the membrane with a group I topology.Most of the UGT protein resides in the ER and only a small portion of the protein, particularly the cytoplasmic tail, resides in the cytosol [27,58].
Previous phylogenetic studies of insect UGTs indicate that UGT50 is the only family that is found universally in all insect species [46].UGT50 family contains one UGT gene from each Diptera species selected for this investigation and the branching pattern of the UGT50 mirrors the species phylogeny among the four Diptera species investigated in this study [46,76].The UGT308 family is the largest in An. funestus by gene number and in the overall phylogenetic tree and, where nine An. funestus UGTs clustered within this family.The expansion in the UGT308 family was potentially driven by divergence to increase the number of substrates binding for glycosylation and gene duplication, as illustrated by the subfamily UGT308G in An. funestus and UGT308J in Ae.Aegypti.Based on previous investigations in mosquitoes, genes belonging to the UGT308 family were speculated to be involved in insecticide resistance, and the gene expansion may have evolved to resist consistent exposure to insecticides [38,77].However, in our investigation, we did not detect overexpression of An. funestus UGT308 genes in resistant field-collected populations.

Overexpression of UGT310B2 Africa-wide in pyrethroid-resistant populations was detected
In this study, DNA and RNA PoolSeq were analysed to identify an association between An. funestus UGTs and pyrethroid resistance.The investigation started by calculating the gene-wise ratio between nonsynonymous and synonymous polymorphisms pN/pS from pooled DNA.The pN/pS genetic test is a powerful population genetic test that requires few assumptions and is a good indicator of selective pressure at a  Targeted UGTs polymorphism and haplotype network analysis.UGTs gene-wise nucleotide diversity was calculated between (resistant and susceptible) populations and within populations from Malawi, Cameroon, Uganda and laboratory colonies the FANG and the FUMOZ (A).The number of polymorphic substitutions for UGTs in each country reflects the gene size (Table 1) (B).TCS haplotype network reveals limited haplotype clustering in UGT301C2 (C), UGT306A3 (D) and UGT306C2 (E) where predominant haplotypes shared by different populations were detected.The predominant haplotype in UGT301C2 was observed in (40/ 140) haplotypes from Cameroon (9), Uganda (14), Malawi (12) and FUMOZ (5).While 18/20 FANG haplotypes cluster together in three predominant nodes separated by a single mutation step each contains (9/20), (7/20), and (2/20) haplotypes respectively (Fig. 5C).In UGT306A3 two main haplotype clusters separated by a single mutation step were detected.The biggest cluster contains (45/160) haplotypes where all the 20 FANG haplotypes cluster, 9/20 FUMOZ and 15/40 haplotypes from Malawi.The second cluster contains (41/160) haplotypes including 12/40 from Malawi, 16/40 from Cameroon and 7/40 from Ugandan population (Fig. 5D).In UGT306C2 many haplotype clusters were observed the most dominant is 23 haplotypes cluster specific to FUMOZ (7/20), FANG (6/20) and Malawi (11/40).gene level [78].pN/pS detected what could be a stabilizing or a purifying selection acting against changes in the protein sequence in all UGTs from field collected mosquitoes from Malawi, Cameroon and Uganda.The pN/pS analysis is limited by only detecting selection pressure in protein-coding regions, however, evolutionary changes in regulatory regions of genes associated with permethrin detoxification may affect the timing and expression level of those genes.Such changes cannot be detected by gene-wise population genetic tests using pooled DNA.Therefore, we investigated the transcriptional regulation of An. funestus UGT genes in response to permethrin exposure using total RNA pools.Pooled RNA samples derived from several biologically similar animals, compared to sequencing corresponding individuals individually, reduce the cost of sequencing while retaining similar biological information.
Insecticide resistance surveillance detected an increase in insecticide resistance in Malawi, Cameroon and Uganda in recent times [7,9,13,17].Changes in transcriptional regulation in response to permethrin exposure between resistant populations and unexposed populations from the same territory were more evident in Malawi than in Uganda and Cameroon.High resistance levels in Uganda and Cameroon may have obscured the difference in gene expression of detoxification genes induced by permethrin exposure.On the other hand, comparing the transcriptional profile of field-resistant populations and the laboratoryresistant (FUMOZ) to that of the laboratory-susceptible colony (FANG) detected a pronounced difference in detoxification genes expression [17].We detected overexpression of UGT310B2 (AFUN011266) across all pyrethroid-resistant populations.UGT overexpression in other mosquito vector populations, such as An.sinensis and A. gambiae, have been reported and linked to pyrethroid resistance [38,77].Although the upregulation of UGTs expression in field-resistant mosquitoes was not as prominent as Phase I detoxification enzymes cytochrome P450s, UGTs still play an important role in the metabolic breakdown of pyrethroid insecticides (Supplementary Fig. S3).UGTs conjugate polar hydrophobic compounds with hydrophilic glycosyl groups, producing more polar metabolites that can be easily excreted from the cell by export transporters [22,[27][28][29].Pyrethroids are mostly composed of nonhydrocarbon chains and cyclic easter or acid groups and typically lack a polar group that can be glycosylated by UGTs [79,80].Therefore, in the detoxification process of pyrethroids, they undergo initial oxidation by phase I cytochrome P450s, resulting in the production of more polar and reactive metabolites that potentially can be conjugated by UGTs [81].
The overexpression of CYPs compared to other detoxification enzymes indicates their primary role in the detoxification process of pyrethroids.While relative overexpression of UGTs compliments their roles as secondary enzymes interacting with the oxidised pyrethroid, a by-product of CYPs.Previous studies have identified overexpression of An. funestus cytochrome P450 genes, including CYP6P9a, CYP6P9b, CYP9J11, CYP6Z1, CYP6M7 and CYP9K1 in resistant populations.The recombinant protein of those genes expressed in vitro metabolises permethrin with significant depletion, therefore reducing the efficacy of permethrin-treated bed nets [7,8,14,17,25,26].The identified overexpression of UGT genes in this study is relevant to current vector control strategies and management.UGTs may contribute to crossresistance against polar insecticides that can be directly glycosylated enhancing their solubility and excretion.Contribution of UGTs to resistance against other insecticides that contain a potential glycosylation site has been reported in the Diamondback moth Plutella xylostella resistance to chlorantraniliprole [39], the housefly Musca domestica resistance to organophosphate [41], and the greenfly cotton aphid Aphis gossypii resistance to neonicotinoid and spirotetramat [42,43].

Targeted sequencing reveals differentiated UGTs and population structure in investigated countries
Sequencing DNA from pools of individuals is a cost-effective approach for conducting population genetics studies, which are otherwise economically challenging to sequence many individual genomes at high coverage [82].However, to avoid pooling biases and detect SNPs at an individual resolution while maintaining a costeffective approach, SureSelect target enrichment sequencing was implemented on individual mosquitoes.Polymorphism detected in the targeted region reflects the geographical distance between the collected populations and the FUMOZ reference genome originally from Mozambique (Southern Africa).As expected, populations from Southern Africa Malawi and the FANG colony (originally from Angola) harbour lower polymorphism than populations collected from Uganda and Cameroon.The gene-wise differentiation (F ST ) for UGT genes is generally lower than other detoxification genes in all analyses except in Malawi, where UGT302A3 and UGT301C2 are the most differentiated detoxification genes, highlighting the potentially important role of UGTs in detoxification in this region.In Uganda and Cameroon, the gene-wise differentiation (F ST ) of some cytochrome p450s and GSTs genes were among the highly differentiated genes illustrating their important role in the detoxification mechanism of pyrethroid in those regions (Supplementary Dataset 5).
A genome-wide process could be inferred from gene-wise Tajima's D density curves for genes in targeted regions revealing a history of An. funestus population in investigated locations, with a likely population expansion after a recent bottleneck effect in Cameroon and Uganda.Gene-wise Tajima's D density curves for populations from Cameroon and Uganda are skewed towards negative values, whereas populations from Southern Africa Malawi, FANG (Angola) and FUMOZ (Mozambique) were close to equilibrium, represented by the coalescent simulation curve.Results in this study, support previous findings of population expansion in the western part of An. funestus population range, west of the African Rift Valley [18,83].A similar pattern is found in co-distributed malaria vectors An. gambiae and An.coluzzii, suggesting that those species have responded to common geographic constraints with a population expansion north of the Congo basin and west of the East African Rift Valley [84].

Low-frequency nonsynonymous SNPs within UGTs in field populations show a limited directional selection
Haplotype clustering of UGTs did not reveal a strong directional selection in any of the investigated populations.UGT haplotypes clustered randomly with many singleton haplotypes separated except for UGTs that show significant gene-wise divergence or a potential genelevel selective sweep inferred from Tajima's D. A dominant haplotype was detected in UGT301C2 that is significantly differentiated gene-wise in analysis between resistant and susceptible populations in Malawi and Africa-wide.A potential gene-level selective sweep is detected in Malawi at UGT301C2, while in Cameroon the significantly negative Tajima's D is not a deviation from the genome-wide trend.Haplotype clustering was detected for UGT306A3 and C2 both genes recorded significant Tajima's D in Cameroon while UGT306C2 is significantly differentiated between resistant and putatively susceptible populations from Uganda.The haplotype clustering for UGT301C2 and UGT306A3 revealed limited directional selection with a predominant haplotype detected in fieldcollected populations, while UGT306C2 haplotypes from field populations cluster in more than one node.The limited selection in UGTs might reflect their role in pyrethroid detoxification as phase II enzymes in the detoxification pathway of pyrethroid compared to the previously described selection for CYP9K1 in Uganda, and CYP6P9A and CYP6P9B in populations from Southern Africa (Supplementary Fig. 10) [7,10,11,14].
The resulting averaged values of gene-wise F ST can mask or lower the magnitude of a positive selection where a combination of positive and negative selection within the gene at different times along its evolution may cancel each other out.Most of the significantly differentiated SNPs in all investigated populations are synonymous and intronic.Therefore, we focused the investigation on significantly differentiated nonsynonymous at the conserved motifs.Nonsynonymous mutations causing amino acid changes at conserved motifs acting as a binding site to the UDP-glycosyl donor are found in low frequencies in field-collected populations but with a significant differentiation between putatively susceptible and resistant populations.Other nonsynonymous mutations were detected outside the conserved motifs at the signal peptide, transmembrane domain, carboxylic tail, and the N-terminal domain.
Changes in the N-terminal domain might have a detrimental effect on substrate specificity.However, the N-terminal domain generally exhibits greater sequence divergence between UGT isoforms to enable the glycosylation of diverse substrates by the same UGT isoform [22,27].Research on allelic variations in human UGTs indicated that amino acid substitutions at key residues can affect the enzyme-substrate selectivity, glycosylation activity and overall drug metabolism [85][86][87][88].Unless those nonsynonymous mutations identified in this research are validated using recombinant proteins substantiating their functional role in pyrethroid detoxification is challenging.To capture the complete effect of those nonsynonymous sites in UGTs activity we recommend using a baculovirus expression system in insect cells to account for posttranslational modifications that have a significant impact on their activity [89].Post-translational modification of UGTs specifically Nglycosylation was demonstrated to be important for UGT protein folding and catalytic activity [85,86,89].
It has been established that direct selection of nonsynonymous SNPs causing amino acid alterations in An. funestus detoxification genes such as P450s [7,8,90] and GSTs [15] enhance their detoxification activities.Despite the low frequency of the UGTs nonsynonymous SNPs outlined in this paper they are the first nonsynonymous SNPs to be reported in An. funestus UGTs.The selection detected in An. funestus UGTs nonsynonymous SNPs is limited and further illustrate the secondary role of An. funestus UGTs in pyrethroid resistance.However, the outlined significantly differentiated UGTs nonsynonymous SNPs will help future prediction of cross-resistance to insecticides that can be directly detoxicated by UGTs.

Conclusion
In this study, we have provided a comprehensive investigation of the potential role of UGTs in pyrethroid resistance in resistant laboratory colonies and field-collected An. funestus populations.Findings in this study have important implications for current An.funestus vector control strategies and management as UGT enzymes may confer cross-resistance to other polar insecticides, which can be directly detoxified by UGTs.Notably, we have identified a common overexpression of UGT310B2 across various regions in Africa.The overexpression of UGT310B2 in the FUMOZ colony was confirmed using quantitative PCR.The increased expression of UGT genes implies a mechanism by which mosquitoes can rapidly metabolise and eliminate pyrethroid with a risk of crossresistance to other polar insecticides.This compromises the ability of pyrethroid-based interventions and reduces their effectiveness in controlling An. funestus population.The gene-wise (F ST ) differentiation for UGT genes is generally lower than other detoxification genes in all analyses except in Malawi, where UGT302A3 and UGT301C2 are the most differentiated detoxification genes, highlighting the potential role of UGTs in detoxification in this region.In addition, SNPs belonging to UGT302A3 were the most highly differentiated SNPs in Malawi between resistant and putatively susceptible based on p-values quantifying differences in allele frequencies (pF ST ).In Uganda, a significant gene-wise F ST differentiation was detected in UGT306C2.The high gene-wise F ST divergence of UGT301C2 and UGT306C2 was supported by a limited selection with limited clustering of haplotypes from those regions in the haplotype tree.Gene-wise Tajima's D density curves infer genome-wide process and reveal population structures of An. funestus population from the three countries, supporting previous observations.In addition, this study identified significantly differentiated UGTs nonsynonymous SNPs that might implicate UGTs detoxification activities and produced detailed records of those SNPs.
This study highlights the complexity of insecticide resistance and the geographic-specific variability in resistance mechanisms.This information is crucial for tailoring vector control strategies to specific regions and populations.Further investigations using more recent fieldcollected populations across Africa should be carried out to explore the role of UGT genes in pyrethroid resistance and predict their potential for cross-resistance to other insecticides.

Fig. 9 )
Fig. 9) (Supplementary Dataset 7).A nonsynonymous SNP in the Nterminal of UGT302A3 (c.334C > G, p.Gln112Glu) is among the most highly differentiated SNPs in Malawi, occurring in 12 haplotypes of the resistant population and only 1 haplotype of the susceptible population.While in Cameroon the most highly differentiated SNPs in detoxification genes belong to CYP304B1 and CYP6AK1 and in Uganda belong to GSTE7 and CYP6Z1 (Supplementary Dataset 7).

Fig. 4 .
Fig. 4. Tajima's D for UGT genes and density curves of all the genes included by the targeted enrichment sequencing for all the populations.In populations from Southern Africa Malawi (MAL) (orange), FANG (Angola) (Yellow) and FUMOZ (Mozambique) (Blue) Tajima's D density curves were close to equilibrium (Black) (B).In Malawi, UGT301C2 is impacted by a recent selective sweep with a negative below the 0.05 quantiles of simulated Tajima's D (A).In Uganda (UG) and Cameroon (CAM), gene-wise Tajima's D was predominantly skewed towards negative values.In Cameroon, 7 UGT genes are below 0.05 quantiles of simulated Tajima's D and most UGT genes in Uganda are with a negative Tajima's D but not within the lowest 5% (A).(For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)

Fig. 6 .
Fig. 6.Significantly differentiated UGT genes nonsynonymous SNPs between laboratory colonies and in each country.Non-synonymous SNPs that are significantly differentiated between susceptible and resistant in each analysis are colour-coded according to the associated UGT gene.The red dotted line indicates the significance level at p-value 0.05 (− log(0.05)= 2.99).Further details of those nonsynonymous SNPs are included in Table S8 and the location of those SNPs relative to UGTs conserved sites is highlighted in Supplementary Fig. 11.(For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)