Genetic diversity of next generation antimalarial targets: A baseline for drug resistance surveillance programmes

Drug resistance is a recurrent problem in the fight against malaria. Genetic and epidemiological surveillance of antimalarial resistant parasite alleles is crucial to guide drug therapies and clinical management. New antimalarial compounds are currently at various stages of clinical trials and regulatory evaluation. Using ∼2000 Plasmodium falciparum genome sequences, we investigated the genetic diversity of eleven gene-targets of promising antimalarial compounds and assessed their potential efficiency across malaria endemic regions. We determined if the loci are under selection prior to the introduction of new drugs and established a baseline of genetic variance, including potential resistant alleles, for future surveillance programmes.


Introduction
The continuous emergence and spread of resistance to first line antimalarial treatments, including artemisinin and its derivatives, threatens global efforts to reduce the burden of malaria. The development of a fully effective vaccine has been hampered by the complex life cycle of the malaria parasite and the high genetic diversity of key parasite antigens. Thus, antimalarial drugs, particularly those targeting basic cellular machinery common to all stages of the parasite life cycle, are the most promising approaches to control malaria.
The pipeline of antimalarial drugs has greatly expanded over the past decade, particularly because of the strong public-private partnerships and significant investment in innovative technologies (Flannery et al., 2013;Wells et al., 2015). A set of next generation antimalarial compounds, for which the molecular targets are known or being investigated, are currently at various phases of preclinical and clinical assessment (Wells et al., 2015).
Knowledge of parasite molecular drug targets can be exploited to monitor the potential emergence and spread of resistant alleles, particularly from the introduction of a drug, and rapidly inform local policies to tailor interventions. Without knowledge of antimalarial gene targets, the identification and surveillance of resistant alleles needs to be based on accurate clinical drug efficacy trials and genome-wide population genetic studies of field collected samples (Anderson et al., 2011). This approach can be both costly and labour intensive. Alternatively, a powerful strategy to identify mutations linked to resistance, prior to the licensing of a drug, is the use of laboratory-adapted strains to induce selection in vitro with sub-lethal and increasing concentrations of drugs. This strategy has led to the identification of polymorphisms in the Plasmodium (P) falciparum kelch13 gene underlying resistance to artemisinin (Ariey et al., 2014). This gene was confirmed subsequently in association studies in field collected samples (Miotto et al., 2015) and using a reverse genetics approach (Ghorbal et al., 2014).
Here we consider eleven gene-targets of key investigated compounds that due to their efficiency might become the next antimalarial drugs, and for which mutations conferring resistance have been identified in in vitro studies (Baragaña et al., 2015;Dong et al., 2011;Flannery et al., 2015;Herman et al., 2015;Kato et al., 2016;LaMonte et al., 2016;Lim et al., 2016;McNamara et al., 2013;Ross et al., 2014). These 11 genes were also selected because they are gene-targets for a range of new antimalarial compounds already under evaluation in clinical trials. We present a survey of the natural genetic variation (SNPs, insertions and deletions (indels), copy number variants (CNVs)) and diversity in these gene-targets using a publicly available global collection of 2000 P. falciparum "field" parasite genomes from 18 countries. We use the variation to establish whether these regions are already under selective pressure, and report a baseline reference to assist future surveillance programmes with observing emergence of resistance mutations.
Sequencing data was generated by the Pf3k project (www. malariagen.net/pf3k), is open access and is described in (Miotto et al., 2015). Whole genome analysis of these data has also been recently described (Ravenhall et al., 2016) and we used a set of characterised high quality SNPs and indels identified in the 11 candidate target genes. In addition, larger structural variants (e.g. CNVs) in these regions were identified using Delly software (Rausch et al., 2012). Using the SNP variants, population genetic analyses were performed to establish if targeted coding regions are under selection. In particular, the Tajima's D method was applied to detect regions under balancing selection (R package Pegas); extended haplotype homozygosity approaches (jiHSj, XP-EHH) were applied to identify long-range positive directional selection, and F ST statistics were used to assess population differentiation (see (Ravenhall et al., 2016) for a detailed description of these methods).

Results
Across the eleven gene-targets, a total of 778 SNPs were identified, with half (n ¼ 424, 54.5%) leading to non-synonymous changes ( Table 1, Supplementary Table 1). The overall genetic diversity was low, with the majority of SNPs (75.1%) having minor allele frequencies of less than 5%. The SNP density (number of SNPs per kbp) across genes was similar (~1 SNP per 33bp), except for those coding for the ras-related protein (Rab11A, 1 SNP per 258.6bp), elongation factor 2 (eEF2, 1 SNP per 73.4bp) and the acetyl-CoA transporter (ACT, 1 SNP per 64.2bp), all with lower density, suggesting greater gene conservation. The pfact gene was recently identified to be the target, together with the UDPgalactose transporter gene-target (Pfugt), of a variety of imidazolopiperazine compounds. One of these compounds (KAF156) has potent activity against gametocytes and parasite liver stages, and is currently in Phase II clinical trial . Rab11A is a molecular target for aminopyridine class compounds (McNamara et al., 2013), and eEF2 is the target for quinoline-4-carboxamide (DDD107498) compounds, both with activity against multiple lifecycle parasite stages (Baragaña et al., 2015). The eEF2 protein mediates GTP-dependent translocation of the ribosome along the mRNA and is required during protein synthesis. The Rab11A protein is likely involved in cytokinesis and interacts with another antimalarial gene-target, the Pfpi4k (McNamara et al., 2013). Only one non-synonymous SNP was detected for each of these two genes, supporting their likely essential function. A low number of nonsynonymous SNPs (19.6%) was also detected for the mitochondrial cytochrome b (MtcytB) gene. This gene is the target for several antimalarial compounds under evaluation (Dong et al., 2011) and atovoquone, a longstanding antimalarial drug used in combination with proguanil in Malarone™ for the curative and prophylactic treatment of malaria.
The Pfpi4k gene has the highest percentage of non-synonymous   SNPs (71.4%), and is a lipid kinase that is a cellular target of imidazopyrazines and quinoxaline compounds (McNamara et al., 2013). This gene probably acts in the Golgi complex and regulates essential membrane trafficking events (McNamara et al., 2013). We also detected a high number of non-synonymous SNPs for the PfcPhers (62.7%), Pfatp4 (60.9%) and Pfcarl (59.2%) genes. The Pfatp4 and Pfcarl have been extensively studied as antimalarial targets. The Pfcarl is an uncharacterized protein-coding gene that also localises in the Golgi apparatus of the parasite and the Pfatp4 locus probably functions as a Na þ -efflux ATPase (Flannery et al., 2015;LaMonte et al., 2016). Several mutations in these genes have previously been reported, particularly for Pfatp4, to confer resistance to a growing number of antimalarial compounds that are structurally unrelated ( Table 2). None of these mutations have been identified in the set of global field isolates considered here. However, several non-synonymous mutations were observed in their vicinity ( Table 2). The PfcPhers is a recently discovered gene-target that can be inhibited by a novel compound (bicyclic azetidine BRD3444) with action in all parasite life stages (liver, blood, and transmission), and with the advantage that can act in a single low-dose (Kato et al., 2016). One of the non-synonymous SNPs identified for this gene in the global dataset is a mutation (L550V amino-acid change) linked to in vitro resistance to BRD3444 (Kato et al., 2016). This mutation was detected in a few field isolates from the Democratic Republic of Congo (frequency 1.79%) and Ghana (0.5%) ( Table 2). We also detected synonymous SNPs in a codon for which an amino-acid change (V545I) has also been implicated in resistance to BRD3444. Although we did not find reported antimalarial resistance mutations in any of the other genes in the set of clinical isolates, we detected some mutations in their vicinity, including several less than 2 amino-acids away ( Table 2). The potential effect of these natural genetic variants on resistance to antimalarial new components should be investigated.
We also assessed the presence of indels and CNVs, as structural variants have been found to be associated with drug resistance in antimalarial treatments, including pfmdr1 for mefloquine and pfgch1 for sulfadoxine/pyrimethamine (Ravenhall et al., 2016). Copy number variation in the gene-targets Pfatp4, Pfdhodh, coding for the enzyme dihydroorotate dehydrogenase (Ross et al., 2014), and Pfpi4k, have also been identified in parasite lines resistant to antimalarial compounds. No CNVs were identified across the unique regions of the eleven candidate genes. Indels in both homopolymeric and tandem repetitive regions were detected, none changing the reading-frame of the respective proteins.
We also investigated if any of the gene-targets was under selective pressure. Tajima's D values were predominantly negative (82.7%, median À0.32 range À3.51e1.17) indicating an excess of rare alleles, consistent with a historical population expansion of P. falciparum and in keeping with results from genome-wide analyses (Ravenhall et al., 2016). There was no evidence of positive directional selection (all jiHSj<2; median ¼ À0.23, min ¼ À1.58, max ¼ 1.65). There was little evidence of selective pressure in the candidate regions, implying that they are likely to be evolving randomly and under neutrality across geographical regions. We detected several SNPs (63.2%) specific to a single country (Fig. 1) or continent (22.7%). Using the 778 SNPs, a principal component analysis revealed clustering by continent (Fig. 2). The Pfatp4, Pfcarl and Pfpi4k genes contributed the most to the observed regional clustering (Supplementary Fig. 1). The F ST measure was used to identify SNPs with allele frequency differences between countries and continents. This analysis revealed nine SNPs with F ST > 0.45, with clear geographic allelic frequency differences (Supplementary  Table 2), particularly differentiating African from Asian origins. These SNPs were localized in the Pfatp4, Pfcarl, Pfpi4k and the Pfcprs genes. The Pfcprs is a cytoplasmic prolyl-tRNA synthetase and a functional target of febrifugine and its synthetic derivatives with activity at erythrocytic and liver stages (Herman et al., 2015).

Discussion
Continuous monitoring of drug efficacy and genome selection pressure is crucial to ensure early detection and appropriate 2.9, 4.7, 6.7, 3.2, 2.9, 6.7 Bangladesh, Myamar, Thailand, Vietnam, Cambodia, Laos F37V 0 a In grey are amino-acid changes or silent mutations linked to drug resistance in vitro.  response to the emergence of drug resistance. We assessed eleven potential antimalarial gene-targets of compounds that are at various stages of testing, and for which mutations linked to resistance are known. The availability of whole genome sequencing data for worldwide field isolates enabled us to survey the genetic diversity in these targets. We identified one mutation associated with in vitro resistance to the antimalarial compounds in low frequency in two African countries. We also identified several amino-acid changes in close proximity to resistance-linked mutations (8 nonsynonymous substitution detected <2 amino-acids away). These and other mutations detected in these genes might have a role in the development of resistance, highlighting the need for drug screening with field isolates in addition to laboratory adapted strains. The high divergence of Plasmodium biology and lack of crystallized protein structures hindered the assessment of the potential impact of the polymorphisms mutations detected in these genes.
The genetic diversity described here may have a role upon onset of selection, and should be taken into account by surveillance programmes. From these observations, we speculate that for new antimalarial compounds acting on the PfcPhers gene, acquired resistance may occur more rapidly in the field, as pre-existing resistant alleles already circulate, although in low frequency, in clinical isolates. Thus, the identification of suitable partner drugs will be crucial to protect its efficacy. This has been effective in prolonging the use of some antimalarial drugs (e.g. atovaquone, artemisinin), which despite resistance readily evolving in vitro and in the field, have been used as effective antimalarials in combination therapies.
For the Pfrab11A and PfeEF2 gene-targets the particularly low genetic diversity and detection of only 1 non-synonymous mutation could suggest that new antimalarials targeting these genes may have a longer lifespan in the field. Nevertheless, as antimalarial resistant mutations have arisen in vitro in all these gene-targets, and despite several produced low fitness mutants that might not survive in the human body and not be transmitted, combination therapies should be considered to increase the useful therapeutic life of these new compounds.
With the continuous emergence of resistance to artemisinin derivatives, the introduction of new antimalarial drugs is urgent and a priori knowledge of the parasite diversity that these drugs are likely to encounter will aid drug resistance monitoring programmes. Overall, the genetic information described here for eleven gene-targets and across 18 countries from malaria endemic regions, forms a baseline diversity that can assist genetic surveillance studies with detecting allele frequency changes associated with the pressure imposed by a newly introduced drug.

Financial support
This work was supported by the Medical Research Council UK (Grant no. MC_PC_15103 to A.R.G., Grant no. MR/K000551/1, MR/ M01360X/1, MR/N010469/1 to T.G.C. and S.C.) and by the Biotechnology and Biological Sciences Research Council (Grant Number BB/ J014567/1 to M.R.).