Genetic markers associated with dihydroartemisinin-piperaquine failure in Plasmodium falciparum malaria in Cambodia: a genotype-phenotype association study.

BACKGROUND
As the prevalence of artemisinin-resistant Plasmodium falciparum malaria increases in the Greater Mekong subregion, emerging resistance to partner drugs in artemisinin combination therapies seriously threatens global efforts to treat and eliminate this disease. Molecular markers that predict failure of artemisinin combination therapy are urgently needed to monitor the spread of partner drug resistance, and to recommend alternative treatments in southeast Asia and beyond.


METHODS
We did a genome-wide association study of 297 P falciparum isolates from Cambodia to investigate the relationship of 11 630 exonic single-nucleotide polymorphisms (SNPs) and 43 copy number variations (CNVs) with in-vitro piperaquine 50% inhibitory concentrations (IC50s), and tested whether these genetic variants are markers of treatment failure with dihydroartemisinin-piperaquine. We then did a survival analysis of 133 patients to determine whether candidate molecular markers predicted parasite recrudescence following dihydroartemisinin-piperaquine treatment.


FINDINGS
Piperaquine IC50s increased significantly from 2011 to 2013 in three Cambodian provinces (2011 vs 2013 median IC50s: 20·0 nmol/L [IQR 13·7-29·0] vs 39·2 nmol/L [32·8-48·1] for Ratanakiri, 19·3 nmol/L [15·1-26·2] vs 66·2 nmol/L [49·9-83·0] for Preah Vihear, and 19·6 nmol/L [11·9-33·9] vs 81·1 nmol/L [61·3-113·1] for Pursat; all p≤10-3; Kruskal-Wallis test). Genome-wide analysis of SNPs identified a chromosome 13 region that associates with raised piperaquine IC50s. A non-synonymous SNP (encoding a Glu415Gly substitution) in this region, within a gene encoding an exonuclease, associates with parasite recrudescence following dihydroartemisinin-piperaquine treatment. Genome-wide analysis of CNVs revealed that a single copy of the mdr1 gene on chromosome 5 and a novel amplification of the plasmepsin 2 and plasmepsin 3 genes on chromosome 14 also associate with raised piperaquine IC50s. After adjusting for covariates, both exo-E415G and plasmepsin 2-3 markers significantly associate (p=3·0 × 10-8 and p=1·7 × 10-7, respectively) with decreased treatment efficacy (survival rates 0·38 [95% CI 0·25-0·51] and 0·41 [0·28-0·53], respectively).


INTERPRETATION
The exo-E415G SNP and plasmepsin 2-3 amplification are markers of piperaquine resistance and dihydroartemisinin-piperaquine failures in Cambodia, and can help monitor the spread of these phenotypes into other countries of the Greater Mekong subregion, and elucidate the mechanism of piperaquine resistance. Since plasmepsins are involved in the parasite's haemoglobin-to-haemozoin conversion pathway, targeted by related antimalarials, plasmepsin 2-3 amplification probably mediates piperaquine resistance.


FUNDING
Intramural Research Program of the US National Institute of Allergy and Infectious Diseases, National Institutes of Health, Wellcome Trust, Bill & Melinda Gates Foundation, Medical Research Council, and UK Department for International Development.


Introduction
of malaria. 14 The natural selection of low-frequency, preexisting resistance to piperaquine might also be occurring and further contributing to this problem.
Recent increases in treatment failures with dihydroartemisinin-piperaquine 4,9,11,12 and piperaquine 50% inhibitory concentrations (IC 50 s) 15 suggest that piperaquine resistance has emerged in Cambodia. These fi ndings, and the discovery that piperaquine-resistant parasites are sensitive to the former artemisinin combination therapy partner drug mefl oquine, 4 have recently led Cambodia's national malaria control programme and WHO to recommend artesunatemefl oquine as the fi rst-line artemisinin combination therapy in ten Cambodian provinces, including Pursat and Preah Vihear. Molecular markers are urgently needed for large-scale surveillance programmes to predict treatment failures with dihydroartemisinin-piperaquine in Cambodia and other countries in the Greater Mekong subregion, and investigate the molecular mechanism of piperaquine resistance.
Several genetic variations have been associated with decreased piperaquine susceptibility: a single copy of the mdr1 gene has been associated with dihydroartemisinin-piperaquine treatment failures in Cambodian patients, 4 whereas amplifi cation of a region downstream of mdr1 and the crt single-nucleotide polymorphism (SNP) C101F has been associated with raised piperaquine IC 50 s in vitro. 16 These genetic variants are not useful as molecular markers, however, because the fi rst is wild type, the second is very uncommon, and the third has not yet been observed in MalariaGEN P falciparum Community Project's global catalogue of variation in clinical samples. Another study 11 found that a triple mutation (kelch13-C580Y, MAL10:688956 SNP, and MAL13:1718319 SNP) linked with slow parasite clearance is associated with a 5·4 times greater risk of dihydroartemisinin-piperaquine treatment failure in Cambodia's Oddar Meancheay province; however, these SNPs were not linked to piperaquine resistance.
The purpose of this genome-wide association study (GWAS) was to discover molecular markers of dihydroartemisinin-piperaquine treatment failures in Cambodia. In designing it, we reasoned that such markers would associate with raised piperaquine IC 50 s, increase in prevalence over time in areas where artemisinin resistance is common, and be uncommon where

Research in context
Evidence before this study We searched PubMed using the terms "artemisinin", "piperaquine", "resistance", and "Cambodia" without any date or language restrictions. This search identifi ed 26 articles, fi ve of which reported original clinical trials that documented dihydroartemisinin-piperaquine failures in the treatment of uncomplicated Plasmodium falciparum malaria in Cambodia. Collectively, these studies reported failure rates of 11-54% for dihydroartemisinin-piperaquine in Pailin, Pursat, Oddar Meancheay, and Preah Vihear provinces in 2010-13, and associated an increased risk of treatment failure with several parasite genotypes and phenotypes: the presence of kelch13 gene mutations that confer artemisinin resistance; the presence of two mutations (MAL10:688956 and MAL13:1718319) linked to slow parasite clearance after artesunate treatment; the absence of mdr1 gene amplifi cation, a molecular marker of mefl oquine resistance; and decreased piperaquine susceptibility in vitro. These studies did not identify molecular markers of both dihydroartemisininpiperaquine failure and piperaquine resistance.

Added value of this study
This study, which investigated the whole-genome sequences and piperaquine susceptibilities of 297 P falciparum clinical isolates from Cambodia, provides two new molecular markers of piperaquine resistance and dihydroartemisinin-piperaquine failures. The fi rst marker is a single-nucleotide polymorphism, which encodes a glutamic acid-to-glycine aminoacid substitution, in a putative exonuclease gene. This exonuclease marker is associated with decreased piperaquine susceptibility in vitro, and a low rate (0·38) of dihydroartemisinin-piperaquine treatment effi cacy. This marker is found predominantly in Cambodia but not in other areas of the world where dihydroartemisinin-piperaquine has not been used. The second marker is an amplifi cation of the plasmepsin 2 and plasmepsin 3 genes. This plasmepsin marker is also associated with decreased piperaquine susceptibility in vitro, and a low rate (0·41) of dihydroartemisinin-piperaquine treatment success. This marker is a strong candidate for causal mutation because plasmepsins encode aspartic proteases, which are involved in the parasite's haemoglobin degradation pathway targeted by related antimalarial drugs such as chloroquine. These exonuclease and plasmepsin markers are commonly found together, along with kelch13 gene mutations, which confer artemisinin resistance, and a single copy of the mdr1 gene, which is associated with decreased piperaquine, but increased mefl oquine, susceptibility in vitro.

Implications of all the available evidence
In Cambodia, emerging treatment failure with dihydroartemisinin-piperaquine is associated with the presence of plasmepsin and exonuclease markers. The plasmepsin marker is a strong candidate for a functional polymorphism and can be used to monitor the spread of dihydroartemisinin-piperaquine treatment failures in Cambodia and neighbouring countries in mainland southeast Asia, where dihydroartemisinin-piperaquine is a fi rst-line treatment for malaria. Mapping the extent of this marker will enable national malaria control programmes to immediately recommend alternative therapies to improve the treatment of patients and drive the elimination of malaria.

Study design and participants
To obtain samples for this GWAS study, we enrolled patients with uncomplicated P falciparum malaria into parasite clearance rate 3,7 and drug effi cacy 4 studies in 2010-13 in three provinces where piperaquine resistance is common (Pursat), emerging (Preah Vihear), or uncommon (Ratanakiri). Written informed consent was given by adult patients, or a parent or guardian of child patients. Protocols were approved by the Cambodian National Ethics Committee for Health Research and the National Institute of Allergy and Infectious Diseases Institutional Review Board and are registered with ClinicalTrials.gov, numbers NCT00341003, NCT01240603, and NCT01736319.
Using parasitised blood samples from these studies, we measured piperaquine IC 50 s ex vivo or after shortterm culture in vitro, and obtained whole-genome parasite sequence data, whenever possible. The GWAS was designed to identify genetic markers of raised piperaquine IC 50 s, whereas a dihydroartemisininpiperaquine effi cacy study 4 was used to test for association between GWAS candidate markers and parasite recrudescence. PCR genotyping using msp1, msp2, and glurp microsatellites as genetic markers distinguished recrudescences from newly acquired infections. 18 A summary of samples, according to province of origin and year of collection, is shown in the appendix.

GWAS and survival analyses
The preparation, sequencing, genotyping, and phenotyping of samples are described in the appendix. The GWAS and correction for population structure were done using a linear mixed model, implemented in FaST-LMM 19 version 2.06. 17 We tested 11 630 SNPs with minor allele frequency greater than 0·033, using genotypes encoded as the number of non-reference alleles (0 or 1), and excluding heterozygous calls to minimise confounding eff ects of mixed infections. Piperaquine IC 50 was used as the continuous dependent variable. When whole-genome sequence or phenotype data were available for both initial and recrudescent samples, we only analysed data from the recrudescent samples. Diff erent approaches (using initial sample data only, using the phenotype and genotype of the same sample, averaging initial and recrudescent IC 50 s) did not alter results. A relationship matrix was calculated using a subset of 6678 SNPs with minor allele frequency greater than 3% (options: -maf 0·03), missing data rate below 25% (options: -geno 0·25), and unlinked (in windows of 100 SNPs, shifted forward by ten SNPs each time, removing one from each pair of SNPs with linkage disequilibrium >0·3; options: -indep-pairwise 100 10 0·3) and extracted using PLINK 20 version 1.07. In estimating the relationship matrix, we found that excluding proximal SNPs (within 10 kb or 100 kb from the tested variant) did not substantially aff ect results. Given the number of independent SNPs used, we applied Bonferroni correction to defi ne a signifi cance threshold of p of 8·6 × 10 -⁷ or less for GWAS analyses. We also defi ned a suggestive threshold of p of 10 -⁴ or less to identify relatively high-ranking loci.
To adjust for potential confounder eff ects, we treated the geographical origin of the sample, the presence of mdr1 amplifi cation, and the presence of kelch13 resistance alleles collectively as covariates in the linear mixed model. These covariates reduced the genomic infl ation factor λ GC from 3·455 to 1·964 (appendix). We also treated genetic similarity across samples as a random eff ect to correct for confounding eff ects of population structure, which further reduced λ GC to 1·106. Although this value is still slightly above 1·0, this is unlikely due to un-accounted population stratifi cation. When we confi ned the estimation of λ GC to only unlinked SNPs, its value dropped to 1·013, suggesting that residual infl ation is due to extended homozygosity haplotypes in the samples and thus has little eff ect on the GWAS.
We did a survival analysis using the R package survival. We fi tted a Cox proportional hazard regression model, in which treatment success (recrudescent vs non-recrudescent infection) represented the survival status, and we added the age of the patient (in years), parasitaemia on day 0 (log-scaled), and dose of piperaquine given (mg/kg, in fi ve-unit increments starting at 35 mg/kg) as covariates. 21 We then included the two markers (exo-E415G and plasmepsin 2-3) as covariates, and estimated adjusted hazard ratios (aHRs) and adjusted survival curves.

Copy number variations
We subsequently tested the association of copy number variations (CNVs) with piperaquine IC 50 s using the same method and parameters just described. We tested 43 CNVs present in at least fi ve samples; genotypes represented the presence or absence (encoded as 0 or 1) of the CNV. In this analysis, mdr1 amplifi cation was not included as a covariate in the linear mixed model. To call CNVs across the genome, we modifi ed a procedure used previously. 22,23 Briefl y, we fi rst divided the genome into 300 bp non-overlapping bins and calculated for each sample the number of reads whose alignment started within each bin. We then normalised these binned read counts by dividing by the median read count of the core regions of chromosome 9. We excluded bins where guanine-cytosine (GC) content was less than 20% due to coverage bias in most samples. Copy number state for each bin was predicted in each sample by fi tting a See Online for appendix Gaussian hidden Markov model to the normalised coverage data.
After determining the most likely state for each window (Viterbi algorithm), in each sample independently we merged adjacent windows having the same state prediction and allowed a maximum gap of 5000 bp between them. We then further removed windows less than 700 bp or not supported by face-away reads, and data for 43 samples with excessive variation in read coverage. This led to the identifi cation of a median six CNVs per sample. Finally, we merged all the CNV calls across the samples, by considering any partial overlap. In total, we identifi ed 134 regions containing CNVs across the core genome; 43 of these CNVs were present in at least fi ve samples and were tested for association with the phenotype, as described above. For the 133 samples in the clinical study used for the survival analysis, we manually inspected each sample individually and scanned them for the presence of reads spanning the breakpoints.

Role of the funding source
The funders had no role in study design, data collection, analysis, interpretation, or report writing. The corresponding authors had full access to all data in the study and fi nal responsibility for the decision to submit for publication.

Results
We analysed 486 P falciparum clinical isolates collected between July 9, 2010, and Dec 31, 2013, in three Cambodian provinces where artemisinin resistance is entrenched (Pursat), emerging (Preah Vihear), and uncommon (Ratanakiri; appendix). 3 , respectively). These regional diff erences are consistent with the relative prevalences of artemisinin-resistant parasites and recrudescent parasitaemias following dihydroartemisininpiperaquine therapy. 4,7 To investigate the genetic basis of piperaquine resistance in vitro, we did a GWAS analysis of the 297 samples (appendix) for which we had both piperaquine IC 50 s and whole-genome sequences (see Methods). We tested 11 630 SNPs that were well covered in this set of samples, and where one of the two alleles was present in at least ten samples. IC 50 s were treated  as a continuous dependent variable in a linear mixedmodel algorithm, and we treated the province of sample origin, presence of mdr1 amplifi cation, and presence of kelch13 mutations as covariates; to correct for the confounding eff ect of population structure, we treated genetic similarity across samples as a random eff ect (appendix).
GWAS analysis identifi ed one major locus on chromosome 13, containing two non-synonymous SNPs, that were signifi cantly associated with raised IC 50 s (p ≤ 8·6 × 10 − ⁷; table 1, fi gure 1A, appendix). The strongest signal (2·55 times increase in IC 50 after adjustment, p = 2·7 × 10 − ⁹) is from a non-synonymous SNP (referred to as exo-E415G) producing a Glu415Gly substitution in a Chromosome number putative exonuclease. The second strongest signal (p=2·3 × 10 − ⁷) from a non-synonymous variant is an SNP (referred to as mcp-N252D) within the same locus, producing an Asn252Asp substitution in a putative mitochondrial carrier protein 1. A locus on chromosome 12 also reaches genome-wide signifi cance, but only with a single synonymous SNP. To exclude the possibility of a batch eff ect, we repeated the GWAS analysis by including the year of sample collection as a covariate in the linear mixed model (appendix). This analysis confi rmed the locus on chromosome 13 as the only one signifi cantly associated with the phenotype. When also correcting for year, mcp-N252D is the only SNP to reach genome-wide signifi cance (p=7·3 × 10 − ⁷), although exo-E415G is just below the threshold (p=1·3 × 10 − ⁶). These data indicate that the chromosome 13 locus (referred to as L13) shows multiple signifi cant GWAS hits that are robust to multiple corrections.
To prioritise the multiple GWAS hits, we analysed their geographical spread, on the assumption that piperaquine resistance is emerging in Cambodia and neighbouring countries where dihydroartemisinin-piperaquine therapy is used (appendix). Taken together, the results of the GWAS and global allele frequency survey identify exo-E415G as the best candidate SNP marker of piperaquine resistance.
To gain a better picture of its frequency distribution in Cambodia, we used Sanger sequencing of PCR-amplifi ed DNA fragments to genotype this SNP (ie, exo-E415G) in 168 additional samples for which whole-genome sequence data were not available, from the same initial cohort of 241 patients. 4 As expected, given the low p value of this SNP in the GWAS, the phenotype distribution diff ers signifi cantly between the wild-type and the mutant alleles. When stratifi ed by province and year, the association seems to be particularly strong in Pursat and Preah Vihear (fi gure 1B, appendix), and from 2012 onwards (appendix). Consistent with these results, the frequency of exo-E415G appears to increase in Pursat and Preah Vihear over time (fi gure 1C, table 2). In summary, exo-E415G is the best predictor of raised IC 50 s in the current dataset, although this fi nding alone is insuffi cient to infer a causal role for it in mediating piperaquine resistance.
Since gene CNVs have also been associated with drug resistance in P falciparum, 24 we investigated the association of piperaquine IC 50 s with 43 genomic regions exhibiting CNVs in at least fi ve samples. We did a GWAS using the same method we used in the SNP GWAS, except that genotypes were the presence or absence of the CNV in each sample. We found that two CNVs strongly associated with IC 50 s (p≤2·3 × 10 − ⁴; fi gure 2A, appendix). As expected, 4 mdr1 amplifi cation was associated (p=1·4 × 10 − ⁶) with low IC 50 s. However, the most signifi cant association (p=7·6 × 10 − ⁷) was found for a novel amplifi cation on chromosome 14, encompassing two of the ten plasmepsin genes in the P falciparum genome-plasmepsin 2 and plasmepsin 3-that encode aspartic proteases involved in the haemoglobin-tohaemozoin conversion pathway targeted by quinolines. 25 A detailed analysis of sequencing reads aligned within the plasmepsin 2-3 locus revealed that the amplifi cation boundaries were identical in all samples.  The number of samples that are wild-type (WT) or mutant (Mut) for the exonuclease E415G mutation (exo-E415G) and have one (CN1) or multiple (CN2+) copies of plasmepsin 2-3, according to province and year of collection, are shown. Samples where data for one of the two markers were not available or the marker was present in a mixed infection were excluded.   The putative breakpoints lie near the 3ʹ end of both plasmepsin 1 and plasmepsin 3, so that each amplifi cation creates an intact extra copy of plasmepsin 2 together with a new chimeric version of plasmepsin 3, with its 3ʹ end replaced by the 3ʹ end of plasmepsin 1 (appendix). Due to the degree of homology between plasmepsin 1 and plasmepsin 3, the aminoacid sequence of the chimeric plasmepsin 3 protein is identical to that of The prevalence of plasmepsin 2-3 amplifi cation in our cohort shows a steady increase over time in Pursat and Preah Vihear compared with Ratanakiri (appendix), and refl ects our observations of increasing piperaquine IC 50 s and rising prevalence of exo-E415G in these two provinces (table 2, appendix). Despite being on diff erent chromosomes, exo-E415G and plasmepsin 2-3 amplifi cation are in signifi cant linkage disequilibrium (r²=0·56, empirical p=1·6 × 10 -⁵; appendix), and have a very similar allele frequency distribution in these Cambodian data. In particular, of the 462 samples where both markers were reliably typed, 72% (n=334) have neither marker, 19% (n=86) have both markers, and only 9% (n=42) have one of the two markers. Although the co-occurrence of the two markers in the population is interesting and surprising, the few samples where the two markers are found separately makes any conclusion regarding their relative eff ect on the phenotype diffi cult to support statistically (appendix).
To further investigate whether the exo-E415G and plasmepsin 2-3 amplifi cation markers segregate with a newly described piperaquine resistance phenotype, we subjected a subset of 12 clinical isolates in triplicate to the in-vitro piperaquine survival assay. 26 In this assay, the survival of early ring-stage parasites following exposure to piperaquine for 48 h is assessed relative to non-exposed parasites tested in parallel. For these experiments, we selected two groups of parasites (appendix). In the fi rst group, parasites did not recrudesce after dihydroartemisinin-piperaquine treatment, had low piperaquine IC 50 s ex vivo and after cultivation in vitro, were wild type for kelch13 and exonuclease, and carried single copies of mdr1 and plasmepsins 2-3. In the second group, parasites recrudesced after dihydro artemisinin-piperaquine treatment, had high piperaquine IC 50 s ex vivo and after cultivation in vitro, carried single copies of mdr1, and harboured the following mutations: kelch13-C580Y, exo-E415G, and plasmepsin 2-3 amplifi cation. This second group of parasites showed signifi cantly higher piperaquine survival rates (median 61·6 nmol/L [IQR 56·8-67·0]; n=6) than the fi rst (2·4 nmol/L [1·6-2·9]; n=6; p=0·002). The relationship between raised piperaquine survival rate and the presence of either exo-E415G or plasmepsin 2-3 amplifi cation were fully concordant.

Discussion
In GWAS analyses of piperaquine IC 50 phenotypes, we identifi ed two genetic markers of piperaquine resistance in vitro and of dihydroartemisininpiperaquine treat ment failures in patients: a nonsynonymous SNP encoding a Glu415Gly mutation in a putative exonuclease (exo-E415G), and amplifi cation of the plasmepsin 2 and plasmepsin 3 genes (plasmepsin 2-3 amplifi cation). The prevalence of these two markers has increased substantially in recent years in Pursat and Preah Vihear, where artemisinin resistance is prevalent and where dihydroartemisinin-piperaquine has been  Of the associated variations found in this study, plasmepsin 2-3 amplifi cation is the strongest candidate for mediating piperaquine resistance because of the role of plasmepsins in haemozoin synthesis pathways in the parasite food vacuole. Since piperaquine, like chloroquine, is believed to inhibit the conversion of toxic haem moieties to non-toxic haemozoin crystals during haemoglobin digestion, it is possible that piperaquine targets plasmepsin 2, plasmepsin 3, or both, and that overproducton of these plasmepsins counteracts the drug's action. The SNP markers, exo-E415G and mcp-N252D, do not lend themselves to an equally simple explanation and, until functional studies are done, it will not be possible to determine whether either is directly involved in mediating piperaquine resistance, whether their role is compensatory for lost fi tness in piperaquineresistant mutants, or whether they are associated with piperaquine resistance simply because of their strong linkage to plasmepsin 2-3 amplifi cation or some other functional mutation in the Cambodian parasite populations.
Assigning a role to the single copy variant of mdr1, which is associated with lower sensitivity to piperaquine, is equally problematic. Although it is possible that mdr1 plays a functional part in piperaquine resistance (eg, by restricting the amount of drug that enters the parasite's food vacuole), recent changes in Cambodia's antimalarial drug policy have caused a decline in mefl oquine pressure, which might have promoted the loss of mdr1 amplifi cations in the parasite populations. In view of the pronounced population structure following the emergence of artemisinin resistance in Cambodia, 27 it is possible that the association with a single copy of mdr1 is the result of piperaquine resistance having emerged in specifi c artemisinin-resistant, mefl oquine-sensitive populations.
Taken together, these fi ndings identify plasmepsin 2-3 amplifi cation as the most likely candidate to be a causal variant of piperaquine resistance. In the samples studied here, it is strongly linked to exo-E415G, which is on a diff erent chromosome and might represent some other functional component of the resistance phenotype. In practice, this means that the exo-E415G SNP can be presently used as a predictive marker of piperaquine resistance in Cambodia; however, this might not be the case elsewhere. Therefore, plasmepsin 2-3 amplifi cation should be monitored in areas where dihydroartemisinin-piperaquine is used, although it is somewhat more laborious to genotype. Despite unresolved questions about causal mechanism, we now have tools to identify areas where treatment failures with dihydroartemisinin-piperaquine are likely to occur, and thereby to empower national malaria control programmes to make informed dec isions about whether to switch to alternative artemisinin combination therapies for fi rst-line treatment of P falciparum malaria.

Declaration of interests
We declare no competing interests.