Longitudinal genomic surveillance of Plasmodium falciparum malaria parasites reveals complex genomic architecture of emerging artemisinin resistance

Background Artemisinin-based combination therapies are the first line of treatment for Plasmodium falciparum infections worldwide, but artemisinin resistance has risen rapidly in Southeast Asia over the past decade. Mutations in the kelch13 gene have been implicated in this resistance. We used longitudinal genomic surveillance to detect signals in kelch13 and other loci that contribute to artemisinin or partner drug resistance. We retrospectively sequenced the genomes of 194 P. falciparum isolates from five sites in Northwest Thailand, over the period of a rapid increase in the emergence of artemisinin resistance (2001–2014). Results We evaluate statistical metrics for temporal change in the frequency of individual SNPs, assuming that SNPs associated with resistance increase in frequency over this period. After Kelch13-C580Y, the strongest temporal change is seen at a SNP in phosphatidylinositol 4-kinase, which is involved in a pathway recently implicated in artemisinin resistance. Furthermore, other loci exhibit strong temporal signatures which warrant further investigation for involvement in artemisinin resistance evolution. Through genome-wide association analysis we identify a variant in a kelch domain-containing gene on chromosome 10 that may epistatically modulate artemisinin resistance. Conclusions This analysis demonstrates the potential of a longitudinal genomic surveillance approach to detect resistance-associated gene loci to improve our mechanistic understanding of how resistance develops. Evidence for additional genomic regions outside of the kelch13 locus associated with artemisinin-resistant parasites may yield new molecular markers for resistance surveillance, which may be useful in efforts to reduce the emergence or spread of artemisinin resistance in African parasite populations. Electronic supplementary material The online version of this article (doi:10.1186/s13059-017-1204-4) contains supplementary material, which is available to authorized users.


Background
Artemisinin-based combination therapy (ACT) is the first-line treatment for Plasmodium falciparum malaria infection in most of the world [1,2]. Resistance to ACT treatment, manifested as delayed clearance of parasites following treatment, was first documented in Cambodia in 2009 [3,4] and has since spread throughout SE Asia [5,6]. Mutations in the BTB/POZ or propeller domain of the kelch13 gene (PF3D7_1343700) have been associated with artemisinin resistance (ART-R), as evidenced by in vitro selection [7] and transfection experiments [8], and are associated with reduced cure rates following ACT [9][10][11]. Surveys have documented the rapid increase in frequency of kelch13 mutations in SE Asia over the last 10 years, driven by a combination of de novo mutation creating new resistance alleles and natural selection favoring the spread of existing alleles [5,[12][13][14]. Though kelch13 resistance mutations are becoming prevalent in SE Asia, and have been observed at low frequency in Africa, South America, and other parts of the world, to date they have not been observed to spread in any location outside of SE Asia [13,15].
Several hypotheses explain the failure of kelch13-mediated resistance to spread outside of SE Asia via de novo mutations or human migration, as has previously occurred with resistance to chloroquine [16] and pyrimethamine [17]. Host immune state and/or host genetics may play a role, for example. The selective pressure of ACTs on parasite populations is presumably more intense in SE Asia due to lower disease endemicity, commensurately less acquired immunity to disease, and therefore a greater likelihood that infected individuals will become symptomatic and be treated with drugs. This hypothesis is difficult to exclude, but malaria endemicity is highly variable across sub-Saharan Africa and many other regions, and access to ACTs is high in many regions outside of SE Asia [18], making it unlikely that SE Asian drug selection pressure is unique. An alternative hypothesis, to be explored in more detail in this study, is that kelch13 mutations induce a fitness cost in parasites lacking an appropriate genetic background. In many pathogens, mutations conferring resistance to drugs also confer deleterious fitness effects that are usually suppressed or mitigated by co-segregating compensatory mutations, a phenomenon well documented in bacteria [19,20], yeast [21], and P. falciparum [22][23][24].
If background mutations that abrogate a fitness cost of kelch13 mutations or provide resistance to partner drugs are required for the spread of kelch13 mutations, SE Asia is a favorable location for these mutations to arise and be selected in association with kelch13 mutations. Beyond the fact that ACTs have been in use for much longer in SE Asia relative to Africa [25], offering a longer window of time for requisite background mutations to occur and rise in frequency, SE Asian parasites experience a lower rate of sexual outcrossing than parasites in most African populations. This is because P. falciparum is an obligately sexual but facultatively outcrossing eukaryotic parasite. Meiosis occurs following the union of parasite gametes in the mosquito midgut, but mosquitoes that feed on humans infected by a single genotype of P. falciparum will result in self-fertilization of male and female gametes from the same genotype, rather than outcrossing between unrelated genotypes. In lowtransmission settings like SE Asia, a majority of human infections are caused by a single parasite genotype [26], leading to infrequent outcrossing relative to high transmission regions like Africa, where human infections may contain multiple parasite genotypes. While sexual outcrossing is typically considered to increase the efficiency of directional selection in situations of disadvantageous linkage disequilibrium (LD), outcrossing may be deleterious in non-mutationally limited scenarios where compensatory alleles are likely to arise on a genomic background in which they confer a selective advantage, or where sign epistasis applies [27]. In addition, there is reduced competition between parasites in low-transmission settings because most infected humans and mosquitoes harbor only a single parasite genotype. SE Asia, therefore, may be an ideal setting for kelch13 mutations to maintain association with a favorable genetic background and spread via natural selection.
There is some existing evidence for the involvement of additional loci in ART-R. Two groups have identified a region of chromosome 14 as being associated with slow parasite clearance [28,29]. Miotto et al [30] have suggested that variants in several loci outside of kelch13 are associated with ART-R.
We hypothesized that background mutations providing compensatory fitness for ART-R mutations in kelch13 or mutations at other loci conferring partner drug resistance should rise in frequency over time with kelch13 resistance mutations. We performed whole-genome sequencing of samples collected between 2001 to 2014 from Northwestern Thailand, a period spanning the emergence and spread of ACT resistance in this region [6], using hybrid selection to enrich parasite DNA in early clinical samples collected as dried blood spots without leukocyte depletion. In conjunction with genotype-phenotype association tests and scans for signals of natural selection, we used longitudinal changes in allele frequency to identify a list of candidate mutations that may provide a suitable background for kelch13 resistance mutations. These markers give insight into the mechanism of kelch13-based ART-R, clarify the genomic architecture of this trait, and suggest other loci in the genome that could be informative targets of future ACT resistance surveillance efforts.

Results
We sequenced a total of 194 isolates distributed uniformly among four time intervals (48-50 per time interval) between 2001 and 2014 from five clinics situated within 100 km of each other along the northwestern border of Thailand (Fig. 1). The median clearance rate half-life of the samples sequenced during each time interval exhibits a sharp increase in this region after 2008 [6], indicating that our collection window spans the emergence of ACT resistance in this region (Fig. 1). Although kelch13 mutations are strongly associated with slow clearance (Additional file 1: Figure S1), clearance rate half-life ranges from 3.0 to 9.6 h for parasites with kelch13 mutations, and 16 out of 68 samples with kelch13 mutations exhibit a clearance rate half-life less than 5 h, suggesting that resistance may vary according to the nature of different kelch13 mutations and/or parasite genomic background.

Dataset filtration
We performed an analysis of identity by descent (IBD) among pairwise comparisons of samples within sampling intervals to ascertain changes in the degree of clonality due to recent common ancestry over time, using a hidden Markov model-based approach that makes use of SNP calls [31]. High levels of IBD among samples impedes the identification of individual variants subject to selection due to increased LD among variants within IBD blocks. Analysis of pairwise IBD distributions within each time interval showed only a modest amount of recent common ancestry during the first three sampling intervals but a high level of clonality among isolates collected in 2014 (Fig. 2). This phenomenon resembles the previously documented increase in parasite clonality in Cambodia [32] and most likely stems from decreasing disease transmission in this region [26] coupled with the selective sweeps of multiple kelch13 resistance mutations. The increased IBD among the 2014 samples elevates the LD between SNPs, making it difficult to identify signals associated with individual background mutations. Therefore, the 2014 samples were excluded from subsequent  analyses of temporal frequency trends, but were included in the genome-wide association study (GWAS). Other isolates were discarded due to low sequencing coverage depth (see "Methods"), resulting in a high-quality dataset, with an average of 82% of their genomes exhibiting at least tenfold sequencing coverage, and 87% exhibiting at least fivefold coverage. A set of 134 isolates were used in the temporal analysis and an extended collection of 150 isolates, including samples from 2014 with available parasite clearance phenotypes, were used in the GWAS (Additional file 2: Table S1). The dataset was also filtered based on location and nature of polymorphic sites (see "Methods"). The resultant 17,911 SNPs were analyzed by a conventional genotypephenotype association analysis aimed at detecting variants associated with a low parasite clearance rate under artesunate therapy and 15,117 SNPs were evaluated using a phenotype-agnostic approach to identify variants with a temporal trend and other features suggestive of ACT selection.

Frequency trajectory of kelch13 mutations
Fifteen distinct nonsynonymous kelch13 mutations were found among sequenced isolates (Additional file 1: Figure S2). Figure 3a shows the frequency of kelch13 mutations exhibiting greater than 5% allele frequency in at least one of the first three sampled time intervals.
Although other kelch13 mutations exhibit a higher frequency before 2011, C580Y overtakes those in the 2011-2012 sample. Two independent origins of the C580Y mutation were inferred by the observation of shared haplotypes in the vicinity of that mutation (Additional file 1: Figure S4), consistent with previous SNP genotyping analyses of parasites from the Thai-Myanmar border [33] and other SE Asian locations [12].

GWAS results
A GWAS using the filtered set of SNPs identified the kelch13 C580Y mutation as significantly associated with slow parasite clearance at a genome-wide level (P = 8.15e-06, Q = 0.037) [34]; Q-Q and Manhattan plots are shown in Additional file 1: Figure S5. The 100 most significant associations identified using this approach are listed in Additional file 3: Table S2.
An additional association analysis was performed with only those samples containing kelch13 mutations in a search for other variants that could be potentiating slow clearance rates mediated by kelch13 mutations. No variants exhibited a statistically significant association with clearance rate after correction for multiple testing. However, among the ten most significant nonsynonymous SNPs associated with ART-R (Additional file 4: Table S3) we identified a SNP (P = 2.1e-3, Q = 0.47) located on chromosome 10 in a gene functionally annotated as "kelch protein, putative" (PF3D7_1022600; kelch10). kelch10 has  limited sequence similarity to kelch13 (Additional file 1: Figure S3b, c), restricted to a few amino acid positions between one instance of the Kelch-type beta-propeller domain (IPR015915) and the Galactose oxidase, betapropeller domain (IPR015916) of kelch13 [35]. Both domains were defined based on tertiary structure, imported by InterPro from CATH, a protein structure classification database [36], and they represent betapropellers with six and seven blades in kelch10 and kelch13, respectively, explaining the lack of amino acid sequence similarity between the loci. The mutation in kelch10 induces a proline to threonine amino acid substitution at position 623 (P623T) and is located between instances of beta-propeller domains (Additional file 1: Figure S7b). P623T exhibits variable impact on parasite clearance rate half-life in the presence of different kelch13 mutations (Additional file 1: Figure S3d). It significantly increases parasite clearance rate half-life in the presence of the kelch13 E252Q and C580Y mutations (Wilcoxon rank sum test, P = 8.7e-3 and P = 8.1e-3, respectively), but not other kelch13 mutations. The kelch10 P623T mutation confers a nearly statistically significant increase in clearance rate on a wild-type kelch13 background (P = 0.07; Additional file 1: Figure S3d). The large difference between the median clearance half-life between isolates with and without P623T drove us to pursue additional genotyping data to confirm this observation. We used PCR-based Sanger dideoxy sequencing to genotype the kelch10 mutation in 68 additional samples with clearance rate data and the kelch13 E252Q mutation from the same geographic region. SNP genotyping data (not shown) indicate that 52 distinct parasite clones are represented within these 68 additional samples, with four clonal groups having the P623T mutation in kelch10. Clonal groups containing the E252Q (kelch13) and P623T (kelch10) mutations exhibit a significantly slower clearance rate relative to those with only the E252Q mutation (Wilcoxon rank sum test, P = 3.21e-3; Fig. 4). Given that E252Q is the only common kelch13 mutation in SE Asia that occurs outside the BTP-POZ and propeller domains of kelch13, this association may be evidence of an epistatic relationship with kelch10 that potentiates the resistance phenotype of E252Q. Further functional work will be necessary to elucidate the relationship between these variants and the relationship between kelch10 P623T and kelch13 C580Y mutations. A paucity of isolates exhibiting wild-type kelch13 and slow clearance also compromised the power of a GWAS restricted to wild-type kelch13 isolates. No statistically significant associations were observed after correction for multiple testing.

Temporal screen for kelch13 background SNPs
The set of 15,117 genome-wide SNPs meeting the quality and frequency filters described in the "Methods" section was screened for positions exhibiting a nonreference allele frequency (NRAF) trajectory similar to C580Y, under the hypothesis that any variants contributing to a fit genomic background for kelch13 mutations should increase in frequency in tandem with the most successful kelch13 mutation. Two approaches were utilized to identify variants with NRAF trajectories similar to C580Y.
The stricter approach, requiring an NRAF of 0% in the first two sampling intervals and an NRAF >5% in the third interval, yielded 779 SNPs, a set that we designate The difference between distributions shown in each boxplot was evaluated by a Wilcoxon rank sum test C580Y-like. An alternative approach, utilizing a threedimensional vector-distance metric not involving hard NRAF thresholds, yielded 382 SNPs, 158 of which also belong to the C580Y-like set. After removing shared SNPs we designated this set as C580Y-vector-like. Variants segregating non-independently as part of linked haplotype blocks within these two variant sets were identified as described in "Methods", and single "tag SNPs" were selected to represent blocks of variants that routinely co-occurred within samples. This reduced the number of variants within the C580Y-like and C580Y-vector-like sets to 254 and 208 independently segregating variants. Variants in these two lists were next filtered for their degree of association with parasite genotypes harboring mutant kelch13 genotypes, under the assumption that background mutations under selection for positive fitness effects in conjunction with kelch13 mutations should be found disproportionately in association with such mutations. Figure 5a and Additional file 1: Figure S6c illustrate the degree of association of C580Y-like and C580Y-vectorlike variants with mutant kelch13, relative to panels of control SNPs exhibiting similar NRAF in the third (2011-2012) sampling interval. The candidate background SNPs were binned into quartiles according to their 2011-2012 NRAF. In the C580Y-like set, the lowest quartile of candidate background variants (5% < NRAF ≤ 6%) exhibits no significant difference from control SNPs with regard to LD with mutant kelch13 (P = 0.47; Wilcoxon rank sum test). The higher frequency quartiles all exhibit enriched LD with mutant kelch13 relative to frequency-matched control SNPs, however, suggesting they may contain legitimate background variants for resistance mutations at that locus (6% > NRAF ≤ 8%; P = 3.49e-3; 8% > NRAF ≤ 10%, P = 1.14e-3; 10% < NRAF ≤28%, P = 2.23e-8). LD analysis of the C580Y-vector-like set yields qualitatively similar results (Additional file 1: Figure S6c).
Both the C580Y-like and C580Y-vector-like variant sets were also analyzed for signatures of natural selection using the iHS (integrated haplotype score) statistic [37], assuming that variants exhibiting upward NRAF trajectories as a result of neutral genetic drift or sampling error would be unlikely to exhibit independent evidence of selection. Similar to the mutant kelch13 analysis, candidate variants were binned into quartiles according to their 2011-2012 NRAF and compared to control SNPs exhibiting similar NRAF in that sampling interval. Figure 5b and Additional file 1: Figure S6d illustrate the results of this analysis for the C580Y-like and C580Y-vector-like sets, respectively. As with the analysis for LD with mutant kelch13, in the C580Y-like set variants in the lowest 2011-2012 NRAF quartile (5% < NRAF ≤ 6%) exhibited no significant difference in iHS distribution relative to frequency-matched control SNPs (P = 0.37, Wilcoxon rank sum). Once more, however, C580Y-like variants in . The C580Y-like SNPs with NRAF less than 8% generally exhibit association with fewer kelch13 alleles than control SNPs, indicating they may be more likely to be products of genetic hitchhiking than C580Y-like SNPs with higher NRAF the three higher NRAF quartiles exhibited significantly lower iHS distributions than control SNPs, suggesting they may be subject to recent natural selection (6% > NRAF ≤ 8%, P = 7.6e-7; 8% > NRAF ≤ 10%, P = 1.74e-6; 10% > NRAF ≤28%, P = 3.22e-10; Wilcoxon rank sum test). iHS analysis of the C580Y-vector-like set for natural selection again yields qualitatively similar results (Additional file 1: Figure S6d). The observation of enhanced LD with mutant kelch13, as well as evidence of enriched signals of natural selection in candidate SNPs exhibiting high NRAF in 2011-2012, indicates that some of these variants may constitute part of a genomic background required for parasite fitness in the presence of kelch13 mutations. An alternative hypothesis is that high-NRAF candidate background SNPs could exhibit these LD and selection signatures as a result of selective sweeps targeting the kelch13 resistance mutations, and that some or all of these candidate background variants are themselves selectively neutral. Though these candidate markers reside on distinct chromosomes (Additional file 5: Table S4; Additional file 6: Table S5), reduced opportunities for sexual outcrossing in a low-transmission setting like Thailand could result in selective sweeps that impact the whole genome, rather than just the immediate vicinity of the selected kelch13 variants on chromosome 13.
To explore this hypothesis, candidate SNPs were examined for the richness of their association with different kelch13 mutations, under the hypothesis that if these mutations are neutral and were dragged up in frequency due to chance co-occurrence with a kelch13 mutation undergoing a selective sweep, they should be primarily associated with a single kelch13 mutation rather than co-occur with multiple different kelch13 mutations. This analysis indicates that many of the candidate background SNPs with low NRAF exhibit low allelic richness in their kelch13 mutation association, but that candidate background SNP exhibiting medium to high NRAF (>8%) are found in association with multiple different kelch13 mutations, similar to a set of frequency-matched control variants (Fig. 5c), reducing the likelihood that their NRAF increase is due to a trivial kelch13 hitchhiking scenario. Additional file 5: Table S4 lists all nonsynonymous candidates in the C580Y-like set with NRAF greater than 8%. Additional file 6: Table S5 lists nonsynonymous candidates in the C580Y-vector-like set that are disjunct from the C580Y-like set. The Shannon entropy index of the association with distinct kelch13 mutations was calculated for each candidate. SNPs with a higher Shannon entropy index are less likely to have been detected due to hitchhiking with a single kelch13 mutation, given their association with multiple distinct kelch13 mutations (Additional file 5: Table S4; Additional file 6: Table S5). We also investigated the distribution of IBD regions across the parasite genome to evaluate whether localized enrichment of IBD could be driving the observed temporal changes in SNP frequency. IBD is generally rare in this dataset, observed between fewer than 1.5% of all pairwise sample comparisons for any genomic region (Additional file 1: Figure S7). And despite high levels of IBD observed between some pairs of samples (Fig. 2), the occurrence of IBD is distributed relatively uniformly across the genome, suggesting that the observed temporal frequency changes in SNPs in the C580Y-like and C580Y-vector-like sets are not being driven by large blocks of IBD regions. Table 1 contains a list of some of the most promising candidate background SNPs that pass these filters in the C580Y-like and C580Y-vector-like sets, and it includes selected candidates from the GWAS sets. Ranked either by 2011-2012 NRAF or Shannon index, the top candidate in the C580Y-like set is a nonsynonymous mutation in a putative phosphatidylinositol 4-kinase (PI4K) locus (PF3D7_0419900). A distinct Plasmodium PI4K beta locus has been identified as a potential Plasmodium drug target [38,39], and another component of the phosphatidylinositol pathway (phosphatidylinositol-3phosphate (PI3P)) has been implicated in the mechanism of kelch13-mediated ART-R [40]. Further functional work will be required to evaluate the potential compensatory role of this and other candidate background mutations, including nonsynonymous mutations in a gene encoding a Sec14 domain-containing protein (PF3D7_0626400), a member of a family of proteins also associated with vesicle trafficking, previously reported as having orthologs in yeast functioning as regulators of PI4K [41] and ranked among the top 14 candidates in the C580Y-like set when ranking by Shannon index and 2011-2012 NRAF. Several other genes encoding proteins associated with vesicle trafficking and/or the endoplasmic reticulum are also in the list (Table 1).

Discussion
Assuming a conventional eukaryotic mutation rate of approximately 3 × 10 −9 mutations per base pair generation [42,43], and assuming that a typical P. falciparum infection results in approximately 10 11 parasites during its peak [44], it is likely that virtually all of the 23 million nucleotide positions in the parasite genome mutate into all possible alternative states during the course of a single infection within an individual. The observation that drug resistance does not routinely emerge in populations after years of use of a drug as first line therapy suggests that the genomic architecture of drug resistance that induces minimal fitness cost to the parasite is complex. To be favored by natural selection, resistance mutations must occur on an appropriate genetic background capable of positively potentiating the resistance phenotype and/or compensating for any deleterious fitness impacts that may result from the resistance mutations. Observations of restored susceptibility in parasite populations to anti-malarial drugs within several years of their withdrawal due to high levels of resistance attest to the negative fitness impact of resistance mutations in the absence of drug pressure [45,46]. In this study, we have performed retrospective longitudinal surveillance of the P. falciparum genome for mutations inside and outside of the kelch13 ART-R locus that may be required for resistance to spread in northwest Thailand. Many of the same longitudinal samples were examined by Cheeseman et al. [47] using a lower resolution targeted SNP genotyping approach in a study that initially identified the chromosome 13 region adjacent to kelch13. Longitudinal genomic surveillance is complementary to GWAS as a means of identifying loci associated with a phenotype subject to natural selection and has the advantage of not requiring phenotype data, which can be much more difficult to collect than genomic data [48][49][50][51][52]. The present results are based on relatively small sample sizes, and the candidate loci we identified require further functional validation to confirm they are the products of direct or indirect selection from ACTs. The falling cost of genome sequencing, however, makes a compelling argument for more highly powered prospective longitudinal genomic surveillance in pathogen populations where the evolution of resistance to therapeutics is a risk.
These results, together with the very detailed longitudinal profile of the temporal dynamics of resistance mutations within the kelch13 locus itself over the same time interval [14], speak to the complexity of drug resistance evolution when observed in real time instead of after the evolution of a fit resistance genotype. This complexity could give fair warning of the emergence of resistance if changes were observed prospectively via a genome-wide surveillance program. For drugs with resistance profiles requiring multiple mutations, alternation of the drugs used for treatment on a cycle of months or years could disrupt the successful combinations of resistance mutations and genetic backgrounds before either gets too high in frequency in a pathogen population. Drug cycling to deter resistance evolution has been proposed, modeled, and experimentally tested in a bacterial context [53][54][55][56], but the effect of the genomic architecture on this kind of therapeutic regimen in a sexually outcrossing eukaryotic parasite has not been fully explored.
These results also highlight the potential shortcomings of in vitro resistance selection studies. While kelch13 was discovered by sequencing parasites selected in vitro for resistance to artemisinin [7], such studies cannot recapitulate the complex fitness landscapes influencing the evolution of parasites exposed to drugs in the field. Mutations observed when sequencing culture-adapted parasite lines selected for resistance to an anti-malarial compound may or may not represent changes that are eligible for spread in parasite populations in the field, because cultured parasites are not exposed to the same fitness-determining challenges as wild parasites. And different wild populations of parasites may assemble a fit resistance genotype in different ways. Notably, we failed to identify any mutations with our GWAS and temporal analyses in NW Thailand that were previously found to be associated with ART-R in Cambodia [30]. This lack of replication does not impugn the results from the Cambodian study, because there may legitimately be different determinants of fit genomic backgrounds in the two parasite populations. Ideally, markers for resistance and genomic backgrounds that enable resistance should be identified from parasites collected as close as possible to the geographic setting in which surveillance will take place.
The variants we have identified as candidate background mutations required for the spread of kelch13 resistance mutations occur in some genes that belong to pathways already hypothesized to contribute to ART resistance in P. falciparum. Mbengue and colleagues [40] found elevated levels of phosphatidylinositol-3-phosphate (PI3P) to be associated with ART resistance in vitro. Elevated levels of PI3P can result from polyubiquitination of phosphatidylinositol-3-kinase (PI3K). We observed no polymorphisms in PI3K, but our top hit in the C580Ylike set was a nonsynonymous mutation in a PI4K locus (PF3D7_0419900; distinct from the PI4K described in other recent studies [38]), which may impact PI3P or other relevant members of the phosphoinositol pathway in a manner conducive to evolutionarily fit resistance. Dogovski and colleagues [57] identified the proteasome/ ubiquitination pathway as associated with ART resistance via that pathway's involvement in the cellular stress response. They hypothesize that an enhanced stress response, manifested via lower levels of ubiquitination, delays cell death and confers resistance to ART, and observe enhanced resistance in vitro when ART is coadministered with proteasome inhibitors. We also find several loci associated with the proteasome/ubiquitination pathway, including a putative sentrin-specific protease (SENP2; PF3D7_0801700) and a putative ubiquitin protein ligase (PF3D7_1448400). These loci, as well as the kelch domain-containing protein from chromosome 10 (PF3D7_1022600) that may be epistatically associated with the E252Q mutation at the kelch13 locus, constitute a list of potential genetic surveillance targets in other regions of the world where kelch13 mutations and/or phenotypic resistance are observed, such as Guyana [58].

Conclusions
Longitudinal genome-wide surveys of P. falciparum parasite populations are a powerful tool for identifying markers of resistance and understanding the nature of resistance evolution. Such surveys should be conducted prospectively in pathogen populations where resistance is anticipated to evolve. Our analysis suggests a complex genomic architecture behind the emergence and spread of ART resistance in NW Thailand. The potentially multi-locus nature of high fitness ART-R genotypes decreases the likelihood that such genotypes will emerge de novo in parasite populations in sub-Saharan Africa, most of which have a much higher rate of sexual outcrossing concomitant with higher transmission. However, introduction of high fitness Asian ART-R parasites bearing a suite of genomic adaptations into sub-Saharan Africa could accelerate the establishment of ART-R there, as occurred with pyrimethamine resistance when the triple mutant dhfr haplotype was introduced from Asia [17]. The present results justify the continued investment of resources to contain ART-R to those regions of SE Asia where it has shown the capacity to spread.

Sample collection and resistance phenotyping
Details on sample collection and ART phenotyping have been published previously [6]. Briefly, blood samples were collected from Karen and Burmese patients with a high parasite count (>4% infected red blood cells) but no signs of severe malaria. Collection was performed at five clinics (Maela, Wang Pha, Mae Ra Mat, Mae Kon Khen, and Mawker Thai) located along the Thai-Myanmar border (Fig. 1) between 2001 and 2014. Samples collected prior to 2010 were collected as blood spots on filter paper from finger pricks and underwent hybrid selection to enrich parasite DNA prior to sequencing [59]. Samples collected from 2010 onward were collected as venous blood and depleted of leukocytes to reduce host DNA content, and were therefore sequenced without hybrid selection. Approximately 48 samples from each of four time intervals (2001-2004, 2008, 2010-2011, and 2014) were selected for sequencing following SNP genotyping to avoid sequencing multiple isolates exhibiting the same parasite genotype [26]. Patients were treated with artesunate for 48 h, receiving artesunate-mefloquine combination therapy after 48 h [6], and parasite density was monitored at 6-h intervals until patients were slide negative. Parasite clearance data were used to estimate clearance half-life as described previously [14]. In brief, we plotted the log linear decay in parasite density over time and determined the time taken for parasite density to reduce by half; hence, a twofold change in clearance half-life from 3 to 6 results in 256-fold reduction in parasite density over a single 48-h asexual parasite cycle. Previous investigations of the clearance half-life in distinct infections caused by the same parasite clone have demonstrated the high heritability (h2 = 0.67) and robustness of clearance half-life as a measure of ACT resistance [51,60]. Ethical approval for this work was given by the Oxford Tropical Research Ethics Committee (OXTREC 562-15) and the Faculty of Tropical Medicine, Mahidol University (MUTM 2015-019-01).

Library preparation and sequencing
Illumina sequencing libraries containing a 200-bp insert were constructed and samples were sequenced on an Illumina HiSeq 2500 platform using paired-end 101-bp reads. Samples collected prior to 2010 were enriched for parasite DNA using a previously published hybrid selection protocol [59] and a set of RNA oligonucleotide baits prepared by random transcription of genomic DNA prepared from the 3D7 parasite line.

Genotype calling
Reads were aligned using BWA [61] (default parameters) against the P. falciparum 3D7 v3 reference genome assembly (PlasmoDB release 12.0). Fourteen samples exhibiting less than fivefold read coverage depth across 60% or more of the reference assembly were excluded from analysis. Genotype calling was performed using the GATK UnifiedGenotyper (version 2.4-9) [62] using the following parameters: EMIT_ALL_SITES, -stand_emit_ conf 0.0, -stand_call_conf 0.0, -sample_ploidy 2 and -glm SNP (SNPs only, no INDELs). Annotation of polymorphic sites and evaluation of their effect on the coding sequence of genes was done via SNPEff [63].

Genotype filtering
Polymorphic sites meeting any of the following criteria were removed from the analysis using VCFTools [64] and custom scripts: heterozygous sites, QUAL < 60, GQ < 30, indels, sites lacking genotype calls in 10% or more of the samples within one or more time intervals and sites with NRAF lower than or equal to 5% in all time intervals. NRAF was calculated using custom scripts. Polymorphic sites located in pericentromeric, subtelomeric, and hypervariable regions (listed in Additional file 7: Table S6) were removed, as were those occurring in genes belonging to large antigenic gene families (Additional file 8: Table S7).

Identifying regions identical by descent
A previously described hidden Markov model was utilized to identify identical by descent (IBD) genomic regions between pairwise comparisons of samples [31].

Genome wide association study
Genotype calls were converted from VCF format to PLINK format using VCFTools [64]. GEMMA [65] was used to evaluate the degree of association between genotype calls and the quantitative clearance rate half-life values under artemisinin treatment. Only positions with less than 10% missing genotypes and with the major allele frequency higher or equal to 15% were evaluated (GEMMA parameters: -miss 0.10 -maf 0.15). Q-Q and Manhattan plots were rendered using the R package qqman [66,67]. Local false discovery rate (q value) was calculated using the R package q value [34].

Comparison between kelch13 and kelch10
Protein sequences of the kelch13 and kelch10 (PF3D7_ 1022600) genes were aligned using the NCBI-BLAST web page [68]. Domains were identified using InterProScan [35,69]. Projections connecting regions of similarity between the linear representation of kelch13 and kelch10 sequences (Additional file 1: Figure S3) were rendered by Kablammo [70]. Boxplots in Additional file 1: Figure S1, as others boxplots in this work, were rendered using the ggplot2 R package [71].

Temporal analysis
Two methods were employed to identify SNP sets with temporal frequency signatures concordant with selection for ACT resistance. First, SNPs were identified that exhibited a sample frequency history strictly similar to the C580Y kelch13 mutation (C580Y-like), which has prevailed as the most successful ART resistance mutation in the region [14]. To be defined as C580Y-like, SNPs were required to exhibit a NRAF of 0 during the first two sampling intervals (2001-2004 and 2008) and NRAF >5% during the third sampling interval (2011-2012).
The second approach to identify SNPs with population histories similar to that of C580Y was less strict with regard to NRAF requirements. SNP frequency changes over time were interpreted as three-dimensional vectors, with each dimension corresponding to the NRAF of the SNP in one of the first three sampling intervals (2001-2004, 2008, and 2011-2012). The Manhattan distance was calculated between the vector representing C580Y and the vectors representing all other SNPs. Distances to the C580Y vector were ranked and a cutoff value was applied according to the first plateau in this series (Additional file 1: Figure S6a). The plateau was empirically defined as a consecutive series of at least 30 C580Y vector distances with a slope equal to 0. SNPs with distances less than the first plateau were designated as C580Y-vector-like.
Defining haplotype blocks LD was measured by calculating the Pearson correlation coefficient (r) between all pairs of SNPs within the C580Y-like and C580Y-vector-like sets using custom scripts. Pairs of SNPs exhibiting r ≥0.8 were classified as belonging to the same haplotype block.

Extended haplotype homozygosity
Genotypes were imputed by Beagle [72] using the recombination map described in [73] and default parameters. Integrated haplotype score (iHS) [37] was calculated via the REHH R package [74] using scan_hh command (limhaplo = 2, limehh = 0.05, limehhs = 0.05) and the Plasmodium reichenowi genome as the ancestral genotype. When ancestral state could not be determined from P. reichenowi, the P. falciparum 3D7 v3 (PlasmoDB release 12.0) assembly was assumed to reflect the ancestral allele. iHS of triallelic sites (distinct ancestral, derived, and reference alleles) were not computed.

LD with mutant kelch13
"Mutant kelch13" was defined as any kelch13 genotype containing one or more non-synonymous differences relative to the 3D7 reference sequence. LD (r) was calculated between every candidate SNP in the C580Y-like and C580Y-vector-like variant sets and mutant kelch13.

Correlation between 2011-2012 allele frequency and signatures of selection
The median of iHS and the median LD with mutant kelch13 were calculated for each haplotype block and for