Genome-wide association analysis identifies genetic loci associated with resistance to multiple antimalarials in Plasmodium falciparum from China-Myanmar border

Drug resistance has emerged as one of the greatest challenges facing malaria control. The recent emergence of resistance to artemisinin (ART) and its partner drugs in ART-based combination therapies (ACT) is threatening the efficacy of this front-line regimen for treating Plasmodium falciparum parasites. Thus, an understanding of the molecular mechanisms that underlie the resistance to ART and the partner drugs has become a high priority for resistance containment and malaria management. Using genome-wide association studies, we investigated the associations of genome-wide single nucleotide polymorphisms with in vitro sensitivities to 10 commonly used antimalarial drugs in 94 P. falciparum isolates from the China-Myanmar border area, a region with the longest history of ART usage. We identified several loci associated with various drugs, including those containing pfcrt and pfdhfr. Of particular interest is a locus on chromosome 10 containing the autophagy-related protein 18 (ATG18) associated with decreased sensitivities to dihydroartemisinin, artemether and piperaquine – an ACT partner drug in this area. ATG18 is a phosphatidylinositol-3-phosphate binding protein essential for autophagy and recently identified as a potential ART target. Further investigations on the ATG18 and genes at the chromosome 10 locus may provide an important lead for a connection between ART resistance and autophagy.


Results
In vitro sensitivities of field isolates to antimalarial drugs. We evaluated in vitro sensitivities of 94 culture-adapted parasite isolates and the laboratory clone 3D7 to 10 commonly used antimalarial drugs using a SYBR Green I assay and RSA ( Fig. 1a and Supplementary Table S1). With the standard drug assay that measures parasite proliferation, parasite isolates were found to be relatively sensitive to ART family drugs with a median IC 50 value of 1.9 nM to DHA, and 2.5 nM to AS and AM, although the ranges were wide. In contrast, with the RSA which measures the survival rate of ring-stage parasites after short exposure to a bolus DHA treatment, the median survival rate of these clinical isolates was 0.7%, and the range was also very broad (0-64.03%). CQ resistance prevailed in these clinical isolates, with only five sensitive isolates present in the study samples (Fig. 1a,b). Similarly, most parasites were also very resistant to SP with a median IC 50 value of 139.7 μ g/ml and a broad range with ~4,000-fold difference. Though cutoffs to define resistance to CQ and SP were not defined here, the resistant and sensitive isolates were clearly separated by large gaps in their in vitro IC 50 s (Fig. 1a). Likewise, parasites also had relatively wide ranges of IC 50 values to LMF, PND and QN, with > 20-fold differences in IC 50 between the most and the least susceptible isolates. In comparison, parasites showed fairly narrow ranges of IC 50 to PPQ and MQ (< 10-fold difference). A comparison between the clinical isolates and 3D7 revealed statistically significant decreases in sensitivity to all 10 drugs tested in the field isolates (P < 0.0001, Student's t-test, Table S1). A significantly higher median ring survival rate was also observed in clinical isolates compared to 3D7 (P = 0.0003). Hierarchical clustering analysis on the fold changes of drug sensitivities from field isolates as compared with those of 3D7 clearly illustrated multidrug resistance phenotypes in many isolates (Fig. 1b). Yet, stratification of the IC 50 data by year did not reveal significant differences between the years. Pairwise comparison between the 10 drugs by Spearman's rho analysis revealed highly significant, positive, correlations among the IC 50 s of DHA, AS and AM. DHA and PPQ also showed significant positive correlations with all other drugs. Significant positive correlation was also noticed between the following pairs of drugs: PND and MQ, LMF and MQ, CQ and QN, and CQ and SP. However, there was little correlation between RSA results and the IC 50 s of the tested drugs (Fig. 1c), indicating that these two assays measure two different drug response phenotypes.
Single nucleotide polymorphism (SNP) array genotypes. We performed genome-wide genotyping of the parasite isolates utilizing a high-density Affymetrix SNP array 32 . The SNP array detected a total of 17,582 (100%) high-confidence SNPs at a minimum of 90% call rate in all samples, giving an approximate density of one SNP per kb. Low-frequency SNPs [minor allele frequency (MAF) < 0.02)] predominated at 40.4% in the samples ( Supplementary Fig. S1). For quality control, we first removed 1,506 (8.6%) SNPs within the multigene families var, rifin and stevor to avoid artifacts from duplicated sequences. Furthermore, 779 SNPs (1.4%) with more than 10% missing genotypes were also excluded. Missing data in the remaining 15,297 SNPs were imputed by Beagle 33 , which is a linkage disequilibrium (LD)-based imputation program that has the ability to accurately impute missing genotypes and improve the accuracy of association analyses in P. falciparum 34 . This was further filtered to remove multi-allelic and low-frequency SNPs with a MAF of < 2%, resulting in a final dataset consisting of 8,564 SNPs. These SNPs are unevenly distributed across the genome with an average distance of ~4.2 kb between adjacent SNPs ( Supplementary Fig. S2) with the MAF ranging from 2.1% to 49.5%.
Population structure and LD. To determine whether the study population is genetically structured, we performed principal component analysis (PCA, Fig. 2a). This analysis revealed partitioning of the parasites into one major and two minor clusters, indicating that most of the isolates had a similar genetic background. The genome-wide average LD level in this population, as expected, decayed rapidly within a few hundred base pairs and reached a baseline level within 1000 bp (Fig. 2b). Compared to other populations, LD decayed to  50 values to SP and CQ, and ring-stage survival rates to DHA. Dashed line indicates the separation. (b) Fold change of drug sensitivities of field isolates in comparison with those of 3D7. Each column represented the sensitivities of a field isolate to 10 drugs. Hierarchical clustering of parasite isolates was analyzed using "ward" method in the R package stats. The degree of fold change is colour-coded. (c) Correlations between drug sensitivities of parasite isolates to ten antimalarial drugs. The correlations between drug sensitivities were analyzed by Spearman's test. The degree of correlation between sensitivities of two drugs is color-coded. Drug Abbreviations: chloroquine (CQ); sulfadoxine-pyrimethamine (SP); ring-survival rates from RSA; dihydroartemisinin (DHA); artemether (AM); piperaquine (PPQ); quinine (QN); lumefantrine (LMF); and pyronaridine (PND). R 2 = 0.2 within approximately 220 bp in this parasite population. This rate was slower than that in parasites from Cambodia and Thailand, which were reported to be within 187 and 162 bp 34 , respectively, suggesting a lower level of outcrossing and less effective recombination. This result is also consistent with the lower estimated multiplicity of infection (mean, 1.41) in the parasite population from the China-Myanmar border 35 than that from the Thailand-Myanmar border (mean, 2.15) 36 analyzed using the polymorphic marker merozoite surface protein 1 (msp1). However, this estimated LD decay rate was only an approximation given the average SNP resolution of 4.2 kb, while a more accurate calculation requires higher-density SNPs from whole genome sequencing. In addition, despite the overall fast decay in LD, there were a number of SNP pairs with much higher R 2 values (R 2 > 0.3), passing the average distance of 4.2 kb ( Supplementary Fig. S3).
Genetic loci linked to parasite responses to drugs. To detect genes associated with altered susceptibilities to the 10 antimalarial drugs, we conducted a genome-wide association study (GWAS) using the IC 50 values as continuous outcomes. To increase the robustness of the study, we used multiple programs, including genome-wide efficient mixed-model association (GEMMA) 37 , PLINK 38 , linear-regression and nonparametric regression in R, and warped linear mixed model (WarpedLMM) 39 . Whereas linear regression analysis assumes a Gaussian distribution of quantitative phenotypes and requires data normalization, a recent study suggested that data transformations are typically not needed 40 . We therefore exploited both original and normalized drug sensitivity data in our GWAS. The top three principal components were applied as covariates to reduce the influences of population stratification. Quantile-quantile (Q-Q) plots showed that the λ values approached 1 for all drugs, indicating effective correction of the population structure ( Supplementary Fig. S4). A P value of 5.83 × 10 −6 was used as the threshold after Bonferroni correction to assess the significance. As an internal validation of our GWAS, we first assessed associations of known molecular markers pfcrt, pfdhfr and pfdhps for resistance to CQ, pyrimethamine and sulfadoxine, respectively. Different GWAS programs showed different sensitivities in identifying resistance-associated SNPs. Moreover, exploitation of different phenotypic formats also yielded different results. GEMMA and PLINK with log-transformed phenotype data identified 17 and 46 significant SNPs for CQ, respectively, and the pfcrt locus was among the most significant association when normalized phenotype data were used (Fig. 3, Supplementary Fig. S5, and Supplementary Table S2). Flanking genes including pfcg1, glp3, cg2, cg7, lysophospholipase and a conserved gene were also identified as significantly associated with CQ resistance by GEMMA and/or PLINK. Consistently, strong LD (R 2 > 0.3) was detected in a wide region (~177 kb) around pfcrt that harbors these genes (Table S3). Besides pfcrt, a region on chromosome 6 where a phospholipase, an amino acid transporter gene (PF3D7_0629500) and a SET domain protein are located was also positively associated with CQ resistance in the study population. Similarly, the most significant correlation between pfdhfr with SP resistance was detected, and a region of ~75 kb flanking the SP gene showed strong LD, highlighting the feasibility of the GWAS approach for identifying SNPs associated with altered drug sensitivities (Fig. 3, Supplementary Fig. S5 and Supplementary Tables S2 and S3). Interestingly, nine common SNPs were identified as significantly associated with resistance to both CQ and SP (Table S2), including one in pfcrt.
Given that clinical ART resistance is associated with an increased ring survival rate, we performed GWAS using both IC 50 values and RSA results. Of particular interest, an autophagy-related protein on chromosome 10, ATG18 (PF3D7_1012900), along with a neighbor gene NLI interacting factor-like phosphatase (PF3D7_1012700, NIF4) showed a strong association with increased IC 50 s to both DHA and AM in multiple analyses. The SNP in pfatg18 corresponds to a T38N mutation and is present in 42.6% of the study population. Interestingly, this locus was approaching significance in association with decreased sensitivities to PPQ, a partner drug in the ACT mostly used at the China-Myanmar border (Fig. 3, Supplementary Fig. S5, Table 1, and Supplementary Table S2). It is noteworthy that a recent large-scale GWAS also detected the association between NIF4 with delayed parasite clearance half-life (PCt 1/2 ) in patients after treatment with an ART derivative in other locations in Southeast Asia 41 , and ATG18 was recently identified as a potential ART target as it was found covalently bound to an ART analog 19 . In addition, strong LD correlation was observed in a very wide range spanning ~905 kb on chromosome 10, where a gene encoding DNA polymerase delta catalytic subunit (PF3D7_1017000) was reported to be associated with DPC in Southeast Asia 12 . In comparison, association with the original RSA result identified positive associations with genes on nearly all chromosomes, of which five were shared with the positives identified when the analysis was performed with the normalized RSA result. These analyses identified DNA repair protein RAD5 (PF3D7_1343400), a gene within 6 kb of K13, as significantly associated with the RSA phenotype. Consistently, this gene was previously detected to be associated with DPC in Southeast Asia 12 . In addition, two genes, PF3D7_0602400 (elongation factor G) and PF3D7_0602500 (geranylgeranyltransferase), were identified by several GWAS methods with the highest P-values.
For in vitro sensitivities to PPQ, QN, LMF and PND, significant associations were identified with a number of SNPs (Table 1 and Table S2). Of note, three SNPs showed association with PPQ resistance, although they were only detected by one method. Of these, the GTPase-activating protein on chromosome 8 (PF3D7_0804900) was the most significant. Seven SNPs were associated with IC 50 values of QN, among which the PPM6 (PF3D7_1309200) protein on chromosome 13 and a conserved protein (PF3D7_0205800) on chromosome 2, which flanks a DNA  repair protein RAD2, also were correlated with increased ring survival rates. This study identified five loci on chromosome 10 and 13 that were associated with sensitivities to PND. In contrast, only a Plasmodium exported protein (hyp11, PF3D7_1148800) was associated with reduced LMF sensitivity.

Evidence of selection on drug resistance loci.
To investigate the potential evidence of drug selection, we evaluated the SNP diversity by measuring the average heterozygosity. We first assessed the effect of selective sweeps for the known resistance genes, pfdhfr and pfcrt. The parasites were grouped by their drug sensitivities, with resistant and sensitive groups being calculated for their levels of heterozygosity separately. We found clear selection signals in the region of pfdhfr and pfcrt where SNP diversity was reduced (Fig. 4a). Correspondingly, the extended haplotype homozygosity (EHH) analysis using alleles from 3D7 as the reference showed a slow decay of EHH in our samples around both the pfdhfr and pfcrt loci, further suggesting selective sweeps in those regions. Of the candidate resistant loci, we are particularly interested in the locus on chromosome 10, where the ATG18 and NIF4 genes are located and associated with decreased sensitivities to DHA, AM and the ACT-partner drug PPQ. When the parasites were grouped by their genotypes based on the significant SNP in ATG18, we indeed detected an extended region of ~22 kb around ATG18 where heterozygosity was reduced to 0, and the EHH decayed at a slower rate, suggesting there might be a recent selection (Fig. 4a 3 ,b 3 ). We then examined the genome-wide evidence for selection using integrated haplotype score (iHS). Using an iHS score of 3.14 as a significant cutoff value for the top 1% across the population, we detected multiple loci that were likely under significant positive selection ( Fig. 5 and Supplementary Table S4). These included the malaria vaccine candidate genes EBA-175, MSP1, AMA-1 and TRAP, consistent with previous observations in both Asian and African populations 34,[42][43][44] . Positive selection also was detected at the PF3D7_0104100 locus, which was identified in previous selection scans in Southeast Asian and African populations 34,43 . This gene is located within 9.5 kb of pfubp-1 (ubiquitin carboxyl-terminal hydrolase 1), the ortholog of which in the rodent parasite P. chabaudi was linked with ART resistance. Further, pfubp-1 was associated with reduced susceptibility to ART in P. falciparum isolates from Kenya 43 . In addition, an autophagy protein ATG7 also was detected to be under positive selection. Signals for significant associations also were evident in surfin 13.1 and 14.1. However, direct selection on pfcrt and pfdhfr was not detected by iHS. This is because the iHS is defined as the standardized log-ratio of integrated EHH, which computes the integral of the observed decay of EHH away from a focal SNP until EHH reaches 0.05 45 . Although the EHH decay of those two regions from 1 to 0.1 was much slower in our isolates, it decayed faster from 0.1 to 0.05 than that in the reference strain (Fig. 4b 1 ,b 2 ), which resulted in the masking of potentially significant iHS scores.

Discussion
GWAS has become an important approach for identifying molecular makers associated with genetic traits. In malaria parasites, genome-wide scan has been used to identify positive selections in the P. falciparum genome and markers for resistance to antimalarial drugs in different parasite populations 11,12,32,42,43,46-48 . In this study, we tested the association of SNPs from a high-density SNP array with in vitro susceptibilities to 10 antimalarial drugs in 94 P. falciparum parasites collected from the China-Myanmar border area, aiming to discover new genetic loci mediating and modifying drug sensitivities. This border region of China-Myanmar has a different antimalarial drug use history relative to other countries of the GMS such as Cambodia and Thailand. Besides, ART drugs have been used in this region for more than three decades mostly as monotherapies 27 , suggesting that different mutations mediating drug resistance might have been selected over time. The genetic background in these parasites is relatively uniform, and the confounding by a few outliers could be successfully corrected. To improve the confidence of the associations, we performed GWAS by use of multiple programs implemented with linear or nonparametric regression models. Furthermore, because of different distributions of the drug assay results, we used both original and transformed data for all GWAS analyses. Extensive deployments of two antimalarials CQ and SP in the past in most malaria endemic areas have selected for high levels of resistance in parasite populations to these drugs. The principal genes pfcrt and pfdhfr associated with resistance to CQ and pyrimethamine, respectively, have been well characterized and are often used as the proof of principle in GWAS. In our in vitro drug assay, we found that the majority of parasites showed significant resistance to these drugs, and only several strains remained sensitive. The GWAS identified the most significant associations of CQ and SP resistance with pfcrt and pfdhfr, respectively. Although the genome-wide LD decays rapidly, strong LD was detected in a region of ~177 kb around pfcrt, where several genes were identified in the association analysis, consistent with a CQ selective sweep at the pfcrt locus. Encouraged by these results, we extended our analysis to eight additional antimalarial drugs. Five drugs had the same positive hits with at least two analysis programs, while PPQ had positive identifications by only one method.  Given the emergence and/or spread of ART resistance in many areas of the GMS, we performed GWAS using both the standard in vitro drug assay and the newly developed RSA. K13 mutations have been linked to clinical ART resistance in other regions of the GMS 3,13,41 . Using the RSA data, we identified significant association of a SNP in RAD5, a flanking gene of K13, with increased RSA values. In addition, GWAS also identified a region on chromosome 6, where the apicoplast translation protein EF-G and geranylgeranyltransferase are located. Both proteins have been indicated as potential drug targets and inhibition of their function has deleterious effects on parasites 49,50 . The identification of multiple loci associated with the RSA results may be correlated with the numerous potential cellular targets of the ART drugs 19 . Using data from the standard drug assay, two adjacent genes on chromosome 10, ATG18 and NIF4, were among the top list of genes significantly associated with IC 50 data of both DHA and AM. NIF4 also was marginally associated with altered sensitivity to PPQ, a partner drug of ACT which has been used extensively in China as a replacement drug of CQ. Furthermore, reduced heterozygosity and delayed EHH at this locus suggested a recent selection in this region. Interestingly, NIF4 also showed significant association with DPC in parasites from other location of Southeast Asia. Noteworthy, an ART-based probe covalently bound to ATG18, suggesting it might be an ART target 19 . Further, since ATG18 binding to PI3P is required for robust autophagic activity in yeast 51 , regulated autophagy might play a role in parasite's response to oxidative stress imposed by certain antimalarial drugs 52 . Recently, strong evidence suggests that the PI3K signaling pathway is a target of ART, and several K13 mutations were linked to increased activity of PI3K and the corresponding PI3P level 17 . Thus, the association of ATG18 mutation with reduced sensitivities to AM and DHA in the standard IC 50 assay presents a potential, logical link between the autophagy and PI3K pathways in mediating ART resistance.
To detect genome-wide loci potentially under recent positive selection we employed the iHS as a measure of long-range directional selection. Consistent with earlier studies 42,43 , this analysis identified several genes involved in merozoite invasion, which are among the top vaccine candidates. It is well known that these genes are under strong balancing selection by host immunity. Of particular interest is the detection of PF3D7_0104100, a gene adjacent to the deubiquitinating protease gene pfubp1. Association of ubp1 with ART resistance was first identified in P. chabaudi and later identified in P. falciparum 43,53 . The involvement of ubiquitination-proteasome pathway in ART resistance may imply a potential role of ubp1 in stress response 18 . In Kenya, though the mutant pfubp1 was not associated with an increased risk of subsequent recurrence of infection post ACT treatment, the mutant genotype was found to be significantly more prevalent in recurrent samples 54 . In Cambodia where clinical ART resistance has been detected, selection on the pfubp1 gene was not detected 41 , suggesting that the impact of ubp1 may be dependent on parasite genetic backgrounds. The positive selection around pfcrt and dhfr, which was detected by EHH, was not identified by iHS due to the faster decay of EHH from 0.1 to 0.05 in field isolates than that in the reference strain. In addition, we also did not detect long-range selection in other drug resistance gene loci such as K13 and pfmdr1 in our parasite population. This could be due to that the selection on K13 by the ART drugs might be a recent event 55 . As a marker for multidrug resistance, pfmdr1 mutations have been linked to resistance to MQ, AS, and QN in field parasites and confirmed by genetic manipulations 56,57 . Extensive use of MQ in Thailand and Cambodia has selected for increased copy numbers of pfmdr1 58 . As such, selective sweep has been observed in parasite populations from this region 59 . It is likely that the lack of a selective sweep in the pfmdr1 locus in the parasite population from the China-Myanmar border area is due to no previous MQ usage in this region. Likewise, our earlier studies only detected Y184F mutation but no pfmdr1 amplification in parasites from this region 10,29 .
In this study we used robust in vitro drug assay data from 94 culture-adapted parasites from the GMS as phenotypes in GWAS and identified new candidate genes associated with in vitro drug resistance. Though the parasites exhibited > 10 fold difference in their sensitivities to the 10 drugs tested, we did not identify a considerable number of SNPs associated with in vitro resistance for certain drugs (e.g., MQ, AS, PPQ and LMF). In addition, the rapid decay of LD within several hundred base pairs in P. falciparum somewhat limits the power of GWAS with the SNP array data. An increase of population size and use of genome sequencing data are likely to increase the power of the GWAS. This also would allow us to detect the linkage of resistant parasites to genetic background, which may predispose parasites to the evolution of resistance 41 .

Materials and Methods
Parasite collection and culture adaption. Plasmodium falciparum infected blood samples were collected during the years 2004-2011 from febrile patients attending local clinics located along the China-Myanmar border. Uncomplicated P. falciparum malaria was identified by microscopy. The study has been approved by the Institutional Review Boards of Penn State University, Kunming Medical University and Department of Health of Kachin, and informed consent was obtained from all adult participants and from parent/guardian of children. The collecting and processing of venous blood samples were performed in accordance with the Appropriate Technology in Health (PATH) guidelines. A total of 94 parasite isolates (2 in 2004, 8 in 2007; 15 in 2008; 58 in 2009; 8 in 2010 and 4 in 2011), determined to be monoclonal infections by genotyping three polymorphic antigen markers msp1, msp2 and glutamate-rich protein, were retrieved from an archive of culture-adapted parasites and used for in vitro drug assay and genotyping. Routine cultures of the parasites were maintained in type O + human red blood cells in complete medium supplemented with 6% human AB serum under an atmosphere of 90% N 2 /5% O 2 /5% CO 2 .
In vitro drug assay and IC 50 calculation. The standard SYBR Green I-based fluorescence assay was used to assess parasite susceptibilities to 10 antimalarial drugs: DHA, AS, AM, PND, LMF, PPQ, MQ, CQ, QN and SP (S:P = 20:1, w/w), of which CQ and SP served as internal controls to validate the analysis. DHA, AS, AM, MQ, CQ, QN, and SP were purchased from Sigma (St. Louis, MO, USA), while PND, LMF and PPQ were obtained from Kunming Pharmaceutical Co. (Kunming, Yunnan, China). The stock solution of AM was dissolved in dimethyl sulfoxide (DMSO), and SP in 40% DMSO. Others were prepared as previous described 10 . In vitro cultures were synchronized with two rounds of 5% D-sorbitol treatment, and ring-stage parasites were assayed for drug sensitivity in 96-well microtiter plates at 1% hematocrit and 0.5% parasitemia. Three biological replicates were tested for each isolate and each drug concentration was repeated twice. To reduce the variations between plates, the standard laboratory clone 3D7 was always included as a reference. RSA was performed to measure susceptibility of ring-stage parasites to DHA following the INV10 Standard Operating Procedure 10 . IC 50 was measured using a non-linear regression model in GraphPad Prism 5 (GraphPad Software, Inc. La Jolla, CA, USA). Normal distribution of the assay values was tested by Shapiro-Wilk test. Geometric mean of the IC 50 and 95% confidence interval were calculated for normally distributed data. Median and interquartile range were calculated if the data were not normally distributed. The correlation between drug assays was assessed by Spearman's correlation coefficients in R. Student t-test was applied to investigate potentially significant differences in mean assay values between the field isolates and 3D7. SNP array and data imputation. Parasite genomic DNA was extracted from cultured isolates using Wizard ® Genomic DNA Purification Kit (Promega, WI, USA). Genomic DNA was genotyped utilizing a high-density Affymetrix SNP array that allows interrogation of over 17,000 SNPs performed at RML Genomics Unit, Research Technologies Branch, NIAID. The BRLMM-P algorithm was used to call the SNPs as previously reported 32 . Genotypes were transferred to a spreadsheet for further analyses. SNPs within var, rifin and stevor genes were excluded to reduce artifacts due to duplicated sequences. SNPs with call rates below 90% were also removed. The SNP data were then imputed by using the software BEAGLE v3.3.2 33 . This dataset was further filtered to remove multi-allelic and low-frequency SNPs with MAF of < 2% by PLINK 1.9.
LD measurement and population structure determination. The genome-wide pairwise LD was measured by PLINK 1.9, setting a flag of -ld-window-r 2 to 0 to include all pairs of SNPs. A setting of this flag to 0.3 was used to measure the SNP pairs with R 2 > 0. 3. LD values were plotted by using the R software suite. Population structure was investigated by PCA based on the variance-standardized relationship matrix using PLINK 1.9.

GWAS.
Phenotypes were transformed by natural logarithm and rank-based inverse-normal transformation.
To account for zeros in the RSA results, 0.5% was added to all values for log-transformation. GWAS was performed using multiple software packages, GEMMA, PLINK 1.9, linear-regression and nonparametric regression in R. In addition, the newly developed WarpedLMM that estimates optimal transformation of the phenotypes was also applied to the analysis. For PLINK, R and WarpedLMM, the top three principal components as covariates were applied to correct the inflation and reduce the influences of population structure. GEMMA estimates genetic relatedness from genotypes and automatically adjusts the inflation. P values were obtained from each model and a threshold after Bonferroni correction (0.05/number of SNPs analyzed) was used to assess the genome-wide significance. Q-Q plots for P values were used to evaluate the robustness of different models in minimizing inflation due to population stratification. The genomic inflation factor lambda (λ ) was calculated for each drug susceptibility phenotype by an R package snpStats. Manhattan plots were generated by modified scripts from the R package CMplot. SNP diversity analysis. Genetic diversity at resistance loci was evaluated by the average heterozygosity. We calculated the expected heterozygosity using the equation = −∑ − He p i (1 ) n n 1 2 , where He is the expected heterozygosity, n is the sample size and pi is the frequency of the ith allele. The heterozygosity values were averaged by gene, and if there was only one SNP in a gene or no annotation for one SNP, it was grouped in the neighbor gene within 10 kb.

Recent directional selection.
Positive selection at resistance loci was examined by EHH in the R package rehh 60 , a method that detects the transmission of an extended haplotype without recombination. The SNPs at S220A in pfcrt, C59R in pfdhfr and T38N in pfatg18 served as the focal SNPs in the EHH analysis. The iHS, which was also implemented in rehh, was used to identify genome-wide long-range directional selection. The iHS is the standardized log-ratio of the integrated extended-haplotype homozygosity for the ancestral and derived alleles at a core SNP, with large positive values indicating long haplotypes carrying the ancestral allele and large negative values indicating long haplotypes carrying the derived allele. Both extreme positive and negative iHS scores are potentially interesting, since ancestral alleles may hitchhike with selected site having large positive score 45 . Therefore, we used the absolute values of iHS to capture unusually long haplotypes surrounding both types of alleles.