Malaria Molecular Surveillance in the Peruvian Amazon with a Novel Highly Multiplexed Plasmodium falciparum AmpliSeq Assay

While the power of next-generation sequencing technologies to inform and guide malaria control programs has become broadly recognized, the integration of genomic data for operational incorporation into malaria surveillance remains a challenge in most countries where malaria is endemic. The main obstacles include limited infrastructure, limited access to high-throughput sequencing facilities, and the need for local capacity to run an in-country analysis of genomes at a large-enough scale to be informative for surveillance. ABSTRACT Molecular surveillance for malaria has great potential to support national malaria control programs (NMCPs). To bridge the gap between research and implementation, several applications (use cases) have been identified to align research, technology development, and public health efforts. For implementation at NMCPs, there is an urgent need for feasible and cost-effective tools. We designed a new highly multiplexed deep sequencing assay (Pf AmpliSeq), which is compatible with benchtop sequencers, that allows high-accuracy sequencing with higher coverage and lower cost than whole-genome sequencing (WGS), targeting genomic regions of interest. The novelty of the assay is its high number of targets multiplexed into one easy workflow, combining population genetic markers with 13 nearly full-length resistance genes, which is applicable for many different use cases. We provide the first proof of principle for hrp2 and hrp3 deletion detection using amplicon sequencing. Initial sequence data processing can be performed automatically, and subsequent variant analysis requires minimal bioinformatic skills using any tabulated data analysis program. The assay was validated using a retrospective sample collection (n = 254) from the Peruvian Amazon between 2003 and 2018. By combining phenotypic markers and a within-country 28-single-nucleotide-polymorphism (SNP) barcode, we were able to distinguish different lineages with multiple resistance haplotypes (in dhfr, dhps, crt and mdr1) and hrp2 and hrp3 deletions, which have been increasing in recent years. We found no evidence to suggest the emergence of artemisinin (ART) resistance in Peru. These findings indicate a parasite population that is under drug pressure but is susceptible to current antimalarials and demonstrate the added value of a highly multiplexed molecular tool to inform malaria strategies and surveillance systems. IMPORTANCE While the power of next-generation sequencing technologies to inform and guide malaria control programs has become broadly recognized, the integration of genomic data for operational incorporation into malaria surveillance remains a challenge in most countries where malaria is endemic. The main obstacles include limited infrastructure, limited access to high-throughput sequencing facilities, and the need for local capacity to run an in-country analysis of genomes at a large-enough scale to be informative for surveillance. In addition, there is a lack of standardized laboratory protocols and automated analysis pipelines to generate reproducible and timely results useful for relevant stakeholders. With our standardized laboratory and bioinformatic workflow, malaria genetic surveillance data can be readily generated by surveillance researchers and malaria control programs in countries of endemicity, increasing ownership and ensuring timely results for informed decision- and policy-making.

T he Global Technical Strategy for Malaria 2016-2030 identified the use of highquality surveillance data for decision-making as an essential pillar for malaria elimination (1). In view of the current challenges in malaria control (e.g., the spread of drug resistance, changing transmission intensity, risk of malaria importation into malaria-free areas, and rapid diagnostic test [RDT] failures), genetic epidemiology is increasingly being recognized for its potential to inform national malaria control programs (NMCPs). By aligning multiple scientific fields in support of NMCP priorities, "use cases" were specified to create scenarios where genetic surveillance can be informative for decision-making (2,3) and can be used to develop and implement new technologies.
Molecular markers associated with drug resistance are reliable predictors of treatment responses (2,4,5), can provide warning for emerging resistance, guide treatment policies (6), and even replace therapeutic efficacy studies in low-malaria-transmission areas (4,5,7,8). The value of these markers was demonstrated in Southeast Asia, where Plasmodium falciparum resistance to artemisinin (ART) was associated with mutations in the Kelch protein 13 gene (K13) (9)(10)(11)(12). Resistance mutations in the same gene recently emerged in Africa (13).
In addition, monitoring population genetic markers can guide control and elimination strategies by identifying drug resistance gene flow, distinguishing imported from autochthonous cases, and estimating the source of reintroduced cases in malaria-free areas (2,(14)(15)(16). Parasite genetics can identify underlying patterns of transmission and transmission intensity (17).
The gene products of P. falciparum hrp2 and hrp3 are detected by RDTs, and gene deletions resulting in false-negative RDT results were first detected in Peru 10 years ago (18). Since then, hrp2 and hrp3 deletions have been reported in many areas (19)(20)(21)(22)(23)(24), raising concerns about the viability of histidine-rich protein 2 (HRP2)-based RDTs. The WHO recommends assessing the prevalence of hrp2 and hrp3 mutants among malaria patients and changing case management strategies when more than 5% of infections contain deletions (25).
Multiple genetic markers are required to address the above-mentioned challenges and are most efficiently targeted in a single genomic tool (3,26). Whole-genome sequencing (WGS) can provide the most information content per sample; however, implementing WGS for surveillance is constrained by the limited access to high-throughput sequencers and the lack of technical and bioinformatic skills in countries of endemicity. As a result, sequencing is often performed through genomic consortia, delaying the turnaround time from sample collection to genetic data and limiting its usefulness to NMCPs. In addition, obtaining high-quality WGS data from low-density infections and dried blood spots (DBSs) remains challenging. Consequently, it is rare for genomic surveillance to be used beyond research (27)(28)(29).
Targeted next-generation sequencing (NGS) tools are emerging that combine numerous targets of interest into one workflow, being more time-and cost-efficient than WGS or traditional PCR-based approaches with amplicons detected by different techniques (10,(30)(31)(32)(33). Laboratory protocols for NGS assays are well standardized, sequencing is feasible on benchtop sequencers, and automated analysis pipelines are available (2,6), for example, the Malaria Resistance Surveillance (MaRS) protocol (34), which targets multiple drug resistance markers. While microsatellites (MSs) have frequently been used for population genetic surveillance as they are not under evolutionary pressure, these markers are not easily translated to NGS due to the short repetitive nature of the variation. Instead, single-nucleotide polymorphism (SNP) barcodes can be successfully targeted by NGS (35)(36)(37) and are informative for connectivity between populations (38, 39) and predictions of infection origins (40,41). The SpotMalaria platform combining drug resistance markers with a SNP barcode was helpful in tracking the rapid spread of resistant parasites in the Greater Mekong subregion (42).
Currently, there is no multifunctional tool that includes a combination of more than two types of markers (i.e., SNP barcodes and drug resistance, etc.) to serve several use cases. The characterization of hrp2 and hrp3 deletions still relies on PCR assays that classify deletions based on the failure to amplify targets (23). The few existing tools that combine population markers with drug resistance target short regions around validated resistance SNPs, missing the potential to detect novel resistance-associated mutations. Furthermore, many SNP panels were designed from genomes across the world and lack the resolution to study subtle patterns on a smaller geographical scale. Therefore, we designed a targeted NGS assay (Pf AmpliSeq) that combines a specifically designed barcode for the target country, 13 full-length resistance-associated genes, hrp2 and hrp3, and an ama1 microhaplotype region. The assay is applicable to DBSs and adaptable to different settings. The AmpliSeq technology (43) applies multiplex PCR to simultaneously amplify a high number of targets in a rapid procedure and allows overlapping amplicons to cover large genes.
We applied the Pf AmpliSeq assay to Peruvian samples for assay validation. P. falciparum in Peru makes a good case study as malaria elimination is on the regional agenda (44), with countries first targeting P. falciparum elimination due to its lower case load than that of Plasmodium vivax but also in fear of emerging ART resistance. South America has been a hot spot for chloroquine (CQ) and sulfadoxine-pyrimethamine (SP) resistance evolution (45). More recently, K13-mediated ART resistance emerged in Guyana (46,47), and K13-independent delayed parasite clearance after ART treatment was reported in Suriname (46,48,49). P. falciparum elimination is also threatened by the increasing prevalence and spread of hrp2 and hrp3 deletions (21,23).
Peru reduced its malaria case incidence by $40% in 2010 (Fig. 1), and after a resurgence in 2016, cases have been decreasing in the past 5 years after the introduction of focal screening and treatment (50 (51), with continued high efficacy (107), and the combination of AS plus sulfadoxinepyrimethamine (SP) was used on the North Coast. In 2015, AS1MQ1PQ became the recommended firstline treatment in the whole country (108). Peru reduced the malaria case incidence by $40% by 2010 using passive case detection (PCD), diagnosis by light microscopy (LM), and treatment with AS1MQ as the main components of the control program (109). In 2015, with the realization that PCD missed asymptomatic infections (110), focal screening and treatment were introduced, resulting in a further reduction in the malaria incidence (50). CQ1PQ, chloroquine plus primaquine; QN, quinine.
Peru to become one of the first countries in the Americas to introduce artemisinin combination therapy (ACT) in 1999 (54)(55)(56). However, ACT efficacy and drug resistance markers have not been monitored in Peru since 2009. In 1998 to 2001, up to 20% of infections in Peru had an hrp2 deletion, increasing to 53% with hrp2 deletions and 37% with deletions of both genes in 2012 to 2014 (18,19). In this study, we demonstrate the added value of molecular surveillance in Peru with different use cases using our Pf AmpliSeq assay and explore the changes in the parasite population in the past 2 decades.

RESULTS
Pf AmpliSeq design and performance. A panel of 13 P. falciparum drug resistance-associated genes (Table 1) and corresponding variants of interest, i.e., (putative) resistance-associated mutations (see Data Set S1 in the supplemental material), were included in the assay design. The selected genes included validated mutations (4, 7) or variants relevant for Peru, i.e., those previously reported or associated with historically and currently used drugs. Additional targets were the hrp2 and hrp3 genes, the apical membrane antigen 1 gene (ama1), the conserved Plasmodium membrane protein gene (cpmp) (57), and the microsatellites (MSs) poly-alpha, ARAII, TA81, and PfPK2 (58). We designed a barcode of 28 SNPs with in-country resolution in Peru, as explained in detail in the supplemental material. Primers for an AmpliSeq custom panel targeting the desired regions were designed using DesignStudio software by the Illumina concierge team. This design process makes individual amplicon testing (by PCR) before multiplexing unnecessary and resulted in two primer pools ready for library preparation. The final assay included .85% of the desired regions in 233 amplicons (Data Set S1) with various amplicon lengths (55 to 323 bp). Target regions where primer design was not possible within the constraints of the multiplex assay due to AT-rich sequences or high genetic variability were the cpmp region and a small section of the crt gene, including crt variant I218F.
We validated the assay with P. falciparum cases (n = 312) (Fig. 2) from multiple previous studies in Peru, laboratory strains (n = 5), and previously genotyped samples (n = 6) as controls. A high number of reads was generated in the assay (mean of 217,037 6 490,478 paired reads/sample after trimming low-quality reads), with a median of 99.6% (range, 3.9 to 99.9%) of trimmed reads aligning to the P. falciparum 3D7 (Pf3D7) genome. A mean of 1,336 6 3,627 paired reads (range, 20 to 43,795) were generated per library (including all samples and controls [also negative controls]). To improve the quality of the sequences (percentage of base calls with quality score above 30 [%Q 30 ]), we increased the nucleotide diversity in the run (30% GC content in the libraries) with a 20% spike-in of a PhiX library. However, while the overall %Q 30 increased, adding 20% PhiX is not recommended as it did not decrease the proportion of low-quality reads that were excluded (9.7% 6 10.6% of reads trimmed without a spikein versus 10.3% 6 7.3% with a spike-in), with a 20% reduction of the total read number.
Depth of coverage. The median amplicon depth of coverage of aligned high-quality reads past the filter (DP) was 82.9 (interquartile range [IQR], 42.9 to 116.1) (Fig. S1). Fifteen (6.4%) amplicons had low DP values (,10) (Table S1), while 6 (2.6%) had high DP values (.150) (Table S2). Among these, 4 hrp2 and hrp3 amplicons had lower DP values in the study samples, but not in the controls, due to the high prevalence of deletions in these genes. The lower DP values for other amplicons are likely due to sequence variations in primer binding sites, resulting in poor amplification and large amounts of missing data for these regions, e.g., A220S in crt. In contrast, 2 hrp2 amplicons had higher depths than the average, possibly due to cross-alignment between hrp2 and hrp3 repetitive regions. For 23/233 amplicons, no variants (i.e., only reference alleles) were detected despite good amplification and sequencing (Table S3), and these amplicons were located in conserved regions of drug resistance genes.
Uninfected controls. Primer specificity was tested with four uninfected human blood samples (negative controls), and none resulted in genotypes in the assay target region. We did observe background sequences with a very low number of reads corresponding to multiple unspecific hits to Plasmodium and human sequences; all of them were outside the assay target regions and below the filtering threshold.
Eight stored previous library preparations (including control samples and one human negative control) were included to test replicability; however, the human negative control was contaminated with P. falciparum sequences, probably acquired after library preparation and during storage; thus, replicability could not be assessed.
Parasite density limit and selective whole-genome amplification. The DP decreased while the missingness increased with decreasing parasite density in a dilution of 3D7 (6,000 to 6 parasites/mL [p/mL]) at a DNA concentration mimicking that of a DBS sample. At the lowest density (6 p/mL), the median DP was only 10, and 60% of loci were missing. Selective whole-genome amplification (sWGA) prior to library preparation improved the mean DP and the number of high-quality reads at parasite densities of #60 p/mL (Fig. S2), with 10-fold more reads at 60 p/mL and 90-fold more reads at 6 p/mL.
The reproducibility of the assay was tested, with a median difference of 4 SNPs (0.6% of SNP loci; 0.007% of 57,445 bp targeted in the assay) between sample and control replicates observed. With indels and multiallelic variants, a median difference of 50 alleles (2.8% of loci; 0.09% of 57,445 bp) was observed.
In comparison to previous results with laboratory strains and control samples, 97.6% of genotypes were accurate (Table S5). For the control samples, 7.9% additional minor alleles in heterozygous genotypes were detected by AmpliSeq, which were likely to be undetected in the SpotMalaria pipeline and WGS data due to lower sequencing depth.
The complexity of infection (COI) estimates the number of genetically distinct clones in an infection, which can be used as a proxy for transmission intensity (16,59). We applied different methods to determine the COI from NGS data, with results varying considerably by method (Table S7). Therefore, we determined the most frequent value (mode) of the COI, which showed an increasing proportion of multiple-clone infections in Peru over time (13. Use case of hrp2 and hrp3 gene deletions. Compared to previous PCR classifications, all samples (10/10) were correctly classified as "RDT failure" (deletion of both genes) versus "RDT detectable" (presence of one or both genes) using the read depths of hrp2 and hrp3 amplicons. For 6/10 samples, the classification was correct for both genes, but for 4 samples, the hrp2 classification was inconclusive (Table S8) due to discrepancies in classifications between amplicons.
With the Pf AmpliSeq assay, 94.9% (241/254) of the study samples could be classified as RDT failure or RDT detectable. The proportion of individuals with RDT failure increased from 17.7% in 2003 to 2005 to 42.6% in 2008 to 2012 and 73.3% in 2014 to 2018 (P . 0.001 by a x 2 test). While hrp3 presence versus absence could be determined for most samples (96.5%), 26.8% of hrp3-positive (hrp3 1 ) samples were inconclusive for hrp2 (Fig. S9), but since they were not hrp3 deleted, they were classified as RDT detectable.
Barcode performance. All 28 barcode SNPs were detected by the assay (Table S9), although 1 (Pf3D7_12_1127001) was genotyped as an indel. In the study samples, the population minor allele frequencies (MAFs) of the barcode SNPs ranged between 0.35 and 0.50 at 9 loci and between 0.1 and 0.34 at 12 loci and were ,0.1 at 7 loci. We did not observe the minor allele of one of the barcode SNPs (Pf3D7_05_921893) in the study samples, although it was detected in laboratory strains (n = 5) and 5/6 control samples.
Use case of transmission intensity: genetic diversity and differentiation. The similarity of the samples was explored over space (district) and time (year of collection) using principal-component analysis (PCA), which showed clustering by year rather than by collection site (Fig. S4). Therefore, the samples were divided into three time periods for subsequent analyses: Use case for connectivity of parasite populations: barcode multilocus lineages and population structure. Using the 28-SNP barcode, we identified 36 multilocus (ML) lineages with distinct barcodes and investigated their dynamics in the study population over time ( Fig. 4) and districts ( Table 2). The distinct lineages clarify the significant linkage disequilibrium (LD) in the barcode observed for all time periods and districts (Table S11). In 2003 to 2005, we observed 23 lineages, and we observed only 11 in 2008 to 2012, offering an explanation for the decrease in genetic diversity.
One lineage (155) became predominant after 2008 in coexistence with other (related) lineages within district and period (Table 2  With discriminant analysis of principal components (DAPC), which is similar to PCA but maximizes the differences between predefined populations, we increased the resolution to all biallelic variants (n loci = 772), and we were able to determine the underlying causes of the observed differentiation (Fig. 5). The highest-contributing alleles along the first axis (2003 to 2005 versus 2008 to 2018) were the drug resistance mutations dhfr C50R and N51I, mdr1 D1246Y, and dhps K540E (Table S13). Contributors to the differentiation along the second axis (2014 to 2018 and Pastaza) were seven barcode SNPs and dhfr, mdr1, coronin, and ubp1 variants.
Use case for drug resistance: artemisinin resistance-associated genes. The fulllength K13 gene was amplified with 12 overlapping amplicons. Although ART resistance-validated SNPs were detected in control samples, we did not observe these SNPs (F446I, N458Y, M476I, Y493H, R539T, I543T, P553L, R561H, P574L, and C580Y) in the study samples. We observed several low-frequency mutations in the K13 propellor region (V445A, I448K, G449C, V581I, and a premature stop codon at position 613)  (Table 3), always as minor alleles in polyclonal infections. Outside the propeller region, the K189T mutation was observed at a high frequency (63% to 84%).
Forty-four amplicons covered the ubp1 gene (91.8% of the gene), including all variants of interest. Previously reported variants (including R3138H) were not detected, but we observed a change in the predominant ubp1 haplotype. In 2003 to 2005, a haplotype of Q107L, commonly with K1193T, was most frequently observed (42.6%) ( Table 3). In later years, R1133S haplotypes (often linked with E1011K) replaced the Q107L haplotypes.
Eleven amplicons amplified the full-length coronin gene. We identified two novel mutations, V62M and V424I, and none of the in vitro-associated variants (60). V62M was seen at 20.6% in 2003 to 2005 but decreased over time (  (Table 3). While V62M is located in the WD-40 beta-propeller domain (containing the resistance-associated mutations G50E, R100K, and E107V), V424I is located just outside this region and is not expected to contribute to ACT resistance.
Use case for drug resistance: CQ and ACT partner drug resistance-associated genes. The SP resistance-associated genes were each targeted with 9 amplicons, covering 100% of the dhfr gene and 87% of dhps, including all variants of interest. For 2003 to 2005, we observed predominantly single (69.5%) and double (22.7%) dhfr mutants and wild-type (wt) dhps ($56.7%) ( Table 3). Although SP use was discontinued in 1999, after 2008, SP resistance increased, with a high frequency of triple mutants in dhfr and dhps. Consistent with previous reports (45, 61), we did not detect dhfr C59R in Peru. Therefore, the "sextuple" mutant here is different from the African sextuple mutant, with C59R instead of C50R, but with a similar superresistance phenotype (62).  Table 2.    Sixteen amplicons covered the crt gene (91.1%), missing one variant of interest (I218F) associated with piperaquine (PPQ) resistance. The CQ resistance (CQR) crt haplotypes CVMNT (CQR) and SVMNT (highly CQR) increased over time (Table 3), with 100% SVMNT in 2014 to 2018. Mutation I356V, whose contribution to CQR is unclear, was frequent (.93%) in all time periods.
Eighteen amplicons covered the mdr1 gene (87%), targeting all variants of interest associated with CQ and mefloquine (MQ) resistance (7). While the mdr1 haplotypes NDFCDD and NGFSDD were predominant in 2003 to 2005, in later years, mainly the NDFCDY haplotype was found (Table 3). There are conflicting results on the association of NDFCDY with susceptibility to MQ, artesunate (AS), and CQ (61,63). The mdr1 D144G SNP has so far been reported only in Peru (annotated as D142G previously [61]) and has been decreasing in recent years.
Copy number variations (CNVs) in the mdr1 and plasmepsin II genes associated with PPQ (64) and MQ (65) resistance, respectively, were not previously observed in Peru (66). We confirmed single copies of both genes by quantitative PCR (qPCR) in a subset of 78 samples (Fig. S8). While both genes were sequenced in our assay, the lack of increased copy numbers resulted in insufficient data to validate CNV detection by Pf AmpliSeq.
Use case for drug resistance gene flow: drug resistance and parasite lineage evolution. Between 2003 and 2005, we observed lineages with predominantly single and double dhfr mutants, wt dhps, CVMNT and SVMNT crt haplotypes, multiple mdr1 haplotypes, and the presence of hrp2 and hrp3 deletions ( Table 2). From 2008, the A scatterplot of discriminant analysis eigenvalues 1 and 2 using all biallelic SNPs in the core targeted region (excluding amplicons targeting repetitive regions and nonnuclear targets) shows the differentiation between Peruvian P. falciparum samples collected before 2008 and those collected afterward (x axis) (PC1) and a subsequent separation along the y axis (PC2) from 2014 to more recent years and along geographic distance. DAPC calculates the discriminant components using predefined populations in such a way that samples from the same population will be grouped, while it simultaneously maximizes the distance with samples from other populations. SNPs contributing most to the DAPC are listed in Table S13 in the supplemental material. DAPC was performed with 40 principal components and 7 discriminants as determined by crossvalidation. The increased resolution of the DAPC allowed the detection of greater genetic variation in samples from 2014 to 2018 (black shapes) than with the lineage analysis. majority of parasite lineages harbored sextuple dhfr/dhps mutants, crt SVMNT, mdr1 NDFCDY, and hrp2 and/or hrp3 deletions. Parasites with this dhps/dhfr/crt/mdr1 resistance haplotype and both hrp2 and hrp3 deleted were first reported in a few cases in Peru in 2006 (61) and were called the B V1 lineage. Here, we observed many parasites with the same "B V1 -type" resistance haplotype although not always with hrp2 and hrp3 deletions. With the higher resolution of the Pf AmpliSeq assay, we observed many related lineages with this resistance haplotype that differed in their barcode genotypes, the presence of hrp2 and hrp3 deletions, and coronin SNPs ( Table 2). This indicates gene exchange between lineages, in contrast to the idea of isolated clonal lineages in Peru.
The B V1 -type lineages (lineage 155 and others) seen after 2008 are most closely related to earlier lineages 158, 141, and 63 in our study samples (Fig. S7). However, the triple dhfr mutants in the B V1 lineage are not accompanied by secondary mutations (I164L and the "Bolivian repeat" [BR], a silent 5-amino-acid insertion before codon 30), indicating that they did not emerge from the dhfr double mutant lineages circulating in earlier years. In conclusion, the combination of different markers in the Pf AmpliSeq assay allowed the identification of multiple recombining P. falciparum lineages in Peru that share increasing SP and CQ resistance, as well as RDT failure, in more recent years.

DISCUSSION
We designed a new highly multiplexed sequencing assay (Pf AmpliSeq) targeting 13 antimalarial resistance genes, combined with a country-specific 28-SNP barcode for population genetic analysis in Peru and hrp2 and hrp3 genes to detect deletions that cause false-negative RDT results. The novelty and strength of this assay are in the many different types of markers multiplexed in one easy workflow, making it suitable for many surveillance use cases. Compared to other amplicon sequencing assays (30,34,67) and genetic surveillance platforms (42), the Pf AmpliSeq assay includes the highest number of drug resistance-associated genes combined with an in-country SNP barcode as well as diagnostic resistance markers.
In many remote areas, sample collection for malaria molecular surveillance is possible with DBSs only, which are easily transported at room temperature to a central laboratory for further processing and ideally stored at 220°C to maintain DNA integrity. This assay performs well on DBS samples with parasite densities of $60 p/mL and allows high-accuracy sequencing at a higher depth of coverage (100-to 1,000-fold versus 50-to 100-fold) and a lower cost (see Table S14 in the supplemental material) than WGS. At densities of ,60 p/mL, sWGA prior to the Pf AmpliSeq assay increases the number of reads but also increases the error rate and cost. In addition, the Pf AmpliSeq assay is less sensitive to DNA degradation than methods targeting long amplicons.
Initial raw data processing can be performed automatically with the software of the sequencer using the manifest file available at bio-protocol. However, here, we used a Linux-based pipeline that gave us more flexibility and insight during the validation process. Subsequent variant analysis can be performed with minimal bioinformatic skills using any software that can analyze tabulated data. An automated analysis environment would be of great benefit for the rapid output of genetic reports and easy interpretation, increasing the actionable use of data to enable programmatic decision-making.
Only nonvalidated K13 SNPs were detected at low frequencies in the Peruvian samples, suggesting that ART resistance has not reached Peru, at least in recent years (2003 to 2018). Mutations in other genes in the endocytosis pathway have been associated with decreased sensitivity to ART in in vitro studies, and changes in ubp1 haplotypes paralleled the emergence of K13 markers in Thailand (60,(68)(69)(70)(71)(72)(73)(74)(75)(76). However, the observed ubp1 and coronin mutations here are uncharacterized. Genetic variants in KIC7, Eps15, and Formin2 associated with in vitro ART resistance (70-72, 74, 77) were not yet reported when the assay design was being completed. These genes will be included in future versions of the assay.
One main advantage of the Pf AmpliSeq assay is that the panel of targeted amplicons can be easily adapted either to update the panel with newly identified drug resistance-associated genes or other makers of interest or to adapt the assay to a different country or region by exchanging the SNP barcode for in-country resolution. The assay in its current form is limited to South America. This is the first proof of principle of an NGS amplicon assay targeting hrp2 and hrp3 deletions. From the analyzed study samples, 56.3% had both hrp2 and hrp3 deleted, but this might be an underestimation since infections containing polyclonal strains with and without deletions will be classified as RDT detectable. For hrp2, 22% of samples could not be classified due to conflicting results between hrp2 amplicons. The difference in read depths among hrp2 amplicons suggests that only part of the gene is deleted, with part of the first exon still being present. As was recently done in Ethiopia (78), a better characterization of the genomic structure of hrp2 deletions in Peru will improve amplicon design. In addition, copy number calling tools could be explored for more systematic classifications of hrp2 and hrp3 deletions.
With the combination of phenotypic markers and the 28-SNP barcode in the Pf AmpliSeq assay, we were able to detect changing allele frequencies in resistance-associated genes, but we were also able to identify temporal genetic differentiation in the parasite population, alongside a decrease in genetic diversity. The barcode served to distinguish different lineages in multiple resistance haplotypes (in dhfr, dhps, crt and mdr1) circulating in recent years and enabled the investigation of drug resistance evolution.
SP and CQ resistance in South America was reportedly spread from a single origin in the lower Amazon (45), with five distinct clonal lineages circulating in Peru before 2006 (19,61,79), followed by frequent outcrossing and lineage evolution (79). With the Pf AmpliSeq assay, we observe the same SP/CQ-resistant lineages and their evolution over time, with increased resolution and additional variants detected in other genes.
After 2008, the parasite population in Peru was increasingly dominated by B V1 -type lineages, which are highly SP/CQ resistant and can frequently escape HRP2-based RDTs (HRP2-based RDTs are not commonly used in Peru). The B V1 -type lineages seem to be sensitive to ART-MQ (absence of K13 variants and the mdr1 CNV), while the high level of CQR (in crt and mdr1) could be the result of CQ treatment of coendemic P. vivax cases. Coinfections with very low P. falciparum densities (80) are frequently misdiagnosed and treated as P. vivax monoinfection (79). When the B V1 -type lineages first appeared (2011), we observed strong population differentiation and the loss of secondary dhfr mutations, characteristic of Peruvian and Bolivian SP-resistant parasites (45), which supports suggestions from others that this lineage was introduced from other parts of the Amazon (81,82), possibly Colombia (20,21) or Suriname (49). Analysis of samples from neighboring countries could shed light on the evolution and spread of resistance and hrp2 and hrp3 deletions in the region; however, genomes from South America are heavily underrepresented in publicly available genomic databases.
The COI results varied depending on the analysis method used, and it was difficult to determine which method approached the true COI. The variant calling tool used here assumes a diploid genotype with balanced within-sample allele frequencies, which is frequently not the case in malaria complex infections, and this can complicate COI analyses. Alternative haplotype-based analysis approaches (e.g., SeekDeep [83], HaplotypR [57], and DADA2 [84]) might have a greater ability to unravel complex infections. These tools use a different error-handling approach and thereby can resolve more fine-scale variation and are more suitable for high-resolution relatedness analysis (in combination with identity by descent [IBD] [85]) but are less suitable for confidently calling drug-resistant variants. However, applying these methods to the Pf AmpliSeq data is complicated by the overlapping amplicons in drug resistance genes and requires further validation. While it is possible to perform IBD analysis with the AmpliSeq data, in this study with sampling by convenience, it was deemed less appropriate.
Half of the SNPs in the barcode became fixed in the most recent years, reflecting a true decrease in diversity (as the sample size was sufficient) with closely related clonal lineages. Although our study sample size is comparable to those of previous reports from Peru (e.g., 220 samples in 1999 [79], 104 samples from 1999, and 62 samples from 2006 to 2007 [61]), it represents a limited proportion of the Peruvian P. falciparum population. In the future, a more systematic sampling approach, ideally performed at regular times and covering all regions where malaria is endemic, would increase genetic surveillance intelligence to inform malaria control and elimination strategies in Peru.
Concluding remarks. We have demonstrated the validity of the Pf AmpliSeq assay for several use cases in malaria molecular surveillance in Peru. With this tool, we have identified P. falciparum lineages with an increasing accumulation of drug and diagnostic resistance-associated mutations that can become a serious threat to P. falciparum elimination in the Amazon region. The Pf AmpliSeq assay also has the potential to perform well to characterize malaria outbreaks in Peru (81,86) and predict the origin of imported infections (40,41,87).
In Peru, we have established a molecular surveillance network (GENMAL) with the aims of sharing tools, samples, and data; coordinating malaria research efforts; and stimulating dialogue among malaria stakeholders in the region. The network promotes the implementation and use of NGS tools to inform decision-making for malaria elimination and is a platform to support the efficient translation of results into policy and practice. The Pf AmpliSeq assay can be adapted to serve malaria molecular surveillance in the Amazon region and South America, filling the void of genetic data from this region.

MATERIALS AND METHODS
Study sites, samples, and controls. P. falciparum qPCR-positive samples (n = 312) from previous studies were selected based on geographical representation (Fig. 2) and parasite density ($100 p/mL by qPCR). Samples from 2003 to 2017 (n = 269) were from published (18,88)  P. falciparum laboratory isolates from cultures (91) and uninfected human blood spotted onto DBS cards were included as controls (details are listed in Table S15 in the supplemental material). Control samples (n = 6) with known drug resistance-associated variants from a previous study in Vietnam (92) and genotyped using the SpotMalaria pipeline and WGS were also included. DNA from these controls (100 mL or 3 DBS punches) was extracted using a QIAamp DNA minikit (Qiagen) according to the manufacturer's instructions. sWGA was performed as previously described (93). Genotypes of the laboratory strains were reported previously (66,(94)(95)(96).
SNP barcode design. A barcode of 28 biallelic SNPs was designed as described in detail in the supplemental material. Briefly, MalariaGEN Plasmodium falciparum Community Project data (97) were used to select SNPs (0.35 # MAF # 0.5) contributing to between-country differentiation using DAPC (98). Two SNPs per chromosome were selected with priority for synonymous SNPs with low pairwise LD.
Pf AmpliSeq workflow and sequencing. Library preparation was performed using the AmpliSeq library plus for Illumina kit (Illumina), AmpliSeq custom panels (Data Set S2), and AmpliSeq CD Indexes (Illumina) according to the manufacturer's instructions. The DNA concentrations for the controls and 25 samples were determined using the Qubit v3 high-sensitivity DNA kit (Invitrogen). Controls were diluted to 1 ng input DNA to mimic DBS samples. Study samples (mean DNA concentration of 6.1 6 0.3 ng/mL) were not diluted prior to library preparation. For each sample, a single library was prepared, except for a subset of samples to determine reproducibility. Target regions were amplified from 7 mL DNA (1 to 150 ng) with adjusted cycling conditions (1 cycle of 99°C for 2 min, 21 cycles of 99°C for 15 s, and 1 cycle of 60°C for 8 min) in two reactions and subsequently combined for final library preparation according to the manufacturer's guidelines. Sample libraries were quantified using the Kapa library quantification kit for Illumina platforms (Kapa Biosystems), diluted to 2 nM with low-Tris-EDTA buffer, and pooled to generate a final pool containing 2 nM each library. The denatured library pool (diluted to 18 pM) was loaded onto a MiSeq system (Illumina) for 2 Â 300-bp paired-end sequencing (MiSeq reagent kit v3; Illumina). A detailed protocol is available at https://doi.org/10.21769/BioProtoc.4621. A 20% PhiX spike-in (Illumina) was used in one sequencing run to determine the effect on the sequencing quality and the trade-off in coverage. The generated demultiplexed FASTQ files were processed with an in-house analysis pipeline on a Unix operating system computer as described in more detail in the supplemental material.
For the final analysis, we selected 254/312 (81%) samples with good-quality data (,50% missing genotype calls and mean coverage of . 15) and retained only one library of replicates (with the lowest missingness). A MAF of 0.01 with a precision of 0.02 (2%) can be detected with sample size of 96 samples, while a MAF of 0.01 with 1% precision can be detected (as determined with https://epitools.ausvet .com.au/oneproportion).
Detection of mdr1 and pm2 copy number variations by PCR. CNVs in pm2 and mdr1 were determined for a subset of samples (n = 78) selected randomly from all time periods and districts using qPCR (92). Samples and controls were tested in triplicate, with 20% of the samples being retested for reproducibility and 100% of the results being in agreement.
Analysis of the complexity of infection. The COI was estimated using Real McCOIL categorical and proportional methods (99) in R using the following subsets of variants: (i) all biallelic SNPs, (ii) 25 barcode SNPs, and (iii) "core variants," biallelic variants excluding repetitive regions (i.e., hrp2, hrp3, and MSs) and mitochondrial and apicoplast regions. Default settings were used for a fixed error; for the categorical method, an upper bound of 10 and an initial COI of 5 were used, and for the proportional method, an upper bound of 15 and an initial COI of 7 were used. We allowed 50% missingness for biallelic SNPs, 20% for the barcode, and 40% for core variants.
We also estimated the proportions of single-and multiple-clone infections based on the number of heterozygous genotypes in (i) the 28-SNP barcode, (ii) biallelic SNPs in ama1, and (iii) core variants. Samples with $1 heterozygous genotype were considered to contain multiple clones (COI $ 2). Finally, we estimated the COI from the MS-amplified regions. As the different methods generated divergent results, the mode (most frequent value for single versus multiple clones) was determined as the final measure for each sample.
Validation of detection of hrp2 and hrp3 deletions. The performance of Pf AmpliSeq for hrp2 and hrp3 deletion detection was assessed using previous PCR-based hrp2 and hrp3 genotypes of 10 included samples (31). The presence or absence of both genes was determined using the mean read depth of hrp2 and hrp3 amplicons, as explained in more detail in the supplemental material. Due to the repetitive nature and homologies between the hrp2 and hrp3 genes, we used a conservative cutoff value, resulting in a "gray zone" where deletion/presence was left inconclusive when the majority of amplicons were not in accordance (Fig. S9). A final variable for RDT failure (absence of both hrp2 and hrp3) versus RDT detectable (presence of hrp2 and/or hrp3) was created, allowing the classification of samples that were inconclusive for one of the two genes if the other gene was present.
Population genetic analysis. Population genetic analyses were performed using the 28-SNP barcode, excluding samples missing .7 SNPs (25%), unless specified otherwise. Samples with heterozygous genotypes were included, justified by the little genetic differentiation (F ST = 0.05) between samples with single-clone barcodes and samples with heterozygous barcode genotypes. Genetic differentiation was measured as F ST (100), G9 ST (101), and Jost's D (102) using 1,000 bootstraps with the R package diveRsity (103). PCA was performed on the genotype matrix of within-sample allele frequencies using the prcomp function (R stats package v4.0.5) using core variants. Prior to PCA, missing genotypes were replaced by the mean allelic frequency at a locus in all samples. The expected heterozygosity (H e ) was calculated (adegenet package [104,105] in R) using diploid barcode genotypes. A Wilcoxon signed-rank test with continuity correction was used to compare the mean observed heterozygosity (H obs ) to the H e . LD was measured as the standardized index of association (rD) in a multilocus analysis using 999 resamplings (method, permutation of alleles), using the poppr package v2.8.6. (106) in R. ML lineages were defined by grouping isolates with similar barcodes using the Hamming distance and clustered based on the maximum distance (farthest neighbor) using poppr. DAPC (98) was performed with cross-validation, and the associated allele loadings for the first four components were determined using the adegenet package in R.
Analysis of genetic variants. Barcode allele frequencies were calculated from allele depths to reflect population allele frequencies in complex infections, as described in the supplemental material. Haplotypes were created by combining genotypes of variants of interest (listed in Data Set S1). Frequency tables with confidence intervals were created with the freqtables R package.
Data availability. Sample metadata, drug resistance and hrp2 and hrp3 haplotypes, barcodes, lineages, locations, and dates, etc., are accessible at https://microreact.org/project/aV7RHNCBmG3sJ2rwzchE4k -peru-2003-2018-molecular-surveillance-with-p-falciparum-ampliseq-assay. Raw data (FASTQ files) are available at the SRA under BioProject accession number PRJNA855317, and individual library accession numbers are listed in the supplemental metadata file at Microreact (see the URL mentioned above). A detailed protocol of the library preparation procedures has been deposited at https://doi.org/10.21769/BioProtoc.4621. It is anticipated that this link will be active by 5 March 2023; until that time, all the data will be available from the corresponding author upon request. Variant files (vcf) and scripts are available upon request. All other data are included in the manuscript and supplemental material.

SUPPLEMENTAL MATERIAL
Supplemental material is available online only.