Plasmodium vivax genomic surveillance in the Peruvian Amazon with Pv AmpliSeq assay

Background Plasmodium vivax is the most predominant malaria species in Latin America, constituting 71.5% of malaria cases in 2021. With several countries aiming for malaria elimination, it is crucial to prioritize effectiveness of national control programs by optimizing the utilization of available resources and strategically implementing necessary changes. To support this, there is a need for innovative approaches such as genomic surveillance tools that can investigate changes in transmission intensity, imported cases and sources of reintroduction, and can detect molecular markers associated with drug resistance. Methodology/Principal findings Here, we apply a modified highly-multiplexed deep sequencing assay: Pv AmpliSeq v2 Peru. The tool targets a newly developed 41-SNP Peru barcode for parasite population analysis within Peru, the 33-SNP vivaxGEN-geo panel for country-level classification, and 11 putative drug resistance genes. It was applied to 230 samples from the Peruvian Amazon (2007–2020), generating baseline surveillance data. We observed a heterogenous P. vivax population with high diversity and gene flow in peri-urban areas of Maynas province (Loreto region) with a temporal drift using all SNPs detected by the assay (nSNP = 2909). In comparison, in an indigenous isolated area, the parasite population was genetically differentiated (FST = 0.07–0.09) with moderate diversity and high relatedness between isolates in the community. In a remote border community, a clonal P. vivax cluster was identified, with distinct haplotypes in drug resistant genes and ama1, more similar to Brazilian isolates, likely representing an introduction of P. vivax from Brazil at that time. To test its applicability for Latin America, we evaluated the SNP Peru barcode in P. vivax genomes from the region and demonstrated the capacity to capture local population clustering at within-country level. Conclusions/Significance Together this data shows that P. vivax transmission is heterogeneous in different settings within the Peruvian Amazon. Genetic analysis is a key component for regional malaria control, offering valuable insights that should be incorporated into routine surveillance.


Introduction
Malaria continues to impact millions of people around the world, particularly those living in low-and middle-income countries.P. vivax infections make up 18.0% to 71.5% of cases in regions outside Africa, with the highest proportions in the Americas [1], mainly affecting populations living in remote areas with poor access to healthcare services.The Amazon Basin reports the majority of malaria cases in the Americas, including Peru (> 26000 malaria cases in 2022, 88% caused by P. vivax) [2].
Challenges to malaria control and elimination in the Amazon region are diverse.P. vivax infections in the region are often asymptomatic (58-93%), below the detection threshold of microscopy (61-96%, sub-patent infections) [3], while a significant proportion of these infections carry gametocytes, the parasite stages responsible for transmission to the vector [4].Moreover, the highest P. vivax incidence occurs in hard-to-reach areas, such as remote indigenous communities [5], and cross-border regions [6] where human mobility, often linked to economic activities, facilitates transmission and reintroduction [7].In addition, low density infections typically require PCR-based tools for diagnosis, which are difficult to implement in remote areas.Although isothermal amplification assays -such as loop-mediated isothermal amplification (LAMP)-have the potential for point-of-care diagnosis [8][9][10], they have not been broadly implemented.Therefore, these asymptomatic and sub-patent infections typically are undiagnosed and untreated, maintaining malaria transmission [4].
Despite these P. vivax specific challenges, the region has made significant efforts to control malaria, with a reduction of the number of cases by 60% from 2000 to 2021 [1].Therefore, many countries are moving towards malaria elimination in the next 5-10 years by implementing National Malaria Control Programs (NMCPs).In Peru, the Ministry of Health (MINSA) launched a National Malaria Elimination Program (NMEP) in 2022, intending to reduce the number of cases by 90% in 2030 [11].
P. vivax transmission is not uniform across the Peruvian Amazon region and varies significantly between different ecological niches suggesting that transmission is also influenced by ecological and environmental factors such as temperature and rainfall [12].Since efforts to control P. vivax malaria in Peru must consider its complexity and heterogeneity of transmission, well-structured and integrated genomic surveillance systems can significantly contribute to pinpoint priorities and guide NMCP/NMEPs decision-making [13][14][15].These systems empower countries to detect areas with the greatest need for control measures, identify program deficiencies and suboptimal strategies, and determine the amount of connectivity between regions.Through genomic analyses, the P. vivax parasite population of Latin America has been described as a distinct population from the rest of the world, presenting relatively lower genetic diversity due to relatively recent introductions in the region [16][17][18].In the case of the Peruvian Amazon, Iquitos city acts as a source from where parasites spread to other areas through human mobility related to socio-economic activities [19,20].Additionally, there is substantial gene flow between parasite populations from geographically distant areas in the region [21].
Previously, we have developed a highly-multiplexed amplicon sequencing (AmpliSeq) assay for malaria genomic surveillance of P. falciparum in Peru (Pf AmpliSeq Peru, [22]) and P. vivax in Vietnam (Pv AmpliSeq v1 Vietnam, [23]).Our assays include validated and candidate genes associated with antimalarial resistance, SNP-barcodes to serve different functions, i.e. a global P. vivax SNP-barcode for prediction of origin, and country or region-specific SNP-barcodes for population genetics analysis such as transmission dynamics and connectivity.The AmpliSeq assays have the advantage, in addition to cover a large panel of genetic variants of interest, to be easily adaptable for diverse epidemiological contexts, and have been successfully used to identify temporal changes in the parasite population (before and after NMCP interventions), monitor drug resistant markers, assess connectivity, and identify sources of reintroduction [22,23].
In this study, we modified the Pv AmpliSeq v1 Vietnam targeted NGS assay to Pv Ampli-Seq v2 Peru, incorporating a newly designed Peru-specific SNP-barcode (named Pv Peru barcode from here on) with in-country resolution instead of the Vietnam specific barcode.The Peru barcode was validated in silico using a published dataset of 399 genomes from Latin America [18].
We analyzed 230 samples collected between 2007 and 2020 across 11 districts in the Peruvian Amazon and generated new genetic data as baseline information for molecular surveillance in Peru.While previous population genetic analysis in the Loreto region have focused in Maynas Province, we have included samples from remote communities in 2 additional provinces: Loreto and Mariscal Ramon Castilla.This tool provides valuable information to effectively guide malaria control and elimination efforts in Peru by facilitating the generation of high-quality data and strategic information for malaria surveillance in Latin America.

Ethics statement
Samples from previous studies with written consent for future use for malaria research were used with protocols registered in the Decentralized System of Information and Follow-up to Research (SIDISI) of the University Directorate of Research, Science and Technology at Universidad Peruana Cayetano Heredia (UPCH), to be then evaluated by UPCH Institutional Research Ethics Committee (CIEI) prior to its execution (SIDISI codes 52707, 61703, 66235, 101518, 102725).In Yavari, Ramon Castilla and Trompeteros districts, samples were collected as part of routine surveillance activities by MINSA.MINSA authorities transferred these samples to the UPCH team for research activities (SIDISI project 102725).Secondary use of all samples for the purpose of genomic surveillance of malaria was approved through the Institutional Review Board of the Institute of Tropical Medicine Antwerp (reference 1417/20).

SNP selection for barcode design
To design a SNP barcode with in-country resolution in Peru, raw whole genome sequencing data (Fast Q files) of P. vivax isolates from Peru generated in-house (n = 30 from [23,24]) was combined with online available Peruvian P. vivax genomes (n = 100 from [25][26][27][28]) and jointly genotyped after variant calling as described elsewhere [18].Briefly, FASTQ files were aligned to the PvP01 reference genome version 46 from PlasmoDB [29] and variants were called using GATK4 HaplotypeCaller.Allele frequencies of the selected SNPs in the design were assessed in the global genome dataset (n = 1474) [18], as well as in genomes from Latin America (n = 399) from that dataset.
To design the SNP Peru barcode, unlinked biallelic SNPs were filtered from the genomic dataset by LD-pruning in 5-6 iterations by scanning over the genome in 500 bp windows to remove uninformative SNPs with pairwise LD>0.2 using the python package scikit-allel.Subsequently, the contributions of the SNPs to genetic clusters were determined using discriminant analysis of principal components (DAPC) [30] with K-means inferred populations (n = 10) using the adegenet package in R. DAPC was performed multiple times (n = 20) with cross-validation, and associated allele loadings between simulations were compared to determine the most contributing alleles.Finally, 49 SNPs with high allele loadings in the DAPC were selected with spread over the chromosomes (aiming for 2-4 SNPs per chromosome, which gave good population differentiation in our previous studies [22,23] and distance (>500 bp) from drug resistant amplicons targeted in the assay to avoid linkage).The Illumina Concierge team (Illumina, San Diego, USA) used DesignStudio software with the PvP01 reference genome to add amplicons for the new barcode to the existing Pv AmpliSeq v1 Vietnam custom panel design [23] without the Vietnam SNP-barcode.Out of the 49 selected SNPs, amplicon design was successful for 41 SNPs, resulting in the Pv Peru Barcode.

Samples and study settings
In order to generate baseline genetic surveillance data that includes parasite populations from peri-urban areas, remote areas and border communities across a wide area in the Peruvian Amazon, we selected samples retrospectively from prior studies in Peru.P. vivax qPCR-positive samples (n = 230) were selected based on geographical origin and parasite density.We applied a parasite density cut-off of �5 parasites/μL by qPCR, to prioritize samples with sufficient template for expected successful library preparation as determined previously [23].Samples were from 11 districts in 3 provinces in the Loreto region: Loreto, Maynas, and Mariscal Ramon Castilla (Fig 1 ).

DNA extractions and quantification
DNA from all samples was extracted using the E.Z.N.A Blood DNA Mini Kit (Omega Bio-tek, Georgia, USA) or QIAmp DNA Blood Mini Kit (Qiagen, Germany) following the manufacturer's instructions.For DNA extraction using the E.Z.N.A kit, we used 40μL of whole blood, or a ~0.72 cm 2 piece of dried blood spot (DBS); DNA was eluted in a final volume of 50 μL.For DNA extraction using the QIAmp kit, we used two punches of DNA (0.6 cm diameter); DNA was eluted in a final volume of 100μL.P. vivax parasitaemia was quantified by 18S rRNA qPCR [35] and/or Pv mtCOX1 qPCR [36].

AmpliSeq library preparation and bioinformatics
Pv AmpliSeq library preparation was performed using AmpliSeq Library PLUS for Illumina kit (Illumina), AmpliSeq Custom Panel design (i.e. the primer pools as per the new design with amplicons of 158-384 bp lengths; S2 Table ) and AmpliSeq CD Indexes (Illumina) as per the manufacturer's instructions.Library preparation was performed on 7 μL of undiluted DNA as previously described [23], following similar steps as the protocol for P. falciparum AmpliSeq [37].Briefly, target regions were amplified in two reactions, and then combined for final library preparation.Libraries were quantified using Qubit v3 High sensitivity DNA kit (Invitrogen, Massachusetts, USA), and subsequently diluted to 2nM with low Tris-EDTA buffer, and pooled (pooling 144 libraries per sequencing run).Denatured library pool (7 pM) was loaded on a MiSeq system (Illumina) for 2x300 paired-end sequencing (Miseq Reagent Kit v3, Illumina) with 1-5% PhiX spike-in (Illumina).
FASTQ files were processed with an in-house analysis pipeline on a Unix operating system computer, as previously described [23], where trimmed reads were aligned to the PvP01 reference genome using Burrows-Wheeler aligner followed by GATK haplotypecaller generating a jointly-called VCF file with variants (SNPs and INDELs) detected in the targeted regions (scripts are available at https://github.com/Ekattenberg/Plasmodium-AmpliSeq-Pipeline).Variants were hard filtered (QUAL>30, overall DP>100, MQ>50, QD>1.0,SOR<4, GT depth >5) and annotated with SnpEff (v4.3T) [38], resulting in 3862 high quality genotypes (incl.all variant types, e.g.SNPs and indels).Per locus filtered depth of coverage (format field DP in the vcf) was used to calculate the median depth of all loci per sample or per amplicon.Aligned coverage was calculated as the number of bases passed filter divided by the number of bases (59815 bp) targeted in the Pv AmpliSeq v2 Peru.
A total of 274 unique samples from 5 different projects, 84 replicates and 69 controls were attempted with the Pv AmpliSeq assay (Fig A in S1 Text).For the analysis of genomic surveillance use cases in Peru, we included 230/274 (83.9%) samples (Fig 1) with good quality data (<25% missing genotype calls for all variants, mean coverage >15), and retained only one library of replicates (with lowest missingness).The size of the bubbles indicates the number of samples collected in a specific year and district. https://doi.org/10.1371/journal.pntd.0011879.g002

Data analysis
Different SNP subsets were used in the data analysis as specified for each analysis below.The Pv Peru barcode was first evaluated in silico in a WGS dataset from [18], using the barcode SNPs and a larger set of biallelic SNPs in the WGS data that were also present in the newly generated Pv AmpliSeq Peru data (n = 730).In the WGS data not all the SNPs from the Pv Ampli-Seq Peru (n = 2909 in total) were present in this dataset, partly as the MAF filter applied in that dataset could have removed SNPs that are rarely observed in other regions in the world.For the analysis of the Pv AmpliSeq Peru data, we used the 41 SNP loci.In addition, while the Pv Peru Barcode is appropriate to be used for population genetic analyses, higher geographical resolution with the AmpliSeq assay data can be obtained by using all biallelic variants or SNPs in the vcf (n loci = 3862, including n SNP = 2909).Using all biallelic variants in the assay also allows the use of models such as identity-by-descent, which requires at least 200 biallelic SNPs [39].We use the biallelic SNPs as we previously saw these variants had a lowest error rate compared to indels [22].However, for the DAPC analysis indels are also included, as these can be biologically relevant variants.Genetic diversity, expressed as expected heterozygosity (He), was calculated using polymorphic barcode (40/41 SNPs) genotypes from the vcf with the adegenet package in R [40,41].Nucleotide diversity (pi) was determined by sliding across the target regions in the genome in 500-bp windows using vcftools and a vcf file with all biallelic SNPs detected by the assay.To compare the median of the genetic diversity parameters across different districts and periods, Kruskal-Wallis rank sum test was performed.For pairwise comparisons between groups, Wilcoxon test with the False Discovery Rate (FDR) or Benjamini-Hochberg procedure as correction for multiple testing was used.Statistical tests were performed with the R package stats.Pvalues < 0.05 were considered significant.PCA and DAPC with cross-validation was performed to infer population structure based on haplotypes across districts and years [30].Associated allele loadings for the first four components in the DAPC were determined.
Genetic differentiation, expressed as fixation index (F ST ), was calculated using all biallelic SNPs (n = 2909) detected by the assay with the R package hierfstat [42].Within-host infection complexity was assessed using within-sample F-statistic (Fws) with the R package moimix [43] using all biallelic SNPs (n = 2909) detected by the assay.Fws � 0.95 was considered a proxy for a monoclonal infection as in Auburn et al. 2012 [44].
We created a list of variants of interest (S3 Table ) that included variants in genes reported in the literature as potentially associated with P. vivax antimalarial resistance.The list was supplemented with non-synonymous variants detected in the target genes that contributed to the variation in the DAPC.Haplotypes were created by combining genotypes of variants of interest.
To measure pairwise identity-by-descent (IBD) between samples, PED and MAP file formats were generated from VCF data using VCFtools.The level of IBD-sharing was calculated employing the isoRelate package in R [45], following previously described settings [23].Specifically, we used all biallelic SNPs (n = 2909) identified by the AmpliSeq assay, applying filtering criteria of the MAF = 0.001 and SNP and individual missingness thresholds (0.6 and 0.3, respectively), resulting in 1219 SNPs for subsequent IBD analysis.Furthermore, IBD thresholds were set to include a minimum of 15 SNPs per segment and a segment length of 3000 bp, aimed at mitigating false-positive calls using an error of 0.001.Network plots (at thresholds of 99%, 50% and 10% IBD) were created using the igraph package in R to visualize the relatedness between samples (95% and 50%) and connectivity between districts (10%).
A likelihood-based classifier was used to predict the origin of P. vivax isolates using the 33-SNP vivaxGEN-geo barcode, using the vivaxgen geo framework (https://geo.vivaxgen.org/)and reference dataset [46].

Design and in silico evaluation of the Pv Peru-barcode using WGS
We designed a 41-SNP Pv Peru Barcode, with in-country resolution (i.e.able to separate between distinct P. vivax populations in Peru based on variability of allele frequencies between districts and provinces) using P. vivax genomes from Peru (n = 130) [18].SNPs in the barcode were selected based on their contribution to geographically distinct genetic clusters in the DAPC.Allele frequencies (AF) of the 41 SNPs in the barcode were evaluated in silico in P. vivax genomes (n = 1474) originating from 31 countries, including 130 Peruvian genomes [18].Minor allele frequencies (MAF) of all SNPs varied between 0.01-0.49,with a median of 0.16 (S4 Table ).Within genomes from Peru, most SNPs (63%) had a MAF>10%, with 10 alleles observed at MAF<5%.The MAF in isolates from multiple countries in Latin America (including Mexico, Panama, Colombia, Brazil) were similar (median MAF 0.09, range 0.01-0.49) to the MAF found in Peru.The Pv Peru Barcode was capable of separately clustering the Latin American population in PCA analysis (Fig 3A and 3C).The resolution for population structuring increased when using n = 753 loci (i.e. the overlap in SNPs between the WGS data and all biallelic SNPs detected in the Pv AmpliSeq v2 Peru target region, mimicking the resolution for population structure achievable with the assay), separating the samples from different countries with distinct subpopulations (Fig 3B and 3D), matching patterns previously observed in the whole genome dataset (Fig B in S1 Text).Altogether, this demonstrates a wider applicability of the Pv AmpliSeq v2 Peru assay to the Latin American region and the optimal resolution of the Pv Peru barcode to differentiate parasite populations by country in Latin America.

Barcode and Pv AmpliSeq v2 Peru assay performance
The Pv Peru Barcode was experimentally validated with the Pv AmpliSeq v2 Peru assay using P. vivax isolates (n = 230) from several regions in Peru (Figs 1 and 2).All 41 targeted SNPs in the barcode were amplified successfully, with a median MAF of 0.24 (range: 0.01-0.49,mean ± standard deviation: 0.25 ± 0.14) in all samples from Peru.For the SNP PvP01_13_v1_32509, only the reference allele was detected.Most of the samples performed well with the Pv Peru barcode: 185 (80.4%) and 213 (92.6%) samples had <10% and < = 25% missing genotypes at the barcode positions, respectively.There were 7 (3%) samples without barcode data due to an experimental error (Pv Vietnam barcode primers from the Pv Ampli-Seq v1 Vietnam assay were used instead of the Pv Peru barcode), however the samples were kept for the variant of interest and prediction of origin analyses as they passed the criteria for all positions.We detected a median of 14 (range 4-28) biallelic SNPs per barcode amplicon in all samples, which could potentially be used as microhaplotypes in a haplotype-based approach (S4 Table ).The Pv AmpliSeq v2 Peru assay generated a high number of reads per sample (median 119033, interquartile range [IQR] 59535-243132 reads/sample after trimming low-quality reads), with a median of 99.4% (IQR 97.7-99.9%) of reads aligning in pairs to the PvP01 reference genome.The median depth of coverage of aligned high-quality reads past the filter (DP) was 335 reads (IQR 121 to 859) (Fig C in S1 Text).There were 4 (1.8%) amplicons with low mean DP-values (<10) (pvama1_4, pfdhfr_8, pvmdr1_1, and PvP01_14_v1_2841138), and 1 amplicon (0.44%) had high mean DP-values (>200) (pvcrt_11).
Primer specificity was confirmed using uninfected human blood samples (n = 2) and a 3D7 laboratory strain (n = 3) as negative controls.The few sequencing reads generated mapped predominantly outside the assay target regions, and only a mean 1.1% ± 0.8% of all variants in these regions were called.All called variants were below the filtering and inclusion thresholds.In addition, previously Pv AmpliSeq v1 Vietnam genotyped samples (n = 24, from [18,23]), were repeated in the Pv AmpliSeq v2 Peru assay.When comparing genotypes at variants of interest loci that were included in both the AmpliSeq v1 and AmpliSeq v2 versions (n = 26 loci), the majority of pairwise comparisons between the replicates, resulted in identical genotypes.Except in the case of 5 genotypes (in 3 samples), where mixed genotypes (more than one allele present) were not detected in both replicates.These samples also had more than one allele at other loci (indicative of multiple strain infections), and therefore 10/15 mixed genotypes were identified in both replicates.

Spatial-temporal patterns in transmission intensity
Complexity of infection (COI) and genetic diversity-expressed as expected heterozygosity (He) and nucleotide diversity (pi)-were used as a proxy of transmission intensity.Both parameters were compared between parasite isolates from different districts and years (Fig 3).He was estimated using the SNPs-barcode positions only, and pi was measured over all biallelic variants detected in the samples.Moderate levels of He were found in all districts, except in Yavari, which had a significantly lower diversity (mean He = 0.0121; p < 0.0001) (Fig 4A ).Pi was significantly higher in Iquitos compared to its surrounding districts Mazan and San Juan Bautista (adjusted p<0.0001).In addition, similar to the observed patterns in He, pi was significantly lower in Yavari (adjusted p<0.0001) compared to all other districts (Fig 4C

Population structure and connectivity
The geographic differentiation and structure of parasite populations was investigated at the district level using all variants detected (n loci = 3862, including n SNP = 2909) to increase the resolution for fine scale population structure detected with the 41 Pv Perubarcode SNPs.Estimation of pairwise genetic differentiation (Fst) using all biallelic SNPs (n SNP = 2909) showed little differentiation (Fst: 0.02-0.07)between districts; except Yavari, which was highly differentiated (Fst: 0.39-0.55)from all other districts and in particular from Iquitos (Fig 5A).
We explored the population structure using DAPC and all variants detected with the AmpliSeq assay (n loci = 3862).The first two components (PC) separate the communities in Maynas province from remote communities in Loreto and Mariscal Ramon Castilla provinces: Yavari (PC1), Trompeteros (PC2) and Ramon Castilla (PC2) ( the contribution of these genomic regions to the population structure with sufficient resolution for within-country analysis in Peru.Highly-contributing alleles in genes of drug resistance interest, especially pvmdr2 and upstream variants of pvcrt, were found, including both missense mutations, potentially under drug-induced selection pressure, as well as synonymous mutations, which do not impact the phenotype but reflect the population genetic background (S5 Table ).
We assessed the connectivity of parasites in the Peruvian Amazon across space and time by measuring the genetic relatedness between isolates.We analyzed the pairwise IBD between samples within and between districts and years (Fig 6 and

Variants of interest
We investigated the haplotypes in the genes that have a potential association with drug resistance in P. vivax and orthologue species as previously described [23].Haplotypes were   ).In pvdhfr, we observed the pyrimethamine-resistance associated mutations S58R (125/ 230, 54.3%) and S117N (3/230, 1.3%), alongside other uncharacterized mutations, including S58K (S6 Table ).Two observed different nucleotide changes (codons AGA and AGG) resulted in the S58R amino acid change in pvdhfr.The A553G mutation in pvdhps and mutations F57L, T61M, S117T in pvdhfr were absent from the samples from Peru.In pvmdr1, we observed the CQ resistance-associated mutations Y967F (CQ and AQ) and F1076L (CQ) in 15.2% (35/230) of samples in Peru, always combined in the same haplotype that also carried the S698G mutation (S6 Table ).
We also investigated the variability in the antigenic pvama1 gene, with 6 observed haplotypes of 6 non-synonymous variants detected in the amplicons targeting the highly variable regions in this gene (S6

Predicting the origin of infections
We predicted the origin of samples from different districts in the Loreto region using the SNP vivaxGEN-geo barcode included in the panel.As expected, most samples from Mazan, San Juan Bautista, Iquitos and Trompeteros were predicted to originate from Peru (predicted value ranging between 50-75%), with a small proportion of samples with principal predictions to originate from Brazil (predicted value ranging between 22-35%) and Colombia (predicted value ranging between 3-25%) ( Fig 7).Samples from Yavari, Ramon Castilla and San Pabloall border communities-were more similar to Brazilian isolates ( Fig 7).

Discussion
In this study, we designed a Pv Peru Barcode capable in combination with the Pv AmpliSeq panel [23] to differentiate parasite populations from not only Peru, but the wider region of Latin America at a high resolution.We successfully applied the Pv AmpliSeq v2 Peru assay to a retrospective sample collection (n = 230) from 11 districts in the Peruvian Amazon.We observed marked differences between peri-urban, remote and border communities, with an overall heterogenous P. vivax population with high diversity in the Loreto region, but moderate to low diversity at the community level.This pattern of low-local but relatively highregional genetic diversity has been described previously using microsatellites [47] and is likely caused by the effect of random genetic drift in small populations that remain relatively isolated from each other.Rare alleles disappear (decreased local diversity translated in a reduction of He, and high relatedness IBD), while differentiation between sites increases.In contrast, in peri-urban communities close to Iquitos -the regional capital-a relatively high genetic diversity is observed with connectivity within and between districts and a temporal drift of the parasite population, in concordance with previous reports [19,48].
The observed connectivity between the peri-urban settings in Maynas province is likely due to economic activities and travel to the economic centers in the areas [49].The presence of malaria corridors in the Peruvian Amazon due to human mobility and other socioeconomic factors has been previously reported [7,21].The effects of human movement on malaria transmission in peri-urban areas of the Peruvian Amazon have been well-studied, though less is known about how migration and mobility affect the spread of malaria to and from remote indigenous and border communities.
In the remote isolated area of Trompeteros (district in Loreto province and inhabited by an indigenous population), a genetically distinct parasite population was observed with moderate diversity, high relatedness between isolates in the community and limited connectivity to other districts in Loreto province.While in the remote community of Yavari district (in Mariscal Ramon Castilla province and along the border with Brazil), a highly differentiated clonal P. vivax population was observed that carried distinct haplotypes in drug resistant genes and pvama1.This population was predicted to be more similar to Brazilian isolates and showed little connectivity with populations in the other provinces in Peru.This highly differentiated clonal population may have been introduced from Brazil causing an outbreak in Yavari.Indeed, 22% (36/163) of samples collected from symptomatic patients in the community, tested positive for P. vivax by PCR analysis.However, due to the limited epidemiological data at that time, an outbreak could not be officially confirmed.However, in addition of typically clonal P. falciparum outbreaks [50][51][52], P. vivax outbreaks have also been reported in Peru The origin of samples was predicted using the vivaxgen geo framework (https://geo.vivaxgen.org/),which returns 3 predictions of origin for each sample with a score for each prediction.Here we plot a heatmap of the frequency (y-axis) of the highest-scoring prediction of origin for the samples from each district of the sample collection site (x-axis).The majority of samples from Mazan, San Juan Bautista, Iquitos and Trompeteros were predicted to originate from Peru, while bordering communities Ramon Castilla, San Pablo and Yavari were predicted to originate from Brazil.
https://doi.org/10.1371/journal.pntd.0011879.g007[32].An alternative explanation is that, due to the vicinity of the Yavari community to the Brazilian border, the parasite population in Yavari shares ancestry with populations in Brazil.This would also result in a similar genetic pattern as observed.
Predictions of origin of parasites in this region are also limited by the low number of South American isolates in the reference dataset in the applied likelihood model, which could be improved by including more recent genomes from this region from other studies [16,18].There were 7 (3%) samples from Peru with predicted origins from Vietnam, Afghanistan and Iran.While it is possible that cases were imported from areas outside of South America, it is more likely that this is a result of incorrect predictions due to missing data in the 33-SNP vivaxGEN-geo barcode loci used for the prediction (3 samples had 18-30% missingness in this barcode compared to the proportion of missingness in all variants: 4-28%), or inaccuracies in the reference dataset, as this was also observed in our earlier study [23].
The presence of highly related parasites observed in these remote districts offers compelling evidence of local transmission in these isolated and hard-to-reach communities, which are accessible only after extensive riverine and road travel spanning many hours or days from Iquitos.Direct interactions with inhabitants during field work reveal occasional movement to neighboring communities for work or social engagements, yet such mobility rarely extends to more distant regions like Maynas.Nonetheless, our analysis underscores the relatedness between P. vivax parasites in these remote communities and those nearer to economic centers.Future investigations should prioritize studying patterns of human mobility and associated socio-epidemiological factors within these hard-to-reach communities to gain deeper insights into the factors driving malaria transmission.
Similar spatiotemporal transmission dynamics have been observed in various malariaendemic Latin American countries.In Brazil, the population structure is notably complex, characterized by numerous clades and a high prevalence of monoclonal infections.Notably, a clear genetic separation exists between the northern states (Amapa ´and Para ´), with high genetic diversity, and a highly clonal P. simium cluster originating from São Paulo [16].Additionally, within the Ma ˆncio Lima municipality, a cluster primarily comprised of monoclonal infections has persisted from 2014 to 2016, indicating stable transmission dynamics in this area [53].In contrast, Colombia displays a high proportion of monoclonal infections, alongside two subpopulations with significant spatiotemporal shifts, particularly evident in the Co ´rdoba region where one subpopulation has increased in prevalence over time [54].Similarly, the P. vivax population in Panama is highly clonal, with a dominant lineage persisting across the country over a decade (2007-2019), suggesting stable transmission despite elimination efforts and varying transmission levels across regions [55].Conversely, Venezuela has experienced an increase in polyclonality and genetic diversity of parasites in 2018 compared to 2003 [56].Overall, these findings underscore the impact of environmental factors and human activities, such as rural-urban mobility, on shaping parasite populations in these regions, highlighting the challenges in malaria control and elimination strategies.
The differences in gene flow and transmission dynamics resulting in distinct genetic patterns are important to guide NMEP strategies for malaria control and elimination.First, it is important to inform implementation of local vs.regional interventions in order to ensure progress towards malaria elimination in all areas.Second, gene flow between the Yavari population and neighboring Brazil, but also with isolates from other Peruvian districts, shows the importance of increasing surveillance in border areas (not only with Brazil, but also Colombia and Ecuador) and creating connections with health authorities from these countries to collectively better control malaria [57].In addition, specific attention can be directed towards addressing stable local transmission in hard-to-reach indigenous communities, such as in Trompeteros.The risk of malaria resurgence following elimination in these areas is mitigated by geographical remoteness and cultural differences.Nevertheless, also due to remoteness, MINSA interventions frequently fail to reach these communities, and when they do, are typically temporary.Consequently, the adoption of district or community-level strategies with targeted interventions, as previously proposed [58], may be advantageous in addressing malaria control challenges in Peru's heterogeneous settings.
Other regions of Peru, such as the Amazonas region (neighboring Loreto) or the Northern Coast, were not included in this study, highlighting the need for further investigations to fully characterize the entire P. vivax population in Peru.This is particularly crucial given the observed low local but high regional diversity.However, compared to other previous studies, our study included more districts and remote areas spanning a larger period [19][20][21].To fully assess the diversity of Peruvian parasites, a more systematic sampling approach is warranted, possibly linked with activities conducted by the NMCP and/or NMEP.Regular sample collection should be conducted through multiple sentinel sites across the country, including hardto-reach communities.Additionally, employing current active case detection strategies may be necessary when number of cases increases over a defined threshold at the community level.
We detected distinct haplotypes in genes putatively associated with drug resistance in Peruvian parasites.However, in contrast to P. falciparum, no validated markers of resistance have been validated for P. vivax, with the exception of SP-resistance [59].Therefore, the observed haplotypes are not characterized in clinical or in vitro studies for CQ or ART resistance in P. vivax, which are the first-line drugs for P. vivax and P. falciparum malaria in Peru, making it difficult to interpret these results considering the national treatment guidelines.In addition, while the WHO recommends routinely monitoring antimalarial efficacy every 2 years [60], the latest treatment efficacy studies (TES) for P. vivax in Peru were conducted about 10 years ago [61,62].Ex-vivo assays represent an additional approach to characterize resistance profiles in Plasmodium parasites.These assays need to be conducted quickly after sample collection and require laboratory facilities in the vicinity to the collection sites [63].For instance, reports from Porto Velho and Ma ˆncio Lima in Brazil spanning from 2012 to 2015 indicated the absence of chloroquine resistance during that period [64,65].Another study with samples from Iquitos collected between 2015 and 2019 reported chloroquine resistance in 3.3% (1/30) of isolates [66].However, these assays are not suitable to analyze low parasitemia infections, common in P. vivax infections, thereby complicating routine analysis [63].
The Pv AmpliSeq assay has been successfully applied in this study to low density samples from Peru, using the previously established threshold of a minimum of 5 parasites per microliter of blood [22].Accurate quantification, through standardized qPCR protocols and the use of a Ct-value cut-offs (e.g. a Ct-value of 34 cycles), which provide a more reliable representation of parasite DNA quantity compared to challenging-to-standardize diluted samples with known parasite densities, is crucial for precise sample selection in efficient surveillance protocols.In addition, maintaining DNA integrity in DBS samples, which are well-suited for sample collection in remote communities due to their ease of transportation to reference laboratories [67], is important to ensure optimal performance of the assay.
Asymptomatic infections are a critical challenge for P. vivax elimination in the Amazon basin, particularly in Brazil and Peru.Though most of these infections commonly have low density parasitemia, 15-31% of asymptomatic infections are patent [3].In this study, we have included samples from asymptomatic patients obtained in active case detection (Iquitos, San Juan Bautista, Trompeteros) and by population-based survey (Mazan), showing the potential of the Pv AmpliSeq assay to analyze both asymptomatic and symptomatic infections, as we also demonstrated for the Pf AmpliSeq assay in Peru [22].Asymptomatic infections should be included in molecular surveillance efforts given the large contribution of these infections into the malaria burden and transmission in Peru.
In summary, our study elucidated the heterogeneity of P. vivax transmission across diverse settings in the Peruvian Amazon by applying the Pv AmpliSeq v2 Peru assay.Results reveal local patterns of low diversity and connectivity, that are important for the NMEP to improve and tailor strategies in accordance with local epidemiology, ecology, and human population.Genomic surveillance of malaria using the Pv AmpliSeq v2 Peru assay can be a valuable tool for routine application in the country and potentially in other countries in South America with similar allele frequencies of the barcode SNPs.To ensure sustained progress in malaria control and eventual long-term elimination, it is essential to prioritize the effectiveness of NMCP/NMEPs through the optimization of resource utilization and the strategic implementation of necessary changes.Informed decision-making, guided by relevant data and an increasing recognition of the potential of genetic surveillance tools, is crucial for effectively addressing the unique challenges associated with P. vivax control and elimination efforts.
).The samples included here were collected in previously published studies from 2007-2008 (n = 41) [31], 2016 and 2017 (n = 54) [32], from population-based cross-sectional surveys in Mazan district in 2018 (n = 13) [33,34], and n = 32 samples collected in 2014 by passive case detection (PCD) at the health center at San Juan district in Iquitos (S1 Table).In addition, samples from Ramon Castilla, Yavari and Trompeteros districts were collected in collaboration with MINSA authorities as part of surveillance activities in remote border communities in the Loreto region.Collections in Ramon Castilla (n = 2) and Yavari (n = 20) were conducted in December 2018 through active case detection (ACD) and PCD, respectively.Samples from Trompeteros (n = 68) were collected during 3 weeks at the end of November as part of ACD visits and during April-May 2020 by PCD (Fig 2 and S1 Table

Fig 1 .
Fig 1. Map of included study sites in the Peruvian Amazon. A. Provinces in the Loreto region.Three provinces were included in this study (Loreto: orange, Maynas: green, Mariscal Ramon Castilla: blue).Loreto province, includes samples from the Trompeteros district.The remote indigenous Nueva Jerusalen community is settled there.In Mariscal Ramon Castilla province, samples were included from 3 districts containing border communities with Colombia and Brazil.The area within the red square is shown in B. B. Maynas province, with many districts with peri-urban communities.Maps were created with an in-house script in R using the R-packages used R libraries SF, ggspatial, ggrepel.The base layer of the map was obtained from: https://ide.inei.gob.pe/#geo.https://doi.org/10.1371/journal.pntd.0011879.g001

Fig 2 .
Fig 2. Overview of samples from Peru over time (year) and district.Samples were collected in 11 districts (y-axis) between 2007 and 2020 (year on x-axis).The size of the bubbles indicates the number of samples collected in a specific year and district.

Fig 3 .
Fig 3. Principal Component Analysis of P. vivax genomes dataset(23), filtered on the SNPs in the Pv Peru Barcode only (A & C), and all biallelic SNPs detected by the Pv AmpliSeq v2 Peru assay in the Peruvian Amazon samples (B&D).The samples (dots) are colored according to the originating population (regions or country), using the classifications fromAdam et al., 2022 [68].https://doi.org/10.1371/journal.pntd.0011879.g003 ). Pi was higher in 2007-2008 than in later years (adjusted p<0.001,Fig 4D), alongside a period of intensification of malaria control and reduction of cases in Peru.No temporal trends were observed in He (Fig 4B).Diversity in samples from 2018 was lower than other years, however this is likely a result of the low diversity in Yavari in 2018, which made up the majority of samples in this year.In addition, we found polyclonal infections in 5 districts, all of them at a low proportion (10-35.3%)(Fig D in S1 Text).

Fig 4 .
Fig 4. Molecular markers for transmission intensity.Violin plot of Expected heterozygosity (He) of 40-SNP barcode positions (A & B), and nucleotide diversity (pi) (C and D) measured across the targeted region in 5000 bp windows by district (A& C) and years (B & D).https://doi.org/10.1371/journal.pntd.0011879.g004 Fig F in S1 Text).High relatedness was seen mainly between samples from the same district, with a clonal population (sequences sharing IBD > 99%) in Yavari (Fig 6A).At moderate levels of IBD-sharing (50%), there was a lot of relatedness within Trompeteros samples (Fig 6B).At even lower levels of IBD (10%), samples from Yavari, Trompeteros, Mazan and San Juan Bautista become connected (i.e.high amount of sample pairs with at least 10% IBD), regardless of their geographical distance within these districts (Fig 6C).Many clusters of high IBD (50% or 90%) were found within one year, however, some sample pairs from different years with high relatedness were observed (Fig F in S1 Text).

DFig 5 .
Fig 5. Population structure and genetic differentiation.Heatmap of pairwise F ST values between 5 districts in the Peruvian Amazon estimated with the hierfstat package in R (A).Scatter plot of discriminant analysis eigenvalues 1 and 2 (LD1 and LD2) using all variants detected by the Pv AmpliSeq v2 Peru assay of Peru samples (n = 230) grouped by district (B) or district and year (C), and Maynas samples (n = 124) grouped by period (D).https://doi.org/10.1371/journal.pntd.0011879.g005

FreqFig 7 .
Fig 7.Prediction of origin of P.vivax.The origin of samples was predicted using the vivaxgen geo framework (https://geo.vivaxgen.org/),which returns 3 predictions of origin for each sample with a score for each prediction.Here we plot a heatmap of the frequency (y-axis) of the highest-scoring prediction of origin for the samples from each district of the sample collection site (x-axis).The majority of samples from Mazan, San Juan Bautista, Iquitos and Trompeteros were predicted to originate from Peru, while bordering communities Ramon Castilla, San Pablo and Yavari were predicted to originate from Brazil.