Longitudinal analysis of Plasmodium falciparum genetic variation in Turbo, Colombia: implications for malaria control and elimination

Malaria programmes estimate changes in prevalence to evaluate their efficacy. In this study, parasite genetic data was used to explore how the demography of the parasite population can inform about the processes driving variation in prevalence. In particular, how changes in treatment and population movement have affected malaria prevalence in an area with seasonal malaria. Samples of Plasmodium falciparum collected over 8 years from a population in Turbo, Colombia were genotyped at nine microsatellite loci and three drug-resistance loci. These data were analysed using several population genetic methods to detect changes in parasite genetic diversity and population structure. In addition, a coalescent-based method was used to estimate substitution rates at the microsatellite loci. The estimated mean microsatellite substitution rates varied between 5.35 × 10−3 and 3.77 × 10−2 substitutions/locus/month. Cluster analysis identified six distinct parasite clusters, five of which persisted for the full duration of the study. However, the frequencies of the clusters varied significantly between years, consistent with a small effective population size. Malaria control programmes can detect re-introductions and changes in transmission using rapidly evolving microsatellite loci. In this population, the steadily decreasing diversity and the relatively constant effective population size suggest that an increase in malaria prevalence from 2004 to 2007 was primarily driven by local rather than imported cases.


Background
During the 1970s and 1980s, malaria incidence increased in South America due to the disorganized development of cities and the emergence of drug resistance in Plasmodium falciparum [1,2]. However, during the last 10 years, transmission has decreased and many areas exhibit limited spatial connectivity between parasite populations [3]. As a result of this tangible reduction in transmission, malaria control programmes in these areas are expected to change from the control to the elimination phase.
Previous studies of P. falciparum in South America identified strong genetic structure and epidemic expansions that were the result of inbreeding and bottlenecks due to high drug pressure [4][5][6][7]. As a result, parasite populations within this region exhibit high genetic differentiation. Regardless of these discoveries, there have been few attempts to incorporate information on parasite genetic diversity into control programmes [8][9][10][11][12]. A recent example is the study by Daniels et al. in Senegal [11]. This study illustrated how genetic information obtained by using SNPs on P. falciparum samples collected over an 8 years period enriched epidemiological information [11,12]. In particular, the authors could detect genetic changes in the parasite population that correlated with field data by using epidemiological models [11].
In this paper, the relationship between parasite genetic variation and malaria epidemiology is explored in a locality that has seen substantial fluctuations in malaria prevalence. This study was conducted on samples collected over 8 years in the northwest region of Colombia, one of the most malaria-endemic countries in South America [13]. Due to economic activities, such as mining and agriculture, there is substantial migration of workers with very high exposure to malaria (in close proximity to Anopheles sp. breeding sites) into different towns. These local movements are considered important in the maintenance of malaria transmission in Colombia and other similar settings in the Americas [1,14]. Samples were genotyped at several microsatellite loci and analysed using Bayesian coalescent methodologies that have been successfully applied to other human pathogens [15][16][17]. In contrast to cross-sectional studies, genetic data from longitudinal historical samples can be used to separately estimate the effective population size (N e ) and substitution rate at each locus. In addition, other mutations associated with drug resistance in P. falciparum were explored to determine how their frequency changed after the drugs were no longer in use.  [14,18]. The samples used were those collected prior treatment [14,18]. The inclusion criteria for the participants were: (a) patients that were older than 1 year of age with fever that lasted 48 h, (b) the patients were living permanently in the study site, and (c) none had severe malaria or were pregnant. The exclusion criteria were: (a) having another associated disease requiring additional treatment, (b) having mixed Plasmodium vivax and P. falciparum infections, (c) being hypersensitive to anti-malarial drugs, (d) moving to another site different from that where the study was conducted, (e) voluntary withdrawal from the study, (f ) and third-party administration of anti-malarial drugs [19]. All patients received treatments approved by the Colombian authorities for uncomplicated malaria at the time: amodiaquine (AQ) plus sulfadoxine-pyrimethamine (SP) until 2006, artesunate (AS) plus mefloquine (MQ) in 2007, and artemether plus lumefantrine (LF) (Coartem ® ) from 2008 forward.

DNA isolation and genotyping methods
DNA was isolated using the QIAamp DNA mini kit (QIAGEN, Valencia, CA, USA) and whole genome amplification was performed using the Repli-g Mini Kit (QIA-GEN, Valencia, CA, USA). Twelve neutral microsatellites distributed across seven chromosomes were used in this study [19]. Fluorescent-labelled PCR products were separated on an Applied Biosystems 3730 capillary sequencer and scored using Gene Marker v1.95 (SoftGenetics LLC). The discovery of more than one allele in a sample was interpreted as a multiclonal infection. Missing data (no amplifications) were reported by locus but not considered for defining haplotypes.

Population genetic analyses
Genetic variation at each locus was quantified by the allelic diversity, which was estimated using the for- where n is the number of isolates sampled and p i is the frequency of the ith allele (i = 1, …, L) in the sample. The sampling variance for H E was calculated To assess the parasite population structure, the genetic data was analysed with Structure v2.1 [24], which uses a Bayesian clustering algorithm to assign isolates to a fixed number (K) of populations or clusters. The data were evaluated using different K values (K = 2-10) and ten independent chains were run for each K value with a burn-in period of 10,000 iterations followed by 50,000 iterations. The admixture model was used in all cases, allowing for the presence of individuals with ancestry in two or more of the K populations. The population was also described using median-joining networks inferred by Network 4.6.1.1 [25] on neutral and linked microsatellite data. The standardized index of association (I S A ) was also used to test for evidence of overall multilocus linkage disequilibrium in the population per year [26]. FST values were calculated using Arlequin 3.11 [27].

Bayesian phylogenetic analyses: microsatellite substitution rates and phylodynamics
Microsatellites can gain and lose repeats by two different mechanisms: unequal crossing over and DNA-replication slippage [28]. Bayesian coalescent-based method implemented in BEAST v.1.7.5 [29] were used to compare 12 models of microsatellite evolution, which make different assumptions about the relationship between the expansion and contraction rates and the repeat length [30]. In addition, the mean substitution rate at each locus in units of absolute time was estimated. The tip dates (sampling dates) were recorded in months and the coalescent constant population size model was used with a strict molecular clock and a standard uniform prior on the mean substitution rate. Once all 12 microsatellite mutation models had been run in this fashion, the model best supported by the data was identified by calculating the Bayes' factor for each model relative to a reference model and choosing the one with the largest factor.
The best supported substitution model and the corresponding substitution rate were used in an Extended Bayesian Skyline Plot (EBSP) to estimate the parasite effective population size (N e ) through time, assuming a generation time for malaria infections of a month. The EBSP assumes a non-parametric piecewise linear model of population size and uses information from multiple unlinked neutral loci to estimate the number of population size changes. Each analysis was performed with three independent runs based on 8 × 10 8 generations and samples taken every 1 × 10 6 steps after a burn-in period of 20 %. Convergence of the Markov chains was assessed by monitoring both the trace plots and effective sample sizes.

Genotyping of neutral microsatellite loci
A total of 245 out of 257 samples amplified at 11 out of the 12 loci. Nine loci were retained for further analysis due to their clear pattern on the electropherogram (single peak) that facilitated the identification of alleles (Additional file 1). Of these, five exhibited high levels of variation (Additional file 1). Although 57 multiple infections were detected, 90 % of these were variable at only one locus suggesting that these patients were infected by highly related parasites (Table 1). A total of 79 different multi-locus genotypes were identified and analysed (Additional file 2), those included four genotypes that were found in at least six consecutive years following 2003 (Additional file 2). However, most pairwise fixation indices calculated between samples collected in different years were significantly different from zero (Table 2), indicating substantial changes in the genetic composition of the parasite population between years. Genetic diversity (H E = 0.35 ± 0.11, Table 1 Table 2.

Genotyping of Pfcrt, Pfdhfr and Pfdhps
Most of the samples were double mutants (75E and 76T) for Pfcrt. There was also one triple mutant (74I, 75E and

Cluster analysis
Six clusters were identified, most of which persisted for several years ( Fig. 1a; Table 3). While all six clusters were found in samples from 2002 to 2003, only five were detected in samples from 2007 to 2009, and a clear expansion of cluster D was observed in 2005. In contrast, cluster E was not detected in 2007, 2008 or 2009, and may have suffered a random local extinction during a period when the parasite population size was reduced following the introduction of AQ. The software Network version 4.6.1.1 was used to characterize the relationship between multilocus genotypes (Fig. 1b) and those clusters that persisted for several years. Genotypes assigned to cluster F were highly dispersed in the network without a clear grouping pattern. One sample was distantly related to the rest of the network and had a sensitive genotype for Pfcrt, Pfdhps and Pfdhfr, which is present in Central America. Additionally, haplotype networks were constructed using the microsatellite loci flanking the drug resistance genes. The haplotypes associated with the Pfdhps mutant were grouped together, while the haplotypes associated with the wild type Pfdhps genotype were highly dispersed throughout the network (Fig. 2). This pattern was not observed in Pfcrt and Pfdhfr where the resistant genotypes were fixed.

Time estimation and mutation rates
A total of 12 different microsatellite mutation models implemented in BEAST v1.7.5 were tested and the PL1 was the best supported model at each of the nine neutral loci included in this study (Table 4). Under this model, the substitution rate is linearly proportional to microsatellite length (P), the probability of microsatellite contraction is different from expansion but dependent on allele length (L), and each mutation changes the length of the microsatellite by exactly one repeat (1). Consequently, PL1 was used in all subsequent analyses conducted using BEAST. The mean substitution rate was estimated at each of these loci, which varied between 5.35 × 10 −3 and 3.77 × 10 −2 substitutions/month for the six most variable loci (mean of the posterior distribution; Table 5). Surprisingly, somewhat higher rates were estimated at three loci that harbored little variation, but the 95 % highest probability density (HPD) intervals were also much broader at these loci than at the other six, suggesting low information content in the data and greater influence of the prior distribution on the substitution rate.
Examination of the extended Bayesian skyline plot (EBSP; Fig. 3) reveals that there is little correspondence between the estimated N e (assuming a generation time of a month) and the number of monthly cases reported from 2002 to 2009 to the Antioquia Sectional Health Directorate (Dirección Seccional de Salud de Antioquia), Colombia. Neither the short-term fluctuations nor the longer-term trends seen in the monthly case data are evident in the EBSP. Furthermore, while the estimated N e is several times smaller than the number of monthly case reports in most months between January 2002 and January 2007, the opposite pattern is seen in the period 2007 to 2009 when there was a pronounced decrease in the number of reported cases but also slight increase in the estimated effective population size.

Discussion
The prevalence of P. falciparum malaria in Turbo, Colombia fluctuated dramatically between 2003 and To better understand the processes influencing P. falciparum malaria in this region, this study involved the characterization of this parasite population using a combination of neutral microsatellite markers and loci linked to known drug resistance mutations at three loci. Because these two classes of loci are expected to differ in both their mutation rates and exposure to selection, they have the potential to reveal different features of malaria epidemiology.
The analysis of the microsatellite data revealed the existence of several related multilocus genotypes that persisted for at least 6 years in the region around Turbo ( Fig. 1; Additional file 2). This is the pattern expected in a parasite population where control efforts have left behind lineages that became the founders of clusters of highly related parasites [7,10]. This observation differs from a previous study of a Plasmodium population in Peru which found that multilocus genotypes were rapidly  In low-transmission areas, meiotic recombination is most likely to occur between lineages that are closely related, leading to high levels of inbreeding. This is consistent with the fact that many of the putatively multiple infections identified in our sample vary at a single locus, possibly due to mitotic mutations that have occurred within the infected individual. Such mitotic events can lead to an overestimation of the number of observed multiple infections [8]. Inbreeding can also account for the significant levels of multilocus linkage disequilibrium documented in every year except 2002 (see "Results"). On the other hand, the microsatellite data suggests that there has been some recombination among the different clusters segregating in Turbo, as is evidenced by the admixture patterns seen in the cluster analysis (Fig. 1).
Although several multilocus haplotypes were observed to persist for multiple years, the cluster analysis revealed pronounced changes in the genetic composition of the P. falciparum population in Turbo from year to year, with different clusters dominant in different years. This observation is confirmed by the pairwise fixation indices shown in Table 2, which show significant genetic differentiation between every pair of years except between 2009 (when only 14 individuals were sampled) and the Fig. 2 Median joining haplotype network using Pfdhps-linked microsatellites. The haplotypes are represented by circles with width being proportional to their frequencies. Haplotypes in the red dashed circle are single mutants grouped in colour clusters according to their genotypes (using Structure). Haplotypes in green are wild type genotypes preceding 2 years. Rapid non-directional changes in the genetic composition of a population, such as observed here, can be explained by demographic stochasticity (i.e., genetic drift), which is expected to be pronounced in small populations. This is consistent with the low effective population size revealed by the EBSP (Fig. 3).
There is a lack of a correspondence between the estimated Ne displayed by the EBSP (using the posterior median of the EBSP) and the monthly case reports. During 2003 to 2007, N e remained relatively unchanged even though there were substantial fluctuations in prevalence. Furthermore, between 2007 and 2009 there was a gradual rise in N e even though the number of monthly case reports rapidly decreased following 2008. The reduction in cases coincides with an intense malaria control programme known as "Papa Luis" carried out in Antioquia between 2007 and 2009. This programme consisted of massive administration of chloroquine (an effective drug to treat P. vivax but not P. falciparum), biological control of Anopheles larvae, swamp drainage and indoor residual spraying [32]. During this period, the introduction of artemisinin-based combination therapy (ACT) is also thought to have contributed to a reduction in malaria cases.
The discrepancy between the epidemiological and genetic estimates could be explained by several factors. For parasites, N e depends not only on prevalence, but also on the transmission rate and the variance in the number of new cases transmitted by infected individuals [9,10,33,34]. In particular, a fixed parasite generation time of 1 month was assumed for the study, but the time between transmission events could vary affecting the parasite generation time. Similarly, the number of new cases generated by infected individuals can vary over time due to changes in social dynamics or public health policy and this too will affect N e . The observation that during 2003 to 2007, N e remained relatively unchanged even though there were substantial fluctuations in prevalence suggests that seasonal clonal expansions led to a high prevalence at some time points. Indeed, frequent clonal expansions could even reduce N e if they were accompanied by an increase in either the transmission rate or the transmission variance between infected individuals.
Another factor that may have contributed to the mismatch between the estimated values of Ne and the case report numbers is the variation in the sample sizes collected for genetic analysis in different years. In particular, the apparent rise in the estimated N e from 2007 to 2009 may be due to the smaller number of samples collected in these years compared with the period 2003 to 2006. One consequence of the variable sample sizes is that the posterior distribution of the effective population size is more strongly influenced by the prior distribution in those periods when the data are less informative. Indeed, this is evident in the increasing credibility intervals for N e between 2007 and 2009. Because of this sensitivity to sample sizes, archiving samples for genotyping should be a standard policy since those will facilitate understanding potential re-introductions or changes in prevalence.  The genetic data from samples collected over a span of 8 years also allowed to estimate microsatellite substitution rates separately from N e . At the six polymorphic loci with enough information, estimates ranged from 5.35 × 10 −3 to 3.77 × 10 −2 substitutions per locus per month (Table 5). In contrast, Su et al. estimated a genome-wide average microsatellite mutation rate of 1.59 × 10 −4 per locus per meiosis from a genetic cross that was typed at 901 loci [35]. Notably, this rate falls below the 95 % HPD interval of five of the six loci that were investigated. The large discrepancy between these estimates could be due to two differences between the studies. First, whereas Su et al. only considered changes arising during meiosis, the estimates reported here reflect the accumulation of mutations over monthlong periods during which the parasite is likely to have undergone multiple mitoses and at most one or two meioses. For the purposes of demographic/epidemiologic inferences, this composite rate is more relevant than the meiotic mutation rate. Secondly, locus-specific variation in microsatellite substitution rates could also explain this discrepancy. Since variable loci were chosen, those may be evolving more rapidly than the genome-wide pool of loci analysed by Su et al.
It should be noted that the mean substitution rates reported in this paper were obtained using the model best supported by our data, the single-phase proportional linear model (PL1). There was no support for any of the two-phase models of microsatellite evolution implemented in BEAST in which alleles may expand or contract by more than one repeat in a single step, although this too could be specific to the loci analysed in this study. The PL1 model assumes that the overall substitution rate is proportional to the repeat length, that each mutation results in the gain or loss of a single repeat, and that the probability of an expansion is a decreasing linear function of the repeat length, so that above a certain threshold, mutations are more likely to result in a contraction rather than in an expansion of the microsatellite [30]. As such, the rates reported in Table 5 are estimates of the mean substitution rate with respect to the stationary distribution of the microsatellite length under this model. These estimates provide empirical evidence that polymorphic microsatellite loci can be used to make inferences at time scales that are epidemiologically relevant [36].
Lastly, although drug resistance mutations can decrease in frequency once the corresponding drugs are no longer in use [37,38], there are also endemic areas where such mutations have gone to fixation from intermediate frequencies even after drug pressure has been removed [4,5,20]. This study provides evidence of this process. Moreover, the data show that drug resistance mutations were fixed in a parasite population with a low effective population size and then remained at high frequency even after the drug policy was changed. Thus, anti-malarial drugs will remain ineffective unless there is an influx of sensitive parasites from other areas (e.g., Central America). The finding of a sensitive and genetically unrelated parasite demonstrates that such re-introductions are possible. It is worth noting that sensitive Pfdhps genotypes were still segregating in the population.

Conclusion
This study provided an opportunity to observe changes in the genetic composition of a local Plasmodium population over many parasite generations. Several multilocus genotypes were found to be stable for many years, a dynamic that should be considered when genotyping is used to separate local from re-introduced cases in areas with seasonal or low malaria transmission. The estimated substitution rates reported in this study support previous claims that polymorphic microsatellite loci evolve on a timescale that facilitates epidemiologic investigations. Regardless of spikes in prevalence followed by the scale-up of interventions and/or changes in drug policy, the parasite effective population size remained small and relatively unchanged for many years. This suggests that the observed fluctuations in the number of cases were likely the result of local processes (e.g. operational or environmental factors) that affected malaria transmission. Such processes will be better understood by incorporating genetic data into epidemiological models [11]. Furthermore, mutations associated with drug resistance remained fixed in the population demonstrating that, without the influx of sensitive migrants, drugs will likely remain ineffective in endemic areas with similar epidemiologic characteristics such as the one studied here. Although several factors could affect the number of malaria cases, variation in the genetic composition of Plasmodium populations provides additional information that can facilitate the understanding of such changes and, by so doing, support science-based decision making process regarding malaria control policies.