Population Genetic Analysis of Plasmodium falciparum Circumsporozoite Protein in Two Distinct Ecological Regions in Ghana

Extensive genetic diversity in the Plasmodium falciparum circumsporozoite protein (PfCSP) is a major contributing factor to the moderate ecacy of the RTS,S/AS01 vaccine. Transmission intensity and the rates of recombination within and between populations inuence the extent of genetic diversity. Understanding the extent and dynamics of PfCSP genetic diversity in different transmission settings will help to interpret the results of current RTS,S ecacy and Phase IV implementation trials conducted within and between populations in malaria endemic areas such as Ghana.

variations ranging between 1 and 6 were observed in the TH2R and TH3R epitope regions of the PfCSP. Tajima's D was negatively skewed especially for the population from Cape Coast given expected historical population expansion. On the contrary, positive Tajima's D was observed for the Navrongo P. falciparum population, consistent with balancing selection acting on the immuno-dominant TH2R and TH3R vaccine epitopes.

Conclusion
The low frequencies of Pfcsp vaccine haplotype in the populations analyzed calls for additional molecular and immuno-epidemiological studies with temporal and wider geographic sampling in endemic populations targeted for RTS,S application. These results have implications on the e cacy of the vaccine in Ghana and will inform the choice of alleles to include in future multivalent or chimeric vaccines.
Background Page 3/23 Stagnation in the decline of malaria over the last 5 years is indicative that the set global malaria elimination targets may not be achieved without the addition of a broadly effective vaccine to complement the panel of available malaria control tools [1]. However, it has taken over 15 years to nally license a moderately e cacious malaria vaccine for implementation due to extreme levels of antigenic diversity of most vaccine candidates, which reduces their e cacy across a broad range of evolving natural parasite populations. E cacy data from a Phase III clinical trial conducted across 11 sites in 7 African countries in children (aged 5 to 17 months) and infants (aged 6-12 weeks) revealed that the vaccine confers moderate protective e cacy against clinical disease and severe malaria which waned over time [2]. The vaccine conferred only 36.3% protection against clinical malaria and 32.2% against severe malaria in children aged 5-17 months who received 3 primary doses of RTS,S with a booster at the 20 th month [2]. The European Medicines Agency gave a favorable scienti c opinion in 2015 indicating how the bene ts of protective immunity outweigh the risk and the potentially high impact this moderate e cacy could make, given the huge disease burden [3]. Subsequently, Ghana, Kenya and Malawi were selected and currently conducting the pilot Phase IV implementation trials via The Malaria Vaccine Implementation Programme (MVIP) led by WHO [4].
The RTS,S vaccine is a malaria subunit vaccine that is formulated from a fragment of the circumsporozoite protein (CSP) of P. falciparum 3D7 laboratory strain and fused with the Hepatitis B surface antigen and the AS01 adjuvant [5]. For cell mediated immunity, RTS,S includes a fragment of the central NANP-NVDP repeat polymorphic B-cell epitope region and a highly polymorphic C-terminal nonrepeat epitope region of PfCSP, which covers CD4 + and CD8 + T-cell epitopes denoted as TH2R and TH3R respectively [6]. Several studies have reported high levels of polymorphisms at the T-cell epitopes within the C-terminal region of PfCSP in natural parasite populations [7][8][9][10]. Although there are variations in the immuno-dominant central repeat region (CRR), it was hoped that antibodies targeting a single dominant epitope based on the tetrapeptide repeat NANP would provide strain-surpassing immunity. This hope was strengthened by the ndings of a molecular epidemiology study in African children that showed no evidence of naturally acquired strain-speci c immunity to different variants of CSP obtained using the 454 sequencing platform [8]. In addition, initial ancillary studies on Phase II clinical trials conducted in three sites including The Gambia, Kenya and Mozambique revealed that immune protection of the RTS,S/AS02 vaccine was not strain-speci c even after vaccination [11][12][13][14]. However, these studies were based on only a few hundred isolates and were not statistically powered to detect moderate effects such as the strain-speci c immune response of the vaccine.
Subsequently, an ancillary next generation deep sequencing analysis of Phase III trial samples in 2015 showed that the vaccine indeed conferred partial protection against clinical malaria for strain-speci c vaccine alleles (50.3%) and poor protection against mismatch strains (33.3%) [15]. Also, recent studies of the population structure of Pfcsp suggest that geographically variable levels of diversity and geographic restriction of speci c subgroups may have an impact on the e cacy of Pfcsp-based malaria vaccines in speci c geographic regions [7,16]. In particular, extreme global genetic diversity of Pfcsp strains has been reported, with the 3D7 vaccine strain found only in approximately 5.0% and 0.2% in some African and Asian countries respectively [16].
The need to explore the extent of genetic diversity and the natural dynamics of malaria vaccine antigens in endemic areas where vaccines will be deployed is a point of focus due to the polymorphic nature of P. falciparum antigens [15,17,18]. Furthermore, evolutionary factors such as selection operating on parasites differ locally owing to varying transmission patterns, ecology and degrees of acquired immunity in humans [19]. Therefore, further characterization of the genetic diversity of immune epitopes of vaccine antigens is important, especially in regions like Ghana where the vaccine is undergoing the Phase IV implementation trial. This should provide a broader assessment of the extent to which the local natural diversity could impact its e cacy and wider implementation.
Malaria transmission in Ghana is generally perennial but with marked seasonal effects, that vary with local ecology and overall transmission intensity [20]. For control purposes, malaria transmission across Ghana has been classi ed eco-epidemiologically into three main zones, namely; forest ecology with perennial but high transmission during the rainy season (May-August and October-November), northern/Guinea savannah with seasonal and intense malaria transmission during the rainy season (June-October), but with periods of very low transmission during the dry season, and coastal savannah with low to moderate perennial transmission and a marked seasonal effect during the rainy season [21]. The implementation trial of RTS,S vaccine is being conducted in three regions namely, Brong-Ahafo and Volta regions in the forest ecology and Central region on the coastal belt with varying transmission levels. Understanding the extent and drivers of diversity in these regions could also have a profound impact on improving the design of future circumsporozoite protein-based vaccines. Using paired-end short-read sequences of the Pfcsp in parasite populations from two geographically distinct sites in Ghana, we evaluated within-host diversity (complexity of infection) and the extent of population speci c haplotype diversity of the c-terminal region of Pfcsp encompassing the TH2R and TH3R epitopes. Our results provide information on diversity most relevant to vaccine escape and cross-protection. We further explored PfCSP amino acid diversity and conservation. In addition, we assessed evidence of selection in the Pfcsp that could be driving and sustaining the diversity observed.

Study area
The study was conducted in two sites, the Cape Coast Metropolitan area with Cape Coast as the main township and Kassena-Nankana districts (KNDs) with Navrongo as the main township (Fig 1). Cape Coast is located in the Southern coastal savannah region with low to moderate perennial malaria transmission but with marked seasonal effect during the rainy season (May-August and October-November). The estimated annual entomological inoculation rate (EIR) is fewer than 50 infective bites per person per year [21]. The KNDs are located in the Upper East Region of Ghana with a Guinea savannah vegetation. Malaria is perennial in the KNDs with high seasonal malaria transmission during the rainy season (June to October), and minimal transmission during the rest of the year, which are relatively dry months. The estimated annual EIR for the KNDs is up to 157 infective bites per person per year [22].
In Cape Coast, P. falciparum parasites were isolated from 101 children (aged 6-59 months) living within the municipality and presented with clinical malaria at the Cape Coast District hospital. Samples were collected during the major rainy season (May-August) in 2013. In Navrongo, P. falciparum parasites were isolated from 131 children aged between 12-59 months who lived in the KNDs and also presented with P. falciparum clinical malaria at health facilities in the KNDs in the years 2010 (January to October), 2011 (January to February) and 2013 (August to October) during both dry and wet seasons. For both study sites, children presenting with fever i.e. axillary temperature ≥ 37.5° or history of fever during the previous 24 hours were screened with malaria Rapid Diagnostic Test (RDT). Blood smears were prepared for RDT positive individuals and P. falciparum asexual parasites were determined by microscopy. Venous blood (2-5mL) was collected and archived from P. falciparum infected patients who gave consent. Genomic DNA extraction and sequencing Genomic DNA was extracted using the QiaAmp DNA prep kit (Qiagen, Valencia, CA) following the manufacturer's protocol and con rmation of P. falciparum positive samples was done by ampli cation using nested PCR with speci c primers [23] (see Additional le 1). The Genomic DNA was submitted to the Wellcome Trust Sanger Institute Hinxton, UK. for whole genome sequencing using the Illumina HiSeq platform as part of the MalariaGEN community project. Illumina sequencing libraries (200bp insert) were aligned to the reference P. falciparum 3D7 genome after which variant calling was done following the customized GATK pipeline. Each sample was genotyped for 797,000 polymorphic bi-allelic coding SNPs across the genome ensuring a minimum of 5x paired-end coverage across each variant per sample. The dominant allele was retained in the genotype le at loci with mixed reads (reference/non-reference). The genotypes were assigned denoting the reference and non-reference nucleotides as 0 and 1 respectively. Polymorphic sites with low call rates and those in hypervariable, telomeric and repetitive sequence regions were excluded.
Sequence acquisition and pre-processing Genome sequences from Navrongo and Cape Coast were mined from the MalariaGEN Plasmodium falciparum Community (Pf3k) Project release 5.1 Database [24] in variant call format (VCF). Genetic variants on chromosome 3 were retrieved for both Navrongo and Cape Coast. For the VCF le of Cape Coast, all genotypes at each SNP position were mono-allelic (monoclonal); we modeled biallelic genotypes using a custom-made Python script. This was based on the approach by the MalariaGEN Pf3k Project, where loci with mixed allele calls were modeled using the read and allelic depth [25]. Brie y, to account for PCR errors, genotypes of SNPs with read depth <5 were not determined. At SNP positions with read depth ≥ 5, the sample was genotyped as heterozygous if the allelic depth of both alleles were ≥ 2. The remaining alleles were either genotyped as the homozygote reference allele or homozygote alternative/derived allele.
Data for both populations were ltered to obtain only biallelic SNPs using Bcftools v1.9 [26] and quality checked as follows: Only SNPs that passed all VCF lters were retained. Isolates with > 10% missing SNPs were excluded followed by the removal of SNPs with > 5% of missing data using PLINK v1.9 [27].
Further, heterozygosity was calculated and 8 isolates with outlier heterozygosity within the Cape Coast population were excluded. No outlier heterozygosity was observed in Navrongo. SNPs with a minor allele frequency (MAF) < 1% were removed. The remaining missing SNPs were imputed and phased using Beagle v5.1 [28].

Population genetics analysis
Minor allele frequency distribution Prior to the removal of rare alleles (MAF ≤0.01), the minor allele frequency distribution for all putative SNPs (n = 90) within the Pfcsp for both Cape Coast (n = 35 SNPs) and Navrongo (n = 55 SNPs) P. falciparum isolates was determined using Plink1.9. MAF is the frequency at which the second most common allele occurs at a given SNP position in a population.

Within host parasite diversity estimation and statistical analysis
The genetic diversity within the individuals was assessed by estimating Wright's inbreeding co-e cient (F WS ). For this analysis, we were primarily interested in the within host diversity of Pfcsp which refers to the number of different Pfcsp strains contained within an individual infection. The retained variants (13 and 22 SNPs) from the 92 Cape Coast isolates and 128 Navrongo isolates were used for this analysis.
The Fws metric estimates the heterozygosity of parasites (H W ) within the individual relative to the heterozygosity within the parasite population (H S ) using the read count of alleles. The Fws metric calculation for each sample was done using the equation: where H W refers to the allele frequency of each unique allele found at speci c loci of the parasite sequences within the individual and H S refers to the corresponding allele frequencies of those unique alleles within the population [29,30]. Fws ranges from 0 to 1; a low Fws value indicates low inbreeding rates within the parasite population thus high within host diversity relative to the population. An Fws threshold ≥0.95 implies samples with clonal (single strain) infections while samples with Fws <0.95 are considered highly to have mixed strain infections signifying within host diversity. Fws was calculated using an R package, moimix [31]. Samples with clonal infections were used for selection analysis. The Pearson Chi square test was used to measure the statistical signi cance of any differences observed in the within host diversity estimates between the population pair. The test was done using R software with P values of <0.05 considered statistically signi cant.

Genetic diversity within parasite populations
We examined the haplotype diversity (the number of two random strains within the population having different haplotypes) of the Pfcsp in each population by exploring the variants in the C-terminal region of the gene (909 -1140bp). We re-constructed 184 Pfcsp fasta DNA sequences with the retained variants (13 SNPs) from the 92 Cape Coast isolates and 256 DNA sequences (22 SNPs) from the 128 Navrongo isolates using an in-house Python script.
The following metrics were then used to assess the diversity of Pfcsp C-terminal within each parasite population using the DnaSP software (version 6.10.01) [32]: number of sequences (n), number of haplotypes (h), segregating sites (S), the average number of pairwise nucleotide differences (K), nucleotide diversity (π) and haplotype diversity (Hd) [33,34].
To assess the genealogical relationships between the Pfcsp C-terminal haplotypes found in Navrongo and Cape Coast, we constructed a network based on the method described by Templeton, Crandall, and Sing (TCS) [35,36] using PopArt [37]. The haplotypes were denoted as 3D7, Hap 2 up to Hap 66 in the network.
We further explored the amino acid haplotypes within each population by translating all the 440 Pfcsp DNA sequences (Cape Coast (184) and Navrongo (256)) into amino acid sequences and comparing them to the 3D7 reference strain (0304600.1, PlasmoDB [38]) using in-house Python scripts. The frequency of TH2R 311-327 amino acid (PSDKHIKEYLNKIQNSL) and TH3R 352-363 amino acid (NKPKDELDYAND) haplotypes in each parasite population were determined also using a customized Python script and plotted.
Population differentiation and structure analysis Wright Fixation index (Fst) and principal component analysis (PCA) were used for population differentiation and structure analyses. To reduce bias in Fst and PCA analysis, we pruned out SNPs (from the 2504 Cape Coast chromosome 3 retained SNPs and the 1954 Navrongo retained SNPs) with pairwise linkage disequilibrium (LD) value, r 2 >0.5 within a window of 100bp in the entire chromosome 3 dataset using a step size of 10. The remaining SNPs set at Chromosome 3 shared between the populations after pruning was 516 of which 10 were Pfcsp SNPs.
The Pfcsp SNPs were then used to estimate Fst and population structure. Weir and Cockerham's Fst per SNP between Cape Coast and Navrongo parasite isolates was calculated using Vcftools v0.1.5 [39] and population structure by PCA was done using smartpca (Cambridge, MA, USA) in EIGENSOFT package v6.1.3 [40]. Principal components were computed with the number of outlier removal iterations set at 10 while maintaining other parameters. In all, 10 PCs were computed with 5 and 9 outlier samples removed from the 92 and 128 isolates from Cape Coast and Navrongo respectively. Thus, there remained 83 samples in Cape Coast and 123 samples in Navrongo population after outlier samples were removed.

Signatures of selection
To test for SNP neutrality, the Tajima's D statistical test [41], was done in sliding windows of size 100bp and step size of 10 with Pfcsp monoclonal samples from each population using Vcftools v0.1.5. Tajima's D test compares the average pairwise differences (pi) and the total number of segregating sites (S). Negative values indicate directional or purifying selection while positive values indicate balancing selection.
To detect loci likely to be under recent positive selection in the Cape Coast and Navrongo monoclonal chromosome 3 isolates, we calculated the standardized integrated haplotype score (|iHS|) for each SNP with a MAF >0.05 in chromosome 3 (358 out of the 2504 and 608 out of the 1954 remaining SNPs from Cape Coast and Navrongo respectively) [42]. Again, for the purpose of this analysis, the Fws metric was used to estimate these monoclonal chromosome 3 isolates in the retained variants within the Chromosome 3 region (2504 in Cape Coast and 1954 SNPs in Navrongo).
|iHS| measures the amount of extended haplotype homozygosity (EHH) at a given SNP along the ancestral allele relative to the derived allele [42]. The reference and alternate alleles were characterized as ancestral and derived alleles respectively. This was done in R using the rehh package v2.0.4 [43]. Genomic regions under positive selection were identi ed as those with multiple SNPs having |iHS| values >3 and formed the focal SNPs for extended haplotype homozygosity (EHH) analysis. EHH for both the reference and alternate alleles were calculated and bifurcation plots generated to visualize the decay of EHH at increasing distances from the focal SNP loci [44] using rehh package v2.0.4 in R.

Results
Minor allele frequency distribution of Pfcsp A total of 90 SNPs within the Pfcsp were analyzed for minor allele frequency (MAF). The P. falciparum population from Navrongo was more variable at Pfcsp (55 SNPs) compared to Cape Coast (35 SNPs), (Fig 2). The allele frequency distribution of all putative SNPs within the Pfcsp loci ranged between 0.001-0.45 in Navrongo and 0.001-0.40 in Cape Coast (Fig 2). As expected for natural P. falciparum populations in Africa (high transmission settings), the allele frequency spectrum was dominated by very low frequency alleles (MAF ≤ 0.05) in both populations. Rare alleles (MAF ≤ 0.01) were observed at 62.9% (22/35) and 61.8% (34/55) in Cape Coast and Navrongo respectively. Low-frequency variants 20% (7/35) and 10.9% (6/55) (MAF range = [0.01-0.05]) were observed in Cape Coast and Navrongo respectively. However, the remaining alleles were observed in moderate to high MAF in both populations implying some underlining evolutionary events.

Genetic diversity of Pfcsp C-terminal haplotypes
To assess the extent of genetic diversity and similarity within and between the two populations, we investigated the diversity in the C-terminal region of Pfcsp (231bp) from a total of 440 DNA sequences from Cape Coast (n = 184) and Navrongo (n = 256) ( Table 1) and summarized this in a Templeton, Crandall, and Sing (TCS) network (Fig 4).
From genetic diversity indices analyzed, Pfcsp C-terminal from Navrongo isolates was generally more diverse than in Cape Coast (Table 1). In summary, we observed more nucleotide polymorphisms (K= 3.761) and segregating sites (S =16) in the Navrongo than Cape Coast (K = 1.148, S = 8). Consequently, Pfcsp nucleotide diversity ( was also higher in the Navrongo isolates ( 0.0007) than in isolates from Cape Coast ( = 0.005 0.0004). Haplotype diversity was also higher in Navrongo (Hd = 0.925 0.009) in comparison with Cape Coast (0.718 0.026) parasite isolates.  The TH2R and TH3R sites were more polymorphic in both populations than the remaining amino acid sequence in the C-terminal region of PfCSP. In general, non-synonymous mutations predominated all the isolates in both TH2R and TH3R epitope regions with implications for cross-protection. Of the 92 (184 amino acid haplotypes) and 128 (256 amino acid haplotypes) isolates from Cape Coast and Navrongo, there were 8 and 27 non-vaccine TH2R haplotypes respectively (see Additional le 3). There were also 2 and 10 non-vaccine TH3R haplotypes in Cape Coast and Navrongo respectively, with 1 non-vaccine haplotype (NKPKDELNYAND) shared between the two populations (Additional le 3). The frequency of the Pf3D7-type TH2R vaccine haplotype (PSDKHIKEYLNKIQNSL) was 56.5% and 7.4% in Cape Coast and Navrongo respectively (Fig 5A). While there was 79.3% and 18.4% for the Pf3D7-type TH3R vaccine haplotype (NKPKDELDYAND) (Fig 5B) in the Cape Coast and Navrongo isolates respectively. The amino acid differences observed between Pf3D7 reference (3D7 0304600.1, PlasmoDB) and the Ghanaian isolates ranged between 1 -6 in both epitope regions (see Additional le 3).

Population differentiation and structure of Pfcsp
Overall Weir and Cockerham's Fst between Cape Coast and Navrongo Pfcsp populations was <0.05 ( Fig  6A) which signi es minimal population differentiation due to genetic structure and suggesting gene ow between the populations despite the geographic distance between the sites. This also con rms the lack of genetic structure observed between Cape Coast and Navrongo parasite isolates through principal component analysis (Fig 6B).

Evidence of selection within populations
Tajima's D values were greater than zero in the TH2R and TH3R epitope regions of the C-terminal loci of Pfcsp (221,422-221,583) for the population of monoclonal Pfcsp isolates from Navrongo (Fig7A) suggesting balancing selection. However, a Tajima's D < 0 was seen in the Cape Coast population at these loci suggest likely directional selection or clonal expansion in the population. Alleles at SNP locus population, suggesting recent positive selection (Fig 7C). The extended haplotype homozygosity revealed some extended haplotypes from the focal SNP locus 221554 in the Navrongo population, but no longrange haplotypes extended beyond 221554 (Fig 8A & 8B).

Discussion
The RTS,S/AS01 malaria vaccine is based only on the Pfcsp sequence of the P. falciparum 3D7 clone [18] and strain-speci c immunity has been con rmed for the licensed vaccine [15]. To provide new insights into how well RTS,S/AS01 may perform if implemented on a large scale in different malaria endemic regions, it is important to assess the intra-host diversity and extent of diversity in circulating parasites from different transmission settings.
Using Pfcsp sequence data generated from whole genome sequencing of 92 and 128 clinical parasite isolates from Cape Coast in the coastal savanna region and the Kassena-Nankana districts (KNDs) in the Guinea savannah zone of the Upper East Region of Ghana, we observed a higher within host malaria parasite diversity in Navrongo with 49.2% of infections having Fws <0.95 than in Cape Coast where only 28.3% of infections had Fws <0.95. This high genetic diversity is known for high transmission areas where infected individuals usually harbor more polyclonal infections compared to those living in low transmission areas, where infections are often monoclonal [45]. Malaria transmission in Navrongo is much higher (EIR=157) than in Cape Coast (EIR= 50) [21,22]. These ndings are consistent with high outcrossing potential in the parasite population in the KNDs compared to that in the coastal town of Cape Coast. This marked difference in within-host diversity is noteworthy for future region-speci c vaccine intervention strategies.
We observed high genetic diversity in the C-terminal TH2R and TH3R amino acid epitopes in the two sites.
Notably, the vaccine-speci c Pf3D7-type haplotype in the TH2R and TH3R epitopes represent about 56.5% and 79.3% respectively of the observed haplotypes in Cape Coast and only about 7.4% and 18.4% of the Navrongo isolates. The observed variance in location-speci c diversity in these epitopes, which correlates with malaria transmission intensity is consistent with ndings from previous studies [8][9][10][46][47][48]. Such polymorphisms at the T-cell epitopes have been suggested to be due to an immune evasion mechanism, in response to host T-cell immune responses [10] or selection in the mosquito host during the malaria transmission cycle [46]. We observed amino acid differences within the TH2R and TH3R epitope regions ranging from 1 to 6 at each epitope in both parasite populations with implications for the duration of vaccine e cacy [13]. This is similar to amino acid haplotype differences observed in the C-terminal region in the Zambian and DRC population, ranging from 2 to 10 [16]. In addition, there were more amino acid substitutions in the Navrongo parasite populations than in Cape Coast, which is consistent with the lower frequency of vaccine haplotype observed in the network analysis for the Navrongo parasite population and this will have implications on vaccine e cacy in comparison with high malaria burden populations in Ghana. Another hypothesis drawn from a previous study suggests that polymorphism at the T cell epitopes could also be driven by an evolutionary response to intermolecular interactions at the surface of CSP [49].
The high degree of location-speci c Pfcsp diversity observed in Ghana might result in differences in vaccine e cacy, potentially reducing RTS,S/AS01 vaccine effectiveness, particularly in Navrongo where the vaccine haplotype was less prevalent. Monitoring differential vaccine e cacy by Pfcsp haplotypes during the RTS,S/AS01 implementation programs will be valuable for such a high transmission area, where post-vaccination expansion of non-vaccine haplotypes in the population is likely to be observed and this could lead to reduced vaccine e cacy and vaccine breakthrough infections.
Our data shows that the abundance of rare alleles in both Cape Coast and Navrongo contributes to the parasite population. Despite this high level of genetic diversity resulting from non-synonymous nucleotide and amino acid substitutions observed, there remained a shared gene pool between the two sites that resulted in a largely homogeneous parasite population. Over the sampled range of 784.4 km between the two sites, there was gene ow between the local populations of P. falciparum based on Pfcsp sequence analysis, with pairwise index of differentiation (Fst) being less than 0.05. The principal component analysis further con rmed the lack of population structure or genetic isolation. Previous studies have indicated that human population mixing is likely to cause gene ow of P. falciparum parasites [50,51]. Despite the ecological and epidemiological diversity between the 2 sites, human movement between the two sites is signi cant and could be accounting for Pfcsp gene ow within the country with implications for the spread of any emerging vaccine-resistant parasite. High levels of genetic recombination in the high transmission area may explain the observed differences in haplotype diversity in Navrongo in comparison with Cape Coast ( [52]) despite the observed gene ow between the two sites.
We observed negative Tajima's D in the Cape Coast isolates signifying a likely population expansion of the 3D7 major haplotype in an area with moderate malaria transmission after over 15 years of enhanced nationwide malaria control interventions (chemotherapy and vector control). This result corroborates with the nding from Thiès, Senegal where increased deployment of malaria control interventions resulted in an increase in the frequency of clonal strains and a decrease in the probability of multiple infections [53].
Evidence of recent positive and balancing selection was observed in the Navrongo parasite isolates. The majority of alleles present at the C-terminal region in the Navrongo parasite population had a positive Tajima's D score and were highly polymorphic, which is likely due to balancing selection in response to host immune pressure on this immunogenic epitope [46,54,55]. Evidence of balancing selection on Pfcsp had been reported previously for a population from Malawi [46]. Balancing selection is common for immune targets and has been reported in other vaccine antigen candidates such as in the domain I epitope of Pf38 gene (found on the merozoite surface) in Papua New Guinea and the Gambia [56] and also in the extracellular domains of AMA1, a target of allele-speci c immune responses [57]. However, seasonal genetic drift among loci attributable to sampling across multiple transmission seasons in Navrongo population may contribute to the balancing selection observed. We also observed evidence of recent positive directional selection (iHS >3) at the T-cell epitope loci in the Navrongo parasite population, this could be due to the addition of new and useful alleles to the already existing repertoire of alleles which are being maintained by balancing selection in the population [19]. On the other hand, the signature of positive selection observed in Navrongo could likely be attributed to the dominance of one allele against the others at the T-cell epitope region in the Navrongo parasite population. Considering the differences in the eco-epidemiological background and the EIRs of these two populations, the intensity of transmission at these two ecologically distinct sites could account for differences in selection signals observed [47].
The samples analyzed here were non-randomly selected from the population and this may have some limitations and bias the inferences that can be drawn from Pfcsp and PfCSP diversity. Notably, the Navrongo and Cape Coast isolates were opportunistic samples whose sequence data were deposited into the Pf3K database at different times, leading to a geographically biased set of sequences, possibly overrepresenting limited genotypes from a small number of geographic foci and in turn under-representing large higher frequency SNPs. Furthermore, the conclusions drawn from sequences obtained from any given sequence repository are subject to change as sample sizes, geographic and temporal distributions are continually updated and expanded. Another limitation that may affect the data interpretation is the small sample sizes analyzed. Finally, the Navrongo sequences obtained from Pf3k represent different periods in comparison with sequences from Cape Coast which were sampled from the same periods, this may prevent samples from these two regions from being optimally comparable. We however, observed no population structure within and between the two populations, an indication that timespan did not affect the results obtained here. In addition, samples from Cape Coast were collected from a single district hospital in comparison to the three health facilities in Navrongo. This may potentially have an impact on the results obtained, although the Cape Coast district hospital serves a wider and comparable catchment area to the three Navrongo sites. Despite these inherent limitations, the sequence analysis elaborated here is a powerful approach capable of elucidating local patterns in vaccine candidate genetic diversity and would be useful for monitoring the effect and e cacy of interventions. A larger sample size with wider geographic and temporal analysis, will further reveal the full extent of the diversity of Pfcsp locally and across Africa. This will help inform strategies for a wider implementation of the RTS,S vaccine.

Conclusion
The extent of polymorphisms of CSP observed in our study sites would likely implicate an allele-speci c immune response during the pilot Phase IV implementation trials being conducted in Ghana. Similar to observations in a study of an AMA-1 vaccine, vaccine e cacy during this trial in Ghana may be dependent on the degree of homology between the amino acid haplotypes circulating in the natural parasite populations and the 3D7 vaccine haplotype [58]. This might slowly result in a directional selective advantage of unmatched CSP haplotypes because the vaccine does not target them. This lays emphasis on the need for a polyvalent malaria vaccine [59,60].
With the ongoing Phase IV RTS,S vaccine implementation trials in Ghana, which includes populations from Cape Coast and Navrongo, the ndings from this study provide prior information on the extent of diversity in Pfcsp and the evolutionary forces driving these variations within Ghanaian natural parasite populations. This will inform the vaccine implementation outcomes and contribute to future vaccine designs. These ndings further emphasize the need for incorporating large-scale prevalence and population genetic analysis of vaccine candidate antigens into future malaria vaccine design to predict malaria vaccine outcomes.