Selections, frameshift mutations, and copy number variation detected on the surf4.1 gene in the western Kenyan Plasmodium falciparum population

Plasmodium falciparum SURFIN4.1 is a putative ligand expressed on the merozoite and likely on the infected red blood cell, whose gene was suggested to be under directional selection in the eastern Kenyan population, but under balancing selection in the Thai population. To understand this difference, surf4.1 sequences of western Kenyan P. falciparum isolates were analysed. Frameshift mutations and copy number variation (CNV) were also examined for the parasites from western Kenya and Thailand. Positively significant departures from neutral expectations were detected on the surf4.1 region encoding C-terminus of the variable region 2 (Var2) by 3 population-based tests in the western Kenyan population as similar in the Thai population, which was not covered by the previous analysis for eastern Kenyan population. Significant excess of non-synonymous substitutions per nonsynonymous site over synonymous substitutions per synonymous site was also detected in the Var2 region. Negatively significant departures from neutral expectations was detected on the region encoding Var1 C-terminus consistent to the previous observation in the eastern Kenyan population. Parasites possessing a frameshift mutation resulting a product without intracellular Trp-rich (WR) domains were 22/23 in western Kenya and 22/36 in Thailand. More than one copy of surf4.1 gene was detected in western Kenya (4/24), but no CNV was found in Thailand (0/36). The authors infer that the high polymorphism of SURFIN4.1 Var2 C-terminus in both Kenyan and Thai populations were shaped-up by diversifying selection and maintained by balancing selection. These phenomena were most likely driven by immunological pressure. Whereas the SURFIN4.1 Var1 C-terminus is suggested to be under directional selection consistent to the previous report for the eastern Kenyan population. Most western Kenyan isolates possess a frameshift mutation that would limit the expression of SURFIN4.1 on the merozoite, but only 60% of Thai isolates possess this frameshift, which would affect the level and type of the selection pressure against this protein as seen in the two extremities of Tajima’s D values for Var1 C-terminus between Kenyan and Thai populations. CNV observed in Kenyan isolates may be a consequence of this frameshift mutation to increase benefits on the merozoite surface.


Background
Malaria poses a serious public health challenge causing estimated 438,000 deaths and 214 million clinical cases in 2015, especially in sub-Saharan Africa where the dominant species is Plasmodium falciparum [1]. In human hosts, P. falciparum multiplies in the red blood cell (RBC), where malaria pathogenicity is mediated by parasite-encoded proteins expressed on the RBC invasive merozoite stage parasite and parasite-infected RBC surface. Therefore, parasite proteins that interact with the host RBCs and endothelial cells are potential immune targets and may serve as vaccine candidates for intervention.
A large type 1 transmembrane protein SURFIN encoded by a surface-associated interspersed gene (surf gene) family has a unique position in such proteins, because its extracellular Cys-rich domain (CRD) and intracellular Trp-rich (WR) domain have homology with a variety of adhesins expressed on the surface of the RBC infected by not only P. falciparum but also other Plasmodium species, for example, CRD with PIR proteins encoded by the Plasmodium interspersed repeats (pir) super multigene family in primate and rodent malaria parasites and WR with P. falciparum PfEMP1 and Plasmodium knowlesi SICAvar [2], suggesting SURFIN components were utilized to generate lineage-specific adhesins expressed on the iRBC surface through domain shuffling with other proteins. These shared structural components point to an evolutionary basal position and important roles of SURFIN in the parasite-host interaction. P. falciparum SURFIN 4.1 is proposed to be involved in the reversible association with target RBCs when it is located on the merozoite surface [3]. Furthermore, exogenously expressed partial SURFIN 4.1 in P. falciparum was observed to be transported to the iRBC [4], suggesting that this protein is potentially also expressed on the iRBC surface as its paralog SURFIN 4.2 [2]. Interestingly, in P. falciparum 3D7 and MS822 lines, SURFIN 4.1 is truncated just after the transmembrane region due to a premature stop codon caused by a frameshift mutation. Conversely, in another line FCR3, this frameshift mutation does not occur and SURFIN 4.1 contains two WR domains in the intracellular region [4]. Moreover, the sequence of IT line P. falciparum in the PlasmoDB database (PFIT_0400900) does not possess these frameshifts and the protein product is expected to contain three WR domains similar to SURFIN 4.2 (Fig. 1). Since the intracellular region containing WR domain has an important role for the iRBC surface expression in the case of SURFIN 4.2 [5], SURFIN 4.1 would be expressed on the iRBC surface only by P. falciparum isolates possessing WR domain. SURFIN 4.1 variants without WR domain would be expressed only on the merozoite surface. The abundance of such type of SURFIN 4.1 would affect the evolutionary process and diversity as the exposure times to the host immune system are very different (>24 h on the iRBC surface versus up to several minutes on the merozoite surface). Optimal specificity and strength of the putative SURFIN 4.1 -host receptor interactions, although not proved yet, would be different (iRBC-endothelial cells/RBC versus merozoite-RBC). Copy number variation (CNV) was also reported for surf 4.1 , but its biological significance is not known [2].
Population genetic approaches offer an avenue to study the effect of acquired immune responses on pathogen genetic polymorphism [6]. This strategy that involves study of nucleotide diversity for signatures of selection on particular antigens has unveiled candidate vaccine targets such as AMA1 [7][8][9], MSP1 [10], MSP2 [11,12], MSP3 [13], EBA175 [14] and SURFIN 4.2 [15] among others. surf 4.1 gene region encoding SURFIN 4.1 extracellular region possesses more polymorphic sites than the corresponding region of the surf 4.2 gene. Significant positive deviations of Tajima's D value from zero were detected on surf 4.2 in both Thai and Kenyan parasite populations and positive selection was proposed on this gene [15,16]. However, this is not the case for surf 4.1 , for which positively deviated Tajima's D values were detected in Thai P. falciparum population [17], but negatively deviated values in the parasite population from Ngerenye area of Kilifi, eastern Kenya [15]. To gain insights on the cause of this difference, the authors determined surf 4.1 gene sequences in the western Kenyan P. falciparum population and evaluated allelic diversity, the type of selection pressure, frameshift mutations, and copy number variation.

Parasite collection and DNA extraction
Twenty-three P. falciparum isolates obtained in the Lake Victoria islands of Kibuogi, Ngodhe, Takawiri and the shore village of Ungoye in Kenya in August 2014 [18] were adapted to the in vitro culture basically as described previously [19]; briefly, in RPMI-1640 medium containing 10% heat-inactivated pooled type AB + human serum, 200 mM hypoxanthine, 20 µg/mL gentamicin and O + human RBC at 2% hematocrit. Human erythrocytes and plasma used for the culture were obtained from Nagasaki Red Cross Blood Center. The sampling was authorized by the Ethical Review Committee of University of Nairobi and Kenyatta National Hospital and carried out in accordance with the approved guidelines. After a short period of culture (≤15 days, median 7 days), genomic DNA (gDNA) were extracted from the parasites using QIAamp DNA Mini Kit (Qiagen, Valencia, CA) according to the manufacturer's instruction. Four isolates were cloned by limiting dilution before DNA extraction to determine the sequence encoding SURFIN 4.1 (Gene ID PF3D7_0402200) extracellular region. To further analyse the frameshifts and CNV for the western Kenya parasites, DNA was obtained after cloning. For the cloned parasites, it took up to 40 days from starting culture to extracting DNA.
To evaluate the frameshifts and CNV of surf 4.1 in Thai P. falciparum isolates, genomic DNA from 25 isolates and 3 clones prepared previously was used and showed clear single peaks for surf 4.1 sequence [17]. The following Thai isolates were newly cloned before DNA extraction to avoid potential problems caused by the infection of more than one allele; MS807-H3, MS814-F1, MS815-A2, MS818-C2, MS819-A3, MS829-B5, MS833-B4, and MS844-A4. PCR products were subjected to a 1.5% agarose gel electrophoresis and visualized with ethidium bromide under UV transillumination. Negative control reaction was always set using distilled water as a template solution. DNA standard marker was used to evaluate the size of the PCR products.

Polymerase chain reaction (PCR) amplification and sequencing
When the PCR band on the agarose gel was confirmed to be single with least or no background, amplified DNA fragments were treated with ExoSAP-IT (GE Healthcare, Buckinghamshire, UK) and sequenced directly with ABI PRISM ® BigDye ™ Terminator ver1.1 using a panel of oligonucleotide primers as described before [17] according to the manufacturer's instruction. Sequences were then analysed with an ABI3730 DNA analyzer (Applied Biosystems, Foster City, CA). The sample with multiple peaks in chromatogram, indicating a mixed allele infection, were sequenced after cloning of the PCR-amplified DNA fragment into pGEM ® -T Easy (Promega, Madison WI) (KT14-111-T1, KU14-119-T1, KU14-119-T2) or sequencing was performed from the PCR products obtained from parasite lines after parasite cloning (KN14-076-B6, KT14-158-G3, KU14-061-B1, and KU14-257-A7). Sequences supported by at least two independent plasmid clones were employed. One isolate (KU14-119) yielded 2 distinct surf 4.1 sequences (KU14-119-T1 and KU14-119-T2) that were supported by 2 and 4 independent plasmid clones, respectively.

Quantitative real-time PCR (qPCR)
Copy number variation of surf 4.1 gene was examined by the comparative C T method using ama1 gene as a control gene locus, for which CNV has not been noted. qPCR for surf 4.1 was performed with the following conditions; an initial denaturation step of 95 °C for 10 min followed by 50 amplification cycles of 95 °C for 15 s, 45 °C for 20 s, and 62 °C for 1 min. Reactions were performed using Power SYBR Green Master Mix (Applied Biosystems) with primers PFD0100c.rtF2 (TAAGAACA-GAACATAATTATGATAA, nt 308-332) and PFD0100. rtR1 (CAATCCTGTTCTGCATATTTTATG, nt 431-409) using Applied Biosystems 7500 Fast Real-Time PCR System. qPCR for ama1 (PF3D7_1133400) was performed as same as for surf 4.1 with a slight difference; with primers fAMA1-RT.F1 (AAGACGAAAATACAT-TACAACACGCA, nt 203-228) and fAMA1-RT.R1 (CTACTCTTATACCTGAACCATGAACT, nt 388-363) and the combined annealing/extension step at 62 °C for 1 min. The quality of the PCR-amplified products were validated by the melting curve method. The amplification efficiency of qPCR for both surf 4.1 and ama1 were determined to be 0.7 using serially diluted DNA solution of P. falciparum 3D7 line and used for the calculation.

Statistical analyses
Sequences were aligned using a CLUSTAL W program [20]. The mean numbers of synonymous substitutions per synonymous site (d S ) and nonsynonymous substitutions per nonsynonymous site (d N ) and their standard errors were computed using the Nei and Gojobori method [21], with the Jukes and Cantor correction, implemented in MEGA4.0 [22]. The statistical difference between d S and d N was tested using a one-tailed Z-test with 500 bootstrap pseudo samples using MEGA. A value of d N significantly higher than d S at the 95% confidence level was taken as evidence for positive selection. When the number of synonymous (Sd) or nonsynonymous (Nd) differences were less than 10, Fisher's exact test was used to estimate the significant difference. Nucleotide diversity (π) was computed using DnaSP5.0 [23]. Sliding window plots of the nucleotide diversity (90 bases with a step size of 3 bases) was computed using DnaSP5.0. Images of the sliding window plot results were generated using Excel software and modified using Adobe Photoshop. Nucleotide and amino acid (aa) positions are after 3D7 line sequence.
Tajima's D test was used to evaluate a departure from the neutral evolution model by comparing θ (nucleotide diversity estimated based on the number of segregating site, S) and π (observed pairwise nucleotide diversity) to investigate whether polymorphic single nucleotide alleles tend to occur at higher or lower frequency than expectation under neutral drift [24]. Fu and Li's D* and F* tests evaluate departures from neutrality by comparing the number of mutations in the external (considered to be "new" mutations) and internal (considered to be "older" mutations) branches of the genealogy. The number of external mutations would be deviated from neutral expectation by the selective pressure, whereas the number of internal mutations is less affected. Under positive balancing selection, the number of internal "old" mutations is expected to be higher than the number of external "new" mutations. Fu and Li's D* compares the estimated θ based on the number of singletons (mutations appearing only once among the sequences, which is new and locates in the external branches) and that based on S. Fu and Li's F* compares the estimated θ based on the number of singletons and that based on k (average number of pairwise nucleotide difference) [25]. These tests were executed using DnaSP5.0.

Polymorphism of the surf 4.1 gene and its product in the western Kenyan P. falciparum population
Twenty-four nucleotide sequences (nt 4-2289) encoding the extracellular region of SURFIN 4.1 were obtained from western Kenyan field isolates in this study. A total of 410 polymorphic nucleotide sites were observed with an average pairwise nucleotide diversity of 0.048. All insertions/deletions (indels; AAT at nt 1348-1350, TAC at 1746-1748, GGA at 2272-2274) found in previously reported Thai surf 4.1 sequences were also detected in western Kenyan isolates. To evaluate the regions accumulating the polymorphisms, the extracellular region of SURFIN 4.1 was divided into 4 regions based on amino acid sequence conservation among SURFIN members: N-terminal segment (Nter; aa 1-50, nt 1-150), Cys-rich domain (CRD; aa 51-195, nt 151-585), a variable region 1 (Var1; aa 196-502, nt 586-1506), and −2 (Var2; aa 503-765, nt 1507-2289) as described previously [17]. The polymorphic sites were mainly distributed in the variable region; 398 among 410 polymorphic nucleotide sites were located in the variable region and 11.7% of 918 nucleotides consisting Var1 region and 37.2% of 783 nucleotides consisting of Var2 region were polymorphic ( Table 2). The degree of the polymorphism was enhanced at amino acid level; 26.0% of 307 amino acids consisting of Var1 and 66.6% of 261 amino acids consisting of Var2 were polymorphic. Sliding window plots of nucleotide diversity and amino acid polymorphism indicated that in the Var2 region more polymorphism accumulated toward the C-terminal side as similar as the previous finding for Thai parasite population (Fig. 2) [17].

Selection on the surf 4.1 gene in the western Kenyan P. falciparum population
The signatures of selection on surf 4.1 were first evaluated in the western Kenyan parasite population by comparing d S and d N ( Table 1). Significant excess of d N over d S was observed when the entire sequence or only Var2 region were analysed (p = 0.004 and 0.0003, respectively). No significant difference between d N and d S was found in Nter, CRD, and Var1 regions (Fisher's exact test). To visualize the sites where the excess of d N over d S is observed and compare the pattern among 3 different geographical locations, the sliding window plots of d N /d S calculated for the surf 4.1 gene sequences previously reported from the eastern Kenyan (n = 51) and Thai (n = 37) P. falciparum populations were overlaid in Fig. 3 [15,17]. All three populations showed a similar pattern and two Kenyan populations showed a significant excess of d N over d S at the Var1 C-terminus and almost entire Var2 region (p < 0.05).
No significant deviation from the neutrality was detected by 3 population-based tests when the entire obtained sequence was analysed, or the 4 subdivided regions were analysed separately ( Table 2). To analyse further the region where the potential balancing/directional selection act on the surf 4.1 gene and compare among 3 different geographical locations, the sliding window plots of Tajima's D, Fu and Li's D*, and F* calculated for surf 4.1 gene sequences previously reported from the eastern Kenyan and Thai P. falciparum populations were overlaid in Fig. 4. All 3 population-based tests showed similar patterns between western and eastern Kenyan populations and revealed a significant positive deviation of greater than zero at the Var2 C-terminus (p < 0.05) in the western Kenyan population, where Thai population also showed significant positive Fu and Li's D*, and F* values and not significant but positive Tajima's D value (0.05 ≤ p < 0.1) [17]. This region was not sequenced for the eastern Kenyan population, thereby not examined in the previous report [15]. Thus, both Kenyan and Thai populations consistently showed positive deviation on the Var2 C-terminus. Tajima's test also detected significant positive deviation on CRD region (p < 0.05, Fig. 4 asterisk) in the western Kenyan population, where not significant but positive values are similarly seen in both eastern Kenyan and Thai populations (p ≥ 0.1). In addition, significant negative deviation was detected on the C-terminal half of the Var1 region (p < 0.05, Fig. 4 hashes), where significant negative deviation was also detected in the eastern Kenyan population [15], but positive selection was detected in the Thai population. Significant negative deviation observed in the region encoding the very C-terminal side of Var1 to N-terminal of Var2 region in the eastern Kenyan population (Fig. 4 dagger) was not seen in the western Kenyan population.
To understand the difference in the selection detected on the surf 4.1 gene locus between Kenyan and Thai P. falciparum populations, the pattern and frequency distribution of the polymorphic amino acid sites were visualized in these populations (Fig. 5). Both Kenyan populations (eastern Kenya, 1998, reported previously [15] and western Kenya, 2014, obtained in this study) had similar distribution of  [4], and IT line sequence has no frameshift mutations and contains three WR domains Table 1 Nucleotide diversity of Plasmodium falciparum surf 4

.1 in western Kenyan isolates (n = 24)
Extracellular, extracellular region; Nter, N-terminal segment; CRD, cysteine-rich domain; Var1, variable region 1; Var2, variable region 2; sites, sites nucleotide analyzed; indels, insertion/deletion polymorphism; k, the average number of nucleotide differences; N and S, average numbers of nonsynonymous and synonymous sites; π, pairwise nucleotide diversity (Jukes-Cantor model); d N , mean number of nonsynonymous substitutions per nonsynonymous site; d S , mean number of synonymous substitutions per synonymous site; SE, standard error computed using the Nei-Gojobori method with the Jukes-Cantor correction. SE was estimated using the bootstrap method with 500 replication The numbers of synonymous (Sd) and nonsynonymous (Nd) differences were calculated by the Nei-Gojobori method. p value indicates the statistical difference between d N and d S , tested using one-tail Z-test with 500 bootstrap pseudo samples implemented in MEGA. ns indicates not significant by two-tailed Fisher's exact tests (*). Number is after 3D7 line sequence polymorphic amino acid sites suggesting that the pattern is spatially and temporally stable between these two areas in Kenya. When the frequency of the minor allele at 181 dimorphic amino acid sites were compared, it is evident that Kenyan populations possess less dimorphic amino acid sites at intermediate frequency whereas the Thai population (1988-1989, reported previously [17]) were skewed towards more frequent intermediate alleles.

Frequency of the frameshift mutation that would result no SURFIN 4.1 expression on the iRBC surface varies between western Kenyan and Thai parasite populations
Frameshift mutations influence the expression of SURFIN 4.1 on the surface of iRBC, potentially impacting evolutionary dynamics. To infer on this possibility, the frequency of the frameshift mutations was examined at nt 2498-2503, 3894-3903, and 4529-4536 in P. falciparum lines isolated from western Kenya and Thailand ( Fig. 1; Table 3 To gain insight if these frameshifts are common in the giant ape malaria parasites close to P. falciparum [26], surf 4.1 gene sequence (PRCDC_0005300, GeneDB) of the CDC line of Chimpanzee malaria parasite Plasmodium reichenowi was examined. It was found that P. reichenowi surf 4.1 sequence did not possess these frameshifts even though only one sequence was examined, suggesting that at least an intact surf 4.1 gene without frameshifts existed in the common ancestor of P. falciparum and P. reichenowi and likely not rare.

surf 4.1 CNV is common in the western Kenyan P. falciparum population, but not in Thai population
Recent genome-wide analysis of the P. falciparum CNVs suggested that CNVs were targets of the purifying selection and particular CNVs found at high frequencies in

for the western Kenyan isolates (n = 24)
Sequence number is after 3D7 line sequence Extracellular, extracellular region; Nter, N-terminal segment; CRD, cysteine-rich domain; Var1, variable region 1; Var2, variable region 2; sites, nucleotide sites analyzed; η, the total number of mutations; S, number of segregating sites; π, observed nucleotide diversity; θ, the expected nucleotide diversity under neutrality derived from S populations with a large effective population size (such as Africa) were likely beneficial [27]. Because CNV has been reported for surf 4.1 , the copy number was examined in western Kenyan and Thai isolates. More than one copy of surf 4.1 gene was detected in western Kenya lines (17%, 3/24 showed two copies and 1/24 showed thee copies), but not in Thai lines (0%, 0/36). This suggests that even though the cytoplasmic WR domain was not expressed, surf 4.1 CNV appears to be positively selected in Africa. As a control, 3D7 and FCR3 lines were examined and single and three copies were detected, respectively. Although Mphande et al. detected 6 copies of surf 4.1 in FCR3, CNVs is known to rapidly change during the in vitro culture, so this may be due to the different culture history of FCR3 [3].

Discussion
In this study, P. falciparum field isolates were collected from patients in western Kenya, adapted to in vitro culture, and 24 sequences encoding SURFIN 4.1 extracellular region were determined. Despite the previous report describing only negative selection on surf 4.1 in the eastern Kenyan parasite population [15], positive balancing selection was detected on this gene in the western  [15] and Thai [17] are plotted with a solid cyan line and a dotted black line, respectively. Sites significantly departed from the neutrality (p < 0.05, two-tailed) are indicated with circle, square, and diamond symbols on each line. Asterisk and hash symbols indicate the region where positive or negative deviation from the neutrality, respectively, were detected in the western Kenyan parasite population in this study. Dagger symbol indicates the region where negative deviation from the neutrality was detected in the eastern, but not western Kenyan population [15]. Negative values were detected on the surf 4.1 gene encoding Var1 C-terminus by population-based analyses in the western Kenyan parasite population, which were also detected for eastern Kenyan population and inferred to be a reflection of directional selection [15]. However, positive values were detected for this region in Thai population [17]. To seek possible explanation for this discrepancy, the study examined frameshifts in the cytoplasmic region that cause a surf 4.1 product having only 19 aa cytoplasmic tail without WR domain, which is implicated to be responsible for the iRBC surface exposure. Ninetysix percent of Kenyan P. falciparum lines possessed this frameshift mutation and were expected to be exposed to the host immunity only on the merozoite surface. If SURFIN 4.1 exposed on the iRBCs is involved with the adherence to the endothelial cells or uninfected RBCs, the frequency of the observed alleles is a result of a balance between pressures to increase the alleles with most fitted specificity and strength to recognize the receptor and the other pressure to escape from the host immunity. Because Var1 region may participate to form a homomeric complex [4], dominant alleles in Kenya may have a better fitness to function on the merozoite surface.
Based on the finding in this study, the authors propose the following model for the surf 4.1 gene evolution in Kenya and Thailand (Fig. 6). Ancestor of surf 4.1 gene with intact full length of open reading frame was originally formed in Laverania Plasmodium infecting giant apes in Africa [26], as suggested by the sequence of the P. reichenowi surf 4.1 ortholog without frameshift mutations. After the introduction of the ancestral P. falciparum to human population, most of the currently observed surf 4.1 polymorphism was shaped under the diversifying selection in Africa, which is suggested by the significant excess of d N over d S and the conserved polymorphism between Kenyan and Thai parasite isolates. It should be noted that the polymorphism of SURFIN 4.1 is more abundant than that of SURFIN 4.2 that is still expressed on the iRBC, suggesting stronger immune pressure to SURFIN 4.1 [17]. Frameshift mutation that results in truncated SURFIN 4.1 expressed on the merozoite surface but not on the iRBC emerged and expanded in Africa. Although the cause of increased frequency of this frameshift mutation in the P. falciparum population is unclear, a loss-of-function of SURFIN 4.1 on the iRBC surface (such as a deficient mutant of the host receptor) followed by immunity against non/less-functional SURFIN 4.1 would favour this frameshift mutation. Such a hypothesis is plausible as it is known that some human molecules were evolved to escape from the parasite recognition, for example majority of west Africans do not express the Duffy Antigen Receptor for Chemokines (DARC), a receptor for the other human malaria parasite Plasmodium vivax, on their RBC and became resistant against this malaria parasite species [28,29]. Acquiring a role only on the Table 3 Existence of frameshift mutations in the surf 4 it is not clear if the frequency of this frameshift mutation was increased, maintained, or decreased to ~60% after the P. falciparum migration to Asia, and also if the frequency is still increasing toward the level in Africa (~95%). The transmission intensity in Southeast Asia is much lower than Africa, reducing the chance to mate and produce novel haplotypes, which may influence the speed to change the frequency of particular alleles that fit to the new environment. However, because P. falciparum was speculated to migrate to Asia with humans around ~50,000-60,000 years ago [30], the intermediate frequency of this frameshift mutation observed in Thailand may be already adapted to the local environment.
If this is the case, the hypothetical SURFIN 4.1 receptor may be still expressed in the Asian human population. Nonetheless, more studies are needed to understand the background of this complex genetic signature, especially identification of the hypothetical SURFIN 4.1 receptor would provide a critical answer to this question.

Conclusions
Signatures of diversifying and balancing selections were detected on the surf 4.1 region encoding SURFIN 4.1 Var2 C-terminus in western Kenyan P. falciparum population similar to Thai parasite population [17]. Whereas, signature of directional selection was detected on the region encoding SURFIN 4.1 Var1 C-terminus in the western Kenyan population, consistent to the previous report for the eastern Kenyan population [15]. Most western Kenyan isolates possess a frameshift mutation that would limit the expression of SURFIN 4.1 on the merozoite, but only ~60% of Thai isolates possess this frameshift, which would significantly affect the level and type of the selection pressure against this protein. CNV was observed in 4/24 Kenya isolates and 0/36 Thai isolates, which may be a consequence of this frameshift mutation and to increase expression on the merozoite surface.