Genetic polymorphisms in the circumsporozoite protein of Plasmodium malariae show a geographical bias

Plasmodium malariae is characterized by its long asymptomatic persistence in the human host. The epidemiology of P. malariae is incompletely understood and is hampered by the limited knowledge of genetic polymorphisms. Previous reports from Africa have shown heterogeneity within the P. malariae circumsporozoite protein (pmcsp) gene. However, comparative studies from Asian countries are lacking. Here, the genetic polymorphisms in pmcsp of Asian isolates have been characterized. Blood samples from 89 symptomatic P. malariae-infected patients were collected, from Thailand (n = 43), Myanmar (n = 40), Lao PDR (n = 5), and Bangladesh (n = 1). pmcsp was amplified using semi-nested PCR before sequencing. The resulting 89 pmcsp sequences were analysed together with 58 previously published pmcsp sequences representing African countries using BioEdit, MEGA6, and DnaSP. Polymorphisms identified in pmcsp were grouped into 3 populations: Thailand, Myanmar, and Kenya. The nucleotide diversity and the ratio of nonsynonymous to synonymous substitutions (dN/dS) in Thailand and Myanmar were higher compared with that in Kenya. Phylogenetic analysis showed clustering of pmcsp sequences according to the origin of isolates (Asia vs. Africa). High genetic differentiation (Fst = 0.404) was observed between P. malariae isolates from Asian and African countries. Sequence analysis of pmcsp showed the presence of tetrapeptide repeat units of NAAG, NDAG, and NAPG in the central repeat region of the gene. Plasmodium malariae isolates from Asian countries carried fewer copies of NAAG compared with that from African countries. The NAPG repeat was only observed in Asian isolates. Additional analysis of 2 T-cell epitopes, Th2R and Th3R, showed limited heterogeneity in P. malariae populations. This study provides valuable information on the genetic polymorphisms in pmcsp isolates from Asia and advances our understanding of P. malariae population in Asia and Africa. Polymorphisms in the central repeat region of pmcsp showed association with the geographical origin of P. malariae isolates and can be potentially used as a marker for genetic epidemiology of P. malariae population.


Background
Plasmodium malariae is one of six Plasmodium spp. that cause malaria in humans (Plasmodium falciparum, Plasmodium vivax, P. malariae, Plasmodium ovale curtisi, P. ovale wallikeri [1], and Plasmodium knowlesi [2]). Plasmodium malariae exhibits unique characteristics; it is the only Plasmodium species with a 72-h long erythrocytic stage in humans, and it can maintain low parasitaemia in humans for a decade [3], while still being infectious to Anopheles mosquito (vector). Although P. malariae is widely distributed in malaria endemic regions, fewer molecular studies have been conducted in this species compared with those in P. falciparum and P. vivax. Plasmodium malariae often maintains low parasitaemia and commonly co-infects with the highly prevalent species, P. falciparum and P. vivax. Consequently, designing experiments to study the epidemiology of P. malariae is difficult.
Comparative genomics of Plasmodium spp., including P. malariae, has been used to elucidate the evolutionary history of Plasmodium spp. that infect humans [4,5]. Population genetic studies of P. malariae should be conducted for more understanding in genetic diversity of this parasite. Measurement of gene polymorphism might be helpful for more understanding in biology of P. malariae. The polymorphic genes such as genes encoding for antigen are usually selected for using as genetic markers. One of the prominent surface antigens that is important for sporozoite function and invasion to hepatocyte is the circumsporozoite protein (CSP). CSP has been used as a marker to measure the population diversity of P. falciparum [6], P. vivax [7], and P. knowlesi [8].
CSP is the major surface protein of Plasmodium sporozoites. The gene encoding for CSP (csp) comprises 1 central repeat region and 2 nonrepeat end regions (N-and C-terminals). The N-terminal nonrepeat region contains a conserved region I located before the central repeat region. The C-terminal nonrepeat region contains 3 subregions, namely Th2R, conserved region II, and Th3R. Th2R and Th3R are identified as T-cell epitope regions [9] and are variable in natural Plasmodium populations [10]. csp has been studied in P. malariae samples collected from African countries [11]. Heterogeneity of sequences was reported among the isolates from sub-Saharan Africa with polymorphism essentially limited to the central repeat region [11]. A recent survey of P. malariae in Kenya also revealed high diversity in csp sequence [12]. However, polymorphisms in csp have not been reported in P. malariae isolates of Asia. The aim of this study is to analyse polymorphisms in csp of P. malariae field isolates collected from Thailand, Myanmar, Lao PDR, and Bangladesh. Understanding the sequence diversity within csp of P. malariae would contribute to more understanding in nature of this parasite's distribution in the regions.

Plasmodium malariae isolates and DNA extraction
A total of 89 P. malariae isolates were collected from 4 different Asian countries, including Thailand, Myanmar, Lao PDR, and Bangladesh (Table 1). This study received ethical clearance from the Faculty of Tropical Medicine, Mahidol University, Thailand (MUTM2011-049-06). Genomic DNA was extracted from the isolates according to the manufacturer's instruction (Qiagen, Germany) and stored at − 20 °C until further use.

PCR amplification of pmcsp
The DNA samples were subjected to nested PCR [13] to confirm the presence of P. malariae and detect the presence of any other Plasmodium species. Sequences of pmcsp corresponding to the accession numbers S69014, U09766, J03992, AJ001525, AJ001523, AJ002582, AJ002578, AJ002580, AJ002576, AJ001526, AJ001524, AJ002583, AJ002581, AJ002577, AJ002579, AJ002575 were retrieved from NCBI (https ://www.ncbi.nlm.nih. gov/). Gene-specific primers spanning the complete coding sequence of pmcsp were designed based on the multiple sequence alignment of pmcsp sequences ( Table 2). Semi-nested PCR approach was used for the amplification of pmcsp using the conditions described in Table 2. All PCRs were carried out with 10 mM Tris-HCl (pH 8.3), 50 mM KCl, 2 mM MgCl 2 , 125 M dNTPs, 250 nM of each primer, and 4 units of Taq Polymerase (Kapa biosystems, USA). The PCR products were examined by gel electrophoresis. All amplified PCR products were purified using the FavorPrep ™ Gel/PCR Purification Kit (Favorgen, Taiwan) and sequenced.

Genetic analysis of pmcsp
DNA sequences of pmcsp were read on both strands and analysed using BioEdit Sequence Alignment Editor Program as described previously [14]. DNA sequence polymorphisms, haplotype diversity, nucleotide diversity, and the rate of synonymous (dS) and nonsynonymous (dN) substitutions were calculated using DnaSP version 5.10.01 [15] and MEGA6 [16]. Phylogenetic analysis was done using the neighbour-joining method [17].

Overall polymorphism of pmcsp
pmcsp of 89 isolates of P. malariae (Thailand = 43, Myanmar = 40, Lao PDR = 5, and Bangladesh = 1; Table 1) was successfully PCR amplified and sequenced. Multiple sequence alignment of these sequences was performed, and sequences corresponding to primer-binding sites, including 12 amino acids at the 5′ end and 6 amino acids at the 3′ end, were removed. Thus, the total length of pmcsp used in this analysis varied from 315 to 403 amino acids. In addition to the pmcsp sequences obtained from the 89 isolates, 58 pmcsp sequences were retrieved from NCBI (https ://www.ncbi.nlm.nih.gov/). All 143 pmcsp sequences (Asia = 91 and Africa = 52) were analyzed for DNA sequence polymorphisms. DNA divergence between populations was calculated in area containing more than 30 sequences: Thailand (n = 43), Myanmar (n = 40), and previously published results from Kenya (n = 38). Average nucleotide diversity in Kenya (pi = 0.017) was lower compared with that in Thailand (pi = 0.036) and Myanmar (pi = 0.043). When compared across different continents, the nucleotide diversity in Asia (pi = 0.042) was higher compared with that in Africa (pi = 0.015). A sliding method plot with a window length of 100 bp and a step size of 25 bp using DnaSP v5 revealed a pi value in three different locations (Fig. 1a) and two continents (Fig. 1b). The haplotype diversity of pmcsp was similar across these 3 countries, and ranged from 0.660 to 0.977. However, the ratio of nonsynonymous (dN) to synonymous (dS) substitutions in Kenya (dN/dS = 0.442) was lower compared with that in Thailand (dN/dS = 1.563) and Myanmar (dN/dS = 0.880). The ratio of nonsynonymous (dN) to synonymous (dS) substitutions in Africa (dN/dS = 0.411) was lower compared  with that in Asia (dN/dS = 1.017) ( Table 3). The neutrality test was performed in different populations though Tajima's D, Fu and Li's F* and Fu and Li's D* tests. Analysis based on the non-repeat regions revealed significant differences in Asia and Africa (P < 0.05) for Tajima's D and Fu and Li's F* tests (Table 3). It was indicated that these population groups have negative value reflect a lower frequency alleles than expected under a neutral model, which can result from population size expansion after a recent bottleneck or purifying selection. A total of 538 positions along pmcsp of all 147 P. malariae isolates were used for inferring their relationship with the neighbor-joining method [17]. The bootstrap consensus tree inferred from 1000 replicates which is taken to represent the evolutionary history of those 147 P. malariae isolates analysed. Branches corresponding to partitions reproduced in less than 50% bootstrap replicates are collapsed. Results showed that most of the P. malariae isolates clustered within their respective continents, with the exception of a couple of isolates from Thailand that clustered together with the four isolates from Venezuela and showed closely related to the African isolates (Fig. 2). where r i refer to the repeat number of the sample i, j indexes country and k indexes continent. The repeat numbers of each repeat type from the sample from any country was assumed to be sampled from a Poisson distribution where its average number was λ. These average numbers λ were thought to be varied between countries and also be distributed normally around the average numbers in the continent level (μ) with the standard deviation σ. The model was fitted to the data using the Markov Chain Monte Carlo (MCMC) technique in Win-BUGS [18] (Fig. 3 and Additional file 4).

Sequence diversity in nonrepeat regions: conserved regions I and II, Th2R, Th3R
Sequences of the nonrepeat regions of pmcsp, including conserved regions I and II, Th2R, and Th3R, were  (Table 4). Limited polymorphisms were observed in all 4 conserved regions. Seven amino acid haplotypes were identified in conserved region I. Among these, the haplotype "AVENKLKQP" was the most predominant and was identified in 82.52% of isolates (118 out of 143 isolates). In conserved region II, the haplotype "ITEEWSPCSVTCG" was identified in 93.01% of isolates (133 out of 143 isolates). The Th2R and Th3R domains showed 6 and 9 haplotypes, respectively. The most prevalent haplotypes in Th2R and Th3R were present in 91.61 and 74.83% of isolates, respectively. All polymorphisms in each Th3R haplotype were nonsynonymous. Contrary to Th2R, synonymous substitutions were found in 2 samples. In addition, both N-and C-terminal nonrepeat regions of pmcsp were used for estimating the average level of genetic differentiation between each population. The Fst [19] for all pairwise comparisons between population were significant (P < 0.001) ( Table 5). The Fst value between Thailand and Myanmar (0.087) was lower than that between Thailand and Kenya (0.518) and Myanmar and Kenya (0.366). High differentiation between Asia and Africa was observed (0.404).

Discussion
In this study, perform genetic characterization of pmcsp in 89 isolates collected from Thailand, Myanmar, Lao PDR, and Bangladesh. Plasmodiummalariae isolates collected from African countries show heterogeneity in pmcsp sequence [10]. Similar results have been reported for 38 P. malariae isolates collected from Kenya [12]. A total of 143 pmcsp sequences were analysed, including those obtained from 89 isolates collected in this study along with 38 previously published African isolates. The pmcsp DNA divergence was higher in Thailand and Myanmar compared with that in Kenya, reflecting the   Fig. 3 Average number of the tetrapeptide repeats: NAAG (a) and NDAG (b), between the Asian and African samples overall DNA divergence in Asia which was also higher than that in Africa. This finding is opposed with the divergence that have been studied in P. falciparum csp from Thailand and Myanmar compared to Kenya [20]. The nucleotide diversity in pfcsp was higher in Kenya than that in Thailand and Myanmar [20]. This is likely related to lower transmission intensity of P. falciparum in Thailand and Myanmar than Kenya. High diversity in pmcsp collected from Thailand and Myanmar might related to wide range of collecting sites in each country with various time point of sample collection (year 2002-2016). However, the overall pmcsp gene characteristic of the sample collected from Asia and Africa is restricted to their continents as demonstrated by the phylogenetic tree analysis. It is referred to the genetic differentiation of pmcsp in Asia and Africa. The analysis focused on 2 main parts of csp: 1 central repeat region and 4 nonrepeat regions, including conserved regions I and II, Th2R, and Th3R. The most characteristic feature of the central repeat region in pmcsp is the tetrapeptide repeat unit. Previous studies have identified 2 major types of repeat units: NAAG and NDAG. The NAAG repeat was present in all P. malariae isolates. However, the NAAG repeat found in Asian isolates was   highly polymorphic than previously reported isolates [10,12]. Plasmodium malariae isolates from African countries carried high copy numbers of the NAAG repeat (40-79 copies), whereas isolates from Asian countries carried either low (0-38) or high (40-51) copy number of the NAAG repeat. By contrast, the NDAG repeat was highly conserved with low diversity, and its copy numbers varied from 2 to 9. The NDAG repeat is likely to be a universal repeat.  [21]. The NANP repeat in the central repeat region of pfcsp has been shown to represent an important target of antibodies isolated from individuals with naturally acquired immunity to malaria [22]. The central repeat region of csp is the immunodominant region and each Plasmodium species carries a unique pattern of tetrapeptide repeats in this region. The type and copy number of tetrapeptide units may be dependent on the immune pressure in different malaria endemic regions. Analysis of the nonrepeat regions, Th2R and Th3R, which serve as T cell epitopes revealed 6 and 9 amino acid haplotypes among 143 P. malariae isolates, respectively. Only one major haplotype was identified in both Th2R and Th3R, accounting for 91.61 and 74.83% of isolates, respectively. Amino acid haplotypes in Th2R and Th3R in P. malariae isolates were unrestricted to geographical location and showed low diversity compared with Th2R and Th3R haplotypes of P. falciparum [23] and P. knowlesi [8]. These data suggest that the central repeat region of csp may be responsible for the immune response in which the diversity might result in different levels of host immune system between Asia and Africa. The nonrepeat region from both N-and C-terminal parts of pmcsp were used to estimate genetic differentiation between populations. Pairwise comparison of P. malariae from Asia and Africa revealed high level of differentiation between the two continents, which is in concordance to previous studies on genetic differentiation in P. falciparum and P. vivax from Asia compared to other continents [10,20,24]. To gain a better understanding of the natural distribution of pmcsp polymorphisms, more number of samples from other malaria endemic regions should be investigated. Analysis of the genetic diversity of pmcsp will be valuable for understanding the population structure of P. malariae, which will further help in the development of strategies to eliminate malaria.

Conclusions
This study provides valuable information on the genetic polymorphisms in pmcsp isolates from Asia and advances our understanding of P. malariae population in Asia and Africa. High genetic differentiation between Asia and Africa inferred the different population between these two continents, which might result from different host immunity in each region.