Worldwide Genetic Variability of the Duffy Binding Protein: Insights into Plasmodium vivax Vaccine Development

The dependence of Plasmodium vivax on invasion mediated by Duffy binding protein (DBP) makes this protein a prime candidate for development of a vaccine. However, the development of a DBP-based vaccine might be hampered by the high variability of the protein ligand (DBPII), known to bias the immune response toward a specific DBP variant. Here, the hypothesis being investigated is that the analysis of the worldwide DBPII sequences will allow us to determine the minimum number of haplotypes (MNH) to be included in a DBP-based vaccine of broad coverage. For that, all DBPII sequences available were compiled and MNH was based on the most frequent nonsynonymous single nucleotide polymorphisms, the majority mapped on B and T cell epitopes. A preliminary analysis of DBPII genetic diversity from eight malaria-endemic countries estimated that a number between two to six DBP haplotypes (17 in total) would target at least 50% of parasite population circulating in each endemic region. Aiming to avoid region-specific haplotypes, we next analyzed the MNH that broadly cover worldwide parasite population. The results demonstrated that seven haplotypes would be required to cover around 60% of DBPII sequences available. Trying to validate these selected haplotypes per country, we found that five out of the eight countries will be covered by the MNH (67% of parasite populations, range 48–84%). In addition, to identify related subgroups of DBPII sequences we used a Bayesian clustering algorithm. The algorithm grouped all DBPII sequences in six populations that were independent of geographic origin, with ancestral populations present in different proportions in each country. In conclusion, in this first attempt to undertake a global analysis about DBPII variability, the results suggest that the development of DBP-based vaccine should consider multi-haplotype strategies; otherwise a putative P. vivax vaccine may not target some parasite populations.


Introduction
After more than a century of development of malaria control measures, Plasmodium vivax remains more widely distributed than Plasmodium falciparum and it is a potential cause of morbidity and mortality. Around 2.85 billion people live at risk of infection by P. vivax, with the greatest burden occurring in Middle East, Asia, the Western Pacific, Central and South America [1,2]. Although neglected, P. vivax causes important socioeconomic loss, with the overall global cost of vivax infection estimated as being between US$1.4-4.0 billion per year [3]. The recent emergence of drugresistant strains and severe (sometimes fatal) disease challenges the traditional view of P. vivax malaria as a benign infection [4,5]. Consequently, the malaria eradication research agenda (MalERA) placed the P. vivax in the top list of priorities [www.ploscollections. org/malERA2011].
The complex life cycle of the Plasmodium includes an erythrocytic phase that is responsible for clinical symptoms of human malaria. While P. falciparum invades mature as well immature red blood cells (RBC) through multiple invasion pathways, P. vivax invades preferentially reticulocytes [6] and requires mainly the interaction of parasite ligand to Duffy antigen/ receptor for chemokines (DARC) on RBC membrane [7]. The P. vivax ligand is a 140 kDa micronemal type I membrane protein, called the Duffy binding protein (DBP), and gene-deletion experiment showed that DBP plays an important role in the irreversible junction of the merozoite with host erythrocytes, a key step of human infection [8]. Cysteine-rich region II of the DBP (DBP II ) comprises erythrocyte binding motif known as Duffybinding-like domain (DBL) [9], which is also found in other erythrocyte binding proteins (erythrocyte binding antigen 175 -EBA-175, EBA-140 and EBA-181) and in cytoadherent proteins (Plasmodium falciparum erythrocyte membrane protein 1) [10]. The crystal structure of the orthologous DBP ligand domain of the simian malaria Plasmodium knowlesi provided insight into the molecular basis for receptor recognition of the PvDBP. The proposed DARC-recognition site of DBP lies in a solventaccessible groove on a fairly flat surface and exposed site for DARC recognition in subdomain 2 of DBP II [11]. Recently, Bolton and Garry (2011) demonstrated an additional region on subdomain 1 of DBP II that might be also necessary for DARC binding [12].
DBP II is an important anti-P.vivax vaccine candidate since antibodies against this molecule: (i) inhibit in vitro their binding to DARC; (ii) reduce merozoite invasion of human erythrocytes; and (iii) confer protection against blood-stage infection [13,14,15,16]. The important role of DBP-DARC interaction is reinforced by previous studies that showed individuals without DARC on their erythrocytes surface are highly resistant to P. vivax invasion [17,18]. In addition, studies developed in Brazil and Papua New Guinea showed reduced susceptibility to P. vivax infection in heterozygous carriers of one DARC-negative allele compared to two DARC-positive allele carriers [19,20]. However, the paradigm of the absolute dependence on the presence of Duffy on the red cell for P. vivax infection has been recently questioned for some findings indicating that P. vivax can infect and cause disease in Duffy-negative people [21,22,23]. Nevertheless, this situation seems to occur in specific areas and/or a small proportion of the populations, thus the epidemiological importance of this alternative pathway seem to be restricted to specific endemic areas, such as Madagascar [23].
Analysis of genetic variability from field parasites showed that the DBP binding domain (region II) is highly polymorphic [24,25,26,27,28,29,30,31], therefore it might hamper the vaccine development as some variable residues alter immune recognition of protein [32,33,34]. The excess of non-synonymous substitutions observed in DBP II is consistent with the hypothesis of positive selective pressure acting on this protein domain, and suggests allelic variation as a mechanism of immune evasion [35]. As antigenic variation presents a major limitation in successful vaccine design, we analyzed all DBP II sequences recorded in GenBank in order to identify the main haplotypes shared among P. vivax isolates from different malaria-endemic areas and undertake a detailed analysis of the nucleotide diversity of the dbpII gene. Here we show the need to include a minimum number of DBP II haplotypes in a DBP-based vaccine; otherwise no broad coverage against worldwide P. vivax isolates might be reached.

Data Analysis
Genetic diversity analysis. DBP II sequences were aligned and compared using the Clustal W multiple alignment algorithm in BioEdit Sequence Alignment editor [36] to identify the single nucleotide polymorphisms (SNPs). Gaps were removed from alignments because indels (insertions/deletions) and repeats evolve by different mechanisms than SNPs and might result in false estimates of biologically significant diversity. The number of segregating sites (S), haplotypes (H), nucleotide diversity (p), haplotype diversity (Hd), and the corresponding standard deviations (SD) were estimated using DnaSP 5.10 software [37]. Between-population differentiation using the pairwise fixation index F ST was measured with Arlequin 3.5 software [38]. Haplotype construction. Haplotypes (combinations of nucleotides with no particular weight placed upon any position) were constructed by using DnaSP 5.10. We removed synonymous SNPs to focus the analysis only on the protein diversity. Cluster (population) analysis. To determine whether our sample could be grouped into genetic clusters and to infer the number of clusters (K) that best fit the data, we used the Bayesian clustering method implemented in the Structure 2.3 software [39,40]. Structure was run 10 times for K = 1-10 for 30,000 Monte Carlo Markov Chain (MCMC) iterations after burn-in period of 10,000 using the admixture model and correlated allele frequencies. We did not use prior information about population origin for each individual (USEPOPINFO = 0). The mean log probability of the data (Ln P[D]) and its standard deviation was plotted to predict the optimal value for K. Graphs of Structure results were produced by using the DISTRUCT program [41].

Polymorphism and genetic differentiation
We compiled 511 sequences from GenBank for the gene fragment encoding Duffy binding protein region II (DBP II ). The population dataset included sequences from the natural parasite populations of eight countries (Table 1): Brazil [35], Colombia [25], India and Iran [30], Papua New Guinea [27,42,43], South Korea [28], Sri Lanka [31] and Thailand [44]. The average of sample was 64 sequences (ranged from 11 to 123) per country. By analyzing a region of the DBP II of 676 bp that is available for the overall dataset, 127 polymorphic sites were identified (ranged from 16 to 73 per country) with a nucleotide diversity (p) varying between 0.006 and 0.0109 (South Korea and Thailand, respectively). Most of these SNPs (55%) are singletons, i.e. observed only in one sequence. Despite the wide range of number of haplotypes per country (ranged from 9 to 73, mean of 24), the levels of haplotype diversity (Hd) among them were equally high and quite similar (ranged from 0.922 to 0.993 in Sri Lanka and Colombia, respectively). In order to remove most potential sequencing errors that could interfere with the analysis and interpretation of the results, additional analyses were performed excluding singleton polymorphisms (Table S1). By comparing both analyses a significant bias could be detected in the number of segregating sites and haplotypes in South Korea and PNG samples. However, nucleotide (p) and haplotype diversity (Hd) were not significantly affected by this further analysis because these diversity parameters exclude polymorphisms that are present at low frequencies.
To determine how the observed diversity was distributed among geographic regions, population structure was inferred by measuring genetic differentiation among countries (F ST ). The highest differentiation was identified between Colombia and South Korea (F ST = 0.384) and the lowest differentiation was detected between Iran and Brazil (F ST = 20.011) ( Table 1). We repeated the analyses for F ST using the dataset in which singletons were excluded and no significant differences were observed for F ST values among countries (Table S1).

Haplotype diversity
To focus the analysis on the putative antigenic diversity (i.e. polymorphisms that change protein sequence), the nonsynon-ymous single nucleotide polymorphism (nsSNP) haplotypes were derived for DBP II sequences. In total 46 nsSNPs were identified among all sequences with most of them being rare (19 nsSNPs with frequency #1% and 17 with frequency between 1-10%). Only 10 nsSNPs showed allele frequency above 10% in the whole world (R308S, K371E, G384D, E385K, K386N, H390R, N417K, L424I, W437R, and I503K). All but one nsSNP were found at the eight countries with DBP genetic diversity data available, except for the SNP R308S that was absent in Colombia and South Korea ( Table 2). Moreover we investigated if these polymorphisms were localized in regions previously identified or predicted as T-or Bcell epitopes. All but one nsSNP (K371E) are in regions predicted or experimentally identified as epitope in the DBP II (Table 2).
In further analysis, those 10 most frequent polymorphisms were used to build predominant DBP II haplotypes circulating in malaria-endemic areas. Seventy-three haplotypes were defined for the whole dataset, ranging from 7 to 29 haplotypes per country (Table S2). In order to determine a minimum number of DBP II haplotypes (MNH) required to be included in anti-P. vivax vaccine, we next sought to identify the number of haplotypes per country able to cover at least 50% of local parasite population. In the whole dataset, 17 out of 73 haplotypes fitted this criterion: two haplotypes in South Korea and Sri Lanka; three in Papua New Guinea, Iran and India; four haplotypes in Brazil and Colombia and six haplotypes in Thailand ( Figure 1 and Table S2). Together, the 17 haplotypes covers about 70% of worldwide parasite population. Among them, seven were shared between two or more areas, being one haplotype (Hap 23, colored in red) found in high frequency in four countries from two different continents (America and Asia). This result agrees with those from F ST analysis, which estimated low genetic differentiation between DBP II sequences from Brazil and Iran, India or Sri Lanka (Table 1, Figure 1 and    Table S2). The reference strain Sal-1 sequence, which has being used to develop a DBP II -based vaccine, covers only 10% of worldwide samples (Hap 12, colored in purple in Figure 1). So far, Sal-1 DBP II was detected only in three endemic areas (Brazil, India and PNG), being in very low frequency in other countries ( Figure 1 and Table S2).
Concerning a more global approach for DBP-based vaccine, we further sought to determine the MNH that will be able to cover the majority (,50%) of worldwide parasite population indepen-dent of the region of origin. Seven haplotypes fitted this criterion and were found in 60% of 511 DBP II sequences of P. vivax deposited in GenBank (Haplotypes 4, 12, 23, 24, 44, 59, 64 of Table S2). Considering the distribution of these 7 selected haplotypes by locality, it was possible to categorize those localities in two groups ( Figure 2). The first group includes Sri Lanka, Iran, Brazil, India and PNG, where around 50% of the parasites population will be covered by these haplotypes. The second group includes South Korea, Thailand and Colombia, where about only 24% (range 12-33%) of parasite isolates will be covered by these seven selected haplotypes.

Clustering
Aiming to identify related subgroups of DBP II sequences among parasites circulating in the study countries, individual samples were clustered to population on the basis of their genotypes, independent of their geographic location. For that, we used the clustering method that uses departures from Hardy-Weinberg equilibrium to detect population structure. The algorithm groups related individuals into a predefined number of clusters (K), herein K = 1-10. A Bayesian approach is taken to infer the K value that provides the best fit to the data as measured by the log-likelihood score. Each individual is then assigned a membership coefficient (Q) to each of the clusters with majority of the haplotypes being assigned to only one cluster at ''true'' K. The estimated log probability of our data [Ln P(D)] plateaued between K = 4-6 ( Figure 3A). Simulation studies have shown that once the real K has been reached, Ln P(D) will typically plateau or continue to increase slightly, indicating that K = 6 provides the best fit to our data. We show clustering results for K values of 2-6, being each individual represented by a vertical line, and each ancestral population in a different color ( Figure 3B). To determine whether the above-defined subgroups were geographically restricted we   Table S2). doi:10.1371/journal.pone.0022944.g002 plotted the average Q for each country. For all countries but South Korea the analysis supported low levels of differentiation among geographical regions, with three to six ancestral populations present in each country (Figure 4).

Discussion
The major pathway used by P. vivax to invade human reticulocytes depends on the interaction of the DBP with its cognate receptor, which makes DBP a high priority anti-P. vivax vaccine candidate. A significant challenge for the development of DBP-based vaccines is its highly polymorphic nature, which can biases antibodies response toward a specific DBP variant [32,33,34]. Engineering vaccines that combine all potential haplotypes that might be circulating in malaria-endemic areas could circumvent polymorphism's limitation; however, it is not feasible for a highly polymorphic antigen like DBP. Therefore, the purpose of this study was to identify predominant DBP II haplotypes that could be included in a putative vaccine, with potential to induce an immune response against P. vivax circulating around the world.
By comparing the DBP II diversity found in different countries worldwide, we demonstrated high levels of haplotype diversity among P. vivax isolates. The profile of DBP II genetic diversity was not related with the levels of malaria endemicity, since similar pattern was observed from areas with low and unstable malaria transmission, such as Brazil, as well as from highly endemic areas such as Papua New Guinea. These findings suggested that recombination plays an important role in determining the haplotype structure of DBP II , as we recently demonstrated [35]. Genetic recombination of parasites takes place in the vector, as part of the Plasmodium life cycle, and is likely facilitated by multiplicity of infections, i.e. the simultaneous infection of a host by more than one parasite variant. For P. falciparum, the levels of multiplicity of infection are partially correlated with the levels of transmission intensities [45]. Nevertheless for P. vivax, strikingly high values of multiplicity of infection are reported even in regions of low endemicity such as Brazil and Thailand. Thus, the proportion of multiple-clone P. vivax infections, estimated by using microsatellites analyses, range from 49-57% in Brazil [46,47], 10-47% in India [48,49], 9-60% in Sri Lanka [50] and 52-63% in Thailand [48,49]. This difference in pattern of multiplicity of infections between P. vivax and P. falciparum could be related with specific biological features of the P. vivax parasite, such as earlier gametocytogenesis [51] and relapse [52]. Early gametocytogenesis might allow for a more efficient transmission to the mosquito vector before symptoms appear and, thus, before drug treatment is initiated, while relapses would also enhance transmission and increase the probability of detecting mixed infections as further inoculations occur.
Besides the role of recombination we have recently demonstrated that natural selection has an important role in the generation and maintenance of the genetic diversity of DBP II [35]. The maintenance of this variability by diversifying selection presumably helps parasite to evade the host immune recognition and favors a low-medium frequency of distinct haplotypes. As we can expect due differences in the endemicity spectrum and immune response profile, both recombination and selection seem to be acting differentially among distinct geographical areas. If no recombination is assumed, we would expect that the number of described polymorphic sites observed would be arranged into a maximum of n+1 haplotypes. Here, this profile was found in the majority of studied countries, some of them with the number of haplotypes lower than the number of segregating sites (South Korea and Iran). However, it is important to highlight that the small number of DBP II sequences available for those two countries  might biased the number of haplotypes observed for those regions; specially, because there was a significant correlation between the number of haplotypes and sample size (Spearman correlation coefficient: r s = 0.6628, P = 0.0139). Interestingly, in two countries, Brazil and Sri Lanka, the number of haplotypes was higher than the number of segregating sites, suggesting a major role of recombination in these areas. At this time, it is not possible to conclude if the differences on haplotype number could be consequence of different recombination rates or due to the different levels of selective pressure.
For the purpose of rationalizing vaccine design will be necessary to define polymorphisms that represent antigenically distinct haplotypes. Here, we selected the most frequent nonsynonymous single nucleotide polymorphisms (nsSNPs) to derive DBP II haplotypes. Nine out of 10 nsSNPs lay in regions of DBP II that are immunologically relevant, mapping on previously defined Tand B-cell epitopes [32,35,53,54,55,56]. Based on these nsSNPs, we identified a number between two and six DBP II haplotypes, which will be required to cover 50% of parasite population in each studied country. Unfortunately, most of those haplotypes were geographically restricted and we could not find a single highfrequency haplotype covering all endemic areas. Of note, the DBP II Sal-1 variant that is currently being used to develop a DBP II -based vaccine was found so far in low frequency (10%) and it seems to be restricted to specific geographic areas. Aiming to avoid geographically restrict DBP II haplotypes, in the next analysis we determined the MNH that covers at least fifty percent of DBP II sequences available in GenBank independent of the region of origin. By using this approach, seven frequent haplotypes were identified, that broadly cover parasite populations from 5 out of 8 endemic-countries, i.e., Sri Lanka, Iran, Brazil, India and PNG. These same selected haplotypes were not able to cover parasite populations from South Korea, Colombia and Thailand; however, the results for South Korea and Colombia must be interpreted with caution because of the limited number of sequences available. Together, these results show that the development of DBP-based vaccine should considered multi-haplotype strategies. Similar strategy has been proposed to another polymorphic malaria vaccine candidate, the P. falciparum Apical Membrane Antigen 1 (PfAMA-1) [57]. By using a similar approach to that used here, the authors demonstrated that a worldwide collection of PfAMA1 haplotypes could be clustered into six populations that were independent of geographic location. Because continuous culture of P. falciparum has been successfully automated for over 30 years [58], the same authors were able to demonstrate that immunization with one member of PfAMA-1 elicited antibodies that block in vitro parasite invasion against the same subgroup [57]. Altogether, these two studies provide an important proof of concept that vaccine against malaria polymorphic antigens should consider multi-haplotype strategies. Now it remains to be determined whether those DBP II haplotypes described here could elicit broadly immune response. Although essential, those studies will be a difficult task due to limitation on P. vivax culture. Recent progress in short-term culture of P. vivax field isolates [59,60] and in vitro DBP II -DARC binding assays [15,61] present new opportunities to improve understanding of the ability of DBP II haplotypes to elicit cross-reactive responses against those that are genetically similar. In addition, these assays might help to define which polymorphisms determine antigenically different DBP II haplotypes.
An additional consideration in the malaria vaccines design is the geographic structuring of the parasite populations because significant variation among regions would suggest a need for vaccines to be tailored accordingly. Overall, we detected low to medium differentiation between countries, except to sequences from Colombia and South Korea, which showed high differentiation compared to the other regions. Remarkably, the four haplotypes that covers more than half Colombian parasites were not shared with any other parasite population, even with the Brazilian, another Latin America parasite population. The contrast among-region structuring was confirmed by F ST analysis with whole data set and clustering analysis using the STRUC-TURE software with the ten selected nsSNPs. The Bayesian clustering tool identified six subgroups of related DBP II sequences in the worldwide parasite population. Nevertheless, clustering was not related to the geographical origin of P. vivax isolates and all subgroups were present in the eight endemic countries studied, although in different proportions. Moreover, the large majority of isolates (.75%) was formed predominantly by only one ancestral population. Similar pattern was described for P. falciparum merozoite antigens in a meta-population analysis [62]. In that study, the merozoite antigens generally had lower levels of differentiation among countries. In addition, cluster analysis suggested that haplotypes formed subgroups independent of geographic origin with high levels of diversity within population. Thus, to improve probability of broad efficacy, vaccines based on merozoite antigens as DBP II may incorporate predominant haplotypes from different regions. Further analysis will be required to define the stability of the haplotypes in a region over time, as natural fluctuations can result from frequency dependent selection or new haplotypes can appear as result of recombination. For DBP, a single study so far has investigated the dynamic of asymptomatic P. vivax infections in a cohort of PNG children [63]. By following those children over six-month period, it was possible to demonstrate that the DBP II dominant haplotypes remained relatively stable throughout the study. However, it is difficult to determine the period of time necessary to observe such changes.
The present study comprises the first attempt to undertake a global analysis about DBP II variability and provides new insights on future design of a broad-spectrum P. vivax vaccine. Overcome the limitation of DBP diversity may require the inclusion of representative haplotypes otherwise a large proportion of the P. vivax population will be not target.

Supporting Information
Table S1 Estimates of genetic diversity and differentiation for PvDBP II encoding gene among P. vivax isolates in the absence of singleton sequences. (DOC)