Bioinformatics analysis of calcium-dependent protein kinase 4 (CDPK4) as Toxoplasma gondii vaccine target

Toxoplasma gondii (T. gondii), an obligate intracellular apicomplexan parasite, could affect numerous warm-blooded animals, such as humans. Calcium-dependent protein kinases (CDPKs) are essential Ca2+ signaling mediators and participate in parasite host cell egress, outer membrane motility, invasion, and cell division. Several bioinformatics online servers were employed to analyze and predict the important properties of CDPK4 protein. The findings revealed that CDPK4 peptide has 1158 amino acid residues with average molecular weight (MW) of 126.331 KDa. The aliphatic index and GRAVY for this protein were estimated at 66.82 and – 0.650, respectively. The findings revealed that the CDPK4 protein comprised 30.14% and 34.97% alpha-helix, 59.84% and 53.54% random coils, and 10.02% and 11.49% extended strand with SOPMA and GOR4 tools, respectively. Ramachandran plot output showed 87.87%, 8.40%, and 3.73% of amino acid residues in the favored, allowed, and outlier regions, respectively. Also, several potential B and T-cell epitopes were predicted for CDPK4 protein through different bioinformatics tools. Also, antigenicity and allergenicity evaluation demonstrated that this protein has immunogenic and non-allergenic nature. This paper presents a basis for further studies, thereby provides a fundamental basis for the development of an effective vaccine against T. gondii infection.


Introduction
Toxoplasma gondii (T. gondii), a compulsory intracellular parasite, could affect nearly all warm-blooded animals, such as humans [1,2]. The current drugs against toxoplasmosis are not effective enough and are associated with serious side effects [3]. Therefore, the finding and development of an effective vaccine have high priority and is critically required to limit T. gondii infection [4][5][6]. Calcium-dependent protein kinases (CDPKs) are a category of serine/threonine kinases found only in plants and protists like ciliates and apicomplexan parasites [7,8]. Multiple CDPKs (the most important household proteins) have been identified in apicomplexan protists, especially in T. gondii. The CDPKs are essential Ca 2+ signaling mediators and participate in parasite host cell egress, outer membrane motility, invasion, and cell division [9][10][11][12]. The CDPK family was regarded as a good choice for anti-Toxoplasma medications and an appropriate option for the design of vaccines [13][14][15][16][17]. There is no report regarding immunization with CDPK4 in experimental animals yet. However, in several studies, vaccination with CDPK1 [13,18,19], CDPK2 [20], CDPK3 [15,21,22], CDPK5 [14], and CDPK6 [16] induced strong humoral and cellular responses and prolonged the survival time in mouse models.
Predicting epitopes is highly important for the determination of an antigen's immunogenicity in vaccine development. So, bioinformatics tools and online resources can enable researchers to predict and recognize the potential epitopes of B-and T-cells [23][24][25][26]. Bioinformatics has recently become the preferred interdisciplinary new science for analyzing biological data using defined computer science, mathematics, statistics, physics, biology, medicine technologies and algorithms [23,26]. Thus, the current study was performed to analyze the several important features of CDPK4 protein using different bioinformatics servers.

Retrieval of CDPK4 protein sequence of T. gondii
First, the complete amino acid sequence of T. gondii CDPK4 protein was attained from ToxoDB server (https ://toxod b.org/toxo/).

Physicochemical evaluations
The physicochemical characteristics of the CDPK4 protein are of crucial significance in the evaluation of its aliphatic index, half-life, theoretical isoelectric point (pI), a grand average of hydropathicity (GRAVY), and electric charge distribution. The mentioned features were explored by ProtParam server [27,28].

Prediction of acylation and phosphorylation sites of CDPK4
In order to predict acylation and phosphorylation sites of CDPK4 protein, CSS-Palm and NetPhos online tools were employed, respectively [29,30].

Transmembrane domains and subcellular location prediction
The transmembrane domains of CDPK4 were examined by the TMHMM server v.2.0. Moreover, the PSORT II server was applied to predict the subcellular position of the CDPK4 protein.

Construction of the 3D model
3D models play a decisive role in the development of vaccines. In this case, the SWISS-MODEL webserver was applied to build the three-dimensional models of the CDPK4 protein through a homology modeling approach [34].

Refining and validating the 3D modeled structure
The proper SWISS-MODEL-constructed model was selected and modified by GalaxyRefine to attain highquality template-based protein predictions [35]. The Ramachandran plot was utilized to validate the threedimensional construct of the CDPK4 protein through the use of the SWISS-MODEL software [36]. The quality of the model was checked out using ProSAweb [37,38].

Prediction of major histocompatibility complex-I (MHC-I) and MHC-II epitopes
In this study, the IEDB [43] and NetMHCcons 1.1 [44] online services were used to predict the binding affinity of CDPK4 peptides toward the MHC class I. Furthermore, IEDB [45] and NetMHCIIpan 3.2 [46] servers were exploited to examine the 15-mer T-cell epitopes of H-2-IEd, H2IAd, and H2IAb mouse alleles.

Detection of the CTL epitopes
To activate the immune system, an antigen should be first presented on the MHC-I surface. So, the choice of cytotoxic T lymphocyte (CTL) epitopes plays a decisive role in designing a vaccine. To this end, a free web server CTLpred [47] was utilized.

Initial overview of the protein CDPK4
The amino acid sequence of CDPK4 protein was determined by the ToxoDB server under the accession ID: TGME49_237890. The CDPK4 protein includes 1158 amino acid residues with an estimated molecular weight of 126.331 KDa (antigens which have MW of < 5-10 KDa are considered as weak immunogens) [51], whereas its theoretical pI was 9.15. The total number of residues with the negative and positive charge was 145 and 167, respectively. Its half-life was predicted as 30 h, > 20 h, and > 10 h in mammalian reticulocytes cells (in vitro), yeast cells, and E. coli, respectively. Based on the instability index results (58.84), the CDPK4 protein is unstable. The relatively good estimated aliphatic index value of 66.82 indicates the thermostability of the protein.

Prediction of PTM sites of CDPK4
As it is evident, PTMs have important roles in cellular control processes [52]. Based on the findings, 137 phosphorylation and 21 acylation sites were predicted on CDPK4 sequence, suggesting that these sites may control several functions of the protein and affect protein activity ( Fig. 1; Additional file 1: Table S1).

Identification of transmembrane domains and subcellular location
The data obtained from TMHMM server v. 2.0 indicated no transmembrane domain in the CDPK4 sequence (Additional file 2: Figure S1). Furthermore, based on the PSORT II prediction, the subcellular location of CDPK4 is as follows: 82.6% nuclear, 8.7% plasma membrane, 4.3% mitochondrial, and 4.3% cytoskeletal.

Secondary structure assessment
It should be noted that determining the protein secondary structure by introducing special constraints, such as beta-turn or alpha helix, is a key step in the assessment of the tertiary structure. The findings showed the CDPK4 protein comprised 30.14% (349/1158) and 34.97% (405/1158) alpha-helix, 59.84% (693/1158) and 53.54% (620/1158) random coils, and 10.02% (116/1158) and 11.49% (133/1158) extended strand by SOPMA and GOR4 servers, respectively (Additional file 2: Figures S2  and S3). The findings from the PSIPRED server are depicted in Additional file 2: Figure S4. It is apparent that alpha-helix and beta-turn placed in the protein's internal portions, with high hydrogen bond-energy, will maintain the protein's structure resulting in a better interaction with antibodies [38,53]. The principal biological behavior of the proteins is focused on their spatial structure. Knowledge of protein structures and awareness the relationships between structures and functions are important [38].

3D model analysis
Following the analysis, five 3D models were established for the CDPK4 sequence among which, the one with the highest identity was chosen. The chosen template exhibited a 34.99% sequence identity. The SWISS-MODEL results are presented in Additional file 2: Figure S5.

Refinement and validation of the tertiary structure
The GalaxyRefine software was employed to refine the tertiary structures. According to the results of the Ramachandran plot and ProSAweb servers, an enhancement was observed in the quality of the threedimensional structure after the refinement. Prior to the refinement process, validation of the protein using the SWISS-MODEL tool showed that 87.87% of residues were situated in favored regions, while 8.40% and 3.73% of them lied in allowed and outlier regions, respectively, verifying its immunogenic efficiency (Fig. 2c). The postrefinement exploration of Ramachandran plots demonstrated that 95.34% of the residues lied within the favored regions, whereas 3.54% of them were in the allowed regions and only 1.12% of the residues were placed the outlier regions (Fig. 2d). The Z-score is indicative of the model quality; this parameter was − 8.09 in the initial model (based on the ProSA-web), and the majority of residues lied in the favored regions (Fig. 2a). Further improvement in the quality of the 3D structure after refinement can be also inferred from the Z-score value (− 8.15) (Fig. 2b).

B-cell epitopes prediction
The epitope prediction could offer invaluable data that can be used to identify the immunogenic peptides. The Bcepred-determined linear B-cell epitopes are presented in Additional file 1: Table S2; while the results obtained from the ABCpred server are listed in Additional file 1: Table S3. The greater peptide score indicates the higher possibility of being an epitope. According to the IEDB online tool, the mean scores of hydrophilicity, antigenicity, beta-turn, bepipred linear epitope prediction, flexibility, and surface accessibility of the CDPK4 protein are 2.381, 1.016, 1.012, 0.350, 1.013, and 1.000, respectively (Additional file 2: Figure S6). The SVMTriP-derived results are also tabulated in Additional file 1: Table S4. The analysis of linear B-cell epitopes demonstrated that the CDPK4 protein contains favorable epitopes and appropriate indices. The Bcepred's estimation accuracy of models based on various properties differs from 52.92% to 57.53%. This server also assists to forecast epitopes of B-cells using physicochemical features [38,39]. Another valuable step for the in silico analysis is the identification of the conformational epitopes needed for antibodyantigen interaction [30]. In this case, the application of ElliPro tool resulted in five discontinuous B-cell epitopes ( Table 1).

Analysis of MHC-I and MHC-II molecules
The connection of peptides to MHC molecules is an important step in the presentation of antigens to T-cells  Validation of CDPK4 protein 3D structure using Ramachandran plot. a The Z-score plot for 3D structure of predicted protein before refinement with ProSA-web server. b The Z-score plot for 3D structure of predicted protein after refinement with ProSA-web server. c The analysis of Ramachandran plot using SWISS-MODEL server in initial model showed 87.87%, 8.40% and 3.73% of residues were located in favored, allowed and outlier regions, respectively. d The results after refinement were changed as follow: 95.34%, 3.54% and 1.12% of residues were located in favored, allowed and outlier regions, respectively In general, the lower percentile ranks (or IC 50 values) indicate the higher level affinity, which represents a better T-cell epitopes and vice versa [28]. Based on the bioinformatics analyses, CDPK4 T-cell epitopes could strongly bind to MHC-I and MHC-II molecules. Since T. gondii is considered as an intracellular protozoa, the cellular immunity mediated by the T-cells have a pivotal role against this microorganism [54]. It is therefore extremely essential for the development of an effective vaccine against T. gondii to explain which type of T-cell-mediated immune response is participated [38,54].

Prediction of the CTL epitopes
The CTLpred server was utilized to select 10 high-rank and suitable epitopes to analyze the CTL epitope. The details are mentioned in Additional file 1: Table S9.

Allergenicity, immunogenicity, and solubility analysis
CDPK4 protein could exhibit high immunogenicity as its antigenicity score was determined 0.780 and 0.622 (through the use of ANTIGENpro and VaxiJen servers), respectively. AllerTOP and AlgPred servers suggest the non-allergic features of this protein. The ability to determine allergenicity is important to make sure that candidates for vaccines are low in allergenicity [38]. Based on the SOLpro output, the solubility of the CDPK4 protein was determined as 0.7087.

Conclusion
This paper provided a detailed explanation of the fundamental aspects of CDPK4 protein, such as physicochemical characteristics, transmembrane domains, secondary and tertiary structures, B-and T-epitopes, and other features of CDPK4, using bioinformatics servers. Based on the findings, CDPK4 protein revealed an acceptable antigenicity score. Also, this protein contains many good epitopes of B-and T-cells, suggesting that CDPK4 can considered as an appropriate vaccine candidate against T. gondii. This research presented important fundamental and theoretical evidence for further in vivo investigations on the CDPK4 protein to establish an effective vaccine against acute and chronic T. gondii infection.

Limitations
In this paper, only in silico analysis was performed. More studies are recommended for the development of an effective vaccine in vivo using the CDPK4 alone or combined with other antigens in the future. Also, a combination of immunodominant CDPK4 epitopes with various adjuvants and formulations will be useful.
Additional file 1: Table S1. The acylation sites of CDPK4 sequence.    14:50 60 AAs: The expected number of amino acids in transmembrane helices in the first 60 amino acids of the protein. If this number more than a few, you should be warned that a predicted transmembrane helix in the N-term could be a signal peptide; Total prob of N-in: The total probability that the N-term is on the cytoplasmic side of the membrane; (B) Analysis of the transmembrane domains of CDPK4. Figure S2. (A) The results of the GOR4 server suggested that CDPK4 contains 34.97% alpha helix (Hh), 11.49% extended strand (Ee) and 53.54% random coils (Cc) in secondary structure; (B) Graphical finding from prediction of secondary structure of CDPK4 using GOR4. Figure S3. (A) The results of the SOPMA server suggested that CDPK4 contains 30.14% alpha helix (Hh), 10.02% extended strand (Ee) and 59.84% random coils (Cc) in secondary structure; (B) Graphical finding from prediction of secondary structure of CDPK4 using SOPMA server. On the graphs, the Y-axes indicate the corresponding score for each residue (averaged in the specified window), while the X-axes indicate the residue positions in the sequence. The higher residue score could be interpreted as having a higher likelihood that the residue would be part of the epitope (yellow color on the graphs). Green color (under the threshold) shows the unfavorable regions that are related to the properties of interest.