Analysis of Structures and Epitopes of Surface Antigen Glycoproteins Expressed in Bradyzoites of Toxoplasma gondii

Toxoplasma gondii is a protozoan parasite capable of infecting humans and animals. Surface antigen glycoproteins, SAG2C, -2D, -2X, and -2Y, are expressed on the surface of bradyzoites. These antigens have been shown to protect bradyzoites against immune responses during chronic infections. We studied structures of SAG2C, -2D, -2X, and -2Y proteins using bioinformatics methods. The protein sequence alignment was performed by T-Coffee method. Secondary structural and functional domains were predicted using software PSIPRED v3.0 and SMART software, and 3D models of proteins were constructed and compared using the I-TASSER server, VMD, and SWISS-spdbv. Our results showed that SAG2C, -2D, -2X, and -2Y are highly homologous proteins. They share the same conserved peptides and HLA-I restricted epitopes. The similarity in structure and domains indicated putative common functions that might stimulate similar immune response in hosts. The conserved peptides and HLA-restricted epitopes could provide important insights on vaccine study and the diagnosis of this disease.


Introduction
Toxoplasma gondii (T. gondii) is a species of parasitic protozoa in the genus Toxoplasma that can be carried by many warmblooded animals including humans [1]. There are three infectious stages in a complex life cycle of T. gondii: the tachyzoites, the bradyzoites, and the sporozoites [2]. A bradyzoite is a slowly replicating version of the parasite, which is responsible for chronic infection of T. gondii [3]. In chronic toxoplasmosis, the parasitophorous vacuoles containing the reproductive bradyzoites form cysts in the tissues of the muscles and brain [4].
The surface antigen of T. gondii that plays roles in the processes of host cell attachment and host immune evasion is dominated by a SRS (SAG1-related sequence) family of proteins which includes the SAG1-like sequence branch and the SAG2-like sequence branch [5]. SRS proteins are expressed in a stage-specific manner. SAG1, SAG2A, SAG2B, SAG3, SRS1, SRS2, and SRS3 are mainly expressed on the tachyzoite surface [6]. Studies have indicated that SAG2 members participate in the process of parasite's invasion to the host, and their antibodies could block the further attachment of T. gondii on host cells [7,8]. Previous studies have demonstrated that T. gondii parasites with a deletion of SAG2C, -2D, -2X, and -2Y gene cluster are less capable of maintaining a chronic infection in the brain [9]. It revealed that SAG2CDXY are important for persistence of cysts in the brain and these antigens might protect bradyzoites against an immune response. Contrary to SAG2A and SAG2B, which are expressed in tachyzoites, SAG2C, -2D, -2X, and -2Y appeared to be expressed exclusively on the surface of bradyzoites [9,10]. However, among 160 members of the SRS family, only three proteins' structures were reported. They are (i) the tachyzoite-expressed SAG1 [11], (ii) the bradyzoite-expressed BSR4 [12], and (iii) the sporoSAG [13]. The structure and function domains of SAG2C, -2D, -2X, and -2Y are still not very clear.
In this study, we sought to predict the structure and function domains of SAG2C, -2D, -2X, and -2Y by bioinformatics methods. The protein sequence alignments were performed by the T-Coffee method. Secondary structural and functional domains were predicted using the software PSIPRED v3.0 and SMART software. The 3D structure model of each protein was mapped using the I-TASSER server. The structural similarities of these proteins were summarized and possible functions of some key amino acids were predicted using the space confrontation by VMD and SWISS-spdbv. Furthermore, HLA-restricted epitopes of SAG2C, -2D, -2X, and -2Y proteins were predicted via algorithms.  [14,15], was used to obtain the alignment analysis among SAG2C, SAG2D, SAG2X, and SAG2Y. The secondary structures were constructed using the software PSIPREDv3.0 (http://bioinf.cs.ucl.ac.uk/psipred/) [16,17]. Simple modular architecture research identification and annotation of signaling domain sequences were analyzed via a web-based tool, SMART (http://smart.embl-heidelberg.de/) [18].

Data
The 3D models of proteins were constructed by I-TASSER, a protein structure server on the website http://zhanglab.ccmb.med.umich.edu/I-TASSER/, which is considered to predict protein 3D structures that have more than 100 amino acids [19][20][21]. VMD is a molecular visualization software for displaying, animating, and analyzing large biomolecular systems using 3D graphics and built-in scripts (http://www.ks.uiuc.edu/Research/vmd/). VMD was used to read standard Protein Data Bank (PDB) files and display the contained structure [22][23][24][25]. Swiss-Pdb Viewer (http://www.expasy.org/spdbv/) is an application that provides a user friendly interface allowing analyses of several proteins at the same time. The proteins can be superimposed in order to obtain structural alignments and compare their active domains. We deduced amino acid mutations, H bonds, angles, and distances between atoms from the intuitive graphic and menu interface. 3D protein molecular fitness analysis was performed for SAG2C, -2D and SAG2X, -2Y [22,23].

Conserved HLA-Restricted Epitopes Prediction.
Consensus methods including ANN, SMM, and CombLib-Sidney in immune epitope database IEDB (http://www .immuneepitope.org/) were used to predict HLA-restricted epitopes [26][27][28]. We used this tool to determine each peptide sequence's ability to bind to the specific HLA class I molecule.
Furthermore, we used SMART to identify domains of these proteins ( Figure 3). SAG2C, SAG2X, and SAG2Y all have two domains, while SAG2D only has one domain. SAG2D has an insertion of an adenosine, causing a frame shift and a premature stop codon, presumably leading to a truncated protein. SAG2C and SAG2D have transmembrane segments, while no transmembrane segments were identified on SAG2X and SAG2Y. From Figure 3, we could see that these proteins have no signal peptides, indicating that they are mature proteins. Members of the SAG2 family also differ in terms of open reading frame size, with the smaller SAG2D protein consisting of only one SAG domain, whereas SAG2C, SAG2X, and SAG2Y contain two SAG domains interrupted by a single intron. This indicates that SAG2C, SAG2X, and SAG2Y proteins have similar structure domains except SAG2D protein, which only has one domain.

Construction of 3D Model for SAG2C, -2D, -2X, and -2Y
Proteins. 3D model of SAG2C, -2D, -2X, and -2Y proteins were constructed by I-TASSER server. Five models were set up for each protein by Dr. Zhang's lab [19]. We selected the model with highest confidence C-score, which estimates the quality of predicted models by I-TASSER. It was calculated based on the significance of threading template alignments and the convergence parameters of the structure assembly simulations [20]. C-score is typically in the range of [−5, 2], and model with a C-score above 2 suggested a high confidence. Low temperature replicas (decoys) generated during the simulation were clustered by SPICKER and top five cluster centroids were selected to generate full atomic protein models. The cluster density was defined as the number of structure decoys at each unit of space in the SPICKER cluster. A higher cluster density meant that the structure occurs more often in the simulation trajectory and therefore a better quality model. Table 2 showed the parameters for construction D model of each protein.
The best model of each protein was selected and viewed via VMD program ( Figure 4). SAG2C, -2X, and -2Y have obvious two domains, D1 and D2, which are formed by two -strands separated by one -helix; SAG2D has one domain which is formed by one -strand separated by one -helix. The -strands rotate to form a sheet tube that is a common character of these proteins. Furthermore, the binding sites of residues in the model were predicted and showed in Table 3.
Previous analysis of SAG2C, -2D, -2X, and -2Y structures revealed that the five on three sandwich fold of SAG2 was most similar to the T. gondii bradyzoite-expressed BSR4 with TM-scores of 0.583, 0.661, 0.672, and 0.670, respectively (Table 4). BSR4 is a prototypical bradyzoite surface antigen encoded in a cluster of SRS genes on chromosome IV, including the closely related paralogs SRS6 and SRS9 [8,9]. Sequence alignment shows that SAG2C, -2D, -2X, and -2Y share 71% sequence identity with the tachyzoite-expressed BSR4. This observation is consistent with the prediction that stage-specific structural features might play an important role in the process of infection, dissemination, and pathogenesis  Figure 3: Prediction for protein domain. A web-based tool-SMARTM was used to figure out the domains of these proteins: transmembrane segments predicted by the TMHMM2 program (segments in blue color), segments of low compositional complexity determined by the SEG program (segments in purple color), signal peptides determined by the SignalP program (segments in red color), and domain (segments in gray color). Figure 4: The 3D models of SAG2C, -2D, -2X, and -2Y. The sequences of proteins were sent to Dr. Zhang's lab from the website http://zhanglab.ccmb.med.umich.edu/I-TASSER/. The 3D models with the highest score for each protein were selected. The models were viewed by VMD software, color method was secondary structure (yellow: -strands, purple: -helix, gray: coil), and draw method was new cartoon. The domain of each model was shown out in sheet form. .0525 a C-score is a confidence score for estimating the quality of predicted models by I-TASSER. C-score is typically in the range of [−5, 2], where a C-score of higher value signifies a model with a high confidence and vice versa. b TM-score and RMSD are known standards for measuring structural similarity between two structures which are usually used to measure the accuracy of structure modeling when the native structure is known. c Number of decoys represents the number of structural decoys that are used in generating each model. d Cluster density represents the density of cluster. a C-score LB is the confidence score of predicted binding site. C-score LB values range between [0-1], where a higher score indicates a more reliable ligandbinding site prediction. b TM-score is a measure of global structural similarity between query and template protein. c RMSD is the RMSD between residues that are structurally aligned by TM-align. d IDEN is the percentage sequence identity in the structurally aligned region. e Cov. represents the coverage of global structural alignment and is equal to the number of structurally aligned residues divided by length of the query protein. f BS-score is a measure of local similarity (sequence and structure) between template binding site and predicted binding site in the query structure.

SAG2Y
(d) Figure 6: Identifying HLA-restricted epitopes on the surface of 3D models. The predicted HLA-restricted epitopes sequences shown in Tables  2 and 3 were marked out on the surface of 3D models of SAG2C, -2D, -2X, and -2Y. The 3D structures of proteins were shown using Surf method. Red color balls stand for epitopes restricted by HLA-A * 1101, green color balls stand for epitopes restricted by HLA-A * 0201, and blue color balls stand for epitopes restricted by HLA-B * 0702. a TM-score of the structural alignment between the query structure and known structures in the PDB library. b RMSD is the RMSD between residues that are structurally aligned by TMalign. c IDEN is the percentage sequence identity in the structurally aligned region. d Cov. represents the coverage of the alignment by TM-align and is equal to the number of structurally aligned residues divided by length of the query protein.
in T. gondii. In BSR4, two strands are organized in an antiparallel fashion, followed by another strand on the lower face of the sandwich. The dimeric structure of SAG1 showed a sandwich, two parallel outside strands with an opposite one in between [29]. The overall topology of the five on three sandwich D2 domain is conserved between SAG2C, -2D, -2X, -2Y and BSR4. A detailed comparison of SAG2C, -2D, -2X, -2Y and BSR4 reveals a similarity in topology of the D1 and D2 domain consistent with the lower Z-score from the Dali search.
By comparison, the next most similar structure is SproSAG (surface antigen glycoprotein) with a substantially reduced TM-score [30,31]. SporoSAG is a dominant surface coat protein expressed on the surface of sporozoites. SporoSAG crystallized as a monomer and displayed unique features of the SRS sandwich fold compared to SAG1 and BSR4 [9]. Intriguingly, the structural diversity is localized to the upper sheets of the sandwich fold and may have important implications for multimerization and host cell ligand recognition. By fit analysis, SAG2D fits well on the Cterminal of the protein SAG2C. SAG2X and SAG2Y fit pretty well from C-terminal to N-terminal ( Figure 5).

Conserved HLA-Restricted CD8 + T Cells Epitope Prediction.
Epitope prediction algorithm consensus was used to predict peptides that could stimulate human to induce effective and protective immune response against T. gondii. We want to see if they have similar epitopes scattered on the surface of their protein. The epitopes from SAG2C, -2D, -2X, and -2Y were predicted using the software from IEDB (http://www.immuneepitope.org/) which could identify novel HLA-class I restricted CD8 + T cell epitopes derived from T. gondii. 16 peptides were selected based on a high HLA allele binding score (percentile rank < 3).
More interestingly, when we marked the HLA-restricted epitopes on the alignment sequences of the proteins, we found that the epitopes restricted by the same type of HLA allele are located at the same domains of the proteins ( Figure 6). Our results indicated that the epitopes from SAG2C, -2D, -2X, and -2Y can be recognized by the proper MHC-I molecular and present on the cell surface to induce immune response in the host CD8 + T cells which might be helpful on vaccine study and diagnosis for this parasitic disease. Some identified peptides from these proteins have been  proven to be recognized by PBMC cells from proper HLArestricted T. gondii seropositive individuals and significantly induced IFN-production in T cells from immunized mice [32,33] and therefore confirmed our predictions.

Conclusions
In this study, we have conducted a detailed bioinformatic and structural characterization analysis of the bradyzoite proteins SAG2C, -2D, -2X, and -2Y. The characterization of SAG2C, -2D, -2X, and -2Y provided structural view of the T. gondii SRS family members at chronic bradyzoite stage. Our bioinformatic analysis clearly showed that SAG2C, -2D, -2X, and -2Y are homologous protein members of the SAG2 subfamily. Consistently, our structural analysis demonstrated that SAG2C, -2D, -2X, and -2Y are similar to two other bradyzoite SAG2 members, BSR4 and SPOROSAG, rather than tachyzoite SAG1. This result indicated that SAG2 family has conserved structure at bradyzoite stage but a great difference from SAG1 at tachyzoite stage. Furthermore, the predicted conserved peptides and HLA-restricted epitopes shed interesting light on vaccine study and diagnosis for this parasitic disease.