Introduction

Epstein–Barr virus (EBV), also known as human herpesvirus 4 (HHV-4), is a member of the Herpesviridae family and is one of the eight known types of human herpesvirus. EBV is the most common human virus in the world1 and was isolated in 1964 from tumor cells (Burkitt’s lymphoma) by Epstein’s group2. EBV is related to distinct forms of cancer, such as Burkitt’s lymphoma, stomach cancer, Hodgkin’s lymphoma and nasopharyngeal carcinoma3,4. A High number of cases are usually reported. In the United States and other developing countries, most people are infected with EBV5 as 90% of the adults in the United States have been formally diagnosed with EBV infection. EBV infection can be asymptomatic or symptomatic, and the latter case includes mild fatigue, fever, enlarged spleen, swollen liver, swollen lymph nodes, inflamed throat, or rashes6. From 2006 to 2015, several clinical trials were conducted to develop vaccines; however, an EBV vaccine, phase 2 trial, from gp350 protein has been testified. This vaccine reduced the rate of Infectious Mononucleosis (IM) but not virus infection7.

Precautionary measures, such as avoiding direct contact with patients (including refraining from using a patient’s toothbrush, sharing food, or exchanging bodily fluids), can help reduce the risk of infection. EBV can infect host B cells and booms via a nonlytic mechanism8. EBV viral proteins play important roles in lymphoproliferative disease. For example, viral membranous proteins, such as LMP-1, may induce tumorigenic replication in infected B cells9,10. There are two types of EBV; Epstein and Yvonne Barr identified EBV in tumor tissue associated with Burkitt’s Lymphoma11. Glycoprotein 42 (gp42), Glycoprotein H, Glycoprotein L, and Glycoprotein B aid in the entrance to the host cell. Glycoprotein 42 binds to the HLA class II molecule because it is required for B cells, which inhibit epithelial cell fusion. For epithelial-cell fusion, the GH receptor protein interacts with GP4212 and GL is transported to the cell surface, which is essential for the correct folding of GH13.

EBV glycoprotein B is important for viral fusion events with B cells14. The human immune system precisely targets EBV glycoprotein 350 (gp350), which is an example of a lytically expressed gene15. The attachment of GP350 to the MHC-II molecules in the cell is aided by the already attached GP42 protein of EBV virus16. The fusion of the B-cell membrane and the outer viral envelope of the EBV virion requires functional spicule glycoproteins such as GH, GL, and gp4217. Previous studies suggest that glycoproteins such as GB complement membrane fusion18.

Vaccination is a significant approach to improve the standard of public health and provide an effective way to control the growing infections. In nature, plants act as bioreactors, which have been used to express efficient vaccine antigens against viral, bacterial and protozoan infections. Besides, we know that antibody epitope prediction using computational tools, one of the crucial steps of vaccine design19. Recent advancement in vaccine design has aimed for the expansion of conventional assays designed to quantify T-cell responses against various vaccine candidates20. Immunoinformatic approaches have made great contributions to predicting B-cell and T-cell epitopes in the development of subunit vaccines21. For subunit vaccine development, identification of continuous B-cell or nonlinear also known as non-continuous and cytotoxic T-lymphocyte (CTL) epitopes are essential. Among B-cell epitopes, >90% are noncontinuous22,23. The use of computational tools contribute greatly in biology designing in silico vaccine, prediction of T-cell epitope is crucial which does not only reduce the cost but also the necessity for experimental results24. Epitopic vaccine against HIV, malaria and tuberculosis resulted in promising outputs and maintained the defensive and beneficial potential therapeutic uses of the developed vaccines candidates25. Immunoinformatics plays an upright role in antibody and immunodiagnostic agents development, and vaccine design. The early phases of vaccine and other therapeutics agents developments were based on solely immunological experimentations. These early developmental techniques were tedious and costly too. The use of bioinformatics such as computational techniques greatly reduces the time and cost of developing such agents for therapeutic purposes. Khan et al.26, used multiple bioinformatics tools to predict vaccine against multiple HPV viruses26. Thus the development of modern therapeutic medicines and vaccines greatly rely on such tools27,28.

Systems medicine emphases significantly on the components of pathway kinetics to probe different conditions. Systems medicine could be also utilized to investigate interaction mechanisms between microbes. Metagenomics data could be utilized for such analysis. Rather than whole cell interaction, an insight onto proteins interaction could be also comprehended through systems biology approaches (Singh, P. K. et al.29. These latest techniques widely expand the circle of new drugs development. Overall, it is known that multidisciplinary aspects of the production of therapeutic proteins that has gained much more attention (Dangi, A. K. et al.30. Structural modelling using computational resources led to success in the development of computer assisted drugs. Likewise molecular docking algorithms scoring functions of different conformers to design new drug candidates.

In the present study, potential B-cell and T-cell epitopes (as effective vaccine candidates) were identified with the help of immunoinformatic approaches. T-cell immunogenicity is associated with epitope binding strength to the MHC molecule31. Molecular modelling tools were applied to peptide-MHC complexes to investigate their post-docking interaction in order to select potential candidates for the development of peptide vaccines.

Materials and Methods

Epstein-Barr Virus Sequence

The sequences of EBV proteins, including GH, GL, GB, GN, GM, GP42 and GP350, were retrieved from the Universal Protein knowledgebase (Uniprot) database (http://www.uniprot.org/). The sequence retrieval accession numbers along with other information are provided in the (Table 1). The 3D coordinates of all the selected proteins were predicted by using online webserver phyre2 (http://www.sbg.bio.ic.ac.uk/phyre2/html/page.cgi?id = index)32. The overall workflow of the work is shown in the Fig. 1.

Table 1 Detailed information, including individual protein sequence length, and region and accession number is shown in the table below.
Figure 1
figure 1

The figure above is showing the pipeline of the study. Resources, methods, and each step is discussed.

Prediction of Linear B-Cell Epitopes

After interacting with antigens (such as B-cell epitopes), B-lymphocyte cells differentiate into memory cells and antibody secreting plasma cells33. B-cell epitopes have a hydrophilic nature and are accessible for flexible regions34. IEDB (http://www.iedb.org/) online analysis resources were used to obtain the Parker hydrophilicity prediction values35, Emini prediction values of surface accessibility36, Kolaskar and Tongaonkar’s antigenicity scale values37, and Karplus and Schulz Flexibility Prediction values. B-cell epitopes were predicted using ElliPro (http://tools.immuneepitope.org/toolsElliPro/) using both protein sequences and structural information38. ElliPro utilizes the Protrusion Index (PI) of residues, protein shape approximation, and the final neighbouring residues clustering, which rely on PI.

Prediction of Potential Cytotoxic T-lymphocyte (CTL) Epitopes

CTL epitopes were predicted using the NetCTL.1.2 (http://www.cbs.dtu.dk/services/NetCTL/) server28. NetCTL accepts the FASTA sequence format to perform different analyses, such as prediction of MHC class I binding affinity, TAP transport efficiency and C-terminal cleavage. The artificial neural network and weight matrix were used for the prediction of MHC-I binding and proteasome-dependent C-terminal cleavage.

Peptide Library Construction and Molecular Docking

All of the predicted epitopes were modelled using the online webserver PEP-FOLD3 using 200 simulation runs to sample the conformations39 and sOPEP energy function40. Subsequently, we have docked the best ranked peptide models to the selected class I MHC molecules HLA-A (PDB ID: 2GIT) using the PatchDock docking server41. The algorithm of the PatchDock server uses structural geometry to find docking transformations with good molecular shape complementarity42. The resulting complexes were refined through the FireDock server43,44. High energy complexes were subjected to interaction analysis and molecular dynamics simulations45.

Molecular Dynamics Simulations

The accepted complexes were subjected to Molecular Dynamic simulations using the AMBER 14 molecular dynamics package46. The system was neutralized using Na+ ions using tleap. Each system was solvated in a rectangular box with buffer distance of 8.0 A° using TIP3P water molecules. A two-stage energy minimization of the complexes, using the SANDER module of AMBER 14, was performed to relieve the atomic clashes. An initial minimization of 6,000 steps, followed by another round of minimization (6,000 steps), were used to restrain the positions of all atoms in the systems, except those from the water molecules in the first minimization. The pmemd.cuda47 software was used to simulate the minimized complexes. The SHAKE algorithm and the Particle-Mesh Ewald (PME) method were used to include the long-range interactions, and a non-bonded interaction cutoff radius of 10 A° was considered. For equilibration, 10,000 ps time was applied, followed by a 50 ns simulation carried out at 310 K using the Langevin temperature coupling scheme at constant pressure (1 atm) with isotropic molecule-based scaling. Sampling of the MD trajectories was carried out every 2.0 ps. RMSD and hydrogen bonding analysis were carried out using the integrated CPPTRAJ and PYTRAJ48 modules in AMBER 14 and were visualized using the online server PDBePISA49, UCSF Chimera50 and PyMOL51.

Antigenic and Allergenic behaviour the predicted Epitopes

To confirm the allergenic and non-allergenic properties of all the designed epitopes, B-cell and T-cell epitopes, AlgPred52 (http://crdd.osdd.net/raghava/algpred/), which is an online web tool was employed with accuracy of 85%. Primary amino acid sequences were used of all the selected proteins for this purpose. The antigenicity of all the epitopes was predicted by using ANTIGENpro53 (http://scratch.proteomics.ics.uci.edu/) using different machine learning algorithms to process the amino acids sequences.

PKPD Modeling

The design and execution of the EBV signalling cascade was performed based on a literature survey within a virtual cell. Proposed peptides were chosen for kinetic analysis of EBV, where a concentration of 0.45 µm was assigned based on previous literature. Modelling of chemical reaction networks was applied for the analysis of this pathway.

This nonlinear kinetics scheme follows the Michaelis–Menten equation as below:

$${\rm{V}}=\frac{({\rm{Vmax}})\,\cdot \,({\rm{S}})}{{\rm{Km}}+{\rm{S}}}$$
(1)

This equation can be transformed to,

$${\rm{C}}=\frac{({\rm{Cmax}})\,\cdot \,({\rm{D}})}{{\rm{Km}}+{\rm{D}}}$$

or

$${\rm{V}}=\frac{{\rm{d}}[{\rm{P}}]}{{\rm{dt}}}=\frac{({\rm{Amax}})\,\cdot \,({\rm{D}})}{{\rm{Km}}+{\rm{D}}}$$
(2)

Here,

C = steady state concentration;

Cmax = theoretical maximum for C;

Amax = theoretical maximum for A;

D = dose.

Results

Sequence retrieval and analysis

Uniprot was used to retrieve the primary amino acid sequences of the selected proteins (glycoprotein B, glycoprotein L, glycoprotein N, glycoprotein H, glycoprotein M, glycoprotein 42 and glycoprotein 350) of EBV as shown in Fig. 2. Information about the protein source, accession number, number of active residues, and other information is given in Table 1.

Figure 2
figure 2

The 3D structures of the selected proteins. The Phyre 2 online server was used for B-cell and T-cell epitope prediction. (A) Glycoprotein B (B) Glycoprotein H (C) Glycoprotein L (D) Glycoprotein M (E) Glycoprotein N (F) Glycoprotein 42 (G) Glycoprotein 350.

Allergenicity and antigenicity prediction

The allergic and nonallergenic behaviours of EBV species were predicted using AlgPred (http://www.imtech.res.in/raghava/algpred/). Allergenicity prediction of known protein sequences is based on similarity. We checked if the epitopes are antigenic or not using the online server AntigenPro (http://www.scratch.proteomics.ics.uci.edu/)35. All of the proteins were found to be nonallergenic, while they possess antigenic properties (Table 2).

Table 2 Antigenic and allergenic results of the selected proteins.

B-cell epitope prediction

The BCPred server predicted 58 B-cell epitopes: five epitopes were for GP 42, eight for GP H, nineteen for GP B, one for GP L and GP N, five for GP M, and nineteen for GP 350. However, epitopes with scores above 0.99 were selected as the most potentially antigenic epitopes. Therefore, only one epitope each from GP42, GL, GM, GN and GH; four epitopes from GB; and fifteen epitopes from GP350 were found to meet the threshold value. B-cell epitopes, along with their scores, are tabulated in Table 3.

Table 3 B-Cell epitopes predicted by BCPred.

Surface accessibility of EBV

Threshold values >1 were set to predict the surface probability values. Amino acids with higher surface probability values (>1) have greater probability to be present on protein surfaces36. The maximum surface probability scores for Glycoprotein B (RRRRRD428–433), Glycoprotein H (EREDRD520–525), Glycoprotein L (KNGSNQ68–73), Glycoprotein M (RNRRRS362–367), Glycoprotein N (TEAQDQ44–49), Glycoprotein 42 (TKKKHT199–124), and Glycoprotein 350 (PRPRYN810–815) were 9.415, 9.265, 4.395, 9.054, 4.777, 5.691 and 4.859, respectively. The minimum surface probability scores were 0.032 (VVILVI745–750), 0.033 (CVFCLV5–10), 0.071 (LAICLV8–13), 0.067 (IIPILC309–314), 0.07 (LVLVII76–81), 0.054 (VIVLLL18–23), and 0.058 (AALLVC3–8). Figure S1 (Supplementary Materials) shows the graphical representation of the predicted surface accessibility of EBV. Moreover, for all of the other proteins which have maximum and minimum accessibility scores are shown in Table S1 (Supplementary Materials).

Surface flexibility of EBV selected proteins

The Karplus and Schulz flexibility method was used to calculate the motions of atoms (back and forth, considering temperature or B factor). Low B-factor values indicate a highly systematic structure, and high B factors indicate a distorted structure 32. The graphical representation of the surface flexibility results for EBV is shown in Fig. S3. Maximum flexibility scores for Glycoprotein B, Glycoprotein H, Glycoprotein L, Glycoprotein M, Glycoprotein N, Glycoprotein 42, and Glycoprotein 350 were 1.13, 1.094, 1.121, 1.157, 1.071, 1.091, 137 for heptapeptides EQNQEQK803–809, VITQGPN346–442, PKNGSNQ67–73, STSSSSS368–374, GASSPTN31–37, VRGGGRV31–37, PGNSSTS733–739, respectively. Minimum flexibility scores were 0.88, 0.872, 0.904, 0.866, 0.862, 0.897, and 0.894 for the peptides QAIMLAL795–801, LAAMLMA351–357, FLAICLV7–13, FLWWVVF193–199, IYLMYVC85–91, VAAAAIT37–43, and LLVMADC877–883, respectively. Figure S2 (Supplementary Materials) shows a graphical representation of the predicted surface flexibility of EBV. Moreover, for all of the other proteins which have maximum and minimum flexibility scores are shown in Table S2 (Supplementary Materials).

Parker Hydrophilicity Prediction for EBV

Hydrophilicity of the predicted epitopes was calculated using the Parker hydrophilicity approach35. The graphical illustration of the predicted Parker hydrophilicity of EBV is shown in Fig. S3 (Supplementary Materials). From all of the predicted EBV peptides, the maximum hydrophilicity calculated was 6.843 for Glycoprotein 350 at the amino acid positions DNGTESK496–502. These regions were predicted to act as active T-cell epitopes. The minimum hydrophilicity score calculated was −7.857 for Glycoprotein M at the amino acid positions FLWWVVF193–199. Moreover, for all of the other proteins which have maximum and minimum hydrophilicity scores are shown in Table S3 (Supplementary Materials).

T-cell epitope identification

Epitope predictions for all seven proteins were conducted on NetCTL, an online epitope prediction server. MHC-I binding prediction using the SMM method resulted in many potential epitopes against one allele, HLA-A*24:02. The weight matrix and artificial neural network was used for the prediction of MHC-I binding and proteasome dependent C-terminal cleavage. The MHC binding affinity, the TAP score, and the C-terminal cleavage score were considered to select the most promising epitopes among those predicted. The top three epitopes (as shown in Table 4) for glycoprotein B (ETDQMDTIY, QMDTIYQCY, PTTVMSSIY), glycoprotein L (MTAASYARY, LTSAQSGDY, ATSVLLSAY), glycoprotein H (ALENISDIY, LLTTLETLY, SSSALTGHL), glycoprotein N (IADCVAFIY, FLALGNSFY, TTDSEEEIF), glycoprotein M (LTEAQDQFY, IASAIYLMY, ASAIYLMYV), glycoprotein 42 (CAELYPCTY, HTFQVPQNY, NTREYTFSY), and glycoprotein 350 (PTNTTDITY, FLGNNSILY, HAEMQNPVY) were selected for docking.

Table 4 List of the total peptides T-cell vaccines predicted by NetCTL.

Peptide modelling and docking studies for HLA-A*24:02 and epitope interaction analysis

The selected top three epitopes from all proteins were docked against HLA-A*24:02 using Fire dock. The epitope QMDTIYQCY (glycoprotein B) was docked to understand the interaction pattern between peptide-MHC complexes. The global docking energy and van der Waals (vdW) energy were reported as −35.20 (kcal/mol) and −25.12 (kcal/mol), respectively. Residues Gln7, Cys8 and Asp3 from the docked peptides and Thr143, Tyr116, Thr80 and Lys146 from the MHC molecules were involved in binding. Peptide MTAASYARY (glycoprotein H) was reported to share a global energy of −34.27 (kcal/mol), with −29.12 (kcal/mol) vdW energy. On the other hand, peptides from Glycoprotein M (TTDSEEEIF), Glycoprotein N (LTEAQDQFY), Glycoprotein 42 (CAELYPCTY), and Glycoprotein 350 (PTNTTDITY) contributed global energies of −36.20 (kcal/mol), −34.25 (kcal/mol), −40.20 (kcal/mol), and −30.48 (kcal/mol), respectively. The vdW interaction energies for these complexes ranged from −29.71(kcal/mol) to −18.80 (kcal/mol). Residues Lys66, Arg97, Tyr99, Gln155, His114, and Thr163 from these complexes were uniformly involved in hydrogen bonding interactions. The Chimera interaction analysis tool predicted the interactions between peptide and MHC-I molecules within 3Ǻ. Overall, the stability was supported by the variable amounts of hydrogen bonding. The molecular interaction patterns are depicted in Fig. 3, while the interacting atoms are shown in Table 5.

Figure 3
figure 3

Interaction pattern of the docked peptides against MHC I molecules.

Table 5 Molecular Docking analysis of the final peptides.

PKPD modelling-based validation

In a time-course simulation with the shortlisted peptides (obtained from the screening), the initial concentration of the top-listed peptides was set at 0.45 µm54,55,56, and after 20 seconds, all of the interactions of the EBV mechanism in the cell signalling cascade were stabilized given in Fig. 4 and 5. EBV is a Baltimore Class I virus of the Herpesviridae family and plays a crucial role in lymphoproliferative disease. During tropic EBV infection, EBV binds to the HLA class II molecule, and B cells inhibit epithelial cell fusion, while the GH receptor protein interacts with GP42, and GL is transported to the cell surface where it is essential for the correct folding of GH. EBV GB is important for viral fusion events with B cells. Glycoprotein 350 binding is supported by the binding of EBV gp42 to B-cell MHC-II, while fusion of the B-cell membrane and the outer viral envelope of EBV virion must have functional spicule glycoproteins, such as GH, GL and gp42. Vaccines development requires the identification of extremely competent B-cell linear or nonlinear and CTL epitopes, where T cells act as mediators.

Figure 4
figure 4

Model of the biochemical pathway of MDS in the presence of peptides.

Figure 5
figure 5

The time-course simulation with screened peptides against relation of interacting EBV.

A systems biology approach is useful to investigate T-cell epitopes in peptide sequences, and earlier reports have accelerated research leading to the development of immune biology.

Root Mean Square Deviation (RMSD)

Molecular dynamics simulation was conducted to confirm the post-docking stability of the complexes. Trajectories were obtained after 50 ns and subjected to backbone stability using RMSD. RMSD of the selected complexes after 50 ns revealed that all of the complexes were stable and that the peptides had occupied the binding grooves of MHC-I molecules. RMSDs of all the complexes were calculated and plotted on the graph shown in Fig. 6. The RMSDs of all of the complexes ranged from 0.08 to 0.2 nm. These results show that small fluctuations were observed during the simulation time, but these fluctuations are likely because some of the peptides are modelled as loop structures.

Figure 6
figure 6

Root mean square deviation (RMSD) of the peptide-MHC complexes showing the stability of the peptides in the binding cavity of the MHC molecule.

Discussion

Vaccination is one of the best options for providing immunity against different pathogenic organisms, thus delivering protection against different diseases. Different types of vaccines, such as peptide, conjugated, subunit, and DNA vaccines, can be used to provoke the immune response. However, in this postgenomic and proteomic era, researchers prefer peptide or subunit vaccines over the whole pathogenic agent due to the availability and accessibility of huge data sets for different pathogens. These data can be systematically analyzed through the use of computational tools. Presentation of MHC-Antigen on the surface of T-cells provide a way to kill the infected cell. This presentation of MHC-antigen complex direct apoptosis or self-eating process of the infected cell. The antigenic element of the pathogen provoke the immune response by activating a signalling process. The peptide fragment bound to MHC molecule is primarily presented T-cell which significantly rely on different factors including proteasome cleavage and transport with the aid of ER. TAP which transporting channels or proteins help in the transport to the surface of the cell. Therefore, considering the c-terminal cleavage activity and TAP efficiency greatly help in the selection of effective vaccine candidates31,57,58,59. Immunoinformatic approaches have contributed greatly to the development of vaccines. Therefore, we employed these tools to design peptide vaccines against EBV to provide a means to protect humanity from the multiple diseases caused by EBV, such as infectious mononucleosis, Burkitt’s lymphoma60, Hodgkin’s lymphoma61, stomach cancer, laryngeal carcinoma62, multiple sclerosis63,64 and lymphomatoid granulomatosis65. Additional diseases that have been linked to EBV include Giannotti–Crosstie syndrome, erythema multiform, acute genital ulcers, and oral hairy leukoplakia66; furthermore, hypersensitivity to mosquito bites has been associated with EBV infection67. EBV has been implicated in disorders related to alpha-synuclein aggregation (e.g., Parkinson’s disease, dementia with Lewy bodies, and multiple system atrophy)68. The proteome of EBV has many important functional proteins involved not only in its pathogenesis but also in the maintenance of the pathogenic condition. EBV has been reported to use its glycoproteins, such as glycoprotein B, glycoprotein L, glycoprotein H, glycoprotein N, glycoprotein M, glycoprotein 42, and glycoprotein 350, to attach to and infect its host cell. Therefore, predicting and validating both B-cell and T-cell epitopes from these proteins are important steps to provide immunity against these various diseases. Based on MHC binding affinity, TAP score, C-terminal cleavage score, molecular docking and MD simulation, we propose Glycoprotein B (QMDTIYQCY134–142), Glycoprotein L (MTAASYARY256–264), Glycoprotein N (TTDSEEEIF396–404), Glycoprotein M (LTEAQDQFY43–52), Glycoprotein 42 (CAELYPCTY132–140), and Glycoprotein 350 (PTNTTDITY316–324) as the final T-cell epitopes that could provoke the immune response in the host cell. The systems biology approach with pharmacokinetic/dynamics modelling validated that these epitopes significantly activates the immune response pathway and, thus, provide a strong basis for the testing of these epitopes under in vivo conditions. Structural stability analysis of the peptide-MHC complexes also revealed the stability of these immunogenic complexes. Antigenic and allergenic profiles also confirmed that these epitopes are strong candidates. Furthermore, B-cell epitopes were reported as the primary choice for the development of a B-cell immune response.

This study provides a means for the development of peptide-based vaccines against EBV infection that could prevent many important diseases. To date, no such computational meta-analysis integrated with dynamics has been reported for the purpose of developing a peptide vaccine for EBV. This multiple step process has noticeably increased the scope and precision of this study. Our results will facilitate efficient subsequent experimental efforts, as the specified regions from Glycoprotein B (QMDTIYQCY134–142), Glycoprotein L (MTAASYARY256–264), Glycoprotein N (TTDSEEEIF396–404), Glycoprotein M (LTEAQDQFY43–52), Glycoprotein 42 (CAELYPCTY132–140), and Glycoprotein 350 (PTNTTDITY316–324) could be used for the development of candidate CTL epitopes. This study will aid in the progress of peptide vaccines against EBV.

Conclusion

It is well known that EBV causes many human diseases, including cancer. This study integrated multiple approaches to elucidate possible effective peptide vaccines that could provide protection against multiple infections. This study provides insight into the disease-causing factors of EBV virus and, thus, finalized potential B-cell and T-cell epitopes that will aid in the development of effective vaccines. Despite the prediction and validation of such peptides computationally, testing of our predicted epitopes should be carried out in animal models.