Immunoinformatics and molecular modeling approach to design universal multi-epitope vaccine for SARS-CoV-2

Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) is a highly transmittable and pathogenic human coronavirus that caused a pandemic situation of acute respiratory syndrome, called COVID-19, which has posed a significant threat to global health security. The aim of the present study is to computationally design an effective peptide-based multi-epitope vaccine (MEV) against SARS-CoV-2. The overall model quality of the vaccine candidate, immunogenicity, allergenicity, and physiochemical analysis have been conducted and validated. Molecular dynamics studies confirmed the stability of the candidate vaccine. The docked complexes during the simulation revealed a strong and stable binding interactions of MEV with human and mice toll-like receptors (TLR), TLR3 and TLR4. Finally, candidate vaccine codons have been optimized for their in silico cloning in E. coli expression system, to confirm increased expression. The proposed MEV can be a potential candidate against SARS-CoV-2, but experimental validation is needed to ensure its safety and immunogenicity status.


Introduction
The world health organization (WHO) on March 11, 2020 declared the recent novel coronavirus (SARS-CoV-2) outbreak as a pandemic. This outbreak not only affected millions of people worldwide, and posed a serious health burden to society, but also led to extreme economic imbalance globally. SARS-CoV-2 belongs to the family beta coronaviruses and are closely associated with Severe Acute Respiratory Syndrome Virus (SARS-CoV) and Middle East Respiratory Syndrome Virus (MERS-CoV), which were responsible for the outbreak in 2002 and 2012 respectively [1,2]. SARS-CoV-2 is a small virus containing positive sense single stranded RNA genome of ~30 Kb in length which encodes 9860 amino acids. The genome is translated into mainly three types of proteins: 1) nonstructural proteins (ORF1a/b, nsp2, nsp3, nsp4, nsp9), 2) accessory proteins (ORF3a, ORF8, ORF7a/b), and 3) structural proteins which includes spike, envelope, membrane and nucleocapsid protein [3,4]. In this regard, a recent study on the SARS-CoV-2 human protein interaction map illustrates several SARS-CoV-2 and human protein receptors as putative therapeutic targets [2]. Non-structural proteins include viral enzymes and other accessory proteins that assist genome replication and pathogenesis. Structural proteins play key roles in virus pathogenesis, replication, and spread of infection. Basically, interaction of spike protein with human ACE2 receptors leads to the internalization of viruses inside the host cells, where the viral genome is replicated and interacted with nucleocapsid protein. After replication, the envelope and membrane proteins help with the assembly and packaging of progeny virus particles [3][4][5][6].
There is no specific drugs or vaccine established for the treatment of COVID-19 disease. The Pfizer/BioNTech vaccine is 95% effective in preventing COVID-19 infection [7], but its approval status varies worldwide. In countries where the vaccine has not been approved by the relevant regulatory authority, it is an investigational drug, and its safety and efficacy have not been established [8]. The most promising approach to deal with any pandemic is vaccination; however, the development of vaccines is a challenging process. Lessons and experiences learned from earlier SARS and MERS outbreaks as well as the advancement of technology has accelerated the process of development of vaccines. So far, many vaccine candidates have either entered the clinical trial or are ready to start the trial process by fulfilling preliminary evaluation protocols. For instance, Moderna's mRNA-1273 based vaccine entered phase 1/2 trial, DNA Vaccine named INO-4800 by Inovio pharmaceuticals, microneedle array delivered recombinant coronavirus vaccines etc. have shown promising results [9]. The most important strategies for the development of vaccines against SARS-CoV-2 are inactivated or weakened virus, replicating or non-replicating viral vector-based approaches, DNA, RNA, virus particle like approaches and protein subunit-based approaches, the later includes immunoinformatics approach [10].
Among all, the immunoinformatics and reverse vaccinomics approaches are rapid, accurate, cost-effective and reliable against pathogens [11]. In this regard, pioneering work of Ruth Arnon on an epitope-based vaccine against influenza (H5N1) virus proved the reliability of immunoinformatics approach [11,12]. Several groups also reported the immunoinformatics approaches to design subunit vaccines against SARS-CoV-2, however, only two of them are under preclinical trials [13]. Several studies proposed subunit vaccines through the immunoinformatics approach to accelerate the race over safe vaccine development, but they were limited to either single protein [14,15] or very few proteins of SARS-CoV-2 [16,17].
This study aims to design a multi-epitope vaccine against SARS-CoV-2 that efficiently elicits both innate and adaptive immune responses into the host body and thereby provides maximum protection. Therefore, 13 out of 21 structural and non-structural proteins (spike, nucleocapsid, membrane, envelope, endoRNAse, nsp4, nsp9, nsp6, ORF3a, ORF6, ORF7a, ORF8 and ORF10) of SARS-CoV-2 are selected based on their antigenicity score. These proteins are analyzed for prediction of Cytotoxic T cell (CTL), Helper T cell (HTL) and B cell epitope. The toxicity, allergenicity and conservancy of epitopes are checked along with the population coverage study. Combining adjuvants, most promising CTL, HTL and B-Cell epitope along with suitable linker, the vaccine is constructed. Vaccine structure is modeled and docked against TLR3 and TLR4 to ensure its efficacy of inducing an immune response. Furthermore, the docked complex is validated and analyzed using molecular dynamics simulations. Finally, to ensure the expression and translation efficiency of vaccine the in silico cloning is performed. The unique feature of our study over other existing reported peptide vaccine is that the selected epitopes could be able to elicit immune response in both human and mice, which would be beneficial in terms of cost reduction in the process of in vivo efficacy testing of the vaccine construct.

Methods
The bioinformatics resources used in the current study are presented in Supplementary Table S1.

MHC class I (CD8 + T cell) epitope screening
Selected proteins were screened through NetCTL 1.2 server with default parameters (Threshold-0.75; Weight on C terminal cleavage: 0.15 and Weight on TAP transport efficiency: 0.05) to predict 9-mer CTL epitopes against 12 human super-families of HLA Class-I alleles (A1, A2, A3, A24, A26, B7, B8, B27, B39, B44, B58, B62). The epitope length was selected as 9 mer, because the NetCTL 1.2 server has been trained on a set of 886 known 9 mer MHC class I ligands, whereas the average AUC score (Area under the ROC Curve) of the server is 0.941. Epitopes showing good binding affinity with human supertypes were screened through 1) TMHMM server to predict the topology, 2) VaxiJen v2.0 server to predict antigenic score, 3) AllerTOP v.2.0 server to predict allergenicity, and 4) ToxinPred server to predict toxicity. Epitopes which passed the screening as topologically outside, antigenic, non-allergen and nontoxic were passed through NetMHC 4 server for Mice alleles Epitopes showing positive results were further cross-checked with IEDB class I immunogenicity server for Human HLA class I binding. Finally, most HLA covering epitopes were selected as final epitopes and subjected to population coverage analysis.

MHC class II (CD4 + T cell) epitope screening
The IEDB MHC II binding server was used to predict epitopes from selected antigenic proteins. Firstly, 15-mer HTL epitopes were predicted for Mice alleles (H2-IA b , H2-IA d , H2-IE d ) with a selecting percentile rank <10. Then, the TMHMM, VaxiJen v2.0, AllerTOP v.2.0 and ToxinPred servers were utilized as in the first filtration such as for CTL screening. Secondly, epitopes were screened through the IFNepitope server to select epitopes with the capability to induce IFN-γ production. Finally, epitopes showing positive results were further cross-checked with the IEDB class II immunogenicity server for Human HLA class II binding. As with CTL screening, most HLA class II covering epitopes were selected as final HTL epitopes and then subjected to population coverage analysis.

Linear and discontinuous B cell epitope screening
The available webserver used for B cell epitope prediction was not accurate enough, and B-cell epitope prediction is a semi-low accuracy procedure. Hence, in our study we predicted the B cell epitopes through a multi-method approach including (i) the physicochemical-based algorithms (e.g., Bcepred and IEDB-based linear epitope prediction) and (ii) the machine learning based algorithms (e.g., ABCpred, BepiPred, and LBtope). We initially used the ABCpred webserver as a standard for BCL (B cell lymphoma) prediction (14, 16, 18 and 20-mer), and furthermore, other webservers were used to find overlapped epitopes through cross checking. Finally, overlapped consensus epitope were screened through VaxiJen v2.0, AllerTOP v.2.0 and ToxinPred servers for selecting the final linear BCL epitopes.
We also predicted the discontinuous epitopes through IEDB Ellipro tools in our study. Structural proteins were used for this analysis. Three dimensional (3D) structures of structural proteins were retrieved from RCSB (Research Collaboratory for Structural Bioinformatics) and I-TASSER online server. Finally, shortlisted overlapped consensus epitopes were cross-checked with the IEDB Ellipro webserver to find overlapped discontinuous epitope residues.

Population coverage and conservancy analysis
Shortlisted CTL and HTL epitopes along with respective HLA allele types were submitted to the IEDB population coverage analysis tool. Calculations were performed separately for MHC class I and MHC class II as well as in the combined condition.
Structural protein sequences from other strains of SARS-CoV-2 were retrieved from NCBI database. Then, the IEDB Epitope Conservancy analysis tool was employed for shortlisted epitopes with a >= 100% sequence identity threshold.

Vaccine construction and structural model
To construct a multi-subunit vaccine, CTL, HTL and BCL were joined in an orderly fashion with appropriate linker. The vaccine construct started with beta defensin-3 adjuvant at the N-terminal end followed by EAAAK linker and PADRE sequence. GGGS linkers were added to join each CTL epitopes, whereas GPGPG and KK linkers were used to join HTL and BCL, respectively. Finally, an additional PADRE sequence was added at C-terminal end. Several parameters such as Physiochemical properties, solubility, antigenicity and allergenicity of designed vaccine construct were analyzed through the ExPASy-ProtParam, Protein-Sol, VaxiJen v.20 and AlgPred servers, respectively.
The secondary structure of the vaccine protein was predicted using the Self-Optimized Prediction Method with Alignment (SOPMA) server [18]. It predicts four conformational states including helix, beta sheets and bridges, turns and coils. The Robetta server was utilized to construct 3D vaccine protein models. The five generated models were evaluated through Rampage Ramachandran plot analysis, while only the best model was optimized in YASARA dynamics suite through 100 ns of MD simulation.

Molecular docking and dynamics simulation of vaccine-TLRs complex
Human TLR-3 (PDB id 1ZIW), human TLR-4 (PDB id: 4G8A), mice TLR3 (3CIG) and mice TLR4 (PDB id 2Z64) were retrieved from the protein database (PDB) and prepared for docking analysis. Proteinprotein docking was performed through the ClusPro 2.0 server. Based on the lowest energy score, efficiently docked complexes were selected and downloaded. Interactions between vaccine and TLR complexes were evaluated.
The docked Toll-like Receptor TLRs (human TLR3, 4 and mice TLR3, 4) and constructed multi-epitope vaccine complex MD simulations were computed using the YASARA dynamics suite [19]. The YASARA dynamics suite was also used to estimate conformational change and all possible interactions. AMBER 14 [20] was considered for all MD simulations at constant temperature (298 K) using the NVT ensemble. The 0.9% concentration of NaCl salt was used to neutralize the system where water molecules (0.998 g/cm 3 density) were added. The cell size was 20 Å larger than the complex in all cases. A cuboid cell box and periodic boundary conditions were employed for execution of the simulation. A cut-off radius of 8.0 Å was used for the particle-mesh Ewald (PME) method [21] for long-range electrostatic interaction calculations. For each system, the total simulation time for each case was 100 ns MD simulations with 100 ps time interval.

Codon adaptation, in silico cloning in E. coli expression system and mRNA structure analysis
The Jcat online tool was used to design optimized codons of the vaccine protein for E. coli K12 expression system. Prokaryotic ribosome binding site, rho-independent transcription terminators and four restriction sites (AclI, NtoI, EagI and EgeI) were avoided in optimized reverse translated sequence. GC content and Codon Adaptation Index (CAI) calculated through the Jcat server were compared and evaluated with the results from GenScript server. Highly efficient cDNA sequence was inserted in pETite N-His SUMO Kan vector (Lucigen, USA) under T7 promoter through SnapGene v5.1 software which would translate the recombinant protein containing Histidine tags, SUMO protein and vaccine protein. mRNA secondary structure of the fusion protein was predicted through the RNAfold server to check the translation efficiency and thermodynamic stability.

Selection of significant protein
The NCBI database was used to retrieve amino acid sequences of 21 structural and non-structural proteins. All proteins were evaluated through the VaxiJen server, which produced an overall score for each protein sequence prevailing their antigenicity response. We found an antigenicity score of more than 0.45 for 13 proteins (nsp4, nsp6, nsp9, endoRNAse, Spike, ORF3a, Envelop, ORF6, ORF7a, Membrane, ORF8, Nucleocapsid, and ORF10 protein), as shown in Supplementary Table  S2. Based on antigenic scores, these proteins were chosen for further design of a multi-epitope vaccine.

Identification and evaluation of CTL (cytotoxic T lymphocytes) epitopes
CTL epitopes of structural and non-structural proteins were initially predicted by a two-stage filtration/screening process. First filtration was conducted after CTL selection by the NetCTL1.2 server, which predicted CTL epitope of 9-mer in length. After identifying CTL epitopes through NetCTL1.2 server, epitopes were filtered by checking antigenicity, allergenicity, non-toxicity and immunogenicity in the first filtration process. Finally, epitopes were evaluated through best binding affinity with mice alleles in the second filtration. About 1582 epitopes were predicted from 13 proteins by NetCTL1.2 server and 109 CTL epitopes were screened out of 1582 in first stage filtration, as shown in Supplementary File 1. Out of 109, 54 CTL epitopes were selected in the second filtration stage, as presented in Supplementary File 1.

Identification and evaluation of HTL (helper T lymphocytes) epitopes
Initially, HTL epitopes for 13 proteins were predicted by the IEDB server for mice MHC-II alleles (H2-IA b , H2-IA d , H2-IE b ). Most significant epitopes were selected based on their percentile rank, which should be less than 10. About 703 HTL epitopes were predicted from the IEDB MHC-II server (Supplementary File 2). Subsequently, these epitopes were screened by following the CTL-first filtration process. However, a total of 120 unique HTL epitopes were predicted for 9 proteins including nsp4, nsp6, nsp9, spike, membrane, envelope, ORF3a, ORF8, and nucleocapsid protein along with their respective mice alleles. Among these 120, after second filtration through IFNepitope server only 59 epitopes were found as IFN-γ positive, as provided in Supplementary File 2.

Prediction of B-cell epitopes and discontinuous B-cell epitopes
The most significant linear B-cell epitopes for spike, envelope, membrane, and nucleocapsid proteins were identified by using a multimethod approach, as detailed in the method section. Initially, a pool of linear B cell epitopes was generated from ABCpred server for all four structural proteins. Epitopes with higher antigenicity, non-toxicity and non-allergenicity were shortlisted from the initial pool. In the case of the spike protein, two consensus 14-mer epitopes (S 206-219 and S 374-388 ) were finally selected for vaccine construction. IEDB Bepipred 2.0 method predicted epitopes from the residue number 206 to 217 and 369 to 394 susceptible to induce immune responses as linear B-lymphocyte epitopes ( Figure S1a) (Supplementary File 3). Most residues of these positions were also predicted by other IEDB tools, such as Chou and Fashman B turn prediction ( Figure S1b), Emini surface accessibility prediction method ( Figure S1c), and Karplus and Schulz flexibility prediction method ( Figure S1d). Bcepred and LBtope webserver also ensured this final two B cell epitopes (S 206-219 and S 374-388 ) can act as linear B cell epitope. Furthermore, Ellipro tools in IEDB also confirmed these shortlisted epitopes from ABCpred to be a part of both linear and discontinuous epitopes by using a spike protein crystal structure 6VSB [22], as presented in Supplementary File 3.
In the case of envelope protein, one consensus BCL epitope (E 57-73 ) from ABCpred was selected as the final epitope. Initially, the IEDB bepipred prediction method predicted that only one epitope from 57 to 71 residues can induce immune response as linear B-lymphocyte epitope (Supplementary File 3). Additionally, this region was also found most common in other IEDB tools including Chou and Fashman B turn prediction, Emini surface accessibility prediction, and Karplus and Schulz flexibility prediction, as shown in Figure S2. Besides, this residue is common in the Bcepred and LBtope webserver prediction. Ellipro tools in IEDB also reconfirmed this final epitope from ABCpred to be a part of both linear and discontinuous epitopes by using an envelope protein crystal structure (QHD43418) retrieved from I-TASSER webserver (Supplementary File 3).
Similarly, from membrane protein and N protein two BCL epitopes (M 176-193, N 91-106 ) were selected based on their consensus results in another prediction method as shown in Supplementary File 3, Figure S3 and S4. Thus, we secured final 5 consensus BCL epitopes for vaccine design with higher antigenicity, non-toxicity, and non-allergenicity properties (Supplementary File 3).

Epitope conservancy, final epitopes prediction and population coverage analysis
For effective immunization, epitopes conservancy across other strains around the world is essential. The results of epitope conservancy analysis revealed that CTL, HTL, and B-cell epitopes of the final proteins were 100% conserved, except for one B-cell epitopes of E protein which showed 96% conservancy, as presented in Supplementary File 1, 2 and 3.
After Conservancy analysis, we selected the final CTL and HTL epitopes. We selected total 12 CTL epitopes from each protein by considering their high antigenic and immunogenic score and multiple types of mice allele and human super type coverage (Supplementary File 1). Thereafter, we predicted the MHC-I binding allele of each of the 12 CTL epitopes from IEDB server. The cut off value for the prediction was selected as percentile rank ≤1, and according to IEDB recommendations, a percentile rank of ≤1 was used as the cutoff to predict peptide binders for MHC-I and a percentile rank of ≤10 was used as the cutoff to predict peptide binders for MHC-II [23].
In the case of CD4 + T cell epitope prediction, similarly 9 HTL epitopes were selected from each protein by considering their mice allele percentile rank, high antigenic score and maximum number of the HLA-II alleles that are covered by these binders under a percentile rank of 10 (Supplementary File 2).
The population coverage analysis was done for selected 12 CTL and 9 HTL epitopes and their corresponding HLA allele among 13 areas of the world provided in IEDB population coverage tool. We found 87.12% (CTLs) and 90.42% (HTLs) of global population coverage in an individual manner (Supplementary File 4). As CTL and HTL epitopes are contained by a vaccine protein, we focused on their combined coverage which was 98.64% of the global population. The maximum coverage (100%) was found in three countries i.e., Europe, North America, and East Africa. In addition, the average population coverage for each of the CTL epitopes and HTL epitopes can be found in Supplementary File 4.

Vaccine construction and physicochemical analysis
A total 12 CTL, 9 HTL and 5 BCL epitopes based on their antigenic nature, binding energy, and the non-allergenic property were linked together by using GGGS, GPGPG, and KK linkers to construct the final vaccine. In the N-terminal of vaccine, adjuvant was added to protect it from degradation. Moreover, an EAAAK linker was used to join the adjuvant to the CTL epitopes. On the other hand, GGGS, GPGPG, and KK linkers were joined with the CTL, HTL and BCL epitopes, as shown in Fig. 1a. The final vaccine construct consisted of 502 amino acids (Fig. 1b).
The evaluated physicochemical properties of constructed vaccines are represented in the Table S3. The vaccine construct was found to be basic in nature by identifying the Isoelectric point (PI) of 10.06 with 52.29 kDa molecular weight (MW). In addition, half-life was calculated >30, >20 and > 10 h in mammalian reticulocytes (in vitro), in yeast (in vivo), and in E. coli (in vivo) respectively. GRAVY was estimated as 0.316, indicating hydrophobic in nature. Moreover, the vaccine construct was found to be soluble (0.593) in the E. coli system during overexpression. Thermo-stability (Aliphatic index) of the vaccine was found to be 89.34, which contain a higher amount of hydrophobic amino acids. Furthermore, the instability index (II) was calculated to be 33.48 classified that protein would be stable in a wet lab. In addition, antigenicity for the vaccine design was 0.6045, as calculated by the Vaxijen server, which indicates that the vaccine is immunogenic and can stimulate sufficient immune response.

Homology modeling, optimization, and structural quality of the vaccine
To analyze the tertiary structure of the vaccine construct, the 502 amino acids peptide sequence was utilized for the prediction of the homology model ( Figure S5a). The Robetta server predicted five homology models of the designed vaccine protein based on comparative modeling and ab initio method.
The initial homology model of the vaccine construct was optimized by performing 100 ns of MD simulation in physiological environments. The stability of the homology model during the MD simulation was determined by its deviation from the elementary structure in terms of RMSD. The RMSD-Cα values of the vaccine protein in the entire MD simulation trajectories was presented in Figure S5b. The homology model of the vaccine protein reached a stable state after 50 ns, while the average RMSD-Cα was 8.72 Å.
The MD simulation optimized homology model of the vaccine protein was evaluated through a Ramachandran plot. The results showed that 93% of residues were placed in most favored regions, 6% in additional allowed regions, and 0.3% in disallowed regions respectively, as predicted by RAMPAGE server ( Figure S5c). Model quality was also checked by ERRAT and the Verify3D server where the quality of vaccine construct was found 86.62% and 87.45%, respectively.
The secondary structure of the final vaccine construct was predicted by the SOPMA web server, which predicts a secondary structure according to the amino acids sequence of the vaccine protein. Among 502 amino acids, 187 are involved in beta sheet formation and 185 in coil formation, while alpha helix and beta turns are formed by 91 and 39 amino acids, respectively. Overall secondary structure prediction result indicates 37.25% are coiled, 36.85% are beta strands, 18.13% are alpha helix, and 7.77% are beta turn, as shown in Figure S6.

Molecular docking and molecular dynamics of the vaccine with human and mice TLRs
Human TLRs (3,4) and mice TLRs (3,4) were used as receptors for docking analysis with vaccine construct using ClusPro, and the calculated energy scores are tabulated in Table 1. Energy scores attained for human TLR3 and TLR4 were − 1400.3 and − 1713.7, respectively, and for mice TLR3 and TLR4 were − 1436.6 and − 1393.4, respectively. These complexes were subjected to MD simulations to analyze their stability.
To evaluate the dynamics of the constructed vaccine complex, MD simulation was performed. The RMSD of alpha-carbon, Rg and RMSF for all amino acid residues of the protein were analyzed for 100 ns. In case of human TLR3 complex, RMSD showed an initial deviation of 0.53 Å and gradually increased to 4.11 Å at 50 ns. Between 55 and 85 ns, the fluctuation increased sharply and then stabilized till the end of the simulation, as shown in Fig. 2a. A similar pattern was observed in the case of the mice TLR3 complex; fluctuation was increased sharply within 5-35 ns while the RMSD value range was 0.52-4.40 Å. However, the average deviation was found to be 4.75 Å and 4.61 Å for human and mice TLR3 complexes, respectively. In contrast, there was no sharp deviation observed in human TLR4 and mice TLR4 complexes. After 35 ns to the end the simulation revealed the stable nature of both complexes with mild fluctuation. Investigating the compactness of protein structure during the MD simulation, we analyzed the radius of gyration (Rg). The Rg value showed a wide difference between human TLR4 complex and other three TLR complex structures with an increased trend for human TLR3 and mice TLR (3 and 4) complexes, as shown in Fig. 2b. In case of human TLR4 complex, the lowest Rg value was observed from starting to the end of the simulation, which was ~32.18 Å (average). In contrast, the higher Rg value was found for human TLR3 complex at ~36.23 Å (average), which was followed by mice TLR4 (average ~35.71 Å) then mice TLR3 (average ~35.66 Å). Furthermore, the root-mean-square fluctuation (RMSF) of amino acid residues was analyzed to investigate the stability of the receptor-ligand interaction (Fig. 3). In the RMSF plot, a slight fluctuation was observed in amino acid residues, which may reflect the incessant interaction between receptor and ligand complex.
As a result, highly fluctuated regions in the plots indicate that the degree of flexibility increased in the receptor-ligand complex.
To gain further insight and understanding of the residue interactions between the docked complexes, we analyzed non-bonding and hydrogen bonding interactions after MD simulations using PDBsum algorithm [24]. The constructed vaccine produced 9 H-bonding interactions with human TLR3 (Fig. 4a and Table S4) and 3 salt bridges and 12 H-bond interactions with mice TLR3 (Fig. 4b and Table S4). On the other hand, 1 salt bridge and 8 H-bond interactions with human TLR4 and 4 salt bridge and 27 H-bond interactions were formed by constructed vaccine, as shown (Fig. 4c, d and Table S4).

Codon optimization for in silico cloning in E. coli and mRNA secondary structure analysis
Codon Optimization index (CAI) and GC contents of the optimized 1506 bp long nucleotide sequence of vaccine construct was evaluated and compared. To have better expression (transcription and translation) in organisms, the GC content ranging between 30% and 70% is considered optimal, whereas a CAI value must be higher than 0.8 up to 1 [25,26]. GC contents of optimized nucleotide sequence obtained from Jcat and GeneScript servers were 53.25% and 56.79%, respectively, while calculated CAI values were 0.92 and 0.78, respectively. Since Jcat server generated better optimized codons, this nucleotide sequence was therefore used as Gene of Interest (GOI) in SnapGene v 5.1 to construct recombinant pETite N-His SUMO Kan plasmid vector (Lucigen, USA). Stop codon TAA was added at 3 ′ end of GOI followed by EagI restriction site (C-terminus) while AclI restriction site (N-terminus) was added upstream of GOI followed by the addition of part of SUMO nucleotide sequence (Fig. 5). If expressed in E. coli K12, the constructed vector will produce a large fusion protein starting with Methionine (N-terminus) followed by six Histidine tags joined with SUMO protein and vaccine protein (C-terminus). After removal of His tags during downstream  processes, the native vaccine protein can be obtained by adding SUMO protease which will cleave the SUMO fusion protein and thereby release the exact vaccine protein with native N terminal residue (Glycine of adjuvant) as designed. Moreover, mRNA structure of expressed genes needs to be stable and the 5 ′ end should not have pseudoknots or long stable hairpin. RNAfold server predicted that the minimum free energy (MFE) of expressed mRNA secondary structure was − 622.10 kcal/mol. Lower MFE indicates higher thermodynamic stability of mRNA secondary structure ( Figure S7). No pseudoknot or stable hairpin was detected at the 5 ′ end of expressed mRNA, even first 10 nucleotides were free from forming secondary structure. These findings conclude on high level expression of gene construct [27,28] in E. coli K12, hence facilitating the production of the vaccine protein.

Discussion
Vaccines are considered the most potential immune modalities [29,30]. The production and development of vaccines is relatively cost effective, yet a laborious and time-consuming process [31,32]. In light of current advances in molecular immunology and computation, researchers are changing conventional vaccinology methods and applying immunoinformatics approaches (reverse vaccinology) to rationally design vaccines, especially for infectious diseases [33]. By using immunoinformatics approach, a vaccine was first designed against serogroup B Neisseria meningitidis [34] and successfully produced. Since then, numerous vaccines were developed against such diseases as Chlamydia pneumoniae, Streptococcus pneumoniae [34], H. pylori [35], Enterotoxigenic Escherichia coli [36], Rickettsia prowazekii [37] based on  immunoinformatics strategies.
In this study, we designed a multi-epitope vaccine targeting 13 proteins of SARS-CoV-2. Multi-epitope vaccines have several advantages over traditional and single-epitope based vaccines: a) multiple epitopes of MHC Class I and Class II can be recognized by TCRs; b) CTL, HTL, and B cell epitopes can be overlapped, and thus have the capability to induce an immune response of cellular and humoral immunity simultaneously, c) linking of an adjuvant with vaccine epitopes provides long-lasting immune response, d) the complications of in vitro antigen expression can be avoided [38][39][40][41].
An ideal multi-epitope vaccine should hold both categories of antigenic epitopes including T-cell and B-cell epitopes to generate particular immune response against predicted antigens [42]. Therefore, T-cell and B-cell epitopes were predicted from 13 proteins of SARS-CoV-2. The selection procedure was maintained based on their immunogenic properties such as antigenicity, toxicity, allergenicity, and cytokine The vaccine construct has 502 amino acid residues (52.29 kDa) with adjuvant beta defensin (a 45 mer peptide) [43], PADRE sequence [44] and linkers. The adjuvant sequence is an eminent agonist for TLR which is essential for the activation if different mediators of innate and adaptive immunity, compatible for vaccine development [45,46]. The adjuvant was also joined with EAAAK linker at the N-terminal of the vaccine protein, merged for effective separation of a bi-functional fusion protein [47,48]. The EAAAK linker was joined with PADRE sequence which is one of the helper T-cell epitope, significant for enhancing CTL responses of different antigens [49]. Moreover, GGGS [50] and GPGPG [51] linkers were merged with CTL and HTL epitopes for efficient recognition and separation of the multi-epitope vaccine [48]. Finally, the BCL epitopes were linked with a KK linker to preserve their distinct immunogenic activity [52]. However, the effectiveness of the vaccine depends on the broad population coverage in which the vaccination is used [53]. The epitope conservancy analysis was conducted with 51 different SARS-CoV-2 strains. Notably, the construct showed 98.64% of global population coverage in pathogenic invaded province (stated earlier). Therefore, this constructed vaccine would cover majority of the global population.
The final vaccine construct was found to be basic in nature and stable in physiological pH range as predicted by physiochemical analysis. Moreover, the estimated aliphatic index score revealed that the vaccine is more thermostable, while the positive GRAVY indicated its hydrophobicity suggesting strong interactions with amino acid residues. In addition, the construct was found to be soluble during overexpression in E. coli system.
For effective transportation into the body, the vaccine construct should have strong binding affinity with Toll-like receptors (TLRs). Although, TLRs play a significant role in the identification of pathogens and installation of the innate immune response against viruses, TLR recognition of SARS-CoV-1 is not well characterized [54]. In this study, we first assessed docking analysis of mice TLRs (3 and 4) against vaccine construct and compared it with human TLRs (3 and 4) respectively. One study revealed that mice TLR3 and TLR4 were more susceptible to SARS-CoV-1 [54]. Recently, a vaccine clinical trial was done with mice TLR4 against S protein of MERS-CoV. They also developed the standard operating procedures for the rapid development of clinic grade MNA SARS-CoV-2 subunit vaccines [55]. One more rationale for the selection of TLR3 is that it could recognize the RNA or dsRNS genome of coronavirus and TLR4 might recognize the outer component of coronavirus the spike protein [56]. The docking scores revealed that the vaccine construct has high binding interaction with human and mice TLRs, thus suggesting that the vaccine can trigger TLR activation and can enhance immune responses against SARS-CoV-2.
Molecular dynamics studies can provide information on the stability of vaccine-TLR complexes. In the case of the RMSD results, slight fluctuations were observed at 55-85 ns for vaccine-human TLR3 complex, however, all vaccine-receptor complexes were highly stable throughout the 100 ns simulation time. These results are in line with the structural compactness and high degree of vaccine-receptor complex flexibility which were analyzed by Rg and RMSF.
Furthermore, the intermolecular protein-protein interactions between vaccine and TLRs were determined via PDBSum ( Fig. 4 and Table S4). Structural analysis showed that vaccine protein formed higher number H-bond and salt bridge interaction with TLR4 than TLR3. In the case of human and mice TLR3, human TLR3 formed 9 and mice TLR3 formed 12 H-bond interactions with vaccine protein, although 3 salt bridge interactions were present in mice TLR3 complex, but this type of interactions was absent in human TLR3 complex. On the other hand, human TLR4 formed 8 and mice TLR4 formed 27 H-bond interactions with the vaccine protein. Although, salt bridge interactions were produced in both complexes, a common salt bridge interaction was found with a residue Arg438 of vaccine construct. Among all non-covalent interactions, salt bridges are the strongest interaction and contribute to a major extent in biomolecular stability [57]. In silico cloning and respective mRNA secondary structure analysis revealed that efficient gene expression and translation initiation can be achieved in wet lab trials.
In our study, certain epitopes of SARS-CoV-2 were dominant for Tcell and B-cell responses and these epitopes were almost well conserved in terms of sequence with SARS-CoV-1. Among them, of particular interest is the SARS-CoV-2 S 1060-1068 CTL epitope found to have 100% conservation in SARS-CoV-1 S 1042-1050 that could induce recall response when IFN-γ stimulation of blood CD8 + T-cells, revealing a significant difference between normal healthy donors and SARS-recovered patients [58]. Moreover, SARS-CoV-2 S 373-387 BCL epitope is located in the receptor binding domain (RBD) which has higher binding affinity for angiotensin-converting enzyme 2 (ACE2) of human respiratory epithelial cell [59]. This epitope is >90% conserved in SARS-CoV-1 S 361-374 . Zhi et al. (2004) reported that FSTFKCYGVSATKLN epitope corresponding to SARS-CoV-1 S 361-375 and CYGVSATKLNDLCFS epitope corresponding to SARS-CoV-1 S 366-380 was identified as the major positive peptides responsible for specific stimulation of T-cells to produce IFN-γ. They also found that CYGVSATKL (SARS-CoV-1 S 366-374 ) has strong binding affinity for mice H-2-K d [60]. Another group reported the frequencies of IFNγ producing cells elicited by CD8 + T cell epitope KCYGVSATKL (SARS-CoV-1 S 365-374 ) [61]. One region from SARS-CoV-2 E 57-72 (YVYSRVKNLNSSRVPD) protein was partially nesting in the region E 55-70 (TVYVYSRVKNLNSSEG), which was recognized as having high neutralizing activity in a SARS-CoV-1 convalescent sera and found to have the capability to elicit marked IgG, IgA, and IgM responses [62].
Currently, there are no effective drugs or vaccines established for the treatment of COVID-19, although vaccine development is a timeconsuming process that involves experimental to clinical trial settings, and no vaccine candidate has completed clinical trials yet [7,[66][67][68][69][70]. The majority of the vaccine designed against SARS-CoV-2 are mainly focused on a single protein (such as S, E, M, and N) [71][72][73], but single proteins such as the spike protein-based vaccine design still has many obstacles which may induce harmful immune response, causing liver damage in vaccinated animals [74]. Besides, another study designed a vaccine based on M protein, but their vaccine construct showed lower antigenic response to the virus [73]. For a potential vaccine candidate, structural, non-structural and accessory proteins of the virus are suitable selections which induce protective immune responses [75][76][77]. There are many studies that propose epitope and multi-epitope based vaccines against SARS-CoV-2 via the application of various computational and immunoinformatics approaches [78][79][80][81][82][83][84][85][86]. After scanning the literature, we found most of the published studies concerning epitope prediction on the S protein [71,87,88]. Furthermore, we compared our findings with reported studies manually, and found that none of the predicted epitopes have been projected earlier, which indicates the novelty of our study. In our study, the selected best epitopes have a good immune response to mice alleles besides human alleles, and also vaccine-mice TLRs complexes remain stable during the MD simulation. This type of study has not appeared in any current studies. Asaf et al. and Srivastava et al. recently reported that computational identification of multi-epitopes for CD4 + and CD8 + T cell based on multi proteins of SARS-CoV-2. But they only predicted T cell epitopes, a B cell peptide was not predicted [89,90]. Bhattacharya et al. predicted MHC-I and MHC-II epitopes based on S protein, but did not predict the capability of producing IFN-γ [15]. A report from Feng et al. outlined the identification of MHC-I epitopes in the main structural proteins of SARS-CoV-2, but did not consider any accessory and non-structural proteins [91]. A comprehensive analysis was conducted by Grifoni and colleagues, in which they identified all SARS-CoV-2 protein CD8 + T cell epitopes, but their main focus was on the peptides with sequence homologous to known SARS-CoV-1 epitopes [92]. Though a research group has recently conducted a study like us, it never considered accessory proteins for probable vaccine construction. Rather, they have designed a multi-epitope vaccine from four proteins (Spike, Mpro, RdRp, and Nsp13) of SARS-CoV-2 which showed a lower antigenicity score (0.4259) [93] when compared to our study.

Conclusion
In this study, we designed a unique multi-epitope vaccine against SARS-CoV-2 by using various computational tools. Our vaccine construct showed satisfactory physiochemical properties, antigenicity, non-allergenicity, anti-toxicity, and immunogenicity. Molecular docking analysis followed by 100 ns MD simulation of the vaccine construct with TLR3 and TLR4 was executed, which showed stable interactions of the vaccine candidate with these receptors. However, the vaccine construct showed strong expression, as confirmed by computational runs, and further experimental validation is required to ensure vaccine construct efficacy against COVID-19.

Declaration of competing interest
The authors declare that they have no known competing interests or personal relationships that could have appeared to influence the work reported in this paper.