Epitope-based chimeric peptide vaccine design against S, M and E proteins of SARS-CoV-2, the etiologic agent of COVID-19 pandemic: an in silico approach

Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) is the etiologic agent of the ongoing pandemic of coronavirus disease 2019 (COVID-19), a public health emergency of international concerns declared by the World Health Organization (WHO). An immuno-informatics approach along with comparative genomics was applied to design a multi-epitope-based peptide vaccine against SARS-CoV-2 combining the antigenic epitopes of the S, M, and E proteins. The tertiary structure was predicted, refined and validated using advanced bioinformatics tools. The candidate vaccine showed an average of ≥90.0% world population coverage for different ethnic groups. Molecular docking and dynamics simulation of the chimeric vaccine with the immune receptors (TLR3 and TLR4) predicted efficient binding. Immune simulation predicted significant primary immune response with increased IgM and secondary immune response with high levels of both IgG1 and IgG2. It also increased the proliferation of T-helper cells and cytotoxic T-cells along with the increased IFN-γ and IL-2 cytokines. The codon optimization and mRNA secondary structure prediction revealed that the chimera is suitable for high-level expression and cloning. Overall, the constructed recombinant chimeric vaccine candidate demonstrated significant potential and can be considered for clinical validation to fight against this global threat, COVID-19.

The S glycoprotein, because of its higher antigenicity and surface exposure (Almofti et al., 2018;Zhou et al., 2019;Shang et al., 2020), plays the most crucial role for the attachment and entry of viral particles into the host cells through the host angiotensin-converting enzyme 2 (ACE2) receptor (Gralinski & Menachery, 2020;Shang et al., 2020). It is noteworthy that E and M proteins also have important functions in the viral assembly, budding and replication of virus particles, as well as play role in augmenting the immune response against SARS-CoV (Shi et al., 2006;Schoeman & Fielding, 2019;Shang et al., 2020). Monoclonal antibodies (mAbs) primarily target the trimeric S glycoprotein of the virion consisting of three homologous chains (A, B, and C), and this protein is composed of two major domains, the receptor-binding domain (RBD) and the N-terminal domain (NTD) (Pallesen et al., 2017;Song et al., 2018;Zhou et al., 2019;Wrapp et al., 2020). The NTD is located on the side of the spike trimer and has not been observed to undergo any dynamic conformational changes (Shang et al., 2020). Thus, this specific region might play role in viral attachment, inducing neutralizing antibody responses and stimulating a protective cellular immunity (Almofti et al., 2018;Ul Qamar et al., 2019;Shang et al., 2020).
Most of the recent vaccine candidates induce neutralizing antibodies against the different forms and/or variants of the spike protein of the SARS- CoV-2 (Le et al., 2020). However, the immune responses generated from using single protein have generally been inadequate to warrant their use in the development of an effective prophylactic tool (Shi et al., 2015;Shey et al., 2019). On this note, multi-epitope vaccine candidates have already been designed against several viruses, including MERS-CoV and SARS-CoV, and their efficacies have been further reported (Almofti et al., 2018;Ul Qamar et al., 2019;Yong et al., 2019). Two related studies have reported the in-silico design of epitope based chimeric vaccine candidates targeting E, M, S and N proteins of SARS-CoV-2, albeit not peer-reviewed (Yazdani et al., 2020;Akhand et al., 2020). Besides, Kibria, Ullah & Miah (2020) performed an immunoinformatic approach to design a 70 aa long multi-epitope vaccine focusing on the the virion outer surface proteins (E, M, and S) (Chan et al., 2020).
Scientists are racing over the clock to develop effective vaccine for controlling and preventing COVID-19 based on the genomics, functional structures, and host-pathogen interactions; nevertheless, the ultimate results of these efforts is yet uncertain. Currently, 10 candidate vaccines against SARS-CoV-2 are in the clinical trial, and 121 more under preclinical evaluation (WHO, 2020). Strikingly, researchers are trialing different technologies, albeit targeting spike protein mainly, some of which have not been used in a licensed vaccine before (Le et al., 2020). Their vaccine appears to be effective and safe based on a limited data and application of the vaccine within a relatively tiny group of individuals. However, there are several uncertainties, for example, whether it will relate to antibody responses in the general population, be safe within a specific sub-population (children, pregnant women, and elder people), as well as lack of a standardized virus neutralization assay and accurate vaccine titer are complicating data interpretation. Moreover, only half of the medium dose receiver developed neutralizing antibody and the T-cell response is not particularly impressive (Sheridan, 2020).
Thus, mimicking of a more natural state of the virus where surface exposed proteins, or the immunodominant epitopes of those proteins influencing the immune response might be a solution, if those candidates ultimately cannot meet the final goal. Furthermore, excluding the nucleocapsid (N) protein, which is embedded within the structure and attached to the viral genome, will amplify the chance of developing a more pseudo-virus state for the expressed chimera that may produce specific antibody as well as T-cell responses. Furthermore, peptide-based chimeric vaccines are biologically safe as they do not need in vitro virus culture, and their selectivity might ensure accurate activation of specific immune responses (Dudek et al., 2010;Wang et al., 2019).
Considering the facts, we have proposed the development of a multi-epitope vaccine candidate, which differs from all the previous studies in the aspect of containing whole RBD and NTD regions of the spike protein alongwith specific epitopes of M and E proteins, giving an excellent chimeric conformation and might lead to the generation of a more potent protective immune responses since smaller epitopes have less ability to give better immune protection. Hence, we can assume that chimeric vaccine targeting multiple epitopes on the RBD and NTD segments of the S protein, M and E proteins would be a potentially effective vaccine candidate in combatting COVID-19 pandemic, and therefore, could be used against the could be used against the highly pathogenic SARS-CoV-2.

Comparative structural analysis of SARS-CoV, MERS-CoV and SARS-CoV-2
Multiple sequence alignment revealed that the S protein of SARS-CoV-2 shares 77.38% and 31.93% sequence identity with the S proteins of the SARS-CoV and MERS-CoV, respectively (Fig. S1). The structural (validated using the Ramachandran plot as in Fig. S2) alignment of the coronavirus S proteins reflects high degree of structural heterogeneity in the receptor-binding domain (RBD) and N-terminal domain (NTD) of the chain A and chain C compared to that of chain B (Fig. S3). Divergence of individual structural domains, NTD and RBD of 2019-nCoV spike protein from both of the SARS-CoV and MERS-CoV warrants the domains for epitope-based chimeric vaccine development, particularly against SARS-CoV-2.

Screening for B-cell epitopes
Linear epitopes prediction (ElliPro) based on solvent-accessibility and flexibility revealed 15, 18, and 19 epitopes within the chain A, B and C of S protein, respectively wherein score >0.8 was the threshold for the highly antigenic epitopes (Table 1). The amino acid (aa) residues in 56-194 and 395-514 position of the detected epitopes belonged to RBD and NTD regions of the S protein, respectively. However, the epitopes with aa position of 1067-1146 were not selected as the potential epitope candidate because of their presence in viral transmembrane domains (Fig. S4). The tertiary structures of the RBD and NTD illustrate their surface-exposed position on the S protein (Fig. 1). Using IEDB analysis resource and Bepipred linear epitope prediction 2.0 tools, we predicted eight and six B-cell epitopes in RBD and NTD regions out of total 22 epitopes in S protein, while the E and M proteins had 2 and 6 epitopes, respectively (Fig. 2, Table S1). However, only 5 epitopes were exposed on the surface of the virion, and had a high antigenicity score (>0.4), indicating their potentials in initiating immune responses (Table 2).
Among the five annotated epitopes having antigenicity score of ≥ 0.5 (VaxiJen 2.0 tool), RBD and NTD regions each possessed two highly antigenic epitopes while the envelope (E) protein contained only one highly antigenic epitope and membrane (M) protein has none (Table 2). Furthermore, the Kolaskar and Tongaonkar antigenicity profiling found five highly antigenic epitopes in RBD region with an average (antigenicity) score of 1.042 (minimum = 0.907, maximum = 1.214), and seven highly antigenic epitopes in NTD with an average (antigenicity) score of 1.023 (minimum = 0.866, maximum = 1.213) (Fig. S5, Table S2). The average Kolaskar scores for envelope protein B-cell epitope (EBE) and membrane protein B-cell epitope (MBE) were 0.980 and 1.032, respectively (Table S2). However, through ABCPred analysis, we further verified 18 and 11 B-cell epitopes in RBD and NTD regions with average antigenicity score of 0.775 and 0.773 in the associated domains, respectively (Table S3).

Selection of T-cell and IFN-γ inducing epitopes
The IEDB MHC-I prediction tool retrieved 77 T-cell epitopes in RBD that interacted with 21 possible MHC-I alleles whereas the NTD domain possessed 35 T-cell epitopes with 17 possible MHC-I alleles (Data S1). Similarly, the IEDB MHC-II prediction tool generated 13-mer 124 peptides from the RBD, and 10-mer 73 peptides in the NTD segments of the S protein that showed interaction with many different and/or common MHC-II alleles with an IC 50 value ranging from 1.4 to 49.9 nM (Data S1). Furthermore, the analysis tool of the IEDB generated an overall scores for proteasomal processing, TAP transport, and MHC-binding efficiency indicating the intrinsic potential of the epitopes to be recognized by immunoreactive T-cells (Data S1).
The findings of IFNepitope program suggests that, both the target RBD and NTD regions of S protein, and membrane protein B-cell linear epitope (MBE) had great probability to Table 1 Linear epitopes present on spike (S) glycoprotein surface predicted through ElliPro in IEDB-analysis resource based upon solventaccessibility and flexibility are shown with their antigenicity scores. The highlighted green coloured regions were the potential antigenic domains while the yellow coloured region represents the trans-membrane domain of the S protein. release of IFN-γ with a positive score. A total of 56 potential positive IFN-γ inducing epitopes (15-mer) were predicted for the RBD domain with an average epitope prediction score of 0.255 and the maximum SVM score of 0.625. On the other hand, a total of 33 potential positive epitopes were predicted for the NTD domain with an average epitope prediction score of 0.312 and the maximum SVM score of 0.811. Moreover, the M protein also possessed several IFN-γ inducing epitopes having an average epitope prediction score of 0.980 (Table S4).

Design-construction, antigenicity and physicochemical properties of the chimeric vaccine candidate
The selected epitope-sequences for designing of chimeric construct were PADRE (13 aa), MBE (20 aa), NTD (139 aa), RBD (200 aa), EBE (15 aa), and Invasin (16 aa), and the construct was named as CoV-RMEN (417 aa) as shown in Fig. 3A. These segments were connected with a repeat of hydrophobic (glycine; G) and acidic aa (glutamic acid; E) linkers for making the final vaccine construct more flexibile with balanced ratio of acidic and basic amino acids. The molecular weight (MW) of the CoV-RMEN was 46.8 kDa with a predicted isoelectric point (pI) of 8.71. The projected half-life was 4.4 h in mammalian reticulocytes in vitro, and >20 h in yeast and >10 hours in E. coli in vivo. The protein was predicted to be less soluble upon expression with a solubility score of 0.330. An instability index (II) value 29.74 predicted the protein as stable (II of >40 indicates instability). The estimated aliphatic index was 66.59, indicating thermostability of the final chimera. The predicted grand average of hydropathicity (GRAVY) was −0.300. The antigenicity score of 0.450 was predicted by the VaxiJen 2.0 server with a virus model at a threshold of 0.4, and further verified by ANTIGENpro showing score of 0.875 (maximum expected score ranking is 1.0) indicating the high antigenic nature of the designed vaccine, CoV-RMEN. Moreover, the vaccine was also predicted to be non-allergenic on both the AllerTOP v.2 and AllergenFP servers.

Structural characterization of the CoV-RMEN
The CoV-RMEN peptide was predicted to contain 43.2% alpha helix, 67.4% beta sheet, and 12% turns (Fig. 3B, Fig. S6) using CFSSP:Chou and Fasman secondary structure prediction server. Additionally, regarding the solvent accessibility of aa residues, 34%, 30% and 34% were predicted to be exposed, medium exposed and buried respectively (Fig. S7). The RaptorX Property server predicted only two aa residues in the disordered domains. The Phyre2 server predicted the tertiary structure model of the designed chimeric protein in 5 templates (c5x5bB, c2mm4A, c6vsbB, c5x29B and c6vybB) based opon heuristics to maximize confidence, alignment coverage and percent identity. The final 3D structure of the CoV-RMEN peptide modelled at 82% with more than 90% confidence (Fig. 3C). Moreover, 65 residues were modelled by ab initio. The selected structural model has parameters of RMSD (0.414), GDT-HA (0.9538), and MolProbity (2.035). The Ramachandran plot analysis of the finally modelled protein exhibited 94.7% of the aa residues in favored regions (Fig. 3D), consistent with the 94.0% score predicted by the GalaxyRefine analysis. Additionally, 4.8% of the residues located in allowed regions, and only 0.5% in disallowed regions (Fig. 3D). The chosen model after refinement had an overall quality factor of 74.45% with ERRAT (Fig. S8) and a ProSA-web based Z-score of −6.17 (Fig. 3E).

Molecular docking and dynamics simulation analysis
Among the selected epitopes from the RBD and NTD segments, top five based on IC 50 score ( Data S1) revealed highly favorable molecular interaction for stable binding with their respective HLA alleles. Docking complexes thus formed have significantly negative binding affinity ( G always remained ≤−8.2 kcal mol −1 , average = −9.94 kcal mol −1 ), and most of the amino acid (aa) residues of the epitopes were involved in molecular interactions with their respective HLA alleles (Fig. 4, Data S1). The immune responses of TLR2, TLR3 and TLR4 against vaccine construct (CoV-RMEN) were estimated by analyzing the overall conformational stability of vaccine protein-TLRs docked complexes. The active interface aa residues of refined complexes of CoV-RMEN and TLRs were predicted (Fig. 5, Table 3). The relative binding free energies ( G) of the protein-TLRs complexes were significantly negative ( Table 3) which suggest that the interaction of the chimeric protein might favor stimulation of the TLR receptors. Consistently, the number of contacts made at the interface (IC) per property (ICs charged-charged: 5, ICs charged-polar: 2, ICs charged-apolar: 17, polar-polar: 1, ICs polar-apolar: 7 and apolar-apolar: 16) for the vaccine protein-TLR2 complex. Interface contacts (IC) per property (ICs charged-charged: 16, ICs charged-polar: 22, ICs charged-apolar: 26, polar-polar: 6, ICs polar-apolar: 25 and apolar-apolar: 29) were for the vaccine protein-TLR3 complex. Also, vaccine protein-TLR4 complex showed similar (ICs) per property (ICs charged-charged: 5, ICs charged-polar: 11, ICs charged-apolar: 30, polar-polar: 4, ICs polar-apolar: 31 and apolar-apolar: 39). Furthermore, the molecular dynamics (MD) simulation analysis of the docked CoV-RMEN-TLR3 and CoV-RMEN-TLR4 complexes showed soundly stable RMSD values between ∼4.35 and ∼5.4 nm for a specified time frame of 100 ps at the reasonably consistent temperature (∼300 K) and pressure (1bar), whereas CoV-RMEN-TLR2 complex showed RMSD value between 5.5 and 6.2 with same cut-off parameters. These data validated that the docked complexes (CoV-RMEN-TLR3 and CoV-RMEN -TLR4) are more stable than CoV-RMEN-TLR2 (Fig. 5).

Immune simulation
The cumulative results of immune responses after three times antigen exposure with four-week interval each time revealed that the primary immune response against the antigenic fragments was elevated indicated by gradual increase of IgM level after each antigen exposure (Fig. 6A). Besides, the secondary immune response, crucial for immune stability, have been shown as increased with adequate generation of both IgG1 and IgG2. Also, the elevated level of all circulating immunoglobulins indicates the accuracy of relevant clonal proliferation of B-cell and T-cell population. The level of cytokines after antigen exposure increased concomitantly reflected by escalation of IFN-γ and IL-2, which are most significant cytokines for anti-viral immune response and clonal selection (Fig. 6B)  and (P-T) represent the top five MHC-II epitopes binds of the same domains binds to their respective HLA alleles. The protein-peptide docking was performed in GalaxyWEB-GalaxyPepDock-server followed by the refinement using GalaxyRefineComplex and free energy ( G) of each complex was determined in PRODIGY server. Ribbon structures represent HLA alleles and stick structures represent the respective epitopes. Light color represents the templates to which the alleles and epitopes structures were built. Further information on molecular docking analysis is also available in Data S1.

Population coverage analysis
The selected CTL and HTL epitopes covered 94.9% and 73.11% of the world population, respectively. Importantly, CTL and HTL epitopes showed 98.63% population coverage worldwide when considered in combination. The highest population coverage was found   to be 99.99% in the Latin American country, Peru (Fig. 7, Data S2). In China, where the viral strain (SARS-CoV-2) first appeared and had more devastating outbreaks, the population coverage for CTL and HTL epitopes was 92.67% and 53.44%, respectively with a combined coverage of 96.59%. SARS-CoV-2 is currently causing serious pandemics in different continents of the globe including Italy, England, Spain, Iran, South Korea and United States of America where the combined population coverage was found to be 98.8%, 99.44%, 95.35%, 98.48%, 99.19% and 99.35%, respectively (Fig. 7A, Data S2). In addition to geographical distribution, the ethnic groups also found to be an important determinant for good coverage of the CTL and HTL epitopes (Fig. 7B). Of the studied 147 ethnic groups, the Peru Amerindian had highest population coverage for CTL (99.98%) while the HTL epitopes had highest population coverage for Austria Caucasoid (88.44%) (Fig. 7B, Data S2). Furthermore, 53.06% of the ethnic groups had a combined population coverage of more than 90.0% for both CTL and HTL epitopes.

Expression prediction of the CoV-RMEN
The length of the optimized codon sequence of the vaccine construct CoV-RMEN in E. coli (strain K12) was 1,251 nucleotides. The optimized nucleotide sequence had a Codon Adaptation Index (CAI) of 0.87, and the average GC content of 50.26% showing the possibility of good expression of the vaccine candidate in the E. coli host (Figs. 8A-8C). Moreover, the evaluation of minimum free energy for 25 structures of chimeric mRNA through Mfold'server showed that G of the best predicted structure for the optimized construct was G = −386.50 kcal/mol. The first nucleotides at 5 did not have a long stable hairpin or pseudoknot. Therefore, the binding of ribosomes to the translation initiation site, and the following translation process can be readily accomplished in the target host. These outcomes were in the agreement with data obtained from the 'RNAfold' web server (Figs. 8D and 8E) where the free energy was −391.37 kcal/mol. After codon optimization and mRNA secondary structure analysis, the sequence of the recombinant plasmid was designed by inserting the adapted codon sequences into pETite vector (Lucigen, USA), which contains SUMO (Small Ubiquitin-like Modifier) tag and 6x-His tag facilitating both the solubilization and affinity purification of the recombinant protein using SnapGene software (Fig. 9). As alternative to E. coli for the expression system, HEK-293 eukaryotic cell line found promising for CoV-RMEN expression. The codon adaption index (CAI), GC content for this system were 1.0 and 61.60 respectively, which indicate high level of expression of the vaccine construct in the HEK-293 cell line as well.

DISCUSSION
SARS-CoV-2, the virus with high zoonotic importance and transmission rate, has spread rapidly around the world and causes life-threatening COVID-19 (Gorbalenya et al., 2020). The number of SARS-CoV-2 infections, and subsequent deaths are increasing day by day (Tai et al., 2020), and thus, COVID-19 outbreak was declared as a public health emergency by the International Concerns (Hui et al., 2020;Zhou et al., 2020a). Scientific community across the world is trying to develop an effective and safe vaccine against this rapidly emerging SARS- CoV-2 (Abdelmageed et al., 2020). Although a good number of vaccine candidates for COVID-19 are now under trials, some of them are advanced to human trials (Lane, 2020), none has yet been declared to be effective and safe for prevention of SARS-CoV-2 infections. Strikingly, no effective therapeutic drugs or vaccines are yet to be available for the treatment of SARS-CoV-2 patients (Hoque et al., 2020a). Through a comprehensive genomic and proteomic study, we endeavor to design an antigenic multiepitope (immunodominant) chimeric vaccine for SARS-CoV-2, named as CoV-RMEN (417 aa), which will nullify the involvement of lab-escape viral transmission, reduce the cost, and may elicit immunity by selectively stimulating antigen-specific B-and T-cells.
The novel approach of multi-epitope based (includes conserved multiple epitopes) vaccines designing represents inducing specific cellular immunity, and highly potent neutralizing antibodies against infections (Dawood et al., 2019;Yong et al., 2019;Gralinski & Menachery, 2020;Kibria, Ullah & Miah, 2020). These epitope-based vaccines also provide increased safety and have the ability to focus on sustainable immune responses because of including conserved multiple epitopes. Unlike the full-length S protein, the RBD and NTD segments possess critical neutralizing domains without any non-neutralizing immunodominant region (Ul Qamar et al., 2019;Gralinski & Menachery, 2020;Shang et al., 2020;Wrapp et al., 2020). Mutations on the RBD may enable the new strains to escape neutralization by established RBD-targeting antibodies, hence other functional regions, especially the NTD, should be considered for developing an effective vaccine as well (Wang et al., 2019;Zhou et al., 2019). Besides, combined administration of RBD and NTD proteins induced highly potent neutralizing antibodies and long-term protective immunity in animal models (Song et al., 2018). Considering the safety and effectiveness perspectives, the RBD and NTD are more promising candidates in the development of SARS-CoV-2 vaccines over the full-length S protein. The presence of E and M proteins on the envelope can augment the immune response against SARS-CoV (Millet & Whittaker, 2015;Almofti et al., 2018) and thus, considered for suitable candidate for vaccine development (Yong et al., 2020;Ahmed, Quadeer & McKay, 2020;Gralinski & Menachery, 2020). Thus, antibodies against the immunologically substantial epitopes of S, M and E proteins of SARS-CoV-2 would provide protective immunity to the infection (Yong et al., 2020;Ahmed, Quadeer & McKay, 2020;Gralinski & Menachery, 2020;Shang et al., 2020). Therefore, the immune response targeting the RBD and/or NTD of the S glycoprotein, M and E proteins of SARS-CoV-2 would be an important prophylactic and therapeutic interventions, which can be tested further in suitable models before clinical trials (Chan et al., 2020).
Effective immunity to viral infections is significantly dependent on activation of both B-and T-cells (Shi et al., 2015;Shey et al., 2019). Therefore, inducing specific humoral or cellular immunity against pathogens, an ideal vaccine should contain both B-cell and T-cell epitopes. Our analyses revealed that selected RBD and NTD regions of the CoV-RMEN contain ample amount of high-affinity B-cell, MHC Class I, MHC Class II and interferon-γ (IFN-γ) epitopes with high antigenicity scores. Moreover, membrane B-cell epitope (MBE) and envelope B-cell epitope (EBE) enhanced the overall stability, immunogenicity and antigenicity of the CoV-RMEN. The development of memory B-cells and T-cells was evident, with memory in B-cells lasting for several months. These finding opposed to several earlier reports where T-cell mediated immune response was considered a long-lasting response compared to B-cells (Abdelmageed et al., 2020;Wrapp et al., 2020). Another engrossing finding of this study was the development of Th1 response which enhances the growth and proliferation of B-cells augmenting the adaptive immunity (Carvalho et al., 2002). If a strong B-cell response occurred in animal trials (mice or rabbit), these antibodies could be used in diagnostic purposes, as they should recognize the prominent antigens on the viral surface (Kibria, Ullah & Miah, 2020). Moreover, CD8+ and CD4+ T-cell responses play major role in antiviral immunity (Abdelmageed et al., 2020). Another crucial fact is that Toll-Like Receptors (TLRs) can effectively bind with spike protein of the CoV (Totura et al., 2015;Zander et al., 2017), and might play an important role in the innate immune response to SARS-CoV-2 infection (Shahabi et al., 2020).
The physicochemical properties also revealed the chimera as a basic or alkaline protein (pI = 8.71) and would be thermostable upon expression, and thus, our proposed vaccine CoV-RMEN would be best suited for worldwide use in different endemic areas (Shey et al., 2019;Ul Qamar et al., 2019). The structural forms (secondary and tertiary) of the CoV-RMEN, when tested as the synthetic peptides, showed the ability to fold into their native structure, hence could mimic the natural infection by SARS-CoV-2 (Almofti et al., 2018). The refined tertiary (3D) structure of the final vaccine construct markedly presented the desirable structural features based on Ramachandran plot predictions (Shey et al., 2019;Srivastava et al., 2019). Molecular docking analysis showed that predicted chimeric protein can establish stable protein-protein interactions with TLRs (TLR-2, TLR-3, TLR-4) (Totura et al., 2015). An efficient activation of surface molecules of the CoV-RMEN is very crucial for immune activation of dendritic cells, and subsequent antigen processing and presentation to CD4+ and CD8+ T-cells via MHC-II and MHC-1, respectively (Shi et al., 2015;Shey et al., 2019;Shang et al., 2020). The molecular dynamics simulation also revealed that the docked CoV-RMEN-TLRs complexes were stable, and had more binding affinity TLR-3 and TLR-4 (Abraham et al., 2015). Furthermore, the CoV-RMEN showed good antigenicity scores on Vaxijen v2.0 and ANTIGENpro indicating that these peptide sequences are supposed to be highly antigenic in nature (Shey et al., 2019). The non-allergenic properties of the CoV-RMEN further strengthens its potential as a vaccine candidate (Shey et al., 2019;Ul Qamar et al., 2019).
Immune simulation of the CoV-RMEN exhibited expected results consistent with typical immune responses, and there was a growing immune responses after the recurrent antigen exposures (Fig. 6). The antiviral cytokine IFN-γ and cell stimulatory IL-2 level significantly increased, which also contribute to the subsequent immune response after vaccination in host (Almofti et al., 2018). This indicates high levels of helper T-cells and consequently efficient antibody production, supporting a humoral response (Shey et al., 2019). A lower IC 50 value indicates higher binding affinity of the epitopes with the MHC class I and II molecules. While most of the previous studies (Sakib et al., 2014;Adhikari, Tayebi & Rahman, 2018) reported that a binding affinity (IC 50 ) threshold of 250 nM identifies peptide binders recognized by T-cells, and this threshold can be used to select peptides, we kept binding affinity within 50 nM to get better confidence level in predicting epitopes for MHC-I and MHC-II alleles.The Simpson index estimated clonal specificity suggested a possible diverse immune response and this is plausible considering the generated chimeric peptide is composed of sufficient B-and T-cell epitopes (Fig. 6).
The interaction between T-cell epitopes, and their respective HLA alleles revealed significant binding affinity reflecting the immune activation of B-and T-cells as supported by other reports (Srivastava et al., 2019;Jaimes et al., 2020). T-cell epitopes from RBD and NTD regions showing high interaction with HLA alleles covered more than 98% of the world population with different ethnic groups, and these findings corroborated with many of earlier studies (Huang et al., 2007;Jaimes et al., 2020;Ul Qamar et al., 2019). The incorporation of GG and EGGE linkers between the predicted epitopes of the CoV-RMEN produced sequences with minimized junctional immunogenicity, and allowed the rational design construction of a potent multi-epitope vaccine (Huang et al., 2007;Badawi et al., 2016;Shey et al., 2019;Srivastava et al., 2019). The glycosylation of the surface antigens helps the enveloped viruses evade recognition by the host immune system, and can influence the ability of the host to raise an effective adaptive immune response (Pereira et al., 2018) or even be exploited by the virus to enhance infectivity (Wolfert & Boons, 2013). Moreover, some antibodies such as mAb 8ANC195 have evolved to recognize peptide epitope with no dependence on glycan binding (Kong et al., 2015). However, there is no data available for antibodies specific to spike glycoproteins of SARS-CoV-2, whether their recognition is interfered by the glycosylation of spike or may either be strengthened by sugars close to the peptide epitope, or not interfered by sugar modification (Zhou et al., 2020b). Furthermore, most of the epitopes of the CoV-RMEN harbor no glycoside apart from the NTD region (Watanabe et al., 2020). Furthermore, most of the glycans of the NTD epitopes present at the terminii of the putative HLA antigens, and may not interfere with the antigen presentation in an HLA complex (Grant et al., 2020). Integrity of the ACE2 receptor of the RBD, envelope protein B-cell epitope (EBE) and membrane protein B-cell epitope (MBE) in the CoV-RMEN suggests that the vaccine may maintain efficacy despite antigenic drift and glycosylation phenomena, as long as the virus continues to target the same host receptor (Grant et al., 2020).
One of the first steps in validating a candidate vaccine is to screen for immunoreactivity through serological analysis. This requires the expression of the recombinant protein in a suitable host. As we focused on the epitopes without the glycosylation or non-significant glycosylation, high-level expression of the vaccine was optimized into well-established and cost-effective prokaryotic expression system E. coli K-12 strain as the first choice using the plasmid pETite containing SUMO (Small Ubiquitin-like Modifier) tag and 6x-His tag facilitated both the solubilization and affinity purification of the recombinant protein (Biswal et al., 2015). Codon optimization of the CoV-RMEN revealed its high-level expression in E. coli (strain K12). Stable mRNA structure, codon adaptability index (0.87), and the GC content (50.26%) were favourable for high-level expression of the protein in the bacterium. After successful cloning of the gene, the recombinant plasmid can be propagated efficiently using E. coli cells, and subsequent protein expression can be performed in E. coli K-12 strain using IPTG (Isopropyl β-d-1-thiogalactopyranoside) induction, and cultivation at 28 • C as also reported earlier (Biswal et al., 2015).
As alternative to E. coli, eukaryotic cell line, HEK-293 was considered for CoV-RMEN expression. The codon adaption index (CAI), GC content were 1.0 and 61.60 respectively, wihch indicate high level of expression of vaccine construct in the HEK-293 cell line. In this case, pSec Tag2 mammalian expression vector could be used, which have secretion signal from the V-J2-C region of the mouse Ig kappa-chain for efficient secretion of the recombinant protein, C-terminal poly-histidine (6xHis) tag for rapid purification with C-termnal c-myc epitope for detection with an anti-myc antibody. To remove the affinity and detection tags (His tag and c-myc epitope) after purification, Factor Xa clevage site (LVPR↓GS) could be added to the C-terminal of the CoV-RMEN (Waugh, 2011).

Sequence retrieval and structural analysis
A total of 250 partial and complete genome sequences of SARS-CoV-2 were retrieved from NCBI (National Center for Biotechnology Information, https://www.ncbi.nlm. nih.gov/protein) (Table S5). We aligned these sequences through MAFFT online server (https://mafft.cbrc.jp/alignment/server/) using default parameters, and Wu-Kabat protein variability was analyzed (Fig. S9) in protein variability server (http: //imed.med.ucm.es/PVS/) for SARS-CoV-2 NCBI reference genome (Accession no: NC_045512.2). We retrieved the S protein sequences of the SARS-CoV and MERS-CoV from the whole genome reference sequences of the respective three viruses from the NCBI database. Moreover, the S proteins of SARS-CoV (GenBank accession no: NC_004718.3), we adopted the ''Motif and SVM hybrid'' (MERCI: Motif-EmeRging and with Classes-Identification, and SVM) approach. The prediction is based on a dataset of IFN-γ-inducing and IFN-γ-noninducing MHC allele binders (Dhanda, Vir & Raghava, 2013).

Design and construction of multi-epitope vaccine candidate (CoV-RMEN)
The candidate vaccine (denoted as 'CoV-RMEN') design and construction method follows previously published peptide vaccine development protocol for different emerging infectious diseases like SARS and MERS (Shi et al., 2015;Badawi et al., 2016;Almofti et al., 2018;Shey et al., 2019;Srivastava et al., 2019;Ul Qamar et al., 2019). The multiepitope protein was constructed by positioning the selected RBD, NTD, MBE and EBE aa sequences linked with short, rigid and flexible linkers GG. To develop highly immunogenic recombinant proteins, two universal T-cell epitopes were used, namely, a pan-human leukocyte antigen DR-binding peptide (PADRE) (Agadjanyan et al., 2005), and an invasin immunostimulatory sequence taken from Yersinia (Invasin) (Li et al., 2015) were used to the N and C terminal of the vaccine construct respectively, linked by EGGE.
Furthermore, the local structural quality of the CoV-RMEN was refined with GalaxyRefine server, and ProSA-web (https://prosa.services.came.sbg.ac.at/prosa.php) was used to calculate overall quality score for the refined structure. The ERRAT server (http://services.mbi.ucla.edu/ERRAT/) was also used to analyze non-bonded atomatom interactions compared to reliable high-resolution crystallography structures. A Ramachandran plot was obtained through the RAMPAGE server (Lovell et al., 2003).

Physicochemical properties prediction of CoV-RMEN
Moreover, the online web-server ProtParam (Gasteiger et al., 2005) was used to assess various Physicochemical parameters of the CoV-RMEN including aa residue composition, molecular weight, theoretical pI, instability index, in vitro and in vivo half-life, aliphatic index, and grand average of hydropathicity (GRAVY). The solubility of the multiepitope vaccine peptide was evaluated using the Protein-Sol server (https://proteinsol.manchester.ac.uk/).

Analysis of cDNA and mRNA for cloning and expression of CoV-RMEN
Reverse translation and codon optimization were performed using the GenScript Rare Codon Analysis Tool (https://www.genscript.com/tools/rare-codon-analysis) in order to express the CoV-RMEN in the E. coli (strain K12). Stability of the mRNA was verified using two different tools, namely RNAfold (http://rna.tbi.univie.ac.at/cgibin/RNAWebSuite/RNAfold.cgi) and the mfold (http://unafold.rna.albany.edu/?q=mfold) web-servers. The optimized gene sequence of CoV-RMEN will be artificially synthesized having N-terminal recombinant human rhinovirus (HRV 3C) protease site (LEVLFQ↓GP) and cloned the final construct into pETite vector (Lucigen, USA) through enzymefree method (Waugh, 2011). Finally, the sequence of the recombinant plasmid was designed by inserting the adapted codon sequences into pETite vector using SnapGene software (from Insightful Science; available at snapgene.com). As an alternative to E. coli, eukaryotic expression system HEK-293 was optimized using similar analysis for the vaccine production.

Population coverage by CTL and HTL epitopes
The predicted T-cell epitopes were shortlisted based on the aligned Artificial Neural Network (ANN) with half-maximal inhibitory concentration (annIC 50 ) values up to 50 nM. The IEDB ''Population Coverage'' tool (http://tools.iedb.org/population/) was used to determine the world human population coverage by the shortlisted CTL and HTL epitopes (Bui et al., 2006). We used OmicsCircos to visualize the association between world population and different ethnic groups (Hoque et al., 2020b).

CONCLUSIONS
This multi-epitope peptide vaccine candidate, CoV-RMEN possesses potential epitopes from the RBD and NTD segments of spike (S), M and E proteins retaining potential antigenicity and non-allergenicity properties. This chimera, suitable for high-level expression and cloning, includes potential CTL, HTL and B-cell epitopes ensuring humoral and cell mediated immunity, as well as predicted immune-simulation refers to increased production of immunoglobulins and cytokines. Molecular docking and dynamic simulation of the CoV-RMEN with the immune receptors (TLRs) predicted strong binding affinity, in particular with TLR3 and TLR4. Remarkably, the CoV-RMEN had more than 90.0% world population coverage for different ethnic groups. The limitations posed by fewer number of SARS-CoV-2 geneome sequence data which tends to mutate frequently may not affect our analysis since we included four peptides of high conservancy from three major proteins of SARS-CoV-2 genome in a multi-epitope vaccine with the high conservancy. However, future in vitro and in vivo studies are required to assess the potentiality of the designed vaccine candidate to induce a positive immune response against SARS-CoV-2 infections, and also to validate the results obtained herein through immuno-informatics analyses.