Epitope-based chimeric peptide vaccine design against S, M and E proteins of SARS-CoV-2 etiologic agent of global pandemic COVID-19: an in silico approach

Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) is the cause of the ongoing pandemic of coronavirus disease 2019 (COVID-19), a public health emergency of international concern declared by the World Health Organization (WHO). An immuno-informatics approach along with comparative genomic was applied to design a multi-epitope-based peptide vaccine against SARS-CoV-2 combining the antigenic epitopes of the S, M and E proteins. The tertiary structure was predicted, refined and validated using advanced bioinformatics tools. The candidate vaccine showed an average of ≥ 90.0% world population coverage for different ethnic groups. Molecular docking of the chimeric vaccine peptide with the immune receptors (TLR3 and TLR4) predicted efficient binding. Immune simulation predicted significant primary immune response with increased IgM and secondary immune response with high levels of both IgG1 and IgG2. It also increased the proliferation of T-helper cells and cytotoxic T-cells along with the increased INF-γ and IL-2 cytokines. The codon optimization and mRNA secondary structure prediction revealed the chimera is suitable for high-level expression and cloning. Overall, the constructed recombinant chimeric vaccine candidate demonstrated significant potential and can be considered for clinical validation to fight against this global threat, COVID-19.


Recognition of T-cell epitopes 206
The IEDB analysis resource tool was employed to identify T-cell epitopes in RBD  the IEDB MHC-II prediction tool generated 13-mer124 peptides from the RBD, and 10-mer 73 220 peptides in the NTD segments of the S protein that showed interaction with many different 221 and/or common MHC-II alleles with an IC 50 value ranging from 1.4 to 49.9 nM (Supplementary 222 Data 1). Furthermore, for MHC-I and MHC-II processing, the analysis tool of the IEDB 223 generates an overall score for each epitope's intrinsic potential of being a T-cell epitope based on 224 proteasomal processing, TAP transport, and MHC-binding efficiency (Supplementary Data 1). 225 The outcomes of these tools are quite substantial because they utilize vast number of alleles of 226 HLAs (human-leukocyte-antigens) during computation. 227 228

Molecular docking analysis of T-cell epitopes with HLA alleles 229
From the selected epitopes from the RBD and NTD segments, top five based on IC 50 230 score were used in molecular docking analysis using the GalaxyWeb server with their respective 231 HLA allele binders, in which they revealed significantly favorable molecular interaction for 232 binding affinity. Docking complexes thus formed have significantly negative binding energy, 233 and most of the aa residues of the epitopes were involved in molecular interactions with their 234 respective HLA alleles (Supplementary data 1). The epitope-HLA docking complexes were 235 further refined with GalaxyRefineComplex, and their binding affinity was analyzed through 236 PRODIGY web-server. All of the selected epitopes showed significantly negative binding 237 affinity (ΔG always remained ≤ -8.2 kcal mol -1 , average = -9.94 kcal mol -1 , Fig. 4, 238 Supplementary data 1). 239

IFN-γ inducing epitope prediction 241
The findings of IFNepitope program suggests that, both the target RBD and NTD regions 242 of S protein, and B-cell linear epitope (MBE) had great probability to release of IFN-γ with a 243 positive score. A total of 56 potential positive IFN-γ inducing epitopes (15-mer) were predicted 244 for the RBD domain with an average epitope prediction score of 0.255 and the maximum SVM 245 score of 0.625. On the other hand, a total of 33 potential positive epitopes were predicted for the 246 NTD domain with an average epitope prediction score of 0.312 and the maximum SVM score of 247 0.811. Moreover, the M protein also possessed several IFN-γ inducing epitopes having an 248 average epitope prediction score of 0.980 (Supplementary Table 4 (Fig. 5a, Supplementary Data 2). In addition to geographical 266 distribution, the ethnic groups also found to be an important determinant for good coverage of 267 the CTL and HTL epitopes (Fig. 5b)

Secondary and tertiary structures prediction of the CoV-RMEN 302
The CoV-RMEN peptide was predicted to contain 43.2% alpha helix, 67.4% beta sheet, 303 and 12% turns (Fig. 6b, Supplementary Fig. 5) using CFSSP:Chou and Fasman secondary 304 structure prediction server. In addition, with regards to solvent accessibility of aa residues, 34% 305 were predicted to be exposed, 30% medium exposed, and 34% were predicted to be buried. Only 306 2 aa residues (0.0%) were predicted to be located in disordered domains by the RaptorX Property 307 server ( Supplementary Fig. 6). The Phyre2 server predicted the tertiary structure model of the 308 designed chimeric protein in 5 templates (c5x5bB, c2mm4A, c6vsbB, c5x29B and c6vybB) 309 based on heuristics to maximize confidence, percent identity and alignment coverage. The final 310 model of the CoV-RMEN peptide modelled at 82% with more than 90% confidence (Fig. 6c). 311 Moreover, 65 residues were modelled by ab initio.

Immune simulation 328
The immune-stimulatory ability of the predicted vaccine CoV-RMEN was conducted 329 through the C-ImmSimm server. The analysis predicts the generation of adaptive immunity in 330 target host species (human) using position-specific scoring matrix (PSSM), and machine learning 331 techniques for the prediction of epitopes and immune interactions 28 . The cumulative results of 332 immune responses after three times antigen exposure with four weeks interval each time revealed 333 that the primary immune response against the antigenic fragments was elevated indicated by 334 gradual increase of IgM level after each antigen exposure (Fig. 7a). Besides, the secondary

Molecular docking of CoV-RMEN with immune receptors (TLR3 and TLR4) 350
The ClusPro server was used to determine the protein binding and hydrophobic 351 interaction sites on the protein surface. The immune responses of TLR3 and TLR4 against 352 vaccine construct (CoV-RMEN) were estimated by analyzing the overall conformational stability 353 of vaccine protein-TLRs docked complexes. The active interface aa residues of refined 354 complexes of CoV-RMEN and TLRs were predicted (Fig. 8, Table 3). The relative binding free 355 energies (ΔG) of the protein-TLRs complexes were significantly negative (Table 3)  content is between 30% and 70% ( Fig. 9 a,b,c). 376

Prediction of mRNA secondary structure of the CoV-RMEN 378
The evaluation of minimum free energy for 25 structures of chimeric mRNA, the 379 optimized sequences carried out by the 'Mfold'server. The results showed that Δ G of the best 380 predicted structure for the optimized construct was Δ G = -386.50 kcal/mol. The first nucleotides 381 at 5' did not have a long stable hairpin or pseudoknot. Therefore, the binding of ribosomes to the 382 translation initiation site, and the following translation process can be readily accomplished in 383 the target host. These outcomes were in the agreement with data obtained from the 384 'RNAfold'web server ( Fig. 9 d,e) where the free energy was -391.37 kcal/mol. 385 386

Expression of CoV-RMEN with SUMO-fusion 387
After codon optimization and mRNA secondary structure analysis, the sequence of the 388 chimeric peptide vaccine production does not involve virus replication, therefore reduce the cost 416 of production. Hence, a low cost strategy should be adopted for developing a highly demanded 417 vaccine for the mankind. Heterologous expression of any vaccine candidate protein has very 418 promising scopes for developing such low cost vaccine, providing that all essential properties for 419 antigenicity, immunogenicity and functional configuration are being conserved to mimic the 420 structural and functional property of the actual antigen 34 . Construction of a vaccine candidate 421 with multiple potential epitopes can obviously potentiate the multi-valency of the antigen to 422 develop immune response against a number of epitopes of any pathogen. Also, rational 423 engineering of epitopes for increased potency and magnitude, ability to enhance immune 424 response in conserved epitopes, increased safety and absence of unnecessary viral materials and 425 cost effectiveness all these cumulatively include potential benefit to multi-epitope recombinant 426 protein based vaccine 20 . This study was designed to assist with the initial phase of multi-epitope 427 vaccine candidate selection. Thereby, safe and effective vaccine development by providing 428 recommendations of epitopes that may potentially be considered for incorporation in vaccine 429 design for SARS-CoV-2. Vaccine design is improved through the use of specialized spacer sequences 39 . To 461 designing the CoV-RMEN (vaccine candidate) GG and EGGE linkers were incorporated 462 between the predicted epitopes to produce sequences with minimized junctional 463 immunogenicity, thereby, allowing the rational design construction of a potent multi-epitope 464 vaccine 21,38 . The molecular weight of our vaccine candidate, the CoV-RMEN is 46.8 kDa with a 465 predicted theoretical pI of 8.71, indicating that the protein is basic in nature. Also, the predicted 466 instability index indicates that the protein will be stable upon expression, thus further 467 strengthening its potential for use. The aliphatic index showed that the protein contains aliphatic 468 side chains, indicating potential hydrophobicity. All these parameters indicate that the 469 recombinant protein is thermally stable, hence would be best suited for use in different endemic 470 areas worldwide 6,21 . 471 The knowledge of secondary and tertiary structures of the target protein is essential in 472 vaccine design 39,40 . Secondary structure analysis of the CoV-RMEN indicated that the protein 473 consisted of 43.2% alpha helix, 67.4% beta sheet, and 12% turns with only 2 residues disordered. all of which showed significant antigenic properties compared to any other viral proteins. This 533 chimera also includes potential CTL, HTL and B-cell epitopes to ensure humoral as well as 534 cellular immune response and the optimal expression and stability of the chimera was validated. 535 With multiple limitations and high cost requirements for the attenuated vaccine preparation for 536 contagious agents like SARS-CoV-2, this chimeric peptide vaccine candidate gives us the hope 537 to ensure it's availability and relatively cheap option to reach entire world. This CoV-RMEN can 538 be very effective measure against COVID-19 to reach globally. Hence, this could be cloned, 539 expressed and tried for in vivo validations and animal trials at the laboratory level. 540

Sequence retrieval and structure generation 544
A total of 250 partial and complete genome sequences of SARS-CoV-2 were retrieved 545 from NCBI (Supplementary Table 5

Linear B-cell epitopes prediction 572
We employed both structure and sequence-based methods for B-cell epitopes prediction. 573 Conformational B-cell epitopes on the S protein were predicted by Ellipro (Antibody Epitope 574 Prediction tool; http://tools.iedb.org/ellipro/) available in IEDB analysis resource 51 with the 575 minimum score value set at 0.4 while the maximum distance selected as 6 Å. The ElliPro allows 576 the prediction and visualization of B-cell epitopes in a given protein sequence or structure. The 577 ElliPro method is based on the location of the residue in the protein's three-dimensional (3D) 578 structure. ElliPro implements three algorithms to approximate the protein shape as an ellipsoid, 579 calculate the residue protrusion index (PI), and cluster neighboring residues based on their 580 protrusion index (PI) value. The residues lying outside of the ellipsoid covering 90% of the inner 581 core residues of the protein score highest PI of 0.9 23 . Antigenicity of full-length S (spike 582 glycoprotein), M (membrane protein) and E (envelope protein) proteins was predicted using 583 The GalaxyRefine server was further used to improving the best local structural quality of the 679 CoV-RMEN according to the CASP10 assessment, and ProSA-web 680 (https://prosa.services.came.sbg.ac.at/prosa.php) was used to calculate overall quality score for a 681 specific input structure, and this is displayed in the context of all known protein structures. The 682 ERRAT server (http://services.mbi.ucla.edu/ERRAT/) was also used to analyze non-bonded 683 atom-atom interactions compared to reliable high-resolution crystallography structures. A 684 Ramachandran plot was obtained through the RAMPAGE server 685 (http://mordred.bioc.cam.ac.uk/~rapper/rampage.php). The server uses the PROCHECK 686 principle to validate a protein structure by using a Ramachandran plot and separates plots for 687 Glycine and Proline residues 64 . 688

Immune simulation 690
To further characterize the immunogenicity and immune response profile of the CoV-691 RMEN, in silico immune simulations were conducted using the C-ImmSim server 692  The red, cyan, and yellow colored regions represent the potential antigenic domains predicted by the IEDB analysis resource Elipro analysis.  GalaxyWEB-GalaxyPepDock-server followed by the refinement using GalaxyRefineComplex and free energy (ΔG) of each complex was determined in PRODIGY server. Ribbon structures represent HLA alleles and stick structures represent the respective epitopes. Light color represents the templates to which the alleles and epitopes structures were built. Further information on molecular docking analysis is also available in Supplementary Data 1.       (COVID-19), a public health emergency of international concern declared by the World Health Organization (WHO). An immuno-informatics approach along with comparative genomic was applied to design a multi-epitope-based peptide vaccine against SARS-CoV-2 combining the antigenic epitopes of the S, M and E proteins.
Here, I would like to request you to have APC waivers and discounts (Bangladesh, lowermiddle-income country) for this manuscript as per the rules of the journal. Therefore, me and rest of the co-authors of this manuscript do firmly believe and hope that you and the reviewer panel will consider this manuscript suitable for publication in npj Vaccines journal.
Thanking you for kind consideration.