Introduction

The world witnessed a fatal outbreak of a novel coronavirus (COVID-19) infection, also known as severe acute respiratory syndrome coronavirus 2 (SARS‐CoV‐2), at the end of the year 2019 [1, 2]. It was preliminary identified as a causative agent for a series of unusual pneumonia in Wuhan City, China, and later led to the sporadic inter-continental pandemic [1, 3, 4]. On March 2020, as the infection has crossed most international borders, WHO designated the pathogen as SARS-CoV-2 and coronavirus disease 2019 (COVID-2019) (WHO.int). By January 2022, more than > 31.8 million confirmed cases with > 5.5 million death reported in 208 countries (WHO.int). The pandemic outbreak affected millions of lives globally due to mandatory isolations/quarantines and lockdown and travel restrictions resulting in economic mayhem (Socio-economic impact of COVID-19 | United Nations Development Programme 2020) [5]. In humans, COVID-19 infections lead to respiratory tract infections presenting a vast range of pathophysiological symptoms ranging from mild, such as fever, coughing, and shortness of breath, and to severe symptoms, such as pneumonia, multiple organ failure (kidney, liver, heart, and CNS), and severe acute respiratory failure leading to fatality [1, 6].

The coronaviruses (CoVs) are termed due to crown-like spikes on their surface and are classified into four main types α, β, γ, and δ, infecting wide ranges of hosts ranging from mammals to aves [7]. The SARS-CoV-2 is a β‐coronavirus with a nucleocapsid of helical symmetry and belongs to the subfamily Orthocoronavirinae, in the family Coronaviridae, order Nidovirales [8,9,10]. The genome of SARS-CoV-2 is a ( +) sense single-stranded RNA as the genetic component which encodes a set of viral replicase proteins and structural proteins including spike protein (S), membrane, envelope, nucleocapsid, and proteases which cleaves polyproteins and several uncharacterized proteins [10]. The SARS-CoV-2 genomes encode for 14 ORFs, including ORF1a and ORF1ab, which encodes for 16 non-structural proteins (NSP1-16) and four essential structural proteins, including S glycoprotein, envelope (E) protein, membrane (M) protein, and nucleocapsid (N) protein which are vital for viral assembly and invasion of SARS-CoV-2 [11]. The polyproteins 1a and 1ab (pp1a and pp1ab) are auto-proteolytically cleaved by SARS-CoV-2 proteases to release 11 (pp1a) and 5 (pp1ab) functional proteins necessary for viral replication, survival and proliferation [12,13,14]. The homotrimer S protein constitutes the spikes on the viral envelope responsible for attachment to host cells. The three transmembrane domains of M protein determine the virion shape, facilitate membrane curvature, and bind to nucleocapsid. The E protein plays a role in virion assembly, pathogenesis, and release, and the N protein is essential for binding with the viral RNA genome [15].

The mode of entry of SARS-CoV-2 is identical to SARS-CoV, using human angiotensin-converting enzyme 2 (ACE2) and CLEC4M/DC-SIGNR receptor for attachment to the cell membrane [16,17,18]. The ACE2 is also expressed in the mucosa of the epithelial cell and the oral cavity of the tongue, type II alveolar cells (AT2), among others, acting as entry routes for viruses through endosomal pathways [19]. The SARS-CoV-2 S protein is a homotrimer class I type fusion protein that enables viral entry to human cells via ACE2 receptor [16]. The S protein is a 1273 residue glycoprotein with three conformational states: pre-fusion native state, pre-hairpin intermediate state, and post-fusion hairpin state (spike protein) [16]. The S protein comprises two subunits, S1, which is required for receptor recognition and S2, required for membrane fusion. The S1 subunit has C-terminal RBD (receptor-binding domain), primarily interacting with the human ACE2 receptor [3]. The S protein undergoes a conformational change when it fuses with the ACE2 receptor destabilizing the pre-fusion trimer, resulting in the discharge of the S1 subunit, permitting the transition of the S2 subunit to a steady post-fusion state [20]. Then, the host cell-mediated S protein priming by a cellular serine protease TMPRSS2 is responsible for the virus entering the host cell [13, 16]. After the virus has entered the host cell, it follows the typical replication cycle, infection and proliferation of a ( +) sense RNA virus such as MERS-CoV and SARS-CoV [21].

So far, since the pandemic breakout, the S glycoprotein is known to have a pivotal role in viral attachment and infection, in the induction of neutralizing-antibody and T-cell responses, and protective immunity, thus designating it as a top candidate for vaccine production. A vaccine based on the S protein and its epitope could induce antibodies in response to block virus binding and fusion [21, 22]. RNA viruses are known to develop high mutation rates related to the virus’s high proliferation, infection, and evasion of the host immune system. These high mutation rates are due to the low fidelity of the RNA-dependent RNA polymerase (RdRp) [23]. It also has become evident that the high mutation frequency in the S protein is correlated with specific variants of concern that have caused significant mortality in certain geographical regions of the globe [24, 25]. Since the COVID-19 pandemic, viral genome sequencing has been accelerated and shared at an unprecedented rate to identify the lineages and trace the virus evolution (SARS-CoV-2 Variant Classifications and Definitions 2021). More than one million SARS-CoV-2 sequences are available via the Global Initiative on Sharing All Influenza Data (GISAID), enabling real-time surveillance of the pandemic unfolding [26]. Since late 2020, SARS-CoV-2 evolution has witnessed numerous mutations, labeling the new variants as variants of concern, changing the virus characteristics, such as transmissibility and antigenicity [27]. A large number of SARS-CoV-2 variants have been recently revealed through genome sequencing projects, and the characteristics mutations of these variants are, unfortunately, the targets of most of the currently licensed and used COVID-19 vaccines [28]. Recent reports have summarized most of the SARS-CoV-2 variants identified and their mutational landscape [27,28,29,30,31,32].

A recent emerging cluster of cases reported in South Africa identified new variants of SARS-CoV-2 by a team led by Tulio de Oliveira at the University of KwaZulu-Natal. The variant carries many mutations also found in other variants, including Delta, and reports suggest that it is spreading at warp speed across South Africa and anecdotal reports across other countries. Due to the high rate of infectivity, WHO designated the strain as B.1.1.529, a variant of concern naming it Omicron on 26 November 2021. The Omicron joins Delta, Alpha, Beta, and Gamma on the current WHO list of variants of concern (Tracking SARS-CoV-2 variants 2021). The B.1.1.529 (Omicron) variant harbors 30 amino acid substitutions, three small deletions, and one small insertion in the SARS-CoV-2 S protein — the host immune system’s prime target antibodies recognize, potentially dampening their potency. Out of these 30 mutations, some of them were also present in the other strains, such as Delta and Alpha were found to be problematic and linked to high infectivity and enhanced ability to evade antibodies (Science Brief: Omicron (B.1.1.529) Variant | CDC 2021; Tracking SARS-CoV-2 variants, 2021) [30]. The mutational profile of 30 mutations can be found in Table S1. The key amino-acid substitutions in the S protein are A67V, del69-70, T95I, del142-144, Y145D, del211, L212I, ins214EPE, G339D, S371L, S373P, S375F, K417N, N440K, G446S, S477N, T478K, E484A, Q493R, G496S, Q498R, N501Y, Y505H, T547K, D614G, H655Y, N679K, P681H, N764K, D796Y, N856K, Q954H, N969K, and L981F (substitutions in RBD region (no. 15 mutations) are highlighted in bold type) (Fig. 1). The RBD domain of the S protein is the prime target of neutralizing antibodies generated following infection by SARS-CoV-2 [33], and the same is the component of currently used mRNA and adenovirus-based vaccines licensed for use and others awaiting regulatory approval [34].

Fig. 1
figure 1

SARS-CoV-2 S glycoprotein (B.1.1.529 Omicron) trimer structure model in different representations: top view (a), surface representation (b), cartoon representation (c). The protein chain is colored in gray, and mutations in the B.1.1.529 Omicron variant are colored in cyan, orange and green, respectively, for three chains in the trimer model. (d) Structural superimposition of RBD domain from S protein of Wuhan reference (gray) and B.1.1.529 Omicron variant (blue). The mutations in B.1.1.529 Omicron S protein are labeled with residue and number in the yellow box

Public vaccination is the best approach and tool for controlling and eliminating the pandemic virus, and the same vaccine development is urgently needed [35]. The diagnosis of COVID-19 is nowadays reliable, efficient, and quick, though public vaccination is required to prevent COVID-19 and its severe repercussions [36]. Since the outbreak of COVID-19, numerous vaccine candidates have been proposed, deployed, or in the development stage. A total of 139 and 194 vaccine candidates are in clinical and pre-clinical development, respectively (WHO.int). Diverse vaccine candidates include but are not limited to live-attenuated or inactivated whole virus, RNA and DNA-based vaccines, subunit vaccines, viral vectors, self-assembling virus-like particle vaccines, and others, with peculiar advantages and disadvantages. Subunit vaccines constructed on S protein have been highly effective for eliciting immunogenicity against the previous coronavirus breaks out, such as SARS and MERS [37, 38]. One promising approach includes designing a subunit vaccine with numerous and diverse antigenic variables which can produce an inclusive spectrum of native viral antigens [42]. A typical vaccine development cycle takes a minimum of ten years from lab research to its approved use after all clinical assessments, making traditional approaches time-consuming and labor intensive [39, 40]. The advent of genome sequencing and high-performance computational technology has made immunoinformatics predictions cost-effective and convenient [41,42,43]. In silico immuno-informatics approaches have reduced the number of experiments needed for vaccine development, and it has been demonstrated to identify protein immunogenic attributes with high efficiency [44,45,46]. Numerous researchers have employed immunoinformatics approaches for the development of novel and potential multi-epitope vaccines against several virulent pathogens including dengue and Ebola virus, Streptococcus parauberis, and SARS-CoV-2 [43,44,45,46,47,48,49,50]. The design of a multi-epitope vaccine candidate targeting different non-structural and structural proteins of SARS-CoV-2 have been recently reported by researchers globally with the potential to be an effective vaccine candidate [51,52,53,54,55,56,57,58]. Recently, researchers have also attempted to identify epitopes against Omicron variants of SARS-CoV-2 and linked them with linkers to finally generate what is known as multi-epitope based peptide vaccine [59,60,61].

In this work, we employed immunoinformatics to predict multiple immunogenic proteins from the SARS-CoV-2 proteome and thereby design a multi-epitope vaccine. In the present study, we used a computational-based immunoinformatics screening approach to identify and construct multi-subunit vaccine candidates. We used SARS-CoV-2 S protein (B.1.1.529 variant) to identify antigenic elements generating B-cell and T-cell immunity. We predicted B-cell and T-cell epitopes and evaluated their further toxicity, allergenicity, and antigenicity properties. We also performed a population coverage analysis and estimated the broad coverage globally. Furthermore, B-cell and T-cell epitopes were linked with appropriate adjuvants to generate a multi-subunit epitope vaccine (MESV). We also performed immune simulations to estimate the possible immune response generated by identified MESV. Finally, to assess the suitability of recombinant MESV production, we also performed in silico cloning.

Methodology

Data retrieval, structural, and physicochemical analysis of SARS-CoV-2 S protein

The protein sequence of S protein (B.1.1.529 variant) was derived from the GISAID (https://www.gisaid.org/) Database (Accession ID EPI_ISL_6980876). For sequence level comparison with wild-type strain, the reference strain (NC 045,512.2) was compared with B.1.1.529 variant S protein for sequence similarity using BLAST (https://blast.ncbi.nlm.nih.gov/Blast.cgi) [62]. For tertiary structure prediction of B.1.1.529 S protein, the amino-acid sequence was submitted to the I-TASSER (https://zhanggroup.org/I-TASSER/) [63]. For structure refinement, the GalaxyRefine (https://galaxy.seoklab.org/cgi-bin/submit.cgi?type=REFINE) [75] and ModRefiner (https://zhanggroup.org/ModRefiner/) [64] programs were used and to assess the Ramachandran statistics, the SAVES server (https://saves.mbi.ucla.edu/) was utilized as reported in our earlier studies [65, 66]. The GRAVY (grand average of hydropathicity), half-life, sub-atomic weight, instability index, aliphatic record, and amino acid atomic composition of the protein sequence were computed through ProtParam (http://web.expasy.org/protparam/).

Prediction of B-cell linear and discontinuous epitopes

To identify the linear B-cell epitope areas in the antigen sequence, the ABCpred (http://crdd.osdd.net/raghava/abcpred/) [68] was utilized. ABCpred operates based on artificial neural networks, which are collections of mathematical models that imitate some of the recognized aspects of the biological nervous system and draw on analogies of adaptive biological learning. The putative epitopes are then tested for antigenicity using Vaxijen 2.0 (http://www.ddg-pharmfac.net/vaxijen/) [69]. We also predicted the discontinuous epitopes with much more substantial attributes than the linear epitopes. The identification of discontinuous B-cell epitopes is a significant difficulty in vaccine design. The DiscoTope (https://services.healthtech.dtu.dk/service.php?DiscoTope-2.0) web server [70] was used to predict the surface area accessibility as well as amino acids that create discontinuous B-cell epitopes as identified through X-ray crystallography of antigen/antibody protein structures.

Prediction of CTL and HTL epitopes

The IEDB MHC I binding prediction methods (http://tools.iedb.org/mhci) were used to predict S glycoprotein CD8 + T-cell epitopes from the sequence. This method unifies epitope prediction restricted to many MHC class I alleles and proteasomal C-terminal cleavage using complicated artificial neural network applications. For predicting the CD4 + T-cell epitopes (peptides), the MHC II (http://tools.iedb.org/mhcii/) was used. Briefly, the epitopes showing the highest binding diversity with the various HLA serotypes were chosen. Furthermore, these epitopes were submitted to Vaxijen 2.0 server for assessing their antigenicity at a 0.7 value threshold. All the top-scoring epitopes identified from each tool mentioned above were then submitted to the IEDB T-cell class I Immunogenicity predictor (http://tools.iedb.org/immunogenicity/). After the determination of the HLA-confined CD8 + and CD4 + T-cell epitopes, the epitope toxicity and allergenicity were predicted using the ToxinPred server (http://crdd.osdd.net/raghava/toxinpred/) [71] and AlgPred (http://crdd.osdd.net/raghava/algpred/) [67] server respectively. Finally, the non-allergenic epitopes predicted from the above servers were chosen as T-cell epitopes.

Population coverage analysis

The IEDB Population Coverage tool (http://tools.iedb.org/population/) was used to test chosen epitopes from the HLA class I and class II families and their corresponding binding leukocytes antigens. With a known HLA background, this tool computed the distribution or fraction of persons projected to respond to the selected epitopes. Likewise, the IEDB device processes the usual number of epitope hits/HLA allele blends perceived by the all-out populace, just as the most extreme and least number of epitope hits are perceived by 90% of the chosen populace. The HLA genotypic frequencies are estimated, and T-cell epitopes are searched by region, ethnicity, and country.

Design, structural modeling, and validation of multi-epitope vaccine construct

The most efficient epitopes likely to generate subunit immunization should have the qualities such as exceptionally antigenic, immunogenic, non-allergenic, and non-toxic. Those epitopes with the following qualities were chosen further to construct multi-epitope subunit vaccine (MESV). An adjuvant was also appended with the EAAAK linker to the primary cytotoxic T lymphocytes (CTL) epitope. The different epitopes were connected with AAY, GPGPG, and KK linkers. β-defensin has been used as an adjuvant since it is a 45 amino acids long peptide that is an immunomodulator and an antimicrobial agent. The 6X His-tag was inserted into the vaccine constructs at the C-terminal end for easy purification using affinity chromatography. For checking the sequence homology of the MESV constructs with the human proteome, the MESV sequence was submitted for the BLAST sequence homology tool. Physicochemical properties such as half-life, theoretical isoelectric point (pI), hydropathy, and aliphatic index of the MESV construct were assessed using the Protparam (https://web.expasy.org/protparam/) tool [72]. The MESV construct’s allergenicity was assessed using the AllerTOP v.2.0 (https://www.ddg-pharmfac.net/AllerTOP/) [73, 74]. For evaluation of the secondary structure component of the MESV construct, PSIPRED (http://bioinf.cs.ucl.ac.uk/psipred/) [76] was used for the identification of secondary structural elements such as alpha-helices, extended chains, the number of beta turns, and random coils.

The homology model of MESV was constructed using I-TASSER (https://zhanggroup.org/I-TASSER/) webserver. The GalaxyRefine [63] and ModRefiner [75] servers were used to energy minimize the predicted MESV 3D structure. The Ramachandran plot statistical evaluation was performed using the SAVES web server, followed by structural validation analysis utilizing the PROSA (https://prosa.services.came.sbg.ac.at/prosa.php) webserver. For estimating the conformational B-cell epitopes of the proposed MESV, the ElliPro tool (http://tools.iedb.org/ellipro/) was used [77]. It estimates the residual protrusion index (PI), protein structure, and neighbour residue clustering to predict epitopes.

Molecular docking of the MESV and human immune receptors

Studying the binding affinity of the epitopes, which characterizes their molecular interaction with the human HLA class I and II molecules, is one of the finest approaches to gaining insight into their immune response. The CASTp server (http://sts.bioe.uic.edu/castp/) [78] was used to estimate the binding pockets on HLA-class I and II molecules as well as the human immunological receptor (TRL3, PDB: 3ULV, and TLR5, PDB: 3J0A). The CASTp server provides an extensive and detailed quantitative assessment of a protein’s topographic characteristics. The molecular docking of MESV against TLR3 structure and TLR5 structure (PDB: 3J0A) was carried out using GalaxyPepDock (https://galaxy.seoklab.org/cgi-bin/submit.cgi?type=PEPDOCK) [79]. The exact output was re-submitted to ClusPro (https://cluspro.bu.edu/) for further refinement [80]. The entire docking process has been previously validated and followed as mentioned in our earlier studies on different targets and using different approaches [80,81,82,83,84,85,86].

MM/GBSA-based evaluation of docked pose

The best scoring dock complex was retrieved from the above docking process, and the top pose was assessed through molecular mechanics/generalized born surface area (MM/GBSA) calculations computed by the HawkDock Server (http://cadd.zju.edu.cn/hawkdock/) [87]. HawkDock investigates the docked pose on a per residue basis across electrostatic potentials, Van der Waal potentials, polar solvation free energy, and solvation free energy using empirical models. The default parameters were used to minimize the docked pose complex for 5000 steps, including steepest descents (2000 cycles) and conjugate gradient minimizations (3000 cycles) through an implicit solvent model, the ff02 force field.

MD simulation and immune simulation of the multi-epitope vaccine

The WAXSIS web server (http://waxsis.uni-goettingen.de/) was utilized for the small-and wide-angle X-ray scattering (SWAXS) to study the properties of the peptide vaccine construct [88]. An accurate prediction of the generated curves was required. The predictions are complicated due to the scattering contribution from the hydration layer and the impact of temperature fluctuations. The MD simulation provides a complete model of the hydration layer and solvent. The Guinier analysis evaluated the protein compactness and the radius of gyration. The Vaccine-TLR5 complex was simulated and investigated; the deformability, B factor, eigenvalues, and elastic network of the docked complex were also analyzed.

To evaluate the immunogenicity and immune response characteristics, the MESV was subjected to immune simulation using a C-ImmSim online server (http://150.146.2.1/C-IMMSIM/index.php) [89]. C-ImmSim predicts related immunological reactions using a machine learning approach for three anatomical compartments: (i) bone, where hematopoietic stem cells are activated, and myeloid cells are created, (ii) the lymphatic organ, and (iii) the thymus, where naive T lymphocytes are selected to avoid autoimmune conditions. The simulations were carried out to administer two-dose injections of MESV at six-week intervals on days 0 and 42. Each time step was positioned at 100 based on the default values, indicating that each time step is 8 h long and time step 1 is the injection provided at time zero. As a result, two injections were given at 6-week intervals. The T-cell memory will be evaluated constantly in this situation. The plot analysis generated a graphical interpretation of the Simpson index.

In silico cloning and codon optimization

For maximizing vaccine expression in the host E. coli K12 system, codon optimization was carried out using the Java Codon Adaptation Tool (JCat) (http://www.jcat.de/) [90]. The restriction enzymes, prokaryotic ribosome binding site, and rho-independent transcription termination were kept in the default state. The acquired codon adaptation index (CAI) value and GC content of the adapted sequence against a certain threshold were assessed. The improved nucleotide was then cloned into the pET28a ( +) vector prototype using the SnapGene 4.2 tool.

Results

Data retrieval, structural, and physicochemical analysis of SARS-CoV-2 S protein

The protein-coding sequence of S protein (B.1.1.529 variant) was retrieved from the GISAID database in FASTA format. The sequence homology of S protein B.1.1.529 variant with reference strain (NC 045,512.2) was 97% (Supplementary Fig. S1). The tertiary structure of full-length S protein (B.1.1.529 variant) was generated with I-TASSER with a C-score of − 1.60 (− 5 to 2, the higher the better) and an estimated TM-score of 0.52 ± 0.15. Ideally the homology model prediction having C-score of >  − 1.50 and a TM-score > 0.5 is considered as a model of correct global topology [62, 63]. The Ramachandran statistics of the 3D model were 73.4% in the core, 20.4% in allowed, 3.5% in generously allowed, and 2.7% in the disallowed region (Fig. 2). After rounds of energy refinement, the Ramachandran statistics were improved to 86.7% in the core, 9.7% in allowed, 1.6% in generously allowed, and 1.9% in the disallowed region. The physiochemical attributes of the S protein (B.1.1.529 variant) are as follows: no. of amino acids (1270), Mol. Wt. 14,1300 Da, pI 7.14, estimated half-life > 10 h in prokaryotes and 30 h in eukaryotes, instability index 34.12 classifying the protein as stable. The estimated aliphatic index 84.95 indicating higher thermostability of the protein [90]. The negative GRAVY score of − 0.080 indicates the protein is hydrophilic and will interact with the water molecules [91].

Fig. 2
figure 2

SARS-CoV-2 S glycoprotein (B.1.1.529 Omicron) trimer structure model with each color representing three different chains

Linear and discontinuous B-cell epitopes

B-cell epitopes play a significant role in viral infection resistance. The amino acid screening methods analyzed possible B-cell epitopes from the S protein (B.1.1.529 variant) sequence. For predicting possible B-cell epitopes, a consensus-based technique was utilized. Three potential linear B-cell epitopes with non-allergenic, non-toxicity, and antigenic properties were identified after stringent criteria selection (Table 1). The peptide “HVTYVPAQEKNFTTAP” exhibited the most significant antigenic index compared to the other predicted B-cell epitope candidates due to its higher antigenicity score of 0.8375 which is above the virus threshold of 0.4 [68]. Our results with B-cell epitope prediction had higher antigenicity score compared to another recent report of Omicron S protein epitope prediction [60]. The linear B-cell epitopes were mapped on the trimeric S glycoprotein model, as shown in Fig. 3. The discontinuous B-cell epitopes were predicted by Discotope 2.0 using the trimeric S protein (B.1.1.529) model. The positions of discontinuous epitopes were mapped on the surface of the model structure of S protein (Fig. 4). All the discontinuous B-cell epitopes were mapped on the fully-exposed “spike head” region (Fig. 4), and no discontinuous B-cell epitopes were found on the less-exposed “spike stem” and the “spike root” region of the S glycoprotein (Fig. 4). Predicted discontinuous epitopes were selected from Omicron’s S glycoprotein’s complete protein chain component and graded according to their propensity score (Table 2).

Table 1 B-cell linear epitopes of SARS-Cov-2 variant (B.1.1.529) S protein and their immunogenic properties
Fig. 3
figure 3

SARS-CoV-2 S glycoprotein (B.1.1.529 Omicron) trimer structure model with mapped B-cell (yellow) and T-cell (green) predicted epitopes. The novel mutations of the B.1.1.529 Omicron variant in the predicted epitopes are red

Fig. 4
figure 4

SARS-CoV-2 S glycoprotein (B.1.1.529 Omicron) trimer structure model with mapped discontinuous B-cell (green) epitopes

Table 2 Discontinuous B-cell epitope contact numbers and their propensity score in the S glycoprotein (B.1.1.529 Omicron)

T-cell (THTL and TCTL) epitope prediction

The S protein (B.1.1.529) sequence was tested against several HLA class 1 alleles. The peptides were chosen depending on their rate rankings and the number of alleles they possibly bind. Also, the peptides were exposed to antigenicity tests utilizing Vaxijen 2.0. Nine epitopes were chosen for the subsequent analysis regarding the antigenicity scores. The main peptides have high restricting energy to HLA class I atoms and show a non-allergenic property. Before considering vaccine design, allergenicity prediction is critical, as vaccination candidates may induce a type II hypersensitivity reaction. Allergen 1.0 and ToxinPred assessed the epitope’s non-allergenicity and non-toxicity properties. The S protein (B.1.1.529) sequence was also searched through many MHC-II alleles for the HLA class II T-cell epitopes. Three epitopes were chosen for their antigenic characteristics. These non-allergenic epitopes can elicit an immunological response by activating either or all of the IFN-γ and IL-4 cytokines. Therefore, an ideal peptide anticipated as an epitope should have high antigenicity and capacity to bind with many alleles, thus having a high potential to initiate a strong defense response upon immunization. A total of 9 MHC class-I allele binding peptides and 3 MHC class-II allele binding peptides were obtained using the above stringent criteria virus threshold of 0.4 [68] (Tables S2A and S2B). The T-cell epitopes (MHC class I and II) were mapped on the trimeric S glycoprotein model, as shown in Fig. 3.

The non-allergenic peptides were as follows: “IPFAMQMAY” restricted to HLA-B*35:01,HLA-B*53:01, HLA-B*51:01 HLA allele; “LPFFSNVTW” restricted to HLA-B*35:01, HLA-B*51:01, HLA-B*57: 01, HLA-B*07:02, HLA-B*58:01 with an antigenicity score of 1.0808; “SVYAWNRKR” attaches to 7 alleles HLA-A*31:01, HLA-A*33:01, HLA-A*68:01, HLA-A*03:01, HLA-A*68:01, HLA-A*11:01, HLA-A*30:01; “EILPVSMTK” restricted to HLA-A*68:01, HLA-A*11:01, HL-A*11:01, HLA-A*03:01,HLA-A*33:01, HLA-A*33:01; the peptide “TEILPVSMTK” would be able to bind with 3 alleles HLA-A*11:01, HLA-A*03:01, HLA-A*68:01; “PFAMQMAYR” restricted to HLA-A*33:01; “IPFAMQMAYR” restricted to HLA-A*33:01, HLA-A*35:01; “KFGAISSVL” restricted to HLA-A*24:02, HLA-A*23:01, “PFAMQMAYRF” restricted to HLA-A*23:01, HLA-A*24:02 (Table S2A).

The peptide “PYRVVVLSFELLHAP,” with an antigenicity score of 0.7072, attaches to 8 HLA alleles: HLA-DRB1*01:01, HLA-DRB1*03:01, HLA-DRB1*04:01, HLA-DRB1*04:05, HLA-DRB1*07:01, HLA-DRB1*08:02, HLA-DRB4*01:01, HLA-DRB5*01:01. The epitope “RVVVLSFELLHAPAT,” with an antigenicity score of 0.7485, is also restricted to 8 HLA alleles: HLA-DRB1*01:01, HLA-DRB1*03:01, HLA-DRB1*04:01, HLA-DRB1*04:05, HLA-DRB1*07:01, HLA-DRB1*08:02, HLA-DQA1*03:01/DQB1*03:02, HLA-DQA1*04:01/DQB1*04:02, HLA-DQA1*05:01/DQB1*02:01. The peptide “YRVVVLSFELLHAPA” binds to allele HLA-DPA1*03:01/DPB1*04:02, HLA-DPA1*02:01/DPB1*01:01, and HLA-DPA1*01:03/DPB1*02:01 with the antigenicity score of 0.7072. The HLA-DRB1 is the most common and versatile MHC-II molecule.

Population coverage of CTL and HTL epitopes

The distribution of HLA alleles varies among heterogeneous ethnic groups and geographical regions. Therefore, while constructing a viable epitope-based vaccine, it is necessary to consider population coverage. The selected MHC-I class epitopes showed higher individual percentage cover when examined with the entire global population. A total of 76.38% of the world’s individuals are capable of responding to the median of one MHC-I epitope (Table 3).

Table 3 Population coverage analysis of MHC-I class epitopes (+ restricted,—not restricted)

It was inferred that North America would possibly show a significant response to the selected HLA-I class-restricted epitopes out of the selected continents and subcontinents. East Asia, Oceania, West Indies, West Africa, Northeast Asia, and Southeast Asia had the highest population coverage of 83.65%, 82.01%, 81.85%, 79.54%, 77.90%, and 77.45%, respectively, while the Central America population had the lowest population coverage at 6.44%. The MHC-I class epitope’s population coverage has been considerably higher than the MHC-II class epitopes (Table 4) (Figs. 5 and 6).

Table 4 Population coverage analysis of MHC-II class epitopes (+ restricted)
Fig. 5
figure 5

Population coverage of MHC class I and II epitopes in selected geographical regions

Fig. 6
figure 6

a 3D structure model of MESV shown in cartoon representation, and b secondary structure prediction of the MESV

Structural model and physicochemical properties of multi-epitope subunit vaccine (MESV)

The constructed multi-epitope subunit vaccine (MESV) comprises 280 amino acids from 15 antigenic B-cell and T-cell epitopes linked with an immunoadjuvant. The tertiary structure of the multiple epitope vaccine was constructed using the I-TASSER server with a C-score of − 4.11 (C-score range: − 5 to 2, the higher the better) and an estimated TM-score of 0.20 ± 0.05 and was further structurally validated using Ramachandran statistics and the ProSA-web server. Ideally, the homology model prediction having C-score of >  − 1.50 indicates a model of correct global topology [62]. The Ramachandran statistics of the 3D model were 66.5% in the core, 26.4% in allowed, 5.0% in generously allowed, and 2.1% in the disallowed region. After rounds of energy refinement, the Ramachandran statistics were improved to 86.8% in the core, 10.7% in allowed, 0.4% in generously allowed, and 2.1% in the disallowed region (Fig. S4). In addition, the MESV structural model had a Z-score of − 2.32, indicating that it was a relatively acceptable model. The physiochemical attributes of MESV are as follow: no. of amino acids (280), Mol. Wt. 31,277.77 Da, pI 10.19, estimated half-life 30 h in eukaryotes, instability index 36.51 classifying the protein as stable. The estimated aliphatic index 71.61 indicating higher thermostability of the protein [90]. The negative GRAVY score of − 0.139 indicates the protein is hydrophilic and will interact with the water molecules [91].

Molecular docking between the vaccine and the toll-like receptor (TLR5)

TLR5 was chosen for its immunomodulatory potential to induce IFN-gamma. It was demonstrated in the study that our chosen CD4 + epitopes induced both Th1 and Th2 cytokines. In addition, the vaccine’s molecular interaction with TLR5 (PDB: 3J0A) was studied, considering their refined binding energies and numerous interacting residues. The possible orientation and interactions between the MESV and TLR5 are shown in Fig. 7a and b, respectively. The initial pose of MESV and TLR5 docked complex was first generated using GalaxyPepDock and then, the pose with the highest docking score (lowest free energy) was subjected to ClusPro server. The predicted binding free energy of the complex in the cluster center was − 48.61 (kcal/mol) as predicted by HawkDock server. The docking score of MESV-TLR5 complex was also compared to a recent report on MESV prediction using immuninformatics in Omicron spike protein [60]. The docking score for MESV-TLR5 was reported to be − 48.61 (kcal/mol), which is better than the reported in similar study with a score of − 35.16 kcal/mol.

Fig. 7
figure 7

a Docked complex of MESV (red) and TLR5 receptor (blue); b, c docked complex of MESV (blue) and TLR5 receptor (red) in surface representation at different angles

MD simulation and immune simulation of the MESV

The radius of gyration shows the rigidity of the vaccine, and the Rg value of the vaccine-TLR5 construct was 36.275. The eigenvalue of the vaccine-TLR5 complex was 1.697699e-06; the value is low, which shows the easier deformation of the complex, which means that the vaccine-TLR5 construct will activate the immune cascade. At some points, the B factor value of the complex is high, which directly corresponds to the higher deformability. The MD simulation results are shown in Fig. 8. The immune simulation of the vaccine was also carried out to simulate the immune response post-vaccination. At administering every dose of the MESV peptide vaccine, there is a sharp increase in the antibody response and a co-currently decrease in the antigen level. The antibody response was significantly higher, i.e., IgM > IgG humoral response highlighting the seroconversion (Fig. 9a). Also, following the second dose, IFN-γ response was higher (accompanying both CD8 + T-cell and CD4 + Th 1 response), and IL-10 and TGF-b cytokines response related to T-reg phenotype (Fig. 9b). Overall, the immune simulation results showed that each dose simultaneously increased the immune response. B-cell population per cell analysis revealed that the overall B-cell population and B-cell memory responses were higher and stable, showing minimal decay for over 300 days (Fig. 9c). After the second dose, there was a simultaneous rise in Th effector cell phenotype and a lower Th-memory readout (Fig. 9d). In addition, there was a concomitant increase in natural killer cell activities (Fig. 9e) throughout the simulation. To summarize the results, the points mentioned above are a good performance indicator of the vaccine construct, showing its capability to stimulate the correct immunological compartment for an effective response.

Fig. 8
figure 8

The MD simulation results of the MESV vaccine. a SAWXS net intensity of MSEV-TLR5 complex; and molecular dynamics simulation of the MESV-TLR5 complex, showing b deformability; c eigenvalue; d variance; e covariance matrix indicates coupling between pairs of residues (red), uncorrelated (white), or anti-correlated (blue) motions; f elastic network analysis which defines which pairs of atoms are connected by springs; and g radius of gyration plot showing the calculated values vs Guinier fit

Fig. 9
figure 9

Results of MESV immune simulations: a the plot of antigen level along with immunoglobin profile; b induced array of cytokines; c B-cell population over the time; d Th cell population over the time; e Tc cell population over the time; f natural killer cells (total count)

In silico codon optimization and cloning

The optimal vaccine codon sequence was 840 nucleotides long. The GC content of the cDNA sequence and adaptive codon index was calculated to be 52.14%, which is still within the recommended range of 30–70% for effective translational efficiency. The computed codon adaptability index was 1, within the range of 0.8–1.0, indicating that the vaccine designs were effectively expressed in Escherichia coli. The optimized nucleotide sequence was cloned into the pET28a ( +) vector with NdeI and EagI restriction enzymes (Supplementary Fig. S2).

Discussion

With the emergence of SARS-CoV-2 and, as a result, the World Health Organization (WHO) declaration that COVID-19 was a pandemic, researchers began working to develop vaccines to prevent the disease [92]. Despite the high mutational rate in SARS-CoV-2, new variants are emerging. Some variants may evade the current vaccines. This study was carried out because of the rapid rise of the omicron variant and its fear of eluding the vaccine. Traditional vaccine development has a high success rate, but the immunoinformatics approach is instrumental because it saves time and eliminates various biosafety concerns because of the immunogenic reaction. Compared to traditional vaccines, multieptiope vaccines have an advantage. The traditional method of vaccine development which involves the entire organisms or proteins can lead to unavoidable and unnecessary antigenic load along with allergenic responses. The said issue can be circumvent by peptide based vaccines comprising of multiple short immunogenic peptide with the capability to elicit targeted immune responses [22, 53, 54]. Designing a multi-epitope vaccine entails screening the viral genome and simulating a targeted immune response that accurately identifies immunogenic epitopes. Several vaccines against SARS-CoV-2 have been developed and reported [93,94,95]. Due to the vast information available for SARS-CoV-2 proteomics and genomics, the in silico approach for vaccine development is helpful for SARS-CoV-2. Few of the corporation involved with vaccine development have also demonstrated the use of immunoinformatics techniques to identify the most antigenic epitopes from plethora of candidates to final vaccine candidate development [43, 96].

In our study, the development of the multi-epitope vaccine elicits a humoral and cell-mediated immune response. MESV contains the epitopes that bind with CTL, HTL, and B-cells and beta-defensin-3 was used as an adjuvant. The S protein of the omicron variant was used to create vaccine epitopes. As an adjuvant, beta-defensin-3 was combined with GPGPG and KK linkers as per the previous report [97]. Antigenicity and allergenicity of predicted CTL and HTL epitopes were assessed. The population analysis of CTL and HTL epitopes in 16 different regions of the world showed that epitopes selected from S protein usually bind to HLA molecules. Immune cells like monocytes, macrophages, and immature dendritic cells express TLR receptors in the plasma membrane, which identifies the S protein of SARS-CoV-2 as a structural component. The binding of COVID19 antigens to TLRs activates NFκB, protein kinase, and cytokine secretion. Molecular modeling can provide detailed information to determine the exact position of the drug/ligand [98].

Here, we used protein–protein molecular docking to study the interaction of vaccine constructs with TLR5. The MESV interacted with TLR5 via two salt bridges between K255-E432 and E229-R29. Three H-bonds were also seen between the complex (H278-S617, R248-D390, and K238-E123). Optimization of the vaccine nucleotide sequence codon was performed to ensure efficient expression in the E. coli host, and its elevation of GC was acceptable for E. coli. Although in silico vaccine studies required entering the clinical phase for confirmation, prediction from immunoinformatics tools confirmed the viability of the vaccine and its role in inducing humoral and cellular immune responses. To eliminate the SARS-CoV-2 infection, there should be a successful interaction with MESV and TLR5, which activates the messenger cascade and triggers the antiviral response. Since in silico analysis has limitations, the in vivo/ in vitro experiments with the clinical trials are required to reinforce the result.