Immune targets to stop future SARS-CoV-2 variants

ABSTRACT The SARS-COV-2 variants of concern (VOCs) accumulated mutations that confer to the viral particle a higher infectivity as well as a capacity to escape neutralizing antibodies (NAbs) elicited by vaccines. In this study, we aimed to identify immune determinants that can be targeted to control future SARS-CoV-2 VOC. Using an immunoinformatics pipeline, we constructed a first data set consisting of 215 spike (S) protein amino acid sequences of the wild-type SARS-CoV-2, as well as of the Alpha, Beta, Delta, Gamma, and Omicron variants/subvariants (BA.1, BA.2, BA.4, BA.5, XBB, and BQ.1). A second data set was composed of epitope amino acid sequences for NAb as well as for T-cells involved in anti-viral activity, retrieved from the Immune Epitope Database (IEDB). Epitope conservation and population coverage analyses were carried out using the data sets. The localization of fully conserved epitopes in the S protein was performed using PyMOL. As main results, (i) fully conserved epitopes were identified: 28 for NAbs and 53 and 99 for class-I and class-II human leukocyte antigen, respectively; (ii) the fully conserved epitopes were shown to have high coverage in the world population (99.77% class combined); and (iii) the receptor-binding domain (RBD) of the S protein better balanced numbers of epitopes for NAb and T-cells, while the subunit two concentrated the highest numbers of T-cell epitopes. These results indicate that the RBD region holds different kinds of immune targets that could be used in an escape mutation-proof vaccine antigen to control future SARS-CoV-2 variants. IMPORTANCE The emergence of SARS-CoV-2 had a major impact across the world. It is true that the collaboration of scientists from all over the world resulted in a rapid response against COVID-19, mainly with the development of vaccines against the disease. However, many viral genetic variants that threaten vaccines have emerged. Our study reveals highly conserved antigenic regions in the vaccines have emerged. Our study reveals highly conserved antigenic regions in the spike protein in all variants of concern (Alpha, Beta, Gamma, Delta, and Omicron) as well as in the wild-type virus. Such immune targets can be used to fight future SARS-CoV-2 variants.

T he coronavirus disease 2019 (COVID-19) is caused by the Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) (1).Since its emergence in late 2019, more than 770 million people have been affected globally, resulting in more than 7 million deaths (2).Despite such a relevant global epidemiological impact, the numbers of COVID-19 cases have diminished, mainly due to the progress of vaccination (3,4), which led the World Health Organization (WHO) to declare the end to COVID-19's emergency phase (5).
From late 2020 to early 2023, many genetic SARS-CoV-2 genetic variants and variants of concern (VOCs) have emerged.The historical sequence of VOC appearance was Alpha, Beta, Gamma, Delta, and Omicron.They carry immune escape mutations in the spike protein (the main immune target, S) that confer the capacity of circumvent ing the neutralization activity of serum antibodies elicited by vaccines based on the wild-type SARS-CoV-2 (6)(7)(8).Although it is known that mRNA vaccines based on the wild-type SARS-CoV-2 induce immunological T-cell memory able to cross-recognize VOCs from Alpha to Omicron (9), the abrogation of several neutralizing antibody (NAb) epitopes, especially in Omicron, led to the development and use of mRNA-based bivalent COVID-19 vaccines.They include a component of the original virus strain aiming to provide broad protection against COVID-19 and a component of the newest viruses: subvariants of the Omicron VOC (BA.1, BA.4, and BA.5) (10,11).
Although bivalent COVID-19 vaccines have been shown to be effective in preventing symptomatic infections (11), the WHO recognizes that the risk of new VOCs remains (5).As there is not a global effort to eradicate the disease, it is expected that new variants of SARS-CoV-2 will emerge.To deal with this future scenario, it is important to seek highly conserved immune determinants.In this study, we aimed to find SARS-CoV-2 immune targets that have been conserved since the beginning of the COVID-19 pandemic on the S protein and that are involved in anti-viral immune response.

NAb epitopes to stop future SARS-CoV-2 variants
In order to identify B-cell immune determinants that can be targeted by NAb to stop future SARS-CoV-2 variants, we carried out conservation analyses of all NAb epitopes deposited in IEDB along with a data set containing spike protein amino acid sequences representative of the main viral genetic variants in history: Wuhan (wild type), Alpha, Beta, Gamma, Delta, and Omicron.As shown in Table 1, from the nine epitopes present in the Wuhan N-terminus domain (NTD), eight were abrogated in the Omicron VOC.Although a slight increase in the number of conserved epitopes has been detected in the Alpha VOC (mutations can generate new epitopes), they decreased over time, according to the historical sequence of VOC appearance (Alpha, Beta, Gamma, Delta, and Omicron).Relevantly, none of the original NTD epitopes was fully conserved in the data set containing sequences representative of all viral variants (named All; see Table 1).A similar result was observed when NAb epitopes present in the receptor-binding domain (RBD) were studied: from the original 107 epitopes present in Wuhan, only 21 were fully conserved in Omicron or in All.Only epitopes located in the subunit 2 (S2) of the S protein tended to be conserved: from eight present in Wuhan, eight and seven were fully conserved in Omicron and All, respectively.Nevertheless, from the 124 epitopes originally present in the whole S protein of Wuhan, only 30 and 28 were fully conserved in Omicron and All, respectively.The heat map (Fig. S1) shows that there is a clear tendency for the number of fully conserved NAb epitopes to decrease along the historical sequence of VOC appearance.
Regarding the positions of the fully conserved NAb epitopes (considering the All data set) in the S protein, as shown using a spike protein trimer 3D model (Fig. 1A), 21 are located in the RBD (Fig. 1B).Most of these epitopes were based on amino acids in the positions 400 to 500 in the S protein amino acid sequence (Supplemental material 4).It is important to highlight that all of these epitopes are discontinuous, in contrast to the seven located in the S2 (Fig. 1C), which are all linear, and were based on amino acid positions 809 to 826 and 1,146 to 1,166 (Supplemental material 4).These regions comprise amino acid sequences before the heptad repeat 1 (HR1) and heptad repeat 2 (HR2), respectively.Four of these last-mentioned epitopes are located out of the 3D model used, which ends in the beginning of HR2 (zoomed epitopes in Fig. 1C).Collectively, these results indicate that most of the NAb epitopes present in the SARS-CoV-2 S protein were abrogated over time, according to the historical sequence of VOC appearance.In addition, most of the fully conserved NAb epitopes are located in the RBD.

T-cell epitopes to stop future SARS-CoV-2 variants
Conservation analyses were also carried out to identify T-cell epitopes that can be used to stop future SARS-CoV-2 variants.As shown in    It is important to highlight that all of these epitopes are linear and some of them are overlapping.The occupancy of the NTD (Fig. 2A) and RBD (Fig. 2B) by class-I HLA epitopes which are targeted by IFN-γ-secreting T-cells is lower when compared to that of S2 (Fig. 2C).In the same way, the occupancy of the NTD (Fig. 2D) and RBD (Fig. 2E) by class-II HLA epitopes which are targeted by IFN-γ-secreting T-cells is lower when compared to that of S2 (Fig. 2F).The number of class-II HLA epitopes is higher (Supple mental material 4).Although some slight increases in the number of conserved epitopes  have been seen over time in the historical sequence of VOC appearance, these results collectively indicate that there was an abrogation of a relevant number of the T-cell epitopes.In addition, most of the fully conserved T-cell epitopes are located in the S2, followed by RBD and NTD.Moreover, a lower number of T-cell epitopes were abrogated when compared to NAb epitopes.

Epitope population coverage
To estimate the coverage of the world population by T-cell fully conserved epitopes, population coverage analyses were carried out for specific populations as well as for the entire world population (

Identification of an immune target combining epitopes for NAb and T-cells
In order to identify an immune target that could present epitopes for both NAb and T-cells, we compared the distribution of the different kinds of epitopes in NTD, RBD, and S2.All of the NAb epitopes were abrogated in the NTD, although a reduced number of T-cell epitopes have been conserved in it.In contrast, the S2 concentrated the highest number of T-cell epitopes among all domains/subunits of the S protein.However, although its NAb epitopes have been well conserved, their number was originally low when compared to RBD.This last domain concentrated most of the fully conserved NAb epitopes, whose numbers were well balanced with fully conserved T-cell epitopes when compared to the other domains.It is important to highlight that all of the NAb epitopes conserved in RBD are discontinuous.Thus, as shown in Fig. 3, the RBD was shown to better balance different kinds of immune targets that could be used in an escape mutation-proof vaccine antigen to control future SARS-CoV-2 variants.

DISCUSSION
Although bivalent COVID-19 vaccines have been shown to be effective in preventing symptomatic infections of the newest SARS-CoV-2 genetic variants, the WHO recognizes that the risk of new VOCs remains.Many mutations were accumulated in the main vaccine target, the S protein, which turned the resulting viruses more transmissible as well as capable of circumventing the immune response elicited by vaccines.The control measures used since the beginning of the pandemic have directed the viral variant selection in these two features.As there is not a global effort to eradicate the disease, it is expected that new variants of SARS-CoV-2 will emerge from areas with absent and/or inefficient immunization programs.To deal with this future scenario, we aimed to find S protein epitopes that can be targeted to stop future variants.
Our results indicated that the highest number of fully conserved NAb epitopes is located in the RBD.In contrast, most of the T-cell conserved epitopes are located in the S2.The conservation of the NAb and T-cell epitopes in RBD and S2, respectively, may be related to key biological functions of the spike protein, especially the maintenance of structure stability, binding to the host cell receptor and membrane fusion (12).Most of the fully conserved NAb epitopes in RBD were discontinuously concentrated in positions 400 to 500 in the S protein amino acid sequence (13).These positions have key roles in the attachment to the host angiotensin-converting enzyme 2 receptor (ACE2).Mutations in these positions are expected to compromise or abrogate the attachment capacity and infection of the virus to the host cell.The S2, which is composed successively of a fusion peptide, HR1, HR2, transmembrane domain, and cytoplasmic domain fusion, is responsible for viral fusion and entry.The HR1 and HR2 are targets for proteolytic cleavages necessary to separate S1 and S2.This event is essential for fusion and entry.All of the fully conserved NAb epitopes in S2 were located before HR1 and HR2.It is expected that mutations in these positions compromise the protein stability or abrogate the fusion and entry events necessary in the viral life cycle and infectivity (14-16).A similar limit in mutations was previously shown by us in flaviviruses, with the most conserved epitopes for NAbs concentrated in structures with key biological functions (17,18).The highly conserved NAb epitopes identified in this study are located in positions that seem to have also key biological functions.This result is in line with several reports showing the conservation of key epitopes in the RBD of different SARS-CoV-2 variants, and also of other coronaviruses (19)(20)(21)(22)(23)(24)(25).On the other hand, T-cell conserved epitopes were shown to be located in different domains of the S protein.It seems that the distribution of this last kind of epitope is not governed by the stringency related to biological functions as observed in NAb epitopes.In general, the RBD better balanced numbers of fully conserved NAb and T-cell epitopes.These results are important because both the humoral and cellular arms of the immune response are necessary to control the virus.NAbs produced by B-cells are essential in viral control because they preclude viral particles to infect cells through different mechanisms (26).On the other hand, CD8 + cytotoxic T-cells are capable of specifically recognizing infected cells through viral peptides presented by class-I HLA and killing them, which contributes to the control of viral synthesis and spread (27).In addition, CD4 + helper T-cells have the function of activating memory B-cells and CD8 + T-cells, contributing to a broad and more specific and effective immune response (27).As can be noted, the functions of the mentioned immune cell types are interconnected, and this highlights the importance of our finding showing that their target epitopes are conserved and more equitably present in the SARS-CoV-2 RBD.
The distribution nature of highly conserved epitopes in the S protein shown in this study can foster the rationale of future vaccine antigens.For example, the recent strategy based on the update of vaccine antigen could be replaced with a broadly protective antigen designed to deal with future SARS-CoV-2 in a long-term way.Immune targets identified here could be used to fight not only the future COVID-19 but also other coronaviruses, which are related to SARS-CoV-2, or even descend from it.In addition, combinations of fully conserved epitopes or the RBD, which better balances Nab and T-cell epitopes, could be incorporated in cutting-edge vaccine platforms, such as those based on self-amplifying RNA, which requires smaller doses and are capable of inducing potent immune responses (28).Peptides based on fully conserved epitopes of the RBD could also be studded with ferritin or other nanoparticles in order to generate more potent immune responses (28).Interestingly, ferritin self-assembles into a sphere and can be studded with proteins.All of the results and indications presented in this study are useful to fight COVID-19 in a long-term way.
One could argue that the sample size of protein sequences used in this study was not adequate.However, some important points deserve to be highlighted: (i) the uneven quality of sequences deposited into GenBank and Global Initiative on Sharing All Influenza Data (GISAID) databases and (ii) the high genomic similarity of respective SARS-CoV-2 VOCs, which can translate into a high proportion of sequences of each variant being mostly identical.The third and last point falls on the treatment given to the sequences, such as the absence of degenerate bases and duplications among VOCs.In addition, we studied only immune targets located in the SARS-CoV-2 S protein.There are other epitopes in other viral proteins.However, the amount of well-characterized epitopes in the S protein is far higher than in any other protein.Moreover, we studied only SARS-CoV-2 S proteins and not from other sarbecoviruses.Finally, one could argue that we did not present an in vivo proof of concept, which is true.However, all of the epitopes presented in this are real and were previously well characterized in vitro and in vivo and are involved in protective anti-viral immune mechanisms against SARS-CoV-2.
In this study, we presented robust data supported by proper analyses showing that the RBD better balances fully conserved epitopes for NAbs, as well as for IFN-γ-producing T-cells among all S protein domains.These findings indicate that the RBD concentrates the more balanced numbers of immune target kinds that have been conserved since the beginning of the COVID-19 pandemic and that could be present in an escape mutation-proof vaccine antigen.

Data sets of SARS-CoV-2 spike protein amino acid sequences and epitopes
A first data set of 215 amino acid sequences of the spike protein from the wild-type SARS-CoV-2 (Wuhan, n = 28) as well as from its variants of concern Alpha (n = 24), Beta (n = 16), Gamma (n = 32), Delta (n = 50), and Omicron (BA.1, BA.2, BA.4,BA.5, XBB, and BQ.1, n = 60) was built (Supplemental material 1).Quality-filtered data sets (0% of degenerated bases and duplicate sequences) were obtained through biopython-based software Sequence Cleaner (https://github.com/metageni/Sequence-Cleaner).Sequences were representative of the Americas, Europe, Africa, Asia, and Oceania.From July 2022 to January 2023, they were retrieved from the National Center for Biotechnology Informa tion (NCBI) (https://www.ncbi.nlm.nih.gov/) as amino acid sequences or from the GISAID (https://gisaid.org/)as genomic nucleotide sequences.S protein-coding sequences retrieved from GISAID were translated into amino acid sequences using UGENE v.45.0 bioinformatics multiplatform (http://ugene.net/).The criteria for selecting the S protein sequences were as follows: (i) complete sequences and (ii) absence of unidentified amino acids.In addition, three other data sets of immune targets (epitope amino acid sequences) were built.Sequences were retrieved from IEDB (https://www.iedb.org/).This platform permits the retrieval of experimentally confirmed data on epitopes for antibodies and T-cell for different infectious diseases.The data set of immune targets for NAbs consisted of 415 epitopes validated by virus neutralization assays, such as plaque reduction neutralization tests, focus reduction neutralization tests, and structural biology analyses (29) (Supplemental material 2).The data set of epitopes for class-I human leukocyte antigen (HLA) consisted of 159 epitopes validated by T-cell IFN-γ release assays (Supplemental material 3).The data set of epitopes for class-II HLA consisted of 321 epitopes validated by T-cell IFN-γ release assays (Supplemental material 3).Databases (NCBI, GISAID, and IEDB) were accessed until 30 January 2023 in order to construct the data sets used in this study.Replicates of the same epitope sequences were removed from analyses.NAb epitopes with more than one chain were also removed from analyses.

Epitope conservation analysis
As previously described (17,18,30), the IEDB conservation analysis tool (http:// tools.iedb.org/conservancy)was used to determine epitope conservation among SARS-CoV-2 S protein sequences contained in our data set.In the present study, only fully conserved epitopes were considered (100% conserved in the S protein amino acid sequences of data sets used), including epitopes for B-and T-cells.One-to-one epitope conservation analyses using S protein data sets of Wuhan, Alpha, Beta, Gamma, Delta, or Omicron as well as using the whole data set containing all SARS-CoV-2 variant S protein amino acid sequences were carried out.

Population coverage analysis
The T-cell epitopes selected in the conservation analysis were subjected to popula tion coverage analysis using the IEDB population coverage calculation tool (http:// tools.iedb.org/population/),as previously described (18).
27 in Wuhan to 10 and 4 in Omicron and All, respectively.In addition, epitopes origi nally present in the Wuhan S2 decreased from 132 to 102 and 64 in Omicron and All, respectively.Relevantly, class-II HLA epitopes decreased from 257 in Wuhan whole S protein to 150 and 99 in Omicron and All, respectively.

FIG 2
FIG 2 Fully conserved class-I/II HLA epitopes in the SARS-CoV-2 spike protein which are targeted by IFN-γ-secreting CD4 + and CD8 + T-cells.Epitopes are shown in blue in 3D models of the spike protein trimer domains.Fully conserved epitopes for class-I HLA, which are targeted by IFN-γ-secreting T-cells, are shown in the NTD (A), RBD (B), and S2 (C).Fully conserved epitopes for class-II HLA which are targeted by IFN-γ-secreting T-cells are shown in the NTD (D), RBD (E), and S2 (F).

FIG 3
FIG 3 SARS-CoV-2 spike protein trimer 3D model with highlighted epitopes.(A) The fully conserved epitopes for NAb and for class-I/II HLA which are targeted by IFN-γ-secreting T-cells (shown in blue) are part of the RBD (shown in gray).(B) 3D model upper view; (C) 3D model bottom view.Most of the NAb epitopes are discontinuous.All of the T-cell epitopes are linear.Many of the epitopes are overlapping.RBD concentrates both NAb and T-cell targets that could be used in an escape mutation-proof vaccine antigen to control future SARS-CoV-2 variants.

TABLE 1
Numbers of fully conserved NAb epitopes in subunits and domains of the spike proteins of the main SARS-CoV-2 genetic variants a NTD, N-terminus domain, located in subunit 1 of the SARS-CoV-2 spike protein.bRBD, receptor-binding domain, located in S1. c S2, subunit 2 of the SARS-CoV-2 spike protein.dWuhan: wild-type SARS-CoV-2.eAll: set of spike protein amino acid sequences (n = 215) from wild-type SARS-CoV-2 (Wuhan) as well as from its variants of concern (VOCs) Alpha, Beta, Gamma, Delta, and Omicron.

Table 2
from 127 in Wuhan whole S protein to 83 and 53 in Omicron and All, respectively.Regarding the numbers of class-II HLA epitopes which are targeted by IFN-γ-secreting T-cells, conservation analyses revealed that (Table3) from the 62 epitopes present in Wuhan NTD, 22 and 12 were fully conserved in Omicron and All, respectively.From the original 36 epitopes present in the Wuhan RBD, 19 were fully conserved in Omicron and All, respectively.Epitopes contained in S1 but out of NTD or RBD decreased from

TABLE 2
Numbers of fully conserved class-I HLA epitopes in subunits and domains of the spike proteins of the main SARS-CoV-2 genetic variants a NTD, N-terminus domain, located in the S1 of the SARS-CoV-2 spike protein.bRBD, receptor-binding domain, located in S1. c S1, subunit 1 of the SARS-CoV-2 spike protein.dS2, subunit 2 of the SARS-CoV-2 spike protein.eWuhan: wild-type SARS-CoV-2.fAll: set of spike protein amino acid sequences (n =215) from wild-type SARS-CoV-2 (Wuhan) as well as from its variants of concern Alpha, Beta, Gamma, Delta, and Omicron.

TABLE 3
Numbers of fully conserved class-II HLA epitopes in subunits and domains of the spike proteins of the main SARS-CoV-2 genetic variants All: set of spike protein amino acid sequences (n = 215) from wild-type SARS-CoV-2 (Wuhan) as well as from its variants of concern Alpha, Beta, Gamma, Delta, and Omicron.

Table 4 )
. Class-I HLA epitopes which are targeted by IFN-γ-secreting T-cells were shown to have an 88.18% average coverage in the world population.The coverage was higher than 92% in most of the continents and subcon tinents, except Central America.On the other hand, class-II HLA epitopes which are targeted by IFN-γ-secreting T-cells were shown to have a 59.55% average coverage in the world population.The coverage was higher than 85% in continents historically highly affected by COVID-19, such as North America and Europe.The class combined (combin ing class-I and class-II epitope coverages) coverage for the world population was 93.06%.Altogether, these results indicate that the fully conserved T-cell epitopes identified in this study have high coverage in the world population and have the potential to stimulate effective T-cell-based immune responses against future SARS-CoV-2 variants.

TABLE 4
Population coverage of class-I and class-II HLA epitopes, which are targeted by IFN-γ-secreting CD8 + and CD4 + T-cells, respectively ,

Population/area Class-I HLA Class-II HLA Class combined Coverage a Average_hit b pc90c c Coverage a Average_hit b pc90c c Coverage a Average_hit b pc90c c
a Population coverage (%) by Immune Epitope Database (IEDB).b Average number of occurrences of these epitopes in the international population by area.c Minimum number of epitope hits/HLA combinations recognized by 90% of the population.d Averages of covergare, average_hit, and pc90c values regarding all continents (population/area) studied.e Standard deviations of coverage, average_hit, and pc90c values regarding all continents (popullation/area) studied.