Identification of Novel Candidate CD8+ T Cell Epitopes of the SARS-CoV2 with Homology to Other Seasonal Coronaviruses

Cross-reactive T cell immunity to seasonal coronaviruses (HCoVs) may lead to immunopathology or protection during SARS-CoV2 infection. To understand the influence of cross-reactive T cell responses, we used IEDB (Immune epitope database) and NetMHCpan (ver. 4.1) to identify candidate CD8+ T cell epitopes, restricted through HLA-A and B alleles. Conservation analysis was carried out for these epitopes with HCoVs, OC43, HKU1, and NL63. 12/18 the candidate CD8+ T cell epitopes (binding score of ≥0.90), which had a high degree of homology (>75%) with the other three HCoVs were within the NSP12 and NSP13 proteins. They were predicted to be restricted through HLA-A*2402, HLA-A*201, HLA-A*206, and HLA-B alleles B*3501. Thirty-one candidate CD8+ T cell epitopes that were specific to SARS-CoV2 virus (<25% homology with other HCoVs) were predominantly identified within the structural proteins (spike, envelop, membrane, and nucleocapsid) and the NSP1, NSP2, and NSP3. They were predominantly restricted through HLA-B*3501 (6/31), HLA-B*4001 (6/31), HLA-B*4403 (7/31), and HLA-A*2402 (8/31). It would be crucial to understand T cell responses that associate with protection, and the differences in the functionality and phenotype of epitope specific T cell responses, presented through different HLA alleles common in different geographical groups, to understand disease pathogenesis.


Introduction
Infection due to the SARS-CoV2 virus is currently the leading cause of mortality among elderly and vulnerable groups in many countries. Outbreaks of COVID-19 are now reported in 220 countries in the world [1], with some countries experiencing a massive second wave. The majority of those who are infected with SARS-CoV2 experience asymptomatic or mild illness, whereas severe illness and fatalities are seen in older individuals and in those with comorbidities such as diabetes, cancer, and cardiovascular disease [2][3][4]. However, the observed case fatality ratios (CFRs) vary widely between countries. For instance, countries such as Mexico, Italy, and Iran report CFRs of over 3.5%, while countries such as India, Turkey, South Korea report CFRs of <2% [5,6]. The deaths per 100,000 population also varies widely among countries, with many European countries and the US reporting rates of over 100 deaths/100,000 population, while the majority of South Asian and South East Asian countries report death rates <25 deaths/100,000 population, despite the ongoing large outbreaks in the South East Asian region [5].
There have been many factors that have been shown to influence the CFRs in different countries including the proportion of the population aged >70 years of age, the GDP per capita, BCG vaccination status [7,8], climate, population density, and social distancing measures, which could account for the differences in the CFRs. However, there could be many immune factors that also lead to these differences, including immunity to human seasonal coronaviruses (HCoVs), providing cross protection against SARS-CoV2. Children are more frequently exposed to HCoVs compared to adults and recently it was shown that a large proportion of children, who were sampled between the years of 2011 to 2018, between the ages of 1 to 16, had IgG antibodies that cross-react with the spike protein of SARS-CoV-2 [9]. Sera from SARS-CoV2 uninfected individuals who had such crossreactive antibodies were able to neutralize the pseudotypes of SARS-CoV-2 without causing antibody dependent enhancement of infection [9].
Higher lymphocyte counts and specifically CD8 + T cell counts have been associated with early viral clearance and reduced disease severity [10]. Robust SARS-CoV2 specific CD4 + and CD8 + T cell responses have been detected following natural infection [11,12]. SARS-CoV2 cross reactive T cells have been detected following SARS-CoV1 infection in 2003, which were predominantly directed against the N and structural proteins (NSP7 and NSP13) [13]. Interestingly, SARS-CoV2 cross reactive CD4 + T cells have also been detected in a large proportion of unexposed individuals in some studies, which are thought to be due the presence of T cells specific to HCoVs [12,14]. However, differences in T cell methodology have resulted in differential findings of cross-reactivity related to altered sensitivity and specificity. Ultimately, it will be important to know if T cells can respond to virally infected target cells where naturally processed epitopes are presented. Nevertheless, the presence of such cross-reactive T cell responses has been speculated to protect against infection with the SARS-CoV2 virus and have also speculated to cause disease pathogenesis or response to vaccines [13,14].
As there is a difference in the CFRs among different countries, which are due to multiple factors, it would be important to understand if prior immunity to HCoVs could influence disease outcome when infected with the SARS-CoV2 virus. The predominant CD8 + and CD4 + T cell epitopes recognized by a population are also influenced by the frequency of different HLA types of a population. Therefore, in this study, we analyzed the cross reactivity of all the proteins of the SARS-CoV2 virus with three HCoVs (OC43, NL63, and HKU1) and those that are specific to the SARS-CoV2 virus and then used immune epitope database (IEDB) to predict the CD8 + T cell epitopes predicted to be restricted through the common HLA-Class I alleles in the Sri Lankan population.

T-Cell Epitope Prediction for MHC Class I HLA Alleles
IEDB (Immune epitope database) (www.iedb.org, 10 October 2020) [15] along with NetMHCpan (ver. 4.1) [16] was used for finding putative peptide sequences that were restricted through the MHC Class-I HLA alleles, as these methods use algorithms (generated by artificial neural networks), which have accuracy of epitope prediction. HLA-A*02, HLA-A*24, HLA-A*33, HLA-B*35, HLA-B*40, HLA-B*44, HLA-C*07, HLA-C*06, HLA-C*04, and HLA-C*03 alleles were used for this epitope prediction, as these five alleles are the predominant alleles (each allele seen in >10% in the population) in the Sri Lankan population [17]. The epitope predictions were carried out for 8mer, 9mer, and 10mer epitopes. According to the epitope prediction NetMHCpan EL 4, which gives a score ranging from 0.0 to 1.0. Epitopes with a higher score are considered as stronger binders.

Conservation Analysis of SARS-CoV-2 Proteins and Predicted Epitopes with OC43, HKU1, and NL63
The epitopes which had a predicted binding score of >0.8 through the above approach and were highly conserved between the SARS-CoV2 and OC43, HKU1, and NL63, were further analyzed to determine percentage of identities and similarities between SARS-CoV2 and other HCoVs. The sequences were aligned using the Clustal W on MEGA X software (www.megasoftware.net, 5 October 2020). Conservation of protein sequences were analyzed by using conservation analysis tool available at European Bioinformatics Institute (EBI) (www.ebi.sc.uk, 12 October 2020), IEDB, and Jalview (www.jalview.org, 15 October 2020).

Results
In order to identify the possible cross-reactive regions between SARS-CoV2 and other HCoVs, we carried out conservation analysis of SARS-CoV-2 virus with OC43, HKU1, and NL63, which have been seen in Sri Lanka. We analyzed the homology and conservation of the 4 structural and the 16 NSPs of SARS-CoV2 with the 3 other HCoVs (Table S1). The NSP12, NSP13, and NSP16 of the SARS-CoV2 showed ≥65% homology with OC43 and HKU, which are beta coronaviruses, and a >55% homology with NL63, which is an alpha coronavirus.
In contrast, the NSP1 and NSP2 proteins of the SARS-CoV2 showed <20% homology with other three viruses, suggesting that these proteins were more specific for SARS-CoV-2 compared to the other proteins. All four structural proteins (S, E, M, and N), NSP3, and NSP6 showed <35% homology with OC43, HKU1, and NL63. Therefore, immune responses directed at these proteins may also be specific to the SARS-CoV2, unlike immune responses generated against NSP12, NSP13 and NSP16. The SARS-CoV-2 proteins S, E, M, N, and the non-structural proteins showed less homology with NL63 than OC43, and HKU1, suggesting that the. SARS-CoV-2 virus is genetically closer to OC43 and HKU1 than NL63.

Identification of Possible CD8 + Epitopes of the SARS-CoV2 Virus Restricted through HLA-A Alleles of the SARS-CoV2
The NetMHCpan EL4 epitope prediction tool gives peptide binding scores ranging from 0 to 1.0 and we considered a predicted score ≥0.90 for a peptide as indicative of a stronger binder. None of the 8mer peptides gave a predictive score of ≥0.90. However, 39 9mer peptides gave a high binding score of ≥0.90, which were restricted through different HLA-A alleles, ( Table 1). Five of these epitopes were identified from the spike and NSP3 proteins and 4 epitopes each were identified from NSP4, NSP6, NSP12, NSP14, and NSP15. Peptides with high binding scores were not identified from the envelope, NSP1, NSP9, NSP10, NSP11, and NSP16 proteins. The epitopes from NSP3 726 YYTSNPTTF 734 (predicted to be restricted through HLA-A*2402) and NSP6 70 FLLPSLATV 78 (predicted to be restricted through HLA-A*0201) gave a score of 0.99, while 1349 NYMPYFFTL 1357 from NSP3, 420 FLLNKEMYL 428 from NSP4, and 152 ALWEIQQVV 160 from NSP8 also gave a score of 0.98 scores. However, they had <45% homology with the other three viruses. 23/39 of the 9mer epitopes predicted in this study were restricted through HLA-A*02 while 16/39 9mer peptides were predicted to be restricted through HLA-A*24. Although HLA-A*33 was an allele seen in over 10% of the Sri Lankan population, 9mers that had a score of ≥0.90 were not identified.
Only four 10mer peptides were predicted to have a score of ≥0.90, two were from spike, and one each from NSP3 and NSP6 proteins (Table 1). As for HLA-A alleles, we considered a predicted score ≥0.90 for a peptide as indicative of a stronger binder. None of the 8mer peptides were found to give a score of ≥0.90 and therefore, were not predicted to be restricted through HLA-B alleles. However, 38 9mer peptides were identified, which had high binding scores and were predicted to be restricted through HLA-B alleles ( Table 2). The highest number of epitopes were predicted from spike protein (5/38) and NSP13 (4/38). Three epitopes were predicted from each of the following proteins: namely the nucleocapsid, NSP2, NSP3, NSP4, and NSP12. No epitopes were identified from envelope, membrane, and NSP11 proteins. Nine epitopes gave a score of 0.99 and were 895 IPFAMQMAY 903 from spike, 325 TPSGTWLTY 333 from nucleocapsid, 195 SEVGPEHSL 203 and 562 GETLPTEVL 570 both from NSP2, 120 EEFEPSTQY 128 and 546 QEILGTVSW 554 both from NSP3, 72 LPSLATVAY 80 from NSP6, 4 SEFSSLPSY 12 from NSP8, and 608 VENPHLMGW 616 from NSP12.  restricted through HLA-B*40 showed a homology of >75% with OC43 and HKU-1, which are two other beta coronaviruses.

Identification of Possible CD8 + Epitopes of the SARS-CoV2 Virus Restricted through HLA-C Alleles of the SARS-CoV2
None of the 8mer and 10mer peptides were found to give a score of ≥0.90 and therefore, were not predicted to be restricted through HLA-C alleles. However, 21 9mer peptides that were identified had high binding scores and were predicted to be restricted through HLA-C alleles ( Table 3). The highest number of epitopes were predicted from NSP3 protein (6/21) and spike (4/21). Epitopes were not predicted from each of the following proteins: namely the nucleocapsid, envelop, NSP2, NSP4, NSP6, NSP9, NSP11, NSP12, NSP15 and NSP16. After identification of peptides that had high predicted values to be restricted through the common HLA-A, B, and C alleles present in the Sri Lankan population, we proceeded to identify the regions of the SARS-CoV2 virus, which had a >75% homology with the HCoVs. We then proceeded to identify CD8 + T cell epitopes within these regions, which were candidates to be restricted through these HLA alleles. This was to determine if we could identify CD8 + T cell epitopes of the SARS-CoV2, which were likely to cross-react with the other HCoVs. None of the predicted CD8 + 8mer epitopes identified within the SARS-CoV2 virus gave a high binding score and therefore, only predicted 9mer and 10mer CD8 + T cell epitopes of the SARS-CoV2 virus were analyzed for the degree of homology with OC43, HKU1, and NL63.
Epitopes that were identified to have a ≥75% homology with more than two HCoV viruses are shown in Table 4. Thirty-four 9mer epitopes and 18 10mer peptides identified within the SARS-CoV2 virus had ≥75% homology with ≥2 HCoV viruses. Of the 9mer peptides, 22/34 epitopes gave a peptide binding score of ≥0.90. 11/34 of these CD8 + T cell epitopes within these cross-reactive regions were predicted to be restricted through HLA-A, 6/34 were predicted to be restricted through HLA-B alleles and 5/34 were predicted to be restricted through HLA-C alleles. Six highly cross-reactive CD8 + T cell epitopes (9mers) with high HLA-A (A*201 and A*206) binding scores were identified from the NSP12 and NSP13 ( 334 FVDGVPFVV 342 ). Two of these peptides showed 100% homology with OC43 and HKU1. The Envelope, nucleocapsid proteins and the other non-structural proteins (apart from NSP12 and NSP13) did not have regions with >75% homology with the other HCoVs. The alignment of SARS-CoV2 NSP12 and NSP13, in which most of the cross-reactive epitopes were identified from and their position are shown in Figures S1 and S2.  Of the 10mer peptides analyzed, 18 were identified within SARS-CoV2 to have ≥75% homology with ≥2 HCoVs (Table 4). Only one 10mer peptide identified within NSP-13 ( 446 AEIVDTVSAL 455 ) gave a score of ≥0.90 score. 14/18 of these 10mer CD8 + T cell epitopes were predicted to be restricted through HLA-A and 3/18 were predicted to be restricted through HLA-B alleles. Only one epitope out of 18 was predicted to be restricted through HLA-C. As with the 9mer peptides, 6 of the 10mer peptides, which were highly homologous with OC43, HKU1 and NL63 were found within the NSP12 and NSP13 region.

Identification of CD8 + T cell Epitopes of SARS-CoV2, Which Show ≤25% Homology with OC43, HKU1, and NL63
After identification of highly cross reactive CD8 + T cell epitopes within the SARS-CoV2, we proceeded to identify regions, which were specific to the virus and did not cross react with other HCoV2, and therefore, are likely to be SARS-CoV2 specific CD8 + T cell epitopes. 9mer peptides of the representing different regions of the SARS-CoV2 virus, which have ≤25% homology with >2 HCoV viruses were analyzed and 60 such potential CD8 + T cell epitopes were identified. (Table 5). 19/60 CD8 + T cell epitopes were predicted to be restricted through HLA-A alleles, 20/60 epitopes were predicted to be restricted through HLA-B alleles and 21/62 epitopes were predicted to be restricted through HLA-C alleles. 31/60 9mer peptides gave a binding score of ≥0.90. 12/31 of these CD8 + T cell epitopes were predicted to be restricted through HLA-A alleles, 14/31 predicted to be restricted through HLA-B alleles and 5/31 predicted to be restricted through HLA-C alleles. A region within the spike protein ( 686 VASQSIIAY 694 ) had no homology with the other HCoVs but had a high binding score of >0.95 to HLA-B*3501 and two other 9mer peptides within the nucleocapsid ( 325 TPSGTWLTY 333 and 322 MEVTPSGTW 330 ) had <22% homology and were predicted to be restricted through HLA-B*3501 and HLA-B*4403. Three other CD8 + T cell epitopes within NSP2, NSP3 and NSP6, which had high binding scores but had 0% homology were also identified.   We identified 49 10mers as CD8 + T cell epitopes, which had ≤25% homology with two HCoV viruses (Table 5). 5/44 epitopes gave a score of ≥0.90. 22/49 of these CD8 + T cell epitopes were predicted to be restricted through HLA-A, 22/49 were predicated to be restricted through HLA-B alleles and 5/49 were predicated to be restricted through HLA-C alleles. Again, the peptides that had the highest binding scores and least percentage identified were predicted to be restricted through HLA-B*3501 and HLA-B*4403. The highest binders, which were specific to SARS-CoV2, were identified within the spike protein ( 95 TEKSNIIRGW 104 ), NSP2 ( 489 KEIKESVQTF 498 ), and NSP3 ( 120 EEEFEPSTQY 129 and 502 VPTDNYITTY 511 ).
3.6. Conservational Analysis of the Candidate CD8 + T Cell Epitopes with Binding Scores of ≥0.90 We proceeded to investigate if the 18 candidate CD8 + T cell epitopes that had a percentage identity of >75% with other HCoVs and the 31 SARS-CoV2 specific (<25% percentage identity) were conserved within the SARS-CoV2. We found that these candidate epitopes were highly conserved (Tables S2 and S3) and these regions were highly conserved within the new SARS-CoV2 variants as well (Figures S3 and S4).

Similarity of Candidate Peptides with Published CD8 + T Cell SARS-CoV2 Epitopes
Several CD8 + T cell epitopes that are restricted through different HLA-A and B alleles have been published [11,[18][19][20][21]. We proceeded to find out if any of the candidate CD8 + T cell epitopes were already identified in patients who were naturally infected with the SARS-CoV2 virus. We found that 20/31 candidate highly conserved T cell epitopes which were found to be specific to the SARS-CoV2 (<25% homology with other HCoVs) had been identified in infected individuals (Table S4). In our HLA allele prediction analysis using the dominant HLA alleles in Sri Lanka, although some of the epitopes were predicted to be restricted though HLA-B*3501 and HLA-B*4403, some of these epitopes were found to be restricted through HLA-A*0201, A*1101 and HLA-A*0301. 7/18 of the candidate T cell epitopes, which were found to be cross reactive (>75% homology with the other HCoV2s) were also identified from those who were naturally infected. For the candidate CD8 + T cell epitopes that were found to be cross reactive with other HCoV2, the predicted HLA allele by us and the HLA allele restriction identified following natural infection were similar in 4/7 epitopes (Table S5).

Discussion
In this study we have identified candidate CD8 + T cell epitopes, which were highly conserved within SARS-CoV2, and some which show >75% percentage homology with the HCoV2s OC43, HKU1 and NL63, and therefore, are candidates to give rise to cross-reactive T cell responses. The majority of the predicted CD8 + T cell epitopes (binding score of ≥0.90), which had a high degree of homology with the other three HCoV2s were within the NSP12 and NSP13 proteins. They were predicted to be restricted through HLA-A*2402, HLA-A*0201, HLA-A*0206 and HLA-B alleles B*3501. Therefore, the presence of SARS-CoV2 cross reactive CD8 + T cell responses could depend on the frequency of the above HLA-A alleles in a population, as the most cross-reactive candidate CD8 + T cell epitopes are restricted through these alleles.
The frequency of HLA-A*0201 and HLA-A*0206 in the Sri Lankan population are 4.9% to 6.6% and 2.1% to 2.4% respectively, while the frequency of HLA-A*2402 is 20.8% to 30.3% [17,22]. HLA-B*35 frequency in the Sri Lankan population is 21% to 23% [17,22]. In contrast, the most frequent HLA-A alleles in the European, US and Brazilian populations are HLA-A*0201 and A*0206 (24.5% to 27.5%), which are several fold higher than in the Sri Lankan population, while HLA-A*2402 and B*3501 are lower (7.9% to 9.5%) [23][24][25]. In silico analysis recently showed that HLA-A*0201 was associated with a higher risk of COVID-19, while HLA-A*2402 was shown to associate with higher capacity to present SARS-CoV2 antigens [26]. SARS-CoV2 specific HLA-A*0201 CD8 + T cell epitopes were shown to have suboptimal antiviral response and of a reduced frequency when compared to other viral infections such as influenza and Epstein-Barr viral infection [27]. The in-silico analysis showed that countries (Italy, France, Germany, Brazil) in which the most frequent HLA-A allele was A*0201 had the highest COVID-19 case fatality rates (CFRs), whereas those where HLA-A*2402 allele was the most frequent (India, Iran) had lower CFRs [26]. Therefore, it would be important to investigate if certain HLA alleles, presented CD8 + T cell epitopes that associate with protection, whereas if certain other alleles present epitopes that are associated with immunopathology and poor antiviral capacity.
In contrast, the candidate CD8 + T cell epitopes, which were highly conserved identified within SARS-CoV2 virus that are likely to be specific, were predominantly identified within the structural proteins (spike, envelope, membrane, and nucleocapsid) and the NSP1, NSP2, and NSP3. 6/31 of these candidate CD8 + T cell epitopes (binding score of ≥0.90), that were specific to SARS-CoV2 (<25% homology with the HCoVs) were predicted to be restricted through HLA-B*3501, 6/31 through HLA-B*4001, 7/31 through HLA-B*4403 and 8/31 through HLA-A*2402. Only 3/31of the SARS-CoV2 specific candidate T cell epitopes were predicted to be restricted through HLA-A*0201 or A*0206 alleles, common in Europe, USA, and Brazil. Therefore, HLA-A*2402 and HLA-B*3501, HLA-B*4001, and HLA-B*4403, which are predominant HLA alleles in Sri Lanka and India, may present both highly cross-reactive and SARS-CoV2 specific CD8 + T cell epitopes. Indeed, our analysis showed that 20/31 highly conserved, SARS-CoV2 specific candidate CD8 + T cell epitopes were already identified in those with natural infection. Although our HLA allele prediction using the dominant HLA alleles in Sri Lanka predicted these epitopes to be restricted through HLA-B*3501 and HLA-B*4403, some were found to be presented through different HLA alleles in those who were naturally infected in Europe and USA [11,18,21]. However, given that these epitopes are highly conserved within the virus, it is possible that they could be presented by multiple HLA alleles, which should be further investigated.
T cell responses of higher magnitude and breath have been observed in patients who had more severe COVID-19 [12,28]. However, it is not yet known if a higher magnitude and breath of T cell responses in COVID-19 is associated with protection or immunopathology. There is a debate if cross-reactive T cells cause immunopathology in certain viral infections such as in dengue [29,30] and if such cross-reactive T cells in SARS-CoV2 are protective should be investigated. It is hoped that the candidate epitopes presented here, will be of help in subsequent functional T cell analyses, particularly as viral variants emerge as they were found to be highly conserved within the new UK SARS-CoV2 variant, B.1.1.7 and the new South African variant B.1.351. By focusing on highly conserved regions within and between each coronavirus, the candidate epitopes may be of value in understanding immune responses across populations and for future vaccine design. In addition, since many vaccines for COVID-19 are currently been rolled out, it would be crucial to understand T cell responses that associate with protection and the differences in the functionality of epitope specific T cell responses, presented through different HLA alleles.

Conclusions
In summary, we have identified candidate SARS-CoV2 CD8 + T cell responses that are highly cross reactive with other HCoVs and that also are specific to SARS-CoV2. Crossreactive epitopes were predominantly identified from NSP12 and NSP13, while specific epitopes were identified within the structural proteins and NSP1-3. It would be crucial to understand the CD8 + T cell epitopes presented through the most frequent HLA alleles, their phenotype and functionality to better understand the immune responses to SARS-CoV2 and possible implications for vaccines.
Supplementary Materials: https://www.mdpi.com/article/10.3390/v13060972/s1. Figure S1: Multiple alignment of 9mer and 10mer peptide sequences identified from SARS-CoV2 NSP12 with OC43, HKU1, and NL63. All peptides shown have a predicted binding score of ≥0.90, Figure S2: Multiple alignment of 9mer and 10mer peptide sequences identified from SARS-CoV2 NSP13 with OC43, HKU1 and NL63. All peptides shown have a predicted binding score of ≥0.90, Figure S3: Conservational analysis of six candidate CD8+ T cell epitopes that had a high degree of homology (>75%) with HCoVs and a binding score of ≥0.90 with the SARS-CoV2 variants in different countries, and with the new UK variant (B.1.1.7) and the new South African variant B.1.351, Figure S4: Conservational analysis of six candidate CD8+ T cell epitopes that were specific to SARS-CoV2 (<25% homology with HCoVs) and a binding score of ≥0.90 with the SARS-CoV2 variants in different countries, and with the new UK variant (B.1.1.7) and the new South African variant B.1.351. Table S1: Homology of different proteins of the SARS-CoV2 with OC43, HKU1, and NL63 corona viruses, Table S2: Highly conserved candidate CD8+ T cell epitopes that had a high degree of homology (>75%) with HCoVs and a binding score of ≥0.90, Table S3: Highly conserved candidate CD8+ T cell epitopes that were specific to SARS-CoV2 (<25% homology with HCoVs) and a binding score of ≥0.90 with the SARS-CoV2 variants in different countries, Table S4: The candidate CD8+ T cell epitopes (<25% homology with other HCoVs) and their predicted HLA allele, the published CD8+ T cell epitopes (marked in red) and their HLA restriction, Table S5: The candidate CD8+ T cell epitopes (>75% homology with other HCoVs) and their predicted HLA allele, the published CD8+ T cell epitopes (marked in red) and their HLA restriction.