Selection for immune evasion in SARS-CoV-2 revealed by high-resolution epitope mapping and sequence analysis

Summary Here, we exploit a deep serological profiling strategy coupled with an integrated, computational framework for the analysis of SARS-CoV-2 humoral immune responses. Applying a high-density peptide array (HDPA) spanning the entire proteomes of SARS-CoV-2 and endemic human coronaviruses allowed identification of B cell epitopes and relate them to their evolutionary and structural properties. We identify hotspots of pre-existing immunity and identify cross-reactive epitopes that contribute to increasing the overall humoral immune response to SARS-CoV-2. Using a public dataset of over 38,000 viral genomes from the early phase of the pandemic, capturing both inter- and within-host genetic viral diversity, we determined the evolutionary profile of epitopes and the differences across proteins, waves, and SARS-CoV-2 variants. Lastly, we show that mutations in spike and nucleocapsid epitopes are under stronger selection between than within patients, suggesting that most of the selective pressure for immune evasion occurs upon transmission between hosts.


INTRODUCTION
infection-naive subjects have been described by several studies. 11 Notably, regions of high homology between SARS-CoV-2 and endemic hCoVs have been highlighted as a likely source of this cross-reactivity, although the role of such cross-reactive responses in the outcome of SARS-CoV-2 infection and vaccination is currently not fully understood (reviewed in 11 ).
Cross-reactive B cells generated to epitopes of low homology may interfere with responses to a secondary heterologous infection by skewing the response to antibodies with higher avidity for the initial infectious agent, reducing the generation of de novo antibody responses. Evidence for such cross-reactive mechanisms was reported in common cold coronaviruses as well as SARS-CoV-2. [12][13][14] Beneficial cross-reactive memory B cells and antibodies can also be generated to epitopes of high similarity between a primary and a heterologous secondary infection. Cross-reactive B cells specific to highly conserved epitopes might then go on to produce an enhanced memory-like response to heterologous infection, including the production of cross-neutralizing antibodies that can prevent viral entry to cells. Evidence for cross-reactive B cell responses was reported in coronavirus infections, including beta-coronaviruses and SARS-CoV-2. 15,16 Age and disease severity have been suggested to be a contributing factor for these observations. 11 However, none of these studies directly explored effects of pre-existing cross-reactive antibodies on COVID-19 severity. As such, the causal relationship between cross-reactivity of hCoVs and SARS-CoV-2 and fatal outcomes remains unclear.
Cross-reactivity can also impact the T cell compartment. Cross-reactive T cells generated to epitopes of relatively low homology may detrimentally impact a secondary infection with heterologous agents by dominating the response to the secondary infection, disrupting the development of high-avidity de novo T cell responses, thereby leading to immunopathology and reduced viral clearance. Evidence for these detrimental cross-reactive mechanisms has been reported in studies of flavivirus infection, 17,18 but to our knowledge not in coronavirus infections. Beneficial cross-reactive T cell responses can be generated by the stimulation of T cells through high-homology epitopes. These T cells then cross-react with high avidity during a heterologous secondary infection and might block pathogenicity by preventing invasive infection or expedite the rate of viral clearance by forming a 'secondary-like' memory immune response with an increased magnitude of B and T cell responses. Evidence for such cross-reactive mechanisms was reported in studies of coronavirus infections, including SARS-CoV-2. [19][20][21][22][23][24] Overall, there is no evidence from longitudinal cohort studies or population-level studies to suggest that pre-existing SARS-CoV-2 cross-reactive antibodies exacerbate COVID-19. 11,25 Rather, current evidence suggests that alongside the de novo immune response, cross-reactive T cells and antibodies form part of the protective immune response to SARS-CoV-2 infection. Considering the high propensity for SARS-CoV-2 to mutate viral proteins, notably in S protein, variants of concern (VOCs) and variants under investigation (VUIs) can acquire properties for increased transmissibility, disease severity, and/or immune evasion. 26 Thus, promoting this cross-reactive, pre-existing memory immune response to common hCoVs may be an effective strategy against SARS-CoV-2 and future VOCs. 27,28 To better understand the molecular determinants underlying protective immunity to pathogens, including viruses, one must define the epitopes in various viral proteins, the minimal unit of an antigen that can be recognized by T and B cells and can elicit potent cellular and humoral immune responses, respectively. A recent study used VirScan technology, a high-throughput, programmable phage-display immunoprecipitation and sequencing (PhIP-Seq) method, 29 to analyze epitopes of antiviral antibodies in sera of COVID-19 patients relative to pre-COVID-19 sera controls. 30 However, the nature and dynamics of the peptide pools of VirScan/PHIP-seq may limit the resolution, sensitivity, and breadth of specific epitope detection in infected individuals, in turn, providing a fragmented view of the complete footprint of epitope recognition by antibodies. 29,31 In the current study, we provide a comprehensive analysis of SARS-CoV-2 humoral immune responses in a dataset of symptomatic or recovered COVID-19-positive and COVID-19-negative patients. We exploited a high-density peptide array (HDPA) by spotting overlapping 15-mer peptides derived from the entire SARS-CoV-2 and hCoVs proteomes to rapidly identify B cell epitopes recognized by distinct antibody isotypes in patients' blood sera of individual patient groups. We then subjected our data to an integrated computational pipeline to evaluate the fine immunological properties of detected SARS-CoV-2 epitopes and relate them to their evolutionary and structural characteristics. We show that while some epitopes are common iScience Article (public epitopes) across all studied hCoVs (including SARS-CoV-2), others are unique (private epitopes) to a specific hCoV. Then, to highlight epitopes that have an important role for protecting against SARS-CoV-2 when an individual gets infected, we defined differential epitopes as epitope for which the response is at least two-times higher in COVID-19-positive than COVID-19-negative individuals. We also highlight hotspots of pre-existing immunity and a subset of cross-reactive epitopes that contributes to increasing the average humoral immune response to SARS-CoV-2. Finally, using a dataset of over 38,000 publicly available genome sequences, collected during the first two waves of the pandemic, we tracked single nucleotide variants (SNVs) within and between COVID-19 patients and found evidence for positive selection on nonsynonymous mutations in epitopes. Selection is stronger between than within patients, indicating that selection for immune evasion occurs mostly upon transmission between hosts. Overall, our results have implications for future genomic surveillance and vaccine design.

RESULTS
Antibody fingerprinting with high-density peptide arrays provides a high-resolution antibody epitope map across the SARS-CoV-2 proteome Most previously reported high-resolution SARS-CoV-2 B cell epitope mapping strategies relied on VirScan/ PHIP-seq methodology. 30,32 However, the nature and dynamics of the peptide pools of VirScan/PHIP-seq limit the resolution, sensitivity, and breadth of specific epitope detection in infected individuals, providing a fragmented view of the complete footprint of epitope recognition by antibodies. 29,31 To assess the humoral immune response against SARS-CoV-2 at the epitope level, we used a HDPA technology to define virus protein-specific B cell epitopes and potential antigenic hotspots for antibody reactivity. A high-resolution linear epitope map across the entire SARS-CoV-2 proteome was achieved using the PEPperCHIP SARS-CoV-2 proteome microarray technology ( Figure 1A). 33 We performed this assay on sera obtained from ten SARS-CoV-2-positive individuals (asymptomatic and recovered) and five SARS-CoV-2-negative, control subjects (SARS-CoV-2-negative) ( Table S1). The degree of immune reactivity to spike protein (S), envelope protein (E), membrane glycoprotein (M), nucleocapsid phosphoprotein (N) and ORF1AB was measured in relative fluorescence units (RFUs). Linear overlapping peptides of 15 amino acid length were used for each protein and a dual isotype analysis, determining IgG-and IgA-specific antibody responses, was performed ( Figure 1A). This was followed by a comprehensive analysis workflow to characterize the differential epitopes, their structural properties and utilize genome sequence analysis of arising SARS-CoV-2 variants to assess immune evasion potential.
Sera from SARS-CoV-2-positive individuals yielded strong immune reactivity (measured in RFU) in S and N proteins, as well as in select regions of ORF1AB ( Figure 1B). Antibody responses were also identified in samples from SARS-CoV-2-negative individuals ( Figure 1B), suggesting that HDPA technology is well suited to detect epitopes of pre-existing immune responses conferred through prior infections with hCoVs. Although sera from SARS-CoV-2-positive and -negative individuals had very similar epitope coverage per amino acid, some regions in the SARS-CoV-2 proteome were more immuno-dominant than others, exerting higher RFU values ( Figure 1B). Except for M protein, all proteins analyzed had stronger antibody responses to more unique peptides in the SARS-CoV-2-positive patient group ( Figure S1, Tables 1, and  S2). In addition, the mean RFU values of SARS-CoV-2-positive sera were higher toward most regions of the SARS-CoV-2 proteome than in the SARS-CoV-2-negative group, demonstrating the elicitation of robust antibody responses to immuno-dominant epitopes upon SARS-CoV-2 infection ( Figure 1C). For further analysis of epitopes with greatest immuno-dominance, only peptide RFU values greater than or equal to Figure 1. High-density peptide arrays (HDPA) provide high-resolution antibody epitope maps across the SARS-CoV-2 proteome (A) Overview of analytical pipeline. The proteome of SARS-CoV-2 was translated into 15-mer overlapping peptides with a peptide-to-peptide overlap of 13 amino acids. The resulting individual peptides were printed in duplicates on the microarray. Sera from confirmed SARS-CoV-2-positive and -negative individuals were incubated on PEPperCHIP the HDPA. Serum antibody binding was visualized using respective fluorescently labeled secondary antibodies (anti-human IgG and anti-human IgA). Image acquisition and data quantification resulted in epitope-specific antibody profiles for SARS-CoV-2.
(B) Average relative fluorescent units (RFU) profiles and peptide coverages are plotted across the SARS-CoV-2 proteome (ORF1A, ORF1B, Spike (S) protein, Envelope (E) protein, Membrane (M) glycoprotein, Nucleocapsid (N) phosphoprotein). Antibody responses to each linear 15-mer peptide were mapped across the SARS-CoV-2 proteome and average RFU calculated for each amino acid residue. The normalized positional 'epitope coverage' at each protein residue location is defined as the ratio of total peptides mapped to each position by the total expected peptides (see STAR methods section iScience Article 1000 were used in our further analysis. Taken together, our results demonstrate that the applied HDPA approach allows highly sensitive detection of a large pool of epitopes across the SARS-CoV-2 proteome.

Structural features of identified epitopes and comparison with computationally predicted epitopes
An epitope is the minimal unit of an antigen that can be recognized by T and B cells and can elicit potent cellular and humoral immune responses, respectively. B cell epitopes can be divided into two major categories, namely linear and conformational epitopes. In a linear epitope, a stretch of continuous amino acids forms the antibody binding site, while amino acid residues that are brought together by protein folding form conformational epitopes. In our current study, antibody responses are detected using linear peptide arrays, and thus these epitopes are primarily linear in nature, although some linear epitopes contributing to conformational components of a protein may also be detected. Though there is a significant interest for short linear epitopes in vaccine design, most of the current SARS-CoV-2 vaccine immunogens are structural S proteins. 34 Thus, we asked whether the short peptide-based approach in the applied HDPA approach can also reveal conformational epitope sites, as has been recently suggested. 30 Using 3D structures and biophysical properties of the SARS-CoV-2 proteome, we applied the DiscoTope algorithm 35 to computationally predict conformational B cell epitopes as well as the BepiPred algorithm 36 to obtain linear B cell epitopes. We then compared these to the epitope sites identified in our HDPA experiment. Apart from E and M proteins, we observed significant overlap of experimentally identified epitopes with predicted epitopes ( Figure 2A), with approximately 38% of the total proteome being part of the amino acid residues contributing to the B cell epitome of SARS-CoV-2 recognized in infected individuals. We observed overlap of mapped epitope sites with conformational epitopes predicted by DiscoTope, suggesting that the applied HDPA approach also identifies a considerable number of conformational epitopes.
It is important for epitopes to be solvent exposed to allow their amino acid side chains to interact with the antibody. In our study, there was no significant difference in the average normalized solvent accessibility (SASA) of epitopes compared to non-epitope regions (mean SASA = 5.3 Å 2 ). However, residues in highly conserved epitope sites have lower solvent accessibility ( Figures S2A-S2C), suggesting that many identified epitopes might only get exposed through proteolysis or conformational changes throughout the infectious cycle or stage of anti-viral immune response. This implies that our linear peptide-based HDPA approach can capture more epitopes compared to those that use full-length antigens or protein domains to study immune profiling of antibody responses. 37,38 Using structural models we mapped the epitopes of S protein of SARS-CoV-2 identified by the HDPA approach, which revealed that many of the epitope sites identified in the S protein are in the N terminal domain (NTD) and the receptor binding domain (RBD) ( Figures 2B and 2C). Interestingly, most epitope sites identified in the NTD and RBD have low conservation scores, while many other epitope sites identified in the S protein had high conservation scores ( Figures 3A  and 3B). In addition, HDPA analysis revealed strong antibody immunoreactivity in a few epitope sites of E ( Figure 3C), M ( Figure 3D), N proteins ( Figure 3E), as well as ORF1A ( Figure S3) and ORF1B ( Figure S4).
Cryo-EM studies have indicated that the SARS-CoV-2 spike protein is highly flexible and exhibits several prefusion conformations where three RBDs adopt distinct orientations: ''up'' (receptor-accessible state) and ''down'' (receptor-inaccessible). Protomers with the up-conformation can facilitate the binding between the spike protein and the angiotensin-converting enzyme 2 (ACE2) receptor, thereby allowing iScience Article and facilitating host cell infection. 39 Analysis of structures of distinct SARS-CoV-2 human neutralizing antibodies (NAbs) in complex with the SARS-CoV-2 spike trimer or RBD revealed structural correlates of SARS-CoV-2 neutralization and allowed classification of antibodies into categories. 40 Focusing on RBD-binding a distinction into four distinct classes has been proposed: Class 1 -NAbs that block ACE2 and bind only to ''up'' RBDs; Class 2 -ACE2-blocking NAbs that bind both ''up'' and ''down'' RBDs; Class 3 -NAbs that bind outside the ACE2 site and recognize both ''up'' and ''down'' RBDs; and Class 4 -NAbs that do not block ACE2 and bind only to ''up'' RBDs. 40 To analyze the extent with which HDPA epitope sites overlap with known NAb-binding residues in SARS-CoV-2 RBD we aligned the epitopes sites identified by HDPA with NAb-binding residue sites mapped by Cryo-EM studies 40. Epitopes found by HDPA generally tend to agree with cryo-EM, particularly in more conserved regions of the protein sequence ( Figure 4A). More than 60% of RBD residues in NAb-binding sites are identified by HDPA analysis, mapping epitope sites in all four distinct structural correlate classes of SARS-CoV-2 RBD-binding NAbs identified by cryo-EM (Figure 4B). Taken together, our results demonstrate that the applied HDPA profiling strategy can identify a set of linear and conformational B cell epitopes unique in sera of SARS-CoV-2-infected individuals. In addition, a large portion of the identified epitope sites are residues in previously reported NAb-binding sites, demonstrating that epitope sites mapped by HDPA analysis are of functional relevance in antibody-mediated immunity to SARS-CoV-2.

Cross-reactivity to endemic seasonal human coronaviruses is a significant driver of antibody responses to SARS-CoV-2 epitopes
In addition to the zoonotic pathogens SARS-CoV, MERS-CoV, and SARS-CoV-2, four other low-pathogenicity hCoVs are endemic and co-circulating in the human population 41 : strains OC43 and HKU1 (beta-CoVs like SARS-CoV-2), and NL63 & 229E (alpha-CoVs), of which OC43 and 229E are the most common, accounting for 5-30% of common colds. 42 Notably, structural proteins of SARS-CoV-2 show some degree of amino acid sequence identity with hCoVs. 43,44 One prevailing view in our understanding of COVID-19 immunopathogenesis is that an underlying immune response toward endemic hCoVs is a hallmark feature of SARS-CoV-2-infected asymptomatic individuals. 11 This pre-existing immunity is hypothesized to partially control viral replication and eliminate infected cells resulting in less severe pathology and inflammation. 11,12,[45][46][47][48] Having established the link between structure accessibility and protein conservation ( Figures 3B and S2), we next asked how conservation is related to the humoral immune response. One way protein conservation could affect adaptive immunity is through cross-reactivity to related viruses. To this end, we first analyzed the humoral immune response against the four hCoVs OC43, HKU1, NL63 and 229E at the epitope level using HDPA on sera from the same ten SARS-CoV-2-positive (asymptomatic or recovered) patients and five control subjects (SARS-CoV-2-negative; Table S1). HDPA yielded strong antibody reactivities to many distinct sites across the proteomes of all hCoVs (Tables 2, S3, S4, S5, and S6). We then defined cross-reactive epitopes based on the conservation of peptide residues across hCoVs and the presence of an immune response to epitopes in SARS-CoV-2 and at least of one of the endemic hCoVs (HKU1, NL63, OC43 or 229E; Figure 5A). To evaluate the conservation of peptide sequences across hCoVs, we aligned the protein sequences of these viruses and calculated a conservation score, reflecting the conservation of physicochemical properties in the alignment where identical residues score the highest. 49 We also defined cross-reactivity at the level of epitope sites (single amino acids) to account for the possibility that a particular amino acid site within a 15-mers epitopes is associated with cross-reactive immunity. Epitope sites with a conservation score R 6 and for which we obtained antibody reactivity for both SARS-CoV-2 and at least one of the hCoVs were considered as cross-reactive epitope sites. These cross-reactive epitope sites represent 27.2% of the pool of detected epitope sites by the applied HDPA assay (Table S7). We also carried out local alignment of the peptides from the HDPA (with RFU R 1000) of all five viral strains to the SARS-CoV-2 proteome to evaluate the cross-reactivity profile of SARS-CoV-2 epitopes and identified hotspots of conserved epitopes (example for S protein in Figure 5B). iScience Article Next, to highlight cross-reactive epitope sites that are particularly important for the humoral immune response after exposure to SARS-CoV-2, we focused on differential cross-reactive epitope sites that give an antibody response signal in sera of SARS-CoV-2-positive over SARS-CoV-2-negative individuals. Although we analyzed sera from a smaller sized cohort compared to a recent study using PhIP-Seq, 30 we were nonetheless able to sensitively detect more differential cross-reactive epitope sites discriminating virus exposed from non-exposed individuals ( Figure 5C). The increased sensitivity of HDPA over PhIP-Seq analysis for identifying cross-reactive epitope sites was further highlighted when performing a sensitivity analysis on the number of cross-reactive epitope sites that define a cross-reactive epitope ( Figure S5A).
We next analyzed if the humoral immune response to SARS-CoV-2 epitopes correlated with the number of cross-reactive epitopes identified. In other words, to what extent is the response to SARS-CoV-2 predictable based on cross-reactivity to other endemic hCoVs? We defined cross-reactive epitopes as peptide sequences with at least five cross-reactive epitope sites. We found a positive correlation between the average humoral immune response to SARS-CoV-2 epitopes and the number of cross-reactive epitopes per patient in a recently published PhIP-Seq dataset 30 ( Figure 5D) and in our HDPA dataset ( Figure S5). We detected a stronger positive correlation between the average antibody response to SARS-CoV-2 epitopes and the number of cross-reactive epitopes in SARS-CoV-2-positive compared to negative patients ( Figure 5D; correlation coefficients: 4.42e-3 vs. 1.54e-3; ANOVA p for Covid19 status = 7.43e-10). This positive correlation is robust to the threshold number of cross-reactive epitope sites defining a cross-reactive epitope and is also replicated in our HDPA dataset ( Figure S5).
Leveraging the sample size in this same PhIP-Seq dataset, we aimed to identify the subset of cross-reactive epitopes with the greatest contribution in humoral immunity in SARS-CoV-2-positive patients using the IndVal test, which is a non-parametric test to identify significant associations from presence-absence data. 50 This test allowed us to identify 75 epitopes that are significantly associated with SARS-CoV-2-positive over SARS-CoV-2-negative samples, with 16 out of these 75 epitopes (21.3%) being cross-reactive (Table S8.). These results again highlight the contribution of a subset of hCoV-cross-reactive epitopes for the humoral immune response to SARS-CoV-2.

Point mutations and natural selection in epitopes occur at higher rates upon transmission than within patients
There is mounting evidence that mutations in SARS-CoV-2 enhance viral fitness, replication rate and transmissibility, and/or partially evade adaptive immunity that has been induced by prior infection or vaccination. 26 Thus, it is essential to shed light on the interplay between SARS-CoV-2 mutations and the acquired immune response in infection. To this end, we tracked the evolution of SARS-CoV-2 B cell epitopes using SNVs identified in 38,685 SARS-CoV-2 genome sequences from the NCBI sequence read archive (Table S9). We selected SARS-CoV-2 samples from the first pandemic wave (defined as January 1 to July 31, 2020) and the second wave (defined as August 1 to December 31, 2020) sequenced using Illumina paired-end amplicons with a minimum average depth of coverage of 2003 and fewer than 10,000 sites with a depth of coverage lower than 1003. Combined with additional filters to remove sequencing errors (see STAR methods for details), such deep coverage allowed us to identify SNVs that are polymorphic within patients, reflecting within-patient evolution, 51,52 as well as those that are shared between the consensus sequences of different patients. We refer to these within-patient SNVs as 'mutations' and to between-patient SNVs (those present at >75% frequency within a sample and observed in three or more samples) as 'substitutions' that have likely been transmitted across multiple patients. Our definitions of mutations and substitutions are not mutually exclusive: an SNV can be a mutation in one sample and a substitution in another. We counted the absolute number of substitutions relative to the Wuhan-1 reference genome, so the count does not reflect the unique number of substitution events along a phylogeny, but rather the prevalence iScience Article of the substitutions in the database. As such, substitution counts are weighted to reflect their 'success' in transmitting widely.
Using this dataset of mutations and substitutions, we first asked whether cross-reactive (public) epitopes evolved differently than epitopes private to SARS-CoV-2. We found that cross-reactive epitopes tend to evolve more slowly than SARS-CoV-2 private epitopes, accumulating fewer substitutions and having lower ratios of nonsynonymous to synonymous substitutions ( Figures S6A-S6D). The same trend of slower evolution in cross-reactive epitopes is also observed at the level of within-patient mutations, but the effect is much stronger at the level of substitutions between patients ( Figures S6A-S6D). This is consistent with the fact that these epitopes are conserved across multiple distinct hCoV strains and could be evolving under strong and long-standing functional constraints. However, purifying selection has less time to purge deleterious mutations within hosts, and is therefore more detectable over longer time scales spanning multiple transmission events.
Mutations in epitopes have the potential to evade or lessen the effectiveness of adaptive immunity conferred by infection or vaccination. A recent topic of debate has been the extent to which natural selection for immune evasion acts on SARS-CoV-2 during infection, or upon transmission. 26 During influenza virus infection, most of the selective pressure for immune evasion occurs upon transmission, not within a patient. 53 This is because viral loads often peak before the priming of adaptive immune responses. As such, peak viral transmission occurs before there is time for selection to act within a patient, and for immune evasion to occur. A similar 'asynchrony' transmission model has been proposed for SARS-CoV-2, 54 although data supporting such model has been lacking. To test the asynchrony model in SARS-CoV-2 we tracked SNVs within as well as between patients, within and outside epitope sites, and across the first two pandemic waves (Figures 6 and S7). Throughout both waves, we found  iScience Article consistently lower within-host mutation rates in epitopes sites when compared to non-epitope sites across most SARS-CoV-2 proteins (Figures 6A and S7A). In contrast, the structural proteins S, M, and N had significantly higher rates of between-host substitution in epitope sites compared to non-epitope sites ( Figures 6B and S7B), suggesting stronger positive selection for epitope changes upon transmission than within hosts. To further assess the evidence for selection on epitopes, we used the ratio of nonsynonymous to synonymous SNVs both between patients (dN/dS) and within patients (pN/pS) calculated separately within and outside epitopes. Higher ratios indicate positive or relaxed purifying selection, whereas lower ratios indicate stronger purifying selection. We found that the structural proteins S and N have consistently higher nonsynonymous SNV rates in epitope sites, both within and between patients, and across both pandemic waves ( Figures 6C, 6D, S7C, and S7D). While this result is consistent with positive selection of altered epitopes (immune evasion) occurring both within and between patients, dN/dS ratios (between patients) are consistently higher than pN/pS ratios (within patients). These observations indicate that nonsynonymous substitutions in S and N epitope sites accumulate most rapidly upon transmission, rather than within patients. Taken together these results support the notion that most of the selective pressure for immune evasion of SARS-CoV-2 occurs upon transmission between hosts, consistent with the asynchrony model. 53

Assessing the immune evasion potential of SARS-CoV-2 variants
The observation of similar pattern of mutations and selective pressures in epitopes across pandemic waves 1 and 2 ( Figures 6 and S7) was surprising, given the expectation that increasing levels of immunity in the population would lead to increased selection for immune evasion over time. The second wave is characterized by the rise of VOCs and VUIs with higher transmissibility and, in some cases, increased disease severity and acquired immune evasion phenotypes. 26 The rise of VOCs has been suggested to be due to a shift in the SARS-CoV-2 fitness landscape. 55 If part of this shift were due to rising population immunity from the first to second wave, one would expect increasing selection for immune evasion variants, resulting in higher frequencies of SNVs in epitopes in wave 2. Although we found a higher total number of nonsynonymous SNVs (including both mutations and substitutions) in epitope sites unique to wave 1 than unique to wave 2 ( Figure 7A), wave 2-specific SNVs reached higher frequencies across samples compared to wave 1-specific SNVs, consistent with increased selection for immune evasion over time ( Figure 7B). However, mutations common to both waves achieved the highest frequencies, indicating their early appearance and persistence over time ( Figure 7B).
A likely driver of VOC evolution is selection for increased transmissibility. For example, the Delta VOC is estimated to be 76-117% more transmissible than non-VOCs and non-VUIs, while Gamma is 29-48% more transmissible and Alpha is 24-33% more transmissible than the original Wuhan SARS-CoV-2 strain. 56 However, selection for immune evasion could also play a significant role for increased spread of VOCs and VUIs. To test this hypothesis, we defined signature mutations of each variant (see STAR methods; Table S10) as substitutions that are present in R90% of sequences assigned to that lineage. We calculated the prevalence of substitutions in thousands of publicly available consensus sequences collected from NCBI during 2020 and added data from CoV-Spectrum about under-represented lineage in the database or lineages that emerged during 2021. 57 We focused on nonsynonymous signature mutations located in epitope sites and found that VOCs and VUIs contain significantly more signature mutations in epitopes compared to non-VOCs and non-VUIs ( Figure 7C) suggesting that evasion of the humoral immune response could be a significant driver of VOC/VUI evolution. We then ranked VOCs and VUIs based on their number of signature mutations in epitopes ( Figure 7D). We observed that Delta has an intermediate number of mutations in mapped epitopes ( Figure 7D). However, the most nonsynonymous epitope mutations were observed in Omicronand its sublineages and recombiants such as XBB ( Figure 7D), which other studies have shown to be highly immune evasive. [58][59][60] Most epitope mutations in VOC/VUI occur in the S protein ( Figure 7D). Normalizing by gene length revealed a relatively high density of epitope mutations in the N protein, especially at sites of the N protein that overlap with ORF9c ( Figure S8), a iScience Article membrane-anchored protein of SARS-CoV-2 that can hinder interferon signaling, viral protein degradation and other stress response pathways when expressed in human lung epithelial cell lines. 61 Finally, having established these general evolutionary patterns of mutation and selection on epitopes, we attempted to pinpoint specific epitope mutations that could hinder the immune response. For each epitope site, we extracted both the measured patient immune responses and the prevalence of nonsynonymous (missense) mutations from the NCBI dataset (Table S11). Among the most prevalent mutations identified are two mutations occurring at consecutive sites in the N protein (N:R203K and N:G204R) that overlap with ORF9c (encoded within the N gene). Taken together our observations show that high resolution epitope mapping combined with genome sequence analysis provides a powerful strategy to rapidly assess the immune evasion potential of emerging SARS-CoV-2 variants. Our results demonstrate that the HDPA approach provides a sensitive, high-throughput antibody profiling strategy to identify linear and conformational B cell epitopes. Using structural models, we found that many of the epitope sites identified in the S protein are in the NTD and RBD region of the S protein. Interestingly, most epitope sites identified in the NTD and RBD are poorly conserved across coronaviruses, while epitope elsewhere in the S protein were more highly conserved. In addition, HDPA analysis revealed strong and specific antibody immunoreactivity in select epitope sites of structural SARS-CoV-2 proteins (E, M, N proteins), as well as ORF1AB.
Antibody cross-reactivity with similar viral antigens affects the accuracy of serological tests, but also has the potential to elicit beneficial immunological memory responses that could affect the course of SARS-CoV-2 infections. Our results highlight a significant cross-reactivity between SARS-CoV-2 and hCoV B cell epitope sites in many viral proteins, demonstrating that HDPA allows to uncover another dimension of cross-reactive immunity relative to PhIP-Seq. The fact that more differential epitope sites in the S and N proteins were detected by a recent study using PhIP-Seq 30 probably reflects the lower sample size of our dataset. However, HDPA detected more cross-reactive epitope sites than PhIP-Seq with fewer patient samples analyzed, reflecting one of the benefits of our applied methodology. Sensitivity limitations of PhIP-seq to broadly detected polio epitopes have been previously reported 29,31 and might contribute to the observed differences, similarly affecting detectability of CoV antigens. Such limitations are not observed in our HDPA approach 62 which typically yielded strong polio responsiveness in over 90% of sampled individuals. 63 While HDPA assays show significant benefits over PhIP-seq analyses, a few limitations still apply in our study. It needs to be taken into consideration that our analysis only targeted ten patient and five control sera, and the antibody-binding epitope profile reflects the breadth of anti-viral immunity in these specific patients. Hence, the non-epitope sites may not necessarily be sites that cannot be recognized by antibodies and increasing the number of analyzed sera could also potentially increase the breadth of the antibody response detected in our assay. Moreover, the signal readout of the HDPA microarray combines several parameters which could potentially affect epitope availability including antibody titer, antibody affinity and antibody off-rates since assays are stopped at a specific timepoint followed by washing and addition of a secondary antibody. Folding effects of the immobilized peptides on the chip can also play a role in masking available epitopes, as peptides tend to form structures (e.g., helical in nature) with increasing distance from the glass slide coating. Such folding may be beneficial or counterproductive for antibody binding. Based on our experience running HDPA, in general, RFU of equal to or higher than 100 reveal a positive antibody response. However, to focus on the most prominent antibody responses, our analysis was performed with an RFU cut-off of 1000 or higher, representing the most significant epitope sites of the cohort analyzed.
Although HDPA is primarily considered to provide linear epitope information, folding effects of the immobilized peptides may provide a large structural interface diversity for conformational antibody-binding sites. In addition, studies have indicated that the SARS-CoV-2 spike protein is very flexible and exhibits iScience Article several prefusion conformations. 39 The HDPA approach might therefore provide a larger and more diverse structural interface for antibody-binding than full-length antigens, in turn, reflecting enhanced diversities of structural interfaces to viral antigen sites during viral infection. Using RFU values of HDPA analysis to calculate ratio values of the analyzed sera (uninfected vs. infected) allowed for the definition of differential epitope sites per amino acid residue within the distinct SARS-CoV-2 proteins. Using structural modeling, we also mapped the epitope sites identified by HDPA within the S protein of SARS-CoV-2 and showed that many of the epitope sites are in the NTD and the receptor binding domain (RBD) (Figure 3).
To analyze the extent to which HDPA epitope sites overlap with known neutralizing antibody (NAb)-binding residues in SARS-CoV-2 RBD, we aligned the epitopes sites identified by HDPA with NAb-binding residue sites mapped by cryo-EM studies 39,40. We demonstrate that more than 60% of RBD residues in NAb-binding sites are identified by HDPA analysis, mapping epitope sites in all four distinct structural correlate classes of SARS-CoV-2 RBD-binding NAbs identified by cryo-EM 39,40. Hence, our data clearly shows that our HDPA profiling strategy can identify sets of linear and conformational B cell epitopes unique in sera of SARS-CoV-2-infected individuals. Importantly, a large portion of the identified epitope sites are residues located in previously reported NAb-binding sites, demonstrating that epitope sites mapped by HDPA analysis are functionally relevant in Ab-mediated immunity to SARS-CoV-2.
Importantly, the cross-reactivity in identified B cell epitope sites positively relates to previous infections with seasonal common cold hCoVs. This suggests that immune memory conferred by previous seasonal hCoV infections positively influences SARS-CoV-2-specific antibody responses and may explain the large portion of SARS-CoV-2-infected individuals with mild and asymptomatic disease symptoms. 4 Notably, there was little to no correlation between cross-reactivity and immune response in COVID-19 negative patients, suggesting that resistance to infection is not easily explained by cross-reactivity. However, this molecular cross-reactivity can pose important complications in serological tests, particularly when studying asymptomatic patients. Cross-reactivity in immunodominant epitopes can be molecular determinants of strong immunity in individuals and therefore may serve as the basis for future pan-coronavirus vaccine design strategies. In turn, mutations in these cross-reactive epitopes can potentially breach pre-existing immune protection conferred by previous viral exposures, contributing to viral evolution, immune selection, and immune evasion.
By combining our epitope dataset with publicly available SARS-CoV-2 genome sequences, we were able to study mutations that occur in epitopes and compare their rates of evolution and selective pressures to nonepitope sites. Ideally, we would have matched epitopes and viral mutations arising from the same patients to infer selection more directly for immune evasion. Although such matched data is currently rare, we were still able to make inferences about evolution within epitopes on a population-wide scale. First, we found that mutations in SARS-CoV-2 epitopes are under evolutionary constraints. SARS-CoV-2-specific epitopes that are cross-reactive with other endemic seasonal hCoVs tend to accumulate fewer substitutions and are under purifying selection against nonsynonymous changes. Second, epitopes in structural proteins S, M, and N accumulate more substitutions and are under stronger positive selection for nonsynonymous changes than non-epitopes. Natural selection favoring changes in epitope sites was therefore detectable during the first two pandemic waves. As population immunity accumulates over time, we would expect increasing selection for immune evasion. Consistent with this expectation, we observed that mutations in epitopes increased in frequency from the first to the second pandemic wave, and we expect this trend to continue.
Notably, we found much slower rates of evolution and weaker evidence for positive selection on epitopes within patients, indicating that most selection for immune evasion occurs upon transmission rather than within patients. This is consistent with asynchrony between peak viral loads (when selection is most efficient) and the adaptive immune response, as is the case for influenza. 53,64 Notable exceptions are chronic infections, in which significant adaptive evolution occurs within patients, likely including antibody evasion. 65,66 However, such infections likely represent a small minority of the sequences included in our dataset. While they may be important -particularly if a chronic infection is transmitted -they do not represent most COVID-19 cases. Another non-exclusive explanation for the higher between-host mutation rate in epitope sites is the small transmission bottleneck. 67,68 Consistent with a general trend of immune evasion, we observed that VOCs and VUIs contain significantly more signature mutations in epitopes than non-VOCs and non-VUIs demonstrating that evasion of the iScience Article humoral immune response is a significant driver of VOC/VUI evolution. The most mutations in epitopes were found in the VOCs Delta, C.36.3, and especially Omicron (B.1.1.529) and its sublineages. Most of these epitope mutations in VOCs are localized to the S protein, highlighting that polymorphism in the S protein critically impacts antigenicity in highly transmissible variants.
Much research rightly focuses on the S protein, but we also find mutation in the N protein epitopes that could be selected for immune evasion. The N protein had the highest dN/dS values during both pandemic waves analyzed, suggesting the presence of a subset of epitope substitutions under positive selection. After normalizing by gene length, we found the highest density of epitope mutations in the N protein, especially in regions of overlap with ORF9c. Orf9c is one of the four conserved overlapping genes (OLGs) of SARS-CoV-2, 69 wherein a single stretch of nucleotides encodes two distinct proteins in different reading frames. OLGs are 'genes within genes' that compress genomic information, thereby allowing genetic innovation via overprinting. 70,71 However, a single mutation in an OLG may alter two proteins at the same time, constraining evolution of the pre-existing open reading frame (ORF). Although, OLGs are known entities that contribute to the emergence and pathogenicity of new viruses, 72 unfortunately, genome annotation methods typically miss OLGs, instead favoring one ORF per genomic region. 72 Similarly, they remain inconsistently reported in viruses of the SARS coronavirus species. 73 Importantly, annotations of ORF9b and ORF9c are conflicting or absent in the SARS-CoV-2 reference genome Wuhan-Hu-1 (NCBI: NC_ 045512.2) and genomic studies. 74,75 In addition, OLGs are often not displayed in genome browsers 76 and therefore such inconsistencies complicate research to decipher their role in infection and immunity.
The small protein encoded by the ORF9c OLG has recently been shown to constitute a membrane-associated protein to suppresses antiviral interferon and antigen-presentation responses and modify innate immune responses. 61,77,78 Here, we found that N protein epitopes in the region overlap with ORF9c constitute an antigenic target of the humoral immune response and accumulate a high density of mutations in VOCs. It remains to be investigated if and to what extent ORF9c-specific immune responses contribute to host protection and if mutations could also affect these responses. Other OLG-derived proteins, including Orf3d, ORF8 and Orf9b, have been shown to elicit strong antibody responses in sera from COVID-19 patients, [79][80][81] although their contribution to host protection remains unknown. Concerns have arisen that S-specific vaccine immunity conferred solely to S protein may fail to neutralize emerging variants of SARS-CoV-2 and contribute to selection of immune escape variants. [82][83][84] Vaccination studies in rodent models using N protein as antigenic target have recently shown the establishment of protective immunity. 85 Hence, expansion of viral antigenic targets in SARS-CoV-2 vaccines, including OLG proteins, to broaden epitope coverage and immune effector mechanisms should be a goal in the development of new COVID-19 vaccines.

Limitation of the study
While HDPA assays show significant benefits over other epitope mapping strategies, a few limitations still apply in our study. It needs to be taken into consideration that we were only able to analyze 10 patients and 5 control sera. Hence, the non-epitope sites may not necessarily be sites that cannot be recognized by antibodies, and increasing the number of analyzed sera could potentially increase the breadth of the antibody response detected by the HDPA assay. Moreover, the signal readout of the HDPA microarray comprises several parameters which could potentially affect epitope availability including antibody titer, antibody affinity and antibody off-rates since assays are stopped at a specific timepoint followed by washing and addition of a secondary antibody. Folding effects of the immobilized peptides on the chip can also play a role in masking available epitopes, as peptides tend to form structures (e.g., helical in nature) with increasing distance from the glass slide coating. Such folding may be beneficial or detrimental for antibody binding.

STAR+METHODS
Detailed methods are provided in the online version of this paper and include the following:

DECLARATION OF INTERESTS
Volker Stadler is the Chief Executive Officer of PEPperPRINT. All other authors declare no competing interests.

INCLUSION AND DIVERSITY
We support inclusive, diverse, and equitable conduct of research. iScience Article general, relative fluorescent units (RFU) of equal to or higher than 100 was considered a positive antibody response. However, as mentioned in the results sections, to focus on the most prominent antibody responses, our analysis was performed with an RFU cut-off of 1000 or higher.

Received
Defining cross-reactivity using protein conservation and immune response to endemic human coronaviruses To find epitope sites associated with cross-reactivity, we first calculated the conservation of peptide sequences across endemic hCoVs (HKU1, NL63, OC43 or 229E). To do so, we aligned the reference sequences of the hCoVs (proteome) in Jalview and extracted the conservation score (Cscore). 91 This conservation score reflects the conservation of physico-chemical properties in the alignment, where identical residues score the highest. 49 Epitope sites with a conservation score R6 and for which we detected antibody responses for both SARS-CoV-2 and at least one of the endemic hCoVs were considered as cross-reactive epitope sites. To find epitope sites associated with cross-reactivity, we first calculated the conservation of peptide sequences across endemic hCoVs (HKU1, NL63, OC43 or 229E). To do so, we aligned the reference sequences of the hCoVs (proteome) in Jalview and extracted the conservation score (Cscore) (86). 91 This conservation score reflects the conservation of physico-chemical properties in the alignment, where identical residues score the highest (49). 49 Epitope sites with a conservation score R6 and for which we detected antibody responses for both SARS-CoV-2 and at least one of the endemic hCoVs were considered as cross-reactive epitope sites. To find epitope sites associated with cross-reactivity, we first calculated the conservation of peptide sequences across endemic hCoVs (HKU1, NL63, OC43 or 229E). To do so, we aligned the reference sequences of the hCoVs (proteome) in Jalview and extracted the conservation score (Cscore). 91 This conservation score reflects the conservation of physico-chemical properties in the alignment, where identical residues score the highest. 49 Epitope sites with a conservation score R6 and for which we detected antibody responses for both SARS-CoV-2 and at least one of the endemic hCoVs were considered as cross-reactive epitope sites.
Detecting epitopes that are significantly more prevalent in SARS-CoV-2 positive patients First, antibody responses to each linear 15-mer peptide were mapped across the SARS-CoV-2 proteome and average RFU calculated for each amino acid residue.
Second, the normalized positional 'epitope coverage' at each amino acid residue within the proteins was defined as the ratio of total peptides mapped to each position by the total expected peptides, with values ranging between 0 and 1. A value of 1 in the SARS-CoV-2-positive group means that amino acid residues within the proteins were covered by peptides that showed immune response in all 10 SARS-CoV-2-positive patients and 14 peptides that overlap that position. (14 3 10 = 140 is the theoretical expected positional coverage to be 100%). Similarly, a value of 1 in SARS-CoV-2-negative group is 70 peptides with response (14 peptides x 5 SARS-CoV-2-negative patients = 70. i.e., all 70 unique peptides that cover residue locations).
Third, to identify the epitopes that are particularly prevalent in SARS-CoV-2-positive subjects, we performed an indicator value analysis. 50 This type of analysis is frequently used in ecology to determine whether species have significant associations with certain site groups. We applied this method to epitopes presence/absence data by replacing species with epitopes and site groups with patient groups defined by SARS-CoV-2 PCR status (positive or negative). The indicator value analysis measures the IndVal metric, which is the product of the specificity (e.g., the proportion of individuals within the whole dataset that exhibit a response to the epitope and belongs to a certain patient group) and the fidelity (e.g., the proportion of individuals within a certain patient group that exhibits a response) to the epitope. To control for the differences in sample size between patient groups, we used the group-equalized version of IndVal, IndValpag. 50 The R function multipatt from the R package indicspecies allowed us to perform this analysis and evaluate the significance of the associations through permutation tests. 50 Structural properties of B cell epitopes and B cell epitope prediction SARS-CoV-2 protein sequences were obtained from Uniprot. 92 Structure models of all 24 proteins in SARS-CoV-2 were obtained from I-TASSER. 93 Solvent accessibility was calculated using freeSASA. 94 Pymol was used for visualization. Using 3D structures and biophysical properties of the SARS-CoV-2 proteome, we applied the DiscoTope algorithm 35 to computationally predict conformational B cell epitopes with a significance threshold of À7.7 (75% specificity, 45% sensitivity). In addition, we used the Bepipred algorithm 36 iScience Article to obtain linear B cell epitopes. Epitopes with minimum length of 7 amino acid residues and minimum score of 0.55 (80% specificity, 30% sensitivity) were used for the analysis.

Epitope evolution profiling
To understand the evolution of SARS-CoV-2 epitopes in SARS-CoV2-positive patients, we made use of single nucleotide variants (SNVs) from 38,685 whole genome sequences from the NCBI sequence read archive (Table S9, see https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/4ZXDW0). We selected SARS-CoV-2 samples from the first pandemic wave (defined as January 1 to July 31, 2020) and the second wave (defined as August 1 to December 31, 2020) sequenced using Illumina paired-end amplicons with a minimum average depth of coverage of 200 x and fewer than 10,000 sites with a depth of coverage lower than 100x. We then retained single nucleotide variants present in both minus and plus strands at a minimum frequency of 2%, occurring at sites with a minimum depth of 100x, having a minimum within-sample frequency of 5% and located between sites 101 and 29778 of the genome to exclude sites at the extremities that are prone to sequencing errors and have been frequently masked. 95 These additional filters allowed us to remove sequencing errors and provided deep coverage to identify SNVs that are polymorphic within patients, reflecting within-patient evolution, 51,52 as well as those that are shared between the consensus sequences of different patients.
Next, to compare epitope evolution from the evolution of non-epitope sites of the same protein, we measured the evolution rates at within-host (SNVs with a frequency <75% that are not transmitted for sure) and between-host/transmission level (SNVs with a frequency R75% that are observed in at least three samples). Because the number of SNVs observed will vary depending on sample coverage, which varies across samples, we estimated the evolution rates in each sample separately using the number of SNVs observed per site with adequate coverage. Such sites are defined as having a detection power of at least 80%, which is the probability of detecting five reads supporting the presence of a SNVs with a frequency of at least 5% in a site of coverage C, i.e., the minimum adequate coverage, under a binomial distribution. This approach has been used previously for similar purposes with the Lassa virus. 96 We also inferred selection in the proteins of interest (ORF1A/B, Spike (S) protein, Envelope (E) protein, Membrane (M) glycoprotein, Nucleocapsid (N) phosphoprotein) using dN/dS, the ratio of non-synonymous (dN) and synonymous substitutions rates (dS), which we calculated from the called SNVs in each sample. 97 dN = dS = ðNb nsub = Nb nss Þ = ðNb ssub = Nb ss Þ Equation 1 where Nb nsub is the number of non-synonymous substitutions, Nb nss is the number of non-synonymous sites, Nb ssub is the number of synonymous substitutions, and Nb ss is the number of synonymous sites. dN/dS can detect purifying selection (dN/dS < 1), neutral evolution (dN/dS z 1) and positive selection (dN/dS > 1). In each sample, we calculated dN/dS only if there were more than three SNVs including at least one synonymous SNV.
Finally, we inferred selection at the within-host level, using pN/pS, which we calculated from intrahost SNVs (iSNVs), i.e., SNVs that are not fixed (within-sample frequency <75%): pN pS = ðNb nmut = Nb nss Þ = ðNb smut = Nb ss Þ Equation 2 where Nb nmut is the number of non-synonymous iSNVs, Nb nss is the number of non-synonymous sites, Nb smut is the number of synonymous iSNVs, and Nb ss is the number of synonymous sites. These analyses have been implemented in R (https://github.com/arnaud00013/SARS-CoV-2-HPDA-evolutionary-analysis).

Selection for immune escape in VOCs and VUIs genomes
To reveal VOCs and VUIs mutations possibly involved in selection for immune escape, we first defined the signature mutations of each variant (Table S10) as substitutions that are present in R90% of sequences assigned to that lineage. We calculated the prevalence of substitutions in thousands of publicly available consensus sequences collected during 2020 and added data from CoV-Spectrum about under-represented lineage in the database or lineages that emerged during 2021. 57 Then, we only focused on nonsynonymous signature mutations in our database and asked if these signature mutations are located at epitope sites as these mutations can change the antibodies' ability to recognize the epitopes. The signature mutation prevalence data were collected from our database of NCBI samples for the earlier lineages iScience Article (PANGO v.2.1.7) and from GISAID data obtained from CoV-spectrum for more recent lineages like Omicron. The database of lineage signature mutations is available on Github (https://github.com/ arnaud00013/SARS-CoV-2-HPDA-evolutionary-analysis).

QUANTIFICATION AND STATISTICAL ANALYSIS
All statistical details of experiments can be found in the figure legends. Findings were considered significant with a p value of less than 0.05. All analyses were performed with R Software Version 4.2.1.
Whole genome sequences used for evolution profiling are publicly available.