N-terminal domain mutations of the spike protein are structurally implicated in epitope recognition in emerging SARS-CoV-2 strains

Graphical abstract


a b s t r a c t
During the past two years, the world has been ravaged by a global pandemic caused by the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2). Acquired mutations in the SARS-CoV-2 genome affecting virus infectivity and/or immunogenicity have led to a number of novel strains with higher transmissibility compared to the original Wuhan strain. Mutations in the receptor binding domain (RBD) of the SARS-CoV-2 spike protein have been extensively studied in this context. However, mutations and deletions within the N-terminal domain (NTD) located adjacent to the RBD are less studied. Many of these are found within certain b sheet-linking loops, which are surprisingly long in SARS-CoV-2 in comparison to SARS-CoV and other related b coronaviruses. Here, we perform a structural and epidemiological study of novel strains carrying mutations and deletions within these loops. We identify short and long-distance interactions that stabilize the NTD loops and form a critical epitope that is essential for the recognition by a wide variety of neutralizing antibodies from convalescent plasma. Among the different mutations/deletions found in these loops, Ala 67 and Asp 80 mutations as well as His 69/Val 70 and Tyr 144 deletions have been identified in different fast-spreading strains. Similarly, deletions in amino acids 241-243 and 246-252 have been found to affect the network of NTD loops in strains with high transmissibility. Our structural findings provide insight regarding the role of these mutations/deletions in altering the epitope structure and thus affecting the immunoreactivity of the NTD region of spike protein.

Research in context
Evidence before the study The COVID-19 worldwide pandemic is already into the fourth wave, with novel highly aggressive strains expanding rapidly. Mutations in the RBD of the SARS-CoV-2 spike protein have been extensively studied in the context of higher infectability. However, accumulated evidence underlines the critical role of NTD mutations and deletions in the immunogenicity of the spike protein.
Added value of this study In this study we find that certain loops within the N-terminal domain of the SARS-CoV-2 spike protein have evolutionary diverged in comparison to other beta-coronaviruses and particularly SARS-CoV. These highly flexible loops are in close proximity and contribute to various interactions that stabilize a surfaceexposed tertiary structure. A super-epitope recognized by most neutralizing antibodies from convalescent plasma is formed and stabilized by specific amino acid residues within these loops, implying a critical role for epitope structure. For their length, these loops accumulate a disproportionably high number of mutations, driving SARS-CoV-2 evolution. Mutations and deletions affecting these amino acid residues are predicted to promote structural changes to the super-epitope region. Such mutations/deletions are common in fast-spreading SARS-CoV-2 strains associated with immune system escape.  [1]. Coronaviruses are divided into four subtypes (a, b, c and d) and can infect various hosts from birds to mammals, and cause severe morbidity and mortality. They came under scrutiny in the past two decades due to recurrent incidents of widespread infection of humans. The most common human coronaviruses are the subtypes a 229E and NL63, and subtypes b HKU1 and OC43, which infect the respiratory tract of people throughout the globe causing common cold symptoms. More recently, b coronavirus strains of zoonotic (animal) origin have emerged as serious threats of human life: MERS-CoV (Middle East Respiratory Syndrome, MERS), SARS-CoV (Severe Acute Respiratory Syndrome, SARS) and SARS-CoV-2 (Coronavirus Disease 2019, COVID-19) [2]. Although the overall mortality of MERS and SARS-CoV is much higher, SARS-CoV-2, which emerged in the Wuhan province in China [3], has infected over 205,5 million people and caused close to 4.3 million deaths worldwide (https://coronavirus.jhu.edu/map.html).

Coronaviruses
SARS-CoV and SARS-CoV-2 share almost 80% sequence identity and both enter the host cells through interaction of the S (spike) protein with the angiotensin-converting enzyme 2 (ACE2) [4]. The S-protein shares 80% protein similarity between the two SARS strains and binds ACE2 with a similar affinity [5].
It is widely accepted that genetic variability and evolution within the positive strand RNA viruses is mainly driven by the low fidelity of RNA replication, as the RNA-dependent RNA polymerase (RdRP) is prone to high error rates [6]. Unlike other RNA viruses, where replication is primarily dependent on the RdRp, in coronaviruses, non-structural proteins (nsps) that include processivity factors (nsp7-8), a helicase (nsp13), a single-strand binding protein (nsp9), a proofreading exonuclease (nsp14) and other cofactors (e.g. nsp 10, nsp16) form a replication complex with the RdRp (that has proof-reading activity and corrects errors by the viral RdRp [7,8]. As a result, coronaviruses are characterized by a 10-fold lower mutation rate compared to other RNA viruses. Despite the proofreading activity in RNA replication , the estimated mutation rate for SARS-CoV is 4x10 -4 nucleotide substitutions/site/ year [9], while for SARS-CoV-2 it is 1.12 Â 10 À3 mutations per site/ year [10]. Recent studies have provided strong evidence that the SARS-CoV-2 spike protein NTD contains epitopes recognized by neutralizing antibodies produced by host adaptive immune response [11]. Moreover, it has been proposed that this region of the spike protein is dynamically involved in host cell surface adhesion, mediating interactions with glycan groups of the cellular glycoenvironment [12]. Therefore, tracking genetic variation in SARS-CoV-2 NTD is important for monitoring emerging strains with potentially higher capability for immune escape or with higher infectivity. To this direction, we investigated the evolution of NTD in b coronaviruses. By comparing the SARS-CoV and SARS-CoV-2 spike structures and analyzing the available mutation data for SARS-CoV-2, we discovered that specific NTD loop elements o which are evolutionary diverged in the SARS-CoV-2 clade, display high mutation rates and drive genetic variation. Interestingly, specific mutations and deletions in these loops are associated with an altered NTD structure, and affect an epitope region that is common among many different antibodies targeting the NTD region of the spike protein. The corresponding epidemiological data for these mutants revealed that specific NTD mutations and deletions have been positively selected over the last waves of COVID-19 pandemic.  [13]. For multiple sequence alignment visualization, the Jalview 2.11 software (http://www.jalview.org/) was used. To investigate the structural divergence of the SARS-CoV-2 NTD with respect to SARS-CoV, the secondary structure of the SARS-CoV NTD, as resolved by the cryo-EM structure (PDB ID: 5X4S), was compared to the cryo-EM structures of SARS-CoV-2 (PDB ID: 6VYB). Phylogenetic tree generation was performed using the Clus-talW2 program.

Mutation analysis
Global mutation data for SARS-CoV-2 NTD sequences were retrieved from the Global Initiative for Sharing All Influenza Data (GISAID) at https://www.gisaid.org/epiflu-applications/phylodynamics/. The frequency over time of SARS-CoV-2 strains harboring mutations on b3-b4, b9-b10 and b14-b15 loops was analyzed through GISAID from January 2020 to July 2021). Nextstrain and WHO nomenclature regarding the major SARS-CoV-2 clades was applied to our analysis for specific variants [23,24].

Ethics
Ethical approval was unnecessary because this work is a metaanalysis of publicly available data.

Role of funders
The authors received no financial support for this research.

Specific loop regions drive the evolutionary divergence of spike NTD protein in b coronaviruses
Sequencing of SARS-CoV-2 genomes from Wuhan patients in China and phylogenetic analysis of representative b coronavirus genomes (Genus: Betacoronavirus) revealed that the subgenus Sarbecovirus of the genus Betacoronavirus could be classified into three dinstict clades. Clade 1 consists of two SARS-CoV-related strains from Rhinolophus sp from Bulgaria (BM-48) and Kenya (BtKY72). Human SARS-CoV-2 sequences and two bat-SARS-like strains from eastern China (bat-SL-CoVZC45 and bat-SL-CoVZXC21) form the clade 2, while SARS-CoV strains from humans and bat SARS-like coronaviruses from southwestern China form the clade 3 [17].
To investigate whether the protein alignment of the spike NTD in b coronavirus members displays a similar phylogenetic profile, we performed a comparative sequence analysis among the spike NTDs of representative b coronaviruses strains. In this direction, residues 15-305 of the SARS-CoV-2 spike protein ( Fig. 1A) were aligned against the NTD sequences of SARS and members of clade 1 (BM48), clade 2 (BM48ZC45, ZXC21) and clade 3 (HKU3, Rp3) Bat-SARS like coronavirus strains. As Fig. 1B indicates, the protein alignment of these sequences showed that SARS-CoV-2 and the related bat-derived strains ZC45 and ZXC21 cluster together, while SARS-CoV and the bat SARS-like sequences cluster separately, demonstrating a similar phylogenetic pattern as the one acquired with the genomic sequences.
Integrating the secondary structure elements of SARS-CoV (as identified from the PDB: 5X4S structure) to this sequence alignment, we observed that overall, the NTD is well conserved within the majority of the b sheets and loops, with the exception of loops separating the b3-b4, b9-b10 and b14-b15 sheets (Fig. 1C). More specifically, SARS-CoV-2 b3-b4 and b14-b15 loops display an extended length compared to SARS-CoV. While an extended b3-b4 loop is also shared among all bat SARS-CoV members, b14-b15 loop extension is restricted only in SARS-Cov-2. To identify the structural differences in the NTD divergent regions between SARS-CoV-2 and SARS-CoV, the cryo-electron microscopy (Cryo-EM) structure of the SARS-CoV-2 spike protein [18] was compared against the SARS-CoV NTD crystal structure [19]. This structural comparison revealed that the length of b3-b4, b9-b10 and b14-b15 loops in SARS-CoV-2 has been evolutionarily extended with regard to SARS-CoV (Fig. 1D).

Interloop interactions and conformational stability of b3-b4, b9-b10 and b14-b15 domains
According to the cryo-EM structure of the SARS-CoV-2 trimeric spike complex and its proposed conformational states [18,20], these loops are highly flexible and exposed on the outer surface of trimeric spike complex, away from the RBD. The protein surface corresponding to these interacting loops is hydrophilic and possesses a positive potential, due to the presence of several charged/hydrophilic amino acids ( Fig. 2A). Notably, the b3-b4 and b14-b15 loops (amino acids 62-80 and 242-263, respectively) are in close proximity, stabilized by electrostatic interactions between amino acids in both loops (Fig. 2B).
Interestingly, 4A8, one of the best characterized neutralizing antibodies [11], was the first member of NTD targeting antibodies to recognize a discontinuous epitope (ID: 1087268, Immune Database and Analysis Resource-IEDB) encompassing b9-b10 and b14-b15 amino acids (Tyr 144, Tyr 145, His 146, Lys 147, Lys 150, Trp 152, His 245, Arg 246, Ser 247, Tyr 248, Leu 249). These amino acids are positioned in the recognition interface within 4 Å distance from the antibody. As mentioned above, several of the epitope amino acids participate in an extensive network of interactions with residues Ala 67, His 69 and Asp 80 (Fig. 4A), and Val 143 (Fig. 4B), suggesting that these interactions are important for antibody recognition.
Recent studies have highlighted the immunogenic properties of NTD and, besides 4A8, a wide panel of neutralizing antibodies have been identified to recognize a NTD supersite [21]. Our structural analysis of epitopes recognized by COV57 [22], S2L28, S2M28, S2X333 [23], 2-17, 5-24, 4-8 and 2-51 [24] antibodies revealed that b3-b4, b9-b10 and b14-b15 loops have an important role in the formation of this universal epitope (Fig. 5A, B). Therefore, mutations or deletions that affect amino acids in these loops are expected to remodel this epitope and alter the binding affinity of these antibodies. Masking of exposed amino acid residues by glycosylation has been described as a general mechanism of viral immune evasion [25]. It is thus important to mention that two glycosylation sites have been identified in loops b3-b4 and b14-b15, at residues Asn 74 and Asn 149 [26].
3.3. Amino acid variability on b3-b4, b9-b10 and b14-b15 loops modulates the evolutionary dynamics of the spike protein NTD region in SARS-COV-2 strains To investigate the distribution of identified SARS-CoV-2 mutations in the NTD secondary structure, mutation data from 2,022,459 high-quality genomic sequences were analyzed through the GISAID SARS-CoV-2 database (https://www.gisaid.org/) as of July 9th, 2021, and aligned to the 1-350 aa sequence of the SARS-CoV-2 spike protein (YP_009724390). We observed that approximately 46.4% of identified NTD non-synonymous mutations in different SARS-CoV-2 strains are found within the b3-b4, b9-b10 and b14-b15 loop regions, indicating a higher degree of variation in these elements compared to other NTD secondary structure elements (Fig. 6A-C). Since the b3-b4, b9-b10 and b14-b15 loops are positioned away from the NTD core and the interaction surface with the RBD, mutations therein may be under reduced selective pressure. On the other hand, most of the identified loop variants affect the solvent-accessible surface of the spike protein, and these mutations can potentially affect the dynamics of intermolecular interactions with sugars or antibodies. This is particularly important for RNA viruses, as it is known that high mutation rate on their proteins is associated with escape from host immune response, higher virulence and altered tissue tropism [27][28][29]. Interestingly, mutations in the b3-b4 and b9-b10 loops display a higher frequency compared to the b14-b15 one, suggesting a more dynamic role of mutations in these two regions during SARS-CoV-2 evolution.
3.4. Specific mutations within the b3-b4 and b9-b10 loops can lead to conformational changes Since we observed that mutations within the b3-b4 and b9-b10 loops are quite frequent, we sought to investigate whether specific  3.5. Epidemiological data provide evidence that specific mutations and deletions within the b3-b4, b9-b10 and b14-b15 loops have undergone a positive selection during SARS-CoV-2 evolution The high mutation rate of viruses provides unique opportunities for natural selection of strains based on greater stability, higher transmission rates and immune escape. As a result, certain variants show increasing representation within the population, through a positive selection [30].
As of June 2021, Nextstrain has identified 13 major clades (19A-B, 20A-20J and 21A). During SARS-CoV-2 evolution, different mutated strains emerged within these clades, displaying high transmissibility, and thus affecting COVID-19 epidemiology. A global monitoring of these strains, also known as variants of concern (VOCs) and variants of interest (VOIs), has been established as a response to COVID-19 pandemic, and a single Greek letter naming scheme has been adopted by the World Health Organization (WHO) for easier labelling. By July 15 1.616 with NTD mutations/deletions in b3-b4, b9-b10 and b14-b15 loops revealed that besides Gamma strain, all the other highly aggressive strains are characterized by multiple alterations on these three loop regions that form the universal NTD epitope (Fig. 7A). Notably, deletions are distributed among all loop regions with a specific enrichment for D69-70 and D144 in b3-b4 and b9-b10 loops respectively, which are present in several different VOCs/ VOIs (Alpha, Eta, B.1.375, B.1.1.616). Moreover, a co-occurrence of D69-70 with D144 exists in Alpha and Eta strains. As a result, the frequency of these two deletions in SARS-COV-2 genotyped sequences from patients has become very high in recent months (Fig. 7B, C). Besides D144, the b9-b10 loop, which has a critical role in the formation of the NTD antigenic supersite, seems to display different patterns of deletions in aggressive strains, as D144-145  Fig. 2). Notably, D241-243, which is present in the Beta strain, affects a region adjacent to the b14-b15 loop, while its impact on the supersite formation has not been studied yet.
In VOCs/VOIs and variants under monitoring, the above pattern of deletions coexists with specific missense mutations. More specifically, mutations in the b3-b4 loop that are associated with the genetic properties of these strains include A67V, G75V, T76I and D80A. On the other hand, G142D, W152C and E154K within the b9-b10, and R246I and D253G in the b14-b15 loop, are the missense mutations identified in the other two loops (Fig. 7A). Despite the mutation/deletion overlap among strains, the different combinations of the genetic variations generate complex virus genotypes (Fig. 7B, C and Supplementary Fig. 2), while the functional implications of these combinations are difficult to assess.
These acquired alterations are expected to modify the network of interloop and intraloop interactions, and thus to induce struc-tural remodeling of the NTD antigenic supersite. In fact, deletions within the above loops have been proposed to decrease the binding and neutralization potency of COVID-19 patient convalescent sera or monoclonal antibodies (mAbs). For this reason, it is considered that these deletions facilitate immune escape and are thus positively selected [32,33]. On the other hand, recent studies have highlighted the role of specific missense mutations in structural changes of the NTD antigenic supersite. Functional data have also correlated these changes with immune escape. For instance, the W152C mutation introduces a free cysteine that can form new disulfide bonds. In the Epsilon (B.1.427/B.1.429) strain, an alternative disulfide bond between C136 and W152C is proposed to drive NTD conformational changes that lead to immune escape from NTD targeting neutralizing antibodies [34]. Moreover, G142D has also been shown to alter the binding of NTD targeting antibodies [23]. Notably, Ala 67 and Asp 80 residues in the b3-b4 loop, which we predict to maintain interloop interactions in the NTD supersite, are mutated in Eta (A67V) and Beta (D80A) strains. In line with our analysis, an NTD remodeling induced by D80A has also been proposed recently [35].

Discussion
SARS-CoV-2 is genetically related to SARS-CoV, a deadly coronavirus that emerged in late 2002 and caused an outbreak of severe acute respiratory syndrome. SARS-CoV was highly lethal but after intense public health measures, was eradicated in 2003 [36]. The new coronavirus SARS-CoV-2 is less deadly but far more transmissible [37]. Moreover, while SARS-CoV appears to infect pneumocytes and enterocytes of the small intestine [38], SARS-CoV-2 can infect multiple organs such as the intestine, liver, kidney and blood vessels [39,40].
To identify divergent structure elements on the SARS-CoV-2 spike protein NTD that could potentially modulate interactions with the host, we performed a comparative sequence and structural analysis on SARS-CoV and SARS-CoV-2 NTDs. As expected, NTD sequences of SARS-CoV and SARS-CoV-2 are highly similar. The most striking difference is the length of loops b3-b4, b9-b10 and b14-b15 which are significantly longer in the SARS-CoV-2 and in certain bat coronaviruses with genomes closely related to SARS-CoV-2, indicating that the structural evolution of these elements is characteristic for the SARS-CoV-2 clade identity. The low degree of homology between sequences corresponding to the respective loops from bat and human b-coronaviruses, suggests that amino acid variations in these elements have a major impact on divergence of spike proteins within the SARS-CoV-2 clade. Although the cryo-EM data suggest that these divergent loop regions are part of a highly flexible NTD region, our molecular modeling data indicate that a network of electrostatic and hydrophobic interactions between several residues of b3-b4 and b9-b10 loops with residues of the b14-b15 loop mediate an interloop communication that provides a relative stability. Residues Ala 67, His 69 and Asp 80 in b3-b4 and Val 143 in the b9-b10 loop were identified to play an important role in these interactions.
It is well accepted that antibodies targeting the RBD confer significantly to the neutralizing activity of convalescent sera [41,42]. In a recent study however, Voss and colleagues analyzed the proteomic profile of IgGs in convalescent sera and demonstrated that the response is directed predominantly (>80%) against epitopes residing outside the RBD, and include the NTD region and the S2 domain [43]. The same study also reported that anti-NTD antibodies contribute critically to neutralization and their protection is related to their relative levels in plasma.
Moreover, while RBD-targeting neutralizing antibodies recognize distinct epitopes, various neutralizing antibodies against the NTD target a common site comprising primarily loops b9-b10 and b14-b15 (N3 and N5 according to Chi et al) [21]. It has been postulated that this region is highly immunogenic in part because it is glycan free, which allows epitope recognition. Moreover, the high flexibility of the loops allows the peptide to assume multiple conformations accommodating recognition by several antibodies. The neutralizing activity of NTD antibodies relies on hindrance, which prevents the spike protein from binding to the ACE-2 receptor [21].
It is well accepted that the conformation of epitopes is essential for the neutralizing activity of antibody responses against SARS-CoV [45], whereas specific mutants can escape the neutralizing activity of certain antibodies [46]. Since conformation changes in the b3-b4, b9-b10 and b14-b15 loops could possibly alter NTD antibody recognition, we analyzed the GISAID mutation data in order to study the sequence variation profile of SARS-CoV-2 divergent loops. Our analysis revealed a disproportionate high rate of mutations in these loops, indicating a dynamic role in spike sequence divergence and evolution within SARS-CoV-2 clade that could enhance immune escape Based on our molecular modeling of the spike 3D structure, Ala 67, Asp 80 and Val 143 residues maintain a rigid network of interactions. In this context, A67V, D80Y, D80A and V143F variants are predicted to rewire the network of these interactions, and to either promote the establishment of new hydrophobic interconnections (A67V) or induce a loss of intraloop hydrogen bonds (D80Y, D80A) and interloop hydrophobic interactions (V143F). This might hinder NTD recognition by neutralizing antibodies from convalescent plasma. In this context, recent studies revealed that in frame deletions of NTD amino acid sequence in SARS-CoV-2 strains that affect b3-b4 (D69-70), b9-b10 (D141-144, D146) and b14-b15 loops (D243-244) are associated with immune escape in patients [32].
In order to identify whether SARS-CoV-2 strains harboring mutations in NTD loops are associated with greater prevalence, we investigated Nextstrain data with a focus on WHO reported VOCs (Alpha, Beta, Gamma, Delta), VOIs (Eta, Iota, Kappa and Lambda) and strains under monitoring (B.   for many of these prevalent strains, recent functional studies have established a link between their high transmissibility and their ability to escape from RBD and NTD targeting neutralizing antibodies. Regarding the escape from NTD targeting antibodies, strong experimental evidence exists for Alpha and Beta [35,[47][48][49], as well as for Delta [50,51] and Epsilon [34] strains.
This analysis revealed that with the exception of the Gamma strain, all other highly aggressive strains harbor multiple mutations and deletions in b3-b4, b9-b410 and b14-b15 loops. The high prevalence of deletions in these loop regions, may indicate that deletions might be subjected to a stronger selection for antigenic drift than missense mutations.
The structural analysis of complexes between neutralizing antibodies and the NTD suggests that the b9-b410 and b14-b15 loops are closer to the antibody binding surface than the b3-b4 loop, which is not directly involved in interactions with antibody residues. However, our structural analysis revealed an important role for the b3-b4 residues Ala 67, His 69 and Asp 80 in interloop interactions that stabilize the b14-b15 loop. Since b14-b15 loop is a critical part of the NTD antigenic supersite, b3-b4 loop mutations may have a critical role in immune escape. Interestingly We acknowledge that all computational data presented here merit experimental validation. Nevertheless, we have uncovered important aspects regarding virus interaction with the host immune system. Our findings could facilitate the generation of monoclonal antibodies and vaccines with better profile against novel fast-spreading SARS-CoV-2 variants.

Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.