Principles of SARS-CoV-2 glycosylation

The structure and post-translational processing of the SARS-CoV-2 spike glycoprotein (S) is intimately associated with the function of the virus and of sterilising vaccines. The surface of the S protein is extensively modified by glycans, and their biosynthesis is driven by both the wider cellular context, and importantly, the underlining protein structure and local glycan density. Comparison of virally derived S protein with both recombinantly derived and adenovirally induced proteins, reveal hotspots of protein-directed glycosylation that drive conserved glycosylation motifs. Molecular dynamics simulations revealed that, while the S surface is extensively shielded by N-glycans, it presents regions vulnerable to neutralising antibodies. Furthermore, glycans have been shown to influence the accessibility of the receptor binding domain and the binding to the cellular receptor. The emerging picture is one of unifying, principles of S protein glycosylation and an intimate role of glycosylation in immunogen structure and efficacy.


Introduction
The severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) spike (S) glycoprotein is important for the assembly and stability of the virus [1]. The entry of SARS-CoV-2 is primarily mediated by the interaction between S and the ACE2 receptor, expressed on the surface of the host cell [2,3]. The S glycoprotein comprises two subunits, S1 and S2. The receptor-binding domain (RBD) in S1 subunit interacts with the ACE2 receptor [4]. The S protein is extensively glycosylated, typically encoding 66 N-linked glycosylation sites per trimer and a lower number of O-linked glycans present at low occupancy [5][6][7]. The high density of viral glycosylation has prompted investigations into the functional role of glycans in a wide range of settings, from its impact on protein folding to its influence on immunogenicity.
The trimeric S protein is a key target for sterilizing antibody responses [8,9] and a key component of all currently approved vaccines [10][11][12][13][14][15]. S protein has also been used as reagent in serological testing [16,17], principally because of its primary role in viral infection and its exposure on the viral surface. There is a wide range of S protein constructs across the currently approved vaccines, which rarely utilize native sequences, with exception of the Sinovac vaccine construct [12] which corresponds to a deactivated virus with a native S protein sequence. Other formats are based on a range of non-native sequences and even the effective adenoviral-based ChAdOx1 nCoV-19 (AZD1222) vaccine [18] encoding native S protein sequence is codon optimized to improve expression level in human cell lines. Other design features used within the trimeric S protein formats include C-terminal truncations to remove the hydrophobic transmembrane regions required for solubilization and the use of stabilizing mutations, such as in the 2P, HexaPro, and S-closed formats [19][20][21]. For example, the RNAbased Pfizer/BioNTech vaccine encodes a codon-optimized S protein with the 2P mutations [10]. The proline mutations are incorporated in these formats to maintain the S protein in prefusion conformation as there is enhanced presentation of neutralizing antibody (NAb) epitopes compared to the post-fusion conformation [22]. Notably, not all vaccine formats rely on trimeric S protein and the RBD-based constructs have shown promising immunogenicity [23] and safety [24].
The wide range of constructs at the basis of currently approved SARS-CoV-2 vaccines suggests that immunogenicity is not sensitive to precise glycan processing. In one extreme, robust immune responses have been generated from material derived from insects [25][26][27], plants [28,29], and mammalian expression systems [10,15]. Even in mammalian derived immunogens, wide range of cell-specific glycosylation is utilized from common recombinant expression systems such as HEK 293 to in vivo production by adenoviral and RNA-delivery systems [10,13]. In the latter case, it is anticipated that the bulk of production is located within phagocytic immune cells, which would be expected to impact cellular-specific glycosylation [30]. Previous studies in HIV-1 have shown differences in glycan composition at the basis of producer cells such as presence of lactosaminoglycans in macrophages whereas these glycan compositions are absent in peripheral blood mononuclear cells (PBMCs) [30]. Despite the apparent insensitivity of the immune response to glycan processing, terminal glycan structures could have implications for viral infectivity [31]. For example, S protein derived from pseudovirus expressed in lung epithelial cells have shown higher processing such as high sialylation content compared to protein expressed in HEK 293T cells and has shown to be associated with increased infectivity [31]. However, the fine processing of glycans does not play critical role in influencing immunogenicity, in fact, truncation of S protein glycosylation enhances the immune response [31]. Even the bulk change in glycosylation of S protein using chemical inhibitor, kifunensine, does not influence serological properties [32]. Despite these observations, at site-specific level, glycans on S are critical for maintaining the quaternary structure [33] and thus indirectly influences immunogenicity and antigenicity. Furthermore, in HIV-1, immunogen glycosylation is known to enhance the humoral immunity by influencing antigen trafficking to immune cells [34]. Overall, the glycosylation of SARS-CoV-2 S protein has a weaker impact on the immunological properties compared to more densely glycosylated protein such as HIV-1 [35].
Beyond the impact of glycans on the immune response to immunogens, there is growing evidence that viral glycosylation contributes to the inflammatory response to viral material [36]. Thus, understanding the fine structure of SARS-CoV-2 S protein glycan shield on the virus and of different immunogens can contribute to the understanding of the role of glycosylation in natural infection and vaccine development. Here, we review the parameters that are structurally affecting the glycosylation of the S protein and assess the extent to which recombinant S protein mimics that of the native virus (Figure 1).  [37]. (B) Model of representative SARS-CoV-2 virus highlighting one S protein on the envelope, has been adapted to highlight a single spike from Yao et al. [37]. (C) Fine glycan processing of viral S representing content of oligomannose-type glycans and heat plot of percentage of oligomannose-type glycans of viral-derived S protein, using reported abundances [38,39]. (D) Glycan processing of recombinant S, values of which obtained from data produced earlier [38,39]. The glycan composition is categorized in three groups based on the abundance of oligomannose-type glycan content: green (100-80%), orange (79-30%) and magenta (29-0%). The heat plot represents the percentage of oligomannosetype glycans at each site from a scale of 0% (white) to 100% (green).

Role of glycans in protein folding and stability
The N-linked glycosylation of S protein is essential for folding and assembly in both recombinant formats and in viral settings. Glycosylation mediates interactions with the calnexin/calreticulin folding pathway and secretion checkpoint. Inhibitors targeting entry into the pathway, for example, inhibition of α-glucosidases by iminosugar-containing/based compounds, such as N-butyldeoxynojirimycin (NB-DNJ) and N-butyldeoxygalactonojirimicin (NB-DGJ), have led to a marked reduction in viral infectivity [40]. Beyond the capacity for J o u r n a l P r e -p r o o f glycans to mediate chaperone interactions, they can also impart physicochemical stability. Furthermore, the high degree of N-linked glycosylation of the SARS-CoV-2 S protein suggests a selective pressure on the virus that has driven the density of the glycan shield. Among the many different key functions driving the evolution of the shield, glycans can be aiding S folding and structural integrity [33,41], facilitate the interaction with cellular factors [42,43], or be shielding underlying epitopes [33,44]. Because of this key functional role, the glycan shield evolved significantly along the phylogeny, and it is continually evolving [45], with the very recent loss of N370 glycosylation due to T372A mutation in SARS-CoV-2. Through molecular dynamics (MD) simulation, it has been observed that presence of both N234 and N370 results in tying the closed RBDs together, and likely hinders the RBD opening [45][46][47]. This possibly explains the absence of N370 glycan site in SARS-CoV-2. Similarly, we can speculate that SARS-CoV-2 glycosylation carries the signature of selective pressure from its host prior to zoonosis.
There is a direct relationship between the spatial accessibility of the glycosylation sites and the processing state [48,49]. Within the context of the S glycoprotein, MD simulations have shown that the N-glycan at position N234 is one of the least accessible, in both closed and open S conformations see Figure 2, which explains the high abundance of large oligomannose-type glycans at this site [6,33]. The N234 glycan is also one the most ordered glycans observed in cryo-EM structures [19] and this glycan has also been shown by MD simulations to stabilize the RBD open conformation and its dynamics [33,45]. MD simulations experiments [33] have shown that the N234 glycan is able to access and effectively fill the cleft left vacant by the opening of the RBD and thus supporting its stability. This function is also supported by the N165 glycan, located in proximity ( Figure 2). Indeed, deletion of these glycans through single point N234A, N165A and N343A mutations corresponds to a significantly reduced binding to the ACE2 receptor [33,41,50]. However, none of these single point mutations fully abrogate the binding suggesting that multiple residues are involved in RBD opening [33,41]. Further insight from MD and cryo-electron micrograph (cryo-EM) studies have shown that glycosylation at N343 is essential for gating the RBD opening and closing [41,45]. The N234 and N165 shields the receptor binding motif (RBM) consistently in both 'close' and 'open' state whereas shielding by N343 decreases with RBD opening [41,45] (Figure 2). Overall, the interaction of the clustered N165, N234, and N343 glycosylation sites with the surrounding protein helps stabilize the orientation and dynamics of the open RBD conformation allowing the different possible orientations and thereby supporting receptor recognition [33,41,45] ( Figure 2B). showing the interaction between neighbouring glycans, N165, N234 and N343. The RBD is accessible for ACE2 binding in 'open' state. The N343 glycan act as a "glycan gate" as it pushes the RBD from "down" to the "up" conformation. The N234 is modelled with Man9GlcNAc2 glycan represented in green, N165 and N343 are modelled with biantennary complex-type glycans [38,39]. The RBD is shown in cyan, and the remaining S protein highlighted in grey. These models are reproduced from previous studies [41,45].

Role of protein structure in glycan maturation
The glycosylation processing of viral glycoproteins is highly dependent on the accessibility of the glycan sites to the glycan processing enzymes in the ER and Golgi apparatus [51,52]. The glycan maturation is strongly influenced by the glycosylation site accessibility and can sterically be impaired by the local architecture of the protein [52]. In contrast to N-linked glycans which are added co-translationally in the ER, O-linked glycans are added in the Golgi apparatus and their presence is influenced by both the intrinsic susceptibility of the sequence to modification but also the local steric accessibility. Importantly, the proximity of potential O-glycosylation sites to the furin cleavage site, that generates the separate S1 and S2 subunits, means that their modification can impact viral maturation and infectivity [53].
One extreme example of steric constraints limiting glycan maturation within the S protein is the N234 site where limited enzymatic access results in almost exclusively oligomannose-type glycans being presented at that site [6,33,38] (Figure 3A). More broadly, comparison of S protein from wide variety of laboratories and expressed in different mammalian cell lines demonstrated consistent subpopulation of oligomannose glycans, indicating that this is an intrinsic property of the glycoprotein [38]. These mannose signatures were also observed in spike protein derived from infectious virus [38,39]. However, when only the S1 subunit is expressed, N234 is fully processed to complex-type glycosylation [39] consistent with the relaxation of steric restrictions to processing in the monomeric form [7,27,39] (Figure 3B).
Sialylation was another significant difference observed between S1 and trimeric S, the former expressing high glycan content of sialylated glycans with high branching compared to the latter which showed low sialylated glycan content including mostly mono-sialylated glycans [27]. Likewise, an increase in glycan content of O-glycosylation was observed in S1 recombinant protein compared to trimeric S. For example, T323 is more occupied in S1 recombinant protein, and T678 O-glycosylation was observed only on the isolated S1 subunit suggesting more accessibility to Golgi resident transferases [39,53] (Figure 3B). However, within the native trimeric sequence furin cleavage unmasks the T678 site which is modified around 25% with O-glycan structures [39]. Furthermore, when isolated RBD was expressed as a recombinant protein reveals high occupancy of O-glycosylation compared to trimeric S protein [7]. For example, T323 is fully occupied in monomeric RBD whereas this site shows low occupancy on trimeric S-protein [7]. In addition, the T470 O-glycosylation site was identified only on monomeric RBD, albeit at very low abundance [7] (Figure 3C). Similarly, the N-glycosylation sites, N331 and N343 of monomeric RBD revealed elevated branching and terminal processing when compared to trimeric S-protein [38]. These glycan characterizations of different viral protein formats highlight the impact of protein architecture on glycan structure. Although a wide variety of protein formats have shown to be effective vaccines, the changes in glycosylation may impact the immunogen efficacy, as has been observed in case of HIV-1 [34].  [7].The N-linked glycosylation takes place at specific sequon, Asn-X-Ser/Thr (X is any amino acid except proline) whereas O-linked glycosylation is not dictated by specific sequon and occurs on serine and threonine in exposed regions. The N-linked glycosylation is presented in three categories on basis of oligomannose content as described in Figure 1, oligomannose (green), hybrid (orange) and complex-type (magenta) glycans. The O-linked glycosylation at T323 site (see magnification) on trimeric S is present at low levels (0.2%) of which values obtained from Eldrid et al. [7]. (B) The glycan composition of recombinant monomeric S1 subunit of which values reproduced from Wang et al. [27] and Brun et al. [39]. Most of the N-glycan sites on S1 subunit is highly processed represented in magenta except N657 which is unoccupied, represented in wheat color. The O-glycosylation is present on S1 subunit at sites, T323 and T678 (see magnification). (C) The glycan composition of monomeric RBD protein (cyan) which binds to main host receptor (ACE2). The N-glycan sites of RBD is highly processed represented in magenta of which values reproduced from Allen et al. [38]. The O-glycosylation was observed on monomeric RBD protein at sites T323 and S469/T470 (see magnification) [7].

Role of glycans in viral entry
The initial step for SARS-CoV-2 infection in the host cell is to bind with the host cell receptor and enter the target cells. While additional factors can enhance entry (see below), entry is principally mediated by S-ACE2 interaction [4,5,54]. ACE2 is ubiquitously expressed on epithelial, endothelial, and blood cells [55]. The viral S protein and ACE2 receptor both are extensively glycosylated, with ACE2 exhibiting 14 N-linked glycans across the dimer [5,6,19]. O-glycosylation was found at low levels on both ACE2 and S protein [5,7,39]. Previous studies have determined the effect of N and O-glycosylation processing of both S and ACE2 on S-ACE2 binding [56,57]. The glycan engineering of ACE2 did not significantly impact the ACE2-S binding [56,57] whereas that of S glycosylation had modest influence on binding [57]. In contrast to these recombinant settings, glycan processing of S protein can influence viral entry [57].
Beyond the impact of glycan maturation on receptor recognition, modification of the furin cleavage by O-linked glycosylation can impact viral entry. For example, P681H and P681R mutation found in highly transmissible alpha and delta variants respectively, decreases Oglycosylation which potentially increases furin cleavage and may influence viral infectivity [53]. Together these results have highlighted the importance of S protein glycosylation on viral J o u r n a l P r e -p r o o f entry and abrogation of these glycans using inhibitors provides insight into intervention strategies to target the SARS-CoV-2 infection [58,59].
Although ACE2 is necessary for infection, there are carbohydrate-based attachment factors that can enhance infection particularly in cases where ACE2 expression is limiting [60]. This has the effect of broadening the viral tropism beyond that which is achieved with ACE2 expression alone. For example, cell surface heparan sulfate (HS) is recognized by SARS-CoV-2 S protein at a separate site from the site involved in ACE2 binding and enhances infection [61][62][63]. Specifically, HS binds to RBD and enhances the open conformation of RBD, thus exposing the RBD for ACE2 interaction and viral infection [61][62][63]. Furthermore, sialylated glycolipids, have shown interaction with RBD and facilitating in viral entry [64], which has also been exploited for diagnostic purposes [65].
Additionally, viral glycans can also be recognised by soluble and cellular lectins which play an important role in recognition. Nuclear magnetic resonance (NMR) studies and MD simulations have revealed the binding of human lectins such as galectins with N331 and N343 glycan sites of RBD [66]. Mannose-binding lectins (MBL) can recognize glycans on S protein, and have shown role in vitro in inhibiting SARS-CoV-2 infection by activation of lectin mediated complement pathway [67]. Furthermore, there are cellular proteases such as transmembrane serine protease 2 (TMPRSS2)which enhance viral infectivity by cleaving both S and ACE2 protein [54]. These attachment factors mediate the viral S protein interaction with ACE2 and their inhibition reduces viral infectivity [54,62,63,68].

Exploiting breaches within the glycan shield
Despite the extensive glycosylation and dynamics of the S protein, antibodies can still recognize breaches within the glycan shield of S protein and can neutralize the virus [33,69]. Previous studies have demonstrated the glycan composition and accessible surface area (ASA) of these glycans using liquid chromatography mass spectrometry (LC-MS) and MD simulations respectively [6,33,38,69]. Despite the extensive coverage of glycans on the head domain, the glycans are sufficiently sparse for there to be still extensive vulnerabilities to antibody binding [33,69] (Figure 4A). In contrast, the glycans in the stalk region have shown effective glycan shielding in MD simulation studies suggesting less accessibility for antibodies to bind this region [33,69]. Recent studies have shown the isolation of antibodies against the head region including RBD, RBM, NTD and S2 subunit from COVID-19 patients [70][71][72][73]. The antibodies against RBD such as C002 and SM211 have shown recognition of quaternary epitope on the RBD. The C002 neutralizing antibody (nAb) binds to a region which spans two RBD's, one in 'down' conformation adjacent to RBD in 'up' conformation ( Figure 4B). The S2M11 antibody binds to two neighbouring RBDs and stabilize the close conformation of trimeric S protein ( Figure 4B). Furthermore, Abs have also been elicited against NTD present on the head of the S protein [70,71,74]. One such antibody is 4A8 highlighted in Figure 4B, which recognizes the NTD in an identical region despite the dynamics of RBD and have also shown to stabilize the NTD region. The N149 glycan site which is present in NTD might also be involved in NTD-4A8 interaction [70] (Figure 4B).
In addition to the view of accessibility derived from static models, considering the dynamics of the system further enhances our understanding of the antigenic surface. Dynamic motions of the S protein can influence the glycan-glycan interaction and can bring the glycans of different domains in close contact. Using the network analysis approach, the glycans responsible for effective shielding were predicted [69,75,76]. The glycans, N234 and N165, involved in RBD dynamics demonstrated stronger glycan-glycan interaction in the open state compared to the closed state as shown in Figure 2B. The glycans at sites N603 and N616 have been shown to play a central role in connecting the upper head with the lower head as these glycans are responsible for the proper shielding of the S protein [69]. Neutralizing antibodies exploit spaces between these shielded region where the glycan density is low or bind to epitopes containing glycans as has been observed in case of HIV-1 ( Figure 4B) [75,77].  [33]. The RBD present on the head region of the S is highlighted in cyan. The stalk region is effectively shielded with glycans. The mesh network represents the virion membrane on which S protein is embedded. (B) Illustration of antibodies binding within the glycan shield of S protein obtained by structural alignment (www.pymol.org). The 4A8 nAb (pink, PDB 7C2L) targets the NTD region of the S protein [70]. The C002 neutralizing antibody (nAb) (yellow, PBD 7K8T) is binding to RBD "up" conformation [73]. The S2M11 (green, PDB 7K43) nAb recognizes a quaternary epitope consisting of two neighbouring RBDs and stabilizes the trimeric S in closed state [72].

Further glycobiology perspectives in vaccine development and beyond
It has been established that underlying protein architecture has significant impact on glycosylation [78]. In one extreme trimeric viral spike exhibit oligomannose-type glycans whereas RBD domains are almost entirely devoid of such structures [6,33,38]. Given the known influence of glycan processing on immunogen targeting, it will be fruitful to understand J o u r n a l P r e -p r o o f the impact of protein formats on vaccine efficacy. Similarly, there is considerable opportunity to enhance the understanding of RNA-based [10] and adenoviral-based delivery system [13] by considering the range of vaccine materials derived from different cellular sources upon immunization [39,79]. It is already established that protein modification can influence antigen stability, and it is likely that such modifications will also impact antigen distribution and cellular presentation [34,80,81]. Overall understanding the interplay between protein architecture, production routes, and glycan processing will illuminate how viral immunogens work and aid in their optimization.