Why glycosylation matters in building a better flu vaccine

Low vaccine efficacy against seasonal influenza A virus (IAV) stems from the ability of the virus to evade existing immunity while maintaining fitness. While most potent neutralizing antibodies bind antigenic sites on the globular head domain of the IAV envelope glycoprotein hemagglutinin (HA), the error-prone IAV polymerase enables rapid evolution of key antigenic sites, resulting in immune escape. Significantly, the appearance of new N -glycosylation consensus sequences (sequons, NXT/NXS, rarely NXC) on the HA globular domain occurs among the more prevalent mutations as an IAV strain undergoes antigenic drift. The appearance of new glycosylation shields underlying amino acid residues from antibody contact, tunes receptor specificity, and balances receptor avidity with virion escape, all of which help maintain viral propagation through seasonal mutations. The World Health Organization selects seasonal vaccine strains based on information from surveillance, laboratory, and clinical observations. While the genetic sequences are known, mature glycosylated structures of circulating strains are not defined. In this review, we summarize mass spectrometric methods for quantifying site-specific glycosylation in IAV strains, and compare the evolution of IAV glycosylation to that of human immunodeficiency virus. We argue that the determination of site-specific glycosylation of IAV glycoproteins would enable development of vaccines that take advantage of glycosylation-dependent mechanisms whereby virus glycoproteins are processed by antigen presenting cells. is


Introduction
Viruses replicate by causing infected cells to produce virions that infect other cells. As has been observed by Fodor et al., virion composition determines virus stability, transmissibility, tropism, and immunogenicity (1). In the case of IAV, viral hijacking of the host cell machinery results in error-prone replication. Some host cell proteins are taken up to construct the new virions, which are pleiomorphic in structure. Viral protein mutations caused by error-prone replication result in antigenic drift, which allows the virus to evade neutralization by immune system molecules.
Hence, generating new vaccines annually is a critical effort in public health.
There are 18 known IAV HA, divided into two groups, and 11 known neuraminidase (NA) subtypes (2). Among the possible HA and NA combinations, only H1N1, H2N2 and H3N2 have caused pandemics. Today H1N1 and H3N2 circulate seasonally in humans. As shown in  (13)(14)(15). FluBlok is a licensed recombinant vaccine containing HA grown in insect cells from baculovirus vectors. Unlike eggbased expression, these alternative expression systems do not depend on a large supply of pathogen-free eggs for vaccine production, making them economical choices as well as ways to avoid the lowered immunogenicity that arises from egg adaptation.

The number of HA sequons increases as IAV circulates seasonally in humans.
Glycosylation in the HA stalk region, at or near residues 15, 26, 289, 483, and 542, occurs in all HA forms (see (16) and references therein). These glycans may interact with glycan binding chaperones in the endoplasmic reticulum and appear to play roles in HA trimer assembly (17,18). In contrast to seasonal strains, pandemic IAV, newly introduced to the human population, evades host antibody and innate immune defenses, and penetrates into the deep lung to infect bronchiolar, alveolar epithelial cells, and alveolar macrophages (19). To circulate in humans, these IAV must evade antibody recognition (20,21). Thus, amino acid residues of the HA globular domain mutate rapidly under evolutionary pressure to avoid antibody recognition (22). Newly emerging pandemic IAV typically begin with a low degree of glycosylation of the HA globular domains, but the number of N-glycosylation consensus sites increases as the strains circulate seasonally. The amino acids shielded by N-glycosylation appear not to mutate at a high rate, relative to those that are exposed to antibody binding (20,23).  Most T cell responses are generated against immunodominant viral peptides, which make up only a small fraction of the thousands of processed viral peptides. T Cell populations that recognize glycopeptides presented by MHC-I and MHC-II have been identified (32,33), indicating the significance of glycosylation on acquired immune responses (34).
Immunodominance reflects many factors, including antigen presentation and T cell activation (35). Immunodominant antigens are recognized by large T cell populations, relative to those of subdominant antigens. This hierarchy is a reproducible pattern among individuals.
Immunodominance largely results from the fact that only a small percentage of peptides bind MHC molecules with affinity sufficient for stable presentation to activate CD8 + T cells, a pattern that shows high evolutionary conservation (36). At present, the apparent inability of memory CD8 + T cells to protect against IAV has driven renewed focus on vaccines that elicit an antibody response (29).
Among IAV proteins, the order of antibody immunodominance is approximately HA > NA >>> nucleoprotein (36). It is therefore of interest that the number of HA (and NA) sequons increases during seasonal circulation in humans. In order to evaluate the influence of IAV glycosylation on binding of strain antibodies, researchers currently rely on genetic sequences to predict sequons.
It is known that the number of sequons on the HA globular head domain increases with the amount of time the IAV sub-type circulates seasonally in humans (21,24-26). Genetic information has been used to model HA glycosylation by inserting the generic N-glycosylation chitobiose core structure onto HA crystal structure coordinates (26,37-39). However, information about the size and composition of glycans at individual sites and their impact on antigenic integrity is available for only a few strains. Comparing how glycosylation is used to evade the human immune system, in IAV and human immunodeficiency virus 1 (HIV-1) While IAV and HIV-1 differ in their replication mechanisms, both exhibit sufficient antigenic variation in their surface proteins over time in the human population to evade the protection conferred by standard vaccine strategies. In both cases, the ability of the virus to evolve reduces the efficacy of vaccines. Glycosylation of the HIV-1 envelope protein trimer, consisting of gp120 and gp41, corresponds to about half its mass (43). The underlying protein is highly mutative and evolves constantly to evade host antibodies. The high density of N-glycosylation limits the accessibility of glycan biosynthetic processing enzymes, resulting in a shield of primarily high mannose N-glycans that facilitate viral escape by interfering with proteolytic processing of envelope peptides for presentation by the major histocompatibility complex (44,45). While broadly neutralizing antibodies to envelope protein have been identified that either tolerate the dense glycan shield or bind to epitopes that contain glycans, it has not been possible to formulate a vaccine that elicits such responses (46). By contrast, glycosylation of IAV HA appears to interfere with receptor binding and/or membrane fusion if too many sequons are occupied on the head domain (47-49).
The simplest mechanism whereby IAV escapes neutralizing antibodies of the adaptive immune system occurs through mutation(s) that diminish antibody binding affinity (50). In some cases, mutations can cause allosteric effects that decrease the antibody access to the epitope (51).
Mutations that increase the avidity of HA for the host sialic acid receptor may cause IAV to bind host cells more avidly than competing antibodies (52,53). Amino acid mutations may create new avidity of an amino acid mutation as a mechanism for maintaining fitness (58). Increased glycosylation may also compromise viral fitness by enabling the binding of HA by lectins of the innate immune system (59-62), or by negatively impacting assembly of stable HA trimers in the ER (16,24,38,49,63-65). Lectins, including surfactant protein D (SP-D) and mannose binding lectin (MBL), neutralize IAV by binding to glycosylated HA. While these interactions depend on the glycan structures present at each glycosite, they cannot be predicted from HA sequence information alone.

Glycosylation and antigenic cartography of influenza viruses
Antigenic cartography (66) is used to assess the antigenic distance among HA molecules from different IAV strains (67)(68)(69). Antigenic distances are calculated from hemagglutination inhibition and microneutralization assays (69). Wan et al. developed a 3D antigenic cartography construction and visualization resource to study strain candidates for vaccines (68). Given its roles in shielding underlying protein sequences from antibody binding, glycosylation is likely to impact antigenic cartography of a given IAV strain. Expanded knowledge of site-specific glycosylation in different IAV strains, including the range of glycoforms present at each site, would enable the correlation between antigenic distance calculation and HA glycosylation. This would be a boon to efforts in predicting the pandemic potential of zoonotic viruses. It would also facilitate vaccine planning by improving the ability to predict whether a given seasonally circulating virus will likely escape vaccines.

Towards a broadly neutralizing IAV vaccine
The major HA antigenic sites in the head domain show high rates of mutation, including the addition of new sequons (70). At the same time, the evolution of receptor binding sites and the stem domain is much more limited to conserve their functions (71-73). Antibody escape mutants occur in five major head domain antigenic clusters (50). Neutralizing antibodies appear to target regions proximal to the receptor binding site and mutations responsible for antigenic drift tend to occur within these proximal regions (74)(75)(76)(77)(78). This may be related to the accumulation of N-glycosylation on the head group that shield underlying antigenic sites (79).
Such glycosylation will disrupt binding of sialic acid residues if it occurs too close to the receptor binding site, thus leaving an opening for neutralizing antibodies to bind.
In principle, broadly neutralizing antibodies can target the conserved sites of the receptor binding and stalk regions, respectively, which are present across different IAV strains (79).
However, most human antibodies against HA (and NA) bind hypervariable residues, not those conserved among IAV strains. Efforts to generate broadly neutralizing antibodies have focused on the receptor binding site and the highly conserved stalk region (80), for which escape mutants would disrupt key viral functions and therefore have a high fitness cost. For this to work, however, it is necessary to direct the immune system away from the immunodominant variable resides towards subdominant residues that are conserved among strains.
Although most neutralizing antibodies target residues proximal to the receptor binding site, some appear to mimic the sialic acid receptor itself and bind to conserved residues, thus offering the potential for a broadly neutralizing response (78). Some researchers have pointed to the lack of accessibility of the stem region for the lack of broadly neutralizing antibodies from vaccines (81), while others have noted that the stem region of HA in virions should be accessible to antibody binding based on structural studies (82). Improved understanding of the dynamics of IAV immunodominance in human populations will be necessary in order to design vaccine strategies that succeed in generating antibodies against conserved epitopes that confer broadly neutralizing projection against IAV (83). Structural analysis of HAs from pandemic and seasonal IAV indicates that while the HA fold is conserved, the surface properties and glycosylation patterns differ significantly among subtypes (80). A large-scale in vitro mutational analysis of the H1 and H3 HA receptor binding site identified many replication-competent mutations not yet observed in nature, indicating that the receptor binding site can accommodate much more sequence diversity than previously believed (84). These researchers noted that many deleterious single mutations were viable when Mass spectrometric analysis of viral glycoproteins has been summarized in a recent review (86). Downard et al. developed an approach for using accurate mass measurement of proteolytic peptides of IAV proteins, referred to as proteotyping, to identify HA and NA from circulating IAV types and subtypes. The accurate mass values constitute signatures for conserved regions of IAV proteins that enable virus typing (87,88). The investigators used this approach to differentiate seasonal strains from pandemic H1N1 (89)(90)(91) and study the evolution of H5N1 strains (92) and NA subtypes (93). They developed computer algorithms to identify virus reassortants from whole virus digests (94). FluShuffle considers combination of viral protein identities that match the mass spectral data using Gibbs sampling. FluResort uses those identities to calculate the weighted distance of each across two or more phylogentic trees through viral protein sequence alignment. As an extension to this approach, the FluClass algorithm performs phylogenetic classification using MS data starting from DNA-or proteinbased phylogenetic trees (95). The MassTree algorithm identifies and displays protein mutations and calculates mutational frequencies across phylogenetic trees for studies of IAV evolution (96)(97)(98).

IAV glycoproteomics
As reviewed (99), mass spectrometry has been used in proteomics studies of IAV proteins and in mass profiling of tryptic peptides and glycopeptides. An early pioneering study characterized N-glycosylation on three IAV strains (100). An LC/MS method has been used to characterize glycoforms at specific sites using alternating high and low collision energy values (a data independent analysis (DIA) experiment known as MS E ) combined with multiple reaction monitoring assays as a means of comparing recombinant HA samples as vaccine candidates (14,101). These investigators used this approach to analyze site-specific glycosylation in a series of engineered H3N2 HA variants with added sequons that mirror those that appeared by guest on March 6, 2020 https://www.mcponline.org

Downloaded from
Flu Glycosylation p. 14 during seasonal circulation since 1968 (49,102). They also characterized HA glycosylation in a series of engineered H5N7 as part of an effort to define glycosylation structure-function relationships in this avian IAV strain (103). In additional work, they also examined glycosylation in a set of reference HA antigens used in influenza vaccine potency testing (104). We have used site-specific glycosylation information to model interactions between HA and surfactant protein-D (60,102,105-107). Mass spectrometry methods for assigning site-specific glycosylation. For detailed glycoproteomics reviews, see (110)(111)(112)(113)(114)(115)(116). As shown in Figure 3, glycopeptide glycoforms elute from a reversed phase chromatography column over a narrow retention time window.

MS workflows for assigning glycopeptides.
Identification of site-specific glycosylation requires tailored analytical and bioinformatics methods. Proteomics workflows identify and quantify proteins based on prediction of peptide tandem mass spectra from genomic databases. While small PTMs have single predictable mass shifts, glycosylation at a given site is heterogeneous, pushing confident site-specific assignment there may be problems determining which precursor peak produced the glycopeptide fragments.
Despite these limitations, investigators have used low collision energy settings to produce Ytype ions for identification and quantification of IgG glycopeptides in a complex matrix of human plasma (120)(121)(122). A SWATH DIA method was used to quantify high mannose N-glycopeptides from yeast using manually created glycopeptide libraries (123,124). Researchers developed a DIA strategy to quantify 25 N-glycopeptides from plasma using a search space of 161 glycoforms for a study of liver cirrhosis (121). Others have used DIA to produce comprehensive glycosylation maps of human serum IgM using extracted ion chromatograms of shared peptidespecific fragment ions to filter related glycoforms for a given glycosite (125). This approach allowed identification of glycopeptides with unexpected modifications. A targeted DIA method  by guest on March 6, 2020 Figure 3. Comparison of acquisition methods for tandem MS of glycopeptides. Extracted ion chromatograms for glycopeptide IADTNITTIPQGLPPSLTELHLDGNK glycoforms are shown, illustrating that a large number of glycoforms elute over a narrow retention time range using reversed phase chromatography LC-MS as described (128). Automated precursor ion selection using data dependent acquisition, targeted precursor ion selection using parallel reaction monitoring (PRM) and data independent acquisition are compared.
by guest on March 6, 2020