Maturing Glycoproteomics Technologies Provide Unique Structural Insights into the N-glycoproteome and Its Regulation in Health and Disease*

The glycoproteome remains severely understudied because of significant analytical challenges associated with glycoproteomics, the system-wide analysis of intact glycopeptides. This review introduces important structural aspects of protein N-glycosylation and summarizes the latest technological developments and applications in LC-MS/MS-based qualitative and quantitative N-glycoproteomics. These maturing technologies provide unique structural insights into the N-glycoproteome and its synthesis and regulation by complementing existing methods in glycoscience. Modern glycoproteomics is now sufficiently mature to initiate efforts to capture the molecular complexity displayed by the N-glycoproteome, opening exciting opportunities to increase our understanding of the functional roles of protein N-glycosylation in human health and disease.

Protein glycosylation encompasses a broad class of posttranslational modifications involving the covalent attachment of complex carbohydrates (glycans) to specific amino acid residues of polypeptide chains. The human biosynthetic machinery catalyzes diverse types of glycosylation, with the best studied being attachment of glycans to asparagine (N-glycosylation) and serine/threonine (O-glycosylation) residues (1). As with all types of protein glycosylation, N-glycosylation is a template-less modification synthesized by a suite of glycosylation enzymes in the secretory pathway, Fig. 1A (2). Template-less synthesis means that glycosylation is determined by the physiological state of the glycosylation machinery and the nature of the proteins undergoing glycosylation. Jointly, these attributes determine the repertoire of glycans present on synthesized glycoproteins (glycoforms) and create the important features of protein site-and cell-specific glycosylation (3)(4)(5). Protein glycosylation is therefore a spatiotemporal dynamic modification that cells can utilize to respond to the constantly changing milieu.
N-linked glycans are typically present on asparagine residues in AsnXxxSer/Thr, Xxx Pro consensus sequences (sequons) in humans. This preference is caused by specific recognition of the sequon by the peptide-binding site of an oligosaccharyltransferase (OST) 1 (6), the enzyme which catalyzes this reaction. However, it is now clear that mammalian cells also have the ability to rarely glycosylate more relaxed sequons (e.g. AsnXxxCys) (7)(8)(9)(10)(11)(12)(13). The use of such nonconsensus sequons seems to be more frequent in rodents, where even glutamine-linked glycosylation has been reported (14). Low efficiency N-glycosylation in these noncanonical sequons is consistent with a role of the canonical sequon in recognition and high-affinity binding to OST to promote glycan transfer (15). Mammalian N-glycans share a common trimannosylchitobiose core comprised of three mannose (Man) and two N-acetylglucosamine (GlcNAc) residues, extended with a variety of monosaccharides including Man, GlcNAc, galactose (Gal), fucose (Fuc), N-acetylgalactosamine (GalNAc) and sialic acids such as N-acetylneuraminic acid (NeuAc) and N-glycolylneuraminic acid (NeuGc). N-glycans can be further modified by noncarbohydrate moieties including phosphorylation, sulfation and acetylation (16,17). The conserved trimannosylchitobiose core of N-glycans is a remnant of the N-glycan precursor (Glc 3 Man 9 GlcNAc 2 ) initially transferred to proteins in mammalian cells. This oligosaccharide structure is built stepwise on a dolichol pyrophosphate carrier embedded in the endoplasmic reticulum (ER) membrane, and is transferred en bloc to nascent polypeptides by OST. The terminal Glc and Man play critical roles in assisting glycoprotein folding and in ensuring glycoprotein quality control in the ER. After glycoproteins are correctly folded, the terminal Glc and Man are generally removed by ␣-glucosidases and ␣-mannosidases in the ER and cis-Golgi. N-glycans can then be extended by glycosyltransferases in the multiple Golgi compartments, potentially resulting in an extreme diversity of structures on mature glycoproteins in an organism-, cell-, or regulation-specific manner. The diverse mammalian N-glycans can be crudely classified into three conventional classes: high mannose, hybrid, and complex type, Fig. 1B. However, it is becoming clear that other unconventional N-glycan classes such as paucimannosidic and chitobiose core types decorate some mammalian glycoproteins (see below) (7, 13, 18 -21). The structures, biosynthetic pathways, and associated disorders of N-glycosylation have been recently reviewed elsewhere and readers are encouraged to use these resources for a deeper introduction (16,22).
A substantial proportion of mammalian genomes is dedicated to genes encoding proteins involved in glycosylation pathways, and these are highly conserved. Consistent with this, glycosylation is central to many biological processes. N-glycans are critical for enabling efficient glycoprotein folding and for maintaining the structural and functional integrity of glycoproteins (16). Protein N-glycosylation is also intimately associated with development processes (23,24), with facilitating or preventing bacterial binding to the host (25) and with sustaining the normal function of individual cells, tissues, organs, and organisms (26). Finally, protein glycosylation is a strictly regulated modification process in healthy cells, and these biochemical processes are dysregulated in various pathologies including, but not restricted to, cancer (27,28), inflammation (29), Alzheimer's disease (30), multiple sclerosis (31), and cystic fibrosis (25). Disease-associated changes in protein glycosylation may arise from changes in glycoprotein abundance, glycosylation site occupancy (macro-heteroge-FIG. 1. Overview of the biosynthesis and structural classes of mammalian protein N-glycosylation. A, Schematic summary of the biosynthetic machinery of N-glycoproteins. The enzymatic processing, which is initiated while the glycoproteins are still being translated, translocated, and folded, may terminate at any point in the enzymatic sequence depending partially on the Asn solvent accessibility of the maturely folded glycoprotein (3). This generates site-, cell-, and even subcellular-specific glycoform heterogeneity forming one of the functionally most important features of the glycoproteome (5), and also creates substantial analytical challenges. TGN: trans-Golgi network. B, Mammalian N-glycoproteins are typically divided into three main N-glycan classes: high mannose, hybrid, and complex type. Unusual paucimannosidic and chitobiose core type N-glycans arising from unconventional truncation pathways (dashed box) have been reported in specific cell types and physiological conditions (21). Monosaccharide residues are depicted according to the establish nomenclature (180) with residual monoisotopic masses provided. neity or "glycosylation efficiency"), or glycan micro-heterogeneity at different sites on a glycoprotein (see details below). Changes in the glycoprotein micro-heterogeneity are dictated by the capacity of glycan-processing enzymes (glycosidases and glycosyltransferases) in the glycoprotein biosynthesis pathway, the nature of specific glycoprotein substrates, and other cellular factors (3,4).
This review summarizes the present analytical tools and technologies capable of performing large-scale (systemwide) analysis of protein N-glycosylation micro-heterogeneity and the unique structural insights that can be derived from such experiments by covering the very latest literature describing recent technological developments and applications in LC-MS/MS-based qualitative and quantitative N-glycoproteomics.
System-wide Structural Analysis of Protein N-glycosylation-Deciphering the glycosylation 'code' has been the ambition of generations of glycobiologists. The ability to accurately characterize the structure of glycoproteins is necessary if we are to succeed in our quest to unravel the diverse functions of glycans and develop the next generations of glycoprotein-based therapeutics (32). However, glycoproteins are challenging to characterize because of the multiple layers of structural diversity that form a spectrum of chemically similar glycofoms. The information needed to unambiguously characterize such heterogeneous glycoproteins is therefore consequently much larger than for unmodified proteins or for proteins with structurally simple modifications such as phosphorylation or methylation. Even the most detailed modern glycoprotein characterization studies usually only capture part of the glycoprotein structure. Some glycoprotein structural features can be inferred or predicted from the biosynthetic constraints of the well-studied glycosylation machinery of mammalian cells (33). Nevertheless, it is important to stress that even incomplete structural information can often be very useful in deciphering the structure/function relationships of glycoproteins, and in identifying alterations in the biosynthetic glycosylation machinery.
Glycoproteomics is the site-specific analysis of the glycoproteome at the systems level, Fig. 2A. Glycoproteomics experimental workflows are typically initiated with protein extraction from biological samples, denaturation, and protease digestion, Fig. 2B. At this step, isotopic labels assisting in glycopeptide quantitation or enhancing their MS features (e.g. ionization and fragmentation) can be introduced. The resulting peptide mixtures are often extremely complex and glycopeptides are consequently typically enriched and/or prefractionated prior to detection, usually by LC-MS/MS. Glycoproteomics experiments are commonly based on the identification, and less frequently, also the quantitation of intact glycopeptides. Glycoproteomics yields system-wide information on the glycoprotein carriers, the glycan attachment sites, the occu- pancies of glycosylation at each site, and the structure and heterogeneity of the attached glycans. As showcased in this review by recent examples, glycoproteomics is a powerful technology to map disease-associated alterations in the glycoproteome, Fig. 3. Such glycosylation alterations may originate from multiple tissues, which may be differently regulated during pathogenesis. Typically, in glycoproteomics investigations, intact glycopeptides derived from the total complement of glycoproteins extracted from bodily fluids or complex tissues from healthy and diseased individuals (or other biological scenarios) are qualitatively and quantitatively compared. By also measuring any changes in glycoprotein abundance and site occupancy, the exact mechanism(s) contributing to the observed glycoproteome regulation can be interrogated (see example below).
Many other analytical approaches can be used to characterize aspects of glycoprotein structural diversity. These include site-specific analysis of N-glycoproteins isolated to relative purity rather than in complex mixtures (11, 12, 20, 34 -42), N-glycomics analyses of glycans released from glycoproteins (18,38,(43)(44)(45)(46)(47), and identification and quantification of previously glycosylated sites on de-N-glycosylated proteins ("deglycoproteomics") after removal of the entire glycan or with remnant N-glycan core remaining (48 -57). Although these studies per se do not qualify under our definition of glycoproteomics, (site-specific analysis of the glycoproteome at the intact glycopeptide level), they still provide useful information in conjunction with glycoproteomics for the glycobiologist, provided correct experimental design is applied (58).
Key Technologies and Recent Analytical Developments in N-glycoproteomics-In 2014, we reviewed the status of modern LC-MS/MS-based glycoproteomics (59). Other excellent recent reviews are also available (15, 30, 60 -68). Thus, this FIG. 3. Three fundamental levels of molecular dysregulation of multiple tissues contributing to an altered secreted N-glycoproteome during disease. A, Hypothetical example illustrating three sources of dysregulation: 1) protein level (green), 2) site occupancy (blue), or 3) glycosylation micro-heterogeneity (red) from three separate tissues (Tissue A-C) contributing to a joint secreted N-glycoproteome (Protein A-C) in a body fluid derived from disease (right) and 'normal' healthy (left) condition. B, After proteolysis and enrichment, the altered abundance of the resulting glycopeptides can be detected using LC-MS/MS-based label-free quantitative glycoproteomics as shown by color-coded traces representing extracted ion chromatograms (XICs). However, establishing which of the three mechanisms causes the glycopeptide alterations for the detected glycoproteins may be challenging solely with glycopeptide analysis, especially in glycoproteomes arising from multiple tissues. Parallel quantitative proteomics and "deglycoproteomics" (detection of formerly occupied N-sites) of the same samples can assist in this task. review will highlight the very latest (ϳ2014 -present) technological advances and applications in glycoproteomics, which have been instrumental for the recent performance improvements in detection limits, accuracy of glycopeptide identification and quantitation, and gains in glycoproteome coverage.
The modern discipline of glycoproteomics has deep roots in the protein and carbohydrate analytical chemistry pioneered in the late 1980s and early 1990s with the advent of biomolecular mass spectrometry (69). Impressive analytical strategies using relatively insensitive MS instrumentation were developed e.g. the selective detection of glycopeptides in mixtures using deglycosylation-based mass shifts (70) and selection ion chromatograms (71), and the fundamentals of glycopeptide ionization and fragmentation behavior were accurately described (72,73). These early studies remain a solid foundation on which many modern glycoanalytical strategies are conceived. It is also clear that glycoproteomics more recently has profited handsomely from technology developments arising from the larger and more mature discipline of proteomics including sample handling, LC-MS/MS acquisition strategies, and data handling and processing (74). In parallel, glycoproteomics has been a beneficiary of the continual performance enhancements of modern mass spectrometers including improved speed, sensitivity, resolution, and accuracy, most notably implemented on the latest Q-TOF (Sciex, Waters, Agilent, Bruker) and on multiple Orbitrap (Thermo) instrument platforms (75). Developments and applications of several key glycoproteomics-specific technologies have additionally been critical for the rapid maturation of glycoproteomics workflows over the past two years, Table I. Specifically, key advances have been made in the enrichment of intact glycopeptides from complex peptide mixtures, in LC-MS/MS-based detection of intact glycopeptides through optimized dissociation and acquisition styles of glycopeptides, and in data handling for more automated, yet still confident, glycopeptide identification and quantitation.
Enrichment and Prefractionation Strategies for Glycoproteomics-Because of the substoichiometry of glycopeptides in complex peptide mixtures arising from the extensive glycan micro-and macro-heterogeneity, and inherently poor detectability, glycopeptide enrichment is a critical component of glycoproteomics experiments. Recent advances in glycopeptide enrichment have been fully reviewed elsewhere (76 -79). Exciting initiatives in glycopeptide enrichment strategies include the optimized use of boronic acid as a reversible glycopeptide capture method on magnetic graphene (80) and on magnetic microspheres (81). Boronic acid, which reacts with cis-diol-containing monosaccharide residues, has also recently been used to enrich intact glycopeptides (82) and glycoproteins (83), but remains an infrequently used glyco-enrichment method. Solid phase hydrazide-based glycopeptide capture has also seen developments (48) and applications (see Table I), but this approach is most commonly used in conjunction with peptide N-glycosidase F-catalyzed release and analysis of formerly N-glycosylated peptides and does not satisfy our definition of glycoproteomics. Other new enrichment methods of interest include the metabolic incorporation of N-azido sugars into N-glycopeptides to facilitate their specific enrichment and detection (84,85). The selective precipitation of glycopeptides by acetone (86), use of size exclusion chromatography (87,88) and the combined use of porous graphitized carbon (PGC) and reversed phase (RP) (89) and titanium dioxide (90) solid phase extraction (SPE) for efficient enrichment of glycopeptides and sialoglycopeptides, respectively, are also promising developments. However, common for most of these proof-of-principle methodology studies is the need for further validation to demonstrate their true potential in glycoproteomics. The frequently used zwitterionic-HILIC SPE-based methods for enrichment and analysis of intact N-glycopeptides (91-93) have been further tested. Usefully, it was found that N-glycopeptides are still efficiently retained when using higher concentrations (0.1%) of surfactants and detergents including SDS and Triton X-100 (94). In addition, other HILIC phases were used to enhance the loading capacity (95) and have been synthetically tweaked by using "click-chemistry" (96,97) and by introducing mixed mode-HILIC retention mechanisms (98 -102). As all N-glycopeptides harbor a minimum degree of localized hydrophilicity arising from a high density of polar hydroxyl groups, HILIC remains the most used and, in our opinion, the most efficient and least biased enrichment method facilitating large-scale analysis of intact and native (nonderivatized) glycopeptides in N-glycoproteomics.
Only few developments in the off-line separation and fractionation of glycopeptides prior to LC-MS/MS detection have recently been published. Some glycoproteomics approaches even by-pass this step because of the increased capacity of modern LC-MS/MS instrumentation to handle the extreme complexity of biologically-relevant glycopeptide mixtures, and perhaps because the multiple LC fractions resulting from off-line separations dramatically increase the required LC-MS/MS instrument time, lower the overall sensitivity of the analysis and complicate the downstream quantitative data analysis (103,104). However, multiple glycoproteomics strategies, particularly in studies where the amounts of biological sample material are not a limiting factor, still opt to use off-line separation to increase the glycoproteome coverage (46,(105)(106)(107). The most exciting new developments in this area include the off-line use of neutral/high pH (pH 7-10) RP-LC (7, 108), which displayed orthogonal glycopeptide retention behavior relative to the conventional acidic (ϳpH 2) RP-LC-(online) MS/MS. Importantly in this system, the higher pH RP separation maintained the "clustering effect" by eluting glycopeptide families with the same peptide backbone in narrow bands within single LC fractions enabling their downstream identification and quantitation within a single LC-MS/MS analysis (106).

MS Acquisition Strategies in Glycoproteomics-LC-MS/
MS-based detection of glycopeptides has seen many exciting advances in the past two years. Jointly, these advances have significantly contributed to the enhanced performance of glycoproteomics workflows. As accurately reviewed (109), much focus has been directed to developing, optimizing and com- * Hydrazide is usually used to capture N-glycopeptides with a subsequent peptide N-glycosidase F release of formerly N-glycosylated peptides for glycosylation site mapping and is, thus, not a generally used tool in glycoproteomics.
bining informative dissociation methods to enable confident large-scale N-glycopeptide identification. Systematic studies have investigated the glycopeptide (glycan and peptide) sequence coverage as a function of several variables e.g. the adduct formation, proton mobility and (normalized) collision energy in higher-energy collision dissociation (HCD) (110), beam-type (Q-TOF) CID (111)(112)(113), and the combined use of resonance activation (ion trap) CID and HCD (77). Together with conventional electron transfer dissociation (ETD), HCD and ion trap CID, which are often activated as "back-to-back" alternating dissociation events of the same isolated precursor ions or alternatively engaged in separate LC-MS/MS runs, form the most popular dissociation techniques because of their complementary nature when applied to N-glycopeptides. Electron capture dissociation (ECD) typically used on Fourier transform ion cyclotron resonance instrumentation is a less used (and less efficient) electron-based dissociation method used in glycopeptide analysis (114,115). No single dissociation method yields information on the entire structural detail of N-glycopeptides. In general terms, the c/z-ion series formed in ETD-MS/MS yield information on the glycosylation site and the peptide identity, but only reveal the glycan as a "blind" modification with an accurate mass, Fig. 4. HCD produces abundant diagnostic oxonium ions and partial B-and Y-ion series for glycopeptide classification and solid b/y-ion series for confident peptide identification (116,117), but typically without the Asn-conjugated glycan, which is lost in the fragmentation process. The glycopeptide fragments generated by beam-type CID on Q-TOF instrumentation resemble the gly-copeptide fragmentation pattern produced by HCD in the C-trap of Orbitrap mass spectrometers. In contrast, resonance activation CID as generated on linear and 3D (Paul) ion traps produces more complete B-and Y-ion series (and even occasionally C-and Z-ions) as well as some oxonium ions to identify the conjugated glycan, but typically neither the site nor the carrier peptide is confidently identified, and is often left as a "blind" mass. In addition, ion trap CID induces mostly single bond cleavages of glycopeptides and some important oxonium ions may therefore not be formed by this dissociation method. Other limitations of ion traps are the reduced resolution and mass accuracy relative to high resolution mass analyzers and the cut-off in the low-m/z region, which can impede glycopeptide identification by masking the important diagnostic oxonium ions. ETD is valuable for eliminating incorrect assignments from unexpected peptide modifications e.g. carbamidomethylation side-reactions (118) or nonspecific protease activity (119), but may still compromise the identification accuracy in the case of unexpected glycan derivatization e.g. from reactive buffers (120). The "hybrid" dissociation method, electron transfer and higher-energy collision dissociation (EThcD), where a supplemental energy is applied to all fragment ions formed by ETD, including the usually abundant unreacted precursor ions, has shown great potential by the production of richer and more informative spectra consisting of both b/y-and c/z-ion series thereby increasing the sequence coverage of both nonmodified peptides (121) and N-glycopeptides (105,122). Although engaging multiple dissociation modes is becoming more popular, some research-  (116,117). *N-Glycan identification is usually restricted to specification of the monosaccharide composition and the partial or complete topology. **The b/y-ions in the beam-type CID (HCD) and EThcD dissociation schemes usually lose the conjugated glycan and therefore do not provide information about the glycosylation site. ers still opt for applying single fragmentation techniques i.e. HCD (46) and ETD (7) in glycoproteomics, for example because of the nature of the research question (e.g. only investigating the site-specific glycan monosaccharide composition), limited availability of instruments allowing multiple fragmentation modes or to avoid the longer MS duty cycle associated with approaches using alternating/hybrid dissociation methods.
The abundant formation of the glycopeptide-specific oxonium ions upon beam-type CID (HCD) has been utilized to create more intelligent glycoproteomics acquisition styles that enable more instrument time being spent on fragmenting glycopeptides rather than the usually more abundant nonglycosylated peptides in biological mixtures. The GlcNAc-derived fragments m/z 138.0545 and m/z 204.0867 are particularly useful diagnostic ions for such acquisition styles. In fact, the same oxonium ions are often generated upon beam-type CID (HCD) fragmentation of GalNAc-containing peptides, albeit in different intensities (123), meaning that these acquisition strategies are also useful for O-glycopeptides. The saccharide oxonium ion profile derived from glycopeptide fragmentation can even provide valuable supporting information of the monosaccharide compositions of the attached N-glycans (123,124). The original product-dependent (PD) method, which triggered ETD events upon recognition of m/z 204.086 in HCD-MS/MS (125), was lately followed up with a more informative approach where alternating ion trap CID and ETD events are triggered upon oxonium ion recognition in HCD-MS/MS spectra (126). In another innovative LC-MS/MS study, N-glycopeptides in nonenriched peptide mixtures were crudely isolated (postacquisition) from their nonglycosylated counterparts based on their accurate mass (127). This type of glycopeptide classification was based on a common mass defect of N-glycopeptides arising from their oxygen-rich elemental compositions. This unique property can be used to filter large LC-MS/MS data sets for the presence of N-glycopeptides.
The growing volume of glycoproteomics literature can be used to identify trends and common methods utilized across multiple laboratories. Convenient examples include the common use of RP-LC as a standardized chromatographic platform to separate glycopeptides in an online-conjugated manner, together with the LC-ESI-MS/MS acquisition style, which commonly utilizes data-dependent acquisition (DDA) to detect native (underivatized) intact glycopeptides at high resolution/high mass accuracy (better than 20 ppm accuracy of precursor and product ions) in positive ion polarity mode. Such approaches are shared by most glycoproteomics researchers, see Table I. In fact, compared with the more established discipline of glycomics that still, based on lab-to-lab or researcher-to-researcher preferences and traditions, utilizes a wide spectrum of analytical approaches to achieve structural information of glycans (128), glycoproteomics strategies appear surprisingly more uniform, likely because of similarities with standard proteomics workflows. However, the literature does harbor recent examples of studies exploring new avenues deviating from these mainstream approaches. For example, ion mobility MS was used to separate and analyze intact ovomucoid glycopeptides (40) and urine glycopeptides from a patient diagnosed with Schindler disease (129). Furthermore, recent studies have explored the feasibility of measuring deprotonated glycopeptides in negative ion mode with (130) and without (131) derivatization. Innovative strategies for the derivatization of sialic acid residues have also been designed to achieve linkage-specific (␣2,3 versus ␣2,6-sialyl) information at the glycopeptide level (132) and to obtain more equal ionization response from neutral and sialylated glycopeptides (133). Furthermore, attempts to generate universal workflows using Pronase digestion and C18-PGC LC-MS/MS instead of the conventional RP-LC-MS/MS of tryptic glycopeptides were described (39,134). In addition, data-independent acquisition (DIA), referred to as "SWATH-MS" on Sciex MS platforms and "MS E " on Waters MS platforms, has also been trialed to monitor both micro- (135,136) and macro-(136 -138) heterogeneity of protein N-glycosylation using LC-MS/MS instrumentation. However, common for these less used, but still very exciting analytical choices, is the fact that much work is still needed to accurately document their potential in glycoproteomics relative to their already implemented counterparts.
Interesting developments have been described in targeted glycopeptide detection and quantitation strategies, as recently reviewed (139). Several applications of multiple reaction monitoring (MRM)-based glycopeptide analysis from a variety of biological systems and diseases were lately published including the use of MRM-based serum glycoproteomics to study esophagus diseases (140), liver disease (141), and the immunoglobulin subclasses (134,142,143). Among other technical challenges unique to MRM-based glycopeptide detection and quantitation, one of the most important considerations is the selection of the precursor-product transitions being monitored. Needless to say, the transitions must be unique for the glycopeptide(s) of interest. The low-mass oxonium ions may not provide sufficient specificity because they can arise from multiple glycopeptides. It is anticipated that the need for targeted glycopeptide analysis will increase in the immediate future as a direct result of the maturation of discovery-type glycoproteomics technologies, yielding more "glycopeptide candidates" to target. MRM-based strategies are ideal for targeted analyses because of their high sensitivity, dynamic range and throughput directly from complex biological matrices requiring minimal sample handling. As recently summarized (15), several recent studies also report large-scale (proteome-wide) quantification of N-glycosylation site occupancy. Novel deglycosylation-based (144) or nondeglycosylation-based (145) analytical LC-MS/MS approaches have been established for the accurate quantitation of glycosylation site occupancy. These analytical platforms were applied to measure site occupancy in various biological settings including in plants (146), human saliva (83), and yeast (137). Recently, an alternative method using differential glycoprotein oxidation and labeling was utilized to map the sialoglycan occupancy on specific protein N-glycosylation sites (145). These methods and the results they produce are highly interesting and of great value to glycobiologists, and emphasize the diverse analytical techniques applicable for glycoprotein analysis.
Glycoinformatic Tools for Glycopeptide Analysis-Following the acquisition step of large-scale glycopeptide data, a bottleneck that has severely limited the field of glycoproteomics is downstream glycopeptide identification. The identification process was, until recently, largely driven by manual expert annotation of the resulting MS/MS spectra. As described in topical reviews (122,(147)(148)(149), many glycoinformatics initiatives have been initiated to automate the glycopeptide identification process. Recent commercial and open-source software developments facilitating automated glycopeptide identification or providing assistance in the process include Byonic TM (150), Protein Prospector (151) Table I). The different programs utilize various strategies to identify N-glycopeptides including the use of characteristic Y 1 ions, oxonium ions, B/Y-and C/Z-type glycan fragment ion series, and b/y-and c/z-type peptide fragment ions (116,117). In order to limit the search space and enhance the detailed structural knowledge of the glycans encountered in a given glycoproteome, several studies have performed parallel N-glycome profiling (21,40,88,105,106,108,162). This so-called 'glycomics-assisted glycoproteomics' approach can also be complemented with quantitative proteome and de-N-glycoproteome profiling (mapping of formerly N-glycosylated peptides) (21, 46, 88, 103, 105-108, 135, 159, 162), which reduce the search space even further and provide supporting evidence to pinpoint the exact mechanism(s) driving the observed glycoproteome alterations. At this stage, for most of the available glycopeptide identification software tools it is strongly advised that manual validation of the glycopeptide assignments are still performed to generate sufficient confidence in the reported identifications. This is particular true for a subset of glycopeptides that are known to cause frequent misassignments including 1) peptides with susceptibility for unexpected peptide modifications/deriva-tizations e.g. Met carbamidomethylation (addition of CH 2 -CO-NH 2 ) (⌬m ϭ 57 Da/41 Da from nonmodified/oxidized Met equating to the mass difference of HexNAc/Fuc and HexNAc/ Hex, respectively) (118), 2) peptides carrying isobaric (Fuc 1 NeuAc 1 -R versus Hex 1 NeuGc 1 -R, ⌬m ϭ 0 Da) or nearisobaric monosaccharide subcompositions (e.g. Fuc 3 -R vs. NeuAc 1 Fuc 1 -R, ⌬m ϭ 1 Da), the latter relying crucially on correct monoisotopic peak picking, and 3) unexpected analyte formation e.g. noncovalent glycopeptide dimer formation (163). Analysts should also be aware that peptides may harbor multiple (N-and O-) glycosylation sites and that the established monosaccharide composition may therefore not necessarily be restricted to a single glycan moiety on a single site. Similar to proteomics, it is necessary that glycoproteomics moves toward false discovery rate (FDR)-based identifications, where the accuracy of the reported glycopeptides are statistically assessed by a joint probability score for the entire glycopeptide structure or by multiple probability scores to assess the correctness of the individual components of the identified glycopeptide i.e. the peptide, site and monosaccharide composition. However, assigning appropriate and meaningful FDRs to glycopeptide identifications involves more variables than generating FDRs of nonmodified peptide identifications and has consequently proven challenging and is still in its infancy (158).
Quantitative Glycoproteomics-As discussed in more detail below, the first studies performing isotope-assisted quantitative glycoproteomics using stable isotope labeling with amino acids in cell culture (SILAC) (105) and isobaric tags for relative and absolute quantitation (iTRAQ) (46,108) have recently been published. This has enabled accurate relative quantitation of single glycoforms on individual protein sites by comparing the precursor (SILAC) or the product (iTRAQ) ion intensities across two or more conditions. However, as we discussed in a recent review (67), comparing the entire glycoform distribution (micro-heterogeneity) at individual glycosylation sites in separate conditions may be more informative in glycobiology. Both isotope-assisted quantitative glycoproteomics (46,105,108,164) and label-free (based on XIC area, precursor intensity or spectral count) quantitative glycoproteomics (21,104,106,159,165) can generate quantitative information of site micro-heterogeneity. Importantly, if the ambition is to understand the mechanisms driving alterations of the glycoproteome in specific conditions (see Fig. 3) parallel proteome profiling and site occupancy experiments are typically required. Recently, isotope-coded carbamidomethylation was shown to be a simple and inexpensive alternative to SILAC and iTRAQ-based glycopeptide labeling and quantitation, but this strategy suffers from the severe limitation that it is only applicable to cysteine-containing glycopeptides (166). Stable isotope dimethyl and succinic anhydride labeling appear to be more widely applicable alternatives when aiming for a universal glycopeptide analytical workflow (167,168).
Glycoproteomics Databases-Glycoproteome data deposition and storage are important aspects to consider at the end of the glycoproteomics pipeline. Large glycoproteome data sets often harbor information of great value to glycobiologists pursuing other research questions that can tentatively be probed using the acquired data (62). This naturally requires the data (processed or raw) to be freely available in a data repository (e.g. ProteomeXchange and PRIDE) and the experimental design and conditions explicitly described. In analogy to the MIAPE and MIRAGE guidelines detailing the minimum information required for proteomics (169) and glycomics (170) experiments, respectively, it is desirable that an equivalent set of guidelines are designed and implemented for published glycoproteomics data. The curated data in the form of the identified N-glycan structures at specified protein sites has begun to be captured and stored at established glycoinformatic platforms such as UnicarbKB (http://unicarbkb.org) (171) for easy browsing and relevant connectivity to the larger protein-centric UniProt (http://uniprot.org) knowledge repository and the glycan-centric UnicarbDB (http://unicarb-db. biomedicine.gu.se) MS/MS database. Informatics initiatives aiming at integrating glycoproteomics data and other glycoand protein-specific data as well as enzyme information at a higher level have also been presented (147). Taken together, it is clear that the recent advances of glycoproteomics strategies are driven by exciting initiatives in all components of the analytical workflow. As described below, the significant aggregate value of these advances is tangible in recent glycoproteomics applications reporting thousands of unique Nglycopeptides from biological protein samples.
Glycoproteomics Facilitates Insight in the Biosynthesis and Regulation of the N-glycoproteome-The technological advances described above have facilitated an increasing number of glycoproteomics-centered studies in recent years, Table II. As expected, considering their diverse experimental design (sample origin/amount, sample preparation, LC-MS/MS acquisition, and data handling strategies), these glycoproteomics studies differed markedly in their N-glycoproteome coverage, with some reporting fewer than 100 unique glycopeptides from less than 50 glycoproteins and others reporting thousands of unique glycopeptides from several hundred glycoproteins, Fig. 5. Using this metric to evaluate analytical performance it becomes clear that glycoproteomics is now capable of achieving a relatively deep coverage of the N-glycoproteome, albeit far from the extensive proteome coverage provided by modern ultra-sensitive proteomics technologies (172). This is particularly true when considering the immense size of the glycoproteome that is orders of magnitude larger than proteome. We would like to emphasize that although deep glycoproteome coverage is worth pursuing in our efforts to enhance our understanding of the structures and regulation associated with of the glycoproteome, important glycobiology can be learnt from glycoproteomics experiments without access to the latest high performance MS instrumen-tation provided an intelligent experimental design and hypothesis is applied.
Glycoproteomics applications can be crudely classified into two types: deep glycoproteome profiling and comparative glycoproteomics. The former seeks to perform deep structural exploration of N-glycoproteomes derived from diverse bodily fluids, tissues/cells or even subcellular compartments thereby providing species-, tissue-, and protein-specific knowledge of the N-glycoproteome produced at a single condition, time and space (13,21,88,103,135,159,162). Although predominantly qualitative and descriptive in their nature, such efforts enhance our fundamental knowledge of glycobiology. This is particularly relevant considering the largely unexplored state of the N-glycoproteome even in many accessible biological tissues and fluids (173). Knowledge obtained from such efforts include a spatiotemporal understanding of protein-and tissue-specific N-glycosylation patterns (13), information that cannot be captured by (deglyco-) proteomics, glycomics or other -omics technologies. As described further below, deep glycoproteome profiling studies may also expand our understanding of the N-glycosylation machinery by identifying novel N-glycan structures that may only be present on a small subset of glycoproteins or restricted to specific tissues or physiological conditions. As such, deep site-specific glycoprotein profiling may ultimately provide new "structuredriven" insight into the capacity and regulatory mechanisms of the biosynthetic machinery (21).
The other class of glycoproteomics studies aims to identify alterations in a specified "N-glycoproteome system" comparing two or more biological conditions (46,104,105,107,108). Occasionally, such comparative glycoproteomics studies may also aim to pinpoint the underlying mechanism(s) driving the pathway dysregulation causing the altered N-glycosylation signatures (46,105). In order to specify which of the three levels of regulation is responsible for the observed glycoproteome alterations (see Fig. 3), the proteome and site-specific occupancies are typically mapped in parallel with site-specific glycoform profiling. The molecular mechanisms controlling regulation can be supported by orthogonal techniques at the protein or glycan level, for example by using antibodies or lectins in Western/lectin blotting, protein/lectin arrays, ELISA, immunofluorescence, immunohistochemistry (and qPCR/ RNAseq at the transcriptional level) or better at the glycoprotein level e.g. by proximity ligation assays (174) or using antibodies directed to shared glycan and polypeptide epitopes (175). Below, two recently published examples of these approaches are discussed to illustrate the potential of modern glycoproteomics.
Cystic Fibrosis Sputum Glycoproteomics-We previously performed N-glycomics of human sputum and found that a significant proportion of the N-glycome of sputum from inflamed and bacterial-infected cystic fibrosis patients (ϳ25-45%) was comprised of a new class of N-glycans, the socalled paucimannosidic N-glycans (Fuc 0 -1 Man 1-3 GlcNAc 2 ,  The glycoform distributions at the individual sites were profiled using XIC-based label-free relative quantitation. From a total of 115 unique N-glycopeptides covering 36 sites distributed across 30 human glycoproteins, paucimannosidic Nglycans were mapped to 35 unique glycopeptides from 23 sites and 18 proteins. The sputum glycoproteome and glycome data suggested that the paucimannosidic structures were carried by proteins located in the azurophilic granules of neutrophils in the (neutrophil-rich) sputum, which was confirmed by colocalization of immunofluorescence using subcellular-and paucimannosidic epitope-specific antibodies (21). A new spatially and temporally regulated biosynthetic pathway based on expression of ␤-hexosaminidases and ␣-mannosidases in specialized compartments of maturing neutrophils was suggested to generate paucimannosidic glycoproteins (see Fig 1A for the proposed truncation pathway) (21). Proteins isolated from the azurophilic granule of human neutrophils also carried further truncated epitopes of the chitobiose core type (Fuc 0 -1 GlcNAc 1-2, see Fig. 1B) (20). Recently, pro-tein paucimannosylation was detected in other cell and tissue types including human fetal lung fibroblasts (103), saliva (176), epithelial colorectal (44,177,178) and breast (179) cancer cells, and in the mouse brain (19) also using glycoproteomics and other -omics technologies. Glycoproteomics was furthermore used to identify core and noncore fucosylated full and truncated chitobiose core type N-glycans from mouse synaptosome (7) and liver (13). Taken together, these findings illustrate the potential of modern glycoproteomics technologies to identify new classes of protein glycosylation hidden in the glycoproteome and to study their biosynthesis and spatial and temporal location.
Comparative Glycoproteomics of TNF-induced Insulin Resistance (IR) in Adipocytes-As discussed in more general terms above, glycoproteomics is a powerful tool to map the regulation of glycoproteomes derived from the same cellular system under different biological conditions. A prime example of this application is a recent study by Parker and colleagues, in which the regulation of the N-glycoproteome during TNFbased induction of IR was investigated in vitro in murine adipocytes (105). Wild-type and IR cells were grown in SILAC media and the N-glycome (87 N-glycans), proteome (7258 proteins), and the de-N-glycoproteome (2187 formerly N-glycosylated sites on 1041 proteins) were first profiled. The Nglycoproteome was then profiled using HILIC-SPE enriched and high-pH RP-LC off-line separated glycopeptides using Orbitrap Fusion LC-MS/MS instrumentation. The ten most abundant precursors in each cycle were selected for HCD fragmentation followed by PD (m/z 138.0545 and 204.0867) re-isolation and fragmentation of glycopeptides using EThcDand ion trap CID-MS/MS. In total, 1580 unique N-glycopeptides covering 332 unique N-sites on 154 proteins were confidently identified using Byonic and quantified using the ion intensities of the isotopically labeled precursors. The novelty of this study was the ability to accurately detect glycoproteome alterations at the site-specific level and, importantly, to be able to differentiate their regulation at the protein, glycosylation site occupancy, and glycan level (see Fig. 3 and text for details). The vast majority of the N-glycoproteome altera- tions observed upon IR-induction could be accounted for by protein regulation; more than 200 proteins were significantly changed in abundance. Very limited differences in site occupancy (only three sites) and glycosylation micro-heterogeneity pattern (only 16 N-glycopeptides) were observed when the altered protein levels were adjusted for. Interestingly, the conventional proteomics experiments acquired on the Orbitrap Q-Exactive platform proved to be sufficiently sensitive to identify and accurately quantify 94 low-abundant glycosyltransferases and glycosidases involved in the glycosylation biosynthetic pathway. Together with transcriptomics data of altered expression of competing galactosyl-and sialyltransferases, these data yielded important clues to the enzymatic origin of the altered N-glycosylation signature associated with IR-transformation (terminal galactosylation and sialylation switching).
It could be speculated that glycomics or glycoepitope (e.g. lectin blotting)-centric analytical methods may have incorrectly assigned these protein abundance changes to quantitative glycosylation changes. It should also be recognized that other physiological conditions and tissues consisting of multiple cell types (in contrast to isolated homogeneous cell cultures investigate here) most likely would show much more complex glycoproteome regulation at all three molecular levels (glycan, occupancy, and protein level), supporting the relevance of performing this detailed level of molecular mapping. Similar impressive glycoproteomics efforts have recently been published in other biological systems using iTRAQ labeling and quantitation of glycopeptides (46,108).
Understanding the exact mechanism(s) contributing to altered glycoproteomes in specific diseases is very important to advance our molecular knowledge of the underlying pathogenesis and when developing accurate biomarkers and targeted therapeutics toward the investigated pathologies. Thus, with the availability of the maturing glycoproteomics technologies, and their potential to provide unique structure-based insight into glycobiology, it is anticipated that similar comparative glycoproteomics strategies will be applied in the immediate future to investigate the glycoproteome regulation in a wide spectrum of biological systems.

CONCLUSIONS
In summary, acknowledging the wealth of biological and chemical information stored in the relatively unexplored Nglycoproteome, the past decade, and the recent two years in particular, have seen tremendous efforts to develop and implement efficient strategies for large-scale site-specific analysis of protein N-glycosylation. An increasing volume of tangible examples are now available in the literature demonstrating the enormous potential of glycoproteomics. The maturation of glycoproteomics has given glycoscientists a new powerful tool which importantly complements existing structural and functional glycobiological techniques and can serve as both initial discovery-type experiments for hypothesis gen-eration or as a confirmatory tool for hypothesis testing. At present, glycoproteomics is still limited to relatively specialized research groups. However, with the recent advances and maturation of workflows, we anticipate that glycoproteomics will rapidly transition to become an accessible tool for the wider research community, in particular to established proteomics scientists. Ultimately this will allow glycoscientists and more general biomolecular scientists alike to explore the intriguing information stored in the spatiotemporally regulated glycoproteome.