A Comprehensive, Open-source Platform for Mass Spectrometry-based Glycoproteomics Data Analysis*

Glycosylation is among the most abundant and diverse protein post-translational modifications (PTMs) identified to date. The structural analysis of this PTM is challenging because of the diverse monosaccharides which are not conserved among organisms, the branched nature of glycans, their isomeric structures, and heterogeneity in the glycan distribution at a given site. Glycoproteomics experiments have adopted the traditional high-throughput LC-MSn proteomics workflow to analyze site-specific glycosylation. However, comprehensive computational platforms for data analyses are scarce. To address this limitation, we present a comprehensive, open-source, modular software for glycoproteomics data analysis called GlycoPAT (GlycoProteomics Analysis Toolbox; freely available from www.VirtualGlycome.org/glycopat). The program includes three major advances: (1) “SmallGlyPep,” a minimal linear representation of glycopeptides for MSn data analysis. This format allows facile serial fragmentation of both the peptide backbone and PTM at one or more locations. (2) A novel scoring scheme based on calculation of the “Ensemble Score (ES),” a measure that scores and rank-orders MS/MS spectrum for N- and O-linked glycopeptides using cross-correlation and probability based analyses. (3) A false discovery rate (FDR) calculation scheme where decoy glycopeptides are created by simultaneously scrambling the amino acid sequence and by introducing artificial monosaccharides by perturbing the original sugar mass. Parallel computing facilities and user-friendly GUIs (Graphical User Interfaces) are also provided. GlycoPAT is used to catalogue site-specific glycosylation on simple glycoproteins, standard protein mixtures and human plasma cryoprecipitate samples in three common MS/MS fragmentation modes: CID, HCD and ETD. It is also used to identify 960 unique glycopeptides in cell lysates from prostate cancer cells. The results show that the simultaneous consideration of peptide and glycan fragmentation is necessary for high quality MSn spectrum annotation in CID and HCD fragmentation modes. Additionally, they confirm the suitability of GlycoPAT to analyze shotgun glycoproteomics data.

Glycosylation regulates protein folding and cell-cell interactions in a variety of biological contexts (1,2). This is an important post-translational modification (PTM) 1 in the context of protein therapeutics, development, normal physiology and diseases like inflammation and cancer (2). Unlike DNA and protein that are composed of a uniform set of nucleotide or amino acid building blocks across all organisms, monosaccharide composition is not uniform among species. To add to this complexity, glycans often contain branched structures, and they can be heterogeneous both in terms of whether a particular site is glycosylated (macroheterogeneity) and also in terms of the distribution of different glycans at a single site (microheterogeneity). This heterogeneity reflects the metabolic status of the cell, tissue or organ system at multiple levels, particularly the factors controlling mRNA transcription, protein translation and glycosylation reaction rates (3).
Tools to study glycosylation are rapidly being developed and recent years have witnessed the increasing use of mass spectrometry (MS) for the structural analyses of glycans (4,5). In this regard, although classical glycomics methods first separate the glycans from proteins to determine either glycan structure or site of protein glycosylation, more recent glycoproteomics workflows focus on analyzing site-specific glycosylation by interrogating the intact glycopeptide (4,6). Commonly, the latter applications use liquid chromatography (LC) to resolve a complex mixture of (glyco)peptides that are generated by the enzymatic digestion of proteins. In the most popular format, following electrospray ionization (ESI) and high-resolution precursor/MS 1 mass quantitation, tandem MS n analysis is performed on selected ions following fragmentation using either vibrational dissociation methods like CID (collision induced/activated dissociation) and HCD (higher-energy collisional dissociation or beam-type CID), or activated electron dissociation methods like ETD (electron transfer dissociation) (4,6,7). Because of the high-throughput nature of the experiment, each LC-MS run results in tens of thousands of fragmentation spectra. Here, the CID mode is prone to producing B-/Y-ions because of glycan fragmentation while leaving the peptide backbone largely intact. Thus, it can assign glycan structure but not the site of glycosylation. HCD results in more extensive glycan fragmentation compared with CID, and peptide backbone b-y ion fragmentation. Although it does not provide detailed glycan-structure information, it identifies MS/MS spectra corresponding to glycopeptides because of the release of prominent low molecular mass mono-and disaccharide oxonium ions. Partial information on the site of glycosylation is also obtained (8). ETD predominantly results in N-C␣ peptide bond cleavage to generate c-/z-type ions while leaving the glycan(s) intact (4). This is invaluable for the identification of glycosylation sites. Together, the complementary fragmentation data regarding the glycans and peptide backbone may be spliced together for comprehensive structural analysis.
Although several programs exist for the analysis of either one or a few glycoproteomics tandem MS spectra, the lack of programs that can handle high-throughput data is a major limitation in the field (reviewed by (5,7,9,10)). Although a few programs for such data analysis have appeared, there is no gold-standard because the glycoproteomics experimental workflows are still evolving (4). Specifically, many of the currently available programs either only handle limited fragmentation modes or provide specific specialized analysis functions (11)(12)(13), cannot handle data in XML (eXtensible Markup Language) format (14,15), lack a well-developed scoring algorithm (16), are protein centric in that they focus on the site of glycosylation and glycan composition rather than detailed carbohydrate structure (17,18), lack a user-friendly Graphical User Interface (GUI) (11, 12, 14, 16, 18 -20) or are proprietary (17). The program "Protein Prospector" has also been modified to handle organism-scale glycopeptide databases, particularly for ETD fragmentation mode data analysis (21). Although some of these programs are "freely available" on request, to our best knowledge none of these are opensource and modular, with comprehensive documentation that can enable expansion by the community. This is important because the number of ways in which tandem-MS runs can be performed with different fragmentation modes is large, as more than one mode of fragmentation may be applied in a single LC run. The analyses of such experiments can be even more complicated when the individual runs interrogate MS 3 and higher level. Such higher-level analysis is likely to be part of future glycoproteomics workflows because of the need to distinguish between different isomeric, complex glycans (22). Thus, different spectra refinement and scoring strategies need to be tested and this cannot be accomplished by existing programs. Additionally, to the best of our knowledge, existing programs do not handle custom monosaccharide definitions and they do not contain a systematic strategy to fragment glycopeptides at multiple locations, a requirement for efficient MS n analysis. Because of these limitations, a majority of investigators in the field continue to rely on manual data post-processing and spectral interpretation, and this limits scientific progress (4,6).
To address the above limitations, this manuscript introduces a new computational, open-source framework called GlycoProteomics Analysis Toolbox (GlycoPAT, available from www.VirtualGlycome.org/glycopat). This program is modular in design, and it is built around a new linear, minimal representation of glycopeptides called SmallGlyPep (SGP 1.0). SGP1.0 is well suited for MS n data analysis because it allows the straightforward representation of multiple glycans on a single peptide backbone, and efficient in silico glycopeptide fragmentation at one or more locations. The incorporation of application programming interfaces (APIs) from a previous toolbox called GNAT (Glycosylation Network Analysis Toolbox, (23)), enables generation of candidate glycan search libraries based on existing knowledge of biochemistry. Data in both text and mzXML input formats are supported (24). The framework includes several additional, novel features including: (1) A scoring scheme to rank candidate glycopeptides based on an ensemble score (ES) which integrates multiple statistical parameters including cross-correlation and probability based scores; (2) A method to identify minimum acceptable ES scores (ES cut-off ) based on decoy libraries and glycopeptide false discovery rate (FDR) calculations; and (3) Parallel computing facilities to accelerate processing of bulky experimental data and search libraries. The program has been tested using single standard glycoproteins, simple mixtures of proteins, human blood plasma cryoprecipitate mixtures enriched in coagulation-related proteins, and complex prostate cancer cell lysates. Stand-alone GUIs, with basic functionality, are also provided to facilitate quick usage by those without access to the MATLAB software, persons unfamiliar with programming, and individuals looking for a ready-togo, freely available application for data analysis. The results confirm the ability of GlycoPAT to perform scoring in multiple fragmentation modes, and analyze complex biological samples. It shows that the simultaneous consideration of peptide and glycan fragmentation enhances the quality of MS n spectrum annotation, particularly following HCD and CID fragmentation.

EXPERIMENTAL PROCEDURES
Code Availability-GlycoPAT program source code, compiled GUIs and detailed instructional manuals and videos are available at the Sourceforge and Youtube repositories. These resources can be accessed from the software homepage: www.VirtualGlycome.org/ glycopat.
Experimental Design and Statistical Rationale-Single standard protein runs include 3 LC-MS runs with fetuin (UniProt P12763), 1 with asialofetuin, and 2 with RNaseB (P61823). A total of 9 runs were performed for defined mixtures that contain a mixture of proteins: fetuin, fibronectin (P02751), RNaseB and human ␣1-acid glycoprotein 1 (AGP-1, P02763). In additional, 14 independent runs assayed the human plasma cryoprecipitome. Additional data for Basigin/CD147 and prostate cancer cells was downloaded from PRIDE. Statistical analysis methods are described as part of the software package. The mass spectrometry proteomics data have been deposited at the ProteomeXchange Consortium (PRIDE identifier: PXD006031).
Tandem-MS Experiment-Bovine asialofetuin, bovine fetuin, bovine RNase B, and AGP-1 were purchased from Sigma-Aldrich (St. Louis, MO). Human fibronectin was from Life Technologies (Grand Island, NY). Five milliliters human blood was drawn by venipuncture from an O-blood group individual into 1:9 sodium citrate following human subjects protocols approved by the University at Buffalo Health Science IRB. Platelet poor plasma (PPP) was isolated from this blood as described previously (25). This precipitate was rapidly frozen to Ϫ80°C, and then slowly thawed at 3°C. Protein precipitate thus formed was collected by centrifugation at 5000 ϫ g for 15 min. The pellet was resuspended in 20 mM HEPES buffer and dissolved by warming to 37°C. For MS, all protein samples were processed with a surfactant aided on-pellet digestion procedure (26,27). Briefly, 100 g protein of each sample was spiked with 0.5% SDS and then denatured and reduced using 4 mM tris(2-carboxyethyl)phosphine (TCEP) at 37°C for 30 min. Following this, fresh 20 mM iodoacetamide was added for 30 min in the dark. Then, 6-fold cold acetone was added to the sample volume in two steps with vortexing, and the mixture was incubated overnight at Ϫ20°C. Samples thus obtained were centrifuged at 20,000 ϫ g at 4°C for 30 min, the supernatant was discarded, and the pellet was washed with methanol and then air-dried for several min. This dried pellet was smashed to small particles in 50 mM Tris-FA (formic acid) buffer (pH 8.5) with a sonicator, 1:20 (w/w) enzyme-to-protein ratio of sequencing grade trypsin or Glu-C (Thermo-Pierce) was added to a total volume of 100 l, and then the sample was incubated at 37°C for 18 h. 6 g of digested samples prepared in this manner were analyzed using either an LTQ-Orbitrap XL mass spectrometer or an Orbitrap-Fusion Tribrid mass spectrometer (Thermo Scientific, San Jose, CA), both with ETD module.
MS Data Preprocessing-The .RAW files generated from the MS instruments were converted either to text format with .dta extension using Bioworks 3.3.1 (Thermo-Scientific) or to .mzXML format using the msconvert tool (ProteoWizard 3.0.5759, (30)). In the .dta files, the first row contains the precursor MS 1 mass (MϩH ϩ ) and charge assignment inferred by Bioworks. The remaining rows list MS 2 fragment m/z values along the first column and corresponding intensity (I) data on the second column. The .mzXML file presents the same data and also additional experimental information, like fragmentation mode, in XML format.
Theoretical Glycopeptide Database Generation-The GlyDB was generated for N-and O-linked glycans as explained in Results. Gly-PepDB was then synthesized in silico by digesting one or more proteins supplied in FASTA format using the specified proteolytic enzyme(s), and then appending both fixed and variable PTM modifications. Protein UniProt i.d. for simple mixtures is provided above, and protein accession i.d. for plasma proteins is listed in Supplemental Tables. The MATLAB function used in this step is called digestSGP and it outputs the GlyPepDB in SGP1.0 format. When generating this database, in this current manuscript, either 2 or 3 missed cleavages, fixed cysteine carbamidomethyl modification (ϩ57.02146 Da) and variable methionine oxidation (ϩ15.99492 Da) was allowed. N-glycans could appear at Asn in the N-X-S/T sequon and O-glycans were allowed at Ser/Thr. Although there was no limit on the number of fixed modifications on any peptide, the number of variable modifications was limited to two, and occasionally three. These variable modifications include both glycan and nonglycan PTMs.
Scoring Experimental Spectra to Obtain Ensemble Score (ES)-GlycoPAT scoring follows two-steps shown in Fig. 1B. Although the parameters used in the current manuscript are stated below and the text corresponds to the case of MS/MS fragmentation, this can be changed for other applications that require MS n data analysis.
In the first step, the experimental MS 1 precursor mass was compared with the theoretical mass of all (glyco)peptides in GlyPepDB. GlyPepDB members with mass difference less than tolerance (typically 10ppm) are termed "candidate (glyco)peptides." In the second step, the ensemble score (ES) was calculated by comparing the experimental MS/MS spectrum with the same spectrum generated for the candidate glycopeptide in silico. The nature of this scoring was differed among the different fragmentation modes: CID, HCD and ETD. In this regard, one of two noise reduction methods was applied to delete low intensity peaks in the experimental data that are because of instrument noise. "Global noise reduction" was applied in CID and HCD modes to remove all peaks less than 2-times median peak intensity provided these are Ͻ1% of the most intense peak. "Local noise reduction" was applied in the ETD mode to delete the unfragmented precursor ion peak, and peaks below the median value in local m/z windows that span 100Th. Following this, the theoretical MS/MS spectrum of the candidate glycopeptide was generated in CID/HCD modes by fragmenting glycosidic bonds to form B-Y type ions in the case of glycopeptides or peptide b-y ions when the glycan is absent. In CID mode, two glycan fragmentations were allowed when the number of monosaccharides in the glycopeptide exceeded 4 because multiple fragmentations on a single glycan can occur in the case of N-linked glycopeptides. A mix of 1 glycan and 1 peptide fragmentation was permitted when the glycans had 4 or fewer monosaccharides because this is commonly observed in the case of O-GalNAc type glycopeptides (31). HCD mode analysis focused on mapping the underlying peptide backbone rather than analyzing the attached glycan. Thus, glycopeptide fragments in the theoretical MS/MS spectrum included selected oxonium ions (m/z ϭ 138.05496, 204.08665; 292.10269 and 274.09213, if the corresponding monosaccharides are present in the candidate), and peptide b/y-ions that contain glycans separated from the underlying peptide by up to 2 glycosidic linkages. In ETD mode, only c-z peptide ions were included. In all cases, oxonium ion charge state (z) ϭ 1. Theoretical peptide fragments had zՅ precursor charge state for CID and HCD, and zϽ precursor charge state for ETD. Next, the ability of the theoretical spectrum to match experimental MS/MS data was quantified using four statistical scores: I. Pearson Cross-correlation Analysis (X corr )-This procedure follows previous literature (32) with some modifications. Specifically, the theoretical MS/MS spectra was simplified such that only one peak was included for each (glyco)peptide fragment. The specific charge state for that fragment corresponded to the charge state of the theoretical peak that had a corresponding experimental MS/MS peak match. In case more than one theoretical peak had matching experimental peaks, the theoretical charge state corresponding to the most intense experimental peak was chosen. If no match was found, the fragment with charge state of ϩ1 was retained. The intensity I of each peak in the theoretical spectra was arbitrarily set to 50. Next, the intensity data for both the theoretical and experimental fragmentation spectra were binned according to the instrument resolution. In the case where the MS/MS tolerance was say p ppm, consecutive bins at a given m/z values were separated by (m/z)⅐p/10 6 Da. When the MS/MS tolerance was q Da units, consecutive bins were separated by q Da. In this manuscript, the MS/MS tolerance was often 1Da. Thus, for a theoretical or experimental peak at say m/z ϭ 331.5, a peak with corresponding intensity was placed at m/z of both 331 and 332. The intensity of multiple peaks appearing at a given integer m/z value were then summed to determine the final intensity. The theoretical MS/MS spectrum was then offset/translated over the corresponding experimental data over a range (), and the cross-correlation score (X corr ()) was calculated using (32).
x i and y i correspond to the intensity of the i th m/z value of the processed experimental and theoretical spectrum respectively. x and y are corresponding mean values averaged over all n possible peaks.
ranged from Ϫ50 to ϩ50. During the correlation analysis, two parameters were recorded: i. Peak lag, or the values where X corr () was maximum; and ii. Height Center (HC corr ), which quantifies the normalized height of X corr ( ϭ 0) with respect to the mean X corr () value: HC corr ϭ Xcorr͑ ϭ 0͒ ϫ ͓max͑͒ Ϫ min͑͒ ϩ 1͔/ min͑͒ max͑͒ Xcorr͑͒.
Thus, for a good match, peak lag should lie between ϩ1 and Ϫ1, and HC corr should be large. This is captured in the scoring parameter s 1 below: II. % Ion Match-The total number of peaks in the full theoretical spectrum is N 1 . The number of these peaks that have corresponding experimental matches is K 1 . % ion match ϭ 100 ϫ K 1 /N 1 . High % ion match values thus indicate superior spectrum matches. Thus, the score s 2 is specified as: III. Top 10 Peaks-This parameter quantifies how many of the 10 most intense experimental peaks were matched during the % ion match calculation, after excluding the unfragmented precursor in ETD mode. Here: IV. Poisson Probability-The probability based scoring strategy determined if the predicted match between the experimental data and candidate glycopeptide is a chance event (33,34). For this, a set of "decoy" glycopeptides were generated. This was done by randomly scrambling the peptide sequence in the glycopeptide, and at the same time arbitrarily adding or subtracting a molecular mass between Ϫ50 to ϩ50 for each monosaccharide while keeping the total glycan mass unaltered (see Fig. 4 for example). Twenty-five such decoys were generated for each candidate glycopeptide. The theoretical fragment ions for individual candidate and decoy glycopeptides was then compared with the experimental spectrum. The p value for the candidate was then computed by: where K 1 and N 1 denote the number of matched and total number of fragment ions for the candidate glycopeptide. K and N denote corresponding values for the entire database that includes both candidate and decoy glycopeptides. Low p values indicate a better match. Thus: Ensemble Score (ES)-The above four parameters were weighted according to the following equation in order to arrive at an ensemble score: ES ϭ s 1 ⅐w 1 ϩ s 2 ⅐w 2 ϩ s 3 ⅐w 3 ϩ s 4 ⅐w 4 . Although the individual weights can be varied depending on the fragmentation modes, the following were the settings for the current manuscript: i. CID mode, In this regard, the CID mode equally weights all the scoring parameters. HCD excludes Top10 as the high-peaks in the MS/MS spectrum are often dominated by oxonium ions which do not inform glycopeptide identification. ETD has a high probability-based weighting because glycopeptide fragmentation is often incomplete, with a large precursor peak remaining.
Glycopeptide False Discovery Rate (FDR) Calculation and Structure Assignments-The ES for the candidate glycopeptide was generated as described in the previous section. For each candidate glycopeptide, a "decoy glycopeptide" was also generated by scrambling the peptide backbone and adding/subtracting up to 50Da mass to each monosaccharide while conserving the overall glycan mass. The ES for this decoy is called ES decoy . The glycopeptide FDR at any ES cut-off value was then calculated using: To confirm assignments in this manuscript, ES cut-off was determined for glycopeptide FDR ϭ 1%. All candidate glycopeptides with ESϾES cut-off were then manually inspected using the Browse Results GUI (browsegui) of GlycoPAT. This program presents an annotated MS/MS spectrum detailing the assignments that could be made for the candidate glycopeptide, including an ion map summarizing the identified hits and a more elaborate table showing the details of each assignment. Thus, all assignments were manually inspected and validated.
Proteomics and Glycoproteomics Analysis of Human Plasma Cryoprecipitate-Proteomics analysis was performed using Proteome Discoverer TM (2.1) embedded with search engine SEQUEST HT to identify proteins in two cryoprecipitate samples digested with trypsin and two more samples that were Glu-C digested. These samples were subjected to the above described LC-MS/MS (HCD) experiments. The search parameters were: MS 1 tolerance ϭ 10 ppm, MS 2 tolerance ϭ 0.20 Da, fixed carboxyamidomethyl modification, variable methionine oxidation, max missed cleavage ϭ 2 for trypsin and ϭ 3 for Glu-C. The reviewed Swiss-Prot human FASTA database of "Uniprot Release 2015_01" (Homo sapiens subset with forwarddecoys) was used for the search. These analysis results were then combined using Scaffold (version 4.4.3, Portland, OR) using 0.1% peptide decoy FDR and 4.3% protein FDR. GlycoPAT search was then conducted on the N-linked glycopeptides of the top 7 proteins identified in this manner. The GlyDB search library used here was identical to the "standard N-glycan GlyDB" described in Results, only it also included O-type blood group antigen terminal modifications.
Glycoproteome of Prostate Cancer Cells-LNCaP and PC-3 prostate cancer cell line glycoproteomics data were downloaded from ProteomeXchange (PXD002107) (36). Twenty-two of the 24 .RAW files could be converted to mzXML. The monoisotopic mass corresponding to each MS/MS product was determined using the "averagine" method (37), using trypsin digested fetuin glycopeptide results presented in this publication as a model. Briefly, we calculated the molecular composition of each of the identified fetuin glycopeptides (i.e. C a H b N c O d S e ), and determined the average unit glycopeptide composition by dividing by the overall molecular mass. The isotopic distribution of this typical glycopeptide was determined using the Bioinformatics toolbox of MATLAB. Next, for each experimental MS/MS spectrum, we determined the precursor isotopic distribution by adding the local MS 1 spectra (Ϯ4Da) surrounding the parent ion in a 1 min chromatographic window (Ϯ 30 s) (13). The monoisotopic peak was then determined at 10 ppm resolution by translating the experimental MS 1 distribution over the distribution of the theoretical unit glycopeptide scaled based on the parent ion mass, and determining the position at which the cross-correlation was maximum.
Once the experimental monoisotopic mass was determined, each of the MS files with ϳ40,000 spectra were searched against a theoretical GlyPep library with 429,841 members using a 36-core cluster (Intel Xeon-E5645 processor, 12 cores per node, 3 nodes). This library was generated using the 1793 unique peptides reported using the SPEG (Solid Phase Extraction of Glycopeptides) method (36), and 172 unique glycan masses including high mannose, bi-, tri-, and tetraantennary structures, core-and terminal fucosylated carbohydrates, and sialylated glycans similar to previous work (38). Additional variable modifications included methionine oxidation. Only one N-glycan was permitted on each glycopeptide, whereas there was no limit on the number of oxidation sites. Fixed modifications included cysteine carbamidomethylation, and iTRAQ labeling (114,115,116,117) at N terminus and lysine. For search parameters, tolerance for MS 1 and MS 2 was 10 ppm and 0.06 Da. All other parameters used were program defaults. During the final data processing steps, the candidate with highest ES was selected, provided ESՆ 0.5. Additionally, all accepted results had at least two glycan oxonium ions, and it was verified that the intensity of the highest oxonium ion exceeded the intensity of the iTRAQ reporter. All ES results were compared with Byonic scores reported previously (36).

RESULTS
SmallGlyPep (SGP1.0)-The SGP1.0 nomenclature is designed for the minimal representation of glycopeptides in linear text format for MS based glycoproteomics data analysis ( Fig. 1 supplemental Movie SA). Its design enables the straightforward in silico MS n fragmentation of glycopeptides at one or more locations that may reside either on the peptide backbone or glycan/nonglycan PTMs. Here, upper and lowercase letters represent amino acid and PTM modifications, respectively. Glycan PTMs are described within braces or curly brackets. Nonglycan PTMs are enclosed within chevrons or angle brackets. The list of currently available monosaccharides and nonglycan modifications in GlycoPAT are provided in supplemental Table S1. Additional members can be added by modifying class definitions as described in the software manual. In addition to single letters, arbitrary monosaccharides can also be represented by numbers corresponding to their molecular mass. During the representation of glycans, each monosaccharide is enclosed within a single pair of braces, with the open bracket ("{") just prior to the residue representing the glycosidic bond that links it to the rest of the molecule and the paired closing bracket ("}") indicating the nonreducing end of the antenna on which this monosaccharide resides.
For illustration, a glycopeptide with one N-glycan, one core-2 O-glycan and one nonglycan PTM is shown in Fig. 1A.
To convert this molecule from the conventional Symbolic Nomenclature for Glycans (39) to SGP1.0, the monosaccharides are represented by single letters with hexose, N-acetylhexosamine, N-Acetylneuraminic acid (sialic acid) and fucose being annotated by "h," "n," "s" and "f," respectively (top of Fig. 1A). Curly bracket pairs, color coded in Fig. 1A, are then introduced for linearization with the open and closed brackets bracing the carbohydrate arm containing the monosaccharide(s). Thus, the number of curly bracket pairs equals the number of monosaccharides. Fragmentation of a glycosidic bond results in the release of the glycan fragment enclosed within paired curly brackets (shown using red dashed boxes, Fig. 1A). Thus, the SGP1.0 nomenclature enables streamlined design of algorithms for in silico glycopeptide fragmentation at multiple sites. This is necessary for MS n data analysis.
GlycoPAT Workflow and Graphical-user-interface (GUI)-Using SGP1.0 as the foundation, a suite of functions was written in MATLAB for tandem-MS glycoproteomics data analysis (workflow in Fig. 1B), including GUIs for MS/MS experiments (Fig. 1C). The full program includes ϳ13,000 lines of code and additional libraries.
In this workflow, first, the glycan search database (or "GlyDB") is designed by either manually listing the glycans in text input files or generating them automatically using the "connection inference" algorithm described previously (23,40). Fig. 2 illustrates the latter case, using an input set of 7 seed glycans and 9 enzymes to generate the "standard Nglycan GlyDB." Here, the seed glycans include one highmannose structure that initiates glycan biosynthesis, and terminal bi-, tri-and tetra-antennary sialylated structures both with and without core fucose ( Fig. 2A). Among the enzymes (Fig. 2B), the mannosidases (Man I and Man II) trim highmannose structures, N-acetylglucosaminyltransferases (GnT-I, -II, -IV, -V) enable N-glycan branching, and the remaining enzymes either extend or decorate the glycan terminus. Enzyme specificity rules are presented based on previous class structures (supplemental Table S2, (23,40)). Using this algorithm, the network linking the starting high mannose glycan (species i) to the tri-antennary glycan (species v) contains 37 automatically generated glycans and 63 reactions. Grouping isomeric glycans with identical monosaccharide compositions and similar fragmentation patterns reduced the glycan number in GlyDB from 37 (Fig. 2C) to 19 prototypic structures (Fig. 2D). Similarly, the full glycosylation network connecting all 7 seed glycans ( Fig. 2A) has 250 unique glycans and 596 reactions (supplemental Fig. S1). Clustering these glycans reduced the "standard N-glycan GlyDB" to 75 members. Similar to this, a "standard O-glycan GlyDB" was generated with 15 species using seed glycans and enzymes in supplemental Table S3. The full reaction network is illustrated in supplemental Fig. S2. The GlyDB library generated in Fig. 2 can be readily expanded by including additional glycosylation related enzymes, as illustrated later in the study of prostate cancer cells. Alternatively, it may also be generated using glycomics-based MS profiling studies or using curated organism-specific glycan databases.
The "theoretical glycopeptide database" or GlyPepDB, which contains the list of potential glycopeptides in the sys-tem, was generated in SGP1.0 format using: (1) GlyDB from the previous step, (2) list of protein inputs provided in FASTA format (input 3, Fig. 1B), (3) fixed and variable nonglycan PTMs (input 4, Fig. 1B), and (4) protease(s) used for digestion (input 5, Fig. 1B). To limit the size of GlyPepDB, GlycoPAT has facilities to limit the maximum and minimum number of variable PTMs on any peptide, and stipulate specific protein backbone locations (e.g. the 55th and 68th amino acid) where variable modifications may occur. Such facilities are important to limit the combinatorial expansion of GlyPepDB and focus the specific search.
To determine which of these glycopeptides from GlyPepDB are present in the sample, GlycoPAT first matches each experimental MS 1 mass to the precursor mass of the GlyPepDB members. Once a "candidate glycopeptide" is identified, a score is generated to relate the corresponding experimental MS n spectrum with the theoretical spectrum generated by in silico fragmentation of the candidate GlyPepDB member. This ensemble score (ES) weighs various statistical parameters: (1) X corr : "Cross correlation" score; (2) % ion match: The percentage of ions that are matched between the theoretical and observed spectrum; (3) Top 10 peaks: Number of 10 most intense experimental peaks that matched the theoretical ions; and (4) p value: The probability based on the generation of a set of glycopeptide decoys.
The GlycoPAT software is freely available as an opensource, platform-independent toolbox. GUIs are currently available to implement core functions (Fig. 1C, supplemental Analysis of Single Glycoproteins- Fig. 3 presents data confirming the ability of GlycoPAT to identify standard proteins (fetuin, asialofetuin and RNaseB) in different fragmentation modes. These spectra were identified using the "standard Oor N-glycan GlyDB" described above. Here, Fig. 3A-3C compares the MS/MS fragmentation patterns of N-linked glycans from fetuin in HCD, CID, and ETD modes. Consistent with previous work (8), the HCD spectrum contains abundant low m/z peaks corresponding to Hex (m/z ϭ 204), Neu5Ac (m/z ϭ 292), HexHexNAc (m/z ϭ 366), and cross-ring fragments of monosaccharides and water loss (m/z ϭ 138, 167, 185) ( Fig.  3A). At high m/z, Y-ions corresponding to the peptide backbone cleavage with short glycan stubs are also evident. CID lead to less extensive fragmentation of sialylated tetra-antennary glycopeptides compared with HCD. Thus, the oxonium B-ions were less intense compared with HCD, and high molecular mass peaks with a loss of either Neu5Ac or Neu5AcHex were abundant (Fig. 3B). Whereas the B-ions had z ϭ 1, the Y-ions included both neutral loss and loss of charge peaks. The presence of core-fucosylation was evident based on the diagnostic ion at m/z ϭ 1381. Fragmentation in ETD mode led to c/z-ions though the efficiency of fragmentation was low with a large precursor ion peak remaining in the MS/MS spectra (removed from Fig. 3C). A few b-and y-ions were also noted, likely because of the use of supplemental activation (a low energy HCD) to dissociate the charge-reduced species. These data confirm the ability of GlycoPAT to analyze glycopeptide fragmentation data in three fragmentation modes.
In addition to the above, GlycoPAT also identified other glycan-types including asialoglycopeptides from asialofetuin (Fig. 3D), high mannose glycoconjugates from RNAseB (Fig.  3E) and O-linked glycopeptides from fetuin (Fig. 3F), all using CID fragmentation. Among these, the nonsialylated tri-antennary glycan in Fig. 3D displayed a range of B-(m/z ϭ 366, 528) and Y-ions. Fragmentation of the high-mannose (Man 8) glycan located at Asn80 of RNaseB resulted in a ladder pattern because of successive loss of one to seven mannose residues in products with z ϭ 3-5 (Fig. 3E). The CID mode fragmentation of O-linked glycopeptides resulted in a pattern similar to the N-linked glycopeptide with B-ions (m/z ϭ 366, 657) and Y-ions because of the loss of Neu5Ac, Neu5AcHex and Neu5AcHexHexNAc (Fig. 3F). Additionally, a small portion of the peptide backbone was also fragmented, resulting in selected b-and y-ions.
Ensemble Score and Decoy-based Strategy for Controlling False-positives-It is necessary to determine the minimal ES or ES cut-off value, which identifies high-quality spectrum matches with minimal false-positive hits. Although GlycoPAT has facilities to implement both global and local glycopeptide FDR calculations to determine this ES cut-off , the global approach is illustrated in Fig. 4. Here, one decoy glycopeptide is first generated for each "candidate glycopeptide" that had an MS 1 match. This is generated by scrambling the base peptide sequence and randomly adding or subtracting a mass of up to 50 Da to each monosaccharide while keeping the total mass of each glycan constant (Fig. 4A). Alternate methods are also possible in GlycoPAT though they were not used in this manuscript. These methods for generating peptide decoys include swapping the first 1-2 amino acids, or reversing the amino acid sequence (Fig. 4A). Overall, the approach used for creating decoy glycopeptides is more comprehensive compared with prior work that either only scrambled the peptide or added a fixed mass to the glycan (41).
The GlycoPAT function names and overall strategy to create decoy glycopeptides are presented in Fig. 4B, along with one example in Fig. 4C. Here, the decoy monosaccharides are represented using numbers corresponding to the mass of the decoy, because monosaccharides in GlycoPAT can be defined using either predefined single letter nomenclature or molecular mass. In this manner, ES is calculated for each "candidate glycopeptide" and its corresponding decoy (Fig.  4D). The global FDR at any ES is then defined based on the ratio of the number of decoy glycopeptides having scores Ͼ ES compared with that for the candidate glycopeptides. Fig. 4E-4G presents an example of glycopeptide FDR calculation for trypsin-digested bovine fetuin in CID mode. In Fig.  4E, a cumulative ES score plot is shown for ϳ850 candidate glycopeptides and their corresponding decoys. As expected, the candidate glycopeptides have a higher ES score. Here, at ES ϭ 0.2, global glycopeptide FDR equals ϳ18.8% (ϭ 150/ 800 ϫ 100). Fig. 4F presents the same data following calculation of glycopeptide FDR at each ES value. As seen, glyco-peptide FDR increases upon relaxing the ES value (Fig. 4F). To set a conservative selection criteria for minimizing false-positives, the current manuscript uses a 1% FDR. In this example, this corresponds to an ES cut-off value of 0.53 (see inset). Fig.  4G presents the relation between glycopeptide FDR cut-off values and number of candidate glycopeptide spectra identi-fied as true hits. Here, relaxing FDR increases the number of accepted spectra (Fig. 4G). In the fetuin example, 530 of the 850 candidate spectra had ES cut-off Ͼ0.53 and FDRϽ1% (see inset). Because many of these spectra corresponded to the same glycopeptide, the actual number of fetuin glycopeptides identified is smaller.

Comparing GlycoPAT Scoring with SEQUEST and Byonic-
The scoring results using GlycoPAT were compared with two popular commercial software, SEQUEST (32) and Byonic (17) (Fig. 5). Whereas SEQUEST is dedicated to MS based proteomics data analysis, Byonic extends the classical proteomics methods for glycopeptide analysis. The first two panels compare the proteomics spectra of these programs by comparing the GlycoPAT ES for a single fetuin MS run with equivalent metrics in SEQUEST (Fig. 5A) and Byonic (Fig. 5B). As seen, most of the assignments with ESϾES cut-off (0.54) have both high SEQUEST Sf score (Ͼ0.6, Fig. 5A) and high Byonic scores (Ͼ200, Fig. B. Additionally, some assignments with ESϽ 0.54 also have high Byonics and/or SEQUEST Sf scores, consistent with the conservative practice of GlycoPAT ES to limit the number of true-positive spectra based on low glycopeptide FDRs. Upon comparing the glycoproteomics score using Glyco-PAT with Byonic, differences were evident because the scoring criteria are not similar (Fig 5C, 5D). This is because GlycoPAT considers the extensive fragmentation of glycans during scoring, whereas Byonic (in HCD and CID modes) primarily only considers the oxonium ions, nonglycosylated peptide, peptide plus core HexNac (ϩ core fucose, if present) and loss of sialic acid (42). In this regard, there is reasonable agreement between both programs when scoring HCD MS/MS spectra because this is dominated by the peptide Y 0 -ion and small glycopeptide stubs (Fig. 5C). Spectra in quadrant-I score better in Byonic (scoreϾ175) because it considers water-losses and some larger glycopeptides that are not considered by GlycoPAT. These peaks, that are unique to Byonic, are indicated by red arrows in Fig. 5E. GlycoPAT scores are higher in quadrant-IV because the SGP1.0 nomenclature enables the simultaneous fragmentation of both the glycan and peptide backbone. In Fig. 5F, several such peaks with simultaneous glycan and peptide breaks are evident (green arrows in Fig. 5F). Another example spectrum from quadrant-IV is also shown in supplemental Fig.  S4, with the raw output window from Byonic and GlycoPAT, contrasted with manual spectrum annotation.
The importance of considering extensive glycan fragmentation is very clear when considering CID data analysis, where the breakage of glycosidic bonds dominants the spectrum (Fig. 5D). Here, several spectra in quadrant-II were identified to be good hits by both programs with ESϾ0.47 and Byonic scoreϾ175 (example in Fig. 5G). The glycopeptides in quadrant IV had high GlycoPAT but low Byonic scores. As seen in the representative spectrum in Fig. 5H, this is because Gly-coPAT comprehensively identifies both the glycan B-ions (m/z ϭ 366.2, 657.4) and Y-ions corresponding to progressive monosaccharide losses.
Although some representative spectra are shown in Fig. 5, the conclusions drawn here were generally true for at least three different runs performed in HCD mode on a Thermo Q Exactive instrument, and 8 -10 runs performed in CID mode on Thermo LTQ-Fusion. GlycoPAT annotated spectra for additional quadrants are provided in supplemental Fig. S5 for HCD data, and supplemental Fig. S6 for CID. Overall, these results highlight the importance of considering both full glycan fragmentation, and simultaneous glycan and peptide fragmentation during glycoproteomics scoring. Table I summarizes all the N-glycans in the "standard N-glycan GlyDB" that were identified for three single standard proteins (fetuin, asialofetuin, RNase B) and also mixtures of standard proteins (fetuin, AGP-1, fibronectin plus RNase B). These runs were performed following digestion using either trypsin or Glu-C in CID fragmentation mode. They did not apply either chromatography methods to enrich for glycopeptides or HCD product-dependent strategy to select them (43). A 1% glycopeptide FDR cut-off criterion was used for the initial screen followed by manual validation of each spectrum. During such validation, multiple glycans were grouped using curly brackets when the observed glycan fragmentation pattern was consistent with more than one member of GlyPepDB. Unique structural assignments were also possible in some cases. supplemental Material provides the detailed structures identified for the single proteins (supplemental Table S4) and mixtures (supplemental Table S5), along with individual annotated spectra in jpeg and MATLAB .fig formats.

Analysis of Single Proteins, Simple Mixtures and Human Blood Plasma Cryoprecipitate-
In the single protein study (supplemental Table S4), Nglycans were identified at the single N-glycosylation site of RNase B (Asn60) and all three putative sites of fetuin and asialofetuin (Asn 99, 156, and 176). Consistent with previous reports (44 -49), the glycans of fetuin included sialylated bi-, tri-, and tetra-antennary carbohydrates both with and without fucose. Fucosylated N-glycans are annotated in supplemental Fig. S7A. As seen, the bi-antennary glycans are preferred at Asn99 and Asn156 but not Asn176. supplemental Fig. S7B presents the retention time profiles of various fetuin glycopeptides generated by trypsin digestion. Here, the sialylated biantennary N-glycan on Asn156 eluted first at 47 min, followed by sialylated bi-and tri-antennary structures at Asn156 and finally the larger tri-and tetra-antennary glycans of Asn99 and Asn176. Glycans identified on asialofetuin (supplemental Table S4) are similar to normal fetuin, only they lack sialic acids. High mannose structures dominate RNaseB as previously reported (50), with one additional hybrid N-glycan (Man3GlcNAc).
In the glycoprotein mixture with four components (supplemental Table S5), 15 N-glycosylation sites were identified including 2 of 3 putative sites on fetuin, 3 of 5 on AGP-1, 9 of 11 on fibronectin and the single site on RNase B. In the case of AGP-1, we detected 14 of the 22 bi-, tri-and tetra-antennary N-linked glycans reported in a previous glycomics profiling study (51). All glycans were sialylated, with glycan structural diversity being greatest at amino acids 56 and 103. The Man5 and Man6 high mannose structures, which are most abundant in RNAseB (52), were measured in the mixed sample. Additionally, complex structures were observed, which is consistent with an earlier multiple-laboratory collaborative investigation (52). The study of fibronectin also revealed the presence of many of the bi-and tri-antennary carbohydrates reported earlier (53), along with additional tetra-antennary structures that are thought to be elevated following oncogenic transformation (54). The current study reports N-glycans at all 7 fibronectin-sites reported previously (53), along with additional glycosylation at amino acids 1236 and 1417. Altogether, 72 unique glycopeptides were determined in this mixture including 42 on fibronectin. Several of the site-specific glycosylation data identified here were not reported previously, especially for AGP-1 (52,55) and fibronectin (53).
In a last example, the ability of GlycoPAT to profile human N-linked glycans in a complex mixture was assessed by analyzing plasma cryoprecipitate prepared from 5 ml blood drawn from an O-type blood donor. Here, spectra potentially corresponding to glycopeptides were identified using product-dependent HCD-mode fragmentation. These putative gly- copeptides were then fragmented using CID. Here, once a candidate glycopeptide was identified based on MS 1 mass match, CID MS/MS was used to determine glycan structure. The presence of a prominent peak corresponding to the underlying peptide with 0 -2 HexNAcs (ϩ0 -2Hex or ϩFuc in the case of core-fucose) in the HCD MS/MS spectra confirmed the peptide backbone identity. Additionally, the MS 1 spectrum was manually inspected to verify that the fragmented ion was the monoisotopic peak. GlyPepDB library, in this case, included the seven most abundant proteins present in the sample based on sequence coverage. The glycans in this data set included the "standard N-glycan GlyDB" and additional carbohydrate structures corresponding to blood group antigens. This experiment revealed 57 unique glycopeptides, including 22 glycopeptides on von Willebrand Factor (VWF) (supplemental Table S6). Although the glycans identified here were themselves identical to a previous study based on Glycomics analysis (56), several novel site-specific glycosylation events or glycopeptides are reported in supplemental Table  S6. Detailed structural analysis performed to distinguish between different glycoforms confirmed the presence of corefucosylated glycans on several VWF glycopeptides. However, unequivocally confirmation of the presence of blood group antigens on VWF was not possible (56), as this requires higher-levels of MS n analysis. The single site of glycosylation on plasminogen (PLMN) contained a tetra-antennary glycan. Fibronectin (FINC) had a subset of the glycopeptides identi-fied in the 4-protein mixture studies. The glycopeptides of fibrinogen were mostly bi-antennary sialylated structures particularly on the ␤ (FIBB) and ␥ (FIBG) chains of the protein, consistent with a previous glycomics investigation (57). Higher level of glycan branching was noted on the ␣-chain (FIBA). Finally, the study identified complex N-glycans at four of the eight potential sites of alpha-2-macroglobulin (58). Overall, the pilot study illustrates the ability of GlycoPAT to analyze the plasma glycoproteome. Glycoproteomics Analysis of Prostate Cell Lines-To illustrate the ability of GlycoPAT to identify glycopeptides in complex samples, we analyzed previously published prostate cancer glycoproteomics experiments using GlycoPAT, and compared the findings with Byonic scores (supplemental Table S7 in (36)). The GlyPepDB in this case had 429,841 members. Peptides and glycans used to generate this library are listed in supplemental Tables S7 and S8. Such analysis identified 1441 spectrum including 960 unique glycopeptides (supplemental Table S9). 1086 of these spectrum were common between GlycoPAT and Byonic (Fig. 6A), and thus identifications with high GlycoPAT ES also typically displayed high Byonic scores. Though the total number of files/spectra identified by both software were similar, 355 identifications were unique to GlycoPAT with ESϾ0.5, and 616 were unique to Byonic with scoreϾ300 (Fig. 6B). MS/MS spectra uniquely identified using GlycoPAT with ESϾ0.5 are individually annotated as part of supplemental Material. Many of these were  (36). A, Dot plot showing summary scores for GlycoPAT (ES) and Byonic, along with contour lines. Each dot represents the same MS/MS spectrum-glycopeptide identification by both GlycoPAT and Byonic. High GlycoPAT scores generally agreed with Byonic assignments, though some scatter in the data is evident. B, Venn diagram showing overlap in the number of spectra identified using GlycoPAT and Byonic. Most of the identifications were common, though some hits were unique to GlycoPAT because of consideration of simultaneous fragmentation events on the glycan and peptide backbone. C, 620 files uniquely identified by Byonic were not identified by GlycoPAT. This was because of differences in precursor monoisotopic mass assignment, lower than acceptable ES score or library used for the search.
identified because of the unique scoring scheme of Glyco-PAT, which is probability based, and which weights simultaneous glycan and peptide fragmentation events. Unique identifications reported by Shah et al. (2015) (36) fell below the GlycoPAT acceptance criteria, primarily either because of differences in the monoisotopic peak assignment or low ES score (Fig. 6C). Overall, the analysis of this complex data set using two independent approaches illustrates the complexity of the glycoproteomics data analysis problem. It suggests that further refinement of the scoring strategy is necessary. Additionally, reliable glycopeptide identifications likely requires comprehensive MS n scoring in more than one fragmentation mode. DISCUSSION This manuscript presents a well-documented, open-source software for glycoproteomics data analysis. The program presents several novel concepts and commonly used functions for glycopeptide digestion, database generation, glycopeptide fragmentation, spectrum scoring and glycopeptide FDR calculations. The scoring scheme developed has been validated for CID, HCD and ETD modes, primarily for Nglycans and O-linked glycans. The focus of the current effort was on software development, rather than the discovery of new biology, because open-source, easy-to-use, modular computational resources are currently lacking in the field of glycoproteomics. This represents a major research bottleneck that hampers the field (7,10). To this end, the current program comes with systematic class definitions, modular design, comprehensive documentation and online tutorials to facilitate program expansion by various investigators in the field. Using this tool, arbitrary monosaccharides, PTMs and fragmentation rules for additional methods like EThcD (59) can be rapidly introduced to enable tandem-MS data analysis (detailed examples in user manual). Additionally, the program is written in MATLAB (with JAVA libraries) because the presence of a vast library of well-written MATLAB functions will enable the rapid expansion of GlycoPAT capabilities without the need for extensive coding. This includes functions for GUI development, statistical analysis, text manipulation, data visualization and table handling. Using a 5-node, 60 core computing cluster and default GlycoPAT settings, ϳ90,000 MS spectra for a single plasma cryoprecipitate run can be analyzed in 8h against a ϳ50,000-member GlyPepDB. Finally, the use of MATLAB facilitates the ready integration of programming modules developed in this package with other programs in the fields of Systems Glycobiology that already use the same platform (7,23,60). Together, these developments aim to make the field of glycoproteomics more accessible to the larger biological community.
Unlike proteomics based programs like Byonic, GlycoPAT is glycan-centric, with a greater focus on carbohydrate fragmentation/structure analysis based on CID-mode data. In this regard, low-energy CID yields a pattern of glycan B-/Y-ions that reflect the carbohydrate assembly. The analysis of this fragmentation pattern enables partial validation of the glycan structure, but it misses the underlying peptide. To address this limitation, GlycoPAT also has facilities for high-energy HCD and ETD MS n spectra analysis, as these provide clues regarding the glycosylation site. Here, HCD enables the identification of glycopeptide fragmentation spectra because of the release of prominent oxonium-ions at low m/z, and it also fragments the peptide backbone. Additionally, as shown in Fig. 5, simultaneous glycan and peptide backbone fragmentation is also a common occurrence in HCD and must be considered during scoring. Unlike the collision activated dissociation modes, ETD predominantly leaves the glycan intact, but it results in glycopeptide backbone fragmentation. This mode is however, most useful only for multiply charged glycopeptides with precursor m/z Ͻ1000. Because of the above, the ideal experimental workflow and data analysis software should combine complementary information emerging from fragmentation of the same peptide in all three fragmentation modes to arrive at a final identification.
Samples analyzed by GlycoPAT thus far include standard glycoproteins, simple mixtures, the most abundant entities in plasma cryoprecipitate and prostate cancer cell line preparations. The ES, measured in the current version, is limited to MS/MS spectrum analysis. It is anticipated that future versions of the program will include more complex ES estimates that integrate the individual scores from HCD, ETD, and CID fragmentation modes at multiple MS n levels. Additionally, as shown recently, there are several challenges with glycopeptide identification in complex mixtures that are not handled well in current software (35,61). These "challenging assignments" occur because of the following identical or near-identical mass balances that can lead to false-identifications, especially in complex mixtures where the search library size is large: (1) Neu5Ac-Neu5Gc ϭ Hex-Fuc ϭ oxidation mass (35); (2) 2ϫFuc-Neu5Ac ϭ 1Da (35); (3) HexNac-Fuc ϭ carboamidomethyl modification (61); (4) HexNAc-Hex ϭ carboamidomethyl -oxidation (61); (5) Asn/Gln deamidation. As suggested by Darula et al. (61), additional software development, analysis of data from more than one fragmentation mode, and consideration of LC retention time is necessary to handle these challenging assignments.
Although the current manuscript introduces an extensive computational infrastructure for glycoproteomics analysis, additional functional modules are currently being developed to handle more complex experimental workflows and to reduce computational time. Specifically, the current version uses the SGP1.0 nomenclature as it accommodates arbitrary monosaccharide types and enables easy in silico glycopeptide fragmentation at multiple locations. This format is currently being expanded to also accommodate bond linkage information. Additionally, a new module called DrawGlycan is being integrated into GlycoPAT to render high-quality glycan drawings, including bond fragmentation data, in the final an-notated output spectrum (60). Modules are also being added to quantitatively discriminate between isomeric glycan structures that share the same precursor mass. Finally, the current code has been validated using single glycoproteins, glycoprotein mixtures or plasma cryoprecipitate, without the implementation of glycopeptide enrichment strategies. Such enrichment methods using lectins, ion-exchange or other specialized columns may enhance the fidelity of the glycopeptide identifications. Together, these advancements, along with the utilization of quantitative MS methods, are planned as this can reveal new details regarding the heterogeneous glycoproteome that is currently masked by the more abundant nonglycosylated entities.