Deep sequencing of complex proteoglycans: a novel strategy for high coverage and site-specific identification of glycosaminoglycan-linked peptides

Proteoglycans are distributed in all animal tissues and play critical, multifaceted, physiological roles. Expressed in a spatially- and temporally-regulated manner, these molecules regulate interactions among growth factors and cell surface receptors and play key roles in basement membranes and other extracellular matrices. Due to the high degree of glycosylation by glycosaminoglycan (GAG), N -glycan and mucin-type O -glycan classes, the peptide sequence coverage of complex proteoglycans is revealed poorly by standard mass spectrometry-based proteomics methods. As a result, there is little information concerning how proteoglycan site specific glycosylation changes during normal and pathological processes. Here, we developed a workflow to improve sequence coverage and identification of glycosylated peptides in proteoglycans. We applied this workflow to the small leucine-rich proteoglycan decorin and the hyalectan proteoglycans; neurocan, brevican, and aggrecan. We characterized glycosylation of these proteoglycans using LC-MS methods easily implemented on instruments widely used in proteomics laboratories. For decorin, we assigned the linker-glycosite and three N -glycosylation sites. For neurocan and brevican, we identified densely glycosylated mucin-like regions in the extended domains. For aggrecan, we identified 50 linker-glycosites and mucin-type O -glycosites in the extended region and N -glycosites in the globular domains, many of which are novel and have not been observed previously. Most importantly, we demonstrate an LC-MS and bioinformatics approach that will enable routine analysis of proteoglycan glycosylation from biological samples to assess their role in pathophysiology. N -glycans or mucin-type O -glycans. We used LC-MS instruments and collisional dissociation methods that are widely available in proteomics laboratories. Our software used to interpret the data (GlycReSoft) is open source and publicly available.


Introduction
At present, 43 distinct genes have been reported for proteoglycans (1) and are found ubiquitously in intracellular vesicles, on cell surfaces, in pericellular spaces and in extracellular matrices. Proteoglycan backbone sequences contain modules, conserved among family members, that bind to extracellular matrix collagens, glycoproteins, and hyaluronan to form three dimensional networks, the structures and functions of which are tissue-, development-, and disease-specific. It must be emphasized that the functions of this relatively small number of gene products are elaborated by spatially and temporally regulated glycosylation (2). These molecules are of interest for their roles in the dysregulated cell signaling mechanisms associated with a wide variety of cancers (3)(4)(5)(6)(7)(8)(9) and for understanding of the roles extracellular matrix structural cues play in permitting cancer growth, angiogenesis and metastasis (10).  (11)(12)(13). They can carry modifications to the linker tetrasaccharide including sulfation of one or both Gal residues (14,15). Fucosylation of Xyl and sialylation of Gal adjacent to Xyl have been reported for bikunin (16).
The SLRPS, including 18 gene products and many splice variants, comprise the largest family of leucine rich repeat containing proteins. They contain homologous 24 amino acid leucine rich repeats and protein cores of 36-42 kDa (1). Two of the five SLRP classes are non-canonical and are not modified with GAG chains. SLRP biological functions arise in part through multifaceted protein binding interactions. The crystal structure of decorin, the most studied family member, shows a curved solenoid structure that typifies the SLRPS (17). Protein binding occurs primarily through the amino acid side chains on the concave side of the solenoid. The SLRPs bind and modulate the structures of collagens. They also impact cellular signaling through their interactions with receptor tyrosine kinases and growth factors.
The hyalectans comprise aggrecan, versican, brevican and neurocan, each of which contains N-terminal globular G1 domains with homology to link proteins that bind the hyaluronan (HA) polysaccharide, anchoring the molecules to the cell surface. All contain a C-terminal G3 domain with EGF-like repeats, a complement regulatory module, and a C-type lectin module (1).
Aggrecan contains a G2 domain, adjacent to G1, which has link homology. The hyalectans also contain N-glycosylation sequons in the globular domains that have been studied using mass spectrometry (18).
Much of the information regarding patterns of proteoglycan expression derives from western blots, immunohistochemistry, and other antibody-based studies (19,20). While such antibodies indicate the presence of individual proteoglycan epitopes, the epitope structures can be difficult to determine and the level of antibody specificity is often unclear. Lectins are also used for staining tissue proteoglycans but leave the underlying proteoglycan structure undefined (21)(22)(23).
Using conventional proteomics workflows, protein levels are inferred from the abundances of a few representative unmodified peptides. Since small, homogeneous PTMs including phosphorylation, acetylation, and methylation can be predicted readily from genomics databases, they have been studied widely using proteomics (24,25). Complex glycosylation alters the dissociation behavior of peptides, and is heterogeneous as a rule. These properties multiply the number of molecular forms (proteoforms) that must be considered in a proteomics search space. As a result, specialized analytical and bioinformatics methods are necessary (26,27).
Enzymatic removal of GAG chains leaves a linker tetrasaccharide plus one disaccharide attached to the protein or peptide. Larson et al. identified characteristic oxonium ions in tandem mass spectra of linker-peptides of CS and HS proteoglycans (16). They also observed a set of linker substituents, including sulfate, phosphate, fucose, and sialic acid (28)(29)(30). They used this approach to identify proteoglycan core proteins based on the linker-peptide sequences. In a study of CSPGs enriched from human urine or cerebrospinal fluid, these authors identified linker-glycopeptides from proteoglycans including decorin, brevican and neurocan (30).
Our goal is to characterize proteoglycan glycosylation as completely as possible. We focus in this work on one member of the SLRP family (decorin) and three hyalectans (neurocan, brevican and aggrecan). We demonstrate analytical and bioinformatics workflows that address proteoglycan complexity. The LC-MS measurements were conducted using widely available instrumentation. We used our publicly available, open source GlycReSoft software to calculate bioinformatics search space sets for proteoglycan samples and assign glycopeptides. We assigned the proteoglycan N-, mucin-type O-, and GAG-glycosylation by calculating theoretical glycopeptide lists against which higher energy dissociation LC-MS data were searched.

Materials and methods:
Materials: Bovine articular cartilage aggrecan, bovine articular cartilage decorin, chymotrypsin, endoproteinase Glu-C, dithiothreitol, iodoacetamide, sodium dodecyl sulfate, urea, MS-safe protease and phosphatase inhibitors, and chondroitinase ABC were purchased from Sigma-Aldrich (St. Louis, MO). Recombinant human neurocan and brevican were purchased from R&D systems. As described in the manufacturer data sheet ( Figure S 1), the human neurocan was expressed with a C-terminal 6-His tag in Chinese hamster ovary cells with a predicted core protein molecular weight of 141.4 kDa. The molecular weight observed using reducing SDS-PAGE was 140-290 kDa. Activity was determined by measuring enhancement of neuroglioma cell adhesion. As described in the manufacturer data sheet ( Figure S 2), human brevican was expressed in a mouse myeloma cell line with a C-terminal 6-His tag with a predicted core protein molecular weight of 97.7 kDa The molecular weight observed using reducing SDS-PAGE was 130-160 kDa. Activity was assessed using functional ELISA assay in which brevican bound biotinylated hyaluronan with an estimated Kd < 3 nM. Trypsin/Lys-C mix was purchased from Promega Biosciences (Madison, WI). The BCA protein quantification kit was purchased from Thermo-Fisher Scientific (Waltham, MA).
Proteoglycan preparation and digestion: Proteoglycans were dissolved in water and protein concentration was measured using a BCA kit. For decorin, brevican and neurocan, proteolytic digestion was carried out using 1:50 enzyme:protein ratio trypsin and Lys-C mixture (Promega Corporation, Madison, WI) and 100 µg proteoglycan in 100 µL 50 mM ammonium bicarbonate with 0.025% ProteaseMax surfactant (Promega) at 37⁰C overnight. To increase peptide coverage for aggrecan, Glu-C digestion was performed following trypsin/Lys-C digestion. All digestions were performed on a Thermo mixer with 30 sec shaking after 15 min. Enzymes were inactivated by heating at 98⁰C for 5 min.
Enrichment of GAG-linked peptides: The enrichment workflow is shown in Figure S 4.
Proteolytic digests were loaded on a 0.5 mL 10 KDa molecular weight cutoff filter (Millipore Burlington, MA), and centrifuged at 14000Xg for 20 min. A 25 µL volume of 0.5 M NaCl was added to the retentate and the filter was centrifuged at 14000Xg for 10 min. A 50 µL volume of water was added to the retentate and the filter was centrifuged at 14000Xg for 10 min. GAGlinked peptides were retained by the 10 kDa molecular weight cutoff filter. Chondroitinase ABC (5 mU) was added to the retentate and incubated at 37⁰C overnight. The solutions were filtered through a 10 kDa molecular weight cutoff filter as above. Fractions were analyzed using reversed phase LC-MS as described below. Plots of the extracted ion chromatograms for the ΔHexAHexNAc oxonium ion at m/z 362.1 showed that fraction retained by the 10 kDa contained the linker-peptides, (Figure S 5). As will be described in detail for each proteoglycan sample, CS-linker-peptides were enriched in the retained fraction.
Liquid chromatography mass spectrometry: Enriched digests were cleaned with Omix C-18 tips (Agilent Technologies, Santa Clara, CA) and 100 ng quantities were loaded on a Q-Exactive HF mass spectrometer (Thermo Scientific, San Jose, CA) equipped with an Advion NanoMate nanoESI source, coupled to a Waters nanoAcquity nanoLC system. A Water Xbridge reversedphase column (150 μm × 100 mm) with 1.7 μm BEH C18 resin and a Waters trap column (180 μm × 20 mm) packed with 5 μm Symmetry™ C18 stationary phase were used for online desalting and separation of the proteomics samples as previously described (31). The mass spectrometer was programmed to acquire data-dependent tandem MS using the top 20 most Mass Spectral Preprocessing using GlycReSoft (33,34) The workflow for analysis of glycoproteomics samples using GlycReSoft is shown in Figure S  and proceeding it to boost signal and stabilize isotopic pattern consistency. We provide all raw mass spectral data for readers to verify our results using the GlycReSoft software. We also providee the GlycReSoft reports as supplemental files in interactive HTML format that show assigned and annotated tandem mass spectra.

Glycopeptide Database Construction
We constructed a glycomics search space that included N-, mucin type O-, and linkersaccharides. We used a combinatorial search space for N-glycans (Hex0-3, HexNAc2-8, Fuc0-4, NeuAc0-4; HexNAc > Fuc; HexNAc>NeuAc; sulfate = HexNAc -2). We manually curated a list of 148 human mucin-type O-glycans with NeuAc and 427 bovine with NeuAc or NeuGc. We manually curated a list of 20 saccharides that corresponded to the linker tetrasaccharides plus ΔHexA-HexNAc disaccharide plus 0-3 sulfate, 0-1 Fuc, 0-1 phosphate, 0-1 NeuAc. For human proteoglycans, the search space contained only NeuAc (a total of 3025 compositions). For bovine proteoglycans, both NeuAc and NeuGc were allowed (a total of 8791 compositions). We constructed a glycopeptide search space for each sample using the mzIdentML file containing proteomics search results for that file, using all identified peptides as well as all theoretical peptides produced by an in-silico digest of those proteins with one glycosylation site. In order to build a search space for bi-glycosylated glycopeptides, we used results from the search for singly glycosylated peptides. All search space files have been posted to the Pride archive along with the raw MS data (Pride number will be given in the final accepted manuscript).

Glycopeptide Database Search Identification
We carried out glycopeptide identification by searching the preprocessed mass spectra against the associated databases using a 10 parts-per-million (ppm) error tolerance for the precursor mass and a 10 ppm error tolerance for product ions. We aggregated identified glycopeptides with a spectrum-level FDR of 5%. Our final result set contains only those identified in the search of the two glycosylation site database. The GlycReSoft reports have been provided as supplementary information.
The scoring function in GlycReSoft ranges between 0 and positive infinity, being based on -10 * log10(p value) from a binomial test of the matched peak intensity against all peaks + -10 * log10(p value) from a binomial test of the number of matched fragments given the number of theoretical fragments and mass accuracy constraints, weighted by the glycopeptide sequence coverage. This score is further augmented by a small bias towards higher mass accuracy following a Gaussian distribution, and bias towards glycan compositions which contain signature ions present in the matched scan. Larger scores correspond to higher confidence assignments, and a score below 20 tends to be unreliable. The glycopeptide false discovery rate is estimated using the target-decoy strategy (35), and for each glycopeptide the reported q-value (36) is the lowest FDR at which a glycopeptide can be accepted.

Glycopeptide identification comments
In many cases O-glycopeptides did not produce adequate glycosylated product ions to localize the glycosylation site. In other instances, O-glycopeptides could be solved by using a single large O-glycan or with two smaller O-glycans which sum to the same composition. When no discriminating ions were present, we reported all possible configurations supported by the data.
In some scans, we observed additional HexNAc-bearing peptide backbone product ions.
Many of the linker glycopeptides of aggrecan contain short recurring peptide subsequences, making the glycosylation at each occurrence ambiguous. As these digest products would be indistinguishable under the experimental conditions, we assigned all evidence for each recurring glycopeptide to all valid sites. Different proteases would be necessary to separate the glycosylation patterns at each recurring subsequence using the flanking non-redundant sequences.
Sulfated N-glycopeptides were identified by precursor exact mass and the presence of sulfated oxonium ions. Since the intact masses of sulfate + 2 HexNAc (504.1261 Da) and 3 Hexose (504.16903 Da) are very close, it was possible to discriminate these ions below 4291.126 Da using a mass accuracy of 10 ppm.
We digested bovine articular cartilage decorin with trypsin and filtered the resulting peptides using a 10 kDa molecular weight cutoff membrane. The retained fraction was digested using chondroitinase ABC to remove CS chain repeats, leaving a disaccharide and linker tetrasaccharide attached to the core protein. A proteomics database search determined that the sample contained decorin and significant quantities of cartilage oligomeric matrix protein, the hyalectan aggrecan, and the SLRPs biglycan and fibromodulin (Figure S 7B). We used the proteomics search export to build a glycopeptide search space that included N-glycosylation, mucin-type O-glycosylation and linker saccharides.
Peptide 30 DEASGIGPEEHFPEVPEIEMGPVCPFR 58 corresponded to the decorin N-terminal peptide, containing acidic residues followed by an SG sequence as a consensus site for GAG modification (49). Figure 1 shows an example linker-peptide tandem mass spectrum for this peptide with a trisulfated linker saccharide. While the peptide sequence is clear, the fragment ion masses do not definitively assign the glycosylation to the SG position. The assignment was therefore made from the consensus xylosylation site (50). Figure 2 shows the set of linkerpeptide glycoforms detected at position 33 (51). The two most abundant of these carried three and two sulfate groups, respectively. The abundances for those linkers with no intensity bar fell below 1e7. All of these linker compositions were attached to peptide backbones assigned with high confidence. There is insufficient precursor mass error to allow the existing linker compositions to be confused with each other, so the ones without an EIC were likely caused by low abundance or incomplete deconvolution. Others have reported presence of fucose and sialic acid substitutions on the linker saccharide of bikunin (16,52). We identified sialylated linker compositions with NeuAc and NeuGc, respectively. We detected bi-and mono-sulfated linker compositions but no fucosylated compositions for this decorin sample. Xylose phosphorylation regulates proteoglycan biosynthesis (13) and occurs sub-stoichiometrically in mature proteoglycans (12,53), possibly reflecting a balance of kinase and phosphorylase activities. For decorin, the xylose-phosphate containing linker saccharides were low in abundances.
Decorin N-glycosylation helps regulate core protein transit and secretion (54).  coverage. This included mucin-type O-glycosylation of peptide 339-346 near the decorin Cterminus based on a tandem mass spectrum score of 46.054 (see supplemental files). As far as we know, this is the first report of comprehensive glycopeptide assignments for decorin and demonstrates an approach that will be useful comparative studies on site-specific glycosylation.
The proteomics database search identified aggrecan in the decorin sample with high coverage for the G1 and G2 globular domains and low coverage for the extended and G3 domains ( Figure S 8A). An abundant set of N-glycopeptides was detected that corresponded to positions 332 in the G1 and 666 in the G2 domains ( Figure S 8B, 3C). We identified no aggrecan linkerpeptides in this sample. This is consistent with the conclusion that a population of truncated aggrecan molecules co-purified with decorin.
We also detected biglycan in the decorin sample. The biglycan sequence contains two SG attachment sites in the N-terminal region and was given an eponym to indicate two GAG chains (50). Biglycan tryptic peptide coverage from the proteomics database search of the decorin sample was 58%, indicating that this SLRP co-purified with decorin. We identified an abundant

Glycoproteomics of brain hyalectan proteoglycans
Neurocan. Consisting of an interacting network composed of hyaluronan, hyalectans and other glycoproteins, perineuronal nets are essential in control of plasticity in the adult brain (21).
Interestingly, enzymatic digestion of CS chains, the bulk of which are bound to hyalectan core proteins, restores synaptic plasticity temporarily in adult brain (56) and memory in mice with tau pathology (57). In addition, perineuronal net structures become altered during aging, repair, learning, memory and in response to drugs of abuse (21). Much of the understanding of the roles of hyalectans in brain relies on immunohistochemical staining and/or genetic expression analyses. Therefore, in order to understand brain physiology and pathophysiology, it is necessary to integrate knowledge of extracellular matrix proteoglycan structure.
Neurocan is the predominant hyalectan in developing brain (58,59) and is upregulated in response to brain injury (60). In adults, neurocan is associated with PNNs that surround axons of some neurons (61) (Figure S 11C, D). After inclusion of glycopeptides, the total coverage of neurocan was 59%. An example tandem mass spectrum of a glycopeptide modified with two mucin-type O-glycans is shown in Figure 4. The spectrum shows abundant peptide backbone fragmentation (blue and red ions) and peptide+saccharide ions (goldenrod-color ions). The peptide backbone product ions identify the glycosylation sites unambiguously but to not directly determine the glycan composition at each site. We report the most reasonable localization given the data alone. Hyalectan proteoglycans are extremely complex and it is appropriate for us to use the most reasonable localization from the data.
We detected linker peptides proximal to the G1 domain in the range of 360-415 (Figure S 11D).
While the peptide sequence and glycan composition are confident, the glycosite was ambiguous. The most abundant neurocan linker saccharide composition contains xylose phosphate and no sulfation ( Figure 5). The neurocan linker-glycopeptides are shown in Figure   S 11E. The average neurocan linker composition is significantly less highly sulfated than that of decorin but distinct from that of aggrecan, as discussed below. In particular, linker with phosphorylated xylose was significantly more abundant in neurocan than aggrecan. Figure 6 shows the overall coverage including the abundant mucin-type glycosylation in the extended domain. This is the first detailed map of glycosylation in the neurocan extended domain. Using this methodology, investigators will be able to determine changes to neurocan extended domain glycosylation associated with neuropsychiatric and other brain diseases.
Brevican is an important component of PNNs expressed in developmentally regulated manner (63). Secreted by neurons and glial cells and incorporated into the brain extracellular matrices, its expression is upregulated in response to injury (64) in brain tumor microenvironments, and has been associated with tumor aggressiveness (65,66) Figure 7 in which glycosylation is localized to the sequon by the presence y13 -y26 ions that include saccharide modifications. The extended region contains a mucin-like array of O-glycans, including two sites of hydroxylprolination (Figure S 12D). Several multiply glycosylated peptides were detected, an example of which is shown in Figure 8. The SL and TR glycosites were assigned based on the absence of saccharide modification for y1-y5, and the presence of saccharide modifications for y10-y24. Since the glycans undergo significant collisional dissociation, it was not possible to determine the glycan mass at each of the glycosites. The total brevican coverage after inclusion of glycopeptides was 79%, including 52 glycosites (Figure 9). To the best of our knowledge, this is the first report of a mucin-like domain that includes hydroxylprolination sites for brevican.
Aggrecan is a large and complex hyalectan, heavily modified by CS chains in the extended domains. The average aggrecan molecule has been estimated to carry 90 CS chains based on the total masses of core protein, CS, and the average mass of a CS chain (68). The bovine aggrecan sequence has 141 SG and 35 SA sub-sequences that may be considered candidate CS attachment sites (69). It also contains 9 N-glycosylation sequons and many potential mucintype O-linked glycosylation sites. The protein coverage for aggrecan reported in UniprotKB is confined primarily to the globular domains ( Figure S 13A). To the best of our knowledge, sitespecific identification of peptides with CS chains in aggrecan has not been reported previously.
In initial experiments we found that use of trypsin as the only protease produced extended region glycopeptides that were too large for analysis using HCD LC-tandem MS. We therefore  Bi-glycosylated linker-peptides were abundant in the trypsin/Glu-C digest dataset. Figure 10 shows a representative tandem mass spectrum of an aggrecan peptide modified with two linker saccharides. Although peptide backbone fragment ions were sufficiently abundant to assign both sites of xylosylation unambiguously, the glycan composition at each site could not be assigned. For many of the bi-glycosylated peptides, only one of the glycosites could be assigned confidently based on peptide backbone product ions. As a result, it is difficult to compare differences in linker-saccharide composition at different aggrecan glycosites. Looking at the average linker saccharide composition for aggrecan (Figure 11), the most abundant glycoforms have 0 sulfate groups and a very low degree of xylose phosphorylation. Neurocan linker peptides showed similar extent of sulfation but much higher xylose phosphorylation ( Figure 5). By contrast, the most abundant decorin linker-peptides had 3 sulfate groups. In addition, decorin linker-saccharide showed sialic acid substitution ( Figure 2).
Bovine aggrecan contains 144 SG and 35 SA sequences. We assigned a total of 50 linker glycosites, 41 to SG and 9 to SA. The glycosites corresponded to 9 different tri-amino acid sequences ( Figure 12A). The most abundant tetra-amino acid glycosites were PSGE (12 instances), ASGV (8) and ISAS (7). Among XSG glycosites, P was the most common side chain in the first position (40%) followed by A (24%) and the fourth position was occupied most commonly by V (31%) and E (29%) (Figure 12B, C). We compared the expression of XSG sequence motifs in aggrecan among 9 mammalian species (Figure S 14A) and observed the order of abundance of PSG>ASG>LSG, VSG. The expression of SGX sequence motifs showed an order of abundance of SGE>SGL>SGV (Figure S 14B). It therefore appears that the types of motifs occupied by linker mirror those present in the aggrecan protein sequences.
Among SA glycosites, 70% had I in the first position and all had S in the fourth position ( Figure   12D, E). All SA glycosites occurred in SASG sequences. In many cases such sequences were modified with two linker-saccharides, making assignment of the SA glycosites unambiguous.
The two aggrecan N-glycosites occurred at NQT sequons in homologous peptide sequences in the G1 (position 332) and G2 (position 666) domains, respectively. The G1 site was occupied by a set of complex N-glycoforms ranging from bi-tetra-antennary, in agreement with the glycoforms distribution previously reported (18). The G2 N-glycosite was occupied by a set of bi-and tri-antennary complex N-glycans qualitatively similar in relative abundances to those at the G1 site ( Figure 13). Both sites included sulfated glycoforms. Mucin-type O-glycosites occurred in the regions 987-90, 1490-92, 1780-86, 1871-75, and 1897-99. Overall, the density of mucin glycosites was much lower than observed in the extended domains of brevican and neurocan, respectively.

Conclusions
Until the present, much of the knowledge of tissue proteoglycan glycosylation has derived from classical biochemical and antibody-binding studies. To date, glycoproteomics methods have been used to characterize bikunin (16), CSPGs from urine and spinal fluid (28,30), C. elegans (71), and the CS/HS PG perlecan (29). We report glycoproteomics methods to assess sitespecific glycosylation of the SLRP decorin and the hyalectans aggrecan, brevican and neurocan. Our methods assign peptides glycosylated by linker saccharides, N-glycans or mucin-type O-glycans. We used LC-MS instruments and collisional dissociation methods that are widely available in proteomics laboratories. Our software used to interpret the data (GlycReSoft) is open source and publicly available.
The advantage to the use of mass spectrometry based omics methods is that the proteoglycan samples need not be pure; it is possible to define glycoproteomics search space information based on the measured proteome of the proteoglycan samples. The method assigns N-, mucintype O-and Linker-glycopeptides from the same LC-MS data. Decorin is the archetypical SLRP and we demonstrate that nearly complete proteomics and glycoproteomics coverage can be obtained from a single data acquisition. We demonstrate that the extended domains of brevican and neurocan contain dense regions of mucin glycosylation. Linker glycosylation was not detected in brevican and was much lower in abundance than mucin glycosylation for neurocan.
Since it is likely that glycosylation of the extended domains varies according to physiological context, it seems reasonable to conclude that brevican and neurocan mucin-regions impact the functions of these molecules. Aggrecan is vastly complex and we demonstrate the first assignments of site-specific linker-glycosylation in the extended domain.
The collisional dissociation methods employed here, while effective for singly glycosylated peptides, were able to distinguish sites in a subset of multiply glycosylated peptides. In order to resolve such glycosylation more effectively, electron activated dissociation (ExD) methods may be useful. The key point is to produce peptide backbone dissociation abundant enough for analysis of multiply glycosylated peptides; however, the linker-peptides are particularly acidic due to uronic acid, sialic acid, sulfate, and/or phosphate substitution. As a result, additional work is needed to define the effectiveness of positive mode ExD methods since charge states will be relatively low reducing the dissociation efficiency. It may be that negative ion ExD will prove more effective based on the negative charge of the glycopeptides, and this will require investigation.

Acknowledgements
This work was funded by NIH grant numbers P41GM104603, R21CA177479 and U01CA221234.

Data availability
The data discussed in the manuscript are available through the Pride Archive (Project PXD008855). Figure 1. Tandem mass spectrum of a decorin linker glycopeptide 30-58 (4354.5765 Da). Color code: green, oxonium ions; red, peptide y-ions; blue, peptide b-ions; goldenrod, peptide+saccharide fragments; grey, unassigned ions.        Color code: green, oxonium ions; red, peptide y-ions; blue, peptide b-ions; goldenrod, peptide+saccharide fragments; grey, unassigned ions. Glycosites are indicated in red. Figure 11. Aggrecan average Linker saccharide composition Figure 12. Comparison of linker glycosite sequence contexts in bovine aggrecan. XSGZ glycosites: A USXZ glycosites with total for each glycosite indicated with an asterisk *; B: (XSG) C (SGZ) D (XSA), E (SAZ), where X and Z are variable amino acid residues. Figure 13. Comparison of glycosylation at two aggrecan N-glycosites from the trypsin plus Glu-C digest fraction retained by the molecular weight cutoff membrane. Figure 14. Diagram showing all peptides and glycopeptides detected for bovine aggrecan from the trypsin plus Glu-C digest fraction regained by the molecular weight cutoff membrane. Each colored bar represents an assigned glycopeptide. Additional details are given in Figure S 13C and the GlycReSoft HTML report in the Supplemental Information.