Human Urinary Glycoproteomics; Attachment Site Specific Analysis of N- and O-Linked Glycosylations by CID and ECD*

Urine is a complex mixture of proteins and waste products and a challenging biological fluid for biomarker discovery. Previous proteomic studies have identified more than 2800 urinary proteins but analyses aimed at unraveling glycan structures and glycosylation sites of urinary glycoproteins are lacking. Glycoproteomic characterization remains difficult because of the complexity of glycan structures found mainly on asparagine (N-linked) or serine/threonine (O-linked) residues. We have developed a glycoproteomic approach that combines efficient purification of urinary glycoproteins/glycopeptides with complementary MS-fragmentation techniques for glycopeptide analysis. Starting from clinical sample size, we eliminated interfering urinary compounds by dialysis and concentrated the purified urinary proteins by lyophilization. Sialylated urinary glycoproteins were conjugated to a solid support by hydrazide chemistry and trypsin digested. Desialylated glycopeptides, released through mild acid hydrolysis, were characterized by tandem MS experiments utilizing collision induced dissociation (CID) and electron capture dissociation fragmentation techniques. In CID-MS2, Hex5HexNAc4-N-Asn and HexHexNAc-O-Ser/Thr were typically observed, in agreement with known N-linked biantennary complex-type and O-linked core 1-like structures, respectively. Additional glycoforms for specific N- and O-linked glycopeptides were also identified, e.g. tetra-antennary N-glycans and fucosylated core 2-like O-glycans. Subsequent CID-MS3, of selected fragment-ions from the CID-MS2 analysis, generated peptide specific b- and y-ions that were used for peptide identification. In total, 58 N- and 63 O-linked glycopeptides from 53 glycoproteins were characterized with respect to glycan- and peptide sequences. The combination of CID and electron capture dissociation techniques allowed for the exact identification of Ser/Thr attachment site(s) for 40 of 57 putative O-glycosylation sites. We defined 29 O-glycosylation sites which have, to our knowledge, not been previously reported. This is the first study of human urinary glycoproteins where “intact” glycopeptides were studied, i.e. the presence of glycans and their attachment sites were proven without doubt.

In search of disease biomarkers, urine qualifies as an important biologic fluid that can easily be collected by repeated and noninvasive sampling from single individuals. Proteins present in urine are derived not only from glomerular ultrafiltration of plasma but also from tubular secretion of soluble proteins, detachment of glycosylphosphatidyl inositol anchored proteins and exosome shedding through the urothelium (1). For healthy individuals, 30% of the urinary proteome has been estimated to originate from the plasma filtrate whereas the remaining 70% is believed to be derived from the kidneys and the urothelium (2). Until 2005, ϳ800 urinary proteins had been identified by various proteomic approaches (3)(4)(5)(6)(7). In 2006, a comprehensive proteomic study identified more than 1500 proteins from healthy human urine samples, simultaneously reflecting the complexity and the potential information concealed in the urinary proteome (8). In 2009, Kentsis et al. reported the hitherto largest data set for the urinary proteome, unveiling more than 2300 protein identities (9). The "core urinary proteome" was recently defined as a common set of nearly 600 urinary proteins with a dynamic concentration range spanning five orders of magnitude (10). Interestingly, the authors also reported that the 20 most abundant proteins, which were estimated to constitute 2/3 of the core urinary proteome by mass, were glycoproteins with serum albumin being the only exception.
Glycoproteins are characterized by the presence of oligosaccharides linked to the peptide backbone primarily through N-or O-glycosidic bonds at asparagine or serine/threonine residues, respectively (11). N-and mucin-type O-glycosylations are widely accepted as the most common and structurally diverse post-translational modifications found on secreted proteins and on the extracellular parts of membrane bound proteins (12). Given that protein glycosylation is involved in various cellular processes (13)(14)(15)(16), the site-specific characterization of N-and O-linked glycosylations and identification of the modified proteins is becoming increasingly important. Urine is potentially a rich source for N-and Olinked glycoproteins derived from renal-and distal organs and represents an interesting subproteome for structural charac-terization of human glycoproteins. However, glycoproteomic characterization of urine is lacking and only a few proteomic studies aimed at identifying urinary glycoproteins have been reported (17)(18)(19)(20). In these studies, the glycan moieties were either cleaved off or not studied at all. It is, however, important to analyze qualitative glycan differences in glycoproteomes because changes associated with the carbohydrate moieties may reflect physiological status (21)(22)(23). Perhaps more importantly for the urinary proteome, the study of intact glycopeptides could reveal not only the glycoprotein origin but potentially also provide information regarding pathological changes of its original tissue (24,25). By analyzing tryptic glycopeptides originating from urinary glycoproteins both the glycan structures and glycosylation sites of proteins may be addressed. However, a highly purified mixture of glycopeptides is the prerequisite for such studies because of the general phenomena of ion suppression and stoichiometric effects in the mass spectrometric analysis of complex mixtures (26 -28). Enrichment methods for the isolation of formerly N-linked glycopeptides from biological sources have been described using hydrazide chemistry, TiO 2 affinity purification, lectin chromatography and hydrophilic interaction liquid chromatography (HILIC) (29 -33). The N-glycans are typically removed by PNGase F treatment during these protocols and the site-specific information of N-glycan structures is usually not addressed. Only a few glycoproteomic studies, aimed at analyzing intact N-glycopeptides from biological samples, have been published (34,35). Also, by comparison to N-glycosylation, characterization of protein O-glycosylation is analytically more challenging for several reasons, e.g. due to the heterogeneity associated with O-glycan core structures (36). Although collision-induced dissociation (CID) 1 -based MS n strategies are well capable of revealing both O-glycan-and peptide sequences for intact glycopeptides (37) the site-specific information of the modified amino acid is however usually lost. This is because of predominant glycosidic fragmentation of the precursor during MS 2 , and peptide fragmentation occurring mainly for the deglycosylated peptide ion in the MS 3 . Additionally, the exact glycosylation site of identified peptides containing several Ser/Thr residues cannot be predicted due to the lack of a consensus sequence for mucin-type O-glycosylation. The alternative fragmentation techniques electron capture dissociation (ECD) (38,39) and electron transfer dissociation (ETD) (40) have been introduced for site-specific analysis of CID-labile PTMs but characterization of protein O-glycosylations using ECD/ETD have generally been limited to synthetic glycopeptides or single glycoproteins (41)(42)(43)(44)(45).Thus, investigation of protein O-glycosylation has lagged behind and relatively little is known about O-linked glycans with respect to their protein carriers and amino acid attachment sites. Recently, Darula and Medzihradszky used lectin enrichment with jacalin, recognizing core 1 O-glycans (Gal␤1-3GalNAc␣-O-Ser/Thr), and identified 21 O-glycosylation sites from bovine serum glycoproteins by combining ETD and exoglycosidase digestion (46). We have previously developed a sialic acid specific capture-and-release protocol for the enrichment of both N-and O-glycosylated peptides from sialylated glycoproteins in biological samples using hydrazide chemistry (37). Only CID based characterization was employed in our previous study and assignment of O-glycan attachment sites was therefore not possible for most O-glycosylated peptides. The low sensitivity and fragmentation yield for ECD/ETD compared with CID make it advantageous to use highly enriched samples of O-glycosylated peptides. We tested the sialic acid capture-and-release protocol on human serum samples but, as expected, N-glycosylated peptides completely dominated the LC-MS/MS chromatograms (Halim et al., unpublished). We then turned our attention to urine, with ambitions to characterize N-and O-glycosylated peptides, since urine also may serve as a sample source for biomedical diagnosis. However, because urine contains much salts and pigments, which could interfere with the periodate oxidation step in our protocol, we first developed a simple method to remove low-molecular waste products and attain pure protein samples suitable for redox chemistry and proteomics purposes. In this study, we have thus extended our protocol ( Fig. 1 and supplemental Fig. S1) to include a unique dialysis procedure for isolation of human urinary proteins prior to the sialic acid capture-and-release method. In addition to the CID-based approach, we also included ECD for the characterization of O-glycan attachment sites and as a complementary peptide fragmentation mode for the identification of urinary glycopeptides.

EXPERIMENTAL PROCEDURES
Collection and Preparation of Human Urine-First morning, midstream urine was obtained from a healthy male individual during five consecutive days and prepared separately. Immediately after collection, 50 ml de-identified urine was separated from intact cells and debris by centrifugation at 3000 ϫ g, 4°C for 20 min. The uppermost 20 ml were frozen at Ϫ20°C and used for further analysis. Routine clinical chemistry analyses of all five samples were all within the reference range (U-Albumin (Ͻ5,4 mg/L), U-Creatinine (mean 17,4 mmol/L; range 12-28 mmol/L). U-Bilirubin, U-Urobilinogen, U-Acetone, U-Glucose, U-Erythrocytes, U-Leukocytes, U-nitrite were all negative).
After thawing, 10 ml of each sample was dialyzed against 14 ϫ 2 L of tap water at 4°C using Spectra/Por MWCO 12-14 kDa (Spectrum Laboratories) for 7 days (Fig. 1). The urine samples were lyophilized, dissolved in 6 ml 5% sodium-dodecyl sulfate (SDS) and dialyzed against 2 ϫ 2 L of 1.5% SDS at 60°C for 24 h. The SDS was subsequently removed by dialysis against 2 ϫ 2 L Milli Q deionized H 2 0 (dH 2 0) at room temperature for 24 h. Finally, the samples were lyophilized and dissolved in 0.5 ml dH 2 0. Protein content was determined using the BCA-1 protein assay (Sigma-Aldrich) on a NanoDrop 1000 spectrophotometer (Thermo Scientific) according to the manufacturer's protocol.
Protein Separation-For protein separation prior to in-gel trypsin digestion 80 g of urinary proteins were dissolved in NuPage LDSsample buffer (Invitrogen, Carlsbad, CA) supplemented with 50 mM dithiothreitol, reduced and denatured at 70°C for 10 min. Protein samples were then separated on 4 -12% Bis-Tris precasted polyacrylamide gels (Invitrogen). SeeBlue Plus2 pre-stained standard (Invitrogen) was used as molecular weight marker and proteins were visualized by Coomassie colloidal blue staining. For in-gel trypsin digestion one gel lane was divided into 15 equally sized gel slices and subjected to automated trypsin digestion (supplemental Fig. S1A) on a BioMek 2000 work station equipped with a vacuum manifold. 96well plates supplemented with a 7 l volume of C18 reversed phase chromatographic resin were used for vacuum filtration and sample clean-up. The work-flow essentially followed the protocol previously described (47) except that the peptide extraction was performed twice with 0.2% trifluoroacetic acid to allow for peptide binding to the C18 resin of the filter plates. Finally, peptides were eluted twice in 40 l of 60% acetonitrile in 0.1% trifluoroacetic acid and the eluted fractions were evaporated to dryness in a vacuum centrifuge. Prior to liquid chromatography/tandem MS (LC-MS/MS) analysis samples were redissolved in 0.1% formic acid.
For electrophoretic analysis of repeatedly dialyzed urine samples, 30 g of urinary proteins were denatured by heating (100°C, 5 min) in 1% SDS and 100 mM dithiothreitol and separated on a 4 -12% Bis-Tris precasted polyacrylamide gel (Invitrogen). SeeBlue Plus2 prestained standard (Invitrogen) was used as molecular weight marker and proteins were visualized by Coomassie colloidal blue staining (supplemental Fig. S2C).

Glycopeptide Enrichment Procedure
Hydrazide Capture-Capture of sialylated glycoproteins to hydrazide beads (supplemental Fig. S1B) was done as previously described (37) with minor modifications. One hundred g protein in 1 ml dH 2 0 was oxidized with 2 mM periodic acid for 15 min at 0°C. The reaction was quenched by the addition of 5 l 99% glycerol and buffer exchanged to 2.5 ml coupling buffer (100 mM acetate, 150 mM NaCl, pH 4.5) using Sephadex PD-10 columns (GE Healthcare). One hundred l hydrazide beads (Bio-Rad) in coupling buffer was added and agitated for 16 h at room temperature in the dark. The beads were subsequently washed with 3 ϫ 3 ml 0.1% Tween 20 in PBS, pH 7.4 and finally with 2 ϫ 3 ml of 50 mM NH 4 HCO 3 , pH 8.0.
Reduction, Alkylation and Trypsin Digestion-The glycoproteins captured onto the beads were then incubated with 0.3 ml 10 mM dithiothreitol for 1 h at 37°C in the dark. Following a washing step (50 mM NH 4 HCO 3 , pH 8.0), 0.3 ml 55 mM iodoacetamide (Sigma Aldrich) was added and incubated for 30 min at room temperature and in the dark. The beads were then washed with 2 ϫ 3 ml of 8 M urea, 50 mM NH 4 HCO 3 , pH 8.0 and with 2 ϫ 3 ml of 1% SDS in dH 2 0 with gentle agitation. Finally, five washing steps with 3 ml of 50 mM NH 4 HCO 3 , pH 8.0 were performed. Captured glycoproteins were digested with 1 g sequencing grade porcine trypsin (Promega, Madison, WI) in 70 l 50 mM NH 4 HCO 3 , pH 8.0, at 37°C for 18 h. The trypsin-released peptides were transferred to prelubricated eppendorf tubes (Costar). Any remaining peptides were extracted once with 100 l 50% acetonitrile, pooled and lyophilized together with the trypsin released peptides and subjected to mass spectrometric analysis (supplemental Fig. S1B).
Release of Glycopeptides-The beads were initially washed once with 3 ml of 50% acetonitrile in dH 2 0, once with 3 ml dH 2 0 and once with 3 ml 1.5 M NaCl in dH 2 0. The beads were then washed 3 ϫ 3 ml dH 2 0, 2 ϫ 3 ml 50% acetonitrile in dH 2 0, 2 ϫ 3 ml with 25% acetonitrile in dH 2 0 and finally with 2 ϫ 3 ml dH 2 0. One hundred l 0.1 M formic acid was added to the beads and incubated for 1 h at 80°C (supplemental Fig. S1C). The released glycopeptides were transferred to prelubricated eppendorf tubes (Costar, Cambridge, MA). Any remaining glycopeptides were extracted once with 50 l 50% acetonitrile in dH 2 0, pooled and lyophilized together with the formic acid released glycopeptides and subjected to mass spectrometric analysis.
LC-MS/MS Analysis-Tryptic peptides, obtained either from in-gel digestion of electrophoretically separated urinary proteins (supplemental Fig. S1A), from unglycosylated peptides released by trypsin digestion of hydrazide captured glycoproteins (supplemental Fig. S1B) or glycopeptides released through formic acid hydrolysis (supplemental Fig. S1C) were separated by reversed phase chromatography on a 15 cm capillary column (Zorbax SB300 C18, 0.075 mm ID). Peptides/glycopeptides were reconstituted in 40 l 0.1% formic acid, 20 l was loaded onto the column in eluent A (0.1% formic acid) and separated with a linear gradient from 3% to 60% eluent B (84% acetonitrile in 0.1% formic acid) at a flow rate of 250 -300 nL/min. Gradient lengths were either 50 min, for the analysis of the peptide fraction, or 150 min, for the glycopeptide fraction and the in-gel digested fractions. The LC system (Ettan MDLC, GE Healthcare) was coupled in-line with a LTQ-FTICR instrument (Thermo Fisher Scientific) via a nanoelectrospray source (Thermo Fisher Scientific). The source was operated at 1.4 kV, with no sheath gas flow and with the ion transfer tube at 200°C. The mass spectrometer was programmed for acquisition in a data dependent mode. The survey scans were acquired in the FTICR mass analyzer and covered the m/z range 300 -2000. For the analysis of peptides the seven most intense peaks in each full mass scan, with charge state Ն2 and intensity above a threshold of 100, were selected for fragmentation in the linear ion trap (LTQ) by CID. Glycopeptides were analyzed with two independent methods, one based on CID fragmentation and the other on ECD fragmentation. For the CID method the most intense peak in each FTICR full scan was selected for fragmentation in the linear ion trap (LTQ) followed by subsequent selection and fragmentation of the five most intense MS 2 fragment ions. For the ECD method the two most intense peaks in each FTICR full scan was selected for fragmentation in the ICR cell. CID fragmentation was performed with normalized collision energy of 35% activation, q ϭ 0.25, activation time of 30 ms and three microscans. ECD fragmentation was performed with a relative energy of 4 and 5 in subsequent scans and a duration of 70 ms and three microscans. For all fragmentation events dynamic exclusion was enabled with a repeat count of 2. Peaks selected for fragmentation more than twice within a 30 s interval were excluded from selection (20 ppm window) for 180 s and the maximum number of excluded peaks was 200. AGC settings were 1000000 (FTMS full scan), 30000 (Ion trap), 10000 (Ion trap MS n ), and 500000 (FTMS ECD).

Data Analysis
Protein Identification-Raw data containing centroid MS/MS spectra, from the analysis of tryptic peptides, were converted into .dta format by the Bioworks software (version 3.3.1) utility extract_msn (Thermo Fisher Scientific) and analyzed with an in-house version of the Mascot software (Mascot ver. 2.3.01, http://www.matrixscience. com). Search parameters were set as follows: peptide tolerance, 10 ppm; MS/MS tolerance, 0.5 Da; enzyme, trypsin, one missed cleavage allowed; fixed carbamidomethyl modification of cysteine; variable oxidation of methionine; database, IPI human version 3.72 (86,392 sequences). Fragment ions from the b-and y-series, including losses of ammonia or water, were used for scoring. Minimal requirement for each protein identification was two unique peptide hits with scores above the significance threshold (p Ͻ 0.05).
Protein Clustering-Mascot results, including information on identified proteins and peptides, were imported into the ProteinCenter software (Proxeon Bioinformatics). Data was filtered so that each identified protein contained at least two unique peptides and identified proteins were clustered, based on peptide sharing, into groups of indistinguishable proteins. Lists of protein identifiers from two independent studies (8,9) were also imported into the ProteinCenter software and comparisons of the three data sets were performed.
Glycopeptide Characterization Using CID-Glycopeptide identification and relative quantification of N-and O-glycan microheterogeneity was done as previously described (37). N-and O-linked glycan sequences were manually verified in CID-MS n spectra for each glycopeptide by tracing peaks corresponding to the loss of individual monosaccharides. Manually selected MS 3 spectra, corresponding to the fragmentation of unmodified peptides for O-glycopeptides, were individually converted to .mzXML format via the Readw application (http://www.proteomecenter.org). Each .mzXML file was individually visualized with the mMass (version 2.4) application (48) and searched with the Mascot algorithm. The peptide monoisotopic mass was manually defined for each search by subtracting the monoisotopic mass of the glycan from the FTICR-MS 1 measured precursor. Search parameters were set as follows: peptide tolerance, 10 ppm; MS/MS tolerance, 0.6 Da; enzyme, trypsin, one missed cleavage allowed; fixed carbamidomethyl modification of cysteine; variable oxidation of methionine and variable loss of NH 3 (-17.0266 Da) at N-terminal cysteine and glutamine; taxonomy, human, 20,259 sequences (protein entries); database, SwissProt 101005. Peptides were considered as positive identifications if the ion score was above the significance threshold (p Ͻ 0.05). For MS 3 spectra that did not yield positive identifications, in the above described procedure, the peak list of individual glycopeptides were manually exported from the mMass application as .txt files and analyzed with an in-house version of the Mascot software (Mascot version 2.3.01, www.matrixscience.com). The precursor mass was manually defined in each .txt file so that it would match the monoisotopic mass of the peptide as described above. Enzyme specificity was set to semitrypsin or to no enzyme to account for peptides with a single or no tryptic sites, respectively. Finally, variable phosphorylation at serine or threonine residues was used in selected cases. All CID-MS 3 spectra that resulted in positive identifications were also converted to .mgf files according to the same procedures as above and Mascot searched against a decoy database (taxonomy, human, 20,245 sequences (protein entries); database, Swissprot 110817) using the same search parameters as above.
For all N-linked glycopeptides, the peak list for CID-MS 3 spectra of selected ions (peptideϩHexNAc or peptideϩdHexHexNAc) was converted to .txt files as described above. The precursor mass was manually defined in each .txt file so that it would match the monoisotopic mass of the peptideϩHexNAc or peptideϩdHexHexNAc. This was accomplished by subtracting the monoisotopic mass of the N-glycan (apart from HexNAc or dHexHexNAc) from the monoisotopic mass of the FTICR-MS 1 measured precursor. The sequence rule SEQ ϭ B-NX [STC] or SEQ ϭ C-N[KR] was included in the .txt file to constrain each search against peptide sequences containing the N-glycosylation consensus (with or without a tryptic cleavage site within the consensus sequence itself). This constraint lowered the acceptance threshold value but was justified by the clear presence of the N-linked glycan sequence in CID-MS 2 . Search parameters were as described above, with the exception of including HexNAc (203.0794 Da) or dHexHexNAc (349.1373 Da) as variable modification of asparagine. Mascot scoring options were set to include the neutral loss of HexNAc (203.0794 Da) from the precursor ion and from peptide b-and y-type fragments. Searches were performed with the Mascot algorithm and peptides were considered as positive identifications if the ion score was above the significance threshold (p Ͻ 0.05). All CID-MS 3 spectra that resulted in positive identifications were also converted to .mgf files according to the same procedures as above and Mascot searched against a decoy database (taxonomy, human, 20,245 sequences (protein entries); database, Swissprot 110817) using the same search parameters as described for N-linked glycopeptides above.
Glycopeptide Characterization Using ECD-The precursor ion masses of ECD spectra were matched to precursor ion masses of glycopeptides that had been identified by the CID-MS n approach. Peak lists of c, (c -1), z and (zϩ1)-ions were prepared for candidate glycopeptides using the MS-product tool (http://prospector. ucsf.edu). Glycopeptide identifications were verified and O-glycan attachment sites were pinpointed manually to unique Ser/Thr residues by tracing c-and z-ion peaks that contained or lacked the anticipated glycan(s). Also, the Mascot distiller program (version 2.3.2.0, Matrix Science) was used for peak picking and to prepare Mascot files from the ECD spectra. Subsequent MS 2 spectra at relative energy 4 and 5 were aggregated and the ions presented as singly protonated in the output Mascot files. Search parameters were set as follows: peptide tolerance, 10 ppm; MS/MS tolerance, 0.03 Da; enzyme, trypsin, one missed cleavage allowed; fixed carbamidomethyl modification of cysteine; variable modification of HexHexNAc (365.1322 Da), Hex 2 HexNAc 2 (730.2644 Da) and dHexHex 2 HexNAc 2 (876.3223 Da) of serine, threonine and tyrosine; variable Hex 5 HexNAc 4 (1622.5816 Da) modification of asparagine; variable oxidation of methionine and variable loss of NH 3 (-17.0266 Da) at N-terminal cysteine and glutamine; taxonomy, human (20, 259 sequences); database, SwissProt 101005. Instrument was set to match 1ϩ ions of the c, z and zϩ1 series (c, zϩ1 and zϩ2 using Mascot terminology). We did not observe any y-ions and these were thus not considered in the scoring. Acceptance criteria for a positive identification was based on scoring above the significance threshold value (p Ͻ 0.05). The Mascot files were analyzed with the in-house version of the Mascot software (Mascot version 2.3.01).

RESULTS
Protein Yields and Identifications-Starting from 10 ml urine we used dialysis against water to remove salts and pigments but this was found to yield inadequate sample purity. However, after a second dialysis against 1.5% SDS at 60°C the procedure was satisfactorily efficient in removing pigments ( Fig. 1 and supplemental Fig. S2). We recovered 31 Ϯ 10 g/ml protein (mean Ϯ 1SD) from the dialyzed urine samples. One dialyzed urine sample was analyzed by GeLC-MS/MS (supplemental Fig. S1A). Applying the criteria of at least two uniquely identified peptides per identified protein, we identified 989 urinary proteins that were grouped into 413 protein groups of indistinguishable proteins by clustering based on peptide sharing (Supplementary excel Table, Gel-based proteomics). Following hydrazide capture (supplemental Fig. S1B), 63 proteins were either identified only from peptides found in the tryptic digests of captured proteins (n ϭ 10), only from the covalently linked glycopeptides released through acid hydrolysis (n ϭ 36) (supplemental Fig. S1C) or from both of these procedures (n ϭ 17). Thus, 53 glycoproteins could be identified solely based on the identification of unique glycopeptides and for 17 of those glycoproteins the identities were also supported by peptide identifications (supplemental Table S1 and supplemental Fig. S3). Altogether, 26 urinary glycoproteins were identified from 122 un-glycosylated peptides found in the tryptic digests of glycoproteins captured onto the beads. Most of these proteins were annotated either as glycoproteins (n ϭ 20) or as potential glycoproteins (n ϭ 4) in the UniProtKB/Swiss-Prot database (49), e.g. Uromodulin, Kallikrein-1, Kininogen, Zinc-alpha-2glycoprotein etc. Also, Phosphoinositide-3-kinase-interacting protein 1 (UniProt/KB accession Q96FE7) and Protein YIPF3 (UniProt/KB accession Q9GZM5), which are currently not annotated as potential glycoproteins, were indeed found to be glycosylated (see below). Serum albumin repeatedly appeared together with the enriched glycoproteins and was identified from 18 peptides only in the tryptic digests of the beads. In total, 442 urinary protein groups were identified in our samples by gel-based proteomics and hydrazide capture enrichment. We observed 400 protein identifications overlapping the data sets of Kentsis et al. and Adachi et al., whereas 42 protein identifications were found to be unique in our data set (supplemental Fig. S3).  (51). The CID-MS 2 spectrum of the Hex 2 HexNAc 2 glycoform in Fig. 2A did not contain a fragment ion at m/z 407, suggesting that two separate core 1-like glycans occupied two individual Ser/Thr residues within the glycopeptide. Conversely, in other cases core 2-like glycans were indeed identified (Fig. 3, see below). For the Hex 2 HexNAc 2 glycoform ( Fig. 2A) the intact peptide ion (Y 0ion) was observed as the fifth most intense ion (for z Ն 2 ions) at m/z 828.4 and peptide fragmentation was obtained in the final CID-MS 3 spectrum. The HexHexNAc 2 glycoform was the next glycopeptide that eluted (m/z 741.7, Fig. 2E) and the CID-MS 2 spectrum (Fig. 2B) showed an intense charge reduced fragment ion at m/z 929.1 corresponding to the loss of HexHexNAc and a proton from the precursor ion. Additional charge reduced fragment ions at m/z 1010.6 and 827.7 showed the loss of HexNAc and HexHexNAc 2 , respectively. CID-MS 3 of Y 0 at m/z 827.7 resulted in peptide fragmentation  (see below). The Y-type fragment ion at m/z 687.5 corresponding to [peptideϩHexNAc 2 ϩ3H] 3ϩ showed that two HexNAc residues were attached to the peptide but did not reveal if they were located on individual Ser/Thr or linked in a core 2-like manner. Again, a diagnostic [HexNAcHexNAcϩH] ϩ ion at m/z 407 was not observed, indicating that the HexNAc residues were located on separate Ser/Thr residues. Approximately 1 min later the HexHexNAc glycoform eluted (m/z 674.0 in Fig.  2E) and the CID-MS 2 spectrum (Fig. 2C) (Table I).

Identification of O-Linked Glycopeptides by CID-We
Assignment of Glycan Attachment Sites by ECD-We also acquired ECD-MS 2 spectra of the triply charged D 93 VSTPPTVLPDNFPR 107 glycopeptides from IGF-II, with Hex 2 HexNAc 2 (Fig. 2G), HexHexNAc 2 (Fig. 2H) and HexHex-NAc (Fig. 2I) glycans. Fragmentation of triply charged precursors generated sufficient c-and z-ions to be used for glycosylation site identification purposes. For the ECD-MS 2 of the triply charged Hex 2 HexNAc 2 glycoform (Fig. 2G), the c 3 -ion was observed without the additional mass of any glycan (m/z 319.16) indicating that Ser 95 was not modified. The c 7 -ion, however, was detected with the additional mass of Hex 2 HexNAc 2 (m/z 1445. 62) showing that the glycan(s) had to reside within the Thr 96 -Pro-Pro-Thr 99 sequence. The cyclic structure of proline precludes ECD induced N-terminal cleavage and c 4 , c 5 , z 10 , and z 11 ions can thus not be observed. The only fragment ions that can resolve the glycan attachment site(s) are therefore z 9 and c 6 . A glycosylated c 6 fragment was indeed observed at m/z 979.44 (Fig. 2G)  Thr 96 harbored a single HexHexNAc. The c 7 fragment was observed at m/z 1445.62, which mapped the second HexHex-NAc to Thr 99 . The glycan sequence, determined as two separate HexHexNAc-O-Ser/Thr structures by CID-MS 2 ( Fig. 2A), was thus mapped by ECD-MS 2 (Fig. 2G) to two individual amino acids, i.e. Thr 96 and Thr 99 of IGF-II. The ECD-MS 2 spectrum of the triply charged HexHexNAc 2 glycoform (Fig.  2H) allowed us to verify the peptide sequence and the presence of a HexHexNAc 2 moiety within the Asp 93 -Val 94 -Ser 95 -Thr 96 -Pro 97 -Pro 98 -Thr 99 region. However, we did not detect any fragment ions that could differentiate whether Thr 96 or Thr 99 was modified with the single HexNAc. For the HexHex-NAc glycoform (Fig. 2I) the c 3 ion was once again observed without the additional mass of the carbohydrate, showing that Ser 95 was not modified. Furthermore, the c 6 was detected at m/z 614.31 and was thus not glycosylated and showed that Thr 96 was not the glycosylation site. In contrast, the c 7 -ion was detected with the additional mass of HexHexNAc (365.13 Da) at m/z 1080.49, thereby pinpointing the glycosylation site to Thr 99 of IGF-II as previously described (52). Taken together, these experiments also revealed the site occupancy (macroheterogeneity) within the D 93 VSTPPTVLPDNFPR 107 tryptic glycopeptide, i.e. the initial HexHexNAc glycosylation occurs at Thr 99 whereas the second HexHexNAc is attached to Thr 96 .
Fragments corresponding to the loss of 43.02 Da from glycopeptide precursors were also observed in ECD-MS 2 , seen at m/z 1172.05 (Fig. 2G) and at m/z 989.48 (Fig. 2I). A plausible explanation for these secondary fragments has been attributed to the loss of an acetyl radical (C 2 H 3 O • ) from the N-acetyl moiety of HexNAc containing glycopeptides (53). Also, elimination of HexHexNAc from precursor ions was occasionally observed in ECD-MS 2 (m/z 1010.49 in Fig. 2G and m/z 827.92 in Fig. 2I) but such fragmentation channels were minor dissociation pathways, which did not have a negative impact on the interpretation of ECD spectra. In total, 32 Olinked glycosylation sites were manually assigned to unique Ser/Thr residues using ECD (Table I and (Table I,  supplemental Table S2 and supplemental Fig. S5). For the C-terminal tryptic peptide A 342 VAVTLQSH 350 from protein YIPF3 a single HexNAc, in accordance with the Tn-antigen, (GalNAc␣-O-Ser/Thr, Fig. 3A) was identified. The ECD-MS 2 spectrum showed that the HexNAc was attached to the Thr 346 residue (Fig. 3B). The HexHexNAc glycoform was also identified by CID-MS 2 (Fig. 3C) and ECD-MS 2 (Fig. 3D). Further, three core 2-like structures with Hex(HexNAc)HexNAc (Fig.  3E), Hex(HexHexNAc)HexNAc (Fig. 3G) and dHexHex-(HexHexNAc)HexNAc (Fig. 3I) glycans were also identified. One glycosylation site on the Thr 346 residue was mapped for these O-linked glycopeptides by ECD-MS 2 (Fig. 3F, 3H, and  Fig. 3E and 3G); the HexHexNAc 2 B/Y-type ion (m/z 569, Fig. 3G); Hex 2 HexNAc 2 (m/z 731, Fig. 3G) and dHexHex 2 HexNAc 2 (m/z 877, Fig 3I) verified that these glycans exceeded the HexHexNAc structure in complexity and thus confirmed the presence of one as opposed to two glycosylation sites for this peptide. B/Y-type oxonium ions exceeding m/z 407, e.g. at m/z 569 equally well matched ions corresponding to [Hex-(HexNAc)-HexNAc ϩ H] ϩ and [Hex-NAc-Hex-HexNAc ϩH] ϩ , i.e. a branched or a linear glycan sequence, respectively. Thus, B/Y-type ions at m/z 569 were unable to differentiate core 2-like glycans from elongated (linear) core 1-like structures. The same limitation is true for B-type ions at m/z 731 (Fig. 3G), corresponding to the entire Hex 2 HexNAc 2 moiety of O-linked glycopeptides. Y-type oxonium ions at m/z 528, corresponding to [Hex-HexNAc-Hex ϩ H] ϩ , could potentially reveal a linear O-glycan sequence but such ions were not observed in any CID-MS n experiments for Hex 2 HexNAc 2 glycoforms in this study.
Additionally, we identified secondary modifications of some O-linked glycopeptides. The CID-MS 2 fragmentation spectrum of the HexHexNAc glycosylated P 52 ATDETVLA 60 peptide (Microfibrillar-associated protein 5, UniProt/KB accession Q13361) (Fig. 4A) showed an initial loss of ϳ80 Da (m/z 641.0), which we tentatively assigned as a sulfate group (79.9568 Da), but which could in theory also be a phosphate group (79.9663 Da). The precursor ion (m/z 681.2804 2ϩ , not shown) was found to deviate by 1.69 ppm (-5.28 ppm for a phosphorylated precursor ion) from the theoretical monoisotopic mass of a sulfated precursor ion. In addition to the oxonium ions at m/z 204 (HexNAc) and m/z 366 (HexHexNAc), a fragment ion at m/z 446 was also observed which indicated that the sulfate group resides on the glycan and not on the peptide (Fig. 4A and Fig. 4B). Co-eluting with the sulfated precursor, we also observed the nonsulfated glycoform, i.e. the HexHexNAc modified P 52 ATDELVLA 60 peptide, which was also characterized by CID-MS n and ECD-MS 2 fragmentation (supplemental Fig. S5). The FTICR-MS 1 measured mass difference between the sulfated (m/z 681.3019 2ϩ ) and nonsulfated (m/z 641.3019 2ϩ ) variants of the HexHexNAc glycosylated P 52 ATDETVLA 60 peptide was found to be 79.9570 Da, which deviates from the theoretical value of a sulfate group (79.9568 Da) only by 0.0002 Da. Although the m/z 446 ion, corresponding to HexHexNAcϩSulf, was detected and mass measured in the ion trap, the accurate mass of the sulfate group was thus indirectly confirmed by the mass measurements of the precursor ions in the ICR cell. Unfortunately, whether the Hex or HexNAc was carrying the secondary modification could not be defined.
The CID-MS 2 spectrum of the HexHexNAc glycosylated T 19 PAPLDSVFSSSER 32 peptide (Vitamin K-dependent protein C, UniProt/KB accession P04070) is shown in Fig. 4C. This glycopeptide was also detected with a mass increment of ϳ80 Da. However, the CID-MS 2 fragmentation of this glyco-peptide resulted in an initial loss of Hex (to m/z 888.3) followed by a loss of HexNAc (to m/z 786. 8), showing that the modification, tentatively assigned as a phosphorylation, was attached to the peptide and not to the glycan. The precursor ion (m/z 969.4200 2ϩ , supplemental Fig. S5) was found to deviate by 3.30 ppm (8.20 ppm for a sulfated precursor ion) from the theoretical monoisotopic mass of a phosphorylated precursor ion. The results in Fig. 4C indicate that O-linked glycans are more susceptible to CID-induced fragmentation by comparison to phosphate groups. ECD-MS 2 fragmentation (supplemental Fig. S5)   sequence to Thr 19 but the phosphorylated serine residue, among the four possible, was not identified (Table I and supplemental Fig. S5).
CID-and ECD-fragmentation of N-linked Glycopeptides-Fifty-eight glycopeptides, corresponding to 25 differently Nglycosylated peptides from 17 urinary glycoproteins were identified (0.0% false positive identifications) in the formic acid released glycopeptide fractions (supplemental Fig. S1C and supplemental Fig. S6). They are all listed together with their N-linked glycans, their attachment sites and Mascot scores of the dominating glycopeptides in Table I and in  supplemental Table S3. As a general feature, we observed the presence of several glycoforms for each N-linked glycopeptide. The relative abundance of specific glycoforms was determined by integrating chromatographic peaks for individual peptide glycoforms and the values were used to estimate the relative distribution of N-glycan microheterogeneity at each site. Oligosaccharide composition corresponding to the bian-tennary complex type structure was typically dominating, although triantennary and fucosylated bi-and triantennary glycoforms were also identified (supplemental Table S3). Sialic acid micro-heterogeneity was not observable since sialic acids were hydrolyzed in the preparative procedure. To illustrate the used methodology, MS n of N-linked glycopeptides originating from three well-known N-glycoproteins (apolipoprotein D, uromodulin and prothrombin) are described in more detail (Fig. 5). Firstly, the FTICR-MS 1 spectrum (Fig. 5A) showed the ADGTVNQIEGEATPVN 98 LTEPAK peptide from Apolipoprotein D (UniProt/KB accession P05090) with N-linked glycans corresponding to the complex type biantennary and fucosylated bi-, tri-, and tetraantennary structures. NAc and Hex 2 HexNAc, respectively. The third most intense ion (m/z 1229.5) resulted from a glycosidic cleavage at the GlcNAcGlcNAc chitobiose core and corresponds to the [peptideϩHexNAcϩ2H] 2ϩ (Y 1 ) ion. CID-MS 3 of the [peptideϩHexNAcϩ2H] 2ϩ ion (Fig. 5C) induced peptide backbone fragmentation into b-and y-ions and were used for identification of the glycan attachment site and peptide sequence by the Mascot algorithm.
Second, the CID-MS 2 fragmentation of a precursor at m/z 1394.9, corresponded to a fucosylated tetra-antennary complex type N-glycopeptide from uromodulin (UniProt/KB accession P07911) and resulted in a prominent charge reduced fragment ion at m/z 1909.8 because of the loss of a terminal HexHexNAc moiety and a proton (Fig. 5D). The second most intense fragment (m/z 1017.4) corresponded to [peptideϩdHexHexNAcϩ2H] 2ϩ , indicating that the fucose resided on the asparagine linked GlcNAc. Additional fragment ions were visible at m/z 1836.8, m/z 1727.6, and m/z 1646.7 corresponding to the loss of dHexHexHexNAc, Hex 2 HexNAc 2 and Hex 3 HexNAc 2 , respectively, and revealed partial structural information on the N-linked glycan. The CID-MS 3 spectrum at m/z 1909.8 (Fig. 5E) showed further sequential glycosidic fragmentation and the entire N-glycan sequence was verified. Ideally, the fragment ion corresponding to [peptideϩHexNAcϩ2H] 2ϩ at m/z 944.6 ( Fig. 5D) would have been used for the peptide identification but because of its low abundance it was not selected for CID-MS 3 fragmentation. Low abundance of Y 1 -ion peaks was found to be a common feature for core fucosylated N-glycopeptides in CID-MS 2 spectra (supplemental Fig. S6). Instead, the fragment ion corresponding to [peptideϩdHexHexNAcϩ2H] 2ϩ (m/z 1017.4, Fig. 5D) was selected for CID-MS 3 fragmentation (Fig. 5F). We observed an intense peak at m/z 943.9 corresponding to the loss of dHex together with minor peaks corresponding to peptide fragmentation and the MS 3 spectrum was matched to the tryptic QDFN 322 ITDISLLEHR peptide of uromodulin, with a Mascot score of 13 (p Ͻ 0.05 threshold; Ͼ10).
Third, the CID-MS 2 fragmentation of a pentuply charged biantennary N-linked glycopeptide at m/z 867.6 ( Fig. 5G) rendered in a different fragmentation pattern compared with a triply charged biantennary N-glycopeptide (compare Figs. 5B and 5G) because of the different charge states, 3ϩ versus 5ϩ. For the pentuply charged precursor we observed abundant glycosidic fragmentation of the terminal HexHexNAc residues and no apparent ion intensity corresponding to the peptideϩHexNAc fragment. Subsequent CID-MS 3 at m/z 993.3 in (Fig. 5H) allowed for verification of the biantennary glycan structure but the amino acid sequence remained unidentified because of the lack of CID-MS 3 data on the peptideϩHexNAc fragment ion. However, considering the high charge state, and thus the relatively low m/z ratio, this glycopeptide was efficiently fragmented into c-and z-type ions by ECD-MS 2 (Fig. 5I) and the peptide sequence was identified to originate from the tryptic YPHK-PEIN 143 STTHPGADLQENFCR peptide from prothrombin (UniProtKB accession P00734). The combination of CID-MS n with ECD-MS 2 was found to be useful in the identification of an additional N-linked glycopeptide (supplemental Fig. S6), namely the tryptic LHEITN 117 ETFR peptide of vasorin (UniProt/KB accession Q6EMK4).

DISCUSSION
The production of urine takes place in the nephron and involves a complex process of ultrafiltration, reabsorption and secretion, eventually leading to the formation of a complex solution containing metabolic waste products, proteins and peptides (54). The high content of salt and metabolic waste products in human urine requires sample purification for the removal of interfering compounds and isolation of urinary proteins prior to proteomic analysis. As yet, there is no universal method that offers complete recovery of the urinary proteome. Various approaches have been investigated for this purpose with each method offering advantages and disadvantages when compared with each other (55). In our study, the choice of sample preparation method was important not only for qualitative recovery of urinary proteins but was also essential for our downstream application, i.e. mild periodic acid oxidation of sialic acids. Efficient and selective oxidation of sialic acids was critical for the enrichment procedure of urinary glycoproteins, a reaction conducted under mild conditions employing only 2 mM periodic acid. Thus, the sample preparation method had to offer qualitative recovery of the urinary proteome and deplete metabolic waste products that might interfere or quench the subsequent oxidation of sialic acids. Several sample preparation methods were examined for this purpose, including organic solvent precipitation (acetone and trichloroacetic acid), spin column purification, size exclusion and reversed phase (C18) chromatography (not shown). Unfortunately, all were found to yield inadequate sample purity and failed in removing residual urinary pigments, which interfered with the sialic acid oxidation.
Eventually, we explored dialysis followed by lyophilization as a way to isolate and concentrate urinary proteins in a two-step procedure. Dialysis of urine against water alone was inefficient (supplemental Fig. S2) but the addition of 1.5% SDS and dialysis at 60°C was found to yield sufficient sample purity for subsequent sialic acid oxidation. The dilute dialysates were subsequently concentrated through lyophilization to minimize the risk of unnecessary sample losses. Albeit time consuming, the preparative procedure employed in this study was thus justified by the strict requirement of sample purity and qualitative protein recovery.
Given that the dialyzed samples would serve as the basis for enrichment of sialoglycoproteins, it was also important to validate the preparative procedure to ensure that a representative urinary proteome was isolated following dialysis and lyophilization. By comparing our data set with the comprehensive proteomic studies of Adachi et al. and Kentsis et al. (8,9), we concluded that 90% of our protein identifications showed a nearly uniform overlap with the data sets of these studies (supplemental Fig. 3A). This observation confirmed that the glycoproteomic data would not mirror an atypical urinary subproteome as a result of the preparative procedure. It should be stressed that our proteomic analysis was not intended to expand the urinary proteome coverage. Thus, in contrast to previous studies, we did not deplete or prefractionate the urine sample prior to the one-dimensional electrophoretic separation, which may explain the relatively low number of protein identifications in this study.
Subsequent enrichment of sialoglycoproteins from the dialysates was achieved through conjugation of oxidized sialic acids to hydrazide beads (supplemental Fig. S1B and S1C). Although side reactions with terminal Hex or HexNAc residues of nonsialylated glycoproteins cannot be completely avoided, the mild oxidation constitutes the first step of introducing specificity to the enrichment procedure. Under these mild conditions, oxidation takes place primarily at the glycerol side chain (C7-C9) of sialic acids. In other words, hydrazide reactive aldehyde groups are specifically introduced on sialic acid by periodic acid oxidation at 0°C. Consequently, targeted enrichment of sialoglycoproteins is enabled by reducing sample complexity through sequential washes of the solid phase to remove nonglycosylated and nonsialylated urinary proteins.
Following trypsin digestion and peptide extraction, the solid phase was extensively washed to remove any remaining nonglycosylated peptides in order to avoid interference by e.g. ion suppression effects in downstream analyses. The covalently linked glycopeptides were subsequently released by mild formic acid hydrolysis for MS-analysis. The formic acid treatment results in specific hydrolysis of sialic acid glycosidic bonds without affecting linkages between dHex, Hex or Hex-NAc residues, and thereby represents the second step of specificity in the glycopeptide enrichment procedure. Only species sensitive to formic acid cleavage are released from the hydrazide beads, which includes glycopeptides conjugated through sialic acids and exclude nonsialylated glycopeptides. Thus, other biomolecules harboring hydrazide reactive groups but lacking formic acid sensitive linkages are also excluded in this step. The combination of both specificity steps, i.e. mild periodic acid oxidation and mild formic acid hydrolysis, thus allows for selective isolation of desialylated glycopeptides. Consistent with this statement, base peak chromatograms of formic acid released fractions revealed various N-and O-linked glycopeptides as the dominating components (supplemental Fig. S4) with Ͼ80% of the subsequent CID-MS 2 spectra possessing typical glycopeptide fragmentation patterns accompanied by diagnostic carbohydrate oxonium ions (56).
Identification of glycan-and peptide sequences was enabled by subjecting enriched glycopeptides to multiple rounds of CID fragmentation. CID-MS 2 spectra of HexHex-NAc glycoforms displayed prominent Y 1 and Y 0 fragments that were used to identify HexHexNAc-O-Ser/Thr sequences. Weak fragment ions corresponding to the mass of peptideϩHex, indicated with an asterisk in Figs. 2 to 4, were also observed during CID-MS 2 . These observations may contradict the HexHexNAc-O-Ser/Thr sequence outlined above, suggesting a Hex residue as the internal peptide linked monosaccharide. However, migration of hexose residues upon CID of protonated N-glycans and N-glycopeptides has been previously observed (57,58), resulting in fragment ions which may lead to incorrect structural predictions. We speculate that the weak peptideϩHex fragment ions generated upon CID of protonated O-linked glycopeptides are most likely caused by hexose migrations similar to those observed for protonated N-linked glycopeptides but further studies are needed to verify these findings.
O-linked glycopeptides containing the Hex 2 HexNAc 2 glycoform generally required five CID-MS 3 experiments to delineate glycan-and peptide sequences. For O-linked glycopeptides with more than four monosaccharide units, isolation of intact peptide ions for CID-MS 3 fragmentation proved difficult because of the increasing dominance of glycosidic fragments in MS 2 spectra. Thus, the characterization of glycan-and peptide sequences for O-linked glycopeptides glycosylated beyond the simple core 1-like structure was rapidly complicated by the increasing number of monosaccharides. This is in contrast to N-linked glycopeptides which are readily identified even though they contain 9 -13 monosaccharide units. Difficulties in characterizing O-linked glycopeptides with the Hex 2 HexNAc 2 glycoform arise not only from isolation of Y 0 -ions for CID-MS 3 , but also from assigning the correct glycan sequence for the carbohydrate moiety. Y-type fragments are usually unable to resolve complex O-glycan sequences since they are equally well matched to the fragmentation pattern of different glycoforms. Thus, the identification procedure for O-glycopeptides is not easily automated and careful manual annotation is still necessary for correct assignment of glycan sequences.
By combining the CID and ECD data for each precursor ion, complementary information of core glycosylation could be gathered. ECD induced peptide fragmentation of Hex 2 HexNAc 2 glycoforms revealed if the oligosaccharide components were located on two separate amino acids, suggesting a macroheterogeneity with two core 1-like glycans (Fig. 2), or different glycans on one single amino acid, indicating site-specific microheterogeneity (Fig. 3). However, ECD fragmentation does not provide structural information on the glycan sequence per se and determination of glycan sequence was therefore mainly based on CID-MS n data. Thus, the main purpose of the ECD experiments was to determine the amino acid attachment sites of O-linked glycans. Traditionally, O-linked glycans are attached to serine or threonine residues but recently we reported a tyrosine residue to be modified by a sialylated O-linked glycan on amyloid beta peptides in human cerebrospinal fluid (59). However, our ECD experiments did not reveal any tyrosine glycosylated peptides in the urine samples, suggesting that complex tyrosine glycosylation is rare, and possibly more tissue specific, than mucintype O-glycosylation on the serine and threonine residues.
The majority of O-linked glycopeptides in Table I were thus identified with a single core 1-like glycan, which raises the issue of whether or not proteins O-glycosylated with core-1 like glycans are positively selected for by our approach. We argue that terminal sialic acids should be equally well oxidized by the periodic acid treatment, regardless of their core glycan structure, and that O-glycopeptides are equally well enriched on the hydrazide beads, given that they are sialylated to the same extent. The release mechanism should also not be dependent on the core glycan structure but only related to the hydrolysis of acid sensitive NeuAc-Gal or NeuAc-GalNAc glycosidic linkages. The subsequent detection of glycopeptides in LC-FTICR-MS 1 is largely dependent on two factors: 1) the chromatographic properties of the peptide backbone, i.e. only glycopeptides of suitable length and hydrophobic character will be resolved by the C18 column; and 2) the physiochemical properties of the peptide backbone, which will dictate the extent of ionization and the stability of the parent ions. O-glycosylation microheterogeneity was found to have a minor impact on chromatographic retention times (Fig. 2E) with various peptide glycoforms eluting within a narrow time frame. The chromatography is thus not expected to favor any particular peptide glycoform since the retaining properties of the C18 column are generally dependent on the peptide composition rather than on the glycan structure. Thus, enrichment and characterization of O-glycan microheterogeneity, i.e. core 1-like versus core 2-like glycosylations, is probably not limited by the chromatographic resolution since different core glycans attached to the same peptide backbone are expected to be resolved equally well. Positive mode ionization of glycopeptides results in detection of [MϩnH] nϩ molecular ions, an outcome that is dependent on the proton affinity of the peptide backbone. This property justifies the comparison of signal intensities not only for detection of microheterogeneity but also for relative quantification of individual peptide glycoforms (60). We were also able to observe extensive microheterogeneity for specific O-glycopeptides, as demonstrated for the A 342 VAVTLQSH 350 peptide of protein YIPF3 in Fig. 3. This O-glycopeptide was identified in five different core glycoforms ranging from a single HexNAc residue to a fucose containing pentasaccharide, clearly showing that our approach is not selective for O-glycopeptides occupied only by core 1-like glycans. Taken together, this indicates that the observed HexHexNAc core 1-like glycans are indeed the predominant O-glycans of the sialylated human urinary glycoproteome. In an earlier study the sialylated core 1 glycan was really shown to be the dominating O-glycan for uromodulin in nonpregnant female and male urine samples whereas Lewis structures on O-glycans were typical for uromodulin in pregnant female urine (61). We were unable to identify any O-linked glycopep-tides from uromodulin in our study, which suggests that the O-glycans of uromodulin are located within trypsin-inaccessible regions of the protein. Alternatively, the trypsin digestion might also result in short, hydrophilic O-glycopeptides which were not retained by the C18 column and thus not detected during analysis. This limitation, which extends to all urinary glycoproteins and is valid for both N-and O-linked glycosylations, may be circumvented by the use of alternative proteases.
Several urinary glycoproteins, e.g. CD44, macrophage colony-stimulating factor 1, vasorin, complement component 7 and protein HEG homolog, identified as enriched glycopeptides in Table I, are each estimated to constitute less than 0.1-0.02% (by mass) of the core urinary proteome (10). This clearly shows that sialylated glycoproteins present in minute amounts in the urine are selectively made accessible for glycoproteomic characterization by the enrichment procedure. Notably, several other glycoproteins of Table I have been identified as potential biomarkers, e.g. elevated levels of urinary IGF-2 in urothelial carcinoma of the bladder (62) and it is not unlikely that these changes are accompanied by aberrant O-glycan profiles. The sialyl-Tn antigen (Neu5Ac␣2-6GalNAc␣-O-Ser/Thr) is a rare glycoepitope in normal tissue but high expression levels are known to occur in ovarian (63), gastric (64), colorectal (65) and pancreatic (66) carcinomas. Existing evidence also indicates that O-glycan occupancy is increased in cancer cells (67,68). The ability to probe both these features simultaneously, i.e. site occupancy and O-glycan microheterogeneity, thus offers a unique opportunity to link aberrant glycans with distinct proteins. Although nonsialylated structures, e.g. Tn-antigen (GalNAc␣-O-Ser/Thr) or high-mannose type N-glycans are not enriched by the procedure, this analytical strategy could provide further insight into the process of pathogenesis for a wide range of diseases by identifying key proteins that are aberrantly glycosylated. Thus, the methodology and the results presented in this study should be of value for further exploration of the urinary glycoproteome in search of novel disease biomarkers.