N-linked (N-) Glycoproteomics of Urimary Exosomes*

Epithelial cells lining the urinary tract secrete urinary exosomes (40–100 nm) that can be targeted to specific cells modulating their functionality. One potential targeting mechanism is adhesion between vesicle surface glycoproteins and target cells. This makes the glycopeptide analysis of exosomes important. Exosomes reflect the physiological state of the parent cells; therefore, they are a good source of biomarkers for urological and other diseases. Moreover, the urine collection is easy and noninvasive and urinary exosomes give information about renal and systemic organ systems. Accordingly, multiple studies on proteomic characterization of urinary exosomes in health and disease have been published. However, no systematic analysis of their glycoproteomic profile has been carried out to date, whereas a conserved glycan signature has been found for exosomes from urine and other sources including T cell lines and human milk. Here, we have enriched and identified the N-glycopeptides from these vesicles. These enriched N-glycopeptides were solved for their peptide sequence, glycan composition, structure, and glycosylation site using collision-induced dissociation MS/MS (CID-tandem MS) data interpreted by a publicly available software GlycopeptideId. Released glycans from the same sample was also analyzed with MALDI-MS. We have identified the N-glycoproteome of urinary exosomes. In total 126 N-glycopeptides from 51 N-glycosylation sites belonging to 37 glycoproteins were found in our results. The peptide sequences of these N-glycopeptides were identified unambiguously and their glycan composition (for 125 N-glycopeptides) and structures (for 87 N-glycopeptides) were proposed. A corresponding glycomic analysis with released N-glycans was also performed. We identified 66 unique nonmodified N-glycan compositions and in addition 13 sulfated/phosphorylated glycans were also found. This is the first systematic analysis of N-glycoproteome of urinary exosomes.

Epithelial cells lining the urinary tract secrete urinary exosomes (40 -100 nm) that can be targeted to specific cells modulating their functionality. One potential targeting mechanism is adhesion between vesicle surface glycoproteins and target cells. This makes the glycopeptide analysis of exosomes important. Exosomes reflect the physiological state of the parent cells; therefore, they are a good source of biomarkers for urological and other diseases. Moreover, the urine collection is easy and noninvasive and urinary exosomes give information about renal and systemic organ systems. Accordingly, multiple studies on proteomic characterization of urinary exosomes in health and disease have been published. However, no systematic analysis of their glycoproteomic profile has been carried out to date, whereas a conserved glycan signature has been found for exosomes from urine and other sources including T cell lines and human milk.
Here, we have enriched and identified the N-glycopeptides from these vesicles. These enriched N-glycopeptides were solved for their peptide sequence, glycan composition, structure, and glycosylation site using collision-induced dissociation MS/MS (CID-tandem MS) data interpreted by a publicly available software Glycopepti-deId. Released glycans from the same sample was also analyzed with MALDI-MS.
We have identified the N-glycoproteome of urinary exosomes. In total 126 N-glycopeptides from 51 N-glycosylation sites belonging to 37 glycoproteins were found in our results. The peptide sequences of these N-glycopeptides were identified unambiguously and their glycan composition (for 125 N-glycopeptides) and structures (for 87 Nglycopeptides) were proposed. A corresponding glycomic analysis with released N-glycans was also performed. We identified 66 unique nonmodified N-glycan compositions and in addition 13 sulfated/phosphorylated glycans were also found. This is the first systematic analysis of Nglycoproteome of urinary exosomes. Urine is a combination of plasma filtrate and the secretion profile of cells lining the urino-genital tract. This secretion profile, in addition to proteins and metabolites, also contains exosomes and larger microvesicles that have glycoproteins on their surface. In healthy individuals, ϳ70% of the urinary proteome originates from kidneys and the rest represents plasma filtered by glomeruli (1). Proteins present in urine are a collection of proteins secreted by a number of tissues, which changes in disease states (2). Therefore, the urinary proteome may serve as a rich source of biomarkers for urogenital and systemic diseases, which have been reviewed previously (3). Moreover, urine collection is a noninvasive procedure, which makes it an ideal candidate for discovery of novel biomarkers. Only a few large scale studies on urinary proteome and glycoproteome have been published (4,5). However, most of them have not focused on the glycopeptide characterization.
Microvesicles, including exosomes are secreted by many cell types and involved in functions including antigen-presentation, cell-to-cell communication, and immunomodulation (6). These are specialized compartments of cells and they mirror the physiological state of cells secreting them while also providing information about the environment into which they are secreted. For instance, the immunosuppressive and pro-angiogenic environment of cancer may be mediated partially by exosomes (6). Exosomes and other types of microvesicles are abundantly found in urine and thought to be mainly secreted by epithelial cells lining the urinary system (7). These vesicles contain DNA, RNA, and proteins.
Glycosylation is an important post-translational modification of proteins and lipids and appears to play many roles, e.g. in cell adhesion, cell-to-cell communication, and immune response (8). Glycosylation is also very important for targeting of proteins to various compartments of the cells. Accordingly, glycans of glycoproteins have important roles in protein sorting to membrane microdomains and furthermore in influencing their intracellular trafficking (9). Microvesicles have a glycan signature that is distinct from the parent cell, suggesting that they originate from specialized membrane microdomains implying a role of glycosylation in microvesicle protein sorting (10,11). Changes in N-glycans of exosomes from expressed prostatic secretions correlate with disease severity (12). HIV-1 particles were found to have a glycome (the comprehensive glycan profile of a protein, cell, or tissue) largely shared with microvesicles, which is taken to imply that the virus hijacks the glycomachinery of infected cells and uses it systematically to infect additional cells or to deceive the normal immunodefence (13). Thus, the specific glycoproteins of exosomes may be of major impact in targeting exosomes to distinct cells and tissues.
Exosome uptake in various cell types has been shown to occur through the mechanisms of clathrin-mediated endocytosis, phagocytosis, and micropinocytosis (14,15). The uptake of exosomes by dendritic cells and macrophages has been shown to be inhibited by mannose, N-acetylglucosamine, and lactose residues, respectively. This uptake is mediated by a C-type lectin in dendritic cells and galectin-5 in macrophages (14,15). All this points toward a system in which the exosome glycosylation pattern is kept specific by the cells secreting them to suit the target cell makeup and uptake pathways, and further downstream functions. Taken together, these findings suggest that better understanding of surface glycosylation patterns as well as the glycomics and glycoproteomics of exosomes might help in establishing the specificity of exosome uptake by target cells and activated downstream pathways. This information about exosome uptake might be utilized in therapeutics involving exosomes. Glycome and glycoproteome of urinary microvesicles will provide information not only about the functional state of constituent proteins, but it will also highlight the similarities and differences among proteins that are specifically targeted to exosomes.
An N-glycopeptide analysis using collision induced dissociation (CID)-Tandem MS has been reported previously for different sample types (16,17). This approach provides information about the composition of glycan, partial structure, glycosylation site, and peptide sequence from the same molecule compared with approaches where released glycans or peptides of N-glycopeptides are analyzed separately.
We have published an algorithm for an automated analysis of N-glycopeptides (18,19). A public, web-based software with some changes to the original was developed and named as GlycopeptideId (www.appliednumerics.com, Glycopepti-deID version 28 -02-28 0.91 beta). The major change was to analyze glycan structures against a database and not with an iterative de novo glycan structure analysis. The proposed structures were further manually validated by the presence of diagnostic ions, when they were available for the given structure.
This software was utilized in this study and a comprehensive glycopeptide characterization of urinary exosomal glycoproteins was carried out. We report here the glycan structure determination of urinary exosomal glycoproteins. We have characterized 126 N-glycopeptides representing 51 N-glycosylation sites that belong to 37 glycoproteins. Additionally, glycomic analysis of released N-glycans was also performed and 66 unique modified and 13 sulfated and/or phosphorylated glycans were found. A third of total glycan compositions were common to both the analysis, whereas approximately a third were unique to both the analysis.

EXPERIMENTAL PROCEDURES
Urine Collection-The first morning urine was collected from three healthy volunteers (one male and two females, age 25 to 45 years) in clean 1 L plastic bottles. The samples were not named and randomized. The urine was tested anonymously with Combur 10 Test®D dipstick (Roche Diagnostics; Mannheim, Germany), and pooled together. Combur test values are given in the supplemental materials(supplemental Table S1). Only the subjects with normal Combur test values were included; however, no abnormal values were found for anyone; therefore, no one was excluded from the study. No protease inhibitors were added. The study was ethically approved by Coordinating Ethics Committee, Hospital District of Helsinki and Uusimaa, Finland. The reference number of the ethical approval is 77/13/03/00/14.
Preparation of Exosomal and Other Fractions of Urine-A schematic representation of the methodology used to isolate vesicles is shown in Fig. 1. The isolation was performed as previously described (20,21). In summary, pooled urine samples (800 ml) were initially centrifuged at a Relative Centrifugal Force (RCF) of 2000 ϫ g for 20 min. Next, 800 ml of SN1 was concentrated by Amicon filter cutoff 100kDa (Millipore, Bedford, MA). The concentrated SN1 was centrifuged by Eppendorf 5810R centrifuge (Eppendorf, Hamburg, Germany) at 15,000 rounds per minute (rpm) 1 (ϳ18,000 ϫ g) for 30 min at room temperature (RT) in a fixed angle rotor (F34 -6-38, Eppendorf). The SN 18,000 ϫ g (SN18) fraction was subjected to an ultracentrifugation step by OptimaTM L-90 K preparative ultracentrifuge (Beckman Coulter, Miami, FL) at 44,000 rpm (200,000 ϫ g) for 2 h at RT in a fixed-angle rotor (Beckman 70.1Ti, Beckman Coulter).
Additional methods for SDS-PAGE, Western blotting and negative transmission electron microscopy are given in Supplemental File S1.
Glycoproteomics-For glycopeptide analysis, the whole workflow is described in Fig. 2. Briefly, exosomes were lysed, reduced, alkylated, and trypsin digested. The tryptic digests were used for SNA affinity chromatography or size exclusion chromatography (SEC). Enriched N-glycopeptides were analyzed by CID-tandem MS. All the MS 2 spectra were deconvoluted and combined in the format of .pkl files, which were analyzed by GlycopeptideId in database dependent manner, spectra were annotated and a list of peptide sequences, glycan compositions, glycan structures, and glycosylation sites were returned.
Trypsin Digestion and Glycopeptide Enrichment-Trypsin Digestion-Total proteins coming from lysed exosomes were digested as described (22). Briefly, 100 l of proteins solution (1 mg total protein) were maintained in Tris buffer, pH 8.0, and 6 M urea. tion) for 1 h at RT. Iodoacetamide was added to a final concentration of 40 mM and the proteins were alkylated at RT for 1 h. Iodoacetamide was quenched by adding the same amount of DTT and leaving the reaction mixture to stand for 1 additional hour at RT. The protein mixture was diluted 10 times to reduce the concentration of urea to 0.6 M, and 20 g of bovine trypsin was added. The digestion was carried out at ϩ 37°C overnight. Next day, the reaction was stopped by adding concentrated acetic acid to bring the pH below the value of five.
SNA adsorbent slurry (150 l) was pipetted into spin columns and equilibrated with 2 ϫ 600 l of 10 mM HEPES, pH 8.0, and 0.15 M NaCl (binding buffer). Tryptic digest of two hundred g of total exosomal proteins was loaded to the column in binding buffer and incubated at ϩ4°C overnight. SEC was performed as described (23). The tryptic peptide fraction was then subjected to SEC using SuperdexPeptide 10/300 GL column (GE Healthcare, Amersham Biosciences), eluted isocratically at a flow rate of 0.5 ml/min with an aqueous solution that contained 0.1% formic acid. Fractions were collected from this column each minute. Using this procedure, the N-glycopeptides were separated from most of the nonglycosylated peptides. Fractions that contained N-glycopeptides were pooled into microcentrifuge tubes and dried under reduced pressure using a SpeedVac system (Savant, ThermoElectron Corporation, ThermoFisher Scientific, MA).
For LC-MS/MS 2 analysis, salt and lactose were removed from the glycopeptide samples using Pierce C18 spin columns according to the Manufacturer's protocol. The eluted N-glycopeptides were dried in SpeedVac (Savant, ThermoElectron Corporation, ThermoFisher Scientific, Waltham, MA) and dissolved in 20 l of 0.1% formic acid. The samples were stored at Ϫ20°C until analysis. GlycopeptideId Based Search of CID Spectra-N-Glycopeptide Identification-The MS 2 spectra were deconvoluted in Waters MassLynx 4.1 software using the MaxEnt3 module and saved as peak lists (file format .pkl). The glycopeptide MS 2 data were analyzed with the GlycopeptideIDsoftware. The GlycopeptideID is an open access web service aimed to ease the data analysis of intact N-glycopeptide CID LC-MS 2 data. The service is developed by Applied Numerics Ltd (Helsinki, Finland) in collaboration with the University of Helsinki. The computational methods used follow closely the ones presented by Joenvä ä rä et al. (18). The main difference is that the glycan structures are identified against a glycan database and no de novo glycan structures are built. At this context, a brief overview of the service is given. A detailed description is given at the service website and will also be published later.
The N-glycopeptide analysis algorithm has three main steps: (1) scoring possible peptide backbones by matching the MS 2 spectra against a peptide database, (2) when the mass of the potential peptide is deduced from the mass of the glycopeptide, the mass of the glycan is obtained and glycan composition is calculated from monosaccharide masses in which N-glycan core is always included, and (3) scoring the MS 2 spectra against the glycan structures from Gly-comeDB corresponding to calculated glycan compositions from the previous step.
For the scoring of the glycan there are two possibilities: (1) glycan structure comes from GlycomeDB when a calculated composition is also found in GlycomeDB. The glycan structure gives limitations to the possible theoretical fragments and so the different structures with the same glycan composition will yield different scores, and (2) in the case where the glycan composition is not found in GlycomeDB, it is built as de novo, as the best fitting monosaccharide combination by mass error (N-Core is always included). In this case, there is no other structure, but N-Core to limit the possible theoretical fragments, but still the spectra are scored as previously.
The outcome is a list of the best matching N-glycopeptides for each precursor. The results are ordered by the total score, which is related to the (binomial) probability that a similar match could be achieved by random sampling. The theoretical glycan MS 2 spectra is composed by fragments generated by glycosidic cleavages and no cross ring fragments are evaluated.
The identified glycan can be a glycan composition or a glycan structure from GlycomeDB. False discovery rate (FDR) analysis can be applied to the peptide part. The analysis is based on applying an identical workflow against the target and decoy (reversed) peptide databases and for the FDR above, a given score limit is estimated by the relative number of decoy matches.
In this study, the glycans were searched against GlycomeDB (http://www.glycome-db.org/) human glycan structures (GlycomeDB database downloaded on December 5, 2013) with the N-glycan core and also against computer generated (de novo) compositions with the given range of monosaccharaides. Peptides were searched against the peptide database generated from the reviewed UniProt (http:// www.uniprot.org/) human proteins (database: UniProt release 2013_04). The peptide database contained peptide sequences (191,896 sequences) with a potential N-glycosylation site (NX[S/T/C], X! ϭ P; X can be any amino acid but not Proline), with a max length of 30 amino acids and with max two missed trypsin cleavage sites. Carbamidomethylation of cysteine was searched as fixed modification. The mass accuracy is given by absolute ⌬m/z and ppm and whichever is lower is used. The precursor and fragment tolerance was kept at dm/z 0.05 Da and 20 ppm.
In the resulting data, glycan compositions are given as one-letter abbreviations H: Hexose, N: Hexosamine, S: Sialic acid, and F: Fucose, and the number following indicates the amount of the monosaccharides. For example, S2H6N5 stands for a glycan containing two sialic acids, six hexoses, and five hexosamines. Proteins are given as the UniProt Id, e.g.
The glycan structure format used in the GlycopeptideID is a simplified version of the consortium for functional genomics Modified IUPAC condensed format (www.functionalglycomics.org/static/ consortium/Nomenclature.shtml). The stereoisomer (␣, ␤) and the regioisomer (e.g. 1-4) notations are omitted and the long monosaccharide names are replaced by single letter codes (H:Hex, N:HexNAc, F:Fuc, and S:NeuAc). The resulting format consists of linear glycan sequences with branching shown by parenthesis and written from nonreducing to reducing end. As an example a core fucosylated, complex N-glycan with two branches shown in consortium for functional genomics format as GlcNAc-Asn is written as SHNH(SHNH)HN(F)N. The aim of the format is to have a compact notation and to show only the amount of information that can be identified with the current experimental setup. In supplemental Table S2, structure matched to the spectrum is shown in such a format.
Glycomics-N-glycan analysis was performed to 250 l of samples 1-5. Asparagine-linked glycans were detached from the proteins by Elizabethkingia meningoseptica N-glycosidase F digestion (Prozyme, Hayward, CA). The released N-glycans were purified by solid-phase extraction with Hypersep C18 and Hypersep Hypercarb (Thermo Scientific). The samples were further purified with miniaturized solidphase extractions. Total N-glycome as well as neutral and acidic fractions, were analyzed with matrix assisted laser desorption-ionization time of flight (MALDI-TOF) MS Ultraflex III TOF/TOF (Bruker Daltonics Inc, Bremen, Germany). Neutral N-glycans and total Nglycome were analyzed in positive ion reflector mode as [MϩNa] ϩ ions and acidic N-glycans in negative ion reflector mode as [M-H] Ϫ ions.
Relative intensities were determined by flexAnalysis 3.4 software (Bruker Daltonics). The glycan profiles were produced from the resulting signal lists and all interfering signals not arising from the glycan components of the sample were eliminated (overlapping isotopic patterns, multiple alkali metal adduct signals, products of elimination of water from oligosaccharides). The resulting glycan signal intensities were normalized to be 100%. Signal was accepted if the signal to noise ratio was Ͼ2. Maximum mass difference between calculated and measured mass that was accepted was 0.1 Da.

RESULTS
Exosome Purification-The exosomes were purified as described in Fig. 1. The previously published protocols were used for isolating urinary exosomes (7,20,21). The exosomes were analyzed with transmission electron microscopy (TEM) using a methodology previously published (20). Two representative TEM pictures are shown in supplemental Fig. S1 (supplementary File S1). As can be seen, most of the vesicles are in the range of 30 -100 nm and appear cup shaped or round in morphology, which is typical of exosomes. This also shows that the purification procedure is gentle enough to preserve the integrity of the vesicles. The Western blot showing the presence of TSG101 in various urine fractions is given in supplemental Fig. S2 (supplementary File S1). In the 200,000 ϫ g pellet,(P200,000 ϫ g) the presence of TSG101 at its molecular weight is indicative of exosomes.
N-glycopeptides-The results obtained from the above workflow (Fig. 2), after combining the N-glycopeptides enriched by either SNA affinity chromatography or SEC, are presented in Table I. A total of 126 N-glycopeptides from 51 N-glycosylation sites belonging to 37 glycoproteins were identified in our study with false positive rate of 2.56%. Out of these 126 N-glycopeptides, one glycopeptide was the result of a miscleavage and as a result, galectin-3-binding protein had two peptides, YKGLNLTEDTYKPR and GLNLTEDTYKPR, which were both found with the same composition of the glycans. These two N-glycopeptides were counted as one composition; and therefore, 125 compositions were elucidated and 87 potential structures were proposed out of these 125 compositions (supplemental Table S2). From one (multiple proteins) up to 31 N-glycopeptides (galectin-3-binding protein) were found for different proteins.
Peptide Sequences/N-Glycosylation Sites-In total, 51 Nglycosylation sites were found in our analysis belonging to 37 proteins. Different numbers of glycosylation sites from one up to five N-glycosylated sites were found from different proteins. The average number of sites found for each protein was typically one, whereas some proteins yielded three sites (aminopeptidase N and megalin) and still others five sites (cubilin).
Out of the total 51 sites, 65% (33 sites) were known to be experimentally validated as glycosylated mainly by hydrazide chemistry methods (Source: UniProt). An additional 21% of N-glycosylation sites (11 sites) are listed in UniProtKb as potential. These sites are predicted based on the consensus N-glycosylation site motif (NXS/T). Our study confirms the presence of N-glycans in these sites. Another 14% of the total sites found (seven sites), are new and not present in any database to the best of our knowledge. All these sites contain the consensus N-glycosylation motif.
The glycoproteins containing new sites are AMN1 homolog, aminopeptidase N, gamma glutamyltransferase light chain 1, neuropathy target esterase, regulator of G-protein signaling, zinc finger protein 518A, and uromodulin (supplementary Table S2). Annotated spectra of all the N-glycopeptides is given in the supplementary File S1. Detailed annotation of the fragment ions spectra (Glycan assignment) is given in supplemental Table S3. In the table, spectra annotation can be differentiated by m/z values that are given as names of the Excel sheets. Some of the compositions from our study are previously known (12 out of 125) and are presented here. Fetuin A glycopeptide, having N site 156, was shown to have a S2H5N4 composition (S:sialic acid, H:hexose, and N:HexNAc) in our study. The core fucosylated variant of this composition has been identified in another study in bovine fetuin (24). This study by Nilsson et al. used desialylated structures of the glycans for analysis, therefore, the degree of sialylation information was lost in their study. Hence, only H5N4 was identified. S2H5N4 implies that this glycan is biantennary as two sialic acids are present to cap the antennae unless poly-sialic acid is present. Similarly, Apolipoprotein D H5N4F1 was identified in their study from cerebrospinal fluid (24), whereas the sialylated version S2H5N4F1 was identified in our study elu-cidating the degree of sialylation. Similarly, ceruloplasmin was identified to have, at Asn138 and 397, H5N4 in their study, whereas S2H5N4 was identified in our study. H5N4 was identified at Asn187 in Hemopexin by Nilsson et al., whereas S2H5N4 was identified in our study. The same was true of Asn144 of IGHA1 where they identified H5N4, whereas we have identified S1H5N4. Asn36 of the thyroxine binding globulin was found to have H5N4 in their study, whereas we found S2H5N4. In another study by the same group, Halim et al., (17), found some structures common with our study. They had again used sialic acid oxidation and the capture method to enrich N-glycopeptides, which results in loss of sialic acid before analysis. Vasorin, at Asn117, was found to have H5N4 in their study and we found S2H5N4. Prothrombin at Asn121 was found to contain H5N4, whereas we have found S2H5N4. Apolipoprotein D at Asn65 was found to have H6N5 in their study, whereas we have found S3H6N5. Two glycans found at Asn176 of IgG heavy chain are already known from various studies (25,26). These 12 glycans identified at various sites of different proteins in different studies from varied sources, such as plasma, cerebrospinal fluid, and urine, serve as validation of our results. Other than these 12 glycan compositions, 113 other glycan compositions proposed in our study are novel as are the 85 structures barring two IgG N-glycopeptides. Glycan Structures from N-glycopeptides-For 87 N-glycopeptides a structure was proposed, whereas for another 38 N-glycopeptides, de novo glycan composition was given and no structure was applied to these compositions.
Proposed Structures-In this section, we only talk about 87 N-glycopeptides where a structure was proposed. Out of the total 87 GlycomeDB derived and scored structures from 126 N-glycopeptides in our study, 82% were complex type structures, 14% high mannose, and 3% hybrid. These classes of glycans can also be inferred from the composition of glycan with help from the knowledge of N-glycans.
Of the total complex type N-glycopeptides (71) found in our study, 38% were bisecting GlcNAc complex glycans. Verifi-cation of the bisecting type structures was done by checking the presence of peptide anchored ion of H1N3, which is possible only if there is a bisecting GlcNAc present. Fifty-eight percent of the 87 N-glycopeptides had terminal hexose residues, whereas 13% had terminal HexNAc. Eighty-eight percent of the complex type structures likely had a lactosamine motif and 8% had the LacdiNAc motif. However, the units of lactosamine, when more than one, could not be assigned to its correct location meaning antennae information was not present; therefore, bi-, tri-, and tetra antennary structures cannot be claimed.
Fifty percent of the structures were sialylated. Considering that polysialic acid is rare, if it is not present, the least number of branches can be claimed unambiguously. With mono sialylated structures, we can say that at least one antennae is present and two antennae in the case of bi-sialylated structures. Out of the total complex structures, 65% were fucosylated. Of the fucosylated structures, 92% were core fucosylated, whereas 8% were only fucosylated on the antennae. An overlapping 16% were both, core as well as fucosylated on the antennae. Six percent of the complex structures had the potential Lewis X motif, whereas 9% had potential blood group H antigen (This is according to the glycopeptide precursor spectra matching to the database entries and experimental validation for this has not been performed); however, all these structures could include any of the three: Lewis X/H antigen/LewisA. This is because it is not possible to differentiate between these structures with the currently available MS data analysis. Of the hybrid structures, all were sialylated and also had the lactosamine motif. This description is only for the N-glycopeptides where a structure was proposed.
De novo Glycan Composition-Of all the N-glycopeptides where a de novo composition was proposed, 27 were found to be of complex type glycans, whereas three were hybrid. These classes are being claimed with the help of drawing the glycan compositions (with knowledge of N-glycans) and searching them on JCGGDB (http://jcggdb.jp/idb/jsp/Glycan-CompositionSearch.jsp). Sixteen of the compositions had evidence for bisecting GlcNAc in the form of a peptide anchored ion of H1N3. Twenty-four compositions were found to contain fucose. For seven of these, unambiguous assignment of core and/or antennae fucosylation was possible from the spectra annotation. Nine of the compositions were sialylated with a varying number (1 to 3) of sialic acids present.
Gene Ontology Enrichment-Thirty-seven glycoproteins found in our analysis were classified according to their subcellular distribution using GO Ontology analysis by program GoRetriever from AgBase (27). This analysis is shown in Fig. 3.
Some of the biggest categories were the extracellular region (21, 57% of total) followed by cytoplasmic and intracellular proteins (18, 49% of total). Organelle proteins (16, 43%) and intracellular proteins were other big categories followed by lysosomal proteins (30%), membrane bound vesicles (24%), endoplasmic reticulum proteins (19%), and Golgi apparatus proteins (11%). In comparison, plasma membrane, extracellular region proteins, and cytoplasm were the three biggest categories in a large scale proteomic study published on urinary exosomes (7).
Cubilin as a Case Example-As a representative example of our study, Cubilin is described here in more detail. We were able to find 21 N-glycopeptides belonging to five different N-glycosylation sites in cubilin, of which, 16 structures were proposed.
Cubilin N-glycopeptides-Glycan structures found at various glycosylation sites of cubilin are presented in Fig. 4. We have used color symbols in the figures, for example, for mannose and galactose but MS does not give this information. It can only tell us that it is a hexose but the Glycopepti-deId software searches the data against a database (Glycome DB), which gives this kind of information. With the help of knowledge about N-glycans, mannose, and galactose can be safely differentiated. It is well known that the N-glycan core has three mannoses and two GlcNAc and no galactose or GalNAc. Similarly, terminal hexose has to be a galactose and it cannot be a mannose if the structure is a complex type. Here, the description is given of only the N-glycopeptides for which a glycan structure was proposed. All the N-glycopeptides identified in this study belonging to Cubilin contained glycans (Fig. 4) that are the complex type glycans. Thirteen of the 16 glycan structures matched were bisecting complex type structures. Nine structures were core fucosylated and three were core as well as antennae fucosylated glycans, whereas an additional three structures were only fucosylated on the antennae (Fig. 4). Fucosylated and nonfucosylated variants of the same structures were found on multiple sites. For confirmation of the fucosylation position, we were able to find evidence of a peptide anchored N1F1 peak in the spectra in all but three cases (m/z value of 1067.45, 1105.7, and 916.3). In these cases, peptide anchored fucosylated N-glycan core or a part thereof was observed. In the case of antennae fucosylation, glycan fragments bearing antennal fucoses were observed in the spectra without ambiguity.
Glycomics-Released N-glycans were analyzed with MALDI-TOF and neutral and total N-glycan compositions are shown in Table II and III. There are 66 unique compositions (also including the acidic fraction compositions but excluding the sulfated/phosphorylated glycan) found in glycomic analysis (containing N core and nonmodified N-glycans) compared with 67 unique compositions (from 126 N-glycopeptides) we found in the glycopeptides analysis. Altogether, 101 unique compositions were found combining glycomic and glycoproteomic analysis (Fig. 5). Out of these, 34 were unique to the glycomic analysis, whereas 35 were unique to glycoproteomic analysis and 32 compositions were common to both the groups.
The average size of the glycan uniquely found in the glycomic analysis was 1900 Da, whereas those uniquely found in glycoproteomic analysis was 2466 Da. A few big glycans can skew the result of such a comparison so we also calculated median size of the unique glycans found in both the analyses and it was 1890 Da for glycomic analysis and 2465 Da for glycoproteomic analysis. A zoomed-in part of the neutral and acidic MALDI-TOF spectra is shown in Fig. 6 and complete spectra for total, neutral, and acidic fractions are shown in supplemental Figs. S3, S4, and S5 (supplemental File S1). High-mannose and hybrid glycans were common to both the analysis, but bigger complex type sialylated glycans were preferentially found in glycoproteomic analysis. For example, glycans containing one sialic acid were 51% of unique glycans found in glycoproteomic analysis (18 of 35 glycans). In FIG. 4. Proposed structures for Cubilin belonging to five different N-glycosylation sites. Blue squares are HexNAc, green circles are mannoses, and yellow circles are galactose, whereas the red triangle is fucose. The peptide sequence is shown at the top of each box. Five different peptide sequences are shown. N-glycosylation site is marked with a red letter. The m/z value of each precursor is given at the bottom of the glycan structure and the charge is given in parenthesis. When the antennalfucose position is ambiguous such as attached to either HexNAc or Hexose, then the fucose is drawn outside of the parenthesis. It is to be noted that LacNAc units cannot be assigned to a given branch from the spectrum; and therefore, the number of antennae remains ambiguous. Here, the structures from GlycomeDB that matched to various spectrums are shown. When the glycan composition is determined, if there are several identical compositions in GlycomeDB with different structures the potential structures are theoretically fragmented and compared with the empirical spectrum and scored. The different structures yield different fragments and the highest scoring structure is reported.
comparison, only 12% of unique glycans found in glycomic analysis contained one sialic acid (4 of 34 glycans). One glycan containing two sialic acids was found in glycomic analysis (1 of 34, ϳ3%), whereas seven of them were found in glycoproteomic analysis (7 of 35, 20%). No glycans containing three or four sialic acids were found in glycomic analysis, whereas five and two glycans containing three and four sialic acids, respectively, were found in glycoproteomic analysis.
Fifty one percent of unique glycans in glycoproteomic analysis were singly fucosylated, whereas only 24% were singly fucosylated in case of glycomic analysis. Glycans containing two fucoses were more present in glycomic analysis (26% of total unique glycans), whereas there were only 9% of them in glycoproteomic analysis. Nine percent of unique glycans found in glycomic study had three fucoses, whereas 14% of such glycans were present in glycoproteomic analysis.
In the glycomic analysis, in acidic fraction, we have also found sulfated and/or phosphorylated glycans (Table IV), which are absent in glycopeptides analysis. These glycans are not sialylated, but mainly represent complex type and 69% of them are fucosylated (9 of 13 glycan compositions). Total matched and unmatched peaks of neutral, total, and acidic N-glycans are given in supplemental Table S4. DISCUSSION Exosomes are formed in the multivesicular bodies (MVB) and upon appropriate stimuli these MVB fuse with cell membranes to release their contents (28). Exosomes can reflect the state of the cells secreting them (29). They can reflect, very early in the course of diseases, the changes going on in the cells upon pathophysiological stimuli. However, glycosylation of exosomal proteins is widely unknown with the exception of classes of glycans on the surface as evidenced by lectin binding assays. This surface glycome is known to change in autosomal dominant polycystic kidney disease compared with healthy volunteers' urinary exosomes (30). In this study, we have characterized multiple glycoproteins, their N-glycosylation site, glycan compositions, and structures. This is the first glycomic and glycoproteomic analysis of urinary exosomes.
N-glycopeptides were enriched from original digest to avoid the matrix effect as their ionization is impaired compared with  nonglycosylated peptides. Nonglycosylated and glycosylated peptides compete for ionization, and ion suppression of glycopeptides can happen. In addition, compared with unmodified peptides from the same protein the molar amount of glycopeptides decreases as microheterogeneity increases. N-glycopeptides were enriched by SNA lectin and SEC. They were initially enriched using SNA affinity chromatography but the number of structures mapped using the SNA alone was low. In the case of some other lectins such as RCA and PHA-E, the analysis did not yield any structures because of the very low amount of N-glycopeptides bound. It is well known that most lectins prefer multivalent ligands (31,32). This multivalent context may not be preserved when glycoproteins are digested to N-glycopeptides reducing the affinity of lectins to N-glycopeptides and the yield of the chromatography. Therefore, we decided to adopt size exclusion chromatography to enrich N-glycopeptides. N-glycopeptides are generally larger than peptides and separate well in SEC (23). Peaks from SNA and SEC analysis were combined as one peak list file. In supplemental Table S2, the m/z values of N-glycopeptides observed in SNA and/or SEC are indicated as SNA or SEC. Our complete workflow is depicted in Fig. 2.
The most valuable data in this study is the amino acid sequence of N-glycopeptides, the assignment of glycosylation site and glycan composition (from glycomics and glycoproteomics experiments). However, this study also contains the structures proposed for the glycan part of multiple Nglycopeptides, which is derived from GlycomeDB search. More details about derivation of structures are given in the Methods section. These structures are only proposed and wherever possible the ambiguity in these assignments has been mentioned in the text and/or figures. Many structural features (fucosylation, bisecting HexNAc, and classes of glycans) have been manually verified by confirming the presence of diagnostic ions and drawing the structures from the given compositions using the knowledge of N-glycans.
The highest numbers of glycoforms were found for five N-glycosylation sites in cubilin (Fig. 4) and two sites in the galectin-3-binding protein. For cubilin, a number of structures were bisecting GlcNAc and many of them were core and/or  antennal fucosylated. We have used intact peptideϩglycan fragments from MS 2 spectra to manually validate the core and in some cases the antennal fucosylation site. In most of them, fragment ions containing fucose were found. For m/z values of 1076.78, 1125.47, 844.35, 1107.8, 1053.1, and 1067.45, the fragment containing a whole peptide and HexNAcϩFucose (N1F1) [PN1F1 where P is intact peptide], a diagnostic ion was found in the annotated spectra. For m/z 916.37, a fragment ion of PH2N2F1 was found in the annotated spectra confirming the core fucosylation. For 984.1, the fragment ion of PN1F1 was not found, however, the fragment ion for PH2N3F1 was found, but based on only this fragment ion, it cannot be resolved whether the fucosylation is of core or of antennae. For this purpose, a glycan fragment of H1N1F1 was found in the annotated spectra, which would not have been possible if the glycan was only core fucosylated. For the m/z value of 1105.7, a peptide containing fragment ion of PH1N3F1 was found, which can only be true if the structure is bisecting and core fucosylated, as this structure is. For m/z 1107.7, a glycan fragment of H1N1F1 was found in the annotated spectra, which confirms that the glycosylation is antennal. Similarly for m/z 1053.7, the glycan fragment of H1N1F1 confirms that it is antennal fucosylated.
In the case of galectin-3-binding protein (supplemental Table  With m/z value of 925.6, the proposed composition was S1H6N5F1. The structure matched to antennally fucosylated complex oligosaccharide in the database. However, in the spectra a PH2N3F1 fragment was found. Considering this fragment, it could also be a core that is fucosylated. If you rearrange the fucose to core, the glycan composition holds true. Therefore, in the case of this composition the presence of fucose is ambiguous. An ion withan m/z value of 1131.5, which has a diagnostic ion for core fucose, also has antennal fucosylation. This spectra in addition to PN1F1, has a glycan fragment of H1N1F1, which can only be true if it is antennally fucosylated. It is unambiguous in the case of m/z 1131.5 that it is core as well as antennally fucosylated. However, this could also be two isobaric ions with one core and the other antennally fucosylated with the same mass and retention times. Manual confirmation of the presence of fucoses (manually searching the spectra for diagnostic ions containing fucose) as well as sialic acid moieties by using multiple diagnostic ions and fragments is of paramount importance. It is well known that fucose can be rearranged between antennae during MS 2 fragmentation and fucose can be erroneously assigned to wrong antennae (33). Therefore, we write these structures as Lewis X/A/H antigen.
Microheterogeneity in the glycosylation site was detected on multiple peptides as shown in Fig. 4. Several Lewis X and/or LewisA/blood group H antigen structures were found in our analysis. One study has been published on mouse kidney proteins in which multiple Lewis X conjugated proteins were found (34). The authors treated the tissue proteins with ␣ 1-2 fucosidase and found no evidence of LewisY epitopes, and using cross-ring cleavages based diagnostic ions in glycans released by PNGase-F, confirmed the absence of Lewis A and -B epitopes (35). The proteins that they confirmed to be Lewis X-conjugated are found in our study together with the Lewis X distinctive ion H1N1F1 that cannot be from core fucose. Precursors with m/z values of 1125.4, 1053.7, 984.1, and 1107.7 have the characteristic fragment H1N1F1. This finding strengthens our results and is in agreement with the published reports.
A corresponding glycomic analysis of exosomal proteins was done with matrix assisted laser desorption-ionization time of flight (MALDI-TOF). Neutral N-glycans and total Nglycome were analyzed in positive ion reflector mode as [MϩNa] ϩ ions and acidic N-glycans in negative ion reflector mode as [M-H] Ϫ ions. Despite the data acquisition differences in glycomic and glycoproteomic analysis, when the nonmodified glycan compositions (having N-core) were analyzed, the results were surprisingly similar in terms of number of unique compositions found (66 versus 67) with roughly, a one third overlap between the two techniques. However, there were some preferences such as: sialylated glycans were better represented in glycoproteomic analysis compared with glycomic analysis.
In the mass spectrometric analysis of glycan and glycopeptides, enrichment is typically needed because of poor ionization of glycans and glycopeptides from complex mixtures for e.g. tryptic digest of exosomal proteins. The molecular classes/types/chemical properties of glycans are glycopeptides are vastly different, and thus, the enrichment methods also vary. Majority of the differences between glycomic and glycoproteomic analysis can be explained with different enrichment techniques, a graphitized carbon solid phase extraction and lectin/SEC, respectively, and ionization methods (ESI versus MALDI). Another difference is the use of RP-UPLC in glycoproteomic analysis, which reduces the complexity even further, increases the ionization efficiency of glycopeptides, and gives the instrument time to fragment them properly. Our results show that glycomic and glycopeptides analysis are complementary to each other and with overlap in results; multiple compositions can be confirmed with both the techniques.
Sulfated and/or phosphorylated glycan compositions were found in the negative mode analysis in MALDI (Table IV), which are absent in the glycoproteomic analysis. Detecting these modifications in glycoproteomic analysis using ESI-MS is inherently difficult because of their labile nature.
It has to be noted that fundamentally these two analyses types are very different from each other. MALDI analyzes the glycans that are mainly singly charged sodium adducts or deprotonated species (Glycomic), whereas the ESI-MS analyzes the glycopeptides that are mainly multiply charged (we have found and resolved ϩ3 to ϩ5) protonated (Glycoproteomic). In glycomic analysis one single glycan composition signal might be coming from tens of different proteins containing the same glycan. On the other hand, because of the peptide part, these same glycans are detected as different species in glycoproteomic experiments.
Glycoproteomic experiments split the intensity of single type of glycan into several different peaks, whereas glycomic analysis combines the intensity from several different proteins, which makes the direct comparison of these two types of data less fruitful. Glycomic analysis lacks the information on the protein part completely when performed on complex mixtures. Glycopeptide analysis provides information, which can relate to the function of a protein and its modulation by glycosylation, which is important in physiological and/or pathological states. Based on these observations, we would like to say that both of these two techniques will give a different view to the same analysis, and their data complement each other. However, it is clear that advances in both the techniques are needed in the near future to get a complete picture of the glycome of a given entity such as cells, tissues, or exosomes for that matter.
Combining the data from the glycomic and glycoproteomic approach revealed that exosomes seem to be enriched in fucosylated glycans. Fucosylation of glycan structures has been linked with cancer progression and inflammation (36). Fucosylated proteins such as carbohydrate antigen 19 -9 and ␣-fetoprotein (AFP) have been used as serum tumor markers of pancreatic and liver cancer, respectively (37). The fucosylated form of AFP is better than AFP in diagnosing liver cancer (38). Of the total complex glycan structures found in our study, 50% were found to be fucosylated. Therefore, exosomes might prove to be a valuable source of biomarkers for these diseases. However, further studies to provide information about this potential of glycan structures are needed. The present study paves the way for studies to exploit the glycoproteomics of urinary exosomes in search of glycopeptide biomarkers as well as identifying the target proteins for uptake of exosomes by cells.