Mapping In Vivo O-Glycoproteome Using Site-specific Extraction of O-linked glycopeptides (EXoO)

Protein glycosylation is one of the most abundant post-translational modifications. However, detailed analysis of in vivo O-linked glycosylation, a major type of protein glycosylation, has been severely impeded by the scarcity of suitable methodologies. Here, we present a chemoenzymatic method for the site-specific extraction of O-linked glycopeptides (EXoO), which enabled the unambiguous mapping of over 3,000 O-linked glycosylation sites and definition of their glycans on over 1,000 proteins in human kidney tissues, T cells and serum. This large-scale localization of O-linked glycosylation sites nearly doubles the number of previously identified sites, demonstrating that EXoO is the most effective method to-date for defining the site-specific O-linked glycoproteome in different types of sample. Detailed structural analysis of the sites identified revealed conserved motifs and topological orientations facing extracellular space, the cell surface, the lumen of the ER and the Golgi. EXoO was also able to reveal significant differences in the in vivo O-linked glycoproteome of tumor and normal kidney tissues pointing to its broader use in clinical diagnostics and therapeutics.

including the use of lectins 6, 7 , HILIC 8, 9 , hydrazide chemistry 10, 11 , metabolic labelling 5, 12 and a 61 gene-engineered cell system named 'SimpleCell' 13 . However, all of these methodologies have 62 In EXoO, proteins are first digested to generate peptides, which are then conjugated to a solid-100 support. After washing, the O-linked glycopeptides are enzymatically released from the support 101 using an endo-protease OpeRATOR that requires the presence of O-linked glycans to 102 specifically cleave on the N-terminal side of O-linked glycan-occupied Ser or Thr (Fig. 1A). To 103 demonstrate proof of principle, bovine fetuin was analysed and the six known O-linked 104 glycosylation sites documented in the Uniprot database were pinpointed at Ser-271, Thr-280, 105 Ser-282, Ser-296, Thr-334 and Ser-341 (Supplementary Table 1). In addition, a new O-linked 106 glycosylation site at Ser-290 was also identified (Supplementary Table 1 and Supplementary Fig.  107 1). Of note, O-linked glycans were still attached to the site-specific O-linked glycopeptides as 108 confirmed by the detection of oxonium-, peptide (Y0)-and less commonly peptide + HexNAc 109 (Y1)-ions in the MS/MS spectrum (Fig. 1B)

T cells and serum 122
EXoO was benchmarked using human kidney tissue, T cells and serum to determine 123 performance of the method in samples with differing levels of protein complexity. To do this, O-124 linked glycopeptides were extracted using EXoO and fractionated into 24 fractions and then 125 subjected to LC-MS/MS analysis ( Fig. 2A) unique peptides compared to that seen for serum, with more than half of peptides detected in 152 serum also being identified in the tissue sample, possibly due to the presence of serum in tissue 153 samples (Fig. 2B). To visualize the relative abundance of peptides in different samples, the PSM 154 numbers of peptides, which are suggestive of relative abundance, were clustered by unsupervised 155 hierarchical clustering (Fig 2C). This showed that not only that the peptides differed between 156 samples but also that their relative abundances were markedly divergent between samples (Fig.  157 2C). Interestingly, immunoglobulin heavy constant alpha 1 (IGHA1) has the highest PSM To identify changes in the O-linked glycoproteome between normal and tumor kidney tissue, 217 spectral counting label-free quantification of the EXoO identified peptides was used (Fig. 3A).

Solid-phase extraction of site-specific O-linked glycopeptides from fetuin 287
Bovine fetuin (P12763) were denatured in buffer containing 8 M urea and 1 M ammonium 288 bicarbonate (AB) and reduced in 5 mM DTT at 37°C for 1 hour. Proteins were alkylated in 10 289 mM iodoacetamide at room temperature (RT) for 40 min in the dark. The resulting samples were 290 diluted eight-fold using 100 mM AB buffer before adding trypsin (enzyme/protein ratio of 1/40 291 w/w) and incubating at 37C for 16 hours. Following digestion, peptides were de-salted using a 292

C18 column (Waters, Milford, MA) according to manufacturer's instructions. 293
Peptides were conjugated to AminoLink resin (Pierce, Rockford, IL) as previously described 20 . 294 Briefly, the pH of the peptide containing eluate of the C18 column was adjusted to 7.4 by adding 295 phosphate buffer (pH 8.0). Peptides were then incubated with the resin (100 µg/100 µl resin, 296 50% slurry) and 50 mM sodium cyanoborohydride (NaCNBH 3 ) at RT for at least 4 hours or Tris-HCl buffer (pH 7.4) to collect the remaining peptides. The pooled peptides were then 305 desalted on a C18 column and dried by lyophilization. 306

Extraction of O-linked glycopeptides from human kidney tissue, serum and T cells 307
Collection and use of human tissue has been approved by Johns Hopkins Institutional Review 308 Board (IRB). Kidney tumors were categorised as being clear cell renal cell carcinomas (CCRCC) 309 and samples of tumor tissue were stored at -80°C before use. Control ormal kidney tissue 310 samples were collected from the same individuals. Proteins from human kidney tissues, serum 311 (Sigma-Aldrich, St. Louis, MO) and CEM T cells were trypsin-digested as described above. 312 Following digestion, guanidination of peptides was conducted on a C18 column using procedure 313 described previously to recover the Lys-containing peptides from complex samples 17

Peptide fractionation 327
Peptides (100 µg) were split into 96 fractions using a 1220 Series HPLC (Agilent Technologies, 328 Inc., CA) equipped with a Zorbax Extend-C18 analytical column containing 1.8 μm particles at a 329 flow rate of 0.3 ml/min. The mobile-phase A was 10 mM ammonium formate (pH 10) and B was 330 10 mM ammonium formate and 90% acetonitrile (pH10). Peptides were separated using the

LC-MS/MS analysis 335
Peptides dissolved in 0.1% formic acid (FA) were analyzed on a Fusion Lumos mass 336 spectrometer with an EASY-nLC 1200 system or a Q-Exactive HF mass spectrometer (Thermo 337 Fisher Scientific, Bremen, Germany) with a Waters NanoAcquity UPLC (Waters, Milford, MA). 338 The mobile phase flow rate was 0.2 μL/min with 0.1% FA/3% acetonitrile in water (A) and 0.1% 339 FA/90% acetonitrile (B). The gradient profile was set as follows: 6% B for 1 min, 6−30% B for 340 84 min, 30−60% B for 9 min, 60−90% B for 1 min, 90% B for 5 min and equilibrated in 50% B, 341 flow rate was 0.5 μL/min for 10 min. MS analysis was performed using a spray voltage of 1.8 342 kV. Spectra (AGC target 4 × 10 5 and maximum injection time 50 ms) were collected from 350 to 343 1800 m/z at a resolution of 60 K followed by data-dependent HCD MS/MS (at a resolution of 50 344 K, collision energy 29, intensity threshold of 2 × 10 5 and maximum IT 250 ms) of the 15 most 345 abundant ions using an isolation window of 0.7 m/z. Charge-state screening was enabled to reject 346 unassigned, single, and more than six protonated ions. Fixed first mass was 110 m/z. A dynamic 347 exclusion time of 45s was used to discriminate against previously selected ions. 348