Probing the O-Glycoproteome of Gastric Cancer Cell Lines for Biomarker Discovery*

Circulating O-glycoproteins shed from cancer cells represent important serum biomarkers for diagnostic and prognostic purposes. We have recently shown that selective detection of cancer-associated aberrant glycoforms of circulating O-glycoprotein biomarkers can increase specificity of cancer biomarker assays. However, the current knowledge of secreted and circulating O-glycoproteins is limited. Here, we used the COSMC KO “SimpleCell” (SC) strategy to characterize the O-glycoproteome of two gastric cancer SimpleCell lines (AGS, MKN45) as well as a gastric cell line (KATO III) which naturally expresses at least partially truncated O-glycans. Overall, we identified 499 O-glycoproteins and 1236 O-glycosites in gastric cancer SimpleCells, and a total 47 O-glycoproteins and 73 O-glycosites in the KATO III cell line. We next modified the glycoproteomic strategy to apply it to pools of sera from gastric cancer and healthy individuals to identify circulating O-glycoproteins with the STn glycoform. We identified 37 O-glycoproteins in the pool of cancer sera, and only nine of these were also found in sera from healthy individuals. Two identified candidate O-glycoprotein biomarkers (CD44 and GalNAc-T5) circulating with the STn glycoform were further validated as being expressed in gastric cancer tissue. A proximity ligation assay was used to show that CD44 was expressed with the STn glycoform in gastric cancer tissues. The study provides a discovery strategy for aberrantly glycosylated O-glycoproteins and a set of O-glycoprotein candidates with biomarker potential in gastric cancer.

Most broad proteomic studies for discovery of cancer biomarkers in serum have been designed to interrogate the proteome and not taking into account that cancer cells often produce aberrant glycoforms (1). Many cancer biomarkers currently used in the clinic are based on circulating O-glycoproteins that are detected in established serological assays (CA125, CA15-3, CEA, and CA19.9) (2). In addition to being overexpressed in cancer, these proteins also carry aberrant glycans, which open for the opportunity to selectively detect aberrant glycoforms. An inherent problem with most cancer biomarker assays is that they often have poor specificity because the detected glycoprotein is found in elevated levels in nonmalignant conditions (2,3). We recently found that the specificity of the widely used CA125 biomarker assay can be increased by selectively detecting aberrant O-glycoforms of the MUC16 mucin probed in the CA125 assay (4). Thus, the truncated O-glycan STn (NeuAc␣2-6GalNAc␣1-O-Ser/Thr) 1 ( Fig. 1) was particularly suited for discrimination of MUC16 circulating in cancer patients in contrast to MUC16 circulating in benign conditions (4).
One of the most characteristic phenotypes of cancer cells is the expression of truncated O-glycans, and the structures T (Gal␤1-3GalNAc␣1-O-Ser/Thr), STn, and Tn (GalNAc␣1-O-Ser/Thr) (Fig. 1) are considered pancarcinoma antigens (2,5). These truncated O-glycans are essentially not produced in normal and benign cells, which suggests that circulating O-glycoproteins in normal and benign conditions should have more mature O-glycans, whereas O-glycoproteins shed from cancer cells are expected to display truncated glycan structures. Cancer cells produce, secrete, and shed many different O-glycoproteins with truncated O-glycans, and provided these glycoproteins reach the circulation they may be detectable in serum. However, it is also known that nonsialylated glycoproteins are cleared from circulation through innate immune lectin receptors (6). In fact, we were previously unable to detect circulating T and Tn glycoforms of MUC1 and MUC16, while the sialylated ST (NeuAc␣2-3Gal␤1-3[NeuAc␣2-6] Ϯ GalNAc␣1-O-Ser/Thr) and STn glycoforms were readily detectable (4,7). Furthermore, two classical serological biomarker assays, CA19 -9 (8) and CA72.4 (9 -11), are based on the detection of sialylated O-glycans, and especially the latter that detects STn shows that proteins expressing the STn glycoform circulate in serum of cancer patients. Interestingly, although CA72.4 has been used for decades, it is still largely unknown which O-glycoproteins carry STn and are detected by the CA72.4 assay (9,10).
The truncated STn O-glycan has attracted much attention because it is highly expressed in most gastric (12), colorectal (13), ovarian (14), breast (15), pancreatic (16), and bladder (17) carcinomas, whereas expression of STn on normal tissues is highly restricted (11,18). In addition, STn expression is associated with carcinoma aggressiveness and poor prognosis (15,19). We have recently described the presence of a few STn bearing glycoproteins in serum from individuals with gastric cancer and gastric cancer precursor lesions (20). The biosynthetic and genetic mechanisms underlying the expression of this truncated O-glycan in cancer have remained poorly understood, and a number of mechanisms have been proposed that may not be mutually exclusive. One mechanism is the altered expression of the sialyltransferase ST6GalNAc-I, which is believed to be the main STn synthase (21,22) (Fig. 1), and in fact overexpression of this enzyme in cell lines appears to override the normal O-glycan elongation machinery and result in expression of STn (22,23). Another mechanism may be reduced core1 elongation that leads to accumulation of Tn, which serves as substrate for ST6GalNAc-I (22). The core1 synthase C1GALT1 is dependent on a private chaperone Cosmc, and several studies have reported that somatic mutations in COSMC gene (24), or hypermethylation of COSMC gene in cancer (25) lead to increased expression of Tn and STn. We have further shown that knockout (KO) of COSMC in a number of human cancer cell lines produce cells that express different levels of Tn and STn truncated O-glycans ranging from exclusive Tn to exclusive STn (26). A third potential mechanism offered recently may be related to cancer-associated relocation of the polypeptide GalNAc-transferases (GalNAc-Ts) that initiate O-glycosylation ( Fig. 1) from Golgi to ER, which appear to induce expression of the Tn truncated O-glycans, although expression of STn has not been explored yet (27).
In the present study, we applied a glycoproteomics strategy to explore potential biomarker O-glycoproteins with the STn glycoform in gastric cancer. We first characterized the Oglycoproteome and including the secretome of two gastric cancer cell lines, AGS (intestinal type gastric carcinoma) and MKN45 (diffuse type gastric carcinoma), using our SimpleCell (SC) discovery platform where we identified a total of 499 O-glycoproteins (1236 O-glycosites). This strategy involves genetic engineering of cell lines to produce homogenous truncated O-glycans (Tn and/or STn) by KO of COSMC, followed by Vicia Villosa lectin (VVA) enrichment of Tn glycoproteins and/or glycopeptides for sensitive identification of O-glycoproteins and O-glycosites by mass spectrometry (26,28) (Fig.  1). We applied the same glycoproteomics workflow to a wild type (wt) gastric cancer cell line, KATO III (diffuse type gastric carcinoma), which naturally expresses Tn and STn O-glycans in a mixture with more complex structures, and identified a significantly smaller O-glycoproteome (total of 47 O-glycoproteins) compared with SimpleCells (total of 499 O-glycoproteins). We next modified the strategy to enrich for STn O-glycoproteins in pools of serum from cancer patients and normal controls using pretreatment with neuraminidase to remove sialic acid and expose Tn for VVA capture. This approach enabled us to isolate and identify 37 O-glycoproteins (49 O-glycosites) in gastric cancer serum. Finally, we confirmed that two of the identified serum O-glycoproteins (CD44 and GalNAc-T5) were expressed in gastric cancer tumors by immunohistology, and further used proximity ligation assay (PLA) to show that STn glycoforms of CD44 was expressed in GalNAc-transferases. The addition of GalNAc to serines or threonines (or tyrosines) forms the Tn structure that can be sialylated by ST6GalNAc-I or further elongated to form up to four core structures. The core structures can be further elongated. cancer tissue. This study clearly shows that cancer patients have a variety of circulating O-glycoproteins with the STn glycoform, and supports the hypothesis that these glycoproteins originate from the cancer tissue. The identified secreted and circulating aberrant O-glycoproteins serve as a discovery set for biomarkers of gastric cancer.

EXPERIMENTAL PROCEDURES
Generation and Characterization of AGS and MKN45 Simple-Cells-We targeted the COSMC gene in two human gastric cell lines, AGS and MKN45, using zinc finger nuclease (ZFN) precise gene editing as previously described (26,29,30). Briefly, cells were transfected with 4 g of compoZr® C1GalT1C1 DNA using an Amaxa TM Nucleofector TM according to the manufacture's cell lines specific protocols (Lonza, Basel, Switzerland). Cells were screened on acetone fixed slides by immunocytochemistry using monoclonal antibodies (MAbs) to the Tn (5F4) and STn (3F1) O-glycans (See Supplemental Table S1 for MAbs used in study). The cells were then cloned by limited dilution and the final clones were further characterized by staining with monoclonal antibodies to either the C1GalT1 enzyme (5B6) or the T antigen (3C9) O-glycan structure with and without pretreatment with neuraminidase (Sigma). Slides were examined using a Zeiss Optical Microscope. Finally, we analyzed selected clones by PCR followed by sequencing to define the induced mutations in COSMC.
Serum Samples-Serum samples from gastric carcinoma patients were collected from The Portuguese Institute of Oncology (IPO-Porto, Portugal), and control samples from individuals undergoing hernia surgery from the Hospital de Sã o Joã o in Porto, Portugal. All samples were collected with informed consent and use of samples was approved by the local Ethical committees (CHSJ and IPO). A total of 29 individual samples (ϳ0.69 ml each, from a sample taken at time of surgery) from both intestinal and unclassified subtype, according to Lauren's classification (31), with different disease stages, were used for the gastric carcinoma serum pool. Individuals were selected for blood group B and O to avoid potential cross-reactivity of VVA lectin with blood group A, although glycan array data from the Consortium for Functional Glycomics show no such cross-reactivity (http://www. functionalglycomics.org). Although some O-glycoproteins may potentially be lost in coagulation, most comparative proteomic studies of serum and plasma have found similar glycoprotein composition (24,32). The age of carcinoma patients varied from 49 to 81 with an average value of 65 years old, and included 17 males and 12 females. Age of control individuals varied from 21 to 83 years old with an average of 69 and included 20 males and 5 females.
Sample Preparation and Lectin Affinity Chromatography-Total cell lysates (TCL) and culture medium (SEC) were processed as previously described (28,33). In brief, spent culture medium (ϳ80 ml harvested from two 175 ml T-flasks seeded at 5 ϫ 10 5 cells and cultured for 3 days) was cleared, dialyzed, and subjected to neuraminidase treatment (10 U Clostridium perfringens neuraminidase Type VI (Sigma)) before loaded on a 0.3 ml Vicia villosa agglutinin (VVA) agarose (Vector laboratories, Burlingame, CA) column. The column was washed with 10 column volumes (CV) of 0.4 M Glucose in LAC A buffer (20 mM Tris-HCl, pH 7.4, 150 mM NaCl, 1 M Urea, 1 mM CaCl 2 , MgCl 2 , MnCl 2 , and ZnCl 2 ) followed by 1 ml 50 mM AmBic. The glycoproteins were then eluted by 4 ϫ 500 l 0.05% RapiGest with heating to 90°C for 10 min as described previously (28). Fractions were pooled and directly reduced with 5 mM DTT for 40 min at 60°C, alkylated with 10 mM iodoacetamide in dark for 45 min, and digested with trypsin (25 g).
Cell pellets were obtained by removing media, washing with phosphate buffer solution (PBS) and, followed by scraping of the cells in PBS, lysed in RapiGest and sonicated. The cleared lysate was reduced, alkylated with iodoacetamide, and then digested overnight with trypsin.
For AGS SC we also used chymotrypsin and Glu-C in addition to trypsin. Thus, for this cell line three data sets for TCL and three for SEC were generated.
Serum pools (20 ml) were diluted in PBS to 100 ml and neuraminidase treated (10 U), and loaded twice on a 0.3 ml VVA agarose column. The glycoproteins were eluted and reduced and alkylated as described above, and digested with chymotrypsin (25 g) overnight at 37°C.
All digests were finally subjected to lectin weak affinity chromatography (LWAC) on VVA agarose as previously described (26,28). Briefly, the digestion was quenched with 1% trifluoroacetic acid (TFA), and the resulting peptides were desalted with C18 stage tips and dried in a SpeedVac. The digest was then neuraminidase treated (New England Biolabs, Ipswich, MA), diluted in 2 ml LAC A buffer and injected onto a pre-equilibrated 2.6 m long VVA agarose (Vector Laboratories, Burlingame, CA) column, similar to the system described previously (28,34,35). The flow was set to 100 l min Ϫ1 and 500 l fractions were collected. The column was washed with 0.4 M glucose in LAC A buffer and eluted with 2 ϫ 2 ml 0.2 M GalNAc and 1 ϫ 2 ml 0.4 M GalNAc.
Analysis of KATO III further included isolation of T-glycopeptides. The flowthrough fractions (2 ml) of the VVA LWAC step for TCL were concentrated on C18 and dried as before and further re-enriched for T glycoforms by PNA LWAC as previously described (26).
MS Approach and Data Analysis-EASY-nLC 1000 UHPLC (Thermo Scientific, Bremen, Germany) interfaced via nanoSpray Flex ion source to an LTQ-Orbitrap Velos Pro spectrometer (Thermo Scientific, Bremen, Germany) was used for analysis. The nLC was operated in a single analytical column set up using PicoFrit Emitters (New Objectives, 75 m inner diameter) packed in-house with Reprosil-Pure-AQ C18 phase (Dr. Maisch, 1.9-m particle size, 19 -21 cm column length) and gradient elution times of 120 min were employed.
A precursor MS1 scan (m/z 350 -1700) of intact peptides was acquired in the Orbitrap at a nominal resolution setting of 30,000, followed by Orbitrap HCD-MS2 and ETD-MS2 (m/z of 100 -2000) of the five most abundant multiply charged precursors in the MS1 spectrum; a minimum MS1 signal threshold of 50,000 was used for triggering data-dependent fragmentation events; MS2 spectra were acquired at a resolution of 7500 for HCD MS2 and 15,000 for ETD MS2. Activation times were 30 and 200 ms for HCD and ETD fragmentation, respectively; isolation width was four mass units, and usually one microscan was collected for each spectrum. Automatic gain control targets were 1,000,000 ions for Orbitrap MS1 and 100,000 for MS2 scans, and the automatic gain control for fluoranthene ion used for ETD was 300,000. Supplemental activation (20%) of the chargereduced species was used in the ETD analysis to improve fragmentation. Dynamic exclusion for 60 s was used to prevent repeated analysis of the same components. Polysiloxane ions at m/z 445.12003 were used as a lock mass in all runs.
Data Analysis-Data processing was performed using Proteome Discoverer 1.4 software (Thermo Scientific, Bremen, Germany) as previously described with small changes (28). Because of the high speed of data processing Sequest HT node was used instead of Sequest. All spectra were initially searched with the full cleavage specificity, filtered according to the confidence level (medium, low, and unassigned) and further searched with the semispecific enzymatic cleavage. In all cases the precursor mass tolerance was set to 6 ppm and fragment ion mass tolerance to 50 mmu. Carbamidomethylation on cysteine residues was used as a fixed modification. Methionine oxidation and HexNAc attachment to serine, threonine, and tyrosine were used as variable modifications for ETD MS2. All HCD MS2 were preprocessed as described (28) and searched under the same conditions mentioned above using only methionine oxidation as variable modification. All spectra were searched against a concatenated forward/reverse human-specific database (UniProt, January 2013, containing 20,232 canonical entries) with the maximum of two missed cleavage sites. In addition, another 251 common contaminants were included in the search using a target false discovery rate (FDR) of 1%. FDR was calculated using target decoy PSM validator node, a part of the Proteome Discoverer workflow. The resulting list was filtered to include only peptides with glycosylation as a modification. This resulted in a final glycoprotein list identified by at least one unique glycopeptide. ETD MS2 data were used for unambiguous site assignment. HCD MS2 data were used for unambiguous site assignment only if the number of GalNAc residues was equal to the number of potential sites on the peptide.
Immunocytochemistry-Cells were set on multiwall glass slides and fixed in ice cold acetone for 10 min before immunofluorescence as previously described (36). Cells were then incubated with MAbs (undiluted hybridoma supernatants) (supplemental Table S1) overnight at 4°C. To assess the expression of ST antigen, cells were pretreated with neuraminidase diluted in 0.1 M sodium acetate buffer (pH 5.5) to a final concentration of 0.1 U/ml, for 1 h at 37°C. FITCconjugated rabbit anti-mouse immunoglobulin (Dako, Glostrup, Denmark) diluted 1:100 in 0,1% bovine serum albumin/PBS (BSA/PBS) solution was used as a secondary antibody during 45 min at room temperature. Cells were visualized and pictures were acquired using a Zeiss Optical Microscope.
Immunohistochemistry-Tissue samples were obtained from University Hospital Sã o Joã o and IPO-Porto and their use approved by the local Ethical committees. Expression of CD44v6 (clone MA54, Invitrogen, 1:50) and STn (Mab 3F1, 1:200) were evaluated in 24 cases of human gastric carcinomas (11 Intestinal subtype and 13 unclassifiable of Lauren's classification (31)). Paraffin sections were dewaxed and rehydrated. Antigen retrieval was carried out by microwave treatment in sodium citrate buffer (10 mM, pH 6.0) for 20 min. Treatment with 3% hydrogen peroxide (H 2 O 2 ) in methanol for 10 min was performed to block endogenous peroxidase. Tissue sections were incubated for 20 min with normal rabbit serum in PBS with 10% BSA, before incubation with MAbs to CD44v6 or STn for 1 h diluted in PBS with 5% BSA. After three washing steps with PBS, sections for immunohistochemistry were incubated with biotin-labeled secondary rabbit anti-mouse antibody (Dako, Glostrup, Denmark) diluted 1:200 in PBS with 5% BSA for 30 min and with ABC kit (Vector Labs, Burlingame, CA) for 30 min. Sections were stained for 2-3 min with 0.05% 3,3Ј-diaminobenzidinetetrahydrochloride (DAB) containing 0.01% H 2 O 2 . Sections were counterstained with Mayers' hematoxylin solution, dehydrated, and mounted. Slides were examined using a Zeiss Optical Microscope.
To evaluate the expression of GalNAc-T5 we produced a novel MAb 5F11 as previously described (37). Briefly, human GalNAc-T5 was recombinant expressed in insect cells as a secreted HIS-tagged construct, purified by Ni-chromatography and used for immunization of mice. Hybridomas were selected and characterized as previously described (37). The specificity of MAb 5F11 for GalNAc-T5 was evaluated by immunocytology with SF9 cells expressing different human GalNAc-T isoforms (GalNAc-T5, T7, T10, and T13) and human cell lines (K562, K562 GalNAc-T5 KO, Colo205, Hek293, LST174, HL60, and A431). MAb 5F11 was used in immunofluorescence labeling of frozen sections of human gastric carcinomas (n ϭ 12, from intestinal, diffuse, and unclassified subtype, according to Lauren's classification (31)) and adjacent gastric mucosa (n ϭ 4) and intestinal metaplasia (n ϭ 4). Serial sections selected for immunofluorescence were stained by hematoxylin and eosin. Immunofluorescence labeling with MAb 5F11 was performed as previously described (21).
In situ Proximity Ligation Assay-In situ Proximity Ligation Assay (PLA) was performed in paraffin sections from gastric carcinoma cases for the detection of colocalization of STn and CD44v6. PLA was performed adapting the procedure previously described (38). Briefly, Duolink In situ Probemaker was used to conjugate the PLA oligonucleotide arms directly to primary antibodies according to the manufacturer instructions (MAb MA54 to CD44v6 and MAb 3F1 for STn). Duolink II reagents from Olink Bioscience were used according to the manufacture instructions. Paraffin sections were dewaxed, rehydrated, antigen retrieval was performed as described previously (38), and endogenous peroxidase blocked before sections were incubated for 30 min at 37°C with the Blocking Solution (Olink Bioscience, Uppsala, Sweden). Conjugated MAbs to CD44v6 (1:50) and STn (1:200) (labeled with PLA probes) were diluted in PBS with 5% BSA and with 1:20 of assay reagent (Olink Bioscience, Uppsala, Sweden) and slides incubated for 1 h at room temperature. Slides were then washed with a filtered solution of 0.01 M Tris, 0.15 M NaCl, and 0.05% Tween 20 (Wash buffer A). For the ligation step, a solution with the ligation stock, consisting of two oligonucleotides, (1:5) and the ligase (1:40) both diluted in pure water, was added to slides for 30 min at 37°C in order to hybridize to the two PLA probes. After washing with wash buffer A, the amplification solution, consisting of nucleotides and fluorescently labeled oligonucleotides, (1:5) together with polymerase (1:80) both diluted in pure water, were added to the samples. The oligonucleotide arm of one of the PLA probes acted as a primer for a rolling-circle amplification (RCA) reaction using the ligated circle as a template, generating a repeated sequence product. The fluorescently labeled oligonucleotides are then hybridized to the RCA product. The slides were then washed with 0.2 M Tris and 0.1 M NaCl (Wash buffer B) and then with 0.01ϫ wash buffer B. To visualize the nuclei cells were incubated DAPI (Sigma-Aldrich, 0.4 mg/ml). Samples were examined under a Zeiss Imager.Z1 Axio fluorescence microscope (Zeiss, Welwyn Garden City, UK). Images were acquired using a Zeiss Axio cam MRm and the AxioVision Rel. 4.8 software.

Characterization of the O-glycoproteome of Gastric Cancer
AGS and MKN45 SCs-In order to identify secreted or shed O-glycoproteins from gastric cancer cells, we first established gastric cancer cell lines displaying simplified homogeneous O-glycoproteomes. We achieved this by targeting the Cosmc chaperone in two different gastric cancer cell lines (AGS and MKN45) to produce SCs (26). The COSMC knockout clones were initially screened and selected by immunocytology with MAb 5F4 detecting the Tn structure. AGS and MKN45 wt cells produce little if any detectable Tn and STn antigens as evaluated by immunocytology with specific MAbs (Fig. 2). Detailed characterization showed that both AGS and MKN45 SCs expressed a mixture of Tn and STn antigens. We confirmed loss of the core1 synthase C1GalT1 and elongated O-glycans by immunocytology with MAbs to C1GalT1 and T, respectively. COSMC KO was verified by target specific PCR followed by sequencing. The selected AGS SC clone had a small ϩ1bp insertion and wt alleles were not detectable. With the MKN45 SC clone we were not able to identify mutated alleles using primers spanning 600 bp around the ZFN target site, and no wt allele was found. It is likely that the ZFN mediated deletion is large and thus not amplified with the used PCR strategy, as previously described for other cell lines (28). We therefore PCR amplified the genes MCTS1 and GLUD2 flanking COSMC and verified their presence in both MKN45 wt and SCs, suggesting that the deletion introduced only affected COSMC (not shown). Both MKN45 and AGS cell lines expressed very little if any ST6GalNAc-I levels detectable by immunocytology with MAb 2C3 (Fig. 2).
Interestingly, AGS SC showed weak reactivity with MAb 3C9 detecting the core1 T O-glycan despite KO of COMSC and loss of the C1GalT1 enzyme as evaluated by immunocytology (Fig. 2). It is currently unclear why AGS SC appear to produce detectable T structures in contrast to other SCs characterized previously (28), but one possibility may be that the relatively high level of expression of C1GalT1 in AGS wt cells (Fig. 2), may result in folding and expression of a minor amount C1GalT1 despite the lack of COSMC in AGS SC.
The analysis of both TCL and SEC of AGS and MKN45 identified a total of 499 O-glycoproteins and 1236 O-glycosites as listed in supplemental Table S2. The TCL analysis identified 252 O-glycoproteins in AGS SC and 216 in MKN45 SC, whereas the SEC analysis identified 135 O-glycoproteins in AGS SC and 303 in MKN45 SC (Fig. 3). Although there was considerable overlap between the set of O-glycoproteins identified in TCL and SEC for each cell line, a majority of O-glycoproteins identified were only found in one of the sources. Of the total glycoproteins and glycosites, 178 O-glycoproteins (36%) and 310 O-glycosites (25%) were shared among the two cell lines (Fig. 3, supplemental Fig. S1). We compared the O-glycoproteome from the two gastric cancer SC lines with the O-glycoproteome derived from 12 human cancer cell lines from different organs of which none were generated from gastric tissues (designated "Other SCs") (28) (Fig. 4). This revealed that we identified a new subset of O-glycoproteins and O-glycosites in the gastric lines that had not been previously found. Specifically, out of the total 499 O-glycoproteins identified in the two gastric SimpleCells, 324 (65%) overlapped with the O-glycoproteins (663) identified in other SimpleCells. Among the novel 175 O-glycoproteins identified only in gastric cells, there were proteins typically expressed in gastric tissues, such as MUC5AC, a gel-forming mucin highly expressed in the gastric epithelium (39), and gastrin, a peptide hormone produced by G cells in the stom-ach (40). Comparing the O-glycosylation sites showed that 733 out of the 1236 (59%) O-glycosites identified in the gastric cell lines overlapped with previously identified O-glycosites (Fig. 4). The finding of new O-glycoproteins and new glycosites in previously identified O-glycoproteins in the gastric cell lines compared with our previous data set from 12 human cell lines, may reflect cell specific differences in proteomes and in particular repertoire of GalNAc-Ts. We previously found that analysis of different cell lines continuously produced a subset of novel O-glycoprotein identifications without apparent saturation (28), however, it should also be considered that in the present analysis we have used more advanced and sensitive mass spectrometry instrumentation.
Characterization of the Aberrant O-glycoproteome of KATO III wt Gastric Cancer Cells-KATO III, in contrast to AGS and MKN45, express Tn/STn glycans as evaluated by immunocytotology, although it also produces more complex O-glycans as evidenced by detection of the core1 (T) glycoform ST (after neuraminidase treatment) by anti-T MAbs (Fig. 2) (36). Moreover, expression of the C1GalT1 enzyme is readily detectable by imunocytotology (Fig. 2). We wanted to evaluate the performance of the VVA LWAC and MS workflow developed for SimpleCells in enrichment and identification of the subfraction of the O-glycoproteome carrying the aberrant Tn/STn O-glycan structures as a prelude to analysis of serum. Remarkably, the application of the VVA glycoproteomics workflow with KATO III wt using the same amount of cells/media (0.5 ml packed cells, 80 ml media) provided a relatively small data set (47 O-glycoproteins, 73 O-glycosites) compared with those obtained with SimpleCells (Fig. 5). However, almost a third of the identified O-glycoproteins (17) were not previously found (supplemental Table S2) (28).
In addition to Tn-glycopeptides we also identified 49 O-glycoproteins (38 unambiguous O-glycosites) with T O-glycans using PNA LWAC after the VVA LWAC step from KATO III TCL (Fig. 5), which is in agreement with the presence of T and ST O-glycans (Fig. 2). Twelve of the T O-glycoproteins overlapped with Tn O-glycoproteins indicating that these proteins carried both Tn and T O-glycosites. Although PNA LWAC appears to have lower sensitivity than VVA (33), we identified a similar number of O-glycoproteins as with VVA.

Characterization of the STn O-glycoproteome in Se-
rum-We then modified the SimpleCell enrichment strategy to capture and analyze O-glycoproteins with the STn glycoform in serum pools from healthy and gastric carcinoma patients (Fig. 6A). We used neuraminidase pretreatment of whole serum to uncover STn glycoproteins and enable VVA LWAC enrichment. We initially showed that the desialylation and enrichment step resulted in accumulation of proteins reactive with Tn antibodies and lectins by ELISA, furthermore showing higher Tn reactivity in samples from cancer patients compared with healthy control serum (not shown). These preliminary results also showed that a large amount of sera would be needed for the LWAC and MS strategy, because semiquantification of Tn reactivity in VVA enriched O-glycoprotein samples obtained from serum versus that obtained from the spent medium of a SimpleCell, suggested that 1 ml of serum generated the equivalent Tn reactivity as 1 ml culture medium from a SimpleCell conditioned media.
Application of 20 ml of pooled gastric cancer serum resulted in identification of a total of 37 O-glycoproteins and 49 O-glycosites ( Fig. 6B; supplemental Table S2). Applying the same strategy to a 20 ml normal serum pool in contrast only resulted in identification of 9 O-glycoproteins (15 O-glycosites) and all of these overlapped with those identified in the cancer serum pool (Fig. 6B; Table I). Some proteins had two or more glycosites and for 35% only a single site was identified.
We also searched MS spectra for HexHexNAc representing the T structure, which would be expected to be widely found on most serum glycoproteins after pretreatment with neuraminidase, although this should not be specifically captured by the VVA lectin enrichment steps unless the structure coincided on proteins/peptides with Tn structures. We did find 11 HexHexNAc sites distributed in 6 O-glycoproteins, and all of these were found on glycopeptides in combination with Tn O-glycosites.
A total of 24 glycoproteins found in the serum analyses were not found in the gastric SimpleCell glycoproteome (Fig.  6C), and of these 14 were not found in any of the SimpleCell glycoproteomes reported previously (Fig. 6E). These 14 were mainly complement proteins, coagulation factors and immunoglobulin chains, and all abundant normal proteins in serum. The O-glycoproteins identified in healthy individuals were predominantly abundant serum proteins (Table I). The relative amounts of a large number of proteins in serum are fairly well established and proteomics strategies often relate their sensitivities to these levels (41,42). Of the serum O-glycoproteins identified where reliable quantification exists, we found that many are placed among the classical plasma protein group that is mainly produced and secreted by the liver. However, we did find glycoproteins such as ADAMTS, Syndecan-1, fractalkine, and Dickkopf related protein1 among a lower abundance set of proteins. Moreover, we identified CD44 and GalNAc-T5 selectively in the cancer serum pool, and both are found in the gastric cancer cell lines as well as in many other SCs (26,28).  Fig. S2). All four possible Ser/Thr glycosylation sites in this peptide have been identified as being glycosylated (28), and in the present study these four sites were also identified in the MKN45 SC and AGS SC. The majority of the O-glycosites identified in CD44 in the gastric cancer cell lines were located in exons v5, v6, and v9. We first used immunohistology to probe expression of CD44 and STn in gastric cancer tissue. We used a MAb directed to the CD44v6 splice-variant, which is known to be highly expressed in gastric cancer (43). Although the peptide found in serum was not located in exon v6, CD44v6 is described as cancer specific, and v9 exon should also be included in the CD44v6-containing isoform (43). Immunohistology showed that CD44v6 was expressed in all 20 gastric cancer tissue samples evaluated. The expression was heterogeneous among the cases ranging between low (Ͻ25% of positive cells), moderate (25-50% of positive cells) and high (Ͼ50% of positive cells) and with variability in intensity (supplemental Table S3; Fig. 7). The STn O-glycan was also expressed in all samples tested with varying intensity and regional heterogeneity (supplemental Table S3). There appeared to be substantial overlap in staining patterns for CD44 and STn. To further confirm co-localization we used PLA with MAbs to CD44v6 and STn on 20 of the same cases of gastric carcinoma, and found that the majority (85%) of the gastric carcinoma cases were positive for PLA signal (Fig. 7; supplemental Table S3). The majority of the cases (65%) showed a strong positive signal, whereas 20% showed moderate pos-itive PLA signal. The 15% of the cases that were PLA negative for CD44v6/STn did appear to display distinct staining patterns without clear co-localization of CD44v6 and STn by immunocytology using the individual MAbs in contrast to the remaining cases. The results support that CD44v6 is aberrantly glycosylated in gastric cancer and carries the STn glycoform.
In serum of cancer patients we identified one T 429 glycosite from GalNAc-T5, and this site was also identified in the gastric cancer SimpleCells. The long stem region of ϳ500 residues in GalNAc-T5 has a large number of O-glycosites reported (30 sites) (28), and most of these (19 sites) and an additional 7 novel ones were also identified in the gastric SCs (supplemental Fig. S3). The T 429 glycosite is positioned in the stem region most proximal to the catalytic domain of the enzyme. Glycosyltransferases are often proteolytically cleaved in the stem region and released as soluble secreted active enzymes (44). GalNAc-T5 has the longest stem region of all known mammalian glycosyltransferases with almost 500 residues and the cleavage site for release of the soluble catalytic domain is unknown. We produced a new MAb (5F11) to human GalNAc-T5 and confirmed expression of this enzyme in the gastric cell lines by immunocytology (supplemental Fig.  S4). We also evaluated the expression of GalNAc-T5 in gastric carcinomas and adjacent non-neoplastic gastric mucosa. This evaluation was done on frozen sections as our MAbs developed to GalNAc-Ts are normally generated to recognize the native active enzymes (37). In normal appearing mucosa GalNAc-T5 was expressed in all epithelial cells with a supranuclear Glogi-like staining pattern (Fig. 8B). GalNAc-T5 was expressed in all (n ϭ 12) gastric carcinomas tested (Fig.  8). In diffuse type gastric carcinoma cases GalNAc-T5 labeling was similarly restricted to a cytoplasmic Golgi-like localization ( Fig. 8J, L), whereas in intestinal-type gastric carcinoma cases a strong diffuse labeling throughout the cytoplasm was found (Fig. 8F, H). The GalNAc-T5 labeling included invasive tumor areas. In addition, GalNAc-T5 was also expressed in the epithelial cells of the adjacent normal gastric mucosa, but here with the typical supranuclear staining pattern found for Golgi-resident glycosyltransferases (37). The diffuse staining pattern found in cancer cells resemble the staining pattern of other glycosyltransferases including GalNAc-Ts in cancer cells, where the Golgi complex is disorganized (45,46). We also evaluated the expression of GalNAc-T5 in intestinal metaplasia areas displayed in the mucosa adjacent to gastric carcinomas in four of the cases, and observed strong distinct supranuclear Golgi-like staining in Goblet cells of metaplastic glands in all four cases (Fig. 8D).

DISCUSSION
It is our hypothesis that cancer cells produce and shed/ secrete O-glycoproteins with truncated cancer-associated Oglycans that can be detected in circulation, and ultimately that such O-glycoproteins may serve as biomarkers with higher specificity. To address this we first characterized the O-glycoproteomes of two gastric cancer cell lines, and identified a large set of shed/secreted O-glycoproteins of which a major fraction was novel (28). We next developed a strategy for selective isolation and identification of circulating O-glycoproteins with truncated STn O-glycans based on our O-glycoproteomics strategy (26), and we were able to confirm that circulating truncated O-glycoproteins are selectively found in cancer patients in a explorative study with pools of sera. Although there was only partial overlap with the shed'ome of the gastric cancer cell lines and the identified serum O-glycoproteins, we confirmed that two interesting candidates, CD44 and GalNAc-T5, found in both gastric cancer cells and in gastric cancer serum, were expressed in gastric primary tumors. Furthermore, applying the PLA technology we could confirm expression of the CD44v6 splice variant with the STn glycoform in tumor tissues. Although the design of our studies was mainly of developmental and explorative nature, the results support the hypothesis and warrant further efforts to identify and develop biomarker assays based on aberrant O-glycoproteins expressed by cancer cells.
The SC O-glycoproteomics strategy enables proteomewide discovery of O-glycosites with high sensitivity (26). We previously applied this strategy to 12 human cancer cell lines derived from different organs to probe the human O-glycoproteome and found that each cell line expressed a minor subset of unique O-glycoproteins (26,28). This suggests that the O-glycoproteomes of cells is differentially regulated by its proteome and its repertoire of GalNAc-Ts as predicted (28).
Here we found that the O-glycoproteomes of gastric cancer cell lines also produced new subsets of O-glycoproteins with some being characteristic for gastric tissues such as MUC5AC (39) and gastrin (40). Moreover, we identified proteins playing pivotal role in the gastric carcinoma neoplastic transformation, such as E-Cadherin (47,48) and c-Met (hepatocyte growth factor receptor), which is highly expressed in MKN45 (49).
The sensitivity of the O-glycoproteome strategy is largely based on selective enrichment of O-glycopeptides and nLC-MS sequencing (50). The VVA lectin is quite efficient in selective enrichment both with O-glycoproteins and O-glycopeptides carrying the Tn O-glycans (51). This enabled us to develop a workflow for enrichment of STn O-glycoproteins in serum using initial desialylation and capture of Tn O-glycoproteins and subsequent enrichment of O-glycopeptides (Fig.  6A). Several O-glycoproteome studies of serum have been undertaken in the past using PNA (35,52), and these have provided insight into the general presence of O-glycoproteins and O-glycosites in serum. Our interest was to isolate the small subfraction of circulating O-glycoproteins with aberrant STn O-glycosylation, which are predicted to exist from studies with the CA72-4 assay (53, 54) and lectin arrays (55)(56)(57)(58). High levels of STn are detectable in sera of patients with different cancer types (59 -61) and this is associated with lower survival (62,63). We confirmed that STn O-glycoproteins are selectively found in gastric cancer patients. Because of the amount of serum needed for our glycoproteomics workflow we had to use pools of sera, and it is clearly necessary to perform studies of individual sera in future.
We only identified a limited set of 37 O-glycoproteins in serum, which compared with our discovery rate in Simple-Cells is quite low. However, applying the same workflow to wt KATO III gastric cancer cells, which express Tn and STn without COSMC knockout, we also only identified a small set of O-glycoproteins (Fig. 5). Although this clearly illustrates the sensitivity of the SimpleCell strategy, it also shows that cancer cells expressing detectable Tn and STn phenotypes by immunocytology, may not provide the same level of sensitivity. The difference may reflect more inherent difficulties with analysis of heterogeneous glycoproteins and/or low quantities. In this respect we did identify O-glycoproteins in serum with estimated concentrations at 5-20 pg/ml (Dickkopf-related protein 1, GalNAc-T5, ST6GalNAc-I), that represents nearly a 10 9 -fold scale of the higher abundant proteins.
Out of the O-glycoproteins identified in both SimpleCells and cancer serum, we focused on CD44 and GalNAc-T5. CD44 is the major cell surface receptor for hyaluronate (64), that serves as a gastric stem cell marker and has been implicated in important cellular functions (47,65). The CD44 gene encodes several protein isoforms because of alternative splicing (66). The variant isoforms of CD44 (CD44v) seem restricted to subpopulations with stem cell potential, and they have been proposed to be involved in cancer development. The CD44v6 in particular has been suggested to play a major role in cancer progression because of its ability to bind hepatocyte growth factor (HGF), osteopontin (OPN), and other major cytokines produced in the tumor microenvironment (67). The glycosite identified in CD44 from gastric cancer serum was located within the peptide IPVTSAKTGSF 584 , which includes the sites also identified in the gastric cancer SimpleCells. This peptide sequence is located in exon v9 of CD44 (supplemental Fig. S2). CD44v6 containing isoforms are highly expressed in premalignant and malignant lesions of the stomach, pinpointing their potential as biomarkers for early transformation of the gastric mucosa (43). In gastric cancer, the CD44v6 isoforms contain exon v9 in which we identified the O-glycosite in gastric cancer serum (43). We also found O-glycosites in the v6 specific sequence but only in the Oglycoproteomes derived from AGS and MKN45 SCs. We confirmed that gastric cancer tissue express the CD44v6 variant and carries STn O-glycans by in situ PLA. It is therefore likely that CD44v6/STn in serum originate from cancer cells and may represent a useful biomarker.
GalNAc-T5 is a poorly studied member of the GalNAc-T family (68), which is expressed in kidney and stomach (69,70). RNA-seq data obtained in various human tissues by the Illumina Human Body Map Project (HBM) (www.illumina.com; ArrayExpress ID: E-MTAB-513) indicates that GalNAc-T5 is expressed primarily in the lung in man and according to the Human Protein Atlas (www.proteinatlas.org) GalNAc-T5 is also primarily expressed in many tissues from the digestive tract as stomach, colon and rectum, while the original Northern blot analysis performed when GalNAc-T5 was cloned showed expression in sublingual gland, stomach, small intestine and colon of the rat tissues (68). Interestingly, we have found GalNAc-T5 derived O-glycosites in most human cancer cell lines tested with our SC O-glycoproteomics strategy (28), suggesting that this GalNAc-T isoform is more widely expressed. Using a new MAb we found that GalNAc-T5 is expressed in the three gastric cancer cell lines studied and in K562, Colo205, LST174, but only weekly or not detectable in other cancer cell lines like HL60, OVCAR3 and HeLa. We also found that GalNAc-T5 is strongly expressed in normal gastric epithelium and in gastric cancer (Fig. 8). The expression of GalNAc-T5 in normal gastric tissues was limited to a perinuclear Golgi-like localization, as previously found for other Gal-NAc-Ts. In contrast, GalNAc-T5 was expressed throughout the cytoplasm of cells in gastric cancer tissues. A recent study also reported that GalNAc-T5 was expressed in normal gastric epithelium and gastric cancer using a polyclonal commercial antibody generated to a recombinant peptide fragment from the stem region (70). This study suggested that Gal-NAc-T5 expression was decreased or lost in advanced TNM stages of gastric cancer. However, the decreased reactivity in the advanced stages of cancer was not correlated with mRNA levels for the enzyme, as these were similar among all the samples. Interestingly, this apparent discrepancy could be related to our finding that the stem region of GalNAc-T5 is O-glycosylated. The polyclonal antibody used to detect Gal-NAc-T5 was raised to a fairly short non-glycosylated peptide and potential enhanced O-glycosylation could block reactivity. The MAb generated in the present study was produced to the active catalytic domain of GalNAc-T5 to ensure reactivity with the native enzyme, and further studies with this MAb may help clarify whether GalNAc-T5 may serve as a biomarker. Nevertheless, the present study provides compelling evidence that GalNAc-T5 is shed from gastric tumors with STn O-glycans and may have potential as a biomarker.
In summary, we have developed a fruitful discovery strategy for aberrant O-glycoproteins in serum and identified a number of O-glycoproteins that may serve as serum biomarkers for gastric cancer. We also provided compelling evidence that at least some of the aberrant O-glycoproteins detected in cancer serum are derived from cancer cells, and the strategy should be applicable to many cancers known to express STn.