N-glycosylation Profiling of Colorectal Cancer Cell Lines Reveals Association of Fucosylation with Differentiation and Caudal Type Homebox 1 (CDX1)/Villin mRNA Expression

Various cancers such as colorectal cancer (CRC) are associated with alterations in protein glycosylation. CRC cell lines are frequently used to study these (glyco)biological changes and their mechanisms. However, differences between CRC cell lines with regard to their glycosylation have hitherto been largely neglected. Here, we comprehensively characterized the N-glycan profiles of 25 different CRC cell lines, derived from primary tumors and metastatic sites, in order to investigate their potential as glycobiological tumor model systems and to reveal glycans associated with cell line phenotypes. We applied an optimized, high-throughput membrane-based enzymatic glycan release for small sample amounts. Released glycans were derivatized to stabilize and differentiate between α2,3- and α2,6-linked N-acetylneuraminic acids, followed by N-glycosylation analysis by MALDI-TOF(/TOF)-MS. Our results showed pronounced differences between the N-glycosylation patterns of CRC cell lines. CRC cell line profiles differed from tissue-derived N-glycan profiles with regard to their high-mannose N-glycan content but showed a large overlap for complex type N-glycans, supporting their use as a glycobiological cancer model system. Importantly, we could show that the high-mannose N-glycans did not only occur as intracellular precursors but were also present at the cell surface. The obtained CRC cell line N-glycan features were not clearly correlated with mRNA expression levels of glycosyltransferases, demonstrating the usefulness of performing the structural analysis of glycans. Finally, correlation of CRC cell line glycosylation features with cancer cell markers and phenotypes revealed an association between highly fucosylated glycans and CDX1 and/or villin mRNA expression that both correlate with cell differentiation. Together, our findings provide new insights into CRC-associated glycan changes and setting the basis for more in-depth experiments on glycan function and regulation.

Colorectal cancer (CRC) 1 is a very prevalent and heterogeneous pathology with highly variable disease progression and clinical outcome among patients. It is the third most common cancer in men and the second most common in women (1) with a highly stage-specific patient survival (2). Treatments are often curative for patients with local disease stages (stage I-II), whereas a 5-year survival of only 13% is observed in patients with distant metastasis (stage IV) (2). As CRC is often asymptomatic in the first years, unfortunately, only 40% of the patients are diagnosed at stage I-II, thus pointing to the urgent need of sensitive diagnostic tools for early detection and consequently effective, curative treatment (3). In this context, understanding the complex mechanisms of CRC is an over-riding condition for the development of new, more efficient means of detection, treatment, and prognosis of the disease.
Altered glycosylation is a hallmark of cancer (4) and is known to occur with cancer progression (4,5) as glycans are involved in many cancer-associated events such as adhesion, invasion, and cell signaling (6). As a result of altered glycan structures, cellular processes can be affected due to a change of interactions with glycan-binding proteins (7)(8)(9). Several CRC tissue-associated changes in N-glycans, O-glycans, and glycosphingolipid glycans have been reported and recently reviewed (7). For instance, N-glycans extracted from colorectal tumor tissues are characterized by an increase of sulfated glycans, (truncated) high-mannose-type glycans, and glycans containing sialylated Lewis type epitopes, while showing a decrease of bisection as compared with glycans from nontumor colorectal tissue of the same individuals (10). In accordance, elevated expression of sialyl Lewis A (NeuAc␣2,3Gal␤1,3[Fuc␣1,4]GlcNAc-R; NeuAc ϭ N-acetylneuraminic acid, Gal ϭ galactose, Fuc ϭ fucose, GlcNAc ϭ N-acetylglucosamine, R ϭ rest) and pauci-mannosidic Nglycans (truncated high-mannose-type, Man1-4GlcNAc1-4GlcNAc; Man ϭ mannose) was recently found to be correlated with poor prognosis in (advanced) colon carcinomas and N-glycomic profiling was successfully applied to distinguish colorectal adenomas from carcinomas (11).
Due to limitations in accessibility of tumor materials and possibilities of in vivo studies on a large scale, cancer cell lines represent a relevant alternative and are widely used as model systems for studying the molecular mechanisms associated with cancer outcome and progression. Since the early 1960s, colorectal cancer cell lines have been established with HT29, LoVo, LS-180, LS-174T, and Co115 representing the first continuous cell lines derived from colon tumors and xenografts (12)(13)(14). Major benefits of cancer cell lines are their continuous availability, their fast growth, and relatively easy handling, making them suitable also for high-throughput screenings (15) and a large range of experimental possibilities (16). Of note, advantages and limitations of cell lines have been recently reviewed (15).
In order to select suitable in vitro models, the characterization of molecular features and their comparison to tumor tissues are needed. A detailed Cancer Cell Line Encyclopedia was recently established containing a genomic dataset for 947 human cancer cell lines, from which 58 are colorectal cancer lineages (17). The Cancer Cell Line Encyclopedia includes data collections on genomic characterization, point mutation frequencies, DNA copy number, and mRNA expression levels. Comparison of these features between cell lines and primary tumors showed a high correlation in most cancer types, especially for colorectal cancer, suggesting that cell lines do represent tumor tissues quite reasonably at least on the genetic level. However, the number of publications characterizing cancer cell lines at a molecular level is far behind the number of articles using cancer cell lines as model sys-tems (18), and only few studies have been conducted on whether in vitro cultured cell lines can serve as suitable models for human tumors (19 -22). Furthermore, cell lines are well characterized genetically, but they are largely understudied with regard to their glycosylation profiles.
Here, we developed and optimized a new analytical method for the more sensitive and higher throughput N-glycome profiling of cells. This method is based on the release of Nglycans in a 96-well plate format from a PVDF-membrane (23) starting from a low number of cells (250,000 cells), the chemical derivatization of released N-glycans enabling the stabilization and discrimination of ␣2,3and ␣2,6-linked N-acetylneuraminic acids (24), followed by registration of the N-glycans by MALDI-TOF(/TOF)-MS. The method was applied to characterize the N-glycome of 25 different colorectal cell lines in a fast and robust manner, including biological and technical replicates for all the cell lines. We obtained the comprehensive N-glycan profiles of 21 cell lines derived from primary tumors, two from lymph node metastases, one from a lung metastasis, and one from ascites fluid to assess their potential as glycobiological tumor model systems. Cancer cell line glycosylation features were then correlated with cancer cell markers and phenotypes as well as glycosyltransferase expressions. This study provides new insights into coloncancer-associated glycan changes and sets a basis for studies into the functions of N-glycans in CRC with cell lines as model systems.
Cells and Cell Culture-Human colorectal cancer cell lines (see Table I and Supplemental Table S1) were obtained from the Department of Surgery of the Leiden University Medical Center (LUMC), Leiden, The Netherlands, as well as the Department of Pathology of the VU University Medical Center (VUmc), Amsterdam, The Netherlands. Cells were cultured in Hepes-buffered RPMI 1640 culture medium containing L-glutamine and supplemented with penicillin (5000 IU/ml), streptomycin (5 mg/ml), and 10% (v/v) fetal calf serum (FCS) at the LUMC or in DMEM medium, supplemented with 10% (v/v) FCS and antibiotics, except for the KM12 cell line, which was grown in RPMI/10% FCS/antibiotics and L-glutamine at the VUmc. Cells were incubated at 37°C with 5% CO 2 in humidified air and cell culturing was performed up to a confluence of 80% under sterile conditions. For harvesting of the cells, medium was removed and adherent cells were washed twice with 1x PBS and trypsinized using 1x trypsin/EDTA solution in 1x PBS. To stop trypsin activity, medium in a ratio of 2:5 (trypsin/EDTA/medium; v/v) was added and cells were pelleted at 300 ϫ g for 5 min. Cells were then resuspended in 3 ml 1x PBS and counted using the Countess TM Automated Cell Counter (Invitrogen, Paisley, UK) based on tryptan blue staining. Cells were aliquoted to 2.0 ϫ 10 6 cells per ml of 1x PBS, and washed twice with 1 ml 1x PBS for 3 min at 1000 ϫ g. The supernatant was removed and pellets stored at Ϫ20°C until further use.
Release of N-glycans from Glycoprotein-Cell pellets (ϳ2 ϫ 10 6 cells) from two or three biological replicates of each colorectal cancer cell line (see Table I and Supplemental Table S1) were suspended in 100 l mQ and sonicated in a water bath for 30 min. As control, 20 l of human control Visucon-F plasma was used that was brought to 100 l with mQ. Glycans were released using a PVDF-membrane based protocol adapted from Burnina et al. (23). Shortly, 0.25 ϫ 10 6 cells/well or 25 l/well diluted human plasma in denaturation buffer (5.8 M GuHCl and 5 mM DTT), were loaded in quadruplicate onto preconditioned HTS 96-well plates with hydrophobic Immobilon-P PVDF membrane and incubated for 30 min at 60°C in a moistured, sealed box as an incubation chamber within an oven. Plates were shaken for 5 min on a horizontal shaker prior to centrifugation (1 min, 500 ϫ g). The wells were washed twice with 200 l mQ with 2 min incubation steps on a horizontal shaker prior to centrifugation and once with 200 l 100 mM NaHCO 3 (1 min, 500 ϫ g). For N-glycan release, 50 l 100 mM NaHCO 3 and 1 mU PNGase F (Roche) were added per well. Plates were placed into the incubation device and incubated for 3 h at 37°C. Glycans were recovered into 96-well collection plates by centrifugation (2 min, 1000 ϫ g); eventual residual solution was collected from the membrane. Release of N-glycans from the Cell Surface-A subset of CRC cell lines were used to obtain the cell surface N-glycome in comparison to the N-glycosylation profile of the residual pellet using a protocol modified from Hamouda et al. (26). CRC cells were cultured as described above. Cells were harvested, pelleted, and aliquoted in Eppendorf tubes to 4 ϫ 10 6 cells in 1x PBS. Subsequently, cells were washed 2 ϫ 10 min and 1 ϫ 5 min at 300 ϫ g with 500 l 1x PBS. Next, cell pellets were carefully dissolved in 500 l 1x PBS and 3.5 l recombinant PNGase F (ϳ9 g) were added and samples incubated for 30 min at 37°C and 250 rpm in an Innova 43 incubator shaker (New Brunswick, Enfield, CT). Afterwards, samples were centrifuged 15 min at 500 ϫ g and the supernatant containing the released glycans was collected. Supernatant and residual pellet were stored separately at Ϫ20°C until further use. N-glycans from residual pellets were then released using the PVDF-membrane based protocol as described, but with 1 million cells per well and overnight PNGase F incubation.
Derivatization of N-glycans and Hydrophilic Interaction Liquid Chromatography (HILIC) Solid Phase Extraction Glycan Enrichment-Released N-glycans were derivatized by ethyl esterification adapted from Reiding et al. (24) allowing for discrimination of N-acetylneuraminic acid linkages (␣2,3 versus ␣2,6). Briefly, 20 l of released N-glycans from the total and residual cell pellets as well as the control samples were added to 100 l of ethyl esterification reagent (0.25 M EDC and 0.25 M HOBt, 1:1 v/v). For the cell surface glycans, the supernatant was concentrated in a vacuum centrifuge to a volume of 20 -30 l, and 100 l ethyl esterification reagent was added. Samples were incubated for 1 h at 37°C. Subsequently, 100 l ACN were added and the mixture was incubated at Ϫ20°C for 15 min. Samples were brought to room temperature prior to glycan purification by hydrophilic interaction liquid chromatography (HILIC) solid phase extraction modified from a protocol described previously (27). For purification of pellet derived N-glycans, pipette tips of 20 l volume were packed with a 4 mm piece of cotton thread (ca. 200 g of cotton), equilibrated by pipetting 3 ϫ 20 l mQ water, followed by 3 ϫ 20 l 85% ACN. Samples were loaded by carefully pipetting up and down for 50 times. The tips were washed by pipetting 3 ϫ 20 l 85% ACN/1% TFA and 3 ϫ 20 l 85% ACN. N-glycans were eluted in 10 l mQ water. For cell surface glycans, pipette tips of 200 l volume were packed with cotton wool (ca 1000 g) to prevent clogging through salts. The cotton HILIC purification was performed as described but with 150 l pipetting volume, while elution was kept at 10 l mQ water.
MALDI-TOF-MS Analysis-For mass spectrometric analysis, 5 l of derivatized N-glycans were spotted onto an anchor chip MALDI target plate (Bruker Daltonics) and cocrystallized with 1 l of 1 mg/ml superDHB in ACN/mQ (1/1, v/v) containing 1 mM NaOH. Samples were allowed to dry at room temperature and were then recrystallized with 0.5 l 5 mg/ml superDHB in ACN/mQ (1/1, v/v) containing 1 mM NaOH. MALDI-TOF-MS spectra were acquired using an Ultrafle-XtremeTM mass spectrometer in the positive-ion reflector mode, controlled by FlexControl 3.4 software Build 119 (Bruker Daltonics). The instrument was calibrated using a Bruker peptide calibration kit. Spectra were obtained over a mass window of m/z 1000 -5000 with ion suppression below m/z 900 for a total of 10,000 shots (1000 Hz laser frequency, 200 shots per raster spot during complete random walk). Tandem mass spectrometry (MALDI-TOF-MS/MS) was performed for structural elucidation via fragmentation in gas-off TOF/ TOF mode.
Data Processing of MALDI-TOF-MS Spectra-A mean average spectrum over all the total cell line sample spectra was generated using an in-house developed script in Python 2.7.3 (Python Software Foundation; http://docs.python.org/py3k/reference/index.html). The average spectrum was internally recalibrated using glycan peaks of known composition (Supplemental Table S2), smoothed (Savitzky Golay algorithm, peak width: m/z 0.06, four cycles), and baselinecorrected (Tophat algorithm) using FlexAnalysis Software (Version 3.3; Bruker Daltonics). Peaks of signal-to-noise Ͼ 2 were picked, manually revised, and analyzed in GlycoWorkbench 2.1 stable build 146 (http://www.eurocarbdb.org/) using the Glyco-Peakfinder tool (http://www.eurocarbdb.org/ms-tools/) for generation of a glycan compositions list. Using MassyTools version 0.1.5.1, which is a novel in-house software developed for automated data processing, the resulting glycan peak list generated based on the average spectrum was used for targeted data extraction of the area under the curve for each of the CRC cell line N-glycomic profiles (28). Within this software, background was determined dynamically and subtracted from intensities of all isotopic peaks prior to calibration of each spectrum and targeted data extraction. Several quality parameters are calculated in order to assess the quality of each individual spectrum as well as picked glycan peaks and allow for good quality data selection. Based on quality parameters, the composition list was reviewed, and only the glycans exhibiting a correct isotopic peak pattern and an average signal noise per cell line above 6 in at least 50% of the spectra of a specific cell line (including both technical and biological replicates) were used for final data extraction. Selected glycan compositions were confirmed by MS/MS. The final peak list as well as MS/MS data are listed in Supplemental Table S3. Further, glycan profiles had to pass at least two out of three quality criteria for inclusion: (i) total intensity Ͼ 1 ϫ 10 5 , (ii) fraction of analyte area with signal noise Ͼ 6 is more than 80%, (iii) fraction of glycan analyte intensity in total spectrum intensity Ͼ 40%. The area-under-the-curve values were rescaled to a total relative intensity of 100% for each spectrum.
For the comparison of cell surface and residual pellet N-glycomes, a condensed peak list was used due to the slightly lower spectral quality which resulted in the detection of less analytes.
Data and Statistical Analysis-Averages, standard deviations, and relative standard deviations were calculated for all biological replicates of the cell lines (Supplemental Table S4). Preprocessed, selected, and rescaled total cell N-glycome data was imported into SIMCA software Version 13.0 (Umetrics AB, Umea, Sweden), and a principle component analysis (PCA) was performed to reveal outliers and batch effects and to study the effects of possible confounders. A PCA displays the variation in the data on new vectors (principle components) and makes it possible to visualize the differences in the data in a two-dimensional way. The PCA is an unsupervised model that finds "natural" variation in the data without over-fitting. The observations (ϭ cell line samples) are displayed in the score plots, while the variables (ϭ relative intensity values for each glycan) are displayed in the loading plots. The glycans located at the outskirts of the loading plot are those contributing most to the displayed principle components. The samples located on the outskirts of the score plot are those showing a particularly large deviation (variation) from the other samples. In addition, the particular locations of data points in score and loading plots are associated. For example, samples with particularly high intensities of glycans located in the top left of the loading plot will locate in the top left of the corresponding score plot. Furthermore, we classified the glycans in traits such as levels of fucosylation and sialylation, for which later also relative abundances were calculated (see below). We used these traits to color the variables in the loading plots to find possible contributions of glycan traits to principle components.
To validate observed trends in PCA, averages per cell type (biological replicates) were used to calculate glycan traits (calculations see Table II). Relative intensities were first summed according to N-glycan types (pauci-mannose, high-mannose, hybrid, and complex type). Data were then rescaled to 100% excluding high-mannose-type glycans to prevent influences of possible high-mannose-type intracellular precursors. Then, additional glycan-derived traits for complex-and hybrid-type glycans were calculated (mono-fucosylation, multi-fucosylation, ␣2,6and ␣2,3-sialylation, ratio Hex versus HexNAc, and number of antennae) and compared (Supplemental Table S5). Mann-Whitney and Kruskal-Wallis with Tukey's posttest with significance level ␣ ϭ 0.05 were performed in GraphPad Prism Version 5.04 (GraphPad Software, Inc., La Jolla, CA) to explore differences of glycan traits with cell line characteristics such as stage of the original tumor, tumor site (primary versus metastasis), and differentiation.
AA-Labeling and LC-ESI-ion trap-MS/MS-Released glycans from technical replicates of total cell pellets were pooled per cell line and 20 l sample incubated with 40 l AA-labeling solution (48 mg/ml AA in DMSO/15% glacial acetic acid and 1 M 2-picoline borane in DMSO, 1:1, v/v) for 2 h at 60°C (29). Samples were cooled down to room temperature, brought to 85% ACN, and purified by cotton-thread HILIC solid phase extraction as described before but with elution in 5 l mQ water. AA-labeled glycans were transferred to glass vials with insert and 0.5 l of sample was injected into a nano-RP-LC-ESI-ion trap-MS(/MS) system for glycan fragmentation analysis. Within the Ultimate 3000 RSLCnano system (Thermo Scientific, Sunnyvale, CA), samples were first concentrated onto a trap column (Acclaim Pep-Map100 C18 column, 100 m ϫ 2 cm, C18 particle size 5 m, pore size 100 Å, Thermo Scientific) prior to separation on an Acclaim PepMap RSLC nano-column (75 m ϫ 15 cm, C18 particle size 2 m, pore size 100 Å, Thermo Scientific). A flow rate of 700 nl/min was applied in a multistep linear gradient (t ϭ 0 -5 min, c(B) ϭ 3%; t ϭ 35 min, c(B) ϭ 27%; t ϭ 40 -45 min, c(B) ϭ 70%; t ϭ 46 -58 min, c(B) ϭ 3% with 0.1% formic acid in water as solvent A, and 95% ACN and 5% water as solvent B). The separation was also monitored by UV absorption at 215 nm. The LC-system was coupled to a CaptiveSpray nanoBooster (Bruker Daltonics) for mass spectrometric MS/MS analysis on an AmazonSpeed ion trap (Bruker Daltonics) using a captive spray for ionization of samples in the positive ion mode. For the electrospray (1300 V), fused-silica capillaries with an internal diameter of 20 m were used. Solvent evaporation was achieved at 220°C with an ACN enriched nitrogen nebulizer gas at a pressure of 0.2 bar. Tandem MS was performed automated on the seven highest precursors per MS with ion detection from m/z 100 to 3000. Fragmentation spectra were analyzed using Data Analysis 4.2 Build 387 (Bruker Daltonics). Analysis on AA-labeled glycans was exclusively used for structural elucidation (Supplemental Table S3) and targeted search for indicative fragment ions of blood group antigens (Supplemental Table S6).
RNA Isolation and cDNA Synthesis-For mRNA analysis, 1 million of the cultured cells were transferred to RNase-Free Eppendorf tubes, centrifuged 5 min at 300 ϫ g, pelleted, and lysed in 500 l of lysis buffer. Lysated cells were stored at Ϫ80°C until mRNA was specifically isolated by capture of poly(Aϩ) RNA in streptavidin-coated tubes using a mRNA Capture kit. cDNA was synthesized using a Reverse Transcription System kit following the manufacturer's guidelines. Lysates were incubated with biotin-labeled oligo(dT)20 for 5 min at 37°C and then 50 l of the mix was transferred to streptavidincoated tubes and incubated for 5 min at 37°C. After washing three times with 250 l of washing buffer, 30 l of the reverse transcription mix (5 mM MgCl 2 , 1ϫ reverse transcription buffer, 1 mM dNTP, 0.4 U of recombinant RNasin RNase inhibitor, 0.4 U of reverse transcriptase, 0.5 g of random hexamers in nuclease-free water) were added to the tubes and incubated for 10 min at room temperature followed by 45 min at 42°C. To inactivate avian myeloblastosis virus (AMV) reverse transcriptase and separate mRNA from the streptavidin-biotin complex, samples were heated at 99°C for 5 min, transferred to microcentrifuge tubes, and incubated in ice for 5 min, diluted 1:2 (v/v) in nuclease-free water, and stored at Ϫ20°C until analysis.
Real-Time PCR-Oligonucleotides for 17 glycosyltransferases (GTs) were designed by using the computer software Primer Express 2.0 (Applied Biosystems, Foster City, CA), synthesized by Invitrogen Life Technologies (Breda, The Netherlands), and are published elsewhere (30). PCRs were performed with the SYBR Green method in an ABI 7900HT sequence detection system (Applied Biosystems). The reactions were set on a 96-well-plate by mixing 4 l of the two times concentrated SYBR Green Master Mix (Applied Biosystems) with 2 l of a oligonucleotide solution containing 5 nM/l of both oligonucleotides and 2 l of a cDNA solution corresponding to 1:100 (v/v) of the cDNA synthesis product. The thermal profile for all the reactions was 2 min at 50°C, followed by 10 min at 95°C and then 40 cycles of 15 s at 95°C and 1 min at 60°C. The housekeeping gene glyceraldehyde-3-phosphate dehydrogenase was used as endogenous reference (31).
Glycosyltransferase mRNA Data and Statistical Analysis-To calculate the relative abundance of the genes, the formula 100 ϫ 2 (Ct glyceraldehyde-3-phosphate dehydrogenase -Ct glycosyltransferase) was used, where Ct is the cycle threshold. In this formula, the Ct value is defined as the number of PCR cycles at which the SYBR-green fluorescent signal exceeds the threshold of 0.2 relative units (32). Averages were calculated for nine technical replicates and the three highest and lowest values were colored in green and red, respectively (Supplemental Table S7). In order to improve comparability, mRNA data was as well LOG2 transformed. To assess the quality and discriminative power, GT mRNA data were imported into SIMCA software and a PCA model was built as described above. The nine technical replicates were used as individual cross-validation groups. The median across the included cell lines was calculated and GT mRNA data was compared with relative abundances of corresponding glycan traits based on the total N-glycome. Both derived traits based on MS data as well as GT gene expression data were imported into GraphPad Prism (GraphPad Software), and a linear regression model was calculated.

RESULTS
High-Throughput Method for Cell Line N-glycosylation Profiling-We established a high-sensitivity method for MALDI-TOF-MS profiling of N-glycans from cell lines. Cellular (glyco-) proteins were solubilized in chaotropic agents and applied to PVDF-membranes in a 96-well format (23). N-glycans were released from adsorbed proteins by PNGase F and derivatized by ethyl esterification to achieve stabilization of sialic acids as well as differentiation of sialic acid ␣2,3and ␣2,6linkages (24). Glycans were purified using cotton-HILIC microsolid phase extraction (27) and analyzed by MALDI-TOF-MS (see workflow in Supplemental Fig. S1). This novel workflow showed high sensitivity and allowed the robust acquisition of high-quality N-glycan profiles from 250,000 cells. The total cell N-glycan profiling workflow was applied to analyze 25 CRC cell lines in multiple biological and technical replicates per cell line, in order to be able to discern biological and technical variation. Applying an unsupervised principle component analysis (PCA) revealed only little variation between replicates (technical as well as biological) of each cell line as seen by close clustering of scores (Supplemental Figs. S2A and 2B). The dataset was further analyzed for possible batch effects. PCA scores were colored according to batches (Supplemental Fig. S2C), and technical replicates distributed on different plates as well as different sample preparations were compared. Based on this comparison, batch effects were found to be minor and no correction was applied. Relative intensities for all glycans in the biological replicates of the cell lines with standard deviation and relative standard deviation are given in Supplemental Table S4. Relative standard deviations were below 20% for peaks with relative intensities above 5%, with exception for cell line C10, showing slightly higher variation. Although this method already provided a robust data acquisition, glycan traits were calculated by summing up relative intensities of glycan peaks corresponding to a certain glycan class (see Table II) to further increase the robustness of the data analysis.
N-glycosylation Profiling of 25 Colorectal Cancer Cell Lines-This sensitive and robust N-glycan profiling method was used to characterize the N-glycosylation of 25 CRC cell lines. Exemplary N-glycan profile spectra of two cell lines are shown in Fig. 1, while a complete set of N-glycan profiles is provided in Supplemental Fig. S3. Calculated traits are depicted in Fig. 2 and Supplemental Table S5. The N-glycan profiles of almost all CRC cell lines were dominated by highmannose-type glycans (37.5% to 64.3%, Ø 53.0%, Figs. 1 and 2A; for structural evidence see Fig. 3A). The only exception was the cell line HCT116 which also showed a high abundance of complex type glycans (58.9%). Complex type glycans could be detected up to m/z 4500 and their abundance varied considerably among the different cell types (31.3% to 58.9%, Ø 43.1%, Fig. 2A). The high abundance of high-mannose-type glycans may reflect a vast contribution of intracellular precursors, especially indicated by the glycan Man9HexNAc2Glc1 (m/z 2067.69), which led to the decision to exclude high-mannose-type glycans to calculate the derived glycosylation traits related to complex type N-glycans (e.g. Hex/HexNAc ratio, sialylation levels). To prove, however, that high-mannose-type glycans are indeed also present on the cell surface, N-glycans were shaved of the plasma membrane in a 30-min incubation step of living cells in the presence of high concentrations of recombinant PNGase F. The relative abundance of high-mannose-type N-glycans on the cell surface with respect to the residual pellet is given in Fig.  2B, showing that high-mannose-type N-glycans are expressed on plasma membrane proteins but to a lower extent as compared with their presence as intracellular precursor. The cell line C10 is an exception and shows similar levels of high-mannose-type glycans on the cell surface and the residual pellet. An exemplary MS spectrum comparing the cell surface N-glycome of CaCo2 cells with the residual pellet containing intracellular precursors is shown in Supplemental  Fig. S4.
For further data analysis, a PCA model based on biological replicates (averaged technical replicates) was generated using SIMCA software. This resulted in a model explaining 84.4% (R 2 Xcum) of the data with 13 principal components (PCs) featuring a good prediction power of 65.4% (Q 2 cum). Coloring the scores according to cell lines showed clear clustering based on the cell type as shown in Fig. 4A, indicating robust glycosylation features shared by biological and technical replicates. Cell line samples clustered within the Hotelling's T 95% with exception of HCT116, which seems to differ vastly from other tested cell lines. Coloring the loading plots, which represent each glycan variable, according to glycan classes helped to exploit which differences in glycosylation drove the separation in the PCA model. Trends observed in PCA models were then compared with relative abundancies of glycan classes by derived trait calculations based on the total cellular N-glycome.
Fucosylation-While PC1, 3, and 4 appeared to be linked to antennarity and terminal HexNAc residues, PC2 (13.5%) was found to reflect fucosylation, separating fucosylated (upper half) from non-fucosylated glycans (lower half; Fig. 4C). Accordingly, score plots of PC1 versus PC2 (Fig. 4A) show HCT116 and DLD-1 located apart from the other cell lines in the lower half, in line with the low relative quantities of fucosylated glycans in these cell lines (Fig. 2E). Relative quantification of complex type glycans reveals only 24.4%, 29.1%, and 41.2% fucosylation in HCT116, DLD-1, and HCT116_VUmc, respectively; whereas all other investigated CRC cell lines contain 74.4% to 86.4% fucosylated N-glycans (Fig. 2E, Supplemental Table S5). Cell line HCT116 has a mutation in the fucosylation pathway and is therefore expected to have low fucosylation. Splitting the total fucosylation in mono-fucosylation (one fucose) and multi-fucosylation (2-5 fucoses, indicative for the presence of Lewis-type antigens) revealed mono-fucosylation to be most abundant in Colo320_VUmc cells (72.4%), followed by SW1463 (67.3%), C10 (66.8%), and Co115 (59.4%). MS/MS spectra give evidence of core-fucosylation in many of the N-glycans, with the loss of the reducing end GlcNAc only occurring after the loss of fucose (Supplemental Fig. S3D, Supplemental Table S3). Nevertheless, mono-fucosylation can also occur on antenna and is not a reliable measure of core-fucosylated glycans. LS180 cells showed highest multi-fucosylation, based on relative quantification of MS data (53.7%), followed by cell lines T84 (53.4%), HCT8 (51.8%), LS174T (50.7%), and SW1116 (47.5%; Fig. 2E, Supplemental Table S5). The presence of antenna fucosylation is indicated by MS/MS data (Supplemental Table S3). Furthermore, antenna fucosylation with indications of the presence of blood group antigens was found (Supplemental Table S6) which is in accordance with literature information  (Hex1HexNAc1␣2,6NeuAc1). The presence of structural isomers cannot be excluded. Annotation was performed using GlycoWorkbench 2.1 stable build 146 (http://www.eurocarbdb.org/). of the expression of blood group antigens in some of the cell lines (Supplemental Table S1). In cell line SW48 (blood type AB), fragment ions indicative of the presence of blood group A antigen (GalNAc-Gal-(Fuc)-GlcNAc-) at m/z 308.88  N-glycans (Supplemental Fig. S6). The presence of blood group A antigens further contributes to the trait Hex/HexNAc ratio. However, abundances of blood group antigens were rather low on N-glycans and expression of these antigens might be higher on other glycan and glycoconjugates such as O-glycans and glycosphingolipids.
Overall, observations in the PCA plots could majorly be confirmed by relative quantification using derived traits, showing a vast difference between the cell lines. Furthermore, the coloring of loadings according to glycan classes was found to be a promising way to explore the data and to find the discriminators contributing to each principal component.
Comparison with N-glycans Derived from Tissues-The aim of this study was to further evaluate the potential of the studied CRC cell lines as model system. We therefore compared the N-glycan profiles derived from CRC cell lines (Supplemental Fig. 3 and Supplemental Table S4) with the data obtained for CRC-tumor and control tissues derived N-glycans by Balog et al. (10) (Supplemental Table S3). Profiles were further compared with recently published data on colorectal adenoma and carcinoma tissues (11), and most of the complex type glycans found in the cell lines were also present in tissues. Furthermore, typical cancer antigens like (sialyl) Lewis epitopes were preserved in the CRC cell lines. Also, the Sda-antigen that we found in CaCo2 cells was described to be expressed in the human gastrointestinal tract, while its expression is decreasing during CRC progression (38). The major difference between CRC cell line and tissue N-glycan profiles is the dominance of high-mannose-type glycans in most of the cells, which is partly due to intracellular precursors. Notably, high-mannose-type N-glycans were also present in CRC tissues (10, 11) and on the cell surface of CRC cell lines (Fig. 2B,  Supplemental Fig. S4) but in lower relative abundance than in the total cell line profiles determined in this study.
Glycan Trait Associations with Stage and Differentiation-To gain more insight in the biology of the N-glycans, glycan traits were tested for association with cell line characteristics as stage of the original tumor and differentiation. Coloring the PCA scores of PC1 and PC2 corresponding to the stages of the tumors from which the cell lines originated showed a gradient from stage I toward stage IV (Fig. 5A), indicating differences in fucosylation as well as the ratio of Hex/HexNAc based on corresponding loading plots (Figs. 4B and 4C). However, statistical evaluation by a Kruskal-Wallis nonparametric test did not reveal significant differences in fucosylation nor Hex/HexNAc ratio between colorectal cancer stages (data not shown). Interestingly, each disease stage group contained two subgroups that were identified as CDX1/ villin-positive and CDX1/villin-negative cell groups (data not shown). The homeobox gene CDX1 is an intestine-specific transcription factor associated with differentiation and villin expression and therefore was used as a marker for the differentiation state of the cell lines (35,39). Coloring the PCA plot according to CDX1/villin-expression showed a separation within the first two principle components (Fig. 5B), and statistical analysis using Mann-Whitney test revealed CDX1/villin-expression to associate with a terminal HexNAc (Fig. 5C, p value .047) as well as multi-fucosylation (Fig. 5D, p value .003). Coexpression data from the Cancer Cell Line Encyclopedia supported this association partly and showed CDX1 positively correlated with FUT3 (Pearson's correlation 0.63; Supplemental Table S8). Furthermore, glycosyltransferase B3GNT8, involved in the generation of LacNAc repeats was positively correlated with CDX1 (Pearson's correlation 0.79; Supplemental Table S8). We observed a trend toward higher poly-antennarity, reflecting more than four LacNAc repeats, in CDX1/villin positive cells but no statistically significance (data not shown). However, the cell line SW48 appeared as an outlier not following this association and expressing a CDX1/ villinϩ glycan phenotype while being CDX1/villin-.
Glycosyltransferase Expression-In order to obtain information on enzyme expression, mRNA of 17 glycosyltransferases (GTs) were obtained by real-time PCR for a subset of cell lines (all cell lines obtained from the Leiden University Medical Center; Supplemental Table S7). The PCA resulted in five principle components explaining 84.7% (R 2 Xcum) of the data with a good prediction power of 56.1% (Q 2 cum). The cell lines clustered per cell type, while technical replicates clustered closely together. Notably, taking the GT gene expression data, the location of the cell lines differs from MS-based data and cell lines HCT116 and DLD-1 do not cluster distinctly apart from the other cell lines (Supplemental Fig. S7A). In contrast, C10, HCT8, WiDr, and SW1116 seem to be different from the other compared cell lines taking the measured GT into account. The corresponding loading plot (Supplemental Fig. S7B) suggests high branching of N-glycans (MGAT4B, MGAT5) as well as LacNAc repeats (B3GNT3,8) and ␣2,3sialylation (ST3GAL3,4) for these latter cell lines, which is in accordance to obtained high gene expression levels (Supplemental Table S7). Cell lines SW1116 and WiDr, furthermore, have high levels of fucosylation in both the loading plot as well as mRNA data. CaCo2 localizes also apart from the majority of cell lines and shows in the loading plot only MGAT4A and partly ST3GAL6 associated (Supplemental Fig. S7B). The mRNA data support this observation since only MGAT4A and ST3GAL6 genes are highly expressed in CaCo2, while all other GT gene expressions are below the average of the tested cell lines (Supplemental Table S7).
Correlation analysis between derived glycan traits based on MS-data and corresponding GT mRNA levels showed signif-icant correlation for mannosyl-(␣1,3-)-glycoprotein-␤-1,4-Nacetylglucosaminyltransferase, isozyme A (MGAT4A; R 2 ϭ 0.363, p value ϭ .008), involved in the synthesis of tri-and multi-antennary glycans and the relative abundance of the corresponding glycan class (Supplemental Fig. S7C), as well as for fucosyltransferase (FUT) 4, the enzyme responsible for Lewis X epitopes on LacNAc repeats, with the trait multifucosylation (R 2 ϭ 0.2947, p value ϭ .020; Supplemental  Fig. S5D). Furthermore, ␤-galactoside-␣2,3-sialyltransferase 4 (ST3GAL4), involved in the production of sialyl Lewis X, showed significant correlation (R 2 ϭ 0.236, p value ϭ .048) with the relative abundance of ␣2,3-linked sialic acid containing N-glycans based on MS data (Supplemental Fig.  S7E). Overall, the mRNA data showed more overlap for GTs specifically involved in the N-glycan biosynthesis rather than with enzymes contributing to the biosynthesis (elongation/decoration) of more glycan classes, i.e. N-glycans, Olinked glycans and/or glycosphingolipids. However, FUT8, the enzyme responsible for ␣1,6-fucosylation of the N-glycan core, showed no correlation with the derived trait mono-fucosylation and instead showed the highest levels for SW48, SW480, SW620, and WiDr. This may indicate that, although MS/MS data suggested mainly core-fucosylated structures, mono-fucosylation is not only representative for core-fucosylation but also antenna-fucosylation. The discrepancy between GT gene expression and MS data shows that the value of GT expression data for predicting actual glycan structures is rather limited and that a better understanding of factors influencing protein glycosylation may be needed to improve this prediction. DISCUSSION We investigated the N-glycome of 25 CRC cell lines including 21 cell lines derived from primary tumors and four from metastases revealing vast glycosylation differences between the cell lines. By applying a PVDF-membrane-based N-glycan release in combination with a new derivatization technique for linkage-specific stabilization of sialic acids, we achieved a sensitive high-throughput MALDI-TOF-MS analysis of N-glycans from small numbers of cells. The method proved to be robust for data acquisition while the calculated, derived glycan traits showed an even more pronounced robustness than most individual glycans. Moreover, the differentiation between ␣2,3and ␣2,6-linked N-acetylneuraminic acids by linkage-specific derivatization adds major biological relevance to the results as it allows detection of the presence of sialyl Lewis antigens that are highly associated with cancer (40).
Often, physiological conditions are disrupted when cells are cultured in vitro without being in their natural surrounding containing stromal and other cells. The main critical points of cell lines as model systems are therefore the following: (i) Culturing conditions may influence the cellular phenotype. (ii) Tumor heterogeneity is not represented. (iii) The influence of the tumor environment is not given (15). Nevertheless, it was shown that genetic profiles as well as functional markers and morphological features are commonly retained for CRC cell lines (12,41). Also, our study revealed a major overlap in the expression of complex-type N-glycans between CRC cell lines and CRC tissue samples (10,11) as well as the expression of CRC/colon-associated epitopes such as (sialyl) Lewis antigens, the Sda antigen, blood group antigens, and terminal HexNAc, thereby revealing the potential of these CRC cell lines as N-glycomic model system. Similarly, high correlations between tissues and cell lines were described on the genomic level (17,19,41). Notably, the N-glycan profiles of the majority of analyzed CRC cell lines were dominated by high-mannosetype glycans (Ø 53.0% in our study), which is in accordance with recently reported data by the group of N. Packer (Ø 55.0% high-mannose N-glycans) (22) but which represents a major difference to N-glycans derived from CRC tissues. However, it should be noted that in comparison to control tissues, CRC tissues also exhibit elevated levels of highmannose-type glycans (10). Similar trends were also observed in breast cancer (42), and also human stem cells show a high abundance of high-mannose-type N-glycans (43). From a biosynthesis point of view, the accumulation of highmannose-type glycans in the CRC cell lines and tissues suggests an incomplete processing of N-glycans, possibly due to shorter division/replication times (44). Especially the glycan Man9HexNAc2Glc1 (m/z 2067.69) gives evidence to the presence of intracellular precursors in the analyzed total cell ho-mogenate glycan pools. Nevertheless, lectin-binding data showed strong interaction of several CRC cell lines with ConA (data not shown) suggesting the presence of high-mannosetype glycans. Of note, this lectin is not entirely specific for high-mannose N-glycans and can also bind to hybrid and di-antennary N-glycans, albeit with lower affinities (45). Investigations on the cell surface glycosylation of a subset of cell lines showed as well spectra with dominant peaks of highmannose-type glycans, proving the presence of high-mannose-type glycans on the surface but to a lower relative abundance as compared with their accumulation in whole cell homogenates as recently also shown for HEK cells (26). This cell surface protocol could not be performed in 96-well plates and involved large amounts of highly active PNGase F to allow short incubation times of 30 min in order to shave the glycans from the cell surface. This makes it less high-throughput suited and very costly. Furthermore, 4 million cells are needed for this protocol, which for some cell types is not achievable. Therefore, although containing intracellular glycans, our method offers a good and reliable alternative for the screening of N-glycosylation of cell lines and is suited for small sample amounts. To address the presence of intracellular precursors, high-mannose-type glycans were excluded for a major part of the trait calculations in order to observe true changes in complex type N-glycans.
In this study, we identified the presence of Sda antigen (NeuAc␣2-3[GalNAc␤1-4]Gal␤1-4GlcNAc-R) in Caco2 cell lines based on MS data, while mRNA data on the corresponding GT, B4GALNACT2 showed elevated levels for WiDr and SW1116 cells and not for CaCo2. The latter result is in contrast to previous findings by Dall'Olio and coworkers who detected B4GALNACT2 mRNA exclusively expressed in CaCo2 when tested with eight other CRC cell lines (not WiDr and SW1116) (46). They describe a varying activity of this GT and suggest B4GALNACT2 as marker of the colonic cell maturation (46). Furthermore, the group around Dall'Olio showed the inhibition of sialyl Lewis X formation through Sda antigen expression with LS174T colon cancer cells as model system (47). To our knowledge, there is no literature on the presence of Sda antigen in WiDr and SW1116.
The studies of Packer and coworkers on the N-glycome of membrane proteins from various CRC cell lines as well as CRC tissues using porous graphitized carbon LC-ESI-MS/MS (22,48) showed that the metastatic CRC cell line LIM1215 presents high levels of N-glycans containing bisecting GlcNAc as compared with two nonmetastatic cancer cell lines. We, as well, observed high levels of potentially bisected complex or hybrid type N-glycan structures (Hex ϭ HexNAc) in SW620 (38.4%), a cell line derived from lymph node metastasis. This is in contrast to literature, often describing the role of bisecting GlcNAc in suppressing metastasis (49). Moreover, the expression of MGAT3, the enzyme initiating bisection, was very low in SW620, suggesting that the suspected bisection is more likely to reflect terminal HexNAc on antennae. Also, our data did not show enhanced levels of potential bisection (Hex ϭ HexNAc) exclusively for cell lines of metastatic origin but also in the well-differentiated stage I cell line SW1116 (51.2%) for which the presence of bisecting GlcNAc was confirmed by LC-MS/MS experiments and also MGAT3 gene expression was largely elevated. This is in accordance with another study of Packer and coworkers describing high expression (15%) of bisecting GlcNAc for the cell lines SW1116, but also SW620, and only very low levels for SW480 (22). In line, our data showed very low expression levels of MGAT3 for SW480 and no indication for bisection was found in MS/MS. However, the discrepancy between enzyme gene expression and relative abundance of bisection in some of the cell lines reveal the limit of the applied MSbased methods as it cannot in all cases sufficiently be differentiated if the additional HexNAc represents a bisecting Gl-cNAc or a terminal HexNAc. An additional method may be offered by a novel lectin that recognizes terminal nonreducing GlcNAc residues, also on galactoses of N-glycan antennae, and was shown to stain colon cancer tissues and cell lines (HT29), whereas no or weak staining was observed in healthy tissues (60).
Regarding sialylation, we observed a higher level of ␣2,3sialylation and a decreased percentage of ␣2,6-sialylation in CDX1-positive cells that correlate with cell differentiation. In contrast, Sethi and collaborators reported on the presence of ␣2,3-sialylation exclusively in the poorly differentiated cell line LIM2405 (48). In general, overall levels of observed sialylation and fucosylation differ between this study and the studies of Packer and coworkers (22,48). Such dissimilarities in the results are not surprising since different methods of glycan extraction and purification as well as different analytical techniques-MALDI-TOF-MS(/MS) versus porous graphitized carbon LC-ESI-MS(/MS)-were applied. In addition, different culture conditions may play a role since it has been shown that the protein glycosylation is influenced by cellular, medium, and process effects, which is frequently addressed for the production of glycoproteins in the pharmaceutical industry (51). Also in our study, we found differences in the N-glycosylation profile between cell lines cultured in different laboratories and the impact of various factors on the glycan profile during the cell culture needs to be further investigated-not only for pharmaceutical productions but also for in vitro experiments with cancer cell lines in order to prevent misleading interpretation of results.
Interestingly, we observed a correlation between multi-fucosylation and expression of CDX1 and/or villin. CDX1 is an intestine-specific transcription factor associated with differentiation and villin expression (34,39). Investigations on cancer stem cell subgroups in CRC cell lines revealed a higher portion of cancer stem cells in nondifferentiating cell lines such as HCT116, which is accompanied with loss of CDX1 expression and a more aggressive phenotype (50). After forcing CDX1 expression in HCT116 cells, intestinal epithelial differentiation to crypt-forming colonies was observed (50). Furthermore, cell line HCT116 has a mutation in the GDPmannose-4,6-dehydratase (GMDS) and therefore very low fucose-levels and showed an aggressive phenotype, while restoration of the GMDS transcript and therefore enhanced fucosylation suppressed tumor formation and metastasis (52). Enhanced fucosylation was further described in early stages of cancer, while lower fucosylation was associated with later stages and cancer progression (53). In accordance, complex fucosylation decreased from colorectal adenomas toward carcinomas (11). In contrast, high antenna fucosylation of TGF-␤ was associated with poor prognosis and metastasis (54). We further found a correlation between CDX1/villin expression and glycans with terminal HexNAc (HexNAcՆHex ratio), in form of terminal HexNAc, bisecting GlcNAc, or Lac-diNAc structures, and a trend (not significant, data not shown) toward higher antennarity/LacNAc repeats was observed for CDX1/villin-positive cells. In line, Kawasaki et al. (55) identified highly fucosylated polylactosamine-type N-glycans in a CRC cell line SW1116 that are expressed specifically on CD26/ dipeptidyl peptidase IV and serve as ligands for mannanbinding protein, a C-type lectin involved in host defense and tumor growth inhibition in CRC cells. In this study, we found that SW1116, a well-differentiated cell line derived from a primary stage I tumor, is characterized by an overall enhanced presence of polylactosamine repeats as well as high levels of multi-fucosylation. Supporting the finding of Kawasaki et al., Chen et al. (56) reported that expression of ␤-1,4-galactosyltransferase 3, involved in the synthesis of (poly)lactosamine epitopes, significantly suppresses ␤1-integrin-mediated cell migration and invasion. For this subset of CRC cell lines, our findings in combination with other reports in literature support the idea that multi-fucosylation, and possibly also (poly)lactosamine repeats, are characteristics of differentiated cell lines expressing CDX1 and villin and exhibiting a less invasive, less aggressive phenotype. Notably, SW48 cells behave as outliers and show a more CDX1/villin-positive N-glycan phenotype. Clearly, further glycomic studies on more CRC cell lines as well as CDX1-knockdown cell lines and cells with forced CDX1 expression are needed to evaluate whether CDX1 acts as an inducer of the associated glycomic profiles, and assays on aggressiveness/invasiveness of corresponding cell lines are required to show the biological relevance.
The analysis of the CRC cell lines with regard to their N-glycome provided new insight in their glycosylation features that are lacking behind genetics and helps to evaluate cell lines as glycobiological model system. However, in order to choose an optimal model system, a larger-scale study comparing cell line glycosylation with tissue-derived glycan profiles is needed. Also, we observed major differences between the cell lines with regard to their N-glycosylation, demonstrating that the study of a single CRC cell line model can lead to the wrong generalization of glycobiological findings. Moreover, the similarity with regard to other glycan classes needs to be further studied, and the first results on O-glycosylation (22) showed major differences between CRC cell lines and tumor tissues. Therefore, cell lines need to be well characterized in all aspects, and assumptions and interpretations should refer only to the cell phenotype that has been studied. The high heterogeneity of cell line glycosylation can complicate the interpretation and comparison of results but also offers a major advantage. One of the main criticisms on cell lines as model systems is that they lack the ability to represent tumor heterogeneity. Since tumor heterogeneity can hardly be displayed in one single cell line, combining different cell lines in vitro might aid to mimic the tumor. Collecting detailed glycomic data on tumor subpopulations in a database can facilitate this approach by matching the characteristics to available cell line glycomic data and therefore improve the potential of cell lines as model systems.