N-linked Glycosylation Enrichment for In-depth Cell Surface Proteomics of Diffuse Large B-cell Lymphoma Subtypes*

Global analysis of lymphoma genome integrity and transcriptomes tremendously advanced our understanding of their biology. Technological advances in mass spectrometry-based proteomics promise to complete the picture by allowing the global quantification of proteins and their post-translational modifications. Here we use N-glyco FASP, a recently developed mass spectrometric approach using lectin-enrichment, in conjunction with a super-SILAC approach to quantify N-linked glycoproteins in lymphoma cells. From patient-derived diffuse large B-cell lymphoma cell lines, we mapped 2383 glycosites on 1321 protein groups, which were highly enriched for cell membrane proteins. This N-glyco subproteome alone allowed the segregation of the ABC from the GCB subtypes of diffuse large B-cell lymphoma, which before gene expression studies had been considered one disease entity. Encouragingly, many of the glycopeptides driving the segregation belong to proteins previously characterized as segregators in a deep proteome study of these subtypes (S. J. Deeb et al. MCP 2012 PMID 22442255). This conforms to the high correlation that we observed between the expression level of the glycosites and their corresponding proteins. Detailed examination of glycosites and glycoprotein expression levels uncovered, among other interesting findings, enrichment of transcription factor binding motifs, including known NF-kappa-B related ones. Thus, enrichment of a class of post-translationally modified peptides can classify cancer types as well as reveal cancer specific mechanistic changes.

phoma (DLBCL) 1 is the most frequent subtype of malignant lymphomas and is clinically heterogeneous (1). The molecular characterization of DLBCL based on gene expression profiling led, for the first time, to the identification of distinct DLBCL entities with significant differences in their pathogenesis, response to conventional treatment and clinical outcomes (2). In fact, gene expression signatures correlated these subtypes to distinct stages of B-cell development. Germinal-center B-celllike DLBCL (GCB) possesses a gene expression signature characteristic of germinal center B cells and has a favorable outcome compared with activated B-cell-like DLBCL (ABC) subtype which possesses a gene expression signature characteristic of B cells activated through their B-cell receptor (2). We have previously demonstrated the ability to segregate these subtypes based on their in-depth protein expression profiles in a cell line model derived from patients (3). Diagnosis in this system is particularly challenging because the two subtypes studied are histologically indistinguishable but could be differentiated by gene expression profiling (2).
The cell surface proteome of B cells plays a very important role in mediating interactions with the surrounding environment and is of particular importance in determining their fate. The B-cell receptor, for instance, is the key functional player on the surface of B cells, responsible for their development, peripheral maintenance and antigen-specific functional response. Other cell surface proteins such as ICAM-1 (CD54) have important roles in mediating the binding of B cells to other cell types. Furthermore, CD40 and CD80 bind to T-cell proteins (CD40L and CD28, respectively) and mediate costimulatory signals required for B-cell (and T-cell) activation. The large repertoire of B-cell surface proteins and the complexity of regulation of B-cell activation make the B-cell surface an interesting niche to explore tumorigenic differences.
In classic approaches like flow-cytometry, antibodies directed against known proteins are commonly employed to phenotype cells of different origin. This technology requires antibodies with high specificity and allows the multiplexing of up to 18 -36 different differentiation markers at a time (4). However, classifying closely-related tumors derived from the same cell type where it is not known which proteins are expressed on the cell surface and to what levels is a more complex problem that first requires an unbiased quantitative in-depth approach to analyze membrane proteins. Taking into consideration that glycosylation is a hallmark of membrane proteins we wanted to investigate the possibility of enriching for glycosylated peptides as a handle to explore the cell surface proteome. In addition, we wanted to ask the question if closely related tumor subtypes such as different DLBCLs can be classified by mass spectrometry (MS)-based proteomics on the basis of PTM-bearing peptides.
The cell surface proteome has been investigated by different approaches. One early method was optimized for the global analysis of both membrane and soluble proteins. It used high pH, which favors the formation of membrane sheets and proteinase K that cleaves the exposed hydrophilic domains of membrane proteins nonspecifically (5). More recent methods targeting the cell surface were based on capturing and covalently labeling glycan moieties on cell surface proteins. Based on such an approach a study on the immune cells using the cell surface capture (CSC) technology which covalently labels extracellular glycan moieties on live cells resulted in the identification of 104 proteins in Jurkat T cells, 96 proteins in an experiment comparing Jurkat T cells and Ramos B cells and 341 proteins in an experiment to detect cell surface changes during differentiation of embryonic stem cells (6). Using the same technology, the combined analysis of 19 B-cell precursor acute lymphoblastic leukemia (BCP-ALL) cases resulted in the identification of 713 cell surface proteins (7).
As glycosylation is increasingly being recognized as one of the key post-translational modifications involved in tumorigenesis with the potential for defining biomarkers, several glycoproteomic studies were performed to study different cancer entities (8). In some of these studies, the primary focus was to specifically capture cell surface and membrane N-glycoproteins based on hydrazide chemistry or lectin affinity approaches. Membrane N-glycoproteins were investigated in colon carcinoma (9), thyroid cancer (10), and breast cancer (11). A more recent study in breast cell lines allowed to distinguish between normal, benign and cancerous ones as well as luminal from basal breast cancer cells based on their glycoprotein profiles (12). Our laboratory has previously described an extension of the filter-aided sample preparation (FASP) method (13), in which lectins are placed on top of a filter where they selectively retain and enrich glycosylated peptides (14). This approach, termed N-glyco FASP, allows the characterization of thousands of glycosylation sites in complex biological samples such as cell lines, tissues and body fluids in evolutionary diverse species (15). For quantification, this method can also be combined with SILAC (14). In particular, for comparing a large number of unlabeled samples, the super-SILAC approach can be employed (16). It allows precise quantitative comparison of many samples whether cell lines or tissues by spiking in the same SILAC-labeled standard in each of them (16,17). The standard is generated in such a way that it encompasses as many proteins as possible of the system in question. For that purpose we had previously selected six lymphoma cell lines for a lymphoma super-SILAC mix based on their maximally distinct protein expression profiles (3). Here we decided to take advantage of the depth of the N-glyco FASP method and quantitative accuracy of super-SILAC approach to explore their applicability in the characterization and classification of DLBCL patients. The segregation of these lymphomas based on their quantified glycoproteomes would effectively classify closely related cancer subtypes on the basis of their pattern of post-translational modifications (PTM), a long standing aim of clinical proteomics. Furthermore, differences between DLBCL subtypes in glycosylation patterns or the expression levels of cell surface glycoproteins may reflect tumor associated hallmarks or characteristics of the stage of B-cell development from which these cells are derived. Therefore, characterizing tumors at the protein and PTM level has the potential to increase our understanding of tumor biology. In particular, the segregating signatures of closely related tumor subtypes could shed light on the corresponding biology related to the developmental stage from which the tumors are derived.
Cell lines from which we generated the super-SILAC mix were labeled with heavy amino acids by growing them in RPMI medium containing 13 C 6 15 N 2 -Lysine (Lys8) and 13 C 6 15 N 4 -Arginine (Arg10) (Cambridge Isotope Laboratories, Andover, MA) instead of the natural amino acids and supplemented with 20% dialyzed fetal bovine serum. We used quantitative mass spectrometry to assess the level of incorporation of the heavy amino acids after at least six passages. Almost complete incorporation was achieved in the six cell lines from which we generated the super-SILAC mix (Ramos, Mutu, BL-41, U2932, L428, DB), as less than 1% of tryptic peptides contained unlabeled arginine or lysine and less than 0.3% of identified peptides showed evidence of Arg to Pro conversion (3). The super-SILAC mix was generated by mixing equal amounts of the heavy lysates from the six cell lines.
Protein Digestion and N-glyco Peptide Enrichment-Equal amounts of the super-SILAC mix and the unlabeled cells (300 g) were mixed on a 30 KDa filter (Millipore, Billerica, MA) and further processed by the filter-aided sample preparation (FASP) method (13). Briefly, the SDS-containing lysis buffer was replaced with a urea buffer and this was followed by alkylation with iodoacetamide. The samples were then digested overnight by trypsin at 37°C in 50 mM ammonium bicarbonate followed by elution with water (2ϫ).
For N-glycosylation enrichment, tryptic peptides were transferred to a new filtration unit. They were mixed with a mixture of lectins (ConA, WGA, and RCA lectins) and incubated for 60 mins. Concanavalin A (Con A) binds to mannose; wheat germ agglutinin (WGA) binds to sialic acid as well as N-acetylglucosamine; agglutinin RCA120 binds to galactose modified at the 3-0 position as well as terminal galactose. On washing the samples, with a buffer composed of 20 mM Tris/HCl pH 7.6, 1 mM MnCl 2 , 1 mM CaCl 2 , 0.5 M NaCl, the unbound peptides were eluted, whereas the captured glycopeptides remained on the filter. The captured peptides were treated with PN-Gase F in H 2 18 O, which leaves a characteristic mass shift on the previously glycosylated site (18). This was followed by elution and measurement of the deglycosylated peptides (14).
LC-MS/MS Analysis-Deglycosylated peptides were separated by a nanoflow HPLC (Proxeon Biosystems, now Thermo Fisher Scientific) coupled on-line to an LTQ Orbitrap Velos mass spectrometer (Thermo Fisher Scientific) with a nanoelectrospray ion source (Proxeon Biosystems). Peptides were loaded with a flow rate of 500 nl/min on a C 18 -reversed phase column (20 cm long, 75 m inner diameter). The column was packed in-house with ReproSil-Pur C18-AQ 1.8 m resin (Dr. Maisch GmbH, Ammerbuch-Entringen, Germany) in buffer A (0.5% acetic acid). Peptides were eluted with a linear gradient of 8 -30% buffer B (80% acetonitrile and 0.5% acetic acid) at a flow rate of 200 nl/min over 145 min. This was followed by 20 min from 30 to 60% buffer B. After each gradient, the column was washed, reaching 90% buffer B followed by re-equilibration with buffer A. Data was acquired using a data-dependent "top 10" method, dynamically choosing the 10 most abundant precursor ions from the survey scan (mass range 300 -1800 Th) in order to isolate them in the LTQ and fragment them by higher energy collisional dissociation (HCD) (19). Full scan MS spectra were acquired at a resolution of 30,000 at m/z 400 with a target value of 1,000,000 ions. The ten most intense ions were sequentially isolated and accumulated to a target value of 40,000 with a maximum injection time of 150 ms. The lower threshold for targeting a precursor ion in the MS scans was 5000 counts. Fragmentation spectra were acquired in the Orbitrap analyzer with a resolution of 7500 at m/z 400.
Data Analysis-MaxQuant software (version 1.2.6.20) was used to analyze mass spectrometric raw data. We searched the MS/MS spectra against the Uniprot database (81,213 entries, release 2012_07) by the Andromeda search engine incorporated in the MaxQuant framework (20,21). Cysteine carbamidomethylation was set as a fixed modification and N-terminal acetylation, methionine oxidation and deamidation in H 2 18 O were set as variable modifications. A false discovery rate (FDR) of 0.01 was required for proteins and peptides. Enzyme specificity was set to trypsin allowing N-terminal cleavage to proline. A minimum of seven amino acids per identified peptide were required and two miscleavages were allowed. The initial allowed mass deviation of the precursor ion was up to 6 ppm and for the fragment masses it was up to 20 ppm. Mass accuracy of the precursor ions was improved by time-dependent recalibration algorithms of MaxQuant. The "match between runs" option was enabled to match identifications across different replicates. Quantification of SILAC pairs was performed by MaxQuant with standard settings with a minimum ratio count of two. We analyzed the MaxQuant output data with the Perseus tools, which are also available in the MaxQuant environment.

RESULTS AND DISCUSSION
Defining the Quantitative N-glycoproteome of Diffuse Large B-cell Lymphoma Cell Lines-We selected five ABC-DLBCL (HBL1, OciLy3, RIVA, TMD8, U2932) and five GCB-DLBCL (BJAB, DB, HT, SUDHL-4, SUDHL-6) cell lines derived from lymphoma patients. Our previous study had shown that these five ABC and five GCB cell lines can be segregated very clearly by principal component analysis based on their global protein expression profiles (3). We reached a depth of 7756 identified proteins, which allowed the extraction of a signature of 55 proteins that strongly distinguishes between these cancer subtypes. This finding confirms that these cell lines are good representatives of ABC and GCB lymphomas and therefore attractive models to investigate if closely related tumor subtypes can be characterized by a quantitative PTM-based approach.
Enriching for glycoproteins would provide a handle for cell surface proteins, which may be especially informative for classification and discovery of biomarkers. For the purpose of an unbiased large-scale enrichment, we used the FASPbased N-linked glycopeptide capture method (N-Glyco-FASP) ( Fig. 1 and "Experimental Procedures"). Briefly, the FASP-eluted glycopeptides were retained on a 30 kDa filter after mixing all peptides with a mixture of lectins, which aims at capturing the three N-glycan classes (high mannose, complex and hybrid). The large sizes of the glycopeptide-lectin complexes ensure their retention after washing away the nonglycosylated peptides. This method was shown to be efficient and unbiased in mapping the N-glycoproteome of mouse tissues and blood plasma (14) as well as in nonmammalian systems (15). Consequently, deglycosylation of captured Nglycopeptides was performed in 18  For an accurate quantitative comparison of the expression profiles of glycosites between ABC-DLBCL and GCB-DLBCL subtypes, we used a heavy-labeled super-SILAC mix of six lymphoma cell lines (3). The pooled lysates constituting the super-SILAC mix were spiked into each of the samples (five ABC-DLBCL and five GCB-DLBCL cell lines) in a 1:1 ratio before the first step of the glyco-enrichment experiment (Fig.  1). The resulting 10 samples were measured in quadruplicates with 165 min gradients. The total measuring time was less than 6 days.
Analysis using stringent filtering criteria in the MaxQuant software environment (20) resulted in the identification of 2383 glycosites, which mapped to 1321 protein groups (supplemental Tables S1 and S2). The median Andromeda identification score of deglycosylated peptides was 127 and the average localization probability of the glycosylation site to a single amino acid was 93%. Next, we filtered for sites with a localization probability greater than 0.75 (class I sites) and a score difference greater than 5 to the next best matching peptide in Andromeda. (Omitting the second filtering step would result in only two additional glycosites, namely beta-1,4-galactosyltransferase-score difference 4.7 and hypoxia up-regulated-protein 1-score difference 4.9.) Our analysis resulted in 2064 very high confidence sites mapped with single amino acid resolution to 1304 protein groups with average localization probability of 99.4% and only these were used in further analysis. Almost all of the high confidence sites were also quantified with at least two valid ratios (1967 of 2064).
In each of the cell lines we identified and quantified 1374 sites on average ( Fig. 2A). There was excellent overlap between the cell lines as 913 sites were quantified across all 10 cell lines (Fig. 2B). From the ratios of the individual samples to the super-SILAC mix we calculated the Pearson correlation coefficients between the measurements. Without exception, quadruplicates co-clustered in a very tight manner (see color code of correlation coefficients in Fig. 2C), demonstrating reproducibility and precision of our quantitative strategy.
General Characteristics of the DLBCL-cell N-glycoproteome-Almost 96% (1973 sites) of the glycosites we identified match the canonical N-!P-[S/T] motif. The 91 sites that did not match show enrichment for cysteine (p ϭ 2.3E-11) in the position of S/T, which confirms our previous observations (14) (Fig. 3A). Matching our data set to the Uniprot database shows that almost 82% (1687 sites) of the glycosites that we identified are annotated to be glycosylated (release 2012_07). However, for 1038 of these sites the annotation is based on prediction or similarity and therefore our results validate these  (4) HT (3) HT (2) HT (4) HT (1)  sites as newly experimentally confirmed N-glycosylation sites (Fig. 3B). To our knowledge this dataset constitutes by far the largest human B-cell lymphoma N-glycoproteome reported to date and adds substantially to the human database of N-glycosylation sites. Among the 1304 protein groups to which the glycosites are mapped, 923 protein groups were identified with a single glycosylation site. Very few protein groups (24 protein groups) were identified with more than seven sites (Fig. 3C), and the maximum number of N-glycosylation sites was measured on alpha-2-macroglobulin receptor with 19 sites. Insulin-like growth factor 2 receptor was identified with 13. Other proteins of regulatory interest turned out to be heavily glycosylated as well; for instance lymphocyte antigen 75 (DEC205, CD205), receptor-type tyrosine-protein phosphatase eta (PTPRJ) and lysosome-associated membrane glycoprotein 1 (CD107a, LAMP-1) each has more than seven sites. The latter protein also highlights that in addition to the cell surface proteome, our approach enriches intracellular N-glycoproteins. For 1082 sites (52%) the annotated topological domain was extracellular and for 225 sites (11%) it was lumenal. Glycosylation on lumenal domains occurs on lysosomal or ER proteins, for instance (supplemental Fig. S1).
The Proteome Versus the N-glycoproteome-We next compared our data set of N-glycosylated peptides and proteins to our previously measured in-depth proteome of the same cell lines (3). That proteome contained 7756 protein groups and 517 of these also occur in the 1304 protein groups in the N-glycoproteome (matching the N-glcyoproteome to the proteome) (Fig. 4A). Strikingly, 787 proteins were exclusively identified by their N-glycosylated peptides, attesting to the enrichment capacity of the workflow. In eukaryotes, N-linked glycosylation occurs on secreted or membrane bound proteins, which are often of low abundance, making them more difficult to detect in highly complex samples such as total cellular lysates. This is supported by the fact that the intensities of de-glycosylated peptides of proteins only identified in the N-glyco experiment are shifted to low intensity values compared with those where the corresponding protein was identified in the in-depth proteome experiment (blue versus red bars in Fig. 4B).
Having extracted a large set of glycoproteins at high sensitivity, we explored which subsets of proteins are enriched in the N-glycoproteome. To obtain a general overview of the cellular localizations and molecular functions of the identified glycoproteins, we analyzed the 1304 proteins groups using Uniprot keywords. The keywords with the highest coverage were "glycoprotein" (89.1%), "membrane" (75.8%), "polymorphism" (71.4%), and "trans membrane" (70.4%) (supplemental Table S3). Compared with the proteome the two keywords with highest enrichment are glycoprotein and signal. Cell membrane proteins and proteins associated with the lysosome, Golgi apparatus and endoplasmic reticulum (ER) were also highly enriched in the glycoproteome compared with the proteome (p Ͻ 1.5E-08). Interestingly, the extracellular matrix (ECM), a category that was difficult to capture without Nglyco-enrichment, is well represented in the N-glycoproteome (p Ͻ 3.5E-07) (Fig. 4C). As these are suspension cells, the ability to capture this set of proteins via the N-glyco-FASP method comes from the fact that proteins destined for secretion are glycosylated via the classical secretion pathways after passing through the ER and Golgi system. ECM proteins are highly enriched in proteins involved in cancer pathways (FDR 3.5E-05, p Ͻ 2E-07) including pathways such as Wnt signaling (FDR 0.00085, p Ͻ 1.5E-05) as well as Hedgehog signaling (FDR 0.0022, p Ͻ 6.5E-05).
Extracellular matrix proteins are intensely studied in cancer progression because of their role in cellular functions such as adhesion, cell shape, migration, proliferation, polarity, differentiation and apoptosis. The enrichment of this class of proteins via the N-glycoproteome allows comparative analysis of different modes by which cancer cells can manipulate their environment. Molecular function categories which are overrepresented in the glycoproteome include receptors, secreted proteins, cell adhesion proteins, glycosyl transferases and metalloproteases (Fig. 4C). These functions are characteristic for glycoproteins and correlate with their extracellular or lumenal location (supplemental Fig. S1).
To investigate how the abundance of glycosites compares to that of proteins, we used the proteome dataset as a reference where no enrichment was performed and quantification is based solely on unmodified peptides. The use of a common super-SILAC mix in both proteome and glycoproteome measurements allows for normalization of technical variance within each experiment and for comparative analysis of proteome versus N-glycoproteome measurements. We analyzed the 1203 glycosites belonging to the proteins that matched between both experiments after filtering for at least two valid values in each set of proteome or glycoproteome measure- ments. The ratios of the glycosites against protein ratios correlate well (Pearson r ϭ 0.78 on average) (Fig. 4D). This is an indication that in the DLBCL cell lines changes in N-glycosylation of a protein are usually a reflection of the change of protein abundance, as expected of a largely cotranslational and stable modification such as N-glycosylation. This is not necessarily true for all sites and indeed highlights the cases in which glycosylation levels and protein expression levels are differentially regulated. With the aim of revealing additional biological differences between the subtypes and to simplify the analysis, we considered the GCB cell lines and ABC cell lines as one entity each and used the median expression level of glycosites and proteins across them. We calculated the log of glycosite to protein ratios and found that there were only few outliers (supplemental Fig. S2). The two strongest ones that are hyperglycosylated in the GCB subtype are glycosites on HLA proteins (HLA-A and HLA-E). Glycosylation on MHC class Ia, for instance, is required for recognition by allogeneic cytotoxic T lymphocytes and to mediate cytolysis (22). Additional interesting hits that are hyperglycosylated in the ABC subtype are three glycosites on ENTPD1 (CD39). CD39 was first described as a B-lymphocyte activation marker (23). It is a prototypic member of the ecto-nucleoside triphosphate diphosphohydrolase (E-NTPDase) family that hydrolyzes extracellular nucleoside diphosphates and triphosphates. Biological actions of CD39 are a consequence of this activity on extracellular nucleotides (24). It has been shown that N-linked oligosaccharides affect the enzymatic activity of CD39 (25) whose role in B lymphocytes is not yet clear but may contribute to the affinity maturation of antibody responses and to facilitate post-germinal center terminal B cell differentiation (24).
Segregation of DLBCL Subtypes Based on Glycopeptide Signatures-Principal component analysis (PCA) converts a 5 5.5 6 6.5 7 7.5 8 8.5 9 9.5 10   large number of data points, which in our case are the SILAC ratios of the glycosites, to a small set of uncorrelated variables-the principal components. Applying this classical statistical test to the glycoproteome resulted in the correct segregation of the cell lines into their corresponding subtypes already based on component 1 (Fig. 5A), the one which accounts for the largest variability in the system (18% in this case). The deglycosylated peptides most strongly driving the segregation (the "loadings") belonged to CD44, IL4I1, CD205, PLBD1, SIRPA, LTBP1, MILR1, MME, and CD27 (Fig. 5B). Reassuringly, several of these proteins were among the strongest drivers of segregation between ABC-DLBCL and GCB-DLBCL in our previous global proteome. Such proteins included CD27, which is more abundant in GCB-DLBCL and CD44, CD205 and IL4I1, which are more abundant in ABC-DLBCL. The fact that both our global proteome and glycoproteome studies resulted in these candidates increases their likelihood to be true markers of segregation.
One of the most differentiating markers is CD44, which is up-regulated in the ABC subtype relative to the GCB subtype. CD44 is increasingly linked to the progression of different cancer subtypes as well as to cancer-initiating cells (CICs) also known as cancer stem cells. In fact, CD44 is the most common marker of CICs (26). In the context of B cells, BCL-6 transcriptionally blocks the expression of a set of genes including CD44 that are induced when B cells are activated (27). The high abundance and important roles of BCL-6 in germinal center B-cells (28) therefore explain the low relative expres-  (4) HT (1) HT (2) HT (3) HT (4) OciLy3 (1) OciLy3 (2) OciLy3 (3) OciLy3 (4) RIVA (1)  sion of CD44 in the GCB subtype, which is derived from germinal center B cells.
The second strongest driver of segregation is interleukin 4-induced protein 1 (IL4I1 or FIG1). The two deglycosylated peptides of this protein were both very strong segregators. Apart from providing additional positive control for our workflow, this finding indicates that the protein itself is highly up-regulated in the ABC subtype, which is indeed what we found in our previous proteome study, where IL4I1 was also a strong differentiator between subtypes. IL4I1 is normally activated by the IL4 receptor via STAT6. The expression of IL4I1 is regulated by NF-B signaling through the activation of B-cells through the CD40 pathway (29). Accordingly, we find the glycosites of CD40 as well as CD80, the two receptors required for T-cell dependent activation, also up-regulated in the ABC subtype.
Consistent with our findings, differential IL-4 induced gene expression and intracellular signaling in the two subtypes have already been reported (30). IL4I1 has an immunomodulatory role as it has been identified as a secreted L-phenylalanine oxidase that is capable of inhibiting T cell proliferation through producing H 2 O 2 . It mediates an immunosuppressive effect in vivo through blocking the CD8 ϩ antitumor T-cell response (31). Expression of IL4I1 has also been reported to be a characteristic of primary mediastinal lymphoma, the third subtype of diffuse large B-cell lymphoma (32).
On the opposite side (higher expression in GCB), two of the strongest drivers are MME (CD10) and CD27, which is also consistent with the results of our proteome study. High MME expression is prognostic for GCB (2) (33). Down-regulation of MME is mediated through an NF-B dependent mechanism, which explains the relatively lower level of expression of this protein in the ABC subtype which is characterized by activation of this pathway (34). CD27 is likewise suggested to be a marker with powerful prognostic value for DLBCL and has been included in several prediction algorithms. The serum level of CD27 is reported to be correlated with outcome of patients subjected to standard B-cell lymphoma (R-CHOP) treatment (35).
In contrast to the above mentioned drivers, allergin-1 (MILR1) and LTBP1 have not yet been associated with lymphoma classification. Allergin-1 is studied in the context of allergic responses where it has been shown to suppress IgE-mediated, mast cell-dependent anaphylaxis in mice. In this same study, it has been shown that allergin-1 was expressed on macrophages, neutrophils and dendritic cells as well as mast cells and/or basophils in both humans and mice (36). Interestingly, allergin-1 was also found to be expressed on human B cells (36). This broad expression pattern corresponds to the expression pattern of other immunoglobulinlike inhibitory receptors such as Fc␥RIIB, PIR-B, gp49B1, MAIR-I and SIRP-␣. SIRP-␣, which we also classified as a strong segregator of the two subtypes, is expressed on macrophages and dendritic cells and plays an important role in blocking phagocytosis through its interaction with CD47 (37), although its role on B cells is unknown. The activation of specific epitopes on the variable domain of CD47 resulted in a rapid induction of apoptosis in T cells (38). Thus, our data indicate that allergin-1 and SIRP-␣ might have important roles in nonallergic immune responses, possibly with relevance to the biology of lymphomas. LTBP1 belongs to the family of latent transforming growth factor beta (TGF-␤) binding proteins, which are master regulators of TGF-␤ bioavailability. In addition, LTBPs are integral components of the fibronectin and microfibrillar extracellular matrix (ECM). In the context of breast cancer, elevated LTBP1 levels appear in two gene signatures predictive of enhanced metastatic behavior. The role of LTBP1 in metastasis is unclear but it has been suggested that LTBPs may provide a bridge between structural and signaling components of the epithelial to mesenchymal transition (EMT) (39).
Next, we wished to perform a global analysis of these proteins to discover pathways or protein classes that have a major contribution to the segregation. We annotated the proteins based on the gene set enrichment analysis (GSEA) database (40,41), which consists a priori defined gene sets curated from publications or derived computationally as well as their promoter motifs. The two sets of genes up-regulated in the ABC subtype with highest enrichment were V$NFKB_ Q6_01 and V$NFKAPPAB_01. The first gene set corresponds to genes with promoter regions [-2kb, 2kb] around transcription start site containing the computationally derived motif NNNNKGGRAANTCCCN, which does not match any known transcription factor. However, the second motif-also highly enriched-is GGGAMTTYCC, which matches NF-B RELA. The proteins responsible for this enrichment included known NF-B regulated proteins such as ATP1B, CPD, ICAM1, PFN1, CD83, LTB, IL4I1, WNT10A, and SLC12A2. In fact, the ABC subtype is characterized by constitutive activity of the NF-B pathway. More specifically, it has been shown that NF-B signaling in ABC leads to nuclear translocation of p50/RELA heterodimers and to a lesser extent p50/c-REL heterodimers (42). Hence, despite the relatively small size of the N-glycoproteome it can reveal biologically relevant differences between the subtypes.
Unsupervised Hierarchical Clustering and t Test Signature-When performing unsupervised hierarchical clustering of the deglycosylated peptides, we again obtained perfect segregation of ABC and GCB into the two major branches in the dendrogram (Fig. 6A). To extract a glycopeptides signature that significantly segregates the two subtypes we performed a t test with a false discovery rate of 0.05 and S0 of 0.1, which resulted in a signature of 38 glycosites (Fig. 6B). Many of these glycosites occurred in proteins that were members of the proteomic signature previously found to segregate the subtypes (3), reflecting the high correlation in the level of expression between the glycosites and the corresponding proteins noted above. Specifically, the previous proteomic signature contained 10 glycoproteins (Uniprot keyword annotation), which are CD27, ICAM1, RCN, CD205 (LY75), IL4I1 (FIG1), CD44, SYPL1, HLA-C, ATPIB, and EVDB. With the exception of EVDB, all of these were identified in the glycoproteome study. In our glycosites signature, glycosites belonging to seven of these nine glycoproteins were shown to be significantly different between the two subtypes. This is quite remarkable and reassuring especially taking into consideration the small size of the glycosites signature, which is composed of 20 glycoproteins. This large overlap, prompted us to evaluate how much information the N-glycoproteome alone would add to the characterization of the system. To this end we subtracted the glycosites on markers already identified in our global proteome study or in mRNA profiling studies. This left 20 glycosites on 12 proteins exclusive to the glyco-signature. Remarkably, the PCA analysis segregated the two subtypes solely based on these exclusive glycosites (supplemental Figs. S3A and S3B).
Often in the glycosites signature, several deglycosylated peptides which belong to the same protein are significantly differentially expressed between the subtypes. This was most prominent in the case of CD205 (LY75) (5 peptides) and ICAM1 (4 peptides). CD205 belongs to the family of C-type lectin receptors (CLRs) which function as pattern recognition receptors recognizing carbohydrate ligands from infected microorganisms (43). CD205 was mainly studied in the context of dendritic cells where it is used as a docking site to deliver specific antigens (44). In the context of B cells it has been shown that CD205 modulates their phenotype causing upregulation of co-stimulatory molecules on the cell surface (45 IL4I1  CD44  CD83  SLC2A13  ICAM1  ICAM1  ICAM1  ICAM1  CD151  LY75  LY75  LY75  LY75  LY75  IGF1R  UGT8  PVRL1  PVRL1  PVRL1  PLBD1  SIRPA  SIRPA  SIRPA  PLBD1  IL4I1  CD274  ATP1B1 TMD8 (4) TMD8 (1) TMD8 (2) TMD8 (3) OciLy3 (4) OciLy3 (3) OciLy3 (2) OciLy3 ( It has also been shown that CD205 may have a role in promoting cell adhesion where blocking CD205 was suggested as a potential clinical strategy to interfere with early ovarian cancer metastasis (46). ICAM1 is an NF-B regulated cellsurface receptor from the immunoglobulin superfamily whose serum levels correlate with a higher tumor burden and dissemination in DLBCL (47). ICAM1 has a role in cell adhesion, a costimulatory role to ensure a proper T cell response as well as a role in lymphoid trafficking and extravasation (47).
Performing a Fisher's exact test (p Ͻ 0.01; enrichment factor Ͼ 5) on our signature glycoproteins after adding GSEA annotations reveals an enrichment of interesting gene sets (supplemental Table S4). The one with the highest significance is in fact V$NFKAPPAB_01 and the second set corresponds to genes up-regulated on an inflammatory response in macrophages. With the aim of obtaining a broader view of such enrichments, we performed a less stringent t test with an FDR of 0.1 and S0 of 0.01 which resulted in 57 differentially expressed sites. The Fisher's exact test on this signaturerequiring the same fivefold enrichment factor as beforeadded some new and interesting categories (supplemental Table S5). Two gene sets that correspond to IL6 regulated genes from two independent studies were enriched (DASU_ IL6_SIGNALING_SCAR_UP and BROCKE_APOPTOSIS_ REVERSED_BY_IL6). The role of IL6 in lymphomagenesis is interesting because upon transformation, B-cell lymphomas use IL6 paracrine signaling as a survival signal (48). In ABC-DLBCL in particular, NF-B signaling was shown to induce the expression of IL6, which leads to activation of STAT3 in an autocrine manner. In fact, combination treatments that block both NF-B signaling and STAT3 signaling are especially toxic to ABC-DLBCL as they work synergistically (49).
The evident biological relevance of the t test signature as well as the enrichment analysis highlight the potential of a PTM-based approach where in this case probing for membrane proteins revealed differential intracellular signaling.

CONCLUSION AND OUTLOOK
We have shown that the protein expression patterns of cell lines derived from ABC-DLBCL and GCB-DLBCL subtypes can unambiguously differentiate them (3). Taking proteomic approaches one step further, we wanted to investigate whether a specific set of functionally relevant proteins can also address this question. We focused this study on membrane proteins, which are key players in cancer cell biology and are located at the interface between a cancer cell and its environment. Taking into consideration the redundancy in the activated downstream signaling pathways, studying membrane proteins can be a more specific way of characterizing cancer cells which can help in classifying them and developing targeted therapies. We used the N-glyco-FASP protocol (14) as a tool to efficiently enrich for this class of proteins. The N-glyco-enrichment protocol does not use any chemical de-rivatization steps but does involve the extra steps of lectin enrichments and deglycosylation with PNGase F (14). As this could introduce additional variability, we performed this study in quadruplicates and used a lymphoma super-SILAC mix as an internal standard, which successfully minimized the effects of technical variations. High quantitative precision is necessary to differentiate the two subtypes especially in this case where quantification of N-glycosylation sites usually involves a single peptide (17).
Applying the N-glyco-FASP method on 10 lymphoma patient-derived cell lines resulted in a subset of the proteome highly enriched for membrane and secreted proteins. To our knowledge this is the largest membrane B-cell lymphoma proteome. This then enabled us to segregate the two closely related subtypes of DLBCL based on their N-glycoproteome expression profiles. Importantly, the loadings of component 1 which segregates the two subtypes in the principal component analysis include glycosites on proteins which we had suggested to be markers in our previous proteome study. This overlap further validates these easily accessible cell surface proteins as clinically interesting candidates. By implication, our novel candidates such as allergin-1 (MILR1), LTBP1 and SIRP-␣ should be very interesting targets for investigation as biological drivers of segregation. In addition to investigating these proteins on an individual basis, we tested bioinformatically for enrichments in the loadings of component 1. This revealed that one of the gene sets up-regulated in the ABC subtype corresponds to genes with promoter regions around transcription start site containing the motif for NF-B RELA. Therefore our unbiased approach links the differences in N-glycoproteomes to differential transcriptional regulation between the subtypes. Differential activity of NF-B signaling is considered to be one of the major pathways accounting for molecular differences between ABC and GCB subtypes of DLBCL. Hence, our approach can link differences in the glycoproteome to intrinsic biological differences between the subtypes using this small subset of proteins that was obtained in a straightforward and rapid manner.
Our study demonstrates that the enrichment of a single PTM can be used to differentiate between closely related tumor subtypes. This highlights the potential of targeting a particular set of proteins-in this case membrane proteinsthat could be of very high clinical relevance in cancer classification and provision of targeted therapies.
In conclusion, the continuous development of mass spectrometry-based technologies generates more and more exciting tools to describe the biology of cancer cells and thereby unlock their secrets. This is especially true for posttranslational modifications, which cannot be identified by genomics approaches. As a first step in this direction, we have here established that MS-based quantification of enriched glycosylated membrane proteins can distinguish between related lymphoma subtypes and to identify disease segregating, novel cell surface targets on B-cell lymphoma cells. This approach might provide the basis for the future diagnosis of subtypes of B-cell lymphomas or any closely related tumor subtypes and even of normal cells where an unbiased global screening of the cell surface is required.