Proteomics Profiling of CLL Versus Healthy B-cells Identifies Putative Therapeutic Targets and a Subtype-independent Signature of Spliceosome Dysregulation*

Chronic lymphocytic leukemia (CLL) is a heterogeneous B-cell cancer exhibiting a wide spectrum of disease courses and treatment responses. Molecular characterization of RNA and DNA from CLL cases has led to the identification of important driver mutations and disease subtypes, but the precise mechanisms of disease progression remain elusive. To further our understanding of CLL biology we performed isobaric labeling and mass spectrometry proteomics on 14 CLL samples, comparing them with B-cells from healthy donors (HDB). Of 8694 identified proteins, ∼6000 were relatively quantitated between all samples (q<0.01). A clear CLL signature, independent of subtype, of 544 significantly overexpressed proteins relative to HDB was identified, highlighting established hallmarks of CLL (e.g. CD5, BCL2, ROR1 and CD23 overexpression). Previously unrecognized surface markers demonstrated overexpression (e.g. CKAP4, PIGR, TMCC3 and CD75) and three of these (LAX1, CLEC17A and ATP2B4) were implicated in B-cell receptor signaling, which plays an important role in CLL pathogenesis. Several other proteins (e.g. Wee1, HMOX1/2, HDAC7 and INPP5F) were identified with significant overexpression that also represent potential targets. Western blotting confirmed overexpression of a selection of these proteins in an independent cohort. mRNA processing machinery were broadly upregulated across the CLL samples. Spliceosome components demonstrated consistent overexpression (p = 1.3 × 10−21) suggesting dysregulation in CLL, independent of SF3B1 mutations. This study highlights the potential of proteomics in the identification of putative CLL therapeutic targets and reveals a subtype-independent protein expression signature in CLL.

pathogenesis (5,6). More recently, the DNA methylation profile of CLL cases was shown to closely reflect that of the proposed cell of origin, namely memory B-cells (MBC) and naive B cells (NBC) for M-CLL and U-CLL, respectively. Interestingly, both studies identified a third epigenetic CLL subgroup with an intermediate methylation signature enriched within M-CLL with between 95 and 98% IGHV somatic mutations. These three CLL epitypes exhibit different clinicobiological features, with the MBC-like CLL cases exhibiting a more indolent clinical course (7)(8)(9)(10).
Although no single aberration appears to drive disease development, many recurring gene mutations and chromosome abnormalities have been described in CLL, and several have prognostic and/or predictive significance. Deletion of 17p and 11q which results in the loss of TP53, baculoviral IAP repeat-containing 3 (BIRC3) or ataxia-telangiectasia mutated serine/threonine kinase (ATM) respectively, are frequently associated with TP53 and ATM mutations on the remaining allele and poor outcome following chemo-immunotherapy (11,12). In contrast the most frequent cytogenetic abnormality in CLL, deletion of 13q, results in increased expression of the anti-apoptotic protein Bcl-2, largely because of loss of miRNA15 and miR16 -1, and is associated with a good prognosis, particularly in M-CLL.
Other recurrent mutations also influence disease progression and treatment response. Next generation sequencing studies have confirmed that mutations of splicing factor 3B subunit 1 (SF3B1) (13) and neurogenic locus notch homolog protein 1 (NOTCH1), a transmembrane receptor and transcriptional regulator determining cell fate (14), are the most frequent recurring mutations in CLL, with an incidence of ϳ18% and 12%, respectively, at the time of initial treatment. Mutations in either gene are associated with a poorer outcome following chemo or chemo-immunotherapy and NOTCH1 mutations are also predictive of a poor response to chemotherapy plus anti-CD20 antibody combinations (15,16). SF3B1 is a spliceosome component with a role in the regulation of pre-mRNA intron excision. Heterozygous missense mutations of the C-terminal HEAT domain are the most frequent alteration to SF3B1, impacting spliceosomal function (17). Indeed, SF3B1 mutation has been shown to induce large numbers of aberrantly spliced and altered gene products in CLL (18). The frequency of SF3B1 mutations and the resulting changes to spliceosomal activity suggest that dysregulation of the spliceosome plays a prominent role in CLL pathogenesis.
Liquid chromatography with mass spectrometry (LC-MS) proteomics provides an effective means of establishing global differential protein expression profiles. Several studies have previously employed MS proteomics in the analysis of CLL (recently reviewed (19,20)); with the latest, a characterization of 9 U-CLL versus 9 M-CLL, identifying 3521 and relatively profiling 2024 proteins, of which ϳ100 proteins were differ-entially expressed (Ϯ1.5-fold change, p Ͻ 0.05) (21). Several earlier studies provided limited proteome coverage but suggested a number of markers of poor prognosis, such as nucleophosmin, PDCD4 and TCL1 (22)(23)(24)(25)(26)(27). Technological and methodological advances such as Orbitrap technology, isobaric labeling and two dimensional chromatography have greatly improved proteomics coverage, leading to the first comprehensive drafts of the human proteome (28 -30).
Our current discovery-stage study has applied isobaric labels and LC-MS proteomics to the characterization of isolated B-cell material from 3 healthy donors and 14 CLL patients. CLL samples were selected to include a range of clinically relevant CLL subtypes associated with poor prognosis versus healthy donor B-cells with the aim of assessing CLL-specific differential protein expression. The resulting quantitative proteomes identified a strong signature common to CLL, highlighting several potential therapeutic targets and suggesting mechanisms, such as spliceosome overexpression, contributing to pathogenesis.

Isolation of Healthy Donor and CLL B-cells From Clinical
Samples-Ethical approval for the use of human samples was granted under REC references 228/02/t (Southampton) and 06/Q2202/30 (Bournemouth). Peripheral blood mononuclear cells (PBMCs) were derived from healthy donors and CLL patients by Southampton Blood Services or Bournemouth Tissue Bank, respectively. PBMCs were isolated from whole blood or blood cones by density gradient centrifugation as previously described (3). PBMCs were frozen at 5 ϫ 10 7 cells/ml in FCS containing 10% v/v DMSO. Healthy donor and CLL patient PBMCs were defrosted, washed and B-cells isolated by negative selection using the EasySep human B-cell enrichment kit without CD43 depletion, according to the manufacturer's instructions. The enriched B-cells were washed 3 times in excess PBS. B-cell purity was assessed by immunostaining with CD19 and CD3 by flow cytometry and yielded at least 90% purity in all samples (supplemental Table S1). Isolated cell pellets were snap frozen and stored in liquid nitrogen before lysate preparation.
Experimental Design and Statistical Rationale-CLL samples were selected to include a range of clinically relevant CLL subtypes associated with poor prognosis. Samples were assigned evenly across two batches of analyzed samples, to accommodate the capacity limits of the isobaric labels (supplemental Table S1). To simultaneously analyze and provide relative quantitative comparability of 14 CLL proteomes, samples were assigned to two tandem mass tag (TMT) 10-plex label sets. Three healthy donor B-cell (HDB) samples were used as both biological and inter-experimental bridging con-trols. HDB peptides were prepared and TMT-labeled before bifurcation and allocation to each 10-plex. The selected samples included (non-discretely) 5 Trisomy 12 cases, 3 CD38 ϩ (Ͼ99%) cases, 8 unmutated IGHV cases, 5 NOTCH1-mutant cases and 5 SF3B1-mutant cases (supplemental Table S1).
To minimize batch effects, reduce technical noise and disregard inconsistent observations, a stringent approach considering only the smallest of the CLL to HDB fold changes was adopted; detailed in Quantitative and Statistical Analysis of MS Data below. CLL:HDB ratios were evaluated by an FDR-corrected 1 sample t test and compared alongside a value termed the regulation score (Rs) (31), a more robust measure than an average, also detailed below. With 14 samples, in those instances where variation was as low as 25%, a fold change of ϳ1.25 could be concluded, based on the power analysis calculations presented by Levin (33). Use of standard deviation as the denominator in the regulation score adjusted for variation on a case by case basis. At a threshold of RsϾ0.3, the lowest fold change defined as significantly regulated was 1.25, observed with an average variation of 18%.
Sample and Peptide Preparation and Labeling for LC-MS-Snap frozen cell pellets were lysed on ice by trituration with a 23 gauge needle in 0.5 M TEAB containing 0.05% w/v SDS. Disrupted cells were further sonicated on ice and lysates cleared by centrifugation at 16,000 g for 10 min at 4°C. Protein concentration was determined by a Direct Detect Spectrometer (Millipore, Billerica, MA). One hundred g of cell lysate (200 g for each HDB sample, for use across two 10-plex experiments) was reduced with 50 mM TCEP and alkylated with 200 mM MMTS, before digestion overnight at RT with a 30:1 ratio of proteomics grade trypsin. Peptides were incubated with TMT 10plex isobaric tags according to the manufacturer's instructions. CLL sample peptides were assigned to two TMT 10-plex label sets, A and B, respectively, as follows: Peptide Prefractionation-Peptides were cleaned using three repetitions of solid phase extraction using 100 l C18 zip tips according to manufacturer's instructions. Peptides were lyophilized and reconstituted in 2% v/v ACN, 0.1% v/v NH 4 OH, and resolved using high-pH RP C8 chromatography (150 mm ϫ 3 mm ID x 3.5 m particle, XBridge) (Waters, Milford, MA) at 300 l/min with a LC-20AD HPLC system (Shimadzu, Kyoto, Japan) maintained at 30°C, using the mobile phases (MP); A -99.9% H 2 O, 0.1% NH 4 OH, B -99.9% ACN, 0.1% NH 4 OH. The 120 min gradient was as follows; 0 min; 2% B, 10 min; 2% B, 75 min; 30% B, 105 min; 85% B, 120 min; 2% B. Fractions were collected in a peak-dependent manner and individually lyophilized. The top 25 high-abundance peptide fractions-reproducibly between both experiments-were selected for individual analysis and the remaining fractions were orthogonally concatenated.
Peptide Fraction Analysis by LC-MS/MS-Lyophilized peptide fractions were individually reconstituted in 2% ACN, 0.1% FA and ϳ500 ng of peptides loaded by a Dionex Ultimate 3000 (Thermo Scientific) and analyzed by LC-MS/MS, described previously (31). In summary, peptides were trapped by C18 and eluted over a reverse phase gradient of 8 h (the richest fraction), 5 h (2nd-4th richest), 4 h (6th-13th richest), and 3 h (remaining and pooled). Gradients for each group (8,5,4  Peptides were analyzed with an LQT-Orbitrap Elite Velos Pro hybrid mass spectrometer (Thermo Scientific). MS analysis of eluting peptides was conducted between 350 and 1900 m/z at 120,000 mass resolution. The top 12 ϩ 2 and ϩ3 precursor ions per MS scan (minimum intensity 1000) were characterized by tandem MS with high-energy collisional dissociation (HCD) (30,000 mass resolution, 1.2 Da isolation window, 40 keV normalized collision energy) and CID (ion trap MS, 2 Da isolation window, 35 keV). Additionally, the DMSO ion at 401.922718 (MS1) (32) and the TMT-Hϩ ion at 230.170422 (MS2) were used as lockmasses and the MS was calibrated weekly.
MS Data Processing-Target-decoy searching of raw spectra data was performed with Proteome Discoverer software version 1.4.1.14 (Thermo Scientific). Spectra were subject to a two stage search, both using SequestHT (version 1.1.1.11), with Percolator used to estimate FDR with a threshold of q Յ 0.01. In both searches, fragment ion mass tolerances of 0.02 Da and 0.5 Da were used for HCD and CID spectra, respectively. Fixed modifications of Methythio (C), TMT (K and N terminus) were used, searching for tryptic peptides. The first allowed only a single missed cleavage, minimum peptide length of 7, precursor mass tolerance of 5 ppm, no variable modifications and searched against the human UniProt Swissprot database (downloaded January 2015; 20,159 protein sequences). The second search used only spectra with q Ͼ 0.01 from the first search, allowed 2 missed cleavages, minimum peptide length of 6, searched against the human UniProt trembl database (downloaded 01/15, 67,812 protein sequences), precursor mass tolerance of 10 ppm and a maximum of 2 variable (1 equal) modifications of; TMT (Y), oxidation (M), deamidation (N,Q) or phospho (S,T,Y). PhosphoRS was used to predict the probability of specific phosphorylated residues. Reporter ion intensities were extracted from non-redundant PSMs with a tolerance of 20 ppm. Data from the two 10-plex experiments was searched separately and combined for protein grouping by Proteome Discoverer. The raw data and processed outputs have been deposited to the ProteomeXchange Consortium (33) via the PRIDE partner repository with the dataset identifier (PXD002004).
Quantitative and Statistical Analysis of MS Data-Log 2 (ratios) were generated describing each CLL sample relative to the three HDB bridging controls (supplemental Table S2). To select for only the most robust and consistent findings, the ratio of CLL:HDB control was assessed for all 3 HDB controls and, rather than a potentially misrepresentative average value, the ratio with the lowest fold change was selected. Findings with both up and downregulation for a protein among the 3 CLL:HDB ratios were rejected. For example: for foldchanges of 1.4, 1.5 and 1.8, 1.4 was selected; for fold-changes of Ϫ2.2, Ϫ1.7, Ϫ1.9 -Ͼ Ϫ1.7 was selected; for fold-changes of 1.5, Ϫ1.9, 1.3 -Ͼ a value of 0 was taken.
To reduce ratio compression, peptide spectrum match data for proteins (qϽ0.01) were exported from Proteome Discoverer and submitted to Statistical Processing for Isobaric Quantitation Evaluation (SPIQuE) at spiquetool.com. This method weighted the contributions of each PSM quantitation to a protein's quantitation based on PSM features (manuscript in preparation). For example, high-intensity peptides with low isolation interference were given a greater weighting factor, as previously applied (31).
Because of the substantial differences in non-B-cells present in the HD and CLL PBMCs, contamination emerged in the HDB samples as a number of downregulated proteins. To accommodate for the contamination, the analyses primarily focused on proteins overexpressed, and therefore specifically attributable to CLL. As the most apparent source of contamination in the proteomics results were platelet-derived proteins, a filtering list consisting of the 1000 most abundant platelet proteins was generated from a prior proteomics analysis of platelets, defined by copy number (described previously (34)). Of these 1000 proteins, 194 were observed with apparent overexpression in the HDB samples which were subsequently removed from the dataset. Additionally, the WT CLL sample 4621 appeared as an outlier and was therefore excluded from further analyses.
For the minimally deviated log 2 (ratios), differential expression in CLL, relative to HDB, was defined by an FDR-corrected one sample t test, where p Ͻ 0.05 and the regulation score (Rs), where RsϾ0.3 or Ͼ-0.3. Rs provided a single, robust measure, representative of both the magnitude and consistency of differential expression (Rs ϭ mean/ (standard deviation ϩ1)). For proteins with no, or inconsistent differential expression, the Rs tended toward 0.
Bioinformatics Analyses-Proteins reaching the thresholds outlined above were submitted to either Ingenuity Pathway Analysis (IPA) or Database for Annotation, Visualization and Integrated Discovery (DA-VID). For DAVID analyses, the default settings were used for pathway and gene ontology (GO) term enrichment, with Benjamini-corrected p values of Ͻ0.05 considered significant. For IPA analyses, default settings were used. Annotations of biomarkers and drug targets were conducted by IPA. Hierarchical clustering was conducted using log 2 (ratios) and Cluster 3.0 (University of Tokyo, Human Genome Centre) clustering based upon Euclidian distance and complete linkage. Java TreeView (version 1.1.6r2) was used to visualize the clustering. Chromosome enrichment analysis was performed in DAVID, using default setting and visualized using Ensembl geneome browser.
Flow Cytometry-Cells were stained with either the manufacturer's recommended concentration, or 10 g/ml, of in-house antibody for 30 min in the dark at 4°C, washed and analyzed by flow cytometry with a FACScalibur (BD Biosciences, Franklin Lakes, NJ) (35,36).

Quantitative Proteomics of CLL Relative to Healthy Donor B-cell Samples Identifies a Consistent, Reproducible CLL
Phenotype-The MS and TMT workflow applied to measure relative differential expression in the proteomes of CLL cells versus HDB controls is summarized in Fig. 1. A total of 8694 proteins were identified (qϽ0.01) of which 5956 proteins were relatively quantified for all samples (supplemental Table S2).
Hierarchical clustering of differential expression across the 14 CLL samples, relative to HDB, highlighted a broad CLLspecific signature with no distinctly clustered subtypes ( Fig.  2A, 2B). The reproducibility of this signature was also apparent when comparing the average CLL expression determined in the two distinct 10-plex experiments (R 2 ϭ 0.799) and the technical reproducibility between the HDB bridging controls (R 2 ϭ 0.797) (supplemental Fig. S1).
544 significantly overexpressed and 592 significantly underexpressed CLL proteins, relative to HDB, were identified with a regulation score of Ͼ0.3 or Ͻ-0.3, respectively (Fig.  2C). Among the overexpressed proteins were examples of well-characterized hallmarks of CLL; CD5, BCL2, CD23 and ROR1. Eighteen proteins were identified with a greater overexpression than CD5, several of which were previously undescribed in CLL (Fig. 2D).
Markers differentiating CLL subtypes were also observed by proteomics. Integrin alpha 4 (CD49d) expression was significantly higher in trisomy 12 cases as previously described (37), IgM expression associated with U-CLL cases (3), CD38 expression was correctly identified as significantly higher in CD38 ϩ (Ͼ99%) cases, and the Y-chromosome-encoded proteins (EIF1AY and RPS4Y1) allowed the differentiation of patient genders (supplemental Fig. S1). Subtype specific differences emerging from the proteomics were considered (supplemental Fig. S3, Table S3), however lacked the statistical power for significant findings.
Putative Novel CLL Immunophenotypes and Drug Targets-In addition to confirmation of established characteristics, several novel observations emerged among the most upregulated proteins (Fig. 2C, 2D). Cytoskeleton-associated protein 4 (CKAP4), a cell surface receptor for antiproliferative factor (APF), was the most significantly and consistently upregulated CLL protein (250 -590% HDB expression, p ϭ 1.8 ϫ 10 Ϫ8 ) with 34 unique peptides identified from over 500 PSMs and has not previously been reported in CLL at the protein level. At the mRNA level, no overexpression was detected (supplemental Fig. S2) (38). To validate this finding, Western blotting was employed, evaluating CKAP4 expression in the proteomics-analyzed samples and an independent cohort of 10 additional CLL samples, relative to HDB controls (Fig. 3B,  3C). CKAP4 was observed to be substantially overexpressed in all but two CLL samples, with little CKAP4 expression observed in healthy B-cells.
INPP5F (SAC2), an inositol 4-phosphatase with a role in AKT signaling, was also found to be among the top 20 overexpressed CLL proteins (200 -500% HDB expression in 12/14 samples, p ϭ 1.2 ϫ 10 Ϫ5 ). Western blot validation (Fig. 3B, 3C) confirmed this overexpression, but appeared more variable in the independent cohort than observed by proteomics. The G2 checkpoint kinase, Wee1 was observed with significant overexpression in CLL (Ͼ175% HDB expression in 9/14 samples, p ϭ 4.3 ϫ 10 Ϫ5 ). Annotation by IPA highlighted Wee1 as a potential therapeutic target of the inhibitor MK1775. Again, overexpression was confirmed by Western blotting with the majority of CLL samples having greater Wee1 expression than HDB samples. BCL2, ROR1 and CD79b expression were also evaluated to demonstrate the concordance between the proteomics-and Western blotting-derived ratios for proteins with known over-or under-expression in CLL.
In addition to the proposed cell surface expression of CKAP4, CLL proteomics identified 20 consistently upregulated proteins annotated with surface localization (Fig. 4). This list highlighted 10 putative novel markers, CKAP4, PIGR, LAX1, CLEC17A, ATP2B4, TMCC3, ST6GAL1, ATP1B1, C17orf80, and NPTN (supplemental Fig. S4), with no previous descriptions of protein upregulation in CLL, providing them as targets for future evaluation for roles in CLL biology or as bio-markers. PBMCs from 14 CLL patients and 3 healthy donors were subjected to negative B-cell isolation followed by whole cell lysis, reduction, alkylation and trypsin digestion. 100 g of peptides from each CLL sample was assigned to one of two tandem mass tag (TMT)-labeled 10-plex experiments. 200 g of each healthy donor B-cell (HDB) protein lysate was labeled and bifurcated to provide bridging controls across the two 10-plex experiments. Each 10-plex was handled and analyzed separately using 2-dimensional liquid chromatography coupled with data-dependent mass spectrometry; in each case, 60 peak-dependent fractions were analyzed. Peptides were identified from mass spectra using target-decoy searching (false discovery rate of Ͻ1%). The identified proteins were quantitated from isobaric labels relative to HDB bridging controls and differential expression analyzed to identify CLL-specific differences in protein expression.

B.
C. -   A, Proteomics-derived quantitations for proteins previously described with overexpression in CLL, relative to HDB controls. B, Western blot validation of differential protein expression observed in an independent cohort of CLL samples versus HDB controls. C, Comparison between the differential expressions observed for key proteins by Western blotting and proteomics for CLL and HDB controls.

TM T A A A A B B A A B A B B B B A B A B A B Gender M M F F M M M M M M M M M F
Although for CKAP4, and to some extent PIGR, surface localization may be affected by internalisation, other candidates, such as lymphocyte transmembrane adapter 1 (LAX1), transmembrane calcium-transporting ATPase (ATP2B4) and prolectin (CLEC17A) are likely primarily localized to the cell periphery. LAX1, ATP2B4, and CLEC17A may

FIG. 4. Proteomics identification of CLL-overexpressed cell surface proteins.
Proteomics-derived quantitations for the 20 most consistently upregulated cell surface-expressed proteins in CLL, relative to HDB. The data represent the number of unique peptides and PSMs, the Log 2 (ratios) relative to HDB, average Log 2 (ratios) and proposed function and evidence for prior observations in CLL. additionally present potential effectors of BCR signaling (39 -42).
Next, overexpressed CLL proteins were annotated for their potential as therapeutic targets based on existing drug/inhibitor knowledge using IPA (Fig. 5). Both functional heme oxygenase isoforms (HMOX1 and HMOX2), which offer cytoprotective effects through free heme degradation, were observed upregulated and annotated as targets of tin mesoporphyrin. Histone deacetylases HDAC3 and HDAC7, and to a lesser extent HDAC1 and sirtuin 5, also presented putative targets of inhibition. This list additionally contained several kinases as therapeutic targets, such as two mitogen-activated protein kinases, MAPK8/JNK1 and MAPK13; two tyrosine-protein kinase proto-oncoproteins, FGR and LCK; and two cell cycle progression kinases, cyclin-dependent kinase 7 (CDK7) and the G2 checkpoint kinase WEE1.
Upregulated KEGG pathways were also interrogated, identifying a 9-fold enrichment of spliceosome components (n ϭ 36, p ϭ 1.3 ϫ 10 Ϫ21 ) (Fig. 7A). A further 60 components were identified with marginal upregulation (0.1ϽRsϽ0.3, p Ͻ 0.05). Fig. 7B presents all proteins within the quantified proteome which map to the KEGG spliceosome pathways, highlighting a near-consistent trend of some degree of over-expression across the CLL samples relative to HDB. DISCUSSION CLL has been the subject of numerous investigations applying genomics and transcriptomics that have contributed greatly to the clinical and biological understanding of the disease (6,(43)(44)(45). However, low correlations observed between mRNA and protein expression limits insight from these studies (46,47). Indeed, a comparison between a previous transcriptomics analysis of CLL (38) and these proteomics results highlight minimally correlated differential expression (supplemental Fig. S2). Proteomics has been applied to CLL in several investigations providing insight into potential differences between subtypes and some CLL-specific signatures (21,22,25,26,48,49). To date, however, CLL proteomics studies have lacked sufficient coverage to identify most expressed proteins and are yet to fully explore comparisons with healthy donor B-cell controls.
This study aimed to implement advances in quantitative LC-MS proteomics to provide a detailed characterization of a broad spectrum of CLL samples and evaluate changes in protein expression relative to B-cells derived from healthy donors. Overall, this investigation provided a substantial, reproducible (supplemental Fig. S1) and representative (Fig. 3) description of the CLL proteome to a depth of almost 6000 proteins. Additionally, the accuracy of the results for individual samples was highlighted by the expression of key subtype markers (supplemental Fig. S1) suggesting the potential of the presented methods for the dissection of subtype-specific differences in CLL and other cancers in the future.
The most striking finding was that of a consistent subtypeindependent expression profile across the CLL samples (Fig.  2B). Given the heterogeneous clinical nature of CLL, some variation and clustering of subtypes was anticipated, although homogeneity among CLL subtypes has also been observed previously by transcriptomics (5,6). This suggests that the phenotypic differences between CLL subtypes may either be a product of post-translational modifications, microenvironment interactions or CLL niche-specific characteristics. Further studies with greater sample numbers will be required to better understand these potentially subtle differences in protein expression.
Phenotypic differences exist between CLL cells in lymph nodes and peripheral blood (50), suggesting that evaluation of CLL cells from additional niches may be required to understand the differences in disease behaviors observed between subtypes. Furthermore, evaluation of fractionated B-cell subsets from several niches would serve as more informative controls. Indeed, given recent insights from methylation studies relating to the likely cell of origin for different CLL subtypes, B-cell subsets relevant to each CLL sample (i.e. MBC for M-CLL and NBC for U-CLL) should be evaluated, guided by their CpG methylation signatures (7)(8)(9)(10). Currently, we are not aware of proteomics data describing these B-cell subsets.
The findings presented here provide several potential novel hypotheses for further investigation. CKAP4, for instance (Fig.  2D, 3), was robustly identified as a highly abundant, overexpressed, putative surface protein in CLL and validated in a separate cohort by Western blotting. This offers potential mechanistic insight into CLL and presents a prospective clinical tool. In addition to roles in the endoplasmic reticulum and as a transcription factor (51), CKAP4, also known as CLIMP-63, can act as a cell surface receptor for tissue plasminogen activator (tPA) (52), surfactant protein A (SP-A) (53) and antiproliferative factor (APF) (54). APF treatment of a bladder  5. Proteomics identification of CLL-overexpressed drug targets. Proteomics-derived quantitations for the 20 most consistently upregulated annotated targets of small molecular inhibitors in CLL, relative to HDB. Proteins were annotated using IPA. The data represent the number of unique peptides and PSMs, the Log 2 (ratios) relative to HDB, average Log 2 (ratios) and IPA-annotated drugs known to target each protein.
cancer cell line resulted in reduced proliferation, attributable to substantially reduced phosphorylation of AKT and GSK3␤ and an increased expression of p53 (55). Interestingly, substantial CKAP4 overexpression was also observed in tumors of the E-TCL1 mouse model of CLL suggesting a potential means of investigation (31)  FIG. 6. Bioinformatics analysis of the CLL-overexpressed proteome. A, GO term enrichment for the 544 overexpressed proteins (RsϾ0.3, p Ͻ 0.05). The benjamini-corrected GO term enrichment p values were plotted against the number of CLL-upregulated proteins annotated with each term, additionally highlighting the observed fold-enrichment relative to the number expected by chance. B, IPA enriched canonical pathway "cleavage and polyadenylation of pre-mRNA" (p ϭ 2.0 ϫ 10 Ϫ10 ) and C, "preinitiation complex assembly" (p ϭ 1.7 ϫ 10 Ϫ4 ).
The identification of several consistently upregulated membrane proteins in CLL versus HDBs (Fig. 4, supplemental Fig.  S4) highlighted the potential of proteomics approaches to identify novel immunotherapy targets for selective targeting with monoclonal antibodies. Additionally, identification of proteins linked with BCR signaling should enable a better understanding of this process in CLL which may enable improved therapeutic targeting of this currently incurable disease (39 -42). LAX1 was shown to be phosphorylated upon BCR stimulation by Src and Syk (39), with B-cell hyper-responsiveness observed in LAX1 Ϫ/Ϫ mice suggesting a regulatory role in BCR signaling (57). ATP2B4 has a role in BCR-induced calcium efflux (40) and Prolectin, also known as CLEC17A, is expressed in germinal center B-cells where expression correlates with proliferation (58). Prolectin also has a potential role in BCR signaling, through an association with BLNK (42).
The analysis of drug targets further highlighted the potential of proteomics in the identification of putative clinical tools (Fig. 5). Wee1 overexpression, for instance (Fig. 3B), suggests a potential target to inhibit the cell cycle, with the inhibitor MK1775 shown to have therapeutic benefit in other cancers (59). The upregulation of HMOX1 and 2 suggested an increased degradation of free heme in CLL, combined inhibition of which could induce apoptosis (60). A trend of HDACs upregulated in these results highlighted the potential of targeted HDAC inhibitors (HDACi). HDAC1 and HDAC3 were both observed upregulated and specifically targetable, for instance, with Entinostat; previously identified to induce proapoptotic effects in CLL cells (61). HDAC7 additionally exhibited consistent upregulation, observed previously at the mRNA level (62), suggesting the possibility of more targeted means of HDAC interference, with fewer off-target effects compared with those seen in previous pan-HDACi trials in CLL (62)(63)(64).
The strong signature of upregulated mRNA processes highlighted by the bioinformatics analyses (Fig. 6) suggests a general underlying defect in CLL, independent of subtype. SF3B1 mutations (65) and observations of major dysregula-tion of splicing patterns (44) have previously indicated aberrant spliceosome activity in CLL. Additionally, inhibition of SF3B1 by spliceostatin-A was toxic to CLL, independent of SF3B1 mutational status, suggesting a broader role for aberrant splicing in CLL biology (66). The observation of broadly consistent overexpression of spliceosomal proteins (Fig. 7) may therefore offer some explanation for these previous observations. This reinforces the notion that interference with aberrant splicing activity could offer a means of better understanding and potentially treating CLL.
A limitation to our study was the observation of non-B-cell contaminants in the HDB samples resulting from the substantial difference in percentage of non-B-cells and platelets between healthy donor and CLL patient PBMCs; varying from 95% in healthy PBMCs to below 10% in CLL patient PBMCs. Emphasis was therefore placed upon overexpressed proteins in CLL and conclusions based upon downregulated proteins -potentially attributable to contamination in the HDB samples -were avoided.
In summary, these results offer the first comprehensive insight into the molecular composition of CLL compared with healthy donor B-cells at the proteome level. They demonstrate the potential of proteomics to identify protein-specific differences between cancers and healthy tissues. The application of such approaches on a larger scale promises the elucidation of putative therapeutic targets and prognostic and diagnostic indicators, in addition to the dissection of the underlying cancer biology.  3 ϫ 10 Ϫ21 ). This pathway is annotated with those proteins identified as significantly overexpressed (p Ͻ 0.05), with both substantial (RsϾ0.3) and marginal (0.1ϽRsϽ0.3) overexpression annotated using red and yellow stars, respectively. B, The differential expressions observed for all proteins and the individual log 2 (ratios) for each CLL sample, mapping to the KEGG "spliceosome" pathway.