A Proteomics and Transcriptomics Approach to Identify Leukemic Stem Cell (LSC) Markers*

Interactions between hematopoietic stem cells and their niche are mediated by proteins within the plasma membrane (PM) and changes in these interactions might alter hematopoietic stem cell fate and ultimately result in acute myeloid leukemia (AML). Here, using nano-LC/MS/MS, we set out to analyze the PM profile of two leukemia patient samples. We identified 867 and 610 unique CD34+ PM (-associated) proteins in these AML samples respectively, including previously described proteins such as CD47, CD44, CD135, CD96, and ITGA5, but also novel ones like CD82, CD97, CD99, PTH2R, ESAM, MET, and ITGA6. Further validation by flow cytometry and functional studies indicated that long-term self-renewing leukemic stem cells reside within the CD34+/ITGA6+ fraction, at least in a subset of AML cases. Furthermore, we combined proteomics with transcriptomics approaches using a large panel of AML CD34+ (n = 60) and normal bone marrow CD34+ (n = 40) samples. Thus, we identified eight subgroups of AML patients based on their specific PM expression profile. GSEA analysis revealed that these eight subgroups are enriched for specific cellular processes.

Interactions between hematopoietic stem cells and their niche are mediated by proteins within the plasma membrane (PM) and changes in these interactions might alter hematopoietic stem cell fate and ultimately result in acute myeloid leukemia (AML). Here, using nano-LC/MS/MS, we set out to analyze the PM profile of two leukemia patient samples. We identified 867 and 610 unique CD34 ؉ PM (-associated) proteins in these AML samples respectively, including previously described proteins such as CD47, CD44, CD135, CD96, and ITGA5, but also novel ones like CD82, CD97, CD99, PTH2R, ESAM, MET, and ITGA6. Further validation by flow cytometry and functional studies indicated that long-term self-renewing leukemic stem cells reside within the CD34 ؉ /ITGA6 ؉ fraction, at least in a subset of AML cases. Furthermore, we combined proteomics with transcriptomics approaches using a large panel of AML CD34 ؉ (n ‫؍‬ 60) and normal bone marrow CD34 ؉ (n ‫؍‬ 40) samples. Thus, we identified eight subgroups of AML patients based on their specific PM expression profile. GSEA analysis revealed that these eight subgroups are enriched for specific cellular processes. Acute myeloid leukemia (AML) 1 is a disease characterized by an increase of immature myeloid blasts in the bone marrow as a consequence of the loss of normal differentiation and proliferation of hematopoietic progenitor cells (1,2). The cancer stem cell (CSC) model (3)(4)(5)(6) suggests that AML is maintained by a rare population of leukemic stem cells that are thought to be relatively quiescent, therapy resistant, and frequently the cause of relapse of disease. The interaction with the surrounding microenvironment in the bone marrow is very important for the regulation of hematopoietic stem cell fate, and probably also of leukemic stem cells (LSCs) (7). Consequently, differential expression of proteins at the plasma membrane level could account for the specific interactions of leukemic cells with their niche. Therefore, the characterization of the plasma membrane proteome of LSCs is fundamental to further unravel the biology of leukemia development. In addition, a better understanding of the membrane proteome features could contribute to improved identification, isolation, and targeting of LSCs.
It is unclear whether there is a common plasma membrane protein signature that generally defines AML, or whether subtypes of leukemia can be identified based on the expression of specific plasma membrane proteins. From a cytogenetic standpoint AML is a very heterogeneous disease with different levels of classification (8). Leukemic cells often carry several recurring mutations, either as point mutations, insertions, deletions, gene rearrangements, and/or chromosomal translocations (8,9). Deep sequencing technology has revealed, and will most likely continue to reveal, the occurrence of many more mutations in AML (10,11). This diversity challenges even further the search for diagnostic factors. It has been recently shown that gene expression profiling is a valid approach in determining AML signatures and prognostic factors (12,13), especially when it is performed on the CD34 ϩ cell population (14) or on LSC-containing cell populations as defined by engraftment in xenograft models (15). Distinct subgroups could indeed be identified based on transcriptome data. However, it will still be necessary to verify whether these transcriptome changes are also translated to changes at the protein level, and whether unique plasma membrane proteins exist that might aid in the identification of distinct subgroups of AML.
Over the last two decades, the advances in mass-spectrometry-based technologies have allowed the identification and characterization of diagnostic markers in complex biological samples (16 -18). In our study we used liquid chromatography-coupled tandem mass spectrometry (LC-MS/MS) to analyze the plasma membrane proteome of two different AML samples, separated into leukemic stem-cell enriched CD34 ϩ and leukemic stem cell-depleted CD34 Ϫ fractions (19), to identify specific plasma membrane-associated signatures.
Following this approach a CD34 ϩ -specific plasma membrane protein profile was identified, which included putative AML markers such as CD47, ITG␣6, CD44, CD82, and CD135. We then correlated the proteomics results with gene expression profiles of a large cohort of AML CD34 ϩ and normal CD34 ϩ samples, which resulted in the classification of eight AML subgroups, associated to a specific PM expression profile. Subsequent gene set enrichment analysis (GSEA) revealed that each of the identified subgroups was characterized by specific cellular processes and prognosis.

EXPERIMENTAL PROCEDURES
Isolation of AML CD34 ϩ and CD34 Ϫ cells, MS5 Cocultures, and FACS Analysis-AML blasts from peripheral blood cells or bone marrow cells from untreated patients with AML were studied after informed consent was obtained in accordance with the Declaration of Helsinki, and the protocol was approved by the Medical Ethical Committee. AML mononuclear cells were isolated by density gradient centrifugation, and CD34 ϩ cells were stained using CD34-PE antibody (BD Biosciences, San Jose, CA, USA) and selected by sorting on a MoFLo (DakoCytomation, Carpinteria, CA, USA). AML cocultures were performed on MS5 stromal cells as described previously (19,20). All fluorescence-activated cell sorter (FACS) analyses were performed on a FACScalibur (Becton-Dickinson [BD], Alpen a/d Rijn, the Netherlands) and the data were analyzed using WinList 3D (Verity Software House, Topsham, USA) or FlowJo (Tree Star, Oregon, USA) software. Cells were incubated with antibodies at 4°C for 30 min. Antibodies against CD34, CD38, CD135, CD47, and ITGA6 were obtained from BD Biosciences (Breda, The Netherlands), antibodies against CD96 and PTH2R were obtained from Santa Cruz (Santa Cruz Biotechnology, Santa Cruz, CA, USA) and second goat-anti-rabbit-FITC antibodies (used for PTH2R stains) were obtained from Invitrogen (Breda, The Netherlands).
Membrane Protein Purification-AMLCD34 ϩ and AMLCD34 Ϫ cell populations were sorted with a MoFlo-XDP sorter from Beckman-Coulter. The cell suspension was centrifuged and the pellet was frozen in liquid N 2 . The cells were quickly thawed and mechanically lysed with six passes through a 301/2 gauge needle and diluted twofold in lysis buffer (50 mM Tris/HCl pH 8, 250 mM sucrose, 2 mM EDTA, 0.2 mM MgCl 2 , and protease inhibitor mixture). The total cell lysate was depleted of the nuclear fraction by a low spin centrifugation step (1000 ϫ g for 10 min at 4°C), and the supernatant was layered on top of a 60% sucrose cushion and centrifuged for 2 h at 100,000 ϫ g at 4°C with a TLA100.1 rotor. The top layer wassixfold diluted with 50 mM Tris/HCl, pH 8, and centrifuged for 1 h at 80,000 ϫ g at 4°C. The pellet was resuspended in 100 mM Na 2 CO 3 pH 8.5, 0.1% SDS and subsequently incubated with TCEP (tris(2-carboxyethyl)phosphine) for 1 h at 60°C, followed by the addition of methyl methanethiosulfonate for 10 min at room temperature, to reduce and modify cystein residues. One microgram of Trypsin Gold (mass spectrometry grade, Promega) was added and the reaction incubated over/night at 37°C. The sample was then treated with 1 Unit of PNGase F (SIGMA) for 2 h at 37°C followed by a second trypsin digestion overnight at 37°C. The tryptic peptides were acidified with 5% formic acid and cleaned with C18 TopTip (Glygen) according to the manufacturer's instructions eluting with 80% methanol in 5% formic acid.
Strong Cation Exchange Fractionation-Off-line peptide prefractionation by strong cation-exchange (SCX) was performed on a silicabased polysulfoethyl aspartamide column (200 ϫ 2.1 mm, 200 Å, Cat.: 202SE0502 PolyLC Inc., Columbia USA) mounted on an Ettan-MDLC system (Amersham Biosciences AB, Uppsala, Sweden), and run at a flow rate of 200 l/min. The pH of the sample was adjusted to 3.0 with phosphoric acid before separation. Gradient solutions A: 10 mM triethylammonium phosphate, pH 2.7, 25% acetonitrile; B: 10 mM triethylammonium phosphate, pH 2.7, 25% acetonitrile, 1 M KCl. Gradient conditions: column equilibration with five column volumes (CV) (1 CV ϭ 0.7 ml) of 100% A. After sample loading, the column was washed with 10 CV at 100% A. Peptides were eluted stepwise in: (1) 0 to 5% B in 5 CV; (2) followed by 12 to 30% B in 10 CV; and (3) 24 -60% B in 5 CV. Elution fractions were collected every 120 s in a 96-well plate and subsequently dried in a vacuum centrifuge. Eluted peptides were concentrated to ϳ40 l in a vacuum centrifuge and diluted 1:2 with 0.2% trifluoroacetic acid. Depending on the complexity, either separate fractions or pools of two fractions were analyzed by RP-LC MS/MS.
Reverse Phase Liquid Chromatography and ESI-MS-SCX fractions resuspended in 5% formic acid were separated on a capillary column (C18 PepMap 300, 75 m ϫ 250 mm, 3-m particle size, Dionex, Amsterdam, The Netherlands) mounted in line with a precolumn (EASY-Column C18, 100 m ϫ 20 mm, 5-m particle size, Thermo Scientific, Bremen, Germany) on a Proxeon Easy-LC system (Proxeon Biosystems, Odense, Denmark). Solutions of 0.1% formic acid in water and a 0.1% formic acid in 100% acetonitrile were used as the mobile phases. A gradient from 2 to 35% acetonitrile was performed in 140 min at a flow rate of 200 nl/min. Eluted peptides were analyzed using a linear ion trap-Orbitrap hybrid mass spectrometer (LTQ-Orbitrap, Thermo Scientific). The LTQ was operated in data dependent mode in which one full MS scan was followed by an MS/MS scan with dynamic exclusion set to: 1 repeat count, 30 s exclusion duration, and 500 exclusion list size. MS scans were acquired in the Orbitrap in the range from 250 to 2000 m/z, with a resolution of 60,000 (full-width at half-maximum). The seven most intense ions per scan were submitted to MS/MS fragmentation (35% Normalized Collision Energy TM ) and detected in the linear ion trap. The capillary temperature was set at 200°C, spray voltage was 1.7 kV, and capillary voltage was 39.96 V. Each fraction was analyzed in triplicate with exclusion lists.
Database Searching-The MS raw data were submitted to Mascot (Version 2.1, Matrix Science, London, UK) and Sequest using the Proteome Discoverer 1.1 analysis platform (Thermo Scientific) and searched against ipi.HUMAN.v3.83 database proteome covering 91,464 entries. Peptide tolerance was set at 20 ppm and 2.0 Da for intact peptides and fragment ions respectively, using semitrypsin as protease specificity and allowing for up to two missed cleavages. Oxidation of methionine residues, deamidation of asparagine and glutamine, and MMTS modification of cysteines were specified as variable modifications. The MS/MS-based peptide and protein identifications were further validated with the program Scaffold (Version Scaffold_3.1.4, Proteome Software Inc., Portland, OR). Protein identifications were accepted if they could be established at greater than 50% and contained at least one identified peptide. X! Tandem (thegpm.org; version 2007.01.01.1) was set up to search a subset of the ipi.HUMAN.v3.83 database also assuming trypsin.
The data associated with this manuscript may be downloaded from ProteomeCommons.org Tranche using the following hash: ivxϩ b6vfwuUwT9OCMTxjkJNjEϩ0aF4klO8y/fkQ0sR4A1j1telRKrH38zfj GKGoJ/u0Vlg/s/0hb0FZSw3wCYU6it20AAAAAAAACZg ϭ ϭ Principal Components Analysis-Gene expression profiling of in total 147 samples was performed previously (14) using the Illumina HumanHT-12 Expression BeadChips. These 147 samples were divided as follows: 40 CD34 ϩ NBM, 60 AML CD34 ϩ , and 47 AML CD34 Ϫ samples (see Supplemental File S2). All samples were corrected for background using Illumina GenomeStudio and then jointly forced to positive values, normalized, and transformed using the R packages Bioconductor (21) and Lumi (22). Probes with a detection p value larger than 0.01 in all samples, as provided by GenomeStudio, were deleted. Log2 transformation and quantile normalization were applied. As a measure of quality control we performed a principal component analysis on the correlation matrix of all 147 samples (23). The first component was removed from the data (24).To ensure reliability and reproducibility of the results we used multivariate permutations to determine the significance of our results using a false discovery rate of 0.4.
Information Gain-Information gain (IG) is a measurement of the expected reduction in information entropy in the presence of a known variable (25). More specifically, in our case we have used two sets of samples, AML CD34 ϩ or NBM CD34 ϩ . This cell-type annotation contains certain information entropy, but without prior knowledge about these samples it would not be possible to classify them correctly. By making use of additional available information about these samples, in our case gene expression profiles, one can reduce the information entropy of the cell-type annotation, which aids in the classification of the cell-type of a sample. The IG is the reduction in information entropy when using the extra knowledge about these samples. In our case, we calculated the IG using the gene expression levels of PM proteins that were upregulated in AML CD34 ϩ cells. The IG allowed us to prioritize genes based on their predictive value of the cell type.

Identification of the Plasma Membrane Proteome of Leukemic Stem Cell-Enriched Fractions of Primary Leukemia Patient
Samples-To investigate the plasma membrane signature of leukemic stem cells, we selected cells from two patients: a poor risk AML patient (FAB M1, FLT3 ITD, NPM1wt; Inv(3q), Ϫ7, Ϫ10; designated AML1) and a myeloid blast crisis patient sample (CML patient that progressed to AML; FLT3wt, NPM1wt, Inv(16) and t(9;22); designated AML2). Both samples were sorted into CD34 ϩ and CD34 Ϫ populations to obtain stem cell-enriched and stem cell-depleted fractions (3,14,19,26). About five million cells could be sorted per sample and because of this relatively small amount of cells it was chosen to simplify membrane purification procedures, thus minimizing the loss of membrane proteins. Therefore, after cell lysis and a low centrifugation step, the sample preparation included just a sucrose cushion centrifugation to separate the membrane component from the nuclear and cytoplasmic fractions, as summarized in the scheme of Fig. 1A. Because of this technical approach, the resulting membrane enriched fractions were still a very complex mixture of cellular and subcellular membrane proteins. To deal with the high level of sample complexity and still obtain a comprehensive inventory of the plasma membrane proteome, we applied a multidimensional protein identification technology (MuDPIT) approach (27) combined with a high resolution LCMS/MS workflow. After protein digestion, the resulting tryptic peptide mixtures were eluted combining off-line a SCX chromatography with a Reverse Phase (RP) chromatography column directly coupled to an LTQ-OrbiTrap mass spectrometer. The elution of each sample was carried out with a 4-hour gradient to increase the separation of the peptide peaks. Each SCX fraction was analyzed in three technical replicates with the use of incremental exclusion lists, which have been shown to increase the num-ber of identified peptides in label-free systems (28) for highly complex samples. The MS/MS spectra were then searched against the ipi-HUMAN database using Mascot, Sequest, and X!Tandem to increase the confidence of identification probability. The results were further validated with the program Scaffold. The total number of proteins identified with at least one unique peptide was: 3504 for AML1 CD34 ϩ , 1162 for AML1 CD34 Ϫ , 2580 for AML2 CD34 ϩ , and 4058 for AML2 CD34 Ϫ (Fig. 1B and Supplemental File S1). Gene ontology annotation indicated that 32% of the total amount of identified proteins was classified as "membrane," of which 61% were plasma membrane proteins (Fig. 1C). In total 867 and 610 unique AML CD34 ϩ PM-associated proteins (PMϩ) could be detected in AML1 and AML2, respectively, of which 619 and 386 were annotated as specific PM proteins ( Fig. 1B and Supplemental File S1). Less PM proteins were detected in the CD34 Ϫ fraction of AML1 compared with AML2, although we do not know whether differences in heterogeneity in the CD34 Ϫ compartments or technical issues underlie these observations. As shown in the VENN diagram, some but limited overlap in plasma membrane (-associated) proteins was observed between the CD34 ϩ and CD34 Ϫ fractions within each AML patient sample (Fig. 1B). Apparently, the leukemic stem cell-enriched CD34 ϩ population is quite distinct from the leukemic stem cell-depleted CD34 Ϫ fraction in terms of its plasma membrane proteome composition. Moreover, some but limited overlap in CD34 ϩ plasma membrane transcriptome was also observed between the two AML samples (Fig.  1D). This indicates that, as expected, there is considerable heterogeneity between the plasma membrane proteome of individual patients as well. Gene ontology annotation for biological processes, obtained by using the combined list of all identified AML CD34 ϩ PM proteins revealed enrichment for processes such as cell adhesion, ion transport, cell migration, and cytoskeleton organization (Fig. 1E). A short list of identified AML CD34 ϩ plasma membrane proteins is shown in Fig. 1F.
Identification of Leukemic Stem Cell Markers Using a Transcriptomics Approach-We determined the gene expression profile of AML and NBM samples using Illumina Bead Arrays. Transcriptomes of NBM CD34 ϩ samples (n ϭ 40) were compared with 60 AML CD34 ϩ stem cell enriched, and 47-paired leukemic stem cell-depleted CD34 Ϫ samples (an overview of patient characteristics is provided in Supplemental File S2) (14). Genes that were more highly expressed in the AML CD34 ϩ fraction were identified using a single sided Kruskal-Wallis U test, and significance was determined by multivariate permutation MP (29), which robustly limits the false discovery rate (30) of the performed analysis. Multivariate permutation exploits the correlation structure of the data and combines the low false positive rate of the Bonferroni correction with the high true positive rate of the Benjamini and Hochberg FDR correction. Within these AML CD34 ϩ -specific transcriptomes, GO annotation was used to select all proteins associated with the GO terms plasma membrane (GO:0005886), external side of plasma membrane (GO:0009897), integral to plasma membrane (GO:0005887), and cell surface (GO:0009986). Thus, 238 AML CD34 ϩ -specific up-regulated probe sets encoding 200 unique genes were identified ( Fig. 2A, Supplemental File  S3). In Fig. 2B, a supervised cluster analysis of these differentially expressed probesets in NBM CD34 ϩ samples versus AML CD34 ϩ samples is shown, and the top 20 up-regulated genes are indicated (Fig. 2B).
Comparison of the Plasma Membrane Proteome and Transcriptome of AML CD34ϩ Populations-Next, data sets obtained from our proteomics and transcriptomics approaches were compared. Among the 200 up-regulated genes in AML CD34 ϩ cells at the transcriptome level, 59 were also present in at the proteome level (Fig. 2C, Supplemental File S3). GO annotation for the term molecular function showed enrichment of signal transducer activity, receptor activity, kinase activity, integrin binding, cytokine binding, receptor binding, and calcium ion binding (Fig. 2C). Clearly, these 59 genes belong to a potentially larger set of proteins that would define differences in the plasma membrane proteome between AML versus NMB CD34 ϩ cells. However, because the expression of these 59 genes was confirmed at both the transcriptome as well as proteome level, we consider that this list contains putative leukemic stem cell markers that can be used to further understand the molecular biology of AML.
Verification and Functional Characterization of a Number of Putative Leukemic Stem Cell Markers-First, we set out to determine the stem cell frequencies within the CD34 ϩ /CD38 Ϫ and CD34 ϩ /CD38 ϩ compartments in 10 primary AML patient samples by long-term culture-initiating cell assays (LTC-IC) in limiting dilution. As shown in Fig. 3A, it is clear that there was a high level of heterogeneity in the percentage of CD34 ϩ /CD38 Ϫ and CD34 ϩ /CD38 ϩ populations. Also, we observed that the AML LTC-IC frequency was not uniquely present in the CD34 ϩ /CD38 Ϫ fraction, but that stem cell activity was observed within the CD34 ϩ /CD38 ϩ compartment as well, in line with previously published data (15,31). Therefore, we focused in our further studies on the AML CD34 ϩ compartment.
Evaluating Heterogeneity in Plasma Membrane Markers in AML-Leukemia is not a single type of disease, but in fact a number of different leukemia subtypes exist. Such different subtypes of leukemia might also be reflected in how leukemic cells interact with and respond to their environment, and thus by differences in their plasma membrane composition. There- fore, we set out to determine whether a subdivision of leukemia subtypes would be possible on the basis of differential expression of PM proteins based on our transcriptome data. To select the best discriminating uncorrelated markers we designed the following algorithm. First we calculated the information gain (25) for all the genes. Then the gene with the highest information gain was selected and all the genes that were moderately correlated (Pearson's r Ͼ ϭ 0.1) to this gene were removed. This process was repeated until we obtained the best possible list of uncorrelated genes that were candidates for leukemic stem cell markers. This whole process is depicted by Algorithm 1 (see Supplemental materials and Methods). Thus, eight plasma membrane markers were identified that were almost completely uncorrelated and could significantly discriminate eight subgroups of AML within our cohort of 60 samples: being FLT3, GPR114, ITGA5, CD44, TNFRSF10B, PTH2R, FCGR1A, and TMEM5, ranked in their decreasing order of IG. Expression for the majority of these could also be confirmed at the proteome level (Supplemental File S1). Supervised cluster analysis shown in Fig. 4A clearly indicates that AML CD34 ϩ samples can be separated from NBM CD34 ϩ samples on the basis of the expression of these eight markers. The eight identified subgroups were not associated with a certain karyotype, risk group, or FLT3-ITD or NPM mutation status. Finally, we questioned whether these eight subgroups would be characterized by specific cell processes. Expression of all genes was ranked according to their Pearson correlation coefficient in relation to the eight uncorrelated plasma membrane markers. Thus, we generated eight individual lists, headed by our eight identified plasma membrane proteins. Genes for which the expression strongly correlated with the plasma membrane marker would reside at the top of these lists and correlation would decrease toward the bottom. To evaluate whether these lists would be useful to further understand biological aspects of these subgroups, we performed GSEA analyses and the data is summarized in Fig.  4B and Supplemental File S4. Clearly, these eight subgroups were enriched for very specific GSEA terms. The FLT3 group, characterized by the highest IG, was enriched for genes associated with doxorubicin resistance, MYC signaling, glucose metabolism, and stem cell signatures. Even though this group was not significantly enriched for AMLs carrying FLT3-ITDs, we did observe enrichment for FLT3-ITD gene expression programs, suggesting that high FLT3 expressing AMLs might use similar signaling pathways. The association with stem cell and MYC signatures was further confirmed by performing GSEA directly with gene sets obtained from Wong and colleagues (ESC-like module, (32)) and Neff and colleagues (MYC signature, (33)). Enrichment for MYC and stem cell signatures was also observed for the PTH2R and TMEM5 groups. The ITGA5 group was enriched for gene sets associated with adhesion, actin cytoskeleton, CXCR4, and integrin signaling, whereas enrichment for MET signaling was observed for the CD44 and TNFRSF10B groups. Finally, we analyzed whether any of the subgroups was enriched for good or poor prognosis gene sets as described by Yagi and colleagues (34). A strong and significant correlation with poor prognosis signature was observed for the GPR114 and TMEM5 groups. Associations with good prognosis signature were observed in the following groups: FLT3, CD44, PTH2R, ITGA5, and FCGR1A. In the last two groups, there was a concomitant negative correlation with poor prognosis signature (Fig. 4B). DISCUSSION Hematopoietic stem cells reside within specialized niches within the bone marrow, with which they interact via proteins within the PM. Changes in these interactions might alter hematopoietic stem cell fate and ultimately result in hematological malignancies including AML. AML is still difficult to treat, often because of relapse of disease caused by therapy-resistant LSCs. Thus, identification of markers to recognize and ultimately target LSCs is warranted.
The main aim of the current study was to characterize the plasma membrane composition of primary leukemic stem and progenitor cells from leukemia patients in detail. We have used proteome and transcriptome approaches, both of which have advantages and disadvantages. The transcriptome can easily be quantified in a large series of samples with low cell numbers, but the presence of mRNA does not always correlate directly with protein expression. In contrast, the proteomics approach will provide insight into whether certain proteins reside in the plasma membrane, but the drawback there is that the proteome is much more difficult to quantify, particularly when only limited amounts of cells can be obtained as is the case for primary leukemia stem and progenitor cells isolated from patient samples. Furthermore, the absolute number of plasma membrane identifications is strongly dependent on the number of cells analyzed. In particular, the availability of low amounts of starting material limits the precise quantification of the least abundant membrane proteins. We therefore set out to combine proteome and transcriptome approaches to gain further insight into the PM proteome composition of primary hematopoietic stem cells isolated from AML patients.
Isolation of plasma membrane proteins for mass spectrometry from embryonic stem cells, murine hematopoietic stem and progenitor cells, and carcinoma cell lines for proteomic analysis has been recently reported (35)(36)(37)(38). We have now adapted these methods to gain further insight into the plasma membrane proteome of primary AML patient cells. We were able to analyze sorted CD34 ϩ stem cell-enriched and CD34 Ϫ stem cell-depleted AML populations by shotgun proteomics. Among all identified proteins, 32% could be annotated as membrane proteins, of which 61% were plasma membrane proteins. Thus, 619 and 386 unique plasma membrane proteins were identified in the CD34 ϩ compartments of AML1 and AML2, respectively. These lists included novel markers like CD82, CD97, CD99, PTH2R, ESAM, MET, and ITGA6, as FIG. 4. Evaluating heterogeneity in plasma membrane markers in AML. A, By identifying the best discriminating uncorrelated markers using an information gain approach (see "Results" section and Materials and Methods sections for details) we were able to identify eight plasma membrane markers that were almost completely uncorrelated. Supervised cluster analysis of expression of these eight markers in AML CD34 ϩ and NBM CD34 ϩ samples is shown. B, An overview of the eight uncorrelated markers including information gain is shown. Gene set enrichment analysis (GSEA) of the eight plasma membrane markers indicates that the identified subgroups associate with specific gene signatures. NS denotes Not Significant.
Although these novel plasma membrane proteomes will help to design focused future studies to further unravel the biology of leukemias, we realize that this study was based on only two patients. Also, our proteome approach did not allow a quantitative evaluation to determine which of these plasma membrane proteins would be higher expressed in leukemic stem and progenitor cells as compared with normal CD34 ϩ stem and progenitor cells. Therefore, we continued with a transcriptome approach in which 60 AML patient samples were sorted into CD34 ϩ stem cell-enriched and CD34 Ϫ leukemic stem cell-depleted fractions (of which 47 could be analyzed). As a comparison, 40 NBM CD34 ϩ samples were also included in the analyses. Thus, 238 probe sets encoding 200 unique plasma-membrane associated genes were identified that were significantly up-regulated in the AML CD34 ϩ fraction. Out of these 200, we found that 59 were indeed expressed at the protein plasma membrane level based on our proteome studies. It is currently unclear why the additional 141 were not identified in our proteome analysis. It is possible that (some of) these transcripts are not translated into protein, or that technical issues associated with the limited number of cells available for analysis, and/or to the presence of large hydrophobic membrane spanning regions make proteins less easy to detect. Nevertheless, our proteome and transcriptome data together clearly indicate that these 59 plasma membrane proteins are truly overexpressed in leukemic stem cell-enriched CD34 ϩ cells. Increased expression of a number of these, including CD135, CD47, ITGA6, CD96, and PTH2R, was further confirmed at the protein level in an independent cohort of AML patients by FACS analyses.
In our analysis we find that CD135 is the strongest and most significantly up-regulated plasma membrane protein encoding gene in AML CD34 ϩ cells compared with normal BM CD34 ϩ cells at the RNA level, and increased expression was also confirmed at the protein level. CD135 (or FLT3) is a membrane receptor that is expressed in the majority of AML cases (39,40). Activating mutations such as internal tandem duplications (FLT3-ITD) and point mutations in the tyrosine kinase domain (FLT3-TDK) are present in about 30% of AML patients (41,42). Overexpression of FLT3 was found to be an unfavorable prognostic factor for overall survival in AML cases without FLT3/ITD (43). Indeed, very high expression of CD135 was also observed in the absence of FLT3-ITDs, suggesting that targeting this pathway might also be beneficial in patients that do not carry mutations in this receptor.
Similarly, we observed that CD47 expression is increased in AML CD34 ϩ cells, both at the RNA and protein level, although there was a considerable heterogeneity among different samples. CD47 overexpression has been shown to be associated with a decreased overall survival in human AML (44). Its interaction with the signal regulatory protein alpha (SIRP␣) is involved in cell-to-cell communication by prevention of phag-ocytosis of red blood cells or platelets by macrophages (45,46). In AML, it has been shown that disruption of the CD47-SIRP␣ interaction, using monoclonal antibodies, leads to phagocytosis of AML-LSC and inhibition of engraftment (44). Other previously described leukemic stem cell markers for which we provide additional evidence include CD96 (47), ITGA5 (48), CD44 (49 -51), and IL3RA (52,53). Aberrant expression of CD97 and CD99 has been observed particularly in lymphoid malignancies (54), however, in line with observations by Akashi and colleagues (55), we observe up-regulation in AML as well.
Interesting new potential leukemic stem cells markers indentified in our analysis include CD82, PTH2R, ESAM, MET, and ITGA6. Recently, CD82 was shown to mediate homing and engraftment of human stem and progenitor cells (56). Parathyroid hormone receptors are typically associated with formation of the stem cell niche (57), but we find them to be up-regulated in AML CD34 ϩ cells as well. ESAM was recently reported as a marker that identifies actively cycling human stem cells that do retain long-term reconstitution activity (58), and it will be interesting to determine its role in myeloid leukemias. The hepatocyte growth factor receptor MET has been shown to play an essential role in numerous cancers (59), and it was recently shown that autocrine activation of MET is frequently observed in AML (60). Recently, Notta and colleagues described that integrin alpha 6 (ITGA6, CD49F) is expressed on human hematopoietic stem cells and can be used to isolate single cells that can provide long-term reconstitution in mice (61). Interestingly, we find that ITGA6 is strongly up-regulated in AML CD34 ϩ cells. Based on these results we have initiated preliminary functional studies, revealing that cells capable of long-term in vitro expansion reside indeed within the CD34 ϩ /ITAG6 ϩ compartment. Ongoing studies include in vivo experiments in which long-term engraftment in xenograft models is evaluated, to further validate these findings.
Leukemia is clearly not a single type of disease. Typically, patients are grouped into risk categories based on their prognosis, in which the karyotype, mutation status of proteins such as FLT3, NPMc, and CEBP␣, or expression levels of proteins such as EVI1 and BAALC play a dominant role (8). Because biological differences between different types of leukemia might also be initiated by interactions of leukemic cells with their environment, possibly based on differences in the composition of their plasma membrane proteome, we set out to determine whether the classification of leukemia subtypes could be associated to differential expression of PM proteins. By identifying the best discriminating uncorrelated genes, using an iterative approach that applies information gain (see "Results" section and Materials and Methods sections for details), we were able to identify eight plasma membrane markers that were almost completely uncorrelated within our cohort of 60 AML samples. These were FLT3, GPR114, ITGA5, CD44, TNFRSF10B, PTH2R, FCGR1A, and TMEM5, ranked in their decreasing order of IG. Although the limited size of our sample group of 60 patients did not allow a thorough statistical analysis of whether some of our identified subgroups were associated with a certain karyotype, mutation status, or risk group, we did observe in an unsupervised cluster analysis based on the expression of the eight markers that a number of different clusters appeared, some of which were enriched for mutated NPMcyt and FLT3-ITDs. Furthermore, the expression of all genes was ranked according to their Pearson correlation coefficient in relation to the eight uncorrelated plasma membrane markers. Thus, we generated eight individual lists in which the expression of genes was correlated to the plasma membrane protein, ranked according to their correlation coefficient. Next, we used GSEA to evaluate whether correlations existed with previously published transcriptome sets. First we analyzed whether enrichment would be observed with gene sets that had been associated with poor or good prognosis (Yagi et al. 2003). Two of the identified subgroups positively correlated with a good prognosis signature, and negatively correlated with a poor prognosis signature. Reversely, in 1 subgroup we identified a strong positive correlation with poor prognosis and a negative correlation with good prognosis. Although these are interesting observations, without direct links to clinical outcome in patients, further studies are required to confirm whether these findings indeed have important clinical relevance. Also, GSEA analyses indicated that the eight subgroups could be characterized by specific cellular processes pinpointing to their possible biological relevance. For instance, a strong positive correlation with MYC signatures was identified in three out of eight of the identified subgroups. In one out of eight, a significant negative correlation was observed, and in four out of eight subgroups no significant correlation with MYC signatures was observed. Also, it is extensively debated in the field whether certain oncogenes would enforce gene expression profiles that are similar to self-renewing embryonic stem cells, and we find that in three out of eight subgroups strong positive correlations exist with embryonic-like signatures defined by Wong et al. (32). Clearly, one cannot make definitive statements based on GSEA approaches alone, and furthermore, future studies will be required to functionally validate these findings. However, these data will be useful for the setup of such experiments and it will be interesting to further characterize these differences in detail in the future and analyze whether the differences in plasma membrane transcriptome that we have described here allow a deeper understanding of the biology of the various subtypes of human myeloid leukemias.