Heterogeneity of B cell lymphopoiesis in patients with premalignant and active myeloma

To better characterize the heterogeneity of multiple myeloma (MM), we profiled plasma cells (PCs) and their B cell lymphopoiesis in the BM samples from patients with monoclonal gammopathy of undetermined significance, smoldering MM, and active MM by mass cytometry (CyTOF) analysis. Characterization of intra- and interneoplastic heterogeneity of malignant plasmablasts and PCs revealed overexpression of the MM SET domain (MMSET), Notch-1, and CD47. Variations in upregulation of B cell signaling regulators (IFN regulatory factor 4 [IRF-4], CXCR4, B cell lymphoma 6 [Bcl-6], c-Myc, myeloid differentiation primary response protein 88 [MYD88], and spliced X box-binding protein 1 [sXBP-1]) and aberrant markers (CD319, CD269, CD200, CD117, CD56, and CD28) were associated with different clinical outcomes in clonal PC subsets. In addition, prognosis was related to heterogeneity in subclonal expression of stemness markers, including neuroepithelial stem cell protein (Nestin), SRY-box transcription factor 2 (Sox2), Krüppel-like factor 4 (KLF-4), and Nanog. Furthermore, we have defined significantly elevated levels of MMSET, MYD88, c-Myc, CD243, Notch-1, and CD47 from hematopoietic stem cells to PCs in myeloma B cell lymphopoiesis, noted even in premalignant conditions, with variably modulated expression of B cell development regulators, including IRF-4, Bcl-2, Bcl-6, and sXBP-1; aberrant PC markers (such as CD52, CD44, CD200, CD81, CD269, CD117, and CXCR4); and stemness-controlling regulators, including Nanog, KLF-4, octamer-binding transcription factor 3/4 (Oct3/4), Sox2, and retinoic acid receptor α2 (RARα2). This study provides the rationale for precise molecular profiling of patients with MM by CyTOF technology to define disease heterogeneity and prognosis.


Evaluation of metal-conjugated antibody efficacy
To evaluate metal-conjugated antibody efficacy, metal-conjugated antibody was digested at supplemented with 0.05% sodium azide (NaN3) and stored at 4 °C. All antibody-metal conjugates were titrated for optimal concentration of use and evaluated for their efficacy using positive controls, either human peripheral blood mononuclear cells or human cell lines, listed in Supplementary Table S2.

Sample collection
Fresh BM aspirates were collected immediately (< 1 minute) after aspiration into sodium heparinized tubes (BD Biosciences, San Jose, CA). Samples were then fixed with proteomic stabilizer buffer (Smart Tube, San Carlos, CA), according to the manufacturer's instructions by adding 1.4 ml of proteomic stabilizer buffer to 1 ml of BM sample and incubating for 10 min at room temperature (RT) on a rotator, and then frozen at -80 °C. Bone marrow samples were thawed just prior to analysis in a 4 °C cold water bath. Erythrocytes were lysed by hypotonic lysis "1 x thaw-lyse" buffer (1000-fold diluted concentrate with distilled water; Smart Tube, San Carlos, CA) in a ratio 4:1 to the total volume of the sample and incubated for 10 min at RT. To pellet leukocyte samples were centrifuged at 600 x g for 6 min at RT and the supernatant was discarded. 4 For complete lysis of erythrocytes, steps with 1x thaw-lyse buffer and its removal were repeated.
Cells were then washed twice in cell staining medium (CSM; 1 × phosphate buffered saline (PBS) with 0.5% bovine serum albumin (BSA) and 0.02% NaN3) and collected at 600 x g for 6 min at RT.

Cell sorting
Samples were thawed as previously described and a part of sample aliquot was stained with antihuman CD15-PerCP-Cy5.5 mouse antibody (clone HI98; BD Pharmingen, San Jose, CA, USA) for 30 min at 4 °C. Cells were sorted for CD15 negative cells to deplete a high percentage of granulocyte CD15+ cells using a 70-m nozzle on a FACS Aria III 5L cell sorter. Cells were harvested in CSM and used immediately for CyTOF sample preparation.

CyTOF sample preparation
Cells were washed in cell staining media (CSM; PBS with 0.5% BSA and 0.02% NaN3) and 1% outlier density to exclude outliers, (iii) 10% target density to retain 10% cells after downsampling process of SPADE analysis, and (iv) number of clusters: 300 in BI & BII panels to define desired number of clusters. In addition, all BM samples of MM patients and healthy donors in BI and BII panels were analyzed simultaneously by SPADE approach, generating the same tree structure, so the resulting tree structure captured all subpopulations present in the entire dataset.
Furthermore, SPADE was used to compare multiple samples, with overlapping staining in BI and BII panels. After separately downsampling the data, we can pool the downsampled data into a meta-downsampled dataset, which is a meta-cloud that represents where a cell is in a highdimensional space defined by the markers that are common to both panels. Therefore, the shape 8 of the SPADE tree was defined by 13 overlapping clustering markers in both BI and BII panels.
For markers that varied across panels, its behavior can be visualized by contrasting its intensities on differently colored samples. The boundaries to separate the clusters that show drastically different colors in SPADE tree were drawn manually and annotated into immunophenotypic populations, according to the colored tree and based upon examination of positive versus negative expression of relevant biaxial plots of cellular events in each cluster. Prior knowledge was used to interpret the biological relevance of individual tree clusters.

CyTOF data visualization
For interactive visualization of the results, we developed a web portal using R software and its additional packages, mainly shiny. Shiny is an R library that allows the development of a simple interactive web application inside R. The application is accessible via any web browser, facilitating server computing resources. The other R packages used for the development include shinyBS, igraph, ggplot2, Cairo, gplots, colorRamps, plotly and DT (Supplementary Table 3 We also normalized the median expression for each marker in each sample. To normalize the expression we used the formula: where y is the normalized median expression of one marker in one cluster of one sample.
c is the number of cells in one node, e is the median expression for one marker in one node and cB is the sum of the number of cells from nodes belonging to one cluster.
For further analysis, we modified the mergeClusters, identifyDAC and volcanoViewer functions from the SPADEVizR package ( (5)  To effectively visualize the comparison of the up-or down-regulation of statistically significant markers in multiple cell clusters, we used circular plots (R software and R libraries: circlize (6) and ComplexHeatmap (7)). Venn diagrams were designed to show common changes 10 of marker expression in cell cluster at various stages of MM using VennDiagram R package.
Principal component analysis (PCA), correspondence analysis (CA) and their visualizations were generated in R software and FactoMineR, factoextra, survminer and survival R libraries.
Correlation heatmaps were created using the ggcorrplot R package (Supplementary Table 3).

Statistical analysis
The tests of normality, the Kolmogorov-Smirnov and the Shapiro-Wilk tests were used to assess distribution of data. The outliers were identified by Tukey test. Statistical significance of two groups was determined by non-parametric Mann-Whitney U test. The differences in median values among four MM stages versus HD control group were evaluated by Dunn's multiple comparison test after the Kruskal-Wallis one-way analysis of variance by ranks test, with p value < 0.05.