CombiFlow: Flow cytometry-based identification and characterization of genetically and functionally distinct AML subclones

Summary Many cancers, including leukemias, are dynamic oligoclonal diseases. Tools to identify and prospectively isolate genetically distinct clones for functional studies are needed. We describe our CombiFlow protocol, which is a combinatorial flow cytometry-based approach to identify and isolate such distinct clones. CombiFlow enables the visualization of clonal evolution during disease progression and the identification of potential relapse-inducing cells at minimal residual disease (MRD) time points. The protocol can be adapted to various research questions and allows functional studies on live sorted cell populations. For complete details on the use and execution of this protocol, please refer to de Boer et al. (2018).


SIRPA-PE (10 mL)
Thermo Total n/a n/a 7,5 mL Make fresh each time.

Run the marker panel
Timing: 4 h Plasma Membrane Protein marker expression data is acquired by flow cytometry that can be used as input for the following analyses: 1) Leukemic cells can be identified based on aberrant expression of one or more markers compared to the healthy control. 2) Aberrantly expressed plasma membrane protein marker expression can be used to identify and sort potential subclones. 3) Disease progression can be tracked by running the panel on longitudinal samples from individual patients.
9. Cells are stained for markers CD34, CD38, CD45 and CD45RA, from now onwards called backbone stain. These markers are used as a basis to merge all measured plasma membrane protein markers together and are therefore present in every tube. More detailed explanation about this principle can be found in Merge the marker panel flow data. Addition of antibodies can be performed at 15 C-25 C. All stainings are carried out at 4 C in the dark. a. Label unstained and single stained control tubes b. Take 15 million viable mononuclear cells (MNCs) of the thawed AML sample and resuspend this in 320 mL PBS c. Add 2 mL of cell suspension to the unstained and single stained control tubes containing 98 mL PBS d. Add the backbone stain to the remaining cells: i. FcR blocking reagent 10 mL ii. CD34-APC 9 mL iii. CD38-FITC 9 mL iv. CD45-PECy7 3 mL v. CD45RA-BV421 3 mL 10. Stain cells for 20 min at 4 C in the dark a. Label tubes for the marker panel (1-36) b. Add PBS to the backbone-stained cells in a total volume of 100 mL * number of markers c. Add 100 mL of backbone-stained cells to each of the labeled tubes. Resuspend in between so that all tubes have a similar amount of cells d. Add one PE-labeled plasma membrane protein marker antibody per tube. The volume of antibody per tube is indicated in the key resources table. e. Stain cells for 30 min at 4 C in the dark f. Wash cells with 2 mL of PBS per tube and centrifuge at 450 g for 5 min g. Remove PBS and resuspend cells in 100 mL fresh PBS 11. Acquire expression data using a flow cytometer Note: The extra volume of backbone-stained cells will be approximately 300 ml. Keep this at 4 C until all PE-labeled marker panel antibodies have been added. It can serve as back-up if a pipetting mistake is made with the marker panel.

OPEN ACCESS
Note: The steps described above have been optimized for staining of cells in FACS tubes. For staining of cells in a 96-well plate some adjustments may be needed in the protocol.
Note: While we have used a backbone panel with antibodies against CD34, CD38, CD45 and CD45RA, other markers, or additional markers such as CD64 or CD117, can be included as backbone markers as well.
Note: The percentage of viable, thawed cells when measuring the marker panel usually lies between 70-90%. The largest cell loss during the thawing procedure is cleaned by the DNase in the NCS mix and does not result in the accidental staining of large number of dead cells. If desired, PI (at 1 mg/mL) can be added to the backbone panel in order to discriminate viable cells. In case of limited patient material, for instance at follow-up time points, a prioritized list consisting of 15 markers can be used, indicated by an asterisk behind the antibody in the key resources table. The ranking of these markers was based on the number of times it was positive in a cohort of 87 AML samples and on literature describing the use of specific markers in AML.
Note: We acquire flow data at the MacsQuant Analyzer 10. Other flow cytometers may also be used if the lasers are sufficient for the used fluorophores.
Optional: Experiment can be paused here. Data analysis and the thawing of a second AML sample for sorting can be done on a different day.

Merge the marker panel flow data
Timing: 30 min A limiting factor in flow cytometry is that the number of markers that can be measured per tube is limited due to availability of fluorochrome-labeled antibodies with distinct excitation and emission spectra and flow cytometers with multiple lasers. To circumvent this problem, we add a backbone of markers in each independent FACS tube. Additional to the backbone in each of the tubes an antibody against a specific plasma membrane marker is added, that can carry the same fluorochrome, in our case PE. Flow data acquired in separate tubes can be combined based on the expression of the backbone markers using Infinicytä. With the recent development of full spectrum flow cytometers, such as the Cytek Aurora, more fluorochromes may be included to further increase the resolution and reduce analysis time. Note: By making use of Infinicyt to separate the PE-labeled markers, the panel can be easily adapted. The number of markers can be increased, provided that a PE-labeled antibody is available.
Note: Pay attention to the names of the markers and make sure these are consistent throughout analyses. This will avoid errors in the CombiFlow pipeline. Spaces are allowed but will be replaced by an underscore in the Combiflow analysis script.
Note: Additional information on exporting FCS files can be found here: FCS file export from FlowJo. Additional explanations on the merging of FCS files and the APS plotting can be found in the Infinicyt documentation available on the Cytognos website.
Note: Select which markers to include in the export if the merged file is used as input for CombiFlow, since here markers should match between included files.

Identification of potential subclones
Timing: 30 min-1 h With the merged flow data from Merge the marker panel flow data marker expressions can be compared with marker expressions of a healthy donor. Aberrantly expressed markers can be selected and used for further analysis in the Infinicytä environment to identify potential subclones as depicted in Figure 1. First, markers with aberrant expression are selected and used as input for the Principal Component analysis, which is used to identify the most discriminating marker ( Figures  1A and B). Subpopulations can be sorted based on this marker and sequenced ( Figure 1C). Based on the sequencing outcomes, a pedigree can be created that depicts the expected evolutionary track of the leukemic cells ( Figure 1D).
15. Prepare the data from the merged file for further analysis a. Export marker expression data per cell from the population of interest in the merged FCS file using FlowJoä b. Prepare a data file for the healthy control 16. Run R script: ''Marker panel -histograms'' to plot plasma membrane protein marker expression data of a healthy control and the data of interest to find aberrantly expressed proteins a. Save generated histogram plot as pdf 17. Select plasma membrane protein markers that are aberrantly expressed in comparison to the healthy control using the previously generated histogram plots 18. Open the analysis file, saved in step 13i, in the Infinicytä environment. See Methods video S3 for stepwise instructions a. Note: If more than one plasma membrane protein marker is needed to distinguish between potential subclones, it is necessary to check whether this marker is available with a different fluorophore than PE before continuing with Sorting of potential subclones.

Troubleshooting 2
Sorting of potential subclones

Timing: 4 h
After determining the most distinguishing plasma membrane protein marker(s) to identify potential subclones using the merged FCS file and Infinicytä. The potential clones can be sorted using the protocol below. This part is a continuation of Thawing an AML. The CombiFlow pipeline can be used to study the hematopoietic landscape of healthy and leukemic samples at diagnosis and longitudinally. Potential relapse-inducing populations can be identified at MRD time points and changes in clonal composition can be monitored based on marker expression data as depicted in Figures 2 and 3. The CombiFlow pipeline is based on an analysis workflow for CyTOF data (Nowicka et al., 2017). Whereas in the described examples clusters are assigned to visualize differences between diagnosis, relapse and healthy (Figure 2), it may also be used to visualize different cell types such as blasts, lymphocytes and mature myeloid cells.
27. Data preparation a. Export population of interest from the merged FCS files using FlowJoä and zip the files b. Create a metadata file c. Create a panel file 28. Create a cluster file. The number of clusters can be adjusted based on the expected number of populations. If this is not known before running the analysis, a fixed number of clusters can be used. Here, 40 clusters were created per analysis to avoid missing out on smaller, more rare cell populations 29. Clusters are formed using the FlowSOM algorithm (Van Gassen et al., 2015) and can be visualized for all samples or per sample using tSNE landscapes. These can be colored by marker expression to identify and visualize specific cell populations such as CD34 + cells or CD4 + lymphocytes 30. Create a heatmap that depicts the transformed marker expression per cluster. This can be used to assign clusters to specific cell types ll OPEN ACCESS 31. Run the script until the step 'Manual clustering per sample' a. Clusters can be assigned to a sample based on the cluster cell count per sample b. Create a sample file 32. Visualize the newly assigned clusters for all samples combined or per sample 33. A new heatmap can be created, which depicts the combined expression of all clusters assigned to one specific sample 34. Continue running the script until the Principal Component analysis step a. Select which markers and clusters to include in the Principal Component analysis. A Principal Component analysis can be performed for all included samples, but a comparison of diagnosis and relapse, without the healthy clusters, is also a possibility b. Run the Principal Component analysis to identify the most discriminating marker(s).
Troubleshooting 5 Optional: Clusters can be assigned to a sample but also cell type. In order to do so, create a file similar to the cluster and sample file but now assign each cluster to a cell type based on the expression data in the heatmap. For example, a cluster with small size, high CD45 and CD19 expression may be assigned to either lymphocytes or, more specifically, B cells.
Note: Some clusters may have cells originating from both the leukemic and healthy sample. We opted to assign these cells to the healthy sample until the ratio Healthy/Leukemic cells fell below 1:8.
Note: Uniform manifold approximation and projection (UMAP) visualization may be an alternative for the currently used tSNE plots. Implementation of UMAP into the CombiFlow pipeline is currently ongoing, as well as a more automated, machine-learning approach for the cluster assignments to reduce analysis time.

EXPECTED OUTCOMES
With this protocol we generated marker expression profiles of 2 diagnosis samples, a paired relapse sample and a healthy control. Of the unpaired diagnosis sample we selected the plasma membrane protein markers that were aberrantly expressed in comparison to the expression levels in the healthy control ( Figure 1A). Using the aberrantly expressed markers a Principal Component analysis was

OPEN ACCESS
performed in the Infinicytä environment and the marker CD25 was identified as most distinguishing maker ( Figure 1B). The diagnosis sample was sorted in two populations: CD34 + CD25 + and CD34 + CD25 -, and cells were processed for further analysis. We isolated gDNA from both clones and performed targeted sequencing for the mutations DNMT3A G>A, RUNX1 dup. CCTA and FLT3-ITD. We found that DNTM3A and RUNX1 were present in both clones, but FLT3-ITD was only found in the CD25 + clone ( Figure 1C). Based on these findings we can predict a possible evolutionary pattern of the clones found in this particular individual. Various downstream analyses were performed on the sorted subclones. RNA sequencing and GO analysis revealed enrichment of processes related to cell proliferation, mitochondrial activity and cytokine signaling in the FLT3-ITD clone, whereas the FLT-WT clone was enriched for processes related to chromatin organization and histone modification. By in vitro culture of the subclones followed by treatment with the FLT3 tyrosine kinase inhibitor AC-220, differences in sensitivity to AC-220 were found as only the FLT3-ITD subclone showed reduced cell counts and increased levels of apoptosis.
The CombiFlow examples show one AML patient with diagnosis and relapse samples and a healthy control. In Figure 2 the forty clusters are visualized in total and per sample. Here it can be observed that the diagnosis and relapse differ based on their marker expression profiles. TruSight sequencing data was also available at diagnosis and relapse, variant allele frequencies at both time points are depicted in a fish plot ( Figure 3A). Clonal evolution between diagnosis and relapse was observed as the BCOR mutation was no longer detected at relapse, whereas the mutations in KMT2A and CEBPA newly appeared. The size of the NPM1 mutated clone increased. These genetic differences between diagnosis and relapse were visualized in the tSNE landscape ( Figure 3B), indicating that the markers used to create the landscape could separate cell clusters derived from the diagnosis and relapse sample. The healthy control clustered away from the AML samples. The Principal Component analysis reflected this separation and identified ITGA5, CD97 and IL1RAP as the top markers to distinguish between the AML and healthy cells ( Figures 3C and 3D). CD47, CD99 and CLL1 differ between diagnosis and relapse and possibly within the healthy cells as well ( Figure 3D).

LIMITATIONS
A possible bottleneck is the merging of the flow data in Infinicytä. This is sensitive to aberrations in expression of the backbone markers. It may lead to having to exclude one of the markers. Furthermore, it is possible that the subclone that was found is very small, so there are not enough cells left after sorting for further analysis. In addition, differences in mutational status are not always found in the sorted subclones, resulting in having to go back to the drawing board. Another limitation is that the number of events that can be included in the CombiFlow analysis is currently relatively low. This is due to a lack of computational power. By running the analysis on a cluster this may be circumvented and the number of events can be increased. Another issue that is commonly present when working with patient material is a low quantity of cells. For longitudinal studies this is mainly at MRD time points, when most of the cells are expected to be healthy but a small aberrant population may remain. A smaller marker panel may be used in these instances to reduce the amount of needed material. Markers may be selected based on their aberrant expression at diagnosis, yet this may exclude markers with a different-from-normal expression that arise following treatment. Lastly, we are currently working on automating the process by a machine-learning approach of creating and assigning clusters to a sample or cell type to reduce analysis time and remove potential user bias.

TROUBLESHOOTING
Problem 1 A merge warning occurs (step 13). disadvantage of this is that the marker can no longer serve as a backbone marker and the calculate data step will take longer. Another possibility is to include more events in the 'calculate data' step. For example, instead of including only the blasts or CD45 + cells, include all viable cells. This will mainly help if cell numbers per file are relatively low.

Problem 2
No subpopulations can be identified in the APS plot (step 19).

Potential solution:
This is a possibility as some AMLs do not have clear subpopulations based on their marker expression profile. It may be that markers currently not included in the panel are capable of distinguishing subpopulations in the AML in question. Another option is that the highest ranking marker does have a larger range in expression than other markers but that lower ranked markers have a more bimodal expression pattern indicative of subpopulations. It is therefore, in these instances, recommended to deselect the one or two highest ranking parameters to see if the APS plot changes.

Problem 3
Stickiness of AML cells result in clogging of FACS sorter (step 25).
Potential solution AML cells can be diluted further in PBS mix; the DNaseI in the mix breaks down free DNA. If this is not sufficient cells can be put through a 35 mm cell strainer.

Problem 4
The sorted subpopulations do not differ in mutation status (step 26).

Potential solution
It is possible that sorting an AML into subpopulations based on the highest ranking marker(s) in the PCA does not result in genetically distinct subclones. A potential solution may be to sort the AML again, except this time not based on the highest ranking but on the second or third ranking marker. Another explanation for the lack of difference in mutation status may be that the differences in marker expression are not due to genetic, but due to epigenetic or transcriptomic variation between the subpopulations. This will require more extensive analysis of the sorted subpopulations than NGS sequencing.

Problem 5
Running the CombiFlow script results in an error (step 29)

Potential solution
Check if the script refers to the correct files. The metadata file should contain the exact file names that are included in the zipped folder containing the FCS files. The markers included in the analysis should match between each FCS file and between the FCS files and the panel file. Lastly, the CombiFlow script can be used for different comparisons, e.g., AML versus healthy or tracking an AML patient longitudinally. This may affect the description of the condition, which should match between the metadata file and the script.

RESOURCE AVAILABILITY
Lead contact Further information and requests for resources and reagents should be directed to and will be fulfilled by the lead contact, Prof Dr. J.J. Schuringa, j.j.schuringa@umcg.nl

Materials availability
This study did not generate new unique reagents.