Technical Aspects of Flow Cytometry-based Measurable Residual Disease Quantification in Acute Myeloid Leukemia: Experience of the European LeukemiaNet MRD Working Party

Measurable residual disease (MRD) quantified by multiparameter flow cytometry (MFC) is a strong and independent prognostic factor in acute myeloid leukemia (AML). However, several technical factors may affect the final read-out of the assay. Experts from the MRD Working Party of the European LeukemiaNet evaluated which aspects are crucial for accurate MFC-MRD measurement. Here, we report on the agreement, obtained via a combination of a cross-sectional questionnaire, live discussions, and a Delphi poll. The recommendations consist of several key issues from bone marrow sampling to final laboratory reporting to ensure quality and reproducibility of results. Furthermore, the experiences were tested by comparing two 8-color MRD panels in multiple laboratories. The results presented here underscore the feasibility and the utility of a harmonized theoretical and practical MFC-MRD assessment and are a next step toward further harmonization.


INTRODUCTION
The presence of measurable residual disease (MRD) in acute myeloid leukemia (AML) is an independent prognosticator of relapse and shorter survival. [1][2][3][4][5] MRD can be measured at high sensitivity using molecular assays (RT-qPCR, next-generation sequencing [NGS]) or multiparameter flow cytometry (MFC). [4][5][6][7] Advantages of MFC-MRD are its wide applicability (>90% of AML cases), short turnaround time (TAT), relatively high sensitivity (10 −3 to 10 −5 ), and the possibility to discriminate between living and dead cells. 8 Nevertheless, this technique has recognized flaws: (1) sensitivity and specificity depend on the different monoclonal antibody (MoAb) panel, numbers of cells analyzed, and discriminatory level of the leukemia-associated immunophenotypes (LAIPs); (2) there is limited harmonization and standardization, since many laboratories use their own MFC-MRD assay; and (3) analysis and interpretation of data require relevant expertise. Taken together, these may limit comparability and clinical interpretation of MFC-MRD results. Therefore, an international group of experts on behalf of the European LeukemiaNet (ELN) proposed in 2018 consensus recommendations regarding the entire process from specimen collection to result reporting. 5 Here, we provide more in-depth and updated technical guidance for flow cytometric AML-MRD from the ELN expert group following live ELN working group discussions, interlaboratory comparisons, a cross-sectional survey and a Delphi poll, all conducted between 2019 and 2021 (details in Supplementary Material, Supplemental Digital Content, http:// links.lww.com/HS/A214 and Heuser et al 9 ). These addressed how specific technical procedures can affect the quality of the flow data, aspects of assay analytical performance (eg, limit of blank [LOB], limit of detection [LOD], lower limit of quantification [LLOQ]) and included evaluation of 2 candidate 8-color MRD panels designed to facilitate standardization.

Bone marrow sampling procedures
The quality of the bone marrow (BM) aspirate depends substantially on the skill and experience of the person who performs the procedure. MRD should be assessed from a small volume (<5 mL) of the first pull of a BM aspirate to prevent dilution from peripheral blood (PB) (hemodilution). In fact, MRD frequencies are about 1-log lower in PB compared with BM, causing an increased likelihood of false-negative results when the sample is diluted. 10,11 Hemodilution has emerged as a critical issue to assess the reliability of a MRD test. There are several modalities to detect it, mainly consisting of formulas to detect PB contamination (Table 1). [12][13][14][15][16] All formulas require additional measurements to estimate hemodilution or BM purity, such as matched PB or specific markers, which are not included in the standard MRD panels. Easier to implement is the examination of the mast cell population (CD117 hi ), as a decrease (≤0.002%) might indicate PB contamination 15 ; however, the frequency of mast cells may be altered in myeloid neoplasias. 17 Another option to estimate the possible contamination of PB is by determining the proportion of mature neutrophils, with a presence above 90% indicating hemodilution. 5 An alternative approach is to change the denominator of the MRD assay from white blood cells (WBC) to the primitive/progenitor fraction (PM-MRD: based on CD34, CD117, or CD133). 5,18 No consensus has been reached for a specific modality to be used in daily practice because of implementing difficulty due to TAT or costs in large centers with many samples to process on a daily basis. However, consensus was that all used strategies should be further explored by the different centers to assess the frequency and extent of hemodilution in the different clinical trials. When the potential relevance has been established, it can be implemented as a standard comment of sample quality for MRD reporting. For now, when hemodilution is documented (including hypocellular, nonregenerated BM), a second BM evaluation should be requested within 2 weeks to avoid unreliable MRD results.
Another factor of variability is the anticoagulant used in tubes for MRD sample collection. Most ELN centers recommend EDTA tubes although heparin and sodium citrate are also utilized. All anticoagulants are liable to influence the sample. Although EDTA allows for a prolonged conservation of the samples over time, 19 it may induce a change of expression patterns of antigens such as CD11b ( Figure 1). 21 Due to this, heparin has been recommended by the ELN myelodysplastic syndromes (MDS) workgroup. 22,23 However, the other anticoagulants are acceptable as long as the laboratory validates its assay for stability. 24

Sample transportation
From the ELN experience, most MRD testing is performed in centralized reference laboratories due to expertise, infrastructure, and cost. This generates issues related to sample transport. Since BM cells start to deteriorate once removed from the BM environment, the interval between BM aspiration and analysis should preferably be shorter than 72 hours. 25 Samples are recommended to be stored at ambient condition to preserve cell viability. 26 All samples are preferably tested for viability, stability, and overall quality, but this is particularly important if a sample is more than 72 hours old. Viability and overall quality of the sample is assessed, even if no viability dye is included, by the initial plotting of forward scatter (FSC) versus side scatter (SSC) that checks the light scatter properties of the sample at the beginning of gating strategies. 27 When viability is inadequate based on the presence of high number of debris in FSC and SSC ( Figure 2), it is advisable to request a second BM specimen. When BM sample quality appears suboptimal, MRD analysis can still be performed but should be accompanied by a comment on sample quality in the report.

Sample processing
There are 2 main approaches to prepare a BM sample for MRD-MFC acquisition. The first and most utilized procedure is stain-lyse-wash (SLW), where the washing step can be omitted. 28,29 This method starts with staining using the appropriate amount of MoAbs, followed by lysing and optionally washing with phosphate-buffered saline (PBS). Human serum albumin or other protein solutions (0.1%) can be added to PBS to prevent cell clumping or cell aggregation to plastic, especially relevant for samples with poor quality such as those processed >72 hours after aspiration. 30 The second approach is lyse-washstain-wash (LWSW) procedure. This includes bulk lysis of red blood cells followed by washing with PBS, resuspension of the pellet in a smaller volume allowing for increased cell concentration, staining of the cells with MoAb cocktail, and sometimes a second washing with PBS. 30 The bulk lysis may also lead to a higher reproducibility, since the labeling conditions are reproducible, and the volume is constant for a given quantity of cells. The SLW procedure allows a clearer separation between positive and negative events, while using the bulk lysis followed by washing allows leukocytes to be concentrated and causes fewer artifacts. 29,31 Different lysis solutions are used to eliminate red blood cells. Ammonium chloride (NH 4 Cl) preparations show minimal effects on WBC counts, however the washing steps required to remove the buffer yield unavoidable cell loss. Lysis buffers, may contain chemicals to fix remaining WBC after red blood cells lysis, for example, fluorescence-activated cell sorting Table 1.  Lyse (BD Biosciences, San Jose, CA), contains approximately 1.5% paraformaldehyde at diluted working solution concentration. These effects may also be achieved by adding a small amount (0.25%) of ultra-pure formaldehyde to NH 4 Cl buffer in an SLW procedure to allow fixation with preservation of light scatter properties allowing viability assessment and preservation of nucleated red cells. 27 Several other stable commercial lysis reagents such as Versalyse (Beckman Coulter; Miami, FL) allow omitting washing without damaging leukocytes. The panel does not suggest a specific lysis protocol, provided that the selected lysis solution has the ability to maintain optimal FSC and SSC properties, and the mean fluorescent intensity (MFI) of all markers that should allow for the detection and enumeration of cell populations of interest ( Figure 3). The advice of the ELN MFC-MRD working group is against additional sample fixation, as it causes changes in morphology due  to cell shrinkage and loss of granulation, resulting in worse FSC and SSC. 25 The same sample preparation should be applied to all MRD samples regardless of the time point. 29

FLOW CYTOMETRY
Monoclonal antibody panels Current MoAb panels used by ELN members consist of 8 to 10 fluorochromes to allow proper discrimination of the aberrant markers required to identify MRD. The ELN MFC expert group agreed on specific MoAb backbone markers to select cell populations of interest, to improve comparability among labs, and to allow, by specific software, merging of different tubes with various LAIP markers. The backbone should consist of 3-5 markers among CD45, CD34, CD117, CD13, CD33, or HLA-DR. This combination provides CD45 for WBC gating, primitive markers (CD34, CD117), and myeloid markers (CD13, CD33, or HLA-DR) to highlight the leukemia cell population. Additional markers will comprise the LAIP-specific antibodies (eg, CD56 and CD7), which are most informative for distinguishing leukemic cells from normal hematopoietic cells. 32 This recommendation was already included in the previous version of the guidelines, but in this paper, it is also substantiated by a multicentric validation of a tube including these backbone markers. In cases with a monocytic/myelomonocytic component (5%-10% of AML cases, but more often seen in high-risk MPN/MDS), CD64, CD11b, and CD4 may be added, such as in the previously proposed special "monocytic tube," consisting of CD64/CD11b/CD14/CD4/CD34/HLA-DR/ CD33/CD45. 5,33 This tube relies mostly on lack of CD14 expression on CD4 + HLA-DR + CD64 + monocytic cells or lack of HLA-DR, CD4, and CD64 on CD14 positive cells. Other markers can be explored in this specific setting (eg, CD56, CD35, IREM2) as well. 34 Cytometer settings and set-up of the flow cytometry instruments Standardized flow cytometer settings are crucial for reproducible measurement and should avoid interference between fluorochromes. To make comparable measurements between different instruments, standard flow cytometer settings for target MFIs are frequently used, such as the EuroFlow 29 and the Harmonemia settings. 35 Daily cytometer calibration checks are strongly advised to verify correct performance of the cytometer's optical configuration. 36 In addition, it is recommended to harmonize instrument settings using calibration beads specific to the instrument platform. Specialized reference beads are also used on the cytometer to calibrate and standardize performance to reduce intertest variability. 37 Given the increasing number of fluorochromes available for standard instruments and accompanying increased complexity, automated compensation should be performed using either software supplied with the instrument or off-line software. Manual compensation is strongly discouraged by the panel because it is time-consuming and can cause incorrect/irreproducible MFC results. 25 Sample running and leukemic cell detection At diagnosis of AML, the BM contains frequent LAIP+ blasts and these high numbers allow accurate LAIP detection by acquiring at least 50,000 events in the blast gate. By contrast, MRD events in follow-up samples are expected to be scarce and the process should follow the rules of rare events acquisition. The sensitivity of the assay depends on the number of relevant events acquired. To allow accurate MRD assessment at the limit of detection (LOD) and limit of quantification (LOQ), acquisition of a minimum denominator of 500,000 to 1,000,000 CD45-expressing events using the combined LAIP/DfN (different from normal) approach is advised and negativity should be confirmed with all tubes. Increasing the number of analyzed cells may further improve sensitivity and allow monitoring of (2022) 6:1 www.hemaspherejournal.com minority clones identified by diagnostic LAIPs. Acquisition at a high pressure or flow rate can lead to several technical issues; for example, changes in scatter patterns may disturb the gating procedure ( Figure 4). Furthermore, fluidic instability, for example, shear turbulence or disturbances, may substantially alter the detection of rare events; therefore, plotting the TIME parameter against any other sensitive to fluidic alterations allows postacquisition correction during analysis ( Figure 5). Today, events not fulfilling appropriate characteristics of singularity, compensation, and fluidic may be excluded from the final analysis by dedicated softwares (eg, FlowAI). 38

Selection of control samples
To define whether antigen expression is aberrant, it is essential to exploit every possible effort in evaluating the performance of each aberrant phenotypes under different conditions. This reference is generated by running the selected MRD panel on sufficient numbers of normal ("control") BM samples, collected during surgical procedures, bone marrow harvesting from healthy donor, or in patients with solid or hematological malignancies but without BM infiltration. The number of control samples should be at least 10 and the procedure should be repeated every time the assay methodology is modified. A larger number and variety of samples will minimize the risk of "control" samples containing aberrant cells that could be mistaken for AML. It is agreed that obvious outliers in control BM, for example, >2 SD, should be excluded. Also, during "stressed" normal differentiation, antigen expression can change without implying the presence of disease. Furthermore, in older patients clonal hematopoiesis of indeterminate potential can occur, although its influence on differentiation antigens pattern is not fully elucidated. 39,40 However, there is some evidence that preleukemic clonal hematopoiesis may persist in AML patients who are in complete remission without any genetic evidence of MRD, and that the aberrant blast phenotypes may represent preleukemic clonal hematopoiesis leading to their misidentification as MRD. 41,42 Age-matched controls are therefore preferred to avoid possible age-related differences in marker expression to be identified as LAIPs. Furthermore, it is essential to know how the surface marker expression may be modified in the recovery phase after chemotherapy or transplantation. This observation implies that marker combinations defining a LAIP may have different frequencies in normal background marrow cells and, consequently, different specificity for MRD detection. 43 Several sources can be used to investigate BM regeneration such as samples after completion of consolidation therapy with no subsequent emerging leukemia; or post stem cell transplantation LAIP negative patients with no subsequent relapse or patients treated with myelosuppressive chemotherapy for other malignancies not involving BM.

Gating strategy
The identification of leukemic cells starts by visualizing the different cell populations, irrespective of the software used. To ensure the best quality of relevant event acquisition, a parameter sensitive to fluidic alterations [eg, FSC-Height or FSC-Area] should be displayed versus TIME in each tube. After eliminating debris and checking for viability on an FSC/SSC scatter plot, as mentioned earlier, it is recommended to perform doublet discrimination (eg, FSC-A versus FSC-H) to exclude cell aggregates. Then WBC are defined as the CD45 expressing population in a CD45/SSC plot. The primitive/progenitor cells can be found in the CD34+, CD117+, or CD133+ fractions. The immature blasts may be found in the CD45dim/SSClow population or in the CD45 high/CD34 negative fraction if cells are more mature. Within those fractions, aberrancies can be defined when myeloid markers (CD13 and CD33) are combined with lineage markers not seen in normal bone marrow (eg, CD7 and CD56). Combinations between the primitive marker positive cells and maturation markers can also be found (eg, CD11b). In addition, some under-and over-expression of markers can be observed (eg, CD33 and HLA-DR). Since the MRD gating strategy requires hierarchical or sequential gating to identify the aberrant immunophenotype, a final confirmation of complete inclusion of the leukemic population in each gate employed is suggested. This can be usefully visualized using a density plot display. As a final step, backgating of the LAIP cells on the CD45, CD34, and SSC/FSC plot can be used to ensure identification of appropriate population. Backgating of CD45 and FSC plot are especially important for monocytic/myelomonocytic aberrancies, which are harder to gate compared to immature cells due to the high overlap with regenerating bone marrow. The gating strategy is visualized in Figure S1, Supplemental Digital Content, http://links.lww.com/HS/A214.

Data analysis and interpretation
Different software can be used for MRD-MFC analysis of the digital data files (.fcs) with similar performance. Every software has its own advantages and disadvantages. There is no agreement to the minimal percentage of LAIP that should be present at diagnosis to select a given marker for MRD monitoring during therapy. Since antigen expression can shift during therapy, it is recommended to identify more than 1 LAIP at diagnosis and to recognize the presence of new LAIP following therapy to reduce false-negative results when using the LAIP approach. In principle, the LAIP method measures the most prominent LAIP population at diagnosis, which is then followed during therapy. 30 This approach is optimized by the analysis of a baseline sample at diagnosis, which is not always available in daily routine settings but should be strongly recommended. In addition, emergent clones can result from clonal evolution or from the persistence of chemoresistant subclones during follow-up and might be missed with the LAIP approach. [44][45][46] Therefore, some researchers use the DfN approach to identify aberrant populations following therapy. 27 The DfN approach can be applied in the absence of a diagnostic sample. Since it does not rely on stability of the diagnostic LAIP during treatment course it can identify leukemic populations even when a subclone emerges outside the LAIP selected at diagnosis. However, this approach requires detailed knowledge of normal and regenerating BM profiles to distinguish the AML clones from preleukemic immunophenotypes. 41 The different techniques also influence the specificity and sensitivity of the assay, with likely higher specificity of the LAIP method compared to higher sensitivity with the DfN approach. 43 The ELN experts would therefore recommend a combination of the 2 analysis methods, 5 recognizing that caution should be used for certain markers that may be transiently expressed in regenerating BM (eg, CD25, CD22, and CD15, from published and shared experience). 43 Regardless of the approach used, the panel suggest a particular attention be devoted to emerging clones also in regions not originally included in the blast gate. An example of an emerging clone in the DfN approach can be found in Figure S2, Supplemental Digital Content, http://links. lww.com/HS/A214.
For clinical decision-making based on the presence of MRD, the threshold defining positivity is of crucial importance and may depend on the time point of sample collection during and after therapy, the treatment schedule and the AML subtype. In general, the threshold of 0.1% after 2 cycles of chemotherapy is used as a prognostic factor in AML for outcome and for clinical decisions regarding intensity of the consolidation treatment. 47 Although more research is required to validate the cutoff for a particular clinical situation, it is essential to be aware of the performance characteristics of the assay in the lower ranges of detection due to the interference of background events. For accurate knowledge of assay performance, 3 characteristics need to be considered: (1) LOB is the maximum number of LAIP cells measured in samples lacking leukemia (such as normal or regenerating BM or samples not stained with the antibody of interest) [LOB = mean blank + 1.645(SD blank )]; (2) limit of detection (LOD) is the minimal number of LAIP cells that can accurately be distinguished above background [LOD = LOB + 1.645(SD low positive )]; (3) LLOQ is the lowest LAIP% that can be reliably quantified relative to a defined acceptance criterion and is equal to or higher than LOD. 48 LOD is ideally established by measuring 10 samples having a very low positive LAIP in triplicate. A coefficient of variation (CV) of <30% is proposed to confirm acceptable LLOQ. 49 The LOD of DfN gates is estimated in the same manner as for the LAIP approach.
For multiple myeloma and CLL MFC-MRD, a precise number of events to define the LOD and LLOQ are suggested based on CV calculated from Poisson statistics for rare events. 50,51 It should be emphasized that these represent theoretical estimates of the reproducibility of enumeration for small numbers of events and must be compared with a desired criterion for reproducibility to determine a theoretical LOD or LLOQ, for example, a desired CV of <10% would require at least 100 events in the population. In addition, they do not take into account the impact of noise or other assay performance characteristics so are a best-case scenario that must be confirmed through experiment for each assay. Applied to AML-MRD assessment, a cluster of 20 events carrying an aberrant phenotype can be sufficient for the recognition of MRD in a well-controlled assay and can represent the LOD, that is, whether an abnormal population is present or not. Similarly, a cluster greater than 50 events can be regarded as the threshold for a standardized and reproducible enumeration of rare populations and can represent LLOQ if a CV of 14% is judged acceptable. 52,53 Thus, the acquisition of 500,000 to 1,000,000 events will allow a theoretical LOD of 0.004% and 0.002% and a LLOQ of 0.01% and 0.005%, respectively. Note that the values obtained by this approach will often differ from those obtained using data as defined in the prior paragraph. These and other approaches are currently being evaluated for AML. The FDA advises to technically validate the LLOQ of the assay 1-log below the chosen threshold for clinical decision-making. 54 This restricts lowering the threshold for MRD positivity and negativity that can be used in clinical studies, although lower MRD thresholds have been prospectively validated for prognostic impact as has MRD positivity defined as any detectable MRD. 6,[55][56][57][58] The panel did not modify the suggestions given in the first release of the guidelines, so, after 2 cycles of intensive chemotherapy, the threshold of 0.1% on a denominator of 500,000-1,000,000 CD45expressing relevant events is still the standard to be pursued. This aside, LOD and LOQ of each determination should be specified and the clinical value of MRD above or below these limits should be actively researched.

Reporting
In most multicenter clinical trials, the final MRD assay result is reported as "MRD-positive" or "MRD-negative" to the clinicians which makes the MRD results easy interpretable. However, this does not imply that all MRD detectable by the assay but below the "MRD positive" threshold (such as 0.1%) holds no prognostic significance. The "technical MRD" below the 0.1% threshold with appearance of residual or emerging leukemic populations may also be described in the report to alert the clinician in case monitoring closely with short follow-up is advisable. For clinical decision-making, the MRD assay should be analytically validated based on the guidelines for rare events in MFC. 59 The accuracy and level of confidence for an MRD result is in part dependent on the number of relevant events acquired (CD45-expressing or mononuclear cells), as described earlier, so report results should be qualified if insufficient cells are evaluated to meet the assay's validated LOD. In addition, report results should be qualified if sample quality, cell viability, or hemodilution impair the performance characteristics of the assay and may result in a false-negative result. When a sample does not meet the required quality criteria, a repeat BM sample should be requested in 2-4 weeks when clinically indicated. The diagnostic conclusion of MRD testing and the informative value of the final amount of residual leukemic cells should be mutually agreed upon between experts from the laboratory and clinicians. When molecular MRD is also available, the results should be combined to further increase the probability of predicting relapse. 55,60

INTERLABORATORY VALIDATION
The working group proposes that a minimum set of CD markers/fluorochromes should be implemented as a prerequisite for harmonized MRD detection including 5 of the suggested (see earlier) backbone markers (CD45, CD34, CD117, CD13, and CD33), 2 aberrant lineage markers (CD7, CD56), and HLA-DR. To compare data from different laboratories using these markers, a consensus ELN tube based on a large German study was validated. In addition, the clinically validated HOVON P1 tube, which is validated in several HOVON/ SAKK/AML-SG protocols, was used as comparison. With both tubes, experts could identify a useful LAIP in a range of 70%-90% of AML cases in different studies compromised of several hundred patients consisting of all WHO-AML subtypes except acute promyelocytic leukemia. 61 Both tubes consist of the same antigens but with different fluorochromes measured in different detectors ( Figure 6). There are 2 different CD45 clones available for the ELN tube depending on the instrument used: HI30 MoAb (order number 560777) for BD Biosciences instruments (San Jose, CA) and J33 antibody (order number B36294) for Beckman Coulter instruments (Brea, CA). Both tubes were measured in a subset of 62 diagnosis and follow-up samples by 4 different laboratories, which showed that blast percentage (R pearson = 0.99, P < 0.001) and MRDpercentage (R pearson = 0.98, P < 0.001) were significantly correlated ( Figure 7A, B). Moreover, we selected 6 samples, which were measured in 1 laboratory using both HOVON P1 and ELN tubes and compared the results solely based on the gating between 4 centers with 3 different methods of gating including a DfN approach. Parameters collected were as follows: WBC, blast-, and MRD-percentage. Although a good concordance in MRD% was found between the tubes in each laboratory, there were some outlier results (in 5/24 analyses), occurring equally in both tubes ( Figure 7C). After discussion by reviewing the results together, it was agreed that the gating of CD45 and CD34 is crucial and the CD34 positivity was a little more distinct in the HOVON tube. As shown in the figure, some outliers were present in this comparison, but these were revoked after discussion by adjusting the gates and in 1 case by selecting another CD marker combination as LAIP. The presented results reflect the need for strict gating strategies and definition of useful LAIPs to have the best comparison between different laboratories. The most often found LAIPs using these tubes are a combination of CD34 or CD117 as primitive marker with: CD7+, CD56+, CD33-, and CD13-/CD33+ (an overview of most used LAIPs from the consensus tube can be found in Table S1, Supplemental Digital Content, http://links.lww.com/ HS/A214). A marker combination that was frequently seen in regenerating bone marrow and therefore not recommended is CD34-CD117+HLA-DR+.

DISCUSSION
The number of centers using MFC as a tool to measure MRD is progressively increasing worldwide, and there is a concomitant demand for harmonization and qualification. This manuscript provides a consensus document for the optimal MFC-MRD assessments based on the combined multicenter expertise of the ELN-group members. MRD assessment using MFC should be performed in a specialized laboratory due to the high complexity of the analysis. Strict quality criteria should be put in place to ensure comparability of MRD results among laboratories and different clinical studies. Laboratories might have some room to adhere to their own local standard operating procedures, but baseline criteria are crucial. First, the flow cytometer settings should allow quality control and adequate scatter properties. Second, the chosen panel is recommended to contain backbone markers (recommended: CD45, CD34, CD117, CD13, CD33, and HLA-DR, see earlier) in each tube to clearly identify the cell populations of interest. The third requirement is a strict gating strategy that has been validated to measure LAIP cells not found in normal and regenerating BM. Finally, to use the MRD measurements for clinical decision-making, the assay needs to be fully validated (LOB, LOD, LLOQ, and other features such as precision and stability).
Lack of harmonization in technical approaches to MRD measurements may render MRD data comparison between studies challenging, as seen in an some meta-analyses. 62 It may also reduce its prognostic value. 63 Yet, recently a meta-analysis of 81 studies with different MRD methods showed a clear prognostic value for MRD in uni-and multivariate analyses. 4 The implementation of a universal tube is a huge step to reduce interlaboratory variation. Both the ELN-and the HOVON P1 tubes allowed good distinction of different cell populations and produced comparable intra-and interlaboratory results. Large multicenter clinical trials have been performed using these tubes and when complemented with additional data, will allow further insights in optimization of MRD for clinical use. Although not accredited for AML-MRD and still in a pilot phase, the initiative of United Kingdom National External Quality Assessment Site to standardize MRD measured by flow cytometry and add an external quality assessment is much appreciated. The round robin test performed by the initiative is an example of how a 4-eye principle and harmonizing results can lead to improved MRD measurements.
Interpretation of these findings are included in an update of the AML-MRD consensus manuscript in which the recommendations were substantiated using a Delphi poll (See also Heuser et al). 9,64 Interestingly, our expert panel on MFC-MRD had a mean overall consensus of 88% (range 76%-94%) in the last round of the Delphi poll. The main message of our intensive collaboration is, that we may not need to have 1 method that fits all but defined essential quality targets to achieve accurate and reliable MRD results. Technical Aspects of Flow Cytometry MRD in AML

Future perspectives
There are several new factors that might improve MRD assessment in the near future. Currently, 25%-30% of MRDnegative patients still develop a relapse and some ways to reduce these false-negative results are being investigated. We might see a larger contribution of MRD based on leukemia stem cell (LSC) frequency because it has shown to be of prospective additive value to MRD in AML risk classifications. 65,66 The largest drawback to LSC is the effort to identify the very low frequency of the stem cells, which may not be present in all subtypes of AML with similar immunophenotypes. Moreover, LSC population testing also has to undergo the same standardization/harmonization steps as MRD testing by MFC (eg, instrument settings, reagents and panel). One recent major step toward standardization is the fabrication of a single LSC tube, 67 that has been validated in a multicenter setting. 68 Also implementation of prepared dry tubes kits could limit technical variation, but research is still ongoing.
MRD assessment is mostly used for clinical decision-making after cycle 2 to select the appropriate consolidation and pretransplant therapy. There is less MRD assay data available to guide decision-making at other time points. It has previously been observed that the optimal time point for MRD measurement could be influenced by specific molecular aberrations. 69 Across different studies, different cutoffs are used to define MRD positivity (0%-0.1%). 52 The optimal cutoff is still under debate, in part because of the relatively high LOD and LLOQ of current assays. As the cutoff for clinical decision-making is lowered, LOD and LLOQ need to be lowered as well and more sensitive and consistent approaches must be developed. Background events will also play a larger role at lower MRD cutoffs, increasing the chance of a false-positive result. Therefore, although this is a time-consuming effort, it is strongly advised to determine the LOB/LOD/LLOQ and background expression mainly for the most used LAIPs to increase the accuracy of the assay. 43,45 Likewise, as a complement to BM, the use of PB for monitoring of MRD could improve the accuracy of MFC-MRD because it suffers less from background. 10,70 Single center studies showed that MRD as measured using PB is highly specific and may therefore  have a prominent role in clinical management and MRD monitoring after treatment. 10,11 Easier sampling could lower patient discomfort and allow MRD to be more frequently assessed. However, MRD monitoring can still be challenging in specific cases, such as AML with monocytic differentiation, where populations easily overlap, generating false-positive results. An extensive evaluation with cross-correlation is therefore warranted.
Recent developments in the field of precision medicine and targeted therapy for AML have led to an interest in finding specific molecular aberrations. New techniques, such as NGS may add further value in predicting the recurrence of leukemia. 60,71 These techniques come with their own disadvantages and are to date less routinely applicable than flow cytometry but represent additional tools to increase the prognostic sensitivity and specificity of MRD detection. 72 Currently, most laboratories still perform the gating strategy manually, which is claimed to be time-consuming, subjective, and expert dependent. Hence, an automated strategy with use of computational data to analyze the MFC-MRD diagnostics such as FlowSOM is promising and has gained a lot of interest. 20,[73][74][75] Unsupervised analysis has also been applied in the pretransplant setting. 76 The future application of unsupervised/ machine learning approaches to data analysis may reduce interobserver variability and therefore contribute to the harmonization of MRD results.

CONCLUDING REMARKS
In conclusion, MRD measured by MFC has become a relevant tool in an increasing number of hematology centers. Implementing techniques comes with challenges due to complexity, in particular for AML, in which clonal heterogeneity prohibits a "one size fits all" approach. In the future, consensus approaches, such as attempted in this paper, may contribute to reducing subjectivity and determine a common analytical backbone for MFC-MRD. This will facilitate interlaboratory comparisons of MFC-MRD results for daily practice and clinical trials as well as enabling future meta-analyses incorporating MFC-MRD big data.
Take home messages • MRD should be assessed from a small volume (<5 mL) of the first pull of a bone marrow aspirate to prevent hemodilution; • Bone marrow samples should be stored at ambient condition and analyzed within 72 hours after collection and processed using SLW or LWSW procedures; • At follow-up, collect a minimum of 500,000 CD45-expressing events and 100 viable cells in the blast compartment assessed for aberrancy(s) for determining MRD negativity; • For clinical decision-making, MRD assessment should be performed with a qualified assay, including adequate LOB, LOD, and LLOQ with a harmonized use of the integrated LAIP and DfN strategy; • Using these technical requirements with own in-house protocols, substantial concordance between different laboratories can be achieved as demonstrated with the consensus tube (composed of CD34, CD13, CD7, CD33, CD56, CD117, HLA-DR, and CD45).