Introduction

AML is a broad diagnostic category encompassing a genetically heterogeneous set of myeloid malignancies,1, 2, 3, 4 with molecularly distinct subgroups displaying different outcomes to treatment5, 6 and the potential for intrapatient oligoclonality such that the predominant clone at presentation is not necessarily the leukemic clone ultimately responsible for clinical relapse and death.7 Relapse of AML after allo-SCT has a dismal prognosis and remains the most common form of treatment failure.8, 9, 10, 11, 12

For AML patients in CR after initial treatment the use of high-sensitivity tests to detect measurable residual disease (MRD,13, 14, 15, 16, 17, 18, 19, 20) prior to allo-SCT can identify patients with higher rates of relapse and death compared with MRD-negative patients and may supersede stratification based on clinical discriminators, such as CR1 vs CR2.21 In situations where highly specific assays are available, such as core-binding factor AML, MRD is the most important prognostic factor for relapse prediction even when high-risk clinical features and pre-treatment molecular risk stratification markers are considered,22, 23 and can be used to optimize subsequent treatment.24 Unfortunately, due in part to the heterogeneity of AML, no single universal AML MRD assay has yet been developed.25, 26

WT1 is a tumor suppressor gene that encodes for a zinc-finger transcription factor and is expressed in ~85–90% of AML cases, and is aberrantly overexpressed to the extent that it can be used as a marker for MRD in AML in between 46 and 74% of cases.27, 28, 29, 30 WT1 expression by RQ–PCR27, 28, 31, 32, 33, 34, 35, 36, 37, 38, 39 has been tested extensively for AML MRD, as have flow cytometry approaches,40, 41, 42, 43, 44, 45, 46 but unfortunately neither method is applicable to all AML patients. We therefore sought to test whether a novel AML MRD approach incorporating detection of multiple potentially overexpressed genes could offer superior pre-SCT relapse risk stratification compared with the use of WT1 alone.

Patients and methods

Patient eligibility

Laboratory analysis was performed on samples previously collected from AML (excluding APL) patients who had received allo-SCT as part of clinical protocols (MB, AJB) performed between 1994 and 2012. These were predominately myeloablative, T-cell depleted, PBSC sibling-matched allogeneic transplants with cyclosporine-based GVHD prophylaxis (Supplementary Table S1). Eligibility criteria for MRD testing were a stored pre-SCT peripheral blood sample, pathological evaluation of disease status within 2 months prior to transplant date and at least 12 months of post-SCT clinical outcome data or until date of death if this occurred before 12 months. All clinical protocols were conducted in accordance with Declaration of Helsinki principles and were approved by the institutional review board. Written informed consent was obtained from all subjects.

Clinical samples

Peripheral blood samples from fifty healthy adult donors were collected as baseline controls. Patient samples immediately prior to transplantation were taken from aliquots of a research leukapheresis product processed using ficoll hypaque isolation, followed by freezing and storage in the vapor phase of liquid nitrogen. RNA was isolated from peripheral blood samples of both patient and healthy donors using AllPrep Mini Kits (Qiagen, Valencia, CA, USA), and assessed using a Nanodrop 1000 Spectrophotometer (Wilmington, DE, USA).

Real-time PCR array

One μg of total RNA was reverse-transcribed into cDNA using the RT2 First Strand Kit (Qiagen). Custom RT2 Profiler PCR array plates including controls for human genomic DNA contamination, reverse transcription and PCR efficacy (SABiosciences, Qiagen) were used for RQ–PCR reactions performed using RT2 SYBR Green ROX qPCR Mastermix (SABiosciences, Valencia, CA, USA) on an ABI 7900 thermal cycler (Applied Biosystems, Foster City, CA, USA) as described previously.47

Data handling and statistical analysis

During the study period, the format and content of pathologist reports at our institution varied; therefore the term ‘active disease’ (relapsed or refractory) has been used here to refer to the current consensus AML response criteria48 of at least 5% blasts on BM aspirate differential and/or an AML-defining chromosomal abnormality. Clinical annotation of samples for pathological diagnosis and clinical outcome was performed independently by two physicians (KL, NJ) blinded to research laboratory testing results. Statistical analysis was performed using GraphPad Prism (La Jolla, CA, USA) with comparison between survival and relapse curves using the log-rank (Mantel–Cox) test, McNemar’s test as described,49 and receiver-operating characteristic curves created using StAR.50

Results

Patients

Eligible samples were identified from 74 of the 111 AML (excluding APL) patients transplanted using these protocols during the study time period (Table 1; Supplementary Table S2; Table 2). Patients studied were transplanted a median of 9.9 years (range: 1.75–20.2 years) before this laboratory analysis.

Table 1 Patient characteristics
Table 2 Individual patient characteristics

Twenty-two patients suffered non-relapse mortality (30%), 18 within the first year after allo-SCT (24%). Twenty-eight (38%) patients relapsed, a median of 99 days after allo-SCT (range 33–1089 days); relapse was fatal in 96% of cases, with one patient still alive 81 days after relapse at the time of analysis. The median survival after relapse was 85 days (range 0–1328). Twenty-six patients (35%) had active disease immediately prior to allo-SCT; only three were alive at time of analysis (12%) including one with ongoing relapse.

Design and validation of a multigene (MG-MRD) RQ–PCR array for AML

We previously reported expression of potential leukemia-associated antigens51 using a highly reproducible RQ–PCR array in primary untreated human AML samples.47 We reanalyzed this data set to focus on 13 out of 30 peripheral blood samples that lacked WT1 expression of at least 50-fold higher than the median seen in normal individuals (Figure 1). We identified that a combination of cyclin A1 (CCNA1, gene ID: 8900), proteinase 3 (PRTN3, gene ID: 5657), preferentially expressed antigen in melanoma (PRAME, Gene ID: 23532) and mesothelin (MSLN, Gene ID: 10232) (Supplementary Table S3) could provide at least one gene overexpressed at least 50-fold in 9 of these 13 patients, which together with WT1 would provide at least one gene highly overexpressed in 26 of all 30 newly diagnosed untreated patients (that is, 87% sensitivity for at least one 50-fold overexpressed gene).

Figure 1
figure 1

Overexpressed genes in AML patients with low levels of WT1 expression. Samples of peripheral blood from 30 newly diagnosed, untreated, AML patients were assayed for gene overexpression by quantitative real-time PCR as previously described.47 Data were reanalyzed to identify genes overexpressed at least 50-fold in those patients with less than 50-fold WT1 overexpression compared with the normal donors. Red: 50–99-fold overexpression; bright red: 100-fold or greater overexpression; black: not detected; white: transcript detected but less than 50-fold overexpressed compared with normal donors. CCNA1, cyclin A1; MSLN, mesothelin; PRAME, preferentially expressed antigen in melanoma; PRTN3, proteinase 3; WT1, Wilms tumor 1.

A custom RQ–PCR array containing these genes together with the housekeeping gene c-abl (ABL, gene ID: 25) was then tested using peripheral blood from 50 reportedly cancer-free healthy donors (Figure 2; Supplementary Figure S2). For each gene tested, a single high-expression outlier was removed. For WT1, PRAME, CCNA1 and PRTN3, a threshold value for MRD was then established as this 98th percentile expression level seen in healthy donors plus the addition of a ‘cushion’ of one half of s.d. (Figure 2). This is referred to as the 4-gene panel (4G-MRD), with elevation of any gene above that threshold classified as 4G-MRD positive. An exception to this method was used for MSLN, as a large s.d. was observed in expression levels in normal donors; the threshold value was therefore determined based on the maximum value seen in pre-transplant samples of those who did not relapse post transplant. Accordingly, as the set used for MSLN threshold discovery is part of the cohort under investigation, further validation will be required, and results from this ‘5G-MRD’ panel are therefore reported separately. As before, elevation of any one of the five genes above the threshold was considered as MRD.

Figure 2
figure 2

Patterns of MG-MRD expression. Level of expression of the constituent gene transcripts of the MG-MRD array in normal healthy donor control and pre-transplant AML patient peripheral blood samples. Gene expression normalized to expression of the ABL control gene. Selected threshold values are indicated by a horizontal dotted line. Two non-relapsing patients with pathologist-detectable disease pre-SCT also had 4G-MRD values above the threshold, one with both WT1 and CCNA1, the other with WT1 and PRTN3. Several patients in this historical self-described healthy donor cohort had MSLN expression above the selected threshold; MSLN is not a specific AML antigen and has also been described as being overexpressed in a variety of solid tumors.62 CCNA1, cyclin A1; MSLN, mesothelin; PRAME, preferentially expressed antigen in melanoma; PRTN3, proteinase 3; WT1, Wilms tumor 1.

Comparison between traditional BM pathological diagnosis and peripheral blood molecular testing

Pathological examination of the BM for assessment of both treatment response and relapse is highly integrated into our care of patients with AML.48 We wished to test how our peripheral blood-based molecular assay compared with traditional BM testing in the assessment of clinically evident AML disease burden prior to allo-SCT. Twenty-six patients (35%) had a clinical diagnosis of active disease prior to transplant based on BM examination (Table 1). Molecular testing of peripheral blood from the same pre-transplantation time period identified 22 patients using WT1 testing (85% concordance) or 24 using multiple gene MRD (MG-MRD) testing (92% concordance) (Figure 3a). The two patients identified by both morphological criteria and MG-MRD, but not WT1 testing alone, suffered from very early relapse (84 and 96 days) and are therefore likely to represent active (but WT1 negative) disease at the time of allo-SCT.

Figure 3
figure 3

MG-MRD correlates with pathological diagnosis and relapse risk. (a) Peripheral blood-based MG-MRD testing has good concordance with pathologist BM diagnosis pre-SCT. Blue: pathologist determination of active disease based on clinical examination of BM (Path+), but negative for peripheral blood MRD testing. Purple: both BM pathological diagnosis and MRD positive. Red: negative for active AML by pathologist examination (‘remission marrow’), but positive for residual disease by peripheral blood-based MRD testing. (b) MG-MRD can identify patients mistakenly classified as low risk by WT1 MRD. Patients with pre-SCT positivity for WT1 represent the high-risk group. MG-MRD testing can identify additional high-risk individuals from the ‘low risk’ WT1-negative group. The mortality in the additional nine patients (12%) reclassified as high risk was 100%. Six of those nine patients reclassified as high risk by MG-MRD were in a CR pre-transplant. (c) Pre-SCT MG-MRD testing improves prediction of early clinical relapses post SCT. All 28 relapses post SCT are plotted by day of clinical relapse aligned with the result of pre-SCT MRD testing. MG-MRD prior to transplantation correctly predicted all relapses in the first 180 days after SCT and was particularly useful in correctly identifying those at risk of early (that is, before median relapse of 99 days) relapse but not identified by WT1 testing. Patients relapsing at 33, 56, 59, 87, 91, 122, 181, 303, 304, 493 and 1089 days post SCT were in a CR prior to SCT (bold).

One of the two patients previously categorized as having active disease by BM examination but not identified by MG-MRD molecular testing of peripheral blood was an atypical case with only 5% blasts prior to transplant, but with cytogenetics showing a leukemia-defining 8,21 translocation in 4 of 20 metaphases. This patient suffered relapse 275 days after transplantation.

Risk stratification based on BM examination or WT1 testing

Our hypothesis was that highly sensitive detection of disease burden prior to allo-SCT could offer improved prediction of both relapse and survival. Traditional response criteria based on BM examination could risk-stratify these 74 patients into two groups: the ‘CR’ group (n=48) with a 3-year OS of 48%, and a group of 26 patients with active disease at the time of allo-SCT with a 3-year OS of only 15%.

Peripheral blood testing for WT1 overexpression identified 25 patients as positive who experienced a post-SCT 3-year OS of only 8%. WT1 testing, however, failed to predict 9 of 24 patients who relapsed in the first year post allo-SCT (63% sensitivity) (Figure 3c; Table 3). For patients in CR before SCT, WT1 correctly predicted just three of nine relapses in the year after SCT. Excluding patients who did not have the opportunity to relapse due to non-relapse mortality, WT1 has an excellent positive predictive power of 83–100% for occurrence of relapse in the year following transplantation, but failed to identify the majority who relapsed in this time period (Figure 4b; Table 3).

Table 3 Summary of MRD test performance characteristics (relapse 1 year post SCT)
Figure 4
figure 4

MG-MRD allows stratification into highly polarized groups for survival and relapse risk. (a) Three-year OS. MG-MRD can effectively segregate patients based on pre-SCT peripheral blood gene expression profile into groups with high and low risk of survival following transplantation. Color pie chart below survival curve illustrates the fraction of the entire cohort triaged to the high-risk category (red) based on the MRD method used. (b) Relapses in the first year after transplantation. Including only patients in pathologist-confirmed CR prior to transplantation and excluding patients dying of non-relapse causes. The green pie chart illustrates sensitivity of each pre-SCT MRD test to predict relapse in the year following transplantation. Statistical analysis was performed using GraphPad Prism with comparison between survival and relapse curves performed using the log-rank (Mantel–Cox) test.

Pre-SCT multigene MRD testing for prediction of post-SCT outcomes

On the basis of the premise that the addition of individually suboptimal secondary MRD assays to the backbone of WT1 testing would increase the overall sensitivity of MRD assessment, we applied the MG-MRD array to peripheral blood samples from this same pre-transplant time point using either WT1 plus three other genes (4G-MRD) or 4G-MRD plus a fifth gene (5G-MRD). MG-MRD risk stratification was superior to the use of WT1 RQ–PCR-based MRD alone. Use of additional genes in the MRD assay correctly reclassified patients assigned by WT1 testing to a ‘low risk’ MRD-negative category into a ‘high risk’ MRD-positive category (Figure 3b). In total, all nine patients additionally identified using MG-MRD from the cohort designated as ‘low risk’ based on testing negative for WT1 alone relapsed and/or died in the first year after transplantation (Figure 3b). Seven of these nine patients had been determined to be in pathological CR prior to allo-SCT. Six relapsed within 100 days of allo-SCT (four had been determined clinically to be in CR at time of allo-SCT), another relapsed 304 days after transplant and two died of non-relapse mortality in the first year after SCT. The sensitivity in predicting clinical relapse in the year after transplant was improved (92%) compared with using WT1 alone (63%). Importantly pre-SCT MG-MRD testing correctly predicted every case of early relapse within 100 days after allo-SCT compared with just 57% using WT1 alone (Figure 3c). The 46% of patients who were 5G-MRD positive prior to transplantation had a 3-year OS of just 9% and experienced 22 out of the total of 24 relapses observed in the first year following transplantation.

In patients entering transplant in CR, the improvement in sensitivity of pre-allo-SCT MRD assessment using MG-MRD (89%) versus WT1 alone (33%) to predict 1-year post-allo-SCT relapse was pronounced (Figure 4b; Table 3) and statistically significant (P=0.031, 95% confidence interval: 0.0–0.821, NcMemar’s exact test, Supplementary Figure S3). There was no statistically significant difference in specificity between these tests. MG-MRD prior to allo-SCT identified eight of the nine CR patients who relapsed in the first year (Figures 3c and 4b). MG-MRD in patients in CR prior to transplant was associated with a positive predictive value of 100% for death following transplant (all 4G-MRD-positive patients relapsed and died; all 5G-MRD-positive patients died, 80% from relapse, Figure 3a).

Discussion

WT1 remains the best single universal molecular marker for detection of AML minimal residual disease.27, 29, 31, 34, 35, 39, 52 It is, however, an imperfect test suitable for use in MRD detection in only ~50–75% cases of AML27, 28, 47 whilst also exhibiting significantly different patterns of expression in different AML subtypes.27, 28, 32 Rather than searching for a single superior alternative marker to WT1, we attempted to mitigate the limitations of this good, but suboptimal, test of AML MRD by the creation of a panel that allowed detection of not only WT1 but also of other genes that may be overexpressed in those AML samples with lower levels of WT1 expression (Figure 1). In prior studies, those AML patients in CR without detectable WT1 expression prior to allo-SCT were assigned to a ‘low risk’ group with a reported relapse rate between 20%34 and 27%,31 which compares well with the 24% 3-year relapse rate observed in the WT1-negative group in this large retrospective study. By incorporating other genes in addition to WT1 for the detection of MRD, our MG-MRD assay correctly reclassified, even when assessed prior to allo-SCT, between 10 and 20% of patients from the WT1 MRD-negative group into a high-risk group with 100% mortality (Figures 3 and 4). In this historical single-institution cohort, the 49% of patients (36/74) with detectable disease based on either traditional BM morphological examination and/or peripheral blood 5G-MRD positivity prior to allo-SCT experienced a 3-year relapse-free survival rate of only 6%.

Active disease identified by BM examination prior to allo-SCT is a well-known risk factor for relapse and death post allo-SCT,53 and in our series was associated with a mortality rate of 88% (23 of 26 patients). Notably, even for those patients in CR prior to allo-SCT, the presence of pre-transplantation 5G-MRD positivity in peripheral blood testing (seen in 10 patients, Figures 3a and 4b) was associated with survival statistically indistinguishable from those with pathologist BM-based diagnosis of active disease (P=0.78, curve comparison by a log-rank Mantel–Cox test, Supplementary Figure 1) and with a 100% mortality rate (at least 80% of which was attributable to relapse).

Testing peripheral blood prior to transplantation, 4G- and 5G-MRD assays identified 20 or 22, respectively, of the 24 patients who would suffer hematological relapse in the first year after allo-SCT (Figure 3c). Excluding those who died of the competing risk of non-relapse mortality, all three ‘false positive’ tests were also seen with WT1 testing, and no additional false-positive results were created by MG-MRD testing. Notably, one of these three ‘false positives’ did relapse in the second year after transplant. The two others were also positive for active disease prior to transplantation by marrow morphology, and likely represent successful cures by allo-SCT.

Although this multigene MRD assay represents an advance in high-sensitivity RQ–PCR-based detection of AML, several caveats should be noted. (1) Sensitivity to predict post-allo-SCT relapse was improved compared with use of WT1 alone, but remained less than 100% (67–92%, Table 3) due to the inability to detect late relapses based solely on pre-SCT assessment (Figure 3c). Surveillance monitoring for MRD in the post-SCT period may mitigate this. (2) Although prediction based on detectable WT1 expression in peripheral blood (that is, above the background threshold level seen in normal healthy donors) was in good agreement with that reported in prior studies,27, 28, 47 we did not use the ELN-validated WT1 PCR test27 as a direct comparator, and there is a possibility that the performance characteristics of the WT1 test could be further improved. Nevertheless, WT1 itself was also a core constituent of the multigene panel, and further optimization of the primer combinations used to detect other genes assayed by our multigene MRD panel is possible. Future work will evaluate use of more sophisticated methods for threshold setting, including machine-learning approaches.54 (3) Thresholds most predictive of relapse risk may vary between transplants with different conditioning and immunosuppression regimens and different AML subtypes. The transplant protocols in this series (Supplementary Table S1) were broadly similar. Future work will also collect detailed information on AML cytogenetic and molecular subtypes, which was not available here, and test a wider variety of transplant approaches, including reduced intensity conditioning and alternative donor transplants. (4) Finally, it should be emphasized that findings from this proof-of-concept study require validation in an independent cohort to determine the optimum sequence and specificity of the panel of primers that give maximum breadth of coverage and reliably detect AML in low-disease burden states. For example, although only two (of 24) relapses in the year following allo-SCT were not predicted by the 5G-MRD panel tested on pre-allo-SCT peripheral blood samples, it is notable that one of these had a diagnosis of AML based on a karyotype that included the leukemia-defining 8,21 translocation in the context of a low (~5%) level of CD34-positive cells on immunohistochemistry of BM. Other reports have previously commented on the trend for lower WT1 expression in AML containing the t(8;21) fusion protein AML1-ETO.27, 28 Subsequent iterations of this MG-MRD array will now also contain primers specific for t(8;21), Inv16,55 mutated NPM156 and potentially also other recurrent1, 57, 58 or personalized somatic mutations seen in AML. The other patient falsely classified as negative, based on pre-SCT MG-MRD testing, had a confirmed CR by BM examination but had a history of CNS involvement by AML.

The ability, using MG-MRD, to identify a group with exceptionally high post-transplant mortality primarily from disease relapse sets the stage for MRD-based clinical interventions based on relapse risk. This could include maintenance therapy with additional agents post allo-SCT to prevent relapse in MRD-positive patients,59, 60, 61 and de-escalation of conditioning in patients that are MRD negative prior to allo-SCT to avoid excessive toxicity. In our series, 13 of the 15 deaths seen in the 3 years following transplantation in the 38 pathological CR, MRD-negative patients were due to non-relapse causes. Direct assessment of tumor burden, and hence relapse risk, prior to transplantation could potentially help ‘tune’ immunosuppression and T-cell dose strategy, and given that some MRD-negative patients may already be cured prior to transplant may be able to identify a subset of individuals who could be spared allo-SCT.