Positron Emission Tomography Score Has Greater Prognostic Significance Than Pretreatment Risk Stratification in Early-Stage Hodgkin Lymphoma in the UK RAPID Study.

PURPOSE Accurate strati ﬁ cation of patients is an important goal in Hodgkin lymphoma (HL), but the role of pretreatment clinical risk strati ﬁ cation in the context of positron emission tomography (PET) – adapted treatment is unclear. We performed a subsidiary analysis of the RAPID trial to assess the prognostic value of pretreatment risk factors and PET score in determining outcomes. associated with adjustment for baseline


INTRODUCTION
The goal of Hodgkin lymphoma (HL) treatment is to optimize patient outcomes by maximizing cure while minimizing toxicity. Cure rates for early-stage HL are high, but treatment toxicity reduces long-term survival and confers significant morbidity. 1,2 Risk-adapted treatment strategies can potentially address this but are reliant on accurate risk stratification of patients to facilitate individualized treatment approaches. The German Hodgkin Study Group (GHSG) and the European Organisation for Research and Treatment of Cancer (EORTC) have developed clinical prognostic scores for early-stage HL that are frequently used to risk stratify patients for treatment selection, [3][4][5] but it is unclear whether these scores have sufficient specificity to predict outcomes with modern combinedmodality treatment. 6 Over the past decade, early response to treatment assessed by [ 18 F]fluorodeoxyglucose (FDG) positron emission tomography (PET) has emerged as a powerful prognostic indicator in HL. 7,8 PET-guided approaches have been evaluated in trials [9][10][11][12][13] and successfully implemented in clinical practice, 14 as one of the first applications of personalized medicine. The introduction of the 5-point scale for PET reporting helped to standardize image interpretation 15 and allowed the threshold used to define a positive PET scan to be adapted according to the research question. 16 In trials involving patients with HL, a positive PET scan has been defined as either FDG uptake greater than the normal mediastinum (PET score, 3, 4, or 5) or equal to or greater than the liver uptake (score, 4 or 5), partly dependent on whether the study intervention involves treatment escalation or de-escalation. 16 Little is known about the predictive value of individual PET scores, and it remains unclear whether all PET-positive or -negative patients derive equal benefit from PET-adapted approaches. 17 The randomized H10 study demonstrated that patients with early-stage HL with a positive PET scan after two cycles of doxorubicin, bleomycin, vinblastine, and dacarbazine (ABVD) benefit from treatment intensification. 9 Patients with positive PET scans had a progression-free survival (PFS) advantage when switched to escalated bleomycin, etoposide, doxorubicin, cyclophosphamide, vincristine, procarbazine, and prednisone (escBEACOPP), compared with continuing ABVD, although they experienced greater toxicity. PET positivity in H10 was defined by International Harmonization Project (IHP) criteria (broadly equivalent to PET scores of 3, 4, or 5 18 ), and it is unknown whether all patients with a PET score of 3 or higher derive equal benefit from treatment escalation. We performed this subsidiary analysis of the United Kingdom (UK) National Cancer Research Institute (NCRI) RAPID (Randomised Phase III Trial to Determine the Role of FDG-PET Imaging in Clinical Stages IA/IIA Hodgkin's Disease) study, 11 in which all PETpositive patients continued ABVD, to explore whether a subset of PET-positive patients with early-stage HL could be adequately treated with ABVD and radiotherapy. Our aim was to investigate the associations of PET score after three cycles of ABVD, pretreatment risk factors, and clinical prognostic scores with patient outcomes.

Study Design
The RAPID trial was one of the first to use a PET-adapted treatment approach in early-stage HL. 11 The primary objective of this phase III noninferiority study was to investigate whether PET response could be used to omit radiotherapy in selected patients and reduce late toxicity. The trial design and randomization procedures have been published. 11 In brief, patients with newly diagnosed, histologically confirmed stage IA or IIA HL were eligible if age 16 to 75 years and without mediastinal bulk disease. Baseline staging was performed by computed tomography (CT). Clinical risk stratification was retrospectively assessed according to standard criteria (Appendix Table A1, online only).
Patients received three cycles of ABVD and then underwent PET and CT assessment. Patients with progressive disease by CT criteria were excluded at this point. 19 PET scans were performed within UK NCRI-accredited PET centers using standardized methods for quality control and image acquisition. 20 PET images were centrally reviewed by two independent reporters at St Thomas' Hospital, London, United Kingdom. FDG uptake was prospectively graded using a 5-point scale according to the likelihood of disease response or nonresponse. The central review score determined further management. A similar graded response method was subsequently adopted internationally, widely referred to as the Deauville criteria. 15,21 A PET score of 5 was defined as 3 or more times the maximum liver uptake in RAPID, and uptake greater than the mediastinum was considered to represent a positive PET result.
In total, 602 patients were recruited between October 2003 and August 2010, of whom 571 completed three cycles of ABVD and underwent PET evaluation; 145 patients (25.4%) were PET positive (uptake $ mediastinum; PET score, 3, 4, or 5) and received a fourth cycle of ABVD and 30 Gy of involved-field radiotherapy (IFRT); 426 patients (74.6%) achieved complete metabolic response (CMR; PET score, 1 or 2) and were randomly assigned using a one-to-one ratio to receive 30-Gy IFRT (n = 209) or no additional treatment (NFT; n = 211). Six patients with CMR withdrew before random assignment. Three additional patients were excluded from this analysis, where review of original diagnostic material at relapse identified a non-HL diagnosis. Outcomes for 562 patients are reported here (PET positive, n = 143; IFRT, n = 208; NFT, n = 211; Fig 1). Patients were monitored for disease progression by regular clinical evaluation and by CT scans at 6, 12, and 24 months post-treatment.

Statistical Considerations
The primary end point of this subsidiary analysis was HLspecific event-free survival (EFS), calculated from the date of registration to relapse or death resulting from HL, censored at the date last seen or date of death resulting from any non-HL cause. PFS was calculated from the date of registration to relapse or death resulting from any cause, censored at the date last seen. Overall survival (OS) was calculated from the date of registration to death resulting from any cause, censored at the date last seen.
EFS, PFS, and OS are described using the Kaplan-Meier method; univariable and multivariable Cox regression analyses were performed to explore the associations with PET score, pretreatment risk factors, and clinical prognostic scores.

RESULTS
Baseline characteristics are listed in Table 1; data for risk stratification by GHSG criteria were available for 480 patients (85.4%), of whom 155 (32.3%) had unfavorable risk. Data for stratification by EORTC criteria were available for 492 patients (87.5%), of whom 184 (37.4%) were unfavorable risk. Patients were classified as unfavorable risk largely because of the number of involved nodal sites, age (EORTC only), and erythrocyte sedimentation rate; only one patient had extranodal disease, and patients with mediastinal bulk and B symptoms were excluded from RAPID.
After a median follow-up of 61.6 months, 44 patients (7.8%) had an HL-related event, with five deaths resulting from HL and 39 additional disease progressions. Twelve non-HL deaths occurred from pneumonia or pneumonitis related to primary HL treatment (n = 6), other malignancies (n = 4), myocardial fibrosis (n = 1), and intracranial hemorrhage (n = 1). For PET-positive patients, there was no end-of-treatment PET scan; however, no patient received salvage therapy for inadequate response in the absence of confirmed disease progression. There was better discrimination between PET-positive and -negative patients in terms of both EFS and PFS using the Lugano classification of PET positivity (PET score, 4 or 5; liver threshold) than the mediastinal threshold. 21 Patients with a PET score of 4 or 5 had a 5-year EFS of 80.3% (95% CI, 69.3% to 91.3%).
Outcomes According to PET Score After Three Cycles of ABVD There was strong evidence that higher PET score was associated with increased risk of progression or HL-related death (EFS; P , .001) on univariable analysis, and results remained significant, with similar effect sizes, when adjusted for baseline GHSG (P = .01) or EORTC risk stratification (P = .01). A similar association was identified between PET score and PFS (unadjusted P , .001; adjusted P = .03 and P = .04 for GHSG and EORTC stratification, respectively).
EFS and PFS by individual PET score are listed in Table 2 and Figures 2A and 2B. Patients with a score of 5 had a significantly higher risk of progression or HL-related death than those with all other PET scores (Table 3; P , .001 for both EFS and PFS). Furthermore, a score of 5 identified poor-prognosis patients among the favorable EORTC or GHSG groups, and similarly, a lower PET score identified good-prognosis patients in the unfavorable group (Appendix Figs A1A to A1D, online only). A similar association was observed for OS (P = .002; Fig 2C). The 5-year OS rate was 85.2% (95% CI, 69.7% to 100%) in patients with a score of 5, compared with 97.8% (95% CI, 96.4% to 99.2%) in patients with a score of 1 to 4. Compared with those with a score of 1 to 4, patients with a score of 5 were

DISCUSSION
The RAPID trial was a large prospective phase III randomized study and the first to our knowledge to use a graded 5-point scale for response adaptation in lymphoma, which has become the modern standard for response assessment. 16,21 Contemporaneous studies used the now outdated IHP 18 criteria, with binary positive versus negative outcomes. This subsidiary analysis from RAPID assessed the prognostic relevance of early PET, using graded response and pretreatment risk factors in earlystage HL. Our results demonstrate that PET score after three cycles of ABVD has greater prognostic value than pretreatment risk stratification. These findings support the continuing use of early PET response assessment as part of risk-adapted treatment strategies for early-stage HL.
Using a binary definition of PET positivity (score, 3 to 5) did not sufficiently discriminate outcomes in RAPID. PETpositive patients had a 5-year PFS of 88.4% (95% CI, 83.1% to 93.7%), compared with 91.4% (95% CI, 88.5% to 94.3%) in those achieving CMR, although the two groups had divergent treatment strategies. Some authors have interpreted the results of RAPID and similar studies to mean that PET assessment has limited prognostic value in earlystage HL. 22,23 However, the value of a positive PET scan is dependent on the threshold used. Our results demonstrate that individual PET scores are strongly associated with outcomes and reinforce the role of PET in individualized treatment planning in early-stage HL. Using the more widely accepted definition of PET positivity in the Lugano classification (score, 4 or 5) 21 provided better discrimination, although only a score of 5 was clearly associated with adverse outcomes in RAPID.
In RAPID, patients with a PET score of 3 had excellent outcomes after ABVD and IFRT without chemotherapy intensification. With a 5-year EFS of 95.3% (95% CI, 90.8% to 99.8%) with ABVD and IFRT alone, our results do not support treatment escalation in this cohort. Whether these patients can be treated with ABVD alone remains unclear; in the PET-adapted Cancer and Leukemia Group B (CALGB) 50604 study, patients with a Deauville score of 3 had inferior outcomes to those with a score of 1 or 2 after receiving four cycles of ABVD alone, although patient numbers were small. 12 Patients with a score of 4 also had good outcomes with ABVD and IFRT, with a 5-year EFS of 93.5% (95% CI, 84.9% to 100%), similar to patients with a score of 1 to 3. Although PFS was slightly lower in patients with a score of 4 (87.5%; 95% CI, 76.1% to 98.9%), this included two treatment-related non-HL deaths (bronchopneumonia and pneumonitis), where treatment escalation would not have been beneficial or feasible. None of the deaths in this group were attributable to HL.
Patients with a PET score of 5 after three cycles of ABVD had particularly poor outcomes, with five progressions and three HL-related deaths in only 21 patients. It is clear that treatment with ABVD and IFRT alone is inadequate for these patients, and alternative strategies should be explored. These might include escalation of chemotherapy intensity, as demonstrated by the H10 study, 9 or introduction of novel agents.
One of the main limitations of this study is that relatively few patients had a PET score of 4, with a small number of events. Although our findings require additional confirmation, they are supported by emerging data in HL and other lymphoma subtypes that demonstrate patients with a PET score of 5 have significantly worse outcomes than those with a PET score of 4. 10,24-27 Baseline PET scans were not performed; therefore, we cannot determine whether patients with a score of 5 had appearances suggestive of progressive metabolic disease, which may have a worse prognosis. However, there was no evidence of progression by CT criteria for patients in this analysis, and early progression is rare in early-stage HL. 28 In other studies 10,27 and international guidance, 16 a score of 5 refers to uptake markedly above liver, without distinguishing whether findings also suggest disease progression, such as increasing metabolic activity and/or new lesions. There was no formal monitoring of the discrepancy rate among central PET reviewers, but several studies have demonstrated that concordance between PET readers using the 5-point scale is high (76% to 84%). 8,29,30 It is unclear whether the results of the H10 study can be generalized to the RAPID population, given significant differences in inclusion criteria, particularly with respect to B symptoms and mediastinal bulk. PET scans were also performed earlier in H10, after two cycles of ABVD. However, it is notable that, although H10 was randomized, PET scans were reported only as positive or negative by IHP criteria. 18 PET score was not used to stratify patients, and it is unknown whether patients with a score of 5 were balanced between treatment arms. We propose using a score of 5 as a basis for treatment escalation and/or as a stratification factor in future PET-adapted trials.   The PET scoring system used here and in other UK NCRI trials 29 evolved directly into the 5-point Deauville scale. 15 A PET score of 5 is defined as three times the maximum liver uptake in UK NCRI-led trials. The Lymphoma Study Association and Fondazione Italiana Linfomi use the lower threshold of twice the maximum liver uptake. Improvements in imaging technology, especially new reconstruction algorithms, mean that today, PET is more sensitive, and there may be a shift toward more scans being scored as 3 or 4. 31 Treatment efficacy may also affect the predictive ability of PET. 32 Quantitative PET(qPET), which is a ratio between residual FDG and mean liver uptake, replaces an ordinal with a continuous scale and may help to refine the threshold between adequate and inadequate response for treatment optimization and allow individualized risk estimates in the future. 33,34 In RAPID, neither GHSG nor EORTC risk score was associated with outcomes. Unlike most early-stage HL studies, treatment in RAPID was not adapted according to baseline risk, and this is one of the first studies to explore the prognostic relevance of clinical risk stratification in the context of PET-adapted treatment. Given the much stronger association with PET score, our findings suggest that pretreatment risk stratification may have diminished relevance with PET-adapted treatment, particularly for patients without mediastinal bulk or B symptoms. It is unclear whether our findings are applicable to the wider early-stage HL population, particularly patients with mediastinal bulk, who were excluded from RAPID and in whom the association with adverse outcomes may be stronger. 6 However, all risk factors are weighted equally within EORTC and GHSG groupings, which are designed to apply to all patients with early-stage HL, including the RAPID population; therefore, our findings highlight weaknesses in current risk stratification models.
Our results are similar to those of retrospective studies in advanced-stage HL, where the International Prognostic Score failed to retain independent prognostic significance over interim PET assessment. 7,35 Indeed, in a subsidiary analysis of H10, only PET assessment, but not baseline risk stratification, was prognostic on multivariable analysis, although treatment was adapted according to EORTC stratification. 28 A subsidiary analysis of the GHSG earlystage HL trials in the pre-PET era showed a small absolute difference in PFS between favorable and unfavorable risk groups for patients treated with ABVD and IFRT (9.4% for GHSG and 6.7% for EORTC risk stratification). 6 These findings emphasize the need to re-evaluate the use of clinical prognostic grouping in early-stage HL in the era of PET-adapted therapy. Incorporation of biological or baseline PET parameters may be required to improve pretreatment risk stratification. 28,36,37 In conclusion, this subsidiary analysis of the RAPID trial demonstrates that PET response assessment after chemotherapy has a much stronger association with outcomes than clinical risk stratification in early-stage HL. We have shown that a positive PET scan does not carry uniform prognostic weight, with only a PET score of 5 associated with inferior outcomes in RAPID; patients with nonbulky early-stage HL and a PET score of 3 or 4 after three cycles of ABVD were treated effectively with a fourth cycle of ABVD and IFRT. In future trials, we propose reserving treatment escalation with its attendant toxicity in this patient group for those with a PET score of 5, who have significantly worse outcomes than those patients with PET scores of 1 to 4. These results support the continued development and use of PET-adapted strategies in early-stage HL.