International Society of Urological Pathology (ISUP) Gleason Grade Groups stratify outcomes in the CHHiP Phase 3 prostate radiotherapy trial

To compare the results of Gleason Grade Group (GGG) classification following central pathology review with previous local pathology assessment, and to examine the difference between using overall and worst GGG in a large patient cohort treated with radiotherapy and short‐course hormone therapy.


Introduction
The Gleason grading system of prostate cancer (PCa) was established in the 1960s and is a key parameter for clinical decision making in localized PCa.Modifications were introduced by the International Society of Urological Pathology (ISUP) in 2005 [1].In 2013, Pierorazio et al. proposed the use of Gleason Grade Groups (GGGs) based on [Correction added on 4 September 2023, after first online publication: In the Conclusion section of the Abstract, "GGG 2-3" has been corrected to "GGG 2 to GGG 3" in this version.]Gleason score (GS) [2], which aligned PCa grading with other cancers.This avoided the anomaly that the least aggressive PCa has a GS of 6 and distinguished between GS 7 = 3 + 4 and GS 7 = 4 + 3, which have different clinical outcomes.This concept was adopted by the 2014 ISUP Conference [3,4].The GGG system has been validated using biochemical relapse in large single-centre, multi-institutional and multimodal therapy studies [5][6][7][8] and using PCa mortality in a conservatively treated cohort [9].
The CHHiP (Conventional or Hypofractionated High-dose intensity-modulated radiotherapy in localized Prostate cancer) trial is the largest Phase 3 trial assessing radiotherapy in localized PCa.The trial demonstrated non-inferiority of modest hypofractionation compared with standard fractionation [10] and the hypofractionated schedules have now become an internationally recognized standard of care [4,11,12].Patients with low-to high-(mostly intermediate-) risk PCa were treated with standardized intensity-modulated radiotherapy (IMRT) in combination with short-course androgen suppression or blockade.We established a prostatic tissue biopsy library from 2047 patients in order to study potential markers of fraction sensitivity [13].The aims of this exploratory analysis were: (i) to review the agreement between local pathology GS at the time of original diagnosis and central review GS using contemporary ISUP 2014 criteria to determine GGG; (ii) to examine the differences between using overall and worst GS in determining GGG; (iii) to report biochemical and clinical failure (BCF) and time to development of distant metastases (DM) by GGG; and (iv) to determine the effect of histopathology review on allocation of National Comprehensive Cancer Network (NCCN) prognostic risk groups [4].

Materials and Methods
The CHHiP trial randomized 3216 PCa patients between three radiotherapy regimens (74 Gy in 37 fractions, 60 Gy in 20 fractions and 57 Gy in 19 fractions) between October 2002 and June 2011 [10].Randomization (via telephone call to the Institute of Cancer Research -Clinical Trials and Statistics Unit [ICR-CTSU]) was 1:1:1 using computer-generated random permuted blocks (size 6 and 9) with NCCN risk group (low vs intermediate vs high) and radiotherapy centre as stratification factors.A total of 2749 CHHiP patients consented to participate in a translational substudy, Trans-CHHiP.Between August 2012 and April 2014, the ICR-CTSU acquired microscope slides and/or tissue blocks from 2047 UK participants who consented to Trans-CHHiP [13].Between 2012 and 2016, all available tumour slides were reviewed by a specialist urological pathologist (C.M.C.) and additional slides were cut from blocks if necessary or if slides were technically unsuitable or not submitted.GSs [1,3] were given for each tumour core separately and an overall score GS was given for the case.This was derived from assessments of the predominant and most aggressive (or secondary) patterns, aggregating all cores together, and these were also used to allocate GGG using the 2014 ISUP recommendations (Appendix S1) [3] interpreted using guidance from the UK Royal College of Pathologists [14].Additionally, the slide or slides with the highest individual GS was/were defined as the worst score and this score was also converted into pathological GGG.Both tumour length and percentage core involvement were also noted.All cases were reviewed blinded to any clinical information, content of the original report, treatment allocation or patient outcome.The pathology report from the hospital where the biopsy was taken was also obtained with the slides/tissue blocks.Biopsies had been taken between 2002 and 2011 and reported contemporaneously according to local histopathological practices at the time.Data from these local pathology reports were separately extracted, without further clinical information or knowledge of patient outcome, to give an overall GS and derive the GGG (C.S., D.D.) for comparison with the central review pathology.Trans-CHHiP was approved by the London multicentre research ethics committee (04/MRE02/10).

Statistical Considerations
Agreement between (i) local pathology GS at the time of original diagnosis and overall GS using contemporary ISUP 2014 to determine GGG and (ii) overall and worst GS from central review was assessed using the kappa (j) statistic (j statistic scores of 0.21-0.40,0.41-0.6,0.61-0.80 were considered to represent fair, moderate and substantial agreement, respectively).The Phoenix consensus definition of PSA > nadir +2 ng/mL was used to define biochemical failure; clinical failure events included recommencement of androgen deprivation therapy (ADT), local recurrence, lymph node or pelvic recurrence and DM [10].Time was measured from randomization to date of first biochemical failure or clinical event.Patients without an event were censored on date of last PSA assessment.Time to development of DM was measured as time from randomization to first reported DM.Patients alive without DM were censored on date of last follow-up and date of death for those who died without an event.The log-rank test was used to assess the equality of survivor functions across GGGs.Multivariable Cox regression analysis was conducted including the following prespecified variables, as per Berney et al. [9]: GGG (with GGG 1 used as the reference group); pre-hormone PSA level (<10 ng/mL [reference group] or ≥10 ng/mL; this categorization was used to distinguish between low-and intermediate-risk patients and enable clinical interpretation); clinical T-stage (with T1 as the reference group); and proportion of positive cores (categorical variable in 10% increments ≤10%, 10% ≤ 20%, 20% ≤ 30%, etc).The proportional hazards assumption was tested on the basis of the Schoenfeld residuals.Kaplan-Meier curves were used to graphically display BCF and time to metastases by GGG, defined using overall and worst GS

Results
A total of 2047 patients consented to Trans-CHHiP, and pathological material and/or reports were submitted by local histopathology departments (Appendix S2).Central review and grading were performed in 1875/2047 patients (92%).Reasons for lack of central review included poor technical quality of submitted slides (n = 27), non-submission of blocks when slides were inadequate (n = 39), lack of tumour in recut blocks because of cutting out of tissue (n = 77), TURP (n = 21), no prostate tissue, (n = 4) and clerical discrepancies (n = 4).
[Correction added on 4 September 2023, after first online publication: The format of the number of patients in the preceding sentence has been updated in this version.]Patients included in the central review were recruited between January 2003 and June 2011 and, although some statistically significant differences were apparent between the centrally reviewed and non-centrally reviewed CHHiP cohorts (Table 1), the differences were small and not likely to impact on the clinical significance of the analyses.Central review using overall GS allocated 420 (22%), 1038 (55%), 299 (16%), 67 (4%) and 51 patients (3%) to GGGs 1, 2, 3, 4 and 5, respectively.There was considerable variation between local pathology-reported overall GS and central pathology-reviewed overall GS (Appendix S3A).GGG was concordant in 853/1841 (46%) cases, upgraded in 614/1841 (33%) cases and downgraded in 374/1841 (20%) cases, with a j statistic of 0.19.Central review using worst GS ISUP Grade Groups in CHHiP Trial defined 419 (22%), 972 (52%), 295 (16%), 133 (7%) and 56 patients (3%) as having GGG 1, 2, 3, 4 and 5, respectively.Agreement between central review GGG using overall and worst GS was high (Appendix S3B) and concordant in 1742/ 1875 (93%) cases, with a j statistic of 0.89.One hundred and thirty nine patients had a higher worst GS compared with their overall GS.Of these patients, 133 were moved to a higher GGG as a result of using the worst rather than overall GS.The six patients whose GGG did not change had GGG 4 (two patients went from 3 + 5 to 5 + 3) and GGG 5 disease (three patients went from 4 + 5 to 5 + 4 and one patient from 5 + 4 to 5 + 5).Baseline characteristics by GGG defined using local GS, and central review of overall and worst GS showed the distributions expected (Appendix S4A-C).

Discussion
We found that specialist histopathology review using contemporary ISUP guidelines [3,15,16] outperformed local histopathology assessments made at the time of original diagnoses, with resultant GGGs more clearly associated with both BCF and development of DM.The Kaplan-Meier curves for GGGs 3 and 4 overlapped for BCF as previously suggested for patients treated with radiotherapy and ADT [5,7], although adjusting for the use of ADT may separate prognostic groups [6].Despite the low rate of DM development in the CHHiP trial, there were sufficient events to suggest a continuous increase in risk of DM with increasing GGG, with GGG 3 having an intermediate outcome between GGG 2 and 4. Other investigators have shown similar separation of prognostic groups for both the development of DM and PCa-specific survival [8,17].However, not all studies using dose-escalated radiotherapy have found GGG to be of prognostic value [18], which is probably because varying patterns of usage and duration of ADT have blurred treatment outcomes.We assessed GS both as an overall and worst score to derive GGG and found overall GS-derived GGG performed at least as well as worst GS-derived GGG, giving very similar results for both BCF and development of DM.Equivalence of prognostic value of using overall or worst grade GGG or GS has been reported by other investigators in patients managed by hormone therapy alone or by watchful waiting [9,19] but, to our knowledge, has not been evaluated in a contemporary cohort of patients treated with radical radiotherapy.The proportion of patients upgraded was quite low at 7.1% but similar to the 6.4%-7.6%previously reported [9,20].We examined the outcome of patients whose GGG increased from 2 to 3 when using worst rather than overall GS.For BCF, PCa in these 'upgraded' patients behaved similarly to GGG 2 PCa initially but, by 8 years, could be grouped together more clearly with GGG 3; a similar pattern was suggested for DM but the overall number of events was small.The distinction between GGG 2 and GGG 3 may be clinically important, potentially changing the NCCN risk group from favourable to unfavourable intermediate risk [4] and so making a patient unsuitable for active surveillance and supporting the use of short-course ADT with radiotherapy.The issue of whether to use overall or worst GS has been debated in the literature.Although worst rather than overall biopsy grade was associated with higher stage and grade at radical prostatectomy [21], as mentioned, the evidence is unconvincing for patients managed conservatively [9,19].Previously in the UK it has been common practice to assign a composite or overall score [15], although a recent survey suggested 78% of UK clinicians would select the highest GS in a histopathology report for patient management [14].ISUP Consensus Conferences [1,3,22] recommended a separate score for each biopsy but there was no consensus on the use of a global score for systematic biopsies, despite agreement that a global score should be given for MRI-targeted biopsies.The European Association of Urology (EAU)-European Association of Nuclear Medicine (EANM)-European Society for Radiotherapy and Oncology (ESTRO)-European Society of Urogenital Radiology (ESUR)-International Society of Geriatric Oncology (SIOG) Guidelines recommend that each biopsy is reported separately, together with a global ISUP 2014 GGG [4].Oxley et al. have recommended a pragmatic rather than 'one-size-fits-all' approach, with histopathologists using their judgement to determine the most appropriate score in an individual case and to record this as the 'bottom line' score [15].National and international guidelines have not made recommendations for specialist histopathology review [4,15,22].However, potential reasons for variation in Gleason grading have been well documented [23] and proponents of central pathology review point to the potential impact of review on clinical management decisions [24,25].High-quality histological diagnosis is critical for analyses of patient populations as well as individual case determination.In particular, pathology review may impact on the analysis of prognostic factors in randomized clinical trials [26].This impact was also observed in this study, with an increase in the proportion of high-risk patients observed.Histological grading is an integral part of all standard and novel risk factor classifications [27][28][29] and we suggest pathology review is desirable when practicable for analysis of datasets with the aim of assessing the generalizability of these prognostic models in contemporary practice.Histology review is also appropriate when evaluating the potential contribution of novel diagnostic tissue or imaging biomarkers [4,30] to ensure any added value is determined alongside contemporary high-quality pathology practice.
Limitations to our findings include the protocol-defined range of eligible patients within the CHHiP trial who were treated with a proscribed schedule of neoadjuvant ADT and radiotherapy, albeit a recognized standard of care.In addition, histopathology review was performed by a single experienced uro-pathologist.We were unable to separate the benefit derived from the histopathology review from benefits that might have derived from changes in diagnostic practice over the course of the trial.Lastly, there were too few cases with GGG 4 or 5 cancers to assess the performance of the ISUP system in distinguishing between these high-grade tumours.
In conclusion, we have confirmed that the ISUP 2014 GGG classification is prognostic in a large randomized controlled trial using neoadjuvant ADT and IMRT.ISUP grades were associated with risk of BCF as well as the development of DM. Results were similar when using either overall or worst GS to determine GGG.However, the small proportion of patients with PCa upgraded from GGG 2 (overall GS) to GGG 3 (worst GS) appeared, on long-term follow-up, to segregate with GGG 3. Histological reassessment resulted in a shift of patients to a higher NCCN risk category.We recommend reporting of both overall and worst GSs to derive ISUP GGG and encourage histopathologists to assist clinicians by giving a 'bottom line' score to guide clinical management.
scores.Analyses were restricted to Trans-CHHiP participants with available centrally reviewed pathology.Statistical comparisons were made between the baseline characteristics of the centrally reviewed cohort and those of the noncentrally reviewed CHHiP trial population to assess the representativeness of the centrally reviewed population.All statistical analyses were conducted using STATA version 15.0 (StataCorp, College Station, TX, USA).

Fig. 1 182 Ó
Fig. 1 Kaplan-Meier analysis for biochemical or clinical failure-free rates by Gleason Grade Group (GGG) defined using (A) local pathology Gleason score (GS), (B) central review overall GS, (C) central review worst GS and (D) central review worst GS, with patients upgraded from GGG 2 to GGG 3 highlighted.

Table 1
Baseline characteristics defined using data reported on the CHHiP case report forms for the complete CHHiP cohort (n = 3216) and the patients included (n = 1875) or excluded (n = 1341) in this study.
*Statistical comparisons made of the centrally reviewed CHHiP population and the non-centrally reviewed CHHiP population.CHHiP, Conventional or Hypofractionated High-dose intensity-modulated radiotherapy in localized Prostate cancer; CRF, case report form; IQR, interquartile range; NCCN, National Comprehensive Cancer Network.Ó 2023 The Authors.BJU International published by John Wiley & Sons Ltd on behalf of BJU International.181

Table 2
Biochemical and/or clinical failure: multivariable Cox regression analysis including Gleason rescores on central review (overall and worse), pre-hormone therapy PSA <10 or ≥10 ng/mL, clinical T stage and proportion of positive cores.
Ó 2023 The Authors.BJU International published by John Wiley & Sons Ltd on behalf of BJU International.183

Table 3
Time to development of metastases: multivariable Cox regression analysis including Gleason rescores (overall and worse), pre-hormone PSA <10 or ≥10 ng/mL, clinical T-stage and proportion of positive cores.Proportion of positive cores was treated as a continuous variable with an increase of one unit equivalent to a 10% increment.GGG, Gleason Grade Group; GS, Gleason score.
*184 Ó 2023 The Authors.BJU International published by John Wiley & Sons Ltd on behalf of BJU International.