Quality of Life in Patients with High-grade Non–muscle-invasive Bladder Cancer Undergoing Standard Versus Reduced Frequency of Bacillus Calmette-Guérin Instillations: The EAU-RF NIMBUS Trial

Take Home Message This study did not find better quality of life with a reduction in the number of bacillus Calmette-Guérin instillations in patients with high-grade non–muscle-invasive bladder cancer. This result together with the previous finding that a reduced frequency schedule is inferior underlines the use of a standard bacillus Calmette-Guérin instillation schedule.


Introduction
Urothelial bladder cancer (UBC) carries a large global disease burden, being the 11th most common cancer, with approximately 550 000 new cases annually [1].Nearly 75% of all primary UBC patients are diagnosed with nonmuscle-invasive bladder cancer (NMIBC).Patients with high-grade NMIBC have increased risks of recurrence, progression, and metastases [2].Intravesical bacillus Calmette-Guérin (BCG) instillations following a transurethral resection of the bladder tumor (TURBT) are the standard of care to reduce these risks.The European Association of Urology (EAU) guidelines recommend a weekly instillation for 6 wk as an induction phase, followed by a maintenance phase of 1 yr (three times 3 weekly instillations at 3, 6, and 12 mo) after TURBT for intermediate-risk and up to 3 yr for high-risk patients [3][4][5].Adverse events, however, are significant during the long-term administration of BCG, often leading to treatment discontinuation [6,7].
The European Organization for Research and Treatment of Cancer (EORTC) trial (EORTC 30962) concluded that BCG dose reduction did not affect toxicity level and led to higher recurrence rates [8].The European EAU-RF NIMBUS trial evaluated whether a reduced instillation frequency during both the induction and the maintenance phase is noninferior to EAU guideline standard of care [9].Unfortunately, safety analyses showed the reduced approach to be inferior to the standard approach for the risk of recurrence, leading to early cessation of patient recruitment to avoid further harm in the reduced BCG frequency arm.The current post hoc analysis of the EAU-RF NIMBUS trial evaluated whether patients with reduced BCG instillation frequency in both the induction and the maintenance phase experienced lower toxicity and consequently better quality of life (QoL) than patients receiving the standard BCG instillation frequency.

Patients and methods
The EAU-RF NIMBUS trial was a European randomized controlled trial that assessed whether a reduction in the BCG instillation frequency is noninferior to the standard BCG frequency in patients with high-grade NMIBC (Ta-T1) [9].Recruitment took place between December 2013 and July 2019 at 51 study sites spread across Germany, The Netherlands, France, Belgium, and Spain.Patient recruitment was ceased on July 1, 2019, after a data review and safety analysis by the Independent Data Monitoring Committee (IDMC) showed the reduced BCG instillation arm to be inferior to the standard BCG instillation arm with regard to the risk of recurrence.
The trial had been approved by all the relevant institutional review boards and independent ethics committees, and had been performed according to the Declaration of Helsinki [10], Good Clinical Practice, and local regulatory requirements.

Inclusion and exclusion criteria
BCG-naïve patients who had been clinically diagnosed with primary or recurrent high-grade NMIBC (Ta or T1), with single or multiple urothelial papillary bladder carcinoma(s), and with or without concomitant carcinoma in situ (CIS) were eligible.A routine repeated TURBT (re-TUR and/or re-re-TUR) had to be performed to confirm the absence of muscle-invasive cancer.High-grade Ta patients were allowed to be included without a re-TUR in case a biopsy specimen confirmed the complete removal of the tumor and included detrusor muscle tissue.
The exclusion criteria were having had previous systemic or multiinstillation intravesical chemotherapy within the preceding 3 mo, having any type of tumor(s) in the upper urinary tract or prostatic urethra at any time, having any immunodeficiency, and having any other type of malignancy besides basal cell carcinoma of the skin or localized prostate cancer under active surveillance.

Randomization
After enrolment, patients were allocated using a validated randomization program (EAU-RF website) according to the minimization method with a random element as described by Pocock [11].Stratification factors included center, Ta versus T1, concomitant CIS versus no CIS, single versus multiple tumors, and BCG strain (Connaught, Medac, or Tice).The patients were randomized to either one of two treatment groups: 1.The standard frequency (SF) arm.Induction: once a week BCG instillations at weeks 1-6; maintenance: once a week instillations at weeks 1-3 at months 3, 6, and 12 (15 planned instillations).
Follow-up was conducted through cystoscopy and urine cytology every 3 mo during the first 2 yr and every 6 mo thereafter.Histological confirmation had to be provided in case of CIS, or if there was a suspicion of disease recurrence.
Patients' participation in the study was ended in case of a recurrence in the bladder, a urothelial carcinoma in the upper urinary tract or prostatic urethra, or presence of distant metastases, or in case systemic chemotherapy was indicated.
The remaining items evaluate any additional symptoms that are commonly perceived in cancer patients (dyspnea, appetite loss, sleep disturbance, constipation, and diarrhea).Paper questionnaires on QoL were handed out during an outpatient visit at the right time points.The questionnaires were completed prior to the first and the last instillation of each BCG cycle, leading to a total of eight measurement points (T0-T7; see Fig. 1).The endpoint of the NIMBUS trial was time to first recurrence.
Consequently, QoL questionnaires were not filled out anymore if patients experienced a recurrence.
In addition, treating physicians were responsible for carrying out side effect (SE) evaluations by means of a form that included known local and systemic SEs (World Health Organization grading of toxicity: grade 1, mild; grade 2, moderate; grade 3, severe; and grade 4, life-threatening toxicity) prior to the first and the last instillation of each BCG cycle [13].

Endpoints
The primary endpoint for the analysis was QoL.Additionally, toxicity incidence and severity were recorded.
All the five functional and three symptom scales plus the individual symptom items of the questionnaires were transformed to a 0-100 score.A high scale score represents a higher response level.Thus, a high score for a functional scale represents a high/healthy level of functioning, a high score for the global health status/QoL represents high QoL, but a high score for a symptom scale/item represents a high level of symptoms/problems.Differences in the mean QoL between the two treatment arms were evaluated using linear regression at T1, T5, and T7 while adjusting for T0 (baseline measurement, ie, prior to induction week 1).Differences between the trends in QoL of the two treatment arms were tested for significance by performing a linear mixed model using time as the fixed factor with eight levels (T0-T7).Chi-square or Fisher exact tests were used to test for significant differences in the number of SEs between the two treatment arms.
After performing the ITT QoL and SE analyses, supplementary perprotocol (PP) QoL and SE analyses were performed.Patients were excluded from the PP analysis if they had incomplete treatment due to missed instillations, had extra BCG instillation(s), switched treatment arm after the study's premature stop, or stopped treatment for other reasons besides SEs or recurrence.

Results
A total of 359 patients were randomized to one of the two treatment arms.The SF arm contained 182 patients, while the RF arm contained 177 patients.At baseline, there were no significant differences in characteristics between the two treatment arms (Table 1).At the time of study discontinuation, 52% (n = 94) of the patients in the SF arm received all 15 planned instillations.In total, 48 (26%) patients in this arm received nine or fewer instillations.In the RF arm, 45% (n = 79) received all nine planned instillations at the time of study stop.In the SF arm, 24 patients developed a recurrence or new CIS within 1 yr and went off study.In the RF arm, this number was 46 (Fig. 2).In total, 30 and 55 patients in the SF and RF arms, respectively, developed a recurrence.

QoL analyses
The QLQ-C30 questionnaires were completed by 304 (84.7%) patients at T1, 226 (63.0%) patients at T5, and 168 (47.2%) patients at T7. Detailed results of the questionnaires can be found in Table 2.A summary of the results is depicted in Figure 3. Aside from the physical functioning at T5 (p = 0.05), we found no differences in the means of any QoL scale between the two treatment arms (p > 0.05; Table 2).Moreover, the linear mixed model, which was adjusted for T0, did not show any statistically significant temporal changes in any QoL domain for both the SF and the RF arm (91% in the SF arm and 94% in the RF arm completed the QoL assessment at T0).

Toxicity
SE evaluations were completed in 57.7% of patients at T1, 44.5% of patients at T5, and 34.6% of patients at T7 (Table 3).
For patients for whom an SE form was not filled out, we conducted an enquiry among the participating urologists.Thirty-three of 51 sites responded; 26 out of the 33 responding urologists (79%) stated that there were no SEs when the SE form was not filled out.In patients recruited by the remaining seven sites (21%), there might have been SEs, but the grading was not assessed.
In Table 3, we present a best case scenario assuming that there were no SEs in patients in whom the SE form was not filled out.Globally, the treatment toxicity did not exceed grade II in the majority of the patients.Grade III and IV local SEs were more frequent in the SF arm (n = 7; 3.8%) than in the RF arm (n = 1; 0.6%; p = 0.07; data not shown).The   Overall, local SEs were reported more often than systemic SEs at all time points for both arms.Urination problems (frequency, urgency, dysuria, and incontinence) were the most commonly reported local SEs, whereas fever and general malaise were the most frequent systemic SEs.Although mostly insignificant, the numbers of recorded local and systemic SEs were generally higher in the SF arm.We found the SF arm to have a significantly higher frequency of total local SEs at T7 (p 0.001).Moreover, the total number of patients with local SEs was significantly higher in the SF arm than in the RF arm at T5 (p = 0.01) and T7 (p 0.001).Specifically, there were significant differences in the incidence of urgency (p = 0.05) and general malaise (p = 0.03) at T1, and frequency (p = 0.002), urgency (p = 0.02), dysuria (p = 0.001), and chemical cystitis (p = 0.03) at T7 favoring RF arm patients.

Follow-up
Side effects (after previous instillations; n = 14) Side effects (after previous instillations; n = 5)   When a worst case scenario is assumed in which all patients without an SE grading form had SEs, we see a prevalence of local SEs at T2 of 42% and 35% in the SF and RF arms, respectively.At T5 and T7, the prevalence was 41.2% versus 31.1% and 39.6% versus 27.1%, respectively.The prevalence of systemic SEs at T2, T5, and T7 was 22.5% versus 18.1%, 24.7% versus 19.8%, and 20.3% versus 17.5%, respectively.Overall, we see the same pattern of fewer SEs in the RF arm than in the SF arm, but differences are small.

PP analyses
After excluding patients according to the PP criteria, a total of 249 patients remained, of whom 123 (49.4%) were ran-

AppeƟte loss
Fig. 3 (continued) domized to the SF arm and 126 (50.6%) to the RF arm.No significant differences in the baseline characteristics were observed between the two arms (Supplementary Table 1).Supplementary Table 2 and Supplementary Figure 1 summarize the results obtained from the QoL analyses.For the largest part, similar results to those in the ITT analysis are seen.The difference in physical functioning in the ITT analysis at T5 is no longer present in the PP results.However, we found a higher mean score of diarrhea in the RF arm at T1 (p = 0.01), which was not the case in the ITT analysis.Again, the linear mixed model did not display statistically significant temporal changes in any QoL domain for both the SF and the RF arm (p > 0.05).The PP analysis of SEs was also largely consistent with the ITT analysis (Supplementary Table 3).However, unlike in the ITT analysis, no significant differences were found in the total number of patients with local SEs at T5 (p = 0.15), urgency at T1 (p = 0.39), and general malaise at T1 (p = 0.06).We additionally found the number of patients with bacterial cystitis in the SF arm to be significantly higher than that in the RF arm (p = 0.03), which was not the case in the ITT analysis.

Discussion
An analysis of EAU-RF NIMBUS study data did not show better QoL in patients undergoing an RF BCG instillation regimen.However, there were significant differences in the incidence of general malaise at T1, and of storage symptoms of frequency, urgency, and dysuria at T7 favoring RF arm patients.Previous studies showed contrasting results in terms of the QoL and toxicity experienced after a dosage reduction in BCG.Yokomizo et al. [14] found a lower BCG dose (40 mg) to be associated with lower toxicity and better QoL than the standard BCG dose (80 mg).This study focused primarily on an eight-instillation induction phase, while QoL was assessed only once after the induction phase had ended.The EORTC 30962 trial analyzed the efficacy of one-third BCG doses compared with the standard dose.They did not report any difference in toxicity between the reduced and full-dose arms [8].This trial, however, was mainly designed to analyze the toxicity after the maintenance phase and did not focus on the induction phase.Nonetheless, these studies focused on the effect of a reduced dose of BCG instillations, whereas our study focused on a full dose but RF of BCG instillations.A direct comparison of these studies is therefore difficult.
In accordance with literature, our study reported urinary SEs, general malaise, and fever as the most frequent SEs caused by BCG instillations [15,16].These SEs were predominantly mild to moderate, reflecting generally good BCG tolerability.
A reduced BCG instillation frequency however significantly decreased the number of overall SEs.This difference was significant for general malaise (p = 0.03) at T1, and for frequency (p = 0.002), urgency (p = 0.02), and dysuria (p = 0.001) at T7.In fact, three times more patients in the SF arm did not complete the instillations due to SEs (14 vs 5 patients).
Interestingly, the higher toxicity reported in the SF arm did not translate into worse QoL.This unexpected result may be explained by the instrument used to measure QoL (QLQ-C30), which is not optimal for this group of patients.Although the QLQ-C30 questionnaire has been validated internationally, it does not focus directly on (non-muscleinvasive) bladder cancer (BC).Literature suggests that the questionnaire fails to assess finer BC-specific details/domains, which reduces the responsiveness to changes.Domains such as sexual functioning, self-consciousness, embarrassment, and psychological distress are of greater importance in BC patients but are not assessed thoroughly by the QLQ-C30 questionnaire [17].Several BC-specific questionnaires have been designed to offer an instrument that closes these gaps, such as the EORTC QLQ-NIMBC24 questionnaire, which has shown excellent measurement properties with regard to validity, reliability, and responsiveness [18], but this did not exist at the time of the original trial design.
The randomized setting of our study is a strength.In addition, the eight time point QoL evaluations over a year should have been able to pick up temporal QoL changes.Nonetheless, there are limitations to this post hoc analysis.In addition to the suboptimal QoL questionnaire, QoL was not measured anymore after the endpoint (a recurrence) was reached so that the influence of, for example, extra TURBTs could not be studied.Moreover, the large number of unanswered questionnaires resulted in a smaller number of evaluable patients (again raising further questions about the suitability of the instrument used to measure QoL changes).Lastly, the patients could not be blinded to RF instillation, which may have induced a response bias.This may thus have instigated some sort of placebo effect in patients to indeed experience better QoL with RF BCG instillations and vice versa.

Conclusions
An analysis of the EAU-RF NIMBUS study data did not show better QoL with EORTC QLQ-C30 v3.0, in patients undergoing an RF BCG instillation regimen despite lower storage symptoms at T7 in favor of RF.This finding may possibly be explained by the insensitivity of the EORTC QLQ-C30 questionnaire for small QoL domain changes.Our study, together with the previous finding that an RF schedule is inferior, supports the use of a standard BCG instillation schedule.

Fig. 1 -
Fig. 1 -Overview of the two treatment arms where each block represents a BCG instillation.The crossed out blocks in the RF arm represent the instillations that had not been performed.The different time points represent the moments the QLQ-C30 questionnaire and the side-effect evaluations had been completed.BCG = bacillus Calmette-Guérin; MM = maintenance month; QLQ-C30 = Quality of Life Questionnaire Core 30 version 3.0; RF = reduced frequency; SF = standard frequency; T = time point; W = week.
0) BCG = bacillus Calmette-Guérin; CI = confidence interval; CIS = carcinoma in situ.a Three patients had CIS only: 1Â treatment completed (15 instillations), patient included in follow-up, no recurrence; 1Â treatment completed (14 instillations), patient included in follow-up, first recurrence, and tumor in prostatic urethra at month 36; 1Â consent withdrawn after six instillations, patient included in follow-up until that time point, no recurrence.b Patient did not receive BCG and was not included in follow-up.c One patient was previously treated with BCG.This was a protocol violation.The patient was kept in the analyses for consistency with the original paper on the NIMBUS trial (by Grimm et al.[9]).E U R O P E A N U R O L O G Y O P E N S C I E N C E 5 6 ( 2 0 2 3 ) 1 5 -2 4 number of grade III and IV systemic SEs were similar to that of the local SEs (3.8% and 0.6%, respectively).

Fig. 2 -
Fig. 2 -Intravesical treatments received and reasons to stop.a Examples of ''Other'' are consent withdrawn, lost to follow-up, and patient not compliant.
C30 = European Organization for Research and Treatment of Cancer Quality of Life Questionnaire Core 30 version 3.0; QoL = quality of life; RF = reduced frequency; SD = standard deviation; SF = standard frequency.E U R O P E A N U R O L O G Y O P E N S C I E N C E 5 6 ( 2 0 2 3 ) 1 5 -2 4

Fig. 3 -
Fig. 3 -Summary of the results from the EORTC QLQ-C30 where the X axes represent the points and the Y axes represent the mean QoL of the different EORTC scales and items (all scales have a range 0-100; for QoL scales, a higher score means better QoL; for symptom scales, a higher score means more symptoms).EORTC = European Organization for Research and Treatment of Cancer; QoL = quality of life; QLQ-C30 = Quality of Life Questionnaire Core 30 version 3.0; RF = reduced frequency; SF = standard frequency.
Calmette-Guérin; RF = reduced frequency; SE = side effect; SF = standard frequency; WHO = World Health Organization.a The following grade 3 or grade 4 side effects were observed: (1) grade 3 local side effects: one event in the RF group (M3W1) and seven events in four patients in the SF group (3Â M2W6, 2Â M6W1, and 2Â M6W3); (2) grade 4 local side effects: none; (3) grade 3 systemic side effects: one event in the RF group (M3W1); and (4) grade 4 systemic side effects: seven events in four patients in the SF group (M6W1, M6W3, and M12W2).b For patients for whom a side-effect form was not filled out, we assumed that there were no side effects.c Calculated using Mann-Whitney U test based on the average number of side effects per patient.E U R O P E A N U R O L O G Y O P E N S C I E N C E 5 6 ( 2 0 2 3 ) 1 5 -2 4

Table 3 -
Incidence of WHO grade I-IV side effects a by treatment groups at time points T1 (induction week 6), T5 (maintenance month 6, week 3), and T7 (maintenance month 12, week 3)