Introduction

In 2018, there were approximately 549,000 new cases of bladder cancer worldwide, and bladder cancer accounted for approximately 200,000 deaths [1]. The vast majority of bladder cancer cases (90%) were urothelial carcinoma [2, 3].

The burden of advanced urothelial cancer (UC) is attributable to disease and treatment characteristics [4, 5]. Haematuria, urinary frequency and urgency, and pain are among the most common signs and symptoms [6]. Additionally, symptoms such as bleeding, pain, dysuria, constipation, fatigue, emotional distress, and urinary obstruction adversely impact QoL in advanced bladder cancer [5]. Treatment-related side effects of fatigue, and the impact on daily activities, are also reported as relevant to these patients [5], as well as issues with self-esteem, embarrassment, and difficulty engaging in sexual relationships [4, 7, 8]. Emerging novel treatments [9] have accelerated interest in developing and validating patient-reported outcome (PRO) collection instruments to gain a full understanding of UC and disease impact, information important for patients and clinicians. In addition, PRO data can inform the benefit/risk assessment for regulators and payers [10].

Many instruments exist to assess health-related quality-of-life (HRQoL) in UC [11, 12]. PRO data are particularly important for these patients due to the burden of disease and therapy [11]. The Functional Assessment of Cancer Therapy-Bladder (FACT-Bl) has been used in several studies, often to determine comparative effects of various interventions on HRQoL [6,7,8, 13,14,15,16]. Despite its wide use, validity data in patients with locally advanced or metastatic UC are not published. The current study confirms the validity of the FACT-Bl in patients with urothelial carcinoma and follows an approach that is consistent with FDA guidelines for PRO validation [17]. It assesses the psychometric properties of the FACT-Bl in a group of patients who participated in the combined phase 1/2 clinical trial of durvalumab monotherapy. Funding for this research was provided by AstraZeneca.

Materials and methods

Study design

A multicentre, open-label (dose-escalation, dose-exploration, and dose-expansion) study was previously conducted to evaluate durvalumab’s safety, tolerability, and antitumour activity in patients with inoperable or metastatic solid tumours (Study 1108; NCT01693562). Study results are published elsewhere [18]. A total of 182 patients with upper and lower tract UC who received and have progressed or are refractory to 1 or 2 prior lines of systemic therapy for inoperable or metastatic disease, including a standard platinum-based regimen, were included in that open-label study. That clinical study was conducted according to the Declaration of Helsinki and approved by the independent ethics committee or institutional review board at each participating centre, with written informed consent obtained from all patients.

Data collection

PROs were evaluated using pen-and-paper versions of the FACT-Bl, the European Organisation for Research and Treatment of Cancer Quality-of-Life Questionnaire C30 (EORTC QLQ-C30), and a one-item pain questionnaire [19] completed at the screening visit: on day 1 of treatment doses 1, 3, and 5; at weeks 6, 12, and 16; and every 8 weeks until end of treatment (12 months).

Measures

FACT-Bl

The FACT-Bl (version 4) is a multidimensional, self-administered 39-item questionnaire to assess patient bladder cancer-specific symptoms using a ‘core’ set of questions (Functional Assessment of Cancer Therapy-General; FACT-G), a cancer site-specific bladder subscale, and HRQoL [20, 21]. Table 1 summarises the five subscales and three summary scores produced by the FACT-Bl.

Table 1 Functional Assessment of Cancer Therapy-Bladder Cancer (FACT-Bl) subscales, summary, and prioritised item characteristics and scores at baseline

NFBISI-18

The National Comprehensive Cancer Network FACT Bladder Symptom Index (NFBlSI-18), a measure of advanced bladder cancer-specific symptoms, assesses symptoms perceived as most important by patients and oncology clinical experts. The NFBISI-18 is based almost entirely on the FACT-Bl, including 16 items from the FACT-Bl instrument plus two items that have not been previously included (‘I feel weak all over’ and ‘I feel light-headed [dizzy]’) [5]. These two items were added based on qualitative analysis of patient and clinician priorities for symptoms and concerns associated with receiving treatment for advanced bladder cancer.

NFBlSI-18 yields three subscale scores (i.e. disease-related symptoms, treatment side effects, and general function/well-being) and a summary score. Items are rated on a 5-point Likert scale ranging from 0 = ‘not at all’ to 4 = ‘very much’ with a 7-day recall period. Higher scores represent better QoL. The subscale and summary scores are calculated using the Manual of Functional Assessment of Chronic Illness Therapy (FACIT) Measurement System [22]. Only the NFBlSI-18 total summary score and the disease-related symptoms-physical subscale (NFBlSI-DRS-P) score are considered in the current analyses and prorated based on the 16 available items.

Statistical analysis

Psychometric analyses were performed on the full analysis set population using baseline (dose 1, day 1) data only, except for test–retest which used dose 3, day 29 and the responsiveness analysis where data up to dose 7, day 85 were used as period 2.

FACT-Bl item and scale characteristics

Performance of the 39 FACT-Bl items was evaluated by means of descriptive statistics (mean, standard deviation) and percentage of lowest and highest responses (floor and ceiling effects, respectively) at baseline (dose 1, day 1). Single items regarding pain and fatigue were prioritised in the clinical trial and examined specifically in psychometric analyses. Characteristics of FACT-Bl subscales for physical, functional, social/family, and emotional well-being and the Bladder Cancer subscale (PWB, FWB, SWB, EWB, and BlCS, respectively) as well as total summary scores (FACT-G total score, FACT-Bl total score, FACT-l Trial Outcome Index [TOI], and NFBlSI-18) were summarised using measures of central tendency (e.g. mean, median) and variability (e.g. standard deviation [SD], interquartile range).

Correlation analysis

Item-to-item, item-to-total, and between-scales correlations were assessed using Spearman correlation coefficients.

Reliability

Internal consistency reliability of subscale and total scores was estimated using Cronbach’s α. To evaluate reliability in stable patients, a group of patients whose EORTC-C30 QoL score was within ± 0.25 standard deviations of their baseline score was isolated. We evaluated intraclass correlation coefficients (ICCs) between baseline (period 1) and dose 3 (day 29) (period 2) [23]. Coefficients of 0.6 and higher are considered acceptable, and coefficients of 0.7 and higher are considered good [23].

Construct validity

Construct validity testing included convergent validity and known-group validity. Convergent validity was assessed at baseline using a Spearman correlation with EORTC QLQ-C30 domain scores. Known-group validity was examined by independent sample t test comparing baseline mean FACT-Bl scale scores by baseline tumour burden (above and below the median value, i.e. 59.9 mm) and by the baseline EORTC QLQ C-30 global health status/QoL score (above and below the median value, i.e. 50 points).

Responsiveness

The ability to detect change was assessed by comparing changes in the FACT-Bl scores over time between responders and non-responders using mixed models with repeated measures. Assessments up to and including dose 7 (day 85) were included in the analysis to maximise the longitudinal window and ensure sufficient sample size. Responders versus non-responders were defined in two ways: objective response (responders defined as patients with a confirmed objective complete or partial tumour response [18]; non-responders included the remainder of the patients [n = 150]) and patient evaluation of change using global health status (GHS)/QoL (patients demonstrating at least a 10-point improvement in GHS/QoL scale at dose 7 (day 85) compared with their baseline score were classified as responders and the remainder of patients as non-responders). The FACT-Bl was considered responsive if the mean change from baseline to dose 7 (day 85) is > 0 and statistically significant (indicating improvement) for the responder group and < 0 and statistically significant (indicating deterioration) for non-responders.

Clinically meaningful thresholds

As in other published studies [24], both anchor-based and distribution-based methods were used to explore a preliminary clinically meaningful change (CMC). For anchor-based methods, two external anchors using the objective response based on the Response Evaluation Criteria in Solid Tumours (RECIST) [25] criteria and the EORTC-C30 GHS/QoL scale at day 57 were used.

Several alternative methods were tested for convergence on the CMC using a robust sample size. Day 57 was selected as the most distant time point with at least 50% of the patients with a baseline reporting a score. The external anchors were as follows: (1) patients with objective response classified as ‘responders’, and the remaining patients as ‘non-responders’ and (2) patients classified as GHS/QoL responders/non-responders using the established clinical meaningful threshold of 10 points (as described above). Three distribution-based methods were also used: (1) a 0.5 SD, (2) 1 standard error of measurement (SEM) at visit baseline, and (3) reliable change index. T tests were performed to compare mean changes from baseline for clinical responders versus non-responders.

All data preparation and analyses were performed using SAS version 9.3 (SAS Institute, NC) or higher. Statistical comparisons were made using two-sided tests at α = 0.05 significance level unless specifically stated otherwise. Due to the exploratory nature of the analyses, adjustments for multiple comparisons were not made.

All analyses, except for item characteristics, were performed on items recoded as necessary with higher scores indicating better QoL.

Results

Baseline demographics and patient response

As of data cut-off (24 October 2016), 191 patients were treated for locally advanced or metastatic UC, 182 of which had progressed after platinum-based therapy [16]. Table 2 provides detailed patient demographics. Further demographic details are published elsewhere [18].

Table 2 Patient baseline demographics

Questionnaire completion/compliance

Out of 182 patients in the second-line-or-later (2L+) post-platinum UC subgroup, 172 (94.5%) completed the FACT-Bl questionnaire at baseline. Response rate was high (over 92%) for all items. Two questions were for patients with ostomies only (46 [26%] patients answered these questions). Two questions about sexuality were asked: one for all patients (89 [49%] patients responded) and one for men only (65 [36%] patients responded). Compliance specific to eligible populations for these questions was not calculated.

Item and scale performance

Subject responses covered the entire range (0–4) for each FACT-Bl item. The majority of items had floor or ceiling effects reflecting minimal symptoms and high functioning. Issues were noted for the three items addressing sexual functioning where at least 25% of the patients reported the lowest response option.

Mean values for FACT-Bl and FACT-G total were 107.5 (range 45.7–156.0) and 75.6 (range 21.7–108.0), respectively, which represent 69% and 70%, respectively, of the scale range. The mean score for the FACT-Bl FWB scale was 15.8 (range 0.0–28.0), representing moderately impaired functioning. Subscales and summary scores at baseline are reported in Table 1.

Correlation analysis

The patterns of correlations matched expectations, with items in a scale correlating more highly with the score of that scale than with the score of other scales in the instrument, and all subscales correlating strongly with the FACT-Bl total score and FACT-G total score. The highest and second highest correlation coefficient was 0.91 observed between FWB and FACT-G and 0.81 observed between the PWB and FACT-G, respectively. All subscales correlated moderately or higher with the TOI or NFBlSI-18 index. The SWB subscale had very low correlations (r < 0.3) with the PWB subscale and moderate correlation with EWB and BlCS subscales. Table 3 shows the subscale and summary score correlations.

Table 3 Between subscale and summary score correlations

Reliability analysis

All subscales and summary scores demonstrate adequate to good internal consistency (Cronbach’s α range 0.63 to 0.93). The BlCS subscale internal consistency (Cronbach’s α value of 0.63) was slightly lower than the generally recommended 0.70 [26]. Composed of different symptoms, the BlCS subscale demonstrated more inter-item variability across patients.

Minimal change in the mean subscale or mean summary scores from baseline to dose 3 (day 29) demonstrated good test–retest reliability. The estimated ICC for the two visits (4 weeks apart) ranged from 0.58 (EWB) to 0.80 (FACT-G total score), with the lower bound of the 95% CI ranging from 0.45 (for EWB) to 0.85 (for FACT-G total score). All ICCs exceeded 0.70, except for emotional well-being (ICC 0.58) and social well-being (ICC 0.66). The mean ICC was 0.72. This confirms that FACT-Bl and the NFBlSI-18 showed fair to very good reliability for all dimensions in this patient population. The single items ‘I have a lack of energy’ and ‘I have pain’ also demonstrate acceptable reliability with ICC values of 0.60 and 0.70, respectively.

Construct validity

As could be expected, the physical functional domain from EORTC-QLQ C30 correlates highly (r ≥ 0.77) with the PWB. Strong correlations are observed also with the summary scores (FACT-BL total score, FACT-G total score, TOI and NFBLSI-18) and the FWB and BLCS. The EORTC-QLQ C30 emotional functional domain correlates highly (r ≥ 0.72) with EWB. In contrast, the EORTC-QLQ C30 social functional domain correlates weakly with SWB (r = 0.22).

The fatigue domain from EORTC-QLQ C30 correlates highly with the single-item GP1 (fatigue) from FACT-BL (r = − 0.8), and the pain domain from EORTC-QLQ C30 correlates highly with the single-item GP4 (pain) from FACT-BL (r = − 0.9). Note that the negative value of the Spearman correlation coefficient is due to the fact that higher scores on the fatigue and pain domains from EORTC-QLQ C30 indicate a worse health state while higher values on the single-items GP1 (fatigue) and GP4 (pain) indicate better health state.

The EORTC QLQ-C30 global health status/QoL score from EORTC-QLQ C30 correlates highly with FACT-G, FACT-TOI, FACT-BL total score, and NFBISI-18 (r > 0.76). Evidence for known-group validity was found for the FACT-Bl and NFBlSI-18 through significant differences between groups defined by baseline tumour burden and EORTC QLQ C-30 health status/QoL scores (Table 4). Increased tumour burden was associated with lower scores (worse health status and more symptoms). This finding holds for all the investigated scores except for the SWB and EWB where the difference between groups was not significant. Similarly, the FACT-Bl could show significant differences between groups defined by the GHS/QoL scores at baseline.

Table 4 Mean score at baseline by tumour burden and EORTC QLQ C-30 GHS/QoL

Responsiveness

The FACT-Bl subscale and total scores examined in this study were responsive to changes in bladder cancer symptom severity during a 12-week time frame. The mean change from baseline to dose 7 (day 85) was > 0 (indicating improvement) for almost all FACT-Bl scores for the responder group, while estimates < 0 (indicating deterioration) were observed for non-responders, regardless of which criterion was used to define responders (objective tumour response or EORTC QLQ C-30 GHS/QoL). The mean change from baseline to dose 7 (day 85) for responders using the GHS status/QoL ranged from 0.75 (for NFBlSI-18) to 21.1 (FACT-Bl total score), compared with − 2.95 (FACT-Bl total score) to 0.06 (BlCS) for non-responders. Similarly, the mean change from baseline to dose 7 (day 85) for responders using clinical measure objective response ranged from − 1.05 (for SWB) to 12.0 (FACT-Bl total score), compared with − 3.27 (FACT-Bl total score) to 0.16 (SWB) for non-responders.

Clinically meaningful change

The estimated clinically meaningful thresholds are provided in Table 5. Of note are the FACT-Bl total score ranges of 6.2–11.5 (rounded to 6–12), the FACT-Bl TOI of 5.4–8.7 (rounded to 5–9), the fatigue item (GP1) ranges of 0.6–1.1 (rounded to 1–2), the pain item (GP4) ranges of 0.7–1.0 (rounded to 1–2), and the NFBISI-18 ranges of 4.4–6.7 (rounded to 4–7).

Table 5 Clinically meaningful thresholds

The anchor-based estimates for clinically meaningful threshold estimates were larger and were reconciled with the distribution-based estimates to provide final estimates. Figure 1 shows the mean change for responders and non-responders in the clinical anchor group.

Fig. 1
figure 1

Mean score changes from baseline in clinical anchor-based responders and non-responders. Notes: Independent sample t test, p value from pooled t test unless otherwise noted. p value from Satterthwaite approximation (double asterisks)

Discussion

This paper reports on the psychometric properties of the FACT-Bl in patients with locally advanced or metastatic UC. Our study fills a gap in the psychometric evidence for the FACT-Bl, following FDA guidelines for PRO validation [17], and may be useful for other studies of patients with advanced UC.

Results from the UC patient cohort Study 1108 showed clinically favourable activity and an acceptable safety profile for durvalumab [18]. The current analysis showed an overall high completion of the FACT-Bl and provided additional information on outcomes important to these patients. The completion rates were above the minimum required for scoring the scales and subscales.

The psychometric properties of the existing FACT-Bl scales and the pain and fatigue items were found to be very good, with correlations in the range of others accepted throughout the validation literature [27,28,29]. In addition to reliability, the FACT-Bl subscale and total scores showed good evidence for construct validity and were responsive to changes in UC symptom severity during a 12-week time frame assessed by both objective tumour response and patient evaluation of change, suggesting appropriateness of the instrument to detect symptomatic change.

This study has some limitations. First, the FACT-Bl was created several years ago, and many new therapies have been developed that address symptoms, and are associated with side effects, which are not necessarily captured in the FACT-Bl. Although this questionnaire does include an overall side-effect bother item (item GP5), it is possible that additional items may be needed to assess the impact of newer treatments on patients’ lives. Second, these data were obtained from patients participating in a clinical trial, and the results may not be completely generalisable to all patients with advanced UC.

The NFBlSI-18 is an abbreviated version of the FACT-Bl that adds two new questions to the 16 FACT-Bl questions that are deemed most important by patients with advanced cancer and by clinicians who care for them [5]. In this report, the total NFBlSI-18 score was prorated based on the 16 items in the FACT-Bl. So, although these results provide support for the use of the more-focused NFBlSI-18, more data on the 18-item version will be important to more fully understand its validity.

Another limitation of the study is the unavailability of anchors (e.g. the Patient Global Impression of Change [PGIC] [30]) to determine clinically meaningful thresholds. However, very useful clinical trial end point anchors were used, including tumour response data and the scores on the EORTC QLQ C-30, another commonly used QoL questionnaire. Our analysis suggests that the established thresholds of 2–3 points for the PWB, FWB, EWB, and SWB subscales and 5–7 points for FACT-G are appropriate for this patient population [31]. These ranges are comparable to those found in studies of patients with other cancers, including the prostate [32], lung [33], and breast [31].

The ICC coefficients showed good stability of the FACT-Bl. However, the use of the 29-day post assessment as a proximal measure for test–retest reliability may have attenuated the ICC coefficients, as real change may have occurred during this period. The ICCs for EWB and SWB did not exceed the threshold of 0.70, although ICCs for these two subscales are typically lower than for other subscales, as has been demonstrated in similar studies [34].

Conclusions

Psychometric properties of the existing established scales for the FACT-Bl as well as the NFBISI-18 were found to be very good for use in this population of advanced urothelial cancer patients. Emerging therapies for bladder cancer have accelerated interest in the development and validation of PRO instruments for this patient population to capture meaningful improvements in quality-of-life and symptom outcomes.