Quantifying trade-offs: quality of life and quality-adjusted survival in a randomised trial of chemotherapy in postmenopausal patients with lymph node-negative breast cancer

We evaluated quality of life (QL) and quality-adjusted survival in International Breast Cancer Study Group Trial IX, a randomised trial including 1669 eligible patients receiving tamoxifen for 5 years or three prior cycles of cyclophosphamide, methotrexate and 5-fluorouracil (CMF) followed by 57 months tamoxifen. During the time with CMF toxicity (Tox), without symptoms and toxicity (TWiST), and following relapse (Rel), patients scored their QL indicators and a utility indicator for subjective health estimation between ‘perfect’ and ‘worst’ health. Scores were averaged within Tox, TWiST and Rel and transformed to utilities. Mean durations for the three transition times were weighted with utilities to obtain mean quality-adjusted TWiST (Q-TWiST). Patients receiving CMF reported significantly worse scores for most QL domains at month 3, but less hot flushes. After completing chemotherapy, there were no differences by treatment groups. Benefits evaluated by Q-TWiST favoured the additional chemotherapy. CMF provided 3 more months of Q-TWiST for patients with ER-negative tumours, but CMF provided no benefit in Q-TWiST for patients with ER-positive tumours. Q-TWiST analysis based on patient ratings is feasible in large-scale cross-cultural clinical trials.

The International Breast Cancer Study Group (IBCSG) recently presented first results of a randomised trial (Trial IX) testing the role of adjuvant chemotherapy preceding treatment with tamoxifen for postmenopausal patients with lymph node-negative breast cancer (International Breast Cancer Study Group, 2002). Patients showed substantially better disease-free survival (DFS) with adjuvant chemotherapy if their oestrogen receptor (ER) status was negative (i.e. endocrine nonresponsive). In contrast, if their cancer was ER-positive (i.e. endocrine-responsive), they obtained no benefit from the chemotherapy compared with tamoxifen alone. In this report, we extend the analysis to quantify trade-offs based on quality of life (QL) and quality-adjusted survival (QAS).
The objective of the QL analysis was an extension of our earlier findings on the impact of adjuvant therapy on QL (Hürny et al, 1996b). Presence, duration, timing and anticipation of chemotherapy had a measurable effect on patients' QL, but patients' psychological adaptation was found to be more important for their QL than cytotoxic side effects. In this trial, we used additional QL indicators (Bernhard et al, 1997).
The objective of the QAS analysis was an extension of the Q-TWiST model designed to evaluate 'trade-offs' in clinical trials (Goldhirsch et al, 1989). This model divides the life-span from the beginning of adjuvant treatment until death into three time segments corresponding to distinct health states: Tox (time with toxicity), TWiST (time without symptoms and toxicity) and Rel (time after systemic relapse). In previous trials, Tox and Rel were weighted by arbitrary utilities and added to TWiST to reach an overall assessment of different treatment groups. In this trial, patients were asked to directly assess the relevant utilities during Tox, TWiST and Rel using a scale which asked them to imagine that they would have to live the rest of their life in their current condition and then to rate this condition between 'perfect health' and 'worst health'. This scale has been shown to correspond to utility values as measured by conventional time trade-off (Hürny et al, 1998).
QL and QAS are usually evaluated separately despite their complementary objectives: understanding the pattern of morbidity and adaptation (QL), and providing information for clinical policy and decision-making (QAS). In this analysis, we link these two concepts to provide a quantitative backing for the trade-offs inherent in the use of effective but toxic therapies.

The Trial
Between October 1988 and August 1999, 1715 postmenopausal patients were randomly assigned to receive either tamoxifen (20 mg day À1 ) for 60 months or three 28-day courses of CMF (cyclophosphamide at 100 mg m À2 orally on days 1 -14, methotrexate at 40 mg m À2 intravenously on days 1 and 8 and 5fluorouracil at 600 mg m À2 intravenously on days 1 and 8, followed by tamoxifen (20 mg day À1 ) for 57 months. Tamoxifen following chemotherapy was to begin on day 15 of the final course of CMF. Of the 1715 patients randomly assigned, 1669 (97%) were eligible and assessable. The details of the trial protocol and conduct are described elsewhere (International Breast Cancer Study Group, 2002). The median follow-up of this analysis was 71 months. Participating investigators are displayed in Appendix A.
Patients were asked to complete a QL form at beginning of treatment (baseline), 2 months later (i.e. day 1 of cycle 3 or 8 weeks after tamoxifen start), at each 3 months for the first year, at months 18 and 24, and 1 and 6 months after relapse. This schedule was expanded, with yearly assessments up to month 72 (June 1995) regardless of the disease status (December 1996), that is, the 1 and 6 month assessments following relapse were dropped.

Quality of life analysis
We predicted worse QL during chemotherapy (month 3) but no residual effects among treatment groups after completion of chemotherapy (Hürny et al, 1996b). A clinically meaningful difference was defined based on our previous observations of the coping indicator in postmenopausal patients with lymph nodepositive breast cancer (Hürny et al, 1996b), as the ratio of the average change between baseline and month 3 in patients with Tam alone vs those with three initial cycles CMF followed by Tam (ratio ¼ 3.37; not shown in the original article). We chose to use the ratio instead of the absolute difference in scores because the baseline scores in the present trial were better than those of the earlier trial, which involved women with node-positive disease.
As a secondary hypothesis, we expected smaller adverse effect of CMF on QL at month 3 for patients with ER-negative than those with ER-positive tumours. We reasoned that patients with ERnegative tumours would be more likely to feel that chemotherapy was necessary and therefore find it more bearable than patients with ER-positive disease.
We report QL data for the first 24 months in patients without recurrence within this time. Of the 1669 eligible and assessable patients, 1398 patients were included in the QL analysis. Of these, 803 completed all of their assessments using the 1993 version of the IBCSG QL Core Form (Bernhard et al, 1997). This form comprised indicators for physical well being, mood (Hürny et al, 1996a), coping effort (Hürny et al, 1993), appetite, tiredness, hot flushes, nausea/vomiting, perceived social support, restrictions in arm movement and subjective health estimation (SHE) (Hürny et al, 1998) in the linear analogue self-assessment (LASA) format. The other 595 patients either filled out all of their assessments using the previous version of the form, which included four of these indicators (physical well being, mood, coping effort, appetite) (Butow et al, 1991), or they filled out a combination of the two. For consistency, only data provided by the 803 patients mentioned above were used for the analyses of the additional indicators. All indicators were analysed separately.
To test for differences between treatments in QL scores at each time point, we used ANOVA adjusting for culture (country/ language; see Table 1 for definitions). For baseline analysis and tests for heterogeneity among treatment groups at each time point, we used the square roots of the scores (Hürny et al, 1996b) because this transformation approximated a normal distribution and was effective in stabilising the variances for all indicators. The figures however show the results in the original scores from 0 to 100.
We also tested for differences in QL scores between baseline and month 3 and 6, respectively, of the within-patient changes in an ANOVA model that included assigned treatment and culture. The intrapatient differences were normally distributed for all QL scores.

Quality-adjusted survival analysis
For this analysis, we used the SHE indicator as designed for QAS (Hürny et al, 1998). Patients were asked to imagine that they would have to live the rest of their life in their current condition and then to rate this condition between 'perfect health' and 'worst health'. This indicator was previously validated against a time trade-off (TTO) interview (i.e. a preference measure) in patients with metastatic breast cancer. The conventional negative anchor 'death' was replaced by 'worst health' in the adjuvant setting since the two versions showed comparable results and 'death' was judged to be an unacceptable anchor for some of the language/culture groups in the present study.
Following the Q-TWiST model (Goldhirsch et al, 1989), we defined three clinical health states: Tox, TWiST and Rel. The clinical question of this trial was the impact of adding three cycles of CMF. Therefore, Tox was calculated only in patients randomised to receive CMF.
To calculate Q-TWiST, each health state is assigned a utility coefficient (u t , u twist and u rel ) which gives a value to time spent in the state relative to the value of an equal amount of time spent in a state of 'perfect health'. The utilities are assumed to be in the interval [0,1], where a zero indicates worst possible health, and unity indicates a state as good as perfect health. The Q-TWiST model is obtained by taking the linear combination of the health state durations adjusted by the respective utilities: Our objective was to make a realistic patient-derived estimate of the utilities to evaluate Q-TWiST. We hypothesised that utilities derived from observed SHE values in both the Tox and Rel states would be substantially higher than the arbitrary value of 0.5 used to illustrate the Q-TWiST method when it was introduced (Goldhirsch et al, 1989). Also, we predicted that the actual scores during the TWiST state would be less than 'perfect', if only because all patients were receiving Tam. Within each defined health state, all available SHE scores were used (N ¼ 1669). For Tox, we used the median value at month 3 (i.e. peak of toxicity). For TWiST, we used the median of the SHE scores averaged within patients over the first 24 months after randomisation (excluding the first 3 months in patients with CMF). For Rel, we used the median of the SHE scores averaged within patients over the first 6 months after relapse. These SHE estimates were converted to quality weights using a power transformation: (Torrance et al, 1996;Torrance et al, 2001). We used the a value from our validation study (a ¼ 1.6) and performed a sensitivity analysis for a range of published a's.
An exploratory analysis was conducted to assess u t confined to those patients who had any recorded subjective toxic effect of grade 2 or higher (excluding amenorrhea) during their chemotherapy as was done in the original Q-TWiST model (Goldhirsch et al, 1989).
Mean health state durations were estimated from censored survival data (product limit method) up to 72 months from randomisation by computing the areas between the survival curve estimates for the transition times. These durations were adjusted using the patient-derived utilities in order to estimate mean Q-TWiST for each treatment group. The treatments were separately compared overall and within the prospectively stratified ERnegative and ER-positive cohorts.
We performed a threshold utility analysis both overall and within the ER-negative and ER-positive cohorts. These findings provide a decision aid for a range of utilities for Tox and Rel. To account for the less than 'perfect health', we divided each of the three utilities by the patient estimated u twist so that Q-TWiST is interpreted relative to TWiST and more accurately reflects the patients' perception during the time period: Q-TWiST ¼ u t /u twist Â Tox þ u twist /u twist Â TWiST Â u r /u twist Â Rel. The treatment comparison results are presented for all possible values of the utilities.
For all analyses, P-values less than 0.05 were deemed statistically significant. No adjustment was made for multiple testing.

Quality of life
Of the 1669 eligible patients, 1398 were assessable for the QL study. The submission rate of the QL form was 87% (1213) at baseline declining to 75% (1044) at month 24. The definition of the sample is shown in Table 1. The patient characteristics are shown in Table 2. Figure 1 depicts the expected short-term effect between treatments at month 3, with less severe tiredness (P ¼ 0.01) in patients receiving Tam alone, as compared to those patients with three initial cycles of CMF. Nausea/vomiting (Po0.01) and appetite were similarly affected (Po0.01). Also, at 3 months, physical well being (Po0.01) and mood (Po0.01) showed effects in favour of Tam alone. Subjective health estimates (SHE) were similarly affected (P ¼ 0.02), as displayed in Figure 2. In contrast, patients receiving CMF reported less hot flushes at 3 months (Po0.01) than those receiving Tam. CMF recipients started tamoxifen after completing chemotherapy.
The global QL indicators showed effects in favour of Tam alone. The coping scores indicated a steady improvement over the first 24 months. This adaptation was delayed but not prevented in patients receiving CMF (Po0.01), as shown in Figure 3. The ratio of the average change in the coping indicator between baseline and month 3 in patients with Tam alone vs those with CMF followed by  (6)  47 (7) 89 (6) Vessel invasion absent 540 (77) 528 (76) 1068 (76)    Tam (ratio ¼ 3.17) was close to that observed in our previous trial (ratio ¼ 3.37), indicating a clinically meaningful difference. There was a baseline difference (P ¼ 0.03) with patients assigned to CMF reporting lower SHE scores. Of these patients, 43% completed the QL form prior to randomisation (thus up to 57% did so after knowing treatment assignment), and 95% of all patients with CMF at day 1 of the first cycle or before (thus 5% did so after experiencing initial toxicity).
The mean treatment differences in the changes of scores over the first 3 months are shown in Figure 4. After the completion of chemotherapy (i.e. month 6 assessment), there were no statistically significant residual differences between treatment groups for any of the QL measures.
Similar analyses were performed within the ER-negative and ER-positive cohorts. Contrary to our hypothesis, there was no indication for a different treatment effect on QL by ER status (data not shown).
The patient derived utilities for the three health states estimated from SHE scores are shown in Table 4. For Tox, 276 patients with CMF responded to the SHE question. The patient-derived utility for this period was 0.89. Of these patients, 66% (N ¼ 183) had a subjective toxic effect of grade 2 or higher (excluding amenorrhea) reported during the three cycles. The estimated utility for Tox of this subgroup was also 0.89.
For TWiST, SHE scores were available for 742 patients. The estimated utility for this period was 0.91. There was no difference in utility during TWiST by randomised treatment. For Rel, SHE scores were available for 37 patients. Those who relapsed after adjuvant CMF indicated a trend to lower SHE scores (median ¼ 0.5, range: 0.10 -0.99) than those initially treated with Tam alone (median ¼ 0.62, range: 0.26 -1.0). Given the small subgroups, we used the overall SHE scores (median ¼ 0.54) for QAS (u r ¼ 0.71).   Positive values indicate either an improvement or less deterioration in QL for patients randomised to receive Tam alone compared with patients randomised to receive CMF followed by Tam.
The average values of time spent in Tox, TWiST and Rel within 72 months of randomisation are summarised in Table 5. The calculation of Q-TWiST is illustrated by the weighted combination of these components using the patient-derived estimates of u t ¼ 0.89, u twist ¼ 0.91 and u r ¼ 0.71. Based on these utilities, the average Q-TWiST within the first 6 years for patients receiving CMF followed by Tam was 62.5 months, 1.1 month longer than patients receiving Tam alone (P ¼ 0.03).
Patients benefited from adjuvant chemotherapy if their ERstatus was negative, with an average gain in Q-TWiST of 3 months as compared to those with tamoxifen alone (P ¼ 0.03). Patients with ER-positive tumours obtained no Q-TWiST benefit from the chemotherapy . Figure 5 allows a sensitivity analysis to display the effect on Q-TWiST of any combination of utilities during Tox and Rel. To prepare this figure, we standardised the other utilities relative to u twist For example, a conventional treatment comparison for DFS makes no deduction for reduced utility during Tox (u t ¼ u twist ¼ 1). Based on these assumptions, CMF followed by Tam is obviously preferred to Tam alone. Taking into account different values for Tox (u t o1;u twist ¼ 1) and including Rel, there is a gain in Q-TWiST for CMF followed by Tam for most values of u t and u r . This trend towards an improvement reaches statistical significance for     Finally, we performed a sensitivity analysis for a range of proposed a's used to transform the SHE scale to reflect a TTO scale. After using an a of 1.4, the utilities for each of the three health states were similar although slightly reduced. After using an a value of 1.8, again the utilities were similar for all three health states although slightly higher. Thus, incorporating utilities using an a within the range of 1.4 -1.8 would not markedly affect the results obtained from our Q-TWiST analysis (data not shown).

DISCUSSION
The main objective of these analyses was to link QL and QAS in comparing the randomised treatments. In addition, we investigated whether there was a difference in the magnitude of the chemotherapy effect according to ER status of the primary tumour as was found for survival in this trial (International Breast Cancer Study Group, 2002).

CMF better than Tam alone (NS)
CMF better than Tam alone (S) +6 +5 +4 +3 +2 +1 Figure 6 Threshold utility analysis of the ER-negative cohort (N ¼ 382). The diagonal lines indicate the units of months gained in Q-TWiST for different utilities for Tox and Rel. The utilities for TWiST is defined as u twist ¼ 1 (i.e. reference state). All values of the utilities for Tox and Rel result in improved Q-TWiST for CMF followed by Tam compared with Tam alone. The shaded region (top left) represents utilities (u t and u r ) for which the improvement in Q-TWiST is statistically significant (Pp0.05). S ¼ significant and NS ¼ not significant. The diagonal lines indicate the units of months gained in Q-TWiST for different utilities for Tox and Rel. The utilities for TWiST is defined as u twist ¼ 1 (i.e. reference state). The solid line labelled 0 is the threshold above which CMF followed by Tam results in improved Q-TWiST. The area below 0 with hatch marks represents improved Q-TWiST for Tam alone. The shaded region (bottom right) represents utilities (u t and u r ) for which the improvement in Q-TWiST is statistically significant for Tam alone (P p 0.05). S ¼ significant and NS ¼ not significant.

Quality of life
The subjective impact of adjuvant therapy was investigated over the first 24 months of treatment. At month 3, patients receiving CMF reported the expected side effects. Better QL in patients with Tam alone was also indicated by the various global measures of well being, coping and health status, despite the earlier beginning of hot flushes for the Tam alone group.
After completing chemotherapy, QL scores rapidly improved as found in our previous trials (Hürny et al, 1996b;The International Breast Cancer Study Group, 2001) and in an adjuvant trial by the Eastern Cooperative Oncology Group with a more comprehensive QL questionnaire (Fairclough et al, 1999). Contrary to our hypothesis, patients' perception of chemotherapy was not affected by the ER status of their tumour.
We faced problems with the timing of administration for the baseline QL assessment. A substantial proportion of patients completed the QL form after randomisation but before starting CMF. These patients were presumably aware of their assigned treatment. The baseline scores may reflect an anticipation of cytotoxic side effects or perception of worse health status, as described for indicators of QL and health status and for preference measures (Hürny et al, 1994;Hürny et al, 1996b;Jansen et al, 2001b). To eliminate any differential anticipatory effects on baseline scores in future studies, we have introduced a completed QL form as an eligibility criterion.
Overall, the improvement in QL over the first 6 months was more pronounced than the transient impairment by chemotherapy. These findings confirm those of IBCSG Trials VI and VII (Hürny et al, 1996b), suggesting that a patient's psychological adaptation is more important for her QL than cytotoxic side effects. There were no residual effects of CMF as assessed by our indicators. Similarly, in a survey in premenopausal women with node-negative breast cancer treated with or without adjuvant CMF, there were no long-term effects of CMF in general and breast cancer-specific QL domains (Joly et al, 2000). This finding does not exclude long-term sequelae, such as fatigue (Bower et al, 2000), or impaired sexual (Broeckel et al, 2002) or cognitive functioning (Phillips and Bernhard, 2003), in subgroups.

Quality-adjusted survival
The Q-TWiST method has hitherto used utilities assigned arbitrarily (Goldhirsch et al, 1989) or estimated based on patient reported QL (Fairclough et al, 1999). Although this approach provides a useful decision aid via a threshold analysis, we wanted to take into account patients' own perception of their health status.
We used a global indicator for health status to reflect the patients' perception across the three distinct health states (Tox, TWiST, Rel) (Hürny et al, 1998). This extension of the Q-TWiST model has provided new information for decision-making. The extent of impairment during Tox was less severe than the conventional assumptions. This finding is not related to the timing of the baseline assessment as we used only the scores at month 3. Our finding is consistent with those of preference studies showing that a majority of patients who previously received adjuvant chemotherapy for breast cancer accept the morbidity of these therapies in turn for a relatively modest survival gain (Lindley et al, 1998;Ravdin et al, 1998;Jansen et al, 2001a;Simes and Coates, 2001).
The SHE scores during the TWiST health state were less than 'perfect' and relatively close to the Tox scores. The fact of having cancer and late effects of surgical and systemic treatments may have had an impact on the health estimates (Broeckel et al, 2000;Ganz et al, 2002). All patients received Tam. Endocrine side effects may be under-reported (Fellowes et al, 2001), especially vasomotor and gynaecological symptoms (Day et al, 1999;Fallowfield et al, 2001).
The SHE scores during the Rel state were distinctly worse than those of Tox and TWiST, although better than the arbitrary values used in previous analyses. For a majority, the relatively high value may reflect their readjustment and the beneficial impact of the treatment for advanced disease (Bernhard et al, 1999). However, there was substantial variability due to both the small number of patients who had a recurrence and the limited information available on these cases.
Overall Q-TWiST accumulated within 72 months after randomisation indicated an advantage of 1 month longer of 'perfect health' with CMF rather than Tam alone. However, in the ER-negative cohort, three cycles CMF provided 3 months more of Q-TWiST than Tam alone, while for the larger ER-positive cohort, CMF provided no benefit.
The findings of the QAS analysis thus complement and expand those of the QL investigation. They provide additional information for clinical policy and decision-making in postmenopausal patients with node-negative breast cancer: Taking patients' view into account, patients benefited substantially from adjuvant chemotherapy if their ER status was negative.
These analyses are truncated at 72 months follow-up. Further followup will enhance any advantage of chemotherapy over Tam alone.
In contrast to other studies using global indicators of health status for QAS analysis (Earle et al, 2000), we specified the frame of individual reference. Patients were asked to imagine they would have to live the rest of their life in their current condition and then to rate their concurrent condition. Although not a true preference measure, this indicator is suitable for large-scale phase-III trials taking into account patients' own evaluation of distinct health states within a given context (e.g. culture). Our premise was to evaluate estimates based on actual experience instead of hypothetical scenarios. Experience of breast cancer (Ashby et al, 1994) and adjuvant chemotherapy (Lindley et al, 1998;Jansen et al, 2001b) have been shown to be associated with current preferences. Most often, this type of evaluation is based on cross-sectional comparisons (Earle et al, 2000). A repeated formal utility interview is not feasible in an international phase-III trial and would not imply a priori a better performance than a rating scale such as our SHE indicator (Giesler et al, 1999).
The method of insertion of utility values into decision models by multiplication of utilities and time assumes the independence of time and values. This assumption has been challenged (Bleichrodt and Johannesson, 1997;Bala et al, 1999;Stiggelbout et al, 1995;Bernhard et al, 2001). Preference studies in cancer patients with repeated assessment reported both reasonably stable (O'Connor et al, 1987;Llewellyn-Thomas et al, 1993;Jansen et al, 2001b) and unstable (Llewellyn-Thomas et al, 1992;Jansen et al, 2000) utilities, suggesting caution in interpreting single point estimates. Variation in self-report health status across distinct clinical situations is plausible and argues for a longitudinal evaluation.
Further developments include the relationship between additional QL domains, such as fatigue (Sadler and Jacobsen, 2001) and cognitive function (Phillips and Bernhard, 2003), and the SHE-scores across different health states (Lowy and Bernhard, 2004). Finally, this concept can be adapted to the palliative setting where there are no clearly separated health states (Glasziou et al, 1998;Cole et al, 2004).
In summary, patients receiving initial CMF indicated the expected adverse impact on QL and a delay in adaptation compared to those assigned Tam alone. The patient estimated utilities for TWiST indicated less than 'perfect health' under Tam. Longitudinal evaluations of SHE scores provide important information for QAS (Q-TWiST) analysis and clinical decision-making. 141711), the Frontier Science and Technology Research Foundation, the Swiss Group for Clinical Cancer Research (SAKK), the Swiss Cancer League, the American Cancer Society (RPG-90-013-08-PBP) and the United States National Cancer Institute (CA-75362). We also acknowledge support for the Cape Town participants from the Cancer Association of South Africa and for St Gallen participants from the Foundation for Clinical Cancer Research of Eastern Switzerland.