Main

Overall survival (OS) is the gold standard for the assessment of efficacy in phase III trials of cancer therapies (Sargent and Hayes, 2008). However, use of OS as the primary end point requires that large numbers of patients be followed for an extended period of time to detect statistically significant differences between the treatment groups, thereby increasing study costs and delaying access to potentially beneficial treatments. Also, for ethical or practical reasons, patients randomised to control therapy are often allowed to crossover to study therapy, or receive an off-study investigational or other active treatment upon disease progression, thereby diluting the observed effect of study treatment on OS. These factors make measures of time to disease progression (e.g., progression-free survival (PFS) or time to progression (TTP)) attractive alternatives to OS. Measures of PFS/TTP generally require fewer patients and/or shorter follow-up to detect statistically significant differences between the treatment groups, and are not confounded by use of subsequent therapies upon disease progression. Moreover, PFS/TTP may be important measures per se, as disease progression may be associated with reduced patient health-related quality of life and increased healthcare costs.

The use of PFS/TTP as a valid surrogate end point for OS requires that treatment effects on OS can be reliably predicted from observed treatment effects on PFS/TTP (Fleming and DeMets, 1996; Tang et al, 2007; Burzykowski et al, 2008). Although the association between treatment effects on PFS/TTP and treatment effects on OS has been examined in a variety of solid tumours (Louvet et al, 2001; Johnson et al, 2006; Buyse et al, 2007; Tang et al, 2007; Sherrill et al, 2008), it has not been rigorously examined in patients with metastatic renal cell carcinoma (mRCC) (Knox, 2008). The objective of this study was to evaluate the association between treatment effects on PFS/TTP and treatment effects on OS in randomized controlled trials of patients with mRCC.

Materials and methods

Search strategy

Medline was searched to identify clinical trials of interleukin-2, interferon (IFN)-α, axitinib, lapatinib, pazopanib, sunitinib, sorafenib, bevacizumab, everolimus, or temsirolimus in mRCC. The search was limited to studies published in English from January 1997 to January 2010, which reported data on survival and/or mortality in the abstract. Abstracts of identified studies were reviewed by two independent reviewers (AK and TED) to identify studies for which full-text articles would be retrieved and reviewed. This search was supplemented with hand searches of the American Society of Clinical Oncology (ASCO) and European Cancer Organisation (ECCO) Web sites for abstracts, posters, and/or presentations reported between January 2005 and December 2010, as well as reference lists of retrieved articles and prior meta-analyses and systematic reviews (Coppin et al, 2005, 2008; Coppin, 2008; Thompson Coon et al, 2010). Studies were included if they reported median PFS/TTP and median OS for two or more treatment groups or hazard ratios (HR)s for PFS/TTP and HRs for OS for one or more treatment comparison.

Data extraction

For each study selected for inclusion, information was extracted on first author, year of publication, prior treatment (treatment-naive, prior cytokine treatment, prior targeted treatment, mixed prior treatment), treatments evaluated, measures of PFS/TTP used (PFS or TTP), overall response rate (ORR) and whether trial patients were allowed to crossover to other study therapy or other active treatment after progression. For each treatment group, sample sizes for ORR, PFS/TTP and OS, median PFS/TTP, median OS, and corresponding 95% confidence intervals (CIs) for median PFS/TTP and median OS were recorded. Also recorded were HRs (and corresponding 95% CIs) for PFS/TTP and OS. Studies representing duplicate reports of the same trial were excluded, with the report least likely to have been impacted by crossover selected for the analysis (e.g., based on rank-preserving structural failure time (RPSFT) models or inverse probability of censoring weighted (IPCW) analyses, with patients censored at crossover, or at study unblinding before crossover).

Measures of treatment effect

Two measures of treatment effects on PFS/TTP and OS were analysed: (1) the absolute differences between the treatment groups in median PFS/TTP (in months) vs the absolute differences between groups in median OS (in months) and (2) the negative of the natural log of the HR for PFS/TTP (−ln HRPFS/TTP) vs the negative of the natural log of the HR for OS (−ln HROS). For small treatment effects (relative risk reduction (RRR) ±30%), the – ln (HR) is approximately equal to the RRR. The HR is frequently used as the primary measure of treatment effect in controlled clinical trials. However, the median survival for each treatment group is also frequently reported. The advantage of the HR is that it reflects a comparison of hazards for the entirety of the survival distribution, whereas the difference in medians reflects a comparison at a single point on the distribution. On the other hand, if treatment has no effect on post-progression survival, the gain in median PFS will be an unbiased estimate of the gain in median OS regardless of the duration of PPS, whereas the HR for OS will tend to be greater than that for PFS, and the degree of difference will depend on the duration of PPS (Broglio and Berry, 2009). Both of these measures have been used in prior studies of the association between PFS/TTP and OS in other cancers (Louvet et al, 2001; Johnson et al, 2006; Buyse et al, 2007; Tang et al, 2007; Sherrill et al, 2008).

For studies that reported both PFS and TTP, we recorded PFS. For those that reported TTP only, we combined TTP results with those for PFS from other studies. Although TTP and PFS are different measures, in the setting of mRCC, wherein survival is short and death due to reasons other than mRCC is rare, the HRs and differences in median survival are likely similar for TTP and PFS. TTP and PFS have been combined in prior studies of the association between disease progression end points and OS in other tumours (Tang et al, 2007; Sherrill et al, 2008). In an evaluation of studies of metastatic breast cancer patients in which TTP and PFS were not combined but were analysed separately, the associations between TTP and PFS on the one hand and OS on the other were similar (Burzykowski et al, 2008).

For studies that did not report HRs for PFS/TTP or OS, HRs were estimated using data from Kaplan–Meier curves or numbers of events and log-rank statistics (Tierney et al, 2007). For treatment arms for which median OS was not reached but for which Kaplan–Meier survival curves were reported, median survival was estimated by fitting Weibull survival functions to reported Kaplan–Meier curves (Carroll, 2003). For studies that included more than the two treatment groups, treatment effects on PFS/TTP and OS were calculated for k−1 of k potential comparisons (e.g., for a study with treatments A, B, and C, we calculated two comparisons: A vs B and A vs C). In cases with an obvious control arm, this arm was selected as the reference group for all comparisons.

Statistical analyses

The possibility of publication bias was assessed by examining asymmetry of a funnel plot of estimates of -ln HROS vs its s.e. and using Egger’s test (Egger et al, 1997). Pearson correlation coefficients between treatment effects on PFS/TTP and treatment effects on OS were calculated. In calculating Pearson correlation coefficients, each treatment comparison was weighted by the sum of the number of patients in the two treatment groups compared (non-parametric Spearman correlation coefficient were also calculated and were virtually the same as Pearson correlations and were not reported). The associations between treatment effects on PFS/TTP and treatment effects on OS also were examined using ordinary least squares regression with each treatment comparison weighted by the sum of the number of patients in the two treatment groups. Ninety-five percent prediction limits were calculated from weighted regressions using the mean number of patients per comparison as a weight.

Analyses were conducted separately by prior treatment for mRCC (none vs any), PFS/TTP end point reported (PFS vs TTP), whether crossover to active therapy after disease progression was allowed, and year of publication. Analyses also were conducted using all potential comparisons from trials with more than two treatment arms (e.g., for a study with treatments A, B, and C, we calculated three comparisons: A vs B, A vs C, and B vs C), setting the intercept terms in regression models to zero, and using all comparisons and setting intercept terms to zero. An analysis also was conducted to assess the association between ORR and OS for studies that reported ORR. In this analysis, comparisons involving arms with zero or missing response data were excluded. The treatment effect on ORR was measured in terms of the natural log of the relative risk of the response (ln RRORR) and the treatment effect on OS was measured in terms of the −ln HROS. An analysis also was conducted of the association between the −ln HR PFS/TTP and −ln HROS in which each comparison was weighted by inverse of the variance of the −ln HROS rather than the number of subjects.

Results

Search results

The search identified 235 potential studies. From these, as well as hand searches of reference lists of retrieved studies, ASCO and ECCO web sites, and prior systematic reviews, a total of 31 studies were identified, representing 10 943 patients, 75 treatment groups, and 41 potential treatment comparisons that reported sufficient information for either the analysis of correlation between differences in median PFS/TTP and differences in median OS or between -ln HRPFS/TTP and −ln HROS (Table 1) (Kruit et al, 1997; Negrier et al, 1998, 2000, 2007, 2008; Medical Research Council Renal Centre Collaborators, 1999; Pyrhonen et al, 1999; Motzer et al, 2000, 2007, 2008, 2009, 2010; Atzpodien et al, 2001, 2002, 2004, 2006; Dutcher et al, 2003; Yang et al, 2003; Atkins et al, 2004; Aass et al, 2005; Donskov et al, 2005; McDermott et al, 2005; Tannir et al, 2006; Bukowski et al, 2007; Escudier et al, 2007a, 2007b; Hudes et al, 2007; Amato et al, 2008; Figlin et al, 2008; Sternberg et al, 2009, 2010a; Gore et al, 2010; Korhonen and Malangone, 2010; Rini et al, 2010; Korhonen et al, 2011; Wiederkehr et al, 2011). The great majority of the studies that were excluded for lack of information on both PFS or TTP and OS.

Table 1 Comparison included in analysis

Study characteristics

Fifteen studies (48%) were published before 2006; 17 (55%) were in treatment-naive patients; seven (23%) allowed crossover to active treatment after disease progression. Ten studies (32%) included one or more targeted treatments. For the phase III trial of sunitinib vs IFN, several analyses of OS were conducted, which might be differentially affected by crossover from IFN to sunitinib. In our base case, we used the results from the analysis in which patients who received any post-study treatment were excluded (HR 0.647, 95% CI: 0.483–0.870, median OS 28.1 months vs 14.1 months for sunitinib (n=193) vs IFN (n=162)) (Figlin et al, 2008; Motzer et al, 2009). The HR from this analysis was virtually identical to that reported in the interim analysis of the ITT population before patients were allowed to crossover (HR 0.65, 95% CI: 0.449–0.942, median OS not reached for sunitinib (n=375) or IFN (n=375)) (Motzer et al, 2007). We used the values from the former because median OS was not reached for the latter. For the phase III trial of everolimus, placebo patients were allowed to crossover to everolimus after documented progression (McDermott et al, 2005; Korhonen and Malangone, 2010; Motzer et al, 2010; Korhonen et al, 2011; Wiederkehr et al, 2011). For this study we used median OS based on analysis using the RPSFT model to control for crossover (Korhonen and Malangone, 2010; Korhonen et al, 2011); the HR for OS was based on analysis using IPCW analysis (Wiederkehr et al, 2011). For phase III trial of pazopanib, the HR for OS was based on the analysis using RPSFT to control for crossover (Sternberg et al, 2010b).

Thirty studies representing 40 treatment comparisons reported median PFS/TTP and median OS for one or more comparisons. Median OS was estimated based on fitting of Weibull survival functions to Kaplan–Meier curves for one treatment arm (bevacizumab plus placebo arm in the study by Bukowski et al (2007). This arm was represented in one comparison. Across all studies, median PFS/TTP and OS averaged 4.9 and 16.6 months, respectively. The median difference between the treatment groups in PFS/TTP averaged 1.4 months (s.d. 2.1 months, range −1.4–7.1 months); the median difference between the treatment groups in OS averaged 2.0 months (s.d. 5.7 months, range −18.0–14.0 months). Twenty-eight studies representing 36 treatment comparisons reported sufficient information for the analysis of −ln HRPFS/TTP vs −ln HROS. The −ln HRPFS/TTP averaged 0.31 (s.d. 0.36, range −0.42–1.17); the −ln HROS averaged 0.15 (s.d. 0.27, range −0.45–0.84). HRs were estimated from Kaplan–Meier curves or log-rank statistics and event counts in 40 treatment arms represented in 23 comparisons.

The funnel plot of estimates of −ln HROS vs corresponding s.e.’s provided no strong evidence of publication bias (Figure 1). The estimated intercept on a regression of the inverse s.d. vs the standardized effect size (Egger’s test) was 0.17 (P=0.7658); this also suggests no evidence of publication bias.

Figure 1
figure 1

Funnel plot of negative log of HR for OS vs corresponding s.e. for each comparison. The funnel plot shows an assessment of publication bias. If there is no publication bias, the coordinates should be scattered symmetrically around the pooled estimate. The vertical line represents the fixed effects pooled estimate of −ln HROS. The diagonal lines describing the funnel represent the 95% CI for each value of the s.e. The outlier is the coordinate for the pivotal study of pazopanib (−ln HROS=0.84, s.e.(−ln HROS)=0.62) (Sternberg et al, 2010b). The relatively high degree of imprecision associated with this estimate was due to the RPFST method used to analyse OS to control for crossover.

Association between treatment effects on PFS/TTP and treatment effects on OS

The weighted Pearson correlation coefficient for the difference in median PFS/TTP and the difference in median OS was 0.54 (P=0.0002). In linear regression analysis, a 1-month difference in median PFS/TTP was associated with a 1.17-month difference in median OS (95% CI: 0.59, 1.76; adjusted R2=0.28) (Figure 2).

Figure 2
figure 2

Association between differences in median PFS/TTP and differences in median OS. Abbreviation: R2=adjusted R-squared. Area of bubbles is proportional to the number of patients. Solid line is predicted value. Dashed lines are prediction intervals.

The weighted Pearson correlation coefficient for −ln HRPFS/TTP and −ln HROS was 0.80 (P<0.0001). The coefficient on −ln HRPFS/TTP vs −ln HROS was 0.64 (95% CI: 0.47, 0.81; adjusted R2=0.63) (Figure 3), suggesting that a 10% increase in the RRR for PFS/TTP is associated with an 6% increase in the RRR for OS.

Figure 3
figure 3

Association between negative log of HR for PFS/TTP and negative log of HR for OS. Abbreviation: R2=adjusted R-squared. Area of bubbles is proportional to the number of patients. Solid line is predicted value. Dashed lines are 95% prediction intervals.

Subgroup and sensitivity analyses

Results in subgroups of studies are presented in Table 2. The correlation between treatment effects on PFS/TTP and treatment effects on OS was greater in studies that did not allow/require crossover, studies that used PFS rather than TTP, and in studies published before 2005 (studies before 2005 were less likely to have allowed crossover). There was no significant association between the treatment effects on PFS/TTP and OS in the subset of trials of vascular endothelial growth factor (VEGF) inhibitors, although there was a trend in the linear regression for −ln HRPFS/TTP vs −ln HROS (P=0.0510). Results were similar to those of primary analysis when all potential comparisons from trials with multiple treatment arms were included. The adjusted R2 for the analysis of differences in median PFS/TTP vs differences in median OS was greater with the exclusion of the study by Bukowski et al (2007), a randomized phase II trial comparing bevacizumab plus erlotinib vs bevacizumab plus placebo that was an extreme outlier, with a positive treatment effect on PFS/TTP and a negative treatment effect on OS (difference in median PFS/TTP 1.4 vs difference in median OS −18.0 (the latter was estimated based on fitting a Weibull survival function to Kaplan–Meier curves) and HR for PFS/TTP 0.86 vs HR for OS 1.57). The observed negative effect of erlotinib on OS in this study may have been due to the relatively high utilisation of non-study treatment post progression in the placebo group (Bukowski et al, 2007). The associations between treatment effects on PFS/TTP and treatment effects on OS were less strong when we used the results for OS from trials of sunitinib, everolimus, and pazopanib that were not adjusted for crossover from placebo to active therapy. The weighted Pearson correlation coefficient for the natural log of the relative risk of the ORR (i.e., ln RRORR) vs −ln HROS was 0.78 (P<0.0001). In linear regression, the coefficient on ln RRORR vs −ln HROS was 0.30 (95% CI: 0.20, 0.39, adjusted R2=0.59) (Figure 4). In the analysis of −ln HRPFS/TTP vs −ln HROS in which comparisons were weighted by the inverse variance of −ln HROS (35 comparisons), the weighted Pearson correlation coefficient for ln HRPFS/TTP vs −ln HROS was 0.76 (P<0.0001). The coefficient on −ln HRPFS/TTP was 0.53 (95% CI: 0.37, 0.68, adjusted R2=0.56). These results are qualitatively similar to those in which the results are weighted by the numbers of subjects.

Table 2 Sensitivity and subgroup analyses
Figure 4
figure 4

Association between the log of relative risk of overall response and the negative log of the hazard ratio of OS. R2=adjusted R-squared. Area of bubbles is proportional to the number of patients. Solid line is predicted value. Dashed lines are 95% prediction intervals.

Discussion

Advances in understanding the biology and genetics of renal cell carcinoma have led to novel approaches for treatment of mRCC that target the VEGF receptor. With the growing therapeutic arsenal against mRCC, it is now feasible for patients to receive multiple lines of potentially beneficial treatment. Indeed, a recent trial reported on a study population that had received three to five prior lines of therapy (Motzer et al, 2010). With the increasing number of effective treatments available (Soulieres, 2009), the effect of first-line therapies on OS are more likely to be confounded by the effects of subsequent therapies. The question of whether PFS/TTP rather than OS should be employed as a primary outcome measure in pivotal studies of new treatments for mRCC is therefore important. This situation is similar to that with metastatic colorectal cancer, in which there was rapid development of novel treatments, necessitating the consideration of using PFS as a surrogate for OS in pivotal studies (Buyse et al, 2007). Although several novel treatments for mRCC have been approved for use in the United States with TTP or PFS as the primary end point in pivotal studies, and results of population-based historical cohort studies of sunitinib and sorafenib have demonstrated that the introduction of these treatment has resulted in increased survival (Heng et al, 2009a; Warren et al, 2009), a rigorous examination of the association between PFS/TTP end points and OS has yet to be undertaken.

The analysis presented here suggests that treatment effects on measures of PFS/TTP are strongly associated with treatment effects on OS in patients with mRCC. However, the proportion of variability in treatment effects on OS that was explained by treatment effects on PFS/TTP was modest. In particular, the adjusted R2 was 0.63 for the association between −ln HRPFS/TTP and −ln HROS. This value is within the range reported in other prior analyses of the relationship between treatment effects on PFS/TTP and OS (Sherrill et al, 2008). A high R2 is not a necessary criterion for surrogacy, however, as some of the unexplained variation may reflect the sampling error in each trial due to small sample size. Even for a perfect surrogate end point, therefore, R2 will be less than one in a set of trials with small samples (Tang et al, 2007). The trials examined in this evaluation were relatively small (median of 96 patients per arm). Moreover, there is no standard value above which an R2 (or correlation coefficient) can be claimed to be sufficient. The adjusted R2 for the association between differences in median PFS/TTP and differences in median OS was only 0.28. While the difference in median survival times may be a more appropriate measure of treatment effect than HRs if the proportional hazards assumption is violated, median survival times represent only a single point on the survival distribution and are potentially imprecise. It is not surprising therefore that amount of unexplained variation is greater when treatment effects are measured in terms of differences in median survival. Despite the relatively low R2 from this regression, it is useful to note that the results from the regression analysis presented here suggest that, on average, there is an slightly better than 1-month gain in median OS associated with a 1 month gain in median PFS/TTP. This is consistent with the hypothesis that treatment effects on post-progression survival are uncorrelated with treatment effects on PFS/TTP (Bowater et al, 2008).

Not surprisingly, the association between treatment effects was stronger in studies that did not allow crossover to active treatment. Additionally, the association between treatment effects on PFS/TTP and OS were less in trials conducted after 2005, when targeted therapies for treatment of mRCC were more likely to be available as potential off-study second-line treatments. Estimates of the association between treatment effects on PFS/TTP and OS based on the entire sample of trials may therefore be conservative. An increase in response rate was also correlated with OS, although the association was not as strong as that with treatment effects on PFS/TTP measured in terms of −ln(HR).

Limitations of this study should be noted. First, this study was based on published results of controlled trials which may be subject to publication bias. To the extent that only studies showing positive effects on both PFS and OS were published, then our estimates may overstate the true association between PFS and OS. However, a funnel plot analysis of the −ln HROS provided no strong evidence of publication bias (the plot was symmetric around the mean effect size and Egger’s test was not significant).

Ideally, the assessment of association of PFS/TTP and OS should be demonstrated over different stages of the disease (as the causal pathways of the disease process might differ depending on the stage) and across classes of drug (as drugs with different modes of action may have different pathways of intervention) (Fleming and DeMets, 1996). It is possible that the association reported here could only apply to specific recognised prognostic groups, but analyses by prognostic groups were unfeasible based on data reported in study publications (Molina and Motzer, 2008; Heng et al, 2009b). The majority of studies included in this analysis involved comparisons of two or more cytokine therapies. The association between treatment effects on PFS/TTP and those on OS were significant in trials evaluating targeted and non-targeted therapies. The association between treatment effects on PFS/TTP and OS was not significant for comparisons involving VEGF inhibitors, although there was a trend towards an association (P=0.0510). The number of such comparisons was small, however, and these comparisons may have been more likely to have been confounded by crossover and receipt of other non-study therapies post progression. It is reasonable to assume that results presented here can be generalised to evaluations of agents such as axitinib, that have similar mechanisms of action to the therapies included in this analysis (Rugo et al, 2005; Rini et al, 2007; Rixe et al, 2007).

For studies that allowed for crossover from control to active therapy, we used the reported measure of treatment effect that was considered to be least likely to be subject to confounding by such crossover. While it would be desirable to use a common measure of treatment effect for all studies, it is well established that crossover from control to active treatment may attenuate observed treatment effects on OS relative to what would have been observed in the absence of such crossover (Finkelstein and Schoenfeld, 2011; Saad and Buyse, 2012). To include results of studies with extensive crossover without controlling for crossover would add no useful information to the analyses. The RPSFT and IPCW methods used in the analyses of everolimus (Korhonen and Malangone, 2010; Korhonen et al, 2011; Wiederkehr et al, 2011) and pazopanib (Sternberg et al, 2010b) are useful methods for analysing OS in the context of selective crossover (Finkelstein and Schoenfeld, 2011; Morden et al, 2011; Rimawi and Hilsenbeck, 2012).

In unblinded trials, there may be a motivation for clinicians to call a patient’s disease progression earlier if the patient is in the control arm than if the same patient had been in the experimental arm (Dodd et al, 2008). To the extent that this inflates the treatment effects on PFS, the association between treatment effects on PFS/TTP and treatment effect on OS might be attenuated (because OS is not impacted by this bias). The use of blinded independent central review (BICR) may reduce any such bias. However, retrospective BICR may necessitate informative censoring on local assessment of progression, which may bias the comparison in favour of control patients (Dodd et al, 2008). This also would attenuate the observed association between treatment effect on PFS/TTP and treatment effect on OS. Treatment assignment was blinded in only six of the studies included in the analyses. Independent review of progression was employed in six studies. As studies that used blinded treatment assignment and/or review of progression tended to be those evaluating novel targeted agents, assessment of the independent effects of blinding of treatment assignment and/or BICR on the association between treatment effects on PFS and treatment effects on OS was infeasible.

Information from the trial reports on the frequency of assessments, the criteria used to assess response and/or progression, or the duration of treatment was not extracted. It therefore was not feasible in this analysis to assess how these and other unmeasured factors might affect the association between treatment effects on PFS and treatment effects on OS. Differences in these factors might help explain some variability in observed associations between treatment effects on PFS/TTP and on OS.

As the searches upon which this study was based were conducted in 2010, results of randomized controlled trials of systemic therapies for mRCC may have been published since the original literature search for this study was conducted. One such trial is the Renal EFFECT trial, a randomized controlled trial of intermittent vs continuous sunitinib (Motzer et al, 2012). It may be worthwhile in future research to update these analyses using results of this and other recently published studies, and to explore in multivariate analysis the independent effects of study design and other factors on the associations between treatment effects on PFS/TTP and treatment effects on OS.

In conclusion, results presented in this study suggest that treatment effects on diseases progression end points are strongly associated with treatment effects on OS. Further research is required to establish whether disease progression end points may be used as surrogate end points for OS in clinical trials of novel treatments for mRCC.