Combining tumour response and progression free survival as surrogate endpoints for overall survival in advanced colorectal cancer

Progression free survival (PFS) and tumour response (TR) have been investigated as surrogate endpoints for overall survival (OS) in advanced colorectal cancer (aCRC), however their validity has been shown to be suboptimal. In recent years, meta-analytic methods allowing for use of multiple surrogate endpoints jointly have been proposed. The aim of this research was to assess if PFS and TR used jointly as surrogate endpoints to OS improve their predictive value. Data were obtained from a systematic review of randomised controlled trials investigating effectiveness of different pharmacological therapies in aCRC: systemic chemotherapies, anti-epidermal growth factor receptor therapies, anti-angiogenic agents, other multi-targeted antifolate treatments and intra-hepatic arterial chemotherapy. Multivariate meta-analysis was used to model the association patterns between treatment effects on the surrogate endpoints (PFS, TR) and the final outcome (OS). Analysis of 33 trials which reported treatment effects on all three outcomes showed reasonably strong association between treatment effects on PFS and OS. A weak surrogate relationship was noted between the treatment effects on TR and OS. Modelling the two surrogate endpoints, TR and PFS, jointly as predictors of OS gave no marked improvement in neither surrogacy patterns nor the precision of predicted treatment effect in the cross-validation procedure. When investigating subgroups of therapy, only small improvement in precision of predicted treatment effects on the final outcome in studies investigating anti-angiogenic therapy was noted. Overall, the simultaneous modelling of two surrogate endpoints did not lead to improvement in association between treatment effects on surrogate and final endpoints in aCRC.


Introduction
Surrogate endpoints have been receiving increased attention by the research community in the last three decades as they offer a cost effective and quicker alternative to the use of final outcomes especially if they can be measured with a shorter follow-up period [1]. For surrogate endpoints to be used effectively in clinical research, they need to be validated. There are three levels of surrogate endpoint validation: biological plausibility of association between outcomes, patient-level association between outcomes and study-level association [2,3]. For the purposes of this study we focus on the latter level of validation. Study level association is the hallmark of surrogacy, i.e. establishing whether the treatment effect on the surrogate endpoint is likely to predict a treatment effect on the clinical outcome. This is usually carried out through meta-analyses of randomised controlled trials (RCTs) and, in particular, using a bivariate meta-analysis [4,5,6].
To identify a surrogate endpoint for overall survival (OS) in advanced or metastatic colorectal cancer, a number of candidate endpoints have been investigated as potential surrogate endpoints, including progression free survival (PFS), tumour response (TR) or time to progression (TTP) [7,8,9,10,11,12]. In previous work investigating triallevel surrogacy in advanced colorectal cancer, Buyse et al found that PFS was an acceptable surrogate endpoint to the overall survival [7]. A more recent study, investigating surrogacy patterns across a broader range of treatments in a meta-analysis of 101 RCTs, showed suboptimal validity of PFS as a surrogate endpoint for OS in advanced or metastatic colorectal cancer [10]. Other studies also investigated the surrogate relationship between the PFS and OS in advanced colorectal cancer that suggested that further validation is required [8,9].
In these previous studies, using meta-analytic techniques either in the form of a meta-regression or a bivariate metaanalysis, only one surrogate endpoint at a time was investigated. Recently, researchers investigated use of multiple surrogate endpoints in a patient-level surrogate endpoint validation in multiple sclerosis [13] and as joint predictors of clinical benefit measured on final outcome in a meta-analytic framework using Bayesian multi-variate meta-analysis [14].
In this paper we investigate whether the use of treatment effects on two candidate surrogate endpoints, PFS and TR, can result in increased precision of predicted treatment effect on the final outcome, namely OS, in advanced colorectal cancer when they are modelled jointly. We investigated the predictive ability of the surrogate endpoints in this setting by conducting a bivariate meta-analysis to investigate one surrogate endpoint at a time and a trivariate meta-analysis to evaluate the surrogate endpoints jointly. We conducted the analysis in the Bayesian framework using multivariate meta-anlysis method described by Bujkiewicz et al [14]. Then the added value of multiple surrogate endpoints modelled jointly was investigated, by comparing the meta-analytic models in terms of predicted intervals.

Data sources
We used data from a systematic review by Ciani et al 2015 [10], which included treatment effect estimates from 101 randomised controlled trials (RCTs) in advanced or metastatic colorectal cancer that assess pharmacologic therapy against other therapy. The studies included trials investigating a broad range of treatments including five different classes which were systemic chemotherapy, anti-epidermal growth factor receptor monoclonal antibodies (Anti-EGFR), anti-angiogenic agents, other multi-targeted antifolate (MTA) and intra-hepatic arterial chemotherapy (IHA). For the purposes of the systematic review conducted by Ciani et al. [10] the following definitions of outcomes were considered: OS was defined as the time from randomization to time of death, PFS was defined as the time from randomization to tumour progression (regardless of how the progression was defined), or death from any cause, and TR was defined by objective tumor measurements by utilising methods that classify patients as responders, with a complete or partial confirmed best response. Responses were determined using the criteria and recommendations according to the Response Evaluation Criteria in Solid Tumors (RECiST) guidelines [15] or the World Health Organization recommendations [16].
For purpose of this paper, we used data from the above systematic review on treatment effects measured on OS, PFS and TR. Not all of the studies in the systematic review reported treatment effects on all three outcomes of interest.
In our base case scenario analysis, we used data from trials reporting all three outcomes. As a sensitivity analysis, we carried out an analysis on data from studies which reported treatment effects on at least two outcomes. Analyses were repeated in subsets of data defined by the type of treatment. A sensitivity analyses to examine the impact of an outlying observation and the choice of the prior distribution for the between-studies correlation were also carried out (for details see section 2.2). Crossover in RCTs, for example from the control to experimental arm following progression, often results in loss of information about the treatment effect on the final outcome; what the effect would have been if crossover was not allowed. As patients move to the experimental treatment arm, the difference in treatment effect on OS between the treatment arms diminishes, leading potentially to zero effect with large uncertainty. This creates difficulty in estimating the association patterns between treatment effects on surrogate and final endpoint and the estimates of the latter are not reliable and their variability is reduced (potentially diminishing the correlation between the treatment effect on the two outcomes). Another sensitivity analysis was carried out on the subset of trials which did not allow for crossover.
Individual patient data (IPD) were available from one of the RCT's included in the systematic review; study by Hurwitz et al. [17] that investigated the use of bevacizumab in combination with irinotecan, fluorouracil, and leucovorin in patients with metastatic colorectal cancer [17]. The IPD were used to obtain the within-study correlations between the treatment effects on the surrogate endpoints and on the final outcome.

Statistical analysis
We used multivariate meta-analysis in a Bayesian framework to model jointly the treatment effects on one or two surrogate endpoints and on the overall survival which was the final clinical outcome. Treatment effects on PFS and OS were modeled using hazard ratios (HRs) and treatmet effects on TR were modelled using odds ratios (ORs). Log scale was used to allow the assumption of normality of the effects. For studies, where there were no responders in one of the treatment arms, continuity correction of 0.5 was added to all values of the contingency table to enable finite odds ratio and variance estimators to be derived.
To model jointly treatment effect on two surrogate endpoints and the final outcome, we used trivariate randomeffects model, where the estimates of treatment effects on the two surrogate endpoints, log OR of TR, denoted Y 1i and log HR on PFS denoted Y 2i and the treatment effect on the final outcome; log HR on OS, denoted Y 3i are assumed to be correlated and normally distributed The trivariate distributions describe the within-study variability, with the effects Y ki estimating the true treatment effects µ ki in the population and σ 2 ki are the corresponding variances of the estimates of the treatment effects on outcome k = 1, . . . , 3 in each study i and ρ kl wi the within-study correlations between these estimates. The elements of the within-study variance-covariance matrix are assumed known. To describe the between-study variability we modelled the correlated true treatment effect µ ki in the product normal formulation of conditional univariate normal distributions: where the variances ψ 2 k , k = 1, 2, 3 relate to the between-study heterogeneity parameters τ 2 k ; ψ 2 1 = τ 2 1 , ψ 2 2 = τ 2 2 − λ 2 21 τ 2 1 and ψ 2 3 = τ 2 3 − λ 2 32 τ 2 2 , with the regression coefficients related to both the heterogeneity parameters and the betweenstudies correlations ρ kl b : τ2 . In the Bayesian framework the parameters are given prior distributions. The between-studies correlations are given informative prior distributions as recommended by Burke et al [18]. By assuming that an increase in the treatment effect on PFS (reduced progression rate) will lead to an increased effect on OS (reduced mortality rate) and hence a positive correlation, we place a uniform prior distribution allowing only positive values between zero and one for the correlation is used; ρ 23 b ∼ U (0, 1). In a similar manner, assuming that increased response rate would lead to reduced progression or mortality rates, and hence a negative correlation, we place prior distribution that allows only negative values between minus one and zero on the correlations between treatment effects on TR and PFS and between the effects on TR and OS; ρ 12 b , ρ 13 b ∼ U (−1, 0). Sensitivity analysis was carried out using a non-informative prior distribution U(-1,1) for between-studies correlation. The heterogeneity parameters are given half normal distributions, τ 1,2,3 ∼ N (0, 1000)I(0, ), to allow only for positive values [6]. The remaining parameters are given non-informative normal prior distributions; η 1 ∼ N (0, 1000), λ 20 , λ 30 ∼ N (0, 1000).
In the above model describing the between-studies variability, we used structured between-studies variance-covariance matrix by assuming conditional independence between the true treatment effects on the first surrogate endpoint µ 1i and on the final outcome µ 3i . As a sensitivity analysis we also used an alternative model using unstructured variancecovariance matrix and another model with a different structure on the variance-covariance matrix where the true treatment effects on the two surrogate endpoints, µ 1i and µ 2i , were assumed conditionally independent.
As stated above, the within-study correlations are assumed to be known. However, to obtain the correlations, IPD are necessary as the correlations between treatment effects on log HR and log OR scales are not reported in the original RCT reports. For the purposes of this study, IPD from only one study were available [17]. Within-study correlations between the treatment effects on the three outcomes (TR, PFS and OS) were obtained using bootstrapping methods with 5000 bootstrap samples [19]. We assume that the within-study correlations were equal across all studies.
Two models were considered, the bivariate and the trivariate random effects models. The bivariate model was used to describe and evaluate the association between treatment effects on a single surrogate endpoint (PFS or TR) and on the final outcome (OS), whilst the trivariate model was used to describe the association between treatment effects on multiple surrogates (PFS and TR) jointly and treatment effect on the final outcome (OS). The trivariate model described by the formulae (1)-(2) reduces to bivariate meta-analytic model when there are only two outcomes. More details on product normal formulation of the bivariate model for surrogate endpoints can be found in Bujkiewicz et al. [6] and for the trivariate (and multivariate) model for multiple surrogate endpoints in Bujkiewicz et al. [14] and for borrowing of strength across outcomes with missing data in Bujkiewicz et al [20].

Surrogacy relationship evaluation
We followed the surrogacy criteria introduced by Daniels and Hughes [4], and adopted by Bujkiewicz et al. [14], by which the slope in (2), λ 32 , indicates the association between the treatment effect on the second surrogate endpoint (PFS) and the treatment effect on the final outcome (OS). For the treatment effects to be associated, we require the slope λ 32 = 0. For the association to be perfect the conditional variance ψ 2 3 = 0. Moreover, we would expect the intercept λ 30 = 0 to ensure that no treatment effect on surrogate endpoint will imply no treatment effect on the final outcome. In a similar manner we can describe surrogacy criteria between the first and second surrogate; λ 21 = 0, We also report the adjusted R 2 [21,22] which for perfect surrogacy relationship should be one. In our model for the relationship between treatment effects on the second surrogate endpoint and on the final outcome.

Cross validation and model comparison
In order to investigate whether the joint use of treatment effects on multiple surrogate endpoints gives more precise predictions of the treatment effect on the final outcome, a cross-validation procedure was carried out. In one study at a time the treatment effect on the final outcome was assumed unknown and predicted from the treatment effect on surrogate endpoint (or multiple surrogate endpoints jointly) using the bivariate (or trivariate) meta-analytic model.
In this Bayesian approach to multivariate meta-analysis, this was achieved by assuming that the unreported outcomes were missing at random, which were then predicted by the Markov chain Monte Carlo (MCMC) simulation of the model [20]. The predicted interval was obtained by assuming the variance σ 3i (or σ 2i in the bivariate case) of the treatment effect on the final outcome known and inflating it by the variance of the random effect giving the variance of the predicted effect σ 2 3n + var(μ 3n |Y 1n , Y 2n , σ 1n , σ 2n , Y 1(−n) , Y 2(−n) , Y 3(−n) ), where Y 1(2,3)(−n) denote the data from the remaining studies without the validation study n, similarly as in Daniels and Hughes [4]. The added value of multiple surrogate endpoints modelled jointly was then investigated by comparing the parameters describing criteria for surrogacy (slope, intercept and conditional variance) and the predicted effects obtained from the cross-validation procedure. The predicted effects on the final outcome obtained by the bivariate and the trivariate meta-analytic models were compared in terms of the width of the predicted intervals. We investigated surrogacy across all RCTs as well as in subgroups of class of therapy.

Software and computing
All models were implemented in WinBUGS [23] where the estimates were obtained using MCMC simulation using 250000 iterations (including 150000 burn-in). Convergence was checked by visually assessing the history, chains and autocorrelation using graphical tools in WinBUGS. All posterior estimates are presented as means with the 95% credible intervals (CrI). R was used for data manipulation and to execute WinBUGS code multiple times (for validation of surrogates for each study) using the R2WinBUGS package [24]. OpenBUGS and R2OpenBUGS version of the software was used for the cross-validation procedures which were conducted using Linux (Red Hat, Inc., Raleigh, North Carolina)-based high performance computer. WinBUGS programs corresponding to the bivariate and trivariate models are included in the online appendix. In the sensitivity analysis of trials that did not allow for patient crossover we combined seven studies, out of the total 33 trials reporting all three outcomes. The studies included three trials of systemic chemotherapy, one anti-EGFR therapy, two anti-angiogenic agents and one IHA.

Included data
List of references for studies included in the analysis can be fount in the supplementary materials.
Exploratory analysis of the data (presented graphically in Figures 1 to 4 and Table 1 of the supplementary materials) showed a lot of heterogeneity of the treatment effects for TR and PFS, with the confidence intervals of the treatment effects on TR particularly wide, especially for two classes of therapy: the chemotherapy and anti-EGFR therapies.

Within-study correlations
The within-study correlations required to populate the within-study variance covariance matrix in the multivariate meta-analytic model, were obtained from IPD. The within-study correlations between each pair of treatment effects on the three endpoints are as follows: the correlation between treatment effects on PFS and OS (log HR(PFS) and log HR(OS)) is ρ 32 wi = 0.513, correlation between treatment effects of TR and OS (log OR(TR) and log HR(OS)) is ρ 31 wi = -0.333 and the correlation between treatment effects on TR and PFS ((log OR(TR) and log HR(PFS)) is ρ 21 wi = -0.433.

Surrogacy criteria: base case scenario
Three bivariate meta-analyses were carried out investigating the association between treatment effect on each pair of the three outcomes: TR, PFS and OS. Treatment effect on TR was evaluated as a surrogate to the treatment effect on PFS and to treatment effect on OS and the treatment effect on PFS was investigated as surrogate to treatment effect on OS. Trivariate model was then used to investigate whether treatment effects on both TR and PFS modelled jointly improved their predictive value of the treatment effect on OS. Table 1 shows results of these analyses conducted in the base case scenario where the data used were from studies reporting treatment effects on all of the three outcomes.
Results of the three bivariate models applied to all of the data, presented in the top part of Table 1, showed that there was an association between the treatment effects on each pair of outcomes. The intervals of the intercepts obtained from the bivariate models all contained zero indicating that no effect on the surrogate endpoint could imply no effect on the final outcome. The intervals for the slopes did not contain zero indicating positive association were slope was positive and negative association where slope was negative. However the surrogate relationships were not strong. When investigating TR as a surrogate endpoint for OS, the association between the treatment effects on the two outcomes was weak in terms of the small slope λ 21 = -0.05 (95%CrI: -0.13,0.00) and the 95% CrI contained zero when rounded to the second decimal place, and the mean and the lower bound of R 2 adj = 0.33 (95% CrI: 0.00, 0.91) were also small. The slope and the adjusted R-squared were higher for the relationship between the treatment effects on PFS and TR; slope was -0. 32   Results from the trivariate meta-analysis, which described the associations between the treatment effects on the two pairs of outcomes (effects on PFS and TR and effects on OS and PFS) in a single model, were similar to those obtained from separate bivariate models. Precision around the intercept, slope and the conditional variance was minimally reduced for the association between the treatment effects on PFS and TR in the trivariate analysis whereas precision for these estimates for the association between the treatment effects on OS and PFS remained the same in the trivariate analysis as in the the bivariate analysis using a single surrogate endpoint. Table 1 also shows the results for each subclass of therapy. For subgroup of trials investigating systemic chemotherapy, the results were similar to those obtained from the whole cohort of studies but typically obtained with increased uncertainty (wider CrIs) and weaker association in terms of lower mean slope. The adjusted R-squared was minimally higher in this subgroup for the association between the treatment effects on OS and TR, whilst a lower mean slope was obtained for the association between TR and PFS and between PFS and OS, compared to the analysis of all treatments. The slopes for the association between the treatment effects on TR and PFS and between the effects on OS and PFS were obtained with minimally higher precision from the trivariate models compared to the bivariate. For the anti-EGFR therapies, also similar results were obtained to those from the analysis conducted on data from all of the studies but also, similarly as for systemic chemotherapy, with weaker association pattern. For anti-angiogenic agents the mean slopes and the mean R-squared values were considerably higher for all investigated surrogacy relationships compared to other subclasses and the analysis of all treatments, however they were obtained with high uncertainty, also likely due to the small number of studies in the subgroup.

Cross-validation
To investigate the predictive value of the surrogate endpoints when modelled jointly, a cross-validation procedures were carried out. The predicted treatment effects on OS predicted from the treatment effect on PFS alone were compared with those predicted from treatment effects on both TR and PFS jointly by exploring the associated uncertainty described by the predicted intervals as well as whether the predicted intervals contained the observed point estimate of the treatment effect on OS in each study. When looking at predicted treatment effects obtained using the complete data set of the 33 studies, some of the predicted intervals were inflated when making predictions from the treatment effect on both surrogate endpoints jointly, compared to the predictions made from the treatment effect on PFS only.
The intervals were on average 0.21% wider with the percentage change of the width of the interval ranging between 1.96% reduction to 4.4% increase. However, from the point of view of the cross-validation procedure, the 95% predicted intervals included the observed point estimate in most of the studies apart from one which was an extreme lowest value of the treatment effect on OS. This was the case for both bivariate and trivariate models. Full set of predicted intervals is included in Table 2 of the supplementary materials.
The cross-validation procedure was repeated for each class of therapy separately. In terms of predicted intervals, on average there was a very modest improvement in precision of predictions for the anti-angiogenic agents obtained from the trivariate compared to the bivariate model (on average 2.35% narrower 95% predicted interval). For the systemic chemotherapy class there was a minimal improvement in precision for most of the studies, with an average percentage reduction of 1.41% predicted interval. However, the predicted intervals were inflated for the anti-EGFR therapies with, on average, 6.9% increase in the width of the interval. All predicted intervals are included in Table 3 of the supplementary materials.
Overall there was not much benefit of combining treatment effects on two surrogate endpoints to predict the treatment effect on the final outcome. This lack of improvement, or even increased uncertainty of the predicted effect when using multiple surrogate endpoints, may be due to increased overall between-studies heterogeneity when extending the data to include the treatment effect on TR. The between-studies heterogeneity for the treatment effect on TR was considerably higher compared to the heterogeneity of the treatment effects on PFS and OS in the data set including all treatments. This was also the case for the subgroups of studies including the systemic chemotherapy and the anti-EGFR therapy trials. However, for the anti-angiogenic agents, the between-studies heterogeneity of the treatment effect on TR was comparable with that for the treatment effects on PFS. This may explain some increase in precision of the slope and the predicted effects on OS when using multiple surrogate endpoints in this class of therapy, as including additional outcome did not increase overall uncertainty. However, due to small number of studies the added value was minimal. The treatment effects on all three outcomes are comparable between those from bivariate and trivariate models. All mean treatment effects and the heterogeneity parameters are listed in Table 1 of the supplementary materials.

Sensitivity analysis
Sensitivity analysis, extending the data set to the 51 studies reporting at least two outcomes gave similar results; surrogacy criteria for the association between the treatment effects on PFS and OS were satisfied both when looking at all therapies, and for Systemic chemotherapy and Anti-angiogenic agents therapies (Tables 4 and 5 in the online   Appendix).
Results of the sensitivity analysis of trials with no crossover, presented in Table 2, were similar to those obtained from base case scenario with respect to the surrogacy criteria. The treatment effects on the surrogate and final outcomes appeared to be associated for all investigated surrogacy relationships. However for all sets of surrogacy relationships and both the bivariate and the trivariate analyses all estimates were obtained with large uncertainty compared to the whole set of studies, which was most likely due to the small number of studies (only 7) in the meta-analysis. The association between treatment effects on OS and PFS appeared to be weaker in terms of the conditional variance which increased compared to the results obtained from all 33 studies reporting all three outcomes. This was the case in both sets of results, from the bivariate and the trivariate analyses. This also led to reduced values of the adjusted R-squared. The use of multiple surrogate endpoints did not improve the strength of the association patterns when compared to the use of a single surrogate endpoint in this subset of studies. Similarly as for the full data set, this could also be due to the large heterogeneity of the treatment effects on TR (see table 6 in the online appendix, including the average effects and the heterogeneity parameters.
The cross validation showed on average a very modest increase in precision, by on average 2.3% and up to 6.2%, when predicting the treatment effect on OS from the treatment effects on both PFS and TR compared to the predictions made from the effect on PFS only. Full set of results from the cross validation are included in Table 7 of the online appendix.
An additional sensitivity analysis was carried out to investigate an impact of an outlying observation (study with the largest effect size estimate for TR). The results (listed in Table 8 of the online appendix) showed an increased mean slope and the mean R-squared for the full set of studies and the subgroups of the EGFR inhibitors, however the uncertainty around these parameters in both sets of results also increased.
A final sensitivity analysis was carried out investigating the impact of the choice of the prior distribution for the between-studies correlation, replacing the informative prior distributions U(-1,0) for the negative association and U(0,1) for the positive association with a non-informative prior distribution U(-1,1). The results of this analysis were somewhat different compared to those obtained from the main analysis and are listed in Table 9 of the online appendix.
The association between PFS and OS, and also for TR and PFS was satisfied for both 2D and 3D models applied to all treatments in the base case scenario. In terms of subclass therapies the association was preserved between TR and PFS for: Systemic chemotherapy and Anti-angiogenic agents.

Discussion
We investigated the use of multiple surrogate endpoints as joint predictors of the clinical benefit measured on the final clinical outcome in advanced colorectal cancer. A multivariate meta-analytic framework allowed us to combine TR OS  . In a Bayesian meta-analytic framework, we modelled the correlated treatment effect in the product normal formulation which is a convenient form to explore a range of parameters describing the surrogacy relationships, such as the intercept, slope, conditional variance (as set out by Daniels and Hughes [4]) and the adjusted R-squared (introduced by Burzykowski et al [22] and in the Bayesian framework by Renfro et al [21]). These models also are used to make predictions of the treatment effect on the final clinical outcome (OS) from the treatment effect on the surrogate endpoints. In this respect they have an advantage of taking into account of the uncertainty around all the parameters, including the measurement error around the treatment effects on surrogate endpoints (in contrast to, for example, the standard approach to meta-regression where treatment effects on surrogate endpoints are treated as fixed covariates) [14].
The treatment effects on PFS and TR were associated with treatment effect on OS. However, overall the joint use of two surrogate endpoints did not lead to much improvement in the association between treatment effects on the surrogate and final endpoints but in the subclass of anti-angiogenic agents led to very modest improvement in precision of the predicted effects on OS. Some small improvement in precision, when modelling both surrogate endpoints jointly, was also observed in cross-validation procedure conducted on trials without cross-over. In the trials allowing for cross-over, there is typically reduced effect on OS with large uncertainty around the treatment effect estimate. This is likely to affect the results of modelling surrogate relationships, using both the bivariate and the trivariate methods.
It is possible that the trivariate approach would show some noticeable benefit if more studies were available in the analysis of studies without the cross-over. In our analysis of the trials which did not allow for the cross-over, there was typically reduced uncertainty of the predicted effects when using multiple surrogates, but the reduction was small as the number of studies in the analysis was also small, therefore we cannot draw strong conclusions based on our findings. Not all studies reported whether the treatment cross-over was allowed. Another source of uncertainty that may have prevented the improvement of the surrogate relationship when using both candidate surrogate endpoints was the large between-studies heterogeneity of the treatment effect on the tumour response. This may have been caused by the heterogeneity of the methods used to measure the response across the trials.
To investigate sensitivity of the results to the model parameterisation, we carried out a sensitivity analysis, (the results of which are shown in Table 6 of the online appendix). Using alternative assumptions about the between-studies variance-covariance structure gave similar results for the surroagte relationship between the treatment effects on PFS and OS, however the R 2 adjusted was reduced compared to the bivariate and trivariate models considered in the main analysis, and for one of the models the interval of the slope included zero suggesting marginally poor surrogate relationship.
In conclusion, the impact of the joint modelling of the treatment effects on two surrogate endpoints (TR and PFS), on their surrogate relationship with the treatment effect on the overall survival was not noticeable in advance colorectal cancer. Further work will be needed to investigate in detail the treatment cross-over and the heterogeneity of the definitions of the outcomes and potentially the patient populations.           were used. In the first one (denoted in Table 10 as 3D structured **) we assumed conditional independence of the true treatment effect on the surrogate endpoints (TR and PFS) whilst the treatment effect on the final outcome was conditional on both treatment effects on the two surrogate endpoints. The second model in sensitivity analysis assumed fully unstructured between-studies variance-covariance matrix, hence the true treatment effects on all three outcomes being correlated.