Combining Correlated Outcomes and Surrogate Endpoints in a Network Meta-Analysis of Colorectal Cancer Treatments

Simple Summary Currently, cytotoxic agents and biological targeted agents are commonly combined for the treatment of advanced or metastatic colorectal cancer. However, questions of ‘which chemotherapy or targeted therapy provides the higher efficacy and lower toxicity’ or ‘whether the addition of targeted therapy to chemotherapy not only increases the treatment effect but also reduces the adverse events’ have been raised. In this study, we firstly calculated the treatment effect on overall survival, which has not been reached in several randomized controlled trials, based on treatment effects on overall response rate and/or progression-free survival. Then we performed the network meta-analysis to compare the efficacy and safety of 12 commonly used regimens. Finally, our analyses showed that FOLFOX+cetuximab and FOLFIRI+bevacizumab have high probabilities of being first-line and second-line treatments in terms of efficacy and safety, respectively. Abstract This study aimed to investigate the efficacy and safety of systemic therapies in the treatment of unresectable advanced or metastatic colorectal cancer. Predicted hazard ratios (HRs) and their 95% credible intervals (CrIs) for overall survival (OS) were calculated from the odds ratio (OR) for the overall response rate and/or HR for progression-free survival using multivariate random effects (MVRE) models. We performed a network meta-analysis (NMA) of 49 articles to compare the efficacy and safety of FOLFOX/FOLFIRI±bevacizumab (Bmab)/cetuximab (Cmab)/panitumumab (Pmab), and FOLFOXIRI/CAPEOX±Bmab. The NMA showed significant OS improvement with FOLFOX, FOLFOX+Cmab, and FOLFIRI+Cmab compared with that of FOLFIRI (HR = 0.84, 95% CrI = 0.73–0.98; HR = 0.76, 95% CrI = 0.62–0.94; HR = 0.80, 95% CrI = 0.66–0.96, respectively), as well as with FOLFOX+Cmab and FOLFIRI+Cmab compared with that of FOLFOXIRI (HR = 0.69, 95% CrI = 0.51–0.94 and HR = 0.73, 95% CrI = 0.54–0.97, respectively). The odds of adverse events grade ≥3 were significantly higher for FOLFOX+Cmab vs. FOLFIRI+Bmab (OR = 2.34, 95% CrI = 1.01–4.66). Higher odds of events were observed for FOLFIRI+Pmab in comparison with FOLFIRI (OR = 2.16, 95% CrI = 1.09–3.84) and FOLFIRI+Bmab (OR = 3.14, 95% CrI = 1.51–5.89). FOLFOX+Cmab and FOLFIRI+Bmab showed high probabilities of being first- and second-line treatments in terms of the efficacy and safety, respectively. The findings of the efficacy and safety comparisons may support the selection of appropriate treatments in clinical practice. PROSPERO registration: CRD42020153640.

Y 2i representing log HR on PFS, and Y 3i representing log HR on OS are assumed to be correlated and normally distributed [63]: where µ ki are the true treatment effects, σ 2 ki are the corresponding variances of treatment effects for individual study i and outcome k, and ρ kl wi are within-study correlations among these estimates. The between-study variability is estimated by modeling µ ki in a conditional univariate normal distribution with structured covariance [63]: where the variances ψ 2 k are related to the between-study heterogeneity parameters τ 2 k through the regression coefficients λ kl , which are related to both τ 2 k and between-study correlations ρ kl b . Given that HR of OS is positively associated with HR of PFS and negatively associated with OR of ORR, we allocated uniform prior distributions for the between-study correlations, ρ 13 b ∼ U(−1, 0) and ρ 23 b ∼ U(0, 1). Additionally, we assigned half-normal distributions for heterogeneity parameters, τ k ∼ N(0, 1000)I(0,), and normal distributions for other parameters, η 1 , λ 20 , λ 30 ∼ N(0, 1000).
For studies reporting OS and a single treatment effect of ORR or PFS only, the reduced model of bivariate random effects [63] was similarly applied to investigate the association between OS and ORR or between OS and PFS.
The estimated parameters were borrowed from multivariate random effects (MVRE) models, and the predicted log HR for OS was calculated for studies reporting OR for ORR and/or HR for PFS but not HR for OS.

Network Meta-Regression Analysis of Treatment Therapies
In the network meta-analysis, we calculated the pooled HR of OS and the pooled OR of AEs grade ≥3 to compare the pairwise efficacy and safety between mCRC treatments, following a generalized linear model [64,65]: where the trial-specific effects of treatment in arm 1 of trial i denote µ i , the trial-specific effects of treatment in arm k compared with the treatment in arm 1 in the same trial denote δ ik , and t ik and t 1k are the treatments in arm k and arm 1 of trial i. The trial-level subgroup indicator x i is defined as x i = 1 i f study i compares primary treatments 0 i f sudy i compares secondary treatments Furthermore, the consistency assumption between direct and indirect estimates and the between-study heterogeneity were evaluated by conducting the node-splitting statistic and calculating the I 2 values [66].
We additionally ranked the treatment based on the surface under the cumulative ranking curve (SUCRA) values [67]. The SUCRA value for treatment i is defined as follows: where n is the number of treatments, and F(i, k) is the cumulative probability that treatment i ranks kth best and is calculated as where P(i,j) is the probability that treatment i ranks jth for a particular outcome of OS and AEs grade ≥3. The SUCRA value is therefore a representative number of the overall ranking, which ranges from 0 to 1 [67]. A higher SUCRA value indicates a higher probability of the efficacy or safety endpoint [67].
The SUCRA values were standardized and presented in a two-dimensional plot according to the efficacy and safety outcomes. We then applied the k-means clustering method to group treatments showing high efficacy and safety, high efficacy and low safety, high safety and low efficacy, and low efficacy and safety [68].
All the models regarding the Bayesian approach were performed in WinBUGS 1.4.3 (MRC Biostatistics Unit, UK) [69], using 3 chains and 150,000 iterations of the Markov chain Monte Carlo simulation process (including 50,000 burn-in iterations).
The study methodology and progress were registered and approved by the National Institute for Health Research-an international prospective register of systematic reviews (PROSPERO registration number: CRD42020153640).

Association between Surrogacy Endpoint and Correlated Outcome
The characteristics and findings from 49 included studies are summarized in Table 1. Bivariate and trivariate random effect models were carried out for the surrogacy associations between treatment effect sizes. Then, we calculated the predicted HRs (95% CIs) for the OS of 17 and five study populations that reported results for ORR only and both ORR and PFS, respectively ( Table 1). The predicted OS was not significantly different for all 22 pairwise treatment comparisons. Table 2 shows the surrogacy parameters of the MVRE models. The 95% CrIs of posterior intercepts containing zero confirmed that no treatment effect on a surrogate endpoint(s) suggested no treatment effect on the outcome. In other words, in studies with no significant differences between the intervention and comparison groups in terms of ORR and/or PFS, there were no differences in OS between the groups either. OR (odds ratio), HR (hazard ratio), CI (confidence interval), CrI (credible interval), FOLFOX (5-fluorouracil, folinic acid, and oxaliplatin), FOLFIRI (5-fluorouracil, folinic acid, and irinotecan), FOLFOXIRI (5-fluorouracil, folinic acid, oxaliplatin, and irinotecan), CAPEOX (capecitabine and oxaliplatin), Bmab (bevacizumab), Cmab (cetuximab), and Pmab (panitumumab). Bold font indicates statistical significance. Table 2. Surrogacy parameters of multivariate random effects models. When the 95% CrI posterior slope did not contain a zero, positive slopes indicated significant positive associations and negative slopes indicated significant negative associations between treatment effects on surrogate endpoints and the outcome. As a result, the treatment effect on OS was observed to be significantly positively associated with the treatment effect on PFS (posterior slope = 0.79, 95% CrI = 0.49-1.09), and the adjusted R-squared value was relatively high (posterior R-squared = 0.54, 95% CrI = 0.25-0.76), with somewhat low variance (posterior variance = 0.02, 95% CrI = 0.01-0.04) in the bivariate model. Although there was a significant negative association between the treatment effects on OS and ORR, the relationship was not strong, with the upper limit of the slope (−0.04) and the lower limit of the R-squared value (0.01) close to zero and somewhat high variance (posterior variance = 0.07, 95% CrI = 0.04-0.13). In the trivariate model, we observed a strong relationship between the treatment effects on OS and both ORR and PFS (posterior slope = 0.76, 95% CrI = 0.46-1.06; posterior R-squared = 0.53, 95% CrI = 0.22-0.76; posterior variance = 0.04, 95% CrI = 0.02-0.07), which was similar to that of the bivariate model of PFS as the surrogate endpoint.

Network Geometry for the Efficacy and Safety of CRC Treatments
A total of 12 commonly used regimens for a/mCRC were included in the NMA for efficacy ( Figure 1A). After including the predicted estimates from MVRE models, there was new evidence of the direct comparison between FOLFOX+Bmab and FOLFOX+Cmab, FOLFOXIRI+Bmab and FOLFOXIRI, and CAPEOX and CAPEOX+Bmab ( Figure 1B). The NMA for safety included 11 regimens, except for FOLFOXIRI, because data for comparative AEs grade ≥3 for FOLFOXIRI were not available from RCTs ( Figure 1C).

Discussion
This study applied MVRE models to calculate the predicted treatment effect on the correlated outcome of OS based on treatment effects on surrogated endpoints, including ORR and/or PFS. Both the observed and predicted HRs for OS as the efficacy and ORs for AEs grade ≥ 3 as safety were included in the Bayesian framework of NMA, which is based on both direct and indirect comparisons. Our findings indicated the high probabilities of FOLFOX+Cmab becoming a primary and secondary treatment in terms of efficacy and FOLFIRI+Bmab becoming primary and secondary treatment in terms of safety in the treatment of a/mCRC.
Although OS is the meaningful gold standard in oncology research and practice, surrogate endpoints have also been investigated over the past few decades because of the limitation of obtaining OS. It was reported that 84% of trials used surrogate endpoints during 2005-2012 [70] and 66% oncology indications between 2009 and 2014 [71] were approved by the United States Food and Drug Administration. In mCRC, PFS was mostly evaluated for the prognosis of OS outcome, in addition to response rate and tumor shrinkage criteria [72][73][74][75][76]. Cicero et al. recently found a moderate correlation between OS and PFS in first-line and second-line treatments with FOLFOX+Bmab for mCRC [72]. Similar findings were presented in a systematic review of twenty individual RCTs in the second-line treatment of mCRC, with moderate (0.73) and poor (0.17) correlations of PFS and ORR with OS, respectively [73]. However, the surrogacy relationships, which were performed by Bujkiewicz et al. in the Bayesian framework [62], were determined to have the advantage of considering the uncertainty of measurement errors of treatment effect on surrogate endpoints and allowed us to combine both treatments on ORR and PFS in the calculation of HR for OS [63]. In the present study, we also observed that the association between HR for OS and HR for PFS (R-squared = 0.54, 95% CrI = 0.25-0.76) was not much improved when adding OR for ORR as the second surrogate endpoint by applying the MVRE models (R-squared = 0.53, 95% CrI = 0.22-0.76).
In contrast, several studies have questioned the accuracy of surrogate endpoints in the prediction of treatment outcomes especially in oncology research [77]. A lack of validation due to weak to moderate correlations between ORR or PFS and OS was reported in patients with cancer treated with immune checkpoint inhibitors [77]. Additionally, a cross-sectional study of 51 products (26 products were assessed through conditional marketing authorization, and 25 products were assessed through accelerated assessment), regardless of treatment indications, which were authorized between 2011 and 2018 by the European Medicines Agency found that 46 approvals were based on surrogate endpoints which had not been demonstrated to obviously predict clinical outcomes [78]. In patients with mCRC, although PFS has been examined as a surrogate endpoint of OS in either literature-based (50 RCTs) [79] or individual patient-based (22 RCTs) [80] analysis, further studies of individual patient data at different time points are needed to validate the findings.
The combinations of 5-fluorouracil and folinic acid with oxaliplatin or irinotecan which were first launched in the 1990s showed a significant improvements in the response rate and survival time compared with those of regimens without oxaliplatin or irinotecan [81]. Several RCTs have been conducted to directly compare the activity of FOLFOX and FOLFIRI since then [51,53,58]. While FOLFOX was reported to be associated with an approximately 30% risk of death [51,58], another head-to-head trial [53], as well as our predicted values from RCTs reporting surrogate endpoints [56,59], showed comparative effects. The current NMA supported the superior efficacy of FOLFOX over FOLFIRI, while the safety was still equivalent. However, the choice of oxaliplatin-based or irinotecan-based therapy remains controversial. Physicians might prefer FOLFOX due to the consideration of the significant cost-effectiveness and the lower nausea and vomiting side effects than FOLFIRI, which might not be appropriate for older female patients [82]. In contrast, hand-foot syndrome is more frequent in patients treated with FOLFOX, which might not be preferred in some polar countries. Among European countries, the preference of using FOLFOX-and FOLFIRI-containing regimens in first-line and second-line treatments was also reversed between Germany-Spain and Italy-France [83]. Nevertheless, our NMA showed that the efficacy and safety between FOLFOXand FOLFIRI-containing regimens (FOLFOX+Bmab vs. FOLFIRI+Bmab, FOLFOX+Cmab vs. FOLFIRI+Cmab, and FOLFOX+Pmab vs. FOLFIRI+Pmab) were not significantly different.
While including anti-VGFR therapy such as Bmab in chemotherapy regimens was introduced to exhibit significant benefits on OS in the ARTIST trial [38], the treatment effect on OS was not reported in other trials [17,25,34,57]. A previous NMA reported consistent findings of comparable OS and AEs grade ≥3 for FOLFOX/FOLFIRI/FOLFOXIRI/CAPEOX plus Bmab vs. chemotherapy alone, although FOLFOX/FOLFIRI plus Bmab resulted in a significantly better disease control rate and PFS than did FOLFOX/FOLFIRI [8]. Wu et al. recently reported a nonsignificant difference in OS between chemotherapy+Bmab and chemotherapy alone in the subset of KRAS (HR = 1.17, 95% CI = 0.93-1.48) and RAS wild-type (HR = 0.88, 95% CI = 0.63-1.23) mCRC patients, despite the small number of individual studies [84].
In this study, we did not observe any significant differences in OS among subjects who received chemotherapy plus Bmab, Cmab, or Pmab. However, pooled analysis for the CRC side suggested Cmab and Pmab for left-sided mCRC treatments and Bmab for right-sided mCRC treatments [84]. Additionally, the effect of anti-EGFR therapies was different according to the presence of KRAS or NRAS mutations [85]. While Cmab and Pmab showed a significant prolongation of OS or PFS among RAS wild-type mCRC patients, the survival outcomes tended to be worse among patients with RAS mutations [85]. The addition of VEGF or EGFR inhibitors into chemotherapy showed similar effects in the first-line treatment of nonmutated RAS mCRC [85]. Cmab-and Pmab-based therapies revealed significant improvements in OS of 25% and 32%, respectively, compared with chemotherapy+Bmab among subjects harboring KRAS wild-type but not RAS wild-type subjects [84].
In the present study, we took the strengths of MVRE models that take into account surrogate endpoints in the final clinical outcome. Despite the consistency of the treatment effects with the previous NMA [8], the treatment effects tended to be close to the null hypothesis in our study because we additionally considered the treatment effect on OS from RCTs that did not report the HR on OS or the HR was not reached. We additionally considered whether the treatment was used for the primary or secondary indication in the meta-regression model to justify the effect of the treatment line.
Despite its strengths, the study has some limitations. Subgroup analyses of cancer side-specific or genotype were not evaluated in the current study. We also combined the treatments based on the components of the regimens, regardless of the schedule (sequentially or continuously, doses, and orders) and drug administration routes (bolus or infusion). Although the dose reduction due to side effects was reported to not have an effect on survival for chemotherapy treatments of colon cancer, we were unable to investigate the impact of these parameters [86]. Furthermore, the types or choices of chemotherapy can depend on the site or a regional preference of the hospital when standard cares show no differences. Finally, publication bias was not assessed because of the small number of head-to-head RCTs for each treatment comparison.

Conclusions
In summary, we found a significant relationship between the correlated outcome of the treatment effect on OS and surrogated endpoints. The findings of efficacy and safety comparisons may support the selection of appropriate treatments in clinical practice.

Conflicts of Interest:
The authors declare no conflict of interest.