Disease-free survival as a surrogate endpoint for overall survival in adjuvant trials of pancreatic cancer: a meta-analysis of 20 randomized controlled trials

We aimed to assess whether disease-free survival (DFS) could serve as a reliable surrogate endpoint for overall survival (OS) in adjuvant trials of pancreatic cancer. We systematically reviewed adjuvant randomized trials for non-metastatic pancreatic cancer after curative resection that reported a hazard ratio (HR) for DFS and OS. We assessed the correlation between treatment effect (HR) on DFS and OS, weighted by sample size or precision of hazard ratio estimate, assuming fixed and random effects, and calculated the surrogate threshold effect (STE). We also performed sensitivity analyses and a leave-one-out cross validation approach to evaluate the robustness of our findings. After screening 450 relevant articles, we identified a total of 20 qualifying trails comprising 5170 patients for quantitative analysis. We noted a strong correlation between the treatment effects for DFS and OS, with coefficient of determination of 0.82 in the random effect model, 0.82 in the fixed effect model, and 0.80 in the sample size weighting; the robustness of this finding was further verified by the leave-one-out cross-validation approach. Sensitivity analyses with restriction to phase 3 trials, large trials, trials with mature follow-up periods, and trials with adjuvant therapy versus adjuvant therapy strengthened the correlation (0.75 to 0.88) between DFS and OS. The STE was 0.96 for DFS. Therefore, DFS could be regarded as a surrogate endpoint for OS in adjuvant trials of pancreatic cancer. In future similar adjuvant trials, a hazard ratio for DFS of 0.96 or less would predict a treatment impact on OS.


Background
Pancreatic cancer is one of the few malignant tumors with increasing incidence and mortality in both sexes [1], and it is predicted to become the third leading cause of death in the European Union in 2020 [2]. Fewer than 20% of pancreatic cancer patients present at a localized, resectable stage at their first visit, and curative resection remains the only chance of cure for these patients. Progress in surgical techniques in recent years has likely minimized postoperative complications, which is regarded as an important factor in long-term survival [3,4]. However, in the absence of adjuvant therapy, approximately 90% of patients suffered from distant or local relapse within 5 years after curative resection, and curative resection alone only yields a 5-year overall survival (OS) of approximately 8 to 13% [5][6][7]. Thus, valid adjuvant therapies are required to reduce this risk.
The gold standard endpoint in adjuvant trials of pancreatic cancer is OS, which has the advantage of being simple and reliable to measure, straightforward to interpret, and clinically useful. However, this endpoint has its disadvantages: it requires many patients and lengthy follow-up duration to detect statistically significant differences. In addition, its estimates are potentially diluted by non-cancer deaths and subsequent therapies after recurrence. Therefore, reliable endpoints that could be used as surrogates for OS in pancreatic cancer could shorten the follow-up period and reduce the cost of drug development. Among them, disease-free survival (DFS) is the reasonable potential surrogate endpoint for OS in the adjuvant setting of pancreatic cancer. Several meta-   analyses have revealed that DFS is validated as a surrogate for OS in lung cancer [25], gastric cancer [26] and colorectal cancer [27]. Although Petrelli et al. reported that DFS cannot represent a reliable surrogate endpoint for OS in adjuvant trials of pancreatic cancer [28], the number of included trials in that study was comparatively small (12 trials); additionally, among the 12 trials, one was the adjuvant trial of periampullary adenocarcinoma (the ESPAC-3 periampullary cancer randomized trial) rather than pancreatic cancer [29], which would confound the results. Therefore, with the accumulated evidence of 20 randomized controlled trials, we performed a rigid metaanalysis to evaluate whether DFS could be used as a surrogate endpoint to measure the effect of the adjuvant therapy of pancreatic cancer.

Search strategy and data collection
In December 2018, we searched Medline and Embase systematically using the key words "pancreatic neoplasm", "chemotherapy", "radiotherapy", and "chemoradiotherapy", limited to "clinical trial", "controlled clinical trial" or "randomized controlled trial". We also search the ClinicalTrials. Gov and Cochrane Library databases, and manually searched the references of the included trials and abstracts of two conference proceedings (the 2019 American Society of Clinical Oncology [ASCO] annual meeting and the European Society for Medical Oncology [ESMO] 2018 congress) to retrieve additional studies.
Inclusion criteria were randomized controlled trials of adjuvant treatment for non-metastatic pancreatic cancer This trial was designed as a two-by-two factorial design to test two comparisons: chemoradiotherapy, and chemotherapy. Patients were randomly assigned to chemoradiotherapy-alone group (n = 73), chemotherapy-alone group (n = 75), both chemoradiotherapy and chemotherapy group (n = 72), and observation group (n = 69) c These trials were analyzed by per-protocol population d The long-term outcomes of CONKO-001 trial after curative resection, reporting hazard ratio (HR) for OS and DFS in full-text publication. We excluded reviews, abstracts, case reports, studies that were not published as full-text articles and studies with cohorts of less than 50 patients. For each trial, the following data were collected by two independent investigators (RCN and SQY): OS and DFS results, final publication year, trial conduct period, type of study (phase II or III), staging information, treatment arms, number of patients, primary endpoint, and median follow-up time.

Statistical analysis
This analysis is at the trial level throughout, with no individual patient-level data being incorporated. We computed the correlation between the treatment effect (HR) on DFS and OS through a linear regression model [27]. To interpret the differences between studies regarding study size and precision of HR estimates, we weighted the analysis proportionally to the study sample size or to the precision of the observed treatment effects. Hence, we applied three weighting strategies (sample size, fixed effect, and random effect) as the weighting strategies [30]. While the fixed effect meta-analysis is based on the presumption that a common treatment effect exists among every trial and uses the estimated inverse variance as weights, the random effect metaanalysis permits treatment effect discrepancy from trial to trial and merges the potential among-trial variation of effects into the weights. According to A' Hern et al. [31], we down-weighted the sample size if trials reported more than two treatment arms.
We calculated the weighted coefficient of determination (R 2 ) to quantify the variation explained by the surrogate endpoints, with R 2 value higher than 0.75 as a strong correlation, higher than 0.5 as good, higher than 0.25 as moderate, and equal to or lower than 0.25 as poor. We performed several sensitivity analyses that restricted the analyses to phase 3 trials, large trials (included patients ≥200), trials with mature follow-up periods (median follow-up ≥24 months), trials with adjuvant therapy versus observation, and trials with adjuvant therapy versus adjuvant therapy to verify the robustness of our findings. We also calculated the surrogate threshold effect (STE), which was defined as the minimum treatment effect on the surrogate necessary to predict an OS benefit [32]. The upper limit of the confidence interval for the estimated surrogate treatment effect should fall below the STE to predict a non-zero effect on OS. For each meta-analysis, we applied an internal validation through leave-one-out analysis to evaluate the prediction accuracy of the surrogate model [33]. Each trial was left out once, and the surrogate model was built with other trials. This model was then re-applied to the left-out trial, and a 95% prediction interval was calculated to compare the predicted and observed treatment effect on OS. We used R version 3.4.0 for all statistical analyses (http://www.r-project.org).

Results
After the systematic literature review, we identified 20 qualifying trials (5 phase 2 trials and 15 phase 3 trials) comprising 5170 patients for final analysis (Fig. 1,   Fig. 2 Correlation between treatment effects on DFS and OS. Each trial is represented by a circle, with the size of the circle being proportional to the sample size. The blue line represents the 95% prediction limit of the regression line (red line). STE = 0.96; OS, overall survival; DFS, disease-free survival; STE, surrogate threshold effect; HR, hazard ratio  [10] was designed as a two-by-two factorial design to evaluate the role of adjuvant chemoradiotherapy and chemotherapy independently, with 75 patients randomly divided into the chemotherapy group, 73 patients in the chemoradiotherapy group, 72 patients in the chemoradiotherapy and chemotherapy group, and 69 patients in the observation group. Neoptolemos et al. reported the interim result of ESPAC-1 trial in 2001 [40], and updated the long-term survival outcomes after a median follow-up of 47.0 months [10]; thus, we included the latter publication in the present study. The CONKO-001 trial was also first published in 2007 [16] and was updated in 2013 [7]. Overall, the 20 trials included 23 comparisons for quantitative analysis, among which nine comparisons reported improvement in OS, and eleven comparisons reported improvement in DFS ( Table 2).
We first assessed the degree of association through sample size weighting strategy, and observed that the correlation between the treatment effect on DFS and OS was strong (R 2 = 0.80, 95% CI: 0.49 to 0.99) (Fig. 2). Additionally, we noted that permitting difference (random effect model) and no difference (fixed effect model) between therapy type and treatment effect on DFS and OS slightly strengthened the degree of association (fixed effect: 0.82, 0.52 to 0.99; random effect: 0.82, 0.52 to 0.99). We then calculated the STE of 0.96, indicating that a future adjuvant trial would need less than 0.96 for DFS of the upper limit of the confidence interval to predict with 95% confidence an OS benefit.
Given the potential heterogeneity of the included studies, we performed several sensitivity analyses (Table 3), and noted that restriction of the analysis to phase 3 trials would strengthen the correlation between DFS and OS (0.82 to 0.83). When we restricted the analyses to trials with adjuvant therapy versus observation, the degree of association between DFS and OS was not strong (0.68 to 0.73) (Fig. 3a). Nonetheless, we recognized that adjuvant therapy versus adjuvant therapy rather than observation is now the standard design setting for pancreatic cancer; thus, we then restricted the analyses to trials with adjuvant therapy versus adjuvant therapy, and observed a very strong correlation between DFS and OS (0.89 to 0.93). Other sensitivity analyses that restricted the analyses to large trials and trials with mature follow-up periods also exhibited strong correlations between DFS and OS (0.80 to 0.87) (Fig. 3b). Finally, we performed a leave-one-out cross validation approach to assess the accuracy of DFS in predicting OS. We noted that the observed HR for OS fell between the limits of the 95% prediction intervals in 22 of 23 comparisons, indicating that the treatment effect on DFS is a reliable predictor of OS (Fig. 4).

Discussion
The point at which a potential surrogate endpoint could be theoretically validated has been seriously discussed [41]. The correlation approach has been widely adopted to validate the efficiency of a surrogate endpoint in locally advanced lung cancer [25], gastric cancer [26,42] and colorectal cancer [27]. In the present study, we included a total of 20 high quality adjuvant randomized controlled trials to evaluate the surrogacy of DFS for OS in pancreatic cancer. Our finding demonstrated that the correlation between DFS and OS was strong (0.80 to 0.82), irrespective of the applied weighting strategies. Sensitivity analyses that were restricted to phase 3 trials, large trials, trials with mature follow-up periods, and trials with adjuvant therapy versus adjuvant therapy also yielded strong or very strong correlations (0.80 to 0.93) between DFS and OS. Therefore, we proposed the use of DFS as the surrogate endpoint for OS in adjuvant trials of pancreatic cancer.  Table 3) according the sensitivity analysis that restricted to trials with adjuvant therapy versus observation (a) and trials with adjuvant therapy versus adjuvant therapy (b). Each trial is represented by a circle, with the size of the circle being proportional to the sample size. The blue line represents the 95% prediction limit of the regression line (red line). OS, overall survival; DFS, disease-free survival; HR, hazard ratio Although the recent advance in adjuvant chemotherapy have translated into substantial survival benefit for pancreatic cancer, a large number of these treated patients still suffered from relapse or metastasis; thus, new therapeutic strategies are urgently needed. Clinicians are now evaluating the therapeutic effect of more intensive adjuvant chemotherapy, adjuvant targeted therapy and immunotherapy in pancreatic cancer after curative resection. It is well recognized that OS is the standard endpoint for clinical trials; however, using the endpoint of OS to perform the phase 3 trials is time consuming, thus postponing the new therapy strategies in clinical application. Therefore, we urgently need reliable surrogate endpoints for OS in adjuvant trials of pancreatic cancer, among which DFS is the most reasonable surrogate endpoint, and it has been set as the primary endpoint in several phase 3 trials [7, 17-19, 23, 37]. A previous meta-analysis reported that the correlation between DFS and OS was not strong enough to support the DFS as the reliable surrogate endpoint for OS in adjuvant trials of pancreatic cancer [28]; nonetheless, they only included a total of 12 trials, among which one trial was adjuvant setting for periampullary cancer rather than pancreatic cancer [29]. Therefore, in the present meta-analysis, we applied more rigorous criteria through three weighting strategies to address this urgent issue. Our findings revealed that the degree of association between DFS and OS was strong, which was further verified through extensive sensitivity analyses and a leave-one-out analysis validation approach. We believe that the robust correlation between DFS and OS in adjuvant therapy of pancreatic cancer is mainly attributable to the fact that pancreatic cancer is an aggressive tumor and that the subsequent lines of therapy are limited if patients develop relapse or metastasis.
Given the fact that adjuvant chemotherapy has showed superior survival outcome to observation for pancreatic cancer, adjuvant chemotherapy including gemcitabinebased or S-1-based regimens rather than observation would be set as the control arm in adjuvant trials. Interesting, we found that the correlation between DFS and OS was not strong (0.68 to 0.73) with restriction to trials with adjuvant therapy versus observation; nonetheless, we noted a very strong correlation between DFS and OS when we restricted the analysis to trials with adjuvant therapy versus adjuvant therapy (0.89 to 0.93). Therefore, in future adjuvant trials of pancreatic cancer, DFS could be served as the robust surrogate endpoint for OS.
STE is an alternative measure for surrogate endpoint validation [32]. Using a surrogate endpoint with STE closer to 1, it would be easier to predict an OS benefit. In the present meta-analysis, our finding showed that the STE was 0.96 for DFS, indicating that an adjuvant trial in pancreatic cancer producing a hazard reduction of at least 4% for disease recurrence or death could be expected to promise a statistically significant reduction in OS. Fig. 4 Leave-one-out cross-validation analysis of the prediction of OS by treatment effect on DFS: observed HR for OS for left-out trial vs. predicted HR for OS and 95% prediction interval for predicted HR for OS. To assess model accuracy, a leave-one-out cross-validation strategy was used: each unit of analysis was left out once, and the linear model was then constructed from scratch using the remaining data [33]. This model was then re-applied to the left-out study in order to compare the predicted and observed treatment effect on OS. Based on the linear regression models, a 95% prediction interval was calculated compare the predicted and observed treatment effect on OS. OS, overall survival; DFS, diseasefree survival; HR, hazard ratio There are several limitations that should be noted. First, the data for our analysis were extracted from trial level rather than an individual patient; therefore, a potential published bias cannot be excluded. Second, the included trials spanned nearly three decades, and the ascertainment of DFS was mainly influenced by the image examination and surveillance interval, thus may have changed considerably over time and among trials. Third, long-term follow-up was not available from all trials included in our analysis. Pancreatic cancer is a relatively aggressive malignancy with severe heterogeneity; thus, the short follow-up in adjuvant trials will result in fairly wide confidence intervals of HR about the treatment effects. In the sensitivity analysis, the correlation between DFS and OS remained strong (R 2 = 0.75) when we included trials with median follow-up > 24 months. Third, the included trials at our analysis comprised a wide range of therapeutic strategies, which included trials of adjuvant chemotherapy, radiation therapy, chemoradiotherapy, chemoimmunotherapy and targeted treatment. Although we performed sensitivity analysis to eliminate the potential effect of these treatment heterogeneities, the results of our analysis should be interpreted with caution. Therefore, we strongly recommended authors of individual trials to share their data to further verify the results of our analysis through individual-patient data.

Conclusions
In conclusion, our analysis suggested that DFS could serve as a reliable surrogate endpoint for OS in adjuvant trials of pancreatic cancer. In future similar adjuvant trials, a hazard ratio for DFS of 0.96 or less would predict a treatment impact on OS. However, these results should be further verified by individual-patient data analysis.