Sample size re-estimation without un-blinding for time-to-event outcomes in oncology clinical trials

Sample size re-estimation is essential in oncology studies. However, the use of blinded sample size reassessment for survival data has been rarely reported. Based on the density function of the exponential distribution, an expectation-maximization (EM) algorithm of the hazard ratio was derived, and several simulation studies were used to verify its applications. The method had obvious variation in the hazard ratio estimates and overestimation for the relatively small hazard ratios. Our studies showed that the stability of the EM estimation results directly correlated with the sample size, the convergence of the EM algorithm was impacted by the initial values, and a balanced design produced the best estimates. No reliable blinded sample size re-estimation inference can be made in our studies, but the results provide useful information to steer the practitioners in this field from repeating the same endeavor.


Introduction
Given the life-threatening nature and the unmet medical needs of many types of cancer, the drug development process in the field of oncology should be accelerated. To this end, many clinical trial approaches have been proposed. The sample size required for a clinical trial should be sufficiently large to provide a reliable answer to the questions addressed [1] , and the method by which the sample size is calculated should be provided in the protocol. This method is the most basic requirement for planning all studies because it is critical to the success of a study and pertains to budget considerations. For fixed sample size designs, there is a risk that expected trial outcomes may not obtain adequate power because some uncertainty is usually associated with the parameters in the planning phase. This deficit can be remedied by re-estimating the parameters during the interim analysis and modifying the initially planned sample size if necessary [2][3] .
Although various methods have been proposed and used for sample size, a basic debate remains: Should the interim data be examined with the treatment group in a blinded or un-blinded manner? Regulatory authorities certainly favor blinded sample size reassessment (BSSR) (CPMP Working Party on Efficacy of Medicinal Products [4] ; ICH-E9 Expert Working Group [1] ; European Medicines Agency/Committee for Medicinal Products for Human Use [5] ) because it better preserves study integrity. Fortunately, the work of Gould and Shih [6] has shown that un-blinding is not necessary to efficiently estimate within-group variance.
Nevertheless, previous studies of sample size reestimation addressed various statistical and practical aspects of this approach, such as BSSR in noninferiority and equivalence trials with normally distributed outcome variables and hypotheses formulated in terms of the ratio and difference of means [7] , BSSR in multi-armed clinical trials when the outcome variable is normally distributed [8] , BSSR with negative binomial counts in superiority and non-inferiority trials [9] , and BSSR with count data in multiple sclerosis [10] , etc.
However, significantly longer progression-free survival (PFS), overall survival (OS) and time to progression (TTP) are well known to be widely used in oncology studies as primary endpoints to evaluate the efficacy of treatment. Thus, the response variable of most oncology clinical trials is survival time. To the best of our knowledge, the use of BSSR in this context has rarely been reported. Here, we tried to develop an EM approach for BSSR with exponentially distributed outcomes. The performance and applicability of this procedure are described based on several simulation studies.

Derived EM algorithm
The sample-size formula for an oncology clinical trial can be simplified if it is expressed as the number of deaths required rather than the number of patients. Suppose that a two-sided test will be performed with a significance level of α/2 and a power of 1-β for a hazard ratio Δ. Let z 1-a/2 and z 1-β be the 1-α/2 and 1-β percentiles of the normal distribution, respectively, and let P A and P B be the proportion of the patients randomized to treatments A and B, respectively. Then, the total number of deaths required is given by the following expression [11] : In general, only Δ in the above formula is unknown and needs to be estimated based on previous studies. However, if Δ is much larger or smaller than estimated, the sample size will need to be re-estimated without breaking the randomization codes. Gould and Shih proposed a EM algorithm-based procedure for blinded variance estimation for normally distributed endpoints [6,12] ; we extended this EM algorithm for Δ estimation to exponentially distributed endpoints without un-blinding the treatment group at the interim stage.
As the treatments are not identified, any interim observation, x i , i = 1,…,n, could be in either treatment group such that the treatment assignments are "missing at random". Let t i denote the treatment group indicator, e.g., t i = 1(0) indicates that sample i is in treatment group 1 (group 2); t 1,… t n are independent random variables with P(t i = 1) = q. The density function of the exponential distribution is f ðtÞ ¼ le -lt , and l is the scale parameter. Given t i , x i (i = 1,…,n) is distributed as follows, with density Therefore, the expression for the expected t i given x i is Then, the log-likelihood of the interim observations is After taking the partial derivative of the above loglikelihood, the maximum likelihood estimates of l 1 /l 2 are The EM algorithm for estimating l 1 /l 2 proceeds as follows. For example, if θ is assumed to be 0.5, then the "E" step consists of substituting the "initial" estimates of l 1 and l 2 into formula (3) to obtain provisional values for the expected value of t i . "M" consists of obtaining maximum likelihood estimates of l 1 /l 2 according to (5) after replacing t i in (5) with the provisional expectations. The "E" and "M" steps are repeated until the value of l 1 /l 2 stabilizes.

Simulations
We use simulations to evaluate performances of the proposed procedures. Specifically, various Δ estimators were compared for different scenarios. The censoring was not considered in the following simulations because the derived EM algorithm was used to re-estimate the number of events. For the EM algorithm, the recursive computation was continued until successive estimates of l 1 and l 2 differed by less than 0.001. Besides, we also designed a simulated clinical trial to investigate the power and illustrate BSSR procedure.
Firstly, the sample size requirement of the above EM algorithm was investigated. We considered the situation of equal samples for each distribution, population parameters were set as l 1 = 0.1, 0.12, 0.15, 0.2, 0.25, 0.3 and l 2 = 0.1, and several scenarios for the number of events per group differed from 5 to 800. The initial values were set to 1.0 and 0.8. In addition, the complete proportion of the total sample size in interim analysis is also an impact factor related to sample size. We defined Δ= 0.66, 2.0, 3.0 to do investigation. It was set α = 0.05, power = 90%, we calculated the number of events using (1) for each scenario with 100% complete proportion, and the estimated values are listed in column "n" in the first line of Table 1. The other "n" cells are calculated according to the defined complete proportion; for example, the 80% complete proportion means 80% of the total number of events are completed in interim stage, n is 103 for scenario l 1 = 0.10 and l 2 = 0.15 which is calculated by 129*80%.
Secondly, as the negligible impact of initialization on the EM estimation procedure, we investigated its effects by calculating the Δ estimates according to the above EM procedure for 1,000 repeated simulations with different values of the initialization constants of l* 1 and l* 2 . We selected equal number of events per group (n = 50) for the internal pilot study.
Thirdly, the balanced design had been considered in most sample size estimation studies. Here, we investigated the impact of the sample allocation ratio on the above EM procedure. It was set with α = 0.05, power = 90%, and we calculated the number of events using (1) for each allocation ratio and the estimated values listed in column "n" of Table 2. As some cells of "n" had the n 1 or n 2 values which were smaller than 20, considering the impact of the sample size, the number of events was expanded to at least 50 for one of the groups to obtain column "n*" in Table 2. Δ was then re-estimated based on "n" and "n*", respectively.
In a simulation study, we investigated the power and BSSR procedure with dummy randomized clinical trials in which the survival data of the two treatment groups were compared. We obtained the independent and identically distributed time-to-event observations from exponential distributions and the censored data from uniform distribution.
In our design, an initial sample size (n 0 ) was carried out based on assumed parameters. Although this sample constituted the final sample size for the fixed design, the sample size could be adjusted using BSSR based on half of the initially sample size, i.e., 50% information time.
The following parameter values were considered: assumed median survival time T = 10 and 5 for the treatment group and the control group, respectively; therefore, the scale parameters l of the exponential distributions were calculated to be 0.07 and 0.14 based on the formula l = log (2)/T. Two different scenarios were considered for the true median survival time, T' = 10, 5 and T' = 12, 5. Furthermore, the design was balanced and the significance level and target power were the usual α = 0.05 (two-sided) and 1-β = 0.9.
The number of failures for the fixed design was estimated to be 45 [13][14] per group based on the assumed parameters, and the internal pilot study included 23 failures per group for an information time of 50%. The assumed censor rates were 0, 20%, and 40%, and then the corresponding event rates were 100%, 80%, and 60%, respectively. Accordingly, these values resulted in fixed sample sizes (n 0 ) of 45, 56, and 75, respectively, based on sample size = number of events/(event rate). The true parameters were used to generate simulated data with a fixed sample size (n 0 ). According to EM algorithm, the blinded re-estimated sample size (n 1 ) was based on the 23 (the half of 45) events from an assumed internal pilot study which is defined as 50% information time of the integrity trial [15] . Subsequently, 1,000 trails were simulated from each scenario.

Results
Sample size requirement Fig. 1 shows the EM re-estimation of Δ for different number of events; 1,000 simulation replications were performed for each situation. The estimation results stabilized as the number of events increased. All results were overestimated when the number of events per group was less than 30. The smaller Δ, the higher sample size was required. More than 100 sample sizes was needed when Δ was 1.5. For Δ = 1.0, there was about 16.05% overestimation even if sample size reached 800, this result was similar with Xie's research (13.54%) [16] . Table 1 also shows that the estimates were impacted by event numbers (sample size). For l 1 = 0.10 and l 2 = 0.15, the estimates were all acceptable even if the completed proportion was 20%, which nearly satisfied the sample size requirement of this EM algorithm. On the contrary, the estimates significantly deviated from the real values for l 1 = 0.3 and l 2 = 0.1 if the completed proportion was less than 80% due to the number of events was insufficient for the EM algorithm.

Initial values
The simulation results are presented in Table 3. Specifically, the estimates obtained from the EM procedure depended on the initialization. It shows that the Δ estimates exceeded 1 if l* 1 >l* 2 and were less than 1 if l* 2 < l* 1 . Fortunately, the estimated Δ values were very close to the true values when the initial values satisfied l* 1 >l* 2 with the true values l 1 >l 2 , and l* 1 < l* 2 with l 1 < l 2 . The estimated number of the required events using (1) was equal for both Δ = l 1 /l 2 and Δ = l 2 /l 1 . Therefore, the choice of l* 1 >l* 2 or l* 1 < l* 2 did not affect the re-estimation of the number of events. Table 3 also shows the overestimation results for Δ = 1 (l 1 = 0.2, l 2 = 0.2), and the choice of initial values did not affect these overestimated results. Table 2 results indicate that the EM estimates of Δ were impacted by the sample allocation ratio. The balanced design produced the best estimates. The unbalanced designed estimates, including 2:1, 1:2, 3:1 and 1:3, were slightly larger or smaller, and the sample size difference between the two groups directly correlated with the estimated values. More unbalance would get more biased estimation although the number of events increased to at least 50 for one group in "n*" column.  l 1 and l 2 are exponential distribution parameters, Δ is l 1 /l 2 , n* is expanded sample size. Table 4 shows the statistical power tested by the logrank test and exponential regression for fixed design and sample sizes re-estimated adjusted design. For the true median survival times of T' = 12 and 5, the original sample sizes were 45 for a 100% event rate, 56 for an 80% event rate and 75 for a 60% event rate, as shown in column "n 0 ". Thus, these sample sizes were overfull to obtain 90% power for the log-rank test or exponential regression. Therefore, the power of the log-rank test and exponential regression exceeded 90% significantly. After a Δ re-estimation using the above EM algorithm, the number of events was re-calculated. Based on the pre-defined event rates (60%, 80% and 100%), the corresponding sample sizes were re-calculated and were shown in column "n 1 ". From the output, the reestimated sample sizes "n 1 " were near the actual requirement, the power for the adjusted design was closer to 90% for a 100% event rate.

Example (simulated)
For true median survival times of T' = 10 and 5, "n 0 " was the correct required sample size, and the values of re-estimated sample sizes "n 1 " were similar to the "n 0 " values; the power for both fixed design and adjusted design was close to 90% for a 100% event rate. The variation details of the simulated EM estimated values of λ 1 and λ 2 for each scenario are presented in Fig. 2.
Notably, the power for both the log-rank test and exponential regression inversely correlated with the event rate in each scenario because the sample size calculation was based on the event numbers/(event rate). Lower event rates required larger samples, which increased the power of the test.

Discussion
The topic of sample size reassessment during an ongoing trial has become very popular in recent years. The inflation of the type I error rate and the loss of power has long been an intractable problem for sample size adjustment in an internal pilot study with adaptive design. In the ICH E9 guidelines, it is reflected by the following requirement for planned sample size adjustment: "The step taken to preserve blindness and consequences, if any, for the type I error and the width of confidence intervals should be explained". The calculation of the actual type I error rate for the blinded case was previously derived for the t-test situation [17] , and the results showed that the nominal inflation in the type I error rate did not significantly differ for the blinded sample size recalculation in the unrestricted design. Over the last decade, a number of studies had addressed the problem of BSSR for an ongoing clinical trial. These documents noted that the assumptions of sample size calculation should be reassessed with the blinded data and that the effect on the type I error rate can be controlled [18] . To our knowledge, this work is restricted to the normally or binomially distributed data.
Cancer is currently one of the major diseases affecting human health, and anti-cancer drug becomes the focus of research in the pharmaceutical industry. Survival time-related outcomes such as PFS and OS are generally used as primary endpoints in oncology studies to assess efficacy. Thus, oncology trials usually require long follow-up periods and include a planned interim analysis, and sample size re-estimation is also essential. In this paper, we tried to extend BSSR method for exponentially distributed survival data based on EM algorithm.
The blinded data of an ongoing trial is from a mixture of two populations; one for the treatment group and the other for the control group. EM method described here is applied to estimate model parameters of the mixture distributions and therefore assess the hazard ratio. The derived EM estimation only can be used to re-estimate the number of events for oncology trials, and the censoring is ignored. Normally, it is acceptable in the clinical trial because the censor rate is always unknown and only assumed in the design stage. With the re- Table 3 EM re-estimation of Δ with different initial values on 1,000 runs Initial values l 1 = 0.2, l 2 = 0.1 l 1 = 0.1, l 2 = 0.2 l 1 = 0.2, l 2 = 0.2 l 1 = 0.3 l 2 = 0.1 l 1 and l 2 are exponential distribution parameters, Δ is l 1 /l 2 , l 1 * and l 2 * are initial values. Fig. 2 Kernel Density of λ 1 and λ 2 estimation. A and C are for Scenario 1-3 in Table 4, B and D are for Scenario 4-6 in Table 4. λ 1 and λ 2 are exponential distribution parameters, T' 1 and T' 2 are true median survival times, Δ' is true hazard ratio, Δ is re-estimated hazard ratio, n 0 is original sample size, n 1 is re-estimated sample size. estimated number of events in the interim stage, the total censor rate of the two treatments which is available without breaking blind, could be used to do sample size re-estimation, because sample size = the re-estimated number of events/(1-interim stage censor rate). On the other hand, only exponential distribution has been considered in our research, and Weibull distribution needs to be investigated in the future since it is more widely applied in survival analysis.
Our studies show that estimates from the EM method are highly variable, which coincides with the literature on interim analysis of treatment effects with blinded data and a normally distributed endpoint. Besides, the relatively small hazard ratios (Δ < 1.2) are overestimated according to this EM method.
Specifically, for the hazard ratios (Δ≥1.2), the EM estimation shown here is subject to a sample size requirement and the stability of the estimation results directly correlated with the number of events. The initial values for the parameters directly affect the convergence of the algorithm and the estimated results. The simulation shows that the initial values should be carefully selected, and the calculation of the optimum initial values requires further research. Moreover, the EM estimation described here is more suitable for a balanced design.
Due to the variation and overestimation of smaller hazard ratios, no reliable inference can be made on sample size re-estimation in our studies. The results from this paper provide useful information to steer the practitioners in this field from repeating the same endeavor. This should be of some relief to health authorities.