An optimised multi-arm multi-stage clinical trial design for unknown variance

Multi-arm multi-stage trial designs can bring notable gains in efficiency to the drug development process. However, for normally distributed endpoints, the determination of a design typically depends on the assumption that the patient variance in response is known. In practice, this will not usually be the case. To allow for unknown variance, previous research explored the performance of t-test statistics, coupled with a quantile substitution procedure for modifying the stopping boundaries, at controlling the familywise error-rate to the nominal level. Here, we discuss an alternative method based on Monte Carlo simulation that allows the group size and stopping boundaries of a multi-arm multi-stage t-test to be optimised according to some nominated optimality criteria. We consider several examples, provide R code for general implementation, and show that our designs confer a familywise error-rate and power close to the desired level. Consequently, this methodology will provide utility in future multi-arm multi-stage trials.


Introduction
With the cost of drug development increasing, study designs that can enhance the efficiency of clinical research are of great interest. One such class of designs is the group sequential [1]. This approach exploits the fact that data are accumulated over time: incorporating interim analyses at which the study may be stopped early, reducing the required sample size.
Recently, this methodology was extended to allow multiple treatments to be compared to a shared control [2]. These multi-arm multi-stage (MAMS) designs can bring sizeable gains in efficiency over conducting a series of single-stage two-armed trials [3]. Unfortunately, a limitation of this methodology in the case of normally distributed outcome data is that designs are usually determined under the supposition of known patient variance in response. Typically, this will not be the case at the design stage. Then, utilising test statistics that assume known variance will result in operating characteristics that differ from their nominal level if the true variance is not equal to the specified value.
For two-armed group sequential trials, several authors have suggested methods to broach this problem. These include a recursive algorithm [4], and a quantile substitution procedure [1,5]. The latter approach was also explored for MAMS trials, and demonstrated to more accurately control the familywise error-rate (FWER) to the desired level, at a small cost to the trial's power [6].
A Monte Carlo based procedure was also proposed for two-armed group sequential trials [7]. In this paper, we extend it to MAMS trials. Explicitly, we describe how the stage-wise group size and stopping boundaries can be optimised. Finally, using the TAILoR trial [2] as a motivating example, we compare the performance of our method to several other approaches.

Methods
We consider a MAMS trial with + 1 arms, and a maximum of stages. Of the arms, (indexed = 1, … , ) are to be compared to a single control arm (indexed = 0). We test the following hypotheses Here, ! is the mean response of patients allocated to arm = 0, … , . We assume that in each stage, patients are allocated to each arm present in the trial. To allow for the early dropping of arms, we denote by !" the actual number of patients allocated to arm = 0, … , in stage = 1, … , . Thus, !" ∈ 0, . Designs with unequal allocation, or with two-sided null hypotheses could be treated similarly.
Denoting by !"# the response of the th patient, in treatment arm , in stage , we assume that the !"# are independent and distributed as !"# ~ ! , ! . Extending [7], set where !"# = 0 ∀ if !" = 0. At interim analysis the following test statistics are constructed When is assumed known, the !" are together multivariate normal (henceforth, the -test statistics). With replaced by its estimate ! , the joint distribution of the resulting -test statistics, !" = !" ! , does not have a simple form. It is this that makes the determination of stopping boundaries for use withtest statistics difficult.
We now consider two categories of MAMS design: one that terminates the entire trial as soon as any null hypothesis is rejected, and one that stops recruitment only to those arms for which the corresponding null hypothesis has been accepted or rejected. These two types of design have been referred to as including simultaneous and separate stopping respectively [8].
To describe our design, we introduce the vectors = ! , … , ! ! and = ! , … , ! ! , where or the whole trial is stopped and no decision on ! ! is made.
2. Conduct stage of the trial, allocating patients to the control arm, and patients to each arm with 5. When using the simultaneous stopping rule, if > 0, set = + 1 and return to 2. Else stop the trial, and for each with ! = 0, set ! = . When using the separate stopping rule, if > 0, set = + 1 and return to 2. Else stop the trial.
On trial completion, and then conform to their designations above.
We would like to ensure that the FWER, the probability of rejecting at least one true null hypothesis, is controlled to some level . There are several ways to define power in a multi-arm setting. Here, as in [2] we desire power of at least 1 − to reject ! ! when ! = ! and ! = ! for = 2, … , . This is the so-called pairwise power for ! ! (see, e.g., [9]).
To this end, define Here is the indicator function on event . Furthermore, Ξ !"# and Ξ !"# represent the set of possible , combinations when using the simultaneous and separate stopping rules respectively.
Denoting the probability of a particular , combination on trial completion for a given vector of treatment effects = ! , … , ! ! by ℙ , | , we specify our required operating characteristics as Additionally, we optimise our choices of , , and . In theory, this could be achieved for almost any optimality criteria, with several sensible choices having been previously proposed (see, e.g., [10]). Here, we focus on minimising some weighted combination of the expected sample sizes (ESSs) when = and = , and the maximal possible sample size; an approach that has in several trial design settings proved effective [5,11]. Note that This could therefore, following [12], be achieved by identifying the , , and that minimise the following function Here ∈ ℝ ! is a penalty for designs with undesirable operating characteristics, taken as the sample size required by a corresponding single-stage design. Moreover, the ! ∈ ℝ ∪ {0}, for = 1,2,3, are weights given towards the desires to minimise the three included factors. Note that previous work suggests that designs that place all of their weight on one of the three factors (e.g., ! = 1, ! = ! = 0), will perform particularly badly for other choices of the weights [5,11]. It is therefore advisable to consider a range of options for the weights, and also to take ! ≠ 0, for = 1,2,3.
Unfortunately, the complex joint distribution of the !" prevents us from calculating the ℙ , | required for this exactly. Instead, we use a Monte Carlo method. We offer first a more practical description of how this works, before providing a formal description below.
Suppose as an example that = = 2, ! = ! = ! = 0, ! = 1, and that we will use the simultaneous stopping rule. For any choice of values for , = ! , ! ! , and = ! , ! ! (with ! = ! ), we can simulate a trials outcome by generating data from each treatment arm in stage one, using the fact that !!! ~ 0,1 for = 0,1,2 and = 1, … , . With this data, the !! for = 1, 2 can be formed. If !! ≥ ! for = 1 or 2, the trial terminates, with a familywise error (FWE) having occurred. If !! < ! for = 1 and 2, then the trial also terminates here, with no FWE having occurred. Otherwise the trial progresses to stage 2, with recruitment continued in arm 0 and the arms with ! ≤ !! < ! . We draw data for stage two in these arms again using the standard normal distribution, and then compute the !! for those with ! ≤ !! < ! . The trial now terminates, either with a FWE having been committed if for at least one of these , !! ≥ ! , or without a FWE having been committed otherwise. The FWER can then be estimated for this design by repeating the above process many times, and counting the proportion of instances in which a null hypothesis is rejected. Similarly, one can estimate the power under the LFC, or estimate ESSs. A global optimisation routine can then be used to search for the optimal values of , , and .
Formally, we generate = 100,000 independent sets of responses for each treatment arm under = for some suitably large value of . Subsets of these datasets are then used to form the responses for any smaller value of . Next, for any , , and , and chosen stopping rule, for the th dataset, the trial is conducted as specified above. Importantly, the values of and on trial completion are determined, and denoted ! and ! . An approximation to !"#$ for this design is then We can similarly compute approximations !"#$% , ESS , and ESS to !"#$% ESS , and ESS . Thus, to find the optimal design, we minimise the following function in , and Note that the requirement to generate datasets necessitates to be treated as an integer. Thus, an algorithm that can simultaneously search over the discrete , and the continuous and is required. We achieve this using CEoptim in R [13]. Code to implement our method is available from https://sites.google.com/site/jmswason/supplementary-material.
For both scenarios, and both considered stopping rules, we determined the balanced-optimal design for -test statistics using the Monte Carlo method described above (denoting the optimal values by ! , ! , ! ). For comparison, we use the triangular designs [14] for -test statistics (denoting the values by ! , ! , ! ), which can be found using the MAMS package in R [2]. These designs are so-named for the shape of their stopping regions, can be found quickly, and have been shown to provide good performance in terms of their associated ESSs for MAMS trials [12]. The resultant designs are given in  Table 1: The triangular designs determined using the known variance test statistics, and the balanced-optimal designs determined using the unknown variance test statistics, are displayed for the two considered trial design scenarios, and the two considered stopping rules. All boundaries are given to three decimal places.
We then examined, using 100,000 trial simulations, the performance of the following approaches as a function of the true variance ! ! A1. ! , ! , ! with -test statistics and the presumed value of ! ; A2. ! , ! , ! with -test statistics; A3. ! , ! , ! with -test statistics, and modification of the ! , ! using quantile substitution. That is, at interim analysis we replace !" and !" by !" , where ! is the cumulative distribution function of Student's -distibution with degrees of freedom; A4. ! , ! , ! with the -test statistics.
The results of these comparisons are given in Table 2. In both scenarios, using either stopping rule, assumption of known variance results in large inflation of the FWER when ! ! > ! . In contrast, Approaches 3 and 4 far more accurately control the FWER in all cases, with Approach 4 controlling to the nominal level on slightly more occasions overall. Moreover, whilst ESS is comparable for Approaches 3 and 4, Approach 4 always attains a lower value for ESS .

Discussion
In this article, we extended previous work for two-armed group sequential trials to allow the design parameters of a MAMS -test to be optimised, when employing either a simultaneous or separate stopping rule. For the considered examples, the method was successful in providing operating characteristics close to their nominal level.
It is important to note that by Equation (1), the FWER is controlled under the global null hypothesis ( = ). This is known to provide strong control under the assumption of known variance with -test statistics [2]. However, it is not known whether this is the case for the -test statistics considered here. Therefore, whilst intuitively it seems logical that Equation (1) would provide strong control in this setting, a search over the vector should be employed after initial design determination to verify this.
In conclusion, our method provides an alternative approach for dealing with unknown variance to the heuristic quantile substitution procedure. Precisely, quantile substitution offers a quick, often effective means of controlling the FWER relatively accurately. However, if it is vital to control the FWER, the proposed method should be preferable, and additionally allows the stopping boundaries to be optimised. In certain circumstances it can therefore be expected to allow the determination of more efficient designs.  Table 2: The estimated familywise error-rate ( !"#$ ), power (1 − !"#$% ), and expected sample sizes (ESSs) when = (ESS ) and = (ESS ) of the four considered approaches (A1-A4) are shown as the true variance ! ! varies, for the two considered trial design scenarios, and the two considered stopping rules. The rejection rate and ESS values are given to four and one decimal places respectively.