Computation of the Properties of Multi-Stage Clinical Trial Design Based on SCPRT

Multi-stage clinical trial is commonly used to evaluate a new treatment against some existing one(s) [1,2]. For one-stage (also called, non-sequential) clinical trial design, with a given nominal level α, the stopping boundary is the level α critical value for the test of the null hypothesis of no difference among the treatments. For multistage clinical trial designs, stopping boundaries should be designed to ensure the overall (or family-wise) type-I error be approximately at the specified level α. These boundaries serve as stopping guidelines to prevent inappropriate early stopping. Such boundaries are not unique, and there are large numbers of researches on this problem. It is now common in phase III trials sponsored by pharmaceutical industry and the National Institute of Cancer to include interim monitoring boundaries in the protocol documents [3-6].


Introduction
Multi-stage clinical trial is commonly used to evaluate a new treatment against some existing one(s) [1,2]. For one-stage (also called, non-sequential) clinical trial design, with a given nominal level α, the stopping boundary is the level α critical value for the test of the null hypothesis of no difference among the treatments. For multistage clinical trial designs, stopping boundaries should be designed to ensure the overall (or family-wise) type-I error be approximately at the specified level α. These boundaries serve as stopping guidelines to prevent inappropriate early stopping. Such boundaries are not unique, and there are large numbers of researches on this problem. It is now common in phase III trials sponsored by pharmaceutical industry and the National Institute of Cancer to include interim monitoring boundaries in the protocol documents [3][4][5][6].
Commonly used stopping boundaries include the O'Brein-Fleming boundary [2] and the corresponding spending function or a variant of it. The sequential conditional probability ratio test (SCPRT) boundary [7][8][9] is derived based on the concept of a negligible discordance probability, namely, the chance that the decision to accept or reject the null hypothesis based on interim data will reverse would be should the trial continue to the planned end. The approach is computationally more complex and once the multistage (i.e., boundary) is chosen, we can calculate the discordance probability of this design and the SCPRT boundary has the feature that it can control the discordance probability at a preset small level so that the probability that conclusion obtained at an early stage and that at the end of the trial differ is very small and negligible. The calculation of the probability is our focus here. However, in the existing SCPRT, the discordance probability, type I error and power are not easy to compute, and are only given for some selected configurations of stages, interim sample sizes and are only available for balanced design. In practice, it is desirable to compute these quantities for any given configuration of stages, interim sample sizes, balanced or imbalanced. Here we investigate a simulation method to compute these quantities for general setting of configurations. The simulation results are reported and the method is illustrated with a real data example.
In Section 2 we describe the problem and the SCPRT procedure, Section 3 describes the proposed simulation method for computing the discordance, type I error and power for the SCPRT. Section 4 presents the simulation results for some selected configurations of stages, interim sample sizes, for both balanced and imbalanced cases, and applies the method to a real data set as an illustration.

The problem and Brief Review of the SCPRT
As mentioned in the Introduction that sequential tests are commonly used in clinical trials. The SCPRT is a special such test, developed by Xiong [7], Tan, Xiong and Kutner [8], Xiong, Tan and Boyett [9], etc. With the SCPRT, if a trial is stopped early with a conclusion, then statistically the same conclusion would likely to hold had the trial continued to its planed end, so that consistent conclusions can be achieved with high chance using this sequence of tests. In clinical trial, often the parameter θ of interest is the difference among the treatments, and the goal is to test the null hypothesis H 0 : θ ≤ θ 0 vs H a : θ > θ 0 . Consider a group sequential test with k-stages, let n 1 < ··· < n k be the cumulative sample sizes at these stages, set t j = n j /n k . At each stage-j, we compute a statistic S j from observations based on the n j individuals, and make a decision whether to stop/continue the trial to the next stage, and make a conclusion about the treatment, based on S j . Let 0 k s be the cutoff value of S k in a non-sequential trail. In SCPRT, at each stage time t j , a decision, either stopping the trial or continue to the next stage of the trial, is determined by the following conditional maximum likelihood ratio ∫ is the conditional density of S j given S k , and it is free

Abstract
Multi-stage clinical trial, also termed group sequential clinical trial, is commonly used design for clinical trials to evaluate a new treatment against existing one(s). Such design allows the trial to stop early for pronounced treatment effect or the lack of it thereof based on data accumulated at the intermediate stage. The sequential conditional probability ratio test (SCPRT) approach is derived based on the concept of discordance probability, namely, the probability that the decision to accept or reject the null hypothesis based on interim data would reverse should the trial continue to the planned end. This probability is controlled at a preset small level. It is one of the intuitively appealing procedures along with stochastic curtailing but the procedure has been shown to be more efficient than the procedures based on stochastic curtailing. However, in the existing SCPRTs, the discordance probability, type I error and power are not easy to compute. Here we investigate a simulation based method to compute these quantities and apply them to a real data problem as an illustration.
of the parameter θ, as long as S k is a sufficient statistic for θ. The decision rule is of the format: continue to the next stage if 0 j j k L(t , S |t , ) [ , ] k s l l ∈ for some l l −∞ < < < ∞ , to be determined such that the test has a given nominal level α and power. In general the above conditional maximum likelihood, and hence l and l are difficult to evaluate. However, if S j is a Browning motion with drift θ on an unit time interval, i.e., Sj ∼ N(θt j , t j ) (j=1, ..., k; t k =1), then the likelihood ratio is easy to evaluate in closed form, [9], and the boundaries [a j , b j ]'s are determined such that the family wise type I error of the sequential test is under control by a given nominal level α and power. In this case, these boundaries are feasible to evaluate, and will be our focus hereafter. To be specific, if assume the responses X 1 , ...,X nk iid N(μ, σ 2 ), with σ 2 known. We want to test H 0 : μ ≤ μ 0 vs H 1 : μ > μ 0 . Under this assumption, corresponding to a local alternative, which is more reasonable than the fixed alternative.
The SCPRT stopping boundaries for the one-sided hypothesis H 0 vs H a in a k-stage clinical trial are given as Where t j =V ar[S j ]/V ar[S k ] is called the time information fraction. Recall that we have one-sided test, so for the non-sequential trial, when the test statistic S k >z α , we reject H 0 ; otherwise accept. For the sequential trial, at an intermediate stage j, if S j ∈[a j ,b j ] c , we stop the trial and make a decision; if S j >b j , we reject H 0 ; if S j <a j , we accept H 0 ; if S j ∈ [a j ,b j ], the trial is continued to next stage. For the final stage k, a k =b k =z α , so it's always stop. If S k >b k , we reject H 0 , otherwise accept. In particular, if the observed data are iid, then t j =n j /n k ; z is the (1-α)-th upper quantile of the standard normal distribution, and the values of a=a(k, ρ) and b=b(k, ρ) depend on the maximum conditional discordance probability ρ (or the maximum discordance probability ρ max ), the number of stages k and the time points of each stage (the information times). In practice often a symmetric boundary pair is used, i.e. a=b, and with balanced information times (or t j -t j−1 =1/k for j=2, ..., k). In this case the values of a=a(k, ρ)(=b =b(k, ρ)), which depends on k, ρ and the sample standard deviation σ, are given in Table 1 in Xiong, Tan and Boyett [9], for some given σ.
As an example, if the trial has k=4 stages with a known σ, and we choose ρ=0.02, then from Table 1  Other values of a for some selected configurations of k, (n 1 ,...,n k ) and ρ are given in Table 1 of Xiong, Tan and Boyett [9]. Our goal in this communication is to present a simulation based method to compute the discordance probability, type I error and power of the SCPRT procedure in the general case.

Discordance probability
The computation of discordance probability is technical, so below we will first review some of its basic facts. In group sequential clinical trial, the decision can be made at some time point n j ahead of the planed final stage at time n k , and the decision made at time n j may differ to that made at the final time point n k if the trial were continued to the end of the trial. It is impossible to make sure that any early decision would be the same as the one made at the end of the trial should the trial went on. Intuitively, a good group sequential clinical trial should be such that the chance of difference between an early decision and that of the nonsequential trial made at the end should be small (but not too small, for one can make the stopping boundaries of the intermediate stages arbitrarily large so that there is no intermediate stop and the multi-stage trail is the same as a single stage one). The discordance probability of a multi-stage clinical trial is the probability that the sequential test and the non-sequential test lead to different decision (reject/acceptance of the null hypothesis) when both are used on the same sequence of observations. For formal definition of this concept and its computation and detailed properties we refer to the paper of Xiong, Tan, Kutner [10] and citations there.
Let P(D) be the discordance probability, it is related to k and the (a j , b j )'s. Let N be the stopping time of the sequential trial, B a and B r be the acceptance and rejection regions for (N, S N ) for testing 0 0 : H θ ∈ Θ vs.
: a Ha θ ∈ Θ . Let R a and R r be the acceptance and rejection regions for the same test based on the non-sequential test S k , with n k fixed. For a sequential clinical trial with k stages, Define the events Then a r D D D = ∪ , and the discordance probability between the statistics (N, S N ) with k stages and S k is For a sequential clinical trial with k stages, the conditional discordance probability is defined as (Xiong, Tan, Kutner [10]) ρ k,s =P(D|S k =s), and by (2), (3), we need to compute the ( , ) j j n S  's. For continuous end points, typically the normal distribution is assumed, In particular, The above iterative formula for the ( , ) j n s  's is not easy to use via simulation based method, as it involves multiple integration.
Below we consider another expression for the ( , ) j n s  's, also from Xiong, Tan, Kutner [10]. Let N be the stopping time, and  (3). See also Xiong [11] and Xiong and Tan [12]. For each fixed k, they find However, (a, b) determined by (6) depend on k and given value of ρ k . In the case of symmetric (a=b) and balanced trial (equal distance among the stage time points), Table 1 in Xiong, Tan, Boyett [9] gives values of a for different k and ρ k . Even for this case, the computation is non-trivial, especially for large k. For the symmetric case, below we consider a simulation based method for the computation of a=a(k, ρ k ).

Computation of discordance probability
To emphasize its dependence on a, we re-write ρ k in (6) as ρ k (a). However, the supreme in (6) is not easy to evaluate, so we consider the expected conditional discordant probability, given by One may attempt to minimize the above expected conditional discordance probability over a (or the maximum conditional probability, or ρ max ), however this does not make sense. If we take a to be sufficiently large, then at each stage time n j , S j will always be in [a j , b j ] so that there is no stopping at any intermediate stage, and the sequential trial becomes the same as the non-sequential trial, and the discordance probability is zero. This is not desired, as we want the sequential trial to be different from the non-sequential one, and has the chance of actual early stop can only be computed with each subjectively given value of ρ (or ρ, or ρ max ) and the values of ρ (or ρ, or ρ max ) should be chosen such that the probability of the conclusion by sequential test being reversed by the test at the planned end is small, but not too small [7,8].
Under θ 0 =0, assume σ=1,  (7) is re-written as  Table 1   Note in the above we only use the S(i) k 's satisfying either the condition ( ) ii). In i), for each fixed j, we need to compute

Computation of type I error
Recall in the single stage case, the theoretic (or asymptotic) type I error is the significance level α, while the observed type I error for testing H 0 vs H 1 in the simulation is In the general k(>1) stages case, we need to control the family-wise type I error to be no more than α, and for continuous type statistic, to be equal α i.e., ii) The simulated type I error is

Computation of power
Recall in the single stage case, the theoretic (or asymptotic) power for given θ ≠ 0 can be computed from the asymptotic distribution, while the observed power in the simulation is, for testing H 0 vs H 1 , and set S k = S k−1 +((n j − n j−1 )/n k ) 1/2 ϵ k . If S k > b k (= z α ), re-set e i = 1.
ii) The simulated type I error is

Simulation results
In this section we present simulation results for the discordance probability, type I error and power, for some selected configurations of k, (n 1 , ..., n k ) for both balanced and imbalanced designs.
From the algorithm in Section 3.1 we see that, the discordance probability depends on k and α, but not on the configuration of (n 1 , ...,n k ). The simulated mean discordance rate is given in Table 1 below, along with the maximum discordance rate ρ from Table 1 of Xiong, Tan, Boynett [9].
Simulation results of type I errors for some selected configurations are given in Table 2 below, for the balanced case, with n 1 = n 2 −n 1 = · · · = n k − n k−1 = 50. In each case, the number in the upper row is the value of a, from Table 1 in Xiong, Tan, Boynett [9]. The number inside bracket in the lower row is the corresponding simulated type I error. The Monte Carlo size is M = 500,000. Table 3 is the results for family-wise type I error for unbalanced case for some selected K and (n 1 , ..., n k )'s, we omitted the a values which are the same as in Table 2. The rejection rates at each intermediate stages are also displayed. In brackets are the sample size vector, for example (40, 75, 100) means a three-stage design with (n 1 , n 2 , n 3 ) = (40, 75, 100). We see that when ρ is small, the early rejection rate is very small (<0.01), so the multi-stage design and the one-stage design is very similar. For   .1), there are non-negligible early stage rejections and the family-wise type I error is close to that of the onestage design. For large values of ρ (> 0.1), although there is good chance of early rejections, but the family-wise type I error has some deviation from the nominal level α.
Below is the power result by simulation. They are under the same set-up as in Table 1, balanced stages with n 1 = n 2 − n 1 = · · · = n k − n k−1 = 50, except that θ ≠ 0. The value of a is omitted (Table 4). Table 5 shows the powers at each interim stage, for some selected sample sizes and θ.

Real data analysis
We use the beta-blocker heart attack trial (BHAT) data described in Tan, Xiong and Kutner [8] as an illustration of the method. The BHAT was a randomized double-blind trial comparing propranolol (n=1916) with placebo (n=1921) in patients with recent myocardial infarction. The trial was terminated early as a result of a large treatment benefit. Aspects on the interim monitoring and early stopping of this trial have been summarized in DeMets et al. [13], Lan and DeMets [14], and DeMets and Lan [15]. The total number of deaths was postulated to be 628. Patient accrual went 2 years from June 1978 to June 1980 with a 2-year follow-up period. Thus, the maximum duration of the trial was 4 years. Seven interim analyses at the times the Policy and Data Monitoring Board met had been planned using the O'Brien-Fleming boundary. With an adjustment for compliance, 3-year mortality rates were projected to be 0.1746 for the placebo group and 0.1375 for the treatment group. Thus, the log-hazard ratio of the control to the experimental treatment is 0.26, which is deemed the minimum difference of clinical importance to be detected. Roughly 628 deaths are required for a fixed-sample size test to declare such a difference to be significantly different from zero at a significance level of 5% with 90% power. The O'Brien-Fleming boundary was crossed at the sixth interim analysis and the trial was then stopped 9 months early. The observed number of deaths at 3.25 years of study was 318. A reasonable guess of the number of deaths at the planned end was 408. The information time is (0.137, 0.189 1.645). The SCPRT boundary is still crossed at the six interim analysis because the value of the standardized log-rank statistic for the difference in mortality is 2.820 and the boundary value is 2.309. The mean discordance probability between the sequential test and the nonsequential test at the planned end of the four year study is ρ=0.0304, which indicates that it is highly unlikely that the conclusion would be reversed had the trial continued to the planned end. The discordance probabilities provided a less conservative assessment of the likelihood of trend reversal than does the stochastic curtailing procedure. The discordance probability refers to the probability of decision reversal of  the whole interim analysis procedure, whereas stochastic curtailing is local and provides a conditional power of 0.8802 (or the conditional probability of reversal is 0.13), assuming the same number of deaths of 408 at the end.

Conclusion
We proposed a simulation method for the computation of the properties of the sequential conditional probability ratio test in group sequential clinical trial setting, including the discordance probability, type I error and power. The method applies to the general configuration with any number of stages, with any given numbers of sample sizes at each stage, and the method is easy to use. Thus the SCPRT procedure can be used more conveniently, is now for broader application, instead of some selected configurations. The method is applied to a real data example to illustrate its usage. With the simulation method, in future designs of individual clinical trials, we can make more comparisons of performance of the SCPRT method with those of other commonly used ones such as the O'Brien-Fleming procedure in designing the clinical trial so that we can choose the most appropriate design for a particular trial.