Precision of maximum likelihood estimation in adaptive designs

There has been increasing interest in trials that allow for design adaptations like sample size reassessment or treatment selection at an interim analysis. Ignoring the adaptive and multiplicity issues in such designs leads to an inflation of the type 1 error rate, and treatment effect estimates based on the maximum likelihood principle become biased. Whereas the methodological issues concerning hypothesis testing are well understood, it is not clear how to deal with parameter estimation in designs were adaptation rules are not fixed in advanced so that, in practice, the maximum likelihood estimate (MLE) is used. It is therefore important to understand the behavior of the MLE in such designs. The investigation of Bias and mean squared error (MSE) is complicated by the fact that the adaptation rules need not be fully specified in advance and, hence, are usually unknown. To investigate Bias and MSE under such circumstances, we search for the sample size reassessment and selection rules that lead to the maximum Bias or maximum MSE. Generally, this leads to an overestimation of Bias and MSE, which can be reduced by imposing realistic constraints on the rules like, for example, a maximum sample size. We consider designs that start with k treatment groups and a common control and where selection of a single treatment and control is performed at the interim analysis with the possibility to reassess each of the sample sizes. We consider the case of unlimited sample size reassessments as well as several realistically restricted sample size reassessment rules. © 2015 The Authors. Statistics in Medicine Published by John Wiley & Sons Ltd.


Introduction
There has been increasing interest over the last years in adaptive two-stage clinical trials where more than one treatment group are compared with one common control. These trials allow for design adaptations as, for example, sample size reassessment or treatment selection at an interim analysis. It is well known that ignoring the adaptive and multiplicity issues lead to a considerable inflation of the type 1 error rate and that effect estimates based on the maximum likelihood principle may be biased. For the comparison of a single treatment with a control and balanced sample sizes between groups, Proschan and Hunsberger [1] showed that the maximum type 1 error rate can be inflated from 0.05 to 0.11. Graf and Bauer [2] extended this arguments to allow for individual sample size reassessment rules in the treatment and control group respectively, which increases the maximum type 1 error to 0.19. However, when selecting one out of k treatments and control for a second stage, Graf et al. [3] showed that if using the Dunnett test to adjust for multiplicity [4], the maximum type 1 error rate may not exceed the pre-specified -level for specific restrictions on the second stage sample size reassessment rule because of the over-correction for the treatments not tested at the end of the study. A large number of hypothesis testing methods have been developed that allow for flexible sample size adaptations (not pre-fixed in advance) without compromising the overall type 1 error rate based on the combination test approach [5][6][7] or the conditional error principle [8,9] and have been extended to multi-armed clinical trials allowing for treatment selection [6,7,[10][11][12].
Whereas the methodological issues concerning hypothesis testing are well understood, up to now, it is not clear how to deal with parameter estimation after flexible interim adaptations. Several methods have been proposed to reduce or remove the Bias [12][13][14][15][16][17][18][19]. The Bias depends on many different features as the selection procedure, the sample size reassessment rule, or the unknown parameters. The proposed methods therefore do only apply to specific adaptation rules and, hence, are not generally applicable. In particular, in designs were adaptation rules are not fixed in advance, estimation is still an unsolved issue, so that in practice the maximum likelihood estimate (MLE) is still used.
Bauer et al. [20] investigated the impact of treatment selection on the mean Bias and the mean squared error (MSE) when selecting those j (out of k) treatments with the largest observed effects while fixing the total per-group sample size. They further considered designs where the sample size is reshuffled equally to the selected treatment arms and control with the conclusion that due to regression of the mean, Bias decrease as compared with the scenario without reshuffling. To our knowledge, no investigations of other types of sample size and selection rules where undertaken yet. Hence, the behavior of MLE is not yet fully understood for adaptive designs.
Adaptive designs have the practically important feature that the selection and sample size reassessment rule need not be fully pre-specified. This complicates the investigation of Bias and MSE, which depend on the actually unknown sample size and selection rule. Simulations or numerically investigations under typical adaption rules are important, however, can only give partial answers. We therefore investigate the behavior of the MLE from another point of view; we search for the selection and sample size reassessment rule leading to the maximum mean Bias or maximum MSE when using the MLE at the end of the adaptive trial to estimate the treatment effect. Brannath et al. [16] calculated the maximum mean Bias for the case of a one-sample z-test and concluded that the maximum mean Bias in a flexible two stage design is in general and not larger than that of a conventional group sequential design. We will consider scenarios where more than one treatment groups are compared with a common control, and one treatment and the control are selected for the second stage. Moreover, we also allow for flexible choices of the second stage allocation ratio, permitting, for example, a larger increase in sample size for the selected treatment than for the control group.
The case of unlimited sample sizes provides upper bounds for Bias and MSE. We therefore consider also scenarios with restrictions on the sample size to obtain less conservative estimates for real adaptive trials. We will, for instance, investigate bounded second stage sample sizes as well as the restriction on the control group to have a smaller sample size than that of the treatment group. We will also consider designs with a fixed overall sample size for the control group and designs with a fixed total sample size permitting only a reshuffling between the selected treatment and the control, with and without the restriction of a smaller control group.
We will see in this paper that the maximum mean Bias and maximum MSE of the MLE are independent from the true means in the treatment and control groups, without and with restrictions on the second stage sample sizes. As a consequence, they are the same under the null and all alternative hypothesis. This is a very attractive property of the maximum mean Bias and maximum MSE of the MLE that simplifies its investigation and discussion considerably.
The rest of the paper is organized as follows. In Section 2, we describe the type of interim adaptations investigated to calculate the maximum mean Bias and maximum MSE. In Section 3, we investigate the maximum mean Bias and maximum MSE for the case when only k = 1 treatment is compared with one control. In Section 4 we generalize the arguments to the scenario of selecting one out of k > 1 treatments and control for the second stage. A strategy that is intensively discussed in the literature [11-13, 18, 20]. We will end with a discussion of the results in Section 5.

Designs with treatment selection
Assume a clinical trial with parallel groups and a two-stage design that starts at the first stage with k treatments and a control and continues in the second stage with one selected treatment and the control. We assume normally distributed outcomes, X (i,j,l) ∼ N( i , 2 ), i = 0, … , k, where i represents the treatment group, with i = 0 for the control and i = 1, … , k for the experimental treatments, and j ∈ {1, 2} is the index for the stage. The index l stands for the individual, where l = 1, … , n (i,1) in the first stage and l = 1, … , n (i,2) in the second stage for each treatment group i. The common variance 2 is assumed to be known.
An interim analysis is performed after recruitment of n (i,1) patients in the ith experimental treatment and n (0,1) in the control group. For simplicity, we assume balanced sample sizes in the first stage, that is, n (i,1) = n (0,1) = n for all i = 1, … , k, which is a common scenario. However, the second stage sample sizes can be unbalanced. Based on the data of the first stage, X (i,1,l) , i = 0, … , k and l = 1, … , n, we select one out of the k treatments, say treatment s ∈ {1, … , k}, and the control for the second stage. We may also reassess the second sample sizes based on the first stage data. In the second stage, n (s,2) = r s n, patients are recruited in the selected treatment and n (0,2) = r 0 n in the control group, where second-to-first-stage ratios 0 ⩽ r i ⩽ ∞ for i = 0, s, can depend on the first stage data. Note that the selected treatment (or control) can also be stopped at interim by setting r s = 0 (or r 0 = 0). In contrast to the majority of the literature on point estimation in designs with treatment selection, we do not assume a specific selection or sample size reassessment rule and thereby consider the full flexibility permitted with adaptive designs [6,7,10,11].
Treatment selection here means to decide on the treatment 'of interest' for which the effect estimate will further be investigated. For treatments, not selected in the interim analysis, we assume that the treatment effect is not of interest at the end of the trial. In the final analysis, the overall effect of the selected treatment to control is calculated using the maximum likelihood estimators, calculated over both stages: is the sample mean of the first stage, andx (i,2) is the sample mean of the second stage for group i = 0, s. If sample size adjustments are performed based on the first stage data, the overall sample meanx i may be biased (see e.g. Brannath et al. [16]).
Our intention is to derive the worst case, meaning that we are searching for the sample size reassessment and selection rule maximizing the mean Bias (denoted in sequel as "Bias" for short) or the MSE for the selected treatment compared to the control. We prefer to consider the 'root mean squared error', RMSE = √ MSE, because it is on the same scale as the mean and the Bias. In the context of designs with treatments selection, the Bias and MSE are defined as follows: These quantities have also been denoted by 'selection Bias' and 'selection MSE' (cf. Bauer et al. [20]). The general idea of this paper is to determine the maximum Bias or maximum MSE by maximizing at each interim sample point the conditional Bias or conditional MSE given the interim data. By searching for the treatment selection and sample size adaptation rules that maximizes the conditional Bias (or MSE), we obtain the treatment selection and sample size rules that maximizes the overall Bias (or MSE). This idea has been used in Brannath et al. [16] to obtain the maximum Bias in the one-sample case and, thereby, also in the balanced two-sample case. A similar idea has earlier (and later) been used to determine the maximum type 1 error rate of the naive z-test or Dunnett-test [1][2][3].

Two-arm trials with sample size reassessment
For illustrative purposes, we start with the scenario where only one treatment group (k = s = 1) is compared with a control. The results will be generalized to k > 1 in Section 4. We start with a discussion of the maximum Bias and then proceed with a similar investigation of the maximum RMSE.

Maximum Bias
Brannath et al. [16] calculated the maximum Bias of the one-sample mean in a two-stage design with data-driven sample size reassessments. Their result easily generalizes to the treatment effect estimate in a two-arm parallel group design with balanced first and second stage sample sizes, because the treatment effect estimate in a balanced two-arm trial is formally equal to the one-sample mean of observations with variance 2 2 . According to the result in [16], the maximum Bias in an adaptive two-stage trial with two-arms and the restriction n (0,2) = n (1,2) becomes where denotes the standard normal density and r min and r max are pre-specified lower and upper bounds for the data driven second-to-first-stage ratio r = n (0,2) ∕n = n (1,2) ∕n. Note that the maximum Bias is independent from the true means 0 and 1 . We can set r min = 0 and r max = ∞ if no such bounds exist. In this case, the maximum Bias becomes B * (n, , 0, ∞) = (0) 3.1.1. Flexible second-to-first-stage ratios. The restriction to n (0,2) = n (1,2) may be too strong for applications because it does not permit an unequal increase or decrease of the sample sizes in the two arms. For instance, if the control is a placebo, ethical reasons may advise us to increase the sample size only in the treatment group or even decrease it in the control arm. A reduction in the placebo allocation ratio will usually also increase the willingness to participate in the trial. We, therefore, also consider the Bias under unequal sample size adaptations, which can be determined by maximizing the conditional Bias with regard to n (0,2) and n (1,2) without the constraint n (0,2) = n (1,2) . We will see in subparagraph 3.1.5 (where we describe our calculations) that the maximum Bias remains independent from 1 and 0 for flexible second stage sample sizes n (0,2) and n (1,2) . Note that r 0 = n (0,2) ∕n and r 1 = n (1,2) ∕n are the individual second-to-first-stage ratios for the control and treatment group with r min ⩽ min(r 0 , r 1 ) ⩽ max(r 0 , r 1 ) ⩽ r max . Figure 1 (A) for k = 1 shows the maximum Bias, B * , standardized by the standard error √ 2 2 ∕n of the first-stage mean. The shown results do therefore also not depend on the first stage sample size n or the common known variance . The solid lines in Figure 1 (A) show B * ∕ √ 2 2 ∕n for r min = 0, 0.5, and 1 and r max varying from 0 to 3. As expected, the maximum Bias is increasing with decreasing r min and increasing r max , showing that more flexibility leads to a larger maximum Bias. For example, for r min = 1, meaning that the second-stage sample size has to be as least as large as the first-stage sample size and r max = 2 allowing a doubling of the second-stage sample size as compared with the first stage, the maximum Bias is 0.09 times the first-stage standard error, increasing to 0.19 and 0.38 for r min = 0.5 and 0, respectively. This shows that the option for sample size reductions (including early stopping) can largely increases B * .
The maximum Bias appears to be large for some of the scenarios in Figure 1 (A). However, recall that we have plotted B * in units of √ 2 2 ∕n and that √ 2 2 ∕n decreases with increasing per group firststage sample size n. Assume, for instance, that n is half the sample size required for a z-test with power 90% at = i − 0 in a classical two-armed parallel group design at one-sided level = 0.025. Then √ 2 2 ∕n = {Φ −1 (0.975) + Φ −1 (0.9)} −1 = 0.31 , and if (r min , r max ) = (1, 2), then B * = 0.09 √ 2 2 ∕n is only 3% of the effect size assumed in the sample size calculation. Allowing for more flexibility, as, for example, (r min , r max ) = (0, 2), the bias is substantially increasing to 12% of the effect size.
3.1.2. Restriction to r 1 ⩾ r 0 . A reasonable constraint to reduce the maximum Bias is to require that the experimental treatment group is never smaller than the control group. In this case, r 0 can vary between (r min , r max ), while r 1 is restricted to (r 0 , r max ). The dot-dashed lines in Figure 1 (A) show the standardized maximum Bias for this type of restriction. As expected, the maximum Bias is always smaller than the maximum Bias with flexible ratios but larger than the one with balanced second-stage sample sizes. Its line is right in the middle of the two other lines. The dotted lines in Figure 1 (A) shows the standardized maximum Bias if we restrict the second-stage sample sizes to be balanced. If r max = 2, the standardized maximum Bias under the given constraint becomes 0.08, 0.16, and 0.32 for r min = 1, 0.5, and 0, respectively. We can see that for r min ⩾ 0.5 the difference in Bias between the constraints r 1 = r 0 and r 1 ⩾ r 0 is small.

Fixing r 0 .
A stronger restriction is to fix the total sample size of the control group (that is, fixing r 0 ) while in the experimental treatment group sample sizes are reassessed within the window (r 0 , r max ). The MLE of the control group is then unbiased. The maximum Bias, therefore, does not depend on the interim outcome of the control group. However, it depends on the fixed r 0 . The dashed lines in Figure 1 (A) give the standardized maximum Bias for r 0 = 0, 0.5, or 1 while r max varies from r 0 to 3. For example, if r 0 = 1 and r max = 2, then the standardized maximum Bias is 0.05, that is, a little more than half of the Bias with flexible sample size reallocations. We are aware that fixing r 0 = 0 may be an unrealistic scenario always resulting in a second stage without control. However, for complete presentation of the results, the Figure Figure 1 (A) indicates that the minimum r min for the second-stage sample sizes has quite some impact on the maximum Bias. To further elaborate the impact of r min , we have calculated 926 the maximum Bias for r max = ∞ and r min = 0, 0.5, 1. Table I shows the results for the standardized maximum Bias. The row 'flexible' gives the maximum Bias with flexible second-stage allocation ratios; in the row 'r 1 ⩾ r 0 ', the sample size of the experimental treatment group is constraint to be at least as large as in the control group. 'r 0 = r 1 ' means that the second-stage sample sizes are restricted to be balanced, and 'fix r 0 ' that the sample size of the control group is fixed. Here, we are interested in the case k = 1 (one experimental treatment only). We observe that the maximum Bias is halved by letting r min = 1 (i.e. forcing the second-stage sample sizes to be at least as large as the first-stage ones) compared with r min = 0. With r min = 0.5, the maximum Bias is reduced by about 33%. These factors for the maximum Bias seem to be completely independent from the additional restrictions on r 1 and r 0 which is a remarkable finding. A possible explanation for this finding is that the maximum bias is dominated by the minimum sample size r min .

Determination of maximum Bias.
As mentioned in the introduction, the sample size reassessment rule, which maximizes the Bias, is obtained by maximizing the conditional Bias, that is, the deviation of the conditional mean of the treatment effect estimate (given the interim data) from the true parameter value.
For the calculation of the conditional Bias, we standardize the individual stage-wise means Recall that n (1,1) = n (0,1) = n and our definition of r i = n (i,2) ∕n, i = 0, 1, with r min ⩽ min(r 0 , r 1 ) ⩽ max(r 0 , r 1 ) ⩽ r max . To simplify the notation, we will omit the index j for the first stage data and summaries, for example, denoting the first-stage standardized means by z i , i = 0, 1. Similar calculations as in [16] give the conditional Bias: To evaluate the worst case, the second-to-first-stage ratios r 1 and r 0 are searched to maximize (2): Note again that we assume in the following the same lower and upper bounds r min , r max for r 0 and r 1 and that (3) corresponds to the fully flexible case without additional restrictions on r 0 and r 1 (like e.g. r 1 ⩾ r 0 ). The generalization to different boundaries and additional restrictions on (r 1 , r 0 ) are formally straightforward. Clearly,CB depends on the restrictions made for the ratios r i . To assess the worst case reassessment rule for a given interim result, the true means of treatment and control group have to be known. However, our intention is to evaluate an upper bound for the overall Bias. The maximum Bias B * is evaluated by integrating the maximum conditional Bias over all interim outcomes: Obviously, the maximum Bias B * (n, , r min , r max ) does not depend on the unknown 0 and 1 . Fortunately, (2) is the sum of two terms depending only on r 1 or r 0 . Hence, in the fully flexible case, we can maximize each term separately in r 1 or r 0 , respectively. Denoting the worst case sample size fractions byr i , i ∈ {0, 1}, we obtainr 1 = r min for z 1 > 0 andr 1 = r max for z 1 < 0. Similarly,r 0 = r max for z 0 > 0 andr 0 = r min for z 0 < 0. Figure (A) in the Appendix shows the four subsets of the interim outcome space corresponding the four values of the tuple (r 0 ,r 1 ). The maximum Bias, B * , is obtained by integrating the maximum conditional Bias in each subset and summing up the four integrals. This leads to .
Without any restrictions on the second-stage sample size reassessment rule, that is, setting (r min , r max ) = (0, ∞), the maximum Bias simplifies to B * (n, , 0, Comparison of (1)  (2 2 ), as well as the standardized maximum root mean squared error, √ MSE * k (n∕(2 2 )), for different restrictions for the sample size reassessment rules: flexible r s and r 0 , balanced second-stage sample size (r s = r 0 ), a larger second stage sample size in the treatment group (r s ⩾ r 0 ) and fixing the sample size in the control (fix r 0 ). Values are given for r min = 0, 0.5 and 1 setting r max = ∞. For comparison, the fixed design with r min = r max is given showing the maximum selection Bias and mean squared error, respectively. and (5) reveals, that when dropping the constraint of equal second-stage sample sizes, the maximum Bias is increased by the factor √ 2, that is, by about 41%. We finally note how to account for constraints like r 1 ⩾ r 0 . To account for r 1 ⩾ r 0 , we need to rule out that r 1 = r min and r 0 = r max . For z 0 > 0 and z 1 > 0, we therefore maximize CB(z 0 , z 1 , r 0 , r 1 , n, ) under the assumption r 1 = r 0 . In this case, the maximum depends on z 1 − z 0 : it is attained for r 1 = r 0 = r min if z 1 − z 0 > 0 and otherwise for r 1 = r 0 = r max . The maximization of CB(z 0 , z 1 , r 0 , r 1 , n, ) under the constraint of a fixed r 0 follows similar lines as in the fully flexible case (leading to a rule that depends on z 1 only).
3.1.6. Reshuffling. Assume now that a sample size of n g patients per group is pre-planned over both stages, resulting in a total of 2n g patients in the trial. This overall patient number is kept fix. The interim analysis is performed after recruitment of tn g patients per group where t ∈ (0, 1). The ratio t can be interpreted as the timing of the interim analysis. To keep the overall sample size, the second stage needs to consist of 2(1−t)n g patients in total. This number is allocated to the experimental treatment and control group in a data dependent manner. This means that in the interim analysis, a second-stage sample size allocation rate v, 0 ⩽ v ⩽ 1, is chosen based on the interim results, such that in the second stage a number of v2(1 − t)n g patients is allocated to the control and (1 − v)2(1 − t)n g to the experimental treatment group. The conditional Bias (2) can be rewritten as follows: where we use the notation w t = 2( 1 t − 1) for mathematical convenience. The allocation ratio 0 ⩽ v ⩽ 1 is now searched to maximize the conditional Bias (6). By setting the first derivative of the conditional Bias to zero, we obtain a quadratic equation with the two roots that are candidates for theṽ maximizing the conditional Bias. The candidates v (1) and v (2) do only exist if z 0 and z 1 have different signs and z 0 ≠ −z 1 . If z 0 = −z 1 > 0, one can see from (6) that the conditional Bias is maximized for v (3) = 1∕2. Furthermore, v (1) and v (2) are ineligible if larger than 1 or smaller than 0. Whether v (1) or v (2) is actually the maximizer depends on z 0 and z 1 . To assess the global maximum, the candidates v (4) = 0 and v (5) = 1 also have to be investigated. Note that for z 0 = −z 1 < 0, candidates v (4) and v (5) coincide and show the maximizer of the conditional Bias. The worst case conditional Bias is the maximum of the five candidates.
Figure (B) in the Appendix shows the subspaces of the interim outcome in terms of the standardized means in the treatment and control groups corresponding to the different maximizer v (i) for t = 0.5. The white area gives the subspace where either v (1) or v (2) are the global maximum. The dashed line gives the subspace, where v (3) is the global maximum. It can be seen that v (1) is no global optimum for t = 0.5. Numerical integration can be used to compute the overall Bias: For numerical integration, we used the R-package R2Cuba [21]. In the following, we also show results under the restriction 0 ⩽ v ⩽ 0.5, which guarantees that the second-stage sample size of the experimental treatment group is never smaller than that of the control group. The solid black line, marked with 1 in Figure 3 (A), shows the standardized maximum Bias as a function of the timing of the interim analysis t for k = 1 and 0 ⩽ v ⩽ 1. The maximum Bias is now standardized by the standard error of a fixedsize-sample test with per group sample size n g , that is, √ 2 2 ∕n g . We are not standardizing with the standard error of the interim estimate because it depends on t. The dashed line (marked with 1) gives the standardized maximum Bias under the restriction 0 ⩽ v ⩽ 0.5. One can see that (for k = 1) the standardized maximum Bias is decreasing with increasing t, that is, the later interim analysis, the smaller the maximum Bias. This is due to the larger first and smaller second-stage sample sizes. For t = 0.5, that is, planning the interim analysis half way, the standardized maximum Bias is 0.40 if 0 ⩽ v ⩽ 1 and decreases to 0.21 if 0 ⩽ v ⩽ 0.5.

Maximum mean squared error
To maximize the MSE, we proceed similar to calculating the maximum Bias. For each interim outcome, the sample size reassessment rule is searched that maximize the conditional MSE (worst case). The conditional MSE, given the interim outcome, can be calculated as follows (see Appendix A.1): For each z 0 and z 1 , r 0 and r 1 are searched to maximize the CMSE: where r min and r max again denote the lower and upper bounds for the second-to-first-stage ratios r i , i = 0, 1. Additional constraints on (r 0 , r 1 ) like r 1 ⩾ r 0 need to be accounted in the maximum (10). Integrating over all interim outcomes gives the maximum MSE, denoted by MSE * in the sequel, Note that MSE * is also independent from the group means i , i = 0, 1.

Flexible second-to-first-stage ratios.
We start investigating the case of completely flexible r 1 and r 0 within the boundaries (r min , r max ). Note again that we assume equal bounds for the treatment and the control group; however, the sample size reassessment rule for the treatment and control group can be different. To maximize the CMSE in (9), for given z 0 and z 1 at interim, a total of nine candidates have to be investigated, and the global maximum is the maximum over these nine candidates. Integrating over all interim outcomes gives the MSE * . Details can be found in Appendix A.2. The solid lines in Figure 2 (A) show the maximum RMSE, say RMSE * , divided by the standard error of the first-stage mean difference, that is, ∕n. Note that we use here the same standardization as for the maximum Bias and that the standardized RMSE * does not depend on n or . As for the Bias, for increasing r min and decreasing r max , the standardized RMSE * is decreasing. Setting r max = 2 and r min = 0, RMSE * is 1.10 times first-stage standard error. Increasing r min to 0.5 or 1, the values are decreasing to 0.84 and 0.71.
The gray horizontal line through 1 represents the standardized RMSE of the first-stage mean difference. For the investigated r min ⩾ 0.5, the standardized RMSE * is always smaller than 1, meaning that we gain in precision from the second stage, independently of the sample-size reassessment rule. If r min = 0, RMSE * is larger than the first-stage RMSE, indicating that we can lose in precision if sample sizes are reassessed and the trial can be stopped at interim (compared with a trial that consist of the first stage only). The latter is because of the Bias that is possible under sample-size reductions and early stopping (Figure 1 (A)).
Setting r min = r max > 0 gives the RMSE to a fixed-size sample test with a sample size larger than the first stage. For example, r min = r max = 0.5, the standardized RMSE is 0.82 decreasing to 0.71 for r min = r max = 1. It is interesting to see that for r min ⩾ 0.5, RMSE * under flexible sample-size reassessments is only slightly increasing in r max and remains close to the RMSE of the fixed-size-sample test with second stage per group sample size r min n. Hence, for sufficiently large r min , the Bias from any adaptive sample increase will not have a substantial negative effect on the precision of the overall maximum likelihood estimate.
The rows 'flexible' in Table I shows the standardized RMSE * when setting r max = ∞ for r min = 0, 0.5, 1. Without any restrictions on the reassessment rule, setting (r min , r max ) = (0, ∞), the standardized RMSE * is 1.13. Setting r min = 1 and r max = ∞, it is 0.72 as compared with 0.71 for the corresponding fixed-size-sample test.

Balanced second-stage sample sizes.
Restricting the second-stage sample size to be balanced between the treatment groups (r = r 1 = r 0 ) reduces the CMSE (9) to CMSE(y, n, , r) = where y = (z 1 − z 0 )∕ √ 2 is standard and normally distributed. Setting the first derivative to 0, a candidate for the global maximum can be evaluated by r (1) = 1 − (z 1 − z 0 ) 2 . By calculating the second derivative at the point r (1) , it can be shown, that r (1) is a maximum if | (z 1 − z 0 ) |⩽ √ 2. This candidate is the global maximum if r min ⩽ r (1) ⩽ r max ; otherwise, the global maximum is achieved for r (2) = r min or r (3) = r max . The worst case CMSE can be evaluated as the maximum over all three candidates. CMSE(y, n, , r min , r max ) = max i=1,2,3∶r min ⩽r (i) ⩽r max CMSE(y, n, , r (i) ), The dotted lines in Figure 2 (A) show the standardized RMSE * for the case of equal second-stage sample sizes in the groups. The restriction to balanced sample sizes decreases the RMSE * , the decrease being smaller for larger r min . Setting r max = 2, the standardized RMSE * is 1.04, 0.82, and 0.71 for r min = 0, 0.5, or 1, respectively. Note that, for r min ⩾ 0.5, the lines corresponding to the different restrictions are indistinguishable. The rows 'r 1 = r 0 ' for k = 1 in Table I show the standardized RMSE * for r max = ∞. Without any restrictions (r min = 0), the standardized RMSE * is 1.04 and, hence, can still be larger than the RMSE of the first stage. For r min = 0.5 and 1, the standardized RMSE * is more or less equal to the standardized RMSE of the corresponding fixed-size-sample test with per group sample size r min n. This shows that, for sufficiently large r min , the worst case Bias from data driven, balanced secondstage sample size increases has a more or less negligible effect on the precision of the overall maximum likelihood estimate.

3.2.3.
Restricting the treatment to r 1 ⩾ r 0 . The dot-dashed lines in Figure 2 (A) show the standardized RMSE * when restricting the sample size of the treatment group to be as least as large as the sample size of the control group. Setting r max = 2, the standardized RMSE * is 1.07, 0.83 and 0.71 for r min = 0, 0.5 or 1, respectively, which is only slightly larger than RMSE * under balanced second-stage sample sizes. The rows 'r 1 ⩾ r 0 ' for k = 1 in Table I shows the maximum for r max = ∞. Without any restrictions (r min = 0), the standardized RMSE * is 1.09. Here, we see some inflation of the RMSE * compared with the one under the constraint r 1 = r 0 .

3.2.4.
Fixing r 0 . Note that, when fixing r 0 , the MLE of the control group is unbiased. In the maximization of the CMSE only three of the nine candidates remain. Two candidates are derived by setting r 1 = r 0 (= r min ) or r 1 = r max . The third candidate can be calculated as candidate r (6) in the maximization of the CMSE with flexible ratios in Appendix A.2. with r min replaced by r 0 . The dashed lines in Figure 2 (A) show the standardized RMSE * , assuming r 0 = 0, 0.5, or 1 and r 0 ⩽ r 1 ⩽ r max . As expected, the standardized RMSE * is smaller than the ones with flexible reassessment in the control; however, the difference decreases with increasing r min . Setting r max = 2, the standardized RMSE * is 1.06, 0.83, and 0.71 for r 0 = 0, 0.5, and 1, respectively. We can see from Table I and Figure 2 (A) that, for sufficiently large r min or large r max , the differences in RMSE * between the rule with fixed r 0 and the one with r 1 ⩾ r 0 are only small, so that there is no substantial gain in (minimum) precision from fixing r 0 (the lines in Figure (A) are indistinguishable).

Reshuffling.
In case of reshuffling, the CMSE can be rewritten as follows: , (14) where, as before, w t = 2( 1 t − 1). Recall that the per-group first-stage sample size is tn g , and the total overall sample size is fixed at 2n g . At the second stage, v2(1 − t)n g patients are allocated to the control and (1 − v)2(1 − t)n g patients to the treatment group. By setting the first derivative of (14) to zero, candidates for the global maximum are found. This problem can be reduced to finding the roots of a third-degree polynomial, and therefore, at maximum three candidates (v (1) , v (2) , v (3) ) must be assessed. Note that we did not derive these candidates analytically. Instead, we used the R-function polyroot [22] for the numerical root finding. Considering furthermore v (4) = 0 and v (5) = 1, the worst case CMSE is the maximum over five candidates. Integrating over all interim outcomes gives the maximum MSE, denoted as before by MSE * . Figure (D) in the Appendix gives the subspaces of the interim outcome of treatment and control to evaluate the worst case CMSE. In the white area either v (1) , v (2) or v (3) are the global maximizer. As for the maximum Bias, we will, furthermore, also give results when restricting 0 ⩽ v ⩽ 0.5, which means that a larger sample size has to be allocated to the treatment group.
The solid line marked with 1 (the case k = 1) in Figure 3 (B) shows RMSE * for 0 ⩽ v ⩽ 1 divided by the standard error of a fixed-size-sample test with per-group sample size n g . The standardized RMSE * is plotted as a function of the timing of the interim analysis t. The dashed line marked with 1 gives the corresponding standardized RMSE * if 0 ⩽ v ⩽ 0.5. Like the Bias, the standardized RMSE * is decreasing with increasing t. For t = 0.5, the standardized RMSE * is 1.39 if 0 ⩽ v ⩽ 1 and decreases to 1.23 if 0 ⩽ v ⩽ 0.5. Note that the standardized RMSE * is always larger than 1, that is, the RMSE with sample reshuffling (between the experimental and control group) is always larger, and for small t substantially larger, than the RMSE of the reference fixed-sample design with the same overall sample size.

Multi-arm trials with interim treatment selection
In this section, we consider two-stage designs, which start with a control and k > 1 experimental treatment groups and where one experimental treatment, say treatment s ∈ {1, … , k}, and the control are selected for the second stage. The second stage sample sizes are then set based on the interim results. Again, we assume balanced sample sizes in the first stage, while in some of our rules the second-stage sample sizes are permitted to be unbalanced.

Maximum Bias
To evaluate the maximum Bias, we search for the selection and sample size adaptation rule that maximize Bias. These are obtained by first maximizing for each MLE,X i −X 0 , the conditional Bias (see formula (2) for k = 1) with respect to the sample size fractions r i and r 0 and then selecting the treatment s with largest maximized conditional Bias; Integrating over all interim outcomes gives the worst case Bias; where s is data dependently determined as in (15) and z s , z 0 denote the observed interim outcome of the selected treatment and control group, respectively. Note that, for the given z i , each CB(z 0 , z i , n, , r min , r max ) can be calculated according to the case of k = 1 (Section 3.1).

Flexible second-to-first-stage ratios.
In case of flexible second-to-first-stage ratios, r 0 is maximized independently from r s (see formula (2) with r 1 replaced by r s ). Because the conditional Bias and thereby also its maximumCB are increasing in z s for fixed z 0 , we have . (17) This means that the treatment with the largest worst case conditional Bias at interim is the treatment with the largest observed z i , that is, z s = max i=1,…,k z i and (15) reduces to the selection of treatment The maximum Bias (16) can therefore be reduced to where Φ denotes the cumulative distribution function of the standard normal distribution. Note, the probability density function of the maximum of independent standard normal distributions is kΦ(x) k−1 (x). Like for k = 1, the maximum Bias is independent from i for all i = 0, 1, … , k. The solid lines in Figure 1 (B) to (F) show the standardized maximum Bias for k = 2 to 6 as a function of r max for r min = 0, 0.5, and 1. The maximum Bias is standardized by the first-stage standard error √ 2 2 ∕n of one treatment-to-control comparison. Because of this standardization, the shown Biases are also independent of the first stage sample size n and the common variance 2 .
The gray horizontal line shows the standardized Bias for a fixed sample size of n patients per treatment group and post-trial selection [20], which results here from setting r min = r max = 0. Setting r min = r max > 0 gives an adaptive design where in an interim analysis, one treatment and the control are selected for the second stage and a second stage with a fixed sample size is performed. This means that r min = r max gives the selection Bias without any additional Bias because of sample size reassessment.
As expected, the standardized maximum Bias is increasing with increasing k. For flexible reassessment rules, the difference of the maximum Bias to the pure selection Bias (the case r min = r max ) is large for all shown k; however, it is not increasing (rather decreasing) in k. Again, the maximum Bias is effectively decreased by increasing r min .

Maximum Bias under restrictions on the second-stage sample-size ratios.
Obviously, equality (17) holds also under restrictions like r 0 = r s , r 0 ⩽ r s or a fixed r 0 . Hence, we can also utilize the mathematical results from Section 3.1 when restricting the second-to-first-stage ratios.
The dotted and the dashed-dotted lines in Figure 1 (B) to (F) show the standardized maximum Biases for k = 2 to 6 with balanced second-stage sample sizes (r s = r 0 ) and under the restriction r s ⩾ r 0 , respectively. Both restrictions substantially reduce the maximum Bias. For k ⩾ 2, we see only a small difference between the maximum Bias with balanced sample sizes and the one with the restriction r s ⩾ r 0 .
The difference of the maximum Bias to the selection Bias (r min = r max ), that is, the additional Bias due to sample size reassessment is still rather large for r min = 0 but becomes substantially smaller for r min ⩾ 0.5 and is decreasing with increasing number of treatments k. This means that with a larger number of treatments the selection Bias dominates the Bias from data-driven sample-size reassessments. Fixing r 0 (dashed lines) leads to a further reduction in the maximum Bias, which is then close to the selection Bias.

Maximum Bias for r max = ∞.
To investigate the influence of r min more carefully, we give in Table I the standardized maximum Biases for different r min , setting r max = ∞ also for k ⩾ 2. For comparison, the rows r min = r max contain the selection Bias without any sample size reassessments. The maximum Bias is decreasing in r min also in this fixed sample size case because of the increasing second stage sample sizes. Like for the case k = 1, the reduction in the maximum Bias due to an increase in r min is more or less independent from the further restrictions on r s and r 0 , and it seems independent from k: the maximum Bias always decreases by about 33% when setting r min = 0.5 (compared with r min = 0) and by about 50% for r min = 1.
The table confirms the finding that the restriction r s ⩾ r 0 leads to a substantial reduction in the maximum possible Bias, while the restriction to balanced second-stage sample sizes does not lead to a substantial further reduction. For k ⩾ 2 and r min ⩾ 0.5, fixing r 0 has some (but not a large) additional effect on the maximum Bias and brings the Bias close to the pure selection Bias (r min = r max ). We may deduce from this findings that a data driven increase in the sample size for the selected experimental treatment group will (when initially large enough) not lead to a substantially additional Bias.

4.1.4.
Reshuffling. Like in Section 3.2.5, we assume now that a total sample size of n g patients pergroup is pre-planned for the two stages, whereby tn g per-group are used in the first stage, t denoting the timing of the interim analysis. As a consequence, the overall pre-planned second-stage sample size is (1 − t)n g (k + 1). Now, in the interim analysis, one treatment is selected and the second-stage sample size (1−t)n g (k+1) is reshuffled between the selected treatment and control that means that for some v ∈ (0, 1), (1 − v)(1 − t)n g (k + 1) patients are allocated to the selected experimental treatment and v(1 − t)n g (k + 1) patients to the control group. The conditional Bias can be calculated to be (6) with w t = (k +1)∕t −(k +1). Note that the sample size over both stages is tn g + (1 − v)(1 − t)n g (k + 1) in the selected treatment and tn g + v(1 − t)n g (k + 1) in the control group. It can be shown that equality (17) holds also in the case of reshuffling (Appendix A.3), so that in the calculation of the maximum Bias B * k , the (k + 1) dimensional integral can be reduced to a two-dimensional integral. Figure 3 (A) shows values of B * k standardized by the standard error of a two-group fixed-size-sample test with sample size n g , that is, √ 2 2 ∕n g . The standardized maximum Bias is shown as function of t for k = 1 to 6. The solid lines show the values for 0 ⩽ v ⩽ 1 and the dashed lines for 0 ⩽ v ⩽ 0.5. Recall that v ⩽ 0.5 corresponds to the constraint that the control group is smaller or as large as the selected experimental treatment group. For comparison, the gray solid lines give the maximum Bias for an adaptive design with interim selection of one treatment and control at time point t, the second-stage sample size being (1 − t)n g (k + 1)∕2 per-group. This is the selection Bias without additional sample size reassessment. The selection Bias is 0 for t = 0 because we then perform a fixed-sample-size test with only one treatment and control. It is increasing with increasing t, being the selection Bias for a trial with post-trial selection for t = 1. This is equivalent to setting r min = r max = 0 in Figure 1. The standardized maximum Bias (including selection Bias and the Bias due to sample size reassessment) is decreasing with increasing t for v ⩽ 1; however, there is a non-monotonous behavior for the standardized maximum Bias if restricting v to be smaller than 0.5. The maximum Bias is depending on both the selection Bias and the Bias for additional sample-size reassessment. The selection Bias is increasing with t, and the Bias due to sample size reassessment is decreasing with t. This leads to a tradeoff between both types of Bias for k ⩾ 1.
For t = 0.5 and k = 2, 3, 4, the standardized maximum Bias is 0.80, 1.00, 1.14 if v ⩽ 1 and 0.43, 0.50, 0.52 if v ⩽ 0.5, respectively. For comparison, the selection Bias is 0.23, 0.28, and 0.29 for k = 2, 3, and 4. In summary, a sample-size reshuffling between the selected treatment and the control group can lead to a substantial Bias. The maximum Bias is halved by the constraint that the control group is never larger than the selected experimental treatment group.

Maximum mean squared error
To evaluate the maximum MSE, we proceed similar to evaluating the maximum Bias. The selection rule to maximize the MSE is to select the treatment with the maximum worst case CMSE based on the interim result: Note that the treatment with the maximum worst case CMSE is not necessarily the treatment with the maximum observed z i at interim or the treatment with the maximum absolute difference to the control |z i − z 0 |, because the conditional error (9) cannot be written as a function of z i − z 0 . The maximum MSE is a (k + 1) dimensional integral over all interim outcomes: whereCMSE is calculated as discussed in Section 3.2 and s is chosen as in (19). To evaluate the (k + 1) dimensional integral, numerical integration was performed using the R-package R2Cuba [21]. Figure 2 (B) to (F) shows the standardized RMSE * k for k = 2 to 6. As for k = 1, the maximum RMSE was standardized by the standard error of the first stage, that is,

Results.
The solid lines show the scenario with full flexibility on the reassessment rules within the boundary (r min , r max ), the dashed lines when fixing the sample size of the control group. The dot-dashed lines show the values when restricting the sample size of the treatment to be larger as the sample size of the control (r 0 ⩾ r s ) and the dotted lines when restricting the second stage sample size to be balanced (r 0 = r s ). For comparison, the dashed gray horizontal line shows the standardized RMSE of a fixed-sample-size test when selecting the treatment with the maximum effect at the end. The solid gray horizontal line represents the case r min = r max = 0 where we select after n observations per group the treatment with maximum CMSE. By definition (see also formula (9)), for r min = r max = 0, the CMSE is simply the square of the difference between the estimated and true effect.
Note also that, if we restrict the second stage sample size to be balanced between groups, the treatment with the maximum CMSE at interim is the treatment with the maximum observed | z i − z 0 |. Again, the values for r min = r max give the selection RMSE without additional sample size reassessment. We can see from the figures that even though the adaptive sample size reassessment may increase Bias substantially, it has only a small effect on the RMSE, that is, sample-size reassessments do not increase the RMSE much over the RMSE under treatment selection, at least when sample-size reductions are limited to r min ⩾ 0.5. Especially for r min ⩾ 0.5 lines for the different restrictions are indistinguishable because of the small difference between the results. Table I give the standardized RMSE * k for several scenarios setting r max = ∞. Recall that, for a comparison, the rows r min = r max show the RMSE under treatment selection only. Like the maximum Bias, RMSE * k is increasing with increasing k; however, for r min > 0, the additional increase in RMSE * k due to sample size reassessment is small in particular under the additional restrictions on r s and r 0 . The difference becomes smaller, the larger r min and k are. The difference between the fixed and adaptive sample size case is particularly small with balanced second stage sample sizes. This may be due to the fact that balanced sample sizes are optimal with regard to the variance of the second-stage effect estimate. Moreover, aiming on the reduction of RMSE * k , we find for k ⩾ 1 that fixing r 0 is not more and can even be less effective than the restriction to balanced sample sizes (in contrast to what we find for the maximum Bias). Again, there is no large difference between the constraints r s ⩾ r 0 and r s = r 0 , in particular, for larger k.

4.2.2.
Reshuffling. For the case of a sample-size shuffling between the selected treatment and the control group, the maximum conditional mean squared error,CMSE, can be calculated as in Section 3.2.5. Figure 3 (B) shows the resulting standardized RMSE * k for k = 1 to 6 for 0 ⩽ v ⩽ 1 (solid black lines) and for the restriction 0 ⩽ v ⩽ 0.5 (dashed black lines). The gray solid lines gives the maximum selection RMSE for an adaptive design, selecting one treatment and control at interim time point t, allocating in the second stage (1−t)n g (k+1)∕2 patients to each of the two groups. In this balanced case, the treatment with the maximum CMSE at interim is the treatment with the maximum absolute observed difference |z i − z 0 | at interim. Note again that this is not necessarily true if we allow for reshuffling leading to unbalanced second stages.
The selection RMSE (gray lines) is increasing with increasing t. This is similar to the results of [20] where the selection Bias was calculated for the case of selecting the treatment with maximum effect at interim. We note again that selecting the treatment with the maximum treatment effect is not the same as selecting the treatment with the maximum CMSE at interim. As for the Bias, the standardized RMSE * k shows a non-monotonous behavior. There is a trade-off between the variance, which is increasing with t for selection and the Bias due to sample size reassessment, which is decreasing with t. For k ⩾ 2 and with the constraint v ⩽ 0.5, there is a t for which the RMSE is minimal. The minimum is achieved at t-values close to 0.5.
The later the interim analysis the smaller the difference between RMSE * k and selection RMSE. For t = 0.5 and k = 2, 3, 4, the standardized maximum Bias is 1.56, 1.67, 1.74 if v ⩽ 1 and 1.28, 1.27, 1.25 if v ⩽ 0.5. For a comparison, the selection RMSE is 0.99, 0.93, 0.88 for k = 2, 3, and 4, respectively. Note that the case t = 1 gives the worst case RMSE of a classical fixed-sample-size parallel group design where the single treatment is selected post-trial (in a fully flexible manner), and that the RMSE * k of an adaptive design with mid-trial treatment selection and sample size reshuffling is smaller.

Discussion
We investigated in this paper the maximum effect of data-driven sample-size reassessments and treatment selection on Bias and precision of maximum likelihood estimators in multi-armed adaptive designs. We assumed that in an interim analysis, one out of k treatments and the control are selected for a second stage and sample sizes are reassessed in a fully flexible manner with and without restrictions. To best of our knowledge, we are the first who consider Bias and MSE under flexible selection and sample size reassessment rules. In [20], for instance, selection Bias and MSE were considered without sample size reassessment and only for some specific selection rules.
To cope with flexible decision rules, we calculated the maximum Bias and maximum MSE searching at each possible interim outcome for the worst case treatment selection and sample size assignments, which maximize the conditional Bias or conditional MSE. We are aware of the fact that the determination of maximum Bias and MSE will lead to an overestimation and that Bias and MSE may in reality be (substantially) smaller. To bound the conservatism of our approach, we considered several restrictions on the sample-size rules, like balanced second-stage sample sizes or to rules for which the selected experimental treatment group is as least as large as the control group. We saw that these restrictions substantially reduce the maximum Bias and maximum MSE and that in some cases (e.g. when k = 1 and r min = 1) the maximum Bias and maximum inflation of the MSE is small enough to justify the use of the MLE.
In spite of the conservatism of our approach, we have been able to draw several important conclusions. One important conclusion is that a lower bound for the second stage sample sizes may effectively reduce Bias and inflations of the MSE. We saw, for instance, that under the constraint that the second stage sample sizes are at least as large as the first stage n (i.e. the case r min = 1); Bias is in general limited and not much larger than the pure selection Bias. This is particularly the case under the restriction that in the second stage sample the treatment group is as least as large as the control group. Moreover, we found that the maximum Bias is not much further decreased by forcing the treatment groups to be balanced at the second stage or the size of the control group to be fixed. Constraining the second stage sample sizes to be at least as large as the first stage n has an even more pronounced effect on the maximum MSE, which is more or less independent from the maximum sample size (r max ) and the additional restriction on the second-stage allocation ratios. We can, therefore, conclude that with a sufficiently large minimal second-stage sample size a further increase of the sample size in the selected treatment group has only a limited negative effect on Bias and MSE.
We also learned that when fixing the total sample size and reshuffling the (fixed) second-stage sample size between the control and selected treatment group the additional Bias and MSE due to the samplesize reassessments may be substantial even under the (realistic) constraint that the control group is not larger than the experimental treatment group. This is particularly the case when the interim analysis is done early. Note that the results with fixed and flexible overall sample sizes are not easy to compare because we had to use different standardizations for the reshuffling and the other cases, and because in the other cases, the total sample size is not fixed but data dependent and of an less determined magnitude.
Our paper necessarily leaves important questions open. It is known that the selection Bias can be severe even without sample-size reassessments if selection is done late. Early selection in general will reduce the Bias as compared with 'post-trial' selection [20]. Our findings confirm these results. Hence, an important question, that goes beyond the scope of this paper, is the performance of adjusted estimates to account for the selection Bias under flexible selection and sample-size reassessments. To this end, it is important to note that Bias adjusted estimates have only been suggested and considered for designs with fixed (known) selection rules, namely selecting the seemingly most efficient treatment. We consider shrinkage estimates as one of the most interesting candidates as they are known to perform well in terms of the MSE under the common treatment selection process (compared with [19]) but other estimates may be considered as well. Another interesting and important extension of our work would be the consideration of selections rules with more than one selected experimental treatment and with realistic constraints on the selection process. Selection of more than one treatment for the play-the-winner rule without additional sample-size reassessment was investigated in [20]. Calculation of the maximum Bias or MSE for further selection rules would be an interesting contribution.
Depending on the given z 1 and z 0 , this candidate is either a minimum or a maximum. If a maximum, this candidate is ineligible if either r (5) 0 or r (5) 1 is larger than r max or smaller than r min . Setting r (6) 0 = r min , the corresponding worst case reassessment rule for the treatment group can be calculated by setting the first derivative with respect to r 1 (assuming r 0 fixed) to zero and results in Candidates (r (7) 0 , r min ), (r max , r (8) 1 ) and (r (9) 0 , r max ) can be assessed similarly. The global maximum for given z 0 and z 1 is the maximum over all nine candidates and formula (10) can be rewritten as.
Figure (C) in the Appendix shows the subsets (corresponding to the several candidates) when setting r min = 0 and r max = ∞. The subset A5 is the area where candidate 5 is the global maximum. See also the subsets for candidates 1 (area A1), 2 (area A2), 3 (area A3), 6 (area A6), and 7 (area A7). It can be seen that candidates 4, 8, and 9 are no global maxima. Some of the regions are similar to the regions maximizing the conditional Bias ( Figure A). For z 1 or z 0 close to 0, the worst-case reassessment rule to maximize the CMSE is different from setting r 0 or r 1 to r min or r max .

A.3 Maximum CB under reshuffling
The following equality holds in the case of reshuffling: ) .
For fix v, n g , t, and z 0 , the conditional Bias (6) is monotonous in z i . Assume now that, for some observed z s and z 0 , the optimal second-stage allocation rate isṽ. For a z * s > z s , because of monotonicity, CB(z 0 , z * s ,ṽ, n g , t, ) > CB(z 0 , z s ,ṽ, n g , t, ).ṽ may not be the allocation rate maximizing the conditional Bias for z * s , but finding the actual optimumṽ * can only increase the Bias and therefore CB(z 0 , z * s ,ṽ * , n g , t, ) ⩾ CB(z 0 , z * s ,ṽ, n g , t, ) concluding thatCB is monotonous in z i (for fixed z 0 ). Therefore, equality (22) holds.