Response adaptive designs for Phase II trials with binary endpoint based on context-dependent information measures☆

In many rare disease Phase II clinical trials, two objectives are of interest to an investigator: maximising the statistical power and maximising the number of patients responding to the treatment. These two objectives are competing, therefore, clinical trial designs offering a balance between them are needed. Recently, it was argued that response-adaptive designs such as families of multi-arm bandit (MAB) methods could provide the means for achieving this balance. Furthermore, response-adaptive designs based on a concept of context-dependent (weighted) information criteria were recently proposed with a focus on Shannon’s differential entropy. The information-theoretic designs based on the weighted Renyi, Tsallis and Fisher informations are also proposed. Due to built-in parameters of these novel designs, the balance between the statistical power and the number of patients that respond to the treatment can be tuned explicitly. The asymptotic properties of these measures are studied in order to construct intuitive criteria for arm selection. A comprehensive simulation study shows that using the exact criteria over asymptotic ones or using information measures with more parameters, namely Renyi and Tsallis entropies, brings no sufficient gain in terms of the power or proportion of patients allocated to superior treatments. The proposed designs based on information-theoretical criteria are compared to several alternative approaches. For example, via tuning of the built-in parameter, one can find designs with power comparable to the fixed equal randomisation’s but a greater number of patients responded in the trials.

1.1. Theorem 1 Expressions for the standard Renyi, Tsallis and Fisher DE of r.v. Z (n) x with PDF (7) can be found in [4].
Above the fact that lim n→∞ x n = α is used and replaced x with αn in U 1 , U 2 , U 3 . Note that H φκ ν (f (n) x ) = 1 1−ν log U 1 U 2 U 3 . Applying Stirling's formula and Taylor series expansion for logarithms we get: Combining all terms together and removing all terms that decay gives the formula (10).

• Tsallis entropy
The proof is straightforward since x ) . Using asymptotics for H φκ q (f (n) x ) from the Theorem 1 gives the formula (11). Proposition 2. For any fixed large n, Proof of the Proposition 2. From the definition of the Tsallis WDE (6) it follows that we need to show that ∂ ∂ν ((q − 1)T φκ q (f (n) x ), for any fixed large n. From the Theorem (1), for large n, exp ω(q, α, κ, n, γ) , Using formula (27) and inserting q = 1 in the expression above gives the formula (28).

• Fisher entropy
Consider the Fisher Information where In the above expression the fact that lim n→∞ x n = α is used. Integral (29) can be found explicitly (see [4]). After simplification, where ψ (x) is a derivative of the Digamma function. Using asimptotics for the Digamma function, given that β and υ are constants, Applying Taylor series expansion for logarithms, . (32) Note, that the fourth term in (32) tends to zero and the last two terms in the last parenthesis cancel each out as n tends to infinity. Expanding the first squared expression, After multiplication the last term in (32) tends to: Combining all terms together we obtain the formula (12).

Calibration
Below, we provide the details on calibration procedure described in Section 3.2, namely, calibration of prior parameter E, cut-off parameter δ and additional parameters q and ν.

Calibration of prior distribution
For each new design, the growth of E leads to an improvement of power and deterioration of PCA. The major problem when calibrating the E was type I error, as for various values of κ the effect of E on type I error is unpredictable (see Figure 5). To solve the problem a procedure was adjusted manually via a cut-off parameter δ. In general, for most of the designs, the power-PCA balance is more likely to be shifted towards power. Thus, initially, we were interested in values of E ≤ 10 in order to improve PCA in exchange for a minor price in power.
Consider the AF, the numerical results for calibration of which are given in Table 2. At first, the experiment is conducted with different sets of parameters  For instance, for the AF several pairs of parameters E and cut-off δ can be chosen: E = 9 with δ = 9.5%, E = 7 with δ = 9%, and E = 6 with δ = 8.5%. Since we are interested in smaller values of E, the pair E = 6, δ = 0.085 is chosen. Values of E and δ for other entropy criteria were calibrated following the similar logic.

Calibration of parameters for Fisher and Tsallis entropies
Further, we present details on the calibration of q for the exact criterion based on Tsallis entropy and ν for the exact criterion based on Renyi entropy. The calibration of these parameters was conducted comparing the designs with κ ∈ {0.1, 0.2, . . . , 0.9} with already calibrated values of E = 9 and E = 8, for Tsallis and Renyi respectively.
During a pairwise comparison of the designs with different values of additional parameters q and ν for the Tsallis and Renyi designs, respectively, in terms of power and PCA, it was found that the designs with q = 0.35 and ν = 0.75 work slightly better or comparably good in terms of Power and PCA among the designs with different values of q or ν. The operating characteristics for q ∈ {0.1, 0.35, 0.9} and ν ∈ {0.1, 0.75, 0.9}, for Tsallis and Renyi respectively and κ ∈ {0.1, 0.5, 0.9} are given in Figure 6. Figure 6 also illustrates that q and ν have smaller effect on the operating characteristics in comparison to κ. For this reason, more rigorous calibration might be excessive.

Operating characteristics
Below, the numerical results for the designs based on different information measures are given. First, the operating characteristics of interest are compared for the Renyi and Tsallis entropy criteria. Afterwards, we address the problem with the "asymmetrical" designs. Finally, an alternative operating characteristic, namely, probability of correct selections (PCS), for comparison of the designs is considered.
The illustration of how different values of κ ∈ {0.1, 0.5, 0.9} influence operating characteristics for Renyi criterion in comparison to Tsallis criterion are presented in Figure 7. The results for Renyi and Tsallis criteria based designs are comparable in terms of both power and PCA for different scenarios (Figure 7) and on average in comparison to FR (Figure 8).
Note, that both designs are asymmetrical in terms of both power and PCA for small values of κ ∈ {0.1, 0.2}. Designs asymmetrical in terms of PCA are not considered in the analysis for a reason described below. For the Tsallis criterion in terms of PCA for θ a < 0.5 T 0.1 outperforms T 0.5, with average difference of 9.7%, and T 0.5 outperforms T 0.9, with average difference of 7.1%. In contrast to the common positive effect of κ on PCA, T 0.1 is outperformed by T 0.5 under scenarios with θ a > 0.6 by an average of 12.6%.

Designs asymmetric in terms of power
In the Section 3.3. it was stated that the drawback in terms of power for the AS0.5 design for the scenarios with θ a > 0.5 was caused by the form of the chosen weight function reflecting the interest in the outcomes close to γ = 0.999. To describe this drawback the probability of allocating x of 75 patients to less efficacious arm B was calculated as a share of iterations for which this event took place (Figure 9). Recall, that the total number of iterations is 10, 000, θ a = 0.9, θ b = 0.5. If the penalty parameter κ is small, then with a high probability by the end of the experiment, less than 10 out of 75 patients will receive the inferior treatment. In particular, for the AS0.5 design with the scenario θ a = 0.9, with the probability of 83.7% 5 or less patients were assigned to treatment arm B, with the probability of 11.4% no patients being assigned to arm B. Therefore, in a high proportion of simulated trials, there might not be enough information to obtain statistically significant results. Although Fisher's exact test is known to be valid even for small sample sizes, it is implemented at the end of the procedure and does not account for all the information collected during the sequential trials, i.e. the fact that during the experiment only several patients received the treatment from one arm tells us that the design considers another arm as a more efficacious. Presumably, this issue would be resolved if the proposed response-adaptive design will be randomised, which is the scope of further research. The significance level for Fisher's test at the end of the experiment can be made more "flexible", so that for small values of κ in a way that it would account for the changes in information gains. However, a more accurate study is required. Note, that S0.5 and T 0.5 were found to be highly asymmetrical in terms of power for the same reason.

Designs asymmetric in terms of PCA
To describe the drawback in terms of PCA for the T 0.1 design for the scenarios with θ a > 0.5, the probability of allocating x of 75 patients to a superior arm A was calculated as a share of iterations for which this event happened. The probabilities were calculated for the scenario with θ a = 0.9, θ b = 0.5. The results are given in Figure 10. It follows that for T 0.1 the effect of κ is so extreme that the changes in information gain are made slowly, as negative responses are not penalized enough. So, when the first patient gets assigned to arm B with probability 0.5, since the priors for both treatment arms are equal, and responds positively with probability θ b = 0.5, the design sticks to it, resulting in a situation when for each of θ a > 0.5 with probability of 17% no patients were assigned to a superior arm A.
This drawback can be adjusted manually, e.g. more than one patient should be randomly allocated to the treatment arm. However until a more detailed study is conducted the designs with small values of κ, which are highly asymmetrical in terms of PCA, will not be considered in a set of competing designs.

Dynamic programming approaches
To support the choice of p = 0.9 for the CRDP design, we provide the averaged operating characteristics of several specifications of CRDP designs with p ∈ {0.6, 0.7, 0.8, 0.9} and l/n ∈ {0.1, 0.15, 0.25, 0.35} in Figure 11. Considering the designs with a balance shifted towards power, no significant difference was found between the designs with l = 0.35n and various values of p, for the designs with l = 0.35n the comparison is followed by a scenario-by-scenario comparison. Considering the designs with a balance shifted towards PCA, in terms of averaged characteristics, the advantage in terms of PCA when choosing p = 0.9 is more prominent in comparison to designs shifted towards power. In terms of scenario-by-scenario comparison (Figure 12), for CRDP 0.1 with p = 0.6 and p = 0.9 the average difference in terms of power is 0.6%, and for CRDP 0.25 with p = 0.6 and p = 0.9 the average difference in terms of power is 1.5%. For CRDP 0.1 with p = 0.6 and p = 0.9 the average difference in terms of PCA is 3.4%, and for CRDP 0.25 with p = 0.6 and p = 0.9 the average difference in terms of power is 1.7%.