A new design strategy for hypothesis testing under response adaptive randomization

: The aim of this paper is to provide a new design strategy for re- sponse adaptive randomization in the case of normal response trials aimed at testing the superiority of one of two available treatments. In particular, we introduce a new test statistic based on the treatment allocation proportion ensuing the adoption of a suitable response adaptive randomization rule that could be more eﬃcient and uniformly more powerful with respect to the classical Wald test. We analyze the conditions under which the suggested strategy, derived by matching an asymptotically best response adaptive procedure and a suitably chosen target allocation, could induce a monotonically increasing power that discriminates with high precision the chosen alternatives. Moreover, we introduce and analyze new classes of targets aimed at maximizing the power of the new statistical test, showing both analytically and via simulations i) how the power function of the suggested test increases as the ethical skew of the chosen target grows, namely overcoming the usual trade-oﬀ between ethics and inference, and ii) the substantial gain of inferential precision ensured by the proposed

Abstract: The aim of this paper is to provide a new design strategy for response adaptive randomization in the case of normal response trials aimed at testing the superiority of one of two available treatments. In particular, we introduce a new test statistic based on the treatment allocation proportion ensuing the adoption of a suitable response adaptive randomization rule that could be more efficient and uniformly more powerful with respect to the classical Wald test. We analyze the conditions under which the suggested strategy, derived by matching an asymptotically best response adaptive procedure and a suitably chosen target allocation, could induce a monotonically increasing power that discriminates with high precision the chosen alternatives. Moreover, we introduce and analyze new classes of targets aimed at maximizing the power of the new statistical test, showing both analytically and via simulations i) how the power function of the suggested test increases as the ethical skew of the chosen target grows, namely overcoming the usual trade-off between ethics and inference, and ii) the substantial gain of inferential precision ensured by the proposed approach.

Introduction
The demands of individual care and experimental information often come into conflict and the ensuing ethical problem, usually referred to as individual-versuscollective ethics, is how to balance the welfare of the patients in the trial against a possible knowledge gain that will improve the care of future patients. In the context of clinical trials, especially in phase-III trials for treatment comparisons, it is widely accepted that Response Adaptive (RA) randomization is a possible answer and this is the reason for which over the past two decades there has been a growing stream of statistical papers on this topic [for a recent review see 2,5].
Often the conflicting goals related to the ethical demand of maximizing the subjects care and to the statistical aim of drawing correct inferential conclusions with high precision can be formalized into suitable (constrained/combined) optimization problems; in this setting, several authors derived target allocations of the treatments that could be regarded as a valid trade-off among ethics and inference [see e.g. 19,22,4]. Generally, these targets depend on the unknown model parameters and they can be approached asymptotically by using suitable RA randomization procedures, namely sequential allocation rules that change at each step the probabilities of treatment assignments to the patients on the basis of earlier responses and past allocations, in order to converge to the chosen target. Classical examples are the well-known doubly-adaptive biased coin design [13] and the efficient randomized adaptive design (ERADE) [14].
Although U.S. government agencies and Health Authorities encourage the adoption of RA procedures [9,10], their use remains controversial due to some inferential problems that could arise [20,21]. Indeed, due to the adaptation process, RA designs induce a complex dependence structure between the outcomes; moreover, the allocations are themselves informative on the model parameters and therefore the ensuing statistical inference must be unconditional on the design, so it should take into account the randomness of the design as well [see 12,5]. In this context, several authors provide the conditions under which the maximum likelihood estimators (MLEs) retain the strong consistency and asymptotic normality properties, that allow one to apply the usual asymptotic inference like, e.g., the classical Wald test. However, the large majority of the design literature is focused on the problem of optimizing the estimation of the treatment effects, while little attention is devoted to hypotheses testing, which has been approached almost exclusively for binary response trials. In particular, Hu and Rosenberger [11,12] showed that the asymptotic power of the Wald test for testing the equality of the probabilities of success of the treatments is negatively affected by the variability of the design; then, they introduced the concept of asymptotically best design, namely RA procedures which, by satisfying a CLT property with asymptotic variance attaining its minimum (i.e., the Rao-Cramér lower bound), guarantee an asymptotically best power (like, for instance, the ERADE). Adopting a large deviations approach, Azriel et al. [3] derived an optimal target allocation in the case of two treatments that maximizes the asymptotic power of Wald test, showing that the ensuing target is quite close to the balanced one (namely the target allocation optimizing the power under homoscedastic outcomes), and so significantly different from the Neyman allocation (i.e., the optimal target under heteroscedasticity). Moreover, Yi and Wang [23] compared the performances of Wald, score, and likelihood ratio tests under RA designs for binary outcomes, showing how these tests are still asymptotically equivalent, with a slight superiority of Wald test. While in the case of homoscedastic normal response trials, Baldi Antognini et al. [6] recently showed how some target allocations may induce an anomalous behaviour of the power of Wald test, that could be locally decreasing or could vanish as the difference between the treatment effects grows, also suggesting a suitable modified version of Wald statistic which avoids some degenerate scenarios.
Since in some circumstances the treatment allocation proportion ensuing the adoption of an RA design behaves as a proper estimator of the chosen target, that could be consistent and asymptotically normal, the aim of this paper is to provide a new statistical test for clinical trials with normal responses which, based on the proportion of treatment assignments, is very simple to implement and allows one to discriminate with high precision the chosen alternatives. By combining this new test with both i) an asymptotically best RA procedure and ii) a suitably chosen target allocation, it is possible to identify a new design strategy that could be more efficient with respect to the classical Wald test, even if matched with the balanced target in the case of homoscedastic outcomes or the Neyman allocation under heteroscedasticity.
More specifically, we derive the conditions on the suggested design strategy under which the power function of the new test monotonically increases as the sample size or the difference between the treatment effects grows. Moreover, we analyze classes of targets aimed at maximizing the power of the new test, also showing how this power function monotonically increases as the ethical skew of the chosen target grows, namely overcoming the usual trade-off between ethics and inference. Throughout the paper, some simulation studies are performed in order to show the consistent gain of inferential precision ensured by the proposed approach, which leads to a substantial gain of power, in some circumstances 10% higher than the classical Wald test's one.
The paper is structured as follows. Starting from some preliminaries in Section 2, Section 3 introduces the new test for normal homoscedastic outcomes, while Section 4 deals with the case of heteroscedasticity. Several examples are provided and some simulations are performed in order to validate the theoretical results, stressing also some practical implications related to the proposed methodology. Finally, Section 5 deals with some conclusions.

Normal model and target allocations
Suppose that patients come to the trial sequentially and are assigned to one of two competing treatments, say A and B. At each step i ≥ 1, let δ i be the indicator managing the allocation of the i th subject, more specifically δ i = 1 if he/she is assigned to A and 0 otherwise, and let Y i be the corresponding response. Conditionally on the allocations, patients' responses are assumed to be independent following (almost approximately) normal distributions with where μ j and σ 2 j denote the mean and the variance under treatment j = A, B. After n assignments, letμ jn be the MLE of the treatment effect μ j (i.e., the corresponding sample mean), while π n = n −1 n i=1 δ i and (1 − π n ) are the allocation proportions to A and B, respectively.
Under this setting, inference is usually focused on the difference μ = μ A − μ B between the treatment effects, so that from now on suppose that the interest lies in testing the superiority of a treatment (say A) in a "the-larger-the-better scenario", namely testing the null hypothesis H 0 : μ = 0 against the right-tailed alternative H 1 : μ > 0 (the two-tailed alternative can straightforwardly be derived). Notice that, under this scenario, μ B is usually regarded as a nuisance parameter; moreover, when σ 2 A = σ 2 B = σ 2 this corresponds to test the stochastic dominance of treatment A over B with respect to the usual stochastic order; while in the case of heteroscedastic responses, when σ 2 A > σ 2 B it corresponds to check if A is greater than B in monotone convex order.
The ethical concern consisting in assigning more patients to the better treatment collides with the inferential goal of deriving correct statistical conclusions with high precision. This duality could be overcome by means of the adoption of a suitable target allocation ρ to A (conversely, 1 − ρ to B) combining these goals. Generally, these targets should depend on both i) the difference μ ∈ R between the treatment effects, in order to skew the assignments to the better performing treatment, and ii) a tuning parameter T > 0 which manages the randomization component of the target, namely low values of T accentuate the ethical skew to the better treatment, while as T grows the ethical component vanishes and the target tends to the balanced one. Thus, from now on we take into account target allocations ρ(μ) = ρ(μ, T ) satisfying for any fixed T > 0 the following properties: These are natural conditions widely satisfied by the target allocations suggested in the literature. In particular, A1 ensures that both treatments are treated symmetrically, namely the target does not change if the treatment labels are switched. The ethical requirement A2 ensures that the superior treatment should be favored, while A3 is a technical condition.
As discussed in Baldi Antognini et al. [6], for any chosen T the behaviour of the target ρ could be represented by the cdf of a continuous r.v. centered at 0 and having support on R, like the Normal target suggested in Bandyopadhyay and Biswas [8] and Atkinson and Biswas [1]: where Φ is the cdf of the standard normal. This target also coincides with the one proposed by Ivanova et al. [15] in the homoscedastic case with T = 2σ √ 2. From A1, ρ could also be characterized by re-scaling the cdf of a continuous random variable defined in R + ∪ {0}, such as the exponential one (i.e., the Laplace cdf). Letting dρ/dx = ρ x , then ρ x (·) can be regarded as the connected pdf: from A2 ρ x (x) ≥ 0 for any x ∈ R and, accordingly to A3, lim x→∞ ρ x (x) = 0 and lim x→∞ xρ x (x) = 0 to ensure integrability. Finally, observe that any given target ρ satisfying A2 with k = 1 can be univocally re-scaled to any desired thresholdk ∈ (1/2, 1) by lettingρ = (2k − 1)ρ + 1 −k. Clearly, the target function could not only depend on the difference μ between the treatment effects, but also on some nuisance parameters. In this case, condition A1 could be replaced by For instance, to encompass "the-larger-the-better" framework, Zhang and Rosenberger [24] (page 564) briefly discussed the target As correctly stated by the authors, this target does not have any natural interpretation. Clearly, it satisfies A1' only in the homoscedastic case; whereas, for heteroscedastic outcomes where ρ * is the well-known Neyman allocation, one of the most commonly cited targets. However, ρ * does not have any ethical appeal, since it could assign the majority of patients to the worst treatment, and it does not satisfy assumptions A1'-A2.

Wald test for RA designs
Given a desired target ρ(μ), RA randomized procedures could be employed to converge to it. After an initial sample size of n 0 allocations to both treatments in order to derive non-trivial estimates, at each step n > 2n 0 , μ is estimated by means ofμ n =μ An −μ Bn and the target is estimated accordingly byρ n = ρ(μ n ); then, the next assignment is forced to the better performing treatment and the allocation proportion π n progressively approaches ρ as n grows. Given a target satisfying A1-A3, several authors [see e.g. 17,7] provided the conditions under which the allocation proportion is a strong consistent estimator of the target, namely lim n→∞ π n = ρ(μ) a.s. These conditions, usually satisfied by the RA rules suggested in the literature (like, e.g., the sequential ML design [17], the doubly adaptive biased coin design and the ERADE; for a general discussion see [12,5]), also guarantee the strong consistency and the asymptotic normality of the MLEs, i.e., lim n→∞ (μ An ,μ Bn ) = (μ A , μ B ) a.s. and, as n → ∞, .
Taking now into account the case of homoscedastic outcomes with unknown common variance σ 2 = σ 2 A = σ 2 B (for the heteroscedastic case see Section 4), Wald test statistic is whereσ 2 n is the usual pooled sample variance. Since lim n→∞ π n = ρ(μ) a.s., then lim n→∞σ 2 n = σ 2 and lim n→∞ ρ(μ n ) = ρ(μ) a.s. Under H 0 , W n converges in distribution to a standard normal r.v. so that, letting z α be the α-percentile of Φ, the power of the right-sided test W n of level α can be approximated by since in a large sample set-upσ 2 n ≈ σ 2 and ρ(μ n ) ≈ ρ(μ). When σ 2 is apriori known, Wald test (2.5) has the same form withσ 2 n replaced by σ 2 and therefore the corresponding power function can still be approximated by (2.6). As is well-known, power (2.6) is maximized when the allocations are balanced, namely if ρ(μ) = 1/2 for any μ, which conversely does not satisfy any ethical issue.
Within this framework, a first fundamental requirement of Wald test W n is that the corresponding power function should tend to one as n grows, that is clearly satisfied for every fixed μ > 0. Moreover, for any fixed n (sufficiently large for the CLT approximation), other additional properties are that the power should i) converge to one as μ grows and ii) be monotonically increasing in μ. As showed in Baldi Antognini et al. [6], these requirements induce the following restrictions on the adopted target ρ: namely i) is fulfilled for targets with a low ethical improvement (or targets with k < 1 in A2), while for every x > 0, (2.8) in order to satisfy ii). Note that these conditions are not trivial and involve the entire functional form of the chosen target. For instance, they fail for ρ N and ρ E , while both of them are satisfied by the targets

The new design-based test for homoscedastic normal outcomes
When the desired target satisfies A1-A3, then ρ(·) is monotone in μ and so invertible: in this setting, testing the equality of the treatment effects H 0 : μ = 0 is equivalent to testing H 0 : ρ(μ) = 1/2 (alternative one-or two-tailed hypotheses can be derived accordingly). At the same time, if the chosen RA procedure converges almost surely to the desired target,ρ n and the allocation proportion π n are competing estimators of ρ. Moreover, when the RA rule satisfies also a CLT property, then the allocation proportion itself could be regarded as a strong consistent and asymptotically normal estimator of ρ and if the design is asymptotically best, then the asymptotic variance of the allocation proportion achieves its minimum. Thus, in the following we will focus on asymptotically best RA rules, namely RA procedures such that, as n → ∞, with asymptotic variance attaining the Rao-Cramér lower bound be a suitable consistent estimator of the asymptotic variance λ 2 , which employs the current allocation proportion π n to estimate ρ(μ) in order to prevent degenerate scenarios [see 6, for a thorough discussion], we propose the following test statistic Under H 0 , Z n converges to a standard normal distribution and therefore the rejection region is Thus, given the consistency ofλ n , the power of the right-tailed test Z n of level α can be approximated by Remark 3.1. Due to the complex dependence structure induced by RA randomization, the exact distribution of the MLEs as well as that of a given test statistic is not available for a fixed sample size. Therefore, the exact power of a test cannot be computed. Within this framework, the approximated power functions in (2.6) and (3.4) are based on the CLT for the corresponding test statistics. These normal-type approximations have been suggested and applied by several authors (see, e.g., Chapter 3 of Lehmann [16] for a general discussion and [18,12,22] for their application to the clinical context) and they are extremely accurate and particularly effective in a moderate-large sample setting, namely the most representative framework in the context of phase-III clinical trials, instead of an asymptotic one (note that, from an asymptotic viewpoint, under the "local alternative" approach the alternative hypothesis takes the form H 1 : μ = δ/ √ n, so the power functions of W n and Z n converge to the same limit Φ δ 2σ − z 1−α , namely these tests perform equally since their asymptotic local powers coincide).
Thus, in this paper we compare tests W n and Z n from a methodological viewpoint on the basis of the approximated power functions in (2.6) and (3.4), henceforth referred to as power or approximated power, likewise. The ensuing theoretical properties will also be explored through simulations, where the behaviour of the simulated power confirms our analytical results.

Properties of the power of test Z n
To analyze the behaviour of the power of test Z n it is suitable to rewrite (3.4) as Φ √ nσ −1 ρ G(μ) − z 1−α , since the power depends on μ only through the adopted target by the function Firstly, observe that ρ G(x) > 0 for any x > 0, so that the power of test Z n tends to one as n grows regardless of the adopted target. Whereas, function (3.4) tends to one as μ grows if and only if the target is chosen such that lim x→∞ ρ G(x) = ∞, namely if ρ satisfies the following condition: This is trivially fulfilled for targets with k ∈ (1/2, 1) in A2; whereas when k = 1, it is easy to check that condition (3.6) is also satisfied by any previously introduced target.
The following Lemma 1 and Corollary 3.1 provide the conditions on the chosen target under which the ensuing approximated power of test Z n is monotonically increasing in μ. In order to identify suitable classes of targets satisfying (3.7), for any given ρ let us introduce the corresponding hazard function ρ h(x) = ρ x (x)/[1 − ρ(x)] for x > 0. The following Corollary provides some sufficient conditions on the target allocation guaranteeing that the approximated power of test Z n is monotonically increasing in μ.

Corollary 3.1. Given a target allocation ρ, the approximated power function of test Z n is monotonically increasing in
Moreover, additional conditions guaranteeing the non-negativity of the LHS of (3.8), and therefore the monotonicity in μ, are:

Gain of power
Now we compare the performances of test Z n to those of the Wald's one given the same desired target allocation ρ(μ). The following Theorem shows how Z n could induce a gain of power with respect to Wald test.

Theorem 3.1. Adopting a given target ρ, test Z n is uniformly more powerful than Wald test W n (i.e., the approximated power (3.4) is uniformly greater than the one in (2.6)) if and only if
for any x > 0, (3.12) with a strict inequality for certain values of x > 0. Whereas, Wald test cannot be uniformly more powerful than test Z n .
Proof. The first statement follows directly from (2.6) and (3.4). Moreover, the non-superiority of Wald test could be easily argued by observing that, due to the property of the target function ρ, the RHS of (3.12) goes to 0 as x grows, while the LHS tends to k − 1/2 > 0. Therefore, it does not exists a target such that ρ(x) − 1/2 ≤ xρ x (x) for every x > 0.

Example 3.2.
By adopting ρ L in (3.11), test Z n is uniformly more powerful than W n . Indeed, from (3.12), i.e., T [exp(2x/T ) − 1] ≥ 2x exp(x/T ) for any x, T > 0. The same conclusion still holds for ρ R in (2.9) and ρ S in (2.10), via simple algebra. Moreover, condition (3.12) is also fulfilled by ρ E in (2.2), since T [1 − exp(−x/T )] ≥ x exp(−x/T ) for every x, T > 0, and for ρ N in (2.1) because is satisfied for any x, T > 0 (indeed, for x > 0 the LHS is greater than 1/2, while at x = T the RHS attains its maximum given by (2πe) −1/2 ∼ = 0.242). Taking now into account the following target condition (3.12) is satisfied only for T ≤ 1; indeed, only when x ≥ T − 1 and therefore test Z n is more powerful than W n only for T ≤ 1.
In order to validate these theoretical results and provide some practical implications of the suggested methodology, we have performed a simulation study by adopting ERADE (with randomization parameter 0.5) and starting sample size n 0 = 1. The results come from 5000 simulations with n = 250, where the responses are assumed homoscedastic normal with σ 2 = 1 and μ B = 1. Figure 1 compares the simulated power of Wald test W n (dashed line) with those of Z n (solid line) under targets ρ E and ρ R for T = 1 and 5.
The simulations show the gain of power induced by the adoption of the proposed design strategy. Notice that, even if in this Figure μ varies between 0 and 0.8 -to highlight the differences between the power functions -ρ E does not satisfy (2.7) and (2.8) and therefore under this target the power of Wald test is not monotonically increasing and vanishes as μ grows (see Table 1 and [6]).
Note that, when the target has a high ethical component (in particular, when ρ x grows extremely fast locally around zero), slightly inflated type-I errors of test Z n could be present (in any case, not greater than 6.5% and not higher than 1.5% with respect to the one of Wald test). This is due to the functional form of the chosen target which, around zero, could induce a slightly unstable behaviour of π n as an estimator of ρ(μ). However, this inflated type-I error for test Z n is not present for every target. For instance, Table 1 compares the simulated power of W n and Z n adopting ρ N and ρ L with T = 1 (we stress the degenerate behaviour of the power of Wald test for large values of μ due to the fact that these targets do not satisfy conditions (2.7) and (2.8)).
We now compare the approximated power of test Z n ensuing from the adoption of an asymptotically best RA design converging to a given target ρ(μ) with the one of the classical Wald test under its optimal scenario, namely when the allocations are balanced, showing the conditions on the chosen target guaranteeing the superiority of the proposed design strategy also in this situation.

Theorem 3.2. Adopting an asymptotically best RA design converging to a target ρ chosen such that
for any x > 0, (3.14) then test Z n is uniformly more powerful than W n equipped by balance. Proof. In the case of balance, the approximated power function of Wald test simply becomes Φ (2σ) −1 √ nμ − z 1−α , for μ > 0; therefore, condition (3.14) follows directly from (3.4).

Example 3.3.
Adopting an asymptotically best RA design converging to ρ L , test Z n is uniformly more powerful than W n under the balanced design. Indeed, condition (3.14) becomes which is satisfied for any x, T > 0. This still holds for ρ E , since for any x, T > 0, for ρ R (because T (T + 2x) ≥ T for any x, T > 0) and for ρ S (since T/2 ≤ T (T + 2 √ x) for any x, T > 0). Whereas, adopting ρ N or ρ B condition (3.14) does not hold.
Under the same simulation scenario of Example 3.2, Figure 2 shows how the proposed design strategy with targets ρ E and ρ R outperforms in terms of power the classical Wald test equipped by balance (note that, in our setting, when ρ(μ) = 1/2 for every μ, then the ERADE becomes an Efron's coin with bias parameter 0.75). As regards test Z n , the gain of power is quite evident and verifies the previously obtained theoretical results. Clearly, as T increases the ethical component of the targets vanishes and, asymptotically, the allocations tend to be balanced. Therefore, in this case the simulated power of tests Z n and W n tend to coincide. As previously showed, test Z n presents slightly inflated type-I errors, while W n is quite conservative (due to the balanced allocation) with a type-I error of 0.046. As discussed in Example 3.2, the inflated type-I error of test Z n can be overcome by taking into account target allocations with a low slope around μ = 0, like ρ L . Table 2 quantifies numerically the gain in terms of power induced by the proposed strategy, stressing also the ethical impact related to the chosen target. In particular, we compare the values of the power of tests W n (equipped by balance) and Z n adopting ρ R and ρ L (with T = 1) as μ varies, showing a substantial improvement in terms of inferential precision and ethics as well. For instance, taking into account ρ R , the gain of power is about 11% at μ = 0.2 (also assigning 8% more subjects to the best treatment), while the inflation of type-I error is around 1%. Whereas, adopting ρ L the type-I error is preserved and the gain in terms of power wrt Wald test is rather high even if lower with respect to ρ R (around 5% at μ = 0.2 with an ethical improvement of 5%). We stress that, adopting ethical targets ρ = ρ(μ), although Wald test preserves the nominal type-I error, its power could be locally decreasing or could even vanish as μ grows (see, e.g., [6] and Table 1).

Ethics improves inference
Taking into account the suggested design strategy, this subsection deals with the improvement in terms of power that could be induced by increasing the ethical skew of suitably chosen targets. In order to stress the dependence on T , in what follows we denote the target function by ρ = ρ(x, T ) and, analogously, ρ h = ρ h(x, T ) and ρ G = ρ G(x, T ). Since T manages the randomization component increasingly, while the ethical skew grows as T → 0, from now on we assume that ρ(x, T ) is differentiable in T with ρ T (x, T ) ≤ 0 for any x, T (analogously to the previous notation, for any function ϕ(s, t) we let ϕ s = ∂ϕ/∂s, ϕ t = ∂ϕ/∂t and ϕ st = ∂ 2 ϕ/∂s∂t).
The following results show how, by adopting the new test Z n , it is possible to overcome the usual trade-off between ethical demands and inferential precision.
, (3.15) then the approximated power function of test Z n is decreasing in T .

Corollary 3.2. Any given target ρ(x, T ) under which the function
is increasing in T satisfies (3.15).
Proof. From (3.15), a sufficient condition for the decreasingness in T is that , for any x, T > 0, The same conclusion still holds for ρ B in (3.13); indeed, the function is increasing in T because its partial derivative with respect to T is non-negative for any x, T > 0, since Whereas ρ E does not satisfy Corollary 3.2 but fulfills condition (3.15), since The same holds for ρ L and ρ S after simple algebra. Figure 3 shows the behaviour of the simulated power of test Z n under targets ρ L , ρ E and ρ R for T = 1 (solid line) and T = 3 (dashed line); results come from the same simulation scenario as before (see Example 3.2). This Figure confirms that, for any considered target, a high ethical component (i.e., low values of T ) induces a gain of power.
Thus, adopting test Z n with a target satisfying (3.15) (or, analogously, (3.16)), small values of T guarantee that more subjects will be assigned to the better treatment and, at the same time, a gain in terms of power to discriminate with higher precision the chosen alternative. The adoption of test Z n induces a partial order between targets on the basis of ρ G, namely given two targets ρ 1 and ρ 2 then ρ 1 ≥ G ρ 2 if and only if ρ1 G(x, T ) ≥ ρ2 G(x, T ) for every x, T > 0 (i.e., ρ 1 guarantees a uniformly more powerful test than ρ 2 ). Within this framework, if two targets have the same functional form with different values of the randomization parameter (namely, ρ l = ρ(x, T l )), then Theorem 3.3 and Corollary 3.2 characterize the classes of targets guaranteeing a simultaneous improvement of the ethical skew and the approximated power of test Z n .
Furthermore, for any fixed T , it is possible to find classes of targets having better performances in terms of power with respect to others, as the following Remark shows.
for every x > 0.
The following Theorem provides the condition for the adopted target to preserve the monotonicity in μ of the approximated power function of test Z n for heteroscedastic responses. . (4.7) Proof. See Appendix A.4.
For instance, targets ρ R and ρ S satisfy condition (4.7) via straightforward algebra.
The following result provides sufficient conditions on a target for inducing a monotonically increasing approximated power of test Z n .
Thus, regardless of ν, (4.8) guarantees that the approximated power of test Z n is monotonically increasing in μ and therefore conditions (3.9) and (3.10) of Corollary 3.1 are still sufficient. Thus, from Example 3.1, adopting ρ E , ρ L and ρ S the power of test Z n is increasing in μ.
Remark 4.1. Given a target ρ, from (4.2) and (4.6) it is straightforward to show that test Z n is uniformly more powerful than Wald test W n if ρ(x, T ) − 1/2 ≥ xρ x (x, T ) for any x, T > 0, i.e., the same condition in the case of homoscedastic outcomes (see Theorem 3.1).
The next result compares the power of test Z n ensuing from the adoption of an asymptotically best RA design converging to a given target with those of the classical Wald test under the Neyman allocation.

Corollary 4.2.
Test Z n ensuing from the adoption of an asymptotically best RA design converging to a target ρ chosen such that, for every x, T > 0, is uniformly more powerful than W n under the Neyman allocation. Thus, regardless of ν, a sufficient condition for (4.9) is Proof. Adopting the Neyman allocation, from (4.2) the approximated power function of Wald test becomes Φ σ −1 for μ > 0. Therefore, condition (4.9) follows directly from (4.6), while (4.10) can be easily derived by observing that ρ(x, Conditions (4.9) and (4.10) are quite restrictive: within the class of targets previously defined, only ρ S in (2.10) satisfies (4.10) and therefore guarantees the superiority of test Z n with respect to the Wald's one combined with Neyman allocation, regardless of the values of σ 2 A and σ 2 B . Finally, the next result extends Theorem 3.3 to the case of heteroscedastic responses, showing how test Z n could overcome the usual trade-off between ethical demands and inferential precision.
then the approximated power of test Z n is decreasing in T . Moreover, for any ν > 0, the increasingness in T of function (3.16) is sufficient to guarantee the decreasingness in T of the power.
Therefore, from Example 3.4, adopting ρ R , ρ S and ρ B the power of test Z n is still decreasing in T even under heteroscedasticity.

Discussion
In general, the ethical issue of assigning more subjects to the better treatment and the aim of improving the care of future patients by maximizing the inferential precision are two conflicting demands. In this paper, we provided a novel statistical test that allows one -under mild conditions on the adopted target as well as the chosen RA rule -to have a simultaneous improvement regarding both the ethical demands and the power of the test. Under this design strategy, that combines an asymptotically best RA procedure, the proposed test and a suitably chosen target allocation, the compromise among ethics and inference translates into a trade-off between the ethical skew of the target and a possible slightly inflated type-I error. This is particularly true when the sample size is small and the chosen target ρ(μ) is characterized by a strong ethical im-provement, since in this case the treatment allocation proportion π n tends to be slightly unstable as an estimator of ρ(μ). However, this does not hold for targets ρ L and ρ N coupled with T = 1 which, under our strategy, are able to control type-I errors.
Moreover, as further simulations omitted here for brevity have showed, for sample sizes lower than 200 combined with a target having a high ethical component and/or small values of T , an inflation of type-I errors could be present for both the suggested test and the classical Wald's one. In order to overcome this drawback, in the case of low sample sizes the ethical component of the target should be quite low, while targets having a strong ethical skew -under which Wald test is not suitable, since its power is not monotonically increasing [6] -require adequately large samples.
In particular, we showed both analytically and via simulations that the choice of target ρ L with T close to 1 not only preserves nominal type-I errors, but also guarantees a gain in terms of both power and ethics. For instance, under the homoscedastic normal model, if compared with the classical Wald test, our design strategy with ρ L (T = 1) guarantees a gain of power of Z n around 5% and assigns 5% more patients to the better treatment when μ = 0.2.
Whereas taking into account ρ E (with T = 1), a slightly inflated type-I error is matched with a gain of power around 8% for μ = 0.2 and 9% more patients assigned to the better treatment. In contrast, despite Wald test preserves in general nominal type-I errors, adopting the exponential target it induces an anomalous behaviour of the power that is locally decreasing and vanishes as the difference between the treatment effects grows [6].
Further research beyond the work in the present paper is needed. For instance, a possible way to overcome the instability induced by the discontinuous allocation function of ERADE could be the employment of the Doubly-Adaptive Biased Coin Design [13] with extremely high values of the randomization parameter (indeed, as the deterministic component in the allocation increases, this design tends to be asymptotically best). Moreover, another open topic is how to generalize our design strategy to multi-treatment clinical trials with more complex statistical models. Although the case of several treatments could seem a straightforward extension, the definition and the properties of the RA designs as well as those of the vectorial target functions are not yet available. For instance, ERADE has still not been generalized to the case of several treatments and therefore, up to now, there is not an RA procedure able to converge to any desired target allocation attaining its minimum asymptotic variance.
then (A.4) can be written as and therefore (3.15) follows directly.

A.3. Proof of Theorem 4.1
Proof. Starting from equation (