Estimation accuracy under covariate-adaptive randomization procedures

: In this paper we provide some general asymptotic properties of covariate-adaptive (CA) randomized designs aimed at balancing the allocations of two treatments across a set of chosen covariates. In particular, we establish the central limit theorem for a vast class of covariate-adaptive procedures characterized by i) a diﬀerent allocation function for each covariate proﬁle and ii) sequences of allocation rules instead of a pre-ﬁxed one. This result allows one to derive theoretically the asymptotic expressions of the loss of information induced by imbalance and the selection bias due to the lack of randomness, that are the fundamental properties for estimation of every CA rule, widely used in order to compare diﬀerent CA procedures. Besides providing the proofs of unsolved conjectures about some CA designs suggested in the literature, explored up to now almost exclusively through simulations, our results provide substantial insight for future suggestions and represent an accurate tool for the large sample comparisons between CA designs. A numerical study is also performed to assess the validity of the suggested approach.


Introduction
The statistical research on adaptive randomized designs for treatment comparisons has advanced substantially over the past five decades, especially in the context of medical and pharmacological studies where randomization is regarded as a must, since it guarantees comparable treatment groups and a suitable protection against several forms of bias like, e.g., the selection bias arising from the investigators being able to guess the treatment allocations in advance, the accidental bias due to the presence of confounders or time trends, etc... Brought to the fore by Efron [17], adaptive randomization rules are sequential allocation procedures where at each step the accrued information is used to make decisions about the way of randomizing the allocation of the next subject. In particular, Efron's Biased Coin Design (BCD) and, more generally, the socalled restricted randomization methods are adaptive procedures that, by taking into account at each step the sequence of previous assignments, are intended to achieve nearly balanced groups by forcing the allocation of the under-represented treatment, while maintaining at the same time a good degree of randomness (see e.g. [1,32,33,37]).
However real life experiments and, in particular, clinical trials often involve additional information on the experimental units expressed by a set of important covariates/prognostic factors, usually of categorical nature. Taking into account these factors is fundamental for planning the experiment in a suitable way as well as for correct inference (see [7,22,25,29]), so that starting from the works of Zelen [40] and Taves [34], several authors have dealt with the topic of covariateadaptive (CA) designs. These procedures modify at each step the allocation probabilities by taking into account the assignments and the characteristics of previous subjects, as well as those of the current patient, with the aim of ensuring balance between the treatment groups among a set of pre-specified covariates.
Even if balance is often considered desirable from several viewpoints, its mathematical justification is strictly related to the linear homoscedastic model assumptions (see for instance [8]). Let A and B be the treatments to be compared and suppose that for each subject entering the experiment we observe a vector X of categorical covariates, that could be represented by a vector of dummies of their levels. In general, patient's covariates are assumed to be random (i.e., they are not under the experimenters' control) but could be observed before assigning a treatment. Each statistical unit is assigned to one of two treatments and assume that the observation of the ith subject with covariate profile x i follow (at least approximately) the linear homoscedastic model: Var(Y ) = σ 2 (i = 1, . . . , n), where δ i denotes the corresponding allocation, with δ i = 1 if he/she is assigned to A, and 0 otherwise, μ A and μ B are the treatment effects, f t (·) is a known vector function which may include interactions among the covariates, and β is a (q − 1)-dim vector of common regression parameters. This model can be rewritten in matrix form as with Y n = (Y 1 , . . . , Y n ) t , δ n = (δ 1 , . . . , δ n ) t , F n = {(1; f t (x i ))} n×q ,β t = (μ B ; β t ) and I n denotes the n-dim identity matrix. In this setting it is customary to regard β as a nuisance and the inferential interest typically lies in estimating μ A − μ B , or (μ A ; μ B ), as precisely as possible. In both cases, the effi-ciency of the design is given by 1−n −1 L n , where (assuming F t n F n non-singular) is the loss associated with an experiment involving n patients and b t n = (2δ n − 1 n ) t F n is usually called the imbalance vector. Thus, the estimation precision is maximized when the design is perfectly balanced, namely if b n = 0 q (where 0 q denotes the q-dim vector of zeros), which means that the two treatments should be globally equireplicated and, on the basis of the absence or presence of interactions among covariates, the treatments should be also marginally balanced (i.e., A and B are equally replicated at every level of each covariate) or jointly balanced (A and B are equireplicated also within each combination of the covariate levels, usually called stratum).
Adopting an optimum design approach, in order to minimize (1.2) sequentially Atkinson [1] suggested his famous D A -optimum BCD by assigning the (n + 1)st subject to A with probability proportionally to the quantity ( represents a measure of overall covariate imbalance for the profile x n+1 after n steps. Note that under (1.3), at each step the components of the imbalance vector b n that will be activated are the global imbalance and the imbalance terms corresponding to the margins and the stratum of the present subject, that are suitably weighted on the basis of the observed representativeness of the different profiles. Even if this rule is perfectly justified from a mathematical perspective and it is also highly general, since it deals with any number of treatments and any type of covariates, its mathematical structure is quite difficult and this is the main reason for which there is a lack of theoretical results about the properties of D A -optimum BCD, that are explored almost exclusively through simulations (see [2,3,4,5,6]). In the same setting, Begg and Iglewicz [13] suggested a simplified scenario where the treatment allocations are deterministically based on the sign of the quantity (1; f (x n+1 ) t )b n , that corresponds to substituting in (1.3) F t n F n with I q ; thus, the ensuing overall imbalance measure puts together the global imbalance with the marginal imbalances and the stratum imbalance involved for the current profile (namely, the same imbalance terms of the D A -BCD) by summing them without weights, and no particular justifications as well as properties were provided for such a proposal (see [28]). On the other hand, a thorough theoretical work was done by Smith [32,33], who modified Atkinson's idea by approximating is the non-singular matrix depending on the covariate distribution that is assumed to be a-priori known; therefore, the weights are now related to the true representativeness of the covariates in the population of interest, but the knowledge of Q strongly reduces the applicability of the procedure in the actual practice.
Additional classes of CA rules are the well-known minimization methods by Pocock and Simon [26] aimed at achieving marginal balance, where an Efron's coin is applied to a measure of overall imbalance that sums the imbalances of the margins of the covariates (see also [20,31,34]) and the stratified randomization methods, which assign the treatments by using a restricted randomization rule applied to a measure of the current stratum imbalance to achieve joint balance (see [18,23]). Generally, stratified randomization is quite simple to implement and is still now widely used in the clinical practice (see e.g. [35,42]).
However, as correctly stated in Rosenberger and Sverdlov [28] and Ma, Hu and Zhang [25], there is a lack of theoretical results about CA rules as well as the related statistical inference. For instance, only recently Baldi Antognini and Zagoraiou [8] provide the asymptotics of a class of stratified randomization mechanisms -called covariate-adaptive biased coin designs (CABCD) -showing that the imbalance process is bounded in probability, that guarantees excellent performances in terms of loss. Moreover, the same property still holds for a family of CA rules introduced by Hu and Hu [21], where the allocations are randomized accordingly to an Efron's coin applied to the overall imbalance measure of Begg and Iglewicz (suitably generalized by weighting the roles of the global, marginal and stratum imbalances on the basis of a set of prefixed weights).
Even if CABCD and Hu and Hu's design seem to be very attractive, since they guarantee a high order of convergence to balance with respect to all the other CA rules, they are also very exposed to potential bias induced by the ensuing lack of randomness (i.e., predictability), both asymptotically and for finite samples, that plays itself a fundamental role from the viewpoint of estimation precision. Indeed, as emphasized by Atkinson [6] in his interesting review, for a suitable comparison between CA procedures it is important to combine the loss (or other measures of balance) to a measure of selection bias. Otherwise, "It is, however, self-evident that, if selection bias is not an issue, deterministic construction of optimum designs will provide the lowest loss out of all rules considering one treatment allocation at a time". As discussed by many authors (see e.g. [5,6,7,10,33]), a suitable compromise between these two goals consists in assuming procedures that strongly force balance for small samples and become increasingly random as the sample size grows. This could be possible for CA rules under which the imbalance process satisfies a CLT property, but until now, excluding Smith's work, the only theoretical results about the asymptotics of CA rules have been proved for CABCD and Hu and Hu's rule (see [8,21,25]), where emphasis is given on the ergodic properties of the imbalance process having a Markovian structure. Moreover, although a theoretical analysis of hypothesis testing under CA rules has been recently provided (see [25,29,30]), the effects of the adoption of CA randomization procedures in terms of estimation, as well as the comparisons of different methods, have been approached almost exclusively through simulations (see [2,3,13,23,26]).
The aim of this paper is to provide general theoretical results on the asymptotic properties of covariate-adaptive randomized designs aimed at balancing the allocations of two competing treatments in the presence of prognostic factors, taking into account both i) the loss of estimation precision induced by imbalance and ii) the selection bias due to the lack of randomness in the allocation process, namely the fundamental ingredients from an estimation viewpoint under CA randomization. In particular, we prove the asymptotic normality for the treatment allocation proportion of a vast class of CA procedures, which includes stratified randomization as well as allocation rules depending on the information accrued from all the strata, taking also into account designs characterized by sequences of allocation rules instead of a prefixed one. Moreover, we provide the asymptotic variance-covariance structure in an explicit form, stressing the consequence in terms of estimation precision. Under this approach it is possible to derive the asymptotics of several suggested procedures as special cases, as well as to prove some unsolved up to now conjectures of the current CA literature, also providing a general asymptotic framework for future suggestions.
The paper is structured as follows. Starting from the notation in Section 2, Sections 3 deals with the central limit theorem for the treatment allocation proportion of CA rules, while in Section 4 we derive analytically the ensuing loss and bias for asymptotically normal CA procedures. Section 5 discusses the stratified randomization rules as a special case, while Section 6 deals with the asymptotics of Atkinson's rule. A simulation study is performed in Section 7 to assess the validity of the suggested approach, giving also some recommendations about a proper choice of CA rules in the actual practice.

Notation for Covariate-Adaptive designs
The standard notation for CA rules is usually based on the specification of the covariates of interest and their levels, for instance by the definition of f (x) in the linear model (1.1), in order to characterize the different strata as well as the margins of the chosen factors. However, to avoid cumbersome notation, for most of the paper we adopt a simplified notation by vectorizing the combinations of each level of chosen covariates into K ≥ 1 different strata, denoted by k = 1, . . . , K and letting Z i = k represent the stratum of the ith subject. Only if required (as in Section 4), to emphasize the role of the margins of the covariates of interest, this vectorized notation will simply change according to a linear transformation preserving the chosen ordering between strata.
Suppose that the covariates are iid with joint probability distribution over the strata p = (p 1 , . . . , p K ) t , where p k = Pr(Z = k) > 0 and p t 1 K = 1, where 1 K is the K-dim vector of ones. After n steps, at each stratum k (with k = 1, . . . , K) let N nk = n i=1 1 {Zi=k} be the number of subjects with this covariateprofile,p nk = n −1 N nk the corresponding percentage and N nk = n i=1 δ i 1 {Zi=k} denotes the number of allocations to A within this stratum (where 1 {E} is the indicator function of the event E); also, let π nk = N nk /N nk be the allocation proportion to A and D nk = N nk [2π nk − 1] is the corresponding imbalance, namely the difference between the number of subjects assigned to A and those assigned to B for stratum k. Moreover, we set π n = (π n1 , . . . , π nK ) t , N n = (N n1 , . . . , N nK ) t ,p n = (p n1 , . . . ,p nK ) t ,P n = diag(p n ), P = diag(p) and D n = (D n1 , . . . , D nK ) t , where clearly N t n 1 K = n,p t n 1 K = 1 and p t 1 K = 1 for n ≥ 1, while D t n 1 K = D n is the global imbalance between the two arms after n steps.
In general, under CA procedures the sequence of allocations is a stochastic process that can be represented by the sequence of the conditional probabilities of assigning A given the previous allocations and covariates, as well as the covariate of the present subject, namely Pr(δ n+1 = 1 | δ 1 , . . . , δ n ; Z 1 , . . . , Z n , Z n+1 ). So letting n = σ {δ 1 , . . . , δ n ; Z 1 , . . . , Z n } be the σ−field representing the natural history of the experiment up to step n, with 0 the trivial σ−algebra, from now on we consider the class of CA procedures such that the first allocation at each stratum is completely randomized and, at each step n, is the so-called allocation function and we set ϕ n (π n ,p n ) = (ϕ n1 (π n ,p n ), . . . , ϕ nK (π n ,p n )) t . Note that this framework is quite general, since • at each stratum k, it allows to consider a sequence of allocation functions {ϕ nk } n≥1 instead of a prefixed one ϕ k , in order to model the relative importance of the experimental goals (often balance and randomness) also as a function of the sample size [12,11]; • the allocation functions could be different at the different strata, to allow a possibly different importance of the covariate profiles [20,21]; otherwise, letting ϕ nk = ϕ n for every k = 1, . . . , K, all the strata will be treated in the same way; • under (2.1), the evolutions of the allocation proportions at different strata are not independent, since they could involve the information accrued from all the strata. Clearly, the so-called stratified randomization is a special case with ϕ nk (π n ,p n ) = ϕ nk (π nk ,p nk ) for k = 1, . . . , K, i.e., for any given stratum at each step the allocation probability depends only on the actual allocation proportion and the current percentage of subjects observed for this profile.
Since the goal of covariate-adaptive randomization is balancing the allocations across covariates, natural conditions (fulfilled by all CA designs suggested in the literature) that will be always assumed throughout the paper are: H1: the two treatments A and B should be treated symmetrically (namely their roles could be exchanged); this corresponds to assume at each step n the following symmetric structure for the allocation functions: which clearly implies that H2: at each step n, the allocation function ϕ nk (π n ,p n ) should be decreasing in π nk in order to force balance on the basis of its current allocation proportion (for any k = 1, . . . , K).
Within this framework, several authors have suggested suitable conditions on the allocation rule that guarantee the asymptotic balance of CA rules, i.e., These conditions are based on technical approaches inspired by i) non-negative almost supermartingales [27], ii) Stochastic Approximation (SA) methods [16] and iii) down-crossing methodology [11]. From now on we take into account asymptotically balanced CA procedures, assuming that H1-H2 and (2.4) are always satisfied, in order to characterize the asymptotic normality of their treatment allocation proportion.

Asymptotic normality for CA rules
We now provide a central limit theorem for the treatment allocation proportions of asymptotically balanced CA designs in the case of categorical covariates. The underlined idea is to fit the evolution of the treatment assignments of a CA rule into a multivariate SA algorithm, in order to derive their asymptotic normality (similar results for response-adaptive designs in the absence of covariates can be found in [24,41]).  x, y), . . . , ϕ K (x, y)) is Lipschitz continuous and locally differentiable at Remark 3.1. Note that: • when ϕ n (x, y) = ϕ(x, y), condition C3 becomes trivial, while C1 simply reads that ϕ(x, y) is a continuous function locally differentiable at (2 −1 1 K ; p) with bounded partial derivatives; clearly, condition C1 is quite restrictive and excludes CA procedures characterized by discontinuous allocation rules (like, e.g., the minimization method of Pocock and Simon [26] and the CA rule proposed by Hu and Hu [21] based on Efron's step function); Moreover, the spectrum of J ϕx has also a crucial role for deriving the matrix exponential e Jϕ x u in (3.1). For instance, if J ϕx is symmetric, then all the eigenvalues λ i are real and J ϕx can be diagonalized by an orthogonal matrix, i.e., there exists a real orthogonal matrix K such that Λ = K t J ϕx K is a diagonal matrix containing the eigenvalues of J ϕx , and therefore More generally, if J ϕx is not diagonalizable, the Jordan canonical form can be used in order to derive e Jϕ x u . Since J ϕx is similar to a block diagonal ma- where the columns of T are the generalized eigenvectors of J ϕx and Thus, e Jϕ x u = T diag e B1u , . . . , e Bmu T −1 and hence where s denotes the matrix dimension of the Jordan block B i .

Bias
Lack of randomness of a given design, usually interpreted in terms of predictability, shows in the possibility for the experimenter to partially guess the sequence of treatment allocations in order to select the most appropriate patients. Following the approach of Blackwell and Hodges [14], the selection bias is usually measured by the proportion of correct guesses when the best guessing strategy is used, namely to pick at each step the under-represented treatment with no preference in case of a tie.
Let U i = 1 if the ith assignment is guessed correctly, and 0 otherwise, then the lack of randomness induced by a design after n steps can be measured by SB n = n −1 n i=1 U i , so that the comparisons between different procedures in terms of predictability are usually evaluated by taking the expected value, i.e., E(SB n ) = n −1 n i=1 Pr(U i = 1), often called selection bias indicator (eventually re-scaled to lies in [0; 1]). For CA rules, at each step the allocation of the next subject depends on its covariate profile, which identifies the stratum of interest. Thus, when the ith patient belonging to the kth stratum (k = 1, . . . , K) is ready to be randomized, the probability of correctly guessing his/her treatment allocation given the optimal strategy is where a ∧ b denotes the maximum between a, b ∈ R. A straightforward consequence of Corollary 4.1 is that lim n→∞ E(SB n ) = 2 −1 , namely CA rules satisfying the CLT property are asymptotically unpredictable like complete randomization.

Loss
To explore the properties of CA rules in terms of loss of estimation precision, a linear model setup is essential for discriminating the different roles of the covariate margins with respect to those of the strata. Hoverer, as discussed in Section 2, on the basis of the chosen f (x) specifying the covariates taken into account, their levels (i.e., the associated dummies) as well as the presence or absence of interactions among them, the vectorized notation of the previous Sections is still valid provided that the corresponding linear transformation preserving the chosen ordering between strata is applied. Indeed, for any chosen (q − 1)-dim vector f (x) there exists a unique (q − 1) × K-dim Boolean matrixȦ (i.e., with elements 0 and 1) such that E[f (x)] =Ȧp. Essentially,Ȧ specifies the correspondence between strata and covariate margins with respect to the chosen order (namely, which strata collapse into a given margin for each covariate). Thus, letting A t = (1 K :Ȧ t ), it follows that E(F t 1 ) = Ap, b n = AD n and F t n F n = nAP n A t for every n, where A is full row rank q and it is non-singular when the model is full (i.e., when q = K). (4.2) Proof. See Appendix C.
Therefore, any large sample comparison between different CA rules that are asymptotically normal should be made taking into account exclusively the asymptotic expected loss of estimation precision in (4.2), because these procedures are always asymptotically unpredictable. Moreover, Theorem 4.1 states a clear relationship between the asymptotic loss of precision and the asymptotic variance of the allocation proportion. Indeed, from (3.1) it is possible to compare different CA rules satisfying the CLT property, looking for designs that induce a suitably small variability in the allocation proportion, which corresponds to a lower asymptotic loss of inferential precision. Indeed, by employing the Löwner ordering for symmetric matrices (denoted here by < L ), namely (1) is positive definite, given two CA designs satisfying the hypotheses of Theorem 3.1 with (asymptotic) variance Σ (1) and Σ (2) , respectively, rule-1 is better than rule-2 if and only if Σ (1) < L Σ (2) . It is quite obvious that the variance achieves its maximum if J ϕx = O K , namely when the asymptotic allocation rule ϕ(x, y) does not depend on x (like e.g. for complete randomization). In such a case, Σ = (4P ) −1 and therefore N ∼ N (0 q ; AP A t ); thus, L n converges in distribution to a Chi-squared r.v. χ 2 q and hence lim n→∞ E(L n ) = q. Thus, the asymptotic expected loss depends on the design only through the main diagonal of Σ and therefore for every CA rule satisfying the hypothesis of Theorem 3.1 it is possible to identify a stratified randomization rule which guarantees the same expected asymptotic loss.

A special case: stratified randomization
Under stratified randomization, at each step the evolution of each stratum depends only on the information gathered up to that step for this covariate profile, namely Pr(δ n+1 = 1 | n , Z n+1 = k) = ϕ nk (π nk ,p nk ), k= 1, . . . , K.
In this simplified setting, when the hypotheses of Theorem 3.1 are satisfied, the Jacobian of the limiting allocation function ϕ has a diagonal structure and its entries coincide with the corresponding eigenvalues.
Proof. See Appendix D.
Notice that, from Theorem 4.1, N ∼ N (0 q ; A [I K − 2J ϕx ] −1 P A t ); therefore, in general, the loss L n does not converge to a Chi-squared r.v. and

.1 and Corollary 5.1 derive analytically the relationship between the asymptotic expected loss and the numbers of covariates/levels involved in the model; in particular, results (5.4) and (5.5) allow one to choose the allocation function in order to obtain the desirable asymptotic precision.
For instance, adopting a restricted randomization rule with the same allocation function ϕ at every stratum, to obtain a given asymptotic lossL ∞ that does not depend on q, we should choose ϕ such that ρ = (

the deterministic component in the allocation function should be increasing in q in order to guarantee that the asymptotic loss of estimation precision is not affected by the number of prognostic factors and their levels).
Moreover, since N nk = π nk N nk , then the global number of allocations to A after n steps is N n = π t npn and a further consequence of Corollary 5.1 is that, as n tends to infinity, Therefore, the (global) percentage of allocation to A is asymptotically normal, with asymptotic variance depending on both i) the different allocation functions chosen in the strata and ii) the representativeness of the strata. When ρ 1 = . . . = ρ k = ρ (i.e., by choosing the same allocation function at each stratum) then the asymptotic variance in (5.6) simply becomes [4(1 − 2ρ)] −1 , that confirms the conjecture of Atkinson [6] (page 19) that the distribution of N n is asymptotically independent of the presence of covariates. [19,36,37,39]), that can be described as follows. At each stratum k = 1, . . . , K, suppose that there is an urn u k containing balls of 2 different types (i.e., colors), one for each treatment. When the (n + 1)st statistical unit belonging to stratum k is ready to be randomized, a ball is drawn at random from among all the ones in u k and, if the chosen ball is of type j (j = A, B), then the corresponding treatment will be assigned to the present unit. Then the selected ball of type j is replaced in u k together with α additional balls of the same type and ζ additional balls of the other type, with α, ζ ≥ 0, while the composition of all the other urns uk withk = k remain unchanged. Starting with an initial urn composition W k A0 = W k B0 = w, letting W k jn represents the number of balls of type j present in u k after n assignments, then π nk ); thus, the total number of balls in u k after n steps is W k

Example 5.1. Generalized Friedman Urn. We now analyze the asymptotic properties of a stratified randomization procedure based on the well-known Generalized Friedman Urn (GFU) model (see
, since for each subject belonging to the kth stratum (α + ζ) balls are added in u k , and therefore This procedure corresponds to a stratified randomization with the same sequence of allocation functions at each stratum k = 1, . . . , K, i.e., .

Example 5.2. Reinforced Doubly-adaptive Biased Coin Design. Since the evolution and the convergence of the allocation proportion at each stratum depends on the number of subjects falling into that covariate profile, strata that could potentially be under-represented might induce high deviations from the corresponding targets. For this reason Baldi Antognini and Zagoraiou [9] have proposed the Reinforced Doubly-adaptive Biased Coin Design (RD-BCD), which is a general class of randomization procedures for categorical covariates intended
to converge to any given allocation proportion by forcing closeness to the target when necessary. Taking into account the balanced target, assume at each step n the following allocation rule for the kth stratum where ν : (0; 1) → R + ∪ {0} is a continuous and decreasing function that could be chosen on the basis of the need for more/less balance. Clearly, allocation (5.8) satisfies assumptions H1-H2 and furthermore every ϕ k is decreasing in π nk , so that the almost sure convergence to balance in (2.4) is guaranteed (see [11]). Moreover, letting for instance ν(p nk ) =p −1 nk , then J ϕx = −P −1 and hence Σ = diag [4(p k + 2)] −1 k=1,...,K . In general, the asymptotic expected loss of this procedure depends on the covariate distribution; clearly, under the uniform distribution (i.e., p k = K −1 for every k = 1, . . . , K), straightforward calculations show that lim n→∞ E(L n ) = q/(1 + 2K), recalling that q = K when the model is full.

Remark 5.2. Even if Corollary 5.1 is quite general, it is evident that cannot be applied for any stratified CA rule as, for instance, for CABCD suggested by
Baldi Antognini and Zagoraiou [8]. This procedure is defined by assigning the (n + 1)st patient to A with probability where at each stratum k, F k (·) : R → [0, 1] is a non-increasing and symmetric function with F k (−x) = 1 − F k (x). This corresponds to assume at each step n ϕ nk (π n ;p n ) = ϕ nk (π nk ;p nk ) = F k [np nk (2π nk − 1)] , k = 1, . . . , K (5.9) and, from the properties of F k , the function ϕ nk (x; y) is non-increasing in x with ϕ nk (1/2; y) = 1/2 for all y ∈ (0; 1). Thus, ϕ is antitone and therefore the CABCD is asymptotically balanced, i.e. lim n→∞ π n = 2 −1 1 K a.s. However, the asymptotic normality stated in ( which is discontinuous at 1/2. Roughly speaking, although at each stratum k = 1, . . . , K {ϕ nk } n≥1 is a sequence of continuous and differentiable functions, as n grows the partial derivative of ϕ nk (x; y) with respect to x evaluated around 1/2 decreases and tends to −∞ (i.e., the asymptotic variance in (5.3) vanishes). Therefore, if compared to the other CA procedures satisfying the CLT, the CABCD guarantees a high order of convergence towards balance, due to the underlined ergodic structure of {D n } n≥1 , that clearly implies a remarkable drawback in terms of predictability of the allocations.

Proof of some unsolved conjectures about Atkinson's D A -optimum BCD
Inspired by the Wynn-Fedorov sequential construction of D-optimal design under the linear model (1.1), Atkinson D A -BCD is a randomized CA rule defined by (6.1) This rule is asymptotically balanced (see for instance [1,11,15]) and coincides (or not) to a stratified randomization on the basis of the presence (or absence) of interactions among covariates in the model, and for this reason we treat these two situations separately.

The case of full model
Assuming model (1.1) with full interactions effects among covariates, A is nonsingular and Thus, let a t k represent the kth row of A t and e k be the standard basis vector, if the (n + 1)st subject belongs to the kth stratum the quantity (1.3) becomes i.e., it is equal to the relative imbalance of the corresponding stratum. Therefore D A -BCD in (6.1) becomes a stratified randomization rule defined by (see [8]): Clearly, procedure (6.2) assumes the same allocation rule at each step n, as well as at every stratum k = 1, . . . , K, i.e., which is a continuous and differentiable function, decreasing in x with ϕ(1/2, y)= 1/2 for any y ∈ (0; 1); therefore, from Corollary (5.1), as n tends to infinity, π n → 2 −1 1 K a.s. and √ n(π n − 2 −1 1 K ) → d N 0 K ; (20P ) −1 , since ρ k = −2 for any k = 1 . . . , K.
As regards the loss, from Corollary 4.1 and (4.3), L n → d 5 −1 χ 2 q as n tends to infinity and therefore lim n→∞ E(L n ) = q/5, regardless of the covariate distribution. Remark 6.1. In the same spirit of Wei [38] and Smith [32,33], Atkinson's rule (6.2) , provided that f (·) is locally differentiable around 0 with bounded derivative. Moreover, these results agree with the theoretical properties derived by Smith [32,33], as well as with those obtained by Wei [38] in the absence of covariates (that can be regarded as a special case of our framework with K = 1 and p K = 1).

Model without interactions
In the absence of some/all interactions among covariates, Atkinson's procedure is not a stratified randomization method. Indeed, from (6.1), the allocation rule of Atkinson's D A -BCD for the kth stratum after n steps could be rewritten as follows ϕ nk (π n ,p n ) = 1 + 1 + h k (π n ;p n ) 1 − h k (π n ;p n ) and rank(A) = q < K, so A is singular. After tedious algebra (see Appendix E), with Re(λ max ) = 0, because Ξ is an idempotent matrix. Thus, and therefore, from (4.2), lim n→∞ E(L n ) = q/5. Consequently, the asymptotic expected loss of estimation precision for Atkinson's D A -BCD is always q/5, regardless of i) the covariate distribution, as well as the presence/absence of a correlation structure among the covariates, and ii) the interactions among covariates included in the model. This result also confirms the conjecture of Burman [15] about the asymptotic loss of Atkinson's design, where the quantity q/5 was observed in a simulation study of Atkinson's rule (in its deterministic version) with four normally distributed covariates. Moreover, from (5.7), it is easy to show that Atkinson's rule always guarantees a lower asymptotic expected loss with respect to the Generalized Friedman Urn.

A simulation study for comparing CA rules
In the present Section we describe how to use the previous results in order to compare different CA procedures satisfying the CLT. A numerical study is performed to assess the properties of some CA rules on the basis of the ensuing asymptotic expected loss of estimation precision, taking also into account the selection bias, in order to analyze the speed of convergence of E(L n ) and E(SB n ) to their asymptotic values and therefore to show the validity of the suggested approach. Since, in general, the loss depends on the model assumptions, in particular on the presence or absence of interactions among covariates, these two situations will be treated separately (in Table 1 and Table 2, respectively).
More in detail, we take into account model (1.1) with two binary covariates (i.e., K = 4) under four different scenarios: in the presence or absence of interactions (namely with q = 4 or q = 3, respectively) and under two different covariate distributions: the uniform one p (1) = 4 −1 1 4 and p (2) = (0.3; 0.3; 0.3; 0.1) t . Note that the choices for the covariate distributions are motivated by the fact that for the RD-BCD the highest asymptotic expected loss is induced by i) the uniform distribution when the model is full and ii) a distribution where one stratum tends to vanish while the remaining strata are equally-represented, in Table 1 Expected loss of estimation precision and selection bias (within brackets) under model (1.1) in the full version for n = 100, 200 and 500.
p (1) p (2)  In what follows we compare the performances of Atkinson's D A -BCD and the RD-BCD, discussed in Example 5.2, with ν(p nj ) =p −1 nj . We also consider Hu and Hu's rule [21] -denoted by Hu&Hu(p) -with biased coin probability p equal to 2/3 and 3/4 and weights: ω g = 1/3 for the global imbalance, ω s = 1/3 for the stratum imbalances and the same weights for the marginal imbalances of the two covariates equal to 1/6. Moreover, to stress the peculiarity of minimization methods as well, we take into account Pocock and Simon's procedure -denoted by P&S(p) -with the same choices for the biased coin probabilities, i.e. 2/3 and 3/4.
When the model is full (K = q = 4), the asymptotic expected loss for both stratified randomization rules rapidly converges towards their theoretical values, given by 4/5 for D A -BCD (regardless of the covariate distribution), 4/9 and 0.439 for the RD-BCD under p (1) and p (2) , respectively. The same behavior holds also in the absence of interactions, where the asymptotic expected loss for Atkinson's rule is 3/5, while it is given by 1/3 and 0.35 for RD-BCD under p (1) and p (2) , respectively.
In the presence of interactions, the superiority of stratified randomization rules wrt P&S is evident. Even if the bias is substantially equivalent for D A -BCD and RD-BCD, the latter guaranties an asymptotic expected loss which is almost half of Atkinson's one; this is also confirmed in the absence of interactions, although this gain is lower.
In general, for n > 200 the differences in terms of selection bias between D A -BCD and RD-BCD are negligible (i.e., of order 10 −3 ) and this validates our suggestion that the comparisons between different CA rules satisfying the CLT should be based only on the loss indicator.
As regards Hu&Hu's procedure, the asymptotic expected loss rapidly vanishes (only for small samples is comparable to the RD-BCD), but this procedure is characterized by extremely high levels of predictability, around 66% when p = 2/3 and 73% for p = 3/4 (namely like those of the permuted block designs of size 2 and 3 respectively), regardless of the model assumptions.
A different analysis should be made for Pocock and Simon's minimization method. Regardless of the presence/absence of interactions, the selection bias of P&S is extremely high (around 70% when p = 3/4 and 64% when p = 2/3, but P&S is less predictable than Hu&Hu's rule with the same bias coin probability) and quite stable for any sample size, that clearly prejudices the usefulness of minimization methods due to their high exposure to potential bias. Taking into account the loss, when the model is full the asymptotic loss of Pocock and Simon's rule is higher wrt the considered stratified randomization procedures; moreover, the loss does not vanish and, asymptotically, seems to converge to 1 (even if the marginal imbalances and the global one are always bounded in probability, as proved in Ma, Hu and Zhang [25], in this case the stratum imbalances seem to be unbounded, as conjectured by Hu and Hu [21]). Thus, Pocock and Simon's minimization method performs worse than D A -BCD and stratified randomization with respect to both loss and predictability, namely P&S rule is dominated in the sense of Atkinson [5,6]. Only in the absence of interaction P&S method performs very well in terms of loss, which vanishes asymptotically, while both RD-BCD and D A -BCD induce a higher loss, having however a negligible impact in terms of inferential efficiency (the expected loss of efficiency is of order 10 −3 ). In other words, from the viewpoint of loss of inferential precision, the performances of P&S rule are strictly related to the model assumptions, in particular on the presence/absence of interactions among covariates, while its level of predictability, that clearly increases as the value of the bias coin probability grows, is always quite large and greater than 1/2. Therefore, our suggestion is that minimization methods should be adopted only i) for clinical trials characterized by strong double-blind protocols, ii) when the adopted linear model does not contain interactions among covariates and iii) if the number of strata -i.e., the number of covariates and their levels -is large if compared with the available sample size (in this case stratified randomization rules cannot suitably evolve, since the number of subjects in every stratum tends to be extremely small, but this situation is anyhow critical from an estimation viewpoint, due to the magnitude of the standard errors of the estimates). Also Hu&Hu's rule should be employed only in the case of strong double-blind protocols, especially for large sample sizes, due to the asymptotic convergence of the loss to 0. In all the other situations, stratified randomization methods -and in particular the RD-BCD -can guarantee superior performances regardless of the model assumptions.

Appendix A: Proof of Theorem 3.1
In this proof we firstly show that, under CA randomization, the evolution of a suitable transformation of the treatment allocation proportions can be fitted by a multivariate SA algorithm; therefore, we derive the asymptotic normality for π n by applying the so-called stochastic differential equation method (see [24]).