Statistical properties of two-color randomly reinforced urn design targeting ﬁxed allocations

: This paper deals with the statistical properties of a response adaptive design, described in terms of a two colors urn model, targeting prespeciﬁed asymptotic allocations. Results on the rate of divergence of number of patients assigned to each treatment are proved as well as on the asymptotic behavior of the urn composition. Suitable statistics are intro- duced and studied to test the hypothesis on treatments’ diﬀerence.


Introduction
In this paper we focus on studying statistical properties of a response-adaptive design, described in terms of two-color urn model, able to target any fixed asymptotic allocation probability. The model considered in this work is the Modified Randomly Reinforced Urn (MRRU) introduced and studied in [4]. The generality of the mathematical setting allows this experimental design to be applied to a broad set of areas of applications. However, since urn models are usually adopted to compare two or more competitive treatments, this work will be illustrated within a clinical trial framework. In this context, adaptive designs are attractive because they aim to achieve two simultaneous goals, concerning both statistical and ethical points of view: (a) collecting evidence to determine the superior treatment, and (b) increasing the allocation of units to the superior treatment. For a complete literature review on response adaptive designs see [18] and [28]. Urn models are some of the most attractive adaptive designs, since they guarantee the randomization of allocations [28]. Asymptotic results concerning urn models with an irreducible mean reinforcement matrix could be found in [5,6,8,20] and [28]. This irreducibility assumption is not satisfied, for example, by the Randomly Reinforced Urn (RRU) studied in [22,26,27] that is described by a diagonal mean replacement matrix. The RRU models were introduced by [10] for binary responses, applied to the dose-finding problems in [11,12] and then extended to the case of continuous responses in [7,26]. In these models, an urn is sequentially sampled and virtually reinforced by adding a random quantity of balls that depends on the response to the treatment associated to the sampled color. For instance, among these models we have the generalized Polya urn models with different reinforcement means. RRU designs have been usually adopted to compare competing treatments in a clinical trial framework, when the main goal is to minimize the number of subjects assigned to the inferior treatment. In fact, an interesting property concerning RRU models is that the probability to allocate units to the superior treatment converges to one as the sample size increases. However, because of this asymptotic behavior, RRU models are not in the large class of designs targeting a fixed proportion η ∈ (0, 1), that usually is chosen to satisfy some optimal criteria. Hence, all the asymptotic desirable properties concerning these procedures presented in literature (see for instance in [24] and [25], are not straightforwardly fulfilled by the RRU designs. Moreover, the asymptotic behavior of RRU design presents other drawbacks, that are relevant for the inferential phase of the trial. For large samples, RRU designs generate treatment groups with very different sample sizes. Hence, inferential procedures based on these designs are usually characterized by a very low power. For these reasons, in [4] the urn scheme of the RRU design has been opportunely changed, in order to construct a new urn model, called Modified Randomly Reinforced Urn design (MRRU), that asymptotically targets an allocation proportion η ∈ (0, 1), still minimizing the number of subjects allocated to the inferior treatment. Other papers have described urn models that can target any desired allocation. For instance, in [8] a general class of immigrated urn models with this feature is presented. In this paper, we provide some asymptotic results concerning reinforced urn models that in [8] are approached under very particular conditions.
In Section 2 we describe the MRRU model, which this work is based on. Visualize an urn containing balls of two colors (red,white) that is sequentially sampled. Each time, the extracted ball is reintroduced in the urn together with a random number of balls of the same color. To fix the notation we call µ R and µ W the laws of the random reinforcements of red and white balls, respectively, and m R , m W the corresponding means. Let us call X = (X n ) n∈N (X n ∈ {0, 1}, n = 1, 2, . . .) the sequence of the colors sampled by the urn and Z = (Z n ) n∈N (Z n ∈ (0, 1), n = 0, 1, 2, . . .) the sequence of urn proportions before each draw. We report the main result proved in [4], concerning the almost sure convergence of the process (Z n ) n∈N to a fixed parameter η ∈ (0, 1), whenever the means of the reinforcements' distributions are different. We prove that the proportion of colors sampled by the urn converges to the same limit of the urn composition. Since this proportion represents also the proportion of patients assigned to treatments, we are able to rule the asymptotic patient's allocation.
Section 3 is focused on the rate of convergence of the process (Z n ) n∈N in the MRRU model. Important results on the asymptotic behavior of the urn proportion (Z n ) n∈N for a RRU model were developed in [13], in the case of reinforcements with different expected values. In [13] it was proved that the rate of convergence of the process (Z n ) n∈N to one (i.e. its limit in the case m R > m W ) is equal to 1/n γ (with γ = 1 − mW mR < 1). Moreover, the quantity n γ (1 − Z n ) converges almost surely to a positive random variable, whose behavior has been studied in [19] and [23]. In Theorem 3.1 of this paper it is proved that the rate of convergence of the process (Z n ) n∈N to its limit η ∈ (0, 1) is 1/n for the MRRU model. This asymptotic result was achieved after defining a particular Markov process denoted (T n ) n∈N , based on the quantities that rule the urn process. The study of stochastic properties of the processT n (see Appendix and Proposition 3.1) has been crucial for proving Theorem 3.1. Moreover, Theorem 3.1 shows that the sequence n(η − Z n ) converges in distribution to a real random variable, whose probability law is related to the unique invariant distribution π of the process (T n ) n∈N . Section 4 is devoted to the inferential properties of the design described in Section 2. We deal with a classical framework testing the null hypothesis that reinforcement's means are equal (m R = m W ) against the one-side alternative hypothesis (m R > m W ). We consider different statistical tests, based either (a) on adaptive estimators of the unknown means or (b) on the urn proportion. Under the null hypothesis, the asymptotic behavior of statistics of type (a) has been studied in many works (see for instance [25] and the bibliography therein) for adaptive designs with target allocation η ∈ (0, 1) and in [13] for RRU designs. On the other side, asymptotic properties of statistics of type (b) in a RRU design were investigated in [1,2,3]. However, under the null hypothesis the asymptotic distribution of the urn proportion's limit is still unknown, except in a few particular cases. Under the alternative hypothesis the behavior of statistics based on adaptive estimators of the unknown parameters has been investigated, for instance, in [29,18] for adaptive designs with target allocation η ∈ (0, 1). For RRU designs, the asymptotic properties of both types of statistics have been studied in [13]. We compare statistical properties of tests based on RRU design and tests based on the MRRU design.
In Section 5 we illustrate some simulations studies on the probability distribution π and on the statistical properties of the tests introduced in Section 4. Section 7 contains a final discussion and concludes the paper. To ease the comprehension the most technical proofs are postponed in Appendix.

The modified randomly reinforced urn design
Consider a clinical trial with two competitive treatments, say R and W . In this section we describe a response adaptive design, presented as an urn model, able to target any fixed asymptotic allocation. This model called MRRU, introduced in [4], is a modified version of the RRU design studied in [26]. In both the cases the reinforcements are modeled as random variables following different probability distributions. In the MRRU model we modify the reinforcement scheme of the urn to asymptotically target an optimal allocation proportion. The term target refers to the limit of the urn proportion process. Let us consider two probability distributions µ R and µ W with support contained in [α R , β R ] and [α W , β W ] respectively, where 0 < α R ≤ β R < +∞ and 0 < α W ≤ β W < +∞. Let (U n ) n∈N be a sequence of independent uniform random variables on (0, 1). We interpret µ R and µ W as the laws of the responses to treatment R and W , respectively. We assume that both the means m R = βR αR xµ R (dx) and m W = βW αW xµ W (dx) are strictly positive. Moreover, Assumption 2.1. At least one of these two conditions is satisfied: the measure µ W is absolutely continuous with respect the Lebesgue measure and the derivative is strictly positive, i.e. ∃ µW (dx) Consider an urn initially containing r 0 balls of color R and w 0 balls of color W . Set At time n = 1, a ball is sampled from the urn; its color is X 1 = 1 [0,Z0] (U 1 ), a random variable with Bernoulli(Z 0 ) distribution. Let M 1 and N 1 be two independent random variables with distribution µ R and µ W , respectively; assume that X 1 , M 1 and N 1 are independent. Next, if the sampled ball is R, it is replaced in the urn together with X 1 M 1 balls of the same color if Z 0 < η, where η ∈ (0, 1) is a suitable parameter, otherwise the urn composition does not change; if the sampled ball is W , it is replaced in the urn together with (1 − X 1 )N 1 balls of the same color if Z 0 > δ, where δ < η ∈ (0, 1) is a suitable parameter, otherwise the urn composition does not change. So we can update the urn composition in the following way Now iterate this sampling scheme forever. Thus, at time n + 1, given the sigmafield F n generated by X 1 , . . . , X n , M 1 , . . . , M n and N 1 , . . . , N n , let X n+1 = 1 [0,Zn] (U n+1 ) be a Bernoulli(Z n ) random variable and, independently of F n and X n+1 , assume that M n+1 and N n+1 are two independent random variables with distribution µ R and µ W , respectively. Set We thus generate an infinite sequence X = (X n , n = 1, 2, . . .) of Bernoulli random variables, with X n representing the color of the ball sampled from the urn at time n, and a process (Z, D) = ((Z n , D n ), n = 0, 1, 2, . . .) with values in [0, 1] × (0, ∞), where D n represents the total number of balls in the urn before it is sampled for the (n + 1)-th time, and Z n is the proportion of balls of color R; we call X the process of colors generated by the urn while (Z, D) is the process of its compositions. Let us observe that the process (Z, D) is a Markov sequence with respect to the filtration F n . In [4] it was proved that the sequence of proportions Z = (Z n , n = 0, 1, 2, . . .) of the urn process converges almost surely to the following limit Since the urn proportion Z n−1 represents the conditional probability of assign the subject n to treatment R, this result shows that the target allocation depends on which is the superior treatment. The parameter δ will represent the desired limit when W is the superior treatment (m R < m W ), while η will be the desired limit when R is the superior treatment (m R > m W ). The dichotomy among the possible limits 0 − 1 in the RRU designs turns to the dichotomy among δ − η in the MRRU design. The parameters δ and η can be arbitrarily fixed by the experimenter, either to assign asymptotically a small proportion of subjects to the inferior treatment or to balance the allocations. In [16] is studied a way to set δ and η, in order to improve the statistical performances of tests based on the trial.
In this paper we study the urn process under the hypothesis m R > m W , since the case m R < m W is specular. Let us notice that in this case P (Z n < δ, i.o.) = 0; then, since we will deal with asymptotic results, from now on we can assume without loss of generality δ = 0.
In this section we study some interesting features of the urn process. The first result concerns the proportion of colors sampled from the urn. Here we prove that it converges to the same limit of the urn proportion Z n .
Proof. Let us denote ξ n = Zn−1−Xn n for any n ≥ 1, with ξ 0 = 0. Then, (ξ n ) n∈N is a sequence of random variables adapted with respect to the filtration (F n ) n∈N by using Kronecker's lemma, and so where the first term goes to zero thanks to the Toeplitz Lemma, since Z n converge to η almost surely.
The following proposition shows the rate of divergence of the total number of balls in the urn. The sequence (D n /n, n = 0, 1, 2, . . .) converges almost surely to the mean of the inferior treatment.
where the almost sure convergence to zero of the last term can be proved with the same arguments used to prove Proposition 2.1. This result implies that

A. Ghiglietti and A. M. Paganoni
Since Z n a.s.
→ η, we get Globally we obtain Notice that in a RRU model the sequence D n /n converges almost surely to the mean of the superior treatment. In fact, in a RRU model, when on a set of probability one. The result (2.6) is proved following the same arguments of (2.5).
Here, we show that the proportion of times the urn proportion Z n is under the limit η converges almost surely to a quantity that depends only on the reinforcements' means m R and m W .
To prove Proposition 2.3 we need the following lemma where the almost surely convergence to zero of the last terms can be proved with the same arguments used to prove Proposition 2.1. Moreover this result implies (2.8) due to the fact that n i=1 1 {Z i <η} n cannot be asymptotically closed to zero. This fact can be proved by contradiction: suppose that We have that on a set of probability one. This contradicts the assumption (2.9).
Remark 2.2. By following the same arguments used to prove Proposition 2.1 and Lemma 2.1 it can be proved also that Proof of the Proposition 2.3. Let us observe that on a set of probability one where the last equality is based on the result of Lemma 2.1. Finally, we note that the equality (2.11) holds if and only if

Asymptotic results
We want to study the asymptotic behavior of the quantity n · (η − Z n ). To do this, let us introduce a real stochastic process (T n ) n∈N , whose features depend on the random variables ruling the urn process: The process (Z n , T n ) n∈N is an homogeneous Markov sequence. Then, there exists the transition probability kernel K for the process T n such that for any The analytic form of the transition probability kernel is the following If the probability measures µ R and µ W are absolutely continuous with respect to the Lebesgue measure, we can write as well and f W (·) are the Radon Nikodym derivatives of the measures µ R and µ W with respect to the Lebesge measure.
Since the marginal process T n needs to be coupled with the process Z n to obtain a Markov bivariate process (T n , Z n ), the application of many results on Markov processes in the case of continuous state space it's not straightforward. Then, we define a new auxiliary processT n strictly related to T n , in this way: Bernoulli random variables of parameter η independent of the sequences (M n ) n∈N and (N n ) n∈N . It's easy to see thatT n is a Markov process. In fact, the transition kernel K η ofT n is independent of the quantity z 0 Using Assumption 2.1 we can prove (see Appendix) that the Markov process T n is an aperiodic recurrent Harris chain. So, the following holds: Proposition 3.1. Let call π the stationary distribution of the recurrent aperiodic Harris ChainT = (T n ) n∈N . Then, for every t 0 ∈ R, we have that Proof. The Markov processT n is a recurrent aperiodic Harris Chain (see Appendix). This result implies that there exists a unique invariant distribution probability π and (3.6) holds for any t 0 such that The thesis is proved since (3.7) holds for any t 0 ∈ R (see Appendix).
Now, we can state the main result where ψ is a real random variable with probability distribution π.
Proof. Using equation (3.2), Proposition 2.2 and Slutsky's theorem we have that it's sufficient to prove that T n L → ψ, where ψ is a real random variable with probability distribution π. Notice that for any interval C ⊂ R From the Proposition 3.1 we have that the second term converges to zero as long as n goes to infinity. Then, to prove the thesis we have to study the first term.
Let us take α, β ∈ R + such that α 0 < α < β < β 0 ; then, let us introduce the set and the probability measure Then, it is easy to see that there exists a sequence of positive numbers (ǫ zn ) n∈N such that, if t 0 ∈ A, then K zn (t 0 , C) ≥ ǫ zn ρ(C) ∀n ∈ N. By following the same procedure adopted in the proof of Proposition A.1, a possible choice for the terms of the sequence is Since the sequence Z n is strictly less than one and converges to η almost surely, we have that ǫ := inf n∈N {ǫ zn } > 0. Besides, it is trivial to see that K η (t 0 , C) ≥ ǫρ(C), because P ( Then, let us construct two sequences of stopping times Naturally, the times (τ i ) i∈N are all almost surely finite because the processT n is a recurrent Harris chain. It is easy to show that also the times (τ i ) i∈N are almost surely finite. The procedure to prove the recurrence of the process T n it's analogous to the one used for the processT n .
Let us imagine that when the process (either T n orT n ) is in the set A, we flip a Bernoulli with parameter ǫ: if it comes up one, the process evolves by using the probability law ρ(dt); otherwise, if it comes up zero, the process moves according to the modified transition kernel The sequences ξ n andξ n represent the outcomes of the Bernoulli trials when the process is in A. Let us denote as λ τi andλτ i the probability measures of the random variables T τi andTτ i respectively, when both the processes start from the same initial point t 0 ∈ R. Hence, we have that for any C ∈ B(R).
By comparing the transition kernels of the processes T n andT n we have that for any ω n ≥ |zn−η| min{η;1−η} . Therefore, since Z n converge to η a.s., there exists a sequence (ω n ) n∈N , going to zero as n goes to infinity, such that for any t 0 ∈ R For any integer k, n, n 0 ∈ N, any t 0 , s 0 ∈ R and any set C ∈ B(R), we have Now, let us define the quantities S and Q as follows

A. Ghiglietti and A. M. Paganoni
By using (i), (ii) and (3.9), we obtain Therefore, we can prove that, for every k, n 0 ∈ N, Let define the stopping time where the second term converges to zero if we let m = m n goes to infinity as n increase, since P (T n ∈ C|T 0 = t 0 ) is a Cauchy sequence.

Testing hypothesis
In this section we focus on the inferential properties of the MRRU design. Let us introduce the classical hypothesis test aiming at comparing the means of two distributions µ R , µ W : We approach to the statistical problem (4.1) considering first a no-adaptive design, and then the MRRU model. Let (M n ) n∈N and (N n ) n∈N be i.i.d. sequences of random variables with distribution µ R and µ W , respectively. For a fixed design with sample sizes n R and n W , the usual test statistics is where M nR and N nW are the sample means and s 2 R and s 2 W are consistent estimators of the variances. When the no-adaptive design allows both the sample sizes n R and n W go to infinity, by the central limit theorem we have that, under the null hypothesis, ζ 0 converges in distribution to a standard normal variable. Then, fixing a significance level α ∈ (0, 1), we define as the critical region of asymptotic level α, where z α is the α-percentage point of the standard Gaussian distribution. Now, let us assume that the rate of divergence of the sample sizes is such that nR nR+nW → η, for some η ∈ (0, 1). Then, the power of the test defined in (4.3) can be approximated, for large n R and n W , as where Z is a Gaussian standard random variable. Now, let us consider an adaptive design described in term of an urn model. Let us denote N R (n) and N W (n) as the sample sizes after the firsts n draws, M (n) and N (n) the corresponding sample means and s 2 R (n) and s 2 W (n) the adaptive consistent estimators. Plugging in (4.2) the corresponding adaptive quantities, we obtain the statistics Using Proposition 3.1 of [4] and Slutsky's Theorem, it can be proved that for the MRRU model, when m R = m W , the statistics ζ 0 (n) converges to a standard normal variable. Hence, the critical region (4.3) still defines a test of asymptotic level α. Moreover, calling η the limit of the urn proportion Z n under the alternative hypothesis, the power of the test defined in (4.3) can be approximated, for large n, as (4.4).
Remark 4.1. The behavior of the statistics ζ 0 defined in (4.5) in the case of RRU model was studied in [13]. In that paper, the asymptotic normality of ζ 0 (n) under the null hypothesis was proved; then (4.3) defines a test of asymptotic level α also in the RRU case. However, under the alternative hypothesis ζ 0 (n) converges to a mixture of Gaussian distributions, where the mixing variable ϕ 2 is a strictly positive random variable such that Therefore, it follows that in the RRU case the power of the test defined in (4.3) can be approximated, for large n, as where Z is a Gaussian standard random variable independent of ϕ.

Remark 4.2. Let us rewrite the power of the test defined in (4.3) as follows
Let us notice that, γ n represents the part in (4.8) that depends on the particular adaptive design rules the trial. When the RRU design is used, the (4.6) allows us to approximate the quantity γ n as that diverges as n goes to infinity. In the same way, when the MRRU design is applied, we can approximate γ n as that converges to a constant. Therefore, when both MRRU and RRU designs are applied with the same sample size n, and n is large enough, the power of the test (4.3) using MRRU design is greater then the one obtained using RRU design.
A different test statistics based on the urn proportion of a RRU model has been investigated in [14,15]. Let us denote as c defines a test asymptotically of level α. As explained in [15], the power of this test can be approximated, for large n, as where ϕ 2 is the random quantity defined in (4.6). Now, we consider the statistics Z n as the urn proportion of a MRRU model, with parameters δ and η. Let us denote as c test {Z n > c (δ,η) α } can be approximated, for large n, as where ψ is the random quantity defined in Theorem 3.1.

Simulation study
This section is devoted to present the simulation studies aimed at exploring the asymptotic behavior of the urn proportion Z n . In this section, all the urns are simulated with the following parameters: δ = 0.2 and η = 0.8. Further studies based on changing the values of δ or η can be of great interest, but this is not the main purpose of the paper. First, we focus on supporting the convergence result proved in Theorem 3.1. The reinforcement distributions µ R and µ W are chosen to be Gaussian, with means set to m R = 10 and m W = 5 respectively. The variances are assumed to be equal and fixed at σ 2 R = σ 2 W = 1. Theorem 3.1 shows that, when m R > m W , the quantity n(η − Z n )m W converges in distribution to a random variable ψ, whose probability law is π. Through some simulations, we compute the empirical distribution of n(η − Z n )m W for n = 10 2 and n = 10 4 . The corresponding histograms are presented in Figure 1.
In proposition 3.1 it was proved that the probability measure π is the unique invariant distribution of the process (T n ) n∈N . This means that π is the unique solution of the functional equation where K η is the transition kernel of the processT n defined in (3.5). Taking the discrete version of (5.1) we compute the density of the measure π, which is superimposed on both the histograms in Figure 1. The quite perfect agreement between the empirical distribution of n(η − Z n )m W and the discrete estimation of π gave to the authors the impetus to prove the convergence result described in Theorem 3.1.
Then the simulation study encouraged the authors to prove some further theoretical results. The first we present is related to an easy expression of a quantile of the probability law of ψ. In general, the asymptotic distribution of the quantity n(η − Z n ) depends on the value η and on the reinforcements distributions µ R and µ W . Nevertheless, the following proposition state that 0 is always the mW mR -percentage point of the distribution π, regardless η or the types of distributions involved.
Proof. Since P (Z n < η) = P (T n > 0) we know that P (Z n < η) is a convergent sequence. In particular Therefore, by using the dominated convergence theorem, the Toeplitz Lemma and Proposition 2.3, we obtain Another interesting result, that came out from the simulation analysis, concerns the correspondence between the asymptotic distribution of Z n and a linear transformation of the reinforcement laws. This property is explained in the following proposition Proposition 5.2. Let Z n and Z n be the urn proportions of two MRRU models with reinforcements distributions (µ R , µ W ) and ( µ R , µ W ) respectively. Assume that there exists c > 0 such that, for any a, b ∈ R with a < b i.e. M n L = c · M n and N n L = c · N n for any n ∈ N. Then, for any a, b ∈ R with a < b, we have π( (a, b) ) = π( (c · a, c · b) ) (5.4) i.e. ψ L = c · ψ.

Statistical properties of a MRRU design 725
Proof. Let us call the initial compositions of the two urn processes as (r 0 , w 0 ) and ( r 0 , w 0 ). The proof will be based on the particular choice r 0 = c · r 0 and w 0 = c · w 0 . However, since from Proposition 3.1 the invariant distribution π is independent of the initial composition, the generality of the result still holds. For any n ≥ 1, by conditioning to the event {( T n , Z n ) = (c · T n , Z n )}, we have that For ease of notation, let us denote λ (Tn,Zn) and λ ( Tn, Zn) as the bivariate laws of the couple of random variables (T n , Z n ) and ( T n , Z n ) respectively. Then, let us notice that the equivalence of the initial compositions of the two processes Z n and Z n implies that the event {( T 0 , Z 0 ) = (c · T 0 , Z 0 )} has probability one. Hence, for any n ≥ 1, we have The thesis is proved since the equivalence λ ( Tn, Zn) = λ (c·Tn,Zn) implies that π = π.
The assumption (5.3) implies also that m R = c · m R and m W = c · m W . Then, from Theorem 3.1 we deduce the equivalence between the asymptotic laws of Z n and Z n . Propositions 5.1 and 5.2 suggest that urn processes with the same reinforcement means ratio present a similar asymptotic behavior. For this reason, we prefer to use the ratio mR mW as parameter measuring the means' distance, instead of the usual mean difference m R − m W .
Finally we present some simulations concerning the hypothesis test (4.1). In particular, we focus on comparing the power of the tests defined in (4.9) and (4.11). The empirical power is computed using n = 10 4 subject, in correspondence of different values of the ratio mR mW . The empirical power functions are reported in Figure 2. As shown in Figure 2, the MRRU design constructs a test more powerful then the one based on the RRU design with the same sample size, for any choice of the reinforcement means. Although this property makes the MRRU design very attractive, the RRU model has the advantage that, with the same sample size, it allocates less subject to the inferior treatment. Hence, what is really interesting is studing the power functions of the tests (4.9) and (4.11), in correspondence of a different values of N W , i.e. the number of subjects assigned to the inferior treatment. We compute the empirical power functions for N W = 20, 50, 100, 500 and we report the graphics in Figure 3.
By inspection of Figure 3 we can conclude that for high values of mR mW the power of the tests (4.9) and (4.11) are very similar. When the ratio mR mW is small the power of the test based on MRRU design seems to be considerable greater, for any value of N W .

A case study
In this section we show a case study that aims at comparing two different treatments. In particular, we conduct the analysis following both the subjects allocation strategy of a RRU model and of a MRRU model. Our data consist in treatment times of patients affected by ST-Elevation Myocardial Infarction. The main rescue procedure for these patients is the Primary Angioplasty. It is well known that to improve the outcome of patients and reduce the in-hospital mortality the time between the arrival at ER (called Door) and the time of intervention (called Baloon) must be reduced as much as possible. So our treatment's response is represented by the Door to Baloon time (DB). We distinguish two treatments: the patients managed by the 118 (toll-free number for emergency in Italy) and the self presented ones. We design our experiment to allocate the majority of patients to treatment performing better, and simultaneously collect evidence in comparing the means of DB time distributions. The dataset gathers data concerning 1179 patients. Among them, 657 subjects have been managed by 118, while the others 522 subjects reached the hospital by themselves. We identify the treatment W with the choice of calling 118 and the treatment R with choice of going to the hospital by themselves. Treatment responses are represented by DB times (in minutes). Since the lower are the responses (DB time) the better is the treatment, without loss of generality we transform the responses through a monotonic decreasing function. The means of treatments R and W have been estimated using all data, obtaining: m R = 1.503, m W = 1.996. The true difference of the means ∆ = m R − m W = −0.493 is negative, so W is the best treatment in this case.
We consider the following one-sided hypothesis test The statistics ζ 0 , defined in (4.2), has been used to construct the critical region (6.1): R α = {ζ 0 < −z α }, where z α is the 1 − α quantile of the stan- dard normal distribution (level α set to 0.05). For both urn designs (RRU and MRRU), and for different values of sample size n, we realized 5000 simulation runs of the urn procedure to compute the empirical power of the test. Each replication uses a subset of responses selected by permutation from the whole dataset. The results are depicted in Figure 4. Notice that the MRRU design requires a smaller sample size to achieve any power than the RRU design.

Conclusions
In the present work, we have completed the study of asymptotic statistical properties of the MRRU design, a response adaptive design, expressed in term of a randomly reinforced urn model, able to target asymptotically any prespecified allocation. This urn design overcomes the difficulties faced by the RRU design whose asymptotic allocation degenerates to the singular values 0 or 1. Nevertheless we are able to obtain also in this case the rate of convergence of the urn proportion to its limit. So doing we can construct suitable asymptotic hypothesis tests of treatment's difference and make a comparison of the performance of this design with the RRU one in term of statistical efficiency. There are a lot of interesting open problems whose solution could help in the research on optimal randomized adaptive designs; in particular, further studies based on changing the values of the parameters δ and η can contribute to explore the possibilities offered by the MRRU design. As ongoing work, we are currently studying the asymptotic properties of the urn process when δ and η are defined as timedependent function of some unknown parameters modeling the reinforcements distributions, and their adaptive estimators are used to update the estimates of δ and η adopted in the urn procedure. Although the formal study of the extension of his MRRU model to a multi-treatment setting is outside the aim of this work, the main results of this paper may be extended to the case of urn composed by an arbitrary number of colors. When there is a unique superior treatment, the asymptotic behavior of the dominant color may be studied by considering a two-color urn design, whose reinforcement distribution of the inferior color is modeled as a mixture of all the distributions of the inferior treatments. In this case the extension is straightforward.

Appendix
In the following we assume, without loss of generality, that condition (a) of the Assumption 2.1 is satisfied; the symmetric case (b) is straightforward.
Proof. Let us take α, β ∈ R + such that α 0 < α < β < β 0 . At first, notice that if t ∈ (t 0 + αη, t 0 + βη), then since t−t0 η ∈ (α, β). For the same reason, for any k ∈ N, we have that if t ∈ (t 0 + kαη, t 0 + kβη), then Let us introduce the sequence of sets (A k ) k such that for k ≥ 1. Then, for any n ∈ N, we have that if where we choose Therefore, a sufficient condition for P ( so the thesis holds for anyt ≥ t 0 + [ β β−α ]αη. Proposition A.1. The Markov processT = (T n ) n∈N on the state space R is a Harris Chain.
Proof. Let us start reminding that the Markov processT n on the state space R is a Harris chain if there exist A, B ⊂ R, a constant ǫ > 0 and a probability measure ρ with ρ(B) = 1, such that (a) If τ A := inf {n ≥ 0 :T n ∈ A}, then P (τ A < ∞ |T 0 = t 0 ) > 0 for any t 0 ∈ R. (b) If t 0 ∈ A and C ⊂ B, then K η (t 0 , C) ≥ ǫρ(C).
• Second case: We fixt ≥ t 0 + [ β β−α ]αη and we definen ∈ N, I ⊂ R as follows Fixingt ∈ I, we have from the previous lemma that for every ζ > 0 sincet ≥n(1 − η)x 0 ≥t. Then, let fix ζ small enough, such thatt + ζ ∈ I. Let n := inf n ≥ 1 : We can write We have already proved that the second term of this product is strictly positive, so we focus on the first term. Let us call • Third case: t 0 < 0 We fixt ≥ max{t 0 + [ β β−α ]αη; 0} and then we follow the same strategy used in the second case (t 0 > (β − α)η).
Let us prove the condition (b) Let and the probability measure for any set C ⊂ B. For every t 0 ∈ A,

A. Ghiglietti and A. M. Paganoni
In what follows, for any interval I ⊂ R, we will refer to (τ I i ) i as the sequence of stopping times For ease of notation, we will denote τ I as τ I 1 . Proposition A.2. The Harris chainT = (T n ) n∈N on the state space R is recurrent.
Proof. Let us remind thatT n is recurrent if P (τ A < ∞ |T 0 ∈ A) = 1, for any initial probability distributionλ 0 , where τ A := inf {n ≥ 1 :T n ∈ A}. In particular, we are able to prove a stronger property, that is P (τ A < ∞ |T 0 = t 0 ) = 1 for any t 0 ∈ R, which implies the condition we need. Let • I be the closed interval defined as • c be the constant defined as c := min t∈I P τ A < ∞ |T 0 = t c is strictly positive because, the processT n is an Harris chain and so P (τ A < ∞ |T 0 = t 0 ) > 0 ∀t 0 ∈ R, •ñ be the integer defined as n := inf n ≥ 1 : min Now, we focus on proving that the stopping times (τ I i ) i are almost surely finite: (a) First case: t 0 ∈ (0, ∞) Looking at the transition kernels (3.3) and (3.5) of the processes T n andT n respectively, we note that for any t 0 ∈ (0, ∞), P (T 1 ≤ T 1 |T 0 = T 0 = t 0 ) = 1. This implies that Then, we have that where the passage fromT n to T n is due to the relation (A.3) and the latest probability is equal to zero because P (T n < 0 i.o. | T 0 = t 0 ) = P (Z n > η i.o. | T 0 = t 0 ) = 1 for any t 0 ∈ R. since from the case (a) we have that ∀t 0 > 0, P (τ I = ∞ |T 0 = t 0 ) = 0. Therefore, we conclude that P ( ∞ i=1 τ I i < ∞ |T 0 = t 0 ) = 1, which means (τ I i ) i is sequence of stopping times almost surely finite.
Therefore, for any t 0 ∈ R we have that and so the thesis is proved.
Proposition A.3. The recurrent Harris ChainT = (T n ) n∈N on the state space R is aperiodic.
Proof. The recurrent Harris chainT n is aperiodic if there exists n 0 ∈ N such that P (T n ∈ A |T 0 ∈ A) > 0, for any integer n ≥ n 0 and for any distribution lawλ 0 onT 0 . Let define the stopping time τ This stopping time is almost surely finite. In fact, since P (τ (−∞,0) < ∞|T 0 = t 0 ) = 1 for any t 0 ∈ R, we have that Hence, there exists n 0 ∈ N such that P (τ A − = n 0 |T 0 ∈ A) > 0. We notice also that Then, for every n ≥ n 0 , we have P T n ∈ A |T 0 ∈ A ≥ P τ A − = n |T 0 ∈ A ≥ η n−n0 · P τ A − = n 0 |T 0 ∈ A > 0 and so the thesis is proved.