Refined Asymptotics for the Composition of Cyclic Urns

A cyclic urn is an urn model for balls of types $0,\ldots,m-1$. The urn starts at time zero with an initial configuration. Then, in each time step, first a ball is drawn from the urn uniformly and independently from the past. If its type is $j$, it is then returned to the urn together with a new ball of type $j+1 \mod m$. The case $m=2$ is the well-known Friedman urn. The composition vector, i.e., the vector of the numbers of balls of each type after $n$ steps is, after normalization, known to be asymptotically normal for $2\le m\le 6$. For $m\ge 7$ the normalized composition vector is known not to converge. However, there is an almost sure approximation by a periodic random vector. In the present paper the asymptotic fluctuations around this periodic random vector are identified. We show that these fluctuations are asymptotically normal for all $7\le m\le 12$. For $m\ge 13$ we also find asymptotically normal fluctuations when normalizing in a more refined way. These fluctuations are of maximal dimension $m-1$ only when $6$ does not divide $m$. For $m$ being a multiple of $6$ the fluctuations are supported by a two-dimensional subspace.


Introduction and result
A cyclic urn is an urn model with a fixed number m ≥ 2 of possible colors of balls which we call types 0, . . . , m − 1. We assume that initially there is one ball of an arbitrary type. In each step we draw a ball from the urn, uniformly from within the balls in the urn and independently of the history of the urn process. If its type is j ∈ {0, . . . , m − 1} it is placed back to the urn together with a new ball of type j + 1 mod m. These steps are iterated.
We denote by R n = (R n,0 , . . . , R n,m−1 ) t the (column) vector of the numbers of balls of each type after n steps when starting with one ball of type 0. Hence, we have R 0 = e 0 where e j denotes the j-th unit vector in R m , indexing the unit vectors by 0, . . . , m − 1. For fixed m ≥ 2 we denote the m-th elementary root of unity by ω := exp( 2πi m ). Furthermore, we set λ k := (ω k ) = cos 2πk m , µ k := (ω k ) = sin 2πk m , Note that v 0 = 1 m 1 := 1 m (1, 1, . . . , 1) t ∈ R m . The asymptotic distributional behavior of the sequence (R n ) n≥0 has been identified in Janson [9,10,11], see also Pouyanne [18,19] and, for the case m = 2, Freedman [7]. Janson developed a limit theory for the compositions of rather general urn schemes. For the cyclic urns he showed that the normalized composition vector R n converges in distribution towards a multivariate normal distribution for 2 ≤ m ≤ 6, whereas for m ≥ 7 there is no convergence by a conventionally standardized version of the R n due to subtle periodicities. Further, for m ≥ 7, there exists a complex valued random variable Ξ 1 (depending on m) such that almost surely, as n → ∞, we have We focus mainly on the periodic case m ≥ 7. In the present paper we study the fluctuations of n −λ 1 (R n − n+1 m 1) around the periodic sequence (2 (n iµ 1 Ξ 1 v 1 )) n≥0 . We call the differences in (2) residuals.
To formulate our results we denote by For v ∈ C m we write v * for the conjugate transpose of v and for z ∈ C,z denotes the complex conjugate of z. Furthermore, 6 | m and 6 m are short for 6 divides (resp. does not divide) m.
We distinguish the cases 6 | m and 6 m as follows.
Theorem 1.1. Let m ≥ 2 with 6 m and set r := (m − 1)/6 . Then, there exist complex valued random variables Ξ 1 , . . . , Ξ r such that, as n → ∞, we have The covariance matrix Σ (m) has rank m − 1 and is given by When 6 | m then the normalization requires an additional √ log n factor and the rank of the covariance matrix is reduced to 2: Theorem 1.2. Let m ≥ 2 with 6 | m and set r := (m − 1)/6 . Then, there exist complex valued random variables Ξ 1 , . . . , Ξ r such that, as n → ∞, we have The covariance matrix Σ (m) has rank 2 and is given by Note, that the sum r k=1 in Theorem 1.1 is empty for 2 ≤ m ≤ 5, also in Theorem 1.2 for m = 6. Hence, for 2 ≤ m ≤ 6 our Theorems reduce to the central limit laws of Janson [9,10,11]. For m ∈ {7, 8, 9, 10, 11} Theorem 1.1 shows that there is a direct normalization of the residuals which implies a multivariate central limit law (CLT). The case m = 12 also admits a multivariate CLT under a different scaling, see Theorem 1.2. For m > 12 the residuals cannot directly by normalized to obtain convergence. However, Theorems 1.1 and 1.2 describe refined residuals which satisfy a multivariate CLT for all m > 12. These can be considered as asymptotic expansions of the random variables R n .
The convergences in Theorems 1.1 and 1.2 also hold for all moments. For an expansion of E[R n ] see (5).
We conjecture Theorems 1.1 and 1.2 as being prototypical for a phenomenon to occur frequently in related random combinatorial structures. E.g., we expect similar behavior for other urn models with analog almost sure random periodic behavior, see Janson [10,Theorem 3.24], further for the size of random m-ary search trees, cf. [4], or for the number of leaves in random d-dimensional (point) quadtrees [3]. (For the latter two instances only the case of Theorem 1.1 is expected to occur.) The remainder of the present paper contains a proof of Theorems 1.1 and 1.
2. An outline of the proof is given in Section 2, where also the occurrence of the contributions (n ω k Ξ k v k ) in Theorems 1.1 and 1.2 is explained. Roughly, our proof combines a spectral decomposition of the residuals and estimates of their mixed moments with a recursive decomposition of the urn process and stochastic fixed-point arguments. In work in progress of the first mentioned author of the present paper also an alternative route via martingales is being explored. Within the details of the proofs of the present paper we make mildly use of martingales. However, we could also work out the whole proof without drawing back to any martingale which may provide a useful general technique for related applications where no martingales are available.
The results of this paper were announced in the extended abstract [15].

Explanation of the result and outline of the proof
In this section we explain our approach towards the proof of Theorems 1.1 and 1.2 and explain the occurrence of the summands (n ω k Ξ k v k ) and the normal fluctuation in these Theorems. We first recall known asymptotic behavior and a spectral decomposition of R n which are used subsequently. Then we state a more refined result on certain projections of residuals in Proposition 2.1 which directly implies Theorems 1.1 and 1.2. Finally, an outline of the proof of Proposition 2.1 is given. Technical steps and estimates are then carried out in Section 3. Throughout, we fix m ≥ 2.
For the cyclic urn with m colors we consider an initial configuration of one ball of type 0 and write R n for the composition vector after n steps. Its dynamics is summarized in the m × m replacement matrix where A ij indicates that after drawing a ball of type i it is placed back together with A ij balls of type j for all 0 ≤ i, j ≤ m − 1. The canonical filtration is given by the σ-fields F n = σ(R 0 , . . . , R n ) for n ≥ 0. The dynamics of the urn process imply the well-known almost sure relation Here, Id m denotes the m × m identity matrix and A t the transpose of A.
Note that v 0 has the direction of the drift vector 1 in Theorems 1.1 and 1.2 and, for m ≥ 7, the vector v 1 determines the direction of the a.s. periodic fluctuations around the drift there. By diagonalizing the matrices on the right hand side of (4) one finds an exact asymptotic expression for the mean of the R n , cf. [12,Lemma 6.7]. With these expressions imply the expansion, as n → ∞, It is also known that the variances and covariances of R n are of the order n 2λ 1 when m ≥ 7 with appropriate periodic prefactors. This explains the normalization n −λ 1 (R n − n+1 m 1) in (2). The analysis of the asymptotic distribution as stated in (2) has been done by different techniques (partly only in a weak sense), by embedding into continuous time multitype branching processes, by (more direct) use of martingale arguments, and by stochastic fixedpoint arguments, see [10,18,12].
For our further analysis we use a spectral decomposition of the process (R n ) n≥0 . This also leads to an explanation of the terms and fluctuations appearing in Theorems 1.1 and 1.2, see the comments after the proof of Theorem 1.2 in the present section.
We denote by π k the projection onto the eigenspace in C m spanned by v k for 0 ≤ k ≤ m−1. Hence, we have where u 0 , . . . , u m−1 denotes the basis dual to v 0 , . . . , v m−1 , as A is diagonizable. We have deterministically π 0 (R n ) = n+1 m 1. For the other projections π k (R n ) one has similar periodic behavior as for the composition vector R n , cf. (2), as long as we have λ k > 1 2 . We call the projections π k (R n ) large, if λ k > 1 2 , since their magnitudes have orders larger than √ n. Projections π k with λ k ≤ 1 2 we call small. For the large projections, i.e. for all k ≥ 1 with λ k > 1 2 , we set with an appropriate complex valued random variable Ξ k , defined as a martingale limit in (11), Section 3.1. The behaviour of the small projections π k (R n ) has already been determined, see [10,14]. For those k with λ k < 1 2 we have If m is even, then X n,m/2 := n −1/2 u m/2 (R n ) Finally, if 6 | m, then there is the pair λ m/6 = λ 5m/6 = 1 2 . In this case the scaling requires an additional √ log n factor. For k ∈ {m/6, 5m/6} we have We prove the convergence of the variances and covariances of all X n,k in Section 3.1. Set X n,0 := u 0 (R n − E[R n ]) = 0. Now, the X n,k are defined for all 0 ≤ k ≤ m − 1 and describe the normalized fluctuations of all the projections. For the small projections we already know that they are asymptotically normally distributed, see (7). As a main contribution of the present paper we show that the residuals of the large projections as normalized in (6) are also asymptotically normal. For each pair of complex conjugate eigenvalues it is sufficient to study only one of the two eigenspaces. Then, all these fluctuations are also asymptotically independent: For the case m 6 Proposition 2.1 holds as well. We just have no k with λ k = 1 2 . Thus the matrix corresponding to M m for the case m 6 does not have the block 1 2 Id 2 and, if m is odd, also the last block 1 3 is missing. Proposition 2.1 (and its version for m 6) directly imply Theorems 1.1 and 1.2: by Proposition 2.1 and the continuous mapping theorem, where Σ (m) is as in the statement of Theorem 1.1. The matrix mΣ (m) can be interpreted as the orthogonal projection onto span{v 1 , . . . , v m−1 }, hence its rank is m − 1.
Rearranging terms as in the proof of Theorem 1.1 we obtain can be interpreted as the orthogonal projection onto span{v m/6 , v 5m/6 }, hence its rank is 2.
The proofs of Theorems 1.1 and 1.2 via Proposition 2.1 indicate how the terms (n ω k Ξ k v k ) and the total fluctuation making up the normal limit occur, see also Figure 1: All eigenspaces with λ k > 1 2 (excluding the deterministic drift for λ k = 1) contribute two asymptotic components: First, there is the almost sure periodic component of order n λ k with a random periodic factor, periodic roughly in log n. Second, there is a normal fluctuation (in distribution) of order √ n. All eigenspaces with λ k < 1 2 have a contribution to the normal fluctuation of order √ n which is the visible order within these eigenspaces. For 6 | m there are eigenvalues with λ k = 1 2 . Our calculations show that in these two eigenspaces there is a normal fluctuation of order √ n log n. According to Proposition 2.1 all these fluctuations within the eigenspaces are asymptotically independent, which explains the overall asymptotic normal fluctuation. Since this normal fluctuation is of order √ n and √ n log n respectively all the almost sure periodic contributions from the eigenspaces with λ k > 1 2 are visible as well.
To prove Proposition 2.1 we first derive moments and mixed moments in Section 3.1 needed for the normalization. In Section 3.2 a pointwise recursive equation for the complex random variables Ξ 1 , . . . , Ξ r is obtained together with a recurrence for the sequence (R n ) n≥0 which extends to a recurrence for the residuals in (2) as well as to the residuals Z n of the projections of the R n , see equation (16) in Section 3.2. Equation (16) is then the starting point to show the convergence in Proposition 2.1. For this, a stochastic fixed-point argument in the context of the contraction method within the Zolotarev metric ζ 3 , see [17] for general reference, is used. Then, we draw back to an approach to bound the Zolotarev distance and some estimates from [16] where a related but simpler (univariate) problem was discussed.

Proof of Proposition 2.1
We start with estimates for the covariance matrix of the Z n appearing in Proposition 2.1 in section 3.1. In section 3.2 we derive the recurrence (16) for the Z n . The use of the Zolotarev metric ζ 3 requires a slightly modified version of recurrence (16). This is explained in section 3.3, see in particular the quantities N n in (20) which are the modified versions of the Z n . Then in section 3.4 asymptotics for the coefficients appearing in the recurrence (16) of Z n and N n respectively are derived. Based on these asymptotics finally in section 3.5 convergence of the N n is shown within the Zolotarev metric, which implies convergence in distribution of the Z n as stated in Proposition 2.1.

Convergence of the Covariance Matrix
As indicated in Section 2, we study the centered process (R n − E[R n ]) n≥0 via its spectral decomposition with respect to the orthogonal basis {v k : 0 ≤ k < m} of the unitary vector space C m , i.e.
where u k (w) := 1 ·w 0 + ω k ·w 1 + · · · + ω (m−1)k ·w m−1 . The evolution (4) of the process implies that there is a complex normalization M k,n := Γ(n + 1) that turns all the eigenspace coefficients, 0 ≤ k ≤ m − 1, into centered martingales. We set M k,0 := 0. Depending on λ k , these martingales are known to exhibit two different kinds of asymptotic behaviour, see [10,11,18]: For all k ∈ {0, . . . , m − 1} with λ k = ω k > 1/2, there exists a complex random variable Ξ k such that, as n → ∞, we have M k,n → Ξ k almost surely, where the convergence also holds in L p for every p ≥ 1. Note that the Ξ k in (11) are identical with the Ξ k appearing in (6) and in Theorems 1.1 and 1.2. The M k,n with λ k = ω k ≤ 1/2 are known to converge in distribution, after proper normalization, to normal limit laws.
From Section 3.2 on our analysis will also require to start the cyclic urn process with initially one ball of type j ∈ {0, . . . , m − 1}. The corresponding composition vector R [j] n is obtained in distribution by the relation with the replacement matrix A from (3) and where d = denotes equality in distribution. Similar to the identity (12), the corresponding martingales M k,n . Our subsequent analysis requires asymptotics of moments and of correlations between the u k (R n ). Exploiting the dynamics of the urn in (4), elementary calculations imply that: Proof. The first two identities immediately follow from (4). For (13), let k, ∈ {0, . . . , m − 1} and n ≥ 1 and note that, almost surely, Here, we use the abbreviation u k+ (R n−1 ) := u (k+ ) mod m (R n−1 ).

Remark 2. From (13) we obtain the mixed moments of the corresponding real and imaginary parts via the identities
From Lemma 3.1 we obtain the order of magnitude of the L 2 -distance of the residuals of the martingales (M k,n ) n≥0 with λ k > 1 2 . This is needed for the proper normalization of these residuals.

Lemma 3.2.
For k ≥ 1 such that 1/2 < λ k < 1 and Ξ k as in (11), as n → ∞, In particular, Proof. We show the claim for E |M k,n − Ξ k | 2 in an exemplary way. Here, we decompose The preceding calculations imply that the covariance matrix of Z n , see Proposition 2.1, converges as n → ∞. Its limit is given by M m defined in (9).

Embedding and Recursions
In this section we briefly explain how to derive an almost sure recurrence for the sequence (R n ) n≥0 which then extends to the projections. These recursive representations transfer to the martingale limits Ξ k and thus also to the components of Z n .
We embed the cyclic urn process into a random binary search tree generated by a sequence (U n ) n≥1 of i.i.d. random variables, where U := U 1 is uniformly distributed on [0, 1]. The random binary search tree starts with one external node at time 0, the so-called root. At time n = 1, the first key U is inserted in this external node, turning it into an internal node. The occupied node then gets two external nodes attached along a left and right branch. We then successively insert the following keys, where each key traverses the internal nodes starting at the root, which is occupied by U . Whenever the key traversing is less than the occupying key at a node it moves on to the left child of that node, otherwise to its right child. The first external node visited is occupied by the key, turning it into an internal node with two new external nodes attached. It is easy to see that in each step one of the external nodes is chosen uniformly at random (and independently of the previous choices) and replaced by one internal node with two new external nodes attached. See, e.g., Mahmoud [13], for a detailed description of random binary search trees.
The cyclic urn is now embedded into the evolution of the random binary search tree by labeling its external nodes by the types of the balls. The initial external node is labeled by type 0. Whenever an external node of type j ∈ {0, . . . , m − 1} is replaced by an internal node then its new left external node is labeled j (corresponding to returning the chosen ball of type j to the urn) and its new right external node is labeled (j + 1) mod m (corresponding to adding a new ball of type (j + 1) mod m to the urn). A related embedding was exploited in [12, Section 6.3], see also [2]. Note that the binary search tree starting with one external node labeled 0 decomposes into its left and right subtree starting with external nodes of types 0 and 1, respectively. The size (number of internal nodes) I n of the left subtree is uniformly distributed on {0, . . . , n − 1} and, conditional on U = u, it is binomial B n−1,u distributed, u ∈ (0, 1). This implies, with J n := n − 1 − I n , the recurrence where the sequences (R  ) n≥0 is a cyclic urn process started with one ball of type 0 at time 0. Now, applying the transformation and scaling (10) which turn R n into M k,n to the left and right hand side of (14), letting n → ∞ and using the convergence in (11) yields the following almost sure recursive equation for the Ξ k : k such that k are independent and Ξ (0) k and Ξ (1) k have the same distribution as Ξ k . Here, Here and subsequently, we make no use of the fact that the martingale limits Ξ k can also be written explicitly as deterministic functions of the limit of the random binary search tree when interpreting the evolution of the random binary search tree as a transient Markov chain and its limit as a random variable in the Markov chain's Doob-Martin boundary, see [6,8]. Following this path the Ξ k become a deterministic function of (U n ) n≥1 and from this representation the self-similarity relation (15) can be read off as well. See [1] for a related explicit construction.
Returning to Z n , we see that where σ 0 := σ 1 := Id m−1 and σ k : where the additional factor of √ log k is needed for the eigenspace m/6 (recall that λ m/6 = 1 2 ), the and the "error term" F n is made up of three components: Setting for ∈ {0, . . . , n − 1}, we have F n = F (G 1,n (I n )) (G 1,n (I n )) . . .
and F (2) n is given by the sum 

The Zolotarev metric
In the last subsection, we prepared for a proof of Proposition 2.1 that is based on the contraction method. To be more precise, the weak convergence in Proposition 2.1 is shown by (the stronger) convergence within the Zolotarev metric. The Zolotarev metric has been studied systematically in the context of distributional recurrences in [17]. We only give definitions of the relevant quantities and properties here. For x ∈ R d , we denote by x the standard Euclidean norm of x, and for B ∈ R d×d , B op denotes the corresponding operator norm. For random variables X and p ≥ 1, we denote by X p the L p -norm of X.
For two R d valued random variables X and Y we set We call a pair (X, Y ) ζ 3 -compatible if the expectation and the covariance matrix of X and Y coincide and if both X 3 , Y 3 < ∞. This implies that ζ 3 (X, Y ) < ∞. A basic property is that ζ 3 is (3, +)-ideal, i.e., for random vectors X, Y, Z, where Z is independent of X, Y and c > 0. For a linear transformation A of R d , we have The following lemma will be used in the proof of Proposition 2.1 and can be proved similarly to Lemma 2.1 in [16].
In order to work with the Zolotarev metric later, it is necessary to adjust the covariance matrix of Z n . I.e., we need to work with a sequence of random vectors that is sufficiently close to (Z n ) n≥0 and has fixed covariance matrix M m to guarantee the finiteness of the corresponding Zolotarev distances ζ 3 .
As noted in section 3.1, the covariance matrices (Cov(Z n )) n≥0 converge componentwise to M m , and M m is invertible. Thus, there exists n 0 ∈ N such that for all n ≥ n 0 , Cov(Z n ) is invertible. Defining Σ n is invertible for all n ≥ 0 and we see that Σ n Z n has covariance matrix M m for all n ≥ n 0 . We now set where the right hand side is a recursive decomposition of N n with coefficients

Preparatory Lemmata
In this section we collect some technical lemmata needed in the proof of Proposition 2.1 in the next section. We first look at the asymptotics of the coefficients arising in recursion (20).

Proof. The triangle inequality implies
We start by considering the first summand in the latter display. Denoting by B n−1,U a mixed binomial distribution with parameters n − 1 and U , we see that since I n , conditional on U = u, has the B n−1,u distribution. Employing the Marcinkiewickz-Zygmund inequality, there exists a constant C independent of u ∈ [0, 1] such that This implies In For the analysis of the second summand in (21), we also condition on U and write We divide the integral into two parts. For this purpose, define E u := {B n−1,u ≥ un e }. Chernoff's inequality implies that for 0 ≤ t < u(n − 1) We can now bound the expectation on E c u in the following way: On E u , we apply the mean value theorem to h 1 (1 + y) λ k − h 1 1 λ k ) with y = B n−1,u −nu nu . Note that (min{1, 1 + y}, max{1, 1 + y}) ⊂ [ 1 e , 1 u ] on E u and that |h 1 | is nonnegative and increasing on this interval. Thus, for some constant C k > 0. Combining these estimates, we obtain as n → ∞. This implies the assertion. .
We have (I n , U, Ξ k independent of (I n , U ). The triangle inequality implies by Lemma (3.6). Also, for n → ∞, as before. Now, the sequence ( Σ n op ) n≥0 is convergent and thus bounded, which implies the claim.
Finally, we use recursion (20) for N n to show that the sequence ( N n 3 ) n≥0 is bounded. Proof. Recall that the composition vector R n takes only finitely many values, the random variables Ξ k have finite absolute moments of arbitrary order, see (11), and Σ n op → 1. Hence, we have N n 3 < ∞ for all n ≥ 0. Recursion (20) implies that Set By Lemma 3.7, E b n 3 → 0 as n → ∞. Also, Now, by construction, Cov(N n ) = M m for all n ≥ n 0 , so max 0≤k≤n−1 N k 2 2 < K for some K > 0 and hence E Y (0) 2 Y (1) = O(1). The same applies to E Y (1) 2 Y (0) .
All other summands in (22) can be bounded using Hölder's inequality. Combining all these bounds leads to the estimate The asymptotics in Lemma 3.5 further imply Since β n ≥ 1, there exist J ∈ N and a constant 0 < E < ∞ such that for all n ≥ J, E N n 3 ≤ (9/10)β n−1 + E. Induction on n gives that for all n ≥ 0, E N n 3 ≤ max{β J , 10E}. This is sufficient, as here, convergence in the Zolotarev metric implies weak convergence and the difference Z n − N n tends to 0 almost surely. Also, note that N (0, M m ) is a solution to the distributional recursion where N (0) , N (1) and U are independent, U is uniform on [0, 1] and N (0) and N (1) have the same distribution as N . First, we use recursion (20) for N n to define hybrid random variables that link N n to N (0, M m ) as follows: Let N (0) and N (1) be defined on the same probability space as (U n ) n≥0 , independent with distribution N (0, M m ) and also independent of the process (R n ) n≥0 . We eliminate the error term in the given recursion and set In order to ensure finiteness of the Zolotarev metric, the covariance matrix of Q n has to be adjusted. Cov(Q n ) has full rank for all n ≥ 0. This implies that we can find a deterministic sequence of matrices (B n ) n≥0 with Cov(B n Q n ) = M m for all n ≥ 0 and B n → Id m−1 componentwise and in operator norm as n → ∞. We write B n = Id m−1 + K n with (K n ) n≥0 tending to 0 componentwise. Hence, with N as before and n ≥ n 0 , each pair of N n , (Id m−1 + K n )Q n and N is ζ 3 -compatible and the triangle inequality implies ζ 3 (N n , N ) ≤ ζ 3 (N n , (Id m−1 + K n )Q n ) + ζ 3 ((Id m−1 + K n )Q n , N ) which is finite for all n ≥ n 0 . First we show that ζ 3 ((Id m−1 + K n )Q n , N ) = o(1) by use of an upper bound of ζ 3 by the minimal L 3 -metric 3 . The minimal L 3 -metric 3 is given by By construction, K n op → 0 and by Lemma 3.7, b n 3 → 0. Also, by Lemma 3.8, sup n≥0 Φ n 3 < ∞ and sup n≥0 Q n 3 < ∞, this yields ζ 3 (N n , (Id m−1 + K n )Q n ) ≤ ζ 3 (Φ n , Q n ) + o(1).
The previous estimates and (23) imply Now a standard argument shows that ζ 3 (N n , N ) → 0 as n → ∞, see [16], for example.