Moment convergence of balanced P\'olya processes

It is known that in an irreducible small P\'olya urn process, the composition of the urn after suitable normalization converges in distribution to a normal distribution. We show that if the urn also is balanced, this normal convergence holds with convergence of all moments, thus giving asymptotics of (central) moments.


Introduction
A Pólya urn process is defined as follows. Consider an urn containing balls of different colours, with s possible colours which we label 1 . . . , s. At each time step, we draw a ball at random from the urn; we then replace it and, if its colour was i, we add r ij further balls of colour j, for each j = 1, . . . , s. Here R := (r ij ) s i,j=1 (1.1) is a given matrix, called the replacement matrix. The state of the urn at time n is described by a vector X n = (X n1 , . . . , X ns ), where X nj is the number of balls of colour j. We start with some given (deterministic) X 0 , and it is clear that X n evolves according to a Markov process.
As usual, we assume that r ij 0 when i = j, but we allow r ii to be negative, meaning removal of balls, provided the urn is tenable, i.e., that it is impossible to get stuck. (See (2.2)-(2.3), and see Remark 1.8 for an extension that allows some negative r ij .) Urn processes of this type have been studied by many different authors, with varying generality, going back to Eggenberger and Pólya [5]; see for example Janson [8], Flajolet, Gabarró and Pekari [6], Pouyanne [14], Mahmoud [12], and the further references given there.
In the present paper we study only the balanced case, meaning that the total number of balls added each time is deterministic, i.e., that the row sums of the matrix (1.1) are constant, say m; we assume further that m > 0.
We define, for an arbitrary vector (x 1 , . . . , x n ), |(x 1 , . . . , x n )| := n i=1 |x i |. In particular, the total number of balls in the urn is |X n |. Note that when the urn is balanced, this number is deterministic, with |X n | = |X 0 | + nm.
In the description above, it is implicit that the numbers r ij are integers. However, it has been noted many times that the process is also well-defined for real r ij , see e.g. [8,Remark 4.2], [9] and [14] (cf. also [11] for the related case of branching processes); this can be interpreted as an urn containing a certain amount (mass) of each colour, rather than discrete balls. We give a detailed definition of this, more general, version in Section 2, and use it in our results below.
Results on the asymptotic distribution of X n as n → ∞ have been given by many authors under varying assumptions, using different methods. It is well-known that the asymptotic behaviour of X n depends on the eigenvalues of R, or equivalently of its transpose A = R t , see e.g. [8,]. By the Perron-Frobenius theory of positive matrices (applied to R + cI for some c 0), R has a largest real eigenvalue λ 1 , and all other eigenvalues λ satisfy Re λ < λ 1 . We say that an eigenvalue λ is large if Re λ > 1 2 λ 1 , small if Re λ 1 2 λ 1 and strictly small if Re λ < 1 2 λ 1 . Similarly, we say that the Pólya process (or urn) is small (strictly small ) if λ 1 is simple and all other eigenvalues are small (strictly small); a process is large whenever it is not small. We call a Pólya process critically small if it is small but not strictly small, i.e., if the process is small and R admits an eigenvalue λ such that Re λ = λ 1 /2. We define, letting Λ be the set of eigenvalues, Thus the Pólya urn is strictly small if σ 2 < λ 1 /2, critically small if σ 2 = λ 1 /2, and large if σ 2 > λ 1 /2. In the main results we assume that the urn is irreducible, i.e., that the matrix R is irreducible. (In other words, every colour is dominating in the sense of [8].) Then, the largest eigenvalue λ 1 is simple. (Thus the second case in (1.2) does not occur.) As said above, we also assume the urn to be balanced, with all row sums of R equal to m, and then λ 1 = m, with a corresponding right eigenvector (1, . . . , 1). Furthermore, there exists a positive left eigenvector v 1 of R with eigenvalue m; we assume that v 1 is normalized by |v 1 | = 1, and then v 1 is unique. If the urn is irreducible and small, then X n is asymptotically normal [8,]. More precisely, if v 1 is the positive eigenvector of R defined above, and ν = 0 if the urn is strictly small and ν 1 is the integer defined in Theorem 1.2 below if the urn is critically small, then, as n → ∞,  [14,, if the urn is large, then there exist (complex) random variables W k , (complex) left eigenvectors v k of R and an integer ν 0 such that, a.s. and in any L p , In general, there will be oscillations (coming from complex eigenvalues λ k ) and X n will not converge in distribution (after any non-trivial normalization). Mixed moments of the limit distributions W k in (1.4) can be computed, see [14]. However, there is in general no explicit description of the limit laws for a large urn. See [2], [4], [3] and Mailler [13] for some recent improvements on these distributions. Note also that (1.4) is valid as soon as the urn is large and λ 1 a simple eigenvalue, the urn being irreducible or not (see [14]).
Results of this type have been proven by several authors, under varying assumptions, using several different methods. The proofs in Janson [8] use an embedding in a continuous-time multi-type branching process, a method that was introduced by Athreya and Karlin [1]. This method leads to general results on convergence in distribution, but not to results on the moments. A different method was developed by Pouyanne [14], where algebraic expressions were obtained for (mixed) moments of various components of X n , and asymptotics were derived. For large urns, the resulting moment estimates and some simple martingale arguments give the limit results, with convergence a.s. and in L p , and thus convergence of all moments (after suitable normalization). The method applies also to small urns, and yields limits for the moments. In principle, it should be possible to use the resulting expressions and the method of moments to show (1.3). However, the expressions for the limits are a bit involved, and it seems difficult to do this in general.
The purpose of the present paper is to show moment convergence for small urns by combining these two methods. We use the convergence in distribution (1.3) proven in [8], and we use the estimates of moments proven in [14] to show that any moment of the left-hand side of (1.3) is bounded as n → ∞; these together imply moment convergence in (1.3). (We thus do not have to calculate the limits provided by [14] exactly; it suffices to find bounds of the right order of magnitude.) This yields the following theorems, which are our main results.
All limits and o(. . . ) in this paper are as n → ∞.   1 + d be the dimension of the largest Jordan block of R corresponding to an eigenvalue λ with Re λ = λ 1 /2 (d 0). Then (1.3) holds, with ν = 2d + 1, with convergence of all moments. In particular, E X n = nλ 1 v 1 + o (n log ν n) 1/2 and the covariance matrix Var(X n ) = n log ν n Σ + o(n log ν n). Let w = (w 1 , . . . , w s ) be any vector in R s and let Y n := w, with convergence of all moments.
The remainder of this section is devoted to remarks and problems that can be skipped on a first reading.

Remark 1.4.
For the mean and variance, similar results are also proven in [10] by a related but somewhat different method (under somewhat more general assumptions); that method does not seem to generalise easily to higher moments. c ∈ R, u 1 = (1, . . . , 1) and Ru 0 = 0, which implies that u 0 , X n is constant and thus Y n = w, X n = Y 0 + ncm is deterministic, see [10,Theorem 3.6].
On the other hand, in the critically small case, the rank of Σ is typically only 1 or 2, and there are non-trivial vectors w such that γ = 0 and thus Var(Y n ) = o(n log ν n). Remark 1.6. More precise error estimates in Theorems 1.1 and 1.2 can be obtained from the proofs below. In particular, for the expectation we have in the strictly small case E X n = nλ 1 v 1 + O n σ2/λ1 log ν1 n + O(1) for some ν 1 . See also [10]. Remark 1.7. It is possible to let balls of different colours have different activities, say a i 0 for balls of colour i, with the probability of a ball being drawn proportional to its activity [8]. The condition that the urn is balanced is now that the total activity added each time is a constant. In the case when all activities are positive, this is easily reduced to the standard case a i = 1 by using the real version above; we just multiply the number of balls of colour i by a i (both in the urn and in the replacement matrix). In general, where there are "dummy balls" of activity 0, which thus never are drawn (see e.g. [8] for the use of such balls), the results above still hold, assuming that the urn is irreducible if dummy balls are ignored. (Note that we get another Pólya process by ignoring dummy balls, and that the non-zero eigenvalues remain the same.) This can be shown by the same proofs as given below; we only have to modify the definitions of balanced in (2.4) and of A and Φ in (2.5) and (2.6) by replacing the vectors k used there by a k k , and note that it is easy to verify that the results in [14] still hold (with the corresponding modification of Φ ∂ defined there).  [14] and [7] just mentioned, (1.3) holds because there is an equivalent urn with random replacements that satisfies the conditions of [8].) Remark 1.9. It is possible to let the replacement vectors (r ij ) s j=1 be random, see [8]: with our notations of Section 2, assume that random V -valued increment vectors W 1 , . . . , W s are given and that they admit moments of order p, p ≥ 2 being an integer or ∞. In this case, the conditional transition probabilities (2.1) keep the same form, and K is a copy of W k , independent of everything that has happened so far. The tenability assumptions (2.2)-(2.3) must be modified: it is sufficient that j (W k ) 0 a.s. for all j, k; more generally (2.2) should hold a.s., while for Assume further that the urn is almost surely balanced, which means that (2.4) is a.s. satisfied (replacing w k by W k ).
Then, our results extend to this case, the moment convergence being valid up to order p.
To see this, note first that in this random replacement context, all results of [8] hold. The techniques developed in [14] and the arguments given in the present paper remain also valid after the following adaptations: the replacement operator (2.5) is now while the transition operator (2.6), restricted to polynomials f of degree not more than p, (1.7) Remark 1.10. For an example of applications of the results above on random tree processes (m-ary search trees and preferential attachment trees), one can refer to [7,Remark 3.3]. Problem 1.11. As said above, we consider in this paper only balanced urns. It is a challenging open problem to extend the results to non-balanced urns.

Preliminaries
We follow [14] and use the following coordinate-free description of the urn process. It is easily seen to be equivalent to the traditional description in Section 1, with r ij = j (w i ) and allowing these numbers to be real and not necessarily integers.
Let V be a real vector space of finite dimension s 1 and let 1 , . . . , s be a basis of the dual space V ; let V + := {v ∈ V : j (v) 0, j = 1, . . . , s} \ {0} be the positive orthant. Let X 0 and w 1 , . . . , w s be given vectors in V , with X 0 ∈ V + . Given X n ∈ V + , for some n 0, we let X n+1 := X n + w K , where the random index K is chosen with conditional probability, given X n , This defines the Pólya process (X n ) ∞ 0 (as a Markov process), provided the process is tenable, i.e., X n ∈ V + for all n.
The standard sufficient set of conditions for tenability, used by many authors, is in our formulation: for all j, k = 1, . . . , s, We assume (2.2)-(2.3) for simplicity, but as said in Remark 1.8, the results hold more generally under suitable conditions. In the present paper, we also assume that the process is balanced, which in this context means s k=1 k (w j ) = m, j = 1, . . . , s, (2.4) for some fixed m. We assume further m > 0, and we may without loss of generality assume m = 1, since we may divide all X n and w k (or, alternatively, all j ) by m.
We shall also use the following notation from [14], where further details are given.
The replacement matrix R (or rather its transpose) now corresponds to the replacement operator A : V → V defined by We choose a basis (v k ) s 1 in the complexification V C that yields a Jordan block decomposition of A, and let (u k ) s 1 be the corresponding dual basis in V C . We may assume that these vectors are numbered such that u 1 and v 1 correspond to the eigenvalue λ 1 = m = 1, and, moreover, for each k either u k • A = λ k u k (so u k is an eigenvector of EJP 23 (2018), paper 34. the dual operator A ) or u k • A = λ k u k + u k−1 , for some eigenvalue λ k . Since the urn is supposed to be irreducible, λ 1 = 1 is a simple eigenvalue; furthermore, the balance condition (2.4) (with m = 1) implies that s j=1 j ∈ V is an eigenvector of A with eigenvalue 1; hence we may assume that u 1 = s j=1 j . This means that v 1 is normalized by s j=1 j (v 1 ) = 1. Let λ := (λ 1 , . . . , λ s ), the vector of eigenvalues of A (repeated according to algebraic multiplicity).
Let π k denote the projection of V C onto Cv k defined by π k (v) := u k (v)v k . Note that s k=1 π k = I. For a multi-index α = (α 1 , . . . , α s ) ∈ Z s 0 , let u α := s i=1 u αi i ; this is a homogeneous polynomial function on V s C . We call such multi-indices α powers, and we say that α is a small power if only linear forms u i corresponding to small eigenvalues appear in u α , i.e., if Re λ i 1 2 when α i > 0; we define strictly small power in the same way. Let Φ be the linear operator in the space of (complex-valued) functions on V defined , and thus the expected evolution of any function f of X n is described by Φ. Note also that Φ is the infinitesimal generator of the Markov branching process defined by (X n ) n after embedding in continuous time (see [1,8,2,3]).
We order the multi-indices by the degree-antialphabetic order, see [14], and define S α := span{u β : β α}. Then S α is a finite-dimensional space of polynomials, and S α is Φ-stable [  E Q α (X n ) = O n Re λ,α log να n , (2.8) where ν α is the index of nilpotence of Q α defined in (2.7).
Our proofs use the whole machinery of [14]. We define a polyhedral cone Σ and, for every power α, a polyhedron A α (to be precise, the set of integer points in a convex polyhedron). Let δ j denote the multi-index α with α i = δ ij , i.e., a single 1 in the j-th place. The cone 1 Σ can be defined by its spanning edges, as the Minkowski sum (2.10) 1 There should be no risk of confusion with the covariance matrix Σ in (1.3); we denote this cone too by Σ in order to fit with the notation in [14]. for every subset I of {1, . . . , s}; the equivalence between the two definitions is proven in [14]. (Moreover, it suffices to consider I with 1 #I s−1 in (2.10); these I correspond to the faces of Σ, see [14].) When α ∈ Z s 0 , the polyhedron A α is defined as where α − D α denotes {α − d : d ∈ D α } and D α is 2 the set of Z ≥0 -linear combinations of all vectors δ k − δ k−1 such that u k is not an eigenfunction of A . Note that for such k, (2.13) as a consequence, |α | = |α| and λ, α = λ, α . Note also that always α ∈ A α , and that if A is diagonalizable, then D α = {0}, and thus A α = {α}.

Proofs
Recall that we for convenience, and without loss of generality, assume λ 1 = m = 1.

Powers and nilpotence indices
We begin with the strictly small case, which is rather simple.  2 The definition of Dα corrects a minor error in [14].
The rest of this subsection is devoted to the critically small case, where we have to pay special attention to eigenvalues λ with Re λ = 1 2 ; such eigenvalues are called critical.
Recall that we have chosen a basis (v 1 , . . . , v s ) that yields a Jordan block decomposition of A. A set of indices J ⊆ {1, . . . , s} that corresponds to a Jordan block is called a monogenic block of indices [14]; if the corresponding eigenvalue is critical, J is called a critical monogenic block.
The support of a power or another vector α = (α 1 , . . . , α s ) ∈ Z s is supp(α) := {k : α k = 0}. The power (vector) α is called critical if α k = 0 =⇒ Re λ k ∈ {1, 1 2 }, and α is called strictly critical if α k = 0 =⇒ Re λ k = 1 2 . Furthermore, α is called monogenic when its support in contained in some monogenic block J, and α is called a quasimonogenic power when supp(α) ⊆ {1} ∪ J for some monogenic block J. We consider only critical monogenic blocks, i.e., blocks associated to a critical eigenvalue. (Note that a power α = cδ 1 is critical and quasi-monogenic, and associated to any monogenic block J; otherwise J is determined by α.) Recall that K α is the set of powers defined in (2.17).

Lemma 3.2.
Assume that the urn is critically small.
(ii) If α is a critical power, then any β ∈ K α is critical.
As a consequence of Lemma 3.2 and Theorem 2.2, the space C of polynomial functions on V defined by C := span Q α : α critical is Φ-stable; thus, when α is a critical power, ν α is also the index of Q α for the nilpotent endomorphism induced by Φ − λ, α on C. This property is the basic fact that allows us to prove Proposition 3.3 which constitutes the key argument of Theorem 1.2.

Proposition 3.3.
Assume that the urn is critically small. If α is a quasi-monogenic critical power associated with a Jordan block of size 1 + r, r 0, then ν α r + 1 2 |α|.
The remainder of this section is devoted to the proof of Proposition 3.3. We assume that α is a critical power with supp(α) ⊆ {1} ∪ J for some monogenic block J, and we may without loss of generality assume that J = {2, . . . , r + 2} for some r 0, since we otherwise may permute the Jordan blocks of the chosen basis. In this case, we define for Note that M (γ) is a linear function of γ.

Proofs of Theorems 1.1 and 1.2, and of Corollary 1.3
Proof of Theorems 1.1 and 1.2. Assume that the urn is small. Let P I := k:Re λ k < 1 2 π k and P II := k:Re λ k = 1 2 π k , so that id C s = π 1 + P I + P II . Remember that π k (v) = u k (v)v k .

(3.24)
When the urn is strictly small (Theorem 1.1), P II = 0 and thus X n = π 1 (X n ) + P I (X n ) = nv 1 + P I (X n ) + O(1), (3.25) and (3.22) implies E |X n − nv 1 | 2 = O n . When the urn is critically small (Theorem 1.2), we instead have X n = π 1 (X n ) + P I (X n ) + P II (X n ) = nv 1 + P I (X n ) + P II (X n ) + O(1), (3.27) so that (3.22) and (3.23) imply E |X n − nv 1 | 2 = O n log 2d+1 n . (3.28) In other words, if X n denotes X n := (X n − nv 1 )/n 1/2 when the urn is strictly small and X n := (X n − nv 1 )/ n log 2d+1 n when the urn is critically small, then E | X n | 2 = O(1), for every positive integer . Consequently, if 0 p < 2 , then the sequence E | X n | p is uniformly integrable. Since is arbitrary, this sequence is uniformly integrable for every fixed p 0. Furthermore, by [8, Theorems 3.22 and 3.23], X n d −→ N (0; Σ), for some covariance matrix Σ. The uniform integrability just shown implies that any mixed moment E X α n converges to the corresponding moment of N (0, Σ).
Proof of Corollary 1.3. The estimates for E Y n and Var Y n follow directly from the results for E X n and Var(X n ) in Theorem 1.1 or 1.  EJP 23 (2018), paper 34.