Random replacements in P\'olya urns with infinitely many colours

We consider the general version of P\'olya urns recently studied by Bandyopadhyay and Thacker (2016+) and Mailler and Marckert (2017), with the space of colours being any Borel space $S$ and the state of the urn being a finite measure on $S$. We consider urns with random replacements, and show that these can be regarded as urns with deterministic replacements using the colour space $S\times[0,1]$.


Introduction
The original Pólya urn, studied already in 1917 by Markov [13] but later named after Pólya who studied it in Eggenberger and Pólya [6] (1923) and Pólya [16] (1930), contains balls of two colours. At discrete time steps, a ball is drawn at random from the urn (uniformly), and it is replaced together with a balls of the same colour, where a 1 is some given constant. The (contents of the) urn is thus a Markov process (X n ) ∞ 0 , with state space Z 2 0 . (The initial state X 0 is some arbitrary given non-zero state.) This urn model has been generalized by various authors in a number of ways, all keeping the basic idea of a Markov process of sets of balls of different colours (types), where balls are drawn at random and the drawn balls determine the next step in the process. (The extensions are all usually called Pólya urns, or perhaps generalized Pólya urns.) These generalizations have been studied by a large number of authors, and have found a large number of applications, see for example [11], [8], [3], [12] and the references given there. The extensions include (but are not limited to) the following, in arbitrary combinations.
(i) The number of different colours can be any finite integer d 2. The state space is thus Z d 0 . (ii) The new balls added to the urn can be of any colours. We have a (fixed) replacement matrix (R i,j ) d i,j=1 of non-negative integers; when a ball of colour i is drawn, it is replaced together with R i,j new balls of colour j, for every j = 1, . . . , d.
(iii) The replacements can be random. Instead of a fixed replacement matrix as in (ii), we have for each colour i a random vector (R i,j ) d j=1 . Each time a ball of colour i is drawn, replacements are made according to a new copy of this vector, independent of everything that has happened so far.
Date: 27 November, 2017. Partly supported by the Knut and Alice Wallenberg Foundation.
(iv) The "numbers of balls" of different colours can be arbitrary nonnegative real numbers (which can be interpreted as the amount or mass of each colour). The state space is thus R d 0 , and the replacement matrix (R i,j ) in (ii), or its random version in (iii), has arbitrary entries in R 0 .
(v) Balls may also be removed from the urn. This means that R i,j in (ii) or (iii) may be negative. (Some conditions are required in order to guarantee that we never remove balls that do not exist; the state space is still Z d 0 or R d 0 .) The simplest case, which frequently appears in applications, is drawing without replacement; then R i,i = −1 is allowed but R i,j 0 when i = j, this means that the drawn ball is not replaced (but balls of other colours are added). In contrast to the many papers on Pólya urns with a finite number of colours, there has so far been very few studies of extensions to infinitely many colours. One example is Bandyopadhyay and Thacker [2,1] who studied the case when the space of colours is Z d , and the replacements are translation invariant. A very general version of Pólya urns was introduced by Blackwell and MacQueen [4] in a special case (with every replacement having the colour of the drawn ball, as in the original Pólya urn, see Example 3.2), and much more generally (with rather arbitrary deterministic replacements) by Bandyopadhyay and Thacker [3] and Mailler and Marckert [12]; this version can be described by: (vi) The space S of colours is a measurable space. The state space is now the space M(S) of finite measures on S; if the current state is µ, then the next ball is drawn with the distribution µ/µ(S). This version seems very powerful, and can be expected to find many applications in the future. Remark 1.1. Note that the case when S is finite in (vi) is equivalent to the version (iv). Also with an infinite S in (vi), a state µ ∈ M(S) of the process can be interpreted as the amount of different colours in the urn. (The amount is thus now described by a measure; note that the measure may be diffuse, meaning that each single colour has mass 0). Remark 1.2. The colour space S is assumed to be a Polish topological space in [4] and [12], and for the convergence results in [3], while the representation results in [3] are stated for a general S. We too make our definitions for an arbitrary measurable space S, but we restrict to Borel spaces in our main result. (This includes the case of a Polish space, see Lemma 2.1 below. Our results do not use any topology on S.) The purpose of the present note is to show that this model with a measurevalued Pólya urn and the results for it by [3] and [12] extend almost automatically to the case of random replacements, at least in the case with no removals. In fact, we show that the model is so flexible that a random replacement can be seen as a deterministic replacement using the larger colour space S × [0, 1], where the extra coordinate is used to simulate the randomization. Random replacement in this general setting was raised as an open problem in [12], and our results together with the results of [12] thus answer this question.
We give a precise definition of the measure-valued version of Pólya urns with random replacement in Section 3. We include there a detailed treatment of measurability questions, showing that there are no such problems. (This was omitted in [3] and [12], where the situation is simpler and straightforward. In our, technically more complex, situation, there is a need to verify measurability explicitly.) The main theorem is the following. The proof is given in Section 4. Theorem 1.3. Consider a measure-valued Pólya urn process (X n ) ∞ 0 in a Borel space S, with random replacements. Then there exists a Pólya urn process ( X n ) ∞ 0 in S × [0, 1] with deterministic replacements such that X n = X n × λ and thus X n = π ♯ ( X n ) for every n 0, where λ is the Lebesgue measure, π : X × [0, 1] → X is the projection, and π ♯ the corresponding mapping of measures.
Urns without replacement or with other removals, see (v), are treated in Section 5. We show that Theorem 1.3 holds in this case too, but the result in this case is less satisfactory than in the case without removals, and it cannot be directly applied to extend the results for this case in [12], see Section 5.
Remark 1.4. Many papers, including [3] and [12], consider only balanced Pólya urns, i.e., urns where the total number of balls added to the urn each time is deterministic, and thus the total number of balls in the urn after n steps is a deterministic linear function of n; in the measure-valued context, this means that the total mass X n (S) = an + b, where b = X 0 (S). (We may without loss of generality assume a = 1 by rescaling.) We have no need for this assumption in the present paper.

Preliminaries
We state some more or less well-known definitions and facts, adding a few technical details.

Measurable spaces.
A measurable space (S, S) is a set S equipped with a σ-field S of subsets of S. We often abbreviate (S, S) to S when the σ-field is evident. When S = [0, 1] or another Polish topological space (i.e., a complete metric space), we tacitly assume S = B(S), the Borel σ-field generated by the open subsets. If X is a random element of S, its distribution is an element of P(S), denoted by L(X).

Borel spaces.
A Borel space is a measurable space that is isomorphic to a Borel subset of [0, 1]. This can be reformulated by the following standard result.
Lemma 2.1. The following are equivalent for a measurable space (S, S), and thus each property characterizes Borel spaces.
For a proof, see e.g. [5,Theorem 8.3.6] or [14,Theorem I.2.12]. An essentially equivalent statement is that any two Borel spaces with the same cardinality are isomorphic.
In Theorem 1.3, we consider only Borel spaces; Lemma 2.1 shows that this is no great loss of generality for applications. Proof. By Lemma 2.1, we may assume that S is a Borel subset of [0, 1]. Then, for every B ∈ S and µ ∈ M ± (S), Proof. M(S) is Borel as a special case of [10, Theorem 1.5]. Alternatively, by Lemma 2.1, we may assume that S is a compact metric space with its Borel σ-field. Then, see e.g. [9, Theorem A2.3], M(S) is a Polish space, and its Borel σ-field equals the σ-field defined above for M(S); hence, M(S) is a Borel space.
Next, M * (S) and P(S) are measurable subsets of M(S) and thus also Borel spaces.
A probability kernel is a kernel that maps S into P(T ), i.e., a kernel µ such that µ s is a probability measure for every s ∈ S.
If µ is a probability kernel from S to T and ν is a probability measure on S, then a probability measure ν ⊗ µ is defined on S × T by Note that if the random element (X, Y ) ∈ S × T has the distribution ν ⊗ µ, then the marginal distribution of X is ν ∈ P(S); we denote the marginal distribution of Y by ν · µ ∈ P(T ). If X and Y are random elements of S and T , respectively, then a regular conditional distribution of Y given X is a probability kernel µ from S to T such that for each B ∈ T , P Y ∈ B | X = µ X (B) a.s. (I.e., µ X (B) is a version of the conditional expectation P Y ∈ B | X .) This is easily seen to be equivalent to: µ is a probability kernel such that (X, Y ) has the distribution L(X) ⊗ µ given by (2.3).
If µ is a probability kernel from a measurable space S to itself, and µ 0 ∈ P(S) is any distribution, we can iterate (2.3) and define, for any N 1, a probability measure µ 0 ⊗ µ ⊗ · · · ⊗ µ on S N +1 such that if (X 0 , . . . , X N ) has this distribution, then X 0 , . . . , X N is a Markov chain with initial distribution X 0 ∼ µ 0 and transitions given by the kernel µ, i.e., P X n ∈ B | X 0 , . . . , X n−1 = P X n ∈ B | X n−1 = µ X n−1 (B) for any B ∈ S and 1 n N . Moreover, these finite Markov chains extend to an infinite Markov chain X 0 , X 1 , . . . with the transition kernel µ.
Remark 2.5. The existence of an infinite Markov chain follows without any condition on S by a theorem by Ionescu Tulcea [9, Theorem 6.17]. (If S is a Borel space, we may also, as an alternative, use Kolmogorov's theorem [9,Theorem 6.16].) The construction of an infinite Markov chain extends to any sequence of different measurable spaces S 0 , S 1 , . . . and probability kernels µ i from S i−1 to S i , i 1, but we need here only the homogeneous case.
The assumption says that C ⊂ A. Furthermore, C is closed under multiplication, since Ψ h 1 Ψ h 2 = Ψ h 1 +h 2 . It follows by the monotone class theorem, in e.g. the version given in [7, Theorem A.1], that A contains every bounded function that is measurable with respect to the σ-field F(C) generated by C.
Let again h ∈ B + (T ). Then, for every ν ∈ M(T ), Hence the mapping ν → h(ν) is F(C)-measurable. In particular, taking h = 1 B , it follows that ν → ν(B) is F(C)-measurable for every B ∈ T . Since these maps generate the σ-field of M(T ), it follows that if D ⊆ M(T ) is measurable, then 1 D is F(C)-measurable, and thus 1 D ∈ A. This means that is measurable for all such D, which means that s → µ s is measurable.
We shall also use the following lemma from [9].

Pólya urns
In this section, we give formal definitions of the Pólya urn model with an arbitrary colour space S. The state space of the urn process is M(S), or more precisely M * (S), since the process gets stuck and stops when there is no ball left in the urn.
In this section we consider for simplicity only urns with replacement and no removals, i.e., all replacements are positive. See Section 5 for the more general case.
We treat first the deterministic case defined and studied by [3] and [12]; our model is the same as theirs and we add only some technical details as a preparation for the random replacement case.
3.1. Deterministic replacements. The replacements are described by a replacement kernel, which is a kernel R = (R s ) s∈S from S to itself, i.e., a measurable map S → M(S); the interpretation is that if we draw a ball of colour s, then it is returned together with an additional measure R s . More formally, we define, for µ ∈ M * (S), a function φ µ : S → M * (S) by thus if the composition of the urn is described by the measure µ, and we draw a ball of colour s, then the new composition of the urn is φ µ (s). Moreover, the ball is drawn with distribution µ ′ := µ/µ(S). Hence, letting φ ♯ µ : P(S) → P(M * (S)) denote the mapping of probability measures induced by φ µ , the composition after the draw has the distribution Since µ ∈ M * (S) implies φ µ (s) ∈ M * (S) by (3.1), R is also a kernel from M * (S) to itself. Finally, R is a probability kernel, since R µ is a probability measure by (3.2).
The Pólya urn process (X n ) ∞ 0 is the Markov process with values in M * (S) defined as in Section 2.3 by the probability kernel R and an arbitrary initial state X 0 ∈ M * (S). (In general X 0 may be random, but we assume for simplicity that X 0 is deterministic; this is also the case in most applications.) Example 3.2. We illustrate the definition with a classical example.
Let S be any measurable space and let the replacement kernel be R s = δ s , i.e., R s (B) = 1 B (s) for s ∈ S and B ∈ S. This means that the drawn ball is returned together with another ball of the same colour. (Note that δ s is well defined even if {s} / ∈ S.) With S = {0, 1} and X 0 an integer-valued measure, this is the urn studied by Markov [13], Eggenberger and Pólya [6] and Pólya [16].
The case when S is an arbitrary Polish space and X 0 ∈ M(S) is arbitrary was studied by Blackwell and MacQueen [4]; they showed that X n /X n (S) a.s. converges (in total variation) to a random discrete probability measure, with a so called Ferguson distribution. See also Pitman [15, Exercises 2.2.6 and 0.3.2, and Section 3.2] (the case S = [0, 1], which is no loss of generality by Lemma 2.1), which imply that the limit can be represented as i P i δ ξ i with ξ i i.i.d. with distribution X 0 /X 0 (S) and (P i ) ∞ 1 with the Poisson-Dirichlet distribution PD(0, X 0 (S)). By Lemma 2.1, the result of [4] extends to any Borel space S. In fact, the result holds for an arbitrary measurable space S; this can for example be seen by considering the same process on S × [0, 1], starting with X 0 × λ, regarding the second coordinate as labels and using the result for [0, 1]; we omit the details.

Random replacement.
For the more general version with random replacement, the replacement measures R s , s ∈ S are random. We let R s := L(R s ) ∈ P(M(S)) for every s ∈ S; R s is thus the distribution of the replacement, and we assume that s → R s is a given probability kernel S → P(M(S)). This means that φ µ (s) in (3.1) is a random measure in M * (S), with a distribution that we denote by Φ µ (s) ∈ P(M * (S)). Note that for a fixed µ ∈ M * (S), the map ψ µ : ν → µ + ν is measurable M(S) → M * (S), and thus induces a measurable map ψ ♯ µ : P(M(S)) → P(M * (S)); furthermore, Φ µ (s) = ψ ♯ µ (R s ).
If we draw from an urn with composition µ ∈ M * (S), then the drawn colour s has as above distribution µ ′ := µ/µ(S), and the resulting urn has thus a distribution R µ that is the corresponding mixture of the distributions Φ µ (s), i.e., in the notation of Section 2.3, see (2.3) and the comments after it, (4.1) The mapping µ → µ × λ is measurable M(S) → M(S); henceR s,u is measurable, and thus a kernel fromS to itself. We now let ( X n ) ∞ 0 be the Pólya urn process in M(S) defined by the replacement kernelR, with initial value X 0 = X 0 × λ. We claim that we can couple the processes such that X n = X n × λ for every n 0. We prove this by induction. Given X n = µ × λ, we draw a ball (s, u) with the distribution (µ × λ) ′ = µ ′ × λ, which means that s has distribution µ ′ and u is uniform and independent of s; hence, given s,R s,u = f (s, u) × λ has the same distribution as f (s, U ) × λ d = R s × λ. We may thus assume (formally by the transfer theorem [9, Theorem 6.10]) thatR s,u = R s × λ, and thus X n+1 = X n +R s,u = (X n + R s ) × λ = X n+1 × λ.

Urns without replacement or with other subtractions
The models in Section 3 can easily be extended to urns without replacement or with removals (subtractions) of other balls.

Deterministic replacements.
In the deterministic case, we let the replacements R s be given by a measurable map S → M ± (S). We assume that we are given some measurable subset M 0 of M * (S) such that for every µ ∈ M 0 , R s + µ ∈ M 0 for µ-a.e. s. I.e., by (3.1), φ µ (s) ∈ M 0 µ-a.e., which means that R µ in (3.2) is a probability measure on M 0 ⊆ M * (S). Lemma 3.1 is modified to say that R is a probability kernel from M 0 to itself; the proof is the same. Then, assuming also X 0 ∈ M 0 , the Pólya urn process is defined by the kernel R as before; we have X n ∈ M 0 for every n. The interpretation is that the drawn ball is discarded, and instead we add a set of balls described by the (positive, integer-valued) measure R s + δ s ; this is thus the classical case of drawing without replacement, see (v) in Section 1. The (5.1) holds, and thus a Pólya urn process is defined for any initial X 0 ∈ N * (S). If the urn is balanced, this is essentially the same as "κ-discrete MVPPs" in [12]. thus allowing 0 ∈ M 0 and consequently R s = −µ in (5.1), which means that we remove all balls from the urn, leaving the urn empty, i.e., X n+1 = 0. In this case, we stop the process, and define X m = 0 for all m > n. Formally, R as defined in (3.2) then is a probability kernel from M 0 \ {0} to M 0 ; we extend it to a kernel from M 0 to M 0 by defining R 0 = δ 0 . We leave further details for this case to the reader.

Random replacements.
In the random case, we similarly assume that (5.1) holds a.s., for some measurable M 0 ⊆ M * (S), every µ ∈ M 0 and µ-a.e. s. We assume that S is a Borel space; then Lemma 2.2 implies that M 0 is a measurable subset of M ± (S), and thus so is, for every µ, Hence the condition is that R is a probability kernel from S to M ± (S) such that for every µ ∈ M 0 , R s (M µ ) = 1 for µ-a.e. s. Then the argument in Section 3.2 shows that R is a probability kernel from M 0 to itself, and thus defines a Pólya urn process for any initial X 0 ∈ M 0 . Theorem 1.3 holds in this setting too, with the same proof given in Section 4; the deterministic urn X n in S × [0, 1] is defined as in Section 5.1 using M 0 := {µ × λ : µ ∈ M 0 } ⊆ M * (S × [0, 1]).
Example 5.4 (Random drawing without replacement). Let N * (S) be as in Example 5.1 and assume that R s is a random replacement such that (5.2) holds a.s. for every s ∈ S; as always we assume also that s → R s := L(R s ) ∈ M ± (S) is measurable. Then (5.1) holds a.s. for every µ ∈ N * (S) and µ-a.e. s, and thus R s defines a Pólya urn process for any initial X 0 ∈ N * (S). Theorem 1.3 gives an equivalent urn in S × [0, 1] with deterministic replacements. Note, however, that this deterministic urn is of the type in Example 5.2, and not of the simpler type in Example 5.1, as the random urn. Hence, Theorem 1.3 may be less useful in this setting.