Marginals of multivariate Gibbs distributions with applications in Bayesian species sampling

Gibbs partition models are the largest class of infinite exchangeable partitions of the positive integers generalizing the product form of the probability function of the two-parameter Poisson-Dirichlet family. Recently those models have been investigated in a Bayesian nonparametric approach to species sampling problems as alternatives to the Dirichlet and the Pitman-Yor process priors. Here we derive marginals of conditional and unconditional multivariate distributions arising from exchangeable Gibbs partitions to obtain explicit formulas for joint falling factorial moments of corresponding conditional and unconditional Gibbs sampling formulas. Our proofs rely on a known result on factorial moments of sum of non independent indicators. We provide an application to a Bayesian nonparametric estimation of the predictive probability to observe a species already observed a certain number of times.

where the combinatorial factor accounts for the number of partitions of [n] in which the j-th block in order of appearance has n j elements. When the order of the blocks is irrelevant an alternative, more tractable coding for the joint distribution (3) is in exchangeable random order (cfr. Eq. (2.7) in [30]) (1−α) nj −1 , (4) that, from now on, we term multivariate Gibbs distribution of parameters (n, α, V ). Corresponding Gibbs sampling formula, encoding the partition of n by the vector of the numbers of blocks of different sizes, is obtained by the obvious change of variable in (4) and is given by P α,V (C 1,n = c 1 , . . . , C n,n = c n ) = n!V n,k for c i = k j=1 1{n j = i}, for i = 1, . . . , n, n i=1 ic i = n and n i=1 c i = k. Note that this is the general Gibbs analog of the Ewens sampling formula (cfr. [8]) P θ (C 1,n = c 1 , . . . , C n,n = c n ) = n!θ k (θ) n encoding by the vector of counts the Dirichlet (θ) partition model, ( [13,22]), whose EPPF is well-known to arise for α = 0 in (1). A comprehensive reference for the study of (5), also called component frequency spectrum, for general combinatorial random structures is [1].
In this paper we study marginals of (4), both conditional and unconditional, in order to derive joint falling factorial moments of corresponding conditional and unconditional sampling formulas. Our main motivation comes from applications in Bayesian nonparametric estimation in species sampling problems. In this setting, given n observations from a population of species with multiplicities of the first k species observed (n 1 , . . . , n k ), interest may lie in conditional predictive estimation of quantities related to a further sample of m observations (cfr. e.g. [24], [25]), or in conditional estimation of some diversity index of the whole population, (see e.g. [5]). A common prior assumption in the Bayesian nonparametric approach is that the unknown relative abundances (P i ) i≥1 of the species in the population follow a random discrete distribution belonging to the Gibbs family, i.e. are such that, by Kingman's correspondence (cfr. [23]), where (i 1 , . . . , i k ) ranges over all ordered k-tuples of distinct positive integers. This is equivalent to assume that the theoretically infinite sequence of species labels (X i ) i≥1 is exchangeable with almost surely discrete de Finetti measure representable as P (·) = ∞ i=1 P i δ Yi (·), for (P i ) any rearrangement of the ranked frequencies (P ↓ i ) satisfying (7), independent of (Y i ) ∼ IID H(·), for H some non atomic probability distribution.
Actually the study of conditional Gibbs structures in this perspective has been initiated in [24] and [25] and some results for conditional falling factorial moments of components of (5) are in [11]. Nevertheless in those papers some confusion arises between conditional EPPFs, and conditional multivariate distributions of the vector of sizes and number of the blocks in exchangeable random order, which heavily affects the complexity of the proofs.
Here, after deriving marginals of conditional and unconditional multivariate Gibbs distributions, we obtain joint falling factorial moments of any order of (5), both conditional and unconditional, and explicit formulas for some distributions of interest generalizing some particular cases obtained in [11], in a direct way. Our analysis, besides providing a more effective technique for the study of Gibbs sampling formulas, with a view toward Bayesian nonparametric applications, establishes the first systematic study of joint multivariate distributions arising from Gnedin-Pitman's Gibbs partition models. The paper is organized as follows: in Section 2 we provide marginals of (4) and, resorting to a result in [20] for sum of non independent indicators, derive general formulas for joint falling factorial moments of (5), together with some explicit marginal distributions and their expected values. In Section 3 we derive conditional multivariate Gibbs distributions and their marginals, for sizes and number of new blocks induced by the additional m-sample. A complete analysis is performed for conditional Gibbs sampling formulas exploiting the same technique adopted in Section 2. In Section 4 we focus on multivariate Pólya-like distributions arising by the conditional allocation of the additional sample in old blocks. Finally, in Section 5, we provide an application of marginals of multivariate Gibbs distributions to a Bayesian nonparametric estimation of a m-step ahead probability to detect at observation n + m + 1 a species already observed a certain number of times.

Marginals of multivariate Gibbs distributions
To obtain the marginal distributions for general Multivariate Gibbs distributions (4) it is enough to resort to the definition of generalized central Stirling numbers (cfr. Eq. 1.9 and 1.19 in [30]) (see the Appendix for further details) where the sum ranges over all (n 1 , . . . , n k ) compositions of n. From now on we refer to (4) omitting the ex power in the notation.

Corollary 1.
By a known result in [16] for each model (2) the number of blocks K n has distribution hence, conditioning (8) on K n = k yields independently of the specific Gibbs model. For r = k this is the general Gibbs analog of Eq. (41.8) in [9], and for r = 1, 0 ≤ k − 1 ≤ n − n 1 and n 1 = 1, . . . , n − k + 1 generalized non-central Stirling numbers (see (57) in the Appendix). Marginalizing (8) with respect to K n yields

Joint factorial moments of Gibbs sampling formulas
Joint falling factorial moments for the Ewens' sampling formula (6) of order (r 1 , . . . , r n ), for r l non negative integers and n − l lr l ≥ 0, are in [9] (cfr. Eq. (41.9)) and correspond to Under the same conditions, the generalization to the (α, θ) Poisson-Dirichlet partition model (1) has been obtained is [33] and is given by In the following Proposition we obtain the general result for the Gibbs sampling formula (5) by resorting to a result in [20], first established in [7] then studied in [21]. See also [18,19].
Proposition 2. Under a general (α, V ) Gibbs partition model, joint falling factorial moments of the vector of counts (C 1,n , . . . , C n,n ) of order (r 1 , . . . , r n ) for l lr l ≤ n are given by lr l . For r l = r ≤ ⌈ n l ⌉ and r j = 0 for every j = l, the r-th falling factorial moment of C l,n results Proof. For K n = k, let C l,n = ..,ar) P(N a1 = l, . . . , N ar = l), (12) where the summation is extended over all r-combinations (a 1 , . . . , a r ) of {1, . . . , k}. Since in our case the number of blocks K n is random, and the vector (N 1 , . . . , N r |K n = k) is exchangeable then, for l = 1, . . . , n, Notice that (10) generalizes the result in [11] Eq. (11), stated in terms of generalized factorial coefficients, (cfr. Eq. (53) and (54) in the Appendix), which corresponds to (11). Next Proposition generalizes Proposition 2 (Dirichlet case) and Proposition 4 (two parameter Poisson-Dirichlet case) in [11].
Proposition 3. Under a general (α, V ) Gibbs partition model, for each n ≥ 1 the law of C l,n , the number of blocks of size l, has distribution (15) and the distribution of the number of singleton species C 1,n follows from (14) with expected value Proof. (14) arises by the known relationship between discrete probability distributions and falling factorial moments (cfr. (58) in the Appendix). (15) follows from (11) for r = 1. (16) and (17) follow for l = 1 and generalize (41.10) and (41.11) in [9] to the entire Gibbs family.

Conditional multivariate Gibbs distributions
The study of conditional exchangeable random partitons, i.e. random partitions starting with an initial allocation of the first n natural integers in a certain number k of blocks, has been initiated in [24] in view of proposing a Bayesian conditional nonparametric estimation of the richness of a population of species under priors on the unknown relative abundances belonging to the Gibbs class. (See also [15], Sect. 7). In [2] it has been shown that the corresponding conditional partition probability function, describing the conditional allocation in new and old blocks of integers n + 1, n + 2, . . . , can be obtained by a multi-step variation of the classical Chinese restaurant process construction (CRP) for exchangeable partitions, (first devised by Dubins and Pitman, see [30] Ch. 3). This variation helps to properly place the Bayesian nonparametric approach to species sampling problems under Gibbs priors into the Gnedin-Pitman's exchangeable random partitions theoretical framework. Here we recall the multi-step CRP for completeness. (2), assume that an unlimited numbers of groups of customers arrive sequentially in a restaurant with an unlimited numbers of circular tables, each capable of sitting an unlimited numbers of customers. Given the placement of the first group of n customers in a n = (n 1 , . . . , n j ) configuration in j tables, a new group of m ≥ 1 customers is a) all seated at the j old tables in configuration

Proposition 4. (Cerquetti, 2009) Given an infinite EPPF model
c) a subset s < m of the new customers is seated at k new tables in configuration (s 1 , . . . , s k ) and the remaining m − s customers are seated at the old tables in which, by the multiplicative property of rising factorials (45), simplifies to Now, as in [25], given the allocation of the first n integers in j blocks with multiplicities (n 1 , . . . , n j ), let K m be the number of new blocks generated by the additional m integers, (S 1 , . . . , S Km ) the vector of the sizes of the new blocks in exchangeable random order and S m = Proposition 5. Under a general (α, V ) Gibbs partition model the joint conditional distribution of (S m , K m , S 1 , . . . , S Km ), for S 1 , . . . , S Km in exchangeable random order, given the initial allocation of n integers in j blocks, corresponds to Moreover, conditioning on S m , by Eq. (11) in [25], yields while conditioning on K m , by Eq. (4) in [24], eliminates the dependency on the specific (V n,k ) Gibbs model as in (9) Remark 1. Further results for the conditional moments of any order of K m and for the conditional asymptotic distribution of a proper normalization of K m under (α, θ) Poisson-Dirichlet partition models are in [10]. A simplified approach to the posterior analysis of the two-parameter model exploiting the deletion of classes property and the Beta-Binomial distribution of S m |K n = j is in [3]. A general result for conditional α diversity for Poisson-Kingman partition models driven by the stable subordinator ( [29]) has been obtained in [4]. Remark 2. Notice that equations (21), (22), and (23) fix corresponding formulas (9), (19) and (34) in [25] which are missing the combinatorial coefficients. The problem in Lijoi et al. (2008) seems to follow from some confusion between conditional Gibbs EPPFs, as arising from the multistep sequential construction of Proposition 4, and joint conditional distributions of the corresponding random vectors. We stress here that an EPPF provides the probability of a particular partition characterized by a certain allocation in a certain number of blocks with certain multiplicities. This differs from the probability of the random vector of the multiplicities to assume that specific value, which is obtained by summing over all different partitions providing the same multiplicities in the same number of blocks. The results in the following sections show that once the corrected formulas for the joint conditional distribution are properly identified, the derivation of estimators for quantities of interest in Bayesian nonparametric species sampling modeling simply follows by working with joint conditional marginals of (24).
The first step is to define the conditional analog of (4).
In the next Proposition, mimicking the technique adopted in the previous section for the unconditional case, we derive marginals of (24) as the tools to obtain joint conditional falling factorial moments of the conditional Gibbs sampling formula. For W l,m = Km i=1 1{S i = l} this is given by the usual change of variable in (24), hence In what follows we will resort to the convolution relation which defines noncentral generalized Stirling numbers in terms of central generalized Stirling numbers see the Appendix (cfr. (55)) for further details.
The following Proposition generalizes Theorem 2. in [11]. We adopt the notation W  Proof. By the analogy between (27) and (8) the proof moves along the same lines as the proof of Proposition 2, exploiting the marginals obtained in Proposition 6.
Remark 3. Notice the great computational advantage provided by the technique based on marginals of multivariate Gibbs distributions devised in the previous section with respect to the complexity of the approach adopted in [11]. (28) immediately follows as the conditional analog of the result obtained in Proposition 2 without any need to provide a new proof.
Proposition 8. Under a general (α, V ) Gibbs partition model the marginal distribution of (25) for x = 0, . . . , ⌈m/l⌉ corresponds to Its expected value, which provides the Bayesian nonparametric estimator under quadratic loss function, for the number of new species represented l times, arises from (29) for r = 1 The conditional distribution of the number of new singleton species W (n) 1,m will be

Multivariate Pólya-Gibbs distributions
In this Section we focus on the conditional random allocation of the additional m integers in the j old blocks.
Next Proposition provides the general marginal that we need to obtain joint falling factorial moments of the vector of counts corresponding to (32).
Proof. By (32), the jont marginal of the first r blocks and S m is easily obtained as marginalizing with respect to S m , and multiplying and dividing by (m− r i=1 m i )! yields , and the result follows by an application of (26).

Now let O
, and the one-dimensional marginal of (36) corresponds to (37) The following result easily follows from (35) as the analog of Propositions 2 and 7. n+m,m ), after the allocation of the additional m-sample, given the initial allocation n 1 , . . . , n k is given by for Ξ r1 = (ξ 1 , . . . , ξ r1 ), . . . , Ξ rn+m = (ξ l r l −rn+m , . . . , ξ n+m l=1 r l ), ξ i : n ξi ≤ l, and each Ξ r l ranges over all the combinations of r l elements of j. For r l = r and r j = 0 for j = l, then for ξ i : n ξi ≤ l, which agrees with the result in Theorem 1. in [11].
Next Proposition generalizes the results in Proposition 5 (two-parameter Poisson-Dirichlet case) and Proposition 9 (one parameter Gnedin-Fisher case [15]) in [11] to the entire (α, V ) Gibbs family. l,m is given by Its expected value, which plays the role of the Bayesian nonparametric estimator, under quadratic loss function, of the number of old species represented l times, follows from (37) Multiplying for the way to choose t blocks among the old and r − t among the new for every t, from (29) and (38) we get which agrees with Theorem 3. in [11].
In the next section we provide one more example of the importance of working with marginals of conditional multivariate Gibbs distributions in the implementation of the Bayesian nonparametric approach to species sampling problems under Gibbs priors.

Bayesian nonparametric estimation of the probability to observe a species of a certain size
In species sampling problems, particularly in ecology or genomics, given a basic sample (n 1 , . . . , n j ), interest may be in estimating the probability to observe at step n + m + 1 a species already represented l times both belonging to an old species or to a new species eventually arising in the m-additional sample which is still not observed. This is the topic of a recent paper by Favaro et al. (2012b) and can be seen as a generalization of the problem of estimating the discovery probability, i.e. the probability to discover a new species, not represented in the previous n + m observations. A Bayesian nonparametric estimator of the discovery probability under general (α, V ) Gibbs partition models has been first derived in [24]. In this Section we show that working with marginals of conditional Gibbs multivariate distributions greatly simplifies the derivation of the results obtained in [12], thus providing another example of the importance of the technique proposed in this paper.
Given a basic sample (n 1 , . . . , n j ), but assuming as in [12] an intermediate m-sample still to be observed, the probability to observe a species represented l times among new species at observation n + m + 1 will be a random variable, namely for K m the random number of new species induced by the additional sample and W (n) l,m the random number of species represented l times in the additional sample given the basic sample.
In the following Proposition we show how the Bayesian nonparametric estimator, under quadratic loss function, of (40), (cfr. Theorem 2. in [12]), may be obtained in few elegant steps.
By analogous approach we provide a straightforward derivation for the Bayesian nonparametric estimator for the probability to observe a species represented l times among the old species, namely  The main reference is [30]. Additionally, to facilitate the reading of the results contained in [24,25,11] and [12], the relationship between central and non central generalized factorial coefficients and generalized Stirling numbers is reported.

A.1. Generalized rising factorials
For n = 0, 1, 2, . . . , and arbitrary real x and h, (x) n↑h denotes the nth factorial power of x with increment h (also called generalized rising factorial) where (x) n stands for (x) n↑1 , and (x) h↑0 = x h , for which the following multiplicative law holds From e.g. [26] (cfr. eq. 2.41 and 2.45) a binomial formula also holds, namely as well as a generalized version of the multinomial theorem, i.e.

A.2. Partitions and compositions
A partition of the finite set [n] = (1, . . . , n) into k blocks is an unordered collection of non-empty disjoint sets {A 1 , . . . , A k } whose union is [n], where the blocks A i are assumed to be listed in order of appearance, i.e. in the order of their least elements. The sequence (|A 1 |, . . . , |A k |) of the sizes of blocks, (n 1 , . . . , n k ), defines a composition of n, i.e. a sequence of positive integers with sum n and P k [n] denotes the space of all partitions of [n] with k blocks. From [30] (cfr. eq. (1.9)) the number of ways to partition [n] into k blocks and assign each block a W combinatorial structure such that the number of W -structures on a set of j elements is w j , in terms of sum over compositions of n into k parts is given by where B n,k (w • ) is a polynomial in variables w 1 , . . . , w n−k+1 known as the (n, k)th partial Bell polynomial.