Large deviation principles for the Ewens-Pitman sampling model

Let $M_{l,n}$ be the number of blocks with frequency $l$ in the exchangeable random partition induced by a sample of size $n$ from the Ewens-Pitman sampling model. We show that, as $n$ tends to infinity, $n^{-1}M_{l,n}$ satisfies a large deviation principle and we characterize the corresponding rate function. A conditional counterpart of this large deviation principle is also presented. Specifically, given an initial sample of size $n$ from the Ewens-Pitman sampling model, we consider an additional sample of size $m$. For any fixed $n$ and as $m$ tends to infinity, we establish a large deviation principle for the conditional number of blocks with frequency $l$ in the enlarged sample, given the initial sample. Interestingly, the conditional and unconditional large deviation principles coincide, namely there is no long lasting impact of the given initial sample. Potential applications of our results are discussed in the context of Bayesian nonparametric inference for discovery probabilities.

Furthermore, let M l,n be the number of blocks with frequency l ≥ 1 such that K n = 1≤l≤n M l,n and n = 1≤l≤n lM l,n . Then, Pitman [26] showed that lim n→+∞ M l,n n α = α(1 − α) (l−1) l! S α,θ a.s. (5) where (x) (n) = (x)(x + 1) · · · (x + n − 1) denotes the rising factorial of x of order n with the proviso (x) (0) = 1. In contrast, for α = 0 and θ > 0, K n and M l,n have a different asymptotic behaviour. Specifically, Korwar and Hollander [18] and Arratia et al. [1] showed that lim n→+∞ K n / log n = θ and lim n→+∞ M l,n = P θ/l almost surely, where P θ/l is distributed according to a Poisson distribution with parameter θ/l. See Arratia et al. [2], Barbour and Gnedin [4] and Schweinsberg [27] for recent generalizations and refinements of (4) and (5). Feng and Hoppe [14] further investigated the large n asymptotic behaviour of the random variable K n and, in particular, they established a large deviation principle for K n . Specifically, for any α ∈ (0, 1) and θ > −α, they showed that n −1 K n satisfies a large deviation principle with speed n and rate function where Λ α (λ) = − log(1 − (1 − e −λ ) 1/α )½ (0,+∞) (λ). In contrast, for α = 0 and θ > 0, it was shown by Feng and Hoppe [14] that (log n) −1 K n satisfies a large deviation principle with speed log n and rate function of the following form It is worth pointing out that the rate function (6) depends only on the parameter α, which displays the different roles of the two parameters, α and θ, at different scales. We refer to Feng and Hoppe [14] for an intuitive explanation in terms of a Poisson embedding scheme for the Ewens-Pitman sampling model. In this paper we establish a large deviation principle for M l,n . Specifically for any α ∈ (0, 1) and θ > −α we show that, as n tends to infinity, n −1 M l,n satisfies a large deviation principle with speed n and we characterize the corresponding rate function I α l . We also present a conditional counterpart of this large deviation principle. To this end, with a slightly abuse of notation, we write X | Y to denote a random variable whose distribution coincides with the conditional distribution of X given Y . Moreover, let (X 1 , . . . , X n ) be an initial sample fromP α,θ,ν featuring K n = j blocks with corresponding frequencies N n = n, and let (X n+1 , . . . , X n+m ) be an additional unobserved sample. Recently, Lijoi et al. [19], Favaro et al. [9] and Favaro et al. [10] derived and investigated the conditional distributions of the number K  l,m of blocks with frequency l ≥ 1 in (X 1 , . . . , X n+m ), given (X 1 , . . . , X n ). In particular, they showed that and where S (n,j) α,θ d = B j+θ/α,n/α−j S α,θ+n with B j+θ/α,n/α−j and S α,θ+n being independent and distributed according to a Beta distribution with parameter (j + θ/α, n/α − j) and according to (3) with q = θ + n, respectively. Intuitively, as suggested by the fluctuations (5) and (8), one may expect that M l,n and M (n) l,m | (K n , N n ) have different asymptotic behaviours also in terms of large deviations, as n and m tend to infinity, respectively. Here we show that, for any fixed n and as m tends to infinity, m −1 M (n) l,m | (K n , N n ) satisfies a large deviation principle with speed m and rate function I α l . In other terms, we show that there is no long lasting impact of the given initial sample to the large deviations. A similar behaviour was recently observed in Favaro and Feng [11] with respect to the large deviation principles for K n and K (n) m | (K n , N n ). The problem of studying conditional properties of exchangeable random partitions was first considered in Lijoi et al. [20]. Such a problem consists in evaluating, conditionally on the random partition (K n , N n ) induced by a sample (X 1 , . . . , X n ) fromP α,θ,ν , the distribution of statistics from an additional sample (X n+1 , . . . , X n+m ). As observed in Lijoi et al. [19], these statistics have direct applications in Bayesian nonparametric inference for species sampling problems arising from ecology, biology, genetic, linguistic, etc. Indeed, from a Bayesian perspective, (2) is a nonparametric model for the individuals X i 's from a population with infinite species, where Π is the prior distribution on the composition of such a population. The aforementioned M (n) l,m is a representative statistic of practical interest. See, e.g., Griffiths and Spanò [17] and Bacallado et al. [3] for other statistics. In particular P[M (n) l,m = m l | K n = j, N n = n] takes on the interpretation of the posterior distribution of the number of species with frequency l in the enlarged sample (X 1 , . . . , X n+m ), given (X 1 , . . . , X n ) features j species with frequencies n. Hence E α,θ [M (n) l,m | K n = j, N n = n] is the corresponding Bayesian nonparametric estimator under a squared loss function. In such a framework our conditional large deviation principle provides a large m approximation of the estimator P[m −1 M (n) l,m ≥ x | K n = j, N n = n], for any x ≥ 0. For large m this is the right tail of the posterior proportion of species with frequency l in the enlarged sample.
A closer inspection of the fluctuations (7) and (8) reveals that for l = 1 our conditional large deviation principle has a natural interpretation in the context of Bayesian nonparametric inference for discovery probabilities. Indeed, let P[D (n) m ∈ · | K n = j, N n = n] be the conditional, or posterior, distribution of the probability of discovering a new species at the (n + m + 1)-th draw, given the random partition (K n , N n ) induced by (X 1 , . . . , X n ). The additional sample (X n+1 , . . . , X n+m ) is assumed to be not observed. For large m, we show that P[D 1,m | K n = j, N n = n] provides a large m approximation of the estimator of the probability of discovering a new species at the (n+m+1)-th draw, namely E α,θ [D (n) m | K n = j, N n = n], which first appeared in Lijoi et al. [19]. An illustration of these asymptotic estimators is presented by using a genomic dataset. The interest in E α,θ [D  m ≥ x | K n = j, N n = n], as well as in their large m approximations, is related to the problem of determining the optimal sample size in species sampling problems. Indeed this problem is typically faced by setting a threshold τ for an exact or approximate mean functional of P[D (n) m ∈ · | K n = j, N n = n], and then making inference on the sample size m for which this mean functional falls below, or above, τ . This introduces a criterion for evaluating the effectiveness of further sampling.
The paper is structured as follows. In Section 2 we present the main result of the paper, namely the large deviation principle for M l,n . Section 3 contains the conditional counterpart of this large deviation principle. In Section 4 we discuss potential applications of our conditional large deviation principle in the context of Bayesian nonparametric inference for species sampling problems.

Large deviations for M l,n
For any α ∈ (0, 1) and θ > −α the large deviation principle for M l,n is established through a detailed study of the moment generating function of M l,n . This is in line with the approach originally adopted in Feng and Hoppe [14] for K n . For any λ > 0 let y = 1 − e −λ and be the moment generating function of the random variable M l,n . Let (y) [n] = y(y − 1) · · · (y − n + 1) be the falling factorial of y of order n, with the proviso (y) [0] = 1. Proposition 1 in Favaro et al. [10] provides an explicit expression for E α,θ [(M l,n ) [r] ]. Recalling that (y) (n) = 0≤i≤n 0≤j≤i |s(n, i)|S(i, j)(y) [j] , where s and S denote the Stirling number of the first type and the second type, an explicit expression for E α,θ [(M l,n ) (r) ] is obtained. Specifically, we have and where the sums over i is nonnull for 0 ≤ i ≤ min(r, ⌊n/l⌋). In the next lemma we provide an explicit expression for the moment generating function G M l,n (y; α, 0). This result follows by combining (11) with the series expansion on the right-hand side of (9), and by means of standard combinatorial manipulations.
Lemma 1. For any α ∈ (0, 1) In the next theorem, which is the main result of the paper, we exploit (12) and (10) in order to establish the large deviation principle for M l,n . The proof of this theorem is split into three main parts. The first two parts deal with the large deviation principle for M l,n under the assumption α ∈ (0, 1) and θ = 0, whereas the third part deals with the general case α ∈ (0, 1) and θ > −α.
Theorem 1. For any α ∈ (0, 1) and θ > −α, as n tends to infinity, n −1 M l,n satisfies a large deviation principle with speed n and rate function I α (17). In particular, for almost all Proof. In the first part of the proof we show that, assuming α ∈ (0, 1) and θ = 0, n −1 M l,n satisfies a large deviation principle with speed n and we characterize the corresponding rate function I α l . For large n, by means of (11) we have If n/l is an integer, then the final term in the above expression is nỹ n/l . By direct calculation we have that lim n→+∞ n −1 log E α,0 [e λM l,n ] = 0 for any λ ≤ 0. Also, for any λ > 0 and In particular, by combining the inequalities stated in (13) and (14), respectively, we have On the other hand, for any ǫ in (0, 1/l), there exists a sequence (i n ) n≥1 such that (i n /n) n≥1 converges to ǫ as n tends to infinity. For this particular sequence Noting that Moreover, since ϕ ′ (0+) = +∞ and ϕ ′ (1/l−) = −∞, then the function ϕ(ǫ) reaches a maximum at a point ǫ 0 in (0, 1/l) where ϕ ′ (ǫ 0 ) = 0. Clearly ǫ 0 depends on α, l and λ. Moreover note that ϕ ′′ (ǫ) = −α/ǫ(1−(l−α)ǫ)(1−lǫ) < 0, which implies that ǫ 0 (λ) is unique and Λ α,l (λ) = log[1 + αǫ 0 /(1 − lǫ 0 )]. In particular, since and Note that, since both the functions h 1 and h 2 are strictly increasing functions with differentiable inverses, then ǫ 0 = h −1 2 • h 1 (λ) is a differentiable strictly increasing function and, in particular, we have lim λ→0 ǫ 0 = 0 and lim λ→+∞ ǫ 0 = 1/l. Now, if we set Λ α,l (λ) to be zero for nonpositive λ, and for λ > 0 then it is clear that {λ : Λ α,l (λ) < +∞} = R and Λ α,l (λ) is differentiable for λ = 0. The left derivative of Λ α,l (λ) at zero is clearly zero. On the other hand, for λ > 0 Since ǫ 0 converges to zero it follows from direct calculation that, as λ ↓ 0 one has Accordingly Λ α,l (λ) is differentiable everywhere. By the Gärtner-Ellis theorem (see Dembo and Zeitouni [6] for details), a large deviation principle holds for n −1 M l,n on space R as n tends to infinity with speed n and good rate function I α l (x) = sup λ {λx − Λ α,l (λ)}. This completes the first part of the proof. In the second part of the proof we further specify the rate function I α l . In particular, let us rewrite Λ α,l (λ) as Λ α,l (λ) = λ/l +Λ α,l (λ), where we definedΛ α,l (λ) = −λ/l, for λ ≤ 0, and For this to hold, we need to verify that I α l (x) is finite for x in (0, 1/l]. By definition, sup for any x in (0, 1/l]. For any λ ≥ 1, let d 2 be the value of ǫ 0 at λ = 1. Then ǫ 0 ≥ d 2 for any λ ≥ 1 and this implies thatΛ(λ) is bounded for all λ ≥ 1. Accordingly, we can write sup λ≥1 (19) implies (18). This completes the second part of the proof. Finally, in the third part of the proof we extend the large deviation principle to the case α ∈ (0, 1) and θ > −α. By combining the definition (9) with (10), and by means of standard combinatorial manipulations, one has where the function D is such that D(α, θ, n, 0) = 1 and, for any 1 ≤ i ≤ ⌊n/l⌋, Since θ/α > −1, it follows from basic algebra that one can find positive constants, say d 3 and d 4 , that are independent of n and i and such that it follows where k is the smallest integer greater than 1 + |θ| + |θ/α|. Accordingly, we have Then, for any α ∈ (0, 1) and θ > −α, n −1 M l,n satisfies a large deviation principle with speed n and rate function I α l . This completes the third part of the proof.
In general it is difficult to get a more explicit expression for I α l . Indeed, Λ α,l depends on λ in an implicit form, namely Λ α,l is a function of λ in terms of h −1 2 • h 1 (λ), where where h 1 and h 2 are in (15) and (16) respectively. However, under the assumption α = 1/2 and l = 1, an explicit expression for I α l can be derived. For any α ∈ (0, 1) and θ > −α, the rate function I α l displayed in (17) can be easily evaluated by means of standard numerical techniques.
Proposition 1. Let B 1 be the function specified in (23). Then, for any x ∈ [0, 1] Proof. Under the assumption α = 1/2 and l = 1, the equation Going back to the rate function, we have It is known that I ( and is the discriminant. Let G(B) denote the left-hand side of (21). By a direct calculation it follows that G ′ (B) = 0 has two negative roots. This, combined with the fact that G(0) = −2x < 0, implies that one and only one of the three roots of (21) is positive. Denote this root by B 1 (x). Then the rate function is Making a change of variable in (21) such that C = B + 2/(3(1 − x)) we obtain the following depressed form of the equation follows by a direct application of the Viéte's trigonometric formula. The proof is completed by combining the rate function (22) with the function B 1 in (23).
To some extent Theorem 1 provides a generalization of the large deviation principle for K n introduced in Theorem 1.2 of Feng and Hoppe [14], for any α ∈ (0, 1) and θ > −α. Indeed, recall that one has the following relations between K n and M l,n : K n = 1≤i≤n M l,n and n = 1≤i≤n lM l,n . However, so far it is not clear to us how to relate the large deviation principle for M l,n with the large deviation principle for K n . In this respect we retain that the results in Dinwoodie and Zabell [7] may be helpful in understanding such a relation.
The random variables (24) and (25)  l,m of blocks with frequency l in the enlarged sample (X 1 , . . . , X n+m ). This is the number of new blocks with frequency l generated by (X n+1 , . . . , X n+m ) plus the number of old blocks with frequency l that arise by updating, via (X n+1 , . . . , X n+m ), the frequencies already induced by (X 1 , . . . , X n ). Specifically, let where the sums over i is nonnull for 0 ≤ i ≤ min(r, ⌊m/l⌋). Note that In other terms the number K n of blocks in the initial sample is a sufficient statistics for . This property of sufficiency was pointed out in Favaro et al. [10]. Along lines similar to Lemma 1, in the next lemma we provide an explicit expression for the moment generating function G N (n) l,m (y; α, 0).

Lemma 2.
For any α ∈ (0, 1) In the next theorem we exploit the moment generating function (29) and the rising factorial moment (27) in order to establish the large deviation principle for M (n) l,m | (K n , N n ). Such a result provides a conditional counterpart of Theorem 1.
Proof. As we anticipated, in order to prove the theorem, it is sufficient to prove the large deviation principle for N (n) l,m | K n , for any α ∈ (0, 1) and θ > −α. We start with the assumption α ∈ (0, 1) and θ = 0 and then we consider the general case. From the moment generating function (29) we can write where C(i, m; n, j, α, l) Note that, for any index i such that ⌊m/l⌋ + 1 ≤ i ≤ ⌊(m + n)/l⌋, we can write the following inequalities 1 ≤ n+m+iα−il−1 l,m | K n satisfies a large deviation principle with speed m and rate function I α l . In order to deal with the general case α ∈ (0, 1) and θ > −α, one needs a term wise comparison between (27) and (28). In particular, for any i ≤ m/l let us define Then, one has By an argument similar to those used in deriving (20) it follows that one can find constants d 5 > 0 and d 6 > 0 and positive integers k 1 and k 2 independent of m and i such that d 5 (n + m) −k 1 ≤ D(m, i; α, θ, n, j) ≤ d 6 (n + m) k 2 which leads to Such a result, combined with Theorem 1, implies that m −1 N (n) l,m | K n satisfies a large deviation principle with speed m and rate function I α l . Hence, by a direct application of Corollary B.9 in Feng [13], m −1 M (n) l,m | (K n , N n ) satisfies a large deviation principle with speed m and rate function I α l , and the proof is completed.
In contrast with the fluctuations (5) and (8), Theorem 1 and Theorem 2 show that in terms of large deviations the given initial sample (X 1 , . . . , X n ) have no long lasting impact. Specifically the large deviation principles for M l,n and M (n) l,m | (K n , N n ) are equivalent when n and m tend to infinity, respectively. This is caused by the two different scalings involved, namely m −1 for large deviations and m −α for the fluctuations. According to Corollary 20 in Pitman [23], the initial sample (X 1 , . . . , X n ) leads to modify the parameter θ in the conditional distribution ofP α,θ,ν given (X 1 , . . . , X n ). Hence we conjecture that the conditional and the unconditional large deviation results will be different if n is allowed to grow and leads to large parameter θ. In the unconditional setting this kind of asymptotic behaviour is discussed in Feng [12], where the parameter θ and the sample size n grow together and the large deviation result will depend on the relative growth rate between n and θ.
If m depends on n and both approach infinity then one can expect very different behaviours in terms of law of large numbers and fluctuations. The large deviation principle for M (n) l,m | (K n , N n ) may not be easily derived, by means of a direct comparison argument, from the large deviation principle of N (n) l,m | K n . In this respect, it is helpful to study the moment generating function We intend to pursue this study further in a subsequent project. As in Lemma (2), an explicit expression for (33) follows by combining the rising factorial moments of M

Discussion
Our large deviation results contribute to the study of conditional and unconditional properties of the Ewens-Pitman sampling model. Theorem 2 has potential applications in the context of Bayesian nonparametric inference for species sampling problems. Indeed, as we pointed out in the Introduction, in such a context P[M (n) l,m ∈ · | K n = j, N n = n] takes on the interpretation of the posterior distribution of the number of species with frequency l in a sample (X 1 , . . . , X n+m ) fromP α,θ,ν , given the initial observed sample (X 1 , . . . , X n ) featuring K n = j species with corresponding frequencies N n = n. The reader is referred to Favaro et al. [10] for a comprehensive account on this posterior distribution with applications to Bayesian nonparametric inference for the so-called rare, or local, species variety.
For large m, m −1 M (n) l,m is the random proportion of species with frequency l in (X 1 , . . . , X n+m ). In Theorem 2 we characterized the rate function I α l of a conditional large deviation principle associated to such a random proportion. The rate function I α l is nondecreasing over the set [0, 1/l]. Then the number of discontinuous points of I α l is at most countable and therefore inf z≥x I α l (z) = inf z>x I α l (z) for almost all x ∈ [0, 1/l]. Hence, for almost all l,m ≥ x | K n = j, N n = n], for any x ≥ 0, namely Hereafter we thoroughly discuss T (n) 1,m in Bayesian nonparametric inference for discovery probabilities. In particular we introduce a novel approximation, for large m, of the posterior distribution of the probability of discovering a new species at the (n+m+1)-th draw. Such an approximation, then, induces a natural interpretation of T (n) 1,m in the context of Bayesian nonparametric inference for the probability of discovering a new species at the (n+m+1)-th draw.

Discovery probabilities and large deviations
Let D (n) m be the probability of discovering a new species at the (n + m + 1)-th draw. Since the additional sample (X n+1 . . . , X n+m ) is not observed, D (n) m | (K n , N n ) is a random probability. The randomness being determined by (X n+1 . . . , X n+m ). In particular, by means of the predictive distribution (1), we observe that P[D (n) m ∈ · | K n = j, N n = n] is related to P[K (n) m ∈ · | K n = j, N n = n] as follows where the conditional, or posterior, distribution P[K (n) m ∈ · | K n = j] was obtained in Lijoi et al. [19] and then investigated in Favaro et al. [9]. Specifi- be the noncentral generalized factorial coefficient. See Charalambides [5] for details. Then, for any k = 0, 1, . . . , m, and The distribution (36) is the posterior distribution of the probability of discovering a new species at the (n + m + 1)-th draw. An explicit expression for this distribution is obtained by means of (37). Also, D is the Bayesian nonparametric estimator, with respect to a squared loss function, of the probability of discovering a new species at the (n + m + 1)-th draw. An explicit expression of this estimator is obtained by combining (36) with (38).
We introduce a large m approximation of P[D (n) m ∈ · | K n = j] and a corresponding large m approximation of the Bayesian nonparametric estimator D (n) m . This approximation sets a novel connection between the posterior (39), respectively. Equivalent, though less rough, normalization rates for (8) and (39) are r M (m; α, θ, n, j, m 1 ) = Γ(θ + α + n + m − 1) and r D (m; α, θ, n, j) = Γ(θ + α + n + m) Γ(θ + n + m + 1) respectively. Obviously, in terms of asymptotic r M (m; α, θ, n, j, m 1 )/m α → 1 and r D (m; α, θ, n, j)/m α−1 → 1 as m tends to infinity. These corrected normalization rates are determined in such a way that D m ∈ · | K n = j], the result displayed in (40) induces a natural interpretation of the conditional large deviation principle of Theorem 2, with l = 1, in the context of Bayesian nonparametric inference for discovery probabilities. Indeed by combining the approximations in (35) and (40) we can write the large m approximation By exploiting the corrected normalization rates (42) and (42), a corrected version of (44) is ≈ T In other terms Theorem 2 with l = 1 provides a large m approximation of the Bayesian nonparametric estimator of the right tail of the probability of discovering a new species at the (n + m + 1)-th draw, without observing (X n+1 , . . . , X n+m ). We point out that If α = 1/2 then the rate function in the approximations (44) and (45) can be exactly computed by means of Proposition 1.

Illustration
We present an illustration of our results dealing with a well-known benchmark Expressed Sequence Tag (EST) dataset. This dataset is obtained by sequencing two cDNA libraries of the amitochondriate protist Mastigamoeba balamuthi: the first library is non-normalized, whereas the second library is normalized, namely it undergoes a normalization protocol which aims at making the frequencies of genes in the library more uniform so to increase the discovery rate. See Susko and Roger [28] (2), the first issue to face is represented by the specification of the parameter (α, θ) characterizing the prior Π. This is typically achieved by adopting an empirical Bayes procedure in order to obtain an estimate (α,θ) of (α, θ). Specifically we fix (α, θ) so to maximize the likelihood function of the model (2) under the observed sample, namely (α,θ) = arg max (α,θ) Alternatively, one could specify a prior distribution for (α, θ). Here we adopt a less elaborate specification of the parameter (α, θ). We choice α = 1/2 and then we set θ such that E 1/2,θ [K n ] = (2θ)(((θ + 2 −1 ) n /(θ) n ) − 1) = j. Empirical investigations with simulated data suggests that α = 1/2 is always a good choice when no precise prior information is available. See Lijoi et al. [19] for details. This approach gives (α, θ) = (1/2, 206.75) for the Mastigamoeba non-normalized and (α, θ) = (1/2, 132.92) for the Mastigamoeba normalized.
For the Mastigamoeba non-normalized and normalized cDNA libraries, Table 1 reports the exact estimate D ( Table 1 and Table 2 about here) Table 1 and Table 2 clearly show that the corrected normalization rates (42) and (43) are of fundamental importance when the additional sample size m is not much larger than the sample size n and the parameter θ. Figure  1 and Figure 2 show the large deviation approximations (44) and (45)