Large deviations for Dirichlet processes and Poisson-Dirichlet distribution with two parameters ∗

Large deviation principles are established for the two-parameter Poisson-Dirichlet distribution and two-parameter Dirichlet process when parameter θ approaches inﬁnity. The motiva-tion for these results is to understand the diﬀerences in terms of large deviations between the two-parameter models and their one-parameter counterparts. New insight is obtained about the role of the second parameter α through a comparison with the corresponding results for the one-parameter Poisson-Dirichlet distribution and Dirichlet process.

The Poisson-Dirichlet distribution was introduced by Kingman (14) to describe the distribution of gene frequencies in a large neutral population at a particular locus. In the population genetics setting it is intimately related to the Ewens sampling formula that describes the distribution of the allelic partition of a sample of size n genes selected from the population. The component P k (θ) represents the proportion of the kth most frequent alleles. If u is the individual mutation rate and N e is the effective population size, then the parameter θ = 4N e u is the scaled population mutation rate.
The GEM distribution can be obtained from the Poisson-Dirichlet distribution through a procedure called size-biased sampling. Here is a brief explanation. Consider a population consisting of individuals of countable number of different types labelled {1, 2, ...}. Assume that the proportion of type i individual in the population is p i . A sample is randomly selected from the population and the type of the selected individual is denoted by σ(1). Next remove all individuals of type σ(1) from the population and then randomly select the second sample. This is repeated to get more samples. Denote the type of the ith selected sample by σ(i). Then (p σ (1) , p σ(2) , ...) is called a size-biased permutation of (p 1 , p 2 , ...). The sequence X θ k , k = 1, 2, ... defined in (1.1) with α = 0 has the same distribution as the size-biased permutation of P(θ) = P(0, θ). The name GEM distribution is termed by Ewens after R.C. Grifffiths, S. Engen and J.W. McCloskey for their contributions to the development of the structure. The Dirichlet process first appeared in (9).
The literature on the study of Poisson-Dirichlet distribution and Dirichlet process with two parameters is relatively small but is growing rapidly. Carlton (2) includes detailed calculations of moments and parameter estimations of the two-parameter Poisson-Dirichlet distribution. The most comprehensive study of the two-parameter Poisson-Dirichlet distribution is carried out in Pitman and Yor (20). In (6) and the references therein one can find connections between twoparameter Poisson-Dirichlet distribution and models in physics including mean-field spin glasses, random map models, fragmentation, and returns of a random walk to origin. The two-parameter Poisson-Dirichlet distribution also found its applications in macroeconomics and finance ((1)).
The Poisson-Dirichlet distribution and its two-parameter counterpart have many similar structures including the urn construction in (12) and (8), GEM representation, sampling formula ((18)), etc.. A special feature of the two-parameter Poisson-Dirichlet distribution is included in Pitman (17) where it is shown that the two-parameter Poisson-Dirichlet distribution is the most general distribution whose size-biased permutation has the same distribution as the GEM representation (1.1).
The objective of this paper is to establish large deviation principles (henceforth LDP) for GEM (θ, α), P D(α, θ), and Dirichlet(θ, α, ν) with positive α when θ approaches infinity. Noting that for the one-parameter model, θ is the scaled population mutation rate. For fixed individual mutation rate u, large θ corresponds to large population size. In the two parameter setting, we no longer have the same explanation. But it can be seen from (1.1) that for nonzero α, large θ plays a very similar role mathematically as in the case α = 0.
LDP for Dirichlet(θ, ν) has been established in (15) and (3) using different methods. Recently in (4), the LDP is established for P D(θ). From (1.1), one can see that for every fixed k, the impact of α diminishes as θ becomes large. It is thus reasonable to expect similar LDPs between GEM (θ) and GEM (θ, α). But in P D(α, θ) and Dirichlet(θ, α, ν), every term in (1.1) counts. It is thus reasonable to expect that the LDP for P D(θ) and Dirichlet(θ, ν) are different from the corresponding LDPs for P D(α, θ) and Dirichlet(θ, α, ν). But it turns out that the impact of α only appears in the LDP for Dirichlet(θ, α, ν).
Result of LDPs turns out to be quite useful in understanding certain critical phenomenon in population genetics. In Gillespie (11) simulations were done for several models in order to understand the roles of mutation and selection forces in the evolution of a population. In the simulations for the infinite-alleles model with selective over-dominance, it was observed that when mutation rate and selection intensity get large together with the population size or θ the selective model behaves like that of a neutral model. In other words, the role of mutation and the role of selection are indistinguishable at certain scale associated with the population size. Through the study of the stationary distribution of the infinitely many alleles diffusion with heterozygote advantage, it was shown in Joyce, Krone and Kurtz (13) that phase transitions occur depending on the relative strength of mutation rate and selection intensity. The result of LDP for P D(θ) provides a more natural way of studying these phase transitions ((4)).
LDP for GEM (θ, α) is given in Section 2. Using Perman's formula and an inductive structure, we establish the LDP for P D(α, θ) in Section 3. The LDP for Dirichlet(θ, α, ν) is established in Section 4 using the subordinator representation in (20) and a combination of the methods in (15) and (3). Further comments are included in Section 5.
The reference (5) includes all the terminologies and standard techniques on large deviations used in this article. Since the state spaces encountered here are all compact, there is no need to distinguish between a rate function and a good rate function. 1], and E ∞ be the infinite Cartesian product of E. Set

LDP for GEM
and consider the map By a proof similar to that used in Lemma 3.1 in (4), one obtains the following lemma.
Lemma 2.1. For each k ≥ 1, the family of the laws of U k satisfies a LDP on E with speed θ and rate function Theorem 2.2. The family {GEM (θ, α) : θ > 0, 0 < α < 1} satisfies a LDP on E with speed θ and rate function else.
Proof: Since U 1 , U 2 , .. are independent, for every fixed n the law of (U 1 , ..., U n ) satisfies a LDP with speed θ and rate function n i=1 I 1 (u i ). For any u, v in E ∞ , set Then for any δ ′′ > 0 and u in E ∞ , one can choose n ≥ 1 and small enough 0 Since E ∞ is compact, by letting n approach infinity in (2.4) and (2.5), it follows that the law of (U 1 , U 2 , ...) satisfies a LDP with speed θ and rate function Since the map G is continuous, it follows from contraction principle and Lemma 2.1 that the family {GEM (θ, α) : θ > 0, 0 < α < 1} satisfies a LDP on E with speed θ and rate function For each 1 ≤ n ≤ +∞, Hence the rate function in (2.7) is the same as S(x 1 , x 2 , ...).
Lemma 3.2. The family of the laws of P 1 (α, θ) satisfies a LDP on E with speed θ and rate function I 1 (p) given in (2.3).
Proof: It follows from the GEM representation that On the other hand, from the representation in Proposition 22 of (20) we obtain that Since both the laws of U 1 and P 1 (θ) satisfy LDPs with speed θ and rate function I 1 (·), we conclude from Lemma 2.4 of (4) that the law of P 1 (α, θ) satisfies a LDP with speed θ and rate function I 1 (·).
Then it follows from (3.16), (3.19) and (3.22) that Introduce a metric d k on ∇ k such that for any p, q in ∇ k For any δ > 0, set For every p ∈ ∇ • k , one can choose δ small enough such that V δ (p) ⊂V δ (p) ⊂ ∇ • k . Let µ denote the Lebesgue measure on ∇ k . Then by Jensen' inequality and (3.24), Letting δ approach zero and using the continuity of I k (·) at p, one gets Since the family {P θ,k : θ > 0} is exponentially tight, a partial LDP holds ( (21)). Let J be any rate function associated with certain subsequence of {P θ,k : θ > 0}. Then it follows from (3.26) that for any p in ∇ • k J(p) ≤ I k (p). On the other hand for any p in ∇ • k , The existence of such q δ is due to the continuity of f over ∇ • k . Letting δ approach zero, one has Next consider the case that p is such that p k > 0, k i=1 p i = 1. Then p k /p k = 1. For small enough δ, we have q k /q k > 1/2 for q ∈V δ (p).
Thus A n (α, θ)(u) = 0 for all n ≥ 1 onV δ (p) and it follows from (3.23) that where a δ is such that Letting δ go to zero, one gets The only case remains is when there is a l ≤ k such that p l = 0. The upper bound in this case is obtained by focusing on a lower dimensional space of the positive coordinates.
Thus we have shown that for every p in ∇ k which combined with the exponential tightness implies the result.
Proof: Proposition 21 in (20) gives the subordinator representation for P D(α, θ). The lemma follows from this representation and the construction outlined on page 254 in (19).
Proof: First note that both function ϕ and function L are essentially smooth. Let It follows from (4.39) and (4.40) that The fact that ν has support E implies that ν(A i ) > 0 for i = 1, ..., n, and Clearly the function Λ is differentiable on D • Λ and grad(Λ)(λ 1 , ..., λ n ) = 1 A sequence λ m approaches the boundary of D • Λ from inside implies that at least one coordinate sequence approaches one. Since the interior of {λ : ϕ(λ) < ∞} is (−∞, 1) and ϕ is essentially smooth, it follows that Λ is steep and thus essentially smooth. The theorem then follows from Gärtner-Ellis theorem ((5)).
Theorem 4.7. The family of the laws of Ξ θ,α,ν on space M 1 (E) satisfies a LDP with speed θ and rate function I α (·).
For any ω, µ in M 1 (E), define (4.57) Then ρ is a metric on M 1 (E) and generates the weak topology.

Further Comments
Our results show that the LDPs for GEM (θ, α) and P D(α, θ) have the same rate function. Since GEM (θ, α) and P D(α, θ) differs only by the ordering, one would expect to derive the LDP for one from the LDP for the other. Unfortunately the ordering operation is not continuous and it is not easy to establish an exponential approximation. The LDPs for GEM (θ, α) and P D(α, θ) also have the same rate function as the LDPs for GEM (θ) and P D(θ). Thus α does not play a role in these LDPs. This is mainly due to the topology used. It will be interesting to investigate the possibility of seeing the role of α through establishing the corresponding LDPs on a stronger topology.
The process Y α,θ (t) is a process with exchangeable increments. One could try to establish a general LDP result for processes with exchangeable increments and derive the result in Section 4 through contraction principle. The proofs here illustrate most of the procedures needed for pursuing such a general result from which the LDP for Ξ θ,α,ν follows.