Characteristic Functionals of Dirichlet Measures

We compute characteristic functionals of Dirichlet-Ferguson measures over a locally compact Polish space and prove continuous dependence of the random measure on the parameter measure. In finite dimension, we identify the dynamical symmetry algebra of the characteristic functional of the Dirichlet distribution with a simple Lie algebra of type $A$. We study the lattice determined by characteristic functionals of categorical Dirichlet posteriors, showing that it has a natural structure of weight Lie algebra module and providing a probabilistic interpretation. A partial generalization to the case of the Dirichlet-Ferguson measure is also obtained.

1. Introduction and main results. Let X be a locally compact Polish space with Borel σ-algebra B(X) and let P(X) be the space of probability measures on (X, B(X)). For σ ∈ P(X) we denote by D σ the Dirichlet-Ferguson measure [9] on P(X) with probability intensity σ.
The characteristic functional of D σ is commonly recognized as hardly tractable [14] and any approach to D σ based on characteristic functional methods appears de facto ruled out in the literature. Notably, this led to the introduction of different characterizing transforms (e.g. the Markov-Krein transform [16,43] or the c-transform [14]), inversion formulas based on characteristic functionals of other random measures (in particular, the Gamma measure, as in [32]), and, at least in the case X = R, to the celebrated Markov-Krein identity (see e.g. [24]).
These investigations are based on complex analysis techniques and integral representations of special functions, in particular the Lauricella hypergeometric function k F D [21] and Carlson's R function [5]. The novelty in this work consists in the combinatorial/algebraic approach adopted, allowing for broader generality and far reaching connections, especially with Lie algebra theory.
Fourier analysis. Denote by D α k the Dirichlet distribution on the standard simplex ∆ k−1 with parameter α k ∈ R k + , which we regard as the discretization of D σ induced by a measurable k-partition X k of X (see §2 below). Our first result is the following. where i = √ −1 is the imaginary unit, Z n is the cycle index polynomial (2.1) of the n th symmetric group and f j denotes the j th power of f . Furthermore, the map σ → D σ is continuous with respect to the narrow topologies.
The characteristic functional representation is new. It provides -in the unified framework of Fourier analysis -(a) a new (although non-explicit) construction of D σ as the unique probability measure on P(X) satisfying D σ = lim k D α k (see Cor. 3.16. Following [45], we call this construction a weak Fourier limit); (b) new proofs of known results on the tightness and asymptotics of families of Dirichlet-Ferguson measures (see Cor.s 3.12 and 3.13), proved, elsewhere in the literature, with ad hoc techniques; (c) the continuity statement in the Theorem, which strengthens [37,Thm. 3.2] concerned with norm-to-narrow continuity. This last result is sharp, in the sense that the domain topology cannot be relaxed to the vague topology.
Representations of SL 2 -currents and Bayesian non-parametrics. The Dirichlet-Ferguson measure D, the gamma measure G [19,42] and the 'multiplicative infinite-dimensional Lebesgue measure' L + [42,45] play an important rôle in a longstanding program [20,42,46] for the study of representations of measurable SL 2 -current groups, i.e. spaces of SL 2 -valued bounded measurable functions on a smooth manifold X. Within such framework, connections between these measures and Lie structures of special linear type are not entirely surprising. In particular, the measure L + is constructed (see [45, §4.1]) as the weak Fourier limit for k → ∞ of rescaled Haar measures on the identity connected components dSL + k+1 in maximal toral -commutative -subgroups of the special linear groups SL k+1 (R). (For details on this construction see §4.2 below.) Relying on connections between cycle index polynomials and Pólya Enumeration Theory, we identify the special linear object acting on the Dirichlet distribution D α k as the dynamical symmetry algebra, in the sense of [26,28], of the Fourier transform D α k . In contrast with the case of L + , we are able to detail the action of the whole -non-commutative -dynamical symmetry algebra, and provide a suitable interpretation of this action in terms of Bayesian statistics. Indeed, one remarkable property [9,29,36] of Dirichlet measures is that their posterior distributions given knowledge on the occurrences of some categorical random variables are themselves Dirichlet measures with different parameters; that is, Dirichlet measures are self-conjugate priors. We show how this property is related to the action of the dynamical symmetry algebra. More precisely, for α ∈ ∆ k−1 and p ∈ (Z + 0 ) k denote by D p α the posterior distribution of the prior D α given atoms of mass p i at point i ∈ [k] (see property iii in §2.2). We prove the following. Theorem 1.2 (see Thm. 4.12). The dynamical symmetry algebra g k of the function D α (see Def. 4.4) is (isomorphic to) the Lie algebra sl k+1 (R) of real square matrices with vanishing trace. Furthermore, if α is chosen in the interior of ∆ k−1 , the universal enveloping algebra U(g k ) naturally acts on an infinite-dimensional linear space O Λα detailed in the proof. Special subalgebras of U(g k ) may be identified, whose actions fix the linear span O Hα ⊆ O Λα of the family of characteristic functionals { D p α } p varying p ∈ (Z + 0 ) k , or the linear span O Λ + α ⊆ O Λα of characteristic functionals of some distinguished improper priors of Dirichlet-categorical posteriors. Theorem 1.1 allows for a partial extension of this result to the infinite-dimensional case of D σ . Since L + and G may be expressed as product measures with D as the only truly infinitedimensional factor, cf. [43], we expect Theorem 1.2 to provide further algebraic insights on these measures.
Quasi-invariance of D. (Quasi-)invariance properties of D, G and L + have been studied with respect to different group actions [20,34,35,42]. Given (X, σ) a Riemannian manifold with normalized volume measure σ, let G be some subgroup of (bi-)measurable isomorphisms of (X, B(X)). We are interested in the quasi-invariance of D σ with respect to the group action ψ.η := ψ η where ψ is in G, η is in P(X) and ψ η := η • ψ −1 denotes the push-forward of η via ψ. When X = S 1 and G = Diff(X), the quasi-invariance of D σ with respect to a similar action was a key tool in the construction of stochastic dynamics on P(X) with D σ or the related entropic measure P σ as invariant measures, see [34,38].
Whereas Theorem 1.1 allows for Bochner-Minlos and Lévy Continuity related results to come into play, the non-multiplicativity of D σ (corresponding to the non-infinite-divisibility of the measure) immediately rules out the usual approach to quasi-invariance via Fourier transforms [2,20,42,43]. Other approaches to this problem rely on finite-dimensional approximation techniques, variously concerned with approximating the space [34,35], the σ-algebra [20] or the acting group [11,45]. The common denominator here is for the approximation to be a filtration (cf. e.g. [20,Def. 9]) -in order to allow for some kind of martingale convergence -and, possibly, for the approximating objects to be (embedded in) linear structures (cf. e.g. [34,45]).
The goal of Theorem 1.2 is ultimately to provide approximating sequences -at the same time of the space X, the σ-algebra on P(X) and the acting group -that are suitable in the sense above.
Plan of the work. Preliminary results are collected in §2, together with the definition and properties of Dirichlet measures and an account of the discretization procedure that we dwell upon in the following. In §3 we prove Theorem 1.1. As a consequence, by the classical theory of characteristic functionals we recover known asymptotic expressions for D βσ when β → 0, ∞ is a real parameter (Cor. 3.13, cf. [37, p. 311]), propose a Gibbsean interpretation thereof (Rem. 3.14), and prove analogous expressions for the entropic measure P β σ on compact Riemannian manifolds [41], generalizing the case X = S 1 [34,Prop. 3.14]. In the process of deriving Theorem 1.1 we obtain a moment formula for the Dirichlet distribution in terms of the cycle index polynomials Z n (Thm. 3.3). In light of Pólya Enumeration Theory we interpret this result by means of a coloring problem ( §4.1). This motivates the study of the dynamical symmetry algebra l k of the Humbert function k Φ 2 resulting in the proof of Theorem 1.2. Finally, in §4.3 we study the limiting action of the dynamical symmetry algebra l k when k tends to infinity.
Some preliminary results in topology and measure theory are collected in the Appendix.

Combinatorial preliminaries.
Set and integer partitions. For a subset L ⊆ [n] denote byL the ordered tuple of elements in L in the usual order of [n]. An ordered set partition of [n] is an ordered tupleL :=(L 1 ,L 2 . . . ) of tuplesL i such that the corresponding sets L i , termed clusters or blocks, satisfy ∅ L i ⊆ [n] and i L i = [n]. The order of the tuples inL is assumed ascending with respect to the cardinalities of the corresponding subsets and, subordinately, ascending with respect to the first element in each tuple. A set partition L of [n] is the family of subsets corresponding to an ordered set partition. This correspondence is bijective. For any set partition write L [n] and L r [n] if #L = r, i.e. if L has r clusters. A (integer ) partition λ of n into r parts (write: λ r n) is an integer solution λ ≥ 0 of the system, n · λ = n, λ • = r; if the second equality is dropped we term λ a (integer ) partition of n (write: λ n). We always regard a partition in its frequency representation, i.e. as the tuple of its ordered frequencies (cf. e.g. [3, §1.1]). To a set partition L r [n] one can associate in a unique way a partition λ(L) r n by setting λ i (L) := # {h | #L h = i}.
Permutations and cycle index. A permutation π in S n is said to have cycle structure λ, write λ = λ(π), if λ i equals the number of cycles in π of length i for each i. Let S n (λ) ⊆ S n be the set of permutations with cycle structure λ, so that S n (λ(π)) = K π the conjugacy class of π and #S n (λ) = M 2 (λ) := n!/(λ! n λ ) [40, Prop. I.1.3.2]. Let now G < S n be any permutation group. The cycle index polynomial of G is defined by We write Z n := Z Sn for the cycle index polynomial of S n , satisfying, for t := (t 1 , . . . , t n ) and t k := (t 1 , . . . , t k ) with k ≤ n, the identities Z n (t) = 1 n! λ n M 2 (λ) t λ , Z n ((a 1) n t) = a n Z n (t) a ∈ R . (2.1) and the recurrence relation Definition 2.1 (Dirichlet distribution). We denote by D α (y) the Dirichlet distribution with parameter α ∈ R k + (e.g. [29]), i.e. the probability measure with density with respect to the k-dimensional Lebesgue measure on the hyperplane of equation y • = 1 in R k , concentrated on (the interior of) ∆ k−1 . Alternatively, for any measurable A ⊆ R k−1 , Whereas both descriptions are common in the literature, the first one makes more apparent property ii below. Namely, write '∼' for 'distributed as' and let Y be any ∆ k−1 -valued random vector. The following properties of the Dirichlet distribution are well-known: i. aggregation (e.g. [9, p. 211, property i • ]). For i = 2, . . . , k set y +i :=(y + y i e i−1 )î. Then, ii. quasi-exchangeability (or symmetry). For all π ∈ S k Y ∼ D α =⇒ Y π ∼ D απ . (2.5) iii. Bayesian property (e.g. [9, p. 212, property iii • ] for the case r = 1). Let W ∈ [k] r be a vector of [k]-valued random variables and P ∈ (Z + 0 ) k be the vector of occurrences defined by and denote by D p α the distribution of Y given P = p, termed here the posterior distribution of D α given atoms with masses p i at points i ∈ [k]. Then, Most properties of the Dirichlet distribution may be inferred from its characteristic functional k Φ 2 , a confluent form of the k-variate Lauricella hypergeometric function k F D (see e.g. [7]).

Recall the following representations of
c > a > 0 and its confluent form (or second k-variate Humbert function [7, ibid.] The distribution D α is moment determinate for any α > 0 by compactness of ∆ k−1 . Its moments are straightforwardly computed via the multinomial theorem as so that the characteristic functional of the distribution indeed satisfies (cf. [7, §7.4.3]) 2.3. The Dirichlet-Ferguson measure.
Notation. Everywhere in the following let (X, τ (X)) be a second countable locally compact Hausdorff topological space with Borel σ-algebra B. We denote respectively by cl A, int A, bd A the closure, interior and boundary of a set A ⊆ X with respect to τ . Recall (Prop. 2.2) that any space (X, τ (X)) as above is Polish, i.e. there exists a metric d, metrising τ , such that (X, d) is separable and complete; we denote by diam A the diameter of A ⊆ X with respect to any such metric d (apparent from context and thus omitted in the notation). Denote by C c (X) (resp. C b (X)) the space of continuous compactly supported (resp. continuous bounded) functions on (X, τ (X)), (both) endowed with the topology of uniform convergence; by C 0 (X) the completion of C c (X), i.e. the space of continuous functions on X vanishing at infinity; by M b (X) (resp. M + b (X)) the space of finite, signed (resp. non-negative) Radon measures on (X, B(X)) -the topological dual of C c (X) and C 0 (X) -endowed with the the vague topology τ v (M b (X)), i.e. the weak* topology, and the induced Borel σ-algebra. Denote further by P(X) ⊆ M + b (X) (cf. Cor. 5.3) the space of probability measures on (X, B(X)). If not otherwise stated, we assume P(X) to be endowed with the vague topology τ v (P(X)) and σ-algebra B v (P(X)). On M + b (X) (resp. on P(X)) we additionally consider the narrow topology τ n (M + b (X)) (resp. τ n (P(X))), i.e. the topology induced by duality with C b (X). Finally, given any measure ν ∈ M b (X) and any bounded measurable function g on (X, B(X)), denote by νg the expectation of g with respect to ν and by g * : ν → νg the linear functional induced by g on M b (X) via integration.
The following statement is well-known. A proof is sketched to establish further notation.
Proposition 2.2. A topological space (X, τ (X)) is second countable locally compact Hausdorff if and only if it is locally compact Polish, i.e. such that τ (X) is a locally compact separable completely metrizable topology on X. Moreover, if (X, B(X)) additionally admits a fully supported diffuse measure ν, then (X, τ (X)) is perfect, i.e. it has no isolated points.
Sketch of proof. Let (αX, τ (αX)) denote the Alexandrov compactification of (X, τ (X)) and α : X → αX denote the associated embedding. Notice that αX is Hausdorff, for X is locally compact Hausdorff; hence αX is metrizable, for it is second countable compact Hausdorff, and separable, for it is second countable metrizable, thus Polish by compactness. Finally, recall that X is (homeomorphic via α to) a G δ -set in αX and every G δ -set in a Polish space is itself Polish. The converse and the statement on perfectness are trivial.
Partitions. Fix σ ∈ P(X). We denote by P k (X) the family of measurable non-trivial kpartitions of (X, B, σ), i.e. the set of tuples X := (X 1 , . . . , X k ) such that Given X ∈ P k (X) we say that it refines A in B if X i ⊆ A whenever X i ∩ A = ∅, respectively that it is a continuity partition for σ if σ(bd X i ) = 0 for all i ∈ [k]. We denote by P k (A ⊆ X), resp. P k (X, τ (X), σ) the family of all such partitions. Given X 1 ∈ P k 1 (X) and X 2 ∈ P k 2 (X) with k 1 < k 2 we say that X 2 refines X 1 , write . We denote the family of all such null-arrays by Na(X). Analogously to partitions, we write with obvious meaning of the notation Na(A ⊆ X) and Na(X, τ (X), σ). If σ is diffuse (i.e. atomless), then lim h σX h,i h = 0 for every choice of X h,i h ∈ X h with (X h ) h ∈ Na(X).
Given a (real-valued) simple function f and a partition X ∈ P k (X), we say that f is locally Given a function f in C c we say that a sequence of (measurable) simple functions (f h ) h is a good approximation of f if |f h | ↑ h |f | and lim h f h = f pointwise. The existence of good approximations is standard.
The Dirichlet-Ferguson measure. By a random probability over (X, B(X)) we mean any probability measure on P(X). For X ∈ P k (X) and η in P(X) set η X := (ηX 1 , . . . , ηX k ) and Recall (cf. [39]) that, if σ ∈ P(X) is diffuse, then for every k ∈ N 1 and y ∈ int ∆ k−1 there exists X ∈ P k (X) such that σ X = y. , Fleming-Viot with parent-independent mutation [8]; see e.g. [36, §2] for an explicit construction) is the unique random probability over (X, B(X)) such that Existence was originally proved in [9] by means of Kolmogorov Extension Theorem (cf. Fig. 1 below). A construction on spaces more general than in our assumptions is given in [18]. Other characterizations are available (see e.g. [36]). Since X is Polish (Prop. 2.2), in (2.11) it is in fact sufficient to consider u continuous with |u| < 1 and, by the Portmanteau Theorem, X ∈ P k (X, τ (X), σ) (cf. e.g. [41, p. 15]).
Let P be a P(X)-valued random field on a probability space (Ω, F , P) and recall the following properties of D σ , to be compared with those of D α , i. realization properties: §4, Thm. 2], with supp P (ω) = supp σ [9, §3, Prop. 1] or [25]. In particular, if σ is diffuse and fully supported, then I is countable and {x i } i is P-a.e. dense in X. The sequence (η i ) i is distributed [12] according to the stick-breaking process. In particular, (independent also of the η i 's [6]) and σ-distributed. ii. σ-symmetry: for every measurable σ-preserving map ψ : X → X, i.e. such that ψ σ = σ, [15,Lem. 9.0] together with (2.10) and the quasi-exchangeability of D α ). In particular, P X is distributed as a function of σ X for every X ∈ P k (X) for every k.
iii. Bayesian property [9, §3, Thm. 1]: Let W := (W 1 , . . . , W r ) be a sample of size r from P , conditionally i.i.d., and denote by D W σ the distribution of P given W, termed the posterior distribution of D σ given atoms W. Then, Discretizations. In order to consider finite-dimensional marginalizations of D βσ , we introduce the following discretization procedure (cf. [33] for a similar construction). Any partition X ∈ P k (X) induces a discretization of X to [k] by collapsing X i ∈ X to an arbitrary point in X i , uniquely identified by its index i ∈ [k], i.e. via the map pr X : The finite σ-algebra σ 0 (X) generated by X induces then a discretization of P(X) to the space P([k]) via the mapping µ → i µX i δ i . Since the latter space is in turn homeomorphic to the standard simplex ∆ k−1 via the mapping i y i δ i → y, every choice of X ∈ P k (X) induces a discretization of P(X) to ∆ k−1 via the resulting composition ev X = pr X . It is then precisely the content of (2.10) that any partition X as above induces a discretization of the tuple ((X, σ), (P(X), D βσ )) to the tuple (([k], α), (∆ k−1 , D α )), where α := β ev X σ is identified with the measure i α i δ i on [k] (cf. Fig. 1 below).
Going further in this fashion, the subgroup S X of bi-measurable isomorphisms ψ of (X, B(X)) respecting X, i.e. such that ψ (X) := (ψ(X 1 ), . . . , ψ(X k )) = X up to reordering, is naturally isomorphic to the symmetric group S k , the bi-measurable isomorphism group Iso The canonical action of S X on X, corresponding to the canonical action of S k on [k], lifts to the action of S k on ∆ k−1 by permutation of its vertices, that is, to the action on P([k]) defined by π.y := π y under the identification of y with the measure i y i δ i .  The following result is a rather obvious generalization of the latter fact, obtained by substituting degeneracy maps with arbitrary maps. We provide a proof for completeness.
Remark 3.2. Assuming the point of view of conditional expectations rather than that of marginalizations, (2.10) may be restated as where σ 0 (X) denotes as before the σ-algebra generated by some partition X ∈ P k (X). The aggregation property (2.4) is but an instance of the tower property of conditional expectations, whereas its generalization (3.2) is a consequence of the σ-symmetry of D σ . Theorem 3.3 (Moments of D α ). Fix α > 0 and s ∈ R k . Then, the following identity holds The statement is equivalent toμ n =ζ n , which we prove in two steps.
Step 1. The following identity holds By induction on n with trivial (i.e. 1 = 1) base step n = 1. Inductive step. Assume for every α > 0 and s in R kμ If k ≥ 2, we can choose j = . Applying (3.6) to both sides of (3.4) yields where the latter equality holds by lettingμ −1 := 0. Letting now α := α + e j and applying the inductive hypothesis (3.5) with α in place of α yields for every j = . By arbitrariness of j = , the bracketed quantity is a polynomial in the sole variables s and α of degree at most n − 1 (obviously, the same holds also in the case k = 1). As a consequence (or trivially if k = 1), every monomial not in the sole variable s cancels out by arbitrariness of s, yielding The latter quantity is proved to vanish as soon as in fact a particular case of the well-known Chu-Vandermonde identity Step 2. It holds thatμ n =ζ n . By strong induction on n with trivial (i.e. 1 = 1) base step n = 0.
The inductive hypothesis, (3.4) and (3.6) yield By arbitrariness of j this implies thatζ n [s, α] −μ n [s, α] is constant as a function of s (for fixed α), hence vanishing by choosing s = 0.
Remark 3.4. Here, we gave an elementary combinatorial proof of the moment formula for D α , independently of any property of the distribution. Notice for further purposes that, defining µ n [s, α] as in (3.3), the statement holds with identical proof for all α in C k such that α • ∈ Z − 0 . For further representations of the moments see Remark 3.11 below.
Proposition 3.5. The function k Φ 2 [ts; 1; α] is the exponential generating function of the polynomials Z n , in the sense that, for all α ∈ ∆ k−1 , More generally, Proof. Recalling that k Φ 2 [α; α • ; s] = D α (s) by (2.9) and noticing that α • = 1, Theorem 3.3 provides an exponential series representation for the characteristic functional of the Dirichlet distribution in terms of the cycle index polynomials of symmetric groups, viz.
Replacing s with −its above and using (2.1) to extract the term t n from each summand, the conclusion follows. The second statement has a similar proof.
Remark 3.6. It is well-known that the characteristic functional of a measure µ on R d (or, more generally, on a nuclear space) is always positive definite, i.e. it holds that The following Lemma also appeared in [22].
Proof. Since D α is moment determinate, it suffices -by compactness of ∆ k−1 and Stone-Weierstraß Theorem -to show the convergence of its moments. By Theorem 3.3 (cf. also (2.1)), As a consequence of the Lemma further confluent forms of k Φ 2 may be computed:

3.2.
Infinite-dimensional statements. Together with the introductory discussion, Proposition 3.1 suggests the following Mapping Theorem for D σ , to be compared with the analogous result for the Poisson random measure P σ over (X, B(X)) (see e.g. [17, §2.3 and passim]). The σsymmetry of D βσ and the quasi-exchangeability and aggregation property of D α are trivially recovered from the Theorem by (2.10).
Theorem 3.9 (Mapping theorem for D σ ). Let (X, τ (X), B(X)) and (X , τ (X ), B(X )) be second countable locally compact Hausdorff spaces, ν a non-negative finite measure on (X, B(X)) and f : (X, B(X)) → (X , B (X)) be any measurable map. Then, Proof. Choosing X :=(g −1 (1), . . . , g −1 (k)), the characterization (2.11) is equivalent to the requirement that (g ) D ν = D g ν for any g : X → [k] such that every ν-representative of g is surjective, which makes X non-trivial for ν. Denote by S(X, ν, k) the family of such functions and notice that if h ∈ S(X , f ν, k), then g := h • f ∈ S(X, ν, k). The proof is now merely typographical: where the second equality suffices to establish that (f ) D ν is a Dirichlet-Ferguson measure by arbitrariness of h, while the third one characterizes its intensity as f ν.
We denote by P(P(X)) the space of probability measures on (P(X), B n (P(X))), endowed with the narrow topology τ n (P(P(X))) induced by duality with C b (P(X)). We are now able to prove the following more general version of Theorem 1.1.
Theorem 3.10 (Characteristic functional of D βσ ). Let (X, τ (X), B(X)) be a second countable locally compact Hausdorff space, σ a probability measure on X and fix β > 0. Then, Moreover, the map ν → D ν is narrowly continuous on M + b (X).
Proof. Characteristic functional. Fix f in C c and let (f h ) h be a good approximation of f , locally constant on X h := (X h,1 , . . . , X h,k h ) with values s h for some (X h ) h ∈ Na(X). Fix n > 0 and set α h := βσ X h . Choosing u : thus, by Dominated Convergence Theorem, continuity of Z n and arbitrariness of f , ∀f ∈ C c µ D βσ n [tf * ] =n! β −1 n Z n t 1 βσf 1 , . . . , t n βσf n , t ∈ R .
Using (2.1) to extract the term t n from Z n and substituting t with i t on the right-hand side, the conclusion follows by definition of exponential generating function.
Continuity. Assume first that (X, τ (X)) is compact. By compactness of (X, τ (X)), the narrow and vague topology on P(X) coincide and P(X) is compact as well by Prokhorov Theorem. Let (ν h ) h∈N be a sequence of finite non-negative measures narrowly convergent to ν ∞ . Again by Prokhorov Theorem and by compactness of P(X) there exists some τ n (P(P(X)))-cluster point D ∞ for the family {D ν h } h . By narrow convergence of ν h to ν ∞ , continuity of Z n and absolute convergence of D · (f ), it follows that lim h D ν h = D ν∞ pointwise on C c (X), hence, by Corollary 5.3, it must be D ∞ = D ν∞ .
In the case when X is not compact, recall the notation established in Proposition 2.2, denote by B(αX) the Borel σ-algebra of (αX, τ (αX)) and by P(αX) the space of probability measures on (αX, B(αX)). By the Continuous Mapping Theorem there exists the narrow limit τ n (P(X))lim h α ν h = α ν ∞ , thus, by the result in the compact case applied to the space (αX, B α ) together with the sequence α ν h , τ n (P(P(X)))-lim The narrow convergence of ν h to ν ∞ implies that α ν ∞ does not charge the point at infinity in αX, hence the measure spaces (X, B(X), ν * ) and (αX, B(αX), α ν * ) are isomorphic for * = h, ∞ via the map α, with inverse α −1 defined on im α αX. The continuity of α −1 and the Continuous Mapping Theorem together yield the narrow continuity of the map (α −1 ) . The conclusion follows by applying (α −1 ) to (3.8) and using the Mapping Theorem 3.9. In the case when ν h converges to ν ∞ in total variation, the continuity statement in the Theorem and the asymptotics for β → 0 in Corollary 3.13 below were first shown in [37,Thm. 3.2], relying on Sethuraman's stick-breaking representation. The following result was also obtained, again with different methods, in [37].  where, in the first case, δ : X → P(X) denotes the Dirac embedding x → δ x .
Proof. The existence of D 0 σ and D ∞ σ as narrow cluster points for {D βσ } β>0 follows by Corollary 3.12. Retaining the notation established in Theorem 3.10, Corollary 3.8 yields for all k hence the order of the limits in each left-hand side of (3.11) may be exchanged, for the convergence in k is uniform with respect to β. This shows (3.9).
By Theorem 3.10, βσ may be substituted with any sequence (β h σ h ) h with lim h β h = 0, ∞ and {σ h } h a tight family. Observe that, despite the similarity with Lemma 3.7, Corollary 3.13 is not a direct consequence of the former, since the evaluation map ev X is never continuous.
Let Z H β := exp(−βH) , F β := −β −1 ln Z H β and G β :=(Z H β ) −1 exp(−βH) respectively denote the partition function, the Helmholtz free energy and (the distribution of) the Gibbs measure of the system. It was heuristically argued in [34, §3.1] that -at least in the case when (X, B, σ) is the unit interval - where: S is now an entropy functional (rather than an energy functional), Z β is a normalization constant and β plays the rôle of the inverse temperature. Here, D * σ denotes a non-existing (!) uniform distribution on P(X). Borrowing again the terminology, this time in full generality, one can say that for small β (i.e. large temperature), the system thermalizes towards the "uniform" distribution δ σ induced by the reference measure σ on the base space, while for large β it crystallizes to δ σ , so that all randomness is lost. Consistently with property i of D σ , we see that E D ∞ σ η i = 0 and E D 0 σ η i = δ i1 for all i, where δ ab denotes the Kronecker symbol; in fact, both statements hold with probability 1.
It is worth noticing that a different interpretation for the parameter β has been given in [22], where the latter is regarded as a 'time' parameter in the definition of a PCOC.
Remark 3.15. By the Continuous Mapping Theorem, both the continuity statement in Theorem 3.10 and the asymptotic expressions in Corollary 3.13 hold, mutatis mutandis, for every narrowly continuous image of D βσ , hence, for instance, for the entropic measure P β σ [34,41]. This generalizes [34, 3.14] and the discussion for the entropic measure thereafter. Corollary 3.16 (Alternative construction of D βσ ). Assume there exists a nuclear function space S ⊆ C 0 (X), continuously embedded into C 0 (X) and such that S ∩ C c (X) is norm-dense in C 0 (X) and dense in S. Then, there exists a unique Borel probability measure on the dual space S , namely D βσ , whose characteristic functional is given by the extension of (3.7) to S.
Proof. By the classical Bochner-Minlos Theorem (see e.g. [10, §4.2, Thm. 2]), it suffices to show that the extension to S, say χ, of the functional (3.7) is a characteristic functional. By the convention in (2.2), χ(0 S ) = χ(0 Cc(X) ) = 1. The (sequential) continuity of χ on S follows by that on C 0 (X) and the continuity of the embedding S ⊆ C 0 (X). It remains to show the positivity (see Rmk. 3.6) of χ, which can be checked only on S ∩ C c (X) by · -density of the inclusions S ∩ C c (X) ⊆ C 0 (X). The positivity of χ restricted to C c (X) follows from the positivity of k Φ 2 in Remark 3.6 by approximation of f with simple functions as in the proof of Theorem 3.10.
Remark 3.17. Let us notice that the assumption of Corollary 3.16 is satisfied, whenever X is (additionally) either finite (trivially), or a differentiable manifold, or a topological group (by the main result in [1]). In particular, when X = R d , we can choose S = S(R d ), the space of Schwartz functions on R d .
Since f * is τ n (P(X))-continuous for every f ∈ C b (X) and bounded by f , the map G is continuous.
Proof. The continuity of D β · is proven in Theorem 3.10. By e.g. [9,Thm. 3] for all f ∈ C c (X) one has D βσ f * = σf , hence G inverts D β · on its image.

Finite-dimensional statements.
Multisets. Given a set S, a (finite integer-valued ) S-multi-set is any function f : S → N 1 such that #f is finite, where # denotes integration on S with respect to the counting measure. We Recall that the number of [n]-multi-sets with cardinality r is r n /r! (see e.g. [40, §I.1.2]). • the symmetry property (2.5) when g = π ∈ S k and, more generally, Proposition 3.1; • the aggregation property (2.4); • the marginalization (2.10) (recall that pr X = ev X ); • the symmetry property (2.12) when f = ψ is measure preserving and, more generally, Theorem 3.9; the commutation of the solid sub-diagram delimited by the two dashed triangles corresponds to the requirement of Kolmogorov consistency. We say that two k-colorings where p k,i [t] := 1 · t i with 1 ∈ R k denotes the i th k-variate power sum symmetric polynomial.
In the following we consider an extension of PET to multisets of colors and explore its connections -arising in the case G = S n -with the Dirichlet distribution D α . A different approach in terms of colorings, limited to the case α • = 1, was briefly sketched in [16, §7].
Let s α be an integer-valued multiset with α ∈ R k + , henceforth a palette. As before, we understand the elements s 1 , . . . , s k of its underlying set  this is the coefficient of the monomial t Corollary 4.3. Let S n,k,r denote the set of S n -equivalence classes ϕ • of α-shadings of [n] such that α ≥ 0 and α • = r. Then, the probability p α h 1 ,...,h k of some ϕ • uniformly drawn from S n,k,r having exactly h i occurrences of the i th color satisfies Proof. The number of palettes with total number of shades r equals the number r n /n! of integer-valued [n]-multisets of cardinality r, thus, choosing r = α • , hence, by homogeneity The conclusion follows by Corollary 4.2 and Theorem 3.3.
The study of D α in the case when α • = 1 is singled out as computationally easiest (as suggested by Theorem 3.3, noticing that 1 n = n!), α representing in that case a probability on [k], as detailed in §2. For these reasons, this is often the only case considered (cf. e.g. [16]). On the other hand though, the general case when α > 0 is the one relevant in Bayesian non-parametrics, since posterior distributions of Dirichlet-categorical and Dirichlet-multinomial priors do not have probability intensity. The above coloring problem suggests that the case when α ∈ (Z + ) k is interesting from the point of view of PET, since it allows for some natural operations on palettes, corresponding to functionals of the distribution.
Indeed, we can change the number of colors and shades in a palette s α by composing any permutation of the indices [k] with the following elementary operations: • (i) 'widen', respectively (ii) 'narrow the color spectrum', by adding a color, say s k+1 , respectively removing a color, say s k . That is, we consider new palettes (s ⊕ s k+1 ) α⊕α k+1 , respectively (s 1 , . . . , s k−1 ) (α 1 ,...,α k−1 ) ; • (iii) 'reduce color resolution' by regarding two different colors, say s i and s i+1 , as the same, relabeled s i . In so doing we regard the shades of the former colors as distinct shades of the new one, so that it has α i + α i+1 shades. That is, we consider the new palette (sî) α +i ; • (iv) 'enlarge', respectively (v) 'reduce the color depth', by adding a shade, say the α th i+1 , to the color s i , respectively removing a shade, say the α th i , to the color s i . This latter operation we allow only if α i > 1, so to make it distinct from removing the color s i from the palette. That is, we consider the new palettes s α+e i , resp. s α−e i when α i > 1.
Increasing the color resolution of a multi-shaded color, say s k with α k > 1 shades, by splitting it into two colors, say s k and s k+1 with α k > 0 and α k+1 > 0 shades respectively and such that α k + α k+1 = α k , is not an elementary operation. It can be obtained by widening the spectrum of the palette by adding a color s k+1 with α k+1 shades and reducing the color depth of the color s k to α k . Thus, this operation is not listed above. We do not allow for the number of shades of a color to be reduced to zero: although this is morally equivalent to removing that color, the latter operation amounts more rigorously to remove the color placeholder from the palette.
The said elementary operations are of two distinct kinds: (i)-(iii) alter the number of colors in a palette, while (iv)-(v) fix it. We restrict our attention to the latter ones and ask how the probability p α h 1 ,...,h k changes under them. By Corollary 4.3 this is equivalent to study the corresponding functionals of the n th moment of the Dirichlet distribution. For fixed k, we address all the moments at once, by studying the moment generating function Namely, we look for natural transformations yielding the mappings where C α is some constant, possibly dependent on α. Here 'natural' means that we only allow for meaningful linear operations on generating functions: addition, scalar multiplication by variables or constants, differentiation and integration. For practical reasons, it is convenient to consider the following construction. Definition 4.4 (Dynamical symmetry algebra of k Φ 2 ). Denote by g k the minimal Lie algebra containing the linear span of the operators E ±1 , . . . , E ±k in (4.2) endowed with the bracket induced by their composition. Following [26], we term the Lie algebra g k the dynamical symmetry algebra of the function k Φ[α; s] := k Φ 2 [α; α • ; s], characterized below. 4.1.2. Dynamical symmetry algebras. We compute now the dynamical symmetry algebra of the function k Φ[α; s] := k Φ 2 [α; α • ; s], in this section always regarded as the meromorphic extension (2.9) of the Fourier transform of D α (s) in the complex variables α, s ∈ C k . The choice of complex variables is merely motivated by this identification and every result in the following concerned with complex Lie algebras holds verbatim for their split real form. For dynamical symmetry algebras of Lauricella hypergeometric functions see [26,27] and references therein; we refer to [13] for the general theory of Lie algebra (representations) and for Weyl groups' theory.
Notation and definitions. Denote by E i,j varying i, j ∈ [k +1] the canonical basis of Mat k+1 (C), with [E i,j ] m,n = δ mi δ nj , where δ ab is the Kronecker delta, and by A * the conjugate transpose of a matrix A. The following is standard.
Then, the complex Lie sub-algebra l k of gl k+1 (C) generated by these vectors is l k = sl k+1 (C), with sl 2 -triples Denote further by f k < l k the sub-algebra spanned by {e i,j , f j,i , h i,j } i,j∈ [k] . Then, f k ∼ = sl k (C).
Everywhere in the following we regard l k together with the distinguished Cartan sub-algebra h k < l k of diagonal traceless matrices spanned by the basis {h 0,j } j∈[k] ; the root system Ψ k induced by h k , with simple roots γ j corresponding to the sl 2 -triples of the vectors e j−1,j for j ∈ [k]; positive, resp. negative, roots Ψ ± k corresponding to the spaces of strictly upper, resp. strictly lower, triangular matrices n ± k . The inclusion f k < l k induces the decomposition of vector spaces (not of algebras) The subscript k is omitted whenever apparent from context.
For fixed α ∈ C k regard k Φ[α; · ] as a formal power series and let f α : C 2k+1 s,u,t −→ C be Let A ⊆ C k . It is readily seen that the functions {f α } α∈A are (finitely) linearly independent, since so are the functions {f α (1, u, 1) ∝ u α } α∈A . Set and define the following differential operators, acting formally on O, where i, j ∈ [k], i = j and ∇ y := (∂ y 1 , . . . , ∂ y k ) for y = u, s. Term the operators E α i , resp. E −α i , raising, resp. lowering, operators. Finally, let g k be the complex linear span of the operators (4.4) endowed with the bracket induced by their composition.
Actions on spaces of holomorphic functions. Let Λ α := α + Z k and set, for every ∈ R + , Proof. The statement on J α i is straightforward. Moreover, Remark 4.7. The variables u and t are merely auxiliary (cf. [28, §1]). The operators do not depend on the parameter α, rather, the subscripts indicate which indices they affect. Heuristically, the action of the operators (4.4) given in Lemma 4.6 may be derived from that [26, (1.5)] of operators in the dynamical symmetry algebra of k F D by a formal contraction procedure [26, p. 1398], letting (in the notation of [26]) α = 0, β = α, γ = α • and dropping redundancies.
Remark 4.8. If α • = 1, the action of the lowering operators E −α i vanishes. This is natural when regarding f α as a formal power series, whereas it is conventional when regarding f α as a meromorphic function, for the functions (1 − α • )f α−e i are in fact -after cancellationswell-defined, not identically vanishing, and holomorphic in s even for α • = 1. The convention here reads 0 × ∞ = 0, which is consistent with the usual convention in measure theory when we identify α • − 1 with the quantity (σ − δ y )X for any y in X; the reason for such identification will be apparent in §4.3 below.
where i, j = 0, . . . , k and p, q = 1, . . . , k with i = j, p = q and, conventionally, Proof. Given the action of the operators in (4.5) straightforward computations yield Proposition 4.11. Let ρ : l k → End(O) be the linear map defined by with j > i. Then, for any fixed α ∈ C k , the pair ρ α : Lie algebra representation of l k with image g k O Λα . Furthermore, the functions f α transform as basis vectors for ρ α , in the sense that for every v in the basis for l k and every α in Proof. By Corollary 4.9, ρ α is a well-defined linear morphism into End(O Λα ). The fact that f α transforms as a basis vector of O Λα is an immediate consequence of Lemma 4.6. For α ∈ Λ α such that α > 1, the actions of operators in (4.4) on O α are mutually different again by Lemma 4.6, hence ρ α is injective. In order to show that ρ α l = g O Λα is a Lie algebra of type A k and that ρ α is a Lie algebra representation, it suffices to verify Serre relations [13, §18.1] of type A for the operators ρ α v with v = v j in an sl 2 -triple corresponding to the simple root γ j in Ψ k . These are readily deduced from Lemma 4.10. In order to prove (i)-(ii) it suffices to show that, for all α ∈ Λ + α and ∈ Z + , one has , v in the basis of f, ∈ N 1 and w in the basis for h ⊕ r + . All of the above follow immediately from Lemma 4.6. Notably, since α • = 1, h acts on O α precisely by weight α.
Since α ∈ ∆ k−1 , then f α+p ( · , 1, 1) = D α+p ( · ). By the Bayesian property of D α the space O Hα is spanned precisely by the Fourier transforms of the form D p α . It remains to show that U(r + ).
The uniqueness of v follows by the fact that, since r + is Abelian, U(r + ) coincides with the (Abelian) symmetric algebra generated by r + (see [13, §17.2]). This proves (iii).
In order to show (iv), recall (e.g. [13, §12.1]) that the Weyl group W k of Ψ k is isomorphic to S k+1 and its action on Ψ k may be canonically identified as dual to the action of S k+1 on h k via conjugation by permutation matrices in P k+1 ∼ = S k+1 < GL(h k ) ∼ = GL k+1 (C). Let P 2:k+1 < GL k+1 (C) denote the subgroup of permutations matrices whose action on Mat k+1 (C) fixes the first row and column. Clearly S k ∼ = P 2:k+1 < P k+1 . Composing the isomorphism ρ α with the identification of the action of P k+1 above completes the proof.

4.2.
Invariant measures on simplices and affine spheres. In the following we lay out a comparison between the results obtained in the previous section and some known facts about the multiplicative infinite-dimensional Lebesgue measure L + [43]. For the reader's convenience, let us briefly recall the construction of L + given in [45]. to be compared with the density (2.3) of the Dirichlet distribution. The k · -invariance of the measures λ k,r k,β on rescaled affine spheres corresponds (a) in the finite-dimensional case -to the projective invariance of the measures L α with respect to the same action, with Radon-Nikodým derivative for L α -a.e. y ∈ M k−1 and (b) in the infinite-dimensional case -to the projective invariance [43, 4.1] of L + β,σ with respect to the action of the group of multipliers exp(C c (X)) M + b (X) given by g.η := g · η where g = e h ∈ C + b (X) for some h ∈ C c (X). The Radon-Nikodým derivative satisfies in this case (see [43, 4.1]) The commutative action of h k . It is the content of Theorem 4.12(i) that the characteristic functionals of the measures D α , varying α ∈ int ∆ k−1 , are projectively invariant under the action of the maximal toral subalgebra h k < l k in the representation ρ α . Since h k acts on O α by weight α (see the proof of Thm. 4.12(i)), for arbitrary J t := t 1 J α 1 + · · · + t k J α k ∈ h k one has The non-commutative action of l k and a family of distinguished improper priors. In contrast to the case of the measures L α on affine spheres -where only the action of the commutative subgroup dSL + k (R) < SL k (R) is taken into account -in the case of the Dirichlet distributions D α it is possible to detail the full non-commutative action of the algebra l k on their characteristic functionals. Incidentally, let us notice that the acting object, although of special linear type in both cases, is a (subgroup of a) Lie group in the first case, but the corresponding Lie algebra in the latter case. This is because the action is, in the first case, an action on measures themselves, whereas, in the second case, on their characteristic functionals.
If α ∈ int ∆ k−1 , then (a) the action of basis elements in r + k amounts to take (characteristic functionals of) Dirichlet-categorical posteriors; it fixes the space O Hα of (characteristic functionals of) such posteriors. On the other hand, (b) the action of basis elements in r − k amounts to take (characteristic functionals of) Dirichlet-categorical priors; such priors should be allowed to be improper, in the sense that they are no longer probability measures, but rather (in-)finite definite (i.e., positive or negative, not signed) measures. Indeed, if we letD α be any such improper prior, with density given by (2.3) in the case when α ∈ Λ + α , thenD α has sign given by The action of r − k fixes the space O Λ + α of (characteristic functionals of) all such priors and vanishes on the line M α,0 , the singular set of the normalization constant B[α ] −1 . Finally, (c) the action of basis elements in f k contains every non-trivial combination of the actions (a) and (b), and fixes isoplethic hypersurfaces M α, , i.e. those where the intensity α has constant total mass α • .
In this framework, the case α ∈ bd ∆ k−1 is spurious, since the intensity measure α should always be assumed fully supported.

4.3.
Infinite-dimensional statements. For a ∈ R we denote by M >a b (X) the space of finite signed measures ν in M b (X) such that νX > a.
Theorem 4.13. Let (X, τ (X), B(X)) be a second countable locally compact Hausdorff space and ν be a diffuse fully supported non-negative finite measure on X. Let further and Then, (iii) let σ be a diffuse fully supported probability measure on (X, τ (X)) and let further (X h ) h ∈ Na(X, τ (X), σ). For σ-a.e. x, such that X h,i h ↓ h {x}, and for every good approximation (f h ) h of f , locally constant on X h and uniformly convergent to f , there exist the pointwise limiting rescaled actions Proof. The functional Φ[ν, f ] is well-defined in the first place since νX > 0. For c, t > 0 denote by P c,t ⊆ R n the polydisk y ∈ R n | |y i | ≤ ct i . By induction and (2.2) it is not difficult to show that max Pc,t |Z n | = Z n [c(t 1) n ]; moreover, by (2.1) and Theorem 3.3, the latter equals t n c n /n!. As a consequence, for arbitrary ν in M >0 b (X) and f ∈ C c (X), letting y i := νf i above, Let now A be in B and (X h ) h as in (ii). Fix f in C c (X), set α h := ν X h and let (f h ) h be a good approximation of f , locally constant on X h with values s h . Equation (4.5) yields by summation More explicitly, since f h is constant on each X h,i with value s h,i , Proposition 3.5 yields Since |f h | ≤ |f | pointwise, the sequence f i h h converges strongly in L 1 ν for every i ≤ n for every n ∈ N 1 , thus by continuity of Z n , there exists the limit The proof of the statement for E A,−B is analogous. This completes the proof of (ii). The requirement that νX > 1 is necessary to the convergence of Φ[ν − δ y , f ] for y ∈ X in the definition of E A,−B , whereas it may be relaxed to νX > 0 in the case of E A . We will make use of this fact in the proof of (iii).
Fix now x in X and let i h := i h (x) be such that X h,i h ↓ h {x}. By Lemma 5.1, the sequence (i h ) h is unique for σ-a.e. x. With the same notation of (ii), let now A = X h,i h in (4.7). Then, thus, (4.8) and (4.9) yield, together with the continuity of y → D σ+δy (f * ) for fixed f and σ, By the Bayesian property D x σ = D σ+δx , this yields the conclusion for the limiting raising action. Finally, since σ is a probability measure, (α h ) • = 1 for all h, thus by Lemma 4.10, where the second equality for the first limiting action follows by (3.11). In all three cases, independence of the limits from the chosen (good) approximation is straightforward.

Appendix.
We collect here some results in topology and measure theory.
Lemma 5.1. Let (X, τ (X), B, σ) be a second countable locally compact Hausdorff Borel measure space of finite diffuse fully supported measure. Then, for every (X h ) h ∈ Na(X, τ (X), σ) for σ-a.e. x in X there exists a unique sequence (X h,i h ) h , with i h := i h (x), such that X h X h,i h ↓ h {x}.
Proof. Proposition 2.2 justifies well-posedness of the requirements in the definition of (X h ) h . Without loss of generality, each X h,i may be chosen to be closed by replacing it with its closure cl X h,i = X h,i ∪ bd X h,i . Hence X h may be chosen to be consisting of closed sets (disjoint up to a σ-negligible set) with non-empty interior. It follows by the finite intersection property that every decreasing sequence of sets (X h,i h ) h such that X h,i h ∈ X h admits a non-empty limit, which is a singleton because of the vanishing of diameters. Vice versa, however chosen (X h ) h , for every point x in X it is not difficult to construct a (possibly non-unique) sequence X h,i h (with i h := i h (x)) convergent to x and such that X h,i h ∈ X h . Furthermore, letting x be a point for which there exists more than one such sequence, we see that for every h the point x belongs to some intersection X h,i 1 ∩ X h,i 2 ∩ . . . , hence, since every partition has disjoint interiors by construction, x ∈ bd X h,i 1 ∩ bd X h,i 2 ∩ . . . . Since for every h and i ≤ k h each set X h,i is a continuity set for σ, the whole union ∪ h≥0 ∪ i∈[k h ] bd X h,i is σ-negligible, thus so is the set of points x considered above, so that for σ-a.e. x there exists a unique sequence (X h,i h ) h such that X h,i h ∈ X h and lim h X h,i h = {x} and x belongs to each X h,i h in the sequence.
Finally, recall the following form of Lévy's Continuity Theorem. . Let (Y, τ (Y )) be a completely regular Hausdorff topological space, V be a linear subspace of C(Y ) separating points in Y and χ be a complexvalued functional on V . If (µ γ ) γ is a narrowly precompact net of Radon probability measures on (Y, B(Y )) and lim γ µ γ (v) = χ(v) for every v in V , then (µ γ ) γ converges narrowly to a Radon probability measure µ, the characteristic functional thereof coincides with χ.