Large Deviations for Non-Crossing Partitions

We prove a large deviations principle for the empirical law of the block sizes of a uniformly distributed non-crossing partition. As an application we obtain a variational formula for the maximum of the support of a compactly supported probability measure in terms of its free cumulants, provided these are all non-negative. This is useful in free probability theory, where sometimes the R-transform is known but cannot be inverted explicitly to yield the density.


Introduction
In this paper we study the block structure of a non-crossing partition chosen uniformly at random. Any partition π of the set n = {1, . . . , n} can be represented on the circle by marking the points 1, . . . , n and connecting by a straight line any two points whose labels are in the same block of the partition. The partition is then said to be non-crossing if none of the lines intersect. These objects were introduced by G. Kreweras [13] and have been studied in the combinatorics literature as an example of a Catalan structure.
We study the empirical measure defined by the blocks of a uniformly random noncrossing partition π. That is, if π has r blocks of sizes B 1 , . . . , B r we consider the random probability measure on N defined by λ n = 1 r r j=1 δ Bj .
We will prove that the sequence λ n n∈N satisfies a large deviations principle of speed n on the space M 1 (N) of probability measures on the natural numbers. This result is obtained via a construction of a uniformly random non-crossing partition by suitably conditioned independent geometric random variables. As a stepping-stone we establish a joint large deviations principle for the process versions of empirical mean and measure of that independent sequence.
A main application of the large deviations result comes from free probability theory. Often one can obtain the free cumulants of a non-commutative random variable. These cumulants characterise the underlying distribution, but obtaining the density involves locally inverting an analytic function which may not lead to a closed-form expression. In such a situation one would still hope to deduce some properties of the underlying probability measure, for example about its support.
The free analogue of the moment-cumulant formula expresses the moments of a noncommutative random variable in terms of its free cumulants. More precisely the moments can be written as the expectation of an exponential functional (defined in terms of the free cumulants) of a non-crossing partition, chosen uniformly at random. Knowing the large deviations behaviour of the latter allows us to apply Varadhan's lemma to describe the asymptotic behaviour of the moments. This in turn yields the maximum of the support of the underlying distribution in terms of the free cumulants.

Statement of Results
Our first main result is the large deviations property of the random measures (λ n ) n∈N . In the form we are stating it here it is a direct corollary of Theorem 4.1.
Theorem 1.1. The sequence (λ n ) n∈N satisfies a large deviations principle in M 1 (N) with good convex rate function J given by where H(µ) denotes the entropy of a probability measure µ and m 1 (µ) its mean.
Since J(ν) = 0 if and only if ν is the geometric distribution G 2 of parameter 1 2 , we obtain a law of large numbers as an immediate corollary. Namely λ n −→ G 2 almost surely in the weak topology.
In the proof of the theorem we need to work with the the function m 1 (µ), which is not continuous in the weak topology. As a stepping-stone we therefore establish a joint large deviations principle for the path versions of empirical mean and measure of i.i.d. geometric random variables. Theorem 1.1 has an application in free probability. Namely it allows us to express the maximum of the support of a compactly supported probability measure in terms of its free cumulants, provided the latter are non-negative. For some background on free probability see Section 5 and the references given there. Theorem 5.9 Let µ be a compactly supported probability measure whose free cumulants (k n ) n∈N are all non-negative. Then the maximum of the support ρ µ of µ is given by log (ρ µ ) = sup 1 m 1 (p) n∈L p n log k n p n − Θ(m 1 (p)) m 1 (p) : p ∈ M 1 1 (L) where L is the set of n ∈ N such that k n = 0 and M 1 1 (L) denotes the set of p ∈ M 1 1 (N) with p(L c ) = 0.
Acknowledgements. The author would like to thank his PhD advisor, Neil O'Connell for his advice and support in the preparation of this paper. We also thank Philippe Biane, Jon Warren and Nikolaos Zygouras for helpful discussions and suggestions.

Uniformly Random Non-Crossing Partitions
In this section we introduce the combinatorial objects mentioned in the introduction. We describe how to generate the uniform distribution on the set of Dyck paths or non-crossing partitions using two sequences of independent and identically distributed geometric random variables. This construction will be used in Section 4 to prove the large deviations result, Theorem 4.1.

Catalan Structures
A Dyck path of semilength n is a lattice path in Z 2 that never falls below the horizontal axis, starting at (0, 0) and ending at (2n, 0), consisting of steps (1, 1) (upsteps) and (−1, 1) (downsteps). Every such path consists of exactly n up-and downsteps each. The set of Dyck paths of semilength n is denoted by P(n). A maximal sequence of upsteps is called an ascent, while a maximal sequence of downsteps is referred to as a descent.
Non-crossing partitions were introduced by G. Kreweras [13]. A partition π of the set n = {1, . . . , n} is said to be crossing if there exist distinct blocks V 1 , V 2 of π and x j , y j ∈ V j such that x 1 < x 2 < y 1 < y 2 . Otherwise π is said to be non-crossing. Equivalently label the vertices of a regular n-gon 1, . . . , n, then π is non-crossing if and only if the convex hulls of the blocks are pairwise disjoint. The set of all non-crossing partitions of n is denoted by NC(n). Combinatorial results on non-crossing partitions, including instances where they arise in topology and mathematical biology can be found in R. Simion's survey [22]. For other areas of mathematics where non-crossing partitions appear see also McCammond [15]. The role of non-crossing partitions in free probability is detailed in Section 5.
There exists a well-known bijection Φ : P(n) −→ NC(n) which maps the descents of p ∈ P n to the blocks of Φ(p), see for example Callan [6] or Yano-Yoshida [26]. Given p ∈ P n label the upsteps from left to right by 1, . . . , n. Label each downstep by the same index as its corresponding upstep, that is the first upstep to the left on the same horizontal level. Then the descents induce an equivalence relation on n: two labels are equivalent if and only if the corresponding downsteps are part of the same descent. The associated partition is then easily seen to be non-crossing.
Conversely, given π = {V 1 , . . . , V r } ∈ NC(n) write the elements of each block V j in descending order, then sort the blocks in ascending order by their least elements. This gives the descent structure of Φ −1 (π), which can be completed by the ascents in a unique way to form a Dyck path.
The common cardinality of P(n) and NC(n) is C n = (2n)! n!(n+1)! , the n th Catalan number. Such combinatorial objects are referred to as Catalan structures and have been much studied. A list of Catalan structures has been compiled by R. Stanley [24], where many results and references on Catalan structures can also be found.
Of course our results extend to any statistic s of any other Catalan structure σ for which there exists a bijection Ψ : σ −→ NC(n) that maps s to the blocks of the image partition. Examples include the blocks of non-nesting partitions (see Reiner [20], Remark 2) or the length of chains in ordered trees as described in Prodinger [19].

A Representation for the Uniform Measure
Since the sets NC(n) and P(n) are finite there exists a uniform distribution on them. Let w have this distribution on P(n). Such a random variable is also referred to as a Bernoulli excursion. We will study the descent structure of a w. Because of the above bijection this is equivalent to studying the blocks of a uniformly random element of NC(n).
The number of Dyck paths with semilength n and k descents is given [10] by the Narayana numbers Therefore the expected number of descents in w is n+1 2 . Further it follows from results in [26] (p.3153) that the expected number of descents of length 1 is given by n 2 +n 4n−4 . Asymptotically we therefore have about n 2 descents, roughly half of which are singletons. However there do not seem to be asymptotic results beyond the singleton descents in the literature. Heuristic arguments suggest that about half of the remaining descents is of length 2 and so on, and indeed, the law of large numbers mentioned in Section 1 (Corollary 4.3) confirms that this is the case.
We now construct a Bernoulli excursion using conditioned geometric random variables. For any n ∈ N let b n : N 2n −→ k∈N P(k) (where P(k) is the set of all length 2k lattice paths on Z, starting at zero and consisting of steps (1, 1) and (1, −1)) denote the map that reconstructs a path from a sequence of ascents and descents. That is, b n (x 1 , y 1 , . . . , x n , y n ) is the path described by x 1 upsteps, y 1 downsteps, then x 2 upsteps and so on, terminating with y n downsteps.
Let X n , Y n be i.i.d. random variables with common law given by the geometric distribution with parameter 1 2 . We will view these as the subsequent ascents and descents of a simple random walk Σ on R starting at 0 with an upstep. Denote by the combined length the first n up-and downsteps take in total and let τ n be the number of descents completed after 2n steps of the simple random walker: We will later work with a renormalisation of τ n , namely τ n = τn 2n . We denote by E n the event that b τn (X 1 , Y 1 , . . . , X τn , Y τn ) is a Dyck path of semilength n: (2.1) The following lemma is now straightforward to check.

Lemma 2.2.
Conditioned on E n the distribution of w τn (X 1 , Y 1 , . . . , X τn , Y τn ) on P(n) is uniform. Hence, conditioned on E n , the random measure λ n defined by is the empirical measure of the descents of a Bernoulli excursion or, equivalently, the block sizes of a uniformly random element of NC(n).

Process Level Large Deviations
Let (X n ) n∈N be an i.i.d. sequence of geometric random variables with parameter 1 2 and denote their common law by G 2 . We define processes S n , L n , indexed by the unit interval and taking values in the space of real numbers and positive finite measures on N respectively by In this section we prove a large deviations principle for the pair (S n , L n ). We start by proving a joint LDP for the pair of end-points (S n (1), L n (1)) via a projective limit argument. We then adapt arguments from Dembo-Zajic [7] to obtain the path-wise result.
Remark 3.3. The reason for obtaining this joint large deviations principle is that for our main large deviations result we need to use the mean as well as the empirical measure but the map µ −→ m 1 (µ) is not continuous in the weak topology. An alternative would have been a priori to strengthen the topology on M + (N) to the Monge-Kantorovich topology, the coarsest topology that makes the map m 1 continuous and is finer than the weak topology. However results by Schied [21] show that in this topology Sanov's theorem does not hold for geometric random variables, because this distribution does not possess all exponential moments.
Let us first recall the definition of a large deviations principle. For background on large deviations theory see for example Dembo-Zeitouni [8].
Definition 3.4. A sequence of measures (µ n ) n∈N taking values on a Polish space is said to satisfy a large deviations principle of speed a = (a n ) n∈N with rate function I if a is a strictly increasing sequence diverging to ∞, I is lower semi-continuous, has compact level sets and for every open set G and every closed set F . (3.5) and (3.6) are often referred to as the large deviations lower bound and upper bound respectively.
All the large deviations principles considered in this paper will be of speed n, that is a n = n for all n ∈ N.

Joint Sanov and Cramér
Denote by M + (N) the space of finite measures on N and let M 1 (N) be the subset of probability measures. We equip M ( N) with the topology of weak convergence. This topology is induced by the complete separable metric β given for µ, ν ∈ M + (N) by where · L denotes the Lipschitz norm. So M + (N) is a Polish space, and so is M 1 (N) when equipped with the subspace topology. See Appendix A of Dembo-Zajic [7]. Our goal here is to establish a joint large deviations principle on Y := R × M + (N) for the empirical mean and measure of the X n . To be more precise define random elements S n := S n (1) ∈ R and L n := L n (1) ∈ M 1 (N). By Cramér's theorem and Sanov's theorem respectively, the laws of S n , L n already satisfy a large deviations principle on R and M 1 (N) individually. The point here is to show that this also holds for the pair. Recall that m 1 (µ) denotes the mean of a probability measure µ.
Proposition 3.8. Let η n denote the law of (S n , L n ). Then (η n ) n∈N satisfies a large deviations principle in Y with good rate function I 1 given by where H(·|·) denotes the relative entropy of two probability measures, i.e., and H(p) = − m p m log(p m ) is the entropy of a probability measure p.
Proof. Recall that the weak topology on M 1 (N) is induced by the dual action of the space The random variables X n , f 1 (X n ), . . . , f d (X n ) ∈ R d+1 are independent and identically distributed, so by Cramér's theorem [8, Corollary 6.1.6] their laws satisfy a large deviations principle on R d+1 with good convex rate function given by The idea is now to take a projective limit approach, following closely Section 4.6 in [8].
We first construct a suitable projective limit space in which R × M + (N) can be embedded. The proposition will follow from an application of the Dawson-Gärtner theorem.
Denote W = C b (N) and let W be its algebraic dual, equipped with the τ (W , W)topology, that is the weakest topology making the maps W f −→ f (w) ∈ R continuous for all w ∈ W. Let further J be the set of finite subspaces of W, partially ordered by we obtain a projective system (Y V , p U,V : U ⊆ V ∈ J). Denote by X its projective limit, equipped with the subspace topology from the product topology. Let further X = R × W and define Φ : This is clearly a bijection. Using the definition of the weak topology in terms of open balls, as in Chapter 8 of Bollobás [5], it is clear that Φ is actually a homeomorphism. Next Then Ψ is a homeomorphism onto its image, which we denote by E. Let η n be the image measure of η n under Ψ. By the Dawson-Gärtner theorem and the finite-dimensional large deviations principle mentioned above, these satisfy a large deviations principle with good rate function I Ψ given by By Cramér's and Sanov's theorem respectively we have exponential tightness for the sequences (S n ) n∈N and (L n ) n∈N separately. Therefore the sequence of pairs ((S n , L n )) n∈N is exponentially tight in R × M 1 (N). The inverse contraction principle (see Theorem 4.2.4 in [8] and the remark (a) following it) now yields the desired LDP for (S n , L n ) with the good rate function It remains to show that I 1 is actually of the form (3.9). Suppose that m 1 (µ) = x, let (λ 1 , . . . , λ d+1 ) ∈ R d+1 and define φ(y) = λ 1 y + d j=1 λ j+1 f j (y). By Jensen's inequality, . . , f 1 dµ ≤ H(µ|G 2 ) and therefore I 1 (x, µ) ≤ H(µ|G 2 ). If µ is a Dirac mass then H(µ|G 2 ) = 0 and the inequality I 1 (x, µ) ≥ H(µ|G 2 ) follows trivially. So we assume that µ is not a Dirac mass. Define e j ∈ C b (N) by e j (m) = δ jm . Write supp (µ) = {n k : k ∈ J}. Then, Fix d ∈ J, and let g(λ) denote the function inside the supremum. The effective domain of Λ en 1 ,...,en d is (−∞, log (2)) × R d . Because µ is not a Dirac mass the function g(λ) tends to −∞ whenever |λ| tends to ∞, in whatever direction. So the supremum of g is attained at some λ 0 ∈ (−∞, log (2)) × R d . Then λ 0 is a local maximum for g, whence ∇g(λ 0 ) = 0, or equivalently ∇Λ en 1 ,...,en d (λ 0 ) = (x, µ n1 , . . . , µ n d ) T . So we can define an exponential tilting ν λ0 of µ by The probability measure ν λ0 has mean x and satisfies e nj dµ = e nj dν λ0 for all j ∈ {1, . . . , d}. Moreover, and therefore, The same estimate shows that I 1 (x, µ) = +∞ whenever m 1 (µ) = x.
It now follows from Lemma 4.1.5 (a) in [8] that the LDP also holds in the larger space Y = R × M + (N), by setting I 1 (x, µ) = ∞ whenever µ is not a probability measure.

The Sample-Path Result
Theorem 3.10. Let ξ n denote the law of (S n , L n ) on C ([0, 1]; Y), the space of continuous functions from the unit interval to Y. The sequence (ξ n ) n∈N satisfies a large deviations principle on C ([0, 1]; Y) with good rate function I 2 given by exists in the weak topology for almost every t ∈ [0, 1] and has the property that m 1 (ṗ(·)) = m(·).
For (L n (·)) on its own the analogous result can be found in Dembo-Zajic [7] and we will use a similar structure, using the joint large deviations principle for empirical mean and measure established above. We first prove exponential tightness for the pair of paths: Lemma 3.12. ((S n (·), L n (·))) n∈N is exponentially tight in C ([0, 1]; Y).
Proof. Let the distance d on Y be given by By Lemma A.2 in [7] we get exponential tightness for the laws ξ n of (S n , L n ) if (a) for each fixed t ∈ Q ∩ [0, 1] the sequence ((S n (t), L n (t))) n∈N is exponentially tight and Exponential tightness of ((S n (t), L n (t))) n∈N for every fixed t ∈ Q ∩ [0, 1] is a direct consequence of Proposition 3.8. Moreover, for 0 ≤ s < t ≤ 1, where the maximum on the right-hand side runs over the (finite) set of j such that ns ≤ j ≤ nt . For any δ, ρ > 0 and n ∈ N it follows therefore that 1 n log P sup |t−s|<δ d ((S n (t), L n (t)) , (S n (s), L n (s))) ≥ ρ The right-hand side diverges to −∞ as δ → 0. So condition (b) also holds and (ξ n ) n∈N is exponentially tight.
Lemma 3.13. For any fixed 0 = t 0 < t 1 , . . . , < t m ≤ 1 the sequence (Z n ) n∈N of random variables defined by satisfies a large deviations principle in Y m with good rate function given by Proof. Let n be large enough so that nt j < nt j+1 − 1 and denote E = Y m . A direct calculation yields, for any f = (λ j , g j ) m j=1 ∈ E * , By Corollary 4.6.14 of [8] this implies that the laws of Z n satisfy a large deviations principle on E with good rate function Λ * 1 given by Since I 1 is convex it follows from the results of Section 3.1 and Theorem 4.5.10(b) in [8] that Λ * 2 = I 1 and the lemma is proved.
The proof of Theorem 3.10 now follows closely that of Theorem 1 of [7]. An application of the contraction principle to the map (z 1 , . . . , z m ) −→ (z 1 , z 1 + z 2 , . . . , z 1 + . . . z m ) yields the large deviations principle for the laws of (S n (t 1 ), L n (t 1 ), . . . , S n (t m ), L n (t m )) with good convex rate function given by Applying the Dawson-Gärtner theorem as in the proof of Lemma 3 in [7] yields a LDP for the laws of the pair process (S n , L n ) on C([0, 1]; Y) with good rate function Obviously m 1 (µ(t)) = x(t) for some t implies I 2 (x, µ) = ∞. Lemma 4 in [7] then implies that I 2 is of the form (3.11). This completes the proof of Theorem 3.10.
Finally let (X n ) n∈N , (Y n ) n∈N be two sequences of i.i.d. random variables of common law G 2 and define L X n , L Y n , S X n , S Y n analogously to (3.1, 3.2). By Corollary 2.9 of Lynch-Sethuraman [14] we obtain the following Corollary 3.14. The sequence of the laws of (S X n , L X n , S Y n , L Y n ) satisfies a large deviations principle on C [0, 1], Y 2 with good rate function I where for (x, p, y, q) ∈ Y 2 , otherwise.

Large Deviations for NC(n)
Recall that, in the notation of Section 2, λ n = 1 τn τn j=1 δ Yj is the empirical measure of the blocks of a non-crossing partition picked uniformly at random. Define further σ n = m 1 (λ n ) = 1 τn τn j=1 Y j . Let ν n denote the law of (σ n , λ n , τ n ) on Y × [0, 1]. The main result of this section is the following.   Proof. Writing E n in terms of the random walk Σ that has ascents X 1 , X 2 , . . . and descents Y 1 , Y 2 , . . ., Therefore, using the Markov property of Σ, because the second probability on the right is just that of running a simple random walk for 2n steps and obtaining a Dyck path. A direct computation using Stirling's formula [11, p.64] yields that 1 n log C n −→ 4 as n → ∞. Equation

The Upper Bound
We are now in a position to prove the large deviations upper bound. We will first give a bound via the process version and then show that this can be written in terms of the stated rate function where the (closed) subset F of Y 2 is defined by F = (x, p, y, q) ∈ E 2 : 1 τ y(τ ), 1 τ q(τ ), τ ∈ F, x(τ ) = y(τ ), x(s) ≥ y(s) ∀s ≤ τ and τ = τ (x + y).
Proof. Recall that λ n = 1 τn L Y 2n (τ n ). Therefore, By Lemma 4.4 the second term on the right-hand side converges to 0. Further, τ n = inf k 2n : 1 2n (X j + Y j ) ≥ 1 , so that τ n is the least integer multiple of 1 2n less than τ (L X 2n + L Y 2n ), with equality if and only if S X 2n (τ n ) + S Y 2n (τ n ) = 1. This certainly holds on E n , so we can write the event E n in terms of the L, S: for ease of notation we denote τ S := τ (S X 2n + S Y 2n ). Then Hence, lim sup n→∞ 1 n log ν n (F ) ≤ 2 lim sup Since τ S is a continuous function of (S X 2n , L X 2n , S Y 2n , L Y 2n ), the set on the right-hand side is closed in Y 2 and we can apply Corollary 3.14 to obtain (4.7).

The Lower Bound
We now turn to proving the lower bound. By the local nature of large deviations lower bounds (see [8], identity (1.2.8) and the adjacent remarks) it is enough to prove the following. where B(µ, r) denotes the ball in M 1 of radius r, centred on µ with respect to β, the metric of (3.7) inducing weak topology.
Recall that lim n→∞ log P(E n ) = 0. Moreover E = r E n,r where On E n,r we have r 2n = τ n ∈ (t − ρ 3 , m + ρ 3 ) and the condition σ n ∈ (m − ρ 2 , m + ρ 2 ) is equivalent to r ∈ n m+ρ2 , n m−ρ2 . Therefore , n m−ρ2 . Fix now w > 0 and let N 1 be large enough to have nw > 2 ρ1 . Then if r ∈ I (w) n = I n ∩ (wn, ∞) and n ≥ N 1 , Using independence of the X j , Y j and the fact that P(Z = a) = P(Z > a) for any Z with law G 2 , ) ≤ τ n and that the S-processes are increasing in time. It follows that Denote by E n,r the latter event and define E n = r∈I (w) n E n,r . We obtain P {(σ n , λ n , τ n ) ∈ G; E n } ≥ Let now N 2 be large enough such that n ≥ N 2 implies n > 2 w ∨ 2 ρ3 ∨ 4w 2 ρ1 ∨ 8 wρ1 and 1 m1+2ρ2 − 1 n < 1 m1+ρ2 . Then and further, using the fact that τ n − τ S < 1 n repeatedly, Here, The right-hand side of (4.15) is of the form (S X 2n , S Y 2n , S X 2n , S X 2n ) ∈ U for an open subset U of C [0, 1]; Y 2 . So we can apply Corollary 3.14, then let w → 0 and obtain lim inf (g 2 ∼ G 2 ) and y(t) = m 1 ( p). Then (x, p), ( y, q) ∈ E and τ (x + y) = t ∈ I τ . By construction (x, y) ∈ I S . Moreover, This concludes the proof of the lower bound, and hence of Theorem 4.1 for (ν n ) n∈N .

A Formula for the Maximum of the Support
In this section we apply our large deviations result to a problem from free probability theory. Fix a compactly supported probability measure µ. Its Cauchy transform is the analytic function G µ where for z ∈ C \ suppµ, The function G µ is analytic on C \ supp(µ) and locally invertible on a neigbourhood of ∞. Its inverse, K µ is meromorphic around zero, where it has a simple pole of residue 1. Removing this pole we obtain an analytic function The function R µ is called the R-transform of µ while its coefficients (k n (µ)) n∈N are called the free cumulants of µ. Given that µ has compact support it is determined by its Rtransform. So, given an R-transform we can, at least in theory, obtain the corresponding probability measure. However in order to do so one needs to find the functional inverse of R(z) + 1 z for which a closed-form expression may not exist. Using the large deviations principle of Section 4 we can deduce the right edge of the support of µ, provided that the free cumulants are non-negative.
The problem of determining a measure from its R-transform occurs in free probability: if a 1 , a 2 are free non-commutative random variables of law µ 1 , µ 2 respectively then the law µ of a 1 + a 2 has the property that k n (µ) = k n (µ 1 ) + k n (µ 2 ) and the law ν of λa 1 has k n (ν) = λk n (µ 1 ) for any λ ∈ R. This linearity property allows the computation of laws of free random variables, similarly to the moment generating function in commutative probability theory. For background on free probability theory see for example [12,25] and the survey of probabilistic aspects of free probability theory [4].
Because the R-transform determines the underlying probability measure one might still hope to recover some information about the measure, for example about the support, even when the Cauchy transform cannot be obtained explicitly. The special case where the underlying law is a free convolution of a semicircular law with another distribution has been studied extensively by P. Biane [3].
In this section we describe how the maximum of the support of µ can be deduced from the free cumulants.
Combinatorial considerations of the way R-and Cauchy transform are related [17,23] give rise to the free moment-cumulant formula: where B j (π) is the number of blocks of size j in π.Our starting point is the observation that the edge of the support of a measure can be deduced from the logarithmic asymptotics of its moments: namely if ρ µ is the maximum of the support of µ then log ρ µ = lim sup n→∞ 1 n log t n µ(dt).

(5.2)
Suppose for the moment that all cumulants are positive (which is indeed the first case we will consider, in Section 5.1). Then we can re-write (5.1) as the expectation of an exponential functional of a uniformly random non-crossing partition. Namely, if θ : N −→ R is given by θ j := log k j , and E n = E(·|E n ) (recall that E n denotes the event we conditioned on in Section 2) and C n , the n th Catalan number, is the cardinality of NC(n)) then   = C n E n e 2nτn θ,λn .
In Section 5.1 we evaluate the logarithmic asymptotics of this expectation by Varadhan's Lemma, using the large deviations principle we have proved above.
Using the fact that lim →0 log = 0 one might suppose that a similar result will still hold when some of the cumulants are allowed to be zero. This is indeed the case and we will prove this in Section 5.2.
Remark 5.3. For γ ∈ R the shift operation given by S γ (µ)(E) = µ({x − γ : x ∈ E} shifts the maximum of the support by γ to the right. Also S γ (µ) = µ δ γ which leaves all cumulants invariant, except for the first which is incremented by γ. So we can always take the first cumulant to be anything we like.

All Free Cumulants Positive
We first consider the case where all free cumulants are positive. Examples include the free Poisson distribution.
Theorem 5.4. Let µ be a compactly supported probability measure on [0, ∞) such that its free cumulants (k j ) j∈N all positive. Then the right edge ρ µ of the support of µ is given by where q f (j|i) = q(i,j) r q(i,r) . Despite the apparent formal similarities we do not seem to be able to relate this formula to ours. This is because the free cumulants of a deterministic matrix are given by a very complicated function of its entries. where g : It is a direct application of the contraction principle, Theorem 4.2.1 in [8], that (λ n , τ n ) n∈N satisfies a large deviations principle on M 1 (N) × [0, 1] with rate function J 13 given by J 13 (µ, t) = J(µ, m 1 (µ), t).
Suppose first that the sequence (k n ) n∈N is bounded by K ∈ (0, ∞). Then g is continuous and bounded, with norm g ∞ ≤ 2 log K. So for any γ > 1, We now turn to the general case, that is, we remove the assumption that the sequence of free cumulants is bounded. Because µ is compactly supported, its R-transform is analytic on a neighbourhood of zero, by Theorem 3.2.1 in Hiai-Petz [12]. So there exist Γ, R ∈ (0, ∞) with k n ≤ ΓR n for all n ∈ N. Define the dilation operator of scale 1 x ∈ A} and let k n = R −n k n be the n th cumulant of D R −1 (µ). The sequence k n n∈N is bounded, so the above applies to which completes the proof of Theorem 5.4.

Non-Negative Free Cumulants
We now consider non-commutative random variables of which all free cumulants are nonnegative but some of them are allowed to take the value zero. We will denote by L the set of n ∈ N such k n = 0. As a prominent example we mention the centred semicircle distributions, where L = {2}. It turns out that the variational formula (5.5) still holds, provided we follow the convention that 0 log 0 = 0. Theorem 5.9. Let µ be a compactly supported probability measure whose free cumulants (k n ) n∈N are all non-negative. Then the maximum of the support ρ µ of µ is given by where we M 1 1 (L) denotes the set of p ∈ M 1 1 (N) such that p(L c ) = 0.
Proof. Since the set {p ∈ M 1 (N) : m 1 (L c ) = 0} is closed the direction '≤' in (5.10) follows directly from Exercise 2.1.24 in Deuschel-Stroock [9]. So we only need to show that the logarithm of the maximum of the support of our measure is bounded below by the variational formula. Let p be the free Poisson distribution with parameter 1 and recall that p has support [0, 4]. For > 0 let ν = D −1 (p), the -dilation of p (see the proof of Theorem 5.4). Then k n (ν ) = n . By the remarks after Example 3.2.3 in [12] (page 98) the maximum of the support of µ := µ ν is no bigger than the sum of those of µ and ν . Moreover Theorem 5.4 applies to µ so that using the fact that > 0. Letting tend to zero yields the '≥' direction of (5.10).

Examples
We conclude with a few examples where our formula can be applied. The main requirement, that the free cumulants be non-negative, is satisfied in a wide range of cases. Example 6.1. As a warm-up let us consider two (known) examples where the variational problem can be solved to give an explicit formula for the maximum of the support. The simplest example is the centred semicircle law of radius r given by Then, in the notation of Section 5, L = {2} and k 2 (σ r ) = r 2 4 . The only probability measure on L is δ 2 which has m 1 (δ 2 ) = 2. Therefore, Next let λ ≥ 1 and consider the free Poisson distribution p λ with parameter λ, i.e., The free cumulants are given by k n = λ for all n ∈ N and therefore log ρ p λ = sup 2τ log(λ) + 2τ H(p) + 2τ Θ 1 2τ : m 1 (p) = 1 2τ = 2 sup τ ≤ 1 2 τ log λ + 2τ Θ 1 2τ .

Freely Infinitely Divisible Distributions
Let µ be freely infinitely divisible. That is, for every n ∈ N there exists a compactly supported probability law µ n such that µ is the n-fold free convolution of µ n with itself: µ = µ n . . . µ n n times .
Freely infinitely divisible probability measures have been studied by Barndorff-Nielsen -Thorbjørnsen [1,2]. Many of their properties are non-commutative analogues of those enjoyed by classical infinitely divisible distributions, for example they lead to the concept of free Lévy processes. There exists an analogue of the Lévy-Khintchine representation, a version of which is given in [12], where Theorem 3.3.6 states that µ is freely infinitely divisible if and only if there exist α ∈ R and a positive finite measure ν with compact support in R such that the R-transform R µ of µ can be written, for z in a neighbourhood of (C \ R) ∪ {0}, as R µ (z) = α + z 1 − xz ν(dx).

(6.3)
We call ν the free Lévy-Khintchine measure associated to µ. By Remark 5.3 we lose no generality by setting k 1 (µ) = α = 0. Setting m 0 (ν) := ν(R) we can express the cumulants of µ in terms of the sequence (m n (ν)) n≥0 by observing that k n (µ) = m n−2 (ν) for n ≥ 2. So if µ is freely infinitely divisible and the moments of its free Lévy-Khintchine measure are all non-negative the variational formula for the maximum of the support of µ from Theorem 5.9 applies.

Series of Free Random variables
Let ξ 1 , ξ 2 , . . . be a sequence of free self-adjoint random variables of identical distribution µ 1 and consider the series ξ = ∞ n=1 n −β ξ n where β > 0 is chosen large enough for the series to converge in the operator norm. Let k n (µ 1 ) denote the free cumulants of µ 1 then the R-transform R ξ of ξ is given by Let U be a neighbourhood of zero where R ξ1 is analytic then we have absolute convergence on U and hence may interchange the order of the two summations: n −rβ k r (µ 1 )z r = ∞ r=0 ζ(βr)k r (µ 1 )z r−1 where ζ denotes the Riemann zeta function. So we conclude that the free cumulants of ξ are given in terms of those of ξ 1 by k n = ζ(βn) k n (µ 1 ). It may not be possible to locally invert the corresponding analytic function R 0 in closed form. In this case our formula comes in useful and we obtain: