Universality in Random Moment Problems

Let $\mathcal{M}_n(E)$ denote the set of vectors of the first $n$ moments of probability measures on $E\subset\mathbb{R}$ with existing moments. The investigation of such moment spaces in high dimension has found considerable interest in the recent literature. For instance, it has been shown that a uniformly distributed moment sequence in $\mathcal M_n([0,1])$ converges in the large $n$ limit to the moment sequence of the arcsine distribution. In this article we provide a unifying viewpoint by identifying classes of more general distributions on $\mathcal{M}_n(E)$ for $E=[a,b],\,E=\mathbb{R}_+$ and $E=\mathbb{R}$, respectively, and discuss universality problems within these classes. In particular, we demonstrate that the moment sequence of the arcsine distribution is not universal for $E$ being a compact interval. On the other hand, on the moment spaces $\mathcal{M}_n(\mathbb{R}_+)$ and $\mathcal{M}_n(\mathbb{R})$ the random moment sequences governed by our distributions exhibit for $n\to\infty$ a universal behaviour: The first $k$ moments of such a random vector converge almost surely to the first $k$ moments of the Marchenko-Pastur distribution (half line) and Wigner's semi-circle distribution (real line). Moreover, the fluctuations around the limit sequences are Gaussian. We also obtain moderate and large deviations principles and discuss relations of our findings with free probability.


Introduction
Let P(E) denote the set of probability measures on an (possibly infinite) interval E ⊂ R with finite moments of all orders. For a measure µ ∈ P(E) denote by m j (µ) = E x j dµ(j) its j-th moment and define M n (E) := (m 1 (µ), . . . , m n (µ)) : µ ∈ P(E) as the set of moment sequences up to order n, generated by P(E). The set M n (E) is convex and has been the subject of many studies beginning with Karlin and Shapeley (1953), Karlin and Studden (1966) and Krein and Nudelman (1977). In these classical works, geometric aspects of moment spaces were studied. While the even more classical moment problems deal with all possible moment sequences, a probabilistic investigation rather asks how a typical moment sequence looks like. This was initiated in Chang et al. (1993), where a uniform distribution on M n ([0, 1]) was considered. There it was shown that the first k moments of such a random vector (m with the covariance matrix Σ k = (m * i+j − m * i m * j ) k i,j=1 . Gamboa and Lozada-Chang (2004) investigated corresponding large deviations principles, while Lozada-Chang (2005) studied similar problems for moment spaces corresponding to more general functions defined on a bounded set.
More recently, Dette and Nagel (2012) defined special probability distributions on the noncompact moment spaces M n ([0, ∞)) and M 2n−1 (R). They could establish results analogous to (1.2) with the moments of the arcsine distribution replaced by those of the Marchenko-Pastur distribution (on [0, ∞)) and of the semicircle distribution (on R), respectively.
In this article, we are going to investigate this surprising occurrence of the three distributions arcsine, Marchenko-Pastur and semicircle distribution in more detail. We are particularly interested in a possible universality of these distributions, as in random matrix theory the latter two appear naturally for large classes of random matrices with independent entries (see e.g. Bai and Silverstein (2010) and references therein). The arcsine measure also appears as a universal distribution of zeros of orthogonal polynomials with respect to weight functions on compact intervals (see Stahl and Totik (1992)). Especially for unbounded moment spaces a clarification of universality seems desirable, as there is no uniform measure and thus the consideration of a particular probability measure needs justification. In other words, we are asking for how typical the moment sequences of arcsine, semicircle and Marchenko-Pastur distribution are.
The paper will be organized as follows. In Section 2 we review some basic facts about moment spaces and introduce general classes of distributions on the moment spaces under consideration. They keep two key features of the uniform distribution on M n ([a, b]) and can be used to interpolate between distributions on compact and non-compact moment spaces. For these distributions we derive laws of large numbers of the type (1.1). In particular, we show that for moment spaces M n ([a, b]) corresponding to compact intervals there is no universality of the arcsine distribution. Instead, the arising measures are known as free binomial distributions, i.e. the analogues of the binomial distribution in free probability theory. On the other hand, for the moment spaces M n ([0, ∞)) and M n (R) the first k moments of a random vector always converge to the first k moments of Marchenko-Pastur and semicircle distributions, respectively. The occurrence of both distributions will be explained in terms of free Poissonian and free central limit theorems for the free binomial distribution. In Section 3 we consider central limit theorems of the form (1.2) and investigate moderate and large deviations principles for random moment sequences. All proofs are postponed to Section 4. Our results provide an extensive description of the distributional properties of random moment sequences and a unifying view on several findings in the recent literature.

Laws of Large Numbers
To motivate the class of distributions considered in this paper, we remark first that a real valued sequence (m i ) i∈N 0 is a sequence of moments corresponding to a Borel measure on the real line if and only if all Hankel matrices (m i+j ) n i,j=0 are positive semi-definite (see Hamburger (1920)). Similar characterizations exist for measures supported on the half line [0, ∞) and compact intervals, and the corresponding sequences are called Stieltjes and Hausdorff moment sequences (see Dette and Studden (1997)). Due to restrictions and relations of this type, the components of a random moment vector in M n (E) are generically not independent coordinates. Moreover, for a compact interval E the moment space M n (E) is a rather small set. For instance, it is known that the volume of M n ([0, 1]) is of order O(2 −n 2 ) (see Karlin and Shapeley (1953)), 2 as for a given moment sequence (m 1 , . . . , m n−1 ) ∈ M n−1 ([0, 1]), the possible range of the n-th moment m n is very small.
For these reasons, we will consider different sets of coordinates that scale with the possible range of values. Although there are infinitely many choices of such coordinates, some are particularly natural and have found considerable attention in the literature. To be precise, assume that (m 1 , . . . , m j−1 ) ∈ M j−1 ([a, b]) is a given vector of moments up to the order j − 1. Then, because of convexity of M j ([a, b]), the set of possible values m j Dette and Studden (1997), we define for m + j = m − j and a given j-th moment m j the j-th canonical moment p j via The canonical moments are left undefined if m − j = m + j (in this case the vector (m 1 , . . . , m j−1 ) is a boundary point of the set M j−1 ([a, b]) -see Karlin and Studden (1966)). Clearly, p j ∈ [0, 1], and p j gives the relative position of m j in the available section of the set M j ([a, b]). It is also worthwhile to mention that canonical moments are invariant under linear transformations of the measure (see Dette and Studden (1997), p. 13). The correspondence map between the canonical and ordinary moments is one-to-one from (0, 1) n onto Int(M n ([a, b])) (Int denoting the interior) and many classical quantities of the measure, especially of its associated orthogonal polynomials and the continued fraction expansion of its Stieltjes transform, have expressions in terms of the canonical moments (see Dette and Studden (1997) for more details). Canonical moments were introduced in a series of papers by Skibinsky (1967Skibinsky ( , 1968Skibinsky ( , 1969 and are closely related to the Verblunsky coefficients, which were investigated much earlier by Verblunsky (1935Verblunsky ( , 1936 for measures on the unit circle. In case of the uniform distribution on M n ([0, 1]), as studied in Chang et al. (1993), the canonical moments have two important properties. After a change of variables by (2.1), the uniform distribution on M n ([0, 1]) has a density w.r.t. the Lebesgue measure on (0, 1) n proportional to n j=1 (2.2) Thus, the canonical moments are independent and for n ≫ j nearly identically distributed.
To investigate a possible universality of the arcsine distribution, we will now define a class of distributions respecting these two properties. However, we will generalize the situation by allowing for different distributions of even and odd canonical moments. This takes into account the different roles that even and odd moments play. While even moments are always positive and give some rough information about the size of the support of the measure, odd moments give information about location of the support and the symmetry of the measure. In canonical moments, symmetry around the center of [a, b] can be characterized easily as the property that all odd canonical moments are 1/2 (see Skibinsky (1969)).
Let us now formulate our first result for random moment sequences on measures supported on the interval [a, b]. Here and later on, we will tacitly assume that the random variables (m (n) j ) j,n≥1 are defined on the same probability space.
(1) Let a < b and V 1 , V 2 ∈ C 2 ((0, 1)) be continuous at 0 and 1. Assume that the functions and each have a unique minimizer p * 1 ∈ (0, 1) and p * 2 ∈ (0, 1), respectively. Let m (n) = (m (1) If p * 1 = p * 2 = 1/2, the measure µ p * 1 ,p * 2 in Theorem 2.1 is the arcsine distribution on the interval [a, b]. Note that this does not imply V 1 = V 2 ≡ 0. However, we see that for p * 1 = 1/2 or p * 2 = 1/2, the limiting measure (the measure having the limiting moments) is not the arcsine measure or an affine rescaling of it. We conclude that the moments of the arcsine measure are not universal within the class of random moment sequences in M n ([a, b]) with nearly i.i.d. canonical moments. On the other hand, there is still some universality as the limiting measure only depends on V 1 , V 2 via the parameters p * 1 and p * 2 .
(2) Since for probability measures supported on a fixed compact set convergence of moments is equivalent to convergence in distribution, the convergence result of Theorem 2.1 can be restated as follows: Let µ n ∈ P([a, b]) be a random probability measure with first n moments (m Then µ n converges a.s. (and in expectation) weakly to µ p * 1 ,p * 2 as n → ∞. The measure µ p * 1 ,p * 2 is known in the literature under (at least) two different names. In the context of probability theory on graphs, it is called Kesten-McKay measure (see Kesten (1959);McKay (1981)). It has also been studied in the context of orthogonal polynomials (see Cohen and Trenholme (1984); Saitoh and Yoshida (2001); Castro and Grünbaum (2013)). In free probability, it is called free binomial distribution (see Nica and Speicher (2006)). It will turn out useful to explain this naming in more detail.
Free probability is a variant of non-commutative probability theory initiated by Voiculescu (see Nica and Speicher (2006) or Chapter 22 by Speicher in Akemann et al. (2011) for an introduction and references) that has found its applications in particular in random matrix theory. For our purposes it suffices to know that free probability theory uses a different notion of independence, called freeness, that manifests itself in a different convolution of probability measures. A constructive approach to this convolution uses random matrices: Let H 1,n , H 2,n be deterministic diagonal n × n matrices with diagonal entries h 1,n (ii) and h 2,n (ii), respectively. Assume that the empirical measures of the diagonal entries, i.e. the eigenvalues, converge for n → ∞ weakly to probability measures of bounded support µ 1 and µ 2 , respectively, that is weakly.
Now let for each n a Haar distributed random unitary n × n matrix U n be given on a common probability space. The Haar probability measure on the unitary group U n is the unique Borel probability measure that is invariant under left (and right) multiplication with any group element. Letting x 1 , . . . , x n denote the n real random eigenvalues of the Hermitian random matrix H 1,n + U n H 2,n U * n , the empirical measure of the x i 's converges for n → ∞ almost surely in distribution to a non-random limit. This limit is called the free (additive) convolution of µ 1 ⊞ µ 2 , in symbols In analogy to classical probability, the free binomial distribution with parameters n ∈ N and p ∈ [0, 1] is then the n-fold free convolution of the Bernoulli distribution µ = (1− p)δ 0 + pδ 1 with itself. It seems convenient to extend the name to convolutions of measures µ = (1 − p)δ c + pδ d with itself, c, d ∈ R. Moreover, even fractional convolution numbers are possible using an analytic approach to the free convolution via the so-called R-transform (see (Akemann et al., 2011, Chapter 22)). It seems difficult to give a direct interpretation of the occurence of the free binomial distribution in the context of random moments. For instance it is not hard to verify that for µ = 1 2 δ c + 1 2 δ d the free convolution µ ⊞ µ is the arcsine measure with support , but in general the measure µ p * 1 ,p * 2 is not just a two-fold convolution of a Bernoulli measure with itself.
However, free probability indicates that universal limiting measures may be expected if random moment problems are considered for the moment spaces M n (R + ) with R + := [0, ∞) and M n (R). Indeed, analogous to classical probability, there are free analogs of Poisson limit theorem and central limit theorem for the free binomial distribution (Akemann et al., 2011, Chapter 22). Typically, they are considered for µ = (1 − p m )δ 0 + p m δ 1 and show weak convergence of the rescaled n-th convolution power µ ⊞m to the free Poisson (Marchenko-Pastur distribution) or the free Gaussian law (semicircle distribution), as m → ∞ and p m converges to a zero or non-zero number, respectively.
The following corollary can be seen as a variant of these limit theorems. The proof is straightforward and will be omitted.
The density of the absolutely continuous part of µ p * 1,m ,p * 2,m (x) converges pointwise to the density of the absolutely continuous part of µ M P,z * 1 ,z * 2 and uniformly within compact subsets of (l − , l + ). Moreover, the moments of µ p * 1,m ,p * 2,m converge to the moments of µ M P,z * 1 ,z * 2 .
(1) The measure µ M P,z * 1 ,z * 2 is called Marchenko-Pastur distribution (see Hiai and Petz (2000) or Nica and Speicher (2006)). For z * 1 ≥ z * 2 (absolutely continuous case) it is the equilibrium measure on R + (in the sense of (2.4)) to the field Besides its role in free probability theory as the free analog of the Poisson distribution it is particularly well-known for its universality in random matrix theory. More precisely, let X denote an m × n random matrix with real i.i.d. entries having mean 0 and variance σ 2 > 0. Assume that as m, n → ∞ we have m/n → λ ∈ (0, ∞). Then the empirical distribution of the eigenvalues of the sample covariance matrix XX T /n converges a.s. and in expectation weakly to µ M P,z 1 ,z 2 , where z 1 := σ 2 (1 + √ λ)/(1 + √ λ) 2 and z 2 := λz 1 . For this result and generalizations we refer to Bai and Silverstein (2010) and references therein.
(2) The measure µ SC,α * ,β * is called semicircle distribution. It is the equilibrium measure to the field In free probability, it plays the role of the Gaussian distribution. In random matrix theory it is the universal limit of so-called Wigner matrices: Let X be an n × n random matrix with real i.i.d. mean 0 and variance σ 2 > 0 entries on and above the diagonal and the entries below the diagonal are chosen such that X is symmetric. Then the empirical distribution of the eigenvalues of X/ √ n converges a.s. and in expectation weakly to µ SC,α,β as n → ∞, where α = 0 and β = σ 2 , see e.g. Bai and Silverstein (2010). The universality in these random matrix statements lies in the fact that the limiting distribution is always the same regardless of the distribution of the matrix entries.
(3) The measures µ p * 1 ,p * 2 , µ M P,z * 1 ,z * 2 and µ SC,α * ,β * all belong to the so-called free Meixner class. It consists of the free analogues of the six classical Meixner class distributions which are Gaussian, Poisson, gamma, binomial, negative binomial and hyperbolic secant distribution. The distributions of the free Meixner class enjoy some interesting characterizing properties, for instance having a generating function of resolvent type for the corresponding orthogonal polynomials (see Anshelevich (2007) for details) in analogy to the generating functions of the classical Meixner class being of exponential type (see Meixner (1934)).
Let us now turn to infinite moment spaces, starting with M n (R + ) (recall R + = [0, ∞)). Following Dette and Nagel (2012), we may define the canonical moments z 1 , . . . , z n of a moment sequence m 1 , . . . , m n in the interior of M n (R + ) as Here one uses that given m 1 , . . . , m k−1 , the section of possible values of m k for given moments (m 1 , . . . , m k−1 ) ∈ Int(M k−1 (R + )) is an interval of the form [m − k , ∞) (see Karlin and Studden (1966), Chapter V). Clearly, z k ∈ R + . The correspondence ϕ R + n : z n = (z 1 , . . . , z n ) → m n = (m 1 , . . . , m n ) (2.7) 7 between canonical and ordinary moments is one-to-one from (0, ∞) n onto Int(M n (R + )) (for all n ∈ N). The Jacobian of this transformation is readily computed as (2.8) To define a probability measure on Int(M n (R + )), consider continuous functions V 1 , V 2 : R + → R, such that for some ε > 0 and all z large enough the inequality holds. Then define a probability measure P n,R + ,V 1,2 on M n (R + ) by P n,R + ,V 1,2 (∂M n (R + )) = 0 and on Int(M n (R + )) via the density where Z n,R + ,V 1,2 is the normalizing constant such that P n,R + ,V 1,2 is a probability density with respect to the Lebesgue measure on Int(M n (R + )). This is possible due to (2.8) and (2.9). Because of (2.8), the canonical moments z 1 , z 2 , . . . , z k are independent under P n,R + ,V 1,2 and for large n and fixed k nearly identically distributed. Note that Dette and Nagel (2012) considered the special case of (2.10) with V 1 (t) = V 2 (t) = t − c n log t and showed that under this measure the (ordinary) moments converge to those of the Marchenko-Pastur distribution. Here we will show that the moments of the Marchenko-Pastur distribution are in fact universal for all generic functions V 1 , V 2 .
Theorem 2.5. Let V 1 , V 2 ∈ C 2 ((0, ∞)) be continuous at 0, satisfy (2.9) and assume that almost surely and in L 1 , where m * 1 , . . . , m * k are the first k moments of the Marchenko-Pastur distribution µ M P,z * 1 ,z * 2 defined in (2.5), that is Next, we consider the moment space corresponding to measures supported on R. We will use the recurrence coefficients of the corresponding orthogonal polynomials as a coordinate system. To be precise, note that for any measure µ ∈ P(R) there is a sequence of monic polynomials P 0 (x), P 1 (x), . . . with deg P j = j that is orthogonal in L 2 (µ). If µ is supported on finitely many points, the sequence is finite. In any case, P j (x) depends on the measure µ via its moment sequence (m 1 , . . . , m 2j−1 ) only. The orthogonal polynomials satisfy a three-term recurrence relation of the form P j+1 (x) = (x − α j+1 )P j (x) − β j P j−1 (x), j = 1, . . .
Lemma 2.6. There is a bijection ϕ R 2n : (R × (0, ∞)) n → Int(M 2n (R)), (α 1 , β 1 , α 2 , . . . , α n , β n ) → (m 1 , . . . , m 2n ) (2.13) between the recursion coefficients of the orthogonal polynomials and the corresponding moments. The Jacobian of ϕ R 2n is The values β j have a simple interpretation in terms of moments, as is the ratio of two consecutive even moments. The coefficients α j give information about symmetry of the measure, e.g. for µ symmetric around 0, one has α j = 0 for all j. Taking into account these two different roles, we will again consider two continuous functions V 1 : R → R and V 2 : R + → R such that for some ε > 0 and |α|, β large enough (2.14) With these notations we define the probability measure P n,R,V 1,2 on M n (R) by P n,R,V 1,2 (∂M n (R)) = 0 and on Int(M n (R)) via the density P n,R,V 1,2 (m 1 , . . . , m n ) : and obtain the following universal law of large numbers.
Theorem 2.7. Let V 1 ∈ C 2 (R), V 2 ∈ C 2 ((0, ∞)) be continuous at 0 and satisfy (2.14). Furthermore, assume that almost surely and in L 1 , where m * 1 , . . . , m * k are the first k moments of the semicircle distribution µ SC,α * ,β * defined in (2.6), that is We finish this section with some concluding remarks concerning the class of models we consider. We study random moment sequences with independent and nearly identically distributed canonical moments or recurrence coefficients, respectively. Dropping either of the two properties will in general result in non-universal limiting sequences even on unbounded intervals, if there is any limit at all. Nevertheless, other related models have been used for successful studies of random matrix models. More precisely, so-called Gaussian beta ensembles admit tridiagonal matrix models, see Dumitriu and Edelman (2002). More recently, Krishnapur et al. (2016) have used tridiagonal matrix models for studying non-Gaussian beta ensembles. They consider exp(−n Tr Q(T )) det(Dϕ R n ) as density on the space of recursion coefficients, where T is the symmetric tridiagonal matrix (truncated Jacobi operator) with the α j 's on the main diagonal and β j 's on the neighboring diagonals, Q is a strictly convex polynomial and Tr denotes the trace. It is not hard to see from the results in Krishnapur et al. (2016) that the limiting moments corresponding to this model are those of the equilibrium measure to Q (see (2.4)), only for Q quadratic (this case is the one studied in Dumitriu and Edelman (2002)) the moments of the semicircle appear.
The connection between certain random matrix ensembles and canonical moments/recursion coefficients has also been used in Gamboa et al. (2016) and Gamboa et al. (2017) for deriving so-called sum rules for free binomial, semicircle and Marchenko-Pastur distribution.

Asymptotic Normality, Moderate and Large Deviations
In this section, we examine the fluctuations of the random moment sequences around their non-random limits. We state the central limit theorem and moderate and large deviations results. For the uniform distribution on the moment space M n ([0, 1]), results of this type were obtained by Chang et al. (1993) and Gamboa and Lozada-Chang (2004), respectively. The following theorem shows that the fluctuations of random moment vectors around their limits are Gaussian. We will adopt a short notation that allows us to state the three cases E = [a, b], E = R + , E = R simultaneously. Note that the functions W 1 , W 2 as well as the limiting moments m * j differ, depending on E.
Theorem 3.1. In the situation of Theorem 2.1, Theorem 2.5 or Theorem 2.7, assume that Then in any of the three cases E = [a, b], E = R + , E = R, for any k ≥ 1 as n → ∞ √ n (m where the matrix Σ k is given by Here, the maps ϕ E k have been defined in (2.1), (2.7) and (2.12), (2.13), the diagonal matrix is of size k × k and y * = (y * 1 , y * 2 , y * 1 , . . .) ∈ R k . In the case E = R + and z * 1 = z * 2 , we have Theorem 3.1 shows that in all considered cases the 1/ √ n-fluctuations of m (n) 1 , . . . , m (n) k around m * 1 , . . . , m * k are Gaussian. We will now study larger fluctuations. The appropriate tool for describing the exponentially small probabilities associated to these fluctuations is the large deviations principle. Recall that a sequence of random vectors (X n ) n with values in a Polish space X is said to satisfy a large deviations principle with speed (b n ) n , lim n→∞ b n = ∞, and good rate function I, if I : X → [0, ∞] is lower semi-continuous, has compact level sets {x ∈ X : I(x) ≤ K}, K ≥ 0 and for any open set O ⊂ X and closed set U ⊂ X cf. (Dembo and Zeitouni, 2010, p. 6). The next theorem is a result on moderate deviations. It shows that on scales up to o(1) the exponential leading order asymptotics are still given by the Gaussian distributions from Theorem 3.1, in particular they are universal.
Theorem 3.2. Let the conditions of Theorem 3.1 be satisfied. Then for any of the three cases E = [a, b], E = R + , E = R, for any real-valued sequence (a n ) n with lim n→∞ a n = ∞ and a n = o( √ n), the sequence of random variables a n (m The next result shows that for fluctuations of order 1 a new, non-universal rate function arises.
Theorem 3.3. Let the conditions of Theorem 2.1, Theorem 2.5 or Theorem 2.7 be satisfied. Then in each of the three cases, the sequence (m Here y * i , i = 1, 2 are as in Theorem 3.1 and y j , j = 1, . . . , k are defined similarly as p j (E = [a, b]), z j (E = R + ) or for E = R as α j+1
Proof of Theorem 3.3. For the sake of brevity we restrict ourselves to the case E = [a, b], the remaining cases can be proved analogously. To this extent, we will show that each p (n) 2i−1 satisfies a large deviations principle on [0, 1] with good rate function where W 1 (p) = V 1 (p) − log(p(1 − p)). Analogously, the p 2i satisfy a large deviations principle on [0, 1] with good rate function I 2 (p) := W 2 (p)− W 2 (p * 2 ) on the interval (0, 1) and ∞ elsewhere. The assertion then follows from the independence of the p i 's and the contraction principle. Note that ϕ [a,b] k is bijective and thus the rate function does not change when passing from canonical to ordinary moments.
For the upper bound (3.2), let U ⊂ [0, 1] be a closed set. If U ⊂ {0, 1}, (3.2) is trivially true by definition of P n, [a,b],V 1,2 and thus we may assume U ∩(0, 1) = ∅. Then, setting W U := inf For the lower bound ( Next, we will prove the results on laws of large numbers in Section 2. It follows from Theorem 3.3 and the Borel-Cantelli lemma that in all three cases (m j 's are uniformly integrable thanks to the exponential decay from the large deviations principle. It remains to identify the corresponding measures to the moment sequences (m * 1 , m * 2 , . . . ). The general technique to do this is to consider the Jacobi operator associated to the recurrence coefficients of the orthogonal polynomials and derive an equation for the Stieltjes transform of the desired measure via a continued fraction expansion. We start with the simplest case of Theorem 2.7, where we explain the strategy in detail.
We will make use of the following lemma.
Lemma 4.1. Let µ be a Borel probability measure on R that is determined by its moments (i.e. the Hamburger moment problem to the moments of µ is determinate). Let α 1 , β 1 , α 2 , β 2 . . . denote the recurrence coefficients of the monic orthogonal polynomials to the measure µ (see (2.11)). If µ is supported on N points, we set β j := 0 for j ≥ N . Then the Stieltjes transform of µ, defined for z ∈ C + := {z ∈ C : ℑz > 0}, has the continued fraction expansion Here the convergents converge locally uniformly in C + as l → ∞.
Although the connection between continued fractions, Stieltjes transforms and orthogonal polynomials is classical and this result should be well-known, we did not manage to find this lemma in the literature. For measures with compact support, it is called Markov's theorem. We will give an elementary derivation.
Proof of Lemma 4.1. Let µ be a measure whose support consists of precisely N distinct points. Then the monic orthogonal polynomials P 1 , . . . , P N up to order N with respect to µ and the corresponding recursion coefficients α 1 , β 1 , α 2 , β 2 , . . . , β N −1 , α N are well-defined. Moreover, if µ has masses ω 1 , . . . , ω N at the points t 1 , . . . , t N and m j denotes the j-th moment of µ, the monic orthogonal polynomial P N is proportional to the polynomial Now the determinant in the last line vanishes whenever two indices i j and i k coincide. If all indices are different, the determinant is equal (up to a sign) to the polynomial ℓ(t) = N i=1 (t−t i ). Consequently, the polynomialsP N and P N are also proportional to ℓ(t) and therefore vanish precisely at the the support points t 1 , . . . t N of the measure µ.
We now define for z ∈ C + the continued fraction Writing f j (z) as a single fraction B j (z) , we see that A j (z) and B j (z), j = 1, . . . , m satisfy the recursions A 0 (z) := 0, B 0 (z) := 1, A 1 (z) := 1, B 1 (z) := z − α 1 and for 2 ≤ j ≤ N . Clearly, B j is a polynomial in z of degree j with leading coefficient 1 and as it satisfies the same recursion as the orthogonal polynomials P j , we conclude B j = P j for 0 ≤ j ≤ N . Furthermore, note that the sequence of functions satisfies the same recursion as A j , from which we can conclude Q j = A j for 0 ≤ j ≤ N . As the roots of P N are precisely the support points of the measure µ we obtain An induction argument over the sum i+j shows that g i,j is a homogeneous polynomial of degree i in z 1 , z 2 , . . .. Consequently, the partial derivative dg i,j dz k is a homogeneous polynomial of degree i − 1. Following the arguments of Dette and Nagel (2012) we have g k,k = m k with Proof of Theorem 3.2. We will only prove the case E = [a, b], the remaining cases are treated similarly. We will first show that each a n (p (n) 2j−1 − p * 1 ) satisfies a large deviations principle with good rate function J(x) := W ′′ 1 (p * 1 )x 2 /2 and speed b n , where (a n ) n and (b n ) n are chosen as in Theorem 3.2. In order to see this, let U ⊂ R be an arbitrary closed set and 0 < ε < 1 sufficiently small so that W ′′ 1 (y) ≥ M > 0 holds for all y ∈ (p * − ε, p * + ε) and some constant M > 0. Set γ := inf x∈U |x|, R(p) := (p(1 − p)) −(2i−1) and let I 1 be the function (4.3). Note that I 1 ≥ 0 with unique zero p * 1 and I ′′ The case γ = ∞ is trivial, since then U = ∅, so we may assume γ < ∞. We will first consider U ∩ {|x| ≥ εa n }. We get lim sup n→∞ 1 b n log U 1 {|x|≥εan} e −nI 1 (x/an+p * 1 ) R(x/a n + p * 1 ) dx ≤ lim sup n→∞ 1 b n log R 1 {|x|≥εan} e −(2i−1)V 1 (x/an+p * 1 ) exp −(n − (2i − 1)) inf |y−p * 1 |≥ε I 1 (y) dx ≤ lim sup n→∞ 1 b n log R a n e −(2i−1)V 1 (t) exp −(n − (2i − 1)) inf |y−p * 1 |≥ε I 1 (y) dt ≤ lim sup n→∞ a 2 n log a n − (n − (2i − 1)) inf |y−p * 1 |≥ε I 1 (y) /n = −∞.