Large deviations principle for biorthogonal ensembles and variational formulation for the Dykema-Haagerup distribution

This note provides a large deviations principle for a class of biorthogonal ensembles. We extend the results of Eichelsbacher, Sommerauer and Stotlz to more general type of interactions. Our result covers the case of the singular values of lower triangular random matrices with independent entries introduced by Cheliotis. In particular, we obtain as a consequence a variational formulation for the Dykema-Haagerup as it is the limit law for the singular values of lower triangular matrices with i.i.d. complex Gaussian entries.


Introduction and results
The aim of this note is to extend the work of Eichelbascher, Sommerauer and Stolz [ESS11] in order to prove a large deviations principle for a wide class of biorthgonal ensembles which include the matrix models introduced by Cheliotis in [Che14].The authors of [ESS11] proved a large deviations principle for wide variety of models, such as the biorthogonal Laguerre ensembles or the matrix model of Lueck, Sommers and Zirnbauer [LSZ06] for disordered bosons.Those models deal with particle systems in R or C with a density involving a double interaction term of type i<j |x i − x j ||x θ i − x θ j | with θ ∈ N * .Biorthogonal ensembles were introduced by Muttalib in physics in [Mut95] and by Borodin in mathematics in [Bor98].The recent article [BLTW15] develops potential theory for the model we study.Large deviations for particles systems with general repulsion have been studied in [CGZ14] and we show that their results apply to this kind of problems.
In the article [Che14], Cheliotis presented a lower-triangular random matrix model for which the distribution of the singular values can be computed and form a class of biorthogonal ensembles.Later, in [FW15], Forrester and Wang found another matrix model for these ensembles.Large deviations principles for the empirical measures of biorthogonal ensembles enter the general framework of [CGZ14], but it is not clear that this model fits their technical hypotheses.
Triangular matrices are the elementary object that appear in many factorization algorithms, such at the Cholesky or the LU decomposition, so one could wonder if, starting from a random matrix, we can compute the distribution of the coefficients of it's Cholesky decomposition.Bartlett answered that question in [Bar33] and proved that the entries of the Cholesky decomposition of a Wishart random matrix are independent Gaussian variables off diagonal and chi random variables on the diagonal.This result is known as the Bartlett decomposition of a Wishart matrix.Cheliotis studied the reverse problem: given a simple model of random triangular matrices T n , what can we say about the eigenvalues of the eigenvalues of T n T * n ?Fix a positive integer n ∈ N, and two parameters b > 0 and θ ≥ 0, we consider the random lower triangular matrix T n = (X i,j ) 1≤i,j≤n with independent random coefficients X i,j distributed according to: where c j = θ(j − 1) + b and d C is the Lebesgue measure on the complex plane.Note that when θ equals 0 and b equals 1, the non-zero entries are i.i.d.complex Gaussians.
In the article [Che14], Cheliotis was able to compute the distribution of the ordered eigenvalues of the matrices is absolutely continuous with respect to the Lebesgue measure on R n with density: when θ > 0. When θ = 0, the density of the distribution of Λ n is: We notice that for good choices of θ and b, we can recover many classical ensembles, such as the Laguerre ensembles.
In the rest of this note, we are interested in the eigenvalues of 1 n S n .The factor 1/n is the proper scaling to observe a convergence of the empirical measure.We will keep the notation λ 1 , . . ., λ n for the eigenvalues of 1 n S n and we define its empirical measure: The special case where θ = 0 and b = 1 corresponds to the case where all the coefficients are X i,j are independent complex random variables with variance 1 is of particular interest.In [DH04], using free probability theory, Dykema and Haagerup proved that (µ n ) n∈N * converges weakly in probability towards a deterministic measure, known as the Dykema-Haagerup distribution.In [Che14], the same result is proved using the moments method and path counting.The Dykema-Haagerup distribution µ DH is compactly supported and absolutely continuous with respect to the Lebesgue measure on R + * with density: where W 0 is the Lambert function.W 0 is analytic in C \ (−∞, −e −1 ] and can be extended to C so that it is continuous on the upper half plane, see figure 1.
The moments of the Dykema-Haagerup distribution are given by: R The Stieljes transform of µ DH is defined for all z ∈ C with Im(z) > and is given by: The R-transform of µ DH is defined for all z ∈ C such that |z| < 1 and is given by: In [DH04], Dykema and Haagerup proved, using free probability tools, that the coefficients of T n are i.i.d.complex Gaussians, the empirical measures (µ n ) n∈N converge weakly in probability towards a deterministic measure, called the Dykema-Haagerup distribution.Independently, Cheliotis proved the same result in [Che14] using the moments method and path counting.This corresponds to the choice of b = 1 and θ = 0 in our model.
In this note, the term "weak topology" corresponds to the topology associated to continuous and bounded test functions.The Bounded Lipschitz metric d defined as: where the supremum is taken over functions bounded by 1 and 1-Lipschitz metricizes the weak topology and makes M 1 (R + ) a complete space, see [Bog07, Section 8.3].Definition 1.1 (Logarithmic energy).The logarithmic energy is the functional : We also define the off-diagonal logarithmic energy where we integrate on the complement of the diagonal of (R + ) 2 .
We define the confining potential associated to the eigenvalue distribution (1) and (2).Let V : R → R given by: As the eigenvalues of 1 n S n are the eigenvalues of S n divided by n, we can compute the distribution of the unordered eigenvalues (λ 1 , . . ., λ n ).This distribution is absolutely continuous with respect to the Lebesgue measure on R n with density: where Z n is a normalizing constant, depending on the model.Those two distributions are of the form: if we chose g = g θ where: This density can be written in the form: where Z n is a normalizing constant and where g * µ is the push-forward of the measure µ by the function g.The term n j=1 x b−1 j will play no role in the large deviations and the same results are valid without this term.We keep this term so that the connection with the model of random matrices introduced by Cheliotis is straightforward.To recover every Laguerre ensemble, one can consider "b = bn", which would correspond to change the function V .Theorem 1.2 (Large deviations principle for µ n ).Let g be a C 1 function on R + * , such that its derivative is positive.Let V be a continuous function on R + such that there exist a constant β > b such that we have: Let us define I : M 1 (R + ) → R ∪ {∞} given by: The random sequence (µ n ) n∈N satisfies a large deviations principle with speed n 2 in M 1 (R) for the weak topology with good rate function Ĩ = I − inf I.This means that for any Borel set A ∈ M 1 (R + ) we have: In addition, the rate function I − inf I is lower semi-continuous and strictly convex on the set of the measures on which it is finite.

Remark 1.3 (Assumptions on g and V .). The assumptions on g mean that the two interaction terms play the same role of short range repulsion, but at different scales. Our hypothesis on g can be rephrased as "g is locally a
The assumptions on V are very standard in large deviations for Coulomb gases.They ensure that e −V (x) dx is finite and that that rate function is well defined.
In [ESS11], Eichelsbacher, Sommerauer and Stolz proved a large deviations principle for the empirical measures µ n when g = g θ and θ is an integer and where V can depend on n.The classical techniques to prove large deviations for the empirical measures of Coulomb gases apply here with no modification.
The novelty of our approach is to extend the result of [ESS11] to any function g.Our theorem covers the original model of Muttalib from [Mut95] with g(x) = Argsh 2 ( √ x) which was the starting point of the study of biorthogonal ensembles.The matrix model introduced by Cheliotis corresponds to the choice of g = g θ where θ > 0. Choosing g(x) = exp(x) gives the large deviations for the model of [CW14].The key of this article is the way we deal with the lower bound.Instead of inspiring from the proof of the lower bound originally given by Ben Arous and Guionnet in [BAG97], we adapt the proof of Hiai and Petz from [HP00].We show that the article [CGZ14] covers a wide class of biorthogonal ensembles, which did not seem obvious.The authors of [BLTW15] consider a very close model as the study holomorphic functions g while the density (3) is integrated with respect to more general measures on C or R. Our techniques rely on the classical probabilistic approach of large deviations while they adopt a Bernstein-Markov approach.
From this result we obtain two important corollaries, which are the motivation for our study: a variational formulation for the Dykema-Haagerup distribution and the almost sure convergence of (µ n ) n∈N towards this measure.We also state a large deviations principle for the top right particle.
Corollary 1.4 (Almost sure convergence towards the minimizer).Let g be a C 1 function on R + * , such that its derivative is positive.Let ν be the unique minimizer of the functional I. Then the random sequence of measures (µ n ) n∈N converges weakly almost surely towards the deterministic measure ν.
Corollary 1.5 (Variational characterization of the Dykema-Haagerup Distribution).The Dykema-Haagerup distribution µ DH is the unique minimizer on M 1 (R + ) of the functional : which is strictly convex.
Theorem 1.6 (Large deviations for the largest particle).Let (x 1 , . . ., x n ) be distributed according to (3) and let x * n = max 1≤i≤n x i .Suppose that the hypotheses of Theorem 1.2 are satisfied and assume that there exist a constant ζ such that: where Z * n−1 is the normalizing constant of the gas (3) with n − 1 particles and confining potential n n−1 V .Let µ eq be the limit measure of (µ n ) n∈N * and let b e q be the right endpoint of its support.The random sequence (x * n ) n∈N * satisfies a large deviations principle in R + with speed n and good rate function: where κ is such that J(b eq ) = 0.This theorem will not be proved in this note as the authors of [CE15] already proved this theorem for the model of [ESS11].In the setting of [CE15], the number of particles at step n is not n but p(n) which makes their result more technical.One could also adapt the proof of the similar theorem from [AGZ10] as the scheme of the proof is the same.First, the product structure of the density (3) allows us to separate the variables and integrate with respect to x 1 < • • • < x n−1 .Then, the large deviations principle for the empirical measure allows us to says that the particles x 1 < • • • < x n−1 generate the same potential as the measure µ e q.Finally, the assumption on the normalizing constants allows us to control the error we do by changing the measure from n particles to n − 1 particles.
Remark 1.7 (Large deviations for the top eigenvalue for Cheliotis' matrix model).In the article [Che14], Cheliotis gives exact formulas for the normalizing constants Z n when g = g θ and V (x) = x.It is straightforward to check that this model satisfies the assumptions of Theorem 1.6.One can also obtain another proof of the fact that for the Dykema-Haagerup model, λ max converges almost surely towards e.
The rest of the note is devoted to the proofs of the theorems.We start by proving the large deviations principles and then we deduce the variational formula and the almost sure convergence.

Proof of the large deviations principle.
The proof of Theorem 1.2 is very close to the standard proof of large deviations principle for Coulomb gases in R.Many authors proved similar results following the steps of [BAG97].For general b and positive integer θ, theorem 1.2 is a special case of the article [ESS11].The proof is organized in several classical steps: Not much is new in the proof that we present here, hence we will focus on what differs from the usual techniques.The parts of the proof that are omitted can be taken from [AGZ10] or [CGZ14].The classical proof is organized as follows: Step 1: Study of the rate function; Step 2: Exponential tightness for the non-normalized measures; Step 3: Weak upper bound for the non-normalized measures; Step 4: Weak lower bound for the non-normalized measures; Step 5: Recover the full large deviations principle for the normalized measures.
We will give the fundamental inequality to prove step 1.Then, the classical proofs of step 2 and 3 apply with no modification.We will give full details about step 4 as it is the difficult part of the proof.Once the large deviations principle is proved for the nonnormalized measures, step 5 just consists in obtaining the asymptotic of the normalizing constants by applying the large deviations inequalities for the whole space of probability measure.

Study of the rate function.
Definition 2.1.We set, for any non-negative x and y: Using the inequality: we obtain: This inequality shows that the function I is well defined and taxes its values in R∪{∞}.This inequality is the key to prove that I is a good rate function.All the details are given in the reference book [AGZ10, Lemma 2.6.2 p.72].
To prove that the rate function I is strictly convex where it is finite, we observe that the logarithmic energy µ → E(µ) is known to be a strictly convex function where it is finite, see [AGZ10] or [Dei00].As the function µ → g * µ is linear, the function µ → E(g * µ) is strictly convex where it is finite.The rate function I is the sum of two strictly convex functions and a linear function, hence it is strictly convex on the set {µ ∈ M 1 (R + ) | I < ∞}.The exponential tightness is also a consequence of the inequality (5), see for instance [AGZ10, p.77].
To prove the upper bound for non-normalized measures, the strategy of the proof is exactly the same as in [CGZ14].This proof applies with no modification.

Proof of the lower bound.
The proof of the lower bound from [ESS11] does not seem to cover the case where g is not an integer power function.The classical scheme of proof from Ben Arous and Guionnet for the lower bound does not suit well for biorthogonal ensembles.We show that scheme of proof of [HP00] for the lower bound is more robust and allows to deal with more general types of interactions.
We want to prove that we have, for any σ ∈ M 1 (R + ): The classical technique consists in constructing configurations for which the density of (1) is very close to exp(−I(σ)).It is important to check that the measure of the configurations we created in R n does not decay too fast.Unfortunately, it is not easy to do it for general measure σ.We notice that it suffices to prove the bound for sufficiently regular measures σ.

First
Step: Reduction to "nice" measures.
We will prove that for any sufficiently regular measure σ, we have: where the infimum is take over G neighborhood of σ.In order to prove that this bound is sufficient to obtain the lower bound of the large deviations principle, we prove that the function φ : M 1 (R + ) → R given by: Let G be a neighborhood of σ, then there exists an integer K such that for all k ≥ K, σ k ∈ G.This implies that for any k ≥ K: where G k are neighborhoods of σ k .Then if we take the limit superior of this inequality and the infimum over G neighborhood of σ we obtain the upper semi-continuity of φ.If we prove (8) for a dense set of measures, then for any measure σ ∈ M 1 (R + ), there exist measures σ k such that (8) holds and σ k → σ we get: We will consider a specific sequence of measures σ k such that for any k, σ k is absolutely continuous with respect to the Lebesgue measure on R + , with compact support in R + * and density bounded from above and below by positive constants and such that: Once we obtain this sequence, we will only have to prove the lower bound for the measures satisfying the regularity conditions given above. Let ) σ, then, as f is bounded from below, by the monotone convergence theorem we get: so we can assume that σ has compact support in R + * .Now let φ ε be a C ∞ probability density with support in [0, ε], then we set σ ε = φ ε * σ.The measures σ ε have compact support in R + * with continuous density and converge towards σ as ε goes to zero.Since it is easy to check that V (x)dσ ε (x) − −− → ε→0 V (x)dσ(x), we only have to prove that for any ε Recall that the function −E is concave, so if we notice that then, thanks to the Jensen inequality and the invariance by translation of the logarithmic energy, we obtained the desired inequality.The last thing we want for our "nice" measures is that the density is bounded from above and from below.As the density of the measures σ ε are continuous with compact support, those densities are already bounded from above.
Changing σ ε to δm + (1 − δ)σ ε where m is the uniform measure on the support of σ ε allows us to deal with measures with continuous density bounded from above and from below.
Second step: lower bound for "nice" measures.
From now, σ will be a measure with compact support [a, b] ⊂ R + * , with density h with respect to the Lebesgue measure on R + for which there exist a constant C > 0 such that Let a 0 , . . ., a n be the 1 n -quantiles of σ, with a 0 = a and a n = b.We have that for any k, then for any (z 1 , . . ., z n ) ∈ ∆ n , we have: where d is the bounded-Lipschitz distance.We are now ready to prove the lower bound.Let ρ 1 be the finite measure on R + x b−1 e −V (x) dx and ρ n = ρ 1 ⊗ • • • ⊗ ρ 1 the finite n-th product measure on (R + ) n .
We notice that: Hence, to obtain the lower bound, it is sufficient to prove that we have: and, using the fact that the functions logarithm and g are increasing: and also: If we admit temporarily the inequalities (10), ( 11) and ( 12), the proof of the lower bound for regular measures is completed.The last step will consist in proving those three inequalities.

Last step: Proof of the inequalities.
First, (10) is easy to check as we approximate a continuous integrable function on [a, b] by simple functions.
We now prove (11) following the proof of [HP00].We admit temporarily that there exist a constant A > 0 such that for i < j: and also that: We postpone the proof of the inequalities ( 13) and ( 14) to prove (11).We call: log |x − y| and we want to prove that: then for every ε > 0 we have : Then we take the limit superior in both sides, and the limit when ε → 0 which proves (11).
We prove now inequality (13).From inequality (9), we get for any k > 0: We deduce from this inequality that the left part of the inequality is bounded by a constant independent of k and n, which proves (13).In order to prove (14), we start from: If we show that: can be made as small as desired when k is bigger than a certain constant independent of n, then (14) will be proved.Using (9) we get: Those two terms can be made as small as desired is k is sufficiently large, independently of n, which proves (14).The proof of the inequality (12) mimics the proof of inequality (11).Like in the previous case, it is sufficient to find a constant A such that for any i < j: and to prove that: As the support of σ is a compact included in R + * , there exist two constants m and M such that for all The inequality (15) is a consequence of (13), using the mean value theorem for g and the fact that its derivative is bounded from above and from below.The inequality ( 16) is equivalent to prove that the quantities are as small when k is large enough.Using the mean value theorem we get: The other term is treated in the same way.Now that we have proved (15) and ( 16), the proof of (12) is the exactly the same as the proof of (11).
As the function I is lower semi-continuous and strictly convex, it has a unique minimizer, called µ b,/theta .Consider the sets: As I is lower semi-continuous, inf{I ( µ), µ ∈ A ε } > 0, then, thanks to the Borel-Cantelli lemma, we get: As we already know that when b = 1 and θ = 0 the random sequence (µ n ) n∈N * converges weakly in probability towards the Dykema-Haagerup distribution µ DH .We also know from Corollary 1.4 that (µ n ) n∈N * converges almost surely weakly towards the minimizer of I. Hence we obtain the following characterization of µ DH :

Perspectives.
We can extend to any finite number of interactions of type: where each of the f k is locally a C 1 diffeomorphism and the β k are positive numbers.Large deviations will be valid if the confining potential V dominates all the functions f k at the same time at infinity.The proof of this result would be similar to the proof of Theorem 1.2.The result of this note can be extended in any dimension if we make additional assumptions on the function g.One could assume that g is continuously differentiable and that on any compact K, there exist a constant m K such that for any x, y ∈ K: g(x) − g(y) ≥ m K x − y .This condition is equivalent to g being locally a C 1 diffeomorphism.The article [BLTW15] covers the complex case.
The model studied by Götze and Vencker in [GV14] is not covered by this note, as they deal with a double interaction term of the type i<j |x i − x j | 2 φ(x i − x j ).This is really the combination of two different interactions whereas our model deals with the usual logarithmic interaction at two different scales.As this model is covered by the study [CGZ14], one could try to find the optimal conditions of φ so that a large deviations principle is valid.
We would like to thank Dimitris Cheliotis whose work [Che14] is the starting point of this study.