Asymptotic freeness through unitaries generated by polynomials of Wigner matrices

We study products of functions evaluated at self-adjoint polynomials in deterministic matrices and independent Wigner matrices; we compute the deterministic approximations of such products and control the fluctuations. We focus on minimizing the assumption of smoothness on those functions while optimizing the error term with respect to $N$, the size of the matrices. As an application, we build on the idea that the long-time Heisenberg evolution associated to Wigner matrices generates asymptotic freeness as first shown in $[9]$. More precisely given $P$ a self-adjoint non-commutative polynomial and $Y^N$ a $d$-tuple of independent Wigner matrices, we prove that the quantum evolution associated to the operator $P(Y^N)$ yields asymptotic freeness for large times.


Introduction and main results
In this paper, we consider polynomials in several independent Wigner matrices Y N 1 , . . ., Y N d .In the early nineties, in his seminal work [40], Voiculescu showed that the fitting framework to understand spectral properties of such polynomials is Free Probability, a non-commutative probability theory with a notion of freeness analogous to independence in classical probability.For Gaussian Wigner matrices he proved that for any given collection of polynomials P 1 , . . ., P k and i j ∈ [1, d], almost surely, where tr N denotes the normalized trace on M N (C); x 1 , . . ., x d ∈ C d is a system of d free semicircular variables and τ is the trace on the C * -algebra C d ; see Definition 2.7.This result was generalized in numerous ways, to begin with in [14], the author extended this result to non-Gaussian Wigner matrices coupled with deterministic matrices under certain assumptions; see also Theorem 5.4.5 of [3] for a proof without those assumptions on the deterministic matrices.It is further possible to consider continuous functions but only by approximating them with polynomials.For such results, one usually first proves the convergence in expectation, and then uses concentration inequalities to establish the almost sure convergence.One of the limitations of such combinatorial methods is that they are not well-suited to obtain estimates on the convergence rate in (1.1).Strong quantitative estimates for smooth functions were first obtained in [22] by Haagerup and Thorbjørnsen in the Gaussian case.They studied not only the convergence of the trace but also of the operator norm.These results were extended to include deterministic matrices in [32,37], and to more general Wigner matrices in [1,4].Beyond the question of polynomials, it is also worth noting that the case of non-commutative rational functions was tackled in [11,41].Those papers however focus on studying the expectation rather than proving almost sure estimates, i.e. estimates on the difference between our random variable and its deterministic limit which hold with high probability.While measure concentration inequalities are usually sufficient to deduce almost sure results, one does not necessarily have such tools for general Wigner matrices.Indeed if the law of the entries of our random matrices satisfy a log-Sobolev inequality then so do their joint laws; see Section 2.3 of [3] for a good introduction on the topic.However, in this paper we do not make this kind of assumption on our random matrices.A possible way to address this issue would be the approach of [26] which uses mollified log-Sobolev inequalities to prove concentration estimates.In this paper we choose a more direct approach by studying high order moments.
The main results of our paper are a high probability estimates for the trace as well as the matrix entries of any products of sufficiently smooth functions evaluated at polynomials in deterministic and independent Wigner matrices.We are in particular interested in optimizing the error term with respect to not only N but also the derivatives of our functions.This will be especially useful for Theorem 1.2 later below.
The following notions will be used in our statement of Theorem 1.1.A square random matrix Y = (Y ) i,j of size N is a Wigner matrix if it is Hermitian or real symmetric and its entries are independent up to the symmetry constraints.The entries are assumed to be centered with variances E[|Y i,j | 2 ] = 1  N , for all i = j, and they satisfy the moment bounds for any p ≥ 2, c.f. Definition 2.1 below.Given a sequence of random variables (X N , Y N ) N ≥1 as well as a sequence (ε N ) N ≥1 of non-negative real numbers, we say that with high probability if for any k > 0, there exists a numerical constant C such that, P(|X N − Y N | ≥ C ε N ) ≤ N −k holds for any N sufficiently large, see Definition 2.4.
Theorem 1.1.Let the following objects be given, • Y N = (Y N 1 , . . ., Y N d ) independent real symmetric or complex Hermitian Wigner matrices of size N defined as in Definition 2.1 below, • A N = (A N 1 , . . ., A N q ) deterministic matrices of size N , such that sup 1≤i≤q,N ∈N * A N i < ∞, • x = (x 1 , . . ., x d ) a system of free semicircular variables, free from A N , i.e. they belong to the free product C d (x) * M N (C) where C d is the C * -algebra generated by x (see Definition 2.7 below), • f 1 , . . ., f k , k ≥ 1, functions such that either f i = id R or there exists a complex-valued measure µ i such that ∀t ∈ R, f i (t) = R e ity dµ i (y), • P 1 , . . ., P k non-commutative polynomials, such that whenever f i = id R , P i is self-adjoint (see Subsection 2.3).
Then with the convention f i 4 = 1 if f i = id R , and otherwise, where |µ| is the variation of the measure µ.Then we have the following result.For any ε > 0, with high probability, where tr N is the normalized trace on M N (C), while τ N is the trace on the free product C d * M N (C); see Definition 2.7.Moreover for x, y ∈ C N , we also have with high probability that where E MN (C) is the conditional expectation from C d * M N (C) to M N (C) (see Definition 2.8 and the remarks that follow, notably for some special cases where this conditional expectation is easy to compute).Finally, if for every i, Y N i is a GUE or GOE random matrix, then one can replace f i 4 by f i 2 in all of the previous formulas.
Theorem 1.1 calls for the following remarks: • It is important to note that unlike previous works, we do not study the Stieltjes transform to get Equations (1.5) and (1.6), instead we study the Fourier transform of the functions f i .A different strategy would be to use Helffer-Sjöstrand calculus after studying a product of resolvent, i.e of terms of the form (z i − P i (Y N , A N )) −1 for some polynomials P i and complex numbers z i .But then the error term will depend on max i |ℑz i | −k which translates into a bound in terms of max i f i k .However as one can see in Equations (1.5) and (1.6), this dependence on k is not optimal as soon as k is larger than 4.
• The norm • 4 is the one associated to the fourth Wiener space W 4 (R); we refer to [35], Section 3.2, for more information on the topic.In particular, if f and its Fourier transform are integrable, then f satisfies Assumption (1.3) with the measure dµ(y) = f (y) dy.Then assuming that f 4 is finite implies that f is four times differentiable and Thus heuristically f 4 is related to the fourth derivative of f .
• Note that in Theorem 1.1, one can chose the functions (f i ) i∈ [1,k] to depend on N , this implies that the norm in the error term will also depend on N .This allows us for example to consider mesoscopic test functions of the form where E ∈ R, a ∈ [0, 1/2) and the functions g i satisfy Assumptions 1.3 and 1.4.However the error term will be of order N 4a+ε−1 max i g i 4 (or Besides the deterministic approximation will depend on N in Equations (1.5) and (1.6).
• The estimates in (1.5) and (1.6) are optimal in terms of the powers in N since in the case of GUE random matrices, the difference between the left hand side and the right hand side of Equation (1.5) multiplied by N converges in law towards a Gaussian random variable, hence it has to be of order 1 with respect to N .Indeed, thanks to Theorem 3.4 of [37], one can replace the trace on the right hand side by the expectation of the left hand side.Then proving such a central limit theorem is a well-known problem, see Theorem 7.6 of [21] for the case of polynomials.However, we believe that the optimal exponent of y = 1 + max |y i | in Proposition 3.3 and 3.6 should be two, as it is the case for the Gaussian ensembles, see also Theorem 2.6 in [9] for k ≤ 4. Thus in turn we expect that the estimates in (1.5) and (1.6) hold for functions in the second Wiener space; i.e. with f 2 instead of f 4 .
• Lastly, we remark that the Wigner matrices Y N can be both real symmetric or complex Hermitian, or a mix of them in (1.5) and (1.6).Further, we do not need the matrix entries to have the same law and the variances of the diagonal entries are only assumed to be of order 1/N .
One can notably compare Theorem 1.1 with Theorem 2.6 of [9] and Corollary 2.7 of [10].The biggest difference in our work is that we consider any polynomials P 1 , . . ., P k whereas in [9] the authors considered monomials of degree 1, i.e.
For such products the key result to prove Theorem 2.6 in [9] is a multi-resolvent local law (Theorem 3.4 of their paper): Let G N (z) := (Y N − z) −1 , z ∈ C\R, denote the resolvent or Green function of a single Wigner matrix Y N .Then the Green function G N (z) is well approximated by m sc (z)I N with m sc (z) the Stieltjes transform of the semicircle law, in averaged sense [15,18,19] as well as in isotropic sense [7,28], for |ℑz| ≫ N −1 , i.e. down to local scales slightly above the typical eigenvalue spacing.Theorem 3.4 in [9] and Theorem 2.5 in [10] give the deterministic approximation for multi-resolvents of the form with optimal error terms and on local scales; where (A N i ) are as in Theorem 1.1.The Helffer-Sjöstrand calculus allows then to extend this local law to observables of the form , in averaged and isotropic sense, yielding the results in Theorem 2.6 of [9].Yet, local laws for a single Green function of a polynomial in Wigner matrices [2,16,20,33] are difficult to derive, partly due to hard to check stability conditions stemming from the linearization trick [22].In particular for our desired results the local laws needed to be established up to the spectral edges, which for general polynomials seems challenging, not to speak of multi-resolvent local laws.
We circumvent this difficulty by working with the Fourier transform instead of the Stieltjes transform.A main novelty are concentration estimates for quantities of the form e iP1y1 R 1 • • • e iP k y k R k with (P i ) and (R i ) polynomials in Y N and A N , and (y i ) ∈ R k ; see Propositions 3.3 and 3.6 below.These concentration results are obtained by a long-time continuous interpolation between Wigner matrices and the GUE applied to the non-commutative setting using recursive moment estimates [23,31]; see [29,30] for long-time interpolations for Green functions of single Wigner matrices.This strategy does not rely on the linearization of non-commutative polynomials and hence avoids stability issues.The concentration estimates in Propositions 3.3 and 3.6 are then combined with results in [37] to replace GUE matrices with semicircular variables by interpolating them with the help of free stochastic calculus.This strategy of interpolating between random matrices and free variables was first developed in [12] before being refined to get a better estimate of the remainder term in [37].Via the Fourier transform our results in Theorem 1.1 follow for functions in the fourth Wiener space.
Another difference to [9] is that our method focuses on algebraic rather than combinatorial aspects.In [9] the main results, including the multi-resolvent local law, are stated using advanced combinatorics, such as free cumulant functions, partial traces or the Kreweras complement.A key insight of [9] was to connect resolvent expansions to non-crossing partitions and free cumulants.In this paper, we rely instead on the algebraic formalism of free probability, e.g.free products of C * -algebras or conditional expectations, that allows us to formulate the convergence results in (1.5) and (1.6) in all generality, i.e. with P i arbitrary polynomials in Y N and A N .In Subsection 2.2, we outline how to pass from an algebraic to a combinatorial formulation in some special cases.
Motivations for Theorem 1.1 come from free probability and mathematical physics.As explained in [9], the RAGE theorem (see [13], Theorem 5.8) states that given H a self-adjoint operator on an infinite dimensional Hilbert space, φ a vector state in the continuous spectral subspace of H, and C a compact operator, then the Heisenberg time evolution of C vanishes on φ when t goes to infinity, more precisely, the Cesàro mean of φ, e itH Ce −itH φ vanishes for large t.Thus heuristically the RAGE theorem describes the asymptotic behavior of the Heisenberg time evolution under some assumptions on C and φ.For example one may consider the operator H = P (x 1 , . . ., x d ) where P is a polynomial and (x i ) are free semicircular variables.It is well-known that the spectrum of a polynomial in independent Wigner matrices behaves similarly to the one of the same polynomial evaluated in free semicircular variables, see for example Theorem 5.4.2 in [3].Consequently one expects that up to an error which depends on N , the Heisenberg time evolution associated with the operator H = P (Y N 1 , . . ., Y N d ) will behaves similarly.Indeed, from Theorem 1.1, we get that if P is such a polynomial, then for t and N large enough, we have for any bounded sequence of deterministic matrices (B N ) and (C N ), as well as vectors x, y ∈ C N , that ) That the quantities on the right sides do not necessarily converge towards 0 comes from the fact that (C N ) may not behave asymptotically as a compact operator, indeed one could for example take (C N ) the identity matrices.Equations (1.7) and (1.8) indicate decays of correlations under the long-time Heisenberg evolution, a phenomenon referred to as thermalization in [9].More generally, the authors showed in Corollary 2.12 of [9], that the long-time Heisenberg evolution associated to a Wigner matrix generates asymptotic freeness.The first link between Free Probability and Random Matrix Theory was made by Voiculescu in the nineties.In [40] he showed that the trace of any polynomial in GUE random matrices converges towards the trace of the same polynomial but evaluated in free semicircular operators.He introduced the notion of asymptotic freeness accordingly: A sequence of families of random matrices (X N 1 , . . ., X N d ) is asymptotically free if for every collection of polynomials P 1 , . . ., P n ∈ C[X] such that tr N (P j (X N ij )) converges towards 0, then if i j = i j+1 for every j, we have that lim In particular, if (A N ) and (B N ) are asymptotically free, then Consequently, Equation (1.8) is a corollary of the asymptotic freeness of e itP (Y N ) C N e −itP (Y N ) and B N .Given two sequences of real numbers (u N ) and (v N ), we write Then with almost surely the family of non-commutative random variables (a N 1 , . . ., a N k ) converges jointly in distribution towards (a 1 , . . ., a k ) where the (a i ) are free.Moreover, if for every i, Y N i is a GUE or GOE random matrix, then one can replace N 1/4 by N 1/2 in the previous assumptions, i.e. one can assume that for any One can notably compare this theorem with Corollary 2.12 of [9] which studied the case of a single Wigner matrix, i.e. d = 1 and P = X 1 in the theorem above.Besides, the equivalents of Equations (1.5) and (1.6) are given in Theorem 2.6 of [9] (respectively Corollary 2.7 of [10]), there the error term depends on the k-th Sobolev norm (respectively k/2) of the functions (f i ) where k is the number of functions considered.In the case of the function f y : t ∈ R → e iyt , the k-th Sobolev norm of f y is of order y k whereas if we use Theorem 1.1, we will get that f y 4 is of order y 4 .While the former is better for smaller k, this dependence on k requires one to assume in Corollary 2.12 of [9] that 1 ≪ y N i+1 − y N i ≪ N 1/k for any k, which can be improved to N 2/k thanks to Remark 2.8 of [10].Consequently, in order to derive the simultaneous convergence of every moment -i.e. the convergence in distribution as defined in Definition 2.5 -out of this result, one has to assume that the time differences go to infinity slower than any power of N .
One can also wonder about the thermalization decay rates in Theorem 1.2.Indeed, given a polynomial P as defined in this theorem, the convergence rates depend on the behavior of the Fourier transform of the limiting spectral measure of P evaluated in free semicircular variables.For example, in the case of a single Wigner matrix, the spectral measure at the limit is the semicircle law which has square root decay at the spectral edges, hence this yields the exponent δ P = 3/2 in the equation below; as in Corollary 2.10 of [9].For general P , thanks to Theorem 1.1(5-6) of [39], one can show that there exists a constant δ P > 0, such that for any polynomial Q, one has that for any ε > 0, with high probability, This estimate could be further improved depending on the polynomial Q, as in Corollary 2.11 of [9].However, for a general given polynomials P , it is hard to compute the corresponding exponent δ P , thus we use the Riemann-Lebesgue lemma in the last step in the proof of Theorem 1.2.This is why we do not give the precise thermalization decay rates unlike Corollaries 2.9, 2.10 and 2.11 in [9].Finally, note that although it does not yield an exact formula, it is possible to use the algorithm of Theorem 4.1 in [5] to approximate the spectral distribution of a given polynomial in semicircular variables.We conclude this section by summarizing the organization of the paper.In Section 2, we first recall definitions from random matrix theory and free probability theory.Then in Subsection 2.3, we introduce the formalism suited to handle polynomials in non-commutative random variables and their derivatives.In Subsection 2.4, we recall the Schwinger-Dyson equations for Gaussian Wigner matrices as well as the cumulant expansions which can be viewed as a generalization of the Schwinger-Dyson equations for Wigner matrices.In Section 3, we derive key concentration estimates for the trace of products of exponentials of polynomials in Wigner matrices.This is accomplished by a continuous interpolation between Wigner matrices and GUE matrices in combination with a recursive moment estimate.In Section 4, we then combine the concentration estimates from Section 3 with the main results from [37] to prove our main results in Theorem 1.1 and Theorem 1.2.
2 Framework and standard properties

Definitions from Random Matrix Theory
In this subsection we define the main object of our study, the Wigner matrix.We make the assumption that the matrix entries are independent and have finite moments to every order, however we do not need to assume that they have the same law.Definition 2.1.We say that a square random matrix Y of size N is a Wigner matrix if it is a Hermitian or symmetric matrix whose entries are independent up to the Hermitian symmetry and such that for any i, j, E[Y i,j ] = 0, and The assumption on the large moments being finite is natural as we are working with polynomials.Although since in Theorem 1.1 we give equations which hold true with high probability rather than estimates on the expectation, there might be a way around this assumption by using the truncation method (see Section 2 of [1]), but this would reflect in the error terms of Equations (1.5) and (1.6).
Besides, we use two specific types of Wigner matrices whose entries are all Gaussian, the Gaussian Unitary Ensemble (GUE) and the Gaussian Orthogonal Ensemble (GOE).Definition 2.2.A GUE random matrix X N of size N is a Hermitian matrix whose entries are random variables with the following laws: Definition 2.3.A GOE random matrix X N of size N is a symmetric matrix whose entries are random variables with the following laws: • For 1 ≤ i ≤ N , the random variables N 2 X N i,i are independent centered Gaussian random variables of variance 1.
• For 1 ≤ i < j ≤ N , the random variables √ N X N i,j are independent centered Gaussian random variables of variance 1, independent of X N i,i i .Finally, we conclude this subsection by defining a notation that we will use regularly in the rest of the paper.Definition 2.4.Given a sequence of random variables (X N , Y N ) N ≥1 as well as a sequence (ε N ) N ≥1 of non-negative real numbers, we say that with high probability if for any k > 0, there exists a numerical constant C such that, holds for any N sufficiently large.

Definitions from Free Probability Theory
In order to be self-contained, we begin by recalling the following definitions from free probability.Definition 2.5.
• A C * -probability space (A, * , τ, .) is a unital C * -algebra (A, * , . ) endowed with a state τ , i.e. a linear map τ : A → C satisfying τ (1 A ) = 1 and τ (a * a) ≥ 0 for all a ∈ A. In this paper we always assume that τ is a trace, i.e. that it satisfies τ (ab) = τ (ba) for any a, b ∈ A. An element of A is called a non-commutative random variable.We will always work with a faithful trace, namely, for a ∈ A, τ (a * a) = 0 if and only if a = 0.
• Let A 1 , . . ., A n be * -subalgebras of A, having the same unit as A. They are said to be free if for all k, for all a i ∈ A ji such that j 1 = j 2 , j 2 = j 3 , . . ., Families of non-commutative random variables are said to be free if the * -subalgebras they generate are free.
• Let A = (a 1 , . . ., a k ) be a k-tuple of non-commutative random variables.The joint distribution of the family A is the linear form µ A : P → τ P (A, A * ) on the set of polynomials in 2k noncommutative variables.By convergence in distribution, for a sequence of families of variables , we mean the pointwise convergence of the map µ AN : P → τ N P (A N , A * N ) .
• A family of non-commutative random variables x = (x 1 , . . ., x d ) is called a free semicircular system if the non-commutative random variables are free, self-adjoint (x i = x * i ), and for all k ∈ N and i ∈ [1, d], one has It is important to note that thanks to [34, Theorem 7.9], which we recall next, one can consider free copies of any non-commutative random variable.
Theorem 2.6.Let (A i , φ i ) i∈I be a family of C * -probability spaces such that the functionals φ i : A i → C, i ∈ I, are faithful traces.Then there exist a C * -probability space (A, φ) with φ a faithful trace, and a family of norm-preserving unital * -homomorphism W i : A i → A, i ∈ I, such that: • The unital C * -subalgebras W i (A i ), i ∈ I, form a free family in (A, φ).
We will usually denote A by * i∈I A i or simply A 1 * A 2 when I only has two elements.
Let us finally fix a few notations concerning the spaces and traces that we use in this paper.Definition 2.7.
• M N (C) is the set of N × N matrices with coefficients in C.
algebra generated by a free semicircular system x = (x 1 , . . ., x d ), that is the C * -probability space built in Theorem 2.6.Note that when restricted to M N (C), τ N is just the regular normalized trace on matrices, in this case we will denote it by tr N .The restriction of τ N to C d is denoted as τ .Note that one can view this space as the limit of a matrix space, we refer to Proposition 3.5 from [12].
• Tr N is the non-normalized trace on M N (C), while tr N is the normalized one.
• We denote E r,s ∈ M N (C) the matrix with entries equal to 0 except in (r, s) where it is equal to 1.
In order to interpret Theorem 1.1, we need to define the conditional expectation first.
Definition 2.8.Let A be a C * -algebra and B be a unital C * -algebra, then E : A → B is said to be a conditional expectation if it is a linear map such that: It is well-known that we have the following property.With those definitions and properties, one can deduce the following which gives us some understanding on Equation (1.6).
• While the C * -algebra A N built in Definition 2.7 is not necessarily a von Neumann algebra, one can always replace it by its enveloping von Neumann algebra.Hence, since M N (C) is a von Neumann algebra, there exists a unique conditional expectation E MN (C) from A N to M N (C).That being said, it is not really necessary to understand the methods used to define the conditional expectation to read this paper.The rest of this remark will provide more details on how to estimate this quantity most of the time.
• Thanks to Proposition 2.9 we can deduce several properties of the conditional expectation in Equation (1.6), first and foremost one has the following bound Besides if f i = id R , then f i (P i (x, A N )) = P i (x, A N ) , and otherwise Consequently as long as our functions do not depend on N , the first order term in Equation (1.6) is also independent of N except with respect to the norm of the matrices A N and the one of the vectors x, y which is to be expected.
• It is also worth noting that since it is not necessary to compute the conditional expectation of to determine the first order term of Equation (1.6).However, the formulation with the conditional expectation renders it clear that this term is of order one.
• It is possible to use advanced combinatorics associated with free probability to compute the leading terms on the right of (1.5) and (1.6), especially when for every i such that f i = id R , P i is a polynomial in x rather than (x, A N ); see Section 2 of [9].For example with s a single semicircular variable, we have that where is a quantity that depends only on the matrices A N 1 , . . ., A N k and the Kreweras complement K(π) of π, whereas the free cumulant function sc • [ • ] depends on the functions f 1 , . . ., f k and the semicircular distribution.We refer to [34] for an introduction to the combinatorics of free probability.
• If one does not have any matrices A N , then since Hence,

Noncommutative polynomials and derivatives
Let A d,2r = C X 1 , . . ., X d , Z 1 , . . ., Z 2r be the set of non-commutative polynomials in d+2r variables.We set q = 2r to simplify notations.We denote by deg M the total degree of M (that is the sum of its degree in each letter X 1 , . . ., X d , Z 1 , . . ., Z 2r ).Let us now define several maps that are frequently used in Operator Algebra.First, for A, B, C ∈ A d,q , let We define an involution * on A d,q by X * , and then we extend it to A d,q by linearity and the formula (αP Q) * = αQ * P * .P ∈ A d,q is said to be self-adjoint if P * = P .Self-adjoint polynomials have the property that if x 1 , . . ., x d , z 1 , . . ., z r are elements of a C * -algebra such that x 1 , . . ., x d are self-adjoint, then so is the non-commutative derivative with respect to X i as such.First on a monomial M ∈ A d,q , one sets and then extend it by linearity to all polynomials.We can also define ∂ i by induction with the formulas, Similarly, with m as in (2.3), one defines the cyclic derivative D i : A d,q −→ A d,q for P ∈ A d,q by In this paper we need to work not only with polynomials but also with more general functions, since we will work with the Fourier transform we introduce the following space.Definition 2.12.We set S = {R ∈ A d,q | R * = R}, then we denote the set of polynomials in X 1 , . . ., X d , Z 1 , . . ., Z 2r as well as in a family of variable indexed by S. Then given y = (x 1 , . . ., x d , z 1 , . . ., z r , z * 1 , . . ., z * r ) elements of a C * -algebra, one can define by induction the evaluation of an element of F d,q in y by following the following rules: • ∀R ∈ S, E R (y) = e iR(y) .
One can extend the involution * from A d,q to F d,q by setting (E R ) * = E (−R) , and then again we have that if Q ∈ F d,q is self-adjoint, then so is Q(y).Finally, in order to make notations more transparent, we will usually write e iR instead of E R .
Note that for technical reasons which are explained in Remark 2.10 of [37], one cannot view F d,q as a subalgebra of the set of formal power series in X 1 , . . ., X d , Z 1 , . . ., Z 2r .Hence, why we need to introduce the notation E R .Now, as we will see in Proposition 2.15, a natural way to extend the definition of ∂ i (and D i ) to F d,q is by setting However we cannot define the integral properly on F d,q ⊗ F d,q , but after evaluating our polynomials in a matrix space this is not a problem anymore.Indeed, a tensor of matrix spaces is also a matrix space and hence the integral of continuous functions on this space is well-defined.Thus we define the non-commutative differential on F d,q as following.
Definition 2.13.For α ∈ [0, 1], we define ∂ α,i : F d,q → F d,q ⊗ F d,q as the map which satisfies (2.4) and is such that for any R ∈ A d,q self-adjoint, Then, given y = (y 1 , . . ., y d+q ) elements of M N (C), we define for any Note that for any P ∈ A d,q , since 1 0 1dα = 1, we do also have with ∂ i Q defined as in Definition 2.11, that Thus Definition 2.13 extends indeed the definition of ∂ i from A d,q to F d,q .Besides, it also means that we can rigorously define the composition of noncommutative differentials.Since the map ∂ α,i goes from F d,q to F d,q ⊗ F d,q it is very easy to do so.For example one defines the following operator that we use later on.Definition 2.14.Let Q ∈ F d,q , given y = (y 1 , . . ., y d+q ) elements of a C * -algebra, let i ∈ [1, d], with • the composition of operator we define

The Schwinger-Dyson equations and their generalization
A tool that we use repeatedly with Gaussian random matrices are the so-called Schwinger-Dyson equations.It is a consequence of Gaussian integration by parts which can be summarized into the following formula.If Z is a centered Gaussian random variable with variance one and f a C 1 function, then From there on, we deduce the Schwinger-Dyson equations in the following proposition.For more information about these equations and their applications, we refer to [3], Lemma 5.4.7.
Proposition 2.15.Let X N be a GOE random matrix of size N , A N deterministic matrices, Q ∈ F 1,q , then where h is the linear map such that h(A ⊗ B) = A T B with A T is the transpose of A.
Proof.Let us first assume that Q ∈ A 1,q .One can write X N = 1 √ N (x r,s ) 1≤r,s≤N and thus where notably we used that for any matrices A, B ∈ M N (C), 1≤i,j≤N A i,j B i,j = Tr N (A T B).
, then the proof is pretty much the same but we need to use Duhamel's formula (for a very similar proof see [38], Proposition 2.2) which states that for any matrices A and B, Thus this allows us to prove that for any self-adjoint polynomials and the conclusion follows.
In the case of GUE random matrices we have an even shorter formula.The following Proposition is actually Proposition 2.23 of [37] whose proof is quite similar to the one of Proposition 2.15.
Proposition 2.16.Let X N be a GUE random matrix of size N , (2.9) For studying Wigner matrices, there exists a more general formula, the cumulant expansion.Its usefulness in random matrix theory was recognized in [27] and has widely been used since, e.g.[8,17,23,24,25,31,36].We will use a specific version with an explicit expression for the remainder.To do so we follow the proof of Proposition 3.1 of [36] but we do not upper bound the remainder.Proposition 2.17.Let u, v be real random variables such that E[|u| ℓ+2 ] < ∞ and E[|v| ℓ+2 ] < ∞ for some natural number ℓ.Let (κ n,m ) n,m∈N be the joint cumulants of u and v, i.e. the numbers that satisfy Then for any function Φ : R 2 → C of the class C ℓ+1 , we have that ) where we defined Proof.Since we have that Consequently, for any polynomial P of degree smaller than ℓ, we have that , then thanks to Taylor's theorem, we have that Consequently, there exists a polynomial π ℓ of degree at most ℓ such that Thus we get that But thanks to Equation (2.11), we have that

Hence, by induction we get that
. Note that we used that Φ was of class C ℓ+2 in this computation but by an argument of density we only need Φ to be of class C ℓ+1 eventually.Thus in conclusion, This proves Equation (2.10).
3 Reduction of the problem to the case of GUE matrices

Concentration of the trace
In order to prove Theorem 1.1, the first step is to reduce the proof to the case of GUE random matrices.To do so we prove two concentration results.First for the trace in Proposition 3.3, then for the scalar product in Proposition 3.6.Before giving those propositions, we state the following lemmas which will be useful for the proof.
For a, b ≥ 0 and conjugate exponents p, q, we have Young's inequality Consequently, with p = n n−g , q = n g , we have that , and the claim follows Lemma 3.2.Given • Y N a d-tuple of independent Wigner matrices as in Definition 2.1, Besides if one writes each Q j as a linear combination of terms of the form e iP1 R 1 • • • e iP k R k , then the upper bound on the equation above does not depend on the (self-adjoint) polynomials P 1 , . . ., P k .
Proof.To begin with, thanks to Hölder's inequality, one can always assume that l = 1.Besides, one has for any p ≥ 1, Consequently, one can assume that c is an even integer.In which case we have that Hence, it is sufficient to prove that for a given By linearity, one can also assume that there exists P 1 , . . ., P k , R 1 , . . ., R k ∈ A d,q non-commutative polynomials such that P 1 , . . ., P k are self-adjoint and Then thanks once again to Hölder's inequality, one has that .
Let us now remark that given a self-adjoint matrix T , we have that e iT = √ e −iT e iT = I N , Thus .
Thanks to Theorem 5.4.5 of [3], such a quantity is uniformly bounded with respect to N (since we assumed that the norm of our deterministic matrices was uniformly bounded with respect to N ).
We can now state our concentration estimate for the trace.

Proposition 3.3. Given
• Y N a d-tuple of independent Wigner matrices as in Definition 2.1, • X N a d-tuple of independent GUE random matrices, independent from Y N , Let y i ∈ R, y = 1 + max i |y i |, P 1 , . . ., P k , R 1 , . . ., R k ∈ A d,q non-commutative polynomials.If we assume that P 1 , . . ., P k are self-adjoint, then with Q = e iP1y1 R 1 • • • e iP k y k R k , we have for any ε > 0 that with high probability, Besides if Y N a d-tuple of independent GOE or GUE matrices, then one can replace y 4 by y 2 in the previous equality.
In order to render the proof easier to read, we divide it in five steps.The first one establishes an equation on the moments of M N (defined in Equation (3.5) below).Then in each of the following three steps we bound a specific error term.In the last step, we study the case of the GUE and the GOE.
Proof of Proposition 3.3.Step 1 (Using the cumulant expansion): Our first step is to define Then we want to study the moments of M N in order to use Markov's inequality.To do so, we are going to prove that for any n ∈ N, for any ε > 0, there exists a constant C n such that for N large enough, Let us now set for t ∈ R + , so that we have the following equality, where we used the Schwinger-Dyson equations, i.e.Proposition 2.16, to get the last line.Note that from (3.7), the non-commutative differential of X N t with respect to X N is (1 − e −t ) 1/2 1 ⊗ 1, hence why the factor (1 − e −t ) −1/2 disappears from the second to the third line above.Since we have that Next we want to use the cumulant expansion in Proposition 2.17 to the third order (ℓ and Here the Φ i,j is chosen to reproduce the first term in the integrand on the right side of (3.8), i.e.
We now compute the first derivatives of Φ i,j .If i = j, we have that and Further if i = j, then ∂ u Φ i,j (u, v) is obtained similarly but with E i,i instead of E i,j + E j,i , and Besides with u and v defined as such, their cumulants satisfy In particular, as we set in Definition 2.1, if i = j, κ 2,0 + κ 0,2 = 1/N .Thus by using the cumulant expansions in Proposition 2.17 with ℓ = 2, we get that where H N,n i,j is the term coming from the second order derivatives (i.e.
is the remainder ε 3 , consequently this term correspond to the third order derivatives of Φ i,j .
Step 2 (Bounding the first order error term): In this step, we focus on upper bounding the terms in Equations (3.14c) to (3.14g), to do so we use the following inequality.Given two matrices A and B of size N , 1≤i,j≤N Consequently, thanks to the moment assumption in (2.1), after summing over i and j, one can bound the terms in (3.14c), (3.14d), (3.14f) and (3.14g) by for some B 1 , B 2 ∈ F 2d,q and Z N = (X N , Y N , A N ).Indeed, for example, if one looks separately at (3.14f), , which is finite thanks to moment assumption in (2.1).Then if we write D s Q = l c l M l where c l ∈ C and M l are unitary monomials in X N t and e iR(X N t ) (for any self-adjoint R), then thanks to Equation (2.5), d l = sup y1,...,y k ∈R c l /y is finite and for some constant K. Hence we can find Similarly after summing over i and j, one can bound the terms in (3.14e) by Then by using Lemma 3.
Thus, thanks to Lemma 3.2, there exists a constant C n such that the terms in Equations (3.16) and (3.17) can be upper bounded by Consequently, after summing over i, j, Equation (3.14) yields Step 3 (Bounding the second order error term): Next we want to tackle the term H N,n i,j in (3.18).We recall that it is the term coming from the second order derivatives, i.e.
The first order derivatives of Φ i,j are given in Equations (3.11) and (3.12).Besides, we compute Moreover, thanks to the moment assumption in (2.1), one can bound the mixed cumulants of the real and imaginary part of √ N (Y N s ) i,j uniformly over i, j and N .Note that since this term comes from the second order derivative, the cumulants which appears due to Proposition 2.17 are of order 3, i.e. κ n,m with n + m = 3, hence why we normalize H N,n i,j by N 3/2 in Equation (3.14h).Consequently, H N,n i,j can be bounded by a linear combination (whose coefficients are independent of i, j) of terms of the form with F 1 , F 2 , F 3 ∈ {E i,j , E j,i } and A, B, C elements of F d,q evaluated in X N t .So one can sum H N,n i,j over i, j and by using the following kind of inequalities, 1≤i,j≤N one has the following bounds for some constant C n , and B 1 , B 2 , B 3 ∈ F 2d,q and Z N = (X N , Y N , A N ), Next, we use Lemma 3.1 with and g = 1, 2, 3. Thanks to Lemma 3.2, we get that there exists a constant C n such that Step 4 (Bounding the third order error term): It remains to bound R N,n i,j in (3.18).Since we have an explicit expression for the remainder ε 3 in Proposition 2.17, we let M u,i,j,s N be defined just like M N but with the entries (j, i) and (i, j) of the matrix Y N s multiplied by u ∈ [0, 1].Then one has that R N,n i,j is bounded by a linear combination (whose coefficients are independent of i, j) of terms of the form, where r, m ∈ [1,4], B 1 , . . ., B l ∈ F d,q , and Z N is defined like Z N but with the entries (j, i) and (i, j) of the matrix Y N s multiplied by u.This is notably due to the fact that by Hölder's inequality, for example, for any matrices R, S, T, U , Besides, given a matrix R, thus we have that for any u ∈ [0, 1], for some polynomials P, C 1 , . . ., C l ′ and a constant c.Hence, one can bound (3.22) by a linear combination (whose coefficients are once again independent of i, j) of terms of the form, for h ∈ [0, 2n − m], Next, by using Equation (3.23) again, one can find polynomials P, D 1 , . . ., D l ′′ and a constant c ′ such that the quantity above is bounded by We can then use Lemma 3.1 with Then thanks to the moment assumption in (2.1) and Lemma 3.2, we can once again upper bound Equation (3.22) by for some constant C n .Consequently, we have for some other constant C n , that 1≤i,j≤N Thus by combining Equations (3.18), (3.21) and (3.25), we get that for some constant Hence, by plugging this result in Equation (3.8), for N sufficiently large, With the above moment estimate at hand, we can apply Markov's inequality: for any δ > 0, we have for any n, for N large enough, Hence, we choose δ = y 4 N 2ε−1 , and we have that for any n, for N large enough, Consequently, for any ε > 0, we have with high probability that which proves Equation (3.4).
Step 5 (The case of the GUE and the GOE): In the case where Y N s is a GOE random matrix, then by using Proposition 2.15 (with a slight modification to take into account for the term where A T is the transpose of A, and h is the linear map such that h(A ⊗ B) = A T B. Thus, similarly to Equation (3.18), we get that Hence by reinjecting this result in Equation (3.8), we prove that The case where Y N s is a GUE random matrix is even simpler.Indeed, we compute that and the result follows.
Remark 3.4.We suspect that it should be possible to improve the error term N ǫ y 4 N in (3.4) for Wigner matrices to N ǫ y 2 N , as is the case for GOE and GUE matrices.The weaker bound in the general case comes from certain inequalities used to bound contributions from the second and third order terms in the cumulant expansions that cannot be easily improved for general polynomials, e.g.(3.20) or (3.22), as one does not expect cancellations from higher order terms in the cumulant expansion.

Concentration of the scalar product
Next we prove a similar concentration result for the scalar product.The main difference is that instead of having a concentration of order N −1 , it is only of order N −1/2 .Note that this speed of convergence cannot be improved by pushing the cumulant expansion further since it comes from the first error term (see Equation (3.35c) to (3.35g)).Besides it is in line with results obtained by proving local, see for example Theorem 2.6 of [6].The main difference between local laws and Proposition 3.6 comes from the term y 4 which as we discussed in Remark 3.4 we suspect is not optimal.Before giving the concentration estimate we prove the following lemma which we will need.

Lemma 3.5. Given
• Y N a d-tuple of independent Wigner matrices as in Definition 2.1,

.28)
Besides if one writes each Q j as a linear combination of terms of the form e iP1 R 1 • • • e iP k R k , then the upper bound on the equation above does not depend on the (self-adjoint) polynomials P 1 , . . ., P k .
Proof.For any k and j, one has that .
Thus thanks to Lemma 3.2, one has that for any k, there exists a constant C k such that for any N , Hence the conclusion by picking k such that l 2k ≤ γ.We can now state our concentration estimate.Proposition 3.6.Given • Y N a d-tuple of independent Wigner matrices as in Definition 2.1, Besides if Y N a d-tuple of independent GOE or GUE matrices, then one can replace y 4 by y 2 in the previous equality.
Similarly to the proof of Lemma 3.3, we divide the following proof it in four steps.The first one establishes an equation on the moment of M N (defined in Equation (3.30)).Then in each of the following two steps we bound a specific error term.In the last step, we study the case of the GUE and the GOE.
Proof of Proposition 3.6.Step 1 (Using the cumulant expansion): First one has that for any matrix R ∈ M N (C), x, Ry = Tr N (Ryx * ).Consequently, with P = yx * , as previously we set and we want to study the moments of M N in order to use Markov's inequality.We want to prove that for any n ∈ N, ε > 0 there exists a constant C n such that for N large enough (3.31) One can always assume that y 4 ≤ √ N , otherwise, since The case where Y N s is a GUE random matrix is even simpler.Indeed, we compute that and the result follows.
Hence we have that Finally, with f 1 to f k defined as in Theorem 1.1, we set Then with Q = f 1 (P 1 ) . . .f k (P k ), and µ i the Dirac measure in 1 if f i = id R , thanks to the functional calculus we have Thus by combining Equations (4.1) and (4.3), we get Equation (1.5) after integrating over y.
Besides in the case where we are working only with GUE and GOE random matrices, then one can replace y 4 by y 2 in Equation (4.1).Then if y 2 ≤ N , Equation (4.2) let us conclude, whereas if y 2 ≥ N , then we simply use the fact that To prove Equation (1.6), we also start by using Proposition 3.6 to show that for any ε > 0, with high probability where one can replace y 4 by y 2 in the case where we are working only with GUE and GOE random matrices.Then we need to estimate E x, Q(X N , A N )y .Thanks again to Lemma 3.6 of [37], we get that where R α,β,δ,γ,r,t is once again such that for some constant C independent of α, β, δ, γ, r, t, y and N , E [ R α,β,δ,γ,r,t ] ≤ Cy 4 .
Consequently since P = yx * is of rank 1 and P = x 2 y 2 , The rest of the proof follows as for Equation (1.5) with the difference that after integrating over y, we use the fact that b → E MN (C) [b] is a continuous linear functional to switch the integral and the conditional expectation.
Proof of Theorem 1.2.Let Q 1 to Q p be non commutative polynomials, i 1 , . . ., i p ∈ [1, k] be such that for every j, τ (Q j (a ij )) = 0 and if j < p, i j = i j+1 .Then if we set u N 2j−1 := e i(y N i j −y N i j−1 )P (x) (with the convention i 0 = i p ) and u N 2j = Q j (A N ij ), one can apply Theorem 1.1 to obtain that with high probability Further, if Y N is a family of GUE and GOE matrices, then Since by assumption for any i, j, |y N i − y N j | ≪ N 1/2 in the case of GUE and GOE matrices, and |y N i − y N j | ≪ N 1/4 in all generality, we get in both cases that with high probability Thus the former equality is true almost surely by Borel-Cantelli.But then thanks to Proposition 11.4 of [34], with NC[2p] the set of non-crossing partition of [1, 2p] (i.e. the set of partitions of [1, 2p] such that one cannot find a, c in one block as well as b, d in another block such that a < b < c < d), we can write

Proposition 2 . 9 .
Let (A, τ ) be a W * -algebra, i.e. a von Neumann algebra with a faithful normal trace τ , and B a W * -subalgebra.Then there exists a unique conditional expectation E : A → B.Besides given a ∈ A, it is characterized by the fact that for any b ∈ B, τ (E[a]b) = τ (ab).Finally, for every element a ∈ A, we have that E[a] ≤ a , where • is the operator norm.