Edge universality of correlated Gaussians

We consider a Gaussian random matrix with correlated entries that have a power law decay of order d > 2 and prove universality for the extreme eigenvalues. A local law is proved using the self-consistent equation combined with a decomposition of the matrix. This local law along with concentration of eigenvalues around the edge allows us to get a bound for extreme eigenvalues. Using a recent result of the Dyson-Brownian motion, we prove universality of extreme eigenvalues.


Introduction
The Wigner-Dyson-Mehta conjecture asserts that the local eigenvalue statistics of large random matrices are universal in the sense that they depend only on the symmetry class of the model -real symmetric or complex Hermitian -but are otherwise independent of the underlying details of the model. There are two types of universality results. Bulk universality involves the spacing distribution eigenvalues that lie well within the support of the limiting spectral distribution, while edge universality involves the extreme eigenvalues.
There has recently been a lot of progress made in proving the Wigner-Dyson-Mehta conjecture in a increasingly large class of models. In [7,8,10,11,14,13], universality was proved for Wigner matrices whose entries are independent and have identical variance; parallel results are obtained independently in various cases in [18,17]. In [3,1], this type of result was extended to more general variance patterns, while still maintaining the independence of matrix entries.
Most of the previous works rely heavily on the independence between matrix entries, and deal with bulk universality. Only recently have people proved results on models with general correlation structure. In [6,4,5], bulk universality is proved for matrices where the correlation decays fast enough, e.g., faster than polynomial decay. In a recent paper [9], Erdös et al. consider a model where the correlation between matrix entries has a power law decay of order d ≥ 12 in the long range and d ≥ 2 in the short range. They use a combinatorial expansion to get optimal local law, then prove bulk universality. They remark in Example 2.12 that in the Gaussian case, d ≥ 2 for both long range and short range correlation is sufficient to satisfy the assumptions of their main theorem.
In this paper, we prove edge universality for Gaussian matrices with a correlation structure that decays as a power law of order d > 2, namely |E [h ij h kl ] | ≤ 1 |i−l| d +|j−k| d where h ij are the entries of the random matrix H.
By a standard rule of thumb, it is much easier to prove universality when the matrix being considered is closer to one such that all elements are independent. When we have stronger correlations it becomes significantly more difficult to apply the few known algebraic techniques to derive a local law. When one applies row removal to get a self-consistent equation, the correlations make it significantly more difficult to bound the error term. When applying the loop equation, one would also derive poor concentration bounds through naive choices of the integrating region. It is believed that d > 2 is the optimal region where one might still be able to get universality estimates. It becomes most difficult to prove the local law in this regime and one has to be very careful with controlling errors.
In order to deal with the difficulty involved in deriving some of these error estimates, it becomes necessary to expand the matrix to large orders and perform a sophisticated combinatorial expansion of the matrix entries of the Green's function. We give a proof of this result that does not require this expansion or a corresponding fluctuation averaging type 1 result, instead we rely on a decomposition of Gaussian random variable.
Correlated matrices regularly occur in various statistical applications. A population researcher may seek to determine the existence of various subpopulations, where the correlations within one subpopulation are greater than those between different subpopulations. Biological researchers studying genetic history of species or protein structure and gene expression use these sorts of correlations to determine the genealogical relationship of species or to construct a map between parts of DNA and the protein it encodes. Since the decay d > 2 is optimal, proving universality in this regime would get a statement that would be robust for all possible applications.
We follow a three step strategy to prove universality: S such that E [G(−S(G) − z)] = 1. Removing the expectation creates some error term.
The goal is to show that a small error exists with high probability on our matrix ensemble, as is done in [4,6].
From [6], it is known that the self-consistent equation for correlated matrix entries is of the form G(−S(G) − z) = I. This can be transformed into the following vector equation via local Fourier transform.
is an integral operator, which is the continuous version of S. There are two difficulties in our case: getting a small error for our self-consistent equation and proving the stability of the equation near the edge.
In order to get a small error for the self-consistent equation, we avoid the procedure of removing blocks of elements, which requires combinatorial expansion, but instead applied integration by parts and concentration results along a careful decomposition of the probability space. This gives us a weak local law which can be bootstrapped to give an even better bound for the expected value of the Green's function. Once we have bounds on the expected value, we use the concentration of eigenvalues about its mean value in order to show a version of upper bound for the top eigenvalue along the edge.
In order to prove the stability, we first embed the matrix space into the continuous , up to small errors. However, entry-wise error is not small enough to allow this embedding. We noticed the fact that the operator S has a smoothing effect and will reduce the error. The smoothing effect of S is due to the fact that it acts as a convolution with a decaying function; this convolution effectively regularizes the error.
A double iteration of the operator F (G) = (−S(G) − z) −1 created a matrix F (F (G)) that satisfies F (F (G)) = F (F (F (G))) + R, (1.2) where R has sufficiently fast decay on off-diagonal entries. A similar strategy based on the smoothing effect of F is also used in [5]. Then we can embed and apply stability of the continuous solution. In order to prove the decay properties of the double iteration, we applied a perturbation around a fixed matrix that is known to have decay of matrix entries.
With sufficiently strong upper bounds on the top eigenvalue and lower bounds on the bottom eigenvalue, we are able to use the result of [16] to get universality for the extreme eigenvalues conditioned on a sub-σ-field, where the scaling factors and edge locations vary. Then we prove the existence of a scaling factor uniform for all matrices in the model, which gives us the final universality statement.
The main novelties in this paper are: the usage of a Gaussian random variable decomposition, which allows us to get better concentration estimates for the Green's function about its expected value; the extension of the result of [16] via uniformization of scaling factors; improving the analysis of stability of the self-consistent equations in the slow-decay regime where d > 2.
The structure of this paper is as follows. The second section is devoted to proving a self-consistent equation with sufficiently small error. The third section of this paper involves proving stability of the self-consistent equation to get a local law to prove an upper bound on eigenvalues. The final section uses this upper bound in order to prove universality.

The model and assumptions
Before we start defining our model, we will start with some preliminary notation. As We need a notion of toroidal distance in order to later define correlation decay. In the sequel, we will use letters i, j, k, l as indices for the matrix entries; for an N by N matrix, we view these indices as elements in Z N = Z/N Z. On Z/N Z we define the natural distance dist Z/N Z (i, j) := min{|i − j + kN ||k ∈ Z}, which for simplicity of notation we still denote by |i − j| unless there is danger of confusion.
ij ) 1≤i,j≤N whose entries are centered Gaussian random variables with correlation ξ (N ) . For simplicity of notation we omit superscript (N ) notation.
We need the following assumptions on the behavior of the ξ ijkl 1. We have a limiting profile for the covariances. Assume there is a Lipschitz function for some d > 2. 3. We have a nonsingularity condition. We assume that the covariance matrix ξ ijkl is strictly positive semidefinite.
[ξ] ijkl > 0 Since we are dealing with Gaussian matrices, this is equivalent to c 2 > 0, such that H allows a decomposition H = c 2 X + Y, (2.4) where X is a GOE matrix independent from Y .

Remark 2.2.
The third nonsingularity condition was used used to gain control of an inverse operation during our study of the loop equation.
There are examples of Gaussian ensembles with correlation decay that do not satisfy universality. As a simple counterexample, consider the following ensemble matrix where G is a GOE matrix and X can be the 2 by 2 identity matrix or the 2 x 2 matrix with all 1s. If X were the identity, all eigenvalues appear with double multiplicity and universality cannot hold. If X was the all 1 matrix, then it would have half rank. The small GOE component causes sufficient spreading of the eigenvalues to get universality.

Remark 2.3.
Though writing our ensemble in the form H = c 2 X + Y where X and Y are independent would suggest that we use results from free probability to prove a local law, we would need to have local law results on the matrix Y . However,here, Y is a Gaussian matrix of a similar type as H and we do not know a priori good local laws for Y . Our method here avoids this issue of infinite regress.
We fix as a control parameter an arbitary α ∈ (2, d). We say that a constant is universal if it only depends on c 1 , c 2 , d, α and φ. In this paper, we denote a b if there is a universal constant c > 0 such that a ≤ cb. We also denote a ∼ 1 if a 1 and 1 a.
For β > 0 and any matrix A (finite square or infinite) we define the following norms, Our main result is the following. Let λ 1 ≤ · · · ≤ λ N be the eigenvalues of H. Letλ 1 ≤ · · · ≤λ N be the eigenvalues of an N by N GOE matrix.
There exists a universal constant γ such that for any f ∈ C 1 (R k−1 ), the following inequality holds for N large enough for some small c > 0.

The loop equation
The following lemma is one of the building blocks of the loop equation.
Proof. This directly follows from an identity known as Stein's lemma, which says that if We also use the following decomposition lemma, which allows us to construct a special sigma algebra. We then derive the loop equation by taking the conditional expectation with respect to this sigma algebra. As we will see later, this sigma algebra has the benefit that we can get strong concentration bounds upon removing conditional expectation. Lemma 2.6. Let Z = (Z k ) p k=1 be a centered Gaussian random vector. Let 1 ≤ q < p. Then, there is a constant matrix (a kl ) 1≤l≤q,q+1≤k≤p such that forms a Gaussian random vector such that the latter (Z k ) p k=q+1 are independent from the former (Z l ) q l=1 Proof. This can be done by a carefully chosen linear transform.
We start with the trivial matrix identity G(H − z) = I, which can be written as fol- Without loss of generality, fix j = 1. According to Lemma 2.6 we may write, γ abk1 h k1 +h ab , (2.9) whereh ab is a Gaussian random variable that is independent from (h k1 ) 1≤k≤N . In particular, γ a1k1 = δ ak ,h a1 = 0, ∀a ∈ Z/N Z. In order to apply Lemma 2.5 on (2.8), let F 1 be the σ-algebra generated by (h ab ) a =1,b =1 . Define the conditional expectation We will then be able to apply Lemma 2.5 to get the following (2.10) For technical reasons, we define the cut-off version of ξ as follows, , so thatξ iklj has a power-law decay as i and j gets farther. Define a linear map S : Notice that the expectation operator E 1 is equivalent to integrating over N weakly dependent Gaussian random variables, we may remove the expectation up to the cost of some small error terms, after which, we would get a self-consistent equation in the following form. (2.14) Define a map F : Then the above equation can be written as the perturbation of a fixed point equation Here the error is entry-wise bounded by roughly O((N η) − 1 2 ). However, this entry-wise bound is not strong enough to use the stability of the equation G = F (G). Therefore, we iterate the map F on G to get The new error term has a power-law decay on the off-diagonal entries; this comes from two facts that will be established in detail later. The first is the fact that the S(G) operator acts like a convolution operator with ξ ijkl which allows us to get decay on the entries of S(G). Secondly, we use the fact that the inverse of a matrix whose entries have decay will also have decay of such order. Hence, two applications of F will give us an error that is much smaller than the original error and we can get a good estimate on F (F (G)). Using F (F (G)) we can recover G and get a bound on |G − G 0 | where G 0 is some deterministic matrix.

Limiting version of self-consistent equation
In order to define a local law, we need a space in which we can define our limiting selfconsistent equation. This limiting equation is best described as the space of convolution operators on the torus.
The argument in Lemma 4.15 of [6] can be modified to show that ϕ ∼ 1. We give a brief summary of that argument as follows. One can write, by the definition of profile, . VarY N can be controlled via the decay assumptions on ξ ijkl . Choosing g carefully gives pointwise bounds on ϕ. This is the only place where the argument differs from Lemma 4.15 in [6] which used the finite range of correlations.
The decay condition (2.2) guarantees that ϕ is Lipschitz. Define Ψ :  Equations like (2.20) are studied in detail in [2]. Since the function ϕ is bounded above and below away from 0, the function Φ satisfies conditions A1-A3 and the block fully indecomposable 2 condition of Definition 2.9 of [2]. Also, since ϕ is Lipschitz, it satisfies (2.22) in that article. Therefore, their Theorem 2.6 says that the above equation has a unique solution g ∈ K + , and there is a universal constant c 3 < +∞ such that Let m(z) := T 2 g(s, u)dsdu. Then m is the Stieltjes transform of a compactly supported probability measure ν on R, i.e., Then Theorem 2.6 in [2] says that ν has a 1 3 -Hölder continuous density ρ ∈ C c (R) such that it has square-root behavior at the left and right edges, i.e., let E L := inf supp ν, E R := sup supp ν.
For h ∈ K, define the Fourier coefficientsĥ(s, k) := T h(s, u)e −2πiku du. On K we may define a norm · β for β ≥ 0: In view of Theorem 3.2 of this work, it is easy to see that g α ∨ g −1 α 1 on any bounded subdomain of C + . For any N ∈ N, define a discretization operator D (N ) : We have the following lemma concerning the discretization D(g): therefore, using the decay ofâ and the Lipschitz continuity ofb, we have Here we used the fact that α > 2 to get decay. Similarly, the l 1 → l 1 norm is bounded by the same quantity, hence the operator norm has the same bound by interpolation. The second estimate follows from a similar argument. Fix z; in the following, we will let g(s, t) be the solution of (2.20) at this point z. We now define Z(z) := {|g(s, t)||s, t ∈ T}. Equation (2.20) gives us that Z is bounded away from 0 and +∞. For K > 0 let D K = {z ∈ C + ||z| ≤ K}.
Proof. Let θ 1 be some parameter to be chosen.
where L is the Lipschitz constant of h with respect to the first variable. By chain rule we know that h ∞ θ −3 and L θ −2 . Therefore, h 2 θ −3 and hence (D(g)D(g) * − That means x is not in the singular spectrum of D(g).
Proof. According to Lemma 2.7 and equation (2.20), we know In particular, the singular spectrum of F (D(g)) and F (F (D(g))) are contained in a compact subset of R + . Proof. Using the notation from the previous corollary, if D(g)(−S(D(g)) − z) − I = R, then F (D(g)) = (I + R) −1 D(g).
perturbation theory we know that the singular spectrum of F (D(g)) is within N − 1 2 of that of D(g), therefore it is a compact subset of R + . On the other hand, a simple algebraic calculation yields F (F (D(g))) = (I + F (D(g))S(F (D(g))R)) −1 F (D(g)).

Concentration lemmas
The following lemma says that a Lipschitz function of weakly dependent Gaussian random variables concentrates around its expectation.
Lemma 2.12. Let X = (X 1 , · · · , X n ) be an array of centered Gaussian random variables with covariance matrix Σ. Let f : Proof. Let Y = Σ −1/2 X so that Y is an n-dimensional random vector with independent N (0, 1) components. In [19], Here L 1 is the Lipschitz constant for the function y → f (Σ 1 2 y). It is easy to see that L 1 ≤ L Σ 1 2 , which concludes the proof.
In the future, we will frequently use the following lemma.
Proof. Without loss of generality let β = 1. For any vector v ∈ R n , Recall that in Section 2.2 we defined a map S (see (2.12)). Thanks to the decay condition (2.2), the operator S is a bounded operator, as will be seen in the following lemma.
Lemma 2.14. Let A ∈ C N ×N . Then there is a universal constant c > 0 such that the following inequalities hold.

Proof. By definition |S(A)
This proves the third inequality. As for the fourth inequality, we use Cauchy-Schwarz inequality to see that

Error estimate
Recall the decomposition (2.9) Taking the co-variance with h l1 for any l ∈ Z/N Z, we see that In the second inequality above we have used the boundedness of Σ −1 Use the decay rate (2.2), is bounded by C(α − 2) −1 , according to Lemma 2.13. Therefore, (2.36) In the second inequality we used Ward Identity. Similarly, Define a short-hand notation Q kl := ∇ 1 G kl . By (2.36) we have |Q| 2 ∞ ≤ CΓ 2 γη −1 . Then (2.38) Now we use the bound (2.36), and use the decay (2.2) as well as Lemma 2.14 to see, (2.39) The observation above yields the following lemma.
Note thatG = G on the event {Γ ≤ K}. Therefore, On the other hand, in view of (2.39), a similar argument yields, Now we go back to the identity (2.13), removing E 1 at the cost of some error term, and replacing 1 with a generic j, to see In particular, for a crude bound, we may take t = log N and take K = 2/η so that P [Γ > K] = 0. The lemma above yields,

Power law decay of inverse matrices
Proof. Note that by definition, A min{β1,β2} B min{β1,β2} ≤ A β1 B β2 , so it is sufficient to prove the case where β 1 = β 2 = β. Without loss of generality assume A β = B β = 1, then, Since either |i − j| or |j − k| is ≥ |i − k|/2, the above quantity is bounded by The following argument is based off a similar argument of Jaffard [15].
We will show matrix element decay of the solution to the self-consistent equation. Though we will only really apply this to the solution of the limiting equation (2.21), the following theorem will phrase the result in terms of Matrices for convenience of notation. For any general positive semi-definite matrix, A, we will be able to write it as A = One we know M has decay d − 1 2 , we can show that S(M ) has even better decay; it will have decay of order α. We can repeat the same argument, but with this better decay estimate, to show that M has matrix decay of order α.
Proof. Let A := F (D(g)) and R := S(M − D(g)). Then Next, we define R = S(F (M ) − F (D(g))), A = F (F (D(g))). Then R α ≤ c according to the above argument. We have The last claim follows from the estimates above and Corollary 2.10.

Local law
Recall definition (2.31) and (2.34), for a constant T > 0 to be chosen, define with probability 1 − e −a3(log N ) 2 . If κ > ρ, Proof. Take K := log N , and let {z k } be an N −4 -net of D. Define Then by Proposition 3.4, on Ω we have We can then apply stability Lemma 2.11 to J(F (F (G))) which approximately satisfies the self-consistent equation. This would then imply that J(F (F (G))) − g ∞ (|R| ∞ + N −1 )ω −1 . Discretizing this would give the inequality F (F (G)) − D(g) ∞ (|R| ∞ + N −1 )ω −1 . Due to the closeness of G and F (G) and F (G) to F (F (G)) from the selfconsistent equation, this inequality would imply that λ(z) , since if we take the T in the definition of D to be a large enough constant, then the former case holds with O(e −c(log N ) 2 ) probability.
We use Theorem 3.5 and the crude bound Im m ≤ ηκ −2 to get the conclusion. Remark 3.7. When we proved this local law, the only error estimates that depended strongly on the particular model we are considering are the stability results for the limiting vector equation. When considering the case of sample covariance matrices, though they are not exactly considered in the context of our proof, the stability results and the square root behavior at the right edge hold. Thus, we will be able to prove a local law for sample covariance matrices.

Upper bound of top eigenvalue
In the previous section, we have established optimal term-wise estimates on the entries of the Green's function. Estimates of the trace of the trace of the Green's function, however, are generally better due to Central Limit Theorem type cancellations.
One way to see this is to prove a Fluctuating Averaging Lemma [12], which would involve combinatorial expansions. In the Gaussian case, we can implicitly see the same effect by using a general result of the concentration of the largest eigenvalue along with our optimal term-wise estimates for the Green's function. The following lemma makes this intuition rigorous. Lemma 3.8. For N ∈ N, consider a family of random measures µ N = 1 N N k=1 δ λ k where λ 1 ≥ · · · ≥ λ N such that there is a deterministicλ 1 satisfying λ 1 =λ 1 + o(N − 1 2 +ε ) for any ε > 0. Assume that there exists a deterministic measure ν whose Green's function satisfies for some γ > 0 and all z = E + iη with dist(E, supp(ν)) ≥ N − and η ≥ N −δ− 1 2 for some δ, > 0.
Proof. Assume for contradiction that λ 1 lies outside a distance N − of supp(ν) for some smaller than the such that condition (3.6) holds for z = E + iη with dist(E, supp(ν)) ≥ N − . The exact value of will be specified later. By our concentration result of λ 1 aroundλ 1 , this is equivalent to assuming thatλ 1 will be a distance N − away from supp(ν) We know that λ 1 will always be in a N −1/2+ neighborhood ofλ 1 , we will be able to prove that the integral of the Green's function in a neighborhood aroundλ 1 will always be bounded below by a constant times N −1 .
More specifically, we would have that N ] with γ < γ ∧ δ/2 and η to be N −1/2−δ . One should realize that with the above conditions, η will always be less than N −1/2−γ . The term in the middle of the above inequality is 1 To see why the first inequality is true, one should first realize that a one sided η neighborhood of λ 1 will always lie in the interval I for sufficiently large N ; this is our concentration assumption λ 1 =λ 1 + o(N − 1 2 +ε ) where we can choose ε less than γ . Without loss of generality, we may assume thatĨ := [λ 1 − η, λ 1 ] is in I.
For E ∈Ĩ, one would be able to bound the function η (E−z) 2 +η 2 below by 1 2η . The integral of this function overĨ would then clearly be bounded below by 1 2 . We see that we can set C = 2 for example.
EJP 24 (2019), paper 44. (3.10) In (3.8) we used the assumption (3.6) while in (3.10), we used the fact that ν satisfies (3.5). In the final line, we used the assumption thatλ 1 lies at a distance of N − from the support of ν, which is at a much greater scale than the length of I.
Notice that we have set η = N −1/2−δ for δ positive and can now choose N − := N − min ( ,δ/4) and see that the error of (3.10) will be o (1) N . This contradiction implies that for large N ,λ 1 must necessarily be of distance less than N − from the support of supp ν. By concentration of λ 1 aroundλ 1 , we would know that all λ 1 will be less than N − from the support of ν. Proof. We would like to apply Lemma 3.8. First notice that by Gaussian concentration, we are able to prove that the distance of in the assumption of Lemma 3.8.
Then we check that the error bounds in Corollary 3.6 are sufficient for our purposes.
The error that appears there is By the definition of D and the Lipschitz continuity of g, we have |E 1 N Tr G − m ν | = O(N − 1 2 −γ ) for some γ > 0 as long as we have η N −3/4+δ and κ ∼ N − for very small and δ > 0. Since δ can be arbitrarily small, we may choose η such that N −3/4+δ η N −1/2 and we can apply Lemma 3.8.

Universality
In the previous section, we proved a local law for m N as well as an improved local law for E [m N ], and combining it with the concentration of the top eigenvalue to prove an upper bound on the top eigenvalue. According to a recent result by Landon and Yau [16] below, the local law with upper bound on the top eigenvalue is sufficient to prove universality near the edge.
We call a deterministic matrix V η * -regular if it satisfies the following properties.
Consider the ensemble V t = V + √ tG. Where G is an independent GOE ensemble. Let t satisfy N − ≥ t ≥ N η * and let F : R k+1 → R be a test function such that F ∞ ≤ C and F ∞ ≤ C. Then there are deterministic parameters γ 0 ∼ 1 and E − such that The first expectation is with respect to the eigenvalues of the ensemble V t with λ 1 < λ 2 < ... < λ N . The latter expectation is taken with respect to the eigenvaluesλ i of a GOE which are orderedλ 1 <λ 2 < ... <λ N . i 0 is the first index i such that ith smallest eigenvalue of V is greater than − 1 2 .
Call H the ensemble with correlation structure ξ ijkl . Theorem 3.9 combined with 3.5 shows that there exists a parameter Φ > 0 such that with high probability a matrix M produced by H would be η * regular for any N −φ such that φ < Φ. We choose some φ < Φ sufficiently small and set t = N −φ . We will use this t whenever referenced in the following sections. It will be important to choose φ sufficiently small in the coming sections.
In order to apply the theorem, we would like to write our matrix ensemble in the from H = H + √ tG, where G is a standard GOE matrix and H is a matrix ensemble independent from G.
Recall the notation ξ ijkl from equation (2.1). We can let H = (h ij ) be the auxiliary Gaussian ensemble whose correlation structure is given by We see that when N is large enough, H has positive semidefinite correlation matrix and so we can construct the ensemble H . Since H is a correlated Gaussian ensemble, we have a local law as in Theorem 3.5 well as bounds for the extremal eigenvalues as in Theorem 3.9.
We will apply (4.1) as follows. H from our original Gaussian ensemble can be written as H + √ tG where H is produced from our auxiliary ensemble and G is a GOE matrix independent from H . Let U be a unitary matrix such that V := U * H U is a diagonal matrix. Notice that U * HU has the same eigenvalues as H and can be written as U * HU = V + √ tĜ whereĜ is a GOE matrix. This is possible as the GOE is invariant under unitary transformations.
We can condition on the matrix H and apply the Theorem 4.1. The ensemble H + √ tG with fixed H and G a GOE has eigenvalue density near the left edge described by for ct 2 ≥ E ≥ E − (t). E − (t) will be called the edge of this ensemble and γ (t) will be called the scaling factor. This is the content of lemma 2.3 of [16]. The universality result coming from applying Thorem 4.1 is: EJP 24 (2019), paper 44.
We used for N large enough, the smallest eigenvalue of H is of distance less than 1/2 from the edge, so the index i 0 is 1. The only issue with (4.3) is that γ 0 is a function of the initial data, we will make this a universal constant in the next section.

Changing the scaling factor
A priori, the scaling factor γ appearing in (4.3) is only known to be a function of the initial matrix H used as an input to Theorem 4.1. However, we have explicit complex analytic equations determining the scaling factors depending on the initial data. By using Rouche's Theorem, the local law, and the Lipschitz continuity of the Green function, we can show that these equations are stable to small perturbations of the initial data. This allows us to show with high probability that the scaling factor will not change too much for two different initial data points and we can choose a common scaling factor γ for the entire ensemble. The above theorem will show that any two matrices produced from H will asymptotically have the same scaling factor.
Proof. Define z i as follows where κ is a parameter dependent on t to be specified later.
The importance of the point z i is contained in the following relation.
This is a standard property of the free convolution, one can refer to equation to equations (7.2) and (7.3) of [16] for a proof.
We can determine scaling factors using the following relation. In (4.8), the first term can be bounded by a sufficiently good local law. The second term can be bounded by a Lipschitz condition provided |z 1 − z 2 | are sufficiently close to each other.
We will now attempt to bound the quantity |z 1 − z 2 |.
For the first term in the second line, we used the local law around z 1 to bound the quantity by t 3 for the second quantity we used Lipschitz continuity of m 2 0 combined with the estimate on |z 1 − z 2 | coming from (4.3).
Notice that if we now have that then it would clearly be impossible for the inequality in (4.19) to hold.
One should note that this argument will also work to show that γ 1 (t) is of O(t) distance from the scaling factor corresponding to the empirical spectral distribution of the ensemble H.

Final universality result
Using the scaling results coming from the previous section we can translate (4.3) as follows.

Theorem 4.4.
Let H be the Gaussian ensemble with correlation structure ξ ijkl satisfying the assumptions of Section 2.1. There exists a deterministic scaling factor γ depending on the ensemble H such that the following inequality holds for functions F : where λ 1 < λ 2 Proof. First, notice that we can find a function F : R k+1 → R such that F ∞ and ∇F ∞ are bounded and Recall from earlier discussion that we can write any matrix from the ensemble H as H + √ tG where H is Gaussian with correlation structure with correlation structure ξ abcd − tδ ab=cd and G is an independent GOE matrix.
Let Ω be the set in which we know that H has sufficiently good regularity so that (4.3) holds for the function F. On Ω, we would like to change the scaling factor γ 0 to γ, which is the scaling factor at the edge for the spectral density corresponding to H.
From Theorem 4.2, we know that the difference between the γ 0 appearing in (4.3) and the γ appearing here is of the order t = N −φ . Finally, one can use F is Lipschitz as well as the fact that the N 2/3 (λ i k − E − ) are bounded to say that We can take expectation of the above quantity in the ensemble H + √ tG with G an independent GOE and apply the triangle inequality with (4.3) to prove Translating this statement to G, we get on the set Ω we have One would now like to remove the conditional expectation in the above expression.
Namely, we would like to integrate (4.22) in Ω while using the trivial bound that |E H [G]− E GOE [G]| is bounded by a constant on the complement of Ω. We thus get the full universality statement as desired. Remark 4.5. As long as we know that a version of the Dyson-Brownian Motion result holds for sample covariance matrices, then we will be able prove edge universality using the local law and edge upper bound for the top eigenvalue results from the previous section.

A Decay of inverse matrices
In this section we prove Theorem 3.2. Let B = I − A. Since B < 1, We can expand A −1 = ∞ k=0 B k . We need the following lemma to bound each term. For simplicity, we will prove the statement of polynomial decay of inverse of order 1 for matrix decay of order 2 + δ. The following proof can readily be generalized to show decay of inverse of order d − 1 − δ, δ > 0, given matrix decay of order d for d > 2. We now use the following interpolation identity which appears in [15]. Proof. Notice that the decay ofB is order 1 + δ with coefficient B 2+δ . Thus we can say thatB exists in l q for q ≥ 1 1+δ . More specifically we have where E is a constant that depends on δ.
Also see that |(MBN ) xy | = | < M e x ,BN e y > | where e x is the canonical basis of our matrix space. By Young's inequality, we can say that B N e y l 2 ≤ E B 2+δ N e y l 2 ≤ E B 2+δ N l 2 . (A.5) The above equation is the result of Young's convolution inequality ||f * g|| r ≤ ||f || p ||g|| q with 1+r −1 = p −1 +q −1 . Here we use r = p = 1 2 and q = 1 with the q norm begin taken on theB term and the p norm taken on the N e y term. We finally apply the Cauchy-Schwarz inequality to | < M e x ,BN e y > | ≤ E M l 2 B 2+δ N l 2 .
Applying the above lemma to each term of the formB iB B n−i−1 , we will be able to say that [B iB B n−i With the lemma in hand, we are able to say that interpolation result as appears in [15].
The main issue is that we are no longer able to estimate quantities like < M e i |BN e j > in (A.2) using the l 2 norms of M and N and instead one must use the l p norms of M and N for p between 1 and 2.
One must then interpolate the l p norm of M and N of with the l 2 norm and the appropriate α norm like B l p ≤ c p B If one would want to prove inductively the bound that B n α ≤ n k R n , then placing this estimate inside the double product B i B n−i−1 and applying the trivial bound that i k (n − i − 1) k ≤ n 2k we would want n 2k(2− 2 p ) ≤ n k . One notices now that this is only possible if we have that 2 − 2 p ≤ 1 2 or p ≤ 4 3 . We could only choose p < 4 If one has the comfort that I − A is bounded away from 0, then one can analyze the recursion at any order α < d but the growth of the alpha norm in the recursion will no longer be I − A but some parameter r > I − A .