Universality of random matrices with correlated entries

We consider an $N$ by $N$ real symmetric random matrix $X=(x_{ij})$ where $\mathbb{E}x_{ij}x_{kl}=\xi_{ijkl}$. Under the assumption that $(\xi_{ijkl})$ is the discretization of a piecewise Lipschitz function and that the correlation is short-ranged we prove that the empirical spectral measure of $X$ converges to a probability measure. The Stieltjes transform of the limiting measure can be obtained by solving a functional equation. Under the slightly stronger assumption that $(x_{ij})$ has a strictly positive definite covariance matrix, we prove a local law for the empirical measure down to the optimal scale $\text{Im} z \gtrsim N^{-1}$. The local law implies delocalization of eigenvectors. As another consequence we prove that the eigenvalue statistics in the bulk agrees with that of the GOE.


Introduction
The Wigner-Dyson-Mehta conjecture asserts that the local eigenvalue statistics of large random matrices are universal in the sense that they depend only on the symmetry class of the model -real symmetric or complex Hermitian -but are otherwise independent of the underlying details of the model, such as the distribution of the individual matrix entries. In particular they agree with the case that the entries are real or complex iid Gaussians -the so-called Gaussian Orthogonal and Unitary Ensembles (GOE/GUE) -for which there are explicit formulas. The past decade has seen spectacular progress in the study of local statistics of random matrix ensembles. In a series of works [21,18,30,25,20,23] the Wigner-Dyson-Mehta conjecture was established for Wigner ensembles which consist of random matrices with independent entries of identical variance. Parallel results were obtained independently in various cases in [47,46].
The Wigner-Dyson-Mehta conjecture extends beyond the class of Wigner ensembles. In fact the results of [21,18,30,25,20,23] also apply to generalized Wigner ensembles in which the variances are allowed to vary but are assumed to be of the same order, and the matrix of variances is assumed to be stochastic. In the work [5] the authors consider matrix ensembles of general Wigner-type in which the stochasticity condition on the variances is dropped. In [1] the authors consider the adjacency matrix of a sparse random graph model whose degree distribution satisfies a power law. For such models the global statistics no longer follow the semicircle law, however the universality of the local statistics is unchanged. Universality has been proved for a class of random band matrices [12], deformed Wigner ensembles [40,41] and the adjacency matrices of sparse random graphs [18,37,1].
The study of these models has relied heavily on the independence of the matrix entries and the local statistics of matrices with a general correlation structure have not been considered. Existing work on correlated random matrices has been restricted to models that have a specific correlation structure which could be exploited. The universality of the adjacency matrices of random regular graphs was obtained in [11,10]. Universality for Gaussian matrices with a translation invariant correlation structure was studied in [6,4]. The local law was obtained for certain additive models in [8,9]. Universality was obtained for sparse random graph Laplacians in [36] and for polynomials of certain Gaussian matrices in [31,33]. Apart from results on the local scale, the convergence of empirical measure on the global scale was obtained for models with translation invariant correlation (see e.g. [42,45,44,14,15]) and for a model with piecewise translation invariant correlation [7]. In all cases the analysis relied on the special structure of the matrix ensemble.
In this article we extend the Wigner-Dyson-Mehta conjecture to real symmetric random matrices with a general correlation structure, E [x ij x kl ] = ξ ijkl . Our assumptions are that ξ is the discretization of piecewise Lipschitz function, and that the correlation is short-ranged, i.e., that the (i, j)th and (k, l)th entry are independent if either |i − k| or |j − l| > K where K is fixed. Under these conditions we obtain a global law for the empirical eigenvalue density. Under the additional hypothesis that ξ is strictly positive definite we obtain a local law as well as universality -that the local statistics coincide with the GOE in the limit N → ∞.
In addition to proving universality for Wigner matrices, the works of Erdős-Yau et.al, [21,18,30,25,20,23] established a robust framework for proving universality for general matrix models. This approach consists of a three-step strategy: Universality of random matrices with correlated entries The second ingredient is that the fixed point equation m = F (m) is stable, i.e., that an approximate solution is in fact close to the solution. In the case of the semicircle law and Wigner matrices the stability of the above scalar equation is trivial. In order to prove the local law for generalized Wigner matrices one adopts a similar strategy but in this case shows that the vector v := (G ii (z)) i satisfies an approximate vector fixed point equation.
In this case the matrix of variances is stochastic and so the solution is in fact a constant vector.
In order to study the general Wigner type matrices [5], the Gaussian matrix with translation-invariant correlation [4] and the sparse random graph ensembles of [1] one shows, as in the generalized Wigner case, that the vector v = (G ii (z)) i satisfies a vector fixed point equation When the matrix of variances S is not stochastic the solution to the above fixed point equation is not in general a constant vector. This type of equations is often crucial in identifying the limiting eigenvalue distributions of random matrices. One of the key contributions of [2] and [1] is to show that the above equation is stable in the bulk of the limiting spectrum, which is needed for proving the local law. An important element in previous works on the local law is the independence of the matrix entries. In particular, the ith row and column are independent of the ith minor; this key fact allows one to establish the approximate fixed point equation, and in the model considered here the loss of independence presents a serious challenge in this step of the proof. We take advantage of the short-range nature of the correlation and find that the entire matrix of Green's function elements G = (G ij ) ij satisfies a fixed point where now F is a function on the space of N × N matrices. In particular it is no longer sufficient to control only the trace m N or the vector of diagonal entries (G ii ) i , as the offdiagonal entries G ij are not necessarily small. This generalizes the equations considered in, e.g. [2], [1]. The assumption that ξ is a discretization of a Lipschitz function ψ allows us to construct a limiting version of the equation (1.3) on an auxilliary function space and establish stability for the finite N equation. A similar equation was derived in [43], where the authors considered H 0 + W where H 0 is a given symmetric matrix and W is a Gaussian matrix with correlation between the matrix entries. In [32], similar equations were considered for random matrices with certain block structures. The equation was studied in [35] in a more general setting, where unique solvability was proved for an operator-valued self-consistent equation by applying the Earle-Hamilton fixed point theorem to a subdomain of a C * -algebra. We use the same argument to show the unique solvability and stability of the matrix equation (1.3) for fixed N and z ∈ C + . In the limit N → ∞, we obtain a limiting equation in a functional space T := L ∞ ([0, 1), K) where K is the space of convolution operators on l 2 (Z): Here Ψ is an integral operator and the inverse is taken in the space K. The limiting equation (1.4), after Fourier transform, is a quadratic vector equation similar to (1.2). In [2], it is proved that this type of quadratic vector equations are stable in the bulk of spectrum, which is needed in our proof of the local law. The proof of stability in the bulk relies on the Krein-Rutman theorem on positive integral operators. We then show that one can approximate the solution of the finite N equation (1.3) by the solution of (1.4) with an O N −1 error. This approximation scheme enables us to prove the stability of (1.3) in the bulk. As mentioned above, the remainder of the strategy developed in [21,18,30,25,20,23] to prove universality of consists of analyzing the DBM flow starting from our correlated matrix ensemble and showing that the statistics are unchanged under the DBM flow.
We modify the DBM flow in order to preserve the correlation structure of our model. We then apply the results of [39] to show that the statistics agree with the GOE after a short time. In the perturbation step we rely on the argument of [13] which shows that the statistics are unchanged.
The main new ideas of this article are as follows. 1) In the case of matrices with independent entries, the Schur complement formula is a useful tool for "decoupling" a row and column of the matrix from its minor, but is no longer effective in the correlated case. We replace the Schur complement formula with a simple method that provides such a decoupling in the case of finite range correlations. 2) The off-diagonal entries of G are possibly of order 1, which does not happen for Wigner matrices and causes serious trouble in the correlated case. We solve this by finding a precise estimate for the off-diagonal entries based on the properties of solution of (1.3). 3) We develop a scheme to approximate the continuum solution of (1.4) by the matrix solution of (1.3), which enables us to identify the limit of the Stieltjes transform 1 N tr G and prove stability in the bulk.
In this article we focus on real symmetric matrices. The main results are easily generalized to complex Hermitian matrices. Parallel results can be obtained for sample covariance matrices using the same approach, as this case can be reduced to analyzing the eigenvalues of Hermitian matrices. The assumption that the correlation is finiteranged can be relaxed to, e.g. assuming that the correlation between the matrix entries decays exponentially fast (plus some more assumptions on weak dependence of distant entries). However, this extension is technical and is not included in this article.
We outline the rest of the article. In Section 2 we define the model and lay out the assumptions, introduce the self-consistent equations, then state the main results. In Section 3, we show that the Green's function satisfies the self-consistent equation up to an error term. In Section 4 we solve the self-consistent equation and prove that it converges to a limit in a certain sense; we also show that the self-consistent equation is stable under small perturbation. Section 5 is devoted to proving two of the main results, the global law and the local law of the Stieltjes transform of empirical measure. In Section 6, we prove the universality of local statistics of eigenvalues using the local law and a result in [39].

Definition of the model
For each N ∈ N, we consider an array of centered real random variables (x ij ) 1≤i≤j≤N . We assume that there is a four dimensional tensor ξ = ξ (N ) such that We assume that the (x ij ) are K-dependent for some constant K > 0 in the following sense: there is no correlation between entries, the spectrum is asymptotically the semicircle law. The spectrum becomes more singular as the correlation gets stronger and the range of correlation increases. Figure 1a is a sample from a Gaussian Orhtogonal Ensemble.
The sample matrix in Figure 1b is constructed from a GOE by adding to each entry 0.3 times the sum of its neighboring entries. The sample matrix in Figure 1c is constructed from a GOE by adding to each entry 0.5 times the sum of its neighboring entries. The sample matrix in 1d is constructed from the sample in 1b by adding to each entry 0.3 times the sum of its neighboring entries. In this article, a constant only depending on K and (µ p ) (defined below) is regarded as a universal constant and we will omit the dependence. We would like the model to include the adjacency matrices of random sparse graphs, in which there are roughly N τ edges connected to each of N vertices. We therefore introduce a sparsity parameter q = N τ for some fixed τ ∈ (0, 1]. We assume that there is a sequence of constants (µ p ) p∈N such that (x ij ) satisfies the bounds Without loss of generality assume µ 2 = 1 so that sup i,j Var x ij ≤ 1. This implies in particular that so that H is roughly of order 1. We are going to analyze the Green's function G(z) given by Throughout this article, we always denote the imaginary part of z by η. The empirical measure µ N : As N → ∞, the limit of the empirical measure, if it exists, is in general not the semicircle law unless the correlation between matrix entries are rather weak.
Finally, we state a condition that will be assumed in some but not all of the main results. In each main result, we will specify whether the condition is assumed or not. We consider the family of random variables (x ij ) 1≤i≤j≤N as a random vector in R N (N +1)/2 . The condition roughly says that almost surely, the family of random variables (x ij ) 1≤i≤j≤N does not admit a non-trivial linear relation.
be the covariance matrix of the family of random variables (x ij ) 1≤i≤j≤N . We say that the tensor ξ is positive definite with lower bound c 0 > 0 if Σ (N ) ≥ c 0 for all N .
When the condition is assumed, any constant that depends on on c 0 will be seen as a universal constant and we will omit the dependence. Here we remark that the condition is very mild, since if (w ij ) is a family of i.i.d. random variables with mean 0 and a small variance ε > 0, then the ξ of (x ij + w ij ) will be positive definite with lower bound ε. In particular, if the family of random varialbes (x ij ) 1≤i≤j≤N are i.i.d. with mean 0 and variance 1, then the matrix Σ (N ) is the identity matrix.

Remark 2.3.
The assumption that K is fixed can be relaxed to e.g. K = (log N ) log log N with little technical difficulty. We refrain from doing so in order to make the argument transparent to the reader.

Remark 2.4.
In this article we focus on real symmetric matrices. However, as mentioned in the introduction, the argument can be easily generalized to the complex case.

Self-consistent equations
Before stating the main results, we would like to introduce the equations that the Green's function satisfies. As ξ ijkl has only been defined for i ≤ j, k ≤ l, we will extend it to all (i, j, k, l) ∈ N 4 . For technical reasons we extend it in such a way: for i > j or k > l Note that now (2.1) is not always true for each (i, j, k, l). To compensate for this, let w be a random variable uniformly distributed on [0, 1) and definê (2.4) One can easily see that ξ ijkl = E x * ijx kl holds for all (i, j, k, l) ∈ N 4 . For each N define a matrix-valued map Ξ : For each N ∈ N and z ∈ C + we consider the equation We will show that the Green's function G is approximately the solution of this equation when N is large. In Section 4 we show that the equation is uniquely solvable in a certain class of matrices, by a fixed point argument. In order to apply a fixed point argument, we define a map F : Naturally, one expects M has some sort of limit when N → ∞. In order to identify the limit, we consider the space where K is the space of bi-infinite complex-valued sequences that act as bounded convolution operators on l 2 (Z), i.e., An element in T can be regarded either as a function from [0, 1) × Z → C or a function from [0, 1) → K. To clarify notations, for any f ∈ T and each ζ ∈ [0, 1), we denote f (ζ) := (f (ζ, k)) k∈Z as a member in K. Letf denote the inverse Fourier transform in the k variable, i.e.,f (θ, ζ) := k f (θ, k)e i2πkζ . It is well known that the norm of a = (a(k)) k∈Z ∈ K satisfies a = ǎ ∞ (see e.g. [38]). The norm on T therefore satisfies f T = sup (2.10) Here dl denotes the counting measure on Z. Define the inverse f −1 of f on T through Here the second inverse is taken in the space K. We regard z ∈ C as an element f z ∈ T given by f z (θ, k) = zδ k0 . Now we are ready to write down the self-consistent equation on T : m = (−z − Ψ(m)) −1 .

(2.12)
This equation is uniquely solvable in a certain subdomain of T . It turns out that if m is the solution to the equation above, tr m(z) (see Definition 2.5) is the Stieltjes transform of a probability measure on R, i.e., there is a probability measure µ on R such that If ξ is positive definite in the sense of Definition 2.2, then µ has a continuous density µ(dx) = ρ(x)dx . (2.13) In Section 4 we will see that under inverse Fourier transform, equation (2.12) becomes a quadratic vector equation as studied in [2,3], where the behavior of the limit density ρ is described in detail.

Main results
Our first main theorem concerns the unique solvability of equation (2.6) and equation (2.12). The theorem follows from Theorem 4.6, Theorem 4.17 and Theorem 4.22. Apart from unique solvability, the equations are also stable under small perturbations, but we will state the stability results in Section 4. Recall the definition (2.8) of the space T . The operator norm of a matrix A is denoted by A .  The assumption that Im tr m is bounded below on D might look a bit odd. However, this is satisfied if ρ(Re z) is bounded below for z ∈ D, since ρ(Re z) = lim Im z→0 + 1 π Im tr m(z).
Before stating the limiting laws of 1 N tr G, we introduce a deterministic control parameter that will frequently appear throughout the article.
Now we state the global law of 1 N tr G which says that with very high probability, 1 N tr G(z) converges to tr m(z) uniformly in compact subsets of C + . Moreover, each entry of G is well approximated by the corresponding entry of the deterministic matrix M which solves equation (2.6). Note that M is a N × N deterministic matrix depending on N and z. In this paper, the notation A ⊂⊂ B means that A's closure is a compact subset of the interior of the set B. P sup The local law of 1 N tr G says that if Re z is in the bulk of the limit density, then with very high probability, 1 N tr G(z) converges to tr m(z). Moreover, each entry of G is well approximated by the corresponding entry of the deterministic matrix M which solves equation (2.6), with error roughly of the size Φ.
Then for σ small enough and p large enough, the following estimates hold for all N ≥ N ω,σ,p Delocalization of eigenvectors says that the eigenvectors corresponding to bulk eigenvalues are flat. Proving the corollary from Theorem 2.11 can be done by a routine argument (see e.g. [20]). γ −∞ ρ(x)dx = k/N } be the k-th classical location of the limiting density, and u k = (u k (i)) i be the eigenvector associated with λ k . Let ω > 0 be a fixed number and σ be an arbitrarily small number, then hold with probability 1 − N −σp for p large enough and N ≥ N ω,σ,p .
As a consequence of Theorem 2.11, we have the universality of k-point correlation functions in the bulk. For a point process Π on the real line, the k-point correlation function is defined as follows: The correlation function is not well defined when the point process lives on a discrete set, which happens when each of the matrix entries satisfies a discrete distribution. In that case, we denote ρ (k) (y 1 , · · · , y k )dy 1 · · · dy k to be the measure (which is called the k-th factorial moment measure) such that for any test function O ∈ C c (R k ), Theorem 2.13 (Bulk universality). Assume that ξ is positive definite (see Definition 2.2). Let E ∈ R be in the bulk of ρ, that is, ρ has a positive density in a neighborhood of E.
Let O be a test function on R k . Fix a parameter b = N −1+c for any c > 0. We have,

Derivation of the self-consistent equations
In this section we will show that the Green's function G approximately satisfies the equation (2.6), up to an error term of size Φ. The main result of this section is Lemma 3.10. The main tools are some algebraic identities stated in Subsection 3.1 and concentration inequalities of quadratic forms of weakly dependent random variables in Subsection 3.2.

Resolvent indenities
Definition 3.2. We will frequently use a stochastic control parameter For technical reasons we also need the following stochastic control parameter where the second sup is taken over all I, J ⊂ [i − 2K, . . . , i + 2K] such that I ∩ J = ∅.
We prove two resolvent identities that we will frequently use in the article. These identities were named the first decoupling identity [24] and second resolvent decoupling identity [29]. Although their proofs are very simple; they play fundamental roles in the proof of local laws for random matrices.
Proof. The second equation follows from taking the (T, J)-block of the resolvent identity On the other hand, taking the (I, J)-block of the same identity one has Combining this equation with (3.3) yields (3.2).
The first resolvent identity (3.2) immediately impliy the following corollary, which will be used many times in the rest of the section.
We will also need the Ward identity for the resolvent of Hermitian matrices.

Concentration inequalities
The following lemma is a corollary of Lemma A.1 in [19].
Let (A i ), (B ij ) be deterministic families or random variables that are independent of (a i ).
Then for any σ > 0 and p large enough we have with probability at Here c p depends only on p.
Proof. For the first estimate, one only need to split the sum i A i a i into at most K parts, each part being the sum of independent random variables. One can apply Lemma A.1 in [19] to each part and get the first estimate. For the second estimate, one can write The first sum on the right hand side can be split into at most K 2 parts and each part is estimated using Lemma A.1 in [19]. The second sum on the right hand side can be estimated using (3.4).

Corollary 3.7.
Under the same condition as Lemma 3.8. Assume further that Here c is an universal constant.
Proof. By Lemma 3.6 and the Ward identity, for any σ > 0 and p ≥ 2 we have with The second estimate holds similarly.

Expansion of the Green's function
Throughout this section, we fix an arbitrarily small σ > 0 and an integer p ≥ 100/σ. Definition 3.8. Let (a (N ) ) and (b (N ) ) be two sequences of random variables. We say that a = O σ,p (b) if there are universal constants c and N 0 ∈ N depending on σ and p such that |a| ≤ cb holds with probability at least 1 − (log N ) c N −σp when N > N 0 .
We start with the trivial identity Let T be the set of indices that are correlated with j.
There are two cases: The lemma follows from (3.7) and (3.8).
T c ,T c H T c ,j that appears on the left hand side of (3.6). Now let S ⊂ T c be the set of indices correlated with T and let U = S ∪ T.

Then we can split
The good news is that one can condition on the index set U c and apply Lemma 3.6 to the first term on the right hand side of (3.10), which yields the following lemma: Proof. In view of (3.6) and (3.10), it is sufficient to prove that We first estimate the first term on the left hand side. By Corollary 3.7 and the fact that On the other hand we use Lemma 3.6 to get Combining all the three estimates above we get (3.13) The lemma follows from (3.12) and (3.13).

Solving the self-consistent equations
In this section, we show the unique solvability and stability of equation (2.6) and (2.12). We also show that as N goes to infinity, the solution M to (2.6) converges to the solution m of (2.12) in a certain sense. We give an explicit construction of M from m, up to an error of size O N −1 .

Solution for fixed N and z
The strategy to solve (2.6) is to write it as a fixed point equation and apply the Earle-Hamilton fixed point theorem to a certain subdomain of C N ×N . This was done by Helton et al. [35] in a general setting for operator-valued self-consistent equations. For the readers' convenience, we give self-contained proofs in our case. We also prove the stability of (2.6) under perturbations that has small · ∞ norm. This relies on the off-diagonal decay of the solution. The main results of this subsection are Theorem 4.6 and Theorem 4.12.
We restate the self-consistent equation below, recalling the map Ξ defined in (2.5): It is remarkable that Ξ(M ) is a band matrix with band width 2K + 1, which will be used to prove the exponential decay of off-diagonal entries of the solution. We are going to solve the equation using a fixed-point argument. For this purpose, we define Then equation (2.6) becomes simply However, F is not defined on the entire space C N ×N , therefore we introduce a domain where F is well-defined.
In the sequel, we denote the operator norm of a matrix A by A and denote   (2.4)). Therefore, we only need to show that for any unit vector v, The two estimates above yield M ∈ M N ( ε δ 2 , 1 ε ).
The following corollary tells us that the map F not only maps M + N to itself, but also takes a compact subset of M + N to its 'strict' interior.  The image of M N (0, δ/2) under the map F is contained in M N (ε, δ/4), whose ε-neighborhood is a subset of M N (0, δ/2).
Proof. If M ∈ M N (0, δ/2), then by (4.2), However, we need more than this theorem to get the stability of solutions. Therefore, we prove the following lemma in our settings. The proof is a slight modification of that of Theorem 4.4 (see [34] for its proof).

Lemma 4.5.
There is a metric d on M N (0, δ/2) such that the map F defined in (4.1) is a strict contraction. In particular, The metric is equivalent to · in the interior of M N (0, δ/2) in the sense that it Proof. Let ∆ be the unit disk in C. α(γ(t), γ (t))dt .
Then we define the distance d(Q 1 , Q 2 ) between Q 1 and Q 2 by minimizing the length of curves connecting two points: which is a map from M N (0, δ/2) to itself, because the diameter of M N (0, δ/2) is at most δ. Taking the diffrential at Q 0 , DF (Q 0 ) = (1 + ε/δ)DF (Q 0 ). Let g : M N (0, δ/2) → ∆ be holomorphic. By the chain rule, for any V ∈ C N ×N , Since g is arbitrary, by definition of α, This is true for any Q 0 ∈ M N (0, δ/2). Let γ be a smooth curve in M N (0, δ/2), then for any t ∈ [0, 1] we set V = γ (t) and Q 0 = γ(t) and see that Integrating over t, we have By definition of the metric d we have Thus we have proved that F is a strict contraction under d.
Integrating over t yields On the other hand, for any Q 0 ∈ M N (β, δ/2 − β), V ∈ C N ×N and holomorphic g : Thus α(Q 0 , V ) ≤ β −1 V . It follows that for any piecewise smooth curve γ ⊂ M N (β, δ/2− β), we have α(γ, γ ) ≤ β −1 γ . By the convexity of M N (β, δ/2 − β), one can always find a curve γ ⊂ M N (β, δ/2 − β) that connects Q 1 and Q 2 . Integrating over t, Proof. Take any Q 0 ∈ M + N . Define recursively Q k+1 := F (Q k ). It is easy to check that The stability of solution follows from the fact that F is a strict contraction under the metric d and that d is equivalent to · in the interior of M N (0, δ/2).    . Then, Proof. This theorem is a immediate consequence of Lemma 4.9 below, which is a corollary of Theorem 2.4 in [16]. .

Proof. By the simple observation that
Theorem 4.10, which we state below, implies that .
Combing the above estimates with the trivial bound |A * (i, k)| ≤ A and that κ(AA * ) ≤ κ(A) 2 , we have . We state below the theorem that we used in the proof. The original theorem consists of two cases, where the matrix A is positive definite or A is not positive definite. Our situation is the second case, but the result there is not as strong as we need, thus we do not directly apply it. We only state the first case of original theorem for our purpose. Besides the off-diagonal decay of M , another important observation is that the map Ξ has a 'mollifying effect', as illustrated by Lemma 4.11 below. This lemma, although fairly simple, is very important because we will use it over and over again in the proof of stability. In the sequel, A ∞ := sup i,j |A ij | for any A ∈ C N ×N . Proof. The claim that Ξ(A) is a band matrix follows from the definition of the map Ξ. By the assumption that Var x ij ≤ 1, we have |ξ ijkl | ≤ 1, which implies The conclusion of the lemma follows from the fact that A ∞ ≤ A .
Once whereR ij = R ij 1 |i−j|≤K . Now thatR is a band matrix with band width 2K + 1, we have R ≤ (2K + 1) R ∞ . Therefore, when z is small enough, Theorem 4.7 implies that

Solution to the limiting equation
Recall that in Subsection 2.2 we defined T = L ∞ ([0, 1), K), where K is the space of bi-infinite sequences viewed as convolution operators on l 2 (Z). For the readers' convenience, we restate equation (2.12) below, recalling the definition (2.10) of Ψ. The inverse in the space T is defined in (2.11). m = (−z − Ψ(m)) −1 .
Since we will also solve it by a fixed point argument, we define a map F on T by The map F is not well-defined on the entire space, thus we introduce a subdomain of T such that F is well-defined. Recall that the inverse Fourier transform of f ∈ T in the second variable is denoted byf .  Su(θ, s) := ψ (θ, φ, s, t)u(φ, t)dφdt,  Proof. Take an arbitrary real continuous function g ∈ C([0, 1] 2 ). For each N ∈ N define a random variable Herex is defined as in (2.4). One can easily compute the variance of Y N : (4.11) Since g is arbitrary, we conclude that 0 ≤ψ(θ, φ, s, t) ≤ K 2 . If ξ is positive definite, then Var Y N ≥ c 0 . Letting N → ∞ we haveψ(θ, φ, s, t) ≥ c 0 . Finally, the claim that Ψ is a bounded operator follows from the upper bound ofψ.
This lemma together with (4.9) immediately yields the following corollary: The solution m is Liptchitz on each I j (see (2.3)).
The off-diagonal entries of m decay exponentially: The solution is stable in the sense that if m ∈ T (0, δ/2) satisfies a perturbed equation Proof. By Lemma 4.15 we see that if inf θ,sf (θ, s) ≥ 0, then (−z − Sf (θ, s)) −1 ∈ (ε, δ/4) for all (θ, s) ∈ [0, 1) 2 . Therefore, F maps T (0, δ/2) to T (ε, δ/4), whose ε-neighborhood is a subset of T (0, δ/2). Now one can repeat exactly the same argument in the proof of Lemma 4.5, because the argument has nothing to do with the structure of T except for that T is a complex Banach space. Therefore, there is a metric d on T (0, δ/2) under which the map F is a strict contraction satisfying Thus one can obtain the solution by iterating F from an arbitrary initial point Q 0 ∈ T (0, δ/2). The exponential decay of m(θ, k) follows from Lemma 4.9. The stability follows from exactly the same argument in the proof of Theorem 4.7.
Note that in the above theorem, the stability relies on the η and |z|. If η is too small or |z| is too large, the estimate breaks down. The following theorem says that equation (2.12) is stable for z ∈ C + as long as Im tr m(z) is positive, even if z is very close to the real axis. The theorem follows from Theorem 2.12 of [2] and Lemma 4.15. For the sake of completeness, we give a self-contained proof in our setting. Proof. Let u, u andř be the inverse Fourier transform in the second variable of m, m and r, respectively. Consider a spaceŤ = L ∞ ([0, 1) 2 ) which is isometric to T by Fourier transform. Then, u ∈Ť satisfies a perturbed version of (4.7): In the sequel, we denote the norm onŤ by · ∞ . Then the assumptions of the theorem translated into u andř reads Im u(θ, s)dθdsu ≥ ω > 0 , In view of Corollary 4.16, if ε ω is small enough, then inf θ,s Im u(θ, s) ∧ Im u (θ, s) ≥ c ω .
Dividing both sides by |u| and note that η ≥ 0 we have Im u |u| ≥ |u| (S Im u) . Now define an operator T on T by Denote w := Im u |u| . Recall the definition of S, then (4.13) becomes w = T w .
Note that T is an self-adjoint integral operator with a strictly positive integral kernel, by Krein-Rutman theorem, T 's largest eigenvalues is 1 and has a positive spectral gap δ depending on c 0 . Now take the difference between (4.7) and (4.12), (4.14) Note that (z + Su) −2 = u 2 . Denote q := u−u |u| and define a function α ∈Ť through e iα := u/ |u|, then q = e i2α T q + O c ω u − u 2 ∞ −ř . We claim that 1 − e i2α T is invertible and its inverse is bounded by some C ω . To prove this claim, it is sufficient to prove Re(v * e i2α T v) ≤ 1 − c ω for any unit vector v ∈ L 2 ([0, 1) 2 ) and some c ω > 0. Write v = v 1 + v 2 where v 1 is parallel to w and v 2 is orthogonal to w. (4.16) The first term on the right hand side by definition equals Note that cos(2α) ≤ 1 − c ω since Im u ≥ ω and |u| ≤ C ω , thus Plugging into (4.16) we have, This estimate is useful when v 2 is small. On the other hand, Here we have used the spectral gap of T . The right hand side is small when v 2 is big. Combining (4.18) and (4.19) we have, for some new constant c ω > 0, (4.20) Thus we have proved the claim. Therefore, (4.15) yileds Recall that q = (u − u )/ |u|, and that |u| is bounded, we have Take ε ω small enough, then we get Plugging in (4.14) and recalling that S is an integral operator with bounded integral kernel, we have By the definitions of u, u andř, the above estimate is equivalent to m − m ≤ c ω r .
One expects the solution M of (2.6) converges to the solution m of (2.12) as N → ∞.
However, it is not clear how to define the limit of a sequence of band matrices whose sizes go to infinity. Fortunately, we can show that 1 N tr M , as a sequence of holomorphic functions on C + , does converge to tr m. The trick is that one can 'imbed' M into the space T and show that this gives an approximate solution of (2.12) which is close to m. We claim that −q is in the domain T + . To prove this claim, we need to show that for any (θ, s) ∈ [0, 1) 2 , − Im k q(θ, k)e i2πsk ≥ 0. For θ whose N −1/2 -neighborhood is contained in one of the I α 's, take a vector The left hand side is non-negative because Therefore, − Im k q(θ, k)e i2πsk ≥ η/2 when N is large enough. For θ whose N −1/2neighborhood contains the endpoints of the I j 's, the estimate also holds because of the continuity of M −1 . Thus we have proved the claim. Now we hope thatm is an approximate solution to (2.12). However, it is not clear whetherm is in the domain T + or not. Instead, F(m) ≈ (−q) −1 is in the space T + . Thus, we prove that F(m) is an approximate solution, then we show that tr F(m) is close to m. We first show thatm = F(m) approximately holds. By definition lm (θ, l)q(θ, h − l) = δ 0h − w(θ, h) , (4.21) where w(θ, h) = |l−h|≤Km (θ, l)((M −1 ) N θ+l , N θ +h − q(θ, h − l)). Now we estimate w(θ) . Note that w(θ) ≤ h |w(θ, h)|, by the off-diagonal decay ofm, Here α z is a constant less than 1 and depending on z. Integrating over θ, we have w(θ) dθ ≤ c z /N . Now we estimate q − (−z − Ψ(m)), which is roughly the difference between an integral and its Riemann sum. For θ whose K/N -neighborhood is contained in one of the I α 's, This combined with (4.21) and (4.22) yields m(−z − Ψ(m))(θ) dθ ≤ c z /N .
Remember that −q ∈ T (η/2, c z ) when N is large. Since −q is close to (z + Ψ(m)), we have a bound (z + Ψ(m)) ∈ T (c z , c z ) when N is large. Therefore, it is comfortable to take the inverse of (−z − Ψ(m)), which is F(m). Moreover, F(m) has exponential decaying off-diagonal entries. Therefore, we multiply F(m) to the estimate above and see

Stability in the bulk
In Theorem 4.18 we see that if Im tr m(z) is bounded below, then the solution m of (2.12) is stable under small perturbations, even if z is close to the real axis. In this subsection we show that under the same assumption that Im tr m(z) is bounded below, the solution M to the finite-demensional equation (2.6) is also stable (Theorem 4.23). The strategy is to show that one can approximate m by 'imbedding' M into the space T . The stability of m will imply the stability of M . Before proving the stability, we also prove Theorem 4.22 which says that 1 N tr M converges to tr m in any domain where Im tr m is bounded below. Theorem 4.22 will be used in the proof of Theorem 4.23.
We will need the following property of the map Ξ. Proof. Assume that A has spectral decomposition A ij = α u α i u α j λ α . Then for By the assumption that ξ is positive definite, i,j,k,l v i v k ξ ijkl u α j u α l is bounded below by c 0 . It follows that v * Ξ(A)v ≥ c 0 .

Corollary 4.21.
Assume that ξ is positive definite with lower bound c 0 > 0 in the sense of Definition 2.2. Suppose |z| ≤ ω −1 and Q ∈ M + N satisfies Im tr Q ≥ ω > 0, then there is a c ω such that F (Q) ∈ M n (c ω , c −1 ω ). In particular, assume M is the solution to equation (2.6). Suppose |z| ≤ ω −1 and Im tr M ≥ ω > 0. Then Proof. By Lemma 4.20, Im tr Q ≥ ω and Q ∈ M + N implies The following theorem says when Im tr m(z) is bounded below, then the solution M to (2.6) converges to m in a certain sense. Again the strategy is similar to Theorem 4.19, that is, we 'imbed' M into the space T and show that this gives an approximate solution that is close to m. However, one needs to use a bootstrapping argument, because we will use Theorem 4.18 that assume the smallness of m − m where m is the approximate solution constructed from M . We do not have an a priori bound for m − m when z is close to the real axis. Therefore we need to start with z far from the real axis, then iteratively get the bound close to the real axis. Proof. Fix a z ∈ D such that tr m − 1 N tr M ≤ N −1/2 . Therefore, 1 N tr M ≥ ω/2 when N is large. Such a z exists by Theorem 4.19.
The left hand side has a positive imaginary part, indeed, Therefore, − Im k q(θ, k)e i2πsk ≥ c ω when N is big enough. For θ whose N −1/2neighborhood contains the endpoints of the I j 's, the estimate also holds because of the continuity of M −1 . Thus we have proved the claim. Now we hope thatm is an approximate solution to (2.12). However, it is not clear whetherm is in the domain T + or not. Instead we prove that F(m) is an approximate solution, then show that tr F(m) is close to m. We first show thatm = F(m) approximately holds. By definition, lm (θ, l)q(θ, h − l) = δ 0h − w(θ, h) , (4.25) where w(θ, h) = |l−h|≤Km (θ, l)((M −1 ) N θ+l , N θ +h − q(θ, h − l)). Now we estimate w(θ) . Note that w(θ) ≤ h |w(θ, h)|, by the off-diagonal decay ofm, Integrating over θ, we have w(θ) dθ ≤ c D /N . Now we estimate q − (−z − Ψ(m)), which is roughly the difference between an integral and its Riemann sum. For θ whose K/N -neighborhood is contained in one of the I α 's, Recall that −q ∈ T (c D , c D ) when N is large. Since −q is close to (z + Ψ(m)) we have a bound (z + Ψ(m)) ∈ T (c D , c D ) when N is large. Therefore, it is comfortable to take the inverse of −(z + Ψ(m)), which is F(m). Moreover, F(m) has exponential decaying off-diagonal entries. Therefore, we multiply F(m) to the estimate above and see m − F(m) dθ ≤ c D /N . (4.28) Applying the map f → −z − Ψ(f ) to bothm and F(m), Proof. At the end of this proof, we need the quantitym that was defined in the proof of Theorem 4.22 and some estimates obtained in that proof. First we prove the theorem under the additional assumption that R is an band matrix with band width (2K + 1).
Later on we will remove this additional assumption.
We first get bounds on F (M ). When ε D is small enough, we have 1 Define m ∈ T by m (θ, k) = M N θ , N θ +k and q ∈ T by q(θ, k) = (F (M ) −1 ) N θ , N θ +k if θ's K/N neighborhood is contained in one of the I α 's (see (2.3)) and q(θ, k) := iωδ 0k otherwise. We claim that −q is in the domain T (c D , C D ). To prove this claim, we need to show that −q ∈ T (c D , +∞), that is, for any θ, s ∈ [0, 1), − Im k q(θ, k)e i2πsk ≥ c D . For θ whose N −1/2 -neighborhood is contained in one of the I α 's, take a vector v = (v k ) = (e i2πsk 1 |N k−θ|≤N −1/2 ) .
Therefore, − Im k q(θ, k)e i2πsk ≥ c D when N is big enough. For θ whose N −1/2neighborhood contains the endpoints of the I j 's, the estimate also holds because of the continuity of Ξ(M ). Thus we have proved the claim. Next, we prove that F(m ) is an approximate solution. We first show that m = F(m ) approximately holds. By definition l m (θ, l)q(θ, h − l) = δ 0h − w(θ, h) , (4.31) where w(θ, h) = |l−h|≤K m (θ, l)((M −1 ) N θ+l , N θ +h − q(θ, h − l)). Now we estimate w(θ) . Note that w(θ) ≤ h |w(θ, h)|, by the off-diagonal decay of m , Integrating over θ, we have Now we estimate q − (−z − Ψ(m )), which is roughly the difference between an integral and its Riemann sum. For θ whose K/N -neighborhood is contained in one of the I α 's, This combined with (4.31) and (4.32) yields Recall that q ∈ T (c D , c D ) when N is large. Since −q is close to (z + Ψ(m )) we have a bound (z + Ψ(m )) ∈ T (c D , c D ) when N is large. Therefore, it is comfortable to take the inverse of −(z + Ψ(m )), which is F(m ). Moreover, F(m ) has exponential decaying off-diagonal entries. Therefore, we multiply F(m ) to the estimate above and see Applying the map f → −z − Ψ(f ) to both m and F(m ), Now we take the inverse of (−z − Ψ(m )) and (−z − Ψ(F(m ))). Because everything is bounded, we have  By definition of m andm,

Proof of the global law
In order to apply Lemma 3.10, we need estimates for Γ and γ for fixed z: with probability 1 − c p N −σp+s . Here s is a universal constant.
Proof. Clearly Γ ≤ η −1 . In order to bound γ, we denote K := I ∪ J for any I, J ⊂ [i − 2K, . . . , i + 2K] and an arbitrary i. By Schur's complement formula, The operator norm can be estimated term by term. First, For the readers' convenience, we restate the global law below: Proof. By Lemma 3.10 and 5.1 and the fact that G is Liptchitz on D we have In other words, Here the error term satisfies F (G)R ∞ = O ( R ∞ ) by Lemma 4.9. Now Theorem 4.12 immediately implies the conclusion, with tr m replaced by 1 N tr M . The proof is concluded by using Theorem 4.19 and taking ν < σ small enough.

Proof of the local law
In order to apply Lemma 3.10, we need estimates for Γ and γ. The estimate for γ is easy to get when the matrix entries are independent, since the off-diagonals of G are small. In our case where G has possibly big off-diagonal entries, the estimate for γ relies on the properties of the solution M .
Proof. Let σ = (ν ∧ τ )/20, so that N 5σ Φ < N −σ on D ν . All the estimates in this proof hold when N ≥ N ω,σ,p , where N ω,σ,p changes from line to line, but only for finite times. For every N , choose a discrete subset Λ ⊂ D ν such that the N −10 neighborhood of Λ contains D ν . By Lemma 3.10, we have Therefore, the value of G at any point in D ν can be well approximated by the points in Λ, with error less than N −8 . Hence . Therefore we can bound the probability of a smaller set Let z 0 be a fixed number in D, the second probability is less than , which is less than c p N σp+s by Theorem 2.9. Therefore, The first estimate in the theorem follows from absorbing the constants c p and s by N −σp/2 then replacing σ by 5σ and replacing p/10 by p. The second estimate follows from the first estimate and Theorem 4.19.

Proof of bulk universality
The strategy is as follows: first, we run an Ornstein-Uhlenbeck process on the matrix entries that has the same correlation structure as (x ij ) and show that this does not change the local statistics as long as t ≤ N −1+ε . Second, we prove that the local statistics at time t = N −1+ε agrees with the local statistics of the GOE. Then we can conclude that the local statistics of the original matrix agrees with the GOE.

Correlated Ornstein-Uhlenbeck process
Let (B ij (t)) 1≤i≤j≤N be a family of Brownian motions that has the same correlation structure as (x ij ) does, i.e., Then we define X(t) to be the matrix with upper diagonal part equal to (x ij (t)) 1≤i≤j≤N . It is easy to check that X(t) has the same correlation structure as X(0) does, and that (x ij (t)) satisfy the same moment bounds (2.2). Therefore, the local law holds for each t ≥ 0. We shall need the following lemma in the sequel: Lemma 6.1. Let x = (x 1 , . . . , x m ) be an array of K-dependent real centered random variables such that sup k E x 3 k 1/3 ≤ κ 3 . Let f be a C 2 function on R m . Then, Proof. If f is a linear function, then the equality is exact without the error term. In general, let T be the set of indices correlated with i. Denote x (T) := (x k 1 k / ∈T ). By Taylor's expansion, Let S ⊂ T c be the set of indices correlated with T, and U = S ∪ T. Note that k∈T ∂ k f (x (T) )x k = k∈T ∂ k f (x (U) )x k + k∈T ,l∈S 1 0 (1−t)∂ kl f (x (U) +t(x (T) −x (U) ))x k x l dt.
Set H t := N −1/2 X(t). The above lemma enables us to compare f (H t ) and f (H 0 ) where f is a C 3 function on the matrix space.
Adding the two equations above, the first term on the right hand side of (6.4) cancels.
The conclusion follows from the observation that E [|h ij h i j h i j |] ≤ µ 3 3 N −1 q −1/2 and E [|h i j |] ≤ N −1 . Now we are ready to prove the following Green's function comparison lemma: Lemma 6.3. Let δ > 0 be arbitrary and choose an η such that N −1−δ ≤ η ≤ N −1 . For any sequence of positive integers k 1 , · · · , k n and complex parameters z m j = E m j ± iη, j = 1, . . . , k m , m = 1, . . . , n with an arbitrary choice of the signs and ρ(E j ) ≥ ω, we have the following. Let G t (z) = (H t − z) −1 be the resolvent and let f (x 1 , x 2 , . . . , x N ) be a test function such that for any multi-index α = (α 1 , . . . , α N ) with 1 ≤ |α| ≤ 3 and for any positive, sufficiently small κ, we have Proof. We consider only the n = 1, k 1 = 1 case for simplicity, the general case can be handled likewise. We want to show that In order to apply (6.4), we need to bound the derivatives of 1 N tr G θ with respect to the entries of H θ = N −1/2 θX, here G θ = (N −1/2 θX − z) −1 , θ ∈ [0, 1] Tij and (i, j) ∈ N 2 . Note that for |α| ≤ 3, by direct calculation, ∂ α 1 N tr G ≤ Γ 4 . (6.5) Note that ∂Γ ∂η ≤ Γ η , thus Γ(E + iη) ≤ Γ(E + iN −1+δ )N 2δ , where Γ(E + iN −1+δ ) can be bounded using the local law Theorem 2.11. Therefore, on an event with probability 1 − N −D for some large D. Outside this event, we have a crude bound ∂ α 1 N tr G ≤ CN 8 . It is not hard to show that Theorem 2.11 holds for G θ uniformly for θ ∈ T ij and all (i, j) ∈ N 2 by a continuity argument. Therefore the estimate (6.6) holds uniformly for θ ∈ T ij and (i, j) ∈ N 2 with some larger D > 0.
Finally, we apply Lemma 6.2 (or rather, the estimate (6.4)) to complete the proof. Lemma 6.3 enables us to approximate test functions on the microscopic scale by a standard argument. We have the following comparison theorem. For the proof we refer the readers to Theorem 6.4 in [24] . It remains to show that the local statistics of H t in the bulk agrees with that of the GOE. A important observation is that dB can be decomposed dB ij = dB 1 ij + dB 2 ij , such that B 1 and B 2 are independent and dB 2 ij is a GOE scaled by a small constant.
Therefore, when t ≥ N −1+ε , we can write x ij (t) =x ij (t) + w ij , where (w ij ) are i.i.d. Gaussian with variance N −1+ε and (x ij (t)) is a family of random variables that have a positive definite correlation structureξ which is positive definite in the sense of Definition 2.2 with lower bound c 0 − o(1). It is tedious but easy to show that the local law holds for X(t) = (x ij (t)). DenoteH t := N −1/2X (t) andG := (H t − z) −1 . The local law implies that Im 1 N trG(z) is bounded above and bounded below for z = E + iN −1+ε where E is in the bulk of the limiting spectrum.

Comparison with GOE
Theorem 2.13 is a corollary of Theorem 6.4 above and Theorem 2.4 in [39]. Here we remark that the result in [22] also applies to our case.
Consequently,H(t) satisfies the regularity condition in Theorem 2.4 of [39]. Therefore, the k-point correlation function of H t near E converges to that of the GOE in the sense of Theorem 2.4 of [39]. The conclusion of Theorem 2.13 follows in view of Theorem 6.4.