Random characteristics for Wigner matrices

We extend the random characteristics approach to Wigner matrices whose entries are not required to have a normal distribution. As an application, we give a simple and fully dynamical proof of the weak local semicircle law in the bulk.


Introduction
Starting with the seminal works of Erdős, Schlein, and Yau [15,16], a large portion of recent progress in random matrix theory rests on strong concentration of measure phenomena for the resolvent on almost microscopic scales. Such estimates are an important model-dependent step in proving Wigner-Dyson universality of the local eigenvalue statistics and uniform delocalization bounds for the eigenvectors. The mean-field setting is especially well-understood: the methods in [12,17,18] give strong results and the central ideas are robust enough to cover a wide range of models (see, for example, [2, 3, 5-7, 9, 13, 14] and references therein). The most basic example consists of the N × N Wigner matrices, whose entries H ij are drawn, independently up to symmetry, from some density with mean zero and variance N −1 . A typical result proves that the resolvent G(z) = (H − z) −1 has essentially deterministic entries the approximation being valid on the smallest possible scale Im z ≫ N −1 . The function is the Stieltjes transform of the semicircle law. The approximations (1.1) are usually proved by deriving an approximate self-consistent equation 1 + (z + G(z) )G(z) ≈ 0, (1.2) where we used the notation A = N −1 Tr A that we retain throughout this paper. The stability of the self-consistent equation is then used to show that the approximate solution G(z) is close to the solution m sc (z) of the exact equation. Because it is usually not possible to prove the validity of the self-consistent equation on local scales Im z ≫ N −1 directly, the analysis uses the stability of (1.2) again to show that rough estimates on the validity of (1.2) self-improve at finer scales. This idea enables a careful bootstrapping scheme, which allows one to successively improve the scale of the approximation (1.1).
It was noted by Pastur [22], that the self-consistent equation (1.2) can also be viewed as the terminal constraint at t = 1 of the deterministic advection equation which can be easily solved by considering a suitable set of deterministic characteristic curves.
In fact, if one generates a Gaussian Wigner matrix dynamically by evolving the entries with Brownian motion [11], one can derive a stochastic version of (1.3) with some explicit matrix-valued martingale M (t, z) (see, for example, [24]). This approach can be used to derive the validity of the semicircle law on global scales Im z = O(1) in the infinitevolume limit [4]. Deterministic characteristic curves have also featured in the analysis on local scales in more recent random matrix literature such as [1,8,19,20]. The works [25,26] showed that the SDE (1.4) can also be analyzed on local scales by considering the evolution along the random characteristiċ yielding a simple dynamical mechanism for directly proving concentration of measure for the resolvent. The method thus allows one to completely separate any stability arguments from local concentration of measure estimates, thereby circumventing the need for a bootstrap argument. The purpose of this paper is to show that this approach to local resolvent estimates is not limited to Wigner matrices with Gaussian entries. The basis of this is the construction of a matrix martingale H(t) whose rescaled entries follow a given density ̺ at time t = 1. This will be possible for densities ̺ satisfying the following assumption.
Assumption 1.1. The density ̺ > 0 is strictly positive, has zero mean and unit variance, and the function is bounded and Lipschitz continuous on R.
The integral equation (1.6) is equivalent to so the condition that a be bounded essentially amounts to ̺ being sub-Gaussian, whereas the Lipschitz continuity of a is linked to the regularity of ̺. Madan and Yor [21] used the Lipschitz continuity of a to construct a scalar martingale h(t) satisfying where b is a standard Brownian motion. Inspired by an idea of Dupire [10], they then showed that Kolmogorov's forward equation implies Hence, to construct the matrix process H(t), we take an array (B ij ) of standard Brownian motions that are independent up to the constraint B ij = B ji and define H ij (t) as the solution of the SDE Then H(t) = √ tH in distribution, where H is a Wigner matrix whose rescaled entries √ N H ij have distribution ̺. It will be important in the sequel that the quadratic variation of H ij (t) satisfies Combining Itô's lemma with the Neumann expansion of the resolvent yields the matrix-valued SDE for the resolvent process G(t, z) = (H(t) − z) −1 with z ∈ C + . Multiplying out the drift gives an expression of the form The matrix-valued operator S[t, ·], which acts on an N × N matrix A as is a dominant term whose presence in (1.8) provides the self-energy corrections in the resolvent.
On the other hand, the operator should be thought of as a finite-volume error reflecting the lack of Hermitian symmetry in our model. Therefore, the self-energy correction coincides with the Itô correction. This is similar in spirit (and identical in the Gaussian case) to the cumulant expansion of He, Knowles, and Rosenthal [18]. However, the technical details are simpler here because the quadratic variation process naturally encodes the fluctuations around the Gaussian noise driving the dynamics. We will illustrate our method by giving a new proof of the weak bulk local semicircle law for the normalized resolvent trace of Wigner matrices whose entries are drawn from densities satisfying Assumption 1.1. To state this result, we will make use of the stochastic domination language of [12]. Let {X(u)} u∈U and {Y (u)} u∈U be two non-negative N -dependent families of random variables. We will say that X(u) is stochastically dominated by Y (u) uniformly in u ∈ U , if, given any ε, p > 0, the inequality is satisfied for all sufficiently large N ≥ N 0 (ε, p). We express this relationship by writing X ≺ Y uniformly in u ∈ U . Although most of the probabilities in this paper can be controlled with an explicit exponential tail, we have refrained from doing so for the sake of brevity. Our proof of the local semicircle law will be valid in a bulk spectral domain for some κ > 0. For simplicity, we will assume that the minimal spectral scale is given by where θ > 0 is fixed, but arbitrarily small.
We stress that this theorem first appeared in [15]. It has been extended both to Wigner matrices with minimal assumption on the distribution of the entries [2] and to random matrices with a more general spread-out variance profile [12]. Theorem 1.2 implies that Im G(1, z) is bounded both above and away from zero when z ∈ D. The Schur complement formula and classical concentration of measure results show that Similar considerations apply to the off-diagonal entries These calculations have become standard and are explained in great detail in [17]. However, given Theorem 1.2, no further bootstrapping is required to conclude (1.9) and (1.10). The organization of this paper is as follows. In Section 2, we show in which sense the evolution (1.8) approximates (1.4). Then, in Section 3, we prove that (1.5) defines an approximate characteristic flow, which we use to derive the local semicircle law in Section 4.

The self-energy correction
In comparison the Gaussian case, the main complication in the random characteristic approach for more general Wigner matrices is that the self-energy operator S[t, ·] remains random and time-dependent. Nevertheless, we will prove in Theorem 2.1 that with very high probability so that the characteristic curve (1.5) still counteracts a large part of the term The resulting error term is not small a-priori, but retains a specific structure that enables the Grönwall scheme of Lemma 3.3.
For technical reasons, we will prove Theorem 2.1 for spectral parameters z in a very large domain and for times t that are greater than a cutoff with some fixed K ∈ N to be specified later in the proof of Lemma 3.3. In the statement of Theorem 2.1, and throughout this paper, · denotes the operator norm.
Proof. We first show that uniformly in t ∈ [t 0 , 1], z ∈ D ′ , and k ∈ {1, . . . , N }. Let H k (t) denote the matrix obtained by replacing the k-th row and column of H(t) by zeros. Denoting by G k (t, z) the resolvent of H k (t), the resolvent identity implies using (2.4) and the trivial resolvent bound for the summand with j = k. Similarly, we have To prove (2.3) it thus suffices to show that uniformly in t ∈ [t 0 , 1], z ∈ D ′ , and k ∈ {1, . . . , N }. After conditioning on H k (t), the random variables σ kj (t)G k jj (t, z) are independent and bounded by C N −1 |G k jj (t, z)|. Using Hoeffding's inequality with respect to the conditional probability P k we get and which proves (2.5). The extension of (2.3) to the maximum over k is by the union bound, whereas the extension to the supremum over all z ∈ D ′ and t ∈ [t 0 , 1] beyond the cutoff t 0 = N −K is by a stochastic continuity bound that we turn to in the next step. We fix r > 0 and consider the neighborhood for any t ≥ t 0 . Since S[t, G(t, z)] is a polynomial combination of the σ kj (t) and G(t, z), which are bounded by N −1 and η −1 ≤ N respectively, the bound (2.6) follows.

Fluctuations along characteristic curves
We will show that, for fixed realizations of the randomness, the unique solutions oḟ γ(t, z) = − G(t, γ(t, z)) , γ(0, z) = z serve as approximate characteristic curves of the resolvent SDE (1.8). We start these curves at spectral parameters z in an initial spectral domain where δ > 0 is any constant satisfying Thus G(0, z) = −z −1 is uniformly bounded and Lipschitz continuous in D 0 . Given an initial point z ∈ D 0 , we consider the process is the characteristic flow stopped at

The main observation is that R(t, z) is approximately constant.
Theorem 3.1. The process R(t, z) satisfies Proof. Since ξ defines a piecewise C 1 -process, an application of Itô's lemma shows that R(t, z) satisfies the same SDE as G(t, z) but with an additional counter-term in the drift. More precisely, while t ≤ τ z , the evolution consists of two terms When dealing with the integrated versions of these processes, we will choose the initial conditions F (0, z) = A(0, z) = 0 so that R(t, z) − R(0, z) = F (t, z) + A(t, z) . The proof of Theorem 3.1 is then a direct consequence of the estimates on F (t, z) and A(t, z) provided in Lemma 3.2 and Lemma 3.3, respectively.
The proofs of the subsequent results will repeatedly use the crucial fact that any continuously differentiable function f satisfies the identity b a f ′ (Im ξ(s, z)) Im R(s, z) ds = − b a f ′ (Im ξ(s, z)) d (Im ξ(s, z)) = f (Im ξ(a, z)) − f (Im ξ(b, z)) provided that a, b ≤ τ z . The term f (Im ξ(a, z)) − f (Im ξ(b, z)) is then usually estimated by a trivial bound in terms of η. We refer to these two steps simply as the "integration trick".
Proof. The martingale part of F (t, z) is and the quadratic variation of its unit trace M is given by where · 2 = Tr | · | 2 denotes the Hilbert-Schmidt norm, ⊙ denotes the entrywise product, and √ σ(t) is the matrix with entries σ jk (t). If t ≤ τ z , we conclude that Tr |R| 2 (s, z) (Im ξ(s, z)) 2 ds Im R(s, z) (Im ξ(s, z)) 3 ds = Im R(s) (Im ξ(s, z)) 2 ds, is also stochastically dominated by (N η) −1 uniformly in z ∈ D 0 because of the integration trick.  Proof. We will split the integral at the point t 0 from (2.2). By the trivial bound on the resolvent, the characteristic ξ(t, z) started at z ∈ D 0 remains in the domain D ′ from (2.1) for all t ≤ 1. Setting Theorem 2.1 shows that the second part of the integral in (3.2) is bounded by Im R(s, z) (Im ξ(s, z)) 3/2 u(s) ds Im R(s, z) (Im ξ(s, z)) 3/2 u(s) ds on an event that also has probability 1 − N −p for arbitrary p > 0 and large enough N . On this event, Grönwall's inequality implies Im R(s, z) N 1/2 (Im ξ(s, z)) 3/2 ds and the integral inside the exponential is bounded by 4(N η) −1/2 = 4N −θ/2 because of the integration trick. Since R(0, z) ≤ δ −1 for z ∈ D 0 , we have shown that sup t≤1 u(t) ≺ 1.
We now insert this bound on u back into the integral in (3.3). Choosing the cutoff exponent K in (2.2) large enough yields Im R(s, z) (Im ξ(s, z)) 3/2 ds ≺ 1 √ N η via the integration trick.
Having established Theorem 3.1, we now argue that the stochastic domination holds simultaneously for a continuum of points using a discretization argument.
Rearranging, we get To prove the theorem, we choose a finite grid Λ ⊂ D 0 such that its cardinality is bounded by |Λ| ≤ N L for some L ∈ N and From Theorem 3.1 and the union bound, we have so it suffices to show that the left side of (3.6) controls the left side of (3.4). Given z ∈ D 0 , we pick w ∈ Λ such that |z − w| ≤ η 3 √ N η . Then τ ′ w ≤ τ w and τ ′ z ≤ τ w since (3.5) guarantees that as long as Im γ(t, z) ≥ η/4 and Im γ(t, w) ≥ η/4. The trivial Cη −2 -Lipschitz continuity of the resolvent in D 0 then shows that the process R(t, z) stays within an error C/ √ N η of R(t, w) for all times t ≤ τ ′ z .
Before proving Theorem 1.2, we mention that the relation is equivalent to −1/w ′ = m sc (z). Since the semicircle law is analytic in the bulk interval W , its Stieltjes transform m sc is Lipschitz continuous in D with a constant independent of N .
Proof of Theorem 1.2. It suffices to prove that sup z∈D | G(1, z) − m sc (z)| ≤ O N ε √ N η on the event A ε for all ε < θ/2. Let z ∈ D, let w = w(z) ∈ D 0 be the point furnished by Lemma 4.1, and let w ′ = −m sc (z) −1 be the solution of (4.2). By Lemma 4.1 we have w+w −1 ∈ D for all sufficiently large N , so the Lipschitz continuous dependence of 1/w ′ on z ∈ D implies On the event A ε the bound is valid with a uniform constant for all z ∈ D.