Singularities of the density of states of random Gram matrices

For large random matrices $X$ with independent, centered entries but not necessarily identical variances, the eigenvalue density of $XX^*$ is well-approximated by a deterministic measure on $\mathbb{R}$. We show that the density of this measure has only square and cubic-root singularities away from zero. We also extend the bulk local law in [arXiv:1606.07353] to the vicinity of these singularities.


Introduction
The empirical eigenvalue density or density of states of many large random matrices is well-approximated by a deterministic probability measure, the self-consistent density of states. If X is a p × n random matrix with independent, centered entries of identical variances then the limit of the eigenvalue density of the sample covariance matrix XX * for large p and n with p/n converging to a constant has been identified by Marchenko and Pastur in [9]. However, some applications in wireless communication require understanding the spectrum of XX * without the assumption of identical variances of the entries of X = (x kq ) k,q [6,8,10]. In this case, the matrix XX * is a random Gram matrix. For constant variances, the self-consistent density of states is obtained by solving a scalar equation for its Stieltjes transform, the scalar Dyson equation. In case the variances s kq . . = E|x kq | 2 depend nontrivially on k and q, the self-consistent density of states is obtained from the solution m(ζ) = (m 1 (ζ), . . . , m p (ζ)) ∈ H p of the vector Dyson equation [7]  for all ζ ∈ H. Here, we introduced H . . = {ζ ∈ C : Im ζ > 0} and [p] . . = {1, . . . , p}. Indeed, the average m(ζ) 1 . . = p −1 p k=1 m k (ζ) is the Stieltjes transform of the self-consistent density of states denoted by ν 1 . If the limit of ν 1 as p, n → ∞ exists then it can be studied via an infinite-dimensional version of (1.1) (see (2.3) below).
For Wigner-type matrices, i.e., Hermitian random matrices with independent (up to the Hermiticity constraint), centered entries, the analogue of (1.1) is a quadratic vector equation (QVE) in the language of [1,3]. In these papers, finite and infinite-dimensional versions of the QVE have been extensively studied to analyze the self-consistent density of states whose Stieltjes transform is the average of the solution to the QVE. The authors show that the self-consistent density of states has a 1/3-Hölder continuous density. Except for finitely many square-root and cubic-root singularities this density is real-analytic. The square-root behaviour emerges solely at the edges of the connected components of the support of the self-consistent density of states, whereas the cubic-root singularities lie inside these components. The detailed stability analyis in [1] is then used in [2] to obtain the local law for Wigner-type matrices. A local law typically refers to a statement about the convergence of the eigenvalue density to a deterministic measure on a scale slightly above the typical local eigenvalue spacing.
For the Dyson equation for random Gram matrices, we obtain away from ζ = 0 the same results as mentioned above in the QVE setup. Furthermore, we extend our local law for random Gram matrices in [5] to the vicinity of the singularities of the self-consistent density of states. This can be seen as another instance of the universality phenomenon in random matrix theory. Despite the different structure of Gram and Wigner-type matrices, the densities of states of these Hermitian random matrices have the same types of singularities. We refer to [5] and the references therein for related results about random Gram matrices.
There is a close connection between Gram and Wigner-type matrices. The Dyson equation, (1.1), can be transformed into a QVE in the sense of [1] and the spectrum of XX * is closely related to that of a Wigner-type matrix in the sense of [2]. This is easiest explained on the random matrix level through a special case of the linearization tricks: If X has independent and centered entries then the random matrix is a Wigner-type matrix and the spectra of H 2 and XX * agree away from zero. Therefore, instead of trying to analyze (1.1) and XX * directly, it is more efficient to study the corresponding QVE and Wigner-type matrix as in [5]. However, owing to the large zero blocks in H, its variance matrix is not uniformly primitive (see A3 in [1]), a key assumption for the analysis in [1]. Indeed, the stability operator of the QVE possesses an additional unstable direction f − , which has to be treated separately. In [5], this study has been conducted in the bulk spectrum and away from the support of ν 1 , where f − did not play an important role at least away from zero.
In this note, we present a new argument needed in the analysis of the cubic equation (see (3.19) below) describing the stability of the QVE close to its singularities in order to incorporate the additional unstable direction. In fact, the analysis of the cubic equation in [1] heavily relies on the uniform primitivity of the variance matrix. Adapting this argument to the current setup cannot exclude that the coefficients of the cubic and the quadratic term in the cubic equation vanish at the same time due to the presence of f − . A nonvanishing cubic or quadratic coefficient is however absolutely crucial for the cubic stability analysis in [1]. Otherwise not only square-root or cubic-root but also higher order singularities would emerge. Our main novel ingredient, a very detailed analysis of these coefficients, actually excludes this scenario. With this essential new input, the regularity and the singularity structure of (1.1) as well as the local law for XX * follow by correctly combining the arguments in [1,2,5].

Acknowledgement
The author is very grateful to László Erdős for many fruitful discussions and many valuable suggestions. The author would also like to thank Torben Krüger for several helpful conversations.

Structure of the solution to the Dyson equation
Let (X 1 , S 1 , π 1 ) and (X 2 , S 2 , π 2 ) be two finite measure spaces such that π 1 (X 1 ) and π 2 (X 2 ) are strictly positive. Moreover, we denote the spaces of bounded and measurable functions on X 1 and X 2 by We consider B 1 and B 2 equipped with the supremum norm · ∞ . We denote the induced operator norms by · B1→B2 and · B2→B1 . For u ∈ B 1 , we write u k = u(k) for k ∈ X 1 . We use the same notation for v ∈ B 2 .
Let s : We define the bounded linear operators S : B 2 → B 1 and S t : We are interested in the solution m : H → B 1 of the Dyson equation for ζ ∈ H, which satisfies Im m(ζ) > 0 for all ζ ∈ H.
Further assumptions on π 1 , π 2 and S will yield a more detailed understanding of the measures ν k . To formulate these assumptions, we introduce the averages of u ∈ B 1 and v ∈ B 2 through Additionally, we set u t .
and t ≥ 1. Moreover, for k ∈ X 1 and q ∈ X 2 , we define the functions S k : We call S k and (S t ) q the rows and columns of S, respectively.
(A3) The rows and columns of S are sufficiently close to each other in the sense that there is a continuous strictly monotonically decreasing function γ : (0, 1] → R + 0 such that lim ε↓0 γ(ε) = ∞ and for all ε ∈ (0, 1], we have (A4) The operators S and S t map square-integrable functions continuously to bounded functions, i.e., there are constants Ψ 1 , Ψ 2 > 0 such that Our estimates will be uniform in all models that satisfy Assumptions 2.2 with the same constants. Therefore, the constants π * , π * from (A1), L 1 , L 2 , κ 1 , κ 2 from (A2), the function γ from (A3) and Ψ 1 , Ψ 2 from (A4) are called model parameters. We refer to Remark 2.4 below for an easily checkable sufficient condition for (A3). We now state our main result about the regularity and the possible singularities of ν k defined in (2.4).

Theorem 2.3. If we assume (A1) -(A4) then the following statements hold true:
(i) (Regularity of ν) There are ν 0 ∈ B 1 and ν d : For all k ∈ X 1 , we have There is ρ * > 0 depending only on the model parameters and δ such that the Lebesgue measure of each connected component of The point E 0 is the intersection of the closures of two connected components of P ∩ (δ, ∞) and ν d has a cubic root singularity at

Abstract
For p × n random matrices X with independent, centered entries but not necessarily identical variances, the eigenvalue density of XX * is well-approximated by a deterministic measure on R for large p and n. We show that the density of this measure has only square and cubic-root singularities away from zero. We also extend the bulk local law in [?] to the vicinity of these singularities.   Remark 2.4 (Piecewise Hölder-continuous rows and columns of S imply (A3)). Let X 1 and X 2 be two nontrivial compact intervals in R and π 1 and π 2 the Lebesgue measures. In this case, (A3) holds true if the maps k → S k and r → (S t ) r are piecewise 1/2-Hölder continuous in the sense that there are two finite partitions (I α ) α∈A and (J β ) β∈B of X 1 and X 2 , respectively, such that, for all α ∈ A and β ∈ B, we have There is a similar condition for (A3) if X 1 = [p] and X 2 = [n] for some p, n ∈ N and the measures π 1 and π 2 are the (unnormalized) counting measures on [p] and [n], respectively.

Local law for random Gram matrices
In this subsection, we state our results on random Gram matrices. We now set X 1 = [p], X 2 = [n] as well as π 1 and π 2 the (unnormalized) counting measures on [p] and [n], respectively. In particular, π 1 (X 1 ) = p and π 2 (X 2 ) = n.
(B2) All entries of X have bounded moments in the sense that there are µ m > 0 for m ≥ 3 such that The sequence (µ m ) m≥3 in (B2) is also considered a model parameter.
Furthermore, for any ε > 0 and D > 0, there is a constant C ε,D > 0 such that, for any deterministic vector The constant C ε,D in (2.8) depends only on the model parameters as well as δ and γ in addition to ε and D.

Remark 2.7. (i) (Corollaries of the local law)
In the same way as in [2] and in [5], the standard corollaries of a local law -convergence of cumulative distribution function, rigidity of eigenvalues, anisotropic law and delocalization of eigenvectors -may be proven.
(ii) (Local law in the bulk and away from supp ν) In the bulk, Theorem 2.6 has already been proven in [5]. Away from supp ν, the convergence rate in (2.8a) and (2.8b) can be improved and thus the condition Im ζ ≥ p −1+γ can be removed. See [5] for Gram matrices and [4] for Kronecker matrices.
(iii) (Local law close to zero) Strengthening the assumption (A2), we have proven the local law close to zero in the cases, n = p and |p − n| ≥ cn, in [5].

Quadratic vector equation
In this section, we translate (2.3) into a quadratic vector equation of [1] (see (3.2) below) and show that Proposition 2.1 trivially follows from [1]. However, the singularity analysis in [1] has to be changed essentially due to the violation of the uniform primitivity condition, A3 in [1], on S (cf. (3.1) below) in our setup. Let X . . = X 1 X 2 be the disjoint union of X 1 and X 2 and π the probability measure defined through π(A B) = (π 1 (X 1 ) + π 2 (X 2 )) −1 (π 1 (A) + π 2 (B)), Moreover, we denote the set of bounded measurable functions X → C by B . . = {w : X → C : w ∞ . . = sup x∈X |w(x)| < ∞} with the supremum norm · ∞ . Finally, on B = B 1 ⊕ B 2 , we define the linear operator S : B → B through Here, we consider S(w| X2 ) and S t (w| X1 ) as functions X → C, extended by zero outside of X 1 and X 2 , respectively. Instead of (2.3), we study the quadratic vector equation (QVE) for ζ ∈ H is a solution of (2.3). If m has positive imaginary part then m as well.
For u ∈ B, we write u x . . = u(x) with x ∈ X. For u, w ∈ B, we denote the scalar product of u and w and the average of u by We also introduce the Hilbert space L 2 (π) . . = {u : X → C : u , u < ∞}. The operator S is symmetric on B with respect to · , · and positivity preserving, as s kr ≥ 0 for all k ∈ X 1 and r ∈ X 2 . Therefore, by Theorem 2.1 in [1], there exists m : H → B which satisfies (3.2) for all z ∈ H. This function is unique if we require that the solution of (3.2) satisfies Im m(z) > 0 for z ∈ H. Moreover, m : H → B is analytic and, for all z ∈ H, we have m(z) 2 Furthermore, for all x ∈ X, there are symmetric probability measures ρ x on R such that for all z ∈ H [1]. That means that m x is the Stieltjes transform of ρ x . By (2.7) in [1], the definition of Σ in (2.5) and Assumptions 3.1. In the remainder of this section, we assume that (A1), (A2), (A4) and the following condition hold true: (C2) There areδ > 0 and Φ > 0 such that for all z ∈ H satisfying |z| ≥δ, we have (A3) and (C2)). By slightly adapting the proofs of Theorem 6.1 (ii) and Proposition 6.6 in [1], we see that, by (A3), for eachδ > 0, there is Φδ > 0 such that (C2) is satisfied with a constant Φ ≡ Φδ.

Remark 3.2 (Relation between
Since our estimates in this section will be uniform in all models that satisfy (A1), (A2), (A4) and (C2) with the same constants, we introduce the following notion.
(ii) The measure ρ from (3.6) is absolutely continuous, i.e., there is a function ρ d : for all x ∈ X. A similar result has been obtained in Theorem 2.4 in [1] essentially relying on the uniform primitivity assumption A3 in [1]. For discrete X 1 and X 2 without assuming (C2), Lemma 3.8 in [5] shows Hölder continuity of m instead of m with a smaller exponent than 1/3. Both conditions, A3 in [1] and the discreteness of X 1 and X 2 , are violated in our setup. However, based on the proof of Theorem 2.4 in [1], we now explain how to extend the arguments of [1] and [5] to show Proposition 3.4. Here, as in the proof of Lemma 3.1 in [5], the uniform primitivity assumption A3 of [1] has to be replaced by (B') in [5], which is a direct consequence of (A2).
The Hölder continuity and the analyticity of m and hence ρ d will be consequences of analyzing the perturbed QVE (3.12) for z ∈ H and d = z − z as well as the stability operator B defined through where F (z) : B → B is defined through F (z)u = |m(z)|S (|m(z)|u) for any u ∈ B (cf. [1,5]). Correspondingly, we introduce F (z) : B 2 → B 1 via F (z)w = |m 1 (z)|S(|m 2 (z)|w) for w ∈ B 2 and F t (z) : To formulate the key properties of F and B, we now introduce some notation. The operator norms for operators on B and L 2 (π) are denoted by · ∞ and · 2 , respectively. If T : L 2 → L 2 is a compact self-adjoint operator then the spectral gap Gap(T ) is the difference between the two largest eigenvalues of |T |. We remark that S and hence F F t are compact operators due to (A4). Lemma 3.6 (Properties of F ). The eigenspace of F associated to F 2 is one-dimensional and spanned by a unique L 2 (π)-normalized positive f + ∈ B. The eigenspace associated to − F 2 is one-dimensional and spanned by f − . . = f + e − ∈ B. We have f + ∼ 1 (3.14) uniformly for z ∈ H Σ δ and for all u ∈ B satisfying f + , u = 0 and f − , u = 0. Furthermore, we have F 2 ≤ 1, Gap(F (z)F t (z)) ∼ 1 uniformly for z ∈ H Σ δ . Lemma 3.6 is a consequence of the proof of Lemma 3.3 in [5] with r = |m| and (3.10).

Singularities of ρ d and the cubic equation
We now study the behaviour of ρ d near points τ ∈ R, where ρ d is not analytic. Theorem 2.6 in [1] describes the density near the edges and the cusps as well as the transition between the bulk and the singularity regimes in a quantitative manner. The same results hold for ρ d as well: For the proof of Proposition 3.8 we follow Chapter 8 and 9 in [1] which contain the proof of the analogue of Proposition 3.8, Theorem 2.6 in [1], and describe the necessary changes as well as the main philosophy.
The shape of the singularities of m as well as the stability of the QVE (cf. Chapter 10 in [1]) will be a consequence of the stability of a cubic equation. We note that similar as in Lemma 8.1 of [1], the following properties of the stability operator B = B(z) defined in (3.13) can be proven. There is ε * ∼ 1 such that for z ∈ H Σ δ satisfying Im m(z) ≤ ε * , B has a unique eigenvalue β = β(z) of smallest modulus and |β | − |β| 1 for all β ∈ Spec(B) \ {β}. The eigenspace associated to β is one-dimensional and there is a unique vector b = b(z) ∈ B in this eigenspace such that b(z) , f + = 1.