The density of states of 1D random band matrices via a supersymmetric transfer operator

Recently, T. and M. Shcherbina proved a pointwise semicircle law for the density of states of one-dimensional Gaussian band matrices of large bandwidth. The main step of their proof is a new method to study the spectral properties of non-self-adjoint operators in the semiclassical regime. The method is applied to a transfer operator constructed from the supersymmetric integral representation for the density of states. We present a simpler proof of a slightly upgraded version of the semicircle law, which requires only standard semiclassical arguments and some peculiar elementary computations. The simplification is due to the use of supersymmetry, which manifests itself in the commutation between the transfer operator and a family of transformations of superspace, and was applied earlier in the context of band matrices by Constantinescu. Other versions of this supersymmetry have been a crucial ingredient in the study of the localization--delocalization transition by theoretical physicists.


Introduction
Band operators and band matrices Random band operators are popular toy models of disordered systems in theoretical physics. Their properties depend on a large parameter, called the bandwidth and denoted W . Informally, the large matrix elements lie in a band of width W about the main diagonal. One natural example is an Hermitian random operator H, represented by the biinfinite Gaussian random matrix with covariance EH x,y H x ′ ,y ′ = 1 W δ x,x ′ δ y,y ′ ½ |x−y|≤W , x, y ∈ Z . (1.1) In this paper we mostly focus on a different example, the Hermitian Gaussian operator with covariance where ∆ is the one-dimensional discrete Laplacian and ½ is the biinfinte identity matrix.
Along with the infinite-volume operators H, one considers their finite-volume versions H N of dimension N × N , called random band matrices. Again, we single out Gaussian band matrices, and among them -those with the covariances EH x,y H x ′ ,y ′ = 1 W δ x,x ′ δ y,y ′ ½ |x−y|≤W .
Localization length By the general theory of one-dimensional random operators [PF92], finite-range band operators (including (1.1) ) exhibit localization for any value of W , manifesting itself in pure point spectrum with exponentially decaying eigenfunctions. The rate of exponential decay of the eigenfunctions is known as the localization length and denoted L loc . An essentially equivalent quantity is the minimal value of N such that 9/10 of the ℓ 2 mass of the eigenvectors is concentrated in a subinterval of length N/10. Anderson localization also occurs for long-range random operators with sufficiently fast decay of the off-diagonal elements (such as 1.2); see [VM99].
A long-standing question is to determine the asymptotic dependence of the localization length on the bandwidth. It is widely believed that L loc scales as W 2 for large W , for eigenvectors corresponding to energies |E| < 2. This belief is supported by various convincing albeit not mathematically rigorous arguments [CMI90; Cas+93; FM91; FM94]; see further below.
On the rigorous side, Schenker proved [Sch09] that L loc ≤ CW 8 for a class of band matrices including (1.1). His argument is reminiscent of the Mermin-Wagner theorem in statistical mechanics. A combination of the result of [Sch09] with the recent Wegner estimate from [Pel+16] yields a slight improvement L loc ≤ CW 7 .
As to lower bounds, the results of [EK11b;EK11a] pertaining to the quantum evolution imply a weak delocalization result for W ≫ N We mention that band matrices and band operators admit a generalization to higher spatial dimension. If the spatial structure of the band is d dimensional with d ≥ 3, they are expected to exhibit an Anderson-type spectral phase transition similar to the conjectural metal-insulator transition in realistic solid state models. The dimension d = 2 is critical, and localization is expected for all values of the band width. See further the review [Spe11].
Density of states The behaviour of the eigenvectors of H is controlled by the quantity |G xy (E + iε)| 2 , where G xy (z) = G xy [H](z) = (z − H) −1 xy is the Green's function of the random matrix H, E ∈ R is a spectral parameter, and · denotes averaging over the disorder. In infinite volume, ǫ should be sent to zero, while W is large but fixed. In finite volume, ǫ should be taken of order 1/N . The quantity |G xy (E + iε)| 2 also controls the properties of the (quenched) spectral measure µ H of H, which is defined by In this paper we focus on a simpler quantity, the average Green function G xx (E +iε) , and the related quantity G xy (E +iε)G yx (E +iε) . The former one controls the behaviour of the average density of states ρ = ρ [H], which can be defined as the Radon derivative of the disorder average µ H of µ H . We also consider the average density of states in finite volume, ρ N = ρ[H N ].
For Gaussian random band matrices including both (1.1) and (1.2), the density of states exists in both finite and infinite volume by a general argument of Wegner [Weg81]. The average density of states is related to G 00 : and (consequently) (1.5) In finite volume, ρ N can be also written as justifying its name. Also, one has: lim N →∞ ρ N (E) = ρ(E) (see the proof of Proposition 4.2). It is believed that the average G xy (E + iε)G yx (E + iε) and the average density of states do not reflect the localization properties of the eigenvectors and the spectral type of H: see e.g. [Weg81]. Still, the former quantities are of intrinsic interest.
In the works [BMP91;MPK92], it was proved that as W → ∞ the densities ρ(λ) converge weakly to the semicircle density ρ s.c. (E) = 1 2π for any bounded continuous test function φ. The arguments in these works apply to for a wide class of band matrices including (1.1), (1.2). The available pointwise results are much less general, and we list them below following a brief discussion of supersymmetry.
Supersymmetry One of the powerful methods for the study of random operators is the supersymmetric formalism. First introduced by Wegner and Schäfer and developed in the works of Efetov, it allows to rewrite the disorder averages of various observables as high-dimensional superintegrals. A general introduction may be found in the monographs [Weg16;Efe99].
In the context of random band matrices, the supersymmetric approach was applied by Fyodorov and Mirlin [FM91;FM94], who confirmed the dependence of the localisation length on the bandwidth and also described the crossover occurring as W ≍ √ N . For Gaussian random band matrices, the average |G xy (E + iε)| 2 corresponds, through Berezin integration and superbosonization or certain formal versions of the Hubbard-Stratonovich transformation [Zir06], to a high dimensional super-integral dominated by a complicated saddle manifold. The average density of states ρ N and the average G xy (E + iε)G yx (E + iε) lead, in the same way, to an integral dominated by saddle points.
The four main steps in the works [FM91;FM94] are the derivation of the supersymmetric integral representation; the σ-model approximation, in which the integration domain is restricted to the saddle manifold; the continuum limit; and a semiclassical analysis of the infinitesimal transfer operator. All these steps, and particularly the last three, have so far not been put on firm mathematical ground. (See [SS18] for recent progress on the second step.) A version of the SUSY formalism that uses similar algebraic identities (Berezin integration), but simpler supersymmetries than those involved in superbosonization, had been used early on in rigorous investigations, e.g., of localization in d = 1 random Schrödinger operators [CK86;KMP86].
Pointwise estimates The models (1.2) and their counterparts in higher dimension are especially convenient for supersymmetric analysis, since the dual supersymmetric model has nearest neighbour coupling (see Proposition 2.1; in classical statistical mechanics, the idea of duality between carefully chosen long-range models and nearest neighbour models goes back to the work of Mark Kac [Kac59,Section 9]). This is why most of the pointwise results established to date pertain to this class of operators. One exception is the upper bound for a reasonably wide class of Gaussian random band matrices including both (1.1) and (1.2), and their counterparts in arbitrary dimension.
Much more is known for models of the form (1.2). In dimension d = 3, it was proved in [DPS02] that Corresponding results were also proved in finite volume. The argument of [DPS02] relies on a cluster expansion similar to the one used for the Wegner orbital model in [Con+87].
Recently, a parallel result was also proved for d = 2 in [DL17]. In d = 1, the cluster expansion methods of [DPS02; DL17] run into difficulties. Instead, the method of transfer matrices can be used. While the method of transfer matrices has been successfully applied in the physical literature for this and more involved problems [FM91; FM94], a mathematical justification is far from straightforward since the transfer matrix is not self-adjoint. One possible strategy to perform semi-classical analysis for non-self-adjoint operators was suggested in [DS16]; for now, this strategy was only implemented for toy operators much simpler than the one considered here.
Recently, Mariya and Tatyana Shcherbina developed a different and very general method of semi-classical analysis for non-self-adjoint operators [SS17]. In the work [SS16], they applied the method to the problem under discussion and proved The main goal of the current paper is to provide an alternative and arguably simpler proof for the result of [SS16]. Most of the work in [SS16] is devoted to proving the spectral gap for the transfer operator and some properties of its top eigenfunction (Theorem 4.2 therein). The argument is based on methods developed in the non-supersymmetric framework in [SS17]. Our proof, by contrast, exploits the supersymmetry of the problem. Apart from important simplification and reorganization of the algebra, supersymmetry manifestly fixes the top eigenvalue to 1, implies a simple equation for the top eigenfunction, and reduces the proof of the spectral gap to an elementary spectral bound. We also emphasize that, similarly to the classical applications of transfer operators in statistical mechanics, the original problem with two large parameters, N and W , is reduced to a bunch of asymptotic problems containing only one parameter.
Our method is non-applicable to problems lacking supersymmetry, such as the correlation length of characteristic polynomials considered in [SS17], but has the advantage of simplicity. It also allows to obtain stronger results, summarized in the theorems below. (1.7) The infinite volume density is well approximated by the semi-circle law: (1.8) Remark 1.1. The estimate (1.7) together with (1.8) yields Note that, for large band width N ∼ W log W , the finite volume deviation (1.7) is substantially bigger than the accuracy (1.8) of the semicircle law.
Theorem 2. Assume that |E| < 2. For N ≥ C(E)W log W the following hold.
1. For any 1 ≤ y, y ′ ≤ N , 2. The finite volume density ρ N is smooth and for all n ≥ 1 (1.11) Remark 1.2. From (1.7) and (1.11) we obtain that the densities ρ N (E) and ρ(E) admit an analytic extension to a domain of complex energies of the form Remark 1.3. Throughout the paper, expressions of the form C(E), C ′ (E), c(E) are bounded from above and below on compact subintervals of (−2, 2). The values of C(E), C ′ (E), c(E) and of the numerical constants C, C ′ , c, et cet. may change from line to line.
Plan of the paper In Section 2, we recall the representation of the mean density of states ρ N (E), as well as of the left-hand sides of (1.10)-(1.11), via Berezin integration, and set up a supersymmetric transfer operator T, (2.16). Then we discuss a certain supersymmetry, that has already been described in [Con88], and is the supermatrix analogue of rotational symmetry and polar coordinates. This supersymmetry allows to reduce the transfer operator (2.16) to the much simpler one, T of (2.37), acting on functions of two real variables.
In Section 3 we analyse the transfer operator T. The two main steps are (a) the construction of an approximate top eigenfunction u 0 , Tu 0 ≈ u 0 , and (b) a bound on the restriction of T to a complement of the top eigenfunction. For a weakened version of Theorem 1 (with a worse error term), it would suffice to use the simple ansatz with an appropriately chosen α. In order to prove the results as stated, we develop, in Section 3.2, a systematic construction ("supersymmetric WKB") which allows to find an approximate solution to an arbitrary order in W −1/2 (for the purposes of the current paper, we use this construction up to order 5). As for step (b), the key fact is Proposition 3.1, proved via a simple semiclassical argument. Corollary 3.4 and Lemma 3.5 contain refinements needed to obtain the improved error term in Theorem 1 and for Theorem 2.
In Section 4 we prove Theorems 1 and 2, for |E| ≤ 32 9 . The relatively simple Proposition 3.7 would suffice to prove a version of Theorem 1 with a worse error term; we work a bit more to prove the results as stated.
In Section 5 we extend the argument to all |E| < 2, using a contour deformation and applying the general strategy of the Section 4 to a power of the transfer operator (where the power is chosen depending on the energy E).
With the exception of the Berezin integral representation (well explained elsewhere) and undergraduate analysis, our proof is self contained.
2 Berezin algebra for the DOS and SUSY

Supersymmetric integral representation
The Berezin integral representation of ρ N (E) involves the use of Grassmann variables. These will be denoted by the symbols ρ, ρ, η, η and ξ, ξ, possibly with an index. All Grassmann variables anticommute with one another. The Grassmann algebra G (≡ free anticommutative) generated (over C) by all Grassmann variables admits a Z/2Z grading, in which a monomial in the generators is even or odd according to the parity of the number of symbols.
On the Grassmann algebra we define complex conjugation by ρ → ρ, ρ → −ρ (etc.) on the generators, and ordinary complex conjugation on the coefficients 1 . An element f in the Grassmann algebra generated by the family {ρ x , ρ x } N x=1 has a unique decomposition as where c IJ ∈ C, ρ I = i∈I ρ i and ρ J := j∈J ρ j and the products are understood under some fixed arbitrary order on the set {1, . . . , N }. In particular, every element can be written as f = f 0 + n where f 0 ∈ C and n is nilpotent, i.e. there exists k ≥ 1 such that n k = 0. The Grassmann algebra has a natural decomposition into even and odd elements. Having a Grassmann algebra G at hand, we consider the algebra of C ∞ -smooth functions f : C m → G. A general element of this larger algebra has the form (2.2) Smooth multivariate functions (z 1 , . . . , z m ) → f (z 1 , . . . , z m ) can be extended to functions of the form (2.2) taking values in the Grassmann algebra by replacing the variables by some even elements of the algebra z i → z i + n i , where n i is nilpotent, and taking their Taylor series around (z 1 , . . . , z m ) until it terminates by nilpotency. For example, for m = 1 one has: The Berezin integral is the linear functional that selects the top degree (in the generators) coefficient in (2.1) and multiplies it with (2π) − m 2 , where m is the number of generators. Thus, if ρ, ρ are the only generators, where the differentials dρ, dρ anticommute with each other and with all Grasmann generators. A 2m × 2m supermatrix is a matrix of the form where A, B (resp. Σ, Γ) are m × m matrices whose matrix elements are even (respectively, odd) elements of the Grassmann algebra. The supertrace and superdeterminant of a supermatrix are defined as where tr and det are the ordinary trace and determinant. We shall use the symbol R (possibly with an index) to denote 2 × 2 supermatrices of the form where a, b are real numbers, and ρ, ρ are generators of the Grassmann algebra. By a superfunction F of one or several variables R we mean an expression of the form (2.2) in which the z variables are replaced by a, b. See [Ber13] for more details on these conventions and definitions.
We will use a supersymmetric representation for the average Green's function, for which we will now introduce some notation.
Let J := −W 2 ∆ N + ½ the inverse of the covariance introduced in (1.4). Abbreviate Let Γ a and Γ b be two horizontal lines in C oriented from left to right, such that Γ a lies below the pole E + iε ∈ C, while no additional constraint is needed for integral in a x is taken along Γ a , and the integral in b x is taken along Γ b . We will denote a = (a x ) N x=1 and b = (b x ) N x=1 .
Proposition 2.1. In the above notation the following identities hold.
and setting E,Ẽ ∈ R By the same arguments as in [DPS02,Sect. 3] the generating function can be written as follows where we used (∂Ẽ y + ∂ ay ) Sdet (Ê y + iε½ 2 − R y ) = 0 and we performed integration by parts in a y . The same argument holds for ∂ Ey J(E½ N , E½ N ) using This completes the proof of (2.6). Performing integration by parts twice we obtain This completes the proof of (2.7). Similar arguments yield where we used that (∂ ay − i∂ by )(a y − ib y ) = 0 for any y. The first summand can be written as y Str R y n−1 SU SY = ∂ n−1 γ e γ y Str Ry SU SY |γ=0 = 0 ∀n > 1 (2.9) since by performing the translation R → R + γ½ 2 we get This completes the proof of (2.8).
Identities of the form (2.5) for supersymmetric averages of supersymmetric functions go back to the work of Parisi and Sourlas, cf. [Weg16,Chapter 15]. Note that the first two items appear also in this form, for example, in [SS16;DL17].
Translation to the saddle Disregarding mathematical precision in our thinking about Grassmann variables, we have the following heuristics. The factor forces the "integration field" R x to be almost constant, and the Laplace method then implies that the main contribution is from field R x in the vicinity of the "saddle points" of e − 1 2 Str R 2 x (Sdet (E ε − R x )) −1 . Now we proceed with the rigorous argument. Setting the Grassmann variables equal to 0 and neglecting the ε contribution it is easy to see that the saddle points (i.e. the critical points of the logarithm) are at (2.10) a + is in the same half space as the pole, therefore we choose as contour Γ a the translate of the real axis with origin at a − . We will abbreviate E = a − = E 2 − i 1 − E 2 4 . Note that EĒ = 1. It will turn out later that b − is the dominant saddle point 3 . We therefore 3 If, as is the case for GUE, ρx ≡ ρ were a collective variable, i.e. a single Grassmann generator, and similarly for ρ, this could be seen heuristically by evaluating Set I| R=R ± ,ε=0 = I0 ± I ± ρρ. If there were no observable, the Grassmann integral dρ dρ would select I ± . Explicit calculation gives , which is large (proportional to N ), while I + = −N (1 − |E| 2 ) = 0. A more convincing argument would have to analyze "fluctuations" in ρ, ρ. choose as contour Γ b the translate of the real axis with origin at b − . After this translation, the limit ε ց 0 can be performed by dominated convergence. We introduce the modified SUSY average (2.11) where ǫ is now set equal to 0 and the integrals in a x , b x are taken along the real axis. Moreover we used E − E =Ē, where the last term vanishes since Str ½ 2 = 0. We then have (2.14) Proof. (2.12) follows from (2.6) by replacing a and ib by a + E and ib + E. In the same way, (2.13) follows from (2.7) by observing that, in addition, (2.12) implies (Ja) y ′ SU SY = (J(ib)) y ′ SU SY . Finally (2.14) follows from (2.8) observing that y Str R y n ′ SU SY = 0 for all n ≥ 1 by the same argument as in (2.9).
Note that − 1 π ℑE = 1 2π √ 4 − E 2 is the semicircle density. Thus, for example, (1.9) amounts to showing that the remaining term in (2.12) is small for large W and N .

Supersymmetric transfer operator
The representation of Corollary 2.2 can be stated using a supersymmetric transfer operator. Recall that Str (R + E, and define (2.16) Remark 2.3. Here we consider e −V as a single typographic symbol, therefore the reader need not be concerned, for example, with the choice of a branch of the logarithm. Also, for now we treat T as a formal operation rather than as an operator, therefore we do not worry about the domain.
Using (2.15) and (2.16) we have for all where Str (·) is the "multiplication operator" by (Str R) and we use the convention T 0 = id. In the same way we get, for any n ≥ 1, where we defined y 0 = 1, y q+1 = N + 1, n 0 = n q+1 = 0, and I m,n 1 ,...,nq x,y 1 ,...,yq := dR T x−ym Str (·) nm · · · T y 2 −y 1 Str (·) n 1 T y 1 −1 e −V ](R) (2.19) Moreover, in the special case m = 0 the first product is replaced by T x−1 e −V while for m = q the second product is replaced by T N −x e −V . For n = 1 (2.18) can be simplified as follows where for x ≤ x ′ I xx ′ was defined in (2.17) and we used I 0,1 xy = I xy , I 1,1 xy = I N −x+1,N −y+1 . We want to reduce T to an ordinary transfer operator, involving no Grassmann integration variables. This could be done by hand, i.e. by expanding in the Grassmann variables and selecting top degree coefficients. In this paper, we will instead exploit a supersymmetry that morally states that T commutes with the superrotations of the supermatrix R. This will yield a purely bosonic (Grassmann free) representation of T in the appropriately defined polar coordinates.
To make this intuition precise, let ξ, η be two new Grassmann generators (that may or may not depend on ρ, ρ). Then the matrix is a "superunitary" rotation in the following sense: It is easy to see that Str R n = Str (U −1 ηξ RU ηξ ) n and Sdet R = Sdet (U −1 ηξ RU ηξ ) for any n ≥ 1.
Lemma 2.4. The measure dR is invariant under superunitary rotations: for any Schwartz function F (R) one has Proof. This is a special case of the general framework of Berezin integration [Ber13] as applied to the conjugation supersymmetry at hand. We will prove it here by direct computation. The function F has a unique decomposition as

Replacing in this formula
where we used n 2 = −2ρρηξ and n 3 = 0 and we defined Note that Performing now integration over ρ, ρ we obtain a sum of terms of the form with (a, b) = (0, 0) has a polar decomposition as follows (2.24) We call λ = (λ 1 , λ 2 ) the eigenvalues of R, and (λ, η, ξ) -the polar coordinates for R.
superunitary rotations depends on the eigenvalues λ 1 , λ 2 only. In this case we will write In particular we have Str R n = λ n 1 − (iλ 2 ) n ∀n ≥ 0 and Sdet R = λ 1 iλ 2 . Hence the potential term in T (2.15) becomes Note that λ 1 , λ 2 are even elements of the Grassman algebra. We will see below that inside an integral we can replace them by ordinary real variables.
Lemma 2.5. The transfer operator T preserves superunitary rotation invariance. Precisely, let F (R) be such that all coefficients F I (a, b) are smooth and |F Proof. We abbreviate U ηξ by U. We have where in the second line we used V (U −1 RU ) = V (R) (since both Sdet and Str are invariant) and F (R ′ ) = F (U R ′ U −1 ) (since we assumed F is invariant). Finally in the last line we used (2.21) with U replaced by U −1 .
Remark 2.6. This lemma implies that we can always replace R by the diagonal matrix diag(λ 1 , iλ 2 ) when evaluating (TF )(R), as long as F is invariant under superunitary rotations, and Str R = a − ib = 0.

Some useful identities
For certain functions the transformed TF can be explicitely computed.
the supergaussian measure with variance z −1 . By direct computation Lemma 2.7. Let 1(R) := 1 be the constant function, and Ω α (R) := e −W α Str R 2 -the gaussian with α ∈ C, ℜα > 0. Then , and f is a polynomial in λ one has: where we used the measure defined in (2.26) with z = W 2 /µ.
Proof. The first identity above follows by translating all variables R ′ → R ′ + R and applying (2.27). For the second identity we first complete the square, then perform the translation R ′ → R ′ + µR : The second equality follows from (2.27), since W µα = W + 2α has positive real part. Repeating the same arguments we get (2.30).
The following identities will be useful later.
Lemma 2.9. For any z ∈ C, with Re(z) > 0 and supermatrix R the following identities hold: Proof. Without loss of generality we can assume R to be diagonal R = diag(λ 1 , iλ 2 ). Indeed by superunitary rotations any R = a ρ ρ ib with (a, b) = (0, 0) can be reduced to a diagonal supermatrix. Finally the case (a, b) = (0, 0) can be recovered from (a, b) = (0, 0) by continuity remarking that both sides of the identities above are polynomials in the variables a and b. ( By (2.38) dµ z (R ′ ) Str (R ′ ) 2 = 0 and the remaining integral vanishes by parity under The second and fourth terms are odd under R ′ → −R ′ , hence the corresponding integrals vanish. Note that Direct computation shows that the integral of each matrix component equals zero.
(c) Expanding we have The second and third term are odd, i.e. change sign under R ′ → −R ′ , hence the corresponding integrals vanish. The integral of the first term vanish by (2.38). Since the integral of each matrix component in R ′ 2 equals zero also the fifth term disappears. It remains to consider Str (RR ′ ) 2 . Since R is assumed to be diagonal we have The integral of the last two term vanish by parity while the integral of the first term vanishes by (2.38).
This concludes the proof.
As a direct consequence of this lemma, we can compute exactly the action of TF for a certain functions. This is done in the following corollary.

Transfer operator in polar coordinates
In the following we will consider only functions R → F (R) taking values in even elements of the Grassmann algebra and such that F (R) contains no Grassmann generators except ξ, η (equivalentlyρ, ρ). If such F is invariant under superunitary rotations then the corresponding function f maps R 2 to R. Our goal is to write TF as an operator T acting on f instead. This requires to change coordinates in the integral (a, b, ρ, ρ) → (λ 1 , λ 2 , η, ξ). The operation is well defined only for functions that vanish at the origin: if F (R) is a "nice" function (we will make this precise below) with F (0) = 0, then by Berezin integration formula (cf. [Ber13]) where Ber = (λ 1 − iλ 2 ) 2 is the so-called Berezinian (the super-analog of the Jacobi determinant), and a, b, λ 1 , λ 2 are real integration variables. The function appearing in To solve the problem one can decompose the integral as follows: where the functionsF 1 (R ′ ),F 2 (R ′ ) must satisfy: (a) both functions are 'nice', in particular integrable, (b)F 1 (0) = 0 so that we can apply the change of coordinates and (c) the integral dR ′F 2 (R ′ ) is easy to compute without any coordinate change. When F is invariant under superunitary rotations, the following two choices are especially convenient.
In both cases the functionF 1 vanishes at 0. Moreover the functionF 2 can be evaluated exactly where the first integral can be performed directly as the one in (2.28), while the second integral is a consequence of the rotation symmetry (cf. Remark 2.13 below). We will see below these choices translate in two equivalent representations of the operator T acting on f. We first need some notation. For f : (from the Ber term in (2.4)) and byΛ the multiplication byΛ = λ 1 + iλ 2 so that Finally we denote by e −V -the multiplication by e −V (λ) (cf. (2.25)) Proposition 2.11. Assume R → F (R) takes values in even elements of the Grassmann algebra and F (R) contains no Grassmann generators except ξ, η (equivalentlyρ, ρ, since where K < W 2 /2. Then TF is also rotation invariant. Moreover, T can be represented as an operator acting on , thus if f vanishes at the origin, so does Tf . For f vanishing at the origin we have then Tf = T f.
Remark 2.13. A direct consequence of this result is the following localization identity: The above decomposition appears already in [Con88] and is a special case of the general framework of Berezin integration [Ber13] applied to the conjugation supersymmetry at hand. Generalization to more complicated supersymmetries has been an important theoretical tool in condensed matter physics. All such results are proven by an inspired sequence of elementary applications of Stokes' theorem. We give a proof of the present simple case for the convenience of the reader.
Proof. Let F be as above. Then settingλ = diag(λ 1 , iλ 2 ) we get from Lemma 2.5 The arguments of f are even elements of the Grassmann algebra, hence Inserting all this in the integral we obtain The second integral can be reorganized as follows In the first line we use Stokes theorem in the form of the Cauchy-Pompeiu [= Cauchy-Green] formula (2.39) where the first term on the right hand side vanishes in the limit r → ∞ as long as our test function f (λ) does not increase too fast. This concludes the proof of the first representation for T. The second one follows from

Recall that G[H
where q, m, n i , x, y i are defined as in (2.19). Also set, for x ≤ x ′ , and for x ′ < x set I xx ′ = I N −x+1,N −x ′ +1 . As we see from the proof below, these definitions are consistent with (2.19) and (2.17).
For n > 1 we have: Proof. From Corollary 2.2 and relations (2.17) the proof of the first two identities is reduced to study Then where in the last line we applied again Stokes theorem (2.39) and we used af i (a, b) = 0 for a = b = 0. The proof for the third identity is done in the same way.

Analysis of the transfer operator
So far we treated T and T as formal operations. To apply spectral-theoretic methods, recall that for f vanishing at the origin, Tf = T f , and hence It is safe to use expressions such as e − V 2 and V , since e −V does not vanish except if E = 0 at λ 2 = 1 (cf. (2.15)). We will show in Sect. 4.4 below this problem is easy to deal with. For f sufficiently regular (but not necessarily vanishing at zero) the computation of T n f will be reduced to the following two ingredients.
(i) We show in Section 3.1 that the operator e − 1 2 V δ ⋆ W 2 e − 1 2 V is bounded in L p (R 2 ) for any p ∈ [1, ∞], and spectral theory yields a good bound on its norm for p ∈ (1, ∞).
(ii) In Section 3.2, we construct a solution u of the eigenvalue equation Tu = u satifying in addition u(0) = 1. Unlike Section 3.1, the argument relies on explicit elementary computations rather than on methods of operator theory, and the term 'eigenfunction' is used without specifying the construction of the operator. 4 We recover then T n f from (i) and (ii) via the decomposition f = f (0)u + [f − f (0)u] as follows: Note that u(0) = 1 ensures (f − f (0)u)(0) = 0. The results of this section mostly pertain to the case |E| ≤ 32 9 . The extension to all |E| < 2 is discussed in Section 5.

Operator norm bound
Denote by · p the operator norm of an operator from L p (R 2 ) = f : We recall for further use that, for an integral operator with kernel K, and that for all 1 < p < ∞ The combination of the relations (3.2) and (3.3) is known as Schur's bound. It immediately follows from (2.36) and (3.3) that δ ⋆ W 2 p ≤ 1 for any p ∈ [1, ∞]. It is also easy to check that, for |E| ≤ 32 9 5 , ℜV (λ) has non-degenerate global minima at λ = 0, and λ = (0, 2 1 − E 2 4 ), with value 0. In particular, ℜV ≥ 0, so that  where e −V p means the operator norm. In particular, The following standard semiclassical argument improves these bounds for 1 < p < ∞.
Proposition 3.1. For any p ∈ (1, ∞) there exists c > 0 such that Remark 3.2. The estimate fails for p = 1, ∞: indeed, Proof. We prove the estimate for p = 2, in which case this is a standard semiclassical argument (see [Hel02]), which we repeat for the convenience of the reader. The general case follows from the case p = 2 and (3.5) by Riesz-Thorin interpolation.
The following corollary will be a key ingredient in the proof of our theorems.
Corollary 3.4. Let f ∈ L p with 1 < p < ∞ be such that Λ −1 f ∈ L p . Then we have for all n ≥ 1, m ≥ 0 (3.6) Proof. For the first bound we use where in the first line we applied equation (3.1) and in the second line equation (3.5) and Proposition 3.1. For the second bound we use (3.8) The bound Λ m+1 e −V ∞ ≤ C m+1 (m + 1)! follows from the quadratic growth of ℜV at infinity.
We conclude this subsection with a few additional properties of the T operator that will prove important later.
Lemma 3.5. The following holds.
(i) Let p ∈ (1, ∞), n ∈ N and 1 ≤ q ≤ p such that f and Λ n f ∈ L p and Λ −1 f ∈ L q .
Then T f ∈ C ∞ (R 2 ) and there exists a constant C = C p,q > 0 such that (3.9) (ii) Let n ∈ N and f ∈ L 1 ∩ L ∞ be a C 1 function such that Λ n f ∈ L 1 ∩ L ∞ . Then Tf is smooth, and for all p ∈ (1, ∞) e V Tf ∈ L p , and satisfies the bound Proof of (i) We drag the multiplier Λ n+1 through δ ⋆ W 2 , as follows. The decomposition where in the last term we extracted a fraction of the exponential decay to bound the factor |Λ − Λ ′ |. The result (i) now follows from (3.10) Proof of (ii) From Definition (2.37) e V Λ n Tf p ≤ Λ n e − W 2 2 λ 2 p |f (0)| + e V Λ n T f p . The first term is bounded by Λ n e − W 2 2 λ 2 p |f (0)| ≤ C n+1 (n + 1)!W −n− 2 p |f (0)|. The result now follows from (3.9) setting q = 1. The condition Λ −1 f ∈ L 1 is ensured by f ∈ C 1 ∩ L 1 and f (0) = 1. Finally f, Λ n f ∈ L 1 ∩ L ∞ ensures f and Λ n f ∈ L p for all p ∈ (1, ∞).
Remark 3.6. Note that while the function e −V is bounded, the function e V develops a singularity of the form 1/(λ 2 − 1) at λ 2 = 1 for E = 0. The lemma above shows this causes no problem, as long as the function appears together with the operator T.

The top eigenfunction
Using the bound of the previous paragraph, we will construct a solution u of the equation Tu = u normalized to u(0) = 1. We call this solution an eigenfunction (with eigenvalue 1), without specifying the operator-theoretic construction.
The strategy is as follows. First, we guess an approximate eigenfunction u 0 such that Tu 0 ≈ u 0 . This is done below in Proposition 3.8. Then we upgrade it to the true eigenfunction (3.11) The next proposition ensures that this procedure is justified, and yields an eigenfunction that is close to u 0 .
Proposition 3.7. Let u 0 ∈ L 1 ∩ L ∞ be a C 1 function such that u 0 (0) = 1. We define v := Tu 0 − u 0 . (3.12) Then for all p ∈ (1, ∞) it holds v ∈ L p and Λ −1 v ∈ L p . Moreover, the series converges in L p to a solution u of the equation Tu = u which satisfies and (3.15) The limit function u is independent of the initial choice for u 0 .
The independence of the limit of u 0 follows from (3.17) again.

Approximate eigenfunction
We need to make a good choice for u 0 in order for u − u 0 to be relatively small. Since we expect u to be also eigenfunction of the supersymmetric operator T, we take as initial ansatz some function U 0 (R) invariant under superunitary rotations hence U 0 (R) = u 0 (λ). Guided from equation (2.29) and Corollary 2.10 and in analogy with semiclassical analysis we take as ansatzs where Q (j) is a polynomial in R of degree j, consisting of sums of products of supertraces, which we assume invariant under superunitary rotations Q (j) (R) = q (j) (λ). Finally α ∈ C is some constant. Then from (2.30) (3.19) By abuse of notation we will use the same letter Err (M ) (·) to denote the function of R and the function of λ. Then The precision of our approximation is therefore determined by the leading order in W − 1 2 from Err(R/ √ W ). The following proposition shows we can make this error function at least as small as O(W −3 ).
Proposition 3.8. Let α be the solution of the equation where c 3 , c j 4 , j = 0, . . . , 3 and c j 5 j = 0, . . . , 7 form the solution of the following system of equations (3.23) Proof. It is more convenient to study the error term in R coordinates. After rescaling R → W − 1 2 R the polynomials Q 3 , Q 4 can be written as where we defined The W prefactors ensure that all terms in P 4 contribute to the same order √ W −4 and all terms in P 5 contribute to the same order √ W −5 . It follows from Corollary 2.10 and the additional formulas in the Appendix .
where the sum is absolutely convergent for |λ| small. Moreover Inserting all this in the rescaled error function yields for M = 0 Finally, the same arguments yield Err (5) (RW − 1 2 ) = O(W −3 ). The result follows.
Remark 3.9. Note that Proposition 3.8 remains valid for all |E| < 2, since it only relies on the properties of the kernel in the vicinity of the origin. For the same reason, the conclusion remains valid if the kernel (or the contour) is deformed outside a vicinity of the origin. We shall use this in Section 5.
A first consequence of these results is the following Corollary.
Note that combining Proposition 3.8 with equations (3.14), we have: To improve this bounds, we need a longer argument.
Proof. First, note that u where in the first line we used (3.15) and in the last line we used the explicit expression Tu (3) 0 = e −V (λ) e −αW µλ 2 (1 + W µ 3 q (3) (λ)), together with the constraint n ≥ 1. The same argument yields the bound on e V Λ n u p .
Remark 3.11. In the rest of the paper we will mostly use u 0 . While the latter gives a better approximation of u, the first two are easier to deal with. This last feature is particularly useful in some parts of the proof.
A key ingredient of our proofs will be the comparison of the function e −V , where the operator T n applies, with the exact eigenfunction u. This is done in the following lemma.
Proof. The first bound follows directly from e −V p ≤ O(1) and Remark 3.11. It suffices to prove the second bound replacing u by u By similar arguments we can replace e −V by e − 1 2 λ 2 in the estimate. We have We use different bounds in the various integration regions. For t ≤ W − 1 2 we have In all other regions we estimate e − t 2 2 and e −W αt 2 separately. Direct computation give This concludes the proof.
4 Proof of Theorems 1 and 2 away from the edges Throughout this section we assume that |E| ≤ 32 9 . This assumption will be relaxed in Section 5. For technical reasons we will initially assume also E = 0. We will explain at the end of this section how to deal with E = 0.

Preliminary results
Recalling the definition of ρ(E) and ρ N (E) (1.5), the relevant quantities to study are lim εց0 where H N is the finite-volume operator, whereas H is the operator in infinite volume. From (2.41) we have where I N (y) = 1 2π where we used (e −V − u)(0) = 0. Therefore I N (y) = I 1 + I 2 (y − 1) + I 2 (N − y) + I 3 (y − 1) where (4.3) Note that I 1 is independent of N and y. The following proposition estimates the decay of I 2 and I 3 and is a key ingredient for the proof of our results. The estimate we obtain here for I 2 is not optimal. We will prove a stronger bound in Proposition 4.3 below.
Proposition 4.1. Let I 2 , I 3 as above. Then for all k ≥ 1 Proof. For k ≥ 2 we write Inserting absolute values and setting 1 q + 1 p = 1, p > 2 yields where in the first line we applied eq. 3.6, while in the second line we used (3.27) for p > 2 and Corollary 3.10. For k = 1, using again (3.27) for p > 2 and Corollary 3.10, the bound reduces to It remains to prove the decay of I 3 (k). Since N ≫ 1, it holds k − 1 > 2 or N − k > 2. We assume without loss of generality k − 1 > 2. Then applying (3.6) and (3.27) for p = 2 we get A first consequence of this estimates is the following representation for the infinite volume Green's function.
Proposition 4.2. Let I 1 , I 2 , I 3 be as in (4.3). Then {2I 2 (y) + I 3 (y)} . (4.5) Proof. Let y = y(N ) so that min(y, N − y) ≥ N 0.01 , and rewrite For fixed ǫ, the second term vanishes in the limit N → ∞. Indeed, it is equal to a sum of several boundary terms such as . Each of these terms tends to zero: indeed, |G 1y [H](E ε )| ≤ ǫ −1 , whereas |G y1 [H N ](E ε )| tends to zero by (an appropriate version of) the Combes-Thomas bound (see e.g. [AW15]). Precisely let X N denote the event |(H N ) jk | ≤ K N α J 1/2 jk , where J jk = (−W 2 ∆ N + id) −1 is the covariance of the random matrix H N and decays exponentially J jk ≤ c W e −|j−k|/W , 0 < α < 1 and K ≫ 1 are some fixed parameters. Then sup x y |(H n ) xy (e δ(|x−y|) − 1)| ≤ K ′ N α as long as J 1/2 jk e δ|j−k| retains some exponential decay. By Combes-Thomas where c > 0 is some constant and we used y/N α → ∞ as N → ∞. To conclude we show that X c N has vanishing probability: We obtain (recall y = y(N )) where in the last equality we can exchange limits since after translating to the saddle in the integral representation for |G y1 [H N ](E ε )| , all integrals are bounded unformly in ǫ.
The result now follows from representation (4.2) and estimates (4.4).
Using this Proposition, (1.8) and (1.8) of Theorem 1 reduce to a study of I 1 and I 2 , I 3 , respectively. However, to obtain the error estimate (1.8) we will need an improved version of (4.4) for I 2 (k) that requires substantial more work. This will be done in Proposition 4.3 below. The bound on I 1 follows from the properties of u, in particular, its approximate symmetry under λ → −λ. The argument is given in section 4.3 below.

Convergence to ρ
Our goal is to prove ρ N (E) − ρ(E) ≤ C ′ (E) N for |E| < 32/9 (away from the edge) and N ≥ C(E)W log W. Recalling the definition of ρ(E) and ρ N (E) (1.5), Proposition 4.2 implies To extract the correct W prefactor we need the following improved version of (4.4).
Proposition 4.3 (Improved estimate on I 2 (k)). Let I 2 be as as above. Then Proof. Note that for any (regular enough) functions f, g it holds (4.7) Replacing f = (e −V − u) and g = Λ −1 e V u the integral I 2 (k) can be written as (4.8) The proof of the Proposition relies on two main ingredients:(a) Λu is an approximate eigenfunction of T with eigenvalue µ, i.e. T (Λu) ≃ µ(Λu) and (b) the contribution I 2 (0) from k = 0 is smaller than expected. More precisely where Rem := 3c 3 (4.10) The proofs are given below. We decompose now the integral in (4.8) as I 2 (k) = I 2,0 (k) + I 2,1 (k) + I 2,3 (k) where I 2,0 was defined above and For the first integral we obtain where we applied (3.27) for p > 2. Finally, applying Corollary 3.10 where we used Λ = Str R. From equation (2.30) and Lemma A.1 (d) we find , and Err (0) , Err (3) were introduced in (3.19). Then (3.20) yields (4.9). In order to prove (4.10) we insert the decomposition T n = µ n id + n−1 l=0 µ n−1−l T l (T − µid) : Using |µ| ≤ (1 − c W ) we get The estimates (3.23) now yield The same bound holds for e V Rem p . This completes the proof.
Proof of Proposition 4.5 Inserting the decomposition ( (4.11) The first two integrals vanish. This can be more easily seen by going back to R coordinates (4.12) The first integral equals 0 by parity, the second by explicit computation. In the same way where the last integral vanishes by parity and the other contributions vanish using (4.12). Hence, using 1 − e −V = O(λ 2 ), Finally, using Corollary 3.10 again This concludes the proof.

Proof of Theorem 2
Our goal is to prove the estimate |∂ n E ρ N (E)| ≤ W n−1 C(E) n n!, n ≥ 1, uniformly in N. Using the supersymmetric representation we have seen that where I m,n 1 ,...,nq x,y 1 ,...,yq was defined in Corollary 2.14 and we use the convention y 0 = 1, y q+1 = N + 1. Using T(Λf ) = T (Λf ), we can reorganize the integral as follows: I m,n 1 ,...,nq x,y 1 ,...,yq = (4.13) where we use the convention n 0 = 0 = n q+1 . Note that by (3.6) for all m, n ≥ 1 it holds When f = Λu this estimate gives a factor u p = O(W − 1 p ). The following lemma shows the bound can be improved.
Lemma 4.6. For all n, m ≥ 1 where in the first line we used (3.6) and in the last line Corollary 3.10, Proposition 4.4 and the explicit form of u 0 , together with the constraint n ≥ 1.

Case n = 1
For . From Corollary 2.14 we have the representation lim εց0 G yy and for x ′ < x we set We insert again the decomposition e −V = u + (e −V − u), which yields I xx ′ = (4.14) and in we used in addition (4.7). The first integral is bounded by where we used (3.6) and Corollary 3.10. The second integral is bounded by where in the first term we used again (3.6) together with (3.27) for p > 2. In the second term we used e V Λu q = O(W − 1 q − 1 2 ) (cf. Corollary 3.10) for the case k ′ = 0. When k ′ ≥ 1 we apply (3.9) to get The estimate now follows from Lemma 4.6 and (3.6). Note that we are forced to estimate the factor e V together with T k ′ Λu since for k = 0 the term e V (e −V − u) is not integrable. Finally the constraint k + k ′ + k ′′ = N − 1 guarantee that k ≤ 1 or k ′ + k ′′ ≥ 1. We can assume without loss of generality k ≥ 1. Then using (3.27) for p = 2 if N ≥ C(E)W ln W, for C(E) > 0 some large constant. This completes the proof of (1.10). Performing the sum over y, y ′ we obtain |∂ E ρ N (E)| ≤ C.

Case n > 1
As in the case n = 1 we insert the decomposition e −V = u + (e −V − u), and reorganize the integral (eventually applying also (4.7)) as the sum of three terms of the following form: where l ≥ 1, l ′ ≥ 0, n j , n ′ k ≥ 1 and m j , m ′ k ≥ 0 for all j, k, with the constraint l j=0 n j + l ′ k=0 n ′ k = n. Finallym,m ′ ≥ 0 but must satisfy the constraints l j=0 m j + l ′ k=0 m ′ k + m +m ′ = N − 1, The proof now works as in the case n = 1 and yields |I m,n 1 ,...,nq x,y 1 ,...,yq | ≤ where y max := max[y q , x], and y min := min[y 1 , x]. Hence

The case E = 0
At E = 0 the factor e V may develop a pole. To solve the problem we use (3.18) to replace u by u = Tu 0 + T (u − u 0 ) before doing any other manipulation. Formulas become slightly more cumbersome, but each factor e V now comes with a prefactor e −V .

Contour deformation
To extend the proof to the entire range E ∈ (−2, 2), the contour of integration has to be deformed. One possible strategy (followed in [Dis04]) is to rotate the contour. The rotation angle must ensure that ℜV has only one non-degenerate global minimum at the saddle point. This can only be achieved for a rotation angle close to π/6 (cf. [Dis04, p. 5.1.2]). However the corresponding transfer operator e −ℜV /2 e −W 2 (λ−λ ′ ) e −ℜV /2 is no longer longer self-adjoint. Another strategy developed in [DS16] consists in performing a complex rotation that makes the operator approximately normal. The results in [DS16] require the resulting function e −ℜV to have only one non degenerate global minimum. In the present case we would need to rotate by approximately π/8, but then e − ReV has two minima.
Here we therefore use the following strategy: first (Section 5.1), we find a good contour Γ for the bosonic variable. After the contour deformation, the operator T is transformed to a new operator T Γ . The main technical difficulty is to find a replacement for the operator norm bound of Proposition 3.1. We show (Proposition 5.2) that a similar bound holds when the operator is replaced with its k-th power, where k is a sufficiently large number, independent of W . Having this bound at hand, the proof follows the lines of its counterparts for |E| < 32 9 . We fix an energy |E| < 2, the dependence on which is omitted from the notation. An inspection of the argument shows that all the estimates are uniform on compact subintervals of (−2, 2). (2) the angle between Γ and the real axis stays in the range (− π 4 (1 − c Γ ), π 4 (1 − c Γ ));
Note that when |E| → 2 we need to take c Γ → 0 too. The proof is an elementary verification. We reduce it to a similar verification already performed in [Dis04].
Proof. In [Dis04, Section 5.1.2] it is proved that for any |E| < 2 there exists ζ with |ζ| = 1, | arg ζ| < π/4 such that min a∈ζR ℜV 1 (a) is uniquely attained at a = 0 and the singularity of ℜV 1 does not lie between ζR and R. We first choose a large C > 0 and then a small c > 0. For sufficiently large C one has ℜV 1 > const > 0 in the entire domain |z| > C , z lies between R and ζR .
In particular, one has ℜV 1 > const > 0 on the four segments (1), (2), (6), (7). For this value of C, one can choose c > 0 sufficiently small so that, by a continuity argument,the minimum of ℜV 1 on the union of the remaining segments (3), (4), (5) is uniquely attained at the origin. For these values of C and c, let Γ = Γ(c, C).
Then Γ satisfies the conditions (1)-(4), and a weakened form of (5) with Γ in place of Γ + . By an additional continuity argument, (5) also holds as stated provided that c Γ is chosen sufficiently small.
For a contour Γ, denote by K Γ the integral operator with kernel acting on L p (Γ × R). Here we use the convention λ 2 = λ 2 1 + λ 2 2 . The main technical step is the following proposition, the proof of which will be the subject of the next Section 5.2.

Proof of the operator norm bound
To prove Proposition 5.2, we study the kernel of the operator K k Γ .
In the case |λ| ≥ 2C Γ we split Arguing as before, with the bounds for the non-deformed case in place of Lemma 5.3 for the first integral, we obtain: that the left-hand side of (5.7) is less than one and, in, fact, decays exponentially with k. This concludes the proof of (5.5).