Traffic distributions of random band matrices

We study random band matrices within the framework of traffic probability, an operadic non-commutative probability theory introduced by Male based on graph operations. As a starting point, we revisit the familiar case of the permutation invariant Wigner matrices and compare the situation to the general case in the absence of this invariance. Here, we find a departure from the usual free probabilistic universality of the joint distribution of independent Wigner matrices. We then show how the traffic space of Wigner matrices completely realizes the traffic central limit theorem. We further prove general Markov-type concentration inequalities for the joint traffic distribution of independent Wigner matrices. We then extend our analysis to random band matrices, as studied by Bogachev, Molchanov, and Pastur, and investigate the extent to which the joint traffic distribution of independent copies of these matrices deviates from the Wigner case.


Introduction and main results
For a self-adjoint N × N matrix A N ∈ Mat N (C), let (λ k (A N )) 1≤k≤N denote the eigenvalues of A N , counting multiplicity, arranged in a non-increasing order. We write µ(A N ) for the empirical spectral distribution (or ESD for short) of A N , i.e., Definition 1.1 (Wigner matrix). Let (X i,j ) 1≤i<j<∞ and (X i,i ) 1≤i<∞ be independent families of i.i.d. random variables: the former, real-valued (resp., complex-valued), centered, and of unit variance; the latter, real-valued and of finite variance. Taken together, the two families define a random real symmetric (resp., complex Hermitian) N × N matrix X N with entries given by We call X N an unnormalized real (resp., complex) Wigner matrix. We introduce the standard normalization via a Hadamard-Schur product. Let J N denote the N × N all-ones matrix, and define N N = N −1/2 J N . We call the random real symmetric (resp., complex Hermitian) N × N matrix W N defined by W N = N N • X N a normalized real (resp., complex) Wigner matrix. We simply refer to Wigner matrices when the context is clear, or when considering the definition altogether.
We define the parameter β of a Wigner matrix as the pseudo-variance of its unnormalized strictly upper triangular entries, i.e., Note that a Wigner matrix is a real Wigner matrix iff its parameter β = 1. We further note that the distribution of a Wigner matrix is invariant under conjugation by the permutation matrices iff its parameter β ∈ [−1, 1] ⊂ R (in general, β ∈ D ⊂ C). This in turn is equivalent to the real and imaginary parts of X N (i, j) being uncorrelated.
Much work has since been done on Wigner matrices and other classical random matrix ensembles. The recent monographs [4,1,24,5,12] provide excellent introductions to this end. Free probability, introduced by Voiculescu [26], explains the distinguished role of the semicircle distribution. Motivated by the study of free group factors, Voiculescu discovered a remarkable analogue of classical independence for non-commuting random variables, the so-called free independence. Free analogues of classical constructions from (commutative) probability theory abound: for example, the free central limit theorem (CLT), free convolution, free cumulants, free entropy, and a free stochastic calculus. In particular, the semicircle distribution is the attractor in the free CLT.
In the landmark paper [27], Voiculescu showed that free independence describes the asymptotic behavior of the ESD for a large class of random matrices, such as those invariant in distribution under conjugation by the orthogonal group (in the real symmetric case) or the unitary group (in the complex Hermitian case). Wigner's semicircle law can thus be seen as a consequence of the free CLT. We refer the reader to [25,13,20,23,18] for further reading on the various aspects of free probability.
On the other hand, many random matrix models of interest do not possess the aforementioned invariance. This consideration led Male to introduce an operadic noncommutative probability theory based on graph operations that describes the asymptotic behavior of random matrices invariant in distribution under conjugation by the symmetric group [14], which includes ensembles outside of the domain of free probability [15,16,2]. This additional operad structure admits a corresponding notion of independence, the so-called traffic independence. At the same time, traffic probability captures certain aspects of both classical and free probability [14,9]. An as yet incomplete understanding of this relationship yields insightful feedback between the different theories.
In a different direction, the universality of non-invariant ensembles constitutes a major ongoing program of research. We recall one prominent model of interest: random band matrices. Definition 1.2 (Band matrix). Let (b N ) be a sequence of nonnegative integers. We write B N for the corresponding N × N band matrix of ones with band width b N , i.e., Let X N be an unnormalized Wigner matrix. We call the random matrix Ξ N defined by Ξ N = B N • X N an unnormalized random band matrix. We introduce a normalization based on the growth rate of the band width b N . We say that (b N ) is of slow growth (resp., proportional in which case we use the normalization We call c the proportionality constant: we say that (b N ) is of full proportion if c = 1 and proper otherwise. For a fixed band width b N ≡ b, we use the normalization Υ N = (2b + 1) −1/2 J N . In any case, we call the random matrix Θ N defined by a normalized random band matrix. We simply refer to random band matrices (or RBMs for short) when the context is clear, or when considering the definition altogether.
A long-standing conjecture proposes a dichotomy for the spectral theory of RBMs: random matrix theory local statistics and eigenvector delocalization for large band widths; Poisson local statistics and eigenvector localization for small band widths; and a sharp transition around the critical value b N = √ N (see [8] and the references therein).
At the macroscopic level, Bogachev, Molchanov, and Pastur proved that the class of band widths in Definition 1.2 determine the global universality classes of the RBMs: for slow growth RBMs, µ(Θ N ) converges to the semicircle distribution µ sc ; for proportional growth RBMs of proper proportion c ∈ (0, 1), µ(Θ N ) converges to a non-semicircular distribution µ c of bounded support; and for fixed band width RBMs having a symmetric distribution for the entries, µ(Θ N ) converges to a non-universal symmetric distribution µ b [7]. The authors further proved a continuity result for these distributions, namely, lim c→0 + µ c = lim c→1 − µ c = µ sc and lim b→∞ µ b = µ sc . (1.1) The work above concerns the distribution of a single RBM: naturally, this invites the question of the joint distribution of such matrices. Shlyakhtenko showed that freeness with amalgamation in the context of operator-valued free probability governs what he called Gaussian RBMs [21]; otherwise, to our knowledge, RBMs have not received much attention from the non-commutative probabilistic perspective. Nevertheless, we show that the framework of traffic probability allows for tractable computations in multiple RBMs. Our main result identifies the joint limiting traffic distribution (or LTD for short) of independent RBMs of possibly mixed band width types under a strong uniform control on the moments.  N ) i∈I1∪I3 are identical, the latter already being known from [14].
Knowledge of the traffic distribution, which is defined in terms of graph observables, can often be difficult to interpret. Notwithstanding, the equality of the LTD for (Θ  converges in distribution to a semicircular system. N and X (2) N for N = 10000. Here, we consider slow growth RBMs Θ , which we plot in blue and red respectively.
The overlapping region is colored blue + red = purple and dominates the graph, as predicted by Corollary 1.4. The common LSD is given by the so-called tetilla law [19,10], which is supported on the interval − 11+5

Remark 1.5 (Coarseness of the traffic distribution).
We do not make any assumptions on the relative rates of growth for the band widths (b N . In particular, perhaps not surprisingly, we fail to observe any sort of transition around the conjectured critical value for RBMs at the level of (first-order) freeness. Moreover, our result shows that the traffic distribution, despite all of its additional structure, falls short of capturing even other macroscopic features. In particular, Theorem 3 in [7] implies that λ 1 (Θ N ) a.s. → ∞ for slow growth RBMs, whereas Bai and Yin showed that λ 1 (W N ) a.s. → 2 iff the entries of X N have finite fourth moments [3].
Unfortunately, traffic probability has less to say about proportional growth RBMs and less still about fixed band width RBMs. We show that independent proportional growth (resp., fixed band width) RBMs are not asymptotically traffic independent unless c = 1 (resp., b = 0). We also prove the traffic analogue of equation (1.1), showing that the continuity of the LSD in the band width extends to the LTD as well. Here, we find a subtle difference in how these limits are attained, leading into our analysis of mixed band width types.
We organize the paper as follows. Section 2 sets about the necessary background and notation. Section 3 considers Wigner matrices in the generality of [11], where we also prove general Markov-type concentration inequalities for the traffic distribution of independent Wigner matrices. Section 4 then treats the case of RBMs, beginning with a preliminary version of our main result for periodic RBMs. As an application of Theorem 1.3, we compute the LSD of the degree matrix of a proportional growth RBM in the appendix, which we find to be almost Gaussian in the sense of its moments.
Definitions 2.1 (Free probability). A non-commutative probability space is a pair (A, ϕ) consisting of a unital algebra A over C equipped with a unital linear functional ϕ : A → C. We refer to elements a ∈ A as non-commutative random variables (or simply random variables) with ϕ playing the role of the expectation.
The distribution of a family of random variables a = (a i ) i∈I in a non-commutative probability space (A, ϕ) is the linear functional µ a : C x → C defined by taking the expectation of a non-commutative polynomial in x = (x i ) i∈I evaluated in the random variables a, i.e., µ a : P → ϕ(P (a)), We say that a sequence of families (a n ) = ((a (i) n ) i∈I ), each living in a non-commutative probability space (A n , ϕ n ), converges in distribution to a if the corresponding sequence of distributions (µ an ) converges pointwise to µ a , i.e., lim n→∞ µ an (P ) = µ a (P ), We say that unital subalgebras (A i ) i∈I of A are classically independent if the (A i ) i∈I commute (i.e., [A i , A j ] = 0 for i = j) and ϕ is multiplicative across the (A i ) i∈I in the following sense: for any k ≥ 1 and distinct indices i(1), . . . , i(k) ∈ I, We say that unital subalgebras (A i ) i∈I of A are freely independent (or simply free) if for any k ≥ 1 and consecutively distinct indices i(1) = i(2) = · · · = i(k) ∈ I, denotes the subspace of centered elements ϕ(a i(j) ) = 0.
Example 2.2 (Random matrices). Naturally, we will focus on the non-commutative probability space (Mat N (L ∞− (Ω, F, P)), E 1 N tr) of random N × N matrices whose entries have finite moments of all orders. Free probability describes the large N limit behavior of such matrices in many generic situations [18].
At the combinatorial level, classical independence and free independence simply amount to rules for calculating mixed moments in independent random variables from the pure moments. Of course, such a rule should satisfy certain natural properties to warrant consideration as a probabilistic notion. In the setting of Definitions 2.1, Speicher showed that if one requires the rule to be suitably universal in an algebraic sense, then in fact classical independence and free independence are the only candidates [22] (see also [6] for a categorical axiomatization).
Traffic probability is a recent extension of the framework in Definitions 2.1. To make this precise, we will need the language of graph theory.

Definitions 2.3 (Graphs).
A multidigraph G = (V, E, src, tar) consists of a non-empty set of vertices V , a set of edges E, and a pair of maps src, tar : E → V specifying the source src(e) and target tar(e) of each edge e ∈ E. Such a graph G is said to be bi-rooted if it has a pair of distinguished (not necessarily distinct) vertices (v in , v out ) ∈ V 2 , the coordinates of which we call the input and the output respectively. For a bi-rooted multidigraph g = (G, v in , v out ), we define ∆(g) (resp., ∆(g)) as the bi-rooted multidigraph (resp., multidigraph) obtained from g by identifying the input and the output v in ∼ v out (resp., and further forgetting the information of the roots).
A graph operation is a finite, connected, bi-rooted multidigraph g = (G, v in , v out , o) together with an ordering of its edges o : E ∼ − → [#(E)]. We interpret g = g(· 1 , . . . , · K ) as a function of K = #(E) arguments, one for each edge e ∈ E, with coordinates specified by the ordering o. In particular, we call such a graph g a K-graph operation. We write G K for the set of all K-graph operations and G = K≥0 G K for the graded set of all graph operations.
A test graph T = (G, γ) in I is a finite, connected multidigraph G with edge labels γ : E → I. We write T I for the set of all test graphs in I and CT I for the complex vector space generated by T I . Suppose that I = j∈J I j is a disjoint union, where we think of each j ∈ J as a different "color". The graph of colored components GCC(T ) = (V, E) is the simple bipartite graph obtained from T as follows. For each j ∈ J, let (T j,k ) (j) k=1 denote the connected components of the subgraph T j of T spanned by the color j. In particular, each T j,k is a test graph in I j . Let V 1 denote the subset of vertices of T that belong to more than one of the components For the sake of brevity, we restrict ourselves to a minimal working definition of the traffic probability framework.

Definitions 2.4 (Traffic probability).
A G-algebra is a complex vector space A together with an action (Z g ) g∈G of the operad of graph operations. In particular, each graph operation g ∈ G K ⊂ G defines a linear map Z g : A ⊗K → A satisfying certain natural compatibility conditions. Note that a G-algebra structure on A defines a unital C-algebra structure on A via the product An algebraic traffic space is a pair (A, τ ) consisting of a G-algebra A equipped with a G-compatible linear functional τ : CT A → C called the traffic state. The traffic state induces a tracial unital linear functional ϕ τ = τ • ∆ : A → C, or, graphically, In particular, (A, ϕ τ ) is a non-commutative probability space. We define a transform of the traffic state called the injective traffic state τ 0 : CT A → C by the Möbius where (P(V ), ≤) is the poset of partitions of V with the reversed refinement order ≤, µ is the corresponding Möbius function, and T π is the test graph obtained from T by identifying the vertices within each block B ∈ π. One recovers the traffic state via the inversion For example, The traffic distribution of a family of random variables a = (a i ) i∈I in an algebraic traffic space (A, τ ) is the linear functional ν a : CT x → C defined by evaluating the traffic state on test graphs in x = (x i ) i∈I under the substitution x i → a i , i.e., We say that a sequence of families (a n ) = ((a (i) n ) i∈I ), each living in an algebraic traffic space (A n , τ n ), converges in traffic distribution to a if the corresponding sequence of traffic distributions (ν an ) converges pointwise to ν a , i.e., We define the injective traffic distribution in the obvious way. Note that convergence in traffic distribution is equivalent to convergence in injective traffic distribution.
We say that sub-G-algebras (A i ) i∈I of A are traffic independent if where the graph of colored components is constructed with respect to the colors i ∈ I. Example 2.5 (Graph polynomials). A graph monomial t = (G, γ) in x = (x i ) i∈I is a bi-rooted multidigraph G = (V, E, src, tar, v in , v out ) with edge labels γ : E → I. We write G x for the set of all graph monomials in x and CG x for the complex vector space generated by G x , the so-called graph polynomials in x. The graph polynomials CG x form a G-algebra under the action of composition: for a K-tuple of graph monomials (t 1 , . . out , γ i ), we define Z g (t 1 ⊗ · · · ⊗ t K ) as the graph monomial obtained by substitution. Formally, one removes each edge e ∈ E and installs a copy of t o(e) in its place by identifying the vertices src(e) ∼ v Example 2.6 (Graphs of matrices [17]). Returning to Example 2.2, we define an action of the operad of graph operations on Mat N (L ∞− (Ω, F, P)) by the coordinate formula (φ(tar(e)), φ(src(e))).
For notational convenience, we often write φ(e) := (φ(tar(e)), φ(src(e))). The reader can easily verify that the G-algebra structure recovers the usual matrix multiplication. At the same time, the action of the graph operations also produces matrices of additional linear algebraic structure. For example, one can obtain the diagonal matrix of row sums as Note that the trace tr of a graph of matrices Z g (A In particular, the traffic state τ N : CT Mat N (L ∞− (Ω, F, P)) → C defined by recovers the normalized trace ϕ τ N = E 1 N tr. The injective traffic state τ 0 N admits an explicit formula without reference to the Möbius function in the matricial setting, namely, ∀T ∈ T Mat N (L ∞− (Ω, F, P)) , whence the name. In the sequel, we use the notation φ : V → [N ] to indicate an injective labeling of the vertices.
The setting of Definitions 2.4 allows us to circumvent Speicher's dichotomy. Notably, permutation invariant random matrices provide a canonical example of asymptotically traffic independent random variables [14]. Of course, one can still define the usual notions of independence in an algebraic traffic space (A, τ ) by virtue of the induced expectation ϕ τ . This subsumption allows for an interplay between the different notions of independence in the traffic framework. Indeed, one finds many striking relationships between them: for example, general criteria for when traffic independence implies free independence or classical independence [14,9]. In particular, we note that the information of the traffic distribution ν a contains the information of the distribution µ a .

Wigner matrices
We restrict ourselves to Wigner matrices X N = (X where the entries (X In particular, compared to our original definition, we now allow the random variables within our matrices to vary with the dimension N ; moreover, we no longer assume that they are identically distributed. For technical reasons, we assume that the real and imaginary parts of an off-diagonal entry X (i) For example, this includes the class of all real Wigner matrices (β i = 1), but also circularly-symmetric ensembles such as the GUE (β i = 0). We comment on the general case of β i ∈ D when possible, though the situation becomes much different and often intractable (especially for RBMs). Thus, unless stated otherwise, we assume that

Limiting traffic distribution
Our first result extends the traffic convergence of the Wigner matrices in [14] to the generality of Equation (3.1). In order to formulate the LTD, we will need some definitions.
We say that T is a fat tree if when disregarding the orientation and multiplicity of the edges, T becomes a tree. We further say that T is a double tree if there are exactly two edges between adjacent vertices. We call the pair of edges connecting adjacent vertices in a double tree twin edges: congruent if they have the same orientation, opposing otherwise. Finally, we say that T is a colored double tree if T is a double tree such that each pair of twin edges {e, e } shares a common label γ(e) = γ(e ) ∈ I. We record the number c i (T ) of pairs of congruent twin edges with the common label i in a colored double tree T . We introduce some notation to emphasize the relevant features of our test graphs. This notation will greatly simplify our analysis and features prominently in the remainder of the article. We start with a finite (not necessarily connected) multidigraph G = (V, E). We partition the set of edges E = L ∪ N to distinguish between the loops L and the non-loop edges N = L c . As suggested by Definition 3.1, we define G = (V, E) as the undirected graph obtained from G by disregarding the orientation and multiplicity of the edges. Formally, E = E/∼ consists of equivalence classes in E, where e ∼ e ⇐⇒ {src(e), tar(e)} = {src(e ), tar(e )}.
In this case, our partition E = L ∪ N projects down to a partition E = L ∪ N between equivalence classes of loops and equivalence classes of non-loops respectively. We may then write the underlying simple graph G of G = (V, E) as G = (V, N ).  hold for a test graph T whose projection T is a tree, then T is a colored double tree.
We analyze the asymptotics of (3.6) by working piecemeal in order to count the number of contributing maps φ (i.e., maps such that the summand is nonzero). First, we note that the independence of the random variables X (i) N (j, k) and the injectivity of the maps φ allow us to factor the product over the expectation provided that we take into account multi-edges. The relevant information is contained precisely in the projected graph T = (V, E), which allows us to recast (3.6) as For non-loop edges e ∈ N , the independence of the centered random variables X    Finally, we make use of our strong moment assumption (3.1) to bound the summands in (3.7) uniformly in φ and N . In particular, our bound only depends on T , i.e., Putting everything together, we arrive at the asymptotic ). (3.12) The inequalities (3.8)-(3.10) then imply that τ 0 N [T (W N )] vanishes in the limit unless T is a colored double tree. For such a test graph T , (3.7) becomes [e] are congruent} , (3.13) where N #(V ) denotes the falling factorial N (N − 1) · · · (N − (#(V ) − 1)). The limit (3.5) now follows.

Equation (3.13) explains the apparent asymmetry in the LTD of the Wigner matrices:
if we record the number o i (T ) of pairs of opposing twin edges with the common label i in a colored double tree T , then we can rewrite the nontrivial part of Equation (3.5) as Working directly with this LTD, one can easily prove the asymptotic traffic independence of the Wigner matrices W N .
The careful reader will notice that we have made use of (3.2) in formulating (3.13).
Indeed, by assuming that β i = β i , we were able to disregard the ordering on the vertices induced by the maps φ and conclude that congruent twin edges [e] always give a contribution of β γ([e]) . In general, for a colored double tree T , a summand S φ (T ) of (3.7) will depend on φ, namely, To compute the limit, we must then keep track of the ordering ψ φ on the vertices, where induce the same ordering ψ φ1 = ψ φ2 , then the corresponding summands are equal, i.e., (φ 2 (e)) = S φ2 (T ).
Thus, for an ordering ψ : In this case, (3.13) becomes (3.14) One can easily verify that however, in anticipation of Section 4, we give a natural integral representation of this limit instead. To this end, we introduce a set of indeterminates x V = (x v ) v∈V indexed by the vertices of our graph. A straightforward weak convergence argument then shows that δ p φ to obtain the expression inside of the limit on the left-hand side of (3.15) (up to an asymptotically negligible correction factor). The limit N → ∞ then converts this discretization into the uniform measure on [0, 1] V . Finally, we arrive at the analogue of (3.5) for general β i ∈ D, In contrast to Proposition 3.2, the LTD (3.16) does not necessarily describe asymptotically traffic independent random matrices. In fact, if we divide our index set I into two camps N ) i∈I C are asymptotically traffic independent, but the matrices W C N are not.
For the first statement, we need only to note that the representative value S ψ (T ) does not depend on the ordering of the vertices that are only adjacent to edges with labels i ∈ I R , for which β i = β i . We can formalize this by considering the subgraphs T R = (V R , E R ) and T C = (V C , E C ) of T with edge labels in I R and I C respectively. We write T C = C C 1 ∪ · · · ∪ C C k1 for the connected components of T C , each of which is a colored double tree C = (V C , E C ), and similarly for T R = C R 1 ∪ · · · ∪ C R k2 . We call such a graph a forest of colored double trees. It follows that a summand S φ (T ) only depends on the orderings In this case, for a concatenation of orderings we write S ψ for the common value of We may then write as was to be shown.
Intuitively, we imagine each pair of twin edges [e] imposing a constraint coming from the ordering of its adjacent vertices {src([e]), tar([e])}. We gather these constraints in the ordering ψ φ to carry out the calculation of S φ = S ψ(φ) ; however, if γ([e]) ∈ I R , the constraint becomes vacuous and we can disregard it, which corresponds to discarding the edge [e] (but keeping the adjacent vertices). In this way, we arrive at the integrals in Equation (3.17) (and, after discarding the isolated vertices, the forest of colored double trees T C ). We return to this notion of a "free" edge [e] in a slightly different context in Section 4.
For the second statement (about the lack of asymptotic traffic independence for W C N ), we give a simple counterexample, namely, for β C where the equalities follow from Equation (3.16). Yet, we know that free independence describes the asymptotic behavior of the Wigner matrices regardless of the parameters (β i ) i∈I [11]. Naturally, we would like to know how to extract this information from the LTD (in particular, how this is consistent with the distinct LTDs (3.5) and (3.16)). To see this, note that the joint distribution µ W N factors through the traffic distribution ν W N via We use the injective traffic state to rewrite this as In the limit, the only contributions come from (colored) double trees C π . We claim that if C π is a double tree, then it can only have opposing twin edges (an opposing double tree). Indeed, assume that π ∈ P(V ) identifies the sources src(e 1 ) π ∼ src(e 2 ) and targets tar(e 1 ) π ∼ tar(e 2 ) of two distinct edges e 1 , e 2 ∈ E. We write C ρ for the graph intermediate to C and C π obtained from C by only making these two identifications. If e 1 and e 2 are consecutive edges in the cycle C, then C ρ consists of a directed cycle with two loops coming out of a particular vertex ("rabbit ears"). Otherwise, C ρ consists of two almost disjoint directed cycles overlapping in the twin edge [e] = {e 1 , e 2 } (a "butterfly"). In both cases, we see that no further identifications can possibly result in a double tree C π . Thus, from the perspective of the joint distribution, we need only to consider the behavior of the LTD on opposing colored double trees T . In this case, we see that the LTDs (3.5) and (3.16) agree on the value of Figure 5: An example of a butterfly C ρ starting from a directed cycle C.

Remark 3.3 (A traffic probability characterization of semicircular systems
). An important application of traffic probability lies in the relationship between traffic independence and free independence. In certain situations, one can actually deduce free independence from traffic independence [14,9], the advantage being that the traffic framework might be more tractable. Of course, the two notions do not perfectly align, as seen even in the case of Wigner matrices (Lemma 3.4 in [14] gives yet another example). In this case, we see that the traffic distribution specifies the behavior of our matrices in situations that are not relevant to their joint distribution: in a certain sense, traffic independence asks for too much. Nevertheless, we can still use the traffic framework to make free probabilistic statements, even when a LTD might not exist! In particular, from our work above, we see that if a family of random variables a n = (a then a n converges in distribution to a semicircular system a = (a i ) i∈I . Note that we do not specify the behavior of τ 0 n [T (a n )] on general double trees T (in particular, we do not assume that lim n→∞ τ 0 n [T (a n )] even exists). We will use this criteria in Section 4 to treat the case of RBMs of a general parameter β i ∈ D.

Concentration of the traffic distribution
For a test graph T = (V, E, γ) ∈ T x , we define the random variable For natural reasons, we are interested in bounding the deviation of tr[T (W N )] from its mean. In particular, we would like to emulate the usual approach for Wigner matrices to , which would allow us to upgrade the convergence in Proposition 3.2 to the almost sure sense. It turns out that this approach will not work in general, but it will be instructive to see just how it falls short.
For notational convenience, we will consider the deviation of tr[T (X N )] instead, where X N = √ N W N are the unnormalized Wigner matrices. To begin, We again make use of our strong moment assumption (3.1), this time to bound our summands uniformly in φ 1 , φ 2 , and N . In particular, our bound only depends on T , i.e., We are then interested in the number of pairs (φ 1 , φ 2 ) that actually contribute in (3.19) (i.e., such that the summand (3.21) is nonzero). To this end, note that the maps φ induce maps φ : N (j, k) implies that the outermost product of (3.21) factors over the expectation, resulting in a zero summand.
Thus, we need only to consider so-called edge-matched pairs (φ 1 , φ 2 ). For our purposes, it will be convenient to incorporate the data of such a pair into the graph T itself.
For a pair (φ 1 , φ 2 ), we construct a new graph T φ1 φ2 by considering two disjoint copies T 1 and T 2 of T (associated to φ 1 and φ 2 respectively), reversing the direction of the edges of T 2 , and then identifying the vertices according to their images under the maps φ 1 and φ 2 ; formally, the vertices of T φ1 φ2 are given by ).
An edge match between φ 1 and φ 2 then corresponds to an overlay of edges, though not necessarily in the same direction. Note that : An example of T φ1 φ2 for an edge-matched pair (φ 1 , φ 2 ). Here, we omit the edge labels to emphasize the vertex labels φ (v). Recall that we reverse the direction of the edges of the second copy T 2 before identifying the vertices.
The sum over the set of edge-matched pairs (φ 1 , φ 2 ) can then be decomposed into a double sum: the first, over the set S T of connected graphs T = (V , E , γ ) obtained by gluing the vertices of two disjoint copies of T with at least one edge overlay (we reverse the direction of the edges of the second copy beforehand, and we keep track of the origin of the edges E = E (1) E (2) ); the second, over the set of injective labelings φ : V → [N ] of the vertices of T . We may then recast (3.19) as . (3.22) We defined S T by reversing the direction of the edges of the second copy of T before gluing in order to write (3.22) without reference to the transposes (3.20). Moreover, by keeping track of the origin of the edges, we ensure that S T does not conflate otherwise isomorphic graphs, and so guaranteeing a faithful reconstruction of (3.19) from (3.22).
The set S T is of course a finite set whose size only depends on T . We consider a generic T ∈ S T , iterating the proof of Proposition 3.2. We decompose the set of edges E = L ∪ N as before, and the same for E = L ∪ N (recall that E denotes the set of equivalence classes in E ). Suppose that there exists a lone edge e 0 ∈ [e] ∈ N with the label γ(e 0 ) = i 0 ∈ I so that Without loss of generality, we may assume that e 0 ∈ E (1) . We write The independence of the centered random variables X (i) N (j, k) and the injectivity of the maps φ imply that Thus, for T ∈ S T to contribute, each label i ∈ I present in a class [e] ∈ N must occur with multiplicity m i,[e] ≥ 2.     (3.27) or, equivalently, falling short of our goal. One might hope that we were overly generous in our bounds and that equality in is not attainable in practice. In fact, in the usual situation of traces of powers (3.30) this is indeed the case; however, in general, (3.27) is tight. In particular, note that if we start with a tree T , we can overlay two disjoint copies T 1 and T 2 of T , the second with reversed edges, to obtain an opposing colored double tree T . In this case, we have equality in (3.23)-(3.26). Proposition 3.2 then shows that the contribution of T in (3.22) is Θ(N #(E)+1 ).
T T x 1 x 2 x 1 Figure 7: An example of an overlay of trees. Here, we consider two copies T 1 and T 2 of the tree T . We depict the second copy T 2 with the direction of its edges already reversed.
Working backwards, we identify the worst case scenario: for (3.23)-(3.26) to hold with equality, we need to glue (not necessarily overlay) disjoint copies T 1 and T 2 of T with at least one edge overlay to obtain a colored double tree T (though T itself need not be a tree in general). In the classical case (3.30), T corresponds to a cycle of length 1 + · · · + m and such a gluing does not exist: starting with an edge overlay between two copies of the cycle, we obtain a butterfly as in Figure 5 (though the twin edges in the butterfly may now be opposing), leading to a strict inequality in (3.29) and hence the usual asymptotic O(N −2 ) in place of (3.28).
The careful reader will notice that we have actually proven a stronger result in the presence of loops L = ∅: in place of (3.26), we can instead use the tighter bound We summarize our findings thus far.

Lemma 3.4 (Preliminary concentration for Wigner matrices). For a family of Wigner
The bound is tight in the sense that there exist test graphs T ∈ T x such that The colored double tree obstruction in Lemma 3.4 ramifies into a forest of colored double trees for higher powers, but this construction remains the lone outlier. We exploit this feature to prove concentration for higher powers.
The bound is tight in the sense that there exist test graphs T ∈ T x such that N )+1) ).
Proof. The concrete case of m = 2 contains all of the essential ideas; we encourage the reader to follow through the proof with this simpler case in mind.
To begin, we expand the absolute value as in (3.19) to obtain φ1,...,φ2m: (φ (e)) , Our strong moment assumption (3.1) again ensures that we can bound the summands in (3.31) uniformly in (φ 1 , . . . , φ 2m ) and N with a dependence only on T , i.e., (3.32) We proceed to an analysis of contributing 2m-tuples Φ = (φ 1 , . . . , φ 2m ). Using the same notation as before, we say that a coordinate φ in a 2m-tuple Φ is unmatched if Similarly, we say that distinct coordinates φ and φ (i.e., = ) are matched if We further say that a 2m-tuple Φ is unmatched if it has an unmatched coordinate φ ; otherwise, we say that Φ is matched. We define an equivalence relation ∼ on the coordinates of Φ by matchings; thus, φ ∼ φ ⇐⇒ ∃ 1 , . . . k ∈ [2m] : φ j and φ j+1 are matched for j = 0, . . . , k, where (0) = and (k + 1) = . We write Φ for the set of equivalence classes in Φ, in which case (3.32) becomes For an unmatched Φ, this product includes a zero term; henceforth, we only consider matched 2m-tuples. We incorporate the data of such a tuple into the graph T as before. For a 2m-tuple Φ, we construct a new graph T Φ by considering 2m disjoint copies (T 1 , . . . , T 2m ) of T (associated to Φ = (φ 1 , . . . , φ 2m ) respectively), reversing the direction of the edges of (T 2 , T 4 , . . . , T 2m ), and then identifying the vertices according their images under the maps Φ; formally, the vertices of T Φ are given by The sum over the set of matched 2m-tuples Φ can then be decomposed into a double sum: the first, over the set S T of (not necessarily connected) graphs T = (V , E , γ ) obtained by gluing the vertices of 2m disjoint copies of T such that each copy has at least one edge overlay with at least one other copy (we reverse the direction of the edges of the even copies beforehand, and we again keep track of the origin of the edges · · · E (2m) ); the second, over the set of injective labelings φ : V → [N ] of the vertices of T . We write C(T ) = {C 1 , . . . , C d T } for the set of connected components of T . We emphasize that d T ≤ m. · · · E (jp(kp)) .
We may then recast (3.31) as We consider a generic T ∈ S T . Note that our analysis from before applies to each of the connected components C p = (V p , E p , γ p ). In particular, using the same notation as before, we know that the components of a contributing T must satisfy Of course, we also have the inherent (in)equalities Putting everything together, we arrive at the asymptotic The tightness of our bound follows much as before. If we start with a tree T , we can overlay pairs of the 2m-disjoint copies (T 1 , . . . , T 2m ) of T to obtain a forest of d T = m opposing colored double trees. In this case, we have equality in (3.33) and (3.35)-(3.38).
Once again, Proposition 3.2 shows that the contribution of T in (3.34) is Θ(N m(#(N )+1) ). As was the case for m = 1, a forest of m colored double trees T corresponds to the worst case scenario.
Reintroducing the standard normalization W N = N −1/2 X N , we obtain the asymptotic which bounds the deviation

Random band matrices
Our analysis of the Wigner matrices in Section 3 crucially relies on two important features of our ensemble, namely, the homogeneity of the vertices in our graphs T and the divergence of our normalization √ N . By the first property, we mean that the label φ(v) ∈ [N ] of a vertex v ∈ V does not constrain our choice of a contributing label φ(w) for an adjacent vertex w ∼ e v (or, in the case of an injective labeling φ, does so uniformly in the choice of φ(v)). At the level of the matrices X N , this corresponds to the fact that any given row (resp., column) of a Wigner matrix looks much the same as any other row (resp., column). For example, if we consider a real Wigner matrix as in Definition 1.1, then the rows (resp, columns) each have the same distribution up to a cyclic permutation of the entries. More generally, there exists a permutation invariant realization of our ensemble X N iff β i ∈ [−1, 1]. This property of course does not hold for random band matrices Ξ N = B N • X N : rows (resp, columns) near the top or the bottom (resp., the far left or the far right) of our matrix will in general have fewer nonzero entries. This in turn owes to the asymmetry of the band condition B N . We can recover the homogeneity of our ensemble by reflecting the band width across the perimeter of the matrix to obtain the so-called periodic random band matrices, providing an intermediate model between the Wigner matrices and the random band matrices. We start with this technically simpler model and work our way up to the RBMs. We summarize the main results at the end of the section on proportional growth RBMs.

Periodic random band matrices
To begin, we formalize Definition 4.2 (Periodic RBM). Let (b N ) be a sequence of nonnegative integers. We write P N for the corresponding N × N periodic band matrix of ones with band width b N , i.e., Let X N be an unnormalized Wigner matrix. We call the random matrix Γ N defined by Γ N = P N • X N an unnormalized periodic RBM. Using the normalization Υ N = (2b N ) −1/2 J N , we call the random matrix Λ N defined by Λ N = Υ N • Γ N a normalized periodic RBM. We simply refer to periodic RBMs when the context is clear, or when considering the definition altogether.
, which we fill in provided that the band width condition |i−j| ≤ b N (resp., |i − j| N ≤ b N ) is satisfied.
N ) i∈I be a family of unnormalized Wigner matrices as before. We consider a family of divergent band widths (b  Proof. The proof follows much along the same lines as Proposition 3.2 except that we must take care to account for the differing rates of growth in the band widths (b (i) N ) i∈I . To begin, suppose that T = (V, E, γ). By definition, we have that     Here, the situation for general β i ∈ D becomes much different. For a single periodic RBM Λ N of divergent band width b N → ∞, the LTD follows (3.16) as in the Wigner case; however, the joint LTD of P N might not exist depending on the fluctuations of the band widths (b (i) N ) i∈I . In this case, we need to make additional assumptions on the band widths (e.g., proportional growth) to ensure the existence of an asymptotic proportion for an ordering ψ of the vertices (i.e., the analogue of (3.15)). We comment more on this situation later. On the other hand, the orderings play no role in the calculation of τ 0 N [T (P N )] for an opposing colored double tree T . Thus, we can apply the criteria (3.18) in Remark 3.3 to conclude that P N = (Λ (i) N ) i∈I converges in distribution to a semicircular system a = (a i ) i∈I regardless of the parameters (β i ) i∈I .
Note that a periodic RBM Λ N with band width b N = N/2 corresponds to a standard Wigner matrix W N . As such, we can view Lemma 4.3 as a generalization of Proposition 3.2. We extend the result to include RBMs of slow growth in the next section.

Slow growth
To begin, we partition the index set I of our matrices X N = (X  (4.10) We form the corresponding family of periodic RBMs as before, For i ∈ I 2 , we also form the corresponding family of slow growth RBMs (Definition 1.2), S if T is a colored double tree, 0 otherwise. (4.11) Proof. In view of Lemma 4.3, it suffices to show that Of course, the only difference between the families P N and M N comes from the periodization of the slow growth RBMs S N . Equation (4.12) then asserts that the contribution of the additional entries arising from this periodization becomes negligible in the limit.
Our notation works just as well in this case to produce the analogue of (4.4) for our sum, U (γ(e )) N (φ(e )) .
Naturally, we then look for the analogue of (4.5). Note that the corresponding version of the band width condition (4.7) must now take into account the index γ(e For an edge e ∈ N , we define |φ(e)| = |φ(src(e)) − φ(tar(e))|. We may then write the analogue of (4.5) for our family M N as X (γ(e )) N (φ(e )) .
is of course a finite set. In this case, we have the bound ]. We may then recast (4.17) as . (4.20) T being a colored double tree, we know that  Equations (4.19)-(4.21) formalize our intuition from before: the periodic version of a RBM only differs within band width's distance of the perimeter; for a slow growth RBM, one then needs to be very close to the perimeter to realize this difference; as such, the corresponding interior region accounts for the bulk of the calculations. The result now follows. Figure 9: An illustration of the "interior" region of a random band matrix (resp., periodic random band matrix) at band width's distance b N N = o(1) from the perimeter. Here, we cut off the boundary to see that the two interior regions are indeed identical. If I = I 2 , then we need to account for the possibility of the band widths of the periodic RBMs being large enough to bring us close to the perimeter so that the walk crosses over with a step from a periodized version of a slow growth RBM. Taking inspiration from the simpler case of I = I 2 , our analysis shows that a generic walk stays within a region in which the slow growth RBMs and their periodized versions are identical.
We encounter the same problem from before when considering general β i ∈ D: without further assumptions on the band widths (b (i) N ) i∈I , their fluctuations could possibly preclude the existence of a joint LTD. In general, we must again settle for the convergence N ) i∈I2 in distribution to a semicircular system a = (a i ) i∈I . Recall that the Wigner matrices W N are asymptotically traffic independent iff β i ∈ R, and that a permutation invariant realization of our ensemble W N exists iff β i ∈ R. One might then ask if permutation invariance is a necessary condition for matricial asymptotic traffic independence; however, we see that this is not the case. In particular, one cannot find a permutation invariant realization of the periodic RBMs (except in the trivial case of b N ∼ N/2), nor of the slow growth RBMs. Instead, we relied on the aforementioned homogeneity property and the divergence of our normalization. Taken alone, neither of these two properties suffices, as we shall see in the proportional growth regime (which lacks homogeneity) and the fixed band width regime (which has a fixed normalization).

Proportional growth
Not surprisingly, the periodization trick from the previous section fails for proportional growth RBMs unless c = 1 (recall that c = lim N →∞ b N N ∈ (0, 1]). In the case of proper proportion c ∈ (0, 1), the entries in the matrix introduced by reflecting the band width across the perimeter now account for an asymptotically nontrivial region in the unit square and so no longer represent a negligible contribution to the calculations. Nevertheless, we can adapt our work from before to prove the existence of a joint LTD supported on colored double trees T , though in general the value of this limit will depend on the particular degree structure of the graph T . To formalize our result, we now split the index set I = I 1 ∪ I 2 ∪ I 3 ∪ I 4 into four camps. We consider a class of divergent band widths (b For i ∈ I 1 ∪ I 2 , we form the corresponding families of periodic RBMs and slow growth RBMs as before, For i ∈ I 3 ∪ I 4 , we form the corresponding families of proportional growth RBMs, We start with the simpler case of the single family O if T is a colored double tree, 0 otherwise, (4.22) where p T (c 4 ) > 0 only depends on the test graph T and the proportions c 4 = (c i ) i∈I4 .
Proof. As usual, we begin by expanding and rewriting the summands as  We may think of the ratio N #(V ) as the proportion of admissible maps φ : V → [N ]. Unfortunately, the vertices of our graph T lack the homogeneity property from before due to the asymmetry of the band condition (4.23). This makes the task of computing C N (T ) extremely tedious (and highly dependent on T ). Nevertheless, we can give an integral representation of the limit of this ratio much as in [7]. In particular, a straightforward weak convergence argument shows that The remaining term in (4.22) follows as . For general β i ∈ D, we must again keep track of the orderings ψ of the vertices. In this case, we combine the integrands of (3.15) and (4.24) to define , which replaces the 1 #(V )! term in (3.16). In particular, we can write the LTD of O Similarly, we group the normalizations coming from the twin edges F ⊂ E with ). If F = E, we write Cut T,r = Cut E,r (resp., Norm T (c 4 ) = Norm E (c 4 )). In this case, We will need some simple bounds on the integral Int T (c 4 ). We start with an easy upper bound. Consider a leaf vertex v 0 of our colored double tree T . Let v 1 ∼ [e0] v 0 denote the unique vertex v 1 adjacent to v 0 . We compute the diameter f (x v1 ) of a cross section in the banded strip of the unit square [0 In particular,

It follows that
where T \ [e 0 ] is the colored double tree obtained from T by removing the leaf v 0 and its adjacent twin edges [e 0 ]. Iterating this construction, we obtain the upper bound The same reasoning of course shows that but we can do much better for small proportions c 4 . In particular, assume that Then Thus, forĉ < 1 2 , we have the bounds  We view the limitĉ → 0 + as approaching the slow growth regime. In view of (4.27), we see that the LTD (4.22) of the proportional growth RBMs behaves accordingly (in particular, we have convergence to the LTD (4.11) of the slow growth RBMs).
In an easier direction, we can also consider the limit c = min One then clearly has (4.28) We can push this limit through the integral by dominated convergence to obtain Of course, the same convergence also holds for the normalizations (4.25), (4.31) We view the limit c → 1 − as approaching the usual Wigner matrices, or, more generally, the full proportion RBMs. Again, our limit (4.31) shows that the LTD (4.22) behaves accordingly (in particular, we have convergence to the LTD (3.5) of the Wigner matrices).
Up to now, our analysis of the integral Int T (c 4 ) essentially follows [7]. We take care to account for possibly different band widths by grouping them in the min c or the maxĉ, but in both cases we indiscriminately send the proportions to a single boundary value {0, 1}. From this point of view, we fail to perceive any differences in the limits We start with the simpler case of sending the band width c i0 of a single label i 0 ∈ I 4 in our colored double tree T to 1 − . We write T i0 = (V i0 , E i0 ) for the subgraph of T with edge labels in i 0 . In general, T i0 is a forest of colored double trees (in the single "color" i 0 ). We define T i0 = (V i0 , E i0 ) as before. We remove the twin edges E i0 from T to obtain a forest of colored double trees T \ E i0 (say, with connected components T 1 , . . . , T k ). We emphasize that we only remove the edges E i0 ; in particular, we keep any resulting isolated vertices. We then have the analogues of (4.28)-(4.30):   It follows that In particular, if T consists of an isolated vertex, then p T (c 4 ) = 1. One can then effectively discard the isolated vertices of T \ E i0 and just consider the resulting forest of nontrivial colored double trees. We choose to keep these vertices in writing a simple, consistent formula for our limit.
Of course, there is nothing special about only sending one of the band widths c i0 → 1 − . In fact, the same argument clearly applies to any collection of labels i 0 , . . . , i j in a colored double tree T . We state the full result later once we have also considered the behavior of p T (c 4 ) in the limit c i0 → 0 + , but first we must introduce some more notation.
For any pair of subsets W ⊂ V and F ⊂ E, we define the conditional expectation For example, the reader can easily verify that As before, we start with a single label i 0 ∈ I 4 in T , for which we now consider the limit c i0 → 0 + . To simplify the argument, we first assume that there is a unique pair of twin edges [e i0 ] with the label γ([e i0 ]) = i 0 . For notational convenience, we write We condition on the vertices {a, b} to obtain is a bounded continuous function that does not depend on c i0 and is the uniform (probability) measure on the banded strip in unit square [0, 1] 2 defined by |x a − x b | ≤ c i0 . In the limit, we have the weak convergence In particular, this implies that ] is the colored double tree obtained from T by contracting the twin edges [e i0 ] (i.e., we remove the edges [e i0 ] and merge the vertices {a, b}). We note the contrast to the situation in (4.36) in the limit c i0 → 1 − , where we remove the edges but do not otherwise modify the vertices.
We must take care if the label i 0 appears in more than one set of twin edges. In any case, we can always identify the subgraph T i0 of T with edge labels in i 0 . In general, is again a bounded continuous function that does not depend on c i0 . In this case, we cannot immediately write (4.38) in terms of probability measures as we did in (4.37) since, in general, however, our work (4.27) from before shows that Thus, we can instead write As before, we note that where T /T i0 is the colored double tree obtained from T by contracting the edges of T i0 (i.e., for each ∈ [k], we remove the edges E and merge the vertices V into a single vertex).
We can easily adapt our argument to accommodate multiple band widths c i0 , . . . , c ij in the limit max(c i0 , . . . , c ij ) → 0 + . In this case, we replace T i0 with T i , the subgraph of T with edge labels in i = {i 0 , . . . , i j }; otherwise, the same argument goes through just as well.

T
T \ E i0 x i0 x i1 x i2 x i1 x i1 x i0 x i0 x i1 x i2 x i1 x i1 T /T i0 x i1 x i1 x i2 x i1 Figure 11: A comparison of the resulting graphs in the limit c i0 → 1 − (resp., c i0 → 0 + ). Here, we start with a colored double tree T and remove (resp., contract) the edges with label x i0 to obtain the limit graph T \ E i0 (resp., T /T i0 ). Note that the two operations can produce very different graphs.
At this point, we see how the limits (4.32) come about in different ways: in the limit c → 0 + , we contract all of the edges, leaving a single isolated vertex; in the limitĉ → 1 − , we remove all of the edges, leaving #(V ) isolated vertices.
Finally, the result for a collection of band widths sent to possibly different boundary values should come as no surprise. We combine our work in the two previous cases, taking care to account for parts moving simultaneously in different directions. To begin, let J 0 (resp., J 1 ) denote the collection of labels in our colored double tree T whose band widths are to be sent to 0 + (resp., 1 − ). We define and write c 2 = c 4 \ (c 0 ∪ c 1 ) for the remaining band widths. We are then interested in the limit We decompose our graph as before. We write T 0 + for the subgraph of T with edge labels in J 0 . In general, T 0 + = (V 0 + , E 0 + ) is a forest T 0 + = T + 1 · · · T + k of colored double trees T + = (V + , E + ) except now possibly with multiple colors. Similarly, we write T 1 − = (V 1 − , E 1 − ) for the subgraph of T with edge labels in J 1 . Finally, we write E 2 = E \ (E 0 ∪ E 1 ) for the remaining edges. Conditioning on the vertices V 0 + = V + 1 · · · V + k of T 0 + , we obtain the analogue of (4.39), where δ(c 0 ) is a real number depending on c 0 such that Despite considering multiple band widths c 0 , we still have the weak convergence As before, is a bounded continuous function that does not depend on c 0 ; however, f c1 does depend on c 1 . In particular, the function it follows that The monotonicity of Cut E 1 − ,c1 in the proportions c 1 then allows us to conclude that where F is the forest of colored double trees F = T 1 · · · T s obtained from T by removing the edges E 1 − and contracting the edges E 0 + . Our treatment of p T (c 4 ) suggests the following form for the joint LTD of the matrices O N . We leave the by-now familiar details of the proof to the diligent reader. (4.40) where F = T 1 · · · T s is the forest of colored double trees obtained from T by contracting the edges with labels in I 2 and removing the edges with labels in

Theorem 4.8 (Traffic convergence for RBMs). For any test graph
Proof. The statements about asymptotic traffic independence follow from the calculation of the forest F from our colored double tree T (we simply remove the edges with labels in I 3 ) and the multiplicativity of (4.41). For the statements about non-asymptotic traffic independence, we give a simple counterexample, namely, for i 2 ∈ I 2 and i 4 , In particular, which covers both statements.
The careful reader will notice that the periodic RBMs P N has a subsequence of slow growth and another subsequence of proportional growth, then the LTDs along these two subsequences will be different. If we assume that the band widths b (1) fall into one of these two regimes, slow growth or proportional growth respectively, then we can prove the extension of Theorem 4.8 to P (1) N . In this case, the LTD essentially follows (4.40) except that we must now also contract the edges with labels in I 1 and remove the edges with labels in I 1 (regardless of the limiting proportions The contraction of the edges with labels in I 1 should come as no surprise given Lemma 4.4, where we saw that periodizing a slow growth RBM does little to affect the calculations. Just as we contract the labels in I 2 , we should then also expect to contract the labels in I 1 . On the other hand, as we noted before, periodizing a proportional growth RBM changes the situation entirely. Formally, we need to work with the periodic absolute value 1] in our integral to account for the edges with labels in I 1 ; however, the analogue of (4.26) does not depend on where we measure the diameter of our cross section This balances out perfectly with the normalization of the periodic RBMs and so we can integrate out the vertices that are only adjacent to edges with labels in I 1 without changing the value of the integral. This of course corresponds to simply removing the edges with labels in I 1 when calculating p F (c 4 ). Iterating the proof of Corollary 4.9, we see that the periodic RBMs P For general β i ∈ D, we must again settle for convergence in distribution.
The family a 1 ∪ a 1 ∪ a 2 ∪ a 3 is a semicircular system; the families a 1 , a 3 , and a 4 are free; the families a 2 and a 4 are not free, nor are the families a 1 and a 4 ; finally, the family a 4 = (a i ) i∈I4 is not free.
Proof. The convergence in distribution follows from a modified version of the criteria (3.18) in Remark 3.3. In particular, we do not actually need to know the value of for an opposing colored double tree T , just that it exists. In this case, we know that the value of this limit is equal to p F (c 4 ), which in turn is equal to 1 if there are no edges with labels in I 4 . This proves the first statement about a 1 ∪ a 1 ∪ a 2 ∪ a 3 .
For the second statement, about a 1 ∪ a 3 ∪ a 4 , it suffices to prove that a 3 and a 4 are free. Indeed, this follows from the calculation of p F (c 4 ): edges with labels in either I 1 or I 3 are both treated just the same and simply removed. In particular, this implies that the joint distributions µ a 1 ∪a3∪a4 and µ a 3 ∪a3∪a4 = µ b3∪a4 are identical, where a 3 is the limit of the full proportion RBMs O N . The standard techniques then apply to show that a 3 and a 4 are free [27].
Similarly, the joint distributions µ a2∪a4 and µ a 1 ∪a4 are also identical, and so we need only to consider the families a 2 and a 4 . Let a i2 ∈ a 2 and a i4 ∈ a 4 . If a i2 and a i4 were free, then ϕ(a 2 i4 a i2 a 2 i4 a i2 ) = ϕ(a 2 i4 ) 2 ϕ(a 2 i2 ) = 1; however, one can easily calculate Finally, suppose that a i4 = a j4 ∈ a 4 with 0 < c i4 ≤ c j4 < 1. If a i4 and a j4 were free, then ϕ(a 2 i4 a 2 j4 ) = ϕ(a 2 i4 )ϕ(a 2 j4 ) = 1; however, one can again show that where p S ({c i4 , c j4 }) is as in the proof of Corollary 4.3. (1) N converge in distribution to a semicircular system regardless, even without this assumption.
Finally, the same considerations that allowed us to translate Proposition 3.2 to Theorem 4.8 also work to prove the RBM version of the concentration inequalities in Theorem 3.5. Here, we do not make any assumptions on the band widths (b (i) N ) i∈I1 beyond their divergence (4.1), nor on the parameters β i ∈ D.
The bound is tight in the sense that there exist test graphs T ∈ T x such that As before, we can use Theorem 4.12 to upgrade the convergence in Theorems 4.8 and 4.10 to the almost sure sense.
Traffic distributions of random band matrices In particular, note that Θ

Fixed band width
We have much less to say in the fixed band width regime. For starters, we cannot work in the generality of the Wigner matrices of Section 3. Instead, we must further assume that the off-diagonal entries (resp., the diagonal entries) of X N are identically distributed and independent of N ; otherwise, in general, the LSD of even a single fixed band width RBM Θ N = Υ N • Ξ N = Υ N • (B N • X N ) might not exist, never mind the LTD.
We assume hereafter that any fixed band width RBM arises from this restricted setting.
Assuming a symmetric distribution for the entries of X N , Section 6 in [7] proves the existence of a symmetric non-universal LSD µ b for a real symmetric RBM Θ N of fixed band width b N ≡ b. The authors further prove that the distribution µ b converges weakly to the standard semicircle distribution µ sc in the limit b → ∞. We consider the joint LTD of independent fixed band width RBMs (real and complex) without this symmetry assumption and prove the analogous convergence to the semicircular traffic distribution in the large band width limit.
To formalize our result, we consider a class of fixed band widths b = (b (i) We form the corresponding family of fixed band width RBMs We write µ i (resp., ν i ) for the distribution of the strictly upper triangular entries X (i) N (j, k) (resp., the diagonal entries X (i) N (j, j)) so that µ i = L(X (i) N (j, k)) and ν i = L(X (i) N (j, j)), ∀j < k.
In contrast to the previous sections, our fixed normalizations Υ (i) N = (2b i + 1) −1/2 J N force us to also consider non-tree-like test graphs T in the large N limit.  where µ = (µ i ) i∈I , ν = (ν i ) i∈I , and (V, N 0 ) is any spanning tree of (V, N ).

N .
We note the contrast to the situation in (3.14). In particular, we cannot use the same weak convergence argument to give an integral representation of lim N →∞ p we see that the sequence (a Theorem 4.14 still holds for general β i ∈ D: in fact, since we already kept track of the orderings ψ, the same proof goes through just as well (except with different values for S ψ (T )). In this case, the limit (4.45) might not exist depending on the relative rates of growth in the band widths (b i ) i∈I . If we assume that the band widths grow at the same rate in the limit b → ∞, then the proportions q (ψ) N will tend to 1 #(V ) as in (3.16), but one can skew these proportions along different subsequences to create an obstruction. One can also periodize the fixed band width RBMs without affecting the calculations (a fixed band width is in some sense the slowest growth possible, and so we can adapt the techniques from the slow growth case).
At this point, we can combine everything into a result for the joint (traffic) distribution of periodic RBMs, slow growth RBMs, proportional growth RBMs, and fixed band width RBMs; however, the result is not much more interesting than what is already known from the previous section due to the form of the LTD (4.44). In particular, we do not have any interesting asymptotic independences arising between the fixed band width RBMs and those of the previously considered regimes, nor amongst the fixed band width RBMs themselves (except in the trivial case b i = 0 of the diagonal matrices).

A An almost Gaussian degree matrix
As an application of Theorem 4.8, we compute the LSD of the degree matrix deg(Θ N ) of a proportional growth RBM Θ N . For simplicity, we restrict our attention to real Wigner matrices X N . We find that the LSD is almost Gaussian in the sense of its moments.
As before, we form the corresponding proportional growth RBMs, unnormalized Ξ N and otherwise Θ N . Let c = lim N →∞ b N N ∈ (0, 1] denote the limiting proportion of the band width b N . The entries of the degree matrix D N = deg(Θ N ) can then be written as One can use the asymptotics of partial sums of falling factorials to compute the limits for example, by choosing a convenient realization of the random variables X N (i, k) and then appealing to the universality of (4.40); however, one can avoid such a tedious calculation and obtain the answer from (4.40) directly. In particular, note that we can factor the distribution µ D N through the traffic distribution ν Θ N via where we have made use of (4.26) in the last equality.
We recognize the double factorial (2 − 1)!! as the 2 -th moment of the standard normal distribution. In view of Theorem 4.12, the limits (A.1) and (A.2) show that µ D N converges weakly almost surely to a symmetric distribution ν c of unit variance with almost Gaussian moments (if c = 1, then these moments are precisely Gaussian). In particular, we can compute the limits