A random walk approach to linear statistics in random tournament ensembles

We investigate the linear statistics of random matrices with purely imaginary Bernoulli entries of the form $H_{pq} = \overline{H}_{qp} = \pm i$, that are either independently distributed or exhibit global correlations imposed by the condition $\sum_{q} H_{pq} = 0$. These are related to ensembles of so-called random tournaments and random regular tournaments respectively. Specifically, we construct a random walk within the space of matrices and show that the induced motion of the first $k$ traces in a Chebyshev basis converges to a suitable Ornstein-Uhlenbeck process. Coupling this with Stein's method allows us to compute the rate of convergence to a Gaussian distribution in the limit of large matrix dimension.


Introduction
The idea of using a stochastic dynamical evolution to unearth the spectral properties of random matrices was first proposed by Dyson [1]. His insight was that, by initiating a suitable Brownian motion within the space of certain invariant matrix ensembles, one could induce a corresponding motion in the eigenvalues, which is independent of the eigenvectors. Thus, solving the associated Fokker-Planck equation for the stationary solution would recover the joint probability density function for the eigenvalues. Dyson Brownian motion (DBM), as it is now known, has since become a powerful tool in random matrix theory (see for instance [2][3][4]). In [5] the present authors advocated an approach in which the idea of using stochastic dynamics to obtain spectral statistics was extended to Bernoulli matrix ensembles. In particular, we argued heuristically, that by initiating a suitable discrete random walk in the space of matrices, the induced motion of the eigenvalues would tend, in some fashion, to DBM in the limit of large matrix size. Then, as a consequence, the spectral properties of Bernoulli matrices would converge to those of the Gaussian orthogonal ensemble (GOE). In the present article we apply this approach to the linearstatistics of matrices associated to random tournaments and random regular tournaments. Tournament graphs are widely studied objects in combinatorics, with results and open questions regarding, their enumeration, score sequences, cycle properties and Perron-Frobenius eigenvalues for instance [6][7][8][9][10][11][12]. However, beyond [13], there appears to be little analysis from a random matrix theory perspective.
For a random (self-adjoint) matrix M , the linear-statistic, for some function h, refers to the distribution of the following random variable, . This result is therefore, in some respects, analogous to the law of large numbers in standard probability theory.
One is therefore led to the question regarding fluctuations about this mean, i.e. what is the distribution of Φ h (M ) − E[Φ h (M )] for some particular random matrix ensemble? This was first addressed by Jonsson [16] in the case of Wishart matrices, showing this random variable is Gaussian in the large N limit. Later, this was also shown to be the case for Wigner matrices [17,18] and also for β-ensembles with appropriate potentials [19] for various forms of the test function h. Notice there is no obvious analogy with the classic CLT, since the eigenvalues in (1.1) are highly correlated, meaning the usual 1/ √ N normalisation is not required. Proving this behaviour has become a key part of the universality hypothesis within random matrix theory, since it addresses the global spectral behaviour, and has thus garnered much attention since the first results were established. For instance, many authors have attempted to classify for which test functions h the Gaussian behaviour is retained [20][21][22][23][24]. Others have investigated large deviation aspects [25], rates of convergence [26] or different kinds of random matrix ensembles such as band matrices [27] or those with non-trivial correlations [26,28].
To show the convergence of (1.1) for all polynomial test functions of degree k one may, instead, show the joint convergence for a polynomial basis. A particularly convenient choice are the Chebyshev polynomials of the first kind then it was first observed by Johansson [19] that if M is chosen from one of the standard Gaussian ensembles, then in the limit of large matrix size the random variables (Tr(T 1 (M )), . . . , Tr(T k (M ))) converge to independent Gaussian random variables. A Brownian motion approach has already been used to show convergence to independent Gaussian random variables of Tr(T n (M )) in the Gaussian unitary ensemble [29] and more general β-ensembles [30], as well as traces of unitary matrices Tr(U n ) in the classical compact groups [31] and the circular βensembles [32]. In particular, the works [30][31][32] utilised a multivariate form of Stein's method, developed by Chatterjee & Meckes and Reinert & Röllin [33][34][35], to obtain rates of convergence, something which, beyond [26], is often neglected in the analysis of linear statistics. However the scenarios [29,31,32] have involved invariant matrix ensembles, which have permitted the use of exact expressions for the eigenvalue motion, which are not available in this context. We therefore turn to an alternative combinatorial approach, similar to that applied in [36] for the unimodular ensemble and [37,38] for random regular graphs. In particular, we express the variables Tr(T n (M )) in terms of sums over non-backtracking cycles and analyse how these behave under the random walk. The difficulties arise in providing accurate bounds for the remainder terms, which involve the expectations of certain products of matrix elements with respect to the appropriate ensembles.
The article is outlined as follows: In Section 2.1 we discuss the ensembles of random tournaments and random regular tournaments, which lead to Definition 1 and Definition 2 for the matrix ensembles we call the imaginary tournament ensemble (ITE) and regular imaginary tournament ensemble (RITE) respectively. We then present our main results, given in Theorem 2.1 and Theorem 2.2, which provide rates of convergence to independent Gaussian random variable of the first k traces of Chebyshev polynomials for matrices in the ITE and RITE respectively. In Section 2.2 we attempt to give an intuitive explanation of the random walk approach, including Theorem 2.3 (due to [33][34][35]) regarding the multidimensional exchangeable pairs approach to Stein's method, and briefly outline the the methods used to evaluate the appropriate remainder terms.
In Section 3 we introduce some graph theoretical tools required for subsequent analysis. Sections 4 and 5 are dedicated to showing how to construct suitable random walks for the ITE and RITE respectively. Specifically, we prove Propositions 4.1 and 5.1 (respectively) which show the remainders contained in Theorem 2.3 are small enough to allow for the results of Theorem 2.1 and Theorem 2.2. In particular, although interesting in its own right, the ITE will serve as an illustrative example that the approach works in simple settings and will help introduce ideas also needed for the more complicated RITE.
Finally, in Section 6 we offer some concluding thoughts and remarks about possible extensions and in the appendix we collect some necessary theorems, proofs and identities. In particular, Appendix B contains a proof for the growth rate of expectation of products of matrix elements in the RITE. This is adapted from the work of McKay [7] on the number of regular tournaments and is critical in estimating the remainders in Proposition 5.1.

Definitions and results
A tournament graph on N vertices is a complete graph in which every edge has a specific orientation (see e.g. Figure 1). Player p is said to win against player q (equivalently player q loses against player p) if there is a directed edge from p to q. This is represented by an a adjacency matrix A admitting the property that A pq = 1 − A qp = 1 (resp. 0) if player p wins (resp. loses). Since a player can't play themselves the diagonal A pp = 0. We denote the set of tournaments on N vertices as T N , with cardinality |T N | = 2 N (N −1)/2 -the number of possible choices of direction for each edge.
If all players win the same number of games, or equivalently the number of incoming edges into a vertex is equal to the number of out going edges for every vertex, then the tournament graph is said to be regular (see e.g. Figure 2). This is characterised by the condition q A pq = (N − 1)/2 for all p = 1, . . . , N , which enforces that N must be odd. We denote the set of regular tournaments on N vertices by R N . An exact formula for |R| is not available however McKay showed [7] (improving on an earlier estimate of Spencer [6]) that for large N In particular, one observes that |R N |/|T N | → 0 as N → ∞ and therefore one cannot immediately infer properties of regular tournaments from tournaments by ergodicity arguments. Hence, the restriction of the rows sums must be dealt with another manner. Due to the non-symmetric nature of the adjacency matrices the eigenvalues are, in general, complex. However applying the simple transformation H = i(2A − (E N − I N )) (where i = √ −1 and E N is the N × N matrix in which every element is 1) brings the matrices into a self-adjoint form. Thus A pq = 0 (resp. 1) corresponds to H pq = +i (resp. −i) for all off-diagonal elements p = q and H pp = 0 for all p = 1, . . . , N . Importantly this means taking complex conjugation yields H = −H, which in turn implies that if λ is an eigenvalue of H then so is −λ, with the eigenvectors being complex conjugates of each other. This spectral symmetry implies In order to make a distinction we say that H is an imaginary tournament matrix (resp. regular imaginary tournament matrix is a tournament (resp. regular tournament). Therefore, with a slight abuse of notation, we write either H ∈ T N or H ∈ R N respectively. Definition 1 (Imaginary tournament ensemble). Let T N be the set of imaginary tournament matrices of size N . Then the imaginary Bernoulli ensemble (ITE) is given by the set of H ∈ T N with the uniform probability measure P (H) = |T N | −1 .
Definition 2 (Regular imaginary tournament ensemble). Let R N be the set of regular imaginary tournament matrices of size N (with N being odd). Then the random imaginary tournament ensemble (RITE) is given by the set of H ∈ R N with the uniform probability measure P (H) = |R N | −1 .
Note that Definition 1 is equivalent to choosing the entires H pq equal to ±i independently and with equal probability, whereas Definition 2 is equivalent to choosing H pq equal to ±i with equal probability but with the constraint that q H pq = 0 for all p = 1, . . . , N .
Due to the independence of the elements in the ITE, many of the techniques developed to treat Wigner matrices are directly applicable, for example the universality of local statistics has been established in this case [13]. Moreover, since H is related to A by a (complex) rank one perturbation, the spectral properties of the ITE can be related to the complex eigenvalues of random tournaments [13]. However, to the best of our knowledge, there are no such results for the RITE, although linear statistics [37,38], local semicircle estimates [39,40] and local universality results [41] have been obtained for random regular graphs using switching methods.
Theorem 2.1 (Convergence for ITE). Let Z = (Z 2 , Z 3 , . . . , Z k ) be a collection of iid random Gaussian variables with mean 0 and variance σ 2 n = E[Z 2 n ] = n. Let H be chosen according to the ITE and define the random variables where Proof.

Outline of ideas and methods
In order to prove Theorems 2.1 and 2.2 we introduce random walks within T N and R N with two properties. Firstly, the stationary distributions correspond to P (H) = |T N | −1 and P (H) = |R N | −1 , as per Definitions 1 and 2 respectively. Secondly, the induced motion of the random variable Y (H) will be closely described by a process, whose stationary distribution is given by Z = (Z 2 , Z 3 , . . . , Z k ), as in Theorems 2.1 and 2.2. More precisely, suppose that at some discrete-time t ∈ N our random walker is situated at the matrix H, then we have a transition probability ρ(H → H ) for the walker to be at the matrix H at time t + 1 later. From this one may track how the corresponding variable Y n (H) changes to Y n (H ). For instance, since this is a Markov process, the expected change is given by (2.6) Similarly fluctuations are obtained by calculating the second moment Now suppose that, if we design our random walk correctly, we observe the moments take the form 9) where α N is a certain constant depending only on N and R n (H), R nm (H) are small remainders (the nature of small will be clarified later). Then, for arbitrary test functions f ∈ C 3 (R k−1 ), expanding f (Y (H )) = f (Y (H) + δY (H, H )) in a Taylor series gives with remainder S f (H, H ) and operator A given by If the Markov process is started from a unique stationary state, then the distributions of H and H will be the same, in which case the random variables are referred to as an exchangeable pair. Moreover, the expected change in f satisfies Proof. One should consult e.g. Lemma 1 in [35] for details. Although briefly -using the stationarity of the solution with respect to A * (see Equation (A.3)), integration by parts yields E[Af (Z)] := dZ P (Z)Af (Z) = dZ f (Z)A * P (Z) = 0 for any f ∈ C 2 (R k ) and thus establishes the first implication. For the converse one requires the exact form of the solution to equation (2.12) presented in Proposition A.1 in Appendix A.
Of course the remainder will not, in general, be zero but one might expect that if it is close (in some appropriate manner) then the corresponding variable Y (H) will be close to Z. Stein's realisation was that A and f could be connected via an auxiliary test function φ in what is now known as Stein's equation Af The aim is therefore to find a bound for |E[Af (Y )]| using the function φ, as this will allow for an estimate on the distributional distance between Y and Z. This idea was initially developed by Charles Stein as an alternative method for proving the classical CLT [43]. Stein's method now refers to the overall technique of recovering the distributional distance from bounding the quantity E[Af (Z)].
For readers unfamiliar with the basics of Stein's method, the review by Ross [44] provides an excellent introduction and overview of the different ways this may be achieved. The work of Götze [45] and Barbour [46] in the early 90s allowed for an extension of Stein's method to multivariate Gaussian distributions and established an explicit connection between Stein's method and Markov processes. Using these ideas a number of authors adapted the use of the exchangeable pairs mechanism to multivariate Gaussian distributions [33][34][35] (the thesis of Döbler offers an excellent overview of this [47]), from which the following theorem is obtained. Then for all φ ∈ C 2 (R k ) we have where ∇ j φ is given in (2.4), c j are fixed positive constants and R (j) = n1,...,nj E|R n1...nj (M )|.
Proof. Theorem 2.3 is a specific form of Theorem 3 in [35], except that we have decided to use the alternative quantities ∇ k φ as bounds since these are easier to state and make minimal difference in the outcome of our main results. We have therefore decided to include the proof of Theorem 2.3 in Appendix A for completeness and to aid the understanding of the interested reader, even though, beyond minimal adjustments, there is nothing new.
Remark 1. As was first noted by Götze [45] and Barbour [46], the operator A is the generator for a specific multi-dimensional Ornstein-Uhlenbeck (OU) process. Thus, in essence, Theorem 2.3 is stating that if the random walk is close (i.e. the remainders R n , R nm etc and the constant α N go to 0 in the limit of large N ) to that of the associated OU-process then the corresponding stationary distributions will also be close -in the distributional sense of (2.16). This association is described in more detail in Section 4 of [36].

Remark 2.
In principle one could remove the factor of n present in (A.3) and achieve the same stationary distribution but it will transpire the evolution of our observables Y n (B), given in (2.3), can only be analysed if it is included. This is because this factor corresponds to rescaling the time t → nt, which is independent of the random variable in question. Thus, in general, the linear statistic Φ h (H) will not evolve according to a single one-dimensional OU process, but rather a linear combination of independent one-dimensional OU processes evolving at different rates.
The novel aspect of our work concerns the evaluation of the remainders R n (H), R nm (H) and R nml (H). For comparison, the CLT results in [30,32,47], whilst slight stronger, heavily utilise Dyson Brownian motion, which affords a closed form expression for the evolution of spectrum. In other words, the remainders are functions of the eigenvalues, i.e. R n (H) = R n (λ 1 (H), . . . , λ N (H)) etc. However, since our ensembles are not invariant under, say unitary or orthogonal transformations, we do not have this luxury. We therefore use alternative combinatorial methods to obtain estimates in terms of the matrix dimension N .
The starting point of these methods comes from a generalised form of the Bartholdi identity, developed in [48] to obtain a trace formula for the eigenvalues of (magnetic) regular graphs. This allows us to relate the centred Chebyshev Polynomials Y n (H) to sums of products of matrix elements, like H p1p2 H p2p3 . . ., associated to non-backtracking cycles (see Section 3). The change of such products under the appropriate random walks leads to remainder terms comprised of, again, certain classes of matrix products. Estimating the remainders consists of bounding the expectations of this quantities with respect to either the ITE and RITE. Here is where the combinatorial aspects arise, since, just as was first used by Wigner for showing convergence to the semi-circle distribution [14,15], one must evaluate the contributions arising from certain walks.
For the ITE the estimates are relatively straightforward because the matrix elements are independent. It means the contributions from many cycles are precisely zero. Those cycles that remain only give contributions tending to 0 in the large N limit. For the RITE, however, a more complicated random walk leads, inevitably, to more complicated expressions for the remainder terms. Moreover, the lack of independence means the expectations of matrix products that were identically zero for the ITE are no longer so for the RITE. A key part of our analysis is therefore showing the correlations are small enough so the expectations go to zero sufficiently fast in N (see Lemma 5.2). This is achieved by adapting McKay's methods [7] for the number of regular tournaments. Specifically, we transform the expectation of matrix products into a multi-dimensional integral, which are shown to be of a certain order in 1/N .

Graph theoretical tools
Before proceeding to our random walks we first introduce some necessary terminology and simple results. A graph G consists of a set of vertices V (G) and edges E(G) connecting these vertices. G is said to be simple if every pair of vertices is connected by at most one edge and there are no vertices connected to themselves. G is also said to be complete if every pair of vertices has precisely one edge connecting them.
A walk ω of length n on a graph G is an ordered sequence of vertices ω = (p 0 , p 1 , . . . , p n−1 , p n ) such that p i+1 = p i and all pairs (p i , p i+1 ) ∈ E(G), i = 0, . . . , n−1 are edges on the graph. If p i+2 = p i for some i = 0, . . . , n−2 then the walk is said to be backtracking. Otherwise ω is non-backtracking. A walk is also a cycle (of length n) if the first and last vertices are the same, i.e. p 0 = p n . Note that, in the present article, cycles will be distinguished by the starting vertex, so for example, ω = (1, 2, 3, 4, 1) = (2, 3, 4, 1, 2) = ω . Again, the cycle is backtracking if there exists some i such that p i = p i+2(n) and non-backtracking otherwise.
Proof. Let C denote the number of connected components of G . We can create a new graphG ⊆ G by adding a minimal number of edges to G such thatG is connected, then Here β(G) = |E(G)| − |V (G)| + 1 is the first Betti number ofG, which counts the number of fundamental cycles. However, sinceG is a subgraph of G it cannot have more fundamental cycles than G and so The condition |V (G )| ≥ 1 ensures that C ≥ 1, which completes the result.
. . , C to denote the subgraphs of these components and β i = |F Proof. By construction we have and so Let us suppose G is connected (i.e. C = 1) and |Vω ∩ Vω | ≥ 1, then by Lemma 3.1 we have It thus remains to check the case when |Vω ∩ Vω | = 0. In this case |Fω ∩ Fω | = 0 and so Extending this to C connected components completes the result.
For our imaginary tournament matrices there is an intimate connection between the traces of Chebyshev polynomials (see Equation (1.4)) and the sets of non-backtracking cycles. This is given by the following lemma.
and M pp = 0 for all p. Then where Ω n denotes the set of non-backtracking cycles of length 2n and M ω is given in (3.1).
Proof. We are aware of two related methods for proving the validity of this statement that we shall not recount here. The first approach is to make a generalisation of the so-called Bartholdi identity (see e.g. [36,48]) that relates the spectrum of M to another matrix associated to non-backtracking walks in the edge space. This connection is applicable since M can be considered as a magnetic adjacency matrix of a complete graph on N vertices. The second approach is based upon showing that polynomials associated to non-backtracking walks obey the same recursion relations as the Chebyshev polynomials (see e.g. [49] and references therein).

Imaginary tournament ensemble
We now construct the random walk process in T N . Many of the intricate details of this walk are discussed in [5] and so we attempt to keep to the essential points. Suppose that at time t ∈ N we select a matrix H ∈ T N , then at time t + 1 we randomly choose another matrix H ∈ T N by selecting with equal probability one of the upper triangular elements of H (say H pq with p < q) and, together with its symmetric partner (i.e. H qp ), we change its sign H pq → −H pq . We will write to denote the N × N rank 2 difference matrix obtained as a result of performing this change of sign. Here e p is the column vector with a 1 in entry p and 0 everywhere else. This switch corresponds to changing the direction of an edge (see Figure 1) in the associated tournament graph, as described in Section 2.1. p q Figure 1: The Markov process consists of choosing an edge (p, q) uniformly at random in the tournament graph (a) and then switching the orientation to obtain the tournament graph in (b). In this example the (p, q)-th element of the associated adjacency matrix Interpreting this in terms of a random walk we say that if the walker is at H at time t then in each unit time step we let the walker move to any matrix H ∈ T N which is exactly a Hamming distance 2 one away with equal probability -giving us the transition probability is the probability for the random walker to be at matrix H at time t then the probability to be at some other matrix H ∈ T N is given by One may then verify easily that P t (H ) = |T N | −1 (the measure of the ITE in Definition 1) is the stationary distribution of this process, since #{H : |H − H| = 1} = d N . In this instance the random matrices H and H have the same distribution and are thus an exchangeable pair. The expected change of some observable f (H) with respect to this random walk is hence given by Similarly, higher moments are obtained by taking the expectation of products of changes, i.e. for We are now in position to state how the observables Y n (H), given in (2.3), behave under this random walk.
Proof. The proofs for Parts (a), (b) and (c) will be presented in Sections 4.1, 4.2 and 4.3 respectively.

Proof of Proposition Part (a) -Drift term
Inserting the form (4.7) for δY pq n into the expression (4.3) for the expected change of an observable undergoing this random walk leads to Using the expression (4.5) for Y n (H) therefore gives the remainder the set of non-backtracking cycles in Λ 2n in which all edges are traversed exactly once. We also write Λ • 2n = Λ 2n \ Λ 2n for the set of non-backtracking cycles in which at least one edge is traversed more than once. Importantly, for all ω ∈ Λ 2n we have p<q φ ω,pq = 2n. (4.10) Therefore the sum over ω in Λ 2n in (4.9) can be reduced to the lesser sum over Λ • 2n . As outlined in Section 3, let us write [ω] for the equivalence class of vertex labelings of non-backtracking cycles and 2n ] for the set of such equivalence classes in Λ • 2n . Given that p<q φ ω,pq is the same for all ω ∈ [ω] where H ω,ω := H ω H ω , as in (3.3). Since ω ∈ Λ • 2n there must be a least one edge that is traversed The contribution from the term inside the square-root is thus obtained by labelling the independent vertices in V ω,ω , exactly as done by Wigner [14]. Up to a constant, we 2n ]|, the number of unlabelled non-backtracking cycles ω ∈ Λ • 2n . However, since the labelling has been removed this quantity is now independent of N , and so |[Λ

Proof of Proposition 4.1 Part (b) -Diffusion term
Similar to the proof of the drift term, we start by inserting the form (4.7) for δY pq n into the expression (4.4) for the expected change of multiple observables, leading to the following diffusion term (4.12) We estimate the cases n = m and n = m separately. For the former case let us take Λ 2n as in Section 4.1 and define (ω is a single loop) then for a fixed ω 1 there are 4n possible ω 2 such that ω 1 ∼ = ω 2 -obtained by choosing the 2n possible starting vertices of the cycle and the 2 possible orientations. Labelling the independent vertices of ω 1 leads to a contribution to where α ω1,ω2 := p<q φ ω1,pq φ ω2,pq . Using that α ω1,ω2 is the same for all ω 1 , ω 2 ∈ [ω 1 , ω 2 ], we find Since we want to maximise the number of vertices the main contribution to the above will come from cycles ω 1 , ω 2 ∈ Λ 2n in which |V ω1 | = |V ω2 | = 2n (i.e. all vertices are distinct). However, ω 1 and ω 2 must share at least one edge (otherwise α ω1,ω2 = 0) and we cannot have ) is therefore connected and will contain edges that are traversed only once by (ω 1 , ω2 , F ω1,ω2 ). The remaining vertices form a single loop connected by the edges F ω1,ω2 . The , since it is independent of N and also α ω1,ω2 = O(1), since it is equal to, at most, the number of shared edges of ω 1 and ω 2 . Hence, It thus remains to evaluate E|R nm (H)| for n = m. In this instance we have, from (4.12) Again, the main contribution will come from cycles ω 1 and ω 2 in which all vertices are distinct, i.e. |V ω1 | = 2n and |V ω2 | = 2m. However, since n = m, ω 1 and ω 2 cannot share all the same edges. The condition α ω1,ω2 > 0 only if ω 1 and ω 2 share at least one edge, and therefore, for the same reasons as above, those contributing collections of cycles (ω 1 , ω 2 , ω 1 , ω 2 ) for which E[H ω1,ω2,ω 1 ,ω 2 ] is non-zero satisfy |V ω1,ω2,ω 1 ,

Proof of Proposition 4.1 Part (c) -Remainder term
For the remainder term we again insert the expression (4.7) into (4.4), which gives us as the set of nonbacktracking cycles that all traverse the edge (p, q) an odd number of times. Taking the expectation over the ITE subsequently leads to The main contribution to the above will again come from non-backtracking cycles in which all vertices are distinct (|V ω1 | = |V ω 1 | = 2n etc.), as this maximises the number of vertices. In this case all the cycles ω i , ω i , i = 1, 2, 3 must traverse the edge p, q precisely once. The expectation E[H ω1,ω2,ω3,ω 1 ,ω 2 ,ω 3 ] is only non-zero when every edge in E ω1,ω2,ω3,ω 1 ,ω 2 ,ω 3 is traversed an even number of times by (ω 1 , ω 2 , ω 3 , ω 1 , ω 2 , ω 3 ). Therefore the number of vertices will be maximised when every edge (other than (p, q)) is traversed precisely twice, in which case |V ω1,ω2,ω3,ω 1 ,ω 2 ,ω 3 | = 2n + 2m + 2l − 4. However the two vertices p and q are fixed, so when obtaining the contribution inside the square root above by labelling the vertices we get

Regular imaginary tournament ensemble
In a similar manner to the previous section we shall introduce a random walk within R N , which in turn induces a random walk in the variables Y n (H). Obviously this must be different to that of ITE in the previous section, for if we simply change the sign of one element of H then we no longer have q H pq = 0 for all p and therefore the new matrix H / ∈ R N . To remedy this situation we use a random walk that has already been investigated previously in the literature [50]. To describe this Markov process we first note that every regular tournament on N vertices contains directed cycles q = (q 0 , q 1 , q 2 , q 0 ) of length 3, i.e. H q0q1 = H q1q2 = H q2q0 (see e.g. Figure 2 (a)). We shall refer to such directed cycles as triangles, for which there are precisely in every regular tournament. Note that we distinguish labelled triangles, so (1, 2, 3, 1) = (2, 3, 1, 2) for example.
Proof of (5.1). Let us introduce the following indicator function which satisfies Summing over q and using that H pq H pq = −1 and r:r =p,q H qr = −H qp gives The random walk is performed by choosing one of these d N triangles q uniformly at random and then reversing the orientation, i.e. H q0q1 , H q1q2 , H q2q0 → −H q0q1 , −H q1q2 , −H q2q0 (see Figure 2). This guarantees the new matrix H = H + δH q is contained in R N as it satisfies q H pq = 0 for all p. The difference matrix is given explicitly by We may summarise this random walk in the following transition probability for H, where |H − H | R N = 1 6 p,q |H pq − H pq | is equal to 1 if and only if H, H ∈ R N differ by the reversal of exactly one triangle. Starting at any tournament H ∈ R N , one may reach any other tournament H ∈ R N by performing successive reversals. Moreover, this Markov process is known to be mixing [50].
If P t (H) is the probability of the random walker to be at H at time t then Thus P t (H) = |R N | −1 implies that P t+1 (H ) = |R N | −1 also, i.e. H and H are an exchangeable pair. Using the indicator function (5.3) the expected change of some observable under this random walk is therefore Similarly, higher moments are obtained by taking the expectation of products of changes, i.e. for Here we are again interested in the particular observables Y n (H) given in (2.3). Using the expression (3.8) for Y n (H) in Section 3 we find with Ω 2n and Λ 2n the same as in previous sections. Note, however, that in contrast to the analogous expression (4.5) for the ITE the expectation term is not identically zero. This is precisely due to the global correlations enforced by demanding the row sums of H are zero and will require the use of Lemma 5.2 below to evaluate.
The following proposition describes how the Y n (H) behave under the aforementioned random walk. Before progressing to Section 5.1 we first outline two lemmas that are necessary for the proofs.
Lemma 5.1. We have the simplification where the prime in the sum denotes that q 0 = q 1 = q 2 = q 0 .
Proof. Starting with the expression (5.2) for Θ q (H) we can remove the factor (1−δ q0q1 δ q1q2 δ q2q0 ) provided we assume that q 0 = q 1 = q 2 = q 0 . Therefore, expanding in the same way as (5.4) and writing δf q := f (H + δH (q) ) − f (H) to condense notation we find where in the last line we have cyclicly permuted the indices in q 0 , q 1 and q 2 . Proof. See Appendix B.
We stress the above lemma provides a key part in our subsequent analysis of the remainder terms in the motion of Y n (H). In contrast to Wigner ensembles, in which the elements are independent with mean 0, the RITE has global correlations between the matrix elements. However, Lemma 5.2 shows that whilst we do not have E[H E ] = (p,q)∈E E[H pq ] = 0, as in the Wigner case, the correlations for a fixed number of elements are sufficiently weak as to allow for convergence to universal behaviour in the large N limit.
Thus χ e,q is an indicator function equal to 1 if the edge e = (p , q ) if it is equal to one of the (undirected) edges {(q 0 , q 1 ), (q 1 , q 0 ), (q 1 , q 2 )} and 0 otherwise. The corresponding change in the matrix product H ω , for the non-backtracking cycle ω = (p 0 , p 1 , . . . p 2n−1 , p 0 ) is therefore where we can also write Thus φ ω,q is an indicator function equal to 1 if the edges in the triangle q are traversed an odd number of times by ω and 0 otherwise. The change in Y n (H) brought about by reversing the orientation of q can therefore be expressed using (5.12) as (5.13)

Proof of Proposition Part (a) -Drift term
Starting from the expression (5.7) for the expected change of an observable, inserting the expression (5.13) and utilising Lemma 5.1 we find Therefore we may write with the remainder given by Now, crucially, by splitting the sum over Λ 2n into Λ 2n = {ω ∈ Λ 2n : |F ω | = 2n} and Λ • 2n = Λ 2n \ Λ 2n (see Section 4) the constant expectation term in the above can be expressed, subject to a subleading correction in N , in the following alternative manner To see why this is the case, first note that Lemma 5. Section 3 for the definition of F ω -the set of free edges), with the contribution of O(N |Vω |) coming from the number of possibilities of labelling the vertices in ω. Let us consider those ω in which every edge is traversed at most twice (all other cycles will give a negligible contribution in comparison) and form the graphĜ = (V ω , E ω ). Since ω is a cycle the graphĜ is connected and satisfies 2|E ω | − |F ω | = 2n, with the first Betti number β(Ĝ) = |E ω | − |V ω | + 1. Thus |V ω | − |F ω |/2 = n + 1 − β(Ĝ). Now β(Ĝ) > 0, otherwise the ω would be backtracking. In addition, suppose β(Ĝ) = 1, thenĜ must be a loop (there can be no dangling edges since ω is non-backtracking), however this is only possible for walks ω in which |F ω | = 2n or |F ω | = 0, which means ω / ∈ Λ • 2n . Hence β(Ĝ) ≥ 2 and therefore |V ω | − |F ω |/2 = n − 1, meaning the second term in (5.16) is of order O(N n−1 ).

Proof of Lemma 5.3 Part (b). Let us define the following sets of walks
Then, splitting the sum over Λ 2n in (5.23) leads to Given Proposition 5.2 we have by the triangle inequality The result is then obtained by showing all the terms within the square brackets are at most of order O(N n ).
To proceed further we note that a similar application of the regularity condition can be applied to the index p 2n−2 Applying the same method to C 2n−3 and so forth leads to Moreover, due to the regularity of H, the sum over C 3 is constant which gives Therefore, inserting (5.37) into (5.34) and then (5.34) into (5.33) leads to the following expression with the constant c n,N given by the second term in the right hand side of (5.37). Importantly, this constant is of order O(N n ), which would lead to a larger result in Lemma 5.3 Part (b), however by subtracting the expectation of the same quantity, as in (5.23) this leading order is removed. Hence, inserting (5.38) into (5.23) gives The result is therefore obtained once we show all terms involving expectations are at most O(N n ). We start with D 2n−2j . In this case each walk ω has |V ω | = |F ω | = 2n −2j and therefore |V ω | − |F ω |/2 = n −j. Hence The same holds for B 2n and Λ † 2n since they are both contained in D 2n . For walks in A 2n we have |V ω | ≤ 2n + 1 and |F ω | = 2n + 2, giving |V ω | − |F ω |/2 ≤ n and so the same result follows.
Finally noting that there is a factor of order O(N −n−m−1 ) means that E|R nm (H)| = O(N −1 ).
However this time there are only two vertices of q contained in V ω1,ω2,ω3 , which means the contribution to (5.47 Returning to (5.46) and noting that the sum over q gives a contribution of O(N 3 ) means that E|R nml (H)| = O(N −1 ), as desired.

Conclusions
We have used a combination of appropriate random walks and Stein's method to provide rates of convergence for the traces of random Bernoulli ensembles derived from both tournaments and regular tournaments. Specifically we have shown that under this random walk the traces, in a basis of Chebyshev polynomials, behave like independent Ornstein-Uhlenbeck processes in the limit of large matrix size. Subsequently, this allows to use the results of Chatterjee & Meckes [33], Reinert & Röllin [34] and Meckes [35], regarding the multivariate version of the exchangeable pairs mechanism for Stein's method, in order to obtain rates of convergence to an appropriate Gaussian distribution. In particular, we are able to obtain these results using combinatorial methods, closely related to previous calculations for showing distributional convergence, but without explicit rates (see e.g. [28,49]). Moreover, this approach only requires estimates involving third order moments to show distributional convergence.
We would like to finish with a couple of comments. Firstly, we note that in the bound for the distributional distance (2.5) of the RITE in Theorem 2.2, the first term is of order O(N −1/2 ). This comes from a single set of walks, arising due to the regularity of the matrix H (see the proof of Lemma 5.3 Part (b)). It is not clear whether this can be improved to O(N −1 ) in order to match the corresponding result in Theorem 2.1 for the ITE. Secondly, we believe the results could be easily applied to other types of matrix ensembles such as Wigner matrices, or tournaments with different score sequences. For Wigner matrices the random walk would be very similar -one may choose a matrix element at random and then resample from the appropriate distribution. However Lemma 3.2 is not immediately applicable and would therefore have to be amended. Although results in this direction have already been achieved [51]. For tournaments with different score sequences similar random walks to the RITE have already been analysed [50] and the number of such tournaments have been asymptotically estimated [8], expanding on the technques developed by McKay for regular tournaments [7], which suggests a result akin to Lemma 5.2 would also be possible.
with a more complicated type of function bound and so we keep with derivatives of the form ∇ j φ for simplicity. This does not provide any meaningful effect on our final result.
Proposition A.1 (Stein solution). Let A be the operator given in (2.11). Then the solution to Stein's equation (2.12) is given by where P (X → Z ; t) = k n=2 P n (X n → Z n ; t), describes the evolution in the corresponding one-dimensional OU process for a fixed initial position X n . After a simple change of variables one obtains the second equality where Z = (Z 2 , . . . , Z k ), Z n ∼ N (0, n) are independent Gaussian random variables with µ(Z) the associated measure andX = (X 2 , . . . ,X k ) withX n (X, Z; t) = X n e −nt + √ 1 − e −2nt Z n .
Proof. Let us write A := k n=2 nL n with then the solution to the backward Fokker-Planck equation ∂ t P (Z ; t) = AP (Z ; t) is well-known (see e.g. [52,53]) and given by (A.1). Therefore Then with ∇ j f and ∇ j φ defined in (2.4).
Proof. We have, writing dµ(Z) = dZP (Z) and changing variables of the derivatives where n i = 2, . . . , k. Integration by parts may therefore be performed on the Z nj variable Thus, using E[|Z nj |] = 2nj π for Z nj ∼ N (0, n j ), gives Therefore, expanding f (Y ) in a Taylor series about Y and substituting for the expressions (2.13) and (2.14) we get where S f (M, M ) is the integral form of the remainder obtained in Taylor's theorem Finally, using Lemma A.1 we have ∇ j f ≤ r j ∇ j−1 φ with explicit values for the r j .

B Expectations in the RITE
Proof of Lemma 5.2. In order to prove the lemma we use the ideas of McKay [7], who was originally interested in establishing the asymptotic number of regular tournaments. This was achieved via what he describes as a saddle-point argument, which we adapt here for our current purposes. The main idea is to rewrite the expectation E[H E ] in terms of a trigonometric integral (see Equation (B.4)), with N angles θ p corresponding to each of the N rows in the matrix H. Crucially the integrand depends only on the differences θ p − θ q of these angles and is maximised when all angles are equal. Therefore we show the main contribution comes from the region where θ p ≈ θ q for all p, q and the remaining regions are negligible in the limit of large N .
To construct the appropriate integral expression we begin with the following characteristic function An analytical expression for χ R N (H) may be achieved via the Kronecker delta function. If we let S p = − q iH pq be the row sums then our matrix H belongs to R N only if S p = 0 for all p. Therefore where we have used that H pq = −H qp . We notice in the expressions above that, since S p is always even, the integrand is invariant under the shift θ p → θ p + π for any p, and so Summing over all possible matrices H ∈ T N and weighting by this characteristic function leads to the following integral expression for the number of regular tournaments and evaluated by McKay [7] |R N | = Using the same approach we can evaluate the expectation in Lemma 5.2. Using the characteristic function (B.1) the expectation (5.11) is therefore and E c = {(p, q) : 1 ≤ p < q ≤ N } \ E.
To evaluate the integral I we split the integration range into those parts which are dominant and subdominant. To this end let us define the following quantities • A s = [(s − 4)π/8, (s − 3)π/8], so in particular [− π 2 , π 2 ] = 7 s=0 A s .
• n j = n j (s) = #{p : s p = j}. This counts the number of angles θ p in the segment A j .
• D (1) = {s : n j + n j+1(8) + n j+2 ( We will show subsequently that where k = |E| is the number of edges in E, and |J (2) | is negligible in comparison in the large N limit. Hence, inserting the expression for |R N |, gives E R N [H E ] = O(N −k/2 ), as desired.
We begin by showing the result (B.6) for J (1) which provides the leading contribution. From the form of D (1) we see that all angles are contained in a range [− π 2 , π 2 ] up to translation 3 . The sets D which are valid for |x| ≤ π 2 . Inserting these, employing the transition θ p → θ p −θ N for all p = 1, . . . , N −1, integrating over the redundant θ N and extending the integration range to the whole real line leads to where θ = (θ 1 , . . . , θ N −1 ). The covariance matrix is therefore Here I r denotes the r × r identity matrix and E r the r × r matrix in which every element is 1. The inverse can be easily verified to be and thus det(Σ N −1 ) = N 1−(N −1) . Hence, using Hölder's inequality, Using the form of the covariance matrix (B.10) the Gaussian expectation of two random variables is E θ [θ p θ q ] = Σ pq = 1 N (δ pq + 1). Therefore E θ [(θ p − θ q ) 2 ] = E θ [θ 2 p ] − 2E[θ p θ q ] + E[θ 2 q ] = 2/N and so We now turn to the evaluation of J (2) . Due to the condition j n j = N , we have that at least one of n 7 + n 0 , n 1 + n 2 , n 3 + n 4 and n 5 + n 6 is greater than or equal to N/4. Suppose this is the case for n 3 + n 4 . Let us denote A = A 3 ∪ A 4 = [− π 8 , π 8 ], B = A 2 ∪ A 5 = [− π 4 , − π 8 ] ∪ [ π 8 , π 4 ] and C = A 0 ∪ A 1 ∪ A 6 ∪ A 7 = [− π 2 , − π 4 ] ∪ [ π 4 , π 2 ], with n A = n 3 + n 4 , n B = n 2 + n 5 and n C = n 0 + n 1 + n 6 + n 7 accordingly. If we write F := {s ∈ D (2) : n A ≥ N/4} and account for the four possibilities of having at least N/4 angles in the particular segment then (B.12) In addition, we split F = F > ∪ F ≤ , where for some > 0 we have F > = {s ∈ F : n C > N } and F ≤ = {s ∈ F : n C ≤ N } and evaluate each part separately. If θ p ∈ A and θ q ∈ C (or vice versa) then |θ p − θ q | ≥ π/8 and so | cos(θ p − θ q )| ≤ cos(π/8) = e −c for c = − log(cos(π/8)) > 0. In addition, for θ p , θ q ∈ A ∪ B and θ p , θ q ∈ C we can employ the bounds (B.8), and for all others write | sin(θ p − θ q )| ≤ 1 and | cos(θ p − θ q )|. Therefore, using the arguments above for the Gaussian integral, the restriction of (B.12) to F > satisfies where k AC = #{(p, q) ∈ E : θ p ∈ A, θ q ∈ C}, k A∪B = #{(p, q) ∈ E : θ p , θ q ∈ A∪B} and k C = #{(p, q) ∈ E : θ p , θ q ∈ C}. Now, given that k AC ≤ k, we have e ck AC ≤ e ck . Also, since k A∪B + k C ≤ k and n A + n B and n C cannot be equal to zero for s ∈ F 2 n A + n B k A∪B 2 2 n C k C 2 ≤ 2 k/2 .
In addition, n A n C ≥ 1 4 N 1+ for s ∈ F > , so the expression in (B.13) is less than or equal to where F k = 2 k/2 e ck and we have used r = n C and N − r = n A + n B . The factor N r accounts for the number ways of placing r angles in C and N − r angles in A ∪ B. The summand is maximised when r = N/2 and so, using the bound √ 2πnn n e −n ≤ n! ≤ 2 √ 2πnn n e −n when n ≥ 1 for the factorial, we get the contribution from F > is less than or equal to which is negligible in comparison to the contribution from J (1) given in (B.6).
This leaves the evaluation of F ≤ . If we restrict the expression (B.12) to F ≤ and follow exactly the same steps as for the contribution from F > above we get that, since n A ≥ N/4 and n C ≥ 1, which, again, is negligible in comparison to (B.6).