Black holes and random matrices

We argue that the late time behavior of horizon fluctuations in large anti-de Sitter (AdS) black holes is governed by the random matrix dynamics characteristic of quantum chaotic systems. Our main tool is the Sachdev-Ye-Kitaev (SYK) model, which we use as a simple model of a black hole. We use an analytically continued partition function |Z(β + it)|2 as well as correlation functions as diagnostics. Using numerical techniques we establish random matrix behavior at late times. We determine the early time behavior exactly in a double scaling limit, giving us a plausible estimate for the crossover time to random matrix behavior. We use these ideas to formulate a conjecture about general large AdS black holes, like those dual to 4D super-Yang-Mills theory, giving a provisional estimate of the crossover time. We make some preliminary comments about challenges to understanding the late time dynamics from a bulk point of view.


Introduction
One of the deep questions in quantum gravity is the origin of the discrete spectrum of black hole microstates, from the bulk perspective of holographic duality. For large black holes the AdS/CFT duality makes the answer clear from the boundary perspective -a boundary field theory on a compact space generically has a discrete spectrum of states. But its origin from bulk gravity or string theory, even including nonperturbative effects like branes, is still mysterious.
Maldacena [1] pointed out a signature of a discrete energy spectrum that can (in principle) be computed in the bulk -the lack of decay of two-point functions evaluated at very late time. Dyson, Lindesay, and Susskind [2] applied these ideas to the study of correlators in de Sitter space.
To understand the way in which a two-point function diagnoses a discrete energy spectrum we can express it in the energy basis. The two-point correlation function of a Hermitian operator 1 O(t) at inverse temperature β is given by Here, Z(β) = Tr e −βH is the partition function and |n are energy eigenstates with energies E n . At early times we can replace the sum over eigenvalues by a coarse grained integral over a smooth density. G(t) will generically decay exponentially in time, but the decay does not continue indefinitely. At late times the discreteness of the spectrum becomes important, and the phases in (1.1) cause G(t) to oscillate rapidly and erratically. The correlation function is exponentially small and no longer decays. Holographically the coarse grained approximation is equivalent to a perturbative gravity calculation, and the exponential decay to quasinormal mode behavior [3]. The decay continues forever in this approximation.
There is a somewhat simpler diagnostic of a discrete energy spectrum, introduced in the black hole context by [4]. We define Z(β, t) ≡ Tr e −βH−iHt . (1. 2) The quantity Z(β, t) can be obtained by starting with Z(β) and analytically continuing β → β + it. At late times Z(β, t) also oscillates erratically. The time average of an observable and its moments is a simple way to quantify its late time behavior. 2 In fact, the time average of Z(β, t) vanishes, which means that at late times this observable fluctuates around zero. The typical size of the fluctuations can be 1 We assume that in a quantum field theory the operator is suitably smeared to eliminate any short distance divergences. 2 The authors of [5] use this idea in a closely related context. As in the case of the two-point function, the late time behavior of this quantity is generically complicated. One can make progress by taking the long-time average, where terms with oscillating phases average to zero and only terms with E m = E n survive. It is given by 4) where N E is the degeneracy of the energy level E. If the spectrum has no degeneracies (N E = 1), the long-time average becomes (1.5) Z generically scales as e aS where S is the entropy and a is a positive constant. So (1.5) scales as e −aS . In the holographic context S is the black hole entropy which scales as 1/g 2 s ∼ 1/G N , where g s and G N are the string coupling and Newton constants of the bulk theory, so (1.5) is nonperturbative in the bulk coupling. For large black holes, S is given by the thermal entropy of the boundary field theory, and it scales with the number of degrees of freedom. In particular, we have S ∼ N 2 in matrix theories like super-Yang-Mills (SYM) theory, and S ∼ N in vector theories like the Sachdev-Ye-Kitaev model [6,7]. Either way, the quantity (1.5) is non-perturbative in 1/N . Now, suppose we attempt to compute the left-hand side of (1.5) by making a coarse grained approximation. If we replace the discrete sum over states in (1.3) by an integral over a smooth density we find that the long-time average vanishes. In holography, by analytically continuing saddle points we also find disagreement with (1.5). (See section 9 and also [8].) Therefore, by studying how the long-time decay of the partition function (or of the correlator) is avoided in gravity we are in fact probing the discreteness of the black hole spectrum -a basic characteristic of its quantum nature. 3 From the bulk perspective, Maldacena initially suggested that an instanton might be responsible for the analogous O(e −aS ) root-mean-square (RMS) height of the correlator. Barbon and Rabinovici [9] pointed out that such an instanton might not describe the details of the irregular long-time fluctuations expected in the correlator. Information loss in correlation functions was also studied in [10,11] in the context of 2d CFTs. These questions have been difficult to address in standard holographic contexts like N = 4 SYM, due to the difficulty in analyzing the chaotic boundary theory with sufficient precision.
The Sachdev-Ye-Kitaev (SYK) model [6,7] is a good laboratory to explore these questions. It is a quantum mechanical model of N Majorana fermions with random q-fermion JHEP05(2017)118 couplings that is soluble at large N . The theory is highly chaotic: at strong coupling it saturates [12][13][14] the chaos bound [15], a property that is characteristic of black holes in Einstein gravity [16][17][18]. It realizes a (highly curved) description of a "nearly AdS 2 "/"nearly CFT 1 " system [14,[19][20][21][22]. As is the case for other vector models, there is an exact rewrite of the disorder-averaged model in terms of a functional integral over bilocal O(N ) singlet fields G, Σ that presumably are related to the bulk description. 4 The SYK model has several other properties that make it useful in the study of late time properties. The average over the random couplings should rattle the energy eigenvalues sufficiently to make the rapidly oscillating terms in equations (1.1), (1.3) average to zero at a fixed time, making these quantities smooth functions of time. This makes them more amenable to study. In addition, the model is computationally simple enough that numerical methods can yield significant insight [31,32]. (After we had finished our numerical analysis the paper [33] appeared. It has significant overlap with our numerical results.) One goal of this paper is to explore the late time behavior of the SYK model. We present numerous numerical results about such behavior in the model, and interpret them using a variety of analytic and conceptual arguments. One of our key findings is a close relationship between the late time behavior of the model and the behavior of random matrices. 5 It is a widely held conjecture [36] that the spacing statistics of nearby energy levels in quantum chaotic systems should be well approximated by an appropriate random matrix ensemble. Since late times corresponds to small energy differences our result is a natural one.
Building on these observations we can make a plausible conjecture about the behavior of more complicated holographic systems, like the Type IIB AdS 5 / N = 4 SYM system.

Summary of results
Here we give an outline of the paper and summarize the main results. In section 2 we introduce the SYK model. Then in section 3 we write down the spectral form factor, which is given by |Z(β, t)| 2 /Z(β) 2 averaged over the random couplings. At late times this quantity goes over to a plateau value given approximately by (1.5), which characterizes the discreteness of the spectrum. By numerically computing this quantity we find that its late time behavior exhibits an interesting feature, see figure 1. Starting at t = 0, the spectral form factor first dips below its plateau value and then climbs back up in a linear fashion (we call this region the 'ramp'), joining onto the plateau. This behavior is readily explained if we approximate the SYK Hamiltonian by a Gaussian random matrix, as shown in figure 2. Further evidence for the relation between the late time behavior and random matrix theory (RMT) is given in section 3.1, where we show the relation between the choice of RMT ensemble (GUE, GOE, or GSE) and the detailed shape of the late time behavior in SYK. See figure 4. 4 Higher dimensional versions of SYK have been constructed in [23,24]. A supersymmetric generalization of the model has been constructed in [25]. A multiflavor version has been constructed in [26]. Other related work includes [27][28][29][30]. 5 Another discussion of random matrices in black hole physics is [34]. Recent discussion of a connection between chaotic systems and random ensembles, including observables generalizing |Z(β, t)| 2 , appears in [35].

JHEP05(2017)118
In section 4 we make a digression to discuss the thermodynamic properties of SYK. We compute the entropy and energy numerically, and by extrapolating these results to infinite N we find excellent agreement with existing analytical calculations carried out in the large N limit. This serves as an incisive check both on our results and on existing analytic calculations.
In section 5 we review the analytical origin of the ramp and plateau in RMT, and the relation of the ramp to the phenomenon of spectral rigidity. We show that the ramp can be understood as a perturbative effect in RMT (though not as a perturbative 1/N effect in SYK, as we explain).
In section 6 we explain the early-time power-law decay in SYK visible in figure 1. This is related to the low energy portion of the spectrum, dominant in the large N , large βJ limit, that is described by the Schwarzian theory of reparametrizations. We argue this is exact in a double scaling limit. In the large N , large βJ limit a sector of the model [14,20] is dual to a dilaton gravity [37,38] black hole in AdS 2 . We argue that the subsequent linearly growing ramp and the plateau should survive in this limit, suggesting a connection between the late time behavior of black holes and random matrix theory.
In section 7 we discuss a similar ramp that appears in SYK correlators. We work out the conditions under which the fermion two-point function exhibits the ramp/plateau structure of the spectral form factor, and check these results numerically.
In section 8 we consider the behavior of the spectral form factor for a single realization of the random couplings. The motivation here is to make contact with theories such as Yang-Mills which do not involve an averaging over couplings. For a single realization the spectral form factor exhibits large fluctuations even at large N , but we argue that by time averaging (and no disorder averaging) the underlying ramp/plateau structure can be brought into view.
In section 9 we make a connection with N = 4 SYM, giving a preliminary estimate of the gravity saddle points that give the early-time decay of |Z(β, t)| 2 . We also argue that there should be a subsequent long period of time where this quantity is growing and dominated by 'ramp' physics, folded against the coarse-grained density of states of the SYM theory.
We conclude and discuss future directions and ongoing work in section 10. Several appendices contain additional results and discussion.
In appendix A we review the particle-hole symmetry of the SYK model, whose properties depend on N mod 8 [32,39].
In appendix B we discuss the double-scaled limit of SYK, where the disorder-averaged density of states can be computed exactly.
In appendix C we consider a toy model of the G, Σ path integral, which is an exact rewrite of the SYK model in terms of bosonic bilocal fields. We explain how the original fermionic behavior can arise from these bosonic variables.
In appendix D we again consider the G, Σ formulation of the model. We point out the existence of a family of subleading saddle points that show up both in the SYK model and in the integrable version q = 2 of it. We explain why this infinite family of saddle points does not significantly affect the thermodynamics of the model at large N .

JHEP05(2017)118
In appendix E we further discuss these saddle points in the integrable q = 2 version of the SYK model, and show how they lead to a simple kind of random matrix theory behavior at late times.
In appendix F we make some preliminary remarks on the origin of the amplitude of the ramp in SYK.
In appendix G we present constraints on a simple single saddle point explanation of the ramp in SYK correlators.
Finally, in appendix H we present additional numerical data.

The Sachdev-Ye-Kitaev model
Consider N Majorana fermions ψ a (a = 1, . . . , N ) in 0+1 dimensions that obey the algebra The coupling tensor J abcd is completely anti-symmetric, and each independent element is a random real number chosen from a Gaussian distribution with zero mean and variance given by and we set J = 1 for convenience.
In this work we mainly focus on the model with 4-fermion interactions, although we will sometimes discuss the generalization where the fermions interact in groups of q.
For N even it is often useful to implement the model using N d = N 2 Dirac fermions c i (i = 1, . . . , N d ) by defining The Dirac fermions satisfy the algebra We can write down a fermion number charge given by Q = N d i=1c i c i . The Hamiltonian (2.1) does not preserve this charge, but it does preserve charge parity (Q mod 2). Therefore, the Hamiltonian has two blocks corresponding to even and odd values of Q.

JHEP05(2017)118 3 Spectral form factor
We define disorder-averaged analogs of the quantity in equation (1.3) as follows.
Here · J denotes the disorder average -the average over the ensemble of random couplings. Z(β, t) was defined in (1.2). As discussed in the introduction, the late-time behavior of these quantities probes the discreteness of the spectrum, similar to the late-time behavior of two-point functions. Notice that we are working with annealed quantities, meaning that we are taking the disorder average separately in the numerator and denominator. This is in contrast with quenched quantities such as |Z(β, t)| 2 /Z(β) 2 J . The advantage of working with annealed quantities is that they require a finite number of replicas in analytic calculations (g requires two replicas, g d requires just one), whereas quenched quantities require an arbitrary number of replicas. 7 Now we present one of the central results of this work, g(t) for the SYK model. In figure 1 we present g(t; β = 5) for N = 34, computed numerically. 8 Notice that g(t) at early times does not simply join onto the late-time plateau, but instead dips below the plateau and then climbs back up. One goal of this work is to understand the source and implications of this behavior, and to estimate how prevalent it is both in SYK (for various values of the parameter βJ) and in quantum field theory in general.
Notice that g(t) is smooth, and does not exhibit the large fluctuations that one expects at late times in a typical quantum theory. This is due to the disorder average, which smooths out the fluctuations exhibited by each realization of the random couplings. (Some fluctuations are apparent at late times, but these are an artifact due to the finite number of samples used in the computation. We will discuss this point further in section 8.) We will be discussing the curve g(t) at length, so let us point out the main features in this plot and introduce some nomenclature. Starting with t = 0, at early times the value of g(t) drops quickly along what we will call the 'slope', until it reaches a minimum at the 'dip time' t d . Next comes a period of linear growth that we will call the 'ramp'. It ends at the plateau time t p , and beyond this we have an almost constant value of g(t) that we call the 'plateau'. The plateau height is equal to the long-time average of g(t).  Figure 1. A log-log plot of SYK g(t; β = 5), plotted against time for N = 34. Here we use the dimensionless combination tJ for time. Initially the value drops quickly, through a region we call the slope, to a minimum, which we call the dip. After that the value increases roughly linearly, ∼ t, until it smoothly connects to a plateau around tJ = 3 × 10 4 . We call this increase the ramp, and the time at which the extrapolated linear fit of the ramp in the log-log plot crosses the fitted plateau level the plateau time. The data was taken using 90 independent samples, and the disorder average was taken for the numerator and denominator separately. plateau is 2Z(2β)/Z 2 (β) ∼ e −aS , in accordance with (1.4). The factor of 2 is due to a 2-fold degeneracy in the spectrum (see appendix A).
Quantities such as g, g d , and g c are studied extensively in the field of quantum chaos. In particular, g(t) (typically used with β = 0) is called the spectral form factor and it is a standard diagnostic of the pair correlation function of energy eigenvalues. We will often refer to g(t) by this name. It supplies information about the correlations of eigenvalues at different energy separations. 9 One of the basic conjectures in the field of quantum chaos is that the fine grained energy eigenvalue structure of a chaotic system is the same as that of a random matrix chosen from one of the standard Dyson ensembles [40]: Gaussian Unitary Ensemble (GUE), Gaussian Orthogonal Ensemble (GOE), or Gaussian Symplectic Ensemble (GSE). (For reviews, see [36,41].) The particular ensemble to use depends on the symmetries of the original Hamiltonian. Random matrix theory can then be used to compute certain quantities (such as the spectral form factor) that are sensitive to eigenvalue correlations. You, Ludwig and Xu [32] first discussed the quantum chaotic properties of SYK by studying the distribution of spacings between nearest-neighbor energy levels, another standard quantum chaos observable. They showed that the distribution is consistent with RMT predictions. Time tJ GUE, L = 4096, 1200 samples, β=5, g(t ) Figure 2. A log-log plot of g(t; β = 5) against time for GUE random matrices, dimension L = 2 12 . A dip, ramp and plateau structure similar to figure 1 is apparent.
In figure 2 we present g(t; β = 5) for the GUE ensemble of matrices of rank L RMT = 2 12 , computed numerically, with a normalization such that the eigenvalues typically lie in the range −2 < λ < 2 (see (5.1)). At β = 0 the height of the plateau is of order 1/L RMT and the plateau time is at t of order L RMT , the inverse mean level spacing.
Note the similarity between the RMT result and the SYK result, and in particular the presence of the ramp and the plateau. We will argue that the late-time behavior of the spectral form factor in SYK can be explained by random matrix theory. The early time behavior of RMT differs from SYK, although it is not obvious from the plots. The typical eigenvalue density has different dependence on energy in the two systems, which leads to somewhat different initial decays. Moreover, at early times RMT is governed by a perturbative expansion in 1/L, while SYK is governed by an expansion in 1/N . On the other hand, at times well beyond the dip, g(t) is determined by eigenvalue correlations on scales much smaller than the total width of the spectrum, and there one expects to find agreement between SYK and RMT.
What is the physical origin of the ramp in RMT? Eigenvalues of generic matrices repel, so near degeneracies are extremely unlikely. This causes the plateau. The time of onset of the plateau is determined by the scale of near neighbor eigenvalue spacings. The ramp, though, is due to the repulsion between eigenvalues that are far apart in the spectrum. This repulsion, when balanced against the effects that keep the energy finite, gives rise to a very rigid eigenvalue structure. This phenomenon is referred to as long-range spectral rigidity [36,40,41]. More quantitatively, if δE n denotes the deviation of an energy from its average value, then at leading order δE n δE m ∼ log |n − m|. For comparison, if the eigenvalues formed a one dimensional crystal with harmonic near neighbor interactions, then δE n δE m ∼ |n−m|, a much less rigid behavior [36,40,41]. The log |n−m| form, after suitable processing we will discuss below, accounts for the linear behavior of the ramp. The ramp lies below the plateau because repulsion causes the eigenvalues to be anticorrelated.

The ramp and the eightfold way
We now present further evidence of the relation between random matrix theory and the presence of the ramp in the SYK spectral form factor.
The Hamiltonian of a chaotic theory is generally believed to resemble a random matrix when studied at sufficiently fine energy resolution. One basic property of random matrices is their nearest-neighbor level statistics, namely the distribution of the distance s between pairs of neighboring energy levels [42]. 10 The nearest-neighbor statistics of an integrable theory follow an exponential distribution e −s , while those of a chaotic theory generally follow one of the three reference ensembles GUE, GOE, and GSE. The particular ensemble depends on the symmetries of the Hamiltonian.
You, Ludwig and Xu [32] studied the nearest-neighbor level spacing distribution in SYK. They made the important point that all three Gaussian RMT ensembles are implemented in the model as we now review.
The SYK model has a particle-hole symmetry given by [31,32,39] where K is an anti-linear operator. The properties of this operator determine the class of RMT statistics of each charge parity sector of the Hamiltonian. In particular, the statistics are determined by the value of (N mod 8) as follows (see appendix A for details).
• When N mod 8 = 2 or 6, the symmetry P maps the even and odd parity sectors to each other. Individual sectors do not have any anti-linear symmetry, and the corresponding ensemble of each sector is GUE.
• When N mod 8 = 0, P maps each sector to itself and P 2 = 1. The corresponding ensemble is GOE.
• When N mod 8 = 4, P again maps each sector to itself but now P 2 = −1. The corresponding ensemble is GSE. Figure 3 shows the nearest-neighbor statistics of SYK with N = 30, 32, and we see excellent agreement with RMT predictions. While the nearest-neighbor spacing distribution is sensitive to correlations between adjacent energy levels, the spectral form factor probes correlations between energy levels at larger separations. The t parameter in g(t) determines the scale of the energy differences being probed. As discussed above, beyond the plateau time only individual energy levels are probed, while at earlier times (and in particular on the ramp) g(t) is sensitive to correlations between levels that are much farther apart than the mean level spacing. The structure of these correlations depends on the ensemble. The three RMT ensembles all exhibit a ramp and a plateau but with slightly different shapes: in GUE the (unfolded) 10 More precisely, one considers the distribution of spacings between unfolded energy levels [43]. These are the levels one obtains by making a change of variables such that the mean level spacing becomes one everywhere. For further details, see [41]. ramp and plateau connect at a sharp corner, in GOE they connect smoothly, and in GSE they connect at a kink. 11 The shape of the ramp in each case agrees with the RMT prediction outlined above. In particular, the kinks visible for N = 20, 28 are a signature feature of the ramp in the GSE ensemble. For N = 34 (GUE) a careful comparison that confirms the RMT ramp shape is described in section 6. As an initial test we fitted the ramp at times well before the plateau time (where unfolding effects discussed in section 6 become significant). We found a power behavior agreeing with the expected GUE behavior g(t) ∼ t 1 to within a few percent. These are strong pieces of evidence that the ramp structure in SYK can be attributed to random matrix theory.
For β = 0 the early time behavior exhibits oscillations, which will not play a role in this work. The oscillations are due to the fact that, at infinite temperature, the spectral form factor is sensitive to the hard edges at both ends of the energy spectrum.
Let us now consider the plateau heights of figure 4 in detail. They are equal to the time-average value of g(t), which at β = 0 is given by (1.4) Here N E is the degeneracy of energy level E. As explained in appendix A, the SYK spectrum has a double degeneracy (N E = 2) when (N mod 8) = 0 due to the particle-hole symmetry, leading to a plateau height of 2/L at β = 0. When (N mod 8) = 0 there is no protected degeneracy, and in those cases the plateau height is 1/L. These facts are consistent with the pattern of plateau heights exhibited by figure 4. In particular, they explain why the plateaus of N = 16, 24, 32 are reduced by a factor of 2 compared with the rest.  One important consequence of figure 4 is that it allows us to learn about the large N behavior of the ramp. As we go to larger N the dip time grows quickly, but the plateau time grows even faster, resulting in a more and more prominent ramp. (For further discussion of the numerical evidence, see appendix H.2.) We are led to the reasonable conjecture that the ramp is a feature of the large N theory, and that the dip time is a new time scale in the theory. In section 6 we will present an analytic argument that supports this conclusion.

Thermodynamics of the SYK model
In this section we compute the thermodynamic properties of SYK numerically, and extrapolate to the large N limit. We find excellent agreement with existing analytic results, both for the infinite N limit and for the leading O(1/N ) correction. This serves as an important cross-check both on our results and on existing results.
We begin with a brief review of the known analytic results. There is an exact rewrite of the SYK model in terms of bi-local anti-symmetric variables G(τ 1 , τ 2 ) and Σ(τ 1 , τ 2 ). The path integral is given in Euclidean time by [7,46] Z = DGDΣ e −I , (4.1) We remind the reader that we set q = 4 in most of our analysis. The action (4.2) is obtained by performing the disorder average over couplings J ijkl , introducing a Hubbard-Stratonovich field for the fermion bi-linear, and integrating out the fermions [7,46]. In particular, G(τ 1 , τ 2 ) should be thought of as the fermion bi-linear 1 N N a=1 ψ a (τ 1 )ψ a (τ 2 ) and Σ(τ 1 , τ 2 ) as a Lagrange multiplier enforcing this identification. To compute Z(β + it)Z(β − it) J we need two copies (called replicas) of the fermion fields labelled by replica indices α, β = 1, 2. G, Σ become G αβ , Σ αβ . The convergence of (4.1) is manifest with a contour choice described in appendix C.
To solve the theory at large N one now writes the saddle point equations for the bi-local fields.
The first equation is in frequency space, and the second is in Euclidean time. These equations can be solved analytically in the limits βJ → 0 and βJ → ∞ [47], and can be solved numerically for arbitrary values of βJ. Plugging the result back in (4.2) gives the large N thermal free energy [48]. Certain perturbative 1/N corrections to the free energy have also been computed [14,49]. At finite N we compute the mean energy and other thermodynamic quantities numerically by fully diagonalizing the Hamiltonian. To make contact with the analytic calculation, we extrapolate the numerical results to large N as follows. At fixed temperature T we compute E(T ) /N at different N values and fit to a polynomial in 1/N of degree 2. The leading O(N 0 ) coefficient is then the infinite N result, the next term is the 1/N correction, and so on. Figure 5 shows the mean energy extrapolated to infinite N , compared with the result obtained from a direct solution of the large N saddle point equations. We find excellent agreement between the two methods of computation, although even at N = 32 (the largest value considered here) the result is not close to the infinite N answer. The mean energy

JHEP05(2017)118
can be written at low temperature as The normalization is such that all coefficients scale as O(N 0 ). The coefficients c (the large N specific heat) and a have been computed in the large N theory 12 and are given by The coefficient c 2 has not been reported in the literature, but we believe it should be c 2 = −0.419. 13 Notice that the linear term in (4.4) is subleading in 1/N . This must be the case because this term corresponds to a log(T ) term in the entropy, which becomes negative at finite temperature. Let us now compare these coefficients to the extrapolated numerical results: 14 We see that a is suppressed at infinite N as expected, while c is within fifteen percent of the expected value (4.5). Next, we fit the 1/N piece of the extrapolated energy and find Here the fitted value of a = 1.6 is fairly close to the expected value a = 3 2 . Next, figure 6 shows the entropy extrapolated to infinite N . We again find excellent agreement with a direct infinite N calculation. At low temperature the entropy is given by (4.9) Here s 0 ≈ 0.2324 ≈ 1 2 log(1.592) is the analytic zero-temperature entropy density (in the large N limit). Notice that the numerical extrapolation correctly captures the large N zero-temperature entropy, even though at any fixed N the entropy goes to zero as T → 0.

Spectral form factor in random matrix theory
In this section we review properties of the spectral form factor in the GUE random matrix ensemble [36,41]. We derive two of the main features of figure 2: the late time behavior of the slope and the early time behavior of the ramp. Both are described by power laws, and from there we get an estimate of the dip time in RMT. 12 The coefficient a was computed in [14,49] from a one-loop fluctuation correction to the large N saddle, or equivalently from summing diagrams formed by bending ladder diagrams around into a loop. 13 This is based on a conjectured 1/β 2 term in the free energy, which in the notation of [14] reads (4.6) 14 We included a T 3.77 term to account for the first nontrivial operator dimension in the model [14].
Surprisingly, the fit agrees with large N results slightly better if we replace this with a T 4 term.  Consider the GUE ensemble of Hermitian matrices M of rank L, with ensemble averaging defined by In this context, the matrix M is analogous to the SYK Hamiltonian, and the rank L corresponds to the dimension of the Hilbert space. One important difference is that the natural perturbative parameter in SYK is 1/N , whereas in RMT we typically expand in The partition function for a given realization of M is defined by The spectral form factor g and the related quantities g d and g c are then defined by (3.1)-(3.3), where the average · J over the random couplings is replaced by the average · GUE over random matrix elements, given by (5.1). 15 Let us diagonalize M and change variables from its matrix elements to its eigenvalues and a unitary change of basis. This introduces a Jacobian that describes repulsion between eigenvalues. In the large L limit the eigenvalues can be described by a density ρ. We will use ρ for the physical density, andρ for the unit normalized density: Replacing the individual eigenvalues λ i byρ(λ), one obtains 16 (5.5) 15 For example, for the partition function we have 16 The normalization of ρ(λ) should be imposed (for example) by a Lagrange multiplier. The resulting saddle point equations are solved subject to this constraint.

JHEP05(2017)118
The large L saddle point of the above is given by the Wigner semicircle law, The physical eigenvalue density is given by ρ(λ) = Lρ s (λ). Notice that the average eigenvalue spacing is of order 1/L. We now turn to discuss the slope and ramp that appear in the spectral form factor, shown in figure 2. Roughly speaking, g(t) is dominated by the disconnected piece g d (t) before the dip time, and by the connected piece g c (t) after the dip time. We will discuss each in turn.
The leading large L behavior of Z(β, t) follows from the semicircle law. Working for simplicity at infinite temperature, we have Here J 1 is a Bessel function of the first kind. At late times we find that the partition function decays as L/t 3/2 , and therefore at late times we have This is true also at finite temperature. Before the dip time, the spectral form factor g(t) is dominated by the disconnected part g d (t). Therefore, the late time decay of g(t) before the dip time is also proportional to 1/t 3 . This particular power is a consequence of the fact that the mean eigenvalue density (5.6) vanishes as a square root near the edge of the spectrum.

The ramp and the dip time
We now review how to derive the presence of a ramp in RMT. We focus for simplicity on the connected spectral form factor g c (t; β = 0), and show that g c (0, t) ∼ t L 2 at times 1 t ≤ L. Beyond the dip time, g(t) and g c (t) are almost equal, both exhibiting the ramp/plateau structure. However, for g c the ramp extends to very early times, giving better perturbative control. 17 The connected spectral form factor can be written as Here R 2 is the connected pair correlation function of the unit-normalized densityρ, and δρ(λ) ≡ρ(λ) −ρ s (λ) is the fluctuation around the mean eigenvalue densityρ s (λ) given by the semicircle law (5.6). A basic result of RMT is that, near the center of the semicircle,

JHEP05(2017)118
R 2 (λ 1 , λ 2 ) is given by the square of the sine kernel [44,45,50] plus a delta function at coincident points: Fourier transforming as in (5.9) gives This explains the observed behavior in figure 2 beyond the dip time: there is a ramp up to the plateau time 2L, and a constant plateau value beyond. 18 The ramp lies below the plateau because the eigenvalues are anticorrelated as reflected in the minus sign in (5.11). This is a good explanation of the ramp and plateau, but it requires an appeal to the sine kernel. In fact, one can derive the ramp in a more basic way without knowing about the sine kernel. Notice that the initial linear time dependence of the ramp can be obtained by approximating the sine kernel by We now review how to derive this perturbatively from the action (5.5) following Altshuler and Shklovskii [53]. Writingρ =ρ s + δρ and expanding the action (5.5) about the saddle point, we find the quadratic term We can now carry out the Gaussian integral to determine the two-point function (5.10). We go to Fourier space δρ(λ) = ds 2π δρ(s) exp(isλ) and find Notice that long-wavelength fluctuations of ρ are strongly suppressed: this is the spectral rigidity referred to in RMT. 19 Then we find A calculationally more efficient way of studying g(t) in RMT is the formalism developed by Brezin and Zee [54] which uses standard 't Hooft large L perturbation theory to compute the double resolvent of M . We discuss this technology in appendix F. 18 Our analysis here only applies to the contribution from eigenvalues near the center of the semicircle, where the mean density is L/π. We will show how to include regions with different mean densities in (6.3). Brezin and Hikami derive remarkable nonperturbative formulas for g(t) in [51,52]. 19 By observing that the local eigenvalue density is the inverse of the level spacing one can read off from the following result the δEnδEm ∼ log |En − Em| signature of spectral rigidity discussed earlier.

JHEP05(2017)118
Equating the slope (5.8) and the ramp (5.12) gives the RMT dip time t d ∼ √ L, exponential in the "entropy" log L. We find that the ratio t p /t d ∼ √ L, also exponential in the entropy, and therefore the ramp in the RMT spectral form factor survives in the large L limit.
This derivation makes it clear that (5.16) is a perturbative result in RMT at order 1/L 2 . Its contribution to g c (t) is proportional to t/L 2 , capturing the ramp part of (5.12). In other words the ramp is a perturbative RMT effect. By contrast, the plateau is not. 20 Indeed, the appearance of the plateau depends on the oscillating factor in the more exact sine kernel (5.11) expression, which can be obtained from a RMT instanton expression e −2LE imag with imaginary energy [55,56]. The oscillating term comes from continuing to real energy and extracting the appropriate part of the result.
This has important consequences for the application to the SYK model. For SYK, L = 2 N/2 so 1/L ∼ e −aN . Therefore, perturbative RMT effects are nonperturbative in SYK, of order e −2aN . Nonperturbative RMT effects of order e −L are of order exp(−e aN ), an extremely small nonperturbative effect.

Spectral form factor in the SYK model
The presence of the ramp in the results of section 3 suggests that the SYK model possesses spectal rigidity, even for eigenvalue spacings far larger than the mean nearest-neighbor spacing. By combining this assumption with coarse-grained features of the large N spectrum, we reproduce reasonably well the g(t) curve obtained from exact diagonalization.
First, let us explain how an assumption of spectral rigidity produces the ramp observed in g(t). Starting with the general definition of ZZ * , it is convenient to define x = λ 1 − λ 2 and E = λ 1 +λ 2 2 . Notice that in this expression and below, we are using ρ, the physical eigenvalue density, normalized so dλ ρ = L. Now, for late times we assume that the integral is dominated by regions where x is sufficiently small that we can approximate the density-density correlator by GOE, GSE, or GUE statistics. For simplicity, we take GUE statistics which leads to 21 In GUE the ramp is the full perturbative result, while in other RMT ensembles (such as GOE and GSE) the ramp receives higher-order perturbative corrections. Non-perturbative corrections to the ramp exist in all cases. 21 We should make a few comments about this formula. First, if the local statistics are GOE or GSE, then we would replace the ramp function in (6.3) by the appropriate spectral form factor. Second, in cases where the spectum is uniformly d-fold degenerate, we should multiply the ramp term by d 2 and divide ρ(E) inside the second term by d.

JHEP05(2017)118
Eq. (6.3) can be interpreted as follows: we approximate the spectrum by bands over which ρ(E) varies very little. From each band, we get a GUE ramp. The integral over energy in (6.3) is simply summing up these individual ramps which then yields another smoothed ramp. This is the inverse of an "unfolding" process. In a theory with many degrees of freedom, we expect the integral over E to be strongly peaked around a maximum. In general, the location of this maximum will depend on time. The ramp will join the plateau at the time t p = e S(2β) , where the energy that maximizes the integral is simply E(2β), the energy that dominates the canonical ensemble at inverse temperature 2β. One can check that the derivative of g c (t) will smoothly approach zero at t p , giving a C 1 transition onto the plateau even though individual bands have a kink.
One would like to apply (6.3) to SYK, but there is an important subtlety. The second term in (6.3) should be understood as exponentially smaller than the first, as long as t is not too large. In SYK, we also expect correlations between eigenvalues that are only power-law suppressed by N (more precisely of order 1/N q ). One source of such fluctuations would be the overall scale of the Hamiltonian, which varies from J configuration to J configuration. Such terms would dominate over the ramp contribution at short times. However, we might hope that these 1/N q terms will always be smaller than either the first term or the second term in (6.3), so the formula still gives a reasonable picture of SYK. We will return to this point below.
Let us now attempt to evaluate (6.3) for large N SYK. First we discuss the disconnected first term. We can numerically evaluate the large N saddle point that determines Z(β + it) , but for large values of β + it, we also need to consider fluctuations about this saddle. There are a set of modes that become soft for large β + it, which can be captured by the partition function of the effective Schwarzian derivative theory [14,17]: Here, 0 < u < 2π is the physical time variable of the model, and τ (u) is a reparametrization of the thermal circle. The parameter J sets the scale of the Hamiltonian in a way appropriate for general values of q, and α S is a numerical coefficient that depends on q; these are related to the specific heat c by c = 4π 2 α S J . The classical and one-loop contributions to this action have been studied previously [14], with the result However, notice that when we continue to large values of β + it, the coefficient multiplying the action (6.4) becomes small, and τ (u) will have large fluctuations. Naively, this invalidates a perturbative analysis, making it difficult to evaluate Z. In fact, with the correct measure, the theory turns out to be one-loop exact. We will present a somewhat indirect derivation of this fact. A direct proof is also possible, and will appear in [57]. Our derivation is based on an intermediate step where we think about the SYK model for large values of q. Then the coefficient in the action becomes [14]

JHEP05(2017)118
In particular, the coefficient is only a function of N q 2 . We can therefore take a "doublescaled" limit of large N and large q with N q 2 held fixed. It is clear that the Schwarzian part of the theory will survive in this limit, but the rest of the SYK theory simplifies significantly, and it becomes possible to exactly compute the disorder-averaged density of states using techniques from [58]. We sketch this in appendix B. 22 To isolate the contribution of the Schwarzian, we take a further "triple-scaled" limit where we take N q 2 large and the energy (E − E 0 ) above the ground state small, with the product held fixed. In this limit, we find the density of states (see appendix B eq. (B.15)) We expect that the Schwarzian sector is the only part of the theory that survives this triplescaled limit, so (6.7) should be an exact result for the Schwarzian theory. Computing the partition function via Z = dEρ(E)e −βE , we learn that the one-loop result (6.5) is actually the exact answer for the Schwarzian theory. The conclusion of this discussion is that we can include the effect of the soft mode integral by simply dividing the large N saddle point expression for the partition function by a factor of (β +it) 3/2 . Using the expression for the large N free energy in the holographic limit, log Z = N ( 0 β + s 0 + c 2β ), one finds that the disconnected term in (6.3) contributes the following to g(t): .
The time dependence of the exponent becomes negligible at t √ N , and we have a power law decay ∼ t −3 . Away from the holographic limit, one would replace the piece in the exponential by the appropriate finite β saddle point action, which can be computed numerically. Now, we would like to evaluate the second term in (6.3). Away from the holographic limit, one has to use the numerical S(E) determined by solving the Schwinger-Dyson equations. However, we can give a simple formula in the holographic limit, where S(E) = N s 0 + 2c(E − E 0 )N . Neglecting one-loop factors from the integral over E, we have the contribution to g(t) . Notice that this function is C 1 . We can evaluate the dip time by equating (6.8) and (6.9), which gives t d ∼ e N s 0 /2 .
One can also make a more exact analysis of the large N function, by evaluating the finite β saddle point action numerically, and doing the integral over E in (6.3). In figure 7 22 Notice that if we take a double scaling limit q, N → ∞ keeping q/N α fixed, then the scrambling time is of order log(N ) when α < 1 2 , while it is of order 1 when α > 1 2 . Therefore q 2 ∼ N marks the boundary between the behaviors expected for k-local and nonlocal Hamiltonians [59][60][61]. we show the result of doing this computation and plugging in N = 34 to compare to the exact diagonalization data. We also take into account the two-fold degeneracy in the spectrum for N = 34 and evaluate the numerical finite temperature saddle for the slope portion, slightly correcting (6.8). The agreement is reasonably good, although the ramp and plateau are off by factors that represent the discrepancy in the exact free energy vs. the large N saddle point. (Presumably this factor would be mostly resolved by a complete one-loop correction to the large N partition function.) We caution the reader that although the agreement in figure 7 looks reasonable, it is very possible that the true large N answer for g(t) would differ in important ways. In particular, we are not confident that the slope region continues to be described by the simple Schwarzian effective theory out to very long times of order e N s 0 /2 . Another possibility is that some effect leads to the slope portion of g(t) decreasing more rapidly at an earlier timescale. For example, this could be the result of some 1/N q effect that tends to smooth out the sharp √ E − E 0 edge in the spectrum, leading to a faster decay. 23 In this situation, the slope would crash and intersect the ramp much sooner, leading to a short dip time, perhaps of order t d ∼ N q . Another possibility is that the very bottom of the spectrum would be controlled by a spin-glass phase that was argued to exist in the Sachdev-Ye model [48], and may also be present at exponentially low temperatures in the SYK model [62]. Such effects may also lead to a softer edge in the spectrum, again leading to a faster decay of the slope and a correspondingly shorter dip time.

JHEP05(2017)118
We are fairly confident that the dip time should be no later than e N s 0 /2 , based on the idea that neglected effects are not likely to make the spectrum vanish more sharply. As an extreme fallback position, one can argue without any calculation that the dip time is less than e N s 0 , which is enough to establish a parametrically long ramp at non-zero temperature. To make the argument, one assumes that the slow decay in the slope is monotonic and roughly independent of temperature, based on the idea that it comes from the edge of the spectrum. Note that t d can never be larger than t p , because at times t > t p the spectral form factor g(t) is only sensitive to individual energy levels, with all correlations between different levels getting washed out by the oscillating terms. For t > t p , g(t) is equal to the constant plateau height g p . This allows us to conclude that

Correlation functions
In this section we will discuss when two-point correlation functions exhibit ramp + plateau structure at late times. We will use the Eigenstate Thermalization Hypothesis (ETH) [63,64] to estimate matrix elements. As we will see, the answer depends on the (N mod 8) symmetry pattern [31,32,39], which is reviewed in appendix A.
As before, we focus on the annealed ('factorized') two-point function in which the disorder average is taken separately in the numerator and the denominator. This quantity is easier to study analytically than the quenched correlator. We note in passing that it is sometimes useful to consider the average of the squared two-point function [2,9], but for our purposes it will be enough to consider the average of the two-point function itself. Let us first determine whether the two-point function has a nonzero plateau. This can be determined by considering the following long-time average in a single realization of the random couplings.
Here, |n is the energy eigenbasis with energies E n in the random couplings realization, and ψ stands for any one of the fermions ψ i . (We neglect the effect of degeneracies for simplicity.) We expect a non-zero plateau to appear unless the matrix element vanishes. If N/2 is even then there is no degeneracy between the charge parity odd and even sectors (see appendix A). In this case the matrix element in (7.2) equals zero and the plateau vanishes. If N/2 is odd then we can use the particle-hole operator P to write down a selection rule for the matrix element. Let |n , |m denote degenerate states with E n = E m , such that P |m = |n . Then we are interested in whether m|ψ|n can be nonzero. We have m|ψ|n = |m , ψP |m = P ψP |m , P |m = η(N ) ψ|m , P |m = η(N ) m|ψ|n , 3)

JHEP05(2017)118
where we used inner product notation |1 , |2 = 1|2 for clarity. In the second equality we used the antiunitarity of P , and in the third equality we used (A.3). We conclude that a plateau can only appear when η(N ) = 1, or equivalently when (N mod 8) = 2. Next we ask when will a ramp appear in the two-point function. To answer this question, we put the disorder-averaged two-point function in the following form.
Here, ρ(E) is the energy spectrum in a given realization of the random couplings. Notice again that the matrix element E|ψ i |E connects eigenstates from two different charge parity sectors. We will again consider two separate cases, depending on whether N/2 is even or odd. If N/2 is even then there is no degeneracy between the two sectors. The two ρ factors that appear in (7.4) are de-correlated for sufficiently small energy differences (corresponding to late times), and we do not expect a ramp to appear.
If N/2 is odd then the two charge parity sectors are degenerate, so effectively there is only one sector. As discussed above, at late times the correlator probes small energy differences in the spectrum, where we expect each sector of the Hamiltonian to resemble a Gaussian random matrix. For such a matrix, the averages over eigenvalues and eigenstates factorize, and we can approximate Furthermore, for a Gaussian random matrix | E|ψ i |E | 2 J is a smooth function of the small energy difference |E −E |, as in ETH, and we approximate it by a constant. The value of this function at E = E determines whether there is a non-zero plateau, as discussed above. The remaining factor ρ(E)ρ(E ) J gives the spectral form factor. It will lead to a ramp, just as in the case of the observable g(t) discussed in previous sections.
To summarize, the two-point function will display the following combinations of a ramp and a non-zero plateau, depending on the value of (N mod 8).
• If (N mod 8) = 0, 4 then there will be no ramp or plateau.
• If (N mod 8) = 2 then there will be a ramp and a non-zero plateau.
• If (N mod 8) = 6 then there will be a ramp but no plateau (the two-point function will vanish at late times). Figures 8, 9 show a numerical computation of the two-point function that bears out these conclusions.
Finally, let us estimate the height of the correlator plateau written down in (7.2) in the cases where it does not vanish. This requires us to estimate the typical size of the matrix elements |ψ nm | 2 = | n|ψ|m | 2 . For typical eigenstates |n and |m , ETH predicts that the matrix elements will be of the same order as for random states, which would  give |ψ nm | 2 = 1/L. However, in our case, |n and |m are related by the action of the P operator, |n = P |m , so we are actually considering a diagonal expectation value |ψ nm | 2 = | m|ψP |m | 2 . Then ETH instructs us to estimate this by replacing |m with a random state. One can check that this also gives |ψ nm | 2 ∼ 1/L. The height of the plateau in the correlation function should then be of order 1/L. Notice that the spectral form factor plateau at β = 0, given by (1.4), is also of order 1/L. We therefore expect the correlator plateau and the spectral form factor plateau to be of the same order. This holds true for the N = 18 data, where the correlator plateau height is approximately 0.0075, and the spectral form factor plateau is at approximately 0.004.

The ramp in more general theories
One goal of this work is to evaluate how generic the ramp/plateau structure is in chaotic quantum field theories. In this section we ask whether it is plausible for this structure to appear in two-point functions of the form G(t) = O(t)O(0) in such theories. We will make two assumptions: that ETH holds for the theory, and that the late time behavior includes a ramp, as predicted by RMT.
If O is a simple operator, we expect the two-point function to approach its time average G p plus fluctuations of order e −S ∼ 1/L. We would like to make sure that ramp behavior is consistent with this expectation.
The correlator can be written in the energy basis as Here we assumed the spectrum is non-degenerate for simplicity. The first sum in (7.6), coming from terms with E n = E m in the double energy sum, exactly gives the plateau height G p . If the diagonal matrix elements |O nn | 2 are of order unity then we find a plateau height G p ∼ 1 as discussed above. The second sum in (7.6) encodes correlations between different energy levels. Beyond the dip time, it is responsible for the linear time dependence of the ramp. ETH predicts that the off-diagonal matrix elements |O nm | 2 are of order 1/L -much smaller than the diagonal ones. To get an estimate for the second sum we assume that these matrix elements can be treated as constant, |O off−diag. | 2 ∼ 1/L. The remaining sum then describes the ramp of the spectral form factor, sans the plateau contribution. Altogether, the two-point function (7.6) is given schematically by Note that the Z(β) factor in front of the parentheses is needed because the correlator is normalized differently than the spectral form factor. In writing the above expression, we are imagining that we are averaging over time somewhat, in order to supress fluctuations of G(t) and get a smooth ramp. This type of averaging will be discussed further in section 8. In any case, the conclusion of this analysis is that ; the difference is suppressed by 1/L as expected.

Single realization of random couplings
It is important to ask whether the late time features of the spectral form factor (the dip, the ramp and the plateau) appear in ordinary chaotic quantum field theories without a disorder average. As a first step towards addressing this question, in this section we consider the SYK spectral form factor g(t; β) the random couplings J ijkl . Figure 10 compares the single sample result with the disorder averaged spectral form factor. Before the dip time there is good agreement between the single sample and the averaged results. This is consistent with our expectation that in the large N limit and at early times, a typical sample should give a good approximation to the disorder-averaged spectral form factor. We say that the spectral form factor is self averaging at early times. At late times, and in particular after the dip time, the spectral form factor is not self averaging [65]. This implies that at large N a typical realization of the couplings does not give a result that approaches the disorder-averaged value. In particular, at late times a typical realization exhibits large fluctuations as shown in figure 10. We expect ordinary quantum field theories (with no disorder average) to have similar behavior. 24 Despite the large fluctuations, the underlying dip, ramp and plateau are still clearly visible. These features can be made clear by averaging over a sliding time window of width t ave , smearing out the fluctuations. 25,26 For this to work we need to be able to take t ave parametrically shorter than the length of the ramp, so that the features we are interested in will not get smeared out along with the fluctuations. To estimate the required size of t ave , consider the auto-correlations in the random variable |Z(β, t)| 2 , Set t to be a fixed time greater than the dip time. At such fixed t the autocorrelation h(t, dt; β) decays with dt with a typical time scale t decay . After t decay the signal is essentially uncorrelated. As we will see shortly, for t on the plateau and at large N and large β we have t decay ∼ 1/ √ N . For t on the ramp t decay ∼ 1 (We have suppressed the β, J dependence here).

JHEP05(2017)118
Given t decay we can estimate the minimal value of t ave required to remove the fluctuations from the single-sample data. We expect the fractional standard deviation (the ratio of the standard deviation to the mean) for such averaged data to behave like t decay tave . To have a curve with fluctuations smaller than, say, 1/N 2 would require t ave to be no greater than N 4 , even if t decay is as large as 1. Such a value of t ave is parametrically smaller than the length of the ramp, which is exponentially large in N . Therefore, at large N the averaging width t ave can be taken to be parametrically smaller than the length of the ramp. Now we show that on the plateau t decay ∼ 1/ √ N at large N and large fixed β. In the energy eigenbasis the autocorrelation function can be written as Let t be greater than the plateau time t p . For such t, and when dt is small, the first sum is dominated by terms which obey E k − E l + E m − E n = 0. For a chaotic spectrum we generally expect only two solutions. One solution, E k = E l , E m = E n , cancels with the disconnected part. The second solution, E k = E n , E l = E m , gives the approximate answer (Here we assumed that there are no degeneracies for simplicity.) We find that at very late times t the time dependence of the autocorrelation is given by the spectral form factor g(dt; 2β) at early times. At large β and small dt, equation (6.8) provides a good approximation to the spectral form factor, which decays as g(dt; 2β) ∼ exp −cN dt 2 /(2β) 3 . The typical decay time scales as t decay ∼ 1/ √ N , as advertised above. After a time of order a few β the exponential decay is replaced by a 1/(dt) 3 power law decay. By this time the spectral form factor (and hence the autocorrelation h) is already exponentially suppressed. A 1/(dt) 3 power law is integrable so it does not alter our above estimate for the required time averaging window.
On the ramp the analysis is more subtle. First we make the plausible assumption that the leading multipoint energy eigenvalue correlation functions at the exponentially small scales appropriate to the ramp are the same as the RMT correlators, up to 1/N q corrections. In GUE these correlators factorize into sums of products of sine kernels [36]. Then we can use a procedure like that leading to (6.3) to show that for most of the ramp after a time dt of order 1 the autocorrelation function h(dt) decays like 1/(dt) 4 . For the earliest part of the ramp, t < e N s 0 , h(dt) ∼ 1/(dt) 3 . These power laws are integrable and so we estimate that on the ramp t decay ∼ 1. This means that t ave can be chosen to make the error smaller than any power of N and still leave exponentially many data points on the exponentially long ramp. Numerics are not conclusive here, but do show a systematic decrease of error after time averaging.

JHEP05(2017)118
9 Conjecture about super-Yang-Mills The above ideas make it possible to give a conjecture about the behavior of g(t) and correlation functions G(t) in the canonical AdS/CFT duality AdS 5 /CFT 4 where CF T 4 is the N = 4 SU(N) super Yang-Mills theory on S 3 . We will assume that the fine grained spectral statistics of this system are described by random matrix theory. This seems highly plausible given that this system at large 't Hooft coupling λ is maximally chaotic, i.e., it saturates the chaos bound [15]. We also assume that there are no new intervening nonperturbative time scales governing the behavior of g(t) between the relatively short times governed by gravity and the very long times governed by random matrix theory. A distinctive aspect of this system compared to SYK is that at very high temperatures T the entropy becomes parametrically large and the plateau parametrically high relative to the dip.
There is no ensemble of Hamiltonians in this system so we want to describe the time averaged behavior of Z(β, t) as discussed in the previous section. We can relate this to the full density of states ρ(E) At early times and large N 2 we evaluate (9.1) by saddle point and use the bulk gravitational action to determine ρ(E). The initial behavior of Z(β+it) should then be given by analytically continuing the large euclidean AdS-Schwarzschild black hole action to complex β.
In the following we use the results and follow the notation of [67]. The black hole metric has warp factor V (r) = 1 − µ/r n−2 + r 2 /l 2 where µ ∼ G n M , n + 1 = D is the bulk spacetime dimension, and l is the AdS radius. The horizon radius r + is determined by V (r + ) = 0.
The inverse temperature is determined by finding the periodicity of time of the Euclidean signature metric, β = 4π(l 2 r + )/(nr 2 + + (n − 2)l 2 ) . (9. 2) The action I, (Z = e −I ) is given by Here and below C is a positive constant and G N ∼ 1 N 2 . The 3 8 l 4 term is specific to n = 4, D = 5. These Casimir energy type terms are missed without thinking about holographic renormalization [67]. The thermal AdS action in this scheme is We find it convenient to subtract the ground state energy, and study (9.1) via e −I where I = I BH − I AdS . In other words, we do not include the Casimir term. Now we analytically continue. As β → β + it, r + becomes complex. For small real β, r + ∼ 1/β. Adding a small positive imaginary part to β corresponds to adding a small negative imaginary part to r + . At large t, r + → −i n−2 n l.

JHEP05(2017)118
At t = 0, Z ∼ e cN 2 /β 3 which for small β is very large. This reflects the very large entropy at high temperatures. (Here and below c denotes positive constants of order one.) As t is increased and β → β + it, |Z| 2 drops very quickly. At t ∼ β, |Z| 2 becomes less than one. Then another saddle, the thermal AdS solution, dominates. Including the one loop determinant around this saddle representing a gas of gravitons we find |Z| 2 ∼ e c/β 3 , an N -independent much smaller value. 27 But this is not the whole story. As t increases to AdS scale |Z| 2 increases again, eventually becoming of order |Z| 2 ∼ e cN 2 (with no 1 β 3 enhancement). This apparently dominates over the thermal AdS again.
But there is another wrinkle. As t becomes large the solutions of V (r + ) = 0 coalesce. This causes the fluctuation corrections to the saddle point in (9.1) to behave like t N 2 , becoming large at t ∼ N 2 . Taken at face value these large corrections invalidate the saddle point analysis for times larger than this.
Other saddle points could be relevant here. For example, at high temperature, small β, there is a 10D small Schwarzschild black hole saddle point with r + l. Using the n = 9 version of the above formulas gives an initial |Z| 2 1, evolving at t ∼ β to |Z| 2 ∼ e cN 2 and then rapidly decreasing to |Z| 2 1 again. But at t ∼ l, r + becomes of order l and AdS corrections become important. A more careful analysis would be required.
Although it is never thermodynamically dominant, the recent analysis of [68] indicates that there is a Gregory-Laflamme-type 5D to localized 10D transition in the space of saddles. At first glance this could produce a singularity in ρ(E) leading to a slow long-time falloff. If this transition is caused by a single mode becoming tachyonic then it produces a branch point singularity in Z which presumably can be analytically continued around. If there is a more serious kind of large N transition it may produce a more extreme form of singularity. In any event, at large but finite N this feature will be smoothed out, so we do not expect it to produce significant long-time effects past times polynomial in N .
Although this analysis is far from conclusive 28 it does seem like the most plausible values of |Z| 2 in the slope region leading up to the dip have N scaling |Z| 2 ∼ 1 or |Z| 2 ∼ e cN 2 . We will assume these values and compute the dip by matching onto the ramp, to which we now turn.

The ramp in SYM
SYM at large 't Hooft coupling λ is maximally chaotic according to the out of time order correlator diagnostic, so it is plausible to conjecture that its fine grained eigenvalue statistics are described by random matrix theory. The relevant ensemble will be determined by the symmetries of the system. For simplicity let us imagine that a nonzero θ term is present to 27 To be precise, there are of order 1 β 3 weakly interacting gravitons of AdS energy and so Z(β+it) oscillates with AdS frequency. 28 As an example of the subtleties here, for n = 5, D = 6 the dominant saddle causes |Z| 2 to diverge as

JHEP05(2017)118
break the T symmetry. Then we expect GUE statistics. 29 For simplicity, we will discuss the case of a high temperature state, where β is small compared to the spatial S 3 radius. We can outline a simple expectation for the ramp behavior using the formula (6.3). We interpret the expectation value · as denoting a time average rather than a disorder average, as discussed in section 8. The procedure is equivalent to "unfolding" the spectrum, analyzing the ramp and plateau for each narrow energy band, and then adding them up together. First, we study the ramp at reasonably late time, t > e #N 2 , where the relevant energies will be high enough that we can use planar SYM formulas for S(E): At large N the integral in (6.3) will be sharply peaked, and the band that makes the largest contribution at time t will be the band which is just reaching the plateau at time t. That is, S(E) = log t. Using the above equations we then have The growth is somewhat slower than linear, and the ramp joins the plateau at time t p = e S(2β) = e 4c 0 N 2 /(2β) 3 where the derivative of (9.8) vanishes.
To understand the dip time t d we need to work out the behavior of the ramp at earlier times. It is possible that weak interactions in the AdS gas could lead to a small ramp, but we focus our attention on the region where the ramp would be associated to black hole states. The smallest black holes that dominate the microcanonical ensemble are determined by microscopic parameters, as discussed by [69][70][71], but in fact these black holes give contributions to the ramp that are smaller than the slope contribution to g(t). To see this, we consider 10D Schwarzschild black holes of mass E with Schwarzschild radius r s much smaller than the AdS radius l where (in the remainder of this section we suppress numerical factors) Here G N = l 8 /N 2 is in 10D. The contribution of such black holes to the ramp would be 9) where we used that the integral is dominated by the value of r s such that t = e N 2 r 8 s /l 8 . The dip time is the first time such that this contribution is larger than the contribution of 29 We thank Alex Maloney for this observation.

JHEP05(2017)118
the slope. The expected slope contribution is no smaller than 1 Z(β) 2 . Eq. (9.9) first exceeds this value when the dominant value of r s is r s = β, or equivalently at a time t d = e N 2 β 8 /l 8 . (9.10) So we see that the first black holes that are relevant are small, with r s ∼ β, but not microscopic. The associated dip time is exponential, but with a parametrically small coefficient at high temperature. Note that this conclusion is rather sensitive to the longtime behavior of the slope, so this identification of the dip time is tentative. For example, if we have g slope = 1 Z(β) 2 e cN 2 instead, then we would expect t d ∼ e c N 2 for an order one c . Either way, at high temperature we have a large hierarchy between the dip time and the plateau time t p ∼ e 4c 0 N 2 /(2β) 3 , leading to a parametrically long ramp. 30 The early time behavior of AdS 3 /CFT 2 is under greater analytic control and has been analyzed in [8]. There is also an exponential hierarchy, although not as large, in this system.
It would be interesting to consider observables that probe the ramp during earlier times where microscopic black holes are relevant. One possibility would be to directly consider a microcanonical partition function that selects this part of the spectrum.
There is a subtlety in these estimates. SYM and many other theories have global symmetries (like the SO(6) R symmetry). We expect the spectrum within each sector to have chaotic RMT correlations, but the different sectors would be essentially uncorrelated. We expect the number of thermodynamically significant sectors at fixed β to be at most polynomial in the entropy S. If we denote the separate sectors by indices a, b we can write g(t) = a,b g ab (t) where g ab contains the sum over energies in the fixed sectors a, b. The diagonal terms in this sum contribute as usual to the ramp and plateau; the off-diagonal terms have vanishing contribution at late time and large S. So the overall heights of the ramp and plateau are suppressed by polynomials in S. This effect is subleading to the exponential effects we are interested in and so we ignore it. We have confirmed these ideas in the Dirac SYK model which contains a U(1) charge.

Discussion
In this paper we have argued that the late time behavior of horizon fluctuations in large AdS black holes is governed by random matrix dynamics. Our main tool was the SYK model, which we used as a simple model of a black hole, adequate for such qualitative questions.
Using numerical techniques we established random matrix behavior at late times. We were able to determine the early time behavior precisely in the double scaling limit. This enabled us to give a plausible estimate for the dip time by computing the intersection of these two curves. 31 The dip time is exponentially late, and the ramp region, controlled by 30 In fact here the hierarchy is more dramatic than in SYK because the plateau can be made arbitrarily high by increasing the black hole temperature. 31 As noted earlier there could be new phenomena at early times, like spin-glass behavior, or 1/N q effects absent in the double scaling limit. It seems quite plausible these would cause the slope to decay faster and so make the dip time earlier.

JHEP05(2017)118
long-range spectral rigidity, is exponentially long, stretching until the asymptotic plateau behavior sets in. It will be useful to have analytic insight into the ramp behavior in the SYK model. In appendix F we make some preliminary remarks about the origin of the e −2S scale of the height of the ramp in this model.
We used these ideas to formulate a conjecture about more general large AdS black holes, like those dual to 4D SYM theory. Here we rely on the widely accepted intuition that the fine grained structure of energy levels in chaotic systems is described by random matrix theory. We then estimated the time at which the ramp appears by making a provisional estimate of the analytically continued 5D AdS-Schwarzschild black hole metric. Again we find an exponential hierarchy of scales. 32 The early time behavior of AdS 3 /CFT 2 is under greater analytic control and has been analyzed in [8]. There is also an exponential hierarchy in this system.
In all of these situations the dip time does not signal a new physical phenomenon 33 -it is just the time when the ramp becomes larger in size than the slope. To understand the new physics of the ramp it would be interesting to follow it "underneath" the slope to see what happens at early times. For instance in SYM one would start accessing regions controlled by string scale black holes, and eventually the chaotic graviton gas. To do this it might be useful to use a more refined probe than g(t).
A more indirect strategy would be to look for precursors of the ramp starting from short times. Do the individual terms in the 1/N expansion get large as time is increased? 34 Or is there just a factorial growth of coefficients signaling an asymptotic expansion with an exponentially small error sufficient to accommodate the ramp and plateau signals? Knowing this would be helpful in looking for signals in SYM of these phenomena.
To understand the SYM situation better it would be useful to understand more about the averaging procedures that are available. Averaging over time windows has been discussed in section 8. But perhaps one could take an ensemble of SYM theories with slightly different parameters. This possibility may be easier to implement in calculations.
Another set of ideas that might be useful are developments in the theory of sparse random matrices. From this point of view the SYK model is a certain type of sparse random matrix with correlated randomness in the entries. Insights have emerged [58,[72][73][74] about the universality of dense random matrix behavior in the fine grained eigenvalue statistics of various types of sparse matrices. These might give clues about the SYK model, and more general contexts. This is a question we would like to return to in future research.
Perhaps the central question this work raises is the nature of the bulk interpretation of the random matrix behavior. The disorder averaged SYK model can be rewritten exactly in terms of the bilocal collective fields G(t, t ), Σ(t, t ). For g(t) one needs two copies 32 In fact here the hierarchy is more dramatic because the plateau can be made arbitrarily high by increasing the black hole temperature. 33 If the spin glass or 1/N q possibilities are present then there is a new physical phenomenon at the dip. 34 The disconnected partition function g d (t) provides an example of this. As discussed above, the gaussian fluctuations of the edge of the eigenvalue distribution produce a gaussian falloff in g d (t). These are signaled by a series of terms of the form (t 2 /N q−2 ) k . This softening cancels out in the time dependence of g(t).

JHEP05(2017)118
("replicas") of the fermions and so the G αβ , Σ αβ fields carry replica indices α, β = 1, 2. An appropriate contour can be chosen so the functional integral over G αβ (t, t ), Σ αβ (t, t ) is nonperturbatively well-defined, as discussed in appendix C. This functional integral is a rough proxy for a bulk description, because it involves O(N ) singlet objects and in some rough way bulk fields should be able to be reconstructed from the bilocal singlet fields. This functional integral must contain the ramp and plateau behavior -the question is how.
We cannot yet answer this question -it will continue to be a focus of our research. Here we will just make some preliminary comments.
The coefficient of the G, Σ action includes N , so new saddle points are a natural mechanism for such e −N effects. For q = 2, as explained in appendix E, quenched correlators do seem to be described by sums over new saddle points with appropriate fluctuation corrections. Here the ramp is a perturbative 1/N 2 effect and the plateau is an e −N effect.
For the interacting case q > 2 the situation is qualitatively different. Here the interplay between L and N discussed in section 5 becomes crucial. The ramp is a 1/L 2 effect, which means an e −N effect. In appendix G we point out obstacles to a possible single saddle point explanation of the ramp in the correlator G(t). But various auxiliary quantities like the f k discussed in appendix F can be computed by saddle point, giving the desired 2 −N value for large k. It is unclear whether this has anything to do with an actual saddle point description of the ramp involving a sum over many saddle points.
The N mod 8 "eightfold way" pattern noted in [31,32,39] must have an explanation in the G, Σ integral. In some ways it seems analogous to the behavior of the Haldane spin chain [75] as the spin varies from integer to half integer. There the explanation in the continuum is a topological term in the action. That would be a natural guess here, and the question is what topology is being probed. As an initial step it will be important to find the origin of this effect in the moment calculations discussed in appendix F.
The origin of the plateau in the G, Σ integral is another mystery. After continuing to imaginary energy this is an effect of order exp (−L) which is of order exp (−e N ). This is an unusually small nonperturbative effect, the size of the error in an asymptotic series of multi-instanton corrections. A more natural way to explain these effects would be a mapping from G, Σ to new effective random matrix degrees of freedom with effective coupling 1/L whose dynamics would give the plateau as a standard Andreev-Altshuler instanton nonperturbative effect [55,56]. This map would be related to reconstituting the fermions from the collective fields. 35 This is a challenging proposition but the SYK model provides the most concrete arena known in which to explore it.  (September 17, 2016). We thank these institutions for their hospitality. In accordance with institutional policy the data used in the preparation of this paper is available to other scientists on request.
A Particle-hole symmetry of SYK In the Dirac description (2.3) the Hamiltonian has conserved charge parity, where the charge (fermion number) operator isQ = ic i c i . The Hamiltonian (2.1) has two sectors for charge parity even and odd.
The theory also has a particle-hole symmetry under the operator [31, 32, 39] where K is the anti-linear operator that takes z →z, z ∈ C (here we choose c i ,c i to be real with respect to K). One can check that The action on the fermions is given by One can now check that P is a symmetry, For some values of N d this leads to a degeneracy in the spectrum. P maps a state with fermion number Q to N d − Q (in our convention the Fock space vacuum has fermion number 0).
1. If N d = N/2 is odd then P maps the even and odd charge parity sectors to each other, and so the two sectors are degenerate.

JHEP05(2017)118
2. If N d = N/2 is even then P maps each charge parity sector to itself.
(a) If (N d mod 4) = 2 then P 2 = −1. Since P is both anti-linear and obeys P 2 = −1 it cannot map energy eigenstates states to themselves, and we have double degeneracy within each sector.
(b) If (N d mod 4) = 0 then P 2 = 1. In this case there is no protected degeneracy.

B The double-scaled SYK theory
In this appendix we compute the disorder-averaged spectrum of the SYK theory in the double-scaled limit The computation is a small modification of the analysis by Erdős and Schröder in [58] for closely related systems (composed of Pauli matrices with random couplings instead of Majorana operators). The argument goes as follows: first, we compute the moments tr H k . Then, we appeal to a combinatoric result in [76] to get the distribution for which these are the moments.
First, we discuss the computation of the moments. We would like to evaluate for k even. We evaluate the J integral by Wick contractions. This involves pairing up the various terms in different H factors and contracting the flavor indices of the fermions that appear in the pair. If all of the Wick-contracted pairs were adjacent in the product, we could evaluate each pair as 1 2 q , since ψ i ψ i = 1 2 . Taking the product over the pairs and summing over the possible fermion flavors that can occur in each pair, we get tr H k assuming all pairs next to each other [14]. Now, of course we also have to consider cases where Wick-contracted pairs are not adjacent. The procedure is to commute the terms past each other until the contracted pairs are adjacent or nested, so that Wick-contraction lines do not cross.
Let's consider what happens when we move one product of fermions past another. Notice that ψ a 1 . . . ψ aq ψ b 1 . . . ψ bq = (−1) # fermions in common ψ b 1 . . . ψ bq ψ a 1 . . . ψ aq . (B.4) The important feature of the limit where we hold q 2 /N fixed is that the expected number of fermions in common stays of order one in this limit. More precisely, the number is Poisson distributed, with distribution P (m fermions in common) = λ m m! e −λ , λ = q 2 N . (B.5)

JHEP05(2017)118
Now, in principle, things will get complicated because we have to consider the possibility that the same fermions that are shared between two copies of the Hamiltonian containing ψ a 1 · · · ψ aq and ψ b 1 · · · ψ bq might also be shared with a third copy containing ψ c 1 · · · ψ cq . Or, more generally, that the number of such terms might be correlated. However, the probability is proportional to 1/N , without a q 2 enhancement, so we ignore it in the double-scaled limit. This is the key point that makes it possible to solve. So now, each time we have to commute a product of q fermions past each other, we get a factor ∞ m=0 (−1) m P (m fermions in common) = e −2λ . (B.6) Notice that this step differs somewhat from the case considered by [58]. Specifically, this is where the fact that we have Majoranas instead of spins is relevant. In the spin case, the analogous sum gives e −4λ/3 . Anyhow, doing this sum independently for each set of fermions that we need to commute past each other, we can now correct the expression (B. Here cross() gives the number of commutations required to get the pairs arranged in a way so that they are all adjacent or nested. We can describe this target situation by saying that lines connecting the Wick pairs will not cross. Then cross() is just the initial number of crossings of Wick contraction lines. The final step is to notice [58] that the distribution with these moments is known [76]. It is related to the integration measure for the Q-Hermite polynomials, with Q = e −λ . The distribution is given by , for |a| < 1 and zero otherwise. The normalization factor can be determined from the constraint that the total number of states is 2 N/2 . It is convenient to rewrite ρ as follows In the second line we used the Poisson resummation formula. In the last line we did the n integral by contour integration, summing over a geometric series of cuts of finite length

JHEP05(2017)118
along the imaginary n axis. This formula is now in a convenient form for discussing the behavior at small λ.
For example, if we take λ → 0 with a fixed, the first term dominates, and exactly reproduces the large q thermodynamics computed in [14].
Our primary goal is to use this to evaluate the partition function of the Schwarzian theory, so we take a further "triple-scaled" limit In this limit, we approximate a = −1 + zλ 2 + O(λ 3 ) and we have We conclude that in the triple-scaled limit we have One can check that for small λ the normalization factor is N = 2 N/2 J λ π , which leads to where E 0 = − J λ and S 0 = N log(2) 2 − π 2 4λ . This agrees with the 1-loop calculation of [14], but here we conclude that it is the exact answer in the triple-scaled limit that isolates the Schwarzian.
Finally, we will mention that there is another way to analyze the double-scaled limit, starting from the G, Σ action for the disorder-averaged partition function: To take the double-scaled limit, we write where now g(τ 1 , τ 2 ) is a symmetric function of its two arguments that is constrained to vanish when they coincide. The action in the double-scaled limit is

JHEP05(2017)118
Notice that σ appears quadratically, so we can integrate it out exactly. We get which has the form of a Liouville action on a Lorentzian space. One can analyze this theory by studying perturbation theory in J 2 . This is equivalent to computing moments as in the Erdos-Schroder analysis. Note that at each order in J 2 we have a simple Gaussian integral.

C A toy G, Σ integral
In the main text, we asserted that G, Σ give a nonperturbatively exact formulation of the disorder-averaged SYK model. In this appendix, we discuss a toy model for the G, Σ path integral. We discuss the contour of integration and saddle points, and we see how Grassmann behavior can arise from bosonic variables. The example that we will discuss can be thought of as the SYK Grassmann path integral on a space where we replace the time dimension by two points, labeled 1 and 2. Then the fermion variables are ψ i (1), ψ i (2) where i = 1, . . . , N . Concretely, the integral we consider for fixed disorder is (C.1) The average over couplings gives We can write this as a G, Σ integral by the standard manipulation: we introduce a variable σ that is a Lagrange multiplier that sets g = 1 N i ψ i (1)ψ i (2). This leads to the expression (C.5) We would now like to describe how to make sense of this integral. The defining contour of integration for σ is along the imaginary axis, and we start by formally integrating g along th real axis. We then evaluate the integral as follows: if we bring down the log(1 + σ) term and expand in powers of σ, we will have σ integrals of the form N 2πi dσσ p e −N σg = N −p (−∂ g ) p δ(g). This leads to (C.7)

JHEP05(2017)118
This is the right answer, and we got it from an integral over bosonic variables, but the final g integral was supported in a neighborhood of the origin, and the calculation reduced rather trivially to a direct fermionic computation of (C.3). However, we can also change the contour and make the integral more manifestly welldefined. We rotate the g and σ contours in opposite directions by e iπ/q . Here it is simplest to define new variablesσ = −iσe −iπ/q andg = e iπ/q g. Then we have where we integrateg,σ over the real axis. It is easy to check that numerical integration first overg and then overσ indeed gives the correct answer (C.7) for a few values of N, q.
One can also discuss saddle points for this integral. For these purposes we go back to the g, σ variables. There are q saddle points, the solutions of the equations There is one real solution, and this is the one that naively dominates. We have not analyzed the deformation of the integration contour in detail, but we observe that this leading saddle does in fact give the right large N behavior, comparing to (C.7). A confusing aspect of the G, Σ representation is that the fundamental variables are Grassmann variables, and we could ask how this is consistent with a representation by g, σ.
For example, the fact that the square of a Grassmann vanishes should imply that g N +1 = 0. This seems inconsistent with the fact that we are integrating over nonzero values ofg, and indeed studying saddle points with g nonvanishing. In fact, one can check that an insertion of g p with p > N will make the integral zero. This is easiest to see from (C.6), based on the fact that we are at most taking N derivatives of the integrand before setting g = 0, so a term of degree N + 1 will give zero.

D Subleading saddle points in the G, Σ variables
Besides the standard saddle point that gives the themodynamics discussed in section 4, there are a family of subleading saddles for the path integral (4.2). We do not have their explicit form for q = 4, but we can understand some of their properties numerically, and by comparison to the simpler q = 2 theory.
In the q = 2 model, the saddle point equations for different Matsubara frequencies decouple, and we have Choosing G + for all frequencies gives the dominant saddle. Choosing G − for some of the frequencies will lead to subdominant saddles. The difference in saddle point action induced by choosing G − (for both ω n = 2π β (n + 1/2) > 0 and the corresponding −ω n ) is For large βJ, we see that the saddles become almost degenerate. Naively, this would suggest a soft mode connecting the saddles, but because the imaginary part differs by an order one amount, we do not have such a mode. However, at large βJ N one would have to sum over all of these saddles. We will see that they play an important role in appendix E.
In the q = 4 model we do not have explicit formulas, but we can find subleading saddles numerically. It seems that for each q = 2 solution there is a corresponding q = 4 solution, which can be found by starting with the q = 2 solution and iterating the Schwinger-Dyson equations while slowly increasing q from two to four. We give a plot of some solutions in figure 11. An important difference between the q = 2 and q = 4 cases is that the actions do not become degenerate at large βJ. For the simplest case, where we start with a q = 2 solution with a single frequency pair ω n flipped, we find numerically that the q = 4 action is given by We are not sure that this simple expression is exactly correct, only that it is within a percent or so of the numerical answer for the first few frequencies n = 0, . . . , 5 where we were able to check. The important point is that there is a large N 2 gap in the action even at very low temperature. This explains why these additional saddles do not disturb the large N thermodynamics. A logical possibility is that the relative dominance of these saddles could change when we study complex β, but preliminary investigation suggests that this is not the case, and that the gap remains.

JHEP05(2017)118
Finally, we will mention that in the q = 4 theory there also appear to be saddle points that depend nontrivially on both of the time arguments, not just the difference. In other words, we have saddle points that spontaneously break time translation invariance. We have not studied these systematically, but the examples that we found numerically had larger action than the standard saddle.

E Saddle points and the q = 2 model
The q = 2 model is qualitatively different than the model with q > 2, since it is equivalent to a model of free fermions with a random mass matrix [26,77,78]. It is not a chaotic system, but the explicit N × N random matrix leads to a "mini-ramp" and "mini-plateau" in certain quantities, with plateau time t p ∼ N instead of t p ∼ L. In this appendix we show how the saddle points discussed in appendix D contribute to this behavior.
The Hamiltonian of the q = 2 model is where J ij is a real antisymmetric matrix. Conjugating with an orthogonal matrix Q, we can take J ij to a block diagonal form with each block given by where λ k > 0 and with k running from 1 to N/2. Then the Hamiltonian can be written as Whereψ i = (Qψ) i , and we made Dirac fermions out of these pairs of Majoranas, Notice that iJ ij is a skew Hermitian matrix, not a GUE Hermitian matrix. Its eigenvalue statistics are known [26,36]. At large N the spectrum is a semicircle, with 1/N corrections. Eigenvalue (mass) pair correlations R 2 (λ, λ ) are described by a modified sine kernel whose short distance behavior is that of GUE.
It follows that eigenvalues in the single particle sector will repel, because of the usual eigenvalue repulsion of a random matrix. However, nearby multiparticle eigenvalues coming from sectors with very different particle numbers will repel only weakly. Because the eigenvalues that repel each other have an average spacing ∼ 1/N instead of 1/L, we expect that the plateau time in this model is t p ∼ N .
The simplest observable in this model with a ramp is the (quenched) disorder averaged squared correlation function. It turns out this is easier to calculate than g d (t). The averaged correlation function (not squared) does not have a ramp.
Part of the reason that the correlation functions are easier to calculate is that the matrix elements of ψ i , n|ψ i |m , are only nonzero for |n , |m belonging to particle number JHEP05(2017)118 sectors differing by a particle number of one. This means that the correlation function only receives contributions from energy differences that are equal to the single particle sector energies, making the calculation much simpler. Explicitly, the Euclidean quenched correlation function is Hereρ(λ) is the average mass density. In the above integral we are extending it to a symmetric functionρ(−λ) =ρ(λ). We can see that the real time correlation function, obtained by continuing τ → it in the above expression, will not have a ramp or plateau. However, the connected part of the quenched disorder averaged square of the correlation function will have a ramp and plateau Note that the annealed correlator cannot be written simply in terms of R 2 (λ, λ ). After analytically continuing τ → it and τ → it , because of the presence of R 2 (λ, λ) , G 2 c (t, t ) will have a ramp and plateau. In particular, at β = 0 it is precisely equal to g c (t) for the ensemble of skew Hermitian matrices.
This simple expression for the square of the averaged correlation function in terms of the mass pair correlator suggests that it may be possible to calculate in a simple way by saddle point. Kamenev and Mezard [56] calculated R 2 (λ, λ ) in the GUE by saddle point with an integral that is very similar to the path integral (4.1) with q = 2. 36 This is why we want to consider the quenched disorder averaged correlation function instead of the annealed disorder average correlation function (where we would J average the denominator and numerator in (E.5) separately).
The quenched disorder averaged squared correlation function in Matsubara frequency space, G 2 c (ω n , ω m ), can be calculated by coupling sources z n to the operators Let Z({z}) be the partition function with the source term included, The key simplification is that the partition function is a product over all the frequencies, Z({z}) = ∞ n=0 Z n ({z n }), and since the logarithms of these products turn into sums over the different frequencies, the derivatives simplify. We find (E.8) 36 The integral that [56] calculated is closer to the integral (4.1) over only one of the Matsubara frequency modes.

JHEP05(2017)118
Now we evaluate the averaged logarithms of the single frequency factors of the sourced partition function. This is almost exactly the calculation of Kamenev and Mezard [56]. 37 They use the replica trick to rewrite the average of the logarithm in terms of the average of the replicated partition function. They then evaluate the replicated partition function by the saddle point approximation. Their saddle point equation for the average of a single logarithm is equivalent to the equation obtained by combining the saddle point equations for G(ω n ) and Σ(ω n ) for one frequency (D.1), except that we now account for the source with a shift of ω n , −iω n → −iω n + z n .
For the average of the product of logarithms, the saddle point equations have a mixing term. These equations are quadratic and thus have two solutions, G + (ω n ) and G − (ω n ) (D.2). Choosing a replica symmetric solution with G + for each replica gives the dominant contribution to the integrals, the fluctuation is the first term that survives. These contributions correspond to the semicircle part of the mass distribution and mass pair correlation function, and the fluctuations give the ramp. Considering a replica symmetry breaking saddle point involving both G + (ω n ) and G − (ω n ) gives the sine kernel type contribution to R 2 (λ, λ ) in G 2 c (ω n , ω m ), and thus gives us the plateau. As we noted above, calculations of Z(t) and g(t) using saddle points will be more complicated. It appears that the Itzykson-Zuber integral [79] will be helpful. We hope to return to this issue in future work.
F On N −q vs. 2 −N It would be nice to have a direct analytical argument for the ramp and plateau in SYK. As a first step, one would like to understand where the e −2S scale of the ramp comes from. Naively, this is puzzling, because the ramp arises from correlations between the two replicas, and in simple diagrams such correlations are suppressed by powers of N q , not exponential factors. In this appendix, we make a simple comment about how the exponential can emerge from such diagrams.
We start by defining the quantity In principle, knowing F k 1 ,k 2 makes it possible to evaluate the double resolvent tr 1 z−H tr 1 w−H . By taking discontinuities in both z and w across the real axis, one gets an expression for the pair correlation function ρ(z)ρ(w) , which gives rise to the ramp. This procedure has been carried out for the GUE ensemble by Brezin and Zee [54]. At leading order in 1/L 2 , one considers planar graphs only: most of the Wick contractions do not contribute, and many of the remaining graphs for the double resolvent can be summed by replacing z, w by dressed propagators. All that remains is a special class of graphs where we take k 1 = k 2 = k and then Wick-pair the Hamiltonians in (F.1) "straight across" 37 Their calculation applied to the GUE, while ours applies to the ensemble of skew Hermitian matrices.
The difference between our integrals comes from the reality constraint on the Majoranas, which gives a different result at order 1/N.

JHEP05(2017)118
up to an overall reflection. More explicitly, the first H factor in the first trace is paired with the k-th factor in the second trace. The second factor in the first trace is paired with the (k − 1)-st factor in the second trace, and so on. We refer to the result of this special contraction as f k . 38 In GUE one finds f k = 2 −N , which is the origin of the 2 −N coefficient of the ramp. The linear time dependence arises from a singularity in the geometric series that defines the double resolvent, and in particular is sensitive only to the f k for large k. This is an important point so we will emphasize it: the short-distance correlations between eigenvalues, or equivalently the late-time behavior of the ramp, is related to the large k behavior of the f k or F k 1 ,k 2 coefficients.
In SYK, the class of graphs that must be summed at leading order is larger than in GUE. In particular, we have to think about the 1/N expansion instead of the 1/L expansion. We will not attempt to analyze the sum in a systematic way. Instead, we will simply comment on the behavior of the special class of graphs that we used to define f k above, because these already provide a model for the handoff between N −q and 2 −N .
We define f k the same way as above: we let k 1 = k 2 in (F.1) and we Wick-contract the couplings in each factor of H in the first trace with the corrresponding (reflected, as before) factor of H in the second trace. This is equivalent to the following: we imagine writing a product of k of the possible terms that appear in the Hamiltonian. Then f k is simply the probability that such a product has a nonzero trace. For small values of k, f k is suppressed by powers of N q , as expected for a two-replica correlation. For example, ∼ N −q . However, for large values of k, f k approaches a constant value of 2 1−N .
This is because for a product of fermions to have a nonzero trace, we must have an even number of each flavor of fermion, leading to N binary constraints. The exact formula is For large values of k the largest α dominates the sum. This is the value α = 1 when m = 0, N , which leads to f k ≈ 2 1−N . In particular, for the large values of k relevant for the late-time ramp, we find the same behavior as in GUE, for any value of q. We suspect that this is a hint of the universality of local random matrix statistics, and is the basic point behind the origin of the e −2S ∼ 2 −N ramp. We are currently working to make this more precise.

G Constraints on saddle point origins of the ramp
As explained in section 5, the late time plateau is a highly non-perturbative effect in SYK that is expected to involve effects as small as exp(−e N ), based on random matrix theory analysis. On the other hand, the ramp scales as e −N and so it may be a more tractable 38 The contribution of all contractions related to such a configuration by cyclicity would be kf k .

JHEP05(2017)118
non-perturbative effect. In particular, random matrix theory tells us that the part of the ramp that is linear in time is a perturbative effect in RMT, and this part may be an ordinary non-perturbative effect in SYK. In this appendix we make a few comments about the simplest possible approach to explaining the ramp -finding a nontrivial saddle of the original G, Σ action. But because G is small the source log G in the action will deform the saddle point. There is backreaction.
Such a saddle would have to satisfy constraints. First, in order to account for the N mod 8 periodicity discussed in section 7 there would have to multiple saddles with complex action.
The second constraint is more nontrivial. As discussed in section 8, the ramp and plateau are not self-averaging (both in the two-point function and in the spectral form factor) [65]. The fluctuations on the ramp are of the same size as its mean value. But a saddle point explanation requires that we have a limit in which fluctuations are suppressed.
This argument may seem a bit quick because the large fluctuations we are discussing are in the integral over random couplings, but this integral can be performed exactly. In particular, in the G, Σ formulation the disorder integral is done first, followed by the integral over the fermion variables, and we are left with an integral over the G, Σ variables. We checked that the latter integral also exhibits large fluctuations on the ramp and plateau (of the same order as the mean value), by numerically comparing the variance G(t) 2 − G(t) 2 with the mean, directly in the original fermion formulation.
It is possible that the saddle point backreaction for G and for G 2 is delicately tuned to make these answers consistent with numerics, but we see no obvious mechanism for this.

H Data
This section contains some further numerical results. We first present g(t), g c (t), and g d (t) for β = 0, 1, 5 for N = 16, 18, . . . , 34 and discuss the dip-ramp-plateau features of g and g c , which exhibit the mod-8 symmetry pattern. The methods for determining the dip time t d and the plateau time t p are explained next, with the results for N = 10, 12, . . . , 34. We compare the fit of t d with an exponential and a power-law function. The error bars are large but the results for larger N are consistent with the estimate in section 6. They are also consistent with other scenarios involving a crash at earlier time. The available N values are not large enough to disentangle all these effects.
The plateau time t p shows a faster exponential increase, and the numerical result is compared with the results of sections 4 and 6. This, together with the results for t d , show that the ramp length grows exponentially in N . For N = 34 we have fitted the ramp power law omitting times near t p where unfolding effects are important. We find a power consistent with the GUE behavior g(t) ∼ t 1 within a couple percent.
All g(t), g c (t), and g d (t) data discussed so far has been for factorized (annealed) quantities, as in H.1 Plots of g(t), g c (t), and g d (t) In figure 12 we plot g(t), g c (t), and g d (t) on a log-log scale. The oscillation observed for β = 0 before the dip time is also visible for β = 1 but becomes negligible for β = 5. It is due to interference between the upper and lower edges of the eigenvalue distribution.
g d (t) decays quickly to typically much smaller values than g(t) or g c (t) around the dip time. This is consistent with the theoretical expectation of a gaussian falloff due to fluctuations in the edge of the eigenvalue distribution at times of order N (albeit with a somewhat large coefficient). Such effects cancel out in g(t). (Beyond the dip time g d (t) seems to rebound. This is just because the number of samples is finite and hence the cancellation is not perfect.) Around the plateau time, the curves for g(t) and g c (t) exhibits a sharp peak for N = 20 and 28 (GSE), a kink for N = 18, 22, . . . , 34 (GUE), and a smoother connection for N = 16, 24, 32 (GOE), for β = 0. For β = 1 the feature is preserved, while for β = 5 the peak is broadened and the kink is less visible. However, the plateau heights for N ≡ 0 (mod 8) (GUE and GSE) cases appear shifted up compared to those for N ≡ 0 (mod 8) (GOE) cases, for all values of β, and the plateau heights for N = 18, 26, 34 are higher than those for N = 16, 24, 32 for β > 0. All of this is consistent with the RMT interpretation, symmetry considerations and smoothing due to unfolding effects.  Intuitively, the dip time can be determined by finding the minimum value of g. However, with finite statistics, the error is large because of the non-self-averaging nature of g(t) past the dip. Therefore, we estimated the error bar as follows. Firstly we found the minimum value g min . Then, the lower and upper limits of the error bar are estimated as the smallest and largest t which give g(t) < g min × 1.04.
We can fit t dip with an exponential function of N t 0 e κ d N . κ d does not exhibit clear dependence on β from our data (although we expect a weak dependence theoretically). The error bars are large but the results for larger N are consistent with the estimate in section 6. A power-law fit (t d ∼ t 0 N α d ) cannot be ruled out from our data up to N = 34. Again, the available N values are insufficient for a conclusive analysis here.
As discussed in the main text, the function g(t) reaches a plateau at exponentially late time. Numerically, we find that the height agrees with the expectation Z(2β)/Z(β) 2 when we take an average with sufficiently many samples. The plateau t p is defined by fitting the ramp by a power-law of the time (linear function in the log-log plot) and the plateau by a constant, and finding the crossing point of the two lines. We choose the starting point of the fitting range for the ramp as t s = 5 t d if g(5 t d ) < 0.7 g p , otherwise we use the time at which g(t s ) = 0.4 g p . The end of the fitting range is the time at which g(t e ) = 0.7 g p , and we fit log(g(t)) by a linear fit and find the time at which the line reaches log t p . In the right panel of figure 13 we plot t p against N .
As explained in section 6, we expect t p ∼ const. exp(S(2β)). Also, as explained in section 4, the expression for entropy at low temperature is S(2β) = (0.23 + 0.198/β + · · · )N + · · · . As we have seen t p ∼ e κpN and t d ∼ e κ d N , where κ d < κ p = S(2β). Hence log(t p /t d )/N ∼ κ p − κ d should be constant up to 1/N . We observe that κ p − κ d > 0. Therefore, the length of the ramp seems to increase exponentially in N , consistent with section 6. Of course our values of N are not large enough to make definitive statements.

H.3 Comparison of factorized and unfactorized quantities
As explained in section 3, there are two options for defining the spectral form factor. Namely, the factorized, or annealed, quantities (3.1), (3.2), and (3.3), and the unfactorized, or quenched, versions where one averages over J after dividing by Z(β) 2 . These two choices agree when the quantity of interest is self-averaging (up to order 1/N q ). Therefore, g and g u must agree at early time. Numerically we find they agree at large N for all time. g c (t) is not self averaging at early time and so differs from g uc (t) there.

H.4 Density of states ρ(E)
In figure 15 we plot the normalized density of statesρ(E), averaging the spectrum obtained by diagonalizing the Hamiltonian (2.1) for many disorder parameters. Almost periodic oscillations due to level repulsion are clearly observed for small values of N . For large N and fixed q, the distribution will converge in e.g. an L 2 norm sense to a Gaussian [58], with width E ∼ √ N . However, the small tails ofρ for energies of order E ∼ N will not be described by a Gaussian, and will contain an exponentially large number of states. Open Access. This article is distributed under the terms of the Creative Commons Attribution License (CC-BY 4.0), which permits any use, distribution and reproduction in any medium, provided the original author(s) and source are credited.