Equilibration time scales in closed many-body quantum systems

We show that the physical mechanism for the equilibration of closed quantum systems is dephasing, and identify the energy scales that determine the equilibration timescale of a given observable. For realistic physical systems (e.g those with local Hamiltonians), our arguments imply timescales that do not increase with the system size, in contrast to previously known upper bounds. In particular we show that, for such Hamiltonians, the matrix representation of local observables in the energy basis is banded, and that this property is crucial in order to derive equilibration times that are non-negligible in macroscopic systems. Finally, we give an intuitive interpretation to recent theorems on equilibration time-scale.


Introduction
There is currently a renewed interest in the derivation of statistical mechanics from the kinematics and dynamics of a closed quantum system [1]. In this approach, instead of assuming a priori that the system is in some mixed state, such as e.g. a micro-canonical ensemble, one describes it at all times using a pure state t y ñ | ( ) . One then seeks to show that, under reasonable conditions, the system behaves as if it were described by a statistical ensemble. In this way the use of statistical mechanics can be justified without introducing additional external degrees of freedom, such as e.g. thermal 'baths'.
A central part of this programme has been to understand the process of equilibration, i.e., how a constantlyevolving closed quantum system can behave as if relaxing to a stable equilibrium. The main insight relies on the fact [2][3][4] that, if measurements are limited to small subsystems or restricted sets of observables, then 'typical' pure states of large quantum systems are essentially indistinguishable from thermal states. It can then be shown [5,6] that under very general conditions on the Hamiltonian and nearly all initial states, the system will eventually equilibrate, in the sense that an (again, restricted) set of relevant physical quantities will remain most of the time very close to fixed, 'equilibrium' values. For example, given some observable A and a system of finite but arbitrarily large size, if its expectation value A t á ñ ( ) equilibrates, then it must do so around the infinite time average (see section 5.1 of [1]) If the infinite-time average fluctuation of A t á ñ ( ) around Ā is small, then we say that the observable A equilibrates.
One major open question is to understand the time scale at which equilibration occurs in a given system, and in particular its scaling with respect to system parameters such as its size (number of degrees of freedom). Various authors have tackled this question, e.g. [7][8][9][10][11][12][13][14][15][16][17][18][19], producing upper bounds that imply finite-time equilibration in various contexts (see also the 'supplementary information' section in [17] for a brief survey of the literature).
Several of these results [8-13, 17, 18] are again obtained in a typicality framework: they estimate, in various different senses, the average equilibration time of evolutions. While in many cases these calculated averages can have an impressive correspondence to experimentally measured equilibration times [17,18], this approach also has some inherent weaknesses. First of all, it is generally believed that many specific physical conditions that are realisable in Nature or in the lab can be very far from 'typical' (for example, the actual Hamiltonians in Nature tend to have a locality structure that may be absent from most members of a mathematically generated ensemble [18]). In addition, by averaging, one loses information about the physical properties that are relevant to the equilibration time scale of any one specific evolution.
Finally, there are a few works by the Bristol group [14,15] that derive general and rigorous bounds on the equilibration times of arbitrary observables, systems and initial states, without any ensemble averaging. However, the bound in [15] scales with the inverse of the minimum energy difference (gap) in the system's spectrum. For physically realistic systems it therefore increase exponentially with the system size, and cannot be a good estimate of the actual equilibration timescale-in particular since equilibration would then not occur in the thermodynamic limit (see section 7 for details). In contrast, a bound derived in [14] can be independent of the system's size, however only in a regime that requires it to be initially in a nearly completely mixed state, failing to give a physically reasonable estimate in the case of a closed system in a pure initial state.
In this work, we seek to identify the properties of a closed quantum system which are relevant for the equilibration timescale of a given (arbitrary) observable. Our approach is more heuristic than rigorous, but it allows us to estimate a timescale which, under reasonable circumstances, depends only weakly on the system size, and thus seems to capture the relevant physics. The main insight we rely on is that equilibration is due primarily to a process of dephasing between different Fourier components of the dynamical evolution-a point that was briefly made in a classic [21], but that has apparently not been fully appreciated by the current community.
Although our argument does not result in a rigorous bound such as those in [14,15], nor in a definite average evolution such as in [17,18], we are able to discuss how the equilibration timescale of a given observable A depends on the physical properties of the system. Specifically, we find that the coherences of the observable of interest in the energy basis, E A E i j á ñ | | , play a fundamental role. More specifically, the equilibration time depends critically on the range of energy gaps E i − E j for which these coherences have non-negligible values. In particular, if this range remains roughly constant as the system size increases, the same will be true for the equilibration time. As we discuss below, this indeed happens for many observables of interest in many-body systems. We illustrate these results with numerical simulations of a spin chain, finding a reasonable qualitative agreement. It should be noted that similar heuristic methods and conclusions have also recently been proposed in simultaneous, independent research by Wilming et al [22].
This point of view also gives a new understanding of some existing results. For instance, it allows us to identify the reason for the limitations of bounds and estimates such as those in [15,20], which, while rigorous, can vastly overestimate the time scale at which equilibration occurs in realistic systems. In section 7, we argue that the reason for this behaviour is that these estimates ultimately rely on the wrong physical mechanism of equilibration, disregarding the crucial role played by dephasing.
Our main findings can be summarised as follows: • In section 3 we discuss qualitatively why dephasing is the underlying mechanism of equilibration in closed quantum systems.
• In section 4 we develop a formalism based on the coarse-graining of functions in frequency space, which allows us to apply basic tools from Fourier transform theory, such as uncertainty relations, to equilibration related questions.
• In section 5 we determine the relevant energy scales that govern the equilibration time for a given observable, namely the energy fluctuations of the initial state and the bandwidth of the matrix of the observable in the energy basis. In particular, we give an independent proof of the fact [23] that local observables have banded matrices when written in the energy basis of a short-ranged spin Hamiltonian.
• In section 6 we illustrate our results with a numerical simulation of the XXZ model.
• In section 7 we discuss some implications of our results. In section 7.1, we reinterpret existing results from the point of view of our dephasing framework. In section 7.2 we discuss implications for the fields of quantum chaos and integrability. In particular we present two models with identical eigenbases but different level statistics that have indistinguishable dynamics over realistic time-scales.

General setting and definition of the problem
Let us consider a closed system whose state is described by a vector in a Hilbert space of dimension d T and whose Hamiltonian has a spectral representation where E k are its energies and P k the projectors onto its eigenspaces. Note that the sum runs over d E d T terms, since some eigenspaces can be degenerate.
We denote the initial state by 0 y ñ | ( ) . If the Hamiltonian has degenerate energies, we choose an eigenbasis of H such that 0 y ñ | ( ) has non-zero overlap with only one eigenstate E k ñ | for each distinct energy. Choosing units such that ÿ=1, the state at time t is then given by In this article, following a number of authors, [5,6,14,15] we will study equilibration by focusing on observables. The idea is that a system can be considered in equilibrium if all experimentally relevant (typically, coarse-grained) observables A have equilibrated. In other words, we will focus on understanding how the expectation value A t á ñ ( ) approaches its equilibrium value A Tr w ( ). Note that, in order to even talk about equilibration time scales, we must assume that such a condition holds, i.e., that this observable sooner or later equilibrates.
Let us introduce the time signal of A, given the initial state 0 y ñ | ( ) , as the distance of A t á ñ ( ) from equilibrium at time t are the matrix elements of A in the energy eigenbasis, and Δ A =a max −a min is the range of possible outcomes, being a max min ( ) the largest (smallest) eigenvalue of A. The denominator Δ A is introduced to make the time signal dimensionless and satisfying g t 1  | ( )| . Note that the time signals of two observables A and A′=b(A−a 0 ) are identical for a b , We can conveniently rewrite the time signal as We will refer to the complex number v α as the amplitude of the corresponding gap G α , and to its normalised ≔ | | | | as the relevance of G α . Note that the set of relevances form a probability distribution over  .
A physical interpretation of this normalisation factor can be given as follows [5,6,24]: note that, if the system has non-degenerate gaps, then the time-averaged fluctuations of the time signal For this to happen, the phases of the complex numbers v α in the time signal equation (6) need to be highly synchronised. This case is presented pictorially in figure 1 (left) where the v α ʼs are depicted as points in the complex plane.
Result 1 (Equilibration is dephasing). Given a time signal g t being the initial amplitude of the gap G  Î a , a necessary condition for the system to be initially out of equilibrium, i.e., g 0 | ( )| significantly larger than the typical equilibrium fluctuation g 2 á ñ ¥ | | , is that the initial phases θ α are not isotropically distributed but significantly synchronised. More precisely, we quantify the distance from equilibrium as Figure 1. Illustration of the dephasing process of the complex terms v e G t i a a (blue dots/black arrows) of a time signal g(t) (green arrow), see equation (6). On the left, the system is far from equilibrium, a short time t = 0, having been initialised with all v α real, and g (t) is substantial. Note that half of the complex terms have rotated clockwise, and the other half anti-clockwise, as expected from the symmetry of the set of gaps G α . On the right, after a long time, the individual complex vectors have become spread out, and the system has equilibrated, with g(t) becoming close to the typical fluctuation which becomes negligible when the phases θ α are isotropically distributed.
Equation (12) follows from a straightforward calculation To see that isotropic randomly distributed phases give g g a q a | | be a set of independent random complex variables with an isotropic probability distribution p r p r r r , i.e. the random variable v α has fixed modulus r α and a random phase θ α uniformly distributed around the circle. Then, the variance of the random variable a a a a • Conversely, suppose the initial state has support over an energy range that does not scale with n. This can happen, for example, in the case of a so-called 'local quench' [33], when a local subsystem of fixed dimension is excited regardless of the full size of the system. In this case, σ G will also at worst be independent of n, and so even observables with long-range coherence between very different energies, which would otherwise equilibrate quickly, will now take a finite time T eq .
Finally, it is also important to mention that the gap dispersion σ G may actually decay to zero with n, leading to equilibration times that become very long. As an example of this situation consider two subsystems of increasing size n interacting through a spatially localised border of fixed size. The coherence in the energy basis of the interaction Hamiltonian is bounded by the operator norm of such an interaction, matching with the intuition that the stronger the interaction between systems, the faster the relaxation, and vice versa. By rescaling the global Hamiltonian, we can see that the interaction terms become relatively weaker as n grows and thereby the equilibration slower.

Fourier description of the dephasing framework
In this section we give further substance to the above heuristic argument to estimate the equilibration time-scale by means of Fourier transform techniques. Let us first give a general idea of our approach. Suppose the time-signal g(t) decayed more or less steadily to zero, and stayed there. If so, then a good estimate for the equilibration time scale would be given by a few multiples of the standard deviation Δt defined by figure 2). In such a case, following the spirit of our previous heuristic example, it would also be tempting to estimate the order of magnitude of Δt by taking the inverse of the spectral variance Δω of the signal. Indeed, this can be justified if we recall the standard uncertainty principle of Fourier analysis [34] Of course, this is a lower bound, that is saturated exactly only in the case of a Gaussian spectrum. However, it will be nearly saturated (Δt·Δω=c 1 , where c 1 is a constant of order 1), when the spectrum is unimodal and without long tails. In this case, we can also expect the time signal to decrease to a very small value after a time c 2 Δt, for some small multiple c 2 again of order 1. Taking both multiples together, we expect a good estimate for the equilibration time to be of order c c 2 1 2 w D .This is satisfied by equation (16), which we will therefore continue to take as our estimate.
Unfortunately, for finite systems, our initial assumption of steady decay does not apply. The time signal has recurrences, that is, a long time after the dephasing has occurred and the system has equilibrated, the phases get again aligned (synchronised) in the complex plane, and the signal regains strength (figure 2). In order to avoid this problem, in the next subsections we introduce a coarse-grained version of the signal spectrum, which dampens out the recurrences. This allows us to exploit the uncertainty principle to estimate the equilibration time-scales, as described above. Under some mild conditions, we show that the equilibration time-scale estimated provided by this procedure coincides with the one previously given by the heuristic argument of points dephasing in the complex plane.

The frequency signal
We define the frequency signal, g w( ), as the Fourier transform of the time signal g(t) which roughly speaking tells us the relevance with which every frequency contributes to the time signal. When both the time and frequency signals are square-integrable (g g L , 2 Î ), the standard uncertainty principle of equation (18) applies, where However, in the case of time signals such as in equation (6) , as can be seen from equation (9) (the integral diverges proportional to T). The same is true for the frequency signal, since where δ(x) is the Dirac delta distribution. Hence, the uncertainty principle in equation (18) cannot be directly applied.
It is worth noting here that, due to the finite range of energies present in our system, there is an asymmetry between the uncertainties in time and frequency of the signal g(t). On the one hand, the uncertainty in frequency can still be well-defined. To see how, recall first that, for g(t) ä L 2 , it is possible to write the moments of a frequency signal in terms of the corresponding time signal and its derivatives, e.g.
In our case, although each of these integrals diverges, their ratio does have a well-defined limit, in the sense that Taking then this limit as the appropriate definition of 2 w á ñ in this case, and noting that, in the same sense, 0 w á ñ = , we obtain that 2 w w D = á ñ is indeed precisely equal to the gap dispersion σ G defined in equation (15).
On the other hand, though, the value of Δt diverges, even when taking limits in the same sense above. This can be understood physically due to the previously mentioned recurrences in the time signal. Indeed, g(t) is a quasi-periodic function that experiences, over an infinitely large time interval, an infinite number of recurrences to a value arbitrary close to its initial one [35,36].

The coarse-grained signal
We now define the notion of coarse-graining, in which we introduce a microscopic energy scale ò below which the fine-grained details of the spectrum are washed out. As we show below, this is done by replacing the discrete spectrum present in equation (21) by a suitable smooth version.
As previously mentioned, such coarse graining of the frequency signal dampens the time signal g(t), removing the recurrences seen in figure 2 and making g(t) and g w( ) belong to L 2 . We shall see later that it will also allow us to exploit certain existing statements concerning the shapes of energy densities and density of states of realistic initial states and short-ranged local Hamiltonians [37], which will justify our assumption of quasisaturating the uncertainty bound.
An important issue in the coarse-graining is obviously the choice of the energy scale ò. This will be discussed later in detail but we can already understand that in order to remove the recurrences, the discreteness of the frequency signal has to be removed, which implies an ò much larger than the separation between consecutive gaps.
Mathematically, the coarse-graining is accomplished by convolving the frequency signal with an appropriate window function h ò (x), which is only nonzero over an interval of size O(ò). In our case, we find it convenient to is the normalised Gaussian distribution centred at the origin and with standard deviation σ, and C a constant that is determined below. With this spirit, the ò-coarse-grained version of the frequency signal is defined as If g w( ) is given by equation (21), then the ò-coarse-grained frequency signal is In other words, coarse-graining corresponds to widening each Dirac-δ in the original spectral function into a Gaussian of O(ò) width (figure 3). Note that, in doing this, we remove fine details of the spectrum such as the level statistics. Furthermore, unlike g w( ), g  w ( ) is square integrable and lies in L 2 : A coarse-grained frequency signal defines a coarse-grained time signal given by where we have used the convolution theorem for Fourier transforms. The constant C is fixed by imposing that the time signal is not affected by coarse-graining the frequency signal in time scales t=ò −1 i.e.g ò (0)=g(0). This leads to C 2p = and 3 0 Even if the time signal g(t) equilibrates after some time, we know that it must eventually have recurrences. To determine the equilibration time-scale from g ò (t), we need that ò −1 is much greater than the equilibration timescale, but much smaller than the recurrence timescale. In this way, we ensure that the coarse-grained time signal will be indistinguishable from the original one during the equilibration process, but unlike the latter will then decay to zero. . Illustration of the coarse-graining of a discrete gap spectrum with a Gaussian window function. The data here corresponds to the XXZ model that is studied in section 6, with n = 12 spins. The solid blue dots represent the amplitudes v α (G α ) of each gap (in this particular case they are all real and positive). The dashed blue lines illustrate a few of the corresponding weighted Gaussians v α h ò (ω−G α ), where we have chosen ò=0.4 (in arbitrary frequency units). The solid red curve represents the full coarse-grained spectrum g  w ( ), obtained by summing these weighted Gaussians, according to equation (26). Note that, for this choice of ò, the width Δω of the coarse-grained spectrum remains close to the dispersion σ G of the original (discrete) gap spectrum. See figure 6 for a comparison of the corresponding coarse-grained time signal g ò (t) with the exact one. Note finally that although, for simplicity, we use here a single numerical scale on the vertical axis, the v α are adimensional, whereas the continuous curves have physical dimension of time (with units that are the inverse of those used for ω and ò).

The coarse-grained density of relevant gaps
We now focus on the properties of the variance of gap values with respect to the coarse-grained frequency signal We will refer to Δω ò as the 'dispersion of relevant gaps', and to the weight as the density of relevant gaps.
Our goal is to show that, under a wide range of choices of ò and of physically relevant circumstances: (i) Δω ò 2 is very close to the gap dispersion σ G 2 of the original signal, as defined in equation (15), and at the same time (ii) the inverse Δω ò −1 is a good estimate for the equilibration time.
Note first that, by construction, for every gap G α =E j −E i in equation (6) (and, by extension, equation (26) Hence the 'average gap' m w vanishes for any ò and the variance Δω 2 is equivalent to 2 w á ñ, i.e.
After some straightforward manipulation, using equations (29) and (32), It can be easily checked that, if 0   , this expression indeed reduces to equation (15). More specifically, Δω ò 2 will be very close to σ G 2 for all ) . Indeed, in this limit the Gaussian window function h ò (ω) becomes negligibly thin with respect to the smallest separation between gaps, and the coarse-grained spectrum g  w ( ) resembles the original discrete spectrum g(ω). Precisely for this reason, however, this limit is of little use to our goals. Another way of putting this is that, for such small values of ò the coarse-grained time signal (equation (30)) does not have time to decay before the recurrence timescale of the original signal, which is of order ) . To make further progress at this point, it is necessary to assume some features about the amplitudes v α . Otherwise, a fine tuning between phases and modulus of v α can make g  w |˜( )|have an arbitrary behaviour, preventing any general statement concerning the equilibration of g ò (t).
Inspired by [25], we will adopt a weak-typicality point of view: let us assume that the evolution we are considering is drawn from an ensemble for which the v α ʼs are describable by some smooth functions plus stochastic fluctuations. Note that we do not assume a uniform ensemble over all states (or any one specific ensemble), as is the case in most typicality studies [8-13, 17, 18], merely one for which the resulting distribution over the v α ʼs has some very general features which are described below. In the spirit of statistical physics, we basically replace complexity by apparent randomness. In most situations, the description of the gap relevances v α in terms of a smooth function plus stochastic fluctuations is a consequence of the energy level populations c i and the matrix-elements of the observable A ij having this same behaviour. In the next section we discuss under which conditions this is indeed the case.
In the following we show that the process of coarse-graining removes the fluctuations and makes the density of relevant gaps g 2  w |˜( )| have a smooth behaviour.
Result 2 (Coarse-grained frequency signal). Let us consider the amplitudes v α of the gaps G α to be described by ( ) ( ) being the density of gaps. The coarse-grained density of gaps ρ ò (ω) describes how many gaps are ò-close to the frequency ω. That is, the process of coarse-graining washes out the fluctuations δv α turning the coarse-grained frequency signal into a smooth function where the meaning of the approximation is made precise in equation (36).
The proof is tedious and non-illuminating so we give it in appendix B. Note that the error in (36) has two components. The term Kò is due to the variation of the 'smooth' functions with Lipshitz constant smaller than K within an interval of width O(ò). The second component   g w r w ( ) ( ) is the fluctuation that shrinks according to the central limit theorem with   r w ( ), where ò ρ ò (ω) is the number of gaps G α within an interval O(ò).
In order for equation (36) to be meaningful and the error bound small, an optimisation over ò is needed. It is easy to see that for the error to be small, the parameter ò should be much larger than the spacing between consecutive gaps and much smaller than the inverse of the Lipshitz constants of the continuous functions v(ω) and γ(ω), i.e.
Here 'consecutive' refers to gaps ordered by size, and we use the informal notation 'G α+1 ' as a shorthand for 'the gap immediately larger than G α '. Note that, in practice, the lower bound above should be applied only for consecutive relevant gaps, i.e., gaps that whose amplitudes v α make a non-negligible contribution to the sum in equation (6). For many-body systems, the number of gaps increases exponentially in the system size n, and energy differences between consecutive gaps shrink exponentially to zero in n. If for example these gaps all have roughly equal (exponentially small) relevances q α , then ò can also be taken exponentially to zero, as long as this is done at a slower rate than the difference in gaps. This makes the time signal and its coarse-grained version indistinguishable in any realistic time-scale.
In sum, if the gap relevances v α can be described by a continuous part plus a fluctuating part, as in equation (35), then the coarse-grained density of relevant gaps is a smooth function given by This will be particularly useful in the following section, where we apply these ideas in the case of many body systems described by short-ranged Hamiltonians. It is worth noting that the factorisation in equation (37) is also automatically obtained if one assumes, as is often done, that in the thermodynamic limit n  ¥ the discrete gap spectrum may be replaced by a smooth continuous gap density. i.e. taking Comparing with the definition in equation (19), we see that in this case the frequency signal is again This indicates that the assumptions we have made concerning the smoothness of v α are not severe. However, the point of attaining this relation via coarse-graining, while maintaining a finite dimension n, is that it allows us to control how the equilibration timescale scales with the system size, and in particular to understand how this scaling depends on the energy scaling of the relevant observable. We turn to this question in the next section.

Relevant energy scales and equilibration time scales for local Hamiltonians
In this section, we focus on the particular but relevant case of short-range Hamiltonians and initial states that have a finite correlation length. For such systems, we express the density of relevant gaps in terms of the energy density of the initial state and the function that describes the matrix-elements of the observable. By doing so, we identify the energy scales that determine the dispersion of gaps. We find that there are mainly two relevant energy scales: the energy fluctuations of the initial state and the bandwidth of the matrix of the observable A in the Hamiltonian eigenbasis. In the case of systems globally out of equilibrium, only those observables that are banded in the Hamiltonian basis can be observed out of equilibrium for a non-negligible time.
Local Hamiltonian. Let us define a short-ranged or local Hamiltonian of a spin lattice system, i.e., acting on a Hilbert space where the locality structure is given by a graph V ,  ( )with a vertex set V and edge set  . The number of terms of the Hamiltonian is denoted by n  = | |. We consider systems for which it is possible to define a sequence of Hamiltonians H n of different sizes. This becomes trivial in the case of translational invariant systems and regular lattices, but also includes systems with disorder, and defects. The reason for introducing such a sequence of Hamiltonians H n is that it allows us to define the thermodynamic limit. For simplicity, the subindex n is not explicitly written from now on.
Energy density of the initial state. The energy density f (E) of an initial state where the Gaussian has mean .
In what follows, we only make use of the energy density inside integrals, so in practice equation (43) allows us to replace the energy density f E  ¢ ( )by the corresponding Gaussian, with vanishing error.
Initial states that are globally out of equilibrium, e.g. globally quenched, have energy densities with mean and standard deviation that scale in the system size as n E s µ and μ E ∝ n. If the system is at criticality and the correlations decay in a power law, this Gaussian shape cannot be guaranteed anymore. In any case, the energy fluctuations can still scale as n E s µ as long as the power m of the decay is sufficiently fast, i.e. m>D+1 where D is the spatial dimension of the lattice (see appendix A for details).
Matrix-elements of an observable in the energy basis. Taking again a weakly stochastic approach, in the same spirit of the argument used in result 2, we constrain ourselves to observables with off-diagonal matrix-elements in the Hamiltonian basis that can be described by a 'continuous' function S(E, ω) plus some fluctuations δA ij where this choice of writing the arguments of the function S(E, ω) will be shown to be convenient in the following section. As in the previous section, the Lipshitz constant of S(E, ω) is assumed to be bounded by K. Note that the so called eigenstate thermalisation hypothesis (ETH) [21] can be seen as a particular case of this assumption (44). One popular version of the ETH [38] is an ansatz on the matrix-elements of an observable A in the Hamiltonian eigenbasis, where E=(E i +E j )/2 and ω=(E i −E j )/2. The functions  and f ETH (E, ω) are smooth functions of their arguments, and R ij are complex numbers randomly distributed, each with zero mean and unit variance. The essential idea is that both in (44) and (45) the off-diagonal matrix-elements of the observable can be described by a smooth function plus fluctuations that vanish when coarse-graining. Note that in our case, in contrast to ETH, we do not assume anything about the diagonal elements of the observable in the energy basis. This is due to the fact that we are not concerned about what is the equilibrium state of the system (whether is thermal or not), but only about how long the relaxation process takes. Now that we have introduced the energy density f ò (E) and the function S(E, ω) that describes the observable, we are ready to express the density of relevant gaps in terms of these functions: Result 3 (Density of relevant gaps). Given a Hamiltonian and an initial state with populations c i 2 | | , let A be an observable whose matrix-elements in the Hamiltonian eigenbasis, A ij 2 | | , can be described by means of the smooth function S(E, ω) plus some fluctuations as in equation (44). Then, up to errors O(ò K ), the density of relevant gaps can be written as Furthermore, if the Hamiltonian is local and the initial state has a finite correlation length, such that the coarsegrained energy density f ò (E) is the Gaussian (43), then where the function S(ω) is defined as with μ E and σ E the mean and standard deviation of the energy density.
Proof. Let us introduce the density . By considering that S (E, ω) is the smooth description of the matrix elements A ij 2 | | , we get, up to errors of order ò K, Putting equations (49) and (51) together with result 3 implies (46). Now for initial states with a Gaussian energy density as in equation (43), simple algebra leads to the identity Plugging (52) in (46) completes the proof.
The function S(ω) describes the average magnitude of the off diagonal matrix-elements A ij 2 | | at a distance ω=E i −E j from the diagonal.
Result 3 and in particular equation (47) show that the density of relevant gaps g 2  w |˜( )| for local Hamiltonians and initial states with decaying correlations decomposes in the product of two densities: S(ω)ρ ò (ω) and N 2 E w s ( ). The density N 2 E w s ( )is a Gaussian with standard deviation controlled by the energy fluctuations of the initial state σ E , and S(ω)ρ ò (ω) is the density of the off diagonal elements of the observable A.
The dispersion of relevant gaps σ G , which we expect to estimate the equilibration time, is then controlled by the smallest of the standard deviations of these two densities. In the case that the system is globally out of equilibrium, e.g., a global quench, the variance of the energy density of the initial state is extensive with the system size, and n E s µ . This implies the following statement: Result 4 (Out of equilibrium observables in global quenches). Given a local Hamiltonian, and an initial state with clustering of correlations let A be an observable that can be described by a smooth function S(E, ω). Then, the only one way to avoid that the dispersion of relevant gaps σ G associated to an observable A diverges in the macroscopic limit is that the matrix-representation of the observable A in the energy basis is banded. More specifically, the density S(ω)ρ ò (ω) has a standard deviation σ A that is independent of the system size where μ A is the first moment.
Note that a divergent dispersion of relevant gaps σ G is expected to imply an equilibration time that tends to zero in the thermodynamic limit. In other words, observables which are not banded in their energy representation are expected to be always equilibrated, since the amount of time that they can be out of equilibrium is negligible.
It is worth mentioning that generic observables have a flat S(ω) and fulfil n A s µ due to the domination of the density of gaps ρ ò (ω). Thus, they will have microscopically short equilibration times [20]. Observables with the property (53) turn out to be both rare and physically relevant.
In [39] it is shown for several concrete examples that indeed the matrix elements S(ω) decrease exponentially or super-exponentially with ω from a certain threshold independent of the system size (see also section 4.3.1.2 of

Result 5 (Local operators are banded in the energy basis). Let us consider a local Hamiltonian
acting on a Hilbert space , with a locality structure given by a graph with a vertex set V and edge set  . Then, the matrix elements in the energy eigenbasis of a local operator E A E i x j á ñ | | acting on a site x fulfil the condition is the strength of the local interactions and α is the lattice animal constant [40] of the graph V ,  ( ).
The proof of result 5 is presented in appendix C. In particular, note that the lattice animal constant mentioned above is a parameter that captures the connectivity of the underlying graph of the Hamiltonian. For D-dimensional cubic lattices, it can be bounded as D 2 e  a (lemma 2 in [40]). In the period of finishing this manuscript we have been alerted to the existence of a result very similar to our result 5, due to Arad et al (theorem 2.1 in [23]). Both proofs are similar in spirit and give similar decay rates. For a D-dimensional cubic lattice with interactions in the edges, they obtain a decay rate of The main difference is that while we bound the number of terms by counting lattice animals, they use a combinatoric argument.
Of course, this behaviour of being banded in the energy basis extends to global operators that can be decomposed into a sum of local terms, as well as for operators that are not local in real space but in momentum space when the Hamiltonian is also local in the momentum representation. Note that, indeed, most observables considered in the literature are of this type.
Let us now consider the relevant scenario of a local quench [33], in which the system is brought out of equilibrium in only a local region of the system. In such a case, the width of the energy density of the initial state is independent of the system size and related to the operator norm of the perturbation applied on the system. Unlike the global quench scenario, now even the observables that are not banded (and are initially out of equilibrium) will take a finite non-negligible time to relax. The equilibration timescale is then governed by whichever energy scale is smallest: the energy fluctuations of the state, or the dispersion σ A of the observable. Note that our results also allow for having equilibration times that increase with the system size, as long as either σ A and σ E shrink with it.

Numerical example: the XXZ model
We illustrate our results using the XXZ model in a transverse field and with next-nearest-neighbour coupling. We choose to use this particular model since it is not integrable, and hence does not have an exponential number of degenerate gaps. | ( )| of this observable, in the sense of equation (5). The calculations were done using full exact diagonalization of H with Δ=0.5, J 2 =1.0 and h z =0.2, for various system sizes.
We expect g M (t) to go to zero when the system equilibrates. Indeed, this is what happens initially, for all system sizes, and we can notice that the equilibration time (the time when g t M 2 | ( )| becomes negligible, just before Jt=20) does not depend much on n. Furthermore, we can compare this value with our heuristic estimate for the equilibration time, T G eq p s , where we use equation (15) to calculate σ G from the numerically obtained eigenvalues. The results are shown in table 1. We can see that the estimated equilibration times also depend only weakly on n, and are in good agreement with the timescale indicated by figure 4.
Of course, due to the small size of the simulated systems, the time signals g M (t) also exhibit strong fluctuations. However, one can already see that, as n increases, the size of these fluctuations tends to decrease, and their onset happens later. Our numerical results therefore seem to corroborate our expectation that the observable M x does indeed equilibrate in the limit of large n, and that this equilibration does happen at a timescale roughly given by T G eq p s . To better illustrate the dephasing mechanism behind the equilibration process, in figure 5 we plot the amplitudes v e G t i a a for this same situation, in the case n=10. Starting from an initial condition where all the amplitudes are in phase (in this case, all real and negative), one can see them rotating at different speeds and becoming more spread out in the complex plane as time goes by, resulting in the decay seen in figure 4. Note that the approximate four-way symmetry exhibited at Jt=20 (implying g(t);0 at this time) is already a symptom of a future recurrence: clearly, after four times this interval, all of the amplitudes will have rotated approximately back to their initial position. Indeed, one can see in figure 4 that a recurrence occurs at around Jt=80.
It is also instructive to compare the exact time evolution of the magnetisation with coarse-grained versions derived according to the procedures described in section 4. In figure 6 we plot again the exact time signal g M (t) for the chain with n=12 spins (the dark blue line in figure 4), represented here by the black dotted curve. We  Table 1. Estimated equilibration times T eq for the XXZ model with next-nearest-neighbour coupling and an external magnetic field (Δ=0.5, J 2 =1.0 and h z = 0.2). The gap dispersion σ G was obtained by explicitly calculating equation (15) from the numerically obtained energy spectrum. also plot two coarse-grained time signals g ò M (t), with ò=0.4 (red line) and ò=0.02 (blue line). These curves were obtained by Fourier transforming coarse-grained frequency spectra (such as the one in figure 3) that were numerically calculated according to equation (26) (i.e., we did not simply dampen the exact time signal using equation (30)). It can be seen that all three signals are essentially indistinguishable up to the equilibration time (the slight deviation close to t = 0 is an artifact of our having discarded terms with v α <10 −4 when calculating the sum in equation (26)). We can see that, for small ò (=0.02), the signals remain indistinguishable up to and including the recurrence time. However, by choosing a value of ò that is sufficiently large (0.4), the coarsegrained signal faithfully reproduces the exact one during the equilibration phase, but suppresses later recurrences. Note that ò must still be chosen sufficiently small (ò = σ G ) in order to avoid suppressing the signal even before equilibration has occurred. As discussed in previous sections, under these circumstances we may use the width of the (square-integrable) coarse-grained signal as a measure of the equilibration time of the original signal.
Finally, let us remark that we also obtained similar results for other values of Δ and h z , and also for the XY model. However, in the latter case, the fluctuations do not decrease exponentially with n, but only polynomially, since there are an exponential number of degenerate gaps.

Reinterpretation of previous results
It is useful now to reinterpret some previous results on equilibration times from our dephasing point of view. For example, Short and Farrelly [15] obtain a rigorous upper bound for the equilibration time of any observable by studying the time-averaged fluctuations g T 2 á ñ | | in equation (8) above. They are able to determine a value T 0  d E /ΔE, where ΔE is the range of energies in the system, and d E the number of different energy levels, such that averages taken over intervals longer than T 0 are negligible. This implies that the equilibration time must be upper bounded by T 0 . Unfortunately, for a typical many-body system with n degrees of freedom, ΔE scales only polynomially with n, while d E scales exponentially with n-and thus so does T 0 . In other words, although this upper bound is mathematically sound, it vastly overestimates the actual equilibration time of most observables. This suggests that its derivation must be incomplete, in the sense of missing or disregarding an essential physical ingredient [20].
We now argue that this ingredient is, in a word, dephasing. Roughly speaking, in the course of their derivation, the authors bound g T 2 á ñ | | by separately bounding the absolute value of every term in its Fourier expansion, disregarding the interference between different terms due to dephasing. Each of these terms, which rotate according to [ ( ) ] does gets individually dephased, due to the time averaging, but only on a time scale t α β∼1/(G α −G β ). The bound in [15] hence corresponds to the amount of time needed for the  (26), with ò=0.4 (red) and ò=0.02 (blue). By choosing a value of ò that is sufficiently large (but not too large), the coarse-grained signal reproduces the exact one during the equilibration phase, but suppresses later recurrences. slowest term in the Fourier sum to average out over time. For many-body systems, the gaps G α , and also the differences between different gaps, can be exponentially small in n, hence the exponentially long upper bound.
Although this bound is therefore much too large to be a reasonable estimate the equilibration time of most observables, it must be stressed that one can always construct a specific observable A which saturates it. This is not entirely unexpected, as the bound itself is observable-independent. In fact, in [20], Goldstein et al construct such an observable, by considering a direct sum of banded matrices in the energy basis, each of which has a bandwidth that is exponentially small in the system size. From the dephasing mindset, it is straightforward to understand what is going on in this example. Since each component of A is banded with an (exponentially) narrow bandwidth, with no coherences between different bands, the dispersion of relevant gaps, σ G , is exponentially small. In this case, our estimate π/σ G for the equilibration time becomes exponentially large, since the small difference in angular speeds means that it must take a long time until all points are dephased and isotropically distributed in the complex plane. Analogously, in [20] it is also shown that one can always construct an observable which equilibrates extremely fast, by defining an A which is far from banded, having coherences between vastly different energies. Again, the dephasing picture intuitively explains the reason for such quick equilibration.
We can also understand some of the results obtained in another approach to the problem of relaxation of many-body systems, namely the study of the survival probability given by the quantum fidelity t t 0 2  y y á ñ ( ) ≔ | ( )| ( ) | , often in situations where the initial state 0 y ñ | ( ) is generated after a sudden displacement ('quantum quench') that brings the system out of equilibrium (see [19] and references therein for a review). Note that the quantum fidelity is, up to an additive constant, equivalent to the time evolution of the observable A 0 0 y y = ñ á | ( ) ( )|. In this case, then, both the energy fluctuations of the initial state and the bandwidth of the observable written in the energy basis are σ E . Hence, our considerations within the dephasing picture predict, for local interacting lattice systems, an equilibration time determined by σ E , in agreement with the results of [19].
Finally, our outlook and conclusions are also compatible, and in some senses complementary, to recent remarkable results by Reimann et al [17,18]. In these works, the time signal g(t) in equations (5) and (6) above is rewritten as g(t)=c F(t)+ξ(t), where (using our notations) and c is a constant (=g(0)). Note that F(t) depends only on the gap spectrum, and on the dimension of the system, but not on the initial condition, observable or eigenbasis of the Hamiltonian. It is then proven that, if one averages the time signal over certain ensembles of Hamiltonians with fixed spectra (i.e. varying only their eigenvectors) then both the average value and the standard deviation of ξ(t) become extremely small in the thermodynamic limit. In other words, for any fixed initial condition and observable of a quantum system, and also any given fixed energy spectrum, the 'typical' time signal will always be of the form g(t)=cF(t), up to negligible error. Comparing with equation (6) above, we can see that the effect of the ensemble average is to uniformize the different amplitudes v α , both in amplitude and in phase. As a result, the 'typical' equilibration dynamics described by equation (56) becomes a pure dephasing process, i.e, a sum of uniform-length vectors in the complex plane, all initially pointing along the positive real axis, which then rotate at different speeds. As we have argued above, dephasing/equilibration should then occur on a timescale of order π/σ G . Indeed, this can be seen in the examples worked out in these references. For instance, one situation considered in [17]  , where y>0 and c is a normalisation constant. It is also assumed that the system's initial state has support restricted to a narrow energy band [E−ΔE, E]. In this case F(t) can be calculated exactly; in particular it is shown that, for ΔE?k B T, it has the form of a Lorentzian function, where γ=k B T / ÿ. Let us now analyse this result from the point of view of our approach. First of all, as we have noted in section 3, restricting the initial state to a finite energy band, and therefore a finite gap spectrum, ensures that, for all observables, dephasing/equilibration must occur in a finite timescale.
Proof. By using twice the triangular inequality, we have g t g t g t g t g t g t g t g t . 6 1 The first and third terms are analogous and are bounded by equation (30) and the fact that g t 1  | ( )| . Concerning the second term, we note that, since g L 2  w Î ( ) , then the uniform continuity statement of functional analysis allows us to write g g g t g t g g sup , 62 and putting everything together, we get g t g t 2 1 e . 63 Finally, the bound (60) is respected as long as t 1 2 log 1 2 ) . Thus, equation (60) is guaranteed to hold for times t 2   d .
Result 6 shows that two frequency signals that become very similar once they are ò-coarse-grained have indistinguishable dynamics up to times ò −1 . This is particularly relevant in many-body systems where the separation between consecutive energy levels shrinks exponentially in the system size 7 . One can then consider the possibility of two many-body Hamiltonians with qualitatively different level statistics, for example one where the distribution of gaps E i+1 −E i between consecutive energy levels follows a Poisson distribution, and another with Wigner-Dyson statistics, giving rise to time signals that are nevertheless indistinguishable in practice for time up to and beyond the equilibration time.
In appendix D we present an example of this kind. We have estimated the one-norm distance between two coarse-grained frequency signals evolving according to two Hamiltonians with identical eigenbases but different level statistics (one Poissonian and the other Wigner-Dyson). For this case we find that where C is a numerical constant, n is the system size, ò the coarse-graining parameter and d eff the effective dimension. Note now that, assuming that d eff increases exponentially in n (d eff ∼exp(cn), for some constant c), then choosing ò∼exp(−cn/4) in equation (64) and using result 6 implies that the dynamics would not be affected up to times t∼exp(cn/4). (Here we are also assuming, as in the discussion following equation (38), ò ? E i+1 −E i , which also decrease exponentially. This requires choosing d eff to increase sufficiently slowly (c sufficiently small)). If these conditions are met, we obtain time signals originating from two Hamiltonians with different level-statistics but that for all practical purposes display the same dynamical behaviour.
Recall now that it is a well-known conjecture that a Poissonian nearest-neighbour gap distribution is a manifestation of integrability, and Wigner-Dyson statistics are a signature of quantum chaos. Our example seems therefore to show that it is possible for both kinds of Hamiltonian to lead to identical time signals, with identical equilibration times, at least some specific cases.
It is less clear what will happen in a more realistic example in which the two Hamiltonians H 1 and H 2 with different level statistics also have different eigenbases (as is in practice always the case). In these situations the two Hamiltonians are related by a perturbation V which does not commute with the integrable Hamiltonian H 1 and where both H 1 and V have a locality structure of the type in (41). In such a scenario, the coarse-grained energy density of the initial state is not affected by the perturbation since it keeps the Hamiltonian local. Thus, any change in the one-norm distance between frequency signals, and thereby in the dynamical behaviour of the system, must come from a drastic change in the matrix-elements of the observable. A question that arises beyond the scope of this paper is then how integrability, non-integrability, and chaos can be identified in the behaviour of the matrix-elements of the observable in the energy basis.

The dephasing mindset for quadratic Hamiltonians
A relevant point to discuss is to which extent the results presented in this paper are valid for integrable models. In this respect, let us focus on quadratic (bosonic or fermionic) Hamiltonians that can be brought to a diagonal form H ka a k k k e = å ( ) † , where a k ( †) are the annihilation (creation) operators that fulfil fermionic or bosonic commutation relations and k e( ) is the dispersion relation. For such systems and quadratic observables, the time signal can also be written in the form of equation (6), where the sum now does not run over the gaps of the Hamiltonian, but over the gaps of the dispersion relation k e( ). In [41] this is done in detail for the Caldeira-Leggett model (a quadratic system of harmonic oscillators). In sum, we see that the above formalism can also be applied to quadratic Hamiltonians, where, roughly speaking, the Hilbert space has been substituted by the space of modes.

Conclusions
In this work we have argued that equilibration in closed quantum systems should be understood as a process of dephasing of complex numbers in the complex plane. From this mechanism, we have heuristically estimated the equilibration time-scale as roughly the inverse of the dispersion of the relevant gaps. We have seen that, under physically relevant circumstances, the equilibration time-scale estimated in this way depends at most weakly on the system size, in agreement with realistic situations. Although our argument does not result in a rigorous bound, we claim that it captures the correct way in which the time-scale depends on the physical properties of the system. In particular, we have seen that the coherences of the observables of interest in the energy basis, E A E i j á ñ | | , play a fundamental role: in order to attain a finite equilibration time for generic initial states, these coherences must become small as E i −E j increases.
We have also observed that the size of the system only plays a role in the typical size of the fluctuations, but not in the time of equilibration, and thus small systems fail to equilibrate not because their equilibration time is large, but because their fluctuations are big. We illustrate these results with numerical simulations of spins chains.
Finally, we have applied the dephasing mindset to give an intuitive interpretation to earlier works on equilibration times. Our results satisfactorily reproduce many particular cases of determining equilibration time scales found in the literature. Of course, further work is still required to put our claims on a more rigorous foundation. For example, we conjecture that in a sufficiently wide range of locally interacting n-body systems, the coarse-grained frequency signal g  w ( ) may itself approach a Gaussian in the limit of large n. In this case, the Heisenberg-like uncertainty principle we have been using to heuristically estimate the equilibration timescale would become close to saturated, and would therefore indeed be a bona-fide measure for it. ,  å = á ñ -á ñ = á ñ -á ñ á ñ y y y y y y Î ( ) ( ) ( )

A.2. Effective dimension
The effective dimension tells us how many eigenstates of the Hamiltonian contribute in the superposition of the initial state. In the previous section we have argued that the energy uncertainty of the initial state scales with n in the case where the system is brought out of equilibrium with a global quench, and is independent of the system size in the case where the system suffers a local quench. Local Hamiltonians have an energy range that scales linearly in the system size while the dimension of the Hilbert space does so exponentially. This implies that the density of states scales exponentially, and so does the effective dimension.
The number of lattice animals of size l is upper-bounded by means of the so called lattice animal constant α. Given a graph V ,  ( )and denoting by a l the number of lattice animals of size l containing the fixed vertex x, then the animal constant α is the smallest constant satisfying ). In order for the series to converge we require that β is such that e 1 1 We compute the absolute value of the matrix-element E E i j á ñ |·| in equation (C.1) ) . In order to get an explicit bound independent of β, we optimise over β. Given some energy difference E i −E j , we look for the β that minimises the upper bound of equation (C.14). To do so, it is useful to rewrite the bound in terms of a new parameter z e  Appendix D. One-norm distance between coarse-grained frequency signals of two systems which only differ in the level statistics Two Hamiltonians H (1) and H (2) with identical eigenstates but eigenvalues with different level statistics give rise to time signals for equal initial states and observables given by Note the standard deviation increasing with the separation of the energy levels. This is indeed the behaviour of δG α if both s k (1) and s k (2) are independent random variables. However, in our example the spectrum E k (2) is built from E k (1) with the single goal of changing its level-statistics. This can be made shifting the energy levels by an amount independent of the separation between them. One way to achieve so is to group the energy levels E k k of consecutive L energy levels, where j labels the different sets. Then, the first and the last energy levels of every set are kept fixed and the other energy levels are shifted according to the other level-statistics. In such a case, the random variable δG α is bounded by a constant independent of the system size. One strategy to bound the one point distance between the coarse-grained frequency signals would be to use the triangular inequality as follows g g v u G D.12