Eigenstate thermalization: Deutsch's approach and beyond

The eigenstate thermalization hypothesis (ETH) postulates that the energy eigenstates of an isolated many-body system are thermal, i.e., each of them already yields practically the same expectation values as the microcanonical ensemble at the same energy. Here, we review, compare, and extend some recent approaches to corroborate this hypothesis and discuss the implications for the system's equilibration and thermalization.

9 Summary and conclusions 21

Introduction
The relaxation of a macroscopic many-body system towards thermal equilibrium is a very common phenomenon, but has still not been satisfactorily understood theoretically [1,2]. In particular, isolated systems and their text-book description at equilibrium by a microcanonical ensemble [3] have recently regained considerable attention [4][5][6]. An immediate first puzzle is the mere fact that the system apparently approaches a steady long-time limit though the quantum mechanical time evolution of a non-equilibrium initial state is well-known not to become asymptotically time-independent (see e.g. Sect. 2 below). As one possible way out, one could, for instance, try to show that after a sufficiently long "equilibration time", the expectation values of pertinent observables become "practically constant" (fluctuations remain below any reasonable resolution limit) for "practically all" later times (exceptions do exists -e.g. due to quantum revivals -but are exceedingly rare). Indeed, results of this type have been established under fairly weak and plausible assumptions about the initial state, the Hamiltonian, and the observables of the considered system [2,[7][8][9][10][11][12]. As an natural next step, quantitative estimates of the above mentioned equilibration times are currently attracting increasing interest [10,[13][14][15][16]. This is a very important but also quite difficult issue of its own right, which goes beyond the scope of our present paper. Here, we rather will focus on another natural next issue, named thermalization: Given that the expectation value of an observable equilibrates in the above sense, how well does this long-time limit agree with the corresponding microcanonical expectation value, as predicted by equilibrium statistical mechanics? A sufficient condition for a good such agreement is the so-called eigenstate thermalization hypothesis (ETH), essentially postulating that the expectation values of pertinent observables exhibit negligible variations for all energy eigenstates with sufficiently close energy eigenvalues. This hypothesis has its roots in closely related conjectures by Berry and Voros about the energy eigenstates of (fully) chaotic systems in the semiclassical limit, see e.g. Eq. (9) in [17] and Eq. (6.17) in [18]. Their implications for the (diagonal as well as off-diagonal) matrix elements in energy representation for observables with a well-behaved classical limit were further explored by Feingold and coauthors [19][20][21]. The key role of ETH for thermalization in high dimensional chaotic systems in the semiclassical regime was first recognised by Srednicki ‡ [22][23][24]. Even earlier, its actual validity was numerically (and implicitly) exemplified and adopted as an explanation for the observed thermalization in a spin-chain model by Jensen and Shankar [25]. More recently, the seminal paper by Rigol, Dunjko, and Olshanii [26] introduced the term ETH, pinpointed its importance for thermalization, and stimulated numerous, predominantly numerical studies on the ‡ It may be worth noting that in those semiclassical studies [17][18][19][20][21][22][23][24] the term "microcanonical ensemble" is used quite differently than in the present paper (see Sect. 3), namely referring to classical phase space averages over an infinitely thin energy surface. validity of ETH for a large variety of specific models (mostly spin-chain-or Hubbardlike), initial conditions (often involving some quantum quench), and observables (mainly few-body or local), see e.g. [27][28][29][30][31][32][33][34][35][36][37][38][39][40].
Mathematically, the validity of ETH could be demonstrated so far only in special cases, namely for the eigenfunctions of the Laplace operator on an arbitrary dimensional compact Riemannian manifold whose geodesic flow is ergodic. If also the considered observables are sufficiently well-behaving, then ETH can be proven to hold for the vast majority of all eigenfunctions with asymptotically large eigenvalues [41][42][43].
Another analytical key work is due to Deutsch, implicitly verifying ETH for the vast majority of systems, whose Hamiltonians have been sampled according to a certain random matrix ensemble [44][45][46]. Here, we generalise this approach by Deutsch and unravel its close connection with other recent explorations of thermalization, especially by Goldstein and coworkers [5,13,16,[47][48][49].

Equilibration
We consider a large (macroscopic but finite), isolated system, modelled in terms of a Hamiltonian H with eigenvalues E n and eigenvectors |n , where n ∈ N and E n+1 ≥ E n . System states -either pure or mixed -are described by density operators ρ(t), evolving according to ρ(t) = U t ρ(0)U † t with propagator U t := exp{−iHt} and = 1. It follows that ρ mn (t) := m|ρ(t)|n is given by ρ mn (0) exp[−i(E m − E n )t], i.e., unless the system was already in a steady state initially, it remains time-dependent forever. In other words, non-equilibrium initial states do not seem to "equilibrate" towards a steady long-time limit in an obvious way.
Observables are represented by self-adjoint operators A with expectation values Tr{ρ(t)A}. In order to model real experimental measurements it is, however, not necessary to admit any arbitrary self-adjoint operator [50][51][52][53][54][55][56][57]. Rather, it is sufficient to focus on experimentally realistic observables in the following sense [7,58]: Any observable A must represent an experimental device with a finite range of possible outcomes of a measurement, where a max and a min are the largest and smallest eigenvalues of A. Moreover, this working range ∆ A of the device must be limited to experimentally reasonable values compared to its resolution limit δA. Indeed, real measurements usually yield at most 10-20 relevant digits, i.e. it is sufficient to consider range-to-resolution ratios ∆ A /δA ≤ 10 20 . Next we define for any given δA > 0 and T > 0 the quantity where |{...}| denotes the size (Lebesgue measure) of the set {...} and where the timeindependent, so-called equilibrium or diagonal ensemble ρ eq is defined as the diagonal part of ρ(0), i.e. (ρ eq ) mn := δ mn ρ nn (0). As detailed e.g. in [11,12], one then can show that for all sufficiently large T . For the sake of simplicity, we also have taken here for granted that the energy gaps E m − E n are finite and mutually different for all pairs m = n. Generalisations have been worked out e.g. in [9][10][11][12]. According to (2), the left hand side of (3) represents the fraction of all times t ∈ [0, T ], for which there is an experimentally resolvable difference between the true expectation value Tr{ρ(t)A} and the time-independent equilibrium expectation value Tr{ρ eq A}. On the right hand side, ∆ A /δA is the above mentioned range-to-resolution ratio and max n {ρ nn (0)} represents the largest occupation probability of an energy eigenstate (note that the ρ nn (t) are conserved quantities).
For a macroscopic N-body system there are roughly 10 O(N ) energy eigenstates with eigenvalues in every interval of 1J beyond the ground state energy [3]. Since N = O(10 23 ), the energy levels are thus unimaginably dense and even the most careful experimentalist will not be able to populate only a few of them with significant probabilities ρ nn (0). In the generic case we thus expect [7,58] that -even if the system's energy is fixed up to an extremely small experimental uncertainty, and even if the energy levels are populated extremely unequally -the largest population ρ nn (0) will still be extremely small (compared to n ρ nn (0) = 1), overwhelming by far the factor (∆ A /δA) 2 on right hand side of (3).
Since the level populations ρ nn (0) are the result of the system preparation, a more detailed understanding and quantification of those terms necessarily requires the modelling of such a preparation procedure. We come back to this point in Sect. 7, where arguments will be provided that can be expected in many cases. From (3) together with (4) we can conclude that the system generically equilibrates in the sense that it behaves in every possible experimental measurement exactly as if it were in the equilibrium state ρ eq for the overwhelming majority of times within any sufficiently large time interval [0, T ].

Thermalization
Next we address the question whether, and to what extent, the above discussed equilibrium expectation value Tr{ρ eq A} is in agreement with the corresponding microcanonical expectation value, as predicted by the textbooks on equilibrium statistical mechanics for our isolated N-body system at hand [3].
To begin with, I mic := [E−∆E, E] denotes the usual microcanonical energy window about the (approximately known and thus preset) system energy E, whose width ∆E is macroscopically small (below the experimental resolution limit) but microscopically large (much larger than the typical energy level spacing E n+1 − E n ). The number of energy eigenvalues E n contained in I mic is denoted as D and is typically very large. The corresponding microcanonical ensemble is given by where the sum mic runs over all n with E n ∈ I mic . In other words, ρ mic nn = 1/D if E n ∈ I mic and ρ mic nn = 0 otherwise. Hence, the expectation value of A in the microcanonical ensemble takes the form On the other hand, recalling that (ρ eq ) mn := δ mn ρ nn (0) implies As usual, we henceforth assume that the system is experimentally prepared at the preset macroscopic energy E, i.e. also the ρ nn (0)'s are negligibly small for energies E n outside I mic . However, within I mic the actual populations ρ nn (0) are still largely unknown and cannot be controlled by the experimentalist. In general we therefore have to admit the possibility that they considerably vary in a largely unknown (pseudo-random) fashion even between neighbouring n's. The problem of thermalization thus amounts to showing that the difference between (6) and (7) is negligible in spite of the lack of knowledge about the ρ nn (0)'s.

Eigenstate thermalization hypothesis (ETH)
As mentioned in the introduction, the ETH consists in the surmise that the expectation values A nn of an observable A hardly differ for eigenstates |n of a many-body Hamiltonian H with sufficiently close energy eigenvalues E n [22-24, 26, 44, 45]. In particular, if the variations of the A nn 's are negligible over the entire microcanonical energy window I mic , then the (approximate) equality of (6) and (7) follows immediately. In this sense, ETH is a sufficient (but not necessary) condition for thermalization.
Similarly as for the microcanonical ensemble in (6), ETH also implies the equivalence of ρ eq in (7) with a large variety of other pure or mixed steady states, whose level populations are mainly concentrated within the energy window I mic . On the one hand, this includes other equilibrium ensembles such as the canonical ensemble, provided the considered energy interval ∆E is large enough to accommodate all notably populated energy levels. (As we will see later, the latter requirement is in fact quite problematic.) On the other hand, even a single energy eigenstate |n with E n ∈ I mic will do. In other words [26], such an energy eigenstate encapsulates all properties of the considered many-body system at thermal equilibrium! Two rather delicate problems, which any "validation" of ETH has to resolve, are as follows: (i) For any given Hamiltonian H, one can readily construct (a posteriori) observables A which violate ETH, e.g. A nn = (−1) n and arbitrary A mn for m = n.
(In contrast to what Ref. [39] might suggest, an ETH-violating observable thus needs not be a conserved quantity.) In particular, this example implies that ETH cannot be satisfied simultaneously for all observables, and in fact not even for all experimentally realistic observables as specified below Eq. (1). (ii) While ETH claims that expectation values A nn are (practically) equal for sufficiently close energy eigenvalues E n , generically there are -of course -notable differences A mm − A nn when E m − E n is not small. But how can the observable A "feel" whether the two eigenstates |m and |n of H belong to similar energies or not, without any a priori knowledge about the Hamiltonian H ?
At first glance, it thus might seem unavoidable to somehow restrict the set of admissible observables. Indeed, the early explorations of ETH [17][18][19][20][21][22][23][24] solely had in mind semiclassical (small ) systems, which are classically chaotic, in conjunction with observables, which are -independent and derive from smooth classical phase space functions (see Sect. 1). In contrast, the more recent, predominantly numerical studies were mainly focused on spin-chain-and Hubbard-like models [27][28][29][30][31][32][33][34][35][36][37][38][39][40] (i.e. without an obvious classical limit), and on few-body or local observables. Yet another option would be to only admit macroscopic observables, see Sect. 8 below. In either case, it is still not obvious whether and why such a restricted class of observables may get around the above mentioned problems (i) and (ii). The solution of those problems within our present approach will be discussed in Sect. 9.

The approach by Deutsch
In this section, we reconsider the approach by Deutsch, originally published in [44]. For the detailed calculations, announced as Ref. [6] therein, see [45]. For an updated summary, see also [46].

Random matrix model
Following Deutsch [44] we consider Hamiltonians H of the form consisting of an "unperturbed" part H 0 and a "perturbation" V . As before, eigenvectors and eigenvalues of H are denoted as |n and E n with E n+1 ≥ E n . Likewise, those of H 0 are denoted as |n 0 and E 0 n with E 0 n+1 ≥ E 0 n . Typical examples one has in mind [44] are H 0 which describe a non-interacting many-body system, e.g. an ideal gas in a box, while V accounts for the particle-particle interactions. Further examples are so-called quantum quenches: H 0 describes the system for times t < 0, while H applies to t ≥ 0. In other words, some external condition or some system property suddenly changes at time t = 0.

In various such examples, the perturbation matrix
is often expected or numerically found to be a banded matrix [21,44,59,60], i.e., the typical magnitude of V 0 mn decreases with increasing |m−n| towards zero. Furthermore, in the above mentioned example where H 0 describes a non-interacting many-body system, the perturbation matrix V 0 mn is usually very sparse, i.e., only a small fraction of all matrix elements is non-zero [60][61][62].
In any case, the perturbation V is required to be sufficiently weak so that the two systems H and H 0 still exhibit similar thermodynamic properties at the considered system energy E, in particular similar densities of the energy levels, see above Eq. (4).
As a next step, the common lore of random matrix theory is adopted [44,47,61]: One samples matrices V 0 mn from a certain random matrix ensemble with statistical properties which imitate reasonably well the main features of the "true" perturbation V (band structure, sparsity etc.), and it is assumed that if a certain property can be shown to apply to the overwhelming majority of such randomly sampled V -matrices, then it will also apply to the actual (non-random) V in (8). A priori, such a random matrix approach may appear "unreasonable" since most of those randomly sampled perturbations V amount to systems which are physically very different from the one actually modelled in (8). Yet, in practice such a random matrix approach turned out to be surprisingly successful in a large variety of specific examples [61], and hence, as in Deutsch's work [44], will be tacitly taken for granted from now on.

General framework
The randomness of V entails via in (8) a randomisation of the eigenstates |n of H and hence of the basis-transformation matrix Likewise, any given observable A and its matrix elements in the unperturbed basis are non-random quantities, while will be the elements of a random matrix, inheriting the randomness of the U-matrix via Demonstrating ETH thus amounts to showing that A mm − A nn is small for most V 's and sufficiently close m and n. Formally, this will be achieved by considering the variances where ... V indicates the average over the random perturbations V . In a first step (Sect. 5.4), we will show that the mean values A nn V for sufficiently close n's differ very little in comparison to the experimental resolution limit δA introduced in Sect. 2. In a second step (Sect. 5.5), we will show that σ n ≪ δA, implying that A nn differs very little from A nn V for most V . Altogether, this will imply the desired result that for most V 's the A nn 's change very little upon changing n (Sect. 5.6).
To simplify the algebra, we henceforth assume that the largest and smallest eigenvalues of A in (1) satisfy As a consequence, a max = ∆ A /2 according to (1). Note that the assumption (15) does not imply any loss of generality, since adding an arbitrary constant (times the identity operator) to the observable A, and hence to all its eigenvalues, does not entail any nontrivial physical consequences. In particular, the above mentioned changes of A nn V upon variation of n and the variances (14) remain exactly the same. For later use, we thus can conclude that for any ν ∈ N.
For the sake of simplicity, we furthermore assume that all matrix elements V 0 mn from (9) and U mn from (10) are real numbers. For example, for systems without spins and magnetic fields, H 0 and V in (8) are both purely real operators in position representation and hence the eigenstates |n 0 and |n can be chosen so that all V 0 mn and U mn become real. So, it is natural to assume that also the corresponding random matrix ensembles only involve real matrix elements. In particular, this implies with Eqs. (13) and (14) that It may well be that our subsequent calculations can be readily extended to systems for which such a transformation to purely real matrices V 0 mn and U mn is no longer possible. However, the so far available knowledge, e.g. from random matrix theory or numerical investigations, regarding the statistical properties of the U mn 's is only sufficient for our purposes for real matrices (see next Section). Since the subject of our paper is not the exploration of such statistical random matrix properties but rather their implications with respect to ETH, we confine ourselves to the case of real matrices.

Properties of U mn
In view of (17), (18), some basic statistical properties of the matrix elements U mn are needed in order to make any further progress.
At this point it is crucial to note that the Hamiltonian H in (8) gives rise to a very special type of random matrix. Namely, the matrix 0 m|H|n 0 is the sum of the above discussed random perturbation V 0 mn and of the non-random diagonal matrix 0 m|H 0 |n 0 = δ mn E 0 n , whose diagonal elements E 0 n grow approximately linearly with n (at least within a sufficiently small vicinity of the preset system energy E, onto which we tacitly restrict ourselves, see also Sects. 3 and 5.6). Out of the huge literature on random matrix theory, only a relatively small number of works pertains to this special case, see e.g. [46,60,62,63] and further references therein. Strictly speaking, they are obtained for infinitely large matrices V 0 mn , whose statistical properties do not depend on m and n separately, but only on the difference m − n. Likewise, the unperturbed matrix 0 m|H 0 |n 0 is assumed to be infinitely large and of the form δ mn E 0 n with equally spaced energy gaps E 0 n+1 − E 0 n . Intuitively, these seem quite plausible approximations, at least for not too strong perturbations V in (8). They can be readily justified by numerical examples, but somewhat more rigorous analytical results do not seem to exist. Here, we adopt the widely accepted viewpoint that, for out present purposes, they can be taken for granted [44,60,61].
As a consequence, also the statistical properties of the U mn only depend on m − n, e.g. the ν-th moments are of the form where the u ν (n) are real (but not necessarily even [63]) functions of n, and are furthermore non-negative for even ν.
Known analytical results mainly concern the second moment u 2 (n) for various ensembles of possibly banded and/or sparse random V -matrices, see e.g. [46,60,62,63] and references therein. In all cases, it is found that u 2 (n) is monotonically decreasing for n ≥ 0 and monotonically increasing for n ≤ 0, hence exhibiting a global maximum at n = 0: Since k U mk U nk = δ mn we can conclude that implying that u 2 (n) must approach zero for large |n|.
In all those analytical results, the mean values V 0 mn V are tacitly assumed to vanish and it is found that u 2 (n) then only depends on the second moments (V 0 mn ) 2 V . Since we are not aware of any justification for this assumption V 0 mn V = 0, we have numerically investigated various examples and found that, indeed, the statistical properties of the U mn 's seem to be independent of the first moments V 0 mn V (while keeping all other cumulants fixed). Furthermore, a simple physical argument is as follows: Replacing an eigenstate |n 0 of H 0 by −|n 0 is supposed not to change any physically relevant properties of the given (non-random) model in (8). Note that this argument applies separately to any single n. Hence, it is quite plausible that upon randomly flipping the signs for half of all n's, the resulting "new" V 0 mn 's will be "unbiased" for m = n. (More precisely: if a random matrix description works at all, then an ensemble with V 0 mn V = 0 seems most appropriate). Finally, a possibly remaining systematic "bias" of the diagonal elements V 0 nn can be removed by adding an irrelevant constant to V . (The typical magnitude of the V 0 mn 's is also estimated in Appendix B of [45], however, not taking into account the possibility that their average may be zero.) For the rest, the detailed properties of u 2 (n) are found -as expected -to still depend on the quantitative details of (V 0 mn ) 2 V . Since no general statement about the latter seems possible for the general class of systems we have in mind with (8), we will focus on conclusions which do not depend on the corresponding details of u 2 (n). Rather, we will only exploit the following very crude common denominator of all so far explored particular classes of random matrices V mn , see e.g. [46,60,62,63] and further references therein: The basic physical reason is that exceedingly "weak" perturbations V in (8) are tacitly ignored so that the smallest relevant energy scale is the mean level spacing E n+1 − E n , being of the order of 10 −O(N ) J according to Sect. 2. Moreover, the ratio between this energy scale and any other relevant energy scale of the system can be very roughly estimated by 10 −O(N ) , independently of any further details of the specific model system in (8). As a consequence, also the very crude estimate (22) is independent of these details. Further statistical properties of the U mn , which we will, similarly as in [44,45,60,62,63], take for granted later on, are: (i) Their average is zero, i.e., u 1 (n) = 0 for all n.
(ii) They are statistically independent of each other, i.e., (iii) Their distribution does not exhibit long tails, i.e., with an n-independent constant c, which may possibly be very large but which is required not to be so large that it can compete in order of magnitude with 1/u 2 (0) from (22). For instance for a system with N = 10 23 particles, it would be sufficient that c ≤ 10 10 22 .
In other words, we adopt the very weak assumption E.g., for a Gaussian distribution (with zero mean, cf. (23)) one finds that u 4 (n) = 3u 2 2 (n) and hence (25) is satisfied for c = 3. Though a Gaussian distribution is often taken for granted [22,44,45], non-Gaussian distributions have been actually observed e.g. in [59] and also in our own numerical explorations (unpublished), but the more general condition (25) was still satisfied in all cases. We also note that since u 2 (n)/u 2 (0) approaches zero for large |n|, the condition in (25) becomes weaker and weaker with increasing |n|.

Mean values
Evaluating (17) by means of (19), (23), and (24) yields It follows that and hence that The maximum over j can be estimated from above by ∆ A /2 according to (16). Recalling that u 2 (k) is monotonically decreasing for k ≥ 0 and monotonically increasing for k ≤ 0 (see above (20)), the sum over k amounts to 2 u 2 (0) (more generally, this sum amounts to the total variation of u 2 (k); hence, if u 2 (k) exhibits M local maxima, it can be estimated from above by 2M max k u 2 (k)). Altogether, we thus can conclude that This upper bound is tight: One can readily find examples A for which (30) becomes an equality. Moreover, it follows that By generalising the line of reasoning in (29), (30) one can also show that with κ := |m − n| − 1. Under certain conditions, the bound (31) may be better than (32), but never by more than a factor of 2. For sufficiently large |m − n|, (32) is always better since the sum on the right hand side is bounded by unity (see (21)), while the right hand side of (31) is unbounded. (The relevance of large |m − n|-values will become apparent in Sect. 5.6 below). In any case, (32) is a rather tight bound in the sense that one can find examples for A 0 jj so that the left hand side is larger than the right hand side divided by 2 for a set of suitably chosen pairs (m, n) so that the differences m − n may still take any integer value.

Variances
We rewrite the variance from (18) as and evaluate the four-fold sum by distinguishing 4 possible cases. Case 1: i = k and i = j. In this case, we only have to keep summands with l = k: otherwise the factor U nk on the right hand side of (33) would be independent of the remaining three factors according to (24), and the corresponding summand would vanish according to (19) and (23). Case 2: i = k and i = j. As before, we can conclude that only summands with l = i and j = k give rise to non-vanishing terms. Case 3: i = k and i = l, implying, as before, that only j = l contribute. Case 4: i = k and i = l, implying j = i. Consequently, we can rewrite (33) as where the four summands correspond to the above four cases and can be rewritten as: With the help of (24) and (19) we can rewrite (35) as where we exploited (27) in the last equation. Likewise, one finds that Introducing these results into (34) thus yields The three factors on the right hand side of (44) are all non-negative and hence With (11) one readily sees that the last sum over k amounts to 0 i|A 2 |i 0 =: (A 2 ) 0 ii . Exploiting (20) we thus obtain Likewise, since u 2 (i) ≥ 0 for all i, the modulus of (45) can be estimated as Turning to (46), we first note that the last factor κ 4 := u 4 (n − i) − 3u 2 2 (n − i) represents the 4th cumulant of the random variable U ni . For a Gaussian distribution (the case considered by Deutsch [44,45]), this cumulant vanishes, but for more general distributions it may be finite and of either sign. We thus estimate |κ 4 | from above by u 4 (n − i) + 3u 2 2 (n − i). Observing (26) and u 2 2 (n − i) ≤ u 2 (0)u 2 (n − i) (see (20)), we thus can bound (46) by Next, we invoke the Cauchy-Schwarz inequality to conclude for arbitrary Hermitian operators B and vectors |ψ . In particular, it follows that Finally, we exploit (16) and (21), resulting in where we used (26) in the last step.

Discussion
From (14), (53), and Markov's inequality it follows that for any ǫ > 0, where Prob(X) denotes the probability that a randomly sampled V in (8) entails property X. For instance, if in the last term O(N) = 10 23 and ǫ = ∆ A 10 −10 22 then the right hand side of (54) is still 10 −O(10 23 ) . Consequently, the joint probability that every A nn is practically indistinguishable from A nn V simultaneously for all n ∈ {n 0 , .., n 0 + ∆n} still remains negligibly small if 0 ≤ ∆n ≪ 10 O(N ) . On the other hand, (22) and (31) imply that the difference A mm V − A nn V remains below the experimental resolution limit δA of A (cf. Sect. 2) even for quite large range-to-resolution ratios ∆ A /δA, provided |m − n| remains much smaller than of the order of 10 O(N ) . In other words, also the variations of the A nn V remain negligibly small within the "window" of n-values {n 0 , .., n 0 + ∆n} if 0 ≤ ∆n ≪ 10 O (N ) .
Altogether, we thus arrive at the conclusion that for the vast majority of randomly sampled perturbations V in (8), the A nn 's remain practically constant (below the experimental resolution limit) as long as n varies by much less than 10 O(N ) .
The latter property is sometimes referred to as the strong ETH [33,38]. It immediately implies the practical indistinguishability of the two expectation values (6) and (7), and hence thermalization, provided both the ρ nn (0) and the ρ mic nn are negligibly small outside a window of n-values much smaller than 10 O(N ) , but otherwise without any further restriction on the initial condition ρ(0).
If the range ∆n of admitted n-values is not any more much smaller than 10 O(N ) , then we can no longer conclude from (53) that with high probability all the A nn 's remain simultaneously close to the A nn V 's. However, we still can conclude that for the vast majority of n's, those differences remain negligibly small §. If, in addition, also the variations of the A nn V 's would remain small, we still could conclude that "most" A nn 's are practically equal (for the overwhelming majority of V 's), i.e. the so-called weak ETH is satisfied [33,38]. As a consequence, the expectation values (6) and (7) would again be practically equal under certain additional conditions on the initial condition ρ(0). For instance, the total weight of all ρ nn (0) ′ s corresponding to exceptionally large differences A nn − A nn V should remain sufficiently small. E.g. (4) would obviously be a sufficient condition.
However, this line of reasoning contains a problem: If the variations ∆n of n are not any more much smaller than 10 O(N ) , then Eqs. (31) and (22) no longer imply that the variations of A nn V remain negligible. The same conclusion follows from the bound (32). Since the latter bound is already rather tight (see below (32)), we can conclude that the restriction to windows of n-values much smaller than 10 O(N ) is not merely a technical problem but rather an indispensable prerequisite of the random matrix model from Sect. 5.1. In particular, this restriction also concerns the original findings by Deutsch [44]. § With x n := |A nn − A nn V | and δ := (∆ A /ǫ) 2 10 −O(N ) , Eq. (54) can be rewritten as Θ(x n −ǫ) V ≤ δ, where Θ(x) is the Heaviside step function. Furthermore, Z ǫ := D n=1 Θ(x n − ǫ) counts how many of the x n 's exceed ǫ. It follows that Z ǫ V ≤ D δ and with Markov's inequality that Prob(Z ǫ ≥ qD) ≤ δ/q, where Prob(Z ǫ ≥ qD) is the probability that more than a fraction q of all x n 's exceed ǫ.
Note that ∆n from above is identical to the number D of energy eigenvalues E n contained in the microcanonical energy window I mic := [E − ∆E, E] from Sect. 3. The above discussed restriction thus amounts to D ≪ 10 O(N ) (55) and implies that ∆E must remain very much smaller than any macroscopically resolvable energy difference. (This follows from the fact that the energy eigenstates are exponentially dense in the system size N, see Sect. 2).

Srednicki's ETH for the off-diagonal matrix elements
In Refs. [23,24], Srednicki formulated, besides the so far considered ETH for the diagonal matrix elements A nn , also a corresponding ETH for the off-diagonal elements A mn with m = n. This hypothesis can also be readily confirmed within our present framework: Similarly as in (17), we find from (13) that For m = n, the last factor U mj U nk V equals U mj V U nk V according to (24), and hence vanishes according to (19) and (23). In conclusion for all m = n.
Turning to the second moment (variance), one finds similarly as in (14), (18) that According to (24), for m = n the last term U mi U nj U nk U ml V now factorises into Step by step as in (44), (47), (48), (53) it follows that for all m = n. In view of (22) we see that the off-diagonals |A mn | are typically exponentially small in the system size N, in agreement with Srednicki's prediction in Refs. [23,24]. The overall conclusion applying to any given Hermitian operator A of finite range ∆ A is: The representation of A in the eigenbasis of H is, for the overwhelming majority of randomly sampled perturbations V in (8), very close to a diagonal matrix, whose diagonal elements A nn change very slowly with n.

Implications for the level populations ρ nn (0)
Throughout this section, we consider the density operator ρ(0) (pure or mixed state) from Sect. 2 and abbreviate it as ρ.
Since ρ is a Hermitian operator, all results so far for general observables A are immediately applicable to ρ. However, this particular observable A = ρ also exhibits some subtle special features. Therefore, we first focus on a simple example.

Simple example
We consider a pure energy eigenstate of the unperturbed Hamiltonian H 0 in (8), i.e.
with an arbitrary but fixed m. Its eigenvalues are either zero or one, hence the range from (1) is ∆ ρ = 1. Observing that ρ 0 ik = δ im δ km it follows from (13) and (17) where the variance from (14) is given for A = ρ by As a concrete example, we may focus on Gaussian distributed U mn 's (see below Eq. (26)), so that u 4 (n) = 3u 2 2 (n) and hence σ 2 n = 2 u 2 2 (n − m) .
Altogether, the standard deviation σ n of the random variable ρ nn from (62) is thus comparable to its mean value ρ nn V , and both are, according to (20) and (22), extremely small compared to the range ∆ ρ = 1 of the considered observable ρ. Within any reasonable resolution limit δρ of this observable we thus can conclude that, for the vast majority of random perturbations V in (8), all ρ nn 's are practically equal (namely zero), in agreement with the general validation of ETH from Sect. 5. But for our present purposes, this usual resolution limit δρ is still way too large. On the actual scale of interest, the ρ nn 's from (62) are not at all a slowly varying function of n, but rather exhibit very significant random fluctuations (see also (24)). In particular, it would be wrong to argue that the ρ nn (0)'s in (7) are now practically constant and hence, upon comparison with (6), thermalization follows.

General case
We return to general density operators ρ, i.e., we only assume that ρ is a Hermitian, non-negative operator of unit trace and purity In the following, we will exploit these properties of ρ, which, however, would be lost after adding a constant to ρ so that (15) is satisfied. Hence we only can employ those previous results which were obtained without the help of (15). Along these lines, one finds exactly as in (27) and (52) that where σ 2 n is defined in (65) and (ρ 2 ) 0 ii := 0 i|ρ 2 |i 0 . Likewise, (57) and (59) yield for all m = n.
Introducing (20) into (68) implies where we exploited that the sum over j equals Tr ρ = 1. Likewise, (69) yields Rewriting the definition from (28) as we find along similar lines of reasoning that Moreover, we can conclude that Similarly as in (30), the last sum is seen to be equal to 2 u 2 (0). The remaining sum over j equals Tr ρ = 1 and hence n |∆ n | ≤ 2 u 2 (0) .
Eqs. (72), (73) indicate that, in contrast to pure states in (61), for mixed states of small purity Tr{ρ 2 }, the random fluctuations of the ρ nn 's about their mean values may become negligible. Moreover, the right hand side of (75) usually turns out [44,45,60,62,63] to be of the order of u 2 2 (0). The same conclusion is also suggested by (77). Consequently, also the variations of ρ nn V as a function of n become small. Unlike for pure states we thus can now conclude that (6) and (7) are approximately equal, implying thermalization. However, assuming a small purity of ρ represents a quite strong restriction in the first place.
Returning to general ρ, we can deduce from (20) and (68) that The sum over n equals one according to (21) and the two remaining sums are equal to Tr ρ = 1, i.e., Likewise, Eqs. (67) and (69) imply Altogether, this yields and with (25) Since (max n ρ nn ) 2 = max n (ρ nn ) 2 ≤ n (ρ nn ) 2 it follows that and hence by Markovs's inequality that Prob max see also the explanations below (54).

Off-diagonals
We focus on the off-diagonal matrix elements ρ mn with m = n. Their mean values are zero according to (70). Their variance can be readily bounded by the Cauchy-Schwarz inequality Likewise, introducing |ρ 0 ik | 2 ≤ ρ 0 ii ρ 0 kk into (71) yields with (68) the estimate Similarly as below (47) one sees that mn |ρ mn | 2 = Tr{ρ 2 } and hence m =n According to (82), the last inequality in (87) is in fact a very tight upper bound. The same estimate follows by summing in (71) over m and n. Neither of these results indicate that the off-diagonal matrix elements are typically much smaller than the diagonal elements. We thus conjecture that typical off-diagonal matrix elements will in fact not be small compared to the diagonal elements. Trivial exceptions are pure states (61). Non-trivial exceptions may be mixed states of low purity, similarly as below (77).

Discussion
The main result of this section is (84): It implies that for the overwhelming majority of randomly sampled perturbations V in (8) the last term in (3) is unimaginably small (essentially in agreement with (4)). In other words, equilibration in the sense of Sect. 2 is verified. We emphasise that all these conclusions do not depend an any further details of the actual initial condition ρ(0), except that it is assumed to be fixed, i.e. independent of the randomly sampled V . The physical interpretation is as follows: One specific, usually not very well known, but nevertheless well-defined initial state ρ(0) (pure or mixed) is "given" to us, and then evolves further in time according to one particular, randomly picked system Hamiltonian (8). Our results guarantee that the vast majority of those randomly sampled Hamiltonians gives rise to equilibration in the sense of Eq. (3). As the only unproven part remains the assumption that the actual (in detail not exactly known) system does not correspond to one of the rare, untypical Hamiltonians of the considered random ensemble.
Since Tr{ρ} = n ρ nn = 1 we can conclude from (84) that, for most V 's, the number of non-negligible ρ nn 's cannot be much smaller than 10 O(N ) . As a consequence, the strong ETH scenario from Sect. 5.6 does not apply. In turn, to apply this scenario, a different physical set up is required, with a different physical view of how the initial condition ρ arises. Namely, one particular, but "typical" V in (8) is considered to have been randomly sampled but now is held fixed. Since we assumed the system is typical, strong ETH as specified in Sect. 5.6 can and will be taken for granted. In a next step, the initial state ρ = ρ(0) for this particular system is specified, arising, e.g., as the result of an experimental preparation procedure for this very system (with respect to other systems H of the ensemble, this preparation procedure may not be physically meaningful or not even well defined). Finally, there must be good reasons (e.g. a very careful experimentalist) to assume that this preparation process yields level populations ρ nn which are negligible outside a window of n-values much smaller than 10 O(N ) , see Eq. (55).
In conclusion, our present formalism is able to validate either equilibration or thermalization, but not both of them simultaneously for one and the same physical model system.

Comparison with the approach by Goldstein and coworkers
In a series of works [5,13,16,47,48] Goldstein and coauthors addressed the problem of thermalization by means of an approach which, at first glance, seems to be entirely different from our present one. In particular, the "intermediate" problem of equilibration (see Sect. 2) apparently can be entirely circumvented. One key point of their approach is the restriction to so-called macroscopic observables, i.e. observables whose statistical fluctuations are negligibly small for macroscopic systems at thermal equilibrium. In this section, we will show that the latter defining property of a macroscopic observable implies that it satisfies (weak) ETH. In other words, the restriction to such observables is essentially equivalent to assuming ETH.
We define the microcanonical mean and variance of any given observable A as where ρ mic is the microcanonical ensemble from (5). By definition, an observable A is called a macroscopic observable, if its fluctuations ∆A are negligibly small. A more precise formal requirement would be vanishing fluctuations in the thermodynamic limit. A more appropriate real world (experimentally useful) version would be to require that the fluctuations are smaller than the experimental resolution limit δA, with, e.g. δA = 10 −10 ∆ A , where ∆ A is the measurement range of the experimental instrument modelled by A, see Sect. 2. The above requirement represents a minimal condition: Whatever alternative definition of a macroscopic observable may be proposed, if it admits non-small fluctuations in the microcanonical ensemble then it would not seem well-defined to us. Indeed, the definitions employed in [5,13,16] are quite similar but not exactly identical to ours. We also note that the microcanonical ensemble itself is only used here as a formal device to define the notion of a macroscopic observable. It does not in any way anticipate that the actual system of interest should exhibit thermalization.
Introducing (5) into (88) and (89) implies where (A 2 ) nn := n|A 2 |n . In other words, (90) represents the average over those A nn 's, whose energies E n are contained in the microcanonical energy window. Their typical deviation from this average is quantified by the variance From (51) we can conclude that (A nn ) 2 ≤ (A 2 ) nn and hence that ∆A ET H ≤ ∆A.
Assuming that A is a macroscopic observable thus implies that ∆A ET H is small [33].
According to the definition in (92) it follows that most A nn 's must be close to A, i.e. weak ETH is satisfied (see Sect. 5.6). Closely related considerations are originally due to [33], however focusing on socalled intensive local few-body operators rather than on macroscopic observables.
Restricting oneself to macroscopic observables clearly has a long and well founded tradition, especially with respect to the thermodynamic roots of statistical physics. On the other hand, statistical physics itself is by no means restricted to such observables. Rather, it is understood and experimentally (and numerically) seen that also "microscopic" observables are perfectly well described by this theory. Such observables exhibit non-negligible fluctuations about their mean values and as such are hardly encountered in our everyday macroscopic world (exceptions may arise near critical points). But already with the help of an optical microscope, interesting observables exhibiting non-negligible thermal fluctuations (e.g. Brownian motion) become accessible. Even more so, within the rapidly developing fields of nanotechnology and single molecule experiments, such microscopic observables become of increasing practical relevance.
Other relations between such an approach and ETH than the one discussed above have also been pointed out in Refs. [36,48]. Furthermore, the concept of randomised Hamiltonians (or random matrices) also plays a key role in the approach by von Neumann and by Goldstein and coworkers [47]. However, rather than introducing this randomness directly into the Hamiltonian H itself -as done in the approach by Deutsch via Eq. (8) -the randomness is now introduced by prescribing the statistical properties of the randomly sampled eigenbases |n of H, see Sect. 5.2, while keeping the spectrum of H fixed. In spite of these technical differences, the two approaches are thus in fact very close in spirit (see, e.g. Sect. 6 in [47]).

Summary and conclusions
In a first step, we reconsidered the random matrix model of Deutsch [44][45][46] and worked out a more detailed and slightly more general demonstration that it validates ETH: For the overwhelming majority of the corresponding random ensemble of Hamiltonians H, any given observable A is represented in the eigenbasis of H by an almost diagonal matrix with very slowly varying diagonal elements. More precisely: Apart from a fraction of exceptional H's which is exponentially small in the system size N, the offdiagonal matrix elements A mn are exponentially small in N and the changes of the diagonal elements A nn as a function of n are also exponentially small in N. This implies the following solution of problem (i) from Sect. 4: For any given H, one can readily construct (a posteriori) an ETH-violating A (see Sect. 4), but any such A still continues to satisfy ETH for most other H's. In turn, if H is not known in all details with extremely high precision, then a given observable is exceedingly likely to exhibit ETH.
The generalisation for more than one observable is straightforward: Given every single observable is exponentially unlikely to violate ETH, it is still extremely likely that all of them will simultaneously exhibit ETH, as long as their number is not exponentially large, i.e. remains within the limits of what is feasible in any real (or numerical) experiment.
In a second step, we have shown by means of a further generalisation of Deutsch's approach that also an essential prerequisite for equilibration, namely Eq. (4), will be satisfied for the overwhelming majority of Hamiltonians H. In doing so, an arbitrary (pure or mixed) state ρ(0) is admitted as initial condition. But this initial state ρ(0) must then remain always the same for the entire ensemble of random Hamiltonians H.
We also identified a not yet satisfactorily solved aspect of Deutsch's original approach and our present generalisation: On the one hand, the changes of the diagonal matrix elements A nn as a function of n are exponentially small in the system size N up to exponentially rare exceptions. On the other hand, the typical difference between neighbouring energy levels E n is also exponentially small in N (cf. Sect. 2), i.e. the number of energy eigenvalues contained in an energy intervals [E − ∆E, E] is exponentially large in N for the usual ∆E's of interest. Hence, the variations of A nn within the entire energy interval may no longer be negligible. As a consequence, thermalization, i.e., the practical indistinguishability of (6) and (7), can only be proven under the extra condition that the interval of relevant n-values, which notably contribute to those sums in (6) and (7), is not too large, namely much smaller than 10 O(N ) (cf. (55)). In other words, only exceedingly small ∆E's are admitted. In the following four paragraphs, we conclude with four noteworthy remarks and implications.
In spite of this restriction, the admitted range of n-values is still huge, and likewise for the admitted energy intervals ∆E in comparison with the energy level spacings. In particular, they are still of physical interest: For instance, one may imagine that the experimentalist has prepared the system with a sufficiently small uncertainty in the total system energy E so that the corresponding condition can be safely taken for granted for the ρ nn 's appearing in (6) and (7).
As mentioned already in in Sect. 4, our findings imply the equivalence of ρ eq in (7) not only with the microcanonical ensemble ρ mic in (6) but also with any other equilibrium (i.e. steady state) ensemble, provided that its level populations are mainly concentrated within a sufficiently small energy window as specified above. Unfortunately, this condition is not satisfied e.g. for the canonical ensemble.
The other, more fortunate side of the coin is that within our present approach the diagonal matrix elements A nn are indeed not forbidden to exhibit non-negligible variations for sufficiently large changes of n, or, equivalently, for macroscopically notable changes of E n . This solves problem (ii) from Sect. 4.
Assuming one and the same initial state ρ(0) for the entire ensemble of random Hamiltonians H, as done in our above discussion of equilibration, implies that the number of ρ nn 's which notably contribute in (7) is not much smaller than 10 O(N ) for most H's. In conclusion, our present generalisation of Deutsch's approach allows us to corroborate either equilibration or thermalization, but not both of them simultaneously for one and the same physical model system. The root of the problem is as before: Whether and why the dependence of the diagonal matrix elements A nn on n is neither too strong nor too weak is not yet fully satisfactorily understood. A solution of this problem is presently being worked out.