Transportless equilibration in isolated many-body quantum systems

A general analytical theory of temporal relaxation processes in isolated quantum systems with many degrees of freedom is elaborated, which unifies and substantially amends several previous approximations. Specifically, the Fourier transform of the initial energy distribution is found to play a key role, which is furthermore equivalent to the so-called survival probability in case of a pure initial state. The main prerequisite is the absence of any notable transport currents, caused for instance by some initially unbalanced local densities of particles, energy, and so on. In particular, such a transportless relaxation scenario naturally arises when both the system Hamiltonian and the initial non-equilibrium state do not exhibit any spatial inhomogeneities on macroscopic scales. A further requirement is that the relaxation must not be notably influenced by any approximate (but not exact) constant of motion or metastable state. The theoretical predictions are compared with various experimental and numerical results from the literature.


Introduction and Overview
Relaxation processes in systems with many degrees of freedom play a key role in a large variety of different physical contexts [1,2,3,4,5,6,7]. Quite often, an essential feature of the pertinent non-equilibrium initial states are some unbalanced local densities of particles, energy, etc., giving rise to transport currents during the relaxation towards equilibrium. Paradigmatic examples are compound systems, parts of which are initially hotter than others, or a simple gas in a box, streaming through a little hole into an empty second box. As a consequence, the temporal relaxation crucially depends on the system size, and may become arbitrarily slow for sufficiently large systems.
In the present work, the focus is on the complementary class of equilibration processes, which do not entail any such transport currents. In the simplest case, one may think of systems without any spatial inhomogeneities on the macroscopic scale, for instance a fluid or solid with spatially constant densities of all particle species, energy, and so on. (Inhomogeneities on the microscopic (atomic) scale are obviously still admitted; they are outside the realm to which concepts like "densities" and associated "transport currents" are applicable, see also section 5.) The non-equilibrium character of an initial state could then for instance manifest itself in a non-thermal velocity distribution. Another concrete experimental example, to which we will actually apply our theory in section 6, is the excitation of an "electron gas" by a laser pulse, resulting in a system state, which is spatially homogeneous but exhibits strong deviations from the usual Fermi-Dirac statistics at equilibrium. Further pertinent examples, which are often considered in numerical investigations, and which will also be compared with our present theory later on, are so-called quantum quenches, where the initial state is given by the ground state (or some other eigenstate or thermal equilibrium state) of a Hamiltonian, which is different from the Hamiltonian that governs the actual relaxation dynamics. Still focusing on spatially homogeneous Hamiltonians and states, also other types of "handmade" non-equilibrium initial conditions are commonly explored in the literature, e.g., so-called Néel states (antiferromagnetic order) in the context of various spin models. In all these cases of transportless equilibration, it is reasonable to expect (and will be confirmed later on) that the temporal relaxation is practically independent of the system size, and that the typical time scales will be much faster than for transport governed equilibration. As yet another striking feature, we will find that transportless relaxation is usually not exponential in time.
The general issues of equilibration and thermalization in isolated many-body quantum systems have stimulated during recent years a steadily growing amount of analytical, numerical, as well as experimental activity, reviewed, e.g. in [1,2,3,4,5,6,7]. (In doing so, also open systems (interacting and possibly entangled with an environment) can be treated by considering the environment (thermal bath, particle reservoir etc.) and the actual system of interest as an isolated compound system.) Strictly speaking, the relaxation of such an isolated system towards a steady long-time limit is immediately ruled out by the unitary time evolution and, in particular, by the well-know quantum revival effects [8]. Nevertheless, "practical equilibration" (almost steady expectation values for the vast majority of all sufficiently large times) has been rigorously established in [9,10,11,12,13] under quite general conditions.
In section 2, the essential points of those previous results on equilibration will be made plausible once again by means of a new, less rigorous, but much simpler and intuitive reasoning. It should be emphasized that the issue of equilibration is related to, but different from the issue of thermalization, i.e., the question whether or not the above mentioned (almost) steady expectation values in the long-time limit agree with the textbook predictions of equilibrium statistical mechanics. The latter issue of thermalization does not play any role throughout this paper: all results are valid independently of whether or not the considered system thermalizes.
In section 3, the previous rigorous approach to transportless equilibration from [14,15] is revisited in terms of an alternative, non-rigorous but physically much simpler line of reasoning, while in sections 4 and 5 its main preconditions are worked out in considerable more detail than before. A representative comparison of this theory with experimental observations is provided by section 6. Section 7 represents the actual core of the paper, and the formal approach adopted in this section is substantially more elaborate than in the previous sections 2 and 3. Technically speaking, the crucial idea is to skillfully "rearrange" the systems's very dense energy eigenvalues and to "redistribute" the possibly quite heterogeneous populations of the corresponding eigenstates, yielding an effective description in terms of an auxiliary Hamiltonian with approximately equally populated eigenstates. The main result is a unification and substantial amendment of the earlier findings in [14,15,16,17], formally summarized by the compact final equation (74). The decisive quantity, which governs the temporal relaxation via the last term in equation (74), will furthermore be identified in section 7 with the Fourier transform of the system's initial energy distribution, and in case the system is in a pure state, also with the so-called survival probability of the initial state. These analytical predictions are compared with previously published numerical simulations in section 8.
Even when focusing solely on analytical investigations, previous studies related to relaxation time scales and the like are still quite numerous, and pointing out in each case the similarities and differences to our present approach goes beyond the scope of this paper. A first major issue in this context, addressed e.g. in [11,18], is the derivation of general upper bounds for some suitably defined relaxation time. While in some specifically tailored examples, the relaxation may indeed become extremely slow [19], those upper bounds are still not quantitatively comparable to the actually observed time scales in more realistic situations. On the other hand, extremely fast time scales have been predicted, e.g., in [19,20]. Finally, investigations of particular classes of models, observables, or initial conditions are provided, among others, in [21,22]. One important step forward of our present work is that not only an estimate of some characteristic time scale, but also a detailed description of the entire temporal relaxation behavior is provided and quantitatively verified against experimental and numerical data.

Equilibration and thermalization
We consider an isolated system, modeled by a Hamiltonian H = n E n |n n| (1) and an initial state ρ(0) (pure or mixed and in general far from equilibrium), which evolves in time according to with propagator Hence, the expectation value of any given observable A in the time evolved state ρ(t) follows as where ρ mn (t) := m|ρ(t)|n , A nm := n|A|m , and where, depending on the specific problem under consideration, the indices n and m run from 1 to infinity or to some finite upper limit. In particular, p n := n|ρ(0)|n = ρ nn (0) (6) represents the population of the n-th energy level, i.e., the probability that the system is found in the energy eigenstate |n when averaging over many repetitions of the measurement and -in the case of a mixed state -over the statistical ensemble described by ρ(0).
The main examples we have in mind are macroscopic systems with, say, f ≈ 10 23 degrees of freedom. While such many-body quantum systems are generically nonintegrable, so-called integrable systems are still admitted in most of what follows. Likewise, compound systems, consisting of a subsystem of actual interest and a much larger environmental bath, are also included as special cases.
Equation (5) represents the completely general and formally exact solution of the dynamics, exhibiting the usual symmetry properties of quantum mechanics under time inversion. Moreover, the right hand side is a quasi-periodic function of t, giving rise to the well-known quantum revival effects [8]: A ρ(t) must return very close to A ρ(0) for certain, very rare times t.
The problem of equilibration amounts to the question whether, in which sense, and under what conditions the expectation value (5) approaches some constant (timeindependent) value for large t. Unless this expectation value is constant right from the beginning, which is not the case under generic (non-equilibrium) circumstances, the above mentioned revivals immediately exclude equilibration in the strict sense that (5) converges towards some well-defined limit for t → ∞. On the other hand, "practical equilibration" in the sense that (5) becomes virtually indistinguishable from a constant value for the overwhelming majority of all sufficiently large t has been demonstrated, for instance, in [9,10,11,12,13] under quite weak conditions on H, ρ(0), and A. In particular, equilibration in this sense still admits transient initial relaxation processes and is compatible with the above mentioned time inversion symmetry and quantum revival properties.
For the rigorous derivation of these results and the detailed requirements on H, ρ(0), and A, we refer to the above mentioned literature. Here, we confine ourselves to a complementary, predominantly heuristic discussion of the essential points: Averaging (5) over all times t ≥ 0 yields the result A ρ dia , where the so-called diagonal ensemble is defined as and where we exploited (6) in the last step ‡. Given the system equilibrates at all (in the above specified sense), it follows that (5) must remain extremely close to A ρ dia for the vast majority of all sufficiently large times t. Intuitively, the essential mechanism is expected to be a "dephasing" [9,23,24] of the oscillating summands on the right hand side of (5): there must be sufficiently many different "frequencies" [E n −E m ]/ which notably contribute to the sum, resulting in an approximate cancellation for most sufficiently large t, provided H, ρ(0), and A satisfy certain "minimal" conditions: To begin with, some of the oscillating summands in (5) may assume arbitrary large amplitudes by suitably tailoring the A nm 's, even for otherwise quite harmless ρ(0) and H, thus prohibiting equilibration in any meaningful sense. To exclude such pathologies, a convenient minimal requirement on A turns out to be that it must represent an experimental device with a finite range ∆ A of possible measurement outcomes, where ∆ A is given by the difference between the largest and smallest eigenvalues of A. Furthermore, the resolution limit δA of the considered device must be limited to experimentally reasonable values compared to its working range ∆ A . Quantitatively, all measurements known to the present author yield less than 20 significant figures, implying that the resolution limit δA must be lower bounded by 10 −20 ∆ A . Maybe some day 100 or 1000 significant figures will become feasible, but it seems reasonable that a theory which does not go very much beyond this will do. Note that similar restrictions also apply to numerical experiments by computer simulations. We finally remark that the same or some equivalent assumption on A is, at least implicitly, taken for granted in all pertinent ‡ If H exhibits degeneracies, we tacitly choose the eigenvectors |n so that ρ mn (0) is diagonal within every eigenspace. Regarding the existence of the time average for infinite dimensional Hilbert spaces see [12]. works in this context, and it is obvious that considering only such observables will be sufficient for all practical purposes.
Similarly, with respect to ρ(0) it is quite plausible that if two (or more) level populations p n in (6) with non-degenerate energies E n are not very small (compared to n p n = 1) then non-negligible Rabi oscillations may arise in (5), which prohibit equilibration in any reasonable sense §, even for otherwise quite harmless A and H. In other words, all level populations must satisfy the condition p n ≪ 1 apart from possibly one exception. More generally, if H exhibits degenerate eigenvalues E n , then analogous conditions must be fulfilled by the populations of the energy eigenspaces in order to rule out any non-negligible "coherent oscillations" on the right hand side of (5). For similar reasons, not too many of the "energy gaps" E n − E m in (5) may coincide, or if they coincide, they must contribute with sufficiently small weights. In view of the usually very dense and irregular energy spectra, the above (or some equivalent) requirements are commonly taken for granted under all experimentally relevant conditions.
Given H, ρ(0), and A satisfy the above "minimal requirements", there are no further obvious reasons which may prevent equilibration via a "dephasing" of the summands on the right hand side of (5). One thus expects that, after initial transients have died out, the system behaves practically indistinguishable from the steady state (7); deviations are either unresolvably small (below the resolution limit δA) or unimaginably rare in time. All this has been rigorously confirmed, e.g., in [9,10,11,12,13].
As an aside we note that the preparation of an initial condition ρ(0) with a distinct non-equilibrium expectation value of A at time t = 0 must actually amount to a quite special selection of the terms ρ mn (0)A nm (in particular of their complex phases) on the right hand side of (5) [23]. This issue is in fact also quite closely related to a variety of so-called typicality concepts and results, see [25,26,27].
In the rest of the paper we always tacitly focus on systems, for which the above "minimal conditions" are fulfilled, and hence equilibration can be taken for granted. For the sake of simplicity, we will further restrict ourselves to the generic case that the energy differences E m − E n are non-zero and mutually different for all pairs m = n, and that is fulfilled for all level populations in (6), i.e., we neglect the above mentioned generalization that there may be one exceptional index n which violates (8). Similarly, also our above restriction on the energy differences E m − E n could in principle still be lifted to some degree, as shown in [11,12]. The natural next question is whether the system exhibits thermalization, that is, whether the long-time average A ρ dia (see above (7)) is well approximated by the pertinent microcanonical expectation value, as predicted by equilibrium statistical mechanics. Throughout the present paper, this issue of whether the system thermalizes or not will be largely irrelevant. In particular, so-called integrable systems and systems exhibiting many body localization (MBL), which are commonly expected to exhibit equilibration but not thermalization [1,2,3,6,28], are still admitted.

Typical temporal relaxation
Taking for granted equilibration as specified above, the main focus of this section is on the detailed temporal relaxation of the expectation value (5) from its initial value at time t = 0 towards the (apparent) long-time limit A ρ dia (see above (7)).
Similarly as in section 2, while a mathematically rigorous derivation of the subsequent results is provided in [14,15], the following line of reasoning amounts to a much shorter, less rigorous, but physically more instructive alternative derivation.
To begin with, we assume that only some large but finite number D of the energy levels E n exhibit non-negligible populations p n = ρ nn (0) (see (6)) and, without loss of generality, we label them so that n ∈ {1, ..., D} for all those E n . Accordingly, all other ρ nn (0)'s are approximated as being strictly zero. For a more detailed, quantitative justification of this approximation we refer to Appendix A. The Cauchy-Schwarz inequality |ρ mn | 2 ≤ ρ mm ρ nn then implies that only m, n ≤ D actually matter in (1), (5), (7), i.e., Note that if the number D of non-negligible level populations were not large, then equilibration as discussed in section 2 may not be expected in the first place. On the other hand, (10) can be shown to approximate (5) very well under quite general conditions (see also Appendix A). The examples of foremost interest are isolated many-body systems with a macroscopically well defined energy, i.e., all relevant energies E 1 ,...,E D are confined to some microcanonical energy window [E − ∆E, E] of microscopically large but macroscopically small width ∆E. Henceforth it is taken for granted that the considered system is of this type.
The summands with m = n in (10) can be readily rewritten by means of the diagonal ensemble from (11) as A ρ dia , yielding where the symbol ′ indicates a sum over all m, n ∈ {1, ..., D} with m = n. Since D is large, the number D(D − 1) of those summands is even much larger. For any given t, those very numerous e mn 's are distributed on the complex unit circle according to (13). All of them start out from e mn = 1 for t = 0, and subsequently spread out along the unit circle as t increases. Hence, their distribution on the unit circle will be highly non-uniform (strongly peaked around unity) for small t, while they are expected to become roughly speaking uniformly distributed as t → ∞. More precisely, since the number of e mn 's is large but finite, their collective motion on the unit circle must be quasi-periodic, i.e., occasional "recurrences" and other appreciable deviations from a uniform distribution necessarily must occur for certain, arbitrary large times t, but they are expected to be extremely rare and thus safely negligible for all practical purposes.
Turning to (14), one readily concludes from the Cauchy-Schwarz inequality that |A nm | ≤ A , where A indicates the operator norm of A (largest eigenvalue in modulus). Likewise, one sees that |ρ mn (0)| ≤ ρ(0) ≤ 1, i.e., all the a mn 's are distributed inside a circle of radius A in the complex plane.
Note that the matrix elements A nm = n|A|m in (14) are independent of the energy eigenvalues E n , while the e mn 's in (13) are independent of the corresponding energy eigenvectors |n . Furthermore, only indices m and n with macroscopically small differences E n − E m (see below (11)) and with m = n actually matter in (12). In the absence of any a priori reasons to the contrary, one thus expects that the quantitative values of the matrix elements A nm will not be "correlated" in any specific way with the e mn 's, see also [3,21,29]. Put differently, how should the observable A "feel" for example whether or not a given pair of eigenvectors |n and |m belongs to a small energy differences E n − E m in (13) without any a priori knowledge about the Hamiltonian H in (9) ? After all, without such extra knowledge, the |n 's are orthogonal to each other but for the rest may be arranged in any way within the high dimensional Hilbert space under consideration.
Similar considerations as for the A nm apply to the matrix elements ρ mn (0) in (14). All these arguments suggest that both the e mn 's and the a mn 's may be roughly speaking viewed as two large sets of pseudorandom numbers, which are essentially independent of each other, implying the approximation ′ e mn a mn Indeed, since D(D − 1) is the number of summands in each of the three sums in (15), the left hand side amounts to the correlation of the e mn 's and the a mn 's, which, for statistically independent random numbers and D → ∞, is known to converge (with probability → 1) towards the product of the two mean values on the right hand side. Qualitatively, somewhat similar ideas have also been developed in [24], but the quantitative details were quite different. Concerning the above justification of (15), our first side remark is that the e mn 's and the a mn 's are actually only required to be uncorrelated, which is strictly speaking a weaker condition than being independent. Second, we note that the e mn 's need not be uniformly distributed on the unit circle . Third, focusing on the a mn 's alone, it is not necessary that they are uncorrelated or independent from each other, and likewise for the e mn 's.
This heuristic approximation in (15) will be the key ingredient of our subsequent line of reasoning. Further arguments in support of it are: (i) It amounts to an exact identity for t = 0. (ii) Likewise, upon averaging over all times t ≥ 0 and taking for granted that all energies E n are non-degenerate (see above (8)), one can show that (15) becomes an exact identity.
The first sum on the right hand side of (15) can be rewritten by means of (13) as Likewise, the last sum in (15) can be rewritten by means of (14) as and with (10), (11) it follows that Upon introducing (15)- (19) into (12), we finally obtain as our first main result the approximation where F (t) := (D|φ(t)| 2 − 1)/(D − 1). Since D ≫ 1 this yields the very accurate approximation where φ(t) is given by (17) and therefore may be interpreted as the Fourier transform of the spectral density of H.
The key ingredient for the derivation of (20) was the heuristic approximation (15). While it makes the derivation short and physically instructive, a more rigorous justification of (15) seems very difficult. On the other hand, the very same formula (20) can also be rigorously obtained by means of a technically very different, more arduous and less instructive approach, see [14,15], using averages over unitary transformations, Also in probability theory, two random variables may well be statistically independent (or uncorrelated), no matter how each of the two single variables is distributed. One (or both) of them may even be non-random (corresponding to a delta-distribution), in which case the independence property is always trivially fulfilled.
under which the locality properties of a given Hamiltonian are in general not preserved (see also sections 4 and 5).
Upon comparison with (17) we see that F (t) in (21) quantifies the above discussed distribution of the e mn 's on the complex unit circle. In particular, one readily finds that F (0) = 1 and 0 ≤ F (t) ≤ 1 for all t. Moreover, the following properties of F (t) were derived previously in [14]: (i) F (t) remains negligibly small for the vast majority of all sufficiently large times t, provided the maximal degeneracy of the energies E 1 , ..., E D is much smaller than D (see also above (8)). The extremely rare exceptional t's are inherited from the above mentioned quasi-periodic motion of the e mn 's on the unit circle. Our main result (20) thus captures at least qualitatively correctly the decay from the initial expectation value A ρ(0) towards the long-time average A ρ dia , and also the well-known quantum revivals at arbitrarily large but exceedingly rare times [8]. (ii) Denoting by Ω(E) the number of energies E n below E, by k B and S(E) := k B ln Ω(E) Boltzmann's constant and entropy, respectively, and by T := 1/S ′ (E) the corresponding formal temperature, one can often approximate the sum in (15) by an integral over a suitably smoothened level density, yielding the approximation As may have been expected, the above mentioned quasi-periodicities of F (t) and the concomitant quantum revivals get lost within such a continuum approximation. We also note that T and S(E) can be identified with the usual temperature and entropy of the thermalized system (at energy E), provided the system does approach thermal equilibrium for large times (see end of section 2). In the opposite case of a non-thermal long-time limit, T and S(E) are usually still well defined formal quantities, but without an immediate physical meaning. Rather, they may be viewed as the equilibrium temperature and entropy of some auxiliary initial state ρ aux (0), which does exhibit thermalization, and whose energy expectation value Tr{ρ aux (0)H} is identical to the "true" system energy E := Tr{ρ(0)H}. In particular, such a ρ aux (0) always exists (for instance the microcanonical ensemble), and hence (22) remains valid even for non-thermalizing initial states ρ(0). The only prerequisite is that the thermal equilibrium properties of H are "as usual", i.e., the density of states is very high and grows very fast with E.
A further implication of (17) and (21) is that F (−t) = F (t) for all t. Hence, the fundamental symmetry properties of quantum mechanics under time inversion mentioned below (5) are still maintained by (20). Remarkably, the time inversion symmetry of (20) even persists in cases where it is broken in the microscopic quantum dynamics, e.g., due to an external magnetic field. This is reminiscent of the second law of thermodynamics, which also remains valid for systems with a magnetic field and thus with broken microscopic time inversion symmetry.

Exceptional cases
In this section, we collect the main a priori reasons announced above (15), which may invalidate the approximation (15) and hence our main result (20).
To begin with, we note that is the commutator between the Hamiltonian (9) and the observable A. If A is a conserved quantity it satisfies [H, A] = 0, implying that A nm = 0 whenever E n = E m . If we now slightly perturb the Hamiltonian under consideration, one can infer from ordinary perturbation theory (for extremely small perturbations) or more sophisticated nonperturbative methods [30] (for moderately small perturbations) that the new matrix elements A nm in the basis of the perturbed Hamiltonian are non-negligible only for relatively small E n − E m . With reference to the new, slightly perturbed system, the observable A may thus be called "almost conserved", still exhibiting a significant correlation between the energy differences E n − E m and the magnitude of the matrix elements A nm . Hence, also the e mn 's in (13) and the a mn 's in (14) will be correlated and the argument below (15) breaks down. One thus expects that the temporal relaxation of such an almost conserved quantity will be slower than predicted by (20).
Important examples are the energies of two weakly coupled subsystems (of an isolated compound system), or the total momentum of an isolated system, such as a simple gas in a box, which is not conserved due to momentum exchange with the system boundaries (and similarly for the total angular momentum). All these observables then amount to almost conserved quantities since they represent "volume" properties (extensive quantities), which only can change through "surface" effects (exchange of energy, momentum etc. via "particle-wall interactions"). Our present theory only applies if such quantities assume their equilibrium value right from the beginning (e.g., the total momentum must be zero), or if they can be approximated as being strictly conserved (e.g., the weak coupling between subsystems is "switched off"). Put differently, this is a first instance where we see that macroscopic transport in the sense of section 1 must be excluded.
An analogous breakdown of (15) and hence of (20) is expected if ρ(0) is an "almost conserved" quantity.
Next, let us replace the original H from (9) by the transformed Hamiltonian where U is an arbitrary but fixed unitary transformation. In other words, the eigenvalues of H U are still given by E n , while the eigenvectors are now U|n instead of |n . Accordingly, the original definition ρ mn (0) := m|ρ(0)|n in (10)-(19) must be replaced by ρ mn (0) := m|U † ρ(0)U|n , and analogously for the definitions of A nm and of ρ dia in (11). In the final result (20), the initial value A ρ(0) as well as the function F (t) are not affected by such a unitary transformation, while the quantitative value of the longtime average A ρ dia may in general change. Similarly, the e mn 's in (13) are independent of U, while the a mn 's in (14) are typically "redistributed" in a very complicated way. Therefore, (15) is expected to be satisfied in very good approximation for most U's. A more detailed verification of this expectation is provided in [14,15]. The key point is that this finding is independent of whether (15) was satisfied by the original Hamiltonian H in (9) or not.
In conclusion, (20) cannot be correct if the temporal relaxation, encapsulated by the U independent function F (t), is notably different for the "true" Hamiltonian H than for most other Hamiltonians H U .
One readily sees that the latter criterion, in particular, also excludes the previously discussed cases when A or ρ(0) is an almost conserved quantity.

Restriction to transportless relaxation
A pivotal feature of almost all physical systems of interest is that they can be very well described in terms of some "elementary constituents" (atoms, molecules, quasiparticles etc.), which are reasonably localized in space and whose interaction is of short range. Formally, the model Hamiltonian H is thus composed solely of so-called local operators. Only in such cases it makes sense to ask for the amount of energy, charge, particles etc. within some subdomain of the system: If the considered volume is not too small then the interaction with the rest of the system is weak and can be approximately ignored (surface effects are small compared to volume contributions). In other words, local densities are reasonably well-defined concepts. Since they are usually "local descendants" of some globally conserved quantities (energy, charge, particle numbers etc.) their content within a given volume can only change via transport currents through the boundaries of that volume.
As discussed in section 2, all those local densities will equilibrate towards certain (approximately) steady values after sufficiently long times. If all local densities for a given initial state ρ(0) agree (at every point in space and in sufficiently good approximation) with the corresponding equilibrium values, then ρ(0) is called a macroscopically homogeneous initial state. The word "homogeneous" refers to the fact that the densities after equilibration are indeed spatially homogeneous in many examples of interest. For simplicity, we tacitly focus on such situations in the following discussion. However, analogous conclusions remain valid even when the equilibrated densities are actually inhomogeneous. (It is only the naming which becomes "wrong", not the argument). The word "macroscopic" refers to the fact that the very concept of a density or a transport current breaks down on microscopic length scales. (For instance, the number of atoms within a small volume should be well approximated by the corresponding particle density times the volume. If the volume is so small that it only contains a few atoms, this is no longer true. Put differently, the microscopically discrete particles are no longer well described by a continuum approximation in terms of densities and concomitant currents.) In real systems, the equilibration of initial inhomogeneities via the above mentioned transport currents takes an increasingly long time over increasingly large distances. On the other hand, the function F (t) from (17) and (21), which governs the temporal relaxation in (20), is independent of the initial state and thus of the distance between possible inhomogeneities. Moreover, the characteristic time scale, predicted, e.g., by (22) is very short ( /k B T ≃ 26 fs at room temperature). In other words, (20) must be invalid for initial conditions which give rise to significant spatial inhomogeneities on macroscopic scales.
The underlying a priori reason (see section 4) is as follows. In contrast to H (see above), most transformed Hamiltonians H U in (23) can no longer be interpreted as a description of certain basic constituents (atoms etc.) which are spatially well localized and exhibit short range interactions, nor can they any longer be rewritten as (sums of) local operators. Hence, local densities and transport currents are not any more well defined, and the very same initial conditions ρ(0), which entailed spatial inhomogeneities when dealing with H, are no longer expected to equilibrate particularly slowly when H U governs the dynamics. Hence the "exclusion criterion" at the end of section 4 applies to such a system Hamiltonian H.
It is interesting to consider the same thing from yet another viewpoint. Namely, one readily sees from the discussion below (23) that instead of replacing H by H U (while leaving ρ(0) and A unchanged), one could as well keep H unchanged and replace ρ(0) and A by ρ U (0) := U † ρ(0)U and A U := U † AU, respectively. In other words, only the initial state and the specific observable under consideration are changed, whereas local densities etc. are represented by the same operators before and after the transformation, and, in particular, still remain perfectly well defined concepts even in the transformed setup. For any given such invariant operator B, one can show along the lines of [26] that the initial expectation value B ρ U (0) is practically indistinguishable from the pertinent equilibrium value B ρ dia for most U's. In particular, B may quantify the amount of energy (or charge etc.) within a macroscopically small but microscopically still not too small volume V , and thus B/V accounts for the corresponding density at the location of that volume. The same remains true simultaneously for several different observables B 1 , .., B K , where K may be sufficiently large to specify the entire spatial dependence of the densities within any experimentally resolvable resolution. As a consequence, most ρ U (0)'s must be (approximately) homogeneous and hence their relaxation (under H) is not expected to be particularly slow.
In conclusion, systems with short range interactions in combination with initial conditions, which give rise to non-negligible spatial inhomogeneities on macroscopic scales, must be excluded in (20). Put differently, the total energy, (angular) momentum, particle numbers etc. within any macroscopic part of the system must remain constant during the entire relaxation process. Accordingly, the relaxation process must not entail any significant transport currents, caused by some unbalanced local densities.
For instance, such a transportless relaxation scenario often arises quite naturally when the system Hamiltonian and the initial non-equilibrium state do not exhibit any spatial inhomogeneities on macroscopic scales. Strictly speaking, one also has to exclude the possibility of spontaneous symmetry breaking during relaxation, initial states with non-vanishing total momentum (resulting in transport through system boundaries), etc., see also section 4.
In case of notable spatial inhomogeneities, it may still be possible to approximately partition the system into sufficiently small, non-interacting subsystems and then describe the relaxation within each of them by (20). Essentially, this is tantamount to the well established concept of local equilibration. Usually, this local equilibration is much faster than the subsequent, global equilibration of the small subsystems relatively to each other. The latter, slow processes are no longer covered by our theory (20). In turn, the clear-cut separation of the two time scales usually admits some Markovian approximation for the slow processes, resulting in an exponential decay, whose timescale still depends on many details of the system. For similar reasons, also correlation and entanglement properties of spatially well separated regions are beyond the realm of our present theory; very roughly speaking, they may be viewed as being governed by transport of information, whose propagation speed is limited, e.g., by Lieb-Robinson bounds [2,31].
Closely related further generalizations of the above local equilibration paradigm are the concepts of hindered equilibrium, quasi-equilibrium, metastability, and, above all, prethermalization [1,32,33,34,35,36]. The first three concepts play a crucial role for instance in chemical reactions with long-lived intermediates, or in quantum systems exhibiting "glassy behavior" [37,38], while the concept of prethermalization refers, e.g., to a fast but only partial thermalization of a certain subset of modes, (quasi-)particles ¶, or other generalized degrees of freedom [14].
More formally, the latter cases have their origin in certain almost conserved quantities of the pertinent Hamiltonian H, which significantly slow down some intermediate steps of the temporal relaxation, while the same is no longer true for most of the transformed Hamiltonians H U within the framework discussed at the end of section 4.
As already mentioned, analogous conclusions remain valid even when the equilibrated densities are actually inhomogeneous, provided all of them are (approximately) equal to the initial densities. The only indispensable prerequisite is the absence of transport during relaxation. This case is of particular interest when the system is composed of a small subsystem of actual interest and a bath. Usually the bath can be considered as equilibrated right from the beginning, hence the decisive question is whether all densities in the small subsystem remain (practically) unchanged during the equilibration process. In particular, if the subsystem is so small that no meaningful local densities can be defined, then the above considerations no longer imply that some initial conditions must be excluded a priori. In turn, if the subsystem is not small and all transport currents are still excluded, one expects a largely similar relaxation behavior in the presence and in the absence of the bath. ¶ In general, quasiparticles are expected to become a meaningful concept only after prethermalization [33].

Comparison with experiments
As recognized in the preceding section 5, an indispensable prerequisite of our present theory is that the initial non-equilibrium state must be spatially homogeneous. Though most published experiments on equilibration and thermalization admittedly do not fulfill this requirement, there still exists a considerable number which do fulfill it.
A variety of such experimental (as well as numerical) data from the literature have been demonstrated already in [14,15] to agree remarkably well with the theoretical predictions in (20) and (22). It is worth mentioning that most of those data have not been quantitatively explained by any other analytical theory so far.
Note that the relevant time scale /k B T in (22) is approximately 26 fs at room temperature. In many cases, such extremely fast processes may be experimentally difficult to observe, or they have simply not been looked for until now. In particular, spatially inhomogeneous initial conditions usually exhibit a much slower relaxation, but they are not covered by our present theory. On the other hand, for systems at extremely low temperatures, such as atomic Bose gases, the relevant time scale /k B T will be more easily accessible, hence these are promising candidates for a comparison with our present theory [14,15]. Finally, the relaxation dynamics near a quantum critical point is known to be governed by the very same time scale /k B T under very general conditions, i.e., independently of any further microscopic details of the system [39].
For a concrete experimental (or numerical) setup at hand, the value of A ρ(0) in (20) is sometimes quite obvious, but more often its quantitative determination is very difficult by purely theoretical means, and likewise for the long-time average A ρ dia in (20). On the one hand, to analytically determine those values is not a main issue of our present work. On the other hand, even the experimental data themselves are often reported in arbitrary units. Therefore, the quantitative values of A ρ(0) and A ρ dia in (20) usually must be taken over from the experiment (or the numerics), hence the only remaining parameter of the theory is the temperature T in (22). Once again, the relevant temperature value, as discussed below (22), is often not available as an experimentally determined quantity, and hence must be estimated indirectly or treated as yet another fit parameter [14,15].
In the remainder of this section, we focus on one of the rare examples, for which the pertinent temperature in (22) is experimentally available. Namely, we consider the pump-probe experiment from [40], where the electron gas in a graphene monolayer is excited by an ultrashort "pump" laser pulse, and then its re-thermalization is monitored by a second "probe" pulse, yielding the number of electrons in the conduction band N CB , see also figure 1. In other words, the observable A in (20) is chosen so that A ρ(t) = N CB (t). A more detailed modeling of the actual observable A corresponding to the experimental measurement procedure would be quite difficult, but fortunately is not needed ! Prior to the pump pulse, the system is at room temperature and A ρ(t) = N CB (t) is known to be negligibly small [40]; i.e., N CB (t) = 0 for t < 0. At time t = 0, the pump pulse suddenly excites a certain number A ρ(0) = N CB (0) of electrons into the conduction band (hence the discontinuity of the dotted line in figure 1). Subsequently, these excited electrons generate secondary electron-hole pairs via impact ionization (inverse Auger scattering) so that A ρ(t) = N CB (t) further increases [40]. If the electron gas were strictly isolated from the rest of the world (as assumed in our theory), it would approach a new thermal equilibrium with some temperature T . Identifying the corresponding long-time average of N CB (t) with A ρ dia in (20), one can deduce from figure 6a in [41] the estimate In particular, the corresponding electron gas temperature in figure 6e of [41] is comparable to the experimentally relevant value (see below). However, in the actual experiment, there is -besides the dominating electron-electron interactions -also a relatively weak interaction via electron-phonon scattering with the atomic "backgroundlattice" of the graphene layer, resulting in a relatively slow relaxation of the electronlattice compound towards a thermal equilibrium state of the total system, which is different from the above mentioned hypothetical equilibrium of the electron gas alone, and which is not covered by our present theory (the energy of the electron gas is an almost conserved quantity, see section 4). Experimentally, one observes that the phonon effects are still approximately negligible for times up to about t = 25 fs, while the electron gas already approximately thermalizes. Therefore, only times up to t = 25 fs have been included in figure 1. In turn, one can deduce from Figure 4 in the Supplemental Material of [40] that the corresponding electron temperature T in (22) is approximately 2000 K.
The resulting theoretical prediction is indicated as dotted line in figure 1 and does not agree very well with the experimental data. The quite obvious reason is that while both laser pulses are extremely short in the experiment, their duration is still not negligible compared to the relaxation time scale of the electron gas. Theoretically, we roughly take into account the finite widths of both pulses by convoluting our above prediction with a Gaussian of standard deviation 5.5 fs. The latter value for the combined widths of both pulses has been experimentally determined, as detailed in the Supplemental Material of [40] (see last paragraph of page 3 therein). The so obtained solid line in figure 1 agrees very well with the experimental findings, especially in view of the fact that, apart from the unknown units of the experimental data, there remains no free fit parameter in the underlying theory.
With respect to the probe pulse, the above convolution with a Gaussian seems an intuitively quite plausible modeling of the "smeared out" time point t of the experimental measurement. With respect to the pump pulse, it represents a rather poor "effective description" since our entire theoretical approach becomes strictly speaking invalid when the duration of the initial perturbation becomes comparable to the relaxation time [14]. One the other hand, it still seems reasonable to expect that the finite widths of the pump and of the probe pulses will have roughly comparable effects on the measurement outcome. Alternatively, one may imagine that the probe pulse is indeed very sharply peaked in time, but the location of the delta-peak is slightly different for spatially different regions on the graphene monolayer, and that those regions interact only very weakly with each other.

Amended theory of transportless relaxation
As already mentioned in section 2, generic many-body systems exhibit an extremely dense energy spectrum: for a macroscopic system with f ≫ 1 degrees of freedom, the distance between neighboring energy levels is exponentially small in f . Hence, even for an initial state ρ(0) with a macroscopically well defined energy, there is still an exponentially large number of energy levels E n which a priori may possibly be populated with a non-negligible probability p n in (6). Moreover, it seems reasonable to assume that it is impossible to experimentally realize initial states ρ(0) with appreciable populations p n of only a few energy levels. (The opposite case essentially amounts to a Schrödinger cat and usually rules out equilibration in the sense of section 2 right from the beginning). In view of n p n = 1 it follows that every single p n must be extremely small (usually exponentially small in f ), see also (8). All these assumptions are tacitly taken for granted in textbook statistical physics and also in all what follows.
Even when every single level population p n is very small, some of them may still be even much smaller than others (for instance those with energies E n far outside the microcanonical energy window [E − ∆E, E] mentioned below (11)). An important implicit assumption of the approach from section 3 is that some of them are actually negligible (can be approximated as being strictly zero), while all the others can be treated on an equal footing. But in practice, the quantitative choice of the threshold between negligible and non-negligible p n 's is often somewhat ambiguous. Moreover, all the remaining non-negligible p n 's are usually still far from being approximately equally large, hence it is not obvious why the larger ones should not play in some sense a more important role than the smaller ones. The main objective of this section is to amend the approach from section 3 along these lines. Accordingly, we no longer work with (9)-(11) but rather return to the original equations (1)-(7).

Setting the stage
Our starting point is the following property of the dynamics (5), which is intuitively quite plausible and rigorously derived in Appendix A: Consider an arbitrary but fixed ρ(0) with level populations p n as defined in (6). Next we choose a set of "auxiliary populations"p n , which satisfyp n ≥ 0 and np n = 1, but otherwise may still be arbitrary. Then there exists a corresponding "auxiliary density operator"ρ(0) with level populations ρ nn (0) =p n (25) and with the property that is satisfied in very good approximation for arbitrary t and A on condition that n |p n −p n | ≪ 1 .
Taking for granted (27), we thus can and will work withρ(t) instead of ρ(t) in the following. In particular, sufficiently small p n 's can now be safely replaced by strictly vanishingp n 's. Moreover, also the remaining non-negligible p n 's may be "redistributed" among thep n 's within the limits imposed by (27). Since every single p n is usually still extremely small (see above), quite significant changes of many level populations are still admissible along these lines. (However, choosing all the non-vanishingp n 's equally large is usually still impossible without violating (27).) The explicit form ofρ(t) is provided in Appendix A, showing thatρ(t) still closely resembles ρ(t) if (27) is fulfilled. Moreover, whenever ρ(t) is a pure state, alsoρ(t) will be pure. Incidentally, the above approximation (or the more precise version in (A.1)) seems to be a quite interesting new result on its own, that may also be of use for instance in the context of quantum information.
In a second step we assume that the Hamiltonian which governs the time evolution ofρ(t) is not any more given by (1) but rather bỹ H := nẼ n |n n| .
As a result, one again finds that (26) remains a very good approximation on condition that where I denotes the set of indices n with non-vanishing level populationsp n , Intuitively, this finding appears quite plausible upon a closer look at the time evolution of ρ(t) in (5) and the analogous formula forρ(t). A more detailed derivation is provided in Appendix B.

Main idea and assumptions
Very roughly speaking, the key idea is to tailor suitable degeneracies of the modified energiesẼ n 's in (28) so that the probabilitiesp n are equally distributed among the different eigenspaces. More precisely, the set I in (30) must be partitioned into M disjoint subsets I 1 , ..., I M with the property that all energiesẼ n with n ∈ I µ are equal, sayẼ and the concomitant "eigenspace populations" are equal for all µ = 1, ..., M. Since n∈Ip n = 1 we can conclude that M µ=1 p ′ µ = 1 (33) and thus for all µ = 1, ..., M.
In the above described construction, two further constraints have to be taken into account for reasons that will become clear shortly: (a) The number of subsets M must be large, (b) The energy shiftsẼ n − E n must remain so small that t max in (29) is still much larger than the actual relaxation time scale of the system under consideration. Since generic level populations p n and energy level distances are extremely small (see beginning of this section) and in view of the possibility to "redistribute" the p n 's among thep n 's (see below (27)) and to "rearrange" the energy levels (see (29)), it seems reasonable to expect that the above described construction can be successfully implemented in many cases of interest. One particularly simple possibility is as follows: Assuming that the system exhibits a macroscopically well-defined energy (see above equation (12) and beginning of this section), there exists a microcanonical energy window W := [E − ∆E, E], whose width ∆E is small on the macroscopic scale, but still so large that we can setp n = 0 for all n with E n ∈ W (see below equation (27)). In other words, the set I in (30) only contains n's with E n ∈ W . Similarly as above (9), we can and will temporally redefine the corresponding indices so that n ∈ {1, ..., D} for all those E n 's contained in W , and thus I = {1, ..., D}. Moreover, we can assume without loss of generality that those E n 's are ordered by magnitude (i.e. E n+1 ≥ E n for all n ∈ {1, ..., D − 1}). In a second step, we defineM as the smallest integer with the property thatM ≥ 1/ √p max , wherep max := max npn . According to the discussion at the beginning of this section,p max will usually be exponentially small in f for a system with f degrees of freedom, henceM will be exponentially large in f . Next, we choose I 1 := {1, ..., D 1 }, where D 1 is the smallest integer with the property that D 1 n=1p n ≥ 1/M. Finally, the latter inequality can be turned into an equality, i.e., D 1 n=1p n = 1/M, by slightly reducing some of thep n 's with n ≤ D 1 (and at the same time slightly increasing some with n > D 1 ). By modifying thep n 's along this line, one readily sees that the original sum on the left hand side of (27)  If we now change the labels µ so thatS = {1, ..., M} and define E ′ µ := E Dµ , then all requirements of our above described construction are fulfilled. In particular, t max in (29) will be exponentially large in f .

Derivation of the main result
In order to explain the main ideas, we temporarily focus on pure states ρ(t) (for mixed states see section 7.6). Hence, alsoρ(t) is pure (see below (27)), i.e., there exist certain (normalized) vectors |ψ(t) and |ψ(t) so that Since the dynamics of ρ(t) is governed by the Hamiltonian H from (1) and that ofρ(t) byH from (28), it follows that see also (2) and (3). Exploiting (37), the level populations in (25) can be rewritten as where c n := n|ψ(0) . Sincep n = 0 unless n ∈ I (see (30)) it follows that In passing we note that a pure state like in (36) may still exhibit a small population p n of every single energy level, as required throughout our present approach. In particular, the diagonal ensemble in (7), which governs the long-time behavior (after equilibration) will then exhibit a small purity Tr{ρ 2 dia } notwithstanding the fact that we are dealing with a pure state, i.e., Tr{[ρ(0)] 2 } = 1.
Taking for granted that the construction from the previous subsection has been successfully implemented, the approximation A ρ(t) = ψ (t)|A|ψ(t) (42) will thus be fulfilled very well for all t ≪ t max . Furthermore, it follows from (32) and (40) that the vectors (44) and that (41) can be rewritten as Moreover, we can infer from (28) and (31) that and with (39) and (45) that Exploiting (42), we finally arrive at where the last relation follows from (37) and (45). In particular, ρ ′ µν (0) is a well defined M × M density matrix (Hermitian, positive, of unit trace).
The right hand side of (48) is formally identical to that of (10). But now all level populations are equal (see (34)), i.e., we got rid of the shortcomings mentioned at the beginning of section 7.
At this point, the assumption (a) from (35) is needed. Namely, due to this assumption and the formal equivalence of (48) with (10), the heuristic considerations from section 3 or the more rigorous treatment in [14,15] can be adopted to arrive at the counterpart of (20), namely Exploiting (35) once more, one can infer from (52), similarly as in (21), the very accurate approximation Upon comparison of χ(t) in (53) with φ(t) in (17), the main properties of G(t) in (56) readily follow from those of F (t) in (21), see above (22) 1 for all t. (iii) G(t) remains negligibly small for the vast majority of all sufficiently large times t. In the latter statement we took (35) for granted and we assumed without loss of generality that the E ′ µ in (31) were chosen so that E ′ µ = E ′ ν for all µ = ν.
Setting t = 0 in (51), the above property (i) implies that More precisely, (57) is an approximation of the same quality as (51) itself. Next we make use of the assumption (b) below (35) that A ρ(t) approaches its approximately constant long-time limit already for times t much smaller than t max in (29). On the one hand, for (most of) those times t the result (51) is still valid and the function G(t) therein must assume values close to zero. On the other hand, we know from section 2 that A ρ(t) stays very close to A ρ dia for most t's beyond the initial relaxation time span. We thus can conclude that in very good approximation By introducing (57) and (58) into (51) we arrive at the main new result of our paper, namely (59)

Discussion of G(t)
A first set of basic qualitative features of G(t) are the properties (i)-(iii) mentioned below (56). The remainder of this subsection is devoted to recasting G(t) from (56) and (53) into physically more illuminating and practically more convenient forms. By utilizing the approximation (34) and the definition (32) we can conclude with (53) that Observing (31) and that the set I is the disjoint union of the subsets I 1 , .., I M (see above (31)) implies Sincep n = 0 for n ∈ I (see (30)) we arrive at By similar (but simpler) calculations as in Appendix B (especially around (A.57)) in combination with our assumption (29) one finds that theẼ n 's in (62) can be very well approximated by the E n 's. Furthermore, δ from (63) can be safely neglected in (62) due to our assumption (27). Exploiting (25), we thus obtain as a first main result of this subsection This is the announced amendment of (17), quantitatively accounting for our previous expectation that larger level populations ρ nn (0) should somehow play a more important role than smaller ones.
Next we rewrite (64) in the equivalent form The function ρ(E) thus quantifies the detailed population of all the energy levels, and χ(t) is its Fourier transform + . Usually, the energies E n are extremely dense and the sum of delta functions in (66) can be replaced by a reasonably smoothened approximation without any notable change of χ(t) in (65) during the entire initial relaxation time period, see also Appendices A and B. In other words, ρ(E) may be viewed as the smoothened (coarse grained) energy distribution of the system. While this distribution is hardly ever available in experiments, it often is in numerical simulations, as exemplified in section 8. The same approximation as for F (t) in (22) is readily recovered for G(t) via (56) and (65) if the ρ nn (0) in (66) are (approximately) equally large for all E n below some threshold energy E and (practically) negligible for all E n > E, and provided that the Hamiltonian H exhibits reasonable thermodynamic properties (well defined entropy S(E) and (positive, intensive) temperature T := 1/S ′ (E)). The same result still applies if only energies E n within a microcanonical energy window [E − ∆E, E] contribute, as long as its width ∆E is much larger than the thermal energy k B T , as it is usually the case. More precisely, it is only the coarse grained ρ(E) (see below (66)) that must closely resemble the one which would be obtained for strictly equally large ρ nn (0)'s for all E n ∈ [E − ∆E, E]. The actual ρ nn (0)'s (before coarse graining) may thus still exhibit quite considerable "fine grained" variations. In other words, the approximation (67) is found to remain valid under substantially weaker premises than its predecessor in (22). Instead of such a microcanonical distribution, one might also consider a canonical distribution, i.e., the ρ nn (0)'s are (approximately) proportional to exp{−E n /k B T }. Similarly as in (22), a straightforward calculation then yields Note that dE(T )/dT is the system's specific heat and dE(T )/d(k B T ) is a dimensionless number which is typically comparable in order of magnitude to the number f of the system's degrees of freedom. However, it must be emphasized that there is no reasonable argument of why the far from equilibrium initial state ρ(0) at time t = 0 should exhibit a canonical energy distribution in the basis of the Hamiltonian H which governs the relaxation dynamics of the isolated system for t > 0. For systems at thermal equilibrium, the so-called equivalence of ensembles is often taken for granted under quite general conditions. However, no such equivalence is to be expected for the temporal relaxation of far from equilibrium initial states, as exemplified by the very different findings (67) and (68).
More generally speaking, the above examples illustrate the fact that the function G(t) depends on the details of the initial energy distribution, but does not depends on any further properties of the initial condition.
Taking into account (1), (36), and (38), one can rewrite (64) as i.e., χ(t) represents the overlap between the time evolved state and the initial state. Similarly, (56) takes the form i.e., G(t) may be viewed as a survival probability (of the initial state) or return probability (of the time evolved state), sometimes also denoted as (quantum) fidelity. Mathematically speaking, (38) and (69) immediately imply that for any, arbitrary but fixed reference time point s ∈ R. Physically speaking, this observation is quite remarkable: The crucial function G(t) in (59) can be recovered from the overlap decay in (71) with respect to any time evolved state |ψ(s) of the system, even if the reference time s is chosen very "late" and thus one might have expected that the system has already equilibrated in any meaningful sense, and, in particular, has "forgotten" the initial disequilibrium conditions.

Summary and discussion
The main result of this section consists in the approximation (59) for the temporal relaxation, where G(t) in (56) follows from either of the equivalent forms (64), (65), or (69). They encapsulate the details of how the function G(t) in (59) decays from its initial value G(0) = 1 towards G(t) ≃ 0 for (most) sufficiently large t. In particular, upon rewriting (59) as taking for granted the assumptions underlying this result (see below), and observing that G(t) in (70) is independent of A, we can conclude that, for any given ρ(0), the left hand side in (72) exhibits for all observables A the same temporal relaxation behavior. Provided that the additional information required in (64), (65), or (69) is available, this result (59) represents a significant step beyond the previously known approximation (20), wherein F (t) follows from (17) and (21).
In particular, to determine F (t) one usually needs to explicitly specify some appropriate energy window (see above equation (12)). In addition, in order to evaluate (17) and (21), one must determine the eigenvalues of the Hamiltonian. In contrast, G(t) can be determined via (70) without explicitly specifying some energy window and without diagonalizing the Hamiltonian.
The underlying key idea and main requirements essentially amount to the following three steps: To begin with, all extremely small level populations p n are neglected. The remaining, non-negligible p n 's are then distributed into subsets I µ with approximately equal net populations n∈Iµ p n . Moreover, all energies E n belonging to the same subset must be very close to each other. In the end, the initially neglected p n 's are redistributed among the subsets, and also the non-negligible p n 's may still be slightly adjusted, the main aim being to further equalize the subset populations.
Once such a rearrangement of the energy eigenvalues and redistribution of the level populations is accomplished, the same arguments as in section 3 or in [14,15] can be adopted to arrive at (59). In so far as these arguments are non-rigorous (no error bounds or systematic improvements or are available), the result (59) may be viewed as an approximative proposition of the same character.
The remaining requirements are largely the same as in sections 4 and 5. The basic reason is that the prediction (59) is essentially a modification of (20), it is not expected to cover previously excluded cases.
In passing we note that when focusing for a given pure state (36) on the particular observable A = |ψ(0) ψ(0)|, then the expectation value on the left hand side of (59) coincides exactly with the survival probability in (70). On the right hand side of (59), one readily find that A ρ(0) = 1 and A ρ dia = n p 2 n ≤ max n p n . Since p n ≪ 1 for all n (see (8) and beginning of section 7), our result (59) thus reproduces the exact result very well in this special case. The latter exact result apparently goes back to Torres-Herrera, Vyas, and Santos (see [16,17] and further references therein), hence our present work may be viewed as a generalization of theirs.

Mixed states
So far, our main result (59) has only be justified for pure states (see section 7.3). Turning to mixed states, we recall that any given density operator ρ can be written in the form for some suitably chosen set of pure (normalized) states |ψ j and weights w j ≥ 0 with J j=1 w j = 1. In general, the vectors |ψ j need not be pairwise orthogonal and not even linearly independent, hence there usually exist many different "representations" (73) of the same density operator ρ. The same properties remain true when the density operator and the pure states in (73) acquire a time dependence via the pertinent Liouville-von Neumann and Schrödinger equations, respectively. Such a time dependence is henceforth tacitly assumed in (73), while arguments t are still omitted.
Taking for granted that every pure state |ψ j in (73) satisfies the requirements from section 7.5, the approximation (59) will be valid for each of them. Next we observe that all expectation values appearing in (59) are linear functionals of ρ. But in general, also G(t) on the right hand side is a non-trivial (non-linear) functional of ρ according to (56) and (64). It follows that (59) cannot be valid in full generality (the left hand side is linear and the right hand side non-linear in ρ). However, under the extra assumption that G(t) is (approximately) identical for all |ψ j with non-negligible weights w j in (73), one readily concludes that also their linear combination in (73) will satisfy (59), where the symbols ρ and ρ dia in (59) now refer to the actual density operator ρ on the left hand side of (73), and likewise for the ρ's appearing in (64)-(66). It seems reasonable to expect that such approximately identical G(t)'s may arise -at least for one of the many possible representations (73) of the same ρ -in many cases of interest.
In fact, if the initial state ρ(0) is of low purity ("strongly mixed"), i.e., Tr{[ρ(0)] 2 } ≪ 1, it is rigorously shown in Appendix C that our main result (59) still amounts to a very good approximation, where G(t) is again given by (56) and (64). In other words, (59) is known to apply both for pure and strongly mixed states. Once again, it is therefore quite plausible that the same result will remain (approximately) correct also in the intermediate case, i.e., when the purity Tr{[ρ(0)] 2 } is neither unity nor close to zero, see also end of Appendix C. However, providing a more rigorous demonstration or criterion appears to be a very daunting task.

Comparison with numerics
As already mentioned at the beginning of section 6, the spatial homogeneity requirement of our present theory considerably restricts the number of suitable experimental and numerical examples in the literature, with which it might be compared. Moreover, our amended theoretical prediction (59) requires information about the function G(t) in (56) and thus either about the level populations in (64)-(66) or about the overlaps in (69), which is not available in most experiments up to now. However, it is noteworthy that the overlap of two quantum many-body states has recently been successfully measured for ultra-cold bosonic atoms in optical lattices [42], hence a direct comparison of our theory with experiments may become feasible in the future. With respect to numerical results, the latter information should in principle be accessible quite often, but in practice it is provided as published data in a relatively small number of cases. In the following, we compare our theory with two such examples, for which all the necessary data are available.
Our first example is the extended Hubbard model for 8 strongly correlated fermions on a one dimensional lattice with 24 sites, whose thermalization after a quantum quench has been numerically explored by Rigol in [43]. figure 2 exemplifies a representative nonintegrable case with nearest-neighbor hopping and interaction parameters τ = V = 1 and next-nearest-neighbor hopping and interaction parameters τ ′ = V ′ = 0.32, corresponding to the data from figures 2(g) and 7(a) in [43]. The numerical findings are compared in figure 2 with the amended theory from (56), (59), and (64), as well as with its predecessor from (20) and (22) [43] for the density-density structure factor δN k (t) of a one-dimensional fermionic model system (for more details see main text and [43]). Solid: Theoretical prediction from (59), where G(t) was evaluated according to (56) and (64) by employing the numerically determined values of E n and ρ nn (0) from [43], see figure 7(a) therein (the original data were kindly provided by Marcos Rigol). Dashed: Theoretical prediction from (20) and (22) (or from (59) and (67)), adopting the estimate T = 3 provided by [43]. Both in (20) and (56), the quantitative values of A ρ(0) and A ρ dia have been fitted to the numerical data. Following [43], the units have been chosen so that k B = = 1.
Bosons on 24 sites), whose magnitude can be estimated from the non-stationarity of the numerical data beyond the actual relaxation time span in figure 2 (see also figure 2(g) in [43]), it is impossible to decide which of the two theoretical curves exhibit a better agreement. Within these numerical finite size effects (which are beyond the theory) both curves agree reasonably well with the data. We also may recall that the only fit parameters of the theory are the initial value A ρ(0) and the long-time average A ρ dia . As already mentioned in section 6, the quantitative determination of those two values for the quite elaborate observable at hand (a dimensionless descendant of the densitydensity structure factor [43]) is not a main objective of our present work.
Our second example is the spin-chain model, numerically explored by Torres-Herrera, Vyas, and Santos in [16], see figure 3. Specifically, the relaxation of an initial state, consisting of 8 alternating pairs of parallel spins is observed via the correlation C z (t) of two neighboring spins in the middle of the chain [16], for which the initial expectation value is known to be C z (0) = 0.25. The two examples in figure 3 with λ = 0 correspond to integrable systems, which are in general not expected to thermalize in the long-time limit, while the three examples with λ = 0 are non-integrable, hence C z (t) should approach the thermal long-time limit zero. This expected long-time behavior is reasonably but not extremely well fulfilled by the numerical results for the two integrable and the three non-integrable cases in figure 3(a). In fact, temporal "oscillations" comparable to those of the cross-and star-symbols in figure 3(a) for t ∈ [1,2] are found to persist in all five cases up to (practically) arbitrarily large times t (not shown). Similarly as in the previous example in figure 2, these persistent oscillations are probably due to the still relatively small system size (16 spins). In other words, it seems reasonable to expect that the behavior of C z (t) for much larger systems may still deviate by 0.05 (or even more) from the corresponding results in figure 3(a). Analogously, the numerically obtained results from [16] for the survival probability G(t) in (70) are reproduced in figure 3(b). Apparently, the numerical finite size effects for this quantity G(t) are considerably weaker than for the quantity C z (t) depicted in figure 3(a).
To connect these numerical results with our present theory, C z (t) in figure 3(a) must identified with A ρ(t) in equation (59), while G(t) in figure 3(b) coincides with G(t) in (59). Still, the theory does not imply any prediction regarding either of these two quantities themselves. Rather, it predicts that the two quantities should be related to each other according to (59). In doing so, the initial value A ρ(0) appearing in (59) is known to be C z (0) = 0.25 (see above). Moreover, the long-time limit A ρ dia appearing in (59) must be estimated from the long-time behavior of C z (t) in figure 3(a). In view of the above mentioned finite-size effects of the numerical data for C z (t) in figure 3(a), the agreement between this theoretical prediction of equation (59) and the numerical findings in figure 3 is quite satisfying.

Conclusions
The main result of this paper is the following approximation for the temporal relaxation of a (pure or mixed) state ρ(t), whose dynamics is governed by a Hamiltonian with energy eigenvalues E n and eigenstates |n : where the observable A has been tacitly "rescaled" so that the long-time average of the left hand side is zero. The first main prerequisite for (74) is that the system must equilibrate at all, i.e., the left hand side of (74) must remain very close to a constant value (here assumed to be zero) for the vast majority of all sufficiently large times t, where "very close" is meant in comparison with the full range of possible measurement outcomes of A. To guarantee the latter equilibration property, we have taken for granted a set of sufficient conditions, which are already rather weak, and which could still be considerably weakened in principle. Most importantly, it is required that there are no degenerate energies and energy gaps (i.e. the energy differences E m − E n are non-zero and mutually different for all pairs m = n), and that all level populations n|ρ(0)|n are small (cf. (6) and (8)). On the other hand, it is not required that the system exhibits thermalization, i.e., the long-time average in (74) may still be different from the pertinent thermal equilibrium value.
The second main prerequisite for (74) is the absence of any notable macroscopic transport currents, caused, e.g., by some initially unbalanced local densities. Such a transportless relaxation can usually be taken for granted if both the system Hamiltonian and the initial state are spatially homogeneous on macroscopic scales. A more detailed discussion of further possible prerequisites for (74) is provided by sections 4 and 5 (see also sections 7.2 and 7.6). In fact, formulating conditions, which are strictly sufficient for (74) but not too restrictive for practical purposes, remains an open problem. In this respect, the situation is somewhat similar as in density functional theory, random matrix theory, and other "non-systematic", but practically very successful approximations.
The most striking property of (74) is that the considered observable A does not matter in the last factor, which encapsulates the entire time dependence of the relaxation. Generically, this factor is unity for t = 0 and very close to zero for practically all sufficiently late times. Specifically for a pure initial state |ψ(0) , the last factor in (74) can be identified with | ψ(t)|ψ(0) | 2 (survival probability). On the one hand, (74) may thus be viewed as a (very substantial) generalization of previous results by Torres-Herrera, Vyas, and Santos [16,17]. On the other hand, also the earlier results from [14,15] are recovered as a special case, namely when all level populations n|ρ(0)|n can be approximated as being either strictly zero, or equal to some (small but finite) constant value.
In many cases of practical interest, the last factor in (74) can be further approximated as 1/[1 + (t k B T / ) 2 ], where T is the temperature after thermalization, or, if the system does not thermalize, the temperature of a thermalized auxiliary system with the same (macroscopic) energy as the true system. In general, transportless relaxation is thus predicted to be non-exponential in time, and the relevant time scale /k B T to be very small.
In principle, all these predictions may be viewed as approximative propositions due to the non-rigorous line of reasoning adopted in section 3 or in [14,15]. On the other hand, they have been validated by showing that they compare very favorably with various previously published experimental and numerical results for systems, which satisfy the above mentioned main prerequisites of the theory reasonably well.
for arbitrary t and A, where the time evolution of both ρ(0) andρ(0) is governed by the Hamiltonian (1), and where ∆ A is the range of the observable A, i.e., the difference between its largest and smallest eigenvalues. Since any real measurement device corresponding to the observable A has a finite range ∆ A as well as a finite resolution δA (see also section 2), it follows that the two expectation values on the left hand side of (A.1) are experimentally indistinguishable if the sum on the right hand side is smaller than (δA/∆ A ) 2 . Altogether, this amounts to the precise quantitative justification of the two above mentioned approximations.
A secondary goal of this appendix is to show that whenever ρ(t) is a pure state thenρ(t) will be pure as well.
To begin with, we recall from the beginning of section 2 the relations The left hand side of (A.4) is understood as usual: In other words,ρ(0) indeed exhibits the given level populationsp n . Moreover, one readily verifies thatρ(0) is a non-negative, Hermitian operator of unit trace, i.e., a well-defined density operator.
If ρ(0) is a pure state, it can be written in the form |ϕ ϕ| for some |ϕ of the form c n |n . By means of (A.9) and (A.11) it follows thatρ(0) can be rewritten as |φ φ| with |φ := g n c n |n , i.e. alsoρ(0) is a pure state. Since ρ(0) is a non-negative Hermitian operator, there exists a Hermitian operator σ with the property that σ 2 = ρ(0). Considering Tr{C † 1 C 2 } as a scalar product between two arbitrary linear (but not necessarily Hermitian) operators C 1,2 , the Cauchy-Schwarz inequality takes the form |Tr{C † 1 C 2 }| 2 ≤ Tr{C † 1 C 1 }Tr{C † 2 C 2 }. Choosing C 1 = (Qσ) † and C 2 = σB we can infer from (A.17) that Observing that all operators on the right hand side of (A. 19) are Hermitian and exploiting the cyclic invariance of the trace yields we can conclude that Since we assumed that |ψ is normalized, also |ψ ′ in (A.45) will be normalized and the last factor in (A.52) can be upper bounded by Rewriting |ψ as n c n |n with c n := n|ψ , the normalization takes the form n |c n | 2 = 1. Furthermore, we can infer from (1) Since this bound is independent of |ψ , one finds by means of (A.56), (A.42), and (A.40) that Exploiting the definition ρ mn (0) := m|ρ(0)|n (see below (5)) and the Cauchy-Schwarz inequality one can readily show that |ρ mn (0)| 2 ≤ ρ mm (0)ρ nn (0) = p m p n (see also (A.34)). It follows that only those summands in (5) are non-zero for which both m and n are contained in the set I from (A.35). Without loss of generality we thus can focus on the case thatẼ n = E n for all n ∈ I. As a consequence, it is sufficient to maximize in (A.60) over all n ∈ I, i.e., we recover the announced final result (A.33).

Appendix C
The purpose of this appendix is to show that (59) with G(t) from (56) and (64) is fulfilled in very good approximation if ρ(0) is a mixed state of low purity, that is, if P := Tr{[ρ(0)] 2 } ≪ 1 . (A.61) Conceptually, the subsequent considerations are somewhat similar to the explorations of dynamical typicality in [44,45,46,47]. Technically, the calculations are particularly close to those in [47].
To begin with, we denote the eigenvalues and eigenvectors of ρ(0) by r n and |ϕ n , respectively, implying where r n ≥ 0 and n r n = 1. Next, we consider an ensemble of (not necessarily normalized) random vectors |ϕ , defined via Similarly as below (A.28), the operator norm A on the right hand side of (A.75) can furthermore be replaced by ∆ A /2, where ∆ A is the measurement range of the observable A (largest minus smallest eigenvalue). Invoking Chebyshev's inequality once more, one thus arrives at In view of (A.61), the vast majority of all vectors |ϕ in (A.64) thus exhibit expectation values A ϕ(t) , whose deviations from the ensemble average A ρ(t) are very small compared to full range ∆ A over which those expectation values in principle could vary.
Recalling the definition of ρ dia in (7) and defining in the same vein the auxiliary observable The same result is readily recovered also for W from (A.93). With (A.63) we thus can conclude that σ 2 χ ≤ P . Due to (A.89) and Chebyshev's inequality it follows that Prob |χ ϕ (t) − χ(t)| ≤ P will be satisfied in very good approximation for most |ψ 's. A more detailed quantitative demonstration that all four approximations (A.102)-(A.105) will be simultaneously fulfilled very well by most |ψ 's can be worked out analogously as in [47]. At this point, a subtle notational difference between the main text and this appendix comes into play: In the main text, the result (59) with G(t) from (56) was derived under the condition that ρ(0) is a pure state, see (37), and hence G(t) can be written in the form (70). In the present appendix, ρ(0) represents a mixed state of low purity according to (A.61). In turn, the above mentioned result for pure states in the main text should now be rewritten for the pure states |ψ(t) considered in this appendix as with χ(t) from (A.85). Since the latter equation is equivalent to (64), we thus have proven that (59) in the main text in fact also holds true for mixed states ρ(0) of low purity, as announced at the beginning of this appendix.
In the above conclusion, we have tacitly taken for granted one more assumption, namely that there exists at least one |ψ which satisfies (A.102)-(A.105) very well, and which at the same time satisfies the preconditions for (A.106), as discussed in sections 4 and 5. While a rigorous justification of this extra assumption seems to be a quite daunting task, it also seems quite reasonable to expect that the assumption will be fulfilled if (and only if) the mixed state ρ(0) itself satisfies those preconditions from sections 4 and 5.
Finally, we turn to the case that the mixed state ρ(0) is not of low purity (but still not a pure state). In such a case, there is no reason to expect that (A.102)-(A.106) will be simultaneously fulfilled for most |ψ 's. However, one may still expect that (A.102)-(A.106) will be simultaneously fulfilled for at least one |ψ , at least for some such ρ(0)'s. If so, (A.107) and thus (59) in the main text still remain true even when the purity of ρ(0) is not small.