Integrals of motion in the Many-Body localized phase

We construct a complete set of quasi-local integrals of motion for the many-body localized phase of interacting fermions in a disordered potential. The integrals of motion can be chosen to have binary spectrum $\{0,1\}$, thus constituting exact quasiparticle occupation number operators for the Fermi insulator. We map the problem onto a non-Hermitian hopping problem on a lattice in operator space. We show how the integrals of motion can be built, under certain approximations, as a convergent series in the interaction strength. An estimate of its radius of convergence is given, which also provides an estimate for the many-body localization-delocalization transition. Finally, we discuss how the properties of the operator expansion for the integrals of motion imply the presence or absence of a finite temperature transition.


Introduction
The thermodynamic description of macroscopic bodies, as shown by Boltzmann in his work on the foundations of statistical mechanics, is based on the assumption that the underlying by the conservation of energy and momenta only. The long-time relaxation then only leads to a restricted (generalized) Gibbs ensemble.
Given this similarity, it was conjectured that, like in the non-interacting limit, an extensive set of (quasi-)local integrals of motion should exist in the MBL phase [33,35,34]. By definition, those do not evolve with time, as they commute with the Hamiltonian. They thus constrain the dynamics to remain very close to the initial condition in which the system was prepared. The existence of such local integrals of motion was recently proven for a particular spin Hamiltonian in [35], under reasonable assumptions bounding potential level attraction. The notion of locality used above refers to the set of degrees of freedom, which the conserved operator affects. Conserved quantities in integrable systems are not local in this sense, as they are sums over all space of certain local terms. 4 The non-locality (in our sense) of those integrals still allows for finite transport in integrable systems. A further important difference between MBL systems and integrable ones is the fact that MBL is robust with respect to any sufficiently small perturbation of the Hamiltonian, while integrability in 1d systems is broken by generic perturbations.
The aim of this paper is to show that quasi-local integrals of motion exist for weakly interacting disordered electrons, under the same set of assumptions that were made in the original work by Basko et al. [4] (henceforth referred to as BAA). We find such integrals by solving equations for conserved operators within perturbation theory. Our approach reduces the problem to the solution of a single-particle-like hopping problem in operator space, for which we present a solution in the strongly localized regime, and determine the radius of convergence of the construction. This furnishes an estimate of the delocalization very similar to that obtained by Basko et al. [4]. We hope that our technique will help to obtain analytic results on many-body localization in the future.

Outline and summary of this work
Here, we present a short outline of this work, summarizing the main steps, and the problems we address.
We are seeking integrals of motion for disordered electrons with weak short range interactions, as defined in Eqs. (1), (2). In Section 2 we coarse-grain the model, reducing it to an array of coupled quantum dots of size of the order of the single-particle localization length.
The non-interacting model has trivial integrals of motion, namely the occupation numbers of the single-particle eigenstates. We then look for their generalization in the presence of interactions, "dressing" these integrals of motion (Section 3). This leads us to a set of linear equations (Section 4, Eq. (40)) in the space of number conserving operators, which we expand in the basis (28) of products of single particle creation and annihilation operators. For any strength λ of the interaction, these equations define a unique set of conserved operators. The main question to analyze is whether they act locally, or whether they significantly affect a spatially unbounded set of degrees of freedom. We address the question of locality within the so-called forward approximation, introduced in Section 5, where we only determine the leading term in perturbation theory for the expansion coefficients. Since the interaction terms act locally, for the conserved quantities to be non-local increasingly high orders of perturbation theory must contribute to the expansion, i.e., the perturbative expansion diverges (Section 5.2). We represent diagrammatically the particle-hole creation processes, which dominate the forward approximation, in Section 5. 4. In order to study the statistics of the diagrams at high orders, we need to solve three main technical problems. One is the estimate of their number, due to the freedom in choosing the interaction vertices and their order. We solve this (Section 5.4) by introducing an integral representation that sums correlated diagrams sharing the same interaction vertices. This reduces the factorially many (in the order N of the perturbation theory) terms to an only sub-exponential number of terms, which are products of N denominators. The second problem concerns their statistical distribution. In the many-body problem the denominators are correlated even within the forward approximation, at variance with one-particle problems. Therefore determining the statistics of large deviations, which dominate the probability of creating excitations at large distance, is a challenge. We solve it using a transfer-matrix technique (Section 7). Finally, we have to count the number of processes leading to a given configuration in operator space, which is a combinatorial problem in the space of diagrams (Section 8). The last two ingredients allow us to determine the decay rate of the largest of these terms, which dominates the expansion. Requiring a positive spatial decay rate determines the range of convergence of the operator expansion in the forward approximation.
After solving these technical problems, we obtain the final result in Section 9, namely the existence of quasi-local integrals of motion for disordered electrons for sufficiently small interaction λ < λ c . We find λ c in the forward approximation: in the same spirit as Anderson's "upper bound" approximation, this is expected to yield a lower bound for the actual phase boundary for manybody localized phase of the lattice system at infinite temperature. In a final section, we discuss possible scenarios for a localization transition or crossover at finite temperature (Section 10).

Model Hamiltonian and coarse-graining
We consider a Hamiltonian describing weakly interacting, spinless electrons in a disordered background. At variance with the work by BAA, we consider a model on a lattice Λ, where (Λ) is the lattice Laplacian, V dis is a random disordered potential and U is a short range interaction. We choose to work with a lattice model, because in a finite volume its Hilbert space is finite, and both spectrum and energy per particle are bounded. This will allow us to take a meaningful limit of infinite temperature, and to make statements about many-body localization in that limit.
It is convenient to write the interaction in the form where ν is the density of states, and u(i − j) is a dimensionless, normalized, short-ranged interaction kernel. The dimensionless parameter λ measures the interaction strength.
We consider a disorder potential such that the single particle part of the Hamiltonian possesses only fully localized wave-functions φ α , α = 1, ..., |Λ|, with typical localization length ξ . Moreover, we are interested in the disorder regime relatively close to single particle delocalization, where ξ is significantly bigger than the lattice spacing a. Let us denote by δ ξ = 1/νξ d the average level spacing in a localization volume, and by W the band width of the single particle problem. The condition ξ ≫ a ensures that a large number of single particle wave-functions overlap significantly in space. This will provide a large parameter for our analysis.
It is convenient to switch to the basis of single particle wave-functions φ α , in which the Hamiltonian assumes the form where n α = c † α c α , and the Greek indices label single particle eigenstates obtained in the absence of interaction. We also choose a certain ordering relation "<" among the indices β, γ .
Our choice of the basis φ α is different from that of BAA, who worked with Hartree-Fock (HF) orbitals. Our choice allows us to work in full generality in the operator space, while HF orbitals depend on the non-interacting occupation numbers, i.e., the many body state around which one analyzes stability with respect to interactions. In Section 8.1 we will argue, however, that in the approximation in which we are working, we can neglect the interaction vertices U αβ,γ δ with two or more coinciding indices, even without resorting to HF, which resums most of those terms. Thus, the two different choices of basis sets lead essentially to the same combinatoric analysis of diagrams.
To simplify the above model further, we assume the single particle energies ϵ α to be random and uncorrelated. The interaction term U is antisymmetrized: U αβ,γ δ = U βα,δγ = −U βα,γ δ . We further simplify it by taking its matrix elements U αβ,γ δ to be local in space, i.e., they are assumed to be non-zero only if the corresponding single particle states have localization center within one localization volume. Hereby we define the localization center of a single particle state as Moreover, it is known that the matrix elements decrease rather rapidly (as a power law) when the energy difference between involved levels exceeds the level spacing in the localization volume δ ξ . This motivates the use of a simplified interaction in which we take u αβ,γ δ to be non-zero only if In these cases we assume where η αβ,γ δ is a random variable, box-distributed in [−1, 1].

Coarse-graining
Let us now coarse-grain the model: we assume that the interaction U αβ,γ δ connects wavefunctions either on the same localization volume or on neighboring localization volumes. For either vertices we assume the same amplitude λ, as long as the restrictions (7) on the energy levels are respected.
This differs from the coarse-graining by BAA, who divided the sample into d-dimensional regions of linear size ξ , restricted the single particle levels α to those regions, but included a (small) hopping term between localization volumes (elastic processes). In contrast the interaction term, responsible for inelastic processes, was restricted to scattering within a given localization cell.

Integrals of motion and absence of transport
In the absence of interactions (λ = 0), the occupation numbers n α of single particle levels are mutually commuting, conserved quantities. These operators are quasi-local in real space, as follows immediately from their expansion in the basis of lattice operators: where φ α is the corresponding localized single particle eigenfunction. By quasi-locality of the n α we mean that an operator c † i c j contributes in the expansion with a weight which decays exponentially in the distance between the localization center ⃗ r α of φ α and the sites it acts on (its support -here the sites i, j ).
By truncating the sum (9) to terms with support only within a neighborhood of mξ of ⃗ r α one obtains an operator, whose commutator with the Hamiltonian vanishes up to exponentially small terms. As m → ∞ the operator rapidly converges (in the operator norm) to the conserved n α . In the non-interacting case this follows directly from the spatial localization of the single particle wave-functions. Our goal is to find an analogue of these operators in the interacting case.
That such a generalization should exist was proven by Imbrie [35] under certain hypotheses on the spectrum in a 1d spin chain, for which he constructed a quasi-local unitary rotation U which essentially diagonalizes the Hamiltonian H . More precisely, it brings it to the canonical form where the k-spin interactions J i 1 ,...,i k decay exponentially with the diameter of their index set. Applying the inverse unitary on the conserved spins σ z i provides one with integrals of motion of the original Hamiltonian H , I i = Uσ z i U † . At the same time, Huse and Oganesyan [33], and independently, Serbyn et al. [34] argued for the existence of such local integrals of motion in general MBL systems.
Note that the set of conserved and mutually commuting quantities is by no means unique. For example, any set of independent polynomials of σ z i 's is conserved as well. A nice property of the set of σ z i , however, is the binarity of their spectrum, {−1, 1}, or the property that (σ z i ) 2 = 1. Knowing the eigenvalues of N independent integrals of motion like this allows one to unambiguously label the 2 N eigenstates of the Hilbert space of an N -spin system [33].
An alternative construction of conserved (but non-binary) quantities is discussed in [36] for a random spin chain, where infinite time averages of local operators are considered (such as n i (t) = e iH t n i e −iH t in our case). By definition of the time average, it commutes with the Hamiltonian. In an MBL phase one expects the average to remain non-zero, whereas it vanishes due to diffusion in an ergodic delocalized phase.
In this paper, we make a different choice, which nevertheless defines a unique set of binary integrals of motion. Our construction consists in two steps. We will first prove the existence of local integrals of motion in perturbation theory in λ, not requiring the binarity of their spectrum. This is the most difficult task and will take the largest part of the paper.

Construction of exact quasiparticles of the Fermi insulator
Since our procedure will leave us some freedom in the choice of integrals, in Appendix A we will show how this freedom can be used to fix the spectrum to be binary, order by order in the interaction λ. Notice that the latter amounts to the construction of exact quasiparticle occupation numbers of the interacting Fermi insulator. In contrast to Fermi liquids where such exact quasiparticle operators cannot be constructed, neither in real nor in momentum space, it becomes possible in the MBL phase. Rewritten in terms of these occupation numbers ñ α , the Hamiltonian can then be seen as an exact quasiparticle energy functional, which determines the energy E (qp) α of any quasiparticle as a function of the occupations of all others, as:

Complete set of local integrals implies absence of transport
Before outlining the construction of the integrals of motion, let us first show how their existence implies the absence of any d.c. transport, and hence many-body localization.
In order to show the absence of d.c. transport, consider the Kubo formula for the conductivity σ associated with the local current density J r , associated with a conserved quantity, such as charge or energy. Let be the current at frequency ω and position r arising in linear response to a spatially homogeneous field E, and denote by the spatially averaged current density, V being the volume of the system. At finite inverse temperature β, the dissipative part of the conductivity is given by: where Π(ω, r) is the Fourier transform of the retarded correlation function of the current operator, with Lehmann representation: Here Z is the partition function, and the limit η → 0 is to be taken after the thermodynamic limit.
Let us now show first that for a complete set of strictly local conserved quantities the conductivity vanishes with probability one in the thermodynamic limit. By strict locality operators we mean that they only act on degrees of freedom that belong to a compact spatial region with finite diameter ζ . We call a set of conserved quantities complete if for any two distinct eigenstates m ̸ = m ′ at least one of those integrals of motion takes a different eigenvalue.
For two eigenstates m, m ′ let Ĩ be such a distinguishing integral, with corresponding eigenvalues Ĩ |m⟩ =Ĩ m |m⟩ and Ĩ |m ′ ⟩ =Ĩ m ′ |m ′ ⟩, with Ĩ m ′ ̸ =Ĩ m . For a strictly local current operator and r sufficiently much bigger than ζ , it follows immediately that one of the two current matrix elements vanishes, since one of the two commutators vanishes. Thus, in Eq. (17) the sum over r can be restricted to r ζ . Furthermore, for any fixed eigenstate m the sum over eigenstates m ′ is restricted to a finite set, since J r ′ |m⟩ can differ only in a finite number (≤ exp(cζ d ), with c = O(1)) of integrals of motion from |m⟩. Thus, in the thermodynamic limit, where we have to send η → 0, the contribution to the δ-function vanishes with probability one, and thus Re[σ (ω = 0)] = 0. Note that the potentially singular term from m = m ′ does not contribute because ⟨m|J r |m⟩ = 0 by time reversal invariance. This discussion is of course over-simplified since the actual integrals of motion are only quasilocal, in the sense that there are corrections to a strict locality, which decay exponentially with the diameter of their support on a typical scale ζ . However, the derivation above reflects the essential mechanism by which a complete set of integrals of motion suppresses transport. Consider the matrix elements ⟨m ′ |J r ′ |m⟩ for eigenstates that differ significantly only in integrals of motion whose support is centered up to a distance xζ from r ′ . These matrix elements are then not exactly zero, but exponentially small in x. There are also exponentially many states m, m ′ which satisfy these criteria, and thus some energy differences E m − E ′ m in (17) become exponentially small. One might worry that these exponentially small denominators can contribute to the δ-function in the thermodynamic limit, leading to a non-zero conductivity. However, the very construction of the local integrals of motion outlined in the following, and the convergence of that procedure, strongly suggest that with probability tending to one as η → 0 the exponential smallness of the energy denominators is dominated by the decay of the matrix elements in (17), in the sense that at small η the contributions to the δ-function come with weights that are almost surely much smaller than η. If this were not the case, resonant energy denominators would systematically appear in the construction of the conserved quantities and prevent their locality. Therefore, the consistency and convergence of the following construction implies the suppression of d.c. transport in systems admitting a complete set of quasi-local conserved operators.

Recipe for the construction of integrals of motion
Let us now come back to the actual construction of quasi-local conserved operators. In order to find a generalization of the single particle occupation numbers to the interacting case one should construct an extensive set of |Λ| functionally independent operators 5 {I α }, which are quasi-local and satisfy Since the spectrum of the many-body system is almost surely non-degenerate, it follows that such conserved quantities also satisfy [I α , I β ] = 0. Their mutual commutativity implies that they form a commutative algebra. As we discussed above, the choice of a basis spanning this algebra is not at all unique. It is worth mentioning that if the operators I α commute with H and span the algebra of operators then we can write as we claimed above. The couplings J 's have similar exponential decay as those in (10).
Here we present a specific construction of conserved operators, which fixes the arbitrariness in their definition in a unique way. Our construction starts from the idea that at weak interactions the I α should be expected to be a perturbed version of the n α . Thus, we look for a perturbative series in λ, For the further discussion it is useful to introduce some natural operator subspaces. I α can be sought as an element of the space C of particle-conserving operators on the Hilbert space, and without loss of generality we may require it to be Hermitian. Since we will require [H, where the same ordering "<" as previously is chosen for the indices β, γ . 5 Functional independence means that no I α can be expressed as a function of all the other I β .
At the nth stage of perturbation theory one has to solve the equation In order for this equation to have a solution one has to make sure that [U, I (n−1) α ] ∈ O. 6 If this is the case, I (n) α is determined up to an element of K. In Appendix A we show how to use this freedom to impose binarity of the spectrum of I α , spec(I α ) = {0, 1}, i.e., I 2 α = I α . The latter allows these operators to be interpreted as generalized quasiparticle number operators of the interacting Fermi insulator.
Below we describe the construction of conserved I α based on a simpler choice, however. In particular, we claim that if our Hamiltonian is time-reversal invariant, and thus has real matrix elements in the basis of single particle eigenstates, there is a unique solution of (24) with I α ∈ O. This choice implies that the only term in the expansion of I α that commutes with H 0 will be the very first one, n α . To prove this at the perturbative level, we have to show that one always finds [U, I ] it follows that x(Ψ 0 ) is purely imaginary, and thus vanishes indeed.
From the above it follows that we can express the solution of Eq. (24) formally as which determines the successive terms in perturbation theory recursively. As we show in Appendix A, the recipe to construct a binary operator consists in modifying order by order the terms in the perturbative expansion by adding to each I (n) α a diagonal operator K (n) α ∈ K, which is determined by the previous orders in perturbation theory as: It is plausible that the convergence for binary operators is essentially the same as for the operators constructed below. Based on the above perturbative argument, we make the following ansatz for the conserved quantities: 6 Note that it is not obvious from the outset that this simple perturbative scheme should work and produce a local operator. Indeed we construct perturbation theory for an extensive set of operators which are all null eigenvectors of [H 0 , .]. In principle one should thus use degenerate perturbation theory for all these operators simultaneously, which could turn out to require a non-local change of basis. The further steps below show, however, that this is not the case.
where the sets I, J run over all sets of indices {β 1 < · · · < β N } of single particle states. Linear constraints on the coefficients A (α) I,J are found by imposing the conservation condition [I α , H ] = 0. The coefficients result as λ-dependent functions of the random energies ϵ α and of the random matrix elements U αβ,γ δ , which vanish in the limit λ = 0. Since the resulting operators I α are functionally independent for λ = 0, we expect the same to hold for any finite λ before the delocalization transition. Indeed it is hard to see how a polynomial of I β̸ =α 's could contain only a single diagonal term n α .
It is important to note that the expansion (28) should not be seen as an expansion in λ, but rather as an expansion in the support on which the operators O I,J act. A formal expansion in λ must always be re-summed locally when rare, but very small denominators are encountered, implying that the naive perturbative series (24), (25) has vanishing radius of convergence in λ [1]. In Appendix B we discuss a simple example where such a re-summation is necessary.
We point out that in any finite system the above ansatz, even though motivated by a perturbative consideration, uniquely determines a conserved operator even if perturbation theory does not converge, despite of re-summations. In that case I α is defined as the finite (possibly exponentially large) sum (28) whose coefficients satisfy the linear system of Eq. (40) below. In a delocalized regime that operator will have support on the whole system.

Convergence criterion
We argue that for sufficiently small λ the expansion (28) converges in the operator norm. The convergence holds in probability, that is, for any ϵ > 0: where r(I, J) = max β∈I∪J |⃗ r α − ⃗ r β | is the maximal distance between the localization center of the state α and any of the states β that are acted upon by the operator O I,J . P is the probability measure over the disorder realizations. This ensures that the series defining the operator I α converges almost surely, since ∥O I,J ∥ = 1 for all I, J.
The resulting operator I α is quasi-local in the sense defined above. As will become clear below, cf. Section 5.2, one can associate a length scale to the support of these operators like for the non-interacting case: truncating the expansion at that length scale yields operators that are conserved up to exponentially small corrections. This scale is essentially the localization length pertaining to the interacting problem.
The many-body delocalization transition is expected to happen at a sharply defined critical value λ = λ c of the interaction strength, at which thermalization and ergodicity are restored. It is natural to expect that this coincides with the delocalization of physically defined conserved quantities, such as the time average of local operators. There is also a sharply defined interaction strength λ = λ ′ c at which our integrals I α become non-local with probability one. Logically we cannot exclude that λ ′ c is slightly smaller than λ c (since it might be possible to find a prescription for conserved quantities that leads to more local operators than ours); however, we believe that within the approximations we are making, see Section 5, λ c and λ ′ c cannot be distinguished. We therefore use the notation λ c indistinctly for both critical values.
To discuss the convergence (29), we map the problem of constructing conserved quantities into an equivalent problem of a particle hopping on a disordered lattice whose sites are labeled by the Fock indices (I, J). In particular, the exponential decay of the coefficients of I α corresponds to the localization of the particle on that lattice, in analogy with the non-interacting case (9). In turn, the delocalization of the particle corresponds to the divergence of the operator expansion (28).

Explicit construction of the integrals of motion
In this section we present the equations defining A I,J in (28) and discuss how to solve them. To illustrate the procedure, we first solve exactly a non-interacting case and then proceed with the interacting problem.

Non-interacting single-particle example
Consider a non-interacting one-dimensional disordered Hamiltonian: where ϵ i are random energies and the hopping t is treated perturbatively. In this case, the ansatz is consistent. Imposing [H, I k ] = 0, we obtain a set of linear equations for the coefficients A (k) ij , one equation for each index k. If for identical indices we define: then the equations for A (k) ij with i ̸ = j can be compactly written as: In view of these equations, one may re-interpret A (k) ij as the wave-function amplitudes of a particle on a square lattice with sites (i, j), and correlated on-site disorder E i,j = ϵ i − ϵ j , subject to the constraint (32). An explicit expression for them can be given in terms of the eigenfunctions φ α of the Anderson problem (30) as: where the ω k α have to be determined from the constraint The exponential decay of the amplitudes (34) in the distance between the sites i, j follows from the localization in space of the eigenstates φ α . It implies the convergence of the expansion (31). Therefore, the operators I k are quasi-local conserved operators, similarly to the particle number operators n α in (9).
However, note that these two sets of operators differ, in particular (31) does not contain any diagonal terms (i = j ̸ = k). Using (34), (35) one can also explicitly check that the operators (31) do not coincide with the time average of the operators n k (t).

Interacting case
We now return to the interacting case. Since the operators I α will contain strings of c † 's and c's of arbitrary length, we need a way to deal with large index sets. We introduce the following notation: for any index set X = (x 1 · · · x N ), we define diagonal coefficients as zero, except if X = {α}: Moreover, for any l, m (with l < m) and any single particle labels γ , δ (with γ < δ), define the index sets: In general, the set X ··· ··· is obtained from X by eliminating the indices in the subscript and appending the ones in the superscript on the left. Note that the resulting sets are thus not ordered. Let σ [·] denote the sign of the permutation which orders the set, and define: Finally, for index sets with |Y| = |Z|, define the modified amplitudes: With this notation, the condition [H, I α ] = 0 is equivalent to the following set of linear equations for A (α) where (I, J) = (α 1 · · · α N , β 1 · · · β N ) and I ̸ = J. The diagonal coefficients appearing on the righthand side are defined in (36).

Topology of the operator lattice
Similarly as in the previous single-particle example, Eq. (40) can be thought of as a hopping problem for a single particle on a lattice with sites given by the Fock indices (I, J) and local, correlated disorder E I,J = N n=1 (ϵ α n − ϵ β n ). The hopping is provided by the interaction U , see  The lattice topology, as determined by the interactions, is rather complicated. However, Eq. (40) has a clear hierarchical structure: the equation for index sets I, J of length N are coupled only to amplitudes with index sets of equal or shorter length. Therefore, the sites can be organized into generations, according to the length of their index sets. Hopping is possible only within the same generation (second term in Eq. (40)) or between consecutive ones (third term in Eq. (40)). In the latter case, the hopping is unidirectional, and thus the hopping problem is non-Hermitian.
The connectivity of the lattice is determined by the restrictions in energy, Eq. (7), and space (particles need to be in the same or in an adjacent localization volume) of the matrix elements U αβ,γ δ . Hoppings from a site (I, J) in generation N to a site (I ′ , J ′ ) in generation N + 1 requires a particle (or hole) in a state α to scatter to the closest energy level γ above or below α, while another particle-hole pair of adjacent levels (β, δ) is created. The particle β can be chosen in N loc ways with N loc given in (3), and there are two choices for γ and δ, respectively. Therefore, the number of Fock states (I ′ , J ′ ) accessible from (I, J) via the decay of a given quasiparticle α is: In contrast, hoppings from (I, J) to a site of the same generation correspond to processes where each member of a pair of particles (or holes) scatter to one of the two closest energy levels: there are 4 possible final states to which a given pair can decay. At this point we emphasize that we are not restricting ourselves to a specific many-body state or energy sector. Thus no assumption about the occupation of the levels or about the position of the Fermi level E F is made. This gives the largest possible connectivity K. It will be reduced to an effective connectivity once we consider the restriction of the integrals I a to subspaces of a definite energy by means of a projector over many-body states, Ĩ a = P I a P , where This projection will alter the connectivity K, so as to reflect the higher probability for some processes to be Fermi-blocked, since the involved levels might already be occupied. This yields an effective connectivity K eff , whose typical value depends both on the average energy density of the states E a and the average filling fraction of the band. It is not difficult to see that if we use typical values for occupation numbers as given by the Fermi distribution (without assuming the underlying states to be thermal), repeating the above considerations at finite temperature T ≪ E F we obtain K eff ∼ T /δ ξ , in analogy to the analysis in [4].

Simplifications due to large connectivity, ξ ≫ a
The requirement of convergence of the operator expansion, Eq. (29), can be interpreted as a localization condition for the hopping problem on the disordered lattice of Fock indices. In order to investigate under which conditions localization occurs, we introduce the main approximation of this work: we neglect the second term of Eq. (40), that accounts for the hopping between sites in the same generation.
This approximation is motivated by the following consideration, assuming that the number of single particle levels per localization volume, and thus K, is large: for operator sites with a density of Fock indices per localization volume much smaller than the maximally possible ∼ K/ξ d , the connectivity within the same generation is much smaller than the connectivity K among sites in different generations (41). Note, however, that transitions from a given state (I, J) due to the second term of Eq. (40) can involve any pair of particles or holes in the same localization volume. Therefore, for operators with a high density of indices per localization volume those transitions are as numerous as the third class of terms in Eq. (40). Our approximation of dropping the second term is therefore not fully controlled at sufficiently high orders in perturbation theory where operators with a high density of indices per localization volume appear. We postpone further discussions of the subtleties related to this approximation to Section 10.
Once the second term in (40) is dropped, the equations reduce to recursive equations for increasing generations, with the initial condition A (α) α 1 ,β 1 = δ α 1 ,β 1 δ α 1 ,α . However, only some of the amplitudes A (α) I,J in (28) are determined through the recursion, while we approximate all other amplitudes to be zero: in generation N , the non-zero amplitudes correspond to sites (I, J) that can be reached from (α, α) via directed paths of length N − 1. Retaining only these sites simplifies the structure of the lattice of Fock indices very substantially, see Fig. 1 The amplitudes on these sites (I, J) can be written as the sum over all directed paths that connect them to the root (α, α) in Fig. 1(b): The path weights ω path are of the form: in close analogy to forward approximations in single particle problems [1,[37][38][39][40].
The factor (−1) σ path takes into account the global fermionic sign associated with the path, arising from the sign factors in Eq. (40). However, we will see below that these signs are immaterial at the level of our approximation.
Note that the resulting expression for A (α) I,J is of order λ N−1 , that is, the lowest possible order in λ for amplitudes of operators involving 2N particle-hole indices. Indeed, at least N − 1 interactions are needed to create the corresponding excitations.

Probability of resonances on the operator lattice
Let us discuss the configuration in real space of the indices (I, J) with |I| = N , which are retained within the forward approximation, cf. Fig. 1(b). Since the amplitudes A (α) I,J are of order λ N−1 and the interaction is local, the indices satisfy r(I, J) ≤ N ξ : amplitudes involving single particle states sufficiently far away from the localization center α must belong to sufficiently high generations. Within the approximations made, the convergence criterion (29) can then be restated in terms of the generation number N as: for arbitrary ϵ > 0. A sufficient condition for Eq. (45) to hold is that for some z < 1 and for N * sufficiently big: The left hand side of Eq. (46) can be interpreted as the probability that no resonance 7 occurs at large distance from the unperturbed localization center (α, α). Whenever it holds, it implies the quasi-locality of the operators I α within the forward approximation: indeed, Eq. (46) implies that the first appearance of operators c β , c † β 's in I α , with |⃗ r β − ⃗ r α | ≈ N ξ and N ≫ 1 is with high probability exponentially small in N .
In the following we will show that Eq. (46) holds in a regime of small couplings λ; the critical value λ c at which (46) ceases to hold gives an estimate for the radius of convergence of the operator series, and thus for the boundary of the many-body localized phase.

Similarities and differences with localization problems on trees
The similarity to a one-particle problem allows us to revisit analogies and differences between many-body localization and single particle problems on lattices which have some features of a Cayley tree [41] (see also [42,43] and references therein). Indeed, in the simplified lattice Hoppings on the lattice correspond to vertices U α 1 α 2 ,β 1 β 2 in the graph. The energy E I,J of an intermediate state is the sum of the energy differences E α 1 α 2 ,β 1 β 2 = ϵ α 1 + ϵ α 2 − ϵ β 1 − ϵ β 2 associated with all preceding scatterings. The three excitations emanating from a vertex are associated to the outgoing legs as follows: the excitation with energy level adjacent to the incoming one is associated with the central leg. The upper and lower leg correspond to the particle and the hole, respectively, of the additionally created pair. The condition (7) requires them to have an energy difference of the order of δ ξ . of Fig. 1(b), the number of sites at distance N from the localization center (α, α) grows as K N with K given in (41). This exponential growth is analogous to the growth on trees and other hierarchical lattices, see e.g. [44]. However, we caution the reader that, despite superficial similarities, the calculation we will perform does not reduce to studying an equivalent single particle problem on a Cayley tree as in [45]. Indeed, in the latter problem there is a unique path leading from the root to a given site and thus there are no loops. In contrast, in the operator lattice, there are typically exponentially many diagrams (or effective paths) leading to a given site, and thus plenty of loops, similarly as in finite dimensional lattices. Nevertheless, it is usually the case that among those many paths only very few dominate the sum over all paths -an observation we will heavily rely on in the sequel.
Our present problem also differs from the study of the decay of excitations in a zerodimensional quantum dot, as considered in [41]. There, no genuine delocalization can take place due to the finite available phase space. Instead, it is essential that our operator expansion leave the localization volume of the initial state α, for delocalization to be possible beyond a critical interaction strength λ c .

Connection with many-body diagrammatic perturbation theory
Insight into the meaning of the forward approximation at the level of the many-body system is given by a diagrammatic representation of the paths, as shown in Fig. 2.  Fig. 3. Loops in the many-body lattice corresponding to different processes with the same final state, and the corresponding ordered graphs. The graphs differ only in the order in which the interactions U 1 , U 2 , U 3 act. The weights of such paths are strongly correlated: they are all proportional to the same product of matrix elements, U 1 U 2 U 3 , and have highly correlated denominators. The sum over all these ordered graphs constitutes a diagram.
To any path of length N in the operator lattice we uniquely associate an ordered graph with N vertices. These graphs have two main branches representing the decay of the operators c α and c † α of the initial operator n α . Directed paths of length N on the lattice translate into graphs having the geometry of a tree, with a root and N nodes corresponding to the creation of particle-hole pairs. The intermediate states of the graph correspond to the sites (I, J) along the path in the operator lattice, their energy being E I,J . Note that the order of the sites along the path fixes the order of the interaction vertices in the graph.
Such graphs can be grouped into diagrams: members of the same diagram only differ in the ordering of vertices, while sharing the same geometry and labeling of the legs; they are obviously highly correlated among each other. An example is shown in Fig. 3, where all three paths connect the state (I, J) = (α 2 β 2 β 1 α 3 , γ 2 γ 1 δ 3 γ 3 ) to the root (α, α), and involve the same interaction matrix elements.
Such correlated paths exist for all diagrams with branchings (i.e., vertices where more than one of the outgoing excitations undergo further scattering). The order of the subsequent interactions on different branches can be permuted. This corresponds to different paths on the lattice and different ordered graphs, respectively.
Obviously we should sum over all possible vertex order permutations of branched diagrams with fixed geometry and labeling of legs.

Singly branched diagrams
Consider the sum of the energy denominators 8 of the three path weights in the example of Fig. 3. It is immediate to check that the following holds: where E i is the energy difference between out-and in-going states at the vertex i. Thus, the sum over the three paths weights in Fig. 3 can be written as a single term ω Γ : where η i is the random variable associated the vertex i. More precisely, ω Γ is the product of two weights of the form (44), describing the independent decay of the particle c † α and the hole c α , respectively. It can easily be checked by induction that this factorization generalizes to an arbitrary number of interactions in such singly branched diagrams: for any of them, a weight of the form (49) is obtained by summing over all the path weights. We refer to ω Γ as the weight of the effective path associated to the diagram, and denote the latter by Γ .

Multiply branched diagrams
Let us now discuss further branchings in the sub-diagrams describing the independent decays of the particle c † α and the hole c α . Consider a multi-branched decay of the single particle c † α , as shown in Fig. 4(a). There the particles γ and δ, which are produced in the first scattering, decay further through n vertices U i=1,...,n , and the vertex Ũ , respectively. The possible orderings of this diagram correspond to n + 1 correlated paths, which differ by the relative position of the vertex Ũ with respect to the U i . Their sum, does not simply factorize, but it can nevertheless be written in compact form through an integral representation, where ω − i = ω i − iϵ. Indeed, the sum Σ ′ (multiplied by the matrix elements of the correspondent vertices) must be equal to the retarded Green function associated to the independent, parallel decay of the particle γ and the hole δ, computed in the forward scattering approximation and at energy E 0 . For loop-free graphs like the one of Fig. 4(a), the decay processes of the particle γ and the hole δ are independent. In the time domain, the Green function of their joint decay is the product of the individual Green functions, which leads to the convolution (51) in frequency space.
The above formula is rather natural when relating with standard many-body perturbation theory. Indeed, after the summation over orderings of vertices, the diagrams of a fixed geometry are in direct correspondence with the diagrams obtained by BAA in the perturbative expansion of the Keldysh self energy in the imaginary self consistent Born approximation. The latter neglects the renormalization of the real part of the self energy and retains only processes where at each vertex an additional particle-hole pair is created. In our formalism, this corresponds to the directed paths jumping from generation to generation, see also the discussion in Appendix B. Not surprisingly, the statistical analysis of this class of diagrams will give an estimate of the radius of convergence for the operator expansion (28) which is similar to the criterion for the breakdown of stability of the localized phase found by BAA, or to its extension to infinite temperature [46]. Our further analysis is also very similar to the calculation in Ref. [47], but differs in some points, which will be indicated.
The expression (51) for a branched diagram is a random variable, whose probability distribution is hard to analyze. However, the analytic structure of the integrand can be exploited to rewrite Σ ′ as a sum over a much smaller number of terms than the number of orderings in Eq. (50). After performing the integral over ω 2 in Eq. (51), we find a number of poles in the complex plane of ω 1 . Using the residue theorem, we can write (51) as the sum over residues of the poles in the half plane, which contains less poles. In the particular example considered, closing the contour on the upper half plane yields the algebraic identity: The two terms in (52) have a similar structure as the denominators in the original path weight (44). For the considered sub-diagram, the sum over all the n + 1 orderings of vertices could thus be reduced to the sum of only two "effective path" weights.

General branched diagrams
A convolution formula analogous to Eq. (51) can be written for any branched diagram: to each branching one associates an integral of the form (51) with one auxiliary frequency per decaying branch, as well as an energy conserving δ-function for the vertex (see Appendix C for an example). Then one eliminates the δ-functions by integrating over the frequency variable, that occurs most often in the denominators. Using the residue theorem, the remaining integrals can be carried out, and the sum over all orderings of a diagram with fixed geometry can be expressed as a much smaller sum of weights of effective paths, as in the example above. The number of such terms is given by the product of the number of residues obtained for each auxiliary frequency.
The number of effective paths associated to a general diagram depends on its structure; to obtain an upper bound on this number, consider the diagram with the maximal number of branchings at fixed order N , see Fig. 4(b). In Appendix C, we show that in this case the number of effective paths scales as exp[log 3 (log N) 2 + O(log N log(log N))]. This upper bound implies that the number of effective paths associated to an arbitrary diagram is always sub-exponential in N .

Summing diagrams
In this section we show that in the localized region, at a any given order of the expansion, a few terms dominate the operator sum. The term with the largest coefficient in turn is dominated by the maximal diagram contributing to it.

Summing over diagrams and their effective paths
Let D I,J denote the set of all diagrams with final state I, J, each diagram being characterized by its geometry and the labeling of its segments. For any diagram d ∈ D I,J , let P(d) be the set of effective path weights ω Γ associated to it, following the procedure described in the previous section. The corresponding amplitude on the operator lattice can then be written as As we shall prove in the following section, the ω Γ are random variables with fat-tailed distributions. The effective paths associated to a diagram d ∈ D I,J all involve the same set of energies in their denominators and are thus correlated. Nevertheless, we argue that the tail of the distribution of their sum, S(d), is still very similar to the tail distribution of a single effective path, since in the case of a large deviation, S(d) is very likely to be dominated by the effective path with the biggest weight. Indeed, consider a rare set of energies E i , which produces an atypically large value of S(d). There is typically one single effective path for which all denominators become simultaneously small, while the combination of energies in the denominators of other effective paths are very likely to be suboptimal for a fraction of the denominators. Therefore, with high probability, S(d) will approximately be equal to the maximum over all effective paths weights: The set of energies E i that optimize distinct effective paths are typically different, and thus these rare events can be approximated as being independent from each other. Hence, the tail of the distribution of S(d) is enhanced with respect to the tail of a single path weight by a factor |P(d)|. We shall see, however, that due to the sub-exponential scaling of the number of effective paths, this enhancement is immaterial for the estimate of the radius of convergence of the operator series.
Inspecting the explicit examples of Eq. (52) or Eq. (C.3), one can see that there exist energy realizations for which cancellations occur between effective paths with significant weight. This happens when the single path weights are individually big, but Ẽ is much smaller than all the other energy variables E i , which leads to a cancellation between effective paths. However, such configurations require an atypically small Ẽ and do not occur with significant probability. Therefore the suppression of the tail distribution due to such effects is hardly relevant.
Correlations between effective path weights of different diagrams are even weaker than those above, since they share at most a fraction of all E i . Therefore we may approximate rare deviations of S(d) and S(d ′ ) as independent if d ̸ = d ′ . Given that the S(d) are themselves fat-tailed random variables, the sum over diagrams is dominated by the largest term. Therefore, the full operator amplitude A (α) I,J is likely to be dominated by one single effective path: where on the right hand side the maximum is taken over all effective paths from (α, α) to (I, J). As a consequence, for the tail of the probability distribution we obtain the approximation where P(d) is an average number of effective paths contributing to a diagram.

Summing over amplitudes: probability of resonances
Similarly to the effective path weights of different diagrams, also the amplitudes A (α) I,J associated to different sites I, J are weakly correlated, and we treat them as independent random variables. Let us now consider the probability in (46): Here we approximated the probability to satisfy the condition at each generation to be independent from the previous generations. As follows from (55) and from the fact that the effective paths ω Γ have fat tails, the amplitudes A (α) I,J have themselves a fat-tailed distribution. Their sum is therefore dominated by the maximal amplitude, and each factor on the right hand side (56) can be computed as: Using (55), the exponent in (57) is re-written as: The probability in (58) is a large deviation probability: indeed, the weights ω Γ of effective paths are of order O(λ N ): in order for ω Γ to be bigger than z N (with z arbitrarily close to 1), this decay factor must be compensated by an atypical smallness of the energy denominators. We devote the following section to the computation of the probability of these large deviation events. The calculation will reveal that, for λ sufficiently small, the probability decays exponentially with N . This decay competes with the exponential growth of the total number of effective paths of length N : which we estimate in Section 8 below. The competition between these two terms leads to a transition at a given critical value of λ, which we determine in Section 9.

Large deviations of paths with correlated denominators
In the previous section we argued that the large deviations of operator amplitudes are essentially determined by the large deviations of effective path weights. The weight of any effective path is the products of two terms, describing the decay of c † α and c α , respectively. In each of those terms (cf. (52) e.g.), the functional dependence on the E i is similar to that in the original path weights (44). We will first discuss the latter and then show that general effective paths behave essentially identically.
Because of the energy restrictions (7) the energy differences E αβ,γ δ /δ ξ are random variables of order O(1). For simplicity, we take them as independent Gaussian random variables with zero mean and unit variance. The denominators in (44) are partial sums of such energies, and we may write: where In path weights of the form (60) we are mostly interested in characterizing the distribution of the product of denominators. The numerator behaves as ∼ (λη typ ) N−1 , with η typ = exp[⟨log |η|⟩] = 1/e, and we neglect the Gaussian fluctuations of its logarithm.
The fact that the denominators in (60) are correlated distinguishes the many-body problem from single particle localization. These correlations are a feature that any perturbative treatment of MBL has to deal with, and it is thus important to develop a method to calculate the large deviations in this case.
The distribution function P N (y) of the logarithm of the product of denominators, can be obtained from its generating function, by inverse Laplace transform, where B is the Bromwich path in the complex k-plane.
In the present case, the relevant y scales linearly with N , and thus we define ỹ = y/N , and where the function has a well-defined limit, φ(ỹ, k), for large N . In that limit, the integral over k can be done by a saddle point approximation. The contour has to be deformed to pass parallel to the imaginary axis through k * = k * (ỹ), which satisfies: Large deviations correspond to ỹ = O(1). In the case of parametrically small interaction strength λ (which is relevant in the case of large connectivity K) we will see that we can restrict our attention to ỹ ≫ 1, see Section 9. For large values of ỹ, we will see that the saddle point tends to k * → −1.
The computation of the generating function G N is given in Appendix D. Here it suffices to say that the recursive structure of the denominators s i lends itself naturally to a transfer matrix expression for G N , which grows as the N th power of the largest eigenvalue.
The final result for the exponent at the saddle point is for ỹ ≫ 1. From this we obtain the large deviation probability: where C contains only negligible logarithmic corrections to the exponent, and

Comparison between correlated and uncorrelated denominators
It is interesting to compare the large deviation distribution (68) with the tails of the distribution of the random variable: where X i are i.i.d. Gaussian random variables with zero mean and unit variance. As derived in Appendix D, at leading order in N , up to a correction F → F − log 2/(2ỹ) + O(1/ỹ 2 ), both have the same form (68). Physically, this result can be understood as follows. By restricting to ỹ ≫ 1, we are concentrating on very rare realizations of Y N . Those are insensitive to the details in the structure of the denominators. Indeed, atypically big values of objects like ( N i=1 s i ) −1 arise from restraining the random walk (s 1 , · · · , s N ) to the vicinity of the origin. This boils down to computing the probability that s i is small conditioned on the fact that s i−1 was small. To leading order in the typical smallness of such denominators, one obtains the same result as by minimizing N denominators independently. The leading correction with respect to the case of i.i.d. denominators consists in a small suppression of the tail, since it is slightly less probable to encounter small denominators, when they are correlated.
The above reasoning can be extended to more general weights ω Γ , associated with effective paths. Indeed, the corresponding denominators are still products of single energies or partial sums (see Eq. (52) or Eq. (C.3)). In the limit of very large deviations (ỹ ≫ 1) they all share the same tail distribution (68), the only relevant parameter being the total number N of denominators. Therefore, approximating the numerator in ω Γ with its typical value (λη typ ) N and using (68), we finally obtain: with F given in (69).

Justification of neglecting interaction vertices with equal indices
We recall that we have neglected interaction terms U αβ,γ δ where two or more indices are identical. This will significantly simplify the combinatorics of counting diagrams. Let us now give a justification a posteriori for this approximation, by showing that such terms would make contributions which are down by factors of 1/K. Consider the various scattering processes with one pair of equal indices among the four legs of a vertex, whereby we restrict to one ingoing and three out-going particles. Consider first the scattering α → β with the simultaneous creation of a pair (γ , α). The constraints |ϵ α − ϵ γ | < δ ξ , |ϵ α − ϵ β | < δ ξ imply that all levels have to lie within δ ξ from each other. The phase space for such events is smaller by a factor of 1/K with respect to generic scattering processes where γ is unrestricted.
The second case is more subtle. It consists in a scattering α → β from a particle γ , which remains in place. If this is to be a resonant contribution one needs the energy increment E of the vertex to be | E| = |ϵ α − ϵ β | δ ξ /K. In a scattering where γ switches to a neighboring state δ, with |ϵ γ − ϵ δ | ∼ δ ξ , one can optimize α, β among the K different choices, such as to make E of order δ ξ /K. However, if γ remains in place, the optimum over the K choices for α, β will yield a parametrically bigger E = ϵ α − ϵ β , because of the repulsion between the neighboring levels α, β. Therefore such processes are systematically much less resonant than processes involving four distinct levels. 9

Combinatorics of diagrams
We now estimate the total number of diagrams N N at a given order N , cf. Eq. (59). For simplicity, we restrict here to the case of spatial dimension d = 1.
Consider any amplitude A (α) I,J with index set (I, J) = (α 1 · · · α N , β 1 · · · β N ). The localization centers r α i , r β i , cf. Eq. (6), of the single particle indices are distributed over a certain number of localization volumina of length ξ around r α , with a given number of single particle indices per localization length. Due to the energy restrictions imposed on the interactions, particles and holes belonging to the same localization volume are organized in pairs: members of a pair are produced in the same scattering process, and have an energy difference of order δ ξ .
Due to the fact that the interaction is local, only particle-hole pairs in nearby localization volumina can be involved in the same interaction vertex: this imposes some constraints on the geometry of the diagrams representing the scattering processes with (I, J) as final state. For example, states (I, J) having only one particle-hole pair per localization length must be associated to diagrams with no branchings in the decays of c † α and c α , since the particle-hole pairs must be created in a fixed order dictated by their spatial sequence, and thus no permutation is possible. In contrast, final states with several pairs per localization length can be reached by a variety of diagrams.
In the following, we construct the subset of diagrams corresponding to scattering processes with a "necklace structure", in which the particle-hole pairs are created in a sequence of n groups of m i=1,...,n pairs, each group belonging to a single localization volume. This furnishes a lower bound on the number of all diagrams. Note that m i is bounded by the maximal number of particle-hole pairs per localization volume (N loc = K/4), and n i=1 m i = N . Due to locality, pairs belonging to the ith and (i + 1)th group belong to neighboring localization volumina in real space; pairs belonging to different groups i, j ̸ = {i − 1, i + 1} might belong to the same localization volume.
This construction is done in two steps: first, for every group i we build all possible subdiagrams with final indices corresponding to the indices of the m i pairs, as illustrated in Fig. 5. In a second step, we connect sub-diagrams of neighboring groups by a single scattering vertex. We thus obtain a global necklace diagram, and count how many different diagrams with this structure there are. The counting is similar to Ref. [47], but here we include diagrams corresponding to final states with a non-uniform density of particle and hole indices per localization length, since these have a larger abundance.
A central ingredient for the combinatorics is the number of all possible geometries of diagrams with m interactions in a given localization volume, see Fig. 5. We denote this number by T m . It equals the number of trees with one root (of connectivity 2) and m nodes (of connectivity 4). As we derive in Appendix E: length ξ , and hence of the level spacing δ ξ , on λ. This in turn might induce a small shift of λ c . However, since their subsequent analysis boils down to dropping the same terms as we have argued above, this shift is expected to be a 1/K correction. Following the reasonings explained in Fig. 5, we find the number of necklace diagrams associated with fixed groups of m i pairs to be The origin of the various factors is explained in detail in Fig. 5: the factor m i counts the number of pairs which are created subsequently to the first pair entering the volume associated to the group i. One of those m i pairs belongs to the adjacent localization volume and creates the subsequent cascade of pair creations there. The second factor describes the choice of two levels (the level closest in energy above or below) to which an incoming quasiparticle may scatter at a vertex. The factorial term comes from the choice of assigning the m i pairs to the final legs of a given tree diagram in the localization volume of group i. Consider first the case in which only a single group i of pairs occupies a given localization volume. The number of choices of {m i } particle-hole pairs is then given by Indeed, a configuration of m i pairs of (disjoint) adjacent levels, and the remaining N loc − 2m i untouched levels in the same localization volume form a set of N loc − m i objects, out of which m i are pairs. This explains the binomial factor. For each pair, one can choose how to assign the two levels to particle and hole, respectively. This yields the factor 2 m i .
As we will see below, the relevant m i are of order O(1) ≪ K. We therefore approximate: Note that the necklace structure will in general fold back and forth in real space, such that several groups will get to lie in the same volume. Nevertheless, the above approximation remains good as long as the total number of pairs created in a given localization volume is significantly smaller than K.
Combining Eqs. (73)-(75), the total number of necklace diagrams is: where the average number of effective paths per diagram, P(d), scales sub-exponentially with N . The factors of 2 arise due to freedom of each group to scatter to the left or the right of the preceding group as long as there is still significant phase space in the corresponding localization volumina. The correction due to the finiteness of K ≫ 1 is small and was thus neglected. We now determine the distribution of group sizes {m i } which dominates the sum (76), writing and thus n m The Lagrange multiplier µ is fixed by the constraint: The saddle point solution can thus be written as  Fig. 6(a). The probability that a given pair is created in a scattering process involving a total of m pairs in the same localization volume is plotted in Fig. 6(b). We see that most pairs are created together with a few more pairs within the same localization volume. Plugging (82) into the saddle point for N N , we find the number of diagrams to grow like (dropping pre-exponential factors) This result is based on the approximation that we only allow for diagrams with a necklace structure, where groups of m i pairs are connected by a single scattering between subsequent localization volumina. Performing the calculation without this restriction is difficult since it is less easy to control the spatial constraints. However, we can easily obtain an upper bound by realizing that all possible diagrams consist in all geometrically possible labellings of trees of size N . The number of trees grows as (27/4) N . For each label one has roughly 3K possibilities, as the pair must lie in a localization volume adjacent to or identical with the one of the pair preceding it on the tree. This yields the simple upper bound which yields a growth factor which is only about a factor of 2 bigger than the much more conservative estimate (83). Let us thus write with 10.6 < C < 20.25.

Effect of Fermi blocking
The above counting is still not entirely complete. Indeed, eventually the operators we have constructed should act on some many body states, and get annihilated when attempting to create particles on occupied states or holes on already empty states. In an infinite temperature state, and at a filling fraction ν each particle-hole creation operator has a chance to annihilate the state with probability 1 − ν(1 − ν), or, in other words, only a fraction of [ν(1 − ν)] N of all operators will not annihilate a typical infinite temperature state. One should thus modify the number of relevant diagrams to In the next section we use this result to determine the radius of convergence of the operator series. Similar considerations apply to finite temperature as we will discuss below.

Structure of the dominant operator terms
Our result differs from the similar analysis in Ref. [47]. The main difference consists in our assumption that the sum of diagrams that add up to the amplitude of a given operator O I,J is dominated by the biggest term (provided the considered amplitude is among the largest ones at that order). In contrast, the authors of [47] assumed that the exponentially many diagrams have comparable amplitudes, but random signs, and applied the central limit theorem to the sum. Moreover, we allow for fluctuations of the number of pairs generated in each localization volume instead of imposing a homogeneous spatial density. We find that in the restricted set of necklace diagrams the optimal distribution of group sizes m i s is peaked at values of order O(1), but still clearly larger than one. Upon folding of the necklace, the number of pairs per localization volume will become even more significantly larger than 1. Thus we see that multiple scattering processes within a localization volume significantly enhance the delocalization tendency. This shows that the many-body problem is genuinely different from an effective one-body problem, in which a simple excitation would propagate nearly ballistically, by shedding one particle-hole excitation in every localization volume.

Estimate of the radius of convergence
We now have all the ingredients to estimate the probability of resonances at generation N , in order to prove that for λ sufficiently small there are no delocalizing resonances and (46) holds true.
Consider the probability in expression (58). Using (71), we estimate: Note that the large deviation result applies since x ≥ log( z λη typ ) ≫ 1. Approximating the integral with the value of the integrand at the extremum, setting z = 1 and neglecting sub-exponential terms in N we obtain: Substitution of (89) and (87) into (58) yields: with Taking into account (56) and (57), we finally obtain: If G(λ, K) < 1, then, for N * sufficiently big, each of the factors in (92) is arbitrarily close to 1. Therefore, their product converges to 1 in the limit N * → ∞ (see also [48] for a similar reasoning). This allows us to conclude that, for all values of λ for which G(λ, K) < 1 holds, (46) holds, too, and the series in operator space (28) converges to a quasi-local operator. In this regime, the excitation of the single particle level α, localized at ⃗ r α , is very unlikely to create a distant disturbance at ⃗ r β with large L = |⃗ r β − ⃗ r α |, its probability tending to zero exponentially as L → ∞: there is no diffusion at small λ.

Comparison with a single particle on the Bethe lattice
It is interesting to note that the delocalization threshold (93) looks identical to the critical ratio between hopping and disorder strength for a single particle problem on a Bethe lattice (see Eq. (5.8) in [45]) with effective connectivity K eff = ν(1 − ν)(C/ √ 2π)K, which is a significantly bigger than the connectivity associated with each vertex, ν(1 − ν)K. This reflects the fact that in the many-body problem the same final state can be reached with many different decay processes. The results are nevertheless similar, because both problems are dominated by very few resonant paths, whereby the large local connectivity in the many-body problem ensures that different resonant paths are likely to be uncorrelated, even if they lead to the same final state.

Possible implications for delocalization in higher dimensions
According to the above calculation, in the dominating decaying processes only groups of O(1) particle-hole pairs are created at the same time in a localization volume. This suggests that the necklace-type diagrams are diffusing back and forth a lot. This contrasts with the model of BAA, where the hopping strength between adjacent volumina was assumed to be parametrically smaller than λ, which favored the particle-hole creation cascade to fully explore a localization volume before moving on to the next volume. The latter led them to conjecture a critical exponent for the localization length in higher dimensions by relating the decay processes of single particle excitations to self-avoiding random walks. This scenario hardly holds in our model, as the optimal processes are not of this kind.

Finite temperature
So far we have been discussing the convergence of the expansion of integrals of motion in the forward approximation. If the expansion converges, we have succeeded in constructing a complete set of quasi-local conserved quantities which entail the absence of transport in whatever state the system is, in particular at any temperature, including the limit T → ∞. Note again, that the latter limit is meaningful because we work on a lattice on which the energy density is bounded.
An interesting question arises when we ask about transport at finite temperature, and the possibility of a MBL transition as a function of temperature, as predicted by BAA. How would this reflect at the level of integrals of motion? If there is a finite temperature transition, one expects that the localized low T phase is still governed by local conservation laws which inhibit transport, while local integrals of motion do not exist at higher temperature. Clearly the latter rules out the convergence of the conserved operators in the operator norm. Rather one has to invoke that the norm of operators O I,J , when restricted to typical low temperature states, becomes exponentially small in N = |I|, if the index sets I, J contain a finite fraction of hole excitations above E F + T or particle excitations below E F − T . This effect may enhance the convergence of the series expansion. This is certainly so at the level of the forward approximation where the temperature T essentially replaces the bandwidth in the analytical estimates of our expansion. This will lead to a larger domain of (weak) convergence of the operator expansion, suggesting the possibility of a delocalization transition at finite temperature.
A similar consideration shows that the transition (93) at T = ∞ takes place in a regime where the operator expansion is not convergent in the operator norm, but converges only weakly on typical high energy states. This is due to the Fermi blocking discussed in Section 8.3.  (28). In the pictures, the wave-functions are the single particle states contributing to (I, J) and (I ′ , J ′ ). Both operators involve degrees of freedom whose maximal distance to the localization center r α is the same: r(I, J) = r(I ′ , J ′ ); however, the length of the support N of the operators (shaded in the picture) increases when N grows in the first case, while it remains bounded in the second case.
However, a different scenario is possible as well. The operator series in Eq. (28), or subsequences of it, can diverge for two reasons: (i) Either the amplitude of terms with growing N do not decrease sufficiently fast, and thus the diameter of the support of these terms grows indefinitely. (ii) There can be subsequences of (28) whose terms have bounded index level N , but supports which wander off to infinity. These two possibilities are illustrated in Fig. 7.
Possibility (i) is what is obtained within the forward approximation. The fraction of terms at λ c , which survive when applied on finite T states, decreases rapidly with N . However, such a projector would not affect the convergence properties of a subsequence of type (ii). Upon restricting to finite T states the norm of the relevant operators is typically reduced by a factor, which remains bounded from below. Therefore the series will continue to diverge despite the projection.
To address the question of whether or not a finite temperature transition is possible one has to consider the interaction strength λ c at which the infinite T transition takes place, i.e., where the integrals of motion delocalize. If there is a subsequence of type (ii), which diverges at this point, the delocalization transition is a function of λ only, but independent of T . In the delocalized phase (λ > λ c ) transport would always remain finite, even though it may become very inefficient and strongly activated at low T . If instead there is no subsequence with bounded index cardinality, which diverges at λ c , a transition in temperature should be expected, as predicted by BAA. Such a transition was recently reported by a numerical study [17].
Physically the scenario (ii) corresponds to transport and delocalization driven by rare, compact, but mobile regions with a local "temperature" above the putative T c . At first sight one is tempted to rule this out because one would expect such a hot bubble to diffuse and loose its extra energy forever to the environment. However, the environment being in the supposed MBL phase cannot transport the extra energy to infinity, and thus there should be a finite recurrence time until the hot bubble forms again. Whether such a bubble would nevertheless have to remain localized, or whether its internally delocalized state would allow it to move around is a difficult open question. Recently, it was argued that big enough bubbles could undergo resonant delocalization [49]. At the level of integrals of motion these two scenarii translate into the above dichotomy about critical subsequences.
Note that a divergence of type (ii) by a set of operators with bounded support is made less likely by the large parameter K. We in fact invoked this large parameter to neglect these terms, similarly as BAA. However, it is difficult to exclude that there is no such divergent subsequence which contributes with a finite, but with a relative weight which is parametrically small in K. In that case, numerical approaches such as [17,50] would not capture this divergence.
It would be interesting to revisit the question of the finite T transition also as a function of density. In the low density limit, the effective connectivity K eff (resulting from projection onto typical states) can be reduced to K eff ≪ 1, in which propagation channels of type (ii) become parametrically favorable, and may be the ones to induce delocalization -if interactions can induce a transition at all under such circumstances.

Conclusion
In this work we have constructed explicit quasi-local integrals of motion within the weakly interacting regime, which we argued to imply the absence of any d.c. transport. We reduced the problem of constructing such operators to a non-Hermitian hopping problem in operator space, an idea that we hope to have potential for further more rigorous studies. We have also obtained an explicit recipe for constructing generalized occupation numbers of a Fermi insulator order by order in perturbation theory.
We have used the large parameter K (proportional to the number of sites in a single particle localization volume) to concentrate on processes where one more particle-hole pair is created at every order of perturbation theory. Within this forward approximation, and based on an analysis of rare resonances at large distance, we found an analytical estimate of the radius of convergence of this perturbative construction, yielding a critical value of the reduced interaction strength λ c = √ 2π /(Cν(1 − ν)2e K log K) with 10.6 < C < 20.25, at infinite T and filling fraction ν, similar to the prediction by BAA based on the analysis of the life time of a single particle injection.
We believe that the spatial structure of our integrals of motion provides a good picture for the "quantum avalanche" created by injection of an extra particle. We have found that the optimal way of its propagation is by exciting a necklace of groups of O(1) particle-hole pairs per localization volume. Due to the meandering of the necklace structure, several groups of such pairs may be created in the same localization volume, an effect which is enhanced in low dimensions.
The convergence of our construction for the local integrals of motion implies the absence of transport and equilibration at any temperature and density. Taken as such, it appears to be blind to potential phase transitions upon varying those parameters. However, projecting the operator series onto typical states with thermal single particle occupations, one may discuss the weak convergence of the operator expansion. In this vein, we have discussed the question of the existence of a genuine finite temperature transition, depending on the properties of the operator series at its critical point at T = ∞. Further investigations of this question would be interesting.
This procedure leads to a modified expansion for I α : with B (m) α given explicitly in Eqs. (26), (27). In the following, we work by induction on m. We set B (0) α = n α and we omit the index α for simplicity. We define the truncation to mth order of I : and assume that the property (A.1) holds to order O(λ m−1 ), namely: Note that I ≤0 is naturally binary, with (I ≤0 ) 2 = I ≤0 .
We denote with Î (m) the solution of the equation: in the subspace O, cf. Eq. (25), and definê The operator Î ≤m is not binary to order O(λ m ); however, we show that it is possible to add to Î (m) a suitably chosen operator K (m) in the kernel K of the linear map f (X) = [H 0 , X], so that is binary to order O(λ m ). To show this, it is sufficient to show that the difference (Î ≤m ) 2 −Î ≤m , truncated to order O(λ m ), is an element of the subspace K, i.e.: and using (A.5), we find which proves (A.8).
A simple computation shows that by choosing: the condition (A.4) is fulfilled to order O(λ m ). Eq. (27) follows from noticing that:

Appendix B. Local re-summation in the case of small denominators
In the following we present a simple example in which the perturbative expansion in λ, Eq. (22), diverges. Suppose that at order n the series expansion contains the term: where O = c † i 1 · · · c † i m c j 1 · · · c j m−1 is a string of operators with i, j ̸ = {α, β, γ , δ}, and that the amplitude J n = O(λ n ) therefore contains the energy denominator: Suppose E to be atypically small. One then easily finds a subsequence of the series (22), which contains arbitrarily high powers of the small denominator. Indeed, let us restrict the interaction to the term U αβ,γ δ (c † α c † β c γ c δ + h.c.) in the interaction U ; higher order terms in the perturbative expansion are obtained by subsequent application of (25) to J n ; this produces: with E αβ,γ δ = ϵ α + ϵ β − ϵ γ − ϵ δ . By iteration of this procedure, a sub-sequence of operators containing arbitrarily high powers of ( E) −1 is generated, preventing the convergence of the series if the term in brackets is larger than 1.
Divergences of this kind are of the same nature as local resonances encountered in single particle localization [1]. They have to be properly re-summed for the series expansion to make sense. For example, all terms multiplying Oc † β c γ c δ re-sum into a self-energy correction of the denominator in the first line of (B.3): The term in square brackets in (B.4) contains a very large self energy correction U 2 αβ,γ δ / E, which compensates the divergence in J n when E → 0.
Self-energy corrections like this are neglected in the forward approximation. Their main effect is to weaken the role of small denominators: As noticed by Anderson, small denominators essentially neutralize themselves by introducing enormous self-energies for the neighboring sites which then appear as very large denominators [1]. The resummation thus increases the convergence as compared to the naive perturbative expansion in forward approximation in [1]. In single particle localization problems with large connectivity, the critical hopping is increased by a factor e/2 [45], and a similar effect is expected here [4].

Appendix C. Evaluating diagrams as sums over effective paths: a more involved example
As an additional example for the evaluation of diagrams as sums over effective paths, we give the explicit expression for effective path weights associated to diagrams with the geometry of Fig. C.8.
For fixed indices on all segments, there are 105 different orderings of the interactions. 10 Their sum has the integral representation: where ω − i ≡ ω i − iϵ, and I 1 (ω) is the integral representation of the sum of all the weights of the subdiagram in the dashed frame, with incoming energy ω: .

(C.2)
By means of the residue theorem, I 0 can be rewritten as the sum over only 8 effective path weights: and f (X) = 1 Note that as a function of the E i and Ẽ i , I 0 has poles only due to denominators which involve the incoming energy E 0 , while I 0 remains regular as any of the Ẽ i → 0, due to cancellations among different terms.
The minimal number of effective paths associated to a diagram equals to the product of the number of residua of any of the performed integrals. This number can be determined from the structure of the diagram using the following rules: First, one eliminates the final leaves which are not associated to auxiliary frequencies, since they do not contribute with poles in the integral representation (Fig. C.9 represents the diagram of Fig. C.8, with these eliminated branches colored in gray). Then, one determines the directed path (branch) with the maximal number of interactions along it (red one in Fig. C.9). The auxiliary frequencies along this path are eliminated integrating the corresponding δ-functions. All remaining branches contribute one more residua than interactions along the branch. In the example of Fig. C.9, the three branches that remain after eliminating the red one contribute 2 residua each. The total number of effective paths is obtained by multiplying these numbers, which gives 2 3 = 8 in the present case. With the help of these rules, we count the minimal number of effective paths associated to the maximally branched diagram with N interactions, shown in Fig. C.10(a). We denote this number by |P|.
The maximally branched diagram consists of two regular rooted trees with L(N) ≡ log(N + 1)/log 3 generations. Since the weights of the two sub-diagrams factorize, we need to count only the effective paths associated to one of them, and square their number. We therefore consider one sub-diagram, and organize its branches according to the number of interactions along it (in Fig. C.10 As claimed in the main text, this number is sub-exponential in N .

Appendix D. Probability of large deviations in products of correlated denominators
Here we derive the probability of large deviations of effective path weights, i.e., the product of correlated denominators, as they occur in perturbation theory in the forward approximation.
We denote by s k = x 1 + · · · + x k the partial sums of i.i.d. random variables x i ≡ E i /δ ξ . Let us assume the x i to be unit Gaussian variables with probability density Consider the distribution function P N (y) of the random variable log |s i |, (D.2) and its generating function G N (k) Let us compute G N for N ≫ 1. We start by taking the expectation value over the joint distribution of x i = s i − s i−1 (s 0 ≡ 0): where the integral operator O k [·] acting on a function g is given by Consider now the basis of even functions: In this basis the linear action of O k is given by: with the matrix where c(k) = φ max,0 · m≥0 a m φ max,m , and φ max is the normalized eigenvector corresponding to λ max . Numerical results for the maximal eigenvalue are shown in Fig. D.11. They are obtained by truncating O to an increasing set of basis states (or chain of sites) m ≤ L. For k close to the singularity k = −1 the results rapidly converge with increasing size L. In this region, we can extract information on the limiting curve λ max (k) from the truncated chain. In particular, we see from the plot that both the function log λ max (k) and its negative slope diverge at k = −1, which will also follow form the analysis below. Hence, k −1 is the relevant region for the saddle point approximation of Eq. (65), if very large deviations ỹ ≫ 1 are considered.
Due to the proximity to a logarithmic divergence at k = −1, to order O(1 + k) the eigenstate φ max for k ∼ −1 is localized on the first site (n = 0) of the corresponding hopping chain: with an eigenvalue (D.12) Corrections to the maximal eigenvalue (D.12) can be evaluated perturbatively in the matrix elements O ik̸ =00 (D.8), which yields One can show that δλ(k) is analytic around k = −1 and satisfies δλ(k → −1) → 0. This is due to the fact that in nth order perturbation theory λ (n) max is proportional to denominators of the form 1/O n−1 00 ∼ (k + 1) n−1 . The leading term in δλ(k) results from: (k + 1) + O(k + 1) 2 . (D.14) A plot of the corrections to the maximal eigenvalue (D.12) is given in Fig. D It has a saddle point at k = k * (ỹ), determined bỹ To isolate the singularity in k = −1 we use the Laurent expansion of ψ (0) (x) around x = 0: where γ is the Euler constant. This allows us to recast (D.16) in the following form: Here, Q(·) is an analytic function with expansion: This yields the equation c(k * (ỹ)) λ max (k * (ỹ)) (D. 25) yields only logarithmic corrections to the exponent. As commented in the main text, when restricting to the linear term in (D.24), the large deviation statistics for the correlated denominators coincides with that of independent identically distributed energy denominators. Indeed, from Eqs. (D.10) and (D.12) it follows that to leading order in k + 1 the exponential growth of G N is almost equal to that of the generating function g N (k) = [2 k+1 2 Γ ( k+1 2 )/ √ 2π] N associated with products of N independent Gaussian denominators with unit variance. For ỹ ≫ 1, the tail of the distribution is determined by the residue of the pole of the generation function at k = −1, which is identical in the two cases. Repeating the above derivation of large deviations for independent denominators with generating function g N (k), one finds that it differs from (D. 23  In general, for k-body interactions we have T (n) = (k−1)n n /((k − 2)n + 1) diagrams. For k = 3 these are the numbers of binary trees with n vertices, or Catalan numbers.
There are two ways to solve Eq. (E.1) and find T n . The first one is to notice that its generating function T(x) satisfies T(x) = T (x) 2 , write Eq. (E.6) in terms of T and use Lagrange's inversion theorem again. Alternatively, one can use the explicit form of T (n) and apply a summation formula for the ratio of four Γ -functions to obtain: with L(N) = log(N + 1)/ log 3. Using that 2 P 1∞ k=1 3 ¡−k log(k) = 0.29, one finds that the minimal number of effective paths |P| for diagrams with this geometry, which is the square of the above, scales as: which should replace the estimate in Eq. (C.7). Among all the possible geometries of diagrams with a fixed number of interactions N , the maximally branched geometry is the one that maximizes the number of effective paths. Thus, Eq. (2) is an upper bound for the average |P(d)| introduced in Eq. (55), which we also expect to have an exponential scaling in N : with 0 < ®α < 0.58. Accounting for this correction, the total number of effective paths of length N , N N in Eqs. (59), is modified accordingly: Since the additional factor is only exponential in N , the conclusion about the convergence of the construction of integrals of motion for small enough interactions is unaltered. The effect of the correction is to slightly diminish the radius of convergence of the construction.
The precise effect of this correction depends on the relative weight of the effective paths and their mutual interference.
If we make the simplifying assumption that the effective paths associated to the same diagram can be treated as independent random variables, the sum S(d) in Eq. (53) is dominated by the largest term, and the factor |P(d)| enhances the tail of the distribution of A (®α) I,J as compared to the tail corresponding to a single path weight, see Eq. (58). The above discussed correction would thus modify by a factor e ®α the numerical constant C in Eq. (93). (Note, however, that this constant C was already subject to uncertainty, see Eq. (85), due to the approximations going into the estimate of N N .) Approximating e ®α ¼≈ e 0.58 = 1.79 we find that the result of Eq. (93) holds with the following uncertainty on C: 18.97 < C < 36.25.
The above assumption neglects, however, that the effective paths associated to a given diagram are not independent. Indeed, they involve the same energy variables in the denominators, but in different combinations. These correlations might be relevant when computing the large deviations for S(d). In fact there could be disorder realizations in which all the energy variables are simultaneously small, in such a way that there is no dominant effective path. In an extreme case, all !ω 0 contributing to S(d) might happen to be of the same order of magnitude and atypically large. These contributions will come with different signs and partially cancel, which counteracts the enhancement of the total amplitude. To estimate an upper bound for the effect of the exponential number of effective paths on the constant C we neglect those partial cancellations, and assume that the diagrams dominating the tails of S(d) are such that essentially all effective paths add up constructively with comparable weights. Under this extreme scenario, the large deviations of S(d) would be given in terms of those of a single path weight !ω by setting S(d) »∼ |P(d)|!ω. In this approximation, Eq. (93) is recovered with the substitution λ !→¸λe ®α . This shifts the estimated interval for C in Eq. (5) only by a logarithmic factor.