Precise evaluation of thermal response functions by optimized density matrix renormalization group schemes

This paper provides a study and discussion of earlier as well as novel more efficient schemes for the precise evaluation of finite-temperature response functions of strongly correlated quantum systems in the framework of the time-dependent density matrix renormalization group (tDMRG). The computational costs and bond dimensions as functions of time and temperature are examined for the example of the spin-1/2 XXZ Heisenberg chain in the critical XY phase and the gapped N\'eel phase. The matrix product state purifications occurring in the algorithms are in one-to-one relation with corresponding matrix product operators. This notational simplification elucidates implications of quasi-locality on the computational costs. Based on the observation that there is considerable freedom in designing efficient tDMRG schemes for the calculation of dynamical correlators at finite temperatures, a new class of optimizable schemes, as recently suggested in arXiv:1212.3570, is explained and analyzed numerically. A specific novel near-optimal scheme that requires no additional optimization reaches maximum times that are typically increased by a factor of two, when compared against earlier approaches. These increased reachable times make many more physical applications accessible. For each of the described tDMRG schemes, one can devise a corresponding transfer matrix renormalization group (TMRG) variant.


I. INTRODUCTION
This paper addresses the efficient evaluation of finite-temperature response functions χÂB(β, t) := 1 Z β Tr(e −βĤB (t)Â) withB(t) ≡ e iĤtB e −iĤt (1) for quantum many-particle systems on one-dimensional (1D) lattices in the framework of the density matrix renormalization group (DMRG) [1][2][3]. From the theoretical perspective, such quantities occur for example in the context of linear response theory [4,5] and characterize the effect of a perturbationÂ of the system at time zero on the expectation value of an observableB at time t. Initially, the system is in thermal equilibrium, where β = 1/T is the inverse temperature and Z β = Tr e −βĤ . The units are chosen such that Boltzmann's and Planck's constants are k B = 1 and = 1. Such response functions contain important information on the many-body physics and are addressed in many experimental setups. For example, recent advances in neutron-scattering techniques make very precise measurements possible. See for example Refs. [6][7][8][9][10]. It is hence very important to have numerical tools at hand that allow for an efficient and precise evaluation of thermal response functions for (strongly-correlated) condensed matter models in order to match theoretical models to actual materials and to gain an understanding of the underlying physical processes. As discussed and demonstrated in several works [11][12][13], finite-temperature response functions for strongly-correlated 1D systems can be evaluated up to some maximum reachable time by using the time-dependent density matrix renormalization group (tDMRG) [14][15][16] applied to a purification [17][18][19] of the density matrix. In the DMRG approach, those purifications are approximated with a controllable precision by matrix product states (MPS) [20][21][22]. To address finite-temperature states with tDMRG was suggested in Refs. [23,24] with first applications in Refs. [25,26]. A difficulty in the simulations of time-evolved states is the growth of entanglement with time [27][28][29][30]. In tDMRG calculations, this leads to a corresponding, typically exponential, increase of the computation cost with time and a strong limitation of the maximum reachable times, depending on the available computational resources and the desired accuracy of the simulation. The effect is much more drastic for mixed states. The focus of this paper is hence to study the computation cost of the different tDMRG schemes for the evaluation of thermal response functions and to suggest novel more efficient approaches that allow for a substantial increase in the maximum reachable times.
The results of such simulations can be Fourier transformed to study the spectral properties of the response. In order to avoid ringing artifacts from the Fourier transformation of the data on a restricted time interval, one can either use filters, which result in an artificial broadening, or use linear prediction [31,32]. This was employed first in Ref. [33] for T = 0 and in Ref. [11] for T > 0. See also Refs. [34,35], for further applications of linear prediction in DMRG calculations at T = 0.
Earlier alternative DMRG approaches for finite-temperature response functions have built on the transfer matrix renormalization group (TMRG) [36][37][38][39]. In a first step, analytic continuation techniques as in Quantum Monte Carlo (QMC) were employed [40]. In a second development, transfer matrices comprising the imaginary-and the real-time evolution were used to access autocorrelation functions [41,42]. In finite-temperature QMC calculations (e.g. positive-definite path integral [43,44] or stochastic series expansion with directed loops [45]), real-time or real-frequency response functions have to be extracted by analytic contin-uation from imaginary-time results [46] which is ill-conditioned and numerically challenging. Finite temperatures have also been addressed by a hybrid algorithm based on Monte Carlo and tDMRG [47,48].
In this paper, I discuss how the simulation based on the evolution of a certain MPS purification of the thermal density matrix as described in [11,13,23] can actually be understood as evolving a matrix product operator (MPO) representation of the square root of the thermal density matrix. The description in terms of purifications is in this sense superfluous. A decisive observation is now that there are considerable degrees of freedom for the choice of a specific evaluation scheme. For the two specific schemes, corresponding to Refs. [11,12] (scheme A) and Ref. [13] (scheme B ), respectively, I examine the scaling of the computation cost with time. It turns out that scheme A has some advantage at low temperatures and scheme B is advantageous at higher temperatures, for which I give an explanation on the basis of quasi-locality [49,50]. The maximum reachable times t A max (β) and t B max (β) differ, for the same computation cost and accuracy, by a factor of order one. It is then shown that an optimized evaluation scheme, for which one optimizes over the aforementioned degrees of freedom, yields maximum reachable times that are at least twice as large as for scheme B, specifically t opt max (2β) 2t B max (β). In all cases one finds that t max (2β) ≈ t max (β) as t max varies slowly as a function of log β. We can devise a new scheme C that does not require any optimization, but outperforms scheme B by a factor of two in the maximum reachable time. It is only outdone by scheme A at very low temperatures. To go beyond scheme C, one can study the computation cost for calculating MPO approximations of operators e iĤs e −αĤB e −iĤs to determine optimized evolution schemes. Finally, the connection between scheme A and an earlier TMRG approach [41] is explained. For each of the tDMRG schemes, one can devise a corresponding TMRG variant.
Due to the substantially increased reachable times and its simplicity (no need for optimization), the novel scheme C, which was introduced recently in Ref. [51] and is explained here in detail, can be expected to be the method of choice for future applications. In Ref. [51], it allowed us to show that the thermal spectral functions of 1D bosons in the quantum critical regime with dynamic critical exponent z = 2 follows a universal scaling form and to compute the corresponding scaling function precisely.

II. MPS PURIFICATION VERSUS MPO DENSITY MATRIX
As described for example in Refs. [11,13,23], the (unnormalized) thermal density matrix ρ β := e −βĤ , Z β := Tr e −βĤ for a chain of L sites can be obtained in the form of an MPS purification where the |σ i label orthonormal site basis states for H, the auxiliary Hilbert space H aux is isomorphic to H, and |σ i aux label orthonormal site basis states for H aux . The MPS is constructed from are also called bond dimensions; see for example the review [3].
In order for |ρ β to be a purification ofρ β , it needs to be constructed such that This can be achieved by choosing the infinite-temperature purification to be which can be written as an MPS (2) with bond dimensions M i = 1. In this so-called ancilla approach, finite-temperature purifications can be calculated by imaginary-time evolution and response functions are evaluated after a subsequent real-time evolution The evolution of the MPS can for example be implemented by decomposing the propagators into circuits of local gates by a Trotter-Suzuki decomposition [14][15][16] as described in Section VI. The square brackets in Eq. (3) indicate which parts of the expression are represented as MPS after the evolution. Note that Z β , which is required in Eq. (3), can be determined as ρ β |ρ β = Tr e −βĤ = Z β . Equivalently, one can work with purifications |ρ β that are normalized to one. In each step of the tDMRG, the evolved states |ψ = |ψ(β, t) are approximated by an MPS with bond dimensions M i = M i (β, t) that are as small as possible for a given constraint on the desired precision of the approximation [3]. This is achieved through truncations. For every splitting of the system into a left and a right part, one does a Schmidt decomposition |ψ = M k=1 λ k |k L ⊗ |k R of the state [19] which boils down to doing singular value decompositions of the tensors A i . The corresponding reduced density matrices are k λ 2 k |k L k| L and k λ 2 k |k R k| R . The bond dimension is then reduced fromM to some value M <M by retaining only the M largest Schmidt coefficients and truncating all smaller ones.
The precision is in each step of the algorithm controlled by bounding the truncation weight As a matter of fact, the notation of this procedure as an algorithm on purifications is in a sense superfluous. Due to the fact that B(H), the space of linear maps on H, and the tensor product space H ⊗ H are isomorphic, everything can be rewritten in terms of MPOs  (6). In the graphical representation, boxes represent the tensors A i that define the MPS or MPO, and lines represent partial tensor contractions. The diagram for Tr(ŶBX) represents the tensor network that has to be contracted in order to evaluate the trace of two MPOsX andŶ , and a local observableB as occurring in the tDMRG schemes for the evaluation of thermal response functions. See for example Eq. (9). The computational cost is determined by the bond dimensions of the MPOs as in Eq. (10).
Examples for this one-to-one relation are where the transposition is to be executed in the |σ basis. In particular, the counterpart of the MPS purification (2) is the MPO See for example Refs. [24,52,53] for discussions of MPOs. Applying evolution operators e −∆βĤ or e ±i∆tĤ to the left or right of an MPOX can be done by tDMRG in exactly the same way as for MPS, for example, by a Trotter-Suzuki decomposition as described in Section VI. The truncation weight, which is kept below a certain bound in order to control the precision, is then given by where X 2 = TrX †X is the Schatten 2-norm. This is the natural norm within the DMRG framework.

III. EVALUATION SCHEMES A AND B
Evaluation scheme A as described and employed in Refs. [11,12] corresponds to Eq. (3). In the MPO notation it reads The square brackets indicate which parts of the expression are represented as MPOs after the evolution. Figure 2 shows the algorithm diagrammatically. The partition function Z β can be obtained from the Schatten 2-norm Z β = ( [e −βĤ/2 ] 2 ) 2 . In the DMRG method, multiplications of the matrices A σ i ,σ i i and singular value decompositions, required for the truncations of the MPOs, dominate the computation costs. Hence, the computation costs for a single evolution step for the first MPO (bond dimensions M i ) and the second MPO (bond dimensions M i ), and for the final evaluation of Eq. (9) scale as respectively [54]; see Figure 1. For a nonsingular operatorT ∈ B(H), the value of (9) is invariant under transformations of the involved MPOs [57]. This is a considerable degree of freedom that can be exploited to Néel phase Figure 3: Zero-temperature phase diagram of the spin-1/2 XXZ Heisenberg model in a magnetic field (13). The ground state is fully polarized in the ferromagnetic phase. In the Néel phase, it has a finite staggered magnetization. The system is critical (vanishing energy for excitations) in the XY, or spin-liquid, phase (gray). The phase boundaries can be obtained from the Bethe ansatz [59,60]. In this work, the properties of different tDMRG schemes for the evaluation of thermal response functions are exemplified with this model at the three points J z = 0, 1, 3 with h = 0.
reduce, for given β, t, and , the bond dimensions M i of the MPOs and, hence, to reduce the computation cost. In cases where such a reduction is possible, one can extend the simulation to longer times t. Some numerical experiments showed that the corresponding computation cost for the optimization ofT scales exponentially with the system size. So a search for globally optimalT seems to be a hopeless exercise. It is hence reasonable to constrain ourselves to certain classes of transformations, e.g.,T = e −β Ĥ e −iĤt and to optimize with respect to the degrees of freedom of the class -β and t in that case. See Section V. The modified evaluation scheme B (see Fig. 2) employed in Refs. [13,58] corresponds to the choiceT = e −iĤt and reads in the operator notation

IV. COSTS OF SCHEMES A AND B AND THEIR EXPLANATION
In order to study the computational costs of evaluation schemes A and B [Eqs. (9) and (12)], as functions of time and temperature, let us choose as an example the spin-1/2 XXZ Heisenberg modelĤ The phase diagram [59,60,62], as derived by the Bethe ansatz, is shown in Figure 3. The simulations are carried out for vanishing magnetic field h = 0, system size L = 128, and coupling constants J z = 0 and J z = 1, where the model is gapless, and at J z = 3, where the model is in its gapped antiferromagnetic phase. For the time evolution, a fourth order Trotter-Suzuki decomposition with step sizes ∆β = ∆t = 1/8 is used. The truncation weights in the imaginary-time and real-time evolutions were fixed to values β and t which are specified in the captions of the corresponding figures. Figure 4 shows the computation cost per time step for the operatorsÂ =B † =Ŝ + L/2 in Eqs. (9) and (12), i.e., spin-flips in the middle of the system [61]. The density plots show, as a function of time and temperature, the number of operations needed for a single time step as measured by i (M i (β, t)) 3 . For scheme A, the cost is dominated by the cost for the computation of the MPO e −iĤtÂ e −βĤ/2 .
For scheme B, the computation of the MPO dominates the numerical costs. Hence, the bond dimensions M i (β, t) of those operators were used in the diagrams. Please notice the logarithmic scale for β = 1/T and the cost, i.e., the color coding and the distances of the contour lines [54]. The contour lines correspond to maximum reachable times for certain computational resources and a predefined precision of the simulation, determined by β and t . Figure 5 shows, for the same simulations, the maximum bond dimensions max i∈ [1,L] First of all, the results show that the computation costs increase, for fixed temperature, exponentially with time t. This corresponds to a linear increase of the entanglement entropy in the evolution of pure states after a quench [27][28][29]. For the noncritical case, J z = 3, the costs per time step become β-independent for temperatures that are sufficiently below the excitation gap. At higher temperatures, the cost for scheme B is systematically smaller than that for scheme A. The effect is strongest for the non-interacting system at J z = 0. For J z = 1 and J z = 3, the increase in the maximum reachable times is more moderate with a factor of 1.4. At lower temperatures, scheme A is more efficient than scheme B. This trend strengthens when the accuracy of the simulation is reduced (higher truncation weights β and t ) as documented in the left part of Figure 6. Figure 5 shows that the maximum bond dimensions max i M i (β, t) evolve almost in the same way as the computation costs. For the noncritical case J z = 3, one sees however that the maximum bond dimensions occurring in scheme A are for all temperatures smaller than those occurring in scheme B, whereas the computation cost for scheme B is lower at higher temperatures. Also, for J z = 1, the maximum bond dimensions occurring in the two schemes are quite similar.
These findings can be explained as follows. In Ref. [13] it was pointed out that the operator (15) of scheme B is time-independent for the simple caseÂ = 1, whereas the computation cost for the MPO (14), occurring in scheme A, can increase with time, even in this trivial case. More generally, the following argument applies for all operatorsÂ with finite spatial support supp(Â). The typical condensed matter systems are quasi-local [49,50], i.e., the spatial support of operators like e −iĤtÂ e iĤt , occurring in scheme B, grows only linearly with time. More precisely, outside a certain space-time cone originating from supp(Â), the evolved operator acts almost like the identity and does hence not change the entanglement in that region; see Figure 7. In mathematical terms, quasi-locality means where all termsĥ i in the HamiltonianĤ = iĥ i are required to be short-ranged and    (14) and (15) as occurring in schemes A and B, respectively. The plots show the bond dimensions M i = M i (β, t) forÂ =Ŝ + L/2 , high as well as low temperatures, and the critical point J z = 1 as well as the gapped J z = 3. The truncation weights were β = 10 −12 and t = 10 −10 . Unlike in scheme A, the MPO for scheme B remains unchanged outside a certain space-time cone also at high temperatures. This is due to the quasi-locality (right); see Eq. (16). norm-bounded,Ĥ V = i∈Vĥ i is the Hamiltonian truncated to a vicinity V of the spatial support of the operatorÂ (supp(Â) ⊂ V ), ∂V is the boundary of V , and v ∝ sup i ĥ i is the Lieb-Robinson velocity [49,50]. So the norm-distance between the exactly evolved operator and the operator, evolved on the subsystem V only, decays exponentially with increasing distance of ∂V to a space-time cone defined by the Lieb-Robinson velocity v, and originating from supp(Â) at time zero. As the evolved operator e −iĤtÂ e iĤt , hence, behaves up to exponentially small corrections like the identity outside the specified spacetime cone, it does not alter the operator e −βĤ/2 in that region -in particular, not the MPO bond dimensions M i -for the product (15) occurring in scheme B. This effect is very well visible in the plots for J z = 1 and β = 3 shown in Figure 7. For both tDMRG schemes, the maximum bond dimensions occur in the middle of the system, whereÂ =Ŝ + L/2 acts, and grow (exponentially) with time t. The maxima have comparable values. Whereas, in scheme B, M i remains unchanged outside the Lieb-Robinson space-time cone due to the quasi-locality, they grow for all bonds in scheme A. This implies according to Eq. (10) an increased cost for scheme A at those temperatures.
Nevertheless, as Figures 4-7 exemplify, scheme A is often advantageous at low temperatures, especially for noncritical systems. For low temperatures, the quasi-locality is not essential as e −βĤ/2 limits the effect of the real-time propagators e ±iĤt to a low-energy subspace. Sufficiently far away from the support ofÂ, the resulting dynamics is then very restricted and can not lead to a big increase of the bond dimensions. This is true for both schemes. Furthermore, in the middle of the system, whereÂ =Ŝ + L/2 acts, the growth of the bond dimensions is slower for scheme A which can be attributed to the fact that one acts with one propagator in Eq. (14) instead of acting with two propagators in Eq. (15). As the plots for J z = 1, 3 and β = 40, 48 in Figure 7 exemplify, the effect is much more pronounced for gapped systems.
The described properties have also been found for other local operatorsÂ andB likê S z L/2 etc. The behavior for non-local operators likeŜ + k=0 := iŜ + i is also quite similar as exemplified in Figure 6 for J z = 0, 3. Please note that [Ŝ + k=0 ,Ĥ] = 0 in the isotropic case J z = 1 for which the evolution ofŜ + k=0 (t) is hence trivial.
V. OPTIMIZED SCHEMES AND NEAR-OPTIMAL SCHEME C Schemes A and B, Eqs. (9) and (12), are typically far from optimal. One has a lot of freedom in designing a scheme that is as efficient as possible. The choice for a specific measure for the efficiency of a scheme can depend on the available computational resources. In this paper, the computation cost per time step as quantified by i (M i (β, t)) 3 [Eq. (10)] and, as an alternative, the maximum bond dimension max i M i (β, t) are considered as useful measures. With a nonsingular operatorT , the most generic scheme involving two MPOs is Z −1 β Tr e iĤtBT T −1 e −iĤtÂ e −βĤ . As pointed out in Section III, to optimize efficiently with respect to genericT is in general not possible. Thus, it is reasonable to constrain ourselves to a certain subclass of schemes and to optimize the efficiency over the remaining degrees of freedom of that class. Let us consider the class of schemes  (18) with J z = 0, 1, and 3, respectively. withÂ =B † =Ŝ + L/2 ,Ŝ + k=0 , the schemes have been optimized with respect to β and t . The corresponding efficiency measures were chosen to be the computation cost per time step for the top and bottom rows of diagrams, and the maximum bond dimension was chosen for the middle row of diagrams. In all computations, the truncation weights were chosen as β = 10 −12 , t = 10 −10 .
As before, the two factors in square brackets are to be approximated by MPOs [57]. For given inverse temperature β, time t, and accuracy as quantified by the truncation weight [Eq. (8)] one can now optimize the chosen efficiency measure with respect to β , t , and t . Scheme A corresponds to the choice (β , t , t ) = (β/2, t, 0), scheme B to (β , t , t ) = (β/2, 0, 0), and the choice (β , t , t ) = (β, 0, 0) for example to the Heisenberg picture. In actual applications, one can first study the computation cost, for obtaining operators e iĤs e −αĤB e −iĤs and e −iĤsÂ e −αĤ e iĤs . From the results, one can then conclude on optimal values for β , t , and t in the evaluation of the response function according to Eq. (17). For the demonstrational purposes of this paper, it is sufficient to not explore the full three-dimensional parameter space, but to constrain ourselves in the following to the subspace with t = t , i.e., to the schemes Figure 8 shows, withB † =Â [61], the computation cost per time step and the maximum bond dimensions for the optimized scheme. The maximum reachable times t opt max of the optimized scheme are at least twice as large as the reachable times t B max (β) for scheme B, specifically t opt max (β) 2t B max (β/2). See also Figure 9. The factor 1/2 in the temperature argument is inessential as one finds that t max (β) ≈ t max (β/2). This is due to the fact that t max varies slowly as a function of log β for all response functions considered in this paper. Let us discuss the possible scenarios for the optimum values of β and t for the casê B † =Â: The computation cost for scheme B is dominated by the tDMRG computation of the operator e −iĤtÂ e −β Ĥ /2 e iĤt in Eq. (12). The corresponding maximum time, reachable for some given computational resources, truncation weight , and inverse temperature β , is denoted by t B max (β ). The maximum reachable time for the scheme (18) as a function of β is then given by There are two possible scenarios, as depicted in Figure 10. If t B max (β ) is concave on the interval [0, β], the optimal scheme corresponds to β = β/2. Otherwise, there will be an optimum β = β/2. In the examples studied here, t B max (β ) was found to be almost concave in the relevant temperature ranges.
One can hence devise a corresponding scheme C that is near-optimal among the schemes (18) (at least forB † =Â), does not require any optimization, but outperforms scheme B by a factor of two in the maximum reachable times. It is only outdone by scheme A at very low temperatures. The scheme corresponds to the choice β = β/2 in Eq. (18) After the imaginary-time evolution that yields [e − β 2Ĥ ], one runs two real-time tDMRG simulations to obtain MPOs e iĤt B e − β 2ĤB e −iĤt B and e −iĤt AÂ e − β 2Ĥ e iĤt A . With Eq. (20) one then obtains χ Ĉ AB (β, t A +t B ); see Figure 10. The accuracy of the MPOs should be kept under control during the whole simulation, for example, as described in Section II by bounding the truncation error [57]. If this is done properly, it is of minor importance what specific t A and t B = t − t A are chosen to evaluate χÂB(β, t) for a given time t. The maximum reachable time t is of course given by the sum of the maximum reachable t A and t B . For the caseÂ =B † (or cases whereB † is for example simply a translate ofÂ [61]), the maximum reachable times for t A and t B are equal, and the total maximum reachable time with this scheme is then twice as large as the maximum time of scheme B. This makes many more physical applications accessible.
As pointed out above, scheme C is optimal among the classes of schemes corresponding   (19)] reachable with the optimizable tDMRG scheme (18) for the caseÂ =B † and given computational resources. If t B max (β ) is non-concave on the interval [0, β], the optimal scheme corresponds to some nontrivial β . This can for example occur for systems with dynamics on different energy scales. If t B max (β ) is concave, the optimal scheme is given by β = β/2, i.e., scheme C. In the examples studied here, t B max (β ) was found to be almost concave in the relevant temperature ranges. Right: Scheme C for the evaluation of the response function according to Eq. (20).
to Eq. (18) if t B max (β ) is a convex function. For the cases studied here it was indeed found to be convex or almost convex in the considered temperature ranges. One can expect a different behavior for example for systems with dynamics on different energy scales. In such cases one can achieve considerable performance improvements by optimizing among the class of schemes (17) or (18). In exceptional cases it also occurs that evolved operatorsÂ(t) have an MPO representation with a time-independent bond dimension. This implies t B max (0) = ∞ and one can compute the response function for arbitrary times by using β = β and t = 0 in Eq. (18). One such case is the operatorŜ z j (t) for the XY model (J z = 0) which can for all times be written as an MPO of bond dimension M j = 4 [63].

VI. tDMRG VERSUS TMRG CONTRACTIONS
The time evolution of matrix product states or density operators, as employed in the presented evaluation schemes for thermal response functions, can be implemented within the tDMRG framework in several different ways. The results for the computation costs of the different evaluation schemes are essentially independent of the chosen time evolution algorithm. The currently most common choice [14][15][16] employs a Trotter-Suzuki decomposition [64][65][66][67][68]. Alternatively, one can use for example the Arnoldi method, Runge-Kutta method or other Krylov subspace approaches [69][70][71].
Let us consider in the following a HamiltonianĤ = L i=1ĥ i with nearest-neighbor interactionsĥ i , operatorsÂ andB with finite spatial support (for simplicity single sites), and implementing the time evolution with a Trotter-Suzuki decomposition. This case allows for an alternative evaluation of the thermal response functions on the basis of the transfer matrix renormalization group (TMRG) [36][37][38][39]. where τ = β or τ = ±it, respectively, ∆τ = τ /N is the time step, andĤ odd,even contain the Hamiltonian terms on even and odd bonds, respectively. The coefficients a n and b n sum up to one ( n a n , n b n = 1) and define different decompositions of order p and number of stages m per time step. A common choice is the leapfrog algorithm which has m = 1 stages and order p = 2 with a 1 = a 2 = 1/2 and b 1 = 1. In this work, a decomposition with m = 5 stages and order p = 4 was used. When the operator exponentials occurring in the formula Z −1 β Tr(e −βĤ/2 e iĤtB e −iĤtÂ e −βĤ/2 ) for the response function in scheme A, Eq. (9), are decomposed by such a Trotter-Suzuki decomposition, one ends up with the task of contracting a 2D tensor network (quantum cellular automaton) of local gates as displayed in Figure 11 with the space direction being horizontal and the time direction being vertical. Similar tensor networks result for the other evaluation schemes.
The tDMRG approach scheme A corresponds to starting with a trivial MPO [1] that represents the identity with bond dimensions M i = 1 ∀ i∈ [1,L] . Beginning at the bottom, one then contracts one row of local gates after another to that MPO until reaching the row that contains operatorB, yielding [e −iĤtÂ e −βĤ/2 ]. Multiplying, in every step of this � � � � � � Figure 11: Visualization of the 2D tensor network that has to be contracted for the evaluation of the thermal response function Z −1 β Tr(e −βĤ e iĤtB e −iĤtÂ ) on the basis of scheme A when using a Trotter-Suzuki decomposition (21) of the operator exponentials (nearest-neighbor interactions). This can be done either using tDMRG or using TMRG. In the tDMRG method, one contracts one row of local gates after another (with intermediate truncation steps), evolving two MPOs that are oriented horizontally. The final evaluation of the response function according to Eq. (9) corresponds to taking the Hilbert-Schmidt scalar product of the two MPOs. Alternatively, one can contract this tensor network with TMRG. In this case the meaning of states and dual states is changed as indicated on the right side of the figure. One starts on the left and on the right with vertically oriented MPS. One column of local gates after another, the transfer matrices, is applied to those MPS, again with intermediate truncation steps to reduce the bond dimensions. Finally, the expectation value is obtained by evaluating the scalar product of the two MPS.
iteration, one row of gates with the current MPO in an exact manner, gives a resulting MPO with increased bond dimensions. A subsequent truncation step reduces the bond dimensions again. The degree to which the bond dimensions are reduced in the truncation steps is controlled by the predefined truncation weight (8) which determines the accuracy of the computation. Similarly, starting from the top, one contracts rows of gates to obtain [e −βĤ/2 e iĤt ]. Finally, one computes the response function by evaluating the Hilbert-Schmidt scalar product (9) as visualized in Figure 1.
In the TMRG approach, the columns of local gates in Figure 11 are interpreted as transfer matricesT =T (β, t) (alsoTÂ andTB) and operate on a Hilbert space of dimension d β/∆β+2t/∆t , where d denotes the dimension of the single-site Hilbert space. The last column corresponds to a pure state |ψ L and the first column to a dual state ψ 0 |. Those transfer matrices and states have matrix product representations with bond dimensions M i ≤ d 2 ∀ i . Starting from the left and the right with the states ψ 0 or ψ L , respectively, one transfer matrix after another is applied with intermediate truncation steps, for example, until one reaches the column containing operatorB. The response function is then given by the overlap of the two resulting MPS, [ ψ 0 |T · · ·TTÂT · · ·T ][TBT · · ·T |ψ L ].
The advantages of the tDMRG approaches, pursued in this work, are that, (i), the number of matrices in the MPOs is fixed by the lattice size L instead of growing linearly with the time t and inverse temperature β in the TMRG approach, (ii), the operatorsÂ andB can have non-local support, (iii), one can evaluate the spectral function for all times t, up to the maximum reachable time, by two tDMRG runs whereas, in the TMRG approach, one has to do a new calculation for every time t of interest.
Concerning (ii), it should be pointed out that non-local operatorsÂ andB would also be accessible in the TMRG approach as long as they are MPOs with sufficiently small bond dimensions. In Refs. [41,42] the focus was on autocorrelation functions of local operators in the thermodynamic limit L → ∞. In this case, one does not need to contract one column after another, but can evaluate the response function directly from the left and right transfer matrix eigenvectors with maximum eigenvalue. When one uses the so-called infinite-system DMRG [1] to obtain those eigenvectors (a single build-up sweep for the finite imaginary-time lattice; not to be mistaken with the algorithm presented in Ref. [72] which generates translationally invariant MPS for infinite systems), problem (iii) does not occur and autocorrelation functions can be evaluated for all accessible times t in a single run. One should however be careful in applying infinite-system DMRG only, as this algorithm has several pitfalls [3]. At least for larger times, finite-system DMRG (multiple sweeps) should be necessary.

VII. CONCLUSION
In this paper, I have studied and explained the computation costs of different tDMRG schemes for the efficient and precise evaluation of finite-temperature response functions of strongly correlated quantum systems. Simplifying the notation to some extent by formulating everything in terms of MPOs, elucidated the effects of quasi-locality on the costs. The new class of optimizable evaluation schemes, Eqs. (17) and (18), typically outperforms the earlier schemes from the literature in terms of the maximum reachable times by a factor of 2. This gives access to many more physical applications. The novel scheme C, that requires no additional optimization and is near-optimal in many typical examples, can be expected to be the method of choice for future applications. For more complex models, like systems with dynamics on different energy scales, scheme C still outperforms the older tDMRG methods, but one can achieve further substantial performance gains by first studying the costs for the computation of operators e iĤs e −αĤB e −iĤs , and then using this information to determine the optimal parameters for the scheme (17). Finally, it has been argued that the tDMRG schemes are in some respects favorable to corresponding TMRG variants. As a first application, scheme C has been used in Ref. [51] to calculate, for 1D bosons in the quantum critical regime with dynamic critical exponent z = 2, the universal scaling function for the thermal spectral function.
Discussions with J. Barthel