Time-evolution methods for matrix-product states

Matrix-product states have become the de facto standard for the representation of one-dimensional quantum many body states. During the last few years, numerous new methods have been introduced to evaluate the time evolution of a matrix-product state. Here, we will review and summarize the recent work on this topic. We will explain and compare the different methods available, namely the time-evolving block decimation, the MPOW I,II method, the global Krylov method, the local Krylov method and the oneand two-site timedependent variational principle. We will also apply these methods to four different representative examples of current problem settings in condensed matter physics.


Introduction
The dynamics of correlated quantum systems is a rich playground for non-equilibrium physics. The actual study, however, is a challenge for experiments as well as for theory. In the context of tensor networks, numerical methods were introduced around 2004 and opened the path to study the dynamical behavior of low-dimensional systems in a controlled fashion. In particular, the Trotterized time evolution of matrix-product states (MPS) as produced by the density-matrix renormalization group (TEBD, tDMRG, tMPS) [1][2][3][4][5][6][7][8] made a breakthrough as it combined powerful time-evolution schemes with the efficient truncation of the exponentially-large Hilbert space intrinsic to quantum many-body systems. Nowadays, DMRG methods formulated in the language of matrix-product states [9][10][11][12][13][14] (MPS) are the de facto standard for the investigation of one-dimensional systems. Here, we will focus on the time evolution of quantum states along the real and imaginary time axis. Specifically, we will review and compare in some detail the various MPS-based approaches which have been developed in the past few years with sometimes large improvements over the first generation of methods.
The main goal of these schemes is to combine the efficient truncation of the Hilbert space with an accurate time-evolution method. Naturally, time-evolution methods are dealt with regularly in the mathematical literature (cf. the review [15]), but the applicability to MPS-based problem settings is often limited due to the peculiarity of the MPS approach in approximating the Hilbert space. Since we have quantum systems in mind, the underlying differential equation is the time-dependent Schrödinger equation (TDSE), whose solution can formally be recast to the task of applying the time-evolution operator U (δ) = e −iδĤ (1) with the time-independent HamiltonianĤ, time step size δ and ≡ 1. It is therefore natural to ask for efficient MPS methods for the time evolution of quantum states by combining suitable approaches to matrix exponentials with the MPS formulation of quantum (lattice) systems. Indeed, most of this review article is concerned precisely with this question and explores under which circumstances which ansatz will be the most successful one in terms of accuracy and efficiency.
In the context of quantum systems, first attempts to numerically integrate the time-dependent Schrödinger equation beyond the reach of exact diagonalization using sparse matrix exponentials [16] are reported in quantum chemistry [17,18]. When attempting to do the same for MPS, two main approaches have to be differentiated: the first evaluates or constructs the time-evolved state |ψ(t + δ) =Û (δ) |ψ(t) directly by approximating the action ofÛ (δ) in a sufficiently small subspace. In this approach,Û (δ) itself is never calculated or approximated. The global Krylov method [19][20][21] presented in Sec. 5 simply translates the Lanczos formalism [22] for unitary time evolution to matrix-product states. The local Krylov approach [21,[23][24][25][26][27] (Sec. 6.1) and the time-dependent variational principle [28,29] (Sec. 6.2) further adapt the Krylov method to the special MPS structure and re-derive the TDSE in suitable local subspaces.
Alternatively, one may ask for an approximation ofÛ (δ) which can be obtained efficiently and also can be applied efficiently to the quantum state. One way to deal with such exponentials is well-known in quantum field theory where it permits to set up path integrals [30]. It decomposes the time-evolution operator for "sufficiently small" time steps using the Suzuki-Trotter [31] decomposition. This decomposition leads to conveniently smaller matrix exponentials. The method can be directly applied in the context of matrix-product states [4][5][6][7][8], as explained in Sec. 4.1. Extending on it, we may ask for efficient matrixproduct operator (MPO) approximations ofÛ (δ) which exploit the MPO structure directly [32] to allow efficient exponentiation, cf. Sec. 4.2. A direct comparison of widely used approaches, including the recently developed methods, is missing so far. Since those methods have different strengths and weaknesses, it is difficult to say a priori which one will be the most efficient for a given problem setting. We will attempt to fill this gap by providing an overview and comparison. In particular, we discuss in detail the Suzuki-Trotter decomposition and MPO W I,II methods for constructingÛ (δ) directly, and TDVP-and Krylov-subspace-based approaches evaluatingÛ (δ) |ψ . All methods are tested and compared by treating four representative problems of dynamics in correlated systems. This includes the real-and imaginary time evolution of chains, timedependent correlators, critical and gapped systems, and attempts at two-dimensional systems. To cross-check our results, we use two independent implementations of the time-evolution methods within the SciPal [33] and SyTen [34,35] toolkits.

Matrix-product states and operators
The main problem in the numerical treatment of quantum many-body systems is the large Hilbert space which grows exponentially with system size [14]. This strongly restricts exact diagonalization approaches to small system sizes; for spin systems one can reach ∼ 40 − 50 lattice sites [36]. A variety of numerical methods (e.g., quantum Monte Carlo [37], dynamical mean-field theory [38], ...) have been developed to overcome this restriction. For one-dimensional systems, matrix-product state (MPS) methods are extremely successful because they can efficiently represent weakly-entangled states that obey the area law [14,39,40]. Here, we start by recapitulating general properties of the MPS ansatz class and its efficient numerical use, with the scaling costs of each application summarized in Tab. 1 at the very end of the chapter.

Tensor notation
The fundamental objects of MPS algorithms are tensors. In our context, tensors are multi-dimensional collections of complex or real numbers. Each index of a tensor corresponds to one of its dimensions; graphically, tensors are represented as shapes (circles, triangles, squares) with one leg per tensor index (cf. Fig. 1). A tensor T on e.g. three associated vector spaces A, B and C has three indices a = [1, . . . , dim(A)], b = [1, . . . , dim(B)] and c = [1, . . . , dim(C)]. The tensor then has scalar entries T a,b,c . We explicitly do not consider global symmetries [11] here and hence do not associate a direction to our tensor legs (or, conversely, differentiate between vector and dual spaces). As such, the upstairs/downstairs location of tensor indices is meaningless and T a,b,c = T a b,c = T a,b,c . By complex-conjugating every element of a tensor T , we obtain a new tensor T with elements T a,b,c = T a,b,c . We will use a to denote indices of conjugated tensors. Contracting T and T over the index a and b, we write: which is equivalent to a,a,b,b δ a,a δ b,b T a,b,c T a,b,c .  Given two tensors A a,b,c and B b,d,c , the shorthand A · B denotes the contraction over all shared indices: Tensor contractions can also be represented graphically by drawing tensors with connected legs, cf. Fig. 1.
Note that we will, where possible without confusion, also use a, b, c, . . . to refer to the dimension dim(A) etc. respectively.
For a specific set of local states {σ 1 . . . σ L }, we use a single matrix M σj j per site. We hence evaluate a matrix product to give a single entry c σ1...σ L , resulting in the name "matrix-product" states. We will use σ j to index the local physical states on site j. Depending on the bond dimensions m j of the tensors M j (also called virtual or auxiliary dimension), different quantum states can be represented exactly. If we let m j grow exponentially by a factor of σ per site towards the center of the system, any quantum state can be written as an MPS. The entanglement between the part of an MPS to the left of bond (j, j + 1) and to the right of this bond is bounded by log(m j ). The required bond dimension m = max i (m i ) to represent a quantum state exactly hence grows exponentially with its entanglement along left/right partitions. On the other hand, lowly-entangled states require only a small bond dimension, leading to an efficient representation of the quantum state [14,39,40].
The only difference is that the tensor components W j are now rank-4 tensors to account for the domain and image Hilbert spaces. There are different avenues to constructing matrix-product operators [41][42][43]. When used to represent the HamiltonianĤ or other operators which are sums of local terms, the construction can be understood by splitting the system at bond j (connecting sites j and j +1). We then separate terms of the operator that act only within their partitionĤ L j−1 ,Ĥ R j+1 and those that connect the partitionsĥ L j;aj ,ĥ R j;aj 1 Based on the tensor product structure there is an operator-valued matrixŴ j which relates the partitioned representations between the bonds j − 1 and j; e.g., for the right partition we have The operator-valued matricesÂ,B,Ĉ andD then define the recursion relations to iteratively build the complete operatorĤ. This picture directly leads to a construction of MPOs based on finite-state machines (FSM) [43,45]. In analogy to matrix-product states with bonds m i and a maximal bond dimension m, the bonds of matrix-product operators are labelled by w i with the maximal bond dimension denoted by w.

Canonical form
Considering Eq. (6), we can clearly insert resolutions of the identity XX −1 in between any two MPS tensors M j and M j+1 . Multiplying X into M j and X −1 into M j+1 changes the numerical content of each tensor while keeping the state invariant. This gauge freedom can be exploited to increase the numerical stability of the algorithm and simplify many tensor contractions. Two possible choices to fix the gauge are  Figure 4: Left (Right) normalized tensor A j (red, right-pointing triangle) (B j (green, left-pointing triangle)) contracted with its adjoint resulting in an identity.
Graphically they are represented by red (left-normalized) or green (right-normalized) triangles (see Fig. 4) where the orientation of the triangles also indicates the normalization. Left-/Right-canonical MPS are now defined by requiring that they consist of left-/right-normalized tensors only. Furthermore, a mixedcanonical MPS is defined by fixing a site j with unnormalized tensor M j and demanding all site tensors to the left/right to be left-/right-normalized tensors (see Fig. 5). The unnormalized site is often called active site or orthogonality center.

Normalizing an MPS
Given an unnormalized MPS with site tensors M j , the first step is often to bring it into a left or rightcanonical form. This can be done by a series of QR decompositions: The tensor M j is reshaped into a matrixM with the left and physical tensor legs forming the rows and the right tensor leg the columns of the matrix, that isM (σj mj−1),mj = M σj j;mj−1,mj . Applying a QR decomposition to this matrix, one obtains two new matrices Q j and R j (we label tensors living on bonds (j, j + 1) with an underlined index j ). The reshaping operation on M j is done in reverse on Q to give the new left-normalized site tensor A j while the transfer tensor R j is multiplied into the next site tensor M j+1 . Likewise reshaping the M j with right and physical legs as rows and left tensor leg as columns results in a right-normalized tensor B j (with the transfer tensor multiplied into the previous site tensor M j−1 ).
Starting on the left edge of the system and subsequently performing left-normalizations on each MPS tensor results in a complete left-normalized state. Equivalently, starting on the right edge of the system and moving to the left results in a right-normalized state.

Truncating an MPS
Operations on matrix-product states typically increase the bond dimension of the state (e.g. MPO-MPS applications or the addition of two MPS). Finding an optimal approximation to such a quantum state for a smaller bond dimension is the purpose of this section. This is of particular relevance to time-evolution methods, as entanglement generically grows during real-time evolution and time-evolved states hence per se already need a larger bond dimension than e.g. ground states. Hence finding good approximation methods is crucial. Let us consider a state |ψ which is represented by an MPS with an initial large bond dimension m. We wish to find another state |ψ with smaller bond dimension m which approximates |ψ well in the sense that it minimizes the Hilbert space distance |ψ − |ψ .
The most direct way to proceed is to use a series of singular value decompositions to successively truncate each bond of the MPS. On each individual bond, the optimal choice is made, but this does not have to result in the globally optimal state |ψ . It is also possible to optimize each site tensor of |ψ sequentially to maximize the overlap between |ψ and |ψ . Multiple sweeps of this variational optimization can be done to approach the global optimum as well as possible.

Direct truncation via SVD
Consider a cut of the MPS on one bond (j, j + 1) into a left and right half such that the state |ψ can be represented in effective left and right basis sets as |ψ = mj l,r=1 Ψ l,r |l L ⊗ |r R .
The coefficient tensor Ψ l,r then occupies the bond between sites j and j + 1, such rank-2 tensors are called bond tensors in the following. For orthonormal left and right basis sets as realized in an MPS with orthogonality center on bond (j, j +1), we can use a singular value decomposition (SVD, cf. Ref. [14] Sec. 4.5) of the Ψ tensor to obtain a locally optimal approximation. That is, we decompose 8 φ| · · · · · · |φ · · · · · · δ mj−1,mj−1 δ mj ,mj = · · · · · · φ| · · · · · · |ψ L mj−1 j−1;mj−1R mj j+1;mj Figure 7: Iterative truncation considering the truncated state to be in a mixed canonical form. The left hand side can then be reduced to the new optimal site tensor M j . The right hand side, which needs to be considered completely, can nevertheless be calculate iteratively via the bond tensorsL andR.
with U and V † left-/right-unitary and S diagonal and non-negative. An approximation is then given by where we only keep the m largest singular values and also correspondingly truncate the rows and columns of U and V † . The error of this approximation of |ψ is given by where the argument to the square root is also known as the discarded weight. In practice, one selects m such that some target discarded weight (e.g. 10 −10 , corresponding to error ≈ 10 −5 · L) is obtained. Instead of working on the bond tensor Ψ l,r we can also apply the decomposition directly to the MPS tensor M j by a suitable reshaping (cf. Figs. 6a and 6b). Starting e.g. on the left end of a right-normalized MPS, we can sweep left-to-right and sequentially truncate every bond, each time obtaining the locally optimal state. Due to this sweeping through the system, the truncation of site L − 1 becomes dependent on the truncation of site 1 but not vice versa. In the case of small truncations the error resulting from this asymmetry is small and can be ignored. To increase accuracy in case of more severe truncations, a subsequent variational optimization of the state may be necessary.

Variational truncation
To overcome the problems of the direct truncation by SVD, an iterative sweeping mechanism is often employed. By sweeping multiple times through the system and finding on each site the locally optimal tensor, it is more likely that one obtains the globally optimal MPS. We start from an initial guess state |φ with tensors M j and a chosen bond dimension m and variationally minimize the distance to the untruncated state |ψ with tensors M j . A good choice for the initial state is given by the direct truncation using the SVD. The distance to minimize is given by We now keep all tensors but M j fixed and hence only minimize the single site tensor M j , differentiating with respect to M j . Because this tensor only occurs in the second half of Eq. (18), the new (optimized) M j can be obtained via Let us now consider that the truncated state is in a mixed canonical form with the active site at position j.
Then the new tensor is given by in which the left (right) parts of the tensor network (Fig. 7, right hand side) are contracted intoL (R), It is never necessary to calculate the complete contraction of the boundary tensors Eq. (21), because the next tensor in sweep direction was already calculated in the sweep before and the other tensor is obtained by reusing the one from the previous sweep step.
In practice, one should consider a two-site variational optimization in which neighboring site tensors M j , M j+1 are optimized at the same time. This allows for flexibility in the bond dimension and distribution of quantum number sectors on the bond m j . For convenience we will demonstrate the necessity of a two-site update to permit for an increased bond dimension. Consider the reshaped updated tensor Any matrix factorization Θ = X · Y may then create an index with intermediate bond dimension which is bounded by min{m j−1 σ j , σ j+1 m j+1 } and potentially larger than m j , which is possible iff at least two MPS sites are considered at the same time.

Finite temperatures
MPS by default can only represent pure quantum states. To describe the mixed states encountered at finite temperatures, one therefore requires an additional ingredient. Two possible choices exist within the MPS framework: minimally entangled typical thermal states [46] and thermofield doublet states [8,47,48]. In the following, we will concentrate on thermofield doublet states or "purifications" of a mixed state; a detailed comparison between both methods can be found in [49]. The purification works by doubling the Hilbert space H → H p ⊗ H a where a denotes the auxiliary or ancilla degrees of freedom. The density matrix we wish to represent is just the trace over the auxiliary space of our purified quantum state |ψ ∈ H p ⊗ H a : ρ = Tr a |ψ ψ| .
At infinite temperature (β = 0),ρ is the identity matrix. Hence, an infinite-temperature quantum state in the grand-canonical ensemble can be constructed simply as a product state of maximally entangled states of physical and auxiliary sites. To construct a canonical infinite-temperature state, it is best to start with a complete vacuum state. In the Hubbard model, for instance, with creators on physical (auxiliary) sites denoted byĉ † j,σ (ĉ † a(j),σ ), one applies the operator repeatedly to create equal superpositions of particles until the desired filling is reached [50]. To obtain a finite-temperature state at β, we time-evolve the infinite-temperature initial state along the imaginary axis over a range β/2. The density matrix is then ρ ∝ Tr a {e −(β/2)Ĥ |ψ ψ| e −(β/2)Ĥ } ∝ e −βĤ (26) up to normalization which is taken care of by keeping the purified state |ψ normalized at all times. For a more detailed discussion, cf. Ref. [14] Sec. 7.2. The dynamics of this density operator is described by the von-Neumann equation. In the case of the purification, this is identical to the Liouville equation which has two copies of the Hamiltonian acting on both the physical and the auxiliary space [51], but with a negative sign in the auxiliary degrees of freedom. Formally, this corresponds to a time evolution, which is forward in time on the physical degrees of freedom, but backward in time on the auxiliary space. Despite this formal resemblence, as one traces out the auxiliary degrees of freedom, a gauge freedom exists, which allows to apply any unitary operator V a acting only on the auxiliary space to the state |ψ . It is clear that this leaves the density matrix ρ = Tr a |ψ ψ| = Tr aVa |ψ ψ|V † a invariant. The simplest choice then is to obtain the real-time evolution by time-evolving the purified state with the Hamiltonian on the physical space, and leave the auxiliary degrees untouched.
Even more, the question arises, if there are unitaries acting on the auxiliary space, which may reduce the entanglement of the purified state |ψ -finding such an optimized scheme would be extremely important in practice. Multiple choices for such unitary operators have been worked out, whose effect on computational efficiency depends on the system at hand. For example, during a real-time evolution of the physical system under the HamiltonianĤ, an easy and often helpful [52,53] choice is to evolve the auxiliary system under the Hamiltonian −Ĥ, which formally resembles to the solution of the Liouville equation for the purified density operator as mentioned above. Finding further schemes is a topic of present research, and for details and more elaborate approaches we refer to the literature, e.g. Refs. [54,55]. Furthermore, also cf. Sec. 7.5 later for a detailled discussion on how specifically one should purify a density matrix when aiming to evaluate a series of time-dependent observables.

Application of an MPO to an MPS
One of the most important operations within the framework of matrix-product states is the application of an MPO to an MPS. In theory this can be done by a straightforward tensor multiplication of the corresponding site tensors of the MPS and MPO. In practice this is not the method of choice as in most applications the resulting target stateÔ |ψ has a much higher bond dimension m = m · w, which, however, is mostly not required to represent the target state efficiently. It is therefore helpful to look at different approaches -nevertheless, for pedagogical reasons we will begin with the direct application before we turn to more elaborate application schemes.

Direct application
The direct application of an MPO to an MPS is obtained by regrouping the contractions such that the target MPS tensor can be obtained as a tensor product of the individual site tensorŝ with the tensors M j given by φ| · · · · · · · · · · · · |φ The variational application of an MPO to an MPS considering the truncated guess state |φ to be in a mixed canonical form. The left hand side can then be reduced to the active side j that we want to optimize. The right hand side, which needs to be evaluated completely, can nevertheless be calculated iteratively via the bond tensors L j−1 and R j+1 .
The resulting state |φ is therefore again an MPS, but with a larger dimension m = m · w with w being the matrix dimension of the MPO site tensors W j . Repeated application of an operator onto a state in this manner hence quickly increases the dimension of the state and truncation becomes necessary with computational costs scaling as m 3 w 3 d per site if we do an SVD truncation or m 2 mwd per site for the variational compression.

Variational application
In the spirit of variationally compressing a state towards a target state we can try to apply the same considerations to the application of an MPO to a source state |ψ with the subsequent compression of the state in one optimization step. Therefore we seek to minimize the distance between a guess state |φ with tensors M j and bond dimension m and the source state with the MPO applied to it, which we denote as |Ôψ , for all guess site tensors M j . If we keep the current guess state in a mixed canonical form, the above set of coupled equations again reduces to a local update scheme for the target tensors where the boundary tensors L j−1 , R j+1 can be built recursively by sweeping through the system and evaluating the contractions The overall onsite contractions are depicted in Fig. 8. Care must be taken to use the (typically) optimal contraction order ( Figure 9: The essential steps of the zip-up method proposed in Ref. [46]. (left) Initial tensor network that shall be contracted, consisting of a right-canonical MPS and an MPO that only slightly destroys the canonical form. The first step is to interpret the combination of the first MPS tensor and the first MPO tensor as a new tensor M σ 1 1;w 0 m 0 ,w 1 m 1 , which is slightly non-normalized and therefore framed in blue. (right) The next step is to apply an SVD (including a relaxed truncation) on the tensor M 1 and finally build the next tensor M 2 leaving the contracted, left-normalized MPS tensor A σ 1 1;m 0 w 0 ,s 1 on the left side.

The zip-up method
An alternative to the direct application of an MPO to an MPS is the zip-up method described in Ref. [46]. The central assumption is that the operator, such as a time-evolution operator, only slightly destroys the canonical form of the MPS. Hence, a modest truncation is already possible during the contraction process without too much loss of information because there is only a small loss of orthogonality in the used left and right basis sets.
The first step is similar to Eq. (29), but we work with a right-normalized initial tensor B, Note that m 0 and w 0 are dummy indices and can therefore easily be fused into a new dummy index. Applying the SVD with a relaxed truncation criterion we obtain the left-normalized tensor A 1 , with a single right index m 1 In the next step, the remaining parts of the result of the SVD are incorporated as before, now also including the MPO tensor, to obtain the next slightly unnormalized tensor This procedure is depicted in Fig. 9. It is repeated until the right end of the system is reached and hence the complete operator is applied. For the relaxed truncation scheme a maximal growth factor of the MPS bond dimension m = 2m and a truncated weight of 1 /10 of the target weight turns out to be a suitable choice. The contraction then has complexity 2m 3 wσ + 2m 2 w 2 σ 2 . The main cost is in the SVD of a 2mσ × mw matrix at cost O(m 3 σw), i.e. linear in w. The MPS is in left canonical form now. A subsequent compression as described in Sec. 2.6 should be applied to obtain the resulting MPS with the target bond dimension.
Our experience. The zip-up method (Sec. 2.8.3) is typically sufficiently accurate and fast. In some cases, it is worthwhile to follow up on it with some additional sweeps of the variational optimization (Sec. 2.8.2) to increase accuracy. In particular if the bond dimension of the state is already large and the operator close to the identity (as would be the case for a small time step), variational optimization will converge quickly and accurately. It hence may be useful to also consider the MPS bond dimension when dynamically selecting the Matrix multiplications QR SVD Overlap of two MPS mn 2 σ + m 2 nσ none none Expectation value of MPO between two MPS mn 2 σw + mnσ 2 w 2 + m 2 nσw none none Variational truncation of MPS from m to n; single-site; one optimization sweep 2mn 2 σ + 2m 2 nσ + n 3 σ n 3 σ none Variational truncation of MPS from m to n; two-site; one optimization sweep 2mn 2 σ + 2m 2 nσ + mn 2 σ 2 + n 3 σ none n 3 σ 3 Direct truncation of MPS from m to n with SVD (incl. initial normalization) m 3 + m 3 σ + mn 2 + m 2 nσ m 3 σ min{m 2 nσ, mn 2 σ 2 } Direct MPO-MPS application and subsequent direct truncation with SVD to bond dimension n m 2 σ 2 w 2 + m 3 w 3 + m 3 σw 3 + mn 2 w + m 2 nσw 2 m 3 σw 3 mn 2 σ 2 w Zip-up MPO-MPS application with target bond dimension n and intermediate bond dimension 2n (incl. initial normalization; second line is the final truncation sweep from 2n to n) m 3 + m 3 σ + 2m 2 nσw + 2mnσ 2 w 2 + 4mn 2 w m 3 σ 4σ 2 n 2 mw +4n 3 σ + 2n 3 +4n 3 σ Variational MPO-MPS application with target bond dimension n; single-site; one optimization sweep 2mn 2 σw + 2mnσ 2 w 2 + 2m 2 nσw + n 3 σ n 3 σ none Variational MPO-MPS application with target bond dimension n; two-site; one optimization sweep mn 2 σw + 3mnσ 2 w 2 + 3m 2 nσw + mn 2 σ 2 w + n 3 + n 3 σ none n 3 σ 3 best application method. Conversely, when using the zip-up method, repeated truncation sweeps according to a defined singular value threshold should be used to obtain the most efficient representation of the state. Further interesting possibilities may be the truncation based on the left/right density matrix [56] or by optimizing the local density matrix [57] which aims at preserving local observables.

Expectation values
In standard dense numerical linear algebra, to evaluate the expectation value φ|Ô |ψ , it is necessary to evaluateÔ |ψ and subsequently the overlap between φ| andÔ |ψ . As we have seen in the previous section, MPO-MPS products are relatively costly to evaluate. Luckily, the tensor network representing φ|Ô |ψ (cf. Fig. 10) allows many different contraction orders. Contracting it from left-to-right (using iteratively updated tensors L j as defined in Eq. (33)) or right-to-left (using R j as defined in Eq. (34) respectively) leads to an asymptotic contraction cost of O L m 3 wσ + m 2 w 2 σ 2 . The optimal contraction sequence is also indicated by the shading in Fig. 10.

Overview of time-evolution methods
In the following sections, we will discuss in detail five different time-evolution methods for MPS which are currently in use to solve the time-dependent Schrödinger equation (TDSE). Each of them has different strengths and weaknesses, requiring a careful consideration of each individual problem to determine the most promising approach. In Sec. 8 we then examine some prototypical examples in an attempt to provide some guidance.
The first two methods (Sec. 4) approximate the time-evolution operatorÛ (δ) = e −iδĤ , which is then repeatedly applied to the MPS |ψ(t) to yield time-evolved states |ψ(t + δ) , |ψ(t + 2δ) etc. The time-evolving block decimation, also known as the Trotter method and abbreviated here as TEBD, was developed around |ψ Ô φ| Figure 10: The expectation value φ|Ô |ψ of an operator between two (possibly different) states represented as MPO and MPS respectively. The optimal contraction order is sideways, e.g. from left to right, as indicated by the shading. To add an additional column, the optimal evaluation order is j . An easy option for two-fold parallelization is the concurrent evaluation of L j−1 and R j from left and right respectively. Here and in all following diagrams, we leave off the dummy left/right indices indicated by dotted lines earlier.
2004 both for MPS [4,7,8] and the original classical DMRG [5,6]. It uses a Trotter-Suzuki decomposition ofÛ (δ), which is in particular applicable to short-ranged Hamiltonians. The time evolution is unitary up to the inherent truncation error, but the energy is typically not conserved. The main variants are using a second-order decomposition (TEBD2 in the following) and a fourth-order decomposition (TEBD4) to minimise the error due to the finite time step. While TEBD2 is essentially always useful, as it produces only two-to three times as many exponential terms as a first-order decomposition, TEBD4 produces four to five times as many exponentials as TEBD2. Depending on the desired error, this may or may not be worthwhile. In contrast, the MPO W I,II method was introduced only recently [32] and relies on a different decomposition scheme for the time-evolution operatorÛ (δ) particularly suited to construct an efficient representation as a matrix-product operator. It can directly deal with long-range interactions and generally generates smaller MPOs than TEBD. Its primary downside is that the evolution is not exactly unitary. The time step error of both TEBD and the MPO W I,II method is larger than in the other methods described below.
Compared to these methods, algorithms based on Krylov subspaces [58,59] directly approximate the action ofÛ (δ) onto |ψ , i.e. produce a state |ψ(t + δ) without explicitly constructingÛ (δ) in the full Hilbert space while preserving the unitarity of the time evolution. The main advantage lies in a very precise evaluation of |ψ(t + δ) with a very small inherent finite time-step error [60].
The global Krylov algorithm (Sec. 5) is related to the time-dependent Lanczos method in exact diagonalization approaches to time evolution [16][17][18] and has only partially [19][20][21] been adapted to the MPS setting. Interestingly, evaluation of observables on a very fine time grid (e.g. δ = 10 −5 ) is possible, which would be prohibitively expensive using any of the time-stepping methods. The downside of this global Krylov method is its need to represent potentially highly entangled Krylov vectors as matrix-product states.
These highly-entangled global Krylov vectors do not need to be represented if instead of working globally, one works in the local reduced basis (Sec. 6). From a DMRG perspective, this can be seen as a variant of the time-step targeting method [24][25][26][27]. Its primary objective is to solve the TDSE locally on each pair of lattice sites to construct an optimal MPS for the time-evolved state. Conversely, this local method can no longer evaluate observables precisely at arbitrarily-small time steps δ δ (as no complete MPS is available at such intermediate times) but works much like TEBD2 as a time-stepper, producing a single final state |ψ(t + δ) . In contrast to TEBD and the MPO W II it allows for the treatment of arbitrary Hamiltonians. An uncontrolled projection error may, however, lead to incorrect results as the MPS-projected Hamiltonian cannot represent the actual physical Hamiltonian well if the MPS used for the projection is very small. A further development of this approach is the time-dependent variational principle [28,29] (TDVP). The TDVP can be considered an approach to remedy the dominant errors in the local Krylov approach by a thorough theoretical analysis leading to an optimized evolution method. Its implementation is nearly identical to the local Krylov method, but the different derivation of the local equations leads to smaller errors because it arrives at a closed solution of a series of coupled equations. In particular, using the two-site variant of TDVP (2TDVP), we know that nearest-neighbor Hamiltonians do not incur a projection error which is often the primary error source in the local methods. The single-site variant (1TDVP) has a larger projection error and also always works at constant MPS bond dimension but violates neither unitarity nor energy conservation during the time evolution.
We complement this part of the review by presenting a subjective selection of useful tricks which are applicable to most of the time-evolution methods and which can help in the treatment of complicated systems. We will discuss in some detail: (i) how to combine the time evolution in the Heisenberg and the Schrödinger picture, respectively, to reach longer times; (ii) how to select time steps to increase the accuracy of the integrator; (iii) how removing low-lying eigenstates and the application of linear prediction helps in calculating Green's functions; (iv) how to specifically select the optimal choice of purification schemes for finite-temperature calculations; and (v) briefly summarize the local basis optimization technique to treat systems with many bosonic degrees of freedom.

Approximations toÛ (δ)
The following two subsections will summarize two popular choices to approximate the time-evolution operatorÛ (δ), which, when applied to an MPS |ψ(t) gives a new MPS |ψ(t + δ) at time t + δ. Future developments to construct suchÛ (δ) in an easier and more generic manner are conceivable.

Time-evolving block decimation (TEBD) or Trotter decomposition
At its heart, this method relies on a Trotter-Suzuki [31] decomposition and subsequent approximation of the time-evolution operatorÛ exact (δ). It was developed around 2004 both in the (then new) context of MPS [4,8] and the classical system-environment DMRG [5,6] and remained the most popular time-evolution method for both settings in the following years. 2 As an illustrative example let us first consider the nearest-neighbor Heisenberg chain. Its Hamiltonian is given byĤ The exact evolution is then given byÛ If we splitĤ into two summandsĤ even andĤ odd aŝ we can use the Baker-Campbell-Hausdorff formula to approximatê ≈ e −iδĤeven e −iδĤ odd (46) Now, both e −iδĤeven and e −iδĤ odd are easy to evaluate: All the summandsĥ j,j+1 commute with each other so it suffices to exponentiate them individually: Writing e −iδĥj,j+1 into a MPO, we have identity tensors with bond dimension w = 1 everywhere except on sites j and j + 1. Splitting the two-site gate into two individual tensors (e.g. with an SVD) results in a bond between those sites which has at most dimension σ 2 . Hence writing e −iδĤeven as an MPO is also efficient, it has alternating bond dimensions σ 2 and 1. Accordingly, e −iδĤ odd written as an MPO has alternating bond dimensions 1 and σ 2 and their product e −iδĤeven e −iδĤ odd can be written with bond dimensions σ 2 throughout. If we keep the decomposition ofĤ intoĤ even andĤ odd fixed and hence the commutator Ĥ even ,Ĥ odd constant, the error per time step δ betweenÛ TEBD1 (δ) andÛ exact (δ) is of second order: because we can approximate Considering a longer interval T , which we divide into T /δ smaller intervals, at which we applyÛ TEBD1 (δ), our errors accumulate to T /δ O(δ 2 ) = O(δ) per time interval T . The error is hence of first order in δ which givesÛ TEBD1 (δ) its name as first-order TEBD. By symmetrizing the decomposition, it is straightforward to constructÛ which has a third-order error per stepÛ and hence a second-order error per time interval. Similarly, we can construct a fourth-order TEBD evolution operator asÛ with We then haveÛ and hence a fourth-order error per time step. This approach can be generalized to more complicated divisions of the full HamiltonianĤ into N H internally-commuting partsĤ 1 ,Ĥ 2 , . . .:Ĥ where eachĤ α is a sumĤ e −iδ/2(ŝ 7 ·(ŝ 8 ×ŝ 9 )+h.c.) Figure 11: Exemplary structure of a tensor network representing a time-evolved stateÛ TEBD2 (δ) |ψ of 12 sites. The state is written as a MPS |ψ at the top and evolved using TEBD2 under the HamiltonianĤ = 10 j=1ŝ j · (ŝ j+1 ×ŝ j+2 ) + h.c.. The three-site action of the Hamiltonian necessitates the decomposition into three sumsĤ 1,2,3 . To achieve a second-order error, the evolution is symmetrized by applying the operatorsÛ 1,2,3 in reverse order after each half-step. The individual time-evolution operators written as MPOs have two thick bonds alternating with a 1-dimensional dummy bond.
such thatĥ k α can be diagonalized efficiently and [ĥ k α ,ĥ l α ] = 0. The local termsĥ k α do not need to be limited to two sites, but it must still be possible to evaluate their exponential efficiently. A first-order TEBD time evolution operatorÛ TEBD1 (δ) can then be written aŝ The error per time interval is still of first order as before. The second-order decomposition is likewise obtained as a symmetrization of theÛ TEBD1 (δ): It has the same second-order error per time step as above, an exemplary tensor network for the case of a three-site interaction is given in Fig. 11. A fourth-order decomposition could be constructed, but the number of terms required grows very quickly with the number of summands N H in the Hamiltonian, making the fourth-order decomposition less attractive. Recently, Ref. [61] investigated optimized decomposition schemes beyond the naive approaches presented here. These schemes have the potential to further reduce errors and the number of exponentials required, but still need to be tested in practice.

Errors
TEBD suffers from two errors, both of which are controllable and can be estimated fairly straightforwardly. The first is the time step error of order O(δ 2 ) per time step for first-order TEBD and O(δ 3 ) per time step for second-order TEBD. With a fixed total interval T divided into N = T /δ steps, the error over the entire interval is O(δ) and O(δ 2 ) respectively for first and second order TEBD. This time step-inherent error does not violate unitarity of a real-time evolution as each of the constructed operators e −iδĤα is unitary. However, the energy and other conserved quantities are not necessarily constant if the time-step error is large. The primary option to reduce this discretization error is to chose a smaller time-step size δ or a higherorder decomposition like TEBD4. It is not always optimal to use TEBD4, as higher-order decompositions typically result in more MPOs to apply to the state sequentially to do a single time step. For example, if a Trotter error of 10 −2 per unit time is desired, 1TEBD needs a step size δ = 0.01 and accordingly 100 steps to reach time t = 1. For the same error, TEBD2 needs a step size δ = 0.1 and hence 10 steps to reach t = 1 while TEBD4 could work with a step size of √ 0.1 ≈ 0.3 and hence approximately three steps. However, where TEBD1 (e.g.) needs three exponentials e −iδH1 e −iδH2 e −iδH3 , TEBD2 has to apply five exponentials e −i δ /2H1 e −i δ /2H2 e −iδH3 e −i δ /2H2 e −i δ /2H1 (or four, if we combine e −i δ /2H1 of neighboring steps) and TEBD4 has to work with 21 (20 when combining neighboring steps) exponentials. That is, while TEBD4 only needs three times fewer steps than TEBD2 for a final error of 10 −2 , each individual step needs approximately five times as much work. In comparison, if we target an error of 10 −8 , TEBD1 needs 10 8 steps, TEBD2 needs 10 4 steps and TEBD4 only needs 10 2 steps -for doing five times as much work per step, we reduce the number of steps by a factor of 100 when going from TEBD2 to TEBD4.
As an alternative to such higher-order decompositions, one may combine a large-scale Trotter decomposition with a small-scale Krylov-based approach to make the commutator of the Hamiltonian summandsĤ α smaller [62].
The second source of errors is due to the mandatory truncation of the time-evolved state to a realistic bond dimension. This truncation error can also be controlled and measured by the discarded weight during the truncation as usual. It affects both the unitarity of the evolution and the conserved quantities, but can be made smaller at comparatively little cost by allowing a larger bond dimension of the MPS.

Long-range interactions
Long-range terms in the Hamiltonian can be treated by introducing swap gates [46] in order to exponentiate each individual contribution. Each swap gate exchanges the physical indices of two nearest neighbors, hence moving the interacting sites next to each other. The evolution term e −iδĥ k α is then sandwiched between two series of swap gates which first bring the sites onto whichĥ k α acts next to each other and then, after the evolution, move them back to their original places. A smart ordering of the individual evolution terms often allows to cancel two series of swap gates from differentĥ k α andĥ l α against each other. Nearly the entire overhead of such swap gates can be removed in this way.

Algorithm
The two steps of the TEBD method (constructing the evolution operator and repeatedly applying it to the state) are described in Alg. 1 at the example of the second-order TEBD2 method.
To construct the time-evolution operator, one first needs to split the Hamiltonian into internallycommuting partsĤ α (cf. Eq. (58)). Each summandĥ k α of each part is then individually exponentiated as exp −iδĥ k α . If the summand is non-local, swap gates are introduced before and after the exponentiation such that the exponentiation works on nearest-neighbor sites. One then obtains a number of MPO component tensors (one for each site between the leftmost and rightmost sites on whichĥ k α acts). These tensors are placed within a MPO spanning the entire system to eventually yieldÛ α . Once allÛ α are constructed (in a specific, selected ordering), one has a representation of the first-order decomposition ofÛ . Going from this first-order decomposition to a second-order decomposition is very straightforward, one only has to first replace δ by δ/2 in the initial exponentiation and second append the reversed list of MPOs to itself, i.e. store a list {Û 1 ,Û 2 ,Û 3 ,Û 3 ,Û 2 ,Û 1 }. This newÛ is then symmetric, each half moves time forward by δ/2 for a total step of size δ.
To then time-evolve the state, one only has to applyÛ repeatedly to it, each time advancing the state by a step δ in time. In practice, we found it useful to write the time-evolved state |ψ(t) to disk after every step. In this way, the time evolution can proceed as quickly as possible and evaluations (e.g. for expectation values, overlaps with other states etc.) may proceed in parallel by using the generated state files, possibly even on separate computers.
An interesting avenue to parallelization [63] in TEBD can be used if instead of working in the MPO framework all n-site gates are kept separate. For very small time steps, gates should not affect singular value spectra on far-away bonds. It should hence be possible to apply many gates independently and also truncate many bonds independently (using a slightly different gauge not discussed here). While not extensively utilized in the literature yet, such real-space parallelization promises much larger speed-ups than local (e.g. BLAS) parallelization options if the associated errors are kept sufficiently small.
Splitting or combiningÛ (δ). It may provide an additional computational advantage to combine multiple MPOs representingÛ α (δ) into one larger MPO representingÛ (δ). For example in the case of the nearestneighbor Heisenberg model, one arrives naively at two MPOsÛ even andÛ odd , each exponentials ofĤ even and H odd respectively. These MPOs have alternating bond dimensions 1 − 4 − 1 − 4 − . . . and 4 − 1 − 4 − 1 − . . .. Multiplying both MPOs together, one obtains a single new MPO with bond dimension 4 throughout the system. Applying the new MPO only requires a single sweep over the system, whereas applying the two original MPOs requires two sequential sweeps. While the computational effort to leading order is identical, the former option is much more cache friendly, requiring access of each individual site tensor from RAM (or even a hard disk drive) only once instead of twice.
Similarly, if the Hamiltonian needs to be split into multiple termsĤ α , it is potentially helpful to first multiply the MPOs forÛ α and then apply the resultingÛ to the MPS. In general, applying manyÛ α with very small bond dimensions leads to a large overhead, whereasÛ α with very large bond dimensions are more costly to multiply with states. The optimal trade-off depends on the problem at hand. As a rule of thumb, one should multiply two time-evolution operators if this does not increase the maximal bond dimension (as in the example above). If the bond dimension of the time-evolution operators is very small, it might also make sense to combine them until a comparatively large bond dimension (e.g. w ≈ 10) is obtained which reduces the overhead associated to many small matrix multiplications.

Algorithm 1
The second-order TEBD2 method. The main input is an analytic expression for the Hamil-tonianĤ necessary to split it into its constitutent internally-commuting parts. Additionally, one needs to select an application method apply to evaluate MPO-MPS products (including truncation as required). To run, one first calls Prepare-U (δ) and then repeatedly Timestep to generate the time-evolved states.

13:
Multiply pairs of constituentsÛ α ,Û α+1 ofÛ if possible within bond dimensions 14: returnÛ 15: end procedure 16: procedure Timestep(List of MPOsÛ , initial state |ψ , application method apply) 17: for each termÛ α inÛ do 18: |ψ ← apply(Û α , |ψ ) e −iδĤ , which can be implemented efficiently using MPOs. In this scheme, the error per site is independent of the system size. Furthermore, the construction is capable of dealing with long-ranged interaction terms making it suitable for two-dimensional systems, too. In this section, we will sketch the derivation of the MPO representations W I,II and discuss how to construct these operators as well as the numerical errors and the stability of the overall method.

Motivation and construction
The general idea is to exploit the intrinsic factorization of MPO representations of operators. We consider operators which have a local structure in the sense that they are given by a sum of termsĤ j acting on a subset of the lattice, starting at site j:Ĥ = jĤ j . The Euler approximation of the operator exponential is e −iδĤ = 1 − iδ jĤ j + O(δ 2 ) and since there are ∼ L 2 contributions from all possible combinations of local terms with finite support the error per site is ∼ Lδ 2 ; hence the approximation becomes more and more unstable with increasing system size. Ref. [32] introduced the following local version of the Euler stepper where the primed sum indicates that the local operator termsĤ j ,Ĥ k do not act on a common subset of the lattice, i.e., they do not overlap. Even though the error ofÛ I is still of order δ 2 in the step size there are only O(L) contributions which are missed, namely those combinations of local terms with overlapping support. Hence, the overall error is bounded by O(Lδ 2 ), and thus the error per site is constant in the system size.
Recall now the decomposition of an MPO into a left, right and local part with N j interaction terms crossing bond j, or equivalently: The operator-valued matricesÂ j ,B j ,Ĉ j ,D j specify the local structure of the interactions described byĤ and the set of all matricesŴ 1 . . .Ŵ L define the MPO representation ofĤ. The MPO representation W I of Û I is given byŴ In this case the representation ofĤ as finite state machine is particularly useful since it permits to directly deduce the MPO representation for W I . The MPO bond dimension of W I is w−1 with w the bond dimension ofĤ. Hence, this MPO can be applied numerically very efficiently. However, the restriction of W I to treat only non-overlapping local operator termsĤ j is very strong and fails to even reproduce the correct time evolution of a purely on-site Hamiltonian. For example, W I for H = jŝ z j generates only operator stringsŝ z k · · ·ŝ z k+n but notŝ z kŝ z k . An improvement is to permit operator strings that overlap on one site:Û where the double-primed sum only excludes termsĤ j ,Ĥ k overlapping at more than one site, sharing a bond. For instance, consider the expansion of the time-evolution operator for the S = 1/2-Heisenberg chain: are discarded. Hence, the error is again of order δ 2 but contributions with arbitrary powers of single-site terms are treated exactly. There is no closed general MPO representation forÛ II but we can give an approximation which has an error O(δ 3 ) and hence does not affect the second-order approximation ofÛ II . In the following we will first explicitly demonstrate how to numerically construct the MPO representation W II from the block-triangular structure of the MPO representation ofĤ and subsequently motivate the used formalism. As forÛ I , the MPO representation ofÛ II is of the formŴ In order to construct the matrices W II {Aj ,Bj ,Cj ,Dj } we employ transition amplitudes between hard-core bosonic states (cf. Sec. 4.2.2) following the notation in Ref. [32]. Let H 2,aj denote the a j -th hard-core bosonic Hilbert space. N j of these spaces form H c = Nj aj =1 H 2,aj and we work in the joint Hilbert space H c ⊗ Hc which spans 2N j individual hard-core bosonic Hilbert spaces. The ladder operators acting on H c and H c arê c † aj ,ĉ aj ,ĉ † aj andĉā j respectively. The generator of the matrix elements is a map on the joint bosonic and physical Hilbert spaces H c ⊗ Hc ⊗ H phys and given by the operator-valued exponentialŝ Φ j;aj ,āj = eF j;a j ,ā j = eĉ † a jĉ † a jÂ j;a j ,ā j Denoting the combined bosonic vacuum state by |0 ⊗ |0 ≡ |0,0 the following transition amplitudes 22 determine the operator-valued entries of the MPO representationŴ II j : W II Aj ;aj ,āj = 0,0|ĉ ajĉājΦj;aj ,āj |0,0 = 0|ĉ ajĉāj eĉ † a jĉ † a jÂ j;a j ,ā j For clarity, in the following we will explicitly demonstrate the calculation of the elements of W II Aj ;aj ,āj . Explicitly, we use to represent the hard-core bosonic states and operators. SinceF j;aj ,āj only contains creation operators for the modes a j ,ā j the vacuum expectation value over all 2N j bosonic modes can be simplified to the relevant modes, i.e., we can replace |0,0 → |0 aj ,0ā j . Therefore, the generator is obtained by exponentiating the matrixF which has been truncated to the relevant bosonic Hilbert spaces H 2,aj , H 2,āj . The entries ofŴ II Aj ;aj ,āj are then obtained by evaluating the power series of the matrix exponential and calculating the vacuum expectation valuê W II Aj ;aj ,āj = 0,0|ĉ ajĉājΦj;aj ,āj |0,0 = 1 aj ,1ā j |Φ j;aj ,āj |0,0 A compact notation for all matrix elements of W II can be obtained if we let the annihilation operatorŝ c aj ,ĉ aj in Eqs. (71) to (74) act on the bra 0, 0| and define the formal symbol Hence, the N 2 j exponentials exp F j;aj ,aj already contain all relevant information to contruct the stepper W II and in particular there is no need to calculate different exponentials as suggested by Eq. (71) -Eq. (74).

Detailed derivation of the W II representation
We will present the construction of W II in more detail, following Ref. [32], in this section. Before digging into the details, let us briefly sketch the derivation: The goal is to find an MPO representation of the time stepperÛ II (δ) capturing interaction terms in the series expansion as described above. The corresponding MPO site tensors W II are obtained by making use of the MPO recursion Eq. (66) to factorize the exponential e −iδĤ . The factorization itself is performed exploiting complex Gaussian integrals via the introduction of auxiliary fields φ j , φ j on each bond which are integrated over. The resulting MPO is of the form that is, the bond indices φ j , φ j are continuous degrees of freedom. In a final step, these bond indices are discretized using bosonic coherent state path integrals yielding the desired expression for the MPO site tensors.
We begin by employing the bond representation ofĤ (compare Eq. (65)) and write the operator as a formal scalar product over auxiliary degrees of freedom a j = 1 . . . N j , where N j again is the number of interaction terms crossing the bond j, and the product between the entries of the operator-valued "vectors"Ĵ j andĴ j are tensor products between operators acting on partitioned Hilbert spaces to the left and right of the bond j, respectively. Next, we introduce complex vector fields φ j;aj and their complex conjugate φ j;aj and define a mapping from the auxiliary indices into the full Hilbert spacê where we will suppress the identities acting on the left and right partitions of the Hilbert space in the following. We then use complex Gaussian integrals to decouple each left and right interaction term eĴ j;a jĴ j;a j = 1 π dφ j;aj dφ j;aj e −φj;a j ·φj;a j +Ĵj;a j ·φj;a j +φj;a j ·Ĵj;a j .
Furthermore, we factorize the operator exponential e −iδĤ (setting τ = −iδ) where the error occurs at second order in τ if the operatorsĴ j ,Ĵ j ,Ĥ L j−1 ,Ĥ R j+1 do not commute. 3 Exploiting N j complex Gaussian integrals and defining D φ j , φ j = aj dφj;a j √ π dφj;a j √ π we obtain Here we used that the identities1 L j and1 R j can also be interpreted as maps from an auxiliary index into the full Hilbert space. Now the MPO recursion Eq. (66) can be applied to, e.g., shift the active bond j → j + 1 in the third exponential From now on we replace the formal dot productD j ·1 R j+1 by its action on the full Hilbert space which reduces for the case ofD j to on-site terms only. The same holds for the dot product φ j ·B j1 R j+1 ≡ φ j ·B j which only connects sites to the left of j. In the exponent the summands √ τ φ j ·B j ·1 R j+1 , τD j ·1 R j+1 and τ1 j ·Ĥ R j+1 already act separately on site j and the right partition of the Hilbert space H R j+1 . In order to separate the remaining summands, too, we introduce another set of auxiliary fields φ j+1 , φ j+1 by insertion of more complex Gaussian integrals with U φj ,φj+1 = e φj ·Âj ·φj+1+ generated by the integration of U φj ,φj+1 e −φj+1φj+1 over the continuous bond degrees of freedom φ j , φ j+1 . Iterating through the whole system we finally obtain which is a first-order approximation in τ of e τĤ in terms of an MPO with continuous bond labels φ j = φ 1 , · · · , φ Nj . In a final step the continuous bond labels φ aj are transformed into discrete ones. We note that for a set of N j bosonic ladder operatorsĉ aj ,ĉ † aj we can rewrite the tensors at site j as expectation values for coherent states which have the propertieŝ For instance, we may consider the case when N j ≡ 1. Then a single bond label φ j;aj ≡ z j is sufficient and (abbreviatingĉ aj ≡ĉ j ,ĉ † aj ≡ĉ † j ) we have at the bond (j, j + 1) In the last equality we expanded the coherent states |z j , |z j+1 in the bosonic occupations number basis |n j , |n j+1 . Now, we turn to the integral over one pair of bond labels z j , z j , considering only those factors We parametrized the complex integration in polar coordinates in the second identity and exploited the representation of the Γ function in the last line. For the general case N j > 1 we define the boson occupation number n j = (n 1 · · · n Nj ) for each bond. Since we can always reorder the φ aj , we can perform the same integrals for each continuous bond label so that we finally find yielding the desired MPO representation. A pedestrian way to obtain a compact MPO representation from this very formal derivation is to analyzê U nj−1,nj when truncating the bosonic Hilbert spaces to a maximal boson occupation b, i.e., n j;aj ∈ {0, . . . , b}. We define bosonic field operatorsφ † j = ĉ † 1 · · ·ĉ † L and denote the maximal occupation number by an upper index (b). Then, for each b we have operator-valued bosonic matrix elementŝ of mapsΦ j on the joint Hilbert spaces of b bosons H b;aj for every a j ∈ {1, . . . , N j } at each bond and the physical Hilbert space H phys at each sitê Let us consider b ≡ 0, then the matrix elements collapse to pure on-site termsÛ j;0,0 = 0, 0|Φ j |0, 0 = e τDj . The next order b ≡ 1 additionally collects contributions from all interactions crossing either the bond (j−1, j) which can be arranged as an operator-valued matrix and yields the suggested form of W II . The lower right matrix elementŴ II Aj contains two bosons. When formally contractingÛ (1) j with the neighboring matrixÛ (1) j+1 , these bosons connect, for instance, all local operators fromÂ j to local operators fromB j+1 . Following this procedure, we find that in this way truncating the MPO approximation to b = 1 we keep all local operator strings that overlap at most at one site and hence the error is O(τ 2 ).

Errors
The MPO W II approximation to the time-evolution operatorÛ (δ) primarly exhibits an error O(δ 2 ) due to the truncation of the auxiliary degrees to hard-core bosons with maximal occupation b ≡ 1. The Trotter error created by Eq. (86) was shown in Ref. [32] to be O(τ 3 ) so that it is subleading compared to the auxiliary boson field truncation error. However, the W II MPO representation is not invariant under the particular choice of the decomposition of the Hamiltonian into local termsĤ = jĤ j . This degree of freedom can be used to reduce the truncation error and is discussed in detail in Ref. [32].
We want to point out that the method of complex time steps discussed in Sec. 7.2 can transform a firstorder stepper with error O(δ 2 ) into a second-order stepper with error O(δ 3 ). This is particularly suitable for the W II MPO representation since the W II bond dimension w = N j + 1 is comparably small also for long-ranged Hamiltonians. An improved stepper is hence available without too much numerical effort and should be used (it is used in this form in the experiments run later). In turn, this construction in general destroys the unitarity of the stepper. Furthermore, as already discussed in Sec. 4.1.1 for TEBD, the choice of the MPO application method can significantly improve (or reduce) the accuracy of the time stepper.

The global Krylov method
Krylov subspace methods [16][17][18][19][20][21] (e.g. the Lanczos method [22,59]) are well-known iterative techniques from the field of numerical linear algebra. In their application to time-dependent problems, one approximates the action ofÛ exact (δ) onto the quantum state |ψ(t) directly, resulting in a time-evolved state |ψ(t + δ) . It does not provide access to the exponential e −iδĤ in the standard physical basis. The most straight-forward approach is to ignore the special structure of the MPS/MPO representation and directly implement the iterative procedure, as detailed below. This is what we call the global Krylov method. In contrast, a variant exploiting the structure of the MPS ansatz will be discussed in Sec. 6.1. We first introduce the Krylov method independent of the specific representation (dense vectors as in exact diagonalization, MPS, tree tensor networks etc.) used. This algorithm is also used as the local integrator for the local Krylov method of Sec. 6.1 and the TDVP in Sec. 6.2. Subsequently, we will discuss particular caveats when applying the method globally to matrix-product states.
The Krylov subspace K N of a HamiltonianĤ and initial state |ψ is defined as the span of vectors {|ψ ,Ĥ |ψ , . . . ,Ĥ N −1 |ψ }. This space is spanned by the Krylov vectors |v 0 , |v 1 , . . ., |v N −1 such that the first Krylov vector |v 0 is set to |ψ normalized to have norm 1, and the subsequent Krylov vectors |v i are constructed by applyingĤ to |v i−1 and orthonormalizing against all previous Krylov vectors equivalently to the Lanczos algorithm. In exact arithmetics with HermitianĤ, this way to construct a Krylov subspace reduces to orthogonalizing against the previous two vectors |v i−1 and |v i−2 , which is equivalent to the Lanczos algorithm [59]. However, due to round-off errors intrinsic to a numerical implementation, orthogonality of the Krylov vectors is usually lost. If the precision required of each solution is low, one Algorithm 2 The Krylov method. The main input is the Hamiltonian, the initial state, and the (possibly complex) time step. Additionally, a procedure ApplyAndOrthonormalize is needed, which in turn requires the operator-state product and the orthogonalization of states. For details on a variational approach to this, see Sec. 5.3. The function ComputeEffectiveH only needs to update the new elements of T j+1 compared to T j .
if |w < ε then 4: return invariant subspace found evolution exact using just {|v 0 , · · · , |v k } 5: end if 6: return |w |w 7: end procedure 8: procedure ApplyAndOrthonormalize(Ĥ, {|v k , · · · |v 0 }) 9: |w k ←Ĥ |v k 10: return Orthonormalize(|w k , {|v 0 , · · · |v k }) 11: end procedure 12: procedure Timestep(Ĥ, |ψ(t) , δ) 13: |v 0 ← |ψ(t) / |ψ(t) 14: for j ← 1 . . . do 15: |v j ← ApplyAndOrthonormalize(Ĥ, {|v 0 , · · · |v j−1 }) 16: return |ψ(t) j i=0 c i j+1 |v i 23: end procedure may abstain from avoiding this problem and simply work in a very small subspace. However, due to the accumulation of errors during a time evolution and the calculation of spectral or time-dependent properties, it is necessary to cure this problem. Hence, one typically needs to explicitly orthogonalize each new Krylov vector against all previous Krylov vectors. 4 The method then proceeds by searching for the element of K N which approximates the result of the exact evolution most closely: To do so, we define the projector onto K N where we have introduced matrices V N and V † N to represent the maps from the Hilbert space onto the Krylov space and vice versa. The solution to the above minimization problem is given by Note that for N = dimH ≡ N H this is exact. Expanding the projectors and writing down the formal Taylor series forÛ exact (δ), we find: where N N H and T N is the Krylov-space representation ofĤ with coefficients The Krylov approximation is introduced in Eq. (108). Note that for n > N − 1 This implies that the error in the Taylor-expansion of the time-evolution operator is of order δ n /n!, indicating that already a few iterations suffice to give a very small error in the integrator. If V † N V N were a proper identity, we could insert it in between any power ofĤ and obtain exactly V NĤ n V † N = T n N . However, our Krylov subspace is much smaller than the true Hilbert space and the projection induced by V † N V N is hence very large, V † N V N = 1. But due to the special structure of this Krylov space, V † N T n N V N |ψ(t) converges much more quickly toĤ n |ψ(t) than expected if we, e.g., selected random vectors in the full Hilbert space to construct our subspace.
In exact arithmetic and with HermitianĤ, T N would be tridiagonal, i.e., (T N ) i,i = 0 if |i − i | > 1. While in practice this is not necessarily true, we found that it improves numerical stability and accuracy of the results to assume T N to be tridiagonal and only evaluate those elements while forcibly setting all other elements to zero. Returning to our equation Eq. (108), as all other Krylov vectors are orthogonal to |v 0 and |v 0 is the normalized version of |ψ(t) . T N can be exponentiated efficiently using standard diagonalization routines from Lapack, as it is only of size N × N .
For a given number of Krylov vectors and step size δ, we hence obtain For typical problems as presented in the example section, the number of Krylov vectors used by us was between 3 and 10. The algorithmic procedure is summarized in Alg. 2.

Evaluation of expectation values Ô (t + δ)
. It is not actually necessary to construct the time-evolved state |ψ N (t + δ) to evaluate Ô (t + δ) for arbitrary δ. Instead, evaluating v i |Ô |v i for all i, i = [0, . . . , N − 1] and constructing only the coefficient vector c N is sufficient to evaluate the observable. One can hence potentially avoid adding up the Krylov vectors and constructing the time-evolved state |ψ(t + δ) . Indeed (cf. Sec. 5.4 later), by evaluating the expectation values v i |Ô |v i , it becomes possible to calculate the value of the observable at any time δ < δ by only operating on small N × N matrices. Hence very small time steps unattainable by the other time-stepping methods (e.g. δ = 10 −5 ) become available.

Errors
Elaborate bounds are available to estimate the error incurred during the Krylov approximation [60]. Unfortunately, these bounds are proven under the assumption of exact arithmetic and hence do not necessarily apply in the context of matrix-product states. The main take-away, which is confirmed in practice, is that the Krylov error is in O(δ N ) as long as √ W δ ≤ N where W is the spectral range, which, in turn, is roughly of the same order of magnitude as the system size. For a typical system of L = 100 sites with δ = 0.1, this condition is fulfilled as soon as N ≈ 3; more precise bounds are available in exact arithmetic. 5 Hence, there are two approaches to measure the convergence of the Krylov space: (i) The bottomrightmost entry of the effective matrix T n measures the weight scattered out of the Krylov space by application of the Hamiltonian and typically decays exponentially; (ii) the full Hilbert-space 2-norm distance between two sequential iterations is cheaply available through the coefficients of Krylov vectors produced in the two iterations. In our experience, this second measure makes for an excellent convergence criterion.
In addition to the inherent Krylov error which can often be made extremely small (O(10 −10 ) or smaller), the Krylov method of course also suffers from the standard MPS truncation error -this error, too, can be measured precisely (via the discarded weight) and be made very small. As such, both errors of the global Krylov method can be made extremely small at finite time-step size, albeit at relatively large numerical cost. The method hence in particular excells if used to evaluate states very precisely, e.g., to measure the errors of other methods on short time scales.

Application to matrix-product states
Up to this point, there was no need to narrow the description down to a specific representation, which serves as a proof for the versatility of the Krylov method. In our practical calculations, however, we wish to use MPS to represent the time-evolved quantum states and intermediate Krylov vectors and an MPO to represent the HamiltonianĤ, which requires a few minor adaptations for efficiency and accuracy. Note that in stark contrast to the TEBD and MPO W I,II method, only an MPO representation ofĤ and no analytical or other decomposition is required.
First and foremost, the most obvious improvement is in the calculation of the last entry of the effective Krylov matrix T N . In exact or dense arithmetic, the evaluation of v N −1 |Ĥ |v N −1 requires computing the matrix-vector productĤ |v N −1 . This is not the case in the MPS approach: Indeed, evaluating the expectation value v N −1 |Ĥ |v N −1 is much cheaper than calculating the MPS representingĤ |v N −1 , cf. Sec. 2.9. As such, to generate a N × N -dimensional effective Krylov matrix T N , one only needs to evaluate N − 1 MPO-MPS products and avoids the MPO-MPS product for the most costly application on the last Krylov vector. In our experience, the bond dimension of every additional Krylov vector grows superlinearly, making this optimization very worthwhile.
To construct the time-evolved state |ψ(t + δ) , it is necessary to sum N Krylov vectors together. Various methods to do so exist, in our implementation, we sequentially add two vectors (resulting in a new MPS with bond dimension 2m) which are then truncated back down to the target bond dimension m. In N − 1 steps, all N Krylov vectors are summed together at cost O(N (2m) 3 ). One could follow-up on this procedure with some sweeps of variational optimization or alternatively directly variationally optimize, but this does not appear to be necessary for our application. 5 In exact arithmetic for a spectral width W ofĤ, the error is smaller than 10e −N 2 /(5W δ) if N is between √ 4W δ and 2W δ and the error is smaller than [60]. It is unknown how this translates to the case of inexact arithmetic.

Loss of orthogonality
A typical problem of Krylov-subspace methods is the possible loss of orthogonality of the basis vectors due to the finite-precision arithmetic of floating point operations. This matter becomes substantially more pressing in the matrix-product algebra, as truncation is crucial to keeping computations feasible. If many Krylov vectors are desired, truncation errors affecting the orthogonality of the basis vectors do not simply add to the overall error (see above), but may quickly degrade the overall quality of the Krylov space, leading to a poor result. In this case, it is necessary to check for orthogonality in the basis and eventually re-orthogonalize the basis vectors successively. However, if one uses a simple Gram-Schmidt procedure to orthogonalize vectors by successive additions of MPS, new truncation errors are introduced during this procedure, which will quite often entail the same problem.
In our experience, it has proven fruitful to orthogonalize the new Krylov states variationally against all other Krylov states. This is essentially variational compression of a state under the additional constraints of having zero overlap with all previous Krylov states, see Sec. 2.6. The additional constraints can be incorporated with the method of Lagrange multipliers: For each constraint (orthogonal vector |v i ), introduce the respective Lagrange multiplier with regard to |v i and the β i . As with variational compression, this can also be solved by iteratively solving the local one-or two-site problems (without explicitly evaluating Ĥ 2 ). Care should be taken to ensure local orthogonality by using the pseudo-inverse of the Gram matrix as explained in Ref. [21]. Using a two-site approach entails an additional truncation step after each local optimization step and implies again a loss of orthogonality. However, the two-site approach converges much better than the single-site approach towards the global optimum. In practice, we hence first do a few sweeps using the two-site optimization (or, similarly, a single-site optimization with subspace expansion [64]) and follow up with a few sweeps of fully single-site optimization without expansion and hence also without truncation. The resulting state is then exactly orthogonal to all previous states. Note that when initially starting the optimization Eq. (115), the available vector space on the first few sites is likely to be very small (e.g., σ 2 · (m 2 = σ 2 )) and the orthogonalization hence overconstrained. To avoid this problem, one should add the constraints one-by-one during subsequent sweeps. This variational orthogonalization can either be used as a separate orthogonalization step after the MPO-MPS application (using any of the algorithms presented in Sec. 2.8) or it can be combined with the variational operator application (cf. Sec. 2.8.2). Whether it is better to first do the MPO-MPS application using the zip-up method and then variationally orthogonalize the result or to do both steps at once depends on the system at hand: in particular with long-range interactions, the variational approach may require more sweeps to converge whereas short-range interactions are dealt with very efficiently there.

Dynamic step sizing
Dynamic step sizing is one of the most interesting and powerful features of this method and can be used in several ways. The idea is that a Krylov subspace, which was computed for some time step δ, can be recycled for some other step length. It is possible to distinguish two cases: interpolation and extrapolation.

Interpolation
In some applications, the time evolution needs to be performed on a very fine grid in time. The timestepping methods would involve a single step for each point of the grid, which can quickly turn cumbersome or even impossible. On the other hand, if we have a Krylov subspace that we used to perform a large time step, it can be re-used to compute any intermediate smaller time step at the same or higher accuracy. This immediately follows from the construction of the Krylov space and the convergence criteria/assumptions made above. As the diagonalization of the effective Hamiltonian is already known, all we need to do is exponentiate the diagonal times the new time step, map back into the Krylov basis to get the coefficient vector, and compute the new MPS as a superposition of Krylov vectors. If one is only interested in expectation values of an observableÔ, it is advantageous to compute its projection into the Krylov space via . With the coefficient vector c N , the desired expectation value can be computed as c † N O N c N , entirely skipping the more expensive superposition of Krylov states.

Extrapolation
Albeit trickier to implement, extrapolation can significantly improve performance when used as a kind of automatic adaptive step sizing scheme. The idea is as follows: Given a Krylov space, it is also often possible to recycle it for larger step sizes, by only adding a small number of additional Krylov vectors (or none at all). It follows that the optimal Krylov subspace dimension minimizes the ratio of the time needed to compute its basis and the number of steps that it can be used for. As crude approximations of these quantities, we assume that the cost of any new Krylov vector grows exponentially, i.e. the ratio of the costs of successive vectors is fixed. Furthermore, we also assume that any new Krylov vector allows us as many additional time steps as the previous Krylov vector. We then continuously monitor the time needed to construct a new Krylov vector and the number of steps we are able to take with it. Once a decision has to be taken whether to extend the Krylov space or rebuild it from scratch, we use those values as estimates for our decision. In practice, this heuristic has proven to be quite reliable. Figure 12: The effective state |ψ eff j is obtained by projecting the MPS with itself. In case of a mixed-orthogonal MPS with orthogonality center on site j, |ψ eff j is simply the local site tensor M j and the left and right projectors are identity matrices. Figure 13: The effective HamiltonianĤ eff j obtained by projecting the MPO forĤ using ψ L j−1 , ψ R j+1 , ψ L j−1 , ψ R j+1 . Note that this tensor is never explicitly constructed! Only its action on the state tensor M j is evaluated.

MPS-local methods
The global Krylov method of the previous section uses generic features of Krylov subspaces to compute the time evolution of an MPS. However, working "globally" and only using MPS to represent the Krylov vectors comes with the major drawback that those vectors are typically much more entangled than the actual time-evolved state |ψ(t + δ) . To circumvent this problem, it may be desirable to work in the local basis of reduced sites. 6 There are two different ways to obtain such a local basis, one being via an attempt to directly Lie-Trotter decompose the Hamiltonian into local subspaces [19,[24][25][26][27], the other by enforcing a constraint that the evolution lies within the MPS manifold [28,29]. Both have been used in the literature, but their precise derivation requires some work. In the following, we will first present a very heuristic understanding of this approach, followed by a detailed derivation of the first approach, which leads to the local Krylov method presented in Sec. 6.1, and subsequently the second approach, which results in the time-dependent variational principle, TDVP, presented in Sec. 6.2.
We start with an MPS |ψ , a HamiltonianĤ and a site j. The tensors ψ L j−1 ≡ (M 1 , . . . , M j−1 ) and ψ R j+1 ≡ (M j+1 , . . . , M L ) then constitute a left and right map, respectively, from the joint Hilbert space on sites 1 through j − 1 onto the bond space m j−1 and from the joint Hilbert space on sites j + 1 through L onto the bond space m j .
By sandwiching the Hamiltonian between two copies of ψ L j−1 and ψ R j+1 , we can transform it to act on the effective single-site space on site j (of dimension σ j × m j−1 × m j ). Let us call this effective single-site HamiltonianĤ eff j , cf. Fig. 13. Likewise, we can take the MPS |ψ and act on it with a single copy ofψ L j−1 andψ R j+1 to obtain the effective single-site tensorM j representing the effective state |ψ eff j , cf. Fig. 12. If we move the orthogonality center of the MPS to site j first, the transformation of the state is an identity operation, as theĀ 1 of the transformation operator cancels with the A 1 in the state etc. This is desirable for many reasons, primarily numerical stability and avoiding a generalized eigenvalue problem in favor of a standard eigenvalue problem.
Instead of mapping onto the space of a single site, we can also map onto the space of two sites. This results in an effective state |ψ eff (j,j+1) given by the product of two site tensors M j · M j+1 over their shared bond and likewise an effective HamiltonianĤ eff (j,j+1) . On the other hand, if we map onto the space of a single bond between sites j and j + 1 (without a physical index), we obtain the center matrix 7 C j to represent the effective state |ψ eff j and the effective center site HamiltonianĤ eff j . In any of these smaller spaces -typically in the basis of left-and right-orthogonal environments -it is very easy to solve the time-dependent Schrödinger equation exactly from time t to time t + δ using a local Krylov procedure or any other exponential solver [15]. This results then in a new effective state |ψ eff j (t + δ) . Unfortunately, the new state will be represented in a relatively bad basis of the environments ψ L and ψ R which are optimized to represent the original state at time t. To alleviate this issue, after evolving the site tensor M j forwards by δ, one now wishes to obtain a global state |ψ(t) at the original time but in the new left auxiliary basis. If this is achieved, we may iteratively update the basis transformation tensors to be suitable for the new state |ψ(t + δ) . This is where the local Krylov method and the TDVP diverge and proceed differently. 8 However, regardless of how one obtains this old state in the new basis, by sweeping once through the system, all site tensors are incrementally optimized for a representation of the next time step while the global state is kept at the original time. Upon reaching the far end of the system, we keep the local state tensor at the new time and hence obtain a global state at the new time t + δ written using near-optimal basis transformation tensors.
Instead of solving the forward problem a single site at a time, it is also possible to consider the effective Hamiltonian and state of two neighboring sites. In this two-site scheme, one obtains a forward-evolved tensor M (j,j+1) which needs to be split up using a SVD, with then the single-site tensor on site j + 1 backwardsevolved to keep the state at the original time. The advantage is the variable bond dimension which allows a successive adaptation of the MPS bond dimension to the entanglement growth in the system. The downside is that an additional truncation error is introduced. In practice, it appears that a hybrid scheme which first uses the two-site method to increase the bond dimension to the usable maximum and then switches to the single-site variant may fare best, at least for some observables [66].
Depending on the particular way in which we obtain the local problems, solving the system of local Schrödinger equations exactly (using a small direct solver each time) is possible. The result is that both the energy and the norm (where applicable) of the state are conserved exactly. This conservation may extend to other global observables [66,67]. In practice, the TDVP achieves this goal.
Finally, one may turn the first-order integrator of sweeping once through the system into a second-order integrator by sweeping once left-to-right and then right-to-left through the system, each time with a reduced time step δ/2.

The local Krylov method
Deriving a Lie-Trotter integration scheme while working in the local reduced spaces is in fact equivalent to translating the time-step targeting DMRG [19,[24][25][26][27] into the MPS framework. Crucially, the MPS framework makes it possible to precisely analyze the errors made, something which would be very difficultand to our knowledge has not been done -in the standard environment-system DMRG picture. To integrate the local time-dependent Schrödinger equations resulting from the Lie-Trotter decomposition, we will use a Krylov-based approach [19,25,26]. This approach has the advantage of a very precise solution of the local equations and a large degree of similarity with both ground-state search DMRG and the time-dependent variational principle (cf. Sec. 6.2). Alternatively, Runge-Kutta integrators have also been used extensively with only minor changes to the overall method [24,27]. In particular, the error analysis presented here is R,|ψ 5 also valid for the Runge-Kutta integrator, though one of course also has to include the additional time-step error of this integrator.
We begin by looking at the Lie-Trotter decomposition of the time-dependent Schrödinger equation The goal is to find a decomposition schemeĤ = νĤ ν such that we can integrate each summand separately by taking advantage of the MPS representation of the state vector |ψ . Therefore we define orthogonal projectorsP with mappings ψ L j;mj from a part of the physical Hilbert space into the bond space m j . By construction, these operators fulfill P L/R,|ψ , i.e., they are projectors. They are explicitly constructed from left-/right orthogonalized MPS site tensors. The action of such projectors onto an MPS representation of |ψ in canonical form with orthogonality center at site j + 1 (j − 1) is then That is, if the state |ψ has orthogonality center to the right (left) of the target index j, the projectors constructed from it act as an identity on their Hilbert space partition. Next we define the projector on the reduced site-space at site j (cf. Fig. 14) viâ The action of such a projectorΠ |ψ j on a state |ψ is to shift the orthogonality center of |ψ to the site j which can be shown by applying the manipulation depicted in Fig. 15 on to |ψ repeatedly. Therein gauge invariance is employed so that the action of the projector is trivial on the site tensors A k /B k , k = j redering j the center of orthogonality. Thus, the quantum state |ψ remains unchanged under the action ofΠ |ψ j and therefore It is instructive to think ofΠ |ψ j as an operator acting on both the physical and gauge degrees of freedom of the MPS representation. In the physical system it acts as a projector on the physical indices σ j of the source state |ψ . In the gauge degrees of freedom of the MPS representation,Π |ψ j |φ fixes the orthogonality center to site j. As the physical content of the state is independent of the location of its orthogonality center, we must have as also immediately follows from Eq. (122). Now, we can reformulate the action of the Hamiltonian by decomposing it into representationsĤΠ |ψ j acting only onto reduced site-spaces: which indeed yields a Lie-Trotter decomposition of the time-dependent Schrödinger equation Typically, this decomposition is not exact, it depends for instance on the size of the chosen basis, but given a sufficiently large MPS bond dimension, the error made will be small. Having obtained the Lie-Trotter decomposition, a first-order approximation to the evolved state can be obtained by solving each problem −i d dt |ψ(t) =ĤΠ |ψ j |ψ(t) independently. 9 Let |ψΠ |ψ j (t) be the solution of the j-th problem. An approximation to the overall time-evolved state from t → t + δ can then be obtained by sequentially solving the initial value problems (setting |ψΠ |ψ 0 and identifying |ψ(t + δ) ≡ |ψΠ |ψ L (t + δ) with the approximated time evolved state. Comparing the formal 9 The prefactors 1 L |ψ are absorbed into the normalization of the site tensor that is currently evolved.
Taylor expansions of the exactly integrated state |ψ(t + δ) exact with the approximation one readily finds The first error occurs at second-order terms in the expansion and is given by and further commutators at arbitrary high orders. Due to the dependence of the reduced problem currently solved onto the previous solution in Eq. (129) the projectorsP L,|ψ j−1 in our integration scheme are in fact time-dependent because they need to be constructed from site tensors A j <j (t + δ) that have already been evolved. In contrast,P R,|ψ j+1 is constructed from unevolved site tensors B j >j (t). We hence have to actually solve the L initial value problems . At this point we can take advantage of the local representation of the state by multiplying each effective problem at site j with the bond maps ψ where we defined effective reduced-site state and operator representations M j and H eff j , respectively (see also Figs. 12 and 13). It can be checked easily that the projection of the effective problems onto their reduced site spaces at site j leaves the solution invariant. Carrying out the differentiation on the left-hand site explicitly yields a sum of partial differentiations. However, on the right hand siteĤΠ |ψ j acts only on the reduced site spaces and hence all differentials must vanish except the one acting on the tensor in the reduced site space.
Unfortunately, a direct adoption of the recursive solution strategy proposed above is not possible because the current problem at site j requires the projectors to be constructed from left-/right-canonical site-tensors A 1 , . . . , A j−1 , B j+1 , . . . , B L . However, the solution M j−1 of the previous reduced problem is not in general in a canonical representation so that in order to construct the next projectorΠ |ψ j we need to perform a basis transformation first. There is no prescription in the derived decomposition scheme which corresponds to such a basis transformation keeping the evolved site tensors in canonical form. On the other hand we can of course rewrite the updated site tensor as M j−1 → A j−1 R j−1 in a left-canonical representation. But, as already pointed out, we have to discard the transformation matrix R j−1 to obey the decomposition scheme. To proceed with the next reduced problem, we need to resolve the situation of having 2 different basis representations between the sites (j − 1, j). Hence, we introduce a transformation between the unevolved and evolved site tensors A j−1 , M j . The simplest guess is to consider a direct mapping between the unevolved and evolved site tensor basis sets. This transformation can be readily obtained from the iterative solution strategy and is performed by projecting the evolved onto the unevolved bond space. To see this we write the partially evolved state explicitely in the mixed canonical form and introduce a compact notation for the left/right partition's basis states Without truncation |ψ L j (t) mj and |ψ R j (t) mj−1 would constitute complete maps from the physical degrees of freedom in the left/right partition and we could perform an exact basis transformation mapping the evolved into the unevolved left basis and the matrix elements Q j−1;m j−1 ,m j−1 (t, t + δ) of the basis transformation Q j−1 (t, t + δ) are constructed from The transformation Q j (t, t + δ) maps bond basis states |ψ L j (t + δ) mj which are optimized to represent the evolved state |ψ(t + δ) into bond basis states |ψ L j (t) mj which are optimized to represent the unevolved state |ψ(t) . Now, if we would let act Q j−1 (t, t+δ) to the already optimized canonical site tensor A σj−1 j−1 (t+δ) then the effect would be to undo the previous site optimization. Hence, the inverse transformation Q j−1 (t, t+δ) is employed to transform the unevolved bond basis labeled by m j−1 of the current site tensor M j (t). Observing that Q j−1 is constructed from the above introduced bond maps ψ L j−1 (t) and ψ L j−1 (t + δ) which themselves should be build recursively, the transformation can be updated and applied before solving the j-th problem via If we allow for truncation the error incurred by this mapping depends on the overlap ψ(t + δ)|ψ(t) as well as the discarded weight. This basis transformation is mostly motivated by its straightforward availability during the sweeping procedure. However, to the best of our knowledge there is no mathematical justification and we can only give the physical motivation that for small time steps δ the time-evolved state is expected to be relatively close to the unevolved state (deviation ∝ Lδ 2 as follows from the consideration in Sec. 6.1.1). Instead of mapping onto the space of a single site, in practice we map onto the space of two sites. The two-site local TDSE is solved using the time-dependent Lanczos approach to obtain A j (t + δ). The original orthogonality center MPS tensor M j+1 (t) is then projected onto the new left basis as described above. This allows for a flexible adaptation of not only the tensor A j itself but also of the bond basis and -if necessary -MPS bond dimension between sites j and j + 1.
Historically, only this two-site variant was used; but in analogy to the 2TDVP method presented later, it may well make sense to initially use the two-site local Krylov method until the desired maximal bond dimension has been obtained and then switch to the single-site integrator to save computational effort.

Errors
Four errors are present in the local Krylov method when used in its (standard) two-site variant. The smallest of those stems from the inexact solution of the local TDSE Eq. (135). This error can be made very small using a precise solver; in practice, a Krylov exponential as described in Sec. 5 with very few (4)(5) vectors is sufficient. The second error is the standard truncation error incurred during the SVD to split the merged two-site tensors again while truncating to the desired bond dimension. This error can be measured and observed throughout the calculation and is much the same as in the other methods.
The third error is due to the approximation in Eq. (125). This projection error is difficult to measure and strongly depends on the initial state. If the initial state has a reasonably large bond dimension and the Hamiltonian has reasonably short-range interactions, this error will be very small. The longer the interactions in the Hamiltonian, the larger the state has to be. In the two-site method, nearest-neighbor interactions can be handled at all bond dimensions, in the single-site variant, only on-site interactions are error-free at small bond dimensions. The projection error is in particular problematic when globally quenching from a product state with a very non-local Hamiltonian (e.g. resulting from a 2D → 1D map). When calculating equilibrium Green's functions for short-range Hamiltonians, this error is quite negligible.
Finally, there is an error due to the sequential solution of the local TDSE as resulting from the Lie-Trotter decomposition. This error can be quantified, but doing so requires some additional work which will follow now: We continue from the Taylor expansion Eq. (132). We emphasize that the action of the commutators ĤΠ |ψ i ,ĤΠ |ψ j |ψ(t 0 ) need to be evaluated with respect to the iteration the commutators are generated from. Consider, for instance, the action ofĤΠ |ψ 2ĤΠ |ψ 3 |ψ(t) , which is generated from the first-order contributionĤΠ (t) is demonstrated in case of a four-site system with i = 2, j = 3 by performing most of the contractions graphically. In order to obtain the matrix element with an arbitrary state |φ we will introduce a compact notation for contractions of MPS and MPO site-tensors with the boundary tensors (partially contracted MPS-MPO-MPS-networks L j /R j ) with the transfer tensors E j = M j W j M j . We also need transfer tensors with the target state which we define by E φ j = M φ j W j M j . Finally, there will be open bonds at sites i and j that correspond to the contractions originating from the "brace"-contractions in the projectorsP L,|ψ i andP R,|ψ i as well asP L,|ψ j+1 andP R,|ψ j+1 , and we will label these bonds explicitly. Considering for instance the first summand of the commutator at (t) (c.f. Fig. 16) the left part of the contractions can be written as To obtain a compact notation for the right part we introduce the "dangling" transfer tensors D j = W j M j if only the ket site-tensor of the state is included and D j = M j W j if only the bra site-tensor is considered. In the same manner as for the left part we can now write for the right contractions compactly In general we obtain for the matrix element of the commutator with the target state |φ (suppressing the Kronecker-δ for brevity) It becomes immediately clear that only those terms in the Hamiltonian contribute to the commutator which cross the bond (j − 1, j) or (i, i + 1). For the current purpose it suffices to find a general estimate for the incurred error ∝ δ 2 in the Taylor expansion. Thus, we will only consider the contributions from nearestneighbor interactions and hence set i = j − 1 so that The crucial observation here is that for each "open" bond index pair we can treat the combined MPS-MPS and MPO-MPO tensor-contractions over the bond (j − 1, j) as scalar product between the left and right part of the system. We can thus write for the first summand for each open index pair and for the second summand We can bound the sums over the open bonds for the scalar products in terms of absolute values of expectation values of effective Hamiltoniansĥ where we definedĥ and in a similiar wayĥ R,eff j replacing L j−1 → R j+1 . The formal absolute values "|·|" on the right side can be estimated by replacing the L j /R j tensors with fractions of the overall energy expectation value. This is a valid approximation as long as there are no interactions connecting the left/right contracted MPS-MPO-MPS tensor networks with sites to the right/left of them which was exactly the condition for non-vanishing contributions to the commutator. We hence compare only squares of single-site expectation values with either one or two MPO site tensors sandwiched between the effective site tensors at sites j − 1, j. For a large system with smoothly varying site tensors the differences at neighboring sites are negligible so that the commutator can be estimated to The last expression can be estimated easily by the coupling strength of the nearest-neighbor interaction term which we denote by Γ. The contributions from the boundary tensors L j /R j are bounded by the square of the system size and cancel with the prefactors 1 /L of each projected Hamiltonian. We hence conclude this analysis by the somewhat straightforward statement that in general the error at second order in δ scales with L · Γ 2 . However, there is one special situation which simplifies the preceeding arguments drastically namely if we set |φ = |ψΠ |ψ j (t) . Then the commutator compares only local overlaps between site tensors sandwiched between either one or two MPO tensors on neighboring sites which scales as 1 /L. Therefore equal-time observables are evolved with very high precision ∼ 1 /L 2 and the contribution due to nearest-neighbor interactions is strongly surpressed.

Algorithm
The two-site local Krylov method is described in detail in Alg. 4. It relies on some basic initialization functions familiar from the DMRG algorithm which are summarized in Alg. 3.

Algorithm 3 Common helper functions for the local Krylov method and the TDVP algorithms.
The local Krylov method. Also cf. Alg. 3 for definitions of Initialize, Contract-Left, Contract-Right and the needed overall Timestep method.
if j = L − 1 then

The time-dependent variational principle (TDVP)
There is an alternative to the Lie-Trotter decomposition introduced in the previous Sec. 6.1 which also results in a series of local problems: the time-dependent variational principle [28,29]. The motivation of this approach is quite different: its primary aim is to constrain the time evolution to a specific manifold of matrix-product states of a given initial bond dimension. To do so, it projects the action of the Hamiltonian into the tangent space to this manifold and then solves the TDSE solely within the manifold. While ideally used in its single-site variant, the two-site variant allows for flexibility in the bond dimension.

Derivation
The main difference between the TDVP and the local Krylov method is in the derivation of the series of these local time-dependent Schrödinger equations and the recovery of the original-time step after each local forward update: Instead of simply projecting the original site tensor onto the new basis as done in the local Krylov approach, the TDVP explicitly solves a backwards-evolution equation. To embark on the derivation for the TDVP, we need to introduce a few additional ingredients: First, we define the single-site tangent space T |ψ of a given MPS |ψ as the space spanned by variations of single MPS tensors. One may e.g. change the first site tensor of the MPS keeping all others fixed to obtain a new state and combine the result with another MPS where only the second site tensor was changed, but one may not change two (or more) site tensors in the same basis state. The projectorP T |ψ which projects onto this tangent space is given by [28,29] whereP L,|ψ j (P R,|ψ j ) projects on the sites left (right) of and including site n (cf. Fig. 14) and is exactly the same as the projectors used in the local Krylov method. As before, these projectors use the gauge-fixed left-and right-normalised MPS tensors, i.e. they depend on the MPS |ψ and can be written aŝ is the collection of left-(right-) normalised MPS tensors on site j and to its left (right) as in the local Krylov approach. The first contributing sum in Eq. (158) filters for all MPS which differ at most on one site from |ψ , whereas the second contributing sum removes all those states which coincide with |ψ . Put differently, individual tangent vectors are constructed by replacing any orthogonality center tensor M j of the MPS by another tensor N j which is orthogonal to M j , i.e., M j ·N j = 0. In contrast to the projectorŝ Π |ψ j of the local Krylov method, the total projectorP T |ψ projects onto some subspace of the Hilbert space and is only coincidentally written as a sum of local terms.
Second, when inserting the projectorP T |ψ into the TDSE, we obtain While an exact solution of Eq. (162) is still not possible, we can approximate it by solving each term individually and sequentially, i.e., solve L forward-evolving equations of the form Figure 17: Right-hand side of the effective single-site forwards-evolving Schrödinger equation (with j = 4). The effective HamiltonianĤ eff j is given by the cornered green, orange and red tensors. The effective state is given by the blue circled tensor M j . During the calculation, the connected dashed lines are contracted, resulting in a new tensor with three legs (the three open dashed lines).
We then multiply each individual equation above by the single-site map ψ L j−1 ⊗ ψ R j+1 or the center-bond map ψ L j ⊗ ψ R j+1 , respectively. As a result, instead of having to work with the full MPS |ψ , we can work with the effective single-site and effective center matrix tensors and associated local Schrödinger equations directly: The tensor contraction in the RHS of Eq. (165) is graphically represented in Fig. 17, while the RHS of Eq. (166) is shown in Fig. 18. Each of these equations can be solved with a local application of the Krylov method much as in DMRG or the local Krylov method of the previous section.
Sweeping right-to-left (rather than left-to-right) through the system results in solving the equations in reverse order. This turns the initial first-order integrator into a second-order integrator, reducing the time step error (as described in Sec. 6.2.2) from O(δ) to O(δ 2 ) if both sweeps are done with halved time steps δ/2.
An interesting property of the single-site TDVP variant (1TDVP) is that the projection of the Hamiltonian onto the MPS manifold occurs before the time evolution and no truncation has to happen after the evolution. As such, both the norm and energy of the state are conserved under real-time evolution. Alternatively, it is straightforward to extend the mechanism to a two-site variant. This 2TDVP forward-evolves a local tensor M (j,j+1) which needs to be split into two separate site tensors again following the evolution. The advantage is that the bond dimension of the state can be adapted on the fly. However, norm and energy are now no longer conserved exactly if a truncation of the evolved bond is necessary.

Errors
The TDVP has four sources of errors: firstly, there is a projection error due to the projection of the full time-dependent Schrödinger equation (TDSE) onto the MPS manifold of limited bond dimension. This error is particularly severe if the MPS in question has a small bond dimension, but it is exactly zero if the MPS has maximal (exponentially growing) bond dimension. However, the projection error occurs during the projection of the TDSE onto the relevant subspace, i.e., before the time evolution. As such, it cannot lead to a violation of energy conservation or change the norm of the time-evolved state (during real-time evolution). Using a two-or multi-site variance [68] it is possible to estimate this projection error. If the n-site variance of the state is large, the (n − 1)TDVP will provide inadequate results. Vice versa, if the up-to-n-site variance of a state is small, the nTDVP will consider this state an eigenstate of the Hamiltonian and the time evolution will only add a global phase to the state. As a corollary, the 2TDVP can evolve Hamiltonians with only nearest-neighbor interactions without incurring a projection error.
Second, the chain of forwards and backwards evolutions can be considered a sequential solution of a series of coupled TDSE (which are the result of the projection above), each describing the evolution of any particular site tensor. Except in the special case that all these evolutions describe exactly the same dynamics (due to the state having maximal bond dimension), there is a finite time-step error of order O(δ 3 ) per time step and order O(δ 2 ) per unit time. In practice, the prefactor of this error is often much smaller than, e.g., in a TEBD calculation, in particular if the bond dimension of the input state is reasonably large. If the bond dimension is very small, the time-step error will be relatively large.
Third, the 2TDVP contains a SVD to split the evolved two-site tensor into two separate tensors again. During this SVD, a truncation is typically unavoidable, leading to a measurable truncation error. Careful analysis of this truncation error is necessary as always, but also proceeds in much the same way as always. In 1TDVP, this error is exactly zero.
Algorithm 5 The 1TDVP method. For a more detailled description of the Lanczos solver used in lines 3, 7, 16 and 20 see Sec. 5. Also cf. Alg. 3 for definitions of Initialize, Contract-Left, Contract-Right and the needed overall Timestep method. To run, one first initializes the worker object using Initialize and then does incremental time steps using the Timestep function.

4:
A j , C j ← M j via QR 5: if j = L then 7:

22:
Delete 26: end procedure The fourth source of error lies in the inexact solution of the local equations. Using sufficiently many Krylov vectors locally, it is very easy to make this error small. Therefore, one should always use sufficiently many vectors such that the obtained error is at least smaller than the truncation error in the previous step.
Note that changing the time-step size δ in the TDVP affects the four errors differently: the projection and truncation error affect each time step relatively independently of the size of that step. Hence, increasing the number of time steps during a fixed total time evolution increases the projection and truncation errors. The finite time-step error and the error from the inexact local solution, on the other hand, decrease when increasing the number of time steps and the total time is kept fixed. As such, choosing a smaller δ decreases the time-step error but increases the projection and truncation error. It is hence typically useful to take some care when choosing, e.g., the truncation threshold and the time-step size such as to approximately balance the induced errors.
Additionally, the energy and norm of the state are conserved exactly within the 1TDVP and only affected by the truncation error in the 2TDVP. This exact conservation may extend to those quantities which are contained within the Hamiltonian [66,67]. While such energy conservation is certainly very helpful to obtain long-time hydrodynamic observables such as diffusion constants, care has to be taken when using only 1TDVP during the calculation as shown in Ref. [69]. Specifically, one has to take great care to ensure that the obtained data is completely converged in the bond dimension of the state at all times.

Algorithm 6
The 2TDVP method. Also cf. Alg. 3 for definitions of Initialize, Contract-Left, Contract-Right and the needed overall Timestep method.

Algorithm
In practice, the 1/2TDVP method is quite similar to the 1/2DMRG method without subspace expansion or density matrix perturbation and nearly identical to the local Krylov method. Compared to the DMRG method, the only change is the replacement of the local eigensolver obtaining the local eigenstate by a local exponentiation obtaining the local time-evolved state. The 1TDVP algorithm is described in detail in Alg. 5, the 2TDVP algorithm in Alg. 6.

Additional tricks
Having reviewed the most commonly used methods in detail, we will now collect and summarize some generic improvements without any claim to completeness. These are essentially tricks which are relatively independent of the actual time-evolution method and can mostly be implemented in all of the methods we just reviewed. They are not strictly necessary to implement in conjunction with any method and as such will not be benchmarked in detail later, but they are useful to keep in mind in case of particularly hard or challenging problems.

Combining Heisenberg and Schrödinger picture time evolution
In the context of MPS methods, combining Schrödinger and Heisenberg picture [70,71] time evolution was first proposed in Ref. [72]. Considering a time-dependent observable between two arbitrary states, in the Schrödinger picture we would evaluate That is, we apply time-evolution operators to the states |φ and |ψ to obtain time-evolved states |φ(t) and |ψ(t) and then evaluate the time-independent observable between them. The maximum time t obtainable is then typically limited by the entanglement growth in the states, resulting in larger and larger bond dimensions or errors. In comparison, the Heisenberg picture would see us time evolve the operatorÔ as while keeping the states |φ , |ψ static. Again, the maximal obtainable time t is limited by the entanglement growth in the operatorÔ and the maximal bond dimension we can use to represent it. 10 If we now combine the two evolutions as we can obtain times t 1 + t 2 while only requiring MPO and MPS bond dimensions typical of times t 1 and t 2 respectively. Note that in this case, the computationally limiting operation is no longer the time evolution itself but the evaluation of observables given as the tensor network of a large-w MPO between two large-m MPS [72].

Complex time steps
Let us assume that we have a time-evolution operatorÛ (δ) =1 − iδĤ which is exact to first order. Applying this operator to a state will result in an error O(δ 2 ) compared to the exact evolution with the operatorÛ (δ) = e −iδĤ . Repeating the process T /δ times to obtain a state at final time T , we incur an error O(δ 2 ) T /δ = O(δ). However, if we allow complex intermediate steps δ 1 and δ 2 , we can solvê for δ 1 and δ 2 by expanding the left-hand side: Two solutions are possible, one of them is Choosing these values for δ 1,2 then results in a third-order error per time step and a second-order error overall. This choice of time steps is suggested in particular in combination with the MPO W I,II method to obtain a better error per time step. The cost of the method only grows linearly with the number of evolution operators and, e.g., four operatorsÛ (δ 1,2,3,4 ) are required for a third-order error [32]. The drawback is the loss of unitarity at each individual time step which may be disadvantageous. Furthermore, if the time evolution is purely imaginary (e.g., for finite-temperature calculations) and the Hamiltonian does not contain complex coefficients, one may avoid complex arithmetic entirely and only use real floatingpoint scalars for 50% less memory usage and an approximately four-fold speed-up on matrix multiplications. Unfortunately, it is then impossible to use this trick to reduce the time-step error.

Green's functions 1: Removal of low-lying states
This trick was first proposed in Ref. [23] and is relatively straightforward to implement. Assume a ground state |0 as an MPS obtained via DMRG and let us shift the Hamiltonian such that this state has energy E 0 = 0,Ĥ|0 = 0. We are then interested in the time-dependent observable whereÂ andB are typically local operators such as creators or annihilators. The evolution ofB|0 is generically non-trivial and if we want to capture all frequency contributions in x(t), we need to evolve until at least times t = 1 /E1 where E 1 is the energy of the lowest eigenstate with non-zero energy |1 contained inB|0 . In contrast, to capture contributions of higher-energy states |n with energies E n > E 1 , we only need to evolve to shorter times t = 1 /En < t . However, a few low-lying eigenstates can often be calculated also with DMRG by orthogonalizing against previously-found eigenstates. Hence if we run DMRG multiple times, we can obtain not just the ground state |0 but also further eigenstates |1 , |2 etc. If we use quantum numbers andB changes the quantum number of the state, these additional eigenstates should be calculated in the quantum number sector ofB|0 . If we then orthogonalizeB|0 against |1 , |2 etc., we remove the contributions which rotate (in real-time) or decay (in imaginary-time evolutions) the slowest and hence require the longest time evolutions. The evolution of the removed states can then be done exactly as we know both their energy and initial weight inB|0 . Even missing one of the eigenstates due to convergence problems with DMRG does not introduce an error but merely decreases the effectivity of the method.

Green's functions 2: Linear prediction
Calculating dynamical structure factors or more generally spectral functions from time-dependent data requires two Fourier transformations: first, one needs to transform from real-space to momentum-space and second from real-time to frequency. The former transformation is typically unproblematic, but the latter transformation suffers either from overdamping or strong spectral leakage if the available maximal time t is insufficient. Linear prediction [73][74][75] assumes that the real-time momentum-space Green's function G(k, t) is composed of multiple distinct exponentially decaying and oscillating contributions arising from a distinct pole structure of G(k, ω). If this is true for a time series x 1 , x 2 , x 3 , . . ., an additional data pointx n can be approximated well by the formx with suitably chosen coefficients a i independent of n. We hence first compute a finite time series which is still as long as we can manage with time-dependent MPS methods. Subsequently, we need to find coefficients a i such that the above holds for the data we computed exactly. Using those a i , we can then extend the time series to arbitrarily long times to generate a sufficiently long time series that a subsequent Fourier transform only requires minimal damping and hence provides for clear features.
It is useful to divide the available calculated data into three segments: first, one should discard an interval To select the a i , we want to minimize the error Note that to evaluatex k , D must be larger than the number of coefficients p. The coefficient vector a is obtained as where the matrix R and vector r have entries respectively. Once the a i are obtained, data ideally can be generated initially for the interval [x D+N , . . . , x max ] and, once verified to coincide with the calculated data, extended to arbitrary times. Several numerical pitfalls need to be considered here: First, the matrix R may be singular. Two possible remedies include addition of a small shift ε or reduction of the number of parameters p. Ideally the latter should be considered, but may lead to problems finding the optimal non-singular p. Second, if we construct the vector we can move it forward one step asx where the matrix A is of the form and its eigenvalues α i contain the frequencies and dampings of the aforementioned oscillations and exponential decays. As such, α i > 1 are unphysical and need to be dealt with, it appears [75] that setting those contributions to zero works best.

Purification insertion point (PIP)
Calculating out-of-time-ordered correlators (OTOC) allows us to measure the scrambling of quantum information and finds many interesting and current applications. In general an OTOC of operatorsŴ ,V is given as an ensemble average At finite temperature, we have to use a purification to evaluate this quantity. If we would calculate the time evolutions in FV ,Ŵ β (t) naively by direct evolution only in the physical degrees of freedom we would require O(N 2 ) time steps to obtain the OTOC at time t = N δ. Clearly, the growing numerical expenses forbid to reach both large system sizes and long time scales t. The graphical notion (see Fig. 19) immediately suggests to transform the operators in the OTOC in some way as to evenly distribute the required time evolutions leading to only linear scaling of effort in time t. In the following, we will explain how to transform these operators in the purification picture and alter the purification insertion point (PIP). For related work in the framework of matrix-product operators, cf. Ref. [76].
Consider the ensemble average FX ,Ŷ ,Ẑ,β ≡ Tr ρ(β)ẐŶX for some global operatorsX,Ŷ ,Ẑ at inverse temperature β. Using the cyclic property of the trace the ensemble average can now be written as expectation value in the enlarged Hilbert space where we have introduced the purified finite temperature state | β /2 ≡ρ( β /2) |0 based on the infinite temperature state |0 (cf. Sec. 2.7). A graphical representation of recasting the trace into an expectation value is given by the two networks Fig. 20a and Fig. 20b with out-going indices representing row vectors (a) Ensemble average as tracê  and in-going indices column vectors.
From the pictographical representation we motivate the infinite temperature state |0 to be represented by a rank (2, 0) tensor |0 ≡ a,b D a,b |a |b and correspondingly 0| by a rank (0, 2) tensor 0| ≡ a,b D a,b a| b |, where we have placed a bar over those indices labeling ancilla degrees of freedom.
These tensors have to fulfill the orthogonality conditions so that the tensor elements can be choosen to be D a,b ≡ D a,b δā b and D a,b ≡ D a,b δb a . When contracted over physical degrees of freedom, the action of these tensors is to convert row vectors into column vectors and vice versa If we now interpret indices carrying a bar as maps between ancilla degrees of freedom we can reformulate the purification in terms of the D tensors Inserting identities on the physical Hilbert space betweenρ andX as well asX andŶ and making explicit use of the representation ofD we obtain so that now fXf eρbf ( β /2) ≡ρ t ( β /2)X t are acting on the ancilla space H A , i.e., we have shifted the purification insertion point. Again these manipulations can be represented efficiently in a graphical notation and are given in Fig. 20c.
Using this procedure, we can rewrite the OTOC FV ,Ŵ β (t) as Defining the initial states and their purified time evolutions 56 the OTOC can be obtained by calculating the expectation value We hence only need N steps to evaluate all expectation values for times t = N δ. From a more general point of view shifting the purification insertion point in the OTOCs reformulates the multiple Schrödinger time evolutions of the physical system in the canonical choice of the PIP into a Heisenberg time evolution on both the physical and ancilla system of a generalized observableV † P ⊗V t A .

Local basis optimization
While the dimension of local Hilbert spaces is typically very limited in spin and electronic systems, bosonic systems potentially require a large local dimension σ = O(100). As this local dimension typically enters at least quadratically in some operations on matrix-product states, some way to dynamically select the most relevant subspace of the local Hilbert space is potentially extremely helpful. The local basis transformation [77][78][79] method provides for just this: by inserting an additional matrix U j on each physical leg of the MPS (cf. Fig. 21), the large physical dimension is transformed into a smaller effective basis. The rank-3 MPS tensor then only has to work with the smaller effective basis. The method was adapted for TEBD time evolution in Ref. [79] but it is also straightforward to use in the other time-evolution methods presented here. For the MPO W I,II and global Krylov methods, only the MPO-MPS product has to be adapted to generate additionally a new optimal local basis after each step. The DMRG, TDVP and local Krylov method translate [35,80,81] directly in much the same way as DMRG.

Examples
The following four subsections serve as exemplary applications of the different time-evolution methods to demonstrate, justify and verify the theoretical remarks of the earlier reviews of each method. To some extent it is our hope that the examples here are sufficiently general to be used as benchmarks for future time-evolution methods as a way to increase the comparability with previous work. A pairwise comparison of a new method with one of the methods tested here would hence also serve as a comparison with all other methods tested here. To this end, we focus on a clear description of the problem setting and analysis of the runtime behaviour of the methods instead of analyzing the physical results in great detail.
In Sec. 8.1, we will calculate the dynamical structure factor of a XXZ spin chain in a staggered magnetic field at zero temperature. This is a standard application of one-dimensional time-dependent MPS techniques which is straightforward to reproduce in any implementation. Sec. 8.2 studies the cooling of a Hubbard chain in the canonical ensemble as an application of imaginary-time evolution and a study of the stability of each evolution method against the build-up of errors. Sec. 8.3 considers the melting of Néel order on a two-dimensional lattice in real-time, a very challenging problem due to very long-range interactions coming into play when projecting the system onto a chain geometry, which is a necessary step for treating higher-dimensional systems with MPS. Finally, in Sec. 8.4 we simulate the evolution of out-of-time-order correlators on an interacting spin chain at infinite temperature. This problem setting combines the need for a purification ansatz with real-time evolution and the calculation of different-times correlators, which, like the dynamical structure factor, requires the correct treatment of phases during the evolution.

Dynamical spin structure factor (DSF)
In this section we examine the longitudinal dynamical spin structure factor (DSF) S zz (q, ω). This quantity can be measured directly in e.g. neutron scattering experiments. Our system of choice here is an anisotropic S = 1/2 Heisenberg chain in a staggered magnetic field described by the Hamiltonian with |0 the ground state of Eq. (199). We then have where we discretize the time coordinate t n = nδ and truncate the integral at a finite, maximum evolution time T = N δ. It is of course important to chose the system size L sufficiently large that the initial central excitation does not reach the edges of the open chain during the finite simulation time T . Due to this finite simulation time T , it is also necessary to introduce a damping factor η > 0. Without the damping factor, we would observe large spectral leakage due to the finite interval. With η > 0, we obtain an artificial broadening of spectral lines instead. The overall simulation procedure was to calculate the ground state |0 of Eq. (199) with DMRG at a maximum discarded weight of 10 −14 and energy convergence better than 10 −12 . The excited state |1 = s z L /2 |0 was then evolved in time until the final time T = 200 in units of ∆ was reached. During the evolution, we need to evaluate the time-dependent correlator where the term 0|ŝ z jŝ z L /2 |0 is of course only evaluated once. The last equality only holds for representations of the time stepperÛ (δ) which act on the ground state by multiplying a phase, i.e.Û (δ) |0 = e −iE0δ |0 . This should of course be the case ideally, but in practice is not achieved by all methods. In general, S zz (q, ω) is very sensitive to the build-up of global phases by the particular time stepper rendering this quantity an important testcase.

Real-space evolution
In Fig. 22, S zz (j, t n ) as obtained from SyTen 2TDVP and TEBD2 is shown in the top panels. Here we already see the main error source when using the raw data to calculate the spin structure factor: a global phase shift in the correlator which can be identified by a uniformly evolving stripe pattern in the TEBD2 data imposed over the perturbation light cone. When calculating the Fourier transformations this uniformly evolving phase will cause a k = π signal which ultimatively may overlay the relevant spectral information. In the bottom panels of Fig. 22 we plot the low-energy part of the DSF with damping η applied such that e 200η = 0.1. This plot reveals a dominant k = π feature visible in the TEBD2 data which is absent in the 2TDVP data. At least to some extent, this phase shift can be reduced by calculating the explicit time evolution of the ground stateÛ (t n ) |0 ≡ |0(t n ) in the evaluation of S zz (j, t n ).

Spectral functions
More generally, most methods do not evolve the area outside the light cone exactly. That is, while in this region the Hamiltonian should lead to trivial dynamics, small errors are introduced. Given enough time, these errors can accumulate with two major results: (i) The slightly erroneous state is now even less of an eigenstate of the (effective) Hamiltonian and hence acquires dynamics, leading to a runaway effect. (ii) Additional entanglement accumulates and makes the calculation more difficult. The precise way in which this error occurs slightly depends on the method: For the TEBD2 and the MPO W II method, the error is simply a finite time-step error due to the Trotterization or otherwise δ-dependent error terms. The local Krylov method is primarily subject to the error created by the basis transformation after solving each local problem, which is proportional to the step size δ. This error affects every site and, after comparably short times, results in site tensors which have lost the property of being exact eigenstates of the effective Hamiltonian outside the light cone. Altogether, these methods can produce results at time step δ = 0.1, but the results only qualitatively reproduce the spectrum, require a large bond dimension m and also carry a large spectral weight around ω = 0.
For the global Krylov method, we had initially selected a Krylov error < 10 −6 , believing that this error should be sufficiently small to provide good results. However, it turns out that an extremely small Krylov error < 10 −10 is necessary to avoid the runaway effect and the accumulation of errors outside the light cone (observed best in Figs. 25 and 26). For this method, the additional dynamics also make the problem more complicated, hence requiring more Krylov vectors for a precise solution. As these Krylov vectors are typically more strongly entangled, their truncation will be more severe. Conversely, the Krylov error at a fixed number of Krylov vectors will increase, leading to yet larger errors. In effect, the method runs into an entirely artificial exponential wall, built from accumulated errors outside the light cone and stopping the calculation in just a few steps (cf. Fig. 26). Selecting a smaller time step size δ = 0.01 and disabling the extrapolation (cf. Sec. 5.4.2) -hence forcing a very small Krylov error -leads to a very precise calculation which is also competitive in runtime.
The 2TDVP, in comparison, does evolve the area outside of the light cone exactly -the MPS tensors there are exact eigenstates of the local effective Hamiltonian, as they are the result of a DMRG ground-state search procedure. As such, 2TDVP also produces good results at δ = 0.1 and (like the global Krylov at δ = 0.01) no spectral weight around ω = 0. Fig. 23 shows the light cones produced by 2TDVP, TEBD2, the global Krylov method and the local Krylov method. 2TDVP and the global Krylov method give small, homogenous correlations outside the light cone. With sufficiently large frequency resolution, we could therefore find some spectral weight exactly at ω = 0; in practice our resolution is too low and the prefactor too small for this effect to be visible. TEBD2, in contrast, also results in slow dynamics in this region. These dynamics can be seen in some remaining spectral weight close to ω = 0, but the light cone itself is produced very well. The local Krylov method, in contrast, gives immediately large contributions to the correlations outside the light cone. These correlations (i) give large spectral weight around ω = 0 even with δ = 0.01 and (ii) result in a shift of the overall spectral function.
As an example of the effects discussed above, consider S zz (k = π, ω) plotted in Fig. 24. Most importantly, all methods are able to resolve the location of the primary peak at ω ≈ 0.5 and also subsequent peaks at ω ≈ 0.67, ω ≈ 0.8 and ω ≈ 0.88. However, only 2TDVP produces the "correct" function at δ = 0.1 and . All other methods produce some spectral weight around ω = 0; the errors of the local Krylov method also produce consistently negative intensities. Inset: Full spectrum as produced by 2TDVP for these parameters. The main graph x-axis corresponds to a vertical line at k = π in the inset.
is only joined in this by the global Krylov method at δ = 0.01. All other methods produce additional unphysical features such as an overall shift (local Krylov method at both δ values; TEBD2 and the MPO W II method at δ = 0.1) or spectral weight towards ω = 0 (TEBD2, MPO W II and local Krylov methods at all δ).

Benchmark
Finally, we compare the runtime and growth of the bond dimension in the different methods in Figs. 25 and 26. All calculations where performed single threaded on an Intel Xeon Gold 6150 CPU with 192 GB of RAM and no hard-disk caching. The growth of the bond dimension is slower when the time step used is smaller (especially δ = 0.01), suggesting that at least some part of the entanglement structure is wrong due to a finite time-step error. Still at δ = 0.01, the bond dimension grows slowest during the 2TDVP evolution resulting in a bond dimension at t = 200 roughly a factor of two smaller than for the local Krylov and MPO W II methods. The local Krylov method at δ = 0.1 and δ = 0.05 very quickly saturates the maximal bond dimension m = 200 which is consistent with the picture of the site tensors being pushed away from the ground state. The strong dependence on the step size is related to the error induced by the basis transformation which in turn can be estimated by the overlap between the unevolved and evolved state which is reduced for smaller time steps. Curiously, the situation is comparable for the global Krylov method even though the underlying reasoning is very different. In this case the strong dependence of the bond dimension on the time step is due to the incurred Krylov error. Once the site tensors are no longer the ground-state solutions of the reduced two site problem, the incurred Krylov error increases even faster as the overall state rapidly evolves away from the global ground state and the small Krylov subspace can not faithfully represent the action of the operator exponential.
The growth of the bond dimension also translates directly to the computational effort, plotted in Fig. 26. Consistently, we find the local Krylov method to be much slower than the comparable 2TDVP even with the smallest time step. Compared to TEBD2, the third method which produces good data at least in the range 0.4 < ω < 1, 2TDVP performance is acceptable though not superb. The global Krylov method also performes comparably well as long as the time step is small enough δ = 0.01 and is completely unsuitable if δ = 0.1.

Conclusion
We calculated the dynamical spin structure factor which can be boiled down to the evaluation of nonequal time correlation functions. In our analysis, we find 2TDVP to provide both the best numerical data and very efficient calculations due to its numerical stability also at larger time steps. In fact, it is the only method which generated a stable time evolution if we set δ = 0.1. The global Krylov method provides equally good numerical results as long as its Krylov errors are sufficiently small, which unfortunately results in a very small time step and no recycling of Krylov subspaces (cf. Sec. 5.4.2) being possible. Putting the focus on the runtime, we find that TEBD2 can be considered as a method providing a satisfying trade-off between quick MPO-MPS application also at smaller time steps and satisfying numerical results, even though there is a spectral loss at very small energies ω ≈ 0 due to the build-up of global phases during the time evolution. Finally, the local Krylov and the MPO W II method are apparently the most susceptible for phase errors with very large errors at δ = 0.1. Even at δ = 0.01, the local Krylov method gives large spectral leakage over the entire frequency range while the MPO W II method gives relatively reasonable data though slower and of worse quality than TEBD2.
An interesting remark is that counterintuitively, higher-precision calculations can be faster than lowerprecision calculations. That is, as long as the time-evolution method only induces very small errors, the area outside the light-cone stays nearly a (local) eigenstate of the Hamiltonian. If instead large errors are made by the method, these nonphysical errors then need to be evolved also in subsequent steps, which may be much harder than "simply" solving the physical problem. The most obvious example of this behaviour is the global Krylov method. In fact, however, all methods exhibit larger bond dimensions at larger time step sizes (related to larger Trotterization errors) and a faster growth of bond dimensions, cf. Fig. 25. Without the artificial limit to m = 200, we would hence expect an eventual cross-over where the more precise calculation at δ = 0.01 with ten times more steps per unit time is faster than the large-step calculation at δ = 0.1.

Cooling of a doped Hubbard chain
Let us now consider the cooling of a Hubbard chain from infinite temperature (β = 0) to a finite temperature β ≥ 0. We will work with a purified state in a canonical ensemble [50] conserving both particle number and spin S z projection, i.e. we implement the associated U(1) N × U(1) S z symmetries. The local physical dimension is four with each quantum number sector denoted as (N, S z ) only containing one single state (i.e., (0, 0),(1, − 1 /2), (1, + 1 /2) and (2, 0)). The physical chain has a length of L = 24 sites resulting in an MPS with L = 48 tensors. Physical sites will be labelled by indices 1, . . . , 24, the auxiliary sites as a(1), . . . , a (24). The initial state is constructed (cf. Sec. 2.7) by applying the operator 20 times to a vacuum state, resulting in 5 /6 filling. The generated state has a maximal bond dimension m = 80 and is used as the initial state for imaginary time evolution under the Hamiltonian with t = 1 and U = 4. To simplify notation in this example, we will time evolve using the evolution operator e −δĤ and refer to "times" via the real-valued β variable. That is, we drop the prefactor −i in δ (which would be necessary to perform an imaginary time evolution using e −iδĤ ) and we also drop the factor of 1 /2 (which relates the target "temperature" β as the result of the evolution to the actual inverse temperature β in the physical system, cf. Sec. 7.5). In practice, all calculations are done using complex-valued arithmetic and negative imaginary time steps.
In the following, we primarily consider the long-range spin-spin correlator The on-site term ŝ z 1ŝ z 1 is about the same order of magnitude as all the long-range terms which is why it is not included here. During the cooling simulation, the value of this correlator evolves monotonously from 0 until it reaches a plateau, ideally at the ground-state value (however, in practice, it may over-or undershoot, see below). In an error-free calculation, the simulated state would then be the ground state of the Hamiltonian and only acquires a global prefactor upon further imaginary time evolution. In practice, due to accumulated errors, particles are able to leave the physical system and move into the auxiliary system of equal size and the state can "tunnel" into the global ground state in the combined physical and auxiliary degrees of freedom, which is not what is looked for. Measuring the particle number in the physical No correlation between bond dimension and onset of particle loss can be observed.
system is, therefore, a measure for the stability of the procedure. This process is monitored by evaluatinĝ N p = L j=1n j and comparing it to its true value, which is N p = 20 here. We run calculations at bond dimensions m = 100, 200, 300, 400 and 500 and time step sizes δ = 0.01, 0.05 and 0.1 up to β = 20. In this problem, the global Krylov method has to be directly disqualified as the highly entangled Krylov vectors generated by it exceed the time and memory resources of the other methods by more than a factor of 20 (RAM) and 100 (CPU time), respectively. The 1TDVP method is also unsuitable here (cf. Sec. 8.2.3 below). We hence focus on the 2TDVP and MPO W II method (both with SyTen and SciPal) as well as the second-order TEBD and local Krylov methods (with SyTen only). SyTen was configured to apply MPOs (in the MPO W II method and TEBD2) using the zip-up method, whereas the MPO W II method in SciPal is configured to use the zip-up method followed by a few variational optimization sweeps. As always, for the plots we have selected the more suitable of the two implementations.

Loss of particles
As we would like to simulate a canonical ensemble, it is quite relevant that loss of particles to the auxiliary system does not occur too early. Studying N p , we find that, first, increasing the step size usually reduces the loss of particles (cf. Fig. 27). Second, there is no clear relation between the selected bond dimension and the onset of loss of particles: E.g. 2TDVP at δ = 0.1 starts losing particles at m = 400 around β = 15 but is stable until β = 17 for m = 100, 200, 300 and 500 (not shown) and while the MPO W II method is stable at δ = 0.05 and m = 100 and m = 500, it loses particles at m = 300 (cf. Fig. 28). As expected, the behaviour of the local Krylov method and 2TDVP on this relatively large-scale observable is very comparable.
The configuration difference in the MPO W II routines to apply the operator to the state between SyTen and SciPal leads to different results here: SyTen was configured to only use the zip-up method whereas SciPal also did a few variational sweeps afterwards to optimize the state. This variational optimization leads to considerably better stability and mostly no loss of particles at all, but is computationally more expensive.

Spin-spin correlator
Our proxy for a long-range observable, the sum of correlators Z 1 , obtains values comparable to the ground-state value around β = 15 to β = 18. This is already deep in the regime where particles tend to  be lost to the auxiliary system. Additionally, it is often unnecessary to consider this point when using a finite-temperature method, as we could conceivably use DMRG directly to obtain ground-state properties.
Hence, we will concentrate on a higher temperature at β = 1 and consider the value of the correlator as a function of m and δ, cf. Fig. 29. The first relevant observation is that 2TDVP, TEBD2 and the MPO W II method converge to the same result once m ≥ 300 and (for TEBD2 and MPO W II ) δ = 0.01. Curiously, the local Krylov method appears to suffer a relatively large time-step error here; only its results with δ = 0.01 are comparable to the other methods. In contrast, 2TDVP provides very good results starting at m = 200 which are also nearly independent of the time-step size. Curiously, the remaining dependence of the 2TDVP error on the step size is such that the δ = 0.01 calculation has the highest deviation from the other data points at m ≥ 300 . That is, while the MPO W II and TEBD2 methods suffer from a simple finite time-step error which becomes consistently smaller with smaller time steps, the behavior of 2TDVP is more complicated. It can be understood as the projection error being 2TDVP's primary error source. The projection error acts on each individual time step, hence increasing the number of time steps also increases the projection error. As the initial state is exact at m = 80, we cannot increase the MPS manifold to make up for the increased projection error.
Here again we observe that the variational optimization following the zip-up considerably reduces the error in the MPO W II method as used by SciPal compared to the SyTen implementation (not shown). In contrast, 2TDVP provides exactly the same result in both implementations.

Projection error in 1TDVP and 2TDVP
We have excluded 1TDVP from the set of viable time-evolution methods here as the nature of the problem -global cooling and redistribution of particles -does not yield itself to the very restricted manifold available to 1TDVP. Indeed, imaginary time evolution with 1TDVP results in a fixed state around β = 3 with energy zero. In comparison, a 2TDVP calculation at β = 3 results in an energy of ≈ −17.2. Additionally, calculating the one-site variance and the two-site variance [68] (initially, they sum to ≈ 45.9) in both cases shows that 1TDVP continously minimizes the one-site variance but leaves the two-site variance invariant. 2TDVP minimizes the two-site variance at the first time step and then slowly minimizes the one-site variance while . The large value of one-and two-site variance initially suggests that also the three-site variance is large and 2TDVPs initial abrupt minimization of the two-site variance also suggests that its projection induces some relatively large error which cannot be reduced further by using smaller step sizes.

Runtime comparison
In this experiment, each calculation was run on a single core of a Xeon E5-2630 v4 with 64 GB of RAM and no hard-disk caching. Comparing run times, the major advantage of the 2TDVP due to its very limited time-step error is apparent. Judging from Fig. 29, the MPO W II method and TEBD2 both require a step size of δ = 0.01 to provide data comparable to 2TDVP at δ = 0.1. Supposedly, the local Krylov method would need a yet smaller time step size to provide perfect results. Per individual time step, we compare runtimes in Fig. 30. This runtime per step is lowest for the TEBD2 method and highest for the MPO W II method. The 2TDVP and local Krylov method perform steps similarly quickly. The relatively long runtime of the MPO W II method is largely due to the variational optimization necessary to obtain good results, the MPO W II method in SyTen (using only the zip-up) is approximately a factor of two faster than TEBD2 but also incurs large errors and loss of particles.
Taking into account the larger time step available with 2TDVP compared to TEBD2 and the MPO W II method (as seen in Fig. 29), the 2TDVP method is certainly the fastest approach here. We also confirm that if the number of Krylov vectors is chosen adaptively during the local solution of the 2TDVP problem, the 2TDVP displays slightly longer CPU times per step for larger time steps (as more local Krylov vectors are required). The MPO W II and the TEBD2 CPU times per step generically do not depend on the step size. Only at m = 500 the bond dimension is not yet saturated at β = 1 if too few time steps have been taken, resulting in slightly lower runtimes for larger step sizes.

Conclusion
To summarize the results of this example, we find that for small β, 2TDVP provides adequate results very quickly, with an overall speed-up of 5-10 compared to TEBD2 and the MPO W II method. The local Krylov method incurs large errors which -while having comparable runtime to 2TDVP -make it unsuitable here. At large β, the MPO W II method with variational optimization tends to be the most stable, displaying nearly no particle loss. We did not investigate a possible combination of 2TDVP initially with a later switch to 1TDVP once the bond dimension has grown sufficiently: this would then ensure stability against particle loss at large β combined with the computational efficiency of the TDVP method here.

Melting of Néel order in the two-dimensional Heisenberg model
In this example we study the melting of Néel order in the ferromagnetic Heisenberg model on the twodimensional square lattice of side length L. The Hamiltonian readŝ where i, j denotes nearest-neighbor site indices. The initial state at t = 0 is chosen to be the Néel state, i.e. spins pointing up and down in a checkerboard pattern with sublattices A and B visualized in Fig. 31. To map the two-dimensional system onto a one-dimensional MPS, we use a tilted z-shaped mapping (cf. Fig. 31) as suggested in Ref. [82]. This mapping transports entanglement between the vertical and horizontal bipartitions of the lattice through O(L) bonds while diagonal bipartitions only cut a single bond. However, nearest-neighbor interactions now have to be carried through ≈ √ 2L other sites. 11 Under real-time evolution, the Néel order of the state is expected to decay, and at the same time the entanglement of this initial product state will grow with time. To monitor this decay, we consider the staggered magnetization per site, which we define as Initially, m(0) = 1 /2. In addition tom, we also measure the energy of the state to check the accuracy of energy conservation in the local Krylov and TDVP approaches. Furthermore, we note that the system has an inversion symmetry: Mirroring the initial Néel state on the MPS chain at the central bond results in the same local expectation values. Since the Hamiltonian does not break this symmetry, it has to be preserved under real-time evolution. Hence, computing the deviation from this symmetry gives a measure for the error of the time evolution scheme. In our test, we set L = 8, i.e. work with 8 × 8 lattices of 64 sites with open boundary conditions in both directions. We limit the CPU time for computing the time evolution to one hour on a single core of a Xeon E5-2630 v4 clocked at 2.20 GHz. After this hour, we then evaluate observables on the stored MPS representing the individual time-evolved states. The bond dimension of the states is limited to m = 200 and the discarded weight varied between 10 −8 and 10 −12 . It appears that the maximal bond dimension limit m = 200 is not the leading source of errors at least at short times t < 1 -for such short times, calculations running for 24 CPU hours at m = 2000 produced essentially identical results. As this is a highly challenging problem for MPS methods, we prefer at this point not to investigate the limitations of our state-of-the-art approaches, but instead compare them for rather restricted bond dimensions focusing on the short time regime. It would be interesting to see, which time scales can be reached when further optimizing our procedure (in particular increase the number of kept states m, which quickly leads to a substantial increase of the needed computational resources, but also further aspects, like the mapping to the chain geometry), and we hope that our considerations here on the short time scales may help future developments of such optimized schemes for treating 2D or quasi-2D systems.
In the following, we choose time steps δ to be either 0.1 or 0.01 to verify approximate convergence and to investigate the error caused by the finite time step.
TDVP and product initial states. The state |ψ(0) is a product state and can be represented at bond dimension m = 1. Attempting to time-evolve this MPS with TDVP or the local Krylov method is pointless, as many of the long-range interactions are lost during the projection step and the resulting time evolution is simply wrong. To avoid this problem, we run an initial DMRG calculation with the Hamiltonian We initialize α = 1 and then for ten successive sweeps reduce it by a factor of ten each time. Finally, we set α = 0 and run five more sweeps. Throughout, we use the subspace expansion to increase the bond dimension of the state up to m init = [50,100,200] and set the discarded weight threshold to zero to keep the state at this artificially large bond dimension. Using the time-evolution HamiltonianĤ together with the subspace expansion guarantees that the generated additional states have some relevance to the physical problem by being generated from (partial) applications ofĤ to |ψ(0) . The DMRG calculation takes up to 120 seconds, the allowed runtime of the TDVP and local Krylov methods is hence reduced accordingly.
Time-evolving block decimation. The TEBD2 requires splitting the Hamiltonian into internally-commuting parts. To minimize the bond dimensions of the MPOs, each column and each row of the lattice is split into two Hamiltonians. These Hamiltonians then do not contain overlapping gates and have a maximal bond dimension of 4. For optimal inversion symmetry, we apply gates on rows 1 and 8 first, then on rows 2 and 7, rows 3 and 6 and finally rows 4 and 5 (or vice versa) and proceed in the same way with the column terms. These MPOs are applied using the zip-up method which was tested to be as precise as the variational optimization here, but much faster.
The global Krylov method. As can be expected, repeatedly applyingĤ to the Néel state generates large amounts of entanglement. This is problematic for the global Krylov method, as its Krylov vectors would ideally have a very large bond dimension. To still proceed with the calculation, we force the truncation of Krylov vectors also to a maximal bond dimension m = 200. This truncation can lead to very wrong results when measuring long-range correlators but should be acceptable for m . Given the initial product state, the variational orthogonalization has to work in the two-site variant.

The staggered magnetization m(t)
We find that neither 1TDVP nor 2TDVP results change with the step size, but some minimal changes can be seen in the local Krylov results between δ = 0.1 and δ = 0.01 (cf. Fig. 32). Furthermore, we find that the 2TDVP and local Krylov method results for m(t) do not change with the initial bond dimension m init = [50,100,200] as obtained by DMRG and converge to the same result at least for short times. The 1TDVP results depend on this initial bond dimension (cf. Fig. 33) and even in the best case of m init = 200 only qualitatively reproduce the other results. In the following, we hence use m init = 50, δ = 0.1 for 2TDVP, m init = 100, δ = 0.1 for 1TDVP and m init = 50, δ = 0.1 and δ = 0.01 for the local Krylov method.
The global Krylov results do not change with step size δ = 0.01 or δ = 0.1 nor with the discarded weight 10 −8,−10,−12 and are in both cases in good agreement with the other methods until t ≈ 0.7. Beyond this point, it provides worse data than 1TDVP due to the very strong truncation of the Krylov vectors. TEBD2 provides data within the range of the other methods at δ = 0.1 with the zip-up algorithm and a discarded weight of 10 −8 . At smaller δ or when using the variational optimization for operator application, the evolution only obtains times t ≈ 0.5 (but in agreement with all other methods).
When using the MPO W II method with large time steps δ = 0.1, the results very quickly deviate from the other methods; in the SyTen implementation (using either zip-up or variational application), additional instabilities occur around t ≈ 4. At δ = 0.01, data is in line with the other methods mostly independent of the discarded weight. The variational operator application is much slower than the zip-up method here, so we use the latter also for the MPO W II method. Fig. 34 shows the staggered magnetization for the selected configurations of each method. The global Krylov method is handicapped by its need to represent highly-entangled Krylov vectors at m = 200, which  Step size is δ = 0.1. The best TEBD2 result is plotted as a reference. The 2TDVP and the local Krylov method do not strongly depend on the initial bond dimension.  leads to large errors. The MPO W II method suffers a large time-step error which in turn increases runtime by approximately a factor of ten compared to those methods which can use a larger step size. TEBD2 provides very reasonable data up to t = 3 and 2TDVP and the local Krylov method are able to go beyond up to t ≈ 7 and t ≈ 10 respectively. However, as the truncation error is of the order of 10 −2 per step size in both methods at larger t, the results will have large quantitative errors at large times, as discussed next.

Energy conservation and inversion symmetry
Under the unitary time evolution, both the energy of the system and the inversion symmetry over the central MPS bond should be conserved. To measure the latter, we consider the maximal deviation Then considering the same configurations as before, we find in Fig. 35 that the inversion error stays relatively small for 2TDVP, the local Krylov and the TEBD2 methods, while it becomes up to an order of magnitude larger for 1TDVP, the global Krylov and the MPO W II method at intermediate times. At very short times, we find TEBD2 and (with δ = 0.01) the local Krylov method to have the smallest inversion symmetry error. The other methods (run at δ = 0.01) also provide similarly good results (not shown).
Considering |E(t) − E 0 | /L 2 , the error in the energy per site, (cf. Fig. 36) provides nearly the same picture except for a much larger difference between the local Krylov method and 2TDVP -the latter conserves energy to a much higher precision than the former. 1TDVP (not shown) provides accuracy in energy up to 10 −7 even at very late times while the global Krylov method is very precise until t ≈ 0.7, where its error quickly increases.

Conclusion
Obtaining the time evolution in 2D is, even for a non-entangled product initial state, a challenge for all methods. Even though the projection steps in the 2TDVP, the local Krylov, and the 1TDVP methods incur problems with such product initial states, this can be healed by simply increasing the initial bond dimension as discussed above, which is also often done in tDMRG in such situations. Apart from better energy conservation, 2TDVP and the local Krylov method are nearly comparable, with the latter being approximately 20-30% faster (however, the maximal bond dimension m = 200 is here already exhausted at t < 1). TEBD2 mostly suffers from slow MPO-MPS products at large bond dimensions but actually has an acceptable error caused by the finite time step, allowing us to choose δ = 0.1. In contrast, the MPO method sports fast MPO-MPS products (due to the smallerŴ II ) but incurs a large time step error. It hence appears that for longer time scales, the 2TDVP or local Krylov method are the most promissing approaches, while for short times, any of the methods work reasonably well. which can be interpreted physically as a measure for the causal relation between correlators during time evolution. This is typically done at non-zero temperature so that we need to describe the system in an enlarged Hilbert space H = H P ⊗ H A to be able to represent mixed states, cf. Sec. 2.7. Here, we work in the infinite-temperature limit β = 0 of a canonical ensemble with S z = 0. As carried out in Sec. 7.5 the time evolution of a purified, (in-)finite temperature state can be manipulated by shifting the purification insertion point so that we can evaluate the OTOC performing two time evolutions and calculate only local expectation values: Fẑ rẑ0 β = Z 0 (β, t)|ẑ † P ;r ⊗ẑ t A;r |Z 0 (t) (220) |Z 0 (t) = Û P (t) ⊗Û A (t) ẑ † P ;0 ⊗1 |0 (221) |Z(β, t) = Û P (t) ⊗Û A (t) ρ P ( β/2 /) ⊗ẑ A;0ρA ( β /2) |0 .
We calculated the time evolutions for 32, 64 and 128 physical sites with step sizes δ = 0.05, 0.01 and kept m = 1000 as the maximum number of states. We find the global Krylov method to be unsuitable for the current problem. Due to the infinite temperature initial state and time evolutions on both the physical and auxiliary system, the Krylov vectors are already at the beginning highly entangled, forbidding a calculation at reasonable time scales.

Spreading of the light cone
In our setup we considered open boundary conditions so that we need to ensure that excitations from the boundary do not disturb the OTOC. With the chosen set of coupling constants this is fulfilled by restricting the maximal computed time evolution to kT ≤ L /2 so that in order to study the hydrodynamic tail we will discuss only the results for the largest system with L = 128 physical sites. In Fig. 37 the OTOC Fẑ rẑ0 β=0 (t) obtained with 2TDVP and a step width δ = 0.01 is shown as an example for the time evolution at all relative lattice distances r with respect to the central site of the system. We have marked the envelope of the spreading lightcone front and the bulk for the case of 2TDVP which gives a quantitative picture of the broadening of the lightcone front.  β=0 (t) of a periodically driven Ising chain with local operatorsẑr =ŝ z L/2+r at constant times and δ = 0.05. 2TDVP appears to resemble the expected spreading, while TEBD2, the local Krylov and the MPO W II methods exhibit only a reduced spreading. Instead, the latter methods create artificial offsets (results from these simulations for time slices t = 50, 62 are not shown here for the sake of clarity). and δ = 0.05. Comparing to the time evolution obtained with 2TDVP we find that both the spreading of the lightcone front and the broadening of the lightcone itself is much less pronounced and disturbed by rays in the lightcone which decay only slowly. These rays are generated by structure inside the lightcone which are absent in the 2TDVP datasets. Apart from even-odd effects, we would expect the dynamics inside the lightcone to generate a homogenous amplitude of the OTOC, strengthening the 2TDVP results. Similar plots can be obtained for the MPO W II method which suffers from the same effects as the local Krylov method.
In order to investigate the lightcone spreading more carefully, we show in Fig. 39 the decay of the lightcone at various slices throughout the right half of the system at constant times t = 2, 14, 26, 38, 50 and 62 at fixed time step δ = 0.05. Only at the first two time slices all methods (nearly) agree with each other displaying the expected broading of the lightcone front. The results for the 2TDVP method show an overall decay for the slope at the lightcone front at later times, as expected. In turn the local Krylov, TEBD2 and MPO W II methods produce a decay for the slope of the lightcone front under the time evolution which can be attributed to additional ray-like structures in the ligtcone, c.f. Fig. 38. Furthermore, we find that the saturation value outside the lightcone for these three methods lowers during the time evolution. This can be linked to additional phases built up during the time evolutions between the two states |Z 0 (t) and |Z(β = 0, t) , as already observed in Sec. 8.1.

Hydrodynamic tail
In the case of periodic boundary conditions the long time behavior behind the lightcone, which travels with the butterfly velocity v B , is expected to exhibit an algebraic power law decay, i.e., for v B · t |r| the OTOC is dominated by  by comparing to an indicated reference function f (t) = a √ v B ·t−1 where the parameters a and v B are obtained by a suitable fit of the 2TDVP datasets. Furthermore there are only little deviations when tuning the time steps, strengthening the validity of these results. However, comparing against the local Krylov, MPO W II and TEBD2 datasets there is only in agreement until t ≈ 10T after which the methods start to deviate and exhibit quite distinct oscillatoric behavior. It seems that at least for the local Krylov (at δ = 0.01) and MPO W II (at δ = 0.05, 0.01) method one can identify the expected hydrodynamic tail but the data is far from being reliable. Since the initial states describe a (perturbed) ensemble at infinite temperature with comparably large initial bond dimensions (m ∼ 150) we could expect the projection errors in 2TDVP and the local Krylov method, respectively, to be very small because there will be a finite overlap with a large portion of the states in the many-body Hilbert space. Subsequently, the effect of the perturbation s z L /2 on the infinite temperature state is then to create quasiparticles from the entire spectrum resulting in a complicated dynamics which is a tough challenge for all time evolution schemes. Nevertheless, since TDVP (1TDVP exact, 2TDVP approximately) fulfills conservations laws for the energy, particles etc. it is not too surprising to find the proper long-time dynamics being captured by TDVP. Even though the datasets created by 2TDVP are physically reasonable, the local Krylov as the other method approximating the action of the exponential suffers from similar phase accumulation as in Sec. 8.1. Consistently we also find the strongest dependence on the step size δ for the local Krylov method.

Benchmark
Finally we take a look at the benchmark data for the calculated time evolutions at the example of the evolution of the state |Z 0 (t) . The calculations where performed single threaded on an Intel Xeon Gold 6150 CPU with 192 GB of RAM and no hard-disk caching. All methods are subject to a rapid increase of the maximal bond dimension m (cf. Fig. 41). 2TDVP exhibits the quickest increase and already reaches the maximal permitted m at times t < 5T while the MPO W II method has the slowest growth and saturates around t ≈ 10T which may be attributed to the variational optimization during the MPO application.
In Fig. 42 we plot the CPU time required per unit time T averaged over one driving cycle. We find the best performance for 2TDVP followed by TEBD2 and the MPO W II . Interestingly, the performance of the MPO W II method at δ = 0.05 is very competitive with TEBD2 at the same step size but when reducing the step size towards δ = 0.01 the variational update in the MPO application, which is absolutely necessary to ensure numerical stability in this testcase, renders the simulations extremly slow. The large difference between the runtimes for 2TDVP and the local Krylov method can be explained by inspecting the number of local Krylov vectors which are required to achieve the target precision of 10 −10 . TDVP typically requires only 5 iterations of the local Krylov solver to converge while the local Krylov method reaches the maximum number of permitted iterations (12) already after a few time steps. Consequently the local Krylov method is unsuitable for small time steps δ = 0.01.

Conclusion
We have tested four time evolution methods: 2TDVP, TEBD2, local Krylov and MPO W II for their ability to describe the time evolution of out-of-time-order correlators for a periodically-driven Ising chain. The best results are provided by 2TDVP with respect to both physical accuracy and performance. TEBD2 again yields a competitive runtime at least for larger step sizes but the data fails to reproduce the broadening of the lightcone and even the hydrodynamic tail of the OTOC is only qualitatively reproduced. Both the local Krylov and MPO W II method fail to adequately reproduce the broadening of the lightcone front employing the numerically feasible step size δ = 0.05 and, more severely, pick up global offsets in the OTOC at long runtimes. The hydrodynamic tail may be identified in these datasets but the reliability should be strengthened by a very careful analysis with more time steps and different maximal bond dimensions m. Since fulfilling conservation laws seems to be advantageous for capturing the complicated dynamics inside the lightcone the possibly best strategy would be to perform 2TDVP time evolution until the bulk bond dimensions have saturated to the permitted maximum value and then switch to 1TDVP. In addition to the results presented here using the purification approach, it was recently shown that TDVP when used with METTS can also provide reasonable data for OTOCs [86]. 75

Future developments
While one should of course not lose sight of new alternative methods to evaluate excitation spectra [87][88][89][90][91] or Green's functions [92], the direct evaluation of real-time observables will always find interesting applications. For the future, at least three distinct approaches for new developments of real-time evolution methods should be considered: Improved MPS-local methods. Following the spirit of td-DMRG, the local Krylov and TDVP may provide the best avenue towards hydrodynamics with the 1TDVP method already enforcing complete energy conservation though still suffering from a limited bond dimension [69]. Combining e.g. the 2TDVP method with a truncation scheme which also ensures energy conservation [57] may prove particularly fruitful.
Easier and more accurate construction ofÛ (δ). At the moment one has to use either the TEBD2 or the MPO W I,II method to construct a MPO representation ofÛ (δ). Both incur relatively large time-step errors and require more than simply a set of tensors describing the MPO as input. One potential alternative here is a construction as (1 − iδ Ĥ ) N with δ N = δ and N 1. The ingredients can all be represented within the MPO framework and MPO compression may provide a very helpful tool to conserving as much accuracy as possible during the construction [93]. Alternatively, combining different approaches, the newest example being a large-scale Trotter decomposition combined with a small-scale Krylov method, may also lead to flexible time steppers which only suffer from minimal time step-size errors [62].
"Better" Krylov vectors. The major drawback of the global Krylov method at the moment is the fast growth of entanglement in the Krylov vectors. While it is used where absolutely necessary -due to its flexibility in the choice of time step sizes -alternative methods provide much faster long-time evolution. In addition to this flexibility, it also only requires an MPO representation as input, making it potentially useful in applications where decompositions are not straightforward. Hence, an approach to use it also in complex settings would be highly appreciated. This could be done by using less entangled Krylov-like subspaces whose basis vectors can be represented efficiently as matrix-product states. Alternatively, one can envision a second basis transformation finding minimally-entangled basis vectors for the standard Krylov subspace.